[RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling
@ 2015-01-30 14:02 Philipp Hachtmann
  2015-01-30 14:02 ` [PATCH 1/3] sched: Support for CPU runtime and SMT based adaption Philipp Hachtmann
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Philipp Hachtmann @ 2015-01-30 14:02 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: heiko.carstens, linux-s390, schwidefsky, Philipp Hachtmann

Hello,

when using "real" processors the scheduler can make its decisions based
on wall time. But CPUs under hypervisor control are sometimes
unavailable without further notice to the guest operating system.
Using wall time for scheduling decisions in this case will lead to
unfair decisions and erroneous distribution of CPU bandwidth when
using cgroups.
On (at least) S390 every CPU has a timer that counts the real execution
time from IPL. When the hypervisor has sheduled out the CPU, the timer
is stopped. So it is desirable to use this timer as a source for the
scheduler's rq runtime calculations.

On SMT systems the consumed runtime of a task might be worth  more
or less depending on the fact that the task can have run alone or not
during the last delta. This should be scalable based on the current
CPU utilization.

The first patch introduces two little hooks to the optional architecture
funtions cpu_exec_time and scale_rq_clock_delta.
Calls to cpu_exec_time replace calls to sched_clock_cpu a few times but
are mapped back to sched_clock_cpu if architecture does not define
cpu_exec_time. The call to scale_rq_clock_delta is added into
update_rq_clock (sched/core.c) and defaults to a NOP when not defined
by architecture code.

Regards

Philipp

Philipp Hachtmann (3):
  sched: Support for CPU runtime and SMT based adaption
  s390/cputime: Provide CPU runtime since IPL
  s390/cputime: SMT based scaling of CPU runtime deltas

 arch/s390/include/asm/cputime.h | 31 +++++++++++++++++++++++++++++++
 arch/s390/kernel/vtime.c        |  4 ++--
 kernel/sched/core.c             |  4 +++-
 kernel/sched/fair.c             |  8 ++++----
 kernel/sched/sched.h            |  8 ++++++++
 5 files changed, 48 insertions(+), 7 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] sched: Support for CPU runtime and SMT based adaption
  2015-01-30 14:02 [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Philipp Hachtmann
@ 2015-01-30 14:02 ` Philipp Hachtmann
  2015-01-30 14:02 ` [PATCH 2/3] s390/cputime: Provide CPU runtime since IPL Philipp Hachtmann
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Philipp Hachtmann @ 2015-01-30 14:02 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: heiko.carstens, linux-s390, schwidefsky, Philipp Hachtmann

On virtualized systems like s390 the CPU runtimes used for the
scheduler's calculations must be adapted to correctly represent real cpu
working time instead of slices of wall time.
Furthermore this real cpu runtime may have to be further adapted on SMT
CPUs depending on the number of threads (Linux: CPUs) being active in
the same core.

This patch changes some calls to sched_clock_cpu into calls to cpu_exec_time.
cpu_exec_time is defined as sched_clock_cpu by default but can be overridden
by architecture code to provide precise CPU runtime timestamps.

One might think that it would be better to override the weak symbol sched_clock
instead of adding something new. This seems to be impossible because
sched_clock is used by other facilities (like printk timestamping) which assume
it to deliver wall time instead of some pure virtual time stamp which differs
from CPU to CPU.

The second hook is a call to an architecture function scale_rq_clock_delta
which additionally scales the calculated delta by an SMT based factor.

Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
---
 kernel/sched/core.c  | 4 +++-
 kernel/sched/fair.c  | 8 ++++----
 kernel/sched/sched.h | 8 ++++++++
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 89e7283..c611055 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -122,9 +122,11 @@ void update_rq_clock(struct rq *rq)
 	if (rq->skip_clock_update > 0)
 		return;
 
-	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
+	delta = cpu_exec_time(cpu_of(rq)) - rq->clock;
 	if (delta < 0)
 		return;
+
+	scale_rq_clock_delta(&delta);
 	rq->clock += delta;
 	update_rq_clock_task(rq, delta);
 }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ef2b104..4921d1d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3180,7 +3180,7 @@ static inline u64 sched_cfs_bandwidth_slice(void)
 
 /*
  * Replenish runtime according to assigned quota and update expiration time.
- * We use sched_clock_cpu directly instead of rq->clock to avoid adding
+ * We use cpu_exec_time directly instead of rq->clock to avoid adding
  * additional synchronization around rq->lock.
  *
  * requires cfs_b->lock
@@ -3192,7 +3192,7 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 	if (cfs_b->quota == RUNTIME_INF)
 		return;
 
-	now = sched_clock_cpu(smp_processor_id());
+	now = cpu_exec_time(smp_processor_id());
 	cfs_b->runtime = cfs_b->quota;
 	cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period);
 }
@@ -6969,13 +6969,13 @@ static int idle_balance(struct rq *this_rq)
 		}
 
 		if (sd->flags & SD_BALANCE_NEWIDLE) {
-			t0 = sched_clock_cpu(this_cpu);
+			t0 = cpu_exec_time(this_cpu);
 
 			pulled_task = load_balance(this_cpu, this_rq,
 						   sd, CPU_NEWLY_IDLE,
 						   &continue_balancing);
 
-			domain_cost = sched_clock_cpu(this_cpu) - t0;
+			domain_cost = cpu_exec_time(this_cpu) - t0;
 			if (domain_cost > sd->max_newidle_lb_cost)
 				sd->max_newidle_lb_cost = domain_cost;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2df8ef0..720664f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1569,3 +1569,11 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+#ifndef cpu_exec_time
+#define cpu_exec_time sched_clock_cpu
+#endif
+
+#ifndef scale_rq_clock_delta
+#define scale_rq_clock_delta(arg)
+#endif
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] s390/cputime: Provide CPU runtime since IPL
  2015-01-30 14:02 [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Philipp Hachtmann
  2015-01-30 14:02 ` [PATCH 1/3] sched: Support for CPU runtime and SMT based adaption Philipp Hachtmann
@ 2015-01-30 14:02 ` Philipp Hachtmann
  2015-01-30 14:02 ` [PATCH 3/3] s390/cputime: SMT based scaling of CPU runtime deltas Philipp Hachtmann
  2015-01-31 11:43 ` [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Peter Zijlstra
  3 siblings, 0 replies; 7+ messages in thread
From: Philipp Hachtmann @ 2015-01-30 14:02 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: heiko.carstens, linux-s390, schwidefsky, Philipp Hachtmann

The cpu maintains a cpu timer which runs only while the cpu
is available to the system (i.e. not scheduled away by the hypervisor).
This patch introduces a function cpu_exec_time that returns a
time stamp which reflects the cpu's real processing time since IPL.

Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
---
 arch/s390/include/asm/cputime.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index b91e960..fee50dd 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -167,6 +167,22 @@ static inline clock_t cputime64_to_clock_t(cputime64_t cputime)
 	return clock;
 }
 
+/*
+ * Read out the current CPU's timer
+ *
+ * Returns an incrementing time stamp in ns.
+ */
+static inline u64 cpu_exec_time(int cpu_unused)
+{
+	u64 timer;
+       	asm volatile(
+		"	stpt	%0\n"	/* Store current cpu timer value */
+		: "=m" (timer));
+
+	return ((ULLONG_MAX - timer) * 1000) / 4096;
+}
+#define cpu_exec_time cpu_exec_time
+
 cputime64_t arch_cpu_idle_time(int cpu);
 
 #define arch_idle_time(cpu) arch_cpu_idle_time(cpu)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] s390/cputime: SMT based scaling of CPU runtime deltas
  2015-01-30 14:02 [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Philipp Hachtmann
  2015-01-30 14:02 ` [PATCH 1/3] sched: Support for CPU runtime and SMT based adaption Philipp Hachtmann
  2015-01-30 14:02 ` [PATCH 2/3] s390/cputime: Provide CPU runtime since IPL Philipp Hachtmann
@ 2015-01-30 14:02 ` Philipp Hachtmann
  2015-01-31 11:43 ` [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Peter Zijlstra
  3 siblings, 0 replies; 7+ messages in thread
From: Philipp Hachtmann @ 2015-01-30 14:02 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: heiko.carstens, linux-s390, schwidefsky, Philipp Hachtmann

The scheduler calculates CPU runtime deltas to account for a task's
runtime. These deltas have to be adapted on SMT CPUs to reflect the real
processing power consumed during the last delta.
This patch introduces scale_rq_clock_delta which is used by the
scheduler to scale the calculated delta by an SMT based factor.

Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
---
 arch/s390/include/asm/cputime.h | 17 ++++++++++++++++-
 arch/s390/kernel/vtime.c        |  4 ++--
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index fee50dd..e44f285 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -8,6 +8,8 @@
 #define _S390_CPUTIME_H
 
 #include <linux/types.h>
+#include <linux/percpu.h>
+#include <asm/smp.h>
 #include <asm/div64.h>
 
 #define CPUTIME_PER_USEC 4096ULL
@@ -20,6 +22,9 @@ typedef unsigned long long __nocast cputime64_t;
 
 #define cmpxchg_cputime(ptr, old, new) cmpxchg64(ptr, old, new)
 
+DECLARE_PER_CPU(u64, mt_scaling_mult);
+DECLARE_PER_CPU(u64, mt_scaling_div);
+
 static inline unsigned long __div(unsigned long long n, unsigned long base)
 {
 #ifndef CONFIG_64BIT
@@ -172,6 +177,7 @@ static inline clock_t cputime64_to_clock_t(cputime64_t cputime)
  *
  * Returns an incrementing time stamp in ns.
  */
+#define cpu_exec_time cpu_exec_time
 static inline u64 cpu_exec_time(int cpu_unused)
 {
 	u64 timer;
@@ -181,7 +187,16 @@ static inline u64 cpu_exec_time(int cpu_unused)
 
 	return ((ULLONG_MAX - timer) * 1000) / 4096;
 }
-#define cpu_exec_time cpu_exec_time
+
+#define scale_rq_clock_delta scale_rq_clock_delta
+static inline void scale_rq_clock_delta(u64 *delta)
+{
+	u64 mult = __get_cpu_var(mt_scaling_mult);
+	u64 div = __get_cpu_var(mt_scaling_div);
+
+	if (smp_cpu_mtid)
+		*delta = (*delta * mult) / div;
+}
 
 cputime64_t arch_cpu_idle_time(int cpu);
 
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 97b3c12..5d9bbd0 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -26,8 +26,8 @@ static atomic64_t virt_timer_current;
 static atomic64_t virt_timer_elapsed;
 
 static DEFINE_PER_CPU(u64, mt_cycles[32]);
-static DEFINE_PER_CPU(u64, mt_scaling_mult) = { 1 };
-static DEFINE_PER_CPU(u64, mt_scaling_div) = { 1 };
+DEFINE_PER_CPU(u64, mt_scaling_mult) = { 1 };
+DEFINE_PER_CPU(u64, mt_scaling_div) = { 1 };
 
 static inline u64 get_vtimer(void)
 {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling
  2015-01-30 14:02 [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Philipp Hachtmann
                   ` (2 preceding siblings ...)
  2015-01-30 14:02 ` [PATCH 3/3] s390/cputime: SMT based scaling of CPU runtime deltas Philipp Hachtmann
@ 2015-01-31 11:43 ` Peter Zijlstra
  2015-02-03 14:11   ` Martin Schwidefsky
  3 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2015-01-31 11:43 UTC (permalink / raw)
  To: Philipp Hachtmann
  Cc: mingo, linux-kernel, heiko.carstens, linux-s390, schwidefsky

On Fri, Jan 30, 2015 at 03:02:39PM +0100, Philipp Hachtmann wrote:
> Hello,
> 
> when using "real" processors the scheduler can make its decisions based
> on wall time. But CPUs under hypervisor control are sometimes
> unavailable without further notice to the guest operating system.
> Using wall time for scheduling decisions in this case will lead to
> unfair decisions and erroneous distribution of CPU bandwidth when
> using cgroups.
> On (at least) S390 every CPU has a timer that counts the real execution
> time from IPL. When the hypervisor has sheduled out the CPU, the timer
> is stopped. So it is desirable to use this timer as a source for the
> scheduler's rq runtime calculations.
> 
> On SMT systems the consumed runtime of a task might be worth  more
> or less depending on the fact that the task can have run alone or not
> during the last delta. This should be scalable based on the current
> CPU utilization.

So we've explicitly never done this before because at the end of the day
its wall time that people using the computer react to.

Also, once you open this door you can have endless discussions of what
constitutes work. People might want to use instructions retired for
instance, to normalize against pipeline stalls.

Also, if your hypervisor starves its vcpus of compute time; how is that
our problem?

Furthermore, we already have some stealtime accounting in
update_rq_clock_task() for the virt crazies^Wpeople.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling
  2015-01-31 11:43 ` [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Peter Zijlstra
@ 2015-02-03 14:11   ` Martin Schwidefsky
  2015-02-05 11:24     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Schwidefsky @ 2015-02-03 14:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Philipp Hachtmann, mingo, linux-kernel, heiko.carstens, linux-s390

On Sat, 31 Jan 2015 12:43:07 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Jan 30, 2015 at 03:02:39PM +0100, Philipp Hachtmann wrote:
> > Hello,
> > 
> > when using "real" processors the scheduler can make its decisions based
> > on wall time. But CPUs under hypervisor control are sometimes
> > unavailable without further notice to the guest operating system.
> > Using wall time for scheduling decisions in this case will lead to
> > unfair decisions and erroneous distribution of CPU bandwidth when
> > using cgroups.
> > On (at least) S390 every CPU has a timer that counts the real execution
> > time from IPL. When the hypervisor has sheduled out the CPU, the timer
> > is stopped. So it is desirable to use this timer as a source for the
> > scheduler's rq runtime calculations.
> > 
> > On SMT systems the consumed runtime of a task might be worth  more
> > or less depending on the fact that the task can have run alone or not
> > during the last delta. This should be scalable based on the current
> > CPU utilization.
> 
> So we've explicitly never done this before because at the end of the day
> its wall time that people using the computer react to.

Oh yes, absolutely. That is why we go to all the pain with virtual cputime.
That is to get to the absolute time a process has been running on a CPU
*without* the steal time. Only the scheduler "thinks" in wall-clock because
sched_clock is defined to return nano-seconds since boot.

> Also, once you open this door you can have endless discussions of what
> constitutes work. People might want to use instructions retired for
> instance, to normalize against pipeline stalls.

Yes, we had that discussion in the design for SMT as well. In the end
the view of a user is ambivalent, we got used to a simplified approach.
A process that runs on a CPU 100% of the wall-time gets 100% CPU,
ignoring pipeline stalls, cache misses, temperature throttling and so on.
But with SMT we suddenly complain about the other thread on the core
impacting the work.

> Also, if your hypervisor starves its vcpus of compute time; how is that
> our problem?

Because we see the effects of that starvation in the guest OS, no?

> Furthermore, we already have some stealtime accounting in
> update_rq_clock_task() for the virt crazies^Wpeople.

Yes, defining PARAVIRT_TIME_ACCOUNTING and a paravirt_steal_clock would
solve one of the problems (the one with the cpu_exec_time hook). But
it does so in an indirect way, for s390 we do have an instruction for
that ..

Which leaves the second hook scale_rq_clock_delta. That one only makes
sense if the steal time has been subtracted from sched_clock. It scales
the delta with the average number of threads that have been running
in the last interval. Basically if two threads are running the delta
is halved.

This technique has an interesting effect. Consider a setup with 2-way
SMT and CFS bandwidth control. With the new cpu_exec_time hook the
time counted against the quota is normalized with the average thread
density. Two logical CPUs on a core use the same quota as a single
logical CPU on a core. In effect by specifying a quota as a multiple
of the period you can limit a group to use the CPU capacity of as
many *cores*. This avoids that nasty group scheduling issue we
briefly talked about ..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling
  2015-02-03 14:11   ` Martin Schwidefsky
@ 2015-02-05 11:24     ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2015-02-05 11:24 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Philipp Hachtmann, mingo, linux-kernel, heiko.carstens, linux-s390

On Tue, Feb 03, 2015 at 03:11:12PM +0100, Martin Schwidefsky wrote:
> On Sat, 31 Jan 2015 12:43:07 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Jan 30, 2015 at 03:02:39PM +0100, Philipp Hachtmann wrote:
> > > Hello,
> > > 
> > > when using "real" processors the scheduler can make its decisions based
> > > on wall time. But CPUs under hypervisor control are sometimes
> > > unavailable without further notice to the guest operating system.
> > > Using wall time for scheduling decisions in this case will lead to
> > > unfair decisions and erroneous distribution of CPU bandwidth when
> > > using cgroups.
> > > On (at least) S390 every CPU has a timer that counts the real execution
> > > time from IPL. When the hypervisor has sheduled out the CPU, the timer
> > > is stopped. So it is desirable to use this timer as a source for the
> > > scheduler's rq runtime calculations.
> > > 
> > > On SMT systems the consumed runtime of a task might be worth  more
> > > or less depending on the fact that the task can have run alone or not
> > > during the last delta. This should be scalable based on the current
> > > CPU utilization.
> > 
> > So we've explicitly never done this before because at the end of the day
> > its wall time that people using the computer react to.
> 
> Oh yes, absolutely. That is why we go to all the pain with virtual cputime.
> That is to get to the absolute time a process has been running on a CPU
> *without* the steal time. Only the scheduler "thinks" in wall-clock because
> sched_clock is defined to return nano-seconds since boot.

I'm not entirely sure what you're trying to say there, but if its
agreement -- like the first few words seems to suggest then I'll leave
it at that ;-)

> > Also, once you open this door you can have endless discussions of what
> > constitutes work. People might want to use instructions retired for
> > instance, to normalize against pipeline stalls.
> 
> Yes, we had that discussion in the design for SMT as well. In the end
> the view of a user is ambivalent, we got used to a simplified approach.
> A process that runs on a CPU 100% of the wall-time gets 100% CPU,
> ignoring pipeline stalls, cache misses, temperature throttling and so on.
> But with SMT we suddenly complain about the other thread on the core
> impacting the work.

Welcome to SMT ;-) So far our approach has been, tough luck. That's what
you get, and I see no reason to change that for s390.

For x86, sparc, powerpc, mips, ia64 who all have SMT we completely
ignore the fact that the (logical) CPU is suddenly slower than it was.
In that respect it's no different from cpufreq mucking about with your
clock speeds. We account the task runtime in walltime, irrespective of
what might (or might not) have ran on a sibling.

Now, there is a bunch of people that want to do DVFS accounting; but
that is mostly so we can guestimate relative gain; like can I fit this
new task by making the CPU go faster or should I use this other CPU.

Also, how does your hypervisor thingy deal with vcpu vs SMT? Does it
schedule it like any other logical CPU and Linux is completely oblivious
to actual machine topology?

> > Also, if your hypervisor starves its vcpus of compute time; how is that
> > our problem?
> 
> Because we see the effects of that starvation in the guest OS, no?

But why ruin Linux for an arguably broken hypervisor? If your HV causes
starvation, fix that.

> > Furthermore, we already have some stealtime accounting in
> > update_rq_clock_task() for the virt crazies^Wpeople.
> 
> Yes, defining PARAVIRT_TIME_ACCOUNTING and a paravirt_steal_clock would
> solve one of the problems (the one with the cpu_exec_time hook). But
> it does so in an indirect way, for s390 we do have an instruction for
> that ..

Of course you do! How's work on the crystal ball instruction coming? ;-)

I really _really_ like to not have more than 1 virt means of mucking
with time. I detest virt (everybody knows that, right?) and having all
the virt flavours of the month do different things to me makes me sad.

Computing steal time should not be too expensive for you right? Just
take the walltime and subtract this new time. Maybe you can even
micro-code a new instruction to do that for you :-)

> Which leaves the second hook scale_rq_clock_delta. That one only makes
> sense if the steal time has been subtracted from sched_clock. It scales
> the delta with the average number of threads that have been running
> in the last interval. Basically if two threads are running the delta
> is halved.

Right; so the patches were decidedly light on detail there. I'm very
sure I did not get what you were attempting to do there, and I'm not
sure I do now.

Isn't the whole point of SMT to get _more_ than a single thread of
performance out of a core?

> This technique has an interesting effect. Consider a setup with 2-way
> SMT and CFS bandwidth control. With the new cpu_exec_time hook the
> time counted against the quota is normalized with the average thread
> density. Two logical CPUs on a core use the same quota as a single
> logical CPU on a core. In effect by specifying a quota as a multiple
> of the period you can limit a group to use the CPU capacity of as
> many *cores*.

*groan*... So we muck about with time because you want to do accounting
tricks? That should have been in big bright neon letters in a comment
somewhere. Not squirreled away in a detail.

Arguably one could make that an (optional) feature of
account_cfs_rq_runtime() and only affect the accounting while leaving
the actual scheduling alone.

This needs more thought and certainly more description.

> This avoids that nasty group scheduling issue we
> briefly talked about ..

I remember we did talk; I'm afraid however I seem to have lost many of
the details in the post baby haze (which still hasn't entirely lifted).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-02-05 11:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-30 14:02 [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Philipp Hachtmann
2015-01-30 14:02 ` [PATCH 1/3] sched: Support for CPU runtime and SMT based adaption Philipp Hachtmann
2015-01-30 14:02 ` [PATCH 2/3] s390/cputime: Provide CPU runtime since IPL Philipp Hachtmann
2015-01-30 14:02 ` [PATCH 3/3] s390/cputime: SMT based scaling of CPU runtime deltas Philipp Hachtmann
2015-01-31 11:43 ` [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling Peter Zijlstra
2015-02-03 14:11   ` Martin Schwidefsky
2015-02-05 11:24     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.