All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Youquan Song <youquan.song@intel.com>,
	linux-kernel@vger.kernel.org, mingo@elte.hu, tglx@linutronix.de,
	hpa@zytor.com, akpm@linux-foundation.org, stable@vger.kernel.org,
	suresh.b.siddha@intel.com, arjan@linux.intel.com,
	len.brown@intel.com, anhua.xu@intel.com, chaohong.guo@intel.com,
	Youquan Song <youquan.song@linux.intel.com>,
	Paul Turner <pjt@google.com>
Subject: Re: [PATCH] x86,sched: Fix sched_smt_power_savings totally broken
Date: Mon, 9 Jan 2012 22:35:21 +0530	[thread overview]
Message-ID: <20120109170521.GB29142@dirshya.in.ibm.com> (raw)
In-Reply-To: <1326125597.2442.90.camel@twins>

* Peter Zijlstra <peterz@infradead.org> [2012-01-09 17:13:17]:

> On Mon, 2012-01-09 at 21:33 +0530, Vaidyanathan Srinivasan wrote:
> 
> > Yes, based on the architecture and topology, we do have two sweet
> > spots for power vs performance trade offs.  The first level should be
> > to reduce power savings with marginal performance impact and second
> > one will be to go for the most aggressive power savings.
> 
> Colour me unconvinced, cache heavy workloads will suffer greatly from
> your 1.

Certain workloads will get hit heavily by '1', so default choice of
'1' is bad for this case. However many general workloads could give
good power savings with marginal performance loss.  For those general
cases, we could keep '1' as default.

> > The first one should generally be recommended as default to have
> > a right balance between performance and power savings, while the
> > second one should be used for reducing power consumption on
> > unimportant workloads or under certain constraints.
> > 
> > Some example policies:
> > 
> > sched_powersavings=1:
> > 
> >         Enable consolidation at MC level
> > 
> > sched_powersavings=2:
> > 
> >         Enable aggressive consolidation at MC level and SMT level if
> >         available. In case arch can benefit from cross node
> >         consolidation, then enable it.
> 
> You fail for mentioning MC/SMT..

My point was that SMT thread level consolidation comes with larger
performance loss compared to core level.  We need not expose this
settings to end user, but kernel can choose 'what' to enable at '2'
based on architecture/topology.

> > Having the above simple split in policy will enable wide adoption
> > where the first level can be a recommended default.  Having just
> > a boolean enable/disable will mean the end-user will have to decide
> > when to turn on and later off for best workload experience.
> 
> Picking one of two states is too hard, hence we given them one of three
> states to pick from.. How does that make sense?

Ok, I am suggesting that having three states will allow the user to
decide 'once' and leave the setting, rather than keep changing the
settings between enable/disable.

I am suggesting that designing powersavings=1 as a good default will
make the adoption simple.  On the other hand, only having
power_savings=enable would mean users will have to decide 'when' to
enable based on some policy, since leaving it enabled could affect
overall performance significantly.

> > Just similar to cpufreq policy of performance, ondemand and powersave.
> > They have their unique use cases and this design choice helps us ship
> > ondemand as default.
> 
> You fail for thinking having multiple cpufreq governors is a good thing.
> The result is that they all suck.

Majority of the users are served by the good default 'ondemand'
governor that has good powersavings without affecting performance
a lot.  If we just had performance and powersave governors, we will
have to choose 'performance' as default and design a method to choose
powersave only when utilization is low.

I agree with you that we do have too many cpufreq governors and
tunables than required.  But a good default covers most use cases,
leaving the rest for corner cases and workload specific tunings.  On
modern systems, cpuidle states and scheduling policy becomes
a significant power savings tradeoffs and hence we will need the
flexibility.

--Vaidy



  reply	other threads:[~2012-01-09 17:05 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-09  8:56 [PATCH] x86,sched: Fix sched_smt_power_savings totally broken Youquan Song
2012-01-09 10:06 ` Peter Zijlstra
2012-01-09 10:28   ` Peter Zijlstra
2012-01-09 10:30     ` Peter Zijlstra
2012-01-09 11:00     ` Vaidyanathan Srinivasan
2012-01-09 14:35       ` Peter Zijlstra
2012-01-09 16:03         ` Vaidyanathan Srinivasan
2012-01-09 16:13           ` Peter Zijlstra
2012-01-09 17:05             ` Vaidyanathan Srinivasan [this message]
2012-01-09 14:13     ` Arjan van de Ven
2012-05-18 10:19     ` [tip:sched/core] sched: Remove stale power aware scheduling remnants and dysfunctional knobs tip-bot for Peter Zijlstra
2012-01-10  0:14   ` [PATCH] x86,sched: Fix sched_smt_power_savings totally broken Youquan Song
2012-01-09 11:05     ` Peter Zijlstra
2012-01-10  5:58       ` Youquan Song
2012-01-09 23:52         ` Suresh Siddha
2012-01-10  9:18           ` Ingo Molnar
2012-01-10 14:32             ` Arjan van de Ven
2012-01-10 14:41               ` Peter Zijlstra
2012-01-10 14:54                 ` Arjan van de Ven
2012-01-10 15:32               ` Vincent Guittot
2012-01-10 15:32                 ` Vincent Guittot
2012-01-10 16:49               ` Vaidyanathan Srinivasan
2012-01-10 19:41               ` Ingo Molnar
2012-01-10 19:44                 ` Ingo Molnar
2012-01-10 16:54           ` Youquan Song
2012-01-10 16:51             ` Vaidyanathan Srinivasan
2012-01-10 19:01               ` Suresh Siddha
2012-01-11  3:52                 ` Vaidyanathan Srinivasan
2012-01-11 17:37                   ` Youquan Song
2012-01-10 16:44       ` Vaidyanathan Srinivasan
2012-01-09 11:12     ` Peter Zijlstra
2012-01-09 14:29       ` Vincent Guittot
2012-01-09 14:29         ` Vincent Guittot
2012-01-09 14:46         ` Peter Zijlstra
2012-01-10  2:12           ` Indan Zupancic
2012-01-10  9:26             ` Peter Zijlstra
2012-01-10  1:54         ` Suresh Siddha
2012-01-10  8:08           ` Vincent Guittot
2012-01-09 15:37 ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120109170521.GB29142@dirshya.in.ibm.com \
    --to=svaidy@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anhua.xu@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=chaohong.guo@intel.com \
    --cc=hpa@zytor.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=stable@vger.kernel.org \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=youquan.song@intel.com \
    --cc=youquan.song@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.