On Fri, Jun 06, 2014 at 08:35:21AM +0800, Yuyang Du wrote:

> > > Actually, silicon supports indepdent non-Turbo pstate, but just not enabled.
> > 
> > Then it doesn't exist, so no point in mentioning it.
> > 
> 
> Well, things actually get more complicated. Not-enabled is for Core. For Atom
> Baytrail, each core indeed can operate on difference frequency. I am not sure for
> Xeon, :)

Yes, I understand Atom is an entirely different thing.

> > So frequency isn't _that_ interesting, voltage is. And while
> > predictability it might be their assumption, is it actually true? I
> > mean, there's really nothing else except to assume that, if its not you
> > can't do anything at all, so you _have_ to assume this.
> > 
> > But again, is the assumption true? Or just happy thoughts in an attempt
> > to do something.
> 
> Voltage is combined with frequency, roughly, voltage is proportional
> to freuquecy, so roughly, power is proportionaly to voltage^3. You

P ~ V^2, last time I checked.

> can't say which is more important, or there is no reason to raise
> voltage without raising frequency.

Well, some chips have far fewer voltage steps than freq steps; or,
differently put, they have multiple freq steps for a single voltage
level.

And since the power (Watts) is proportional to Voltage squared, its the
biggest term.

If you have a distinct voltage level for each freq, it all doesn't
matter.

> If only one word to say: true of false, it is true. Because given any
> fixed workload, I can't see why performance would be worse if
> frequency is higher.

Well, our work here is to redefine performance as performance/watt. So
running at higher frequency (and thus likely higher voltage) is a
definite performance decrease in that sense.

> The reality as opposed to the assumption is in two-fold:
> 1) if workload is CPU bound, performance scales with frequency absolutely. if workload is
>    memory bound, it does not scale. But from kernel, we don't know whether it is CPU bound
>    or not (or it is hard to know). uArch statistics can model that.

Well, we could know for a number of archs, its just that these
statistics are expensive to track.

Also, lowering P-state is 'fine', as long as you can 'guarantee' you
don't loose IPC performance, since running at lower voltage for the same
IPC is actually better IPC/watt than estimated.

But what was said earlier is that P-state is a lower limit, not a higher
limit. In that case the core can run at higher voltage and the estimate
is just plain wrong.

> But still, the assumption is a must or no guilty, because we adjust
> frequency continuously, for example, if the workload is fixed, and if
> the performance does not scale with freq we stop increasing frequency.
> So a good frequency governor or driver should and can continuously
> pursue "good" frequency with the changing workload. Therefore, in the
> long term, we will be better off.

Sure, but realize that we must fully understand this governor and
integrate it in the scheduler if we're to attain the goal of IPC/watt
optimized scheduling behaviour.

So you (or rather Intel in general) will have to be very explicit on how
their stuff works and can no longer hide in some driver and do magic.
The same is true for all other vendors for that matter.

If you (vendors, not Yuyang in specific) do not want to play (and be
explicit and expose how your hardware functions) then you simply will
not get power efficient scheduling full stop.

There's no rocks to hide under, no magic veils to hide behind. You tell
_in_public_ or you get nothing.