On Fri, Jun 06, 2014 at 08:35:21AM +0800, Yuyang Du wrote: > > > Actually, silicon supports indepdent non-Turbo pstate, but just not enabled. > > > > Then it doesn't exist, so no point in mentioning it. > > > > Well, things actually get more complicated. Not-enabled is for Core. For Atom > Baytrail, each core indeed can operate on difference frequency. I am not sure for > Xeon, :) Yes, I understand Atom is an entirely different thing. > > So frequency isn't _that_ interesting, voltage is. And while > > predictability it might be their assumption, is it actually true? I > > mean, there's really nothing else except to assume that, if its not you > > can't do anything at all, so you _have_ to assume this. > > > > But again, is the assumption true? Or just happy thoughts in an attempt > > to do something. > > Voltage is combined with frequency, roughly, voltage is proportional > to freuquecy, so roughly, power is proportionaly to voltage^3. You P ~ V^2, last time I checked. > can't say which is more important, or there is no reason to raise > voltage without raising frequency. Well, some chips have far fewer voltage steps than freq steps; or, differently put, they have multiple freq steps for a single voltage level. And since the power (Watts) is proportional to Voltage squared, its the biggest term. If you have a distinct voltage level for each freq, it all doesn't matter. > If only one word to say: true of false, it is true. Because given any > fixed workload, I can't see why performance would be worse if > frequency is higher. Well, our work here is to redefine performance as performance/watt. So running at higher frequency (and thus likely higher voltage) is a definite performance decrease in that sense. > The reality as opposed to the assumption is in two-fold: > 1) if workload is CPU bound, performance scales with frequency absolutely. if workload is > memory bound, it does not scale. But from kernel, we don't know whether it is CPU bound > or not (or it is hard to know). uArch statistics can model that. Well, we could know for a number of archs, its just that these statistics are expensive to track. Also, lowering P-state is 'fine', as long as you can 'guarantee' you don't loose IPC performance, since running at lower voltage for the same IPC is actually better IPC/watt than estimated. But what was said earlier is that P-state is a lower limit, not a higher limit. In that case the core can run at higher voltage and the estimate is just plain wrong. > But still, the assumption is a must or no guilty, because we adjust > frequency continuously, for example, if the workload is fixed, and if > the performance does not scale with freq we stop increasing frequency. > So a good frequency governor or driver should and can continuously > pursue "good" frequency with the changing workload. Therefore, in the > long term, we will be better off. Sure, but realize that we must fully understand this governor and integrate it in the scheduler if we're to attain the goal of IPC/watt optimized scheduling behaviour. So you (or rather Intel in general) will have to be very explicit on how their stuff works and can no longer hide in some driver and do magic. The same is true for all other vendors for that matter. If you (vendors, not Yuyang in specific) do not want to play (and be explicit and expose how your hardware functions) then you simply will not get power efficient scheduling full stop. There's no rocks to hide under, no magic veils to hide behind. You tell _in_public_ or you get nothing.