linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][2.5.32] CPU frequency and voltage scaling (0/4)
@ 2002-08-28 11:46 Dominik Brodowski
  2002-08-28 18:47 ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: Dominik Brodowski @ 2002-08-28 11:46 UTC (permalink / raw)
  To: torvalds; +Cc: cpufreq, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2508 bytes --]

Hi Linus, lkml,

The following patches add CPU frequency and volatage scaling
support (Intel SpeedStep, AMD PowerNow, etc.) to kernel 2.5.32


Patch 1/4: cpufreq-core
-----------------------
The cpufreq core offers a common interface to the CPU clock 
speed features of ARM, PPC and x86 CPUs.  

For communication with user space, sysctl entries are placed in
/proc/sys/cpu/{0,1,...,NR_CPUS-1}/ .  Entries provided are:

	speed-min  (readonly)
	speed-max  (readonly)
	speed-sync (readonly - all CPUs need the same frequency,
	                       changes affect all CPUs)
	speed      (read/write)

In order for this code to be built, an architecture must define the
CONFIG_CPU_FREQ configuration symbol.  The merged ARM code already
has the necessary configuration in place, the i386 code follows in
parts 2 and 3.

Specifically on ARM CPUs, the core is especially important, since
various ARM system on a chip implementations derive peripheral clocks
from the CPU clock (eg, LCD controllers, SDRAM controllers, etc).
The core allows these peripherals to take action either prior and/or
after the actual CPU clock adjustment so we don't go out of tolerance.


Patch 2/4: cpufreq-i386-core
----------------------------
The main part of this patch is a CPUFreq notifier in arch/i386/kernel/time.c.
It updates the i386-specific cpu_khz, cpu_data[].loops_per_jiffy and
fast_gettimeoffset_quotient on each frequency change.

Additionally, this patch allows "cpu_khz" to be exported (it is needed 
for some cpufreq drivers) and adds some MSR #defines to asm-i386/msr.h


Patch 3/4: cpufreq-i386-drivers
-------------------------------
Four i386 CPUFreq drivers are ready to be merged this time. These are:
elanfreq.c:	  The AMD Elan CPU family offers extensive clock scaling
longhaul.c:	  VIA Longhaul processor clock + voltage scaling
powernow-k6.c:	  mobile AMD K6-2+ / mobile AMD K6-3+ clock scaling
speedstep.c:	  clock and voltage scaling on mobile Intel Pentium 3 and 4s,
		  but (unfortunately) only on ICH2-M or ICH3-M based
                  chipsets.

Support for mobile AMD K7 processors is still in development.


Patch 4/4: cpufreq-doc
----------------------
an entry to the CREDITS and the MAINTAINERS file, Config.help texts, and
extensive documentation in linux/Documentation/cpufreq


Comments welcome; however please ensure that the cpufreq development
list at cpufreq@www.linux.org.uk receives a copy of all comments.

	Dominik

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread
* RE: [PATCH][2.5.32] CPU frequency and voltage scaling (0/4)
@ 2002-08-28 20:25 Grover, Andrew
  2002-08-28 20:46 ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: Grover, Andrew @ 2002-08-28 20:25 UTC (permalink / raw)
  To: 'Linus Torvalds', Dominik Brodowski; +Cc: cpufreq, linux-kernel

> From: Linus Torvalds [mailto:torvalds@transmeta.com] 
> In other words: there is no valid way that a _user_ can set the policy
> right now: the user can set the frequency, but since any sane policy
> depends on how busy the CPU is, the user isn't even, the 
> right person to
> _do_ that, since the user doesn't _know_.
> 
> Also note that policy is not just about how busy the CPU is, but also 
> about how _hot_ the CPU is. Again, a user-mode application 
> (that maybe 
> polls the situation every minute or so), simply _cannot_ handle this 
> situation. You need to have the ability to poll the CPU tens 
> of times a 
> second to react to heat events, and clearly user mode cannot do that 
> without impacting performance in a big way.
> 
> The interface needs to be improved upon. It is simply _not_ 
> valid to say
> "run at this speed" as the primary policy.

Well TMTA CPUs would seem to be easy, because all this is done behind the
OS's back, right?

Let's talk about CPUs in which the OS has to control processor performance.
The way I see it, there are a bunch of inputs that are going to determine
CPU speed & voltage: user preference, workload, and thermals.

Wouldn't you have your initial perf setting determined by the workload, and
then revised down, based upon user preferences (such as "I want to conserve
battery") and the thermal requirements?

Any workload analysis has to be in the kernel. The user interface can be one
that just allows a limit to be placed upon the setting the workload demands.
Then, the thermal control can further drop the setting, if needs be.

Regards -- Andy

^ permalink raw reply	[flat|nested] 46+ messages in thread
* RE: [PATCH][2.5.32] CPU frequency and voltage scaling (0/4)
@ 2002-08-29 15:07 Pering, Trevor
  2002-08-30  8:04 ` Helge Hafting
  0 siblings, 1 reply; 46+ messages in thread
From: Pering, Trevor @ 2002-08-29 15:07 UTC (permalink / raw)
  To: 'Alan Cox', Linus Torvalds
  Cc: Dominik Brodowski, cpufreq, linux-kernel

And now comes the problem with the policy approach -- what to include in the
policy... event-HZ? Temperature? Mem Bus Freq? The list is endless. But,
here's my thoughts on the matter (not sure this adds anything new, but it
helps clarify it for me, at least). 

Taking the graphics card analogy... Graphics subsystems have evolved over
many years. Initial implementations were just direct-access-framebuffers,
and in fact, were initially often indirect-access frame-buffers (e.g., the
Apple II didn't even have a linear memory map for the character display,
IIRC). Over time, individual companies would develop optimized hardware and
write libraries for it, and then eventually abstractions like OpenGL or
DirectX appeared -- *after* people knew what was useful and what was not. 

In a "well-formed" world, we could probably start off with a very basic
interface, which is what cpufreq has tried to do, and then build policy on
top of that. However, we are already effectively building on top of
abstraction, so things are a little more complicated -- as Linus points out,
just specifying the exact frequency makes no sense anymore.

If you want to continue the graphics analogy, then we're in the position of
trying to write a driver that both handles bitmapped displays as well as
vector plotters. Just allowing direct bitmap access makes *no* sense in this
situation because they are meaningless for a vector plotter, which always
draws lines.

So, if you need to support two sets of graphics drivers, one that provides
draw_point(x,y), and one that provides draw_line(x1,y1,x2,y2) -- what do you
do? I would say you provide draw_line(x1,y1,x2,y2), and which can be reduced
to draw_point(x,y), if necessary. 

So, cpufreq is trying to just support set_freq(x), while some processors
_require_ a call in the form of bound_freq(x1,x2). Exact same situation.

Given that, there are still a couple of open questions:

1) About the policy field -- I think this should be as simple as possible...
because the use is either going to be simple, or way to complex to
effectively capture. So, the enumerated policy is probably best. Start
simple, then add other things if absolutely necessary.

2) To use MHz or something else? The problem is that the number here is
virtually meaningless. It does not translate from machine to machine,
processor to processor, or application to application. So, if you have to
pick a meaningless metric, what do you use? I would actually argue for % of
full capacity instead of MHz, but it doesn't really matter in the end.

3) Thermal overloading -- this, I believe, is a separate issue from the
cpufreq setting for things. I would leave this out of the equation, and let
the lower-level components handle this. I.e., think of "cpufreq" as a
suggestion, and if the suggestion would break something, then it is ignored.
If you really wanted, you could have a policy that is something like
"IgnoreThermal" -- but I think that would be silly.

4) The whole "one number describes processor behavior" is also somewhat
silly -- there is the core frequency, memory bus frequency, internal bus
frequency, etc... multipliers, dividers, PLLs -- everywhere!  Still not sure
what to do about this, at the moment -- but, I think this might be a
convenient use for the policy field. I.e., in "Performance" state it does
one thing (fasted mem bus freq available), in "Conservation" state it does
another (slowest available). But... (see point #1)... should this be a
separate field or not?  Start simple, then build on later, if necessary.

Another way of looking at this is to break the calls up into component
parts:
freq_set_minmax(x,y)
freq_set_exact(x)  (same as freq_set_minmax(x,x))
freq_set_policy(p)
(but then there are synchronization issues...)
freq_synchronize()
etc...

	Trevor


-----Original Message-----
From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
Sent: Thursday, August 29, 2002 3:54 AM
To: Linus Torvalds
Cc: Dominik Brodowski; cpufreq@www.linux.org.uk;
linux-kernel@vger.kernel.org
Subject: Re: [PATCH][2.5.32] CPU frequency and voltage scaling (0/4)


>  { min-Hz, max-Hz, policy }
> 

For a few of the processors "event-hz" or similar would be nice. The
Geode supports hardware assisted bursting to full processor speed when
doing SMM, I/O and IRQ handling.


_______________________________________________
Cpufreq mailing list
Cpufreq@www.linux.org.uk
http://www.linux.org.uk/mailman/listinfo/cpufreq

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2002-09-06 20:53 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-28 11:46 [PATCH][2.5.32] CPU frequency and voltage scaling (0/4) Dominik Brodowski
2002-08-28 18:47 ` Linus Torvalds
2002-08-28 18:48   ` Cort Dougan
2002-08-28 19:25     ` Alan Cox
2002-08-28 19:32       ` Cort Dougan
2002-08-29 10:26         ` Zwane Mwaikambo
2002-08-28 19:41       ` Peter Riocreux
2002-08-28 19:58       ` Linus Torvalds
2002-08-29  9:51         ` Padraig Brady
2002-08-29 10:23     ` Zwane Mwaikambo
2002-08-28 19:21   ` Alan Cox
2002-08-28 19:49     ` Linus Torvalds
2002-08-28 20:25       ` Alan Cox
2002-08-28 20:29         ` Linus Torvalds
2002-08-28 23:26           ` Alan Cox
2002-08-28 23:49             ` Linus Torvalds
2002-08-30  0:39               ` jw schultz
2002-08-29  7:01             ` Dominik Brodowski
2002-08-28 20:39         ` Dominik Brodowski
2002-08-28 21:05           ` Linus Torvalds
2002-09-06 11:31             ` Pavel Machek
2002-08-28 20:27       ` Dominik Brodowski
2002-08-28 20:19   ` Dominik Brodowski
2002-08-28 20:43     ` Linus Torvalds
2002-08-28 20:53       ` Dominik Brodowski
2002-08-28 21:08         ` Linus Torvalds
2002-08-28 23:00           ` george anzinger
2002-08-28 23:30           ` Alan Cox
2002-08-29  0:08             ` Linus Torvalds
2002-08-29  7:07               ` Dominik Brodowski
2002-08-29 10:02               ` Padraig Brady
2002-08-29 10:53               ` Alan Cox
2002-08-29 13:38                 ` Dave Jones
2002-08-29 18:47                 ` Linus Torvalds
2002-08-29 19:24                   ` Alan Cox
2002-08-29 21:22                   ` george anzinger
2002-08-30  6:46                     ` David Gibson
2002-08-30  7:54                     ` Helge Hafting
2002-08-30  3:21                 ` David Lang
2002-08-28 20:25 Grover, Andrew
2002-08-28 20:46 ` Linus Torvalds
2002-08-29 15:07 Pering, Trevor
2002-08-30  8:04 ` Helge Hafting
2002-08-30 11:53   ` Dave Jones
2002-08-30 12:36     ` Helge Hafting
2002-08-30 22:43       ` george anzinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).