All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: intel_pstate: Lower p-state when putting down CPU
       [not found]   ` <5320D12B.4040303@gmail.com>
@ 2014-03-13  0:07     ` Rafael J. Wysocki
  2014-03-13  4:56       ` Viresh Kumar
  2014-03-14 21:03       ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie
  0 siblings, 2 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2014-03-13  0:07 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Patrick Marlier, Viresh Kumar, Srivatsa S. Bhat, Linux PM list

On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote:
> This is a multi-part message in MIME format.
> --------------030804060705000409040008
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> 
> Resend to snare Rafael since intel.com rejects mail with attached shell
> scripts

You might CC the list as well.

> On 03/12/2014 02:23 PM, Dirk Brandewie wrote:
> > Hi Patrick,
> >
> > Adding a couple of people to the CC list so I will restate the issue please
> > correct me if I get it wrong :-)
> >
> > With Patrick's reproduction case below if a core is off-lined while it
> > is requesting a high P state that request continues to be factored into
> > the hardware coordination for selecting the package P state which will
> > set the floor P state at the P state requested by the offline core.
> >
> > I changes intel_pstate's ->exit() procedure to set the P state request
> > for the core being off-lined to the MIN P state. This did not work due to the
> > exit callback being called in the CPU_POST_DEAD phase of the CPU offline
> > process.
> >
> > I hacked a call to the exit() callback into the end of __cpufreq_remove_dev_prepare()
> > and removed the exit() call in __cpufreq_remove_dev_finish() and everything
> > started as expected (at least for this test case).
> >
> > This is most likely *WRONG* given the hell Viresh and Srivatsa when through
> > getting the lightweight tear down stuff working.

I agree with this observation.  The ordering in __cpufreq_remove_dev_finish()
most likely matters and you'd need to audit all drivers implementing ->exit()
even if you thought that it didn't matter.

> > So the question for Rafael, Viresh and Srivatsa what is the way to proceed
> > here given that intel_pstate's exit() callback needs to be called in the
> > CPU_DOWN_PREPARE phase of cpu offline for it to do the work it needs to do
> > to fix the problem Patrick is seeing?
> >
> > I see two possibilities:
> >   1. Move the exit() callback to __cpufreq_remove_dev_prepare().  I don't
> >      have a good understanding of what carnage this would cause in the core
> >      or other scaling drivers.
> >
> >   2. Add another callback to the cpufreq_driver interface that would be call
> >      from __cpufreq_remove_dev_prepare() if the callback was set.

I prefer 2, the reason being that it pretty much is guaranteed not to break
anything.  For the record, I'm not a fan of adding new cpufreq driver callbacks,
but in this particular case it seems we can't really do anything better.

Thanks!

> > I am open to other suggestions as well.
> >
> > The two patches I have locally that "fix" the problem are included at the end.
> >
> > Comments, suggestions?
> > --Dirk
> >
> >
> > On 03/12/2014 10:59 AM, Patrick Marlier wrote:
> >> Here is how I proceed to test it (on a 8 HyperThread CPU):
> >>
> >> 1. make sure that the governor is not performance and that your system is idle
> >> 2. run perf.sh (script attached) and check it is running in a low frequency
> >> 3. run stress -c 4 and keep it running in background
> >> 4. run perf.sh and check it is now running full speed.
> >> 5. turn off 4 cores
> >> for i in {4..7} ; do echo 0 | sudo tee /sys/devices/system/cpu/cpu$i/online ; done
> >> 6. stop stress program
> >> 7. wait few seconds
> >> 7. run perf.sh
> >>
> >> perf.sh will show that the requested p-state for all 4 online cores is low but
> >> the effective p-state is high frequency due to offline cores.
> >>
> >> I will test your patch tomorrow and thanks again for your time.
> >>
> >> Keep you posted,
> >> --
> >> Patrick Marlier
> >>
> > commit 293dd3c8d176c3c807471a60dac308d71275251a
> > Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
> > Date:   Tue Mar 11 11:48:28 2014 -0700
> >
> >      intel_pstate: Set core to min P state during core offline
> >
> >      If a core is off-lined and becomes non-idle it will have the last
> >      requested P state for the core which could adversely affect the
> >      package P state.
> >
> >      Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> > ---
> >   drivers/cpufreq/intel_pstate.c | 14 +++++++++-----
> >   1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index c883ebc..23e3fa1 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate)
> >   	if (limits.no_turbo)
> >   		val |= (u64)1 << 32;
> >
> > -	wrmsrl(MSR_IA32_PERF_CTL, val);
> > +	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> >   }
> >
> >   static struct cpu_defaults core_params = {
> > @@ -776,12 +776,16 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy)
> >
> >   static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
> >   {
> > -	int cpu = policy->cpu;
> > +	int cpu_num = policy->cpu;
> > +	struct cpudata *cpu = all_cpu_data[cpu_num];
> >
> > +	pr_info("intel_pstate CPU %d exiting\n", cpu_num);
> > +	
> > +	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate);
> > +	del_timer(&all_cpu_data[cpu_num]->timer);
> > +	kfree(all_cpu_data[cpu_num]);
> > +	all_cpu_data[cpu_num] = NULL;
> >
> > -	del_timer(&all_cpu_data[cpu]->timer);
> > -	kfree(all_cpu_data[cpu]);
> > -	all_cpu_data[cpu] = NULL;
> >   	return 0;
> >   }
> >
> > commit 9277ca812181ac246c76e4634e937845ecb5c81f
> > Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
> > Date:   Wed Mar 12 12:19:05 2014 -0700
> >
> >      cpufreq: call scaling driver exit from down prepare
> >
> >      Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> > ---
> >   drivers/cpufreq/cpufreq.c      | 7 ++++---
> >   drivers/cpufreq/intel_pstate.c | 1 +
> >   2 files changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index cf485d9..28a3c2a 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -1337,7 +1337,8 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
> >   			}
> >   		}
> >   	}
> > -
> > +	if (cpufreq_driver->exit)
> > +		cpufreq_driver->exit(policy);
> >   	return 0;
> >   }
> >
> > @@ -1386,8 +1387,8 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> >   		 * since this is a core component, and is essential for the
> >   		 * subsequent light-weight ->init() to succeed.
> >   		 */
> > -		if (cpufreq_driver->exit)
> > -			cpufreq_driver->exit(policy);
> > +//		if (cpufreq_driver->exit)
> > +//			cpufreq_driver->exit(policy);
> >
> >   		/* Remove policy from list of active policies */
> >   		write_lock_irqsave(&cpufreq_driver_lock, flags);
> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index 585e5f2..c883ebc 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -778,6 +778,7 @@ static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
> >   {
> >   	int cpu = policy->cpu;
> >
> > +
> >   	del_timer(&all_cpu_data[cpu]->timer);
> >   	kfree(all_cpu_data[cpu]);
> >   	all_cpu_data[cpu] = NULL;
> >
> 
> 
> --------------030804060705000409040008
> Content-Type: application/x-shellscript;
>  name="perf.sh"
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment;
>  filename="perf.sh"
> 
> IyEvYmluL2Jhc2ggCkNQVXM9JChzZXEgMCA3KQoKU1BFRURTVEVQX0VOQUJMRT0kKHN1ZG8g
> cmRtc3IgLWYgMTY6MTYgMHgxQTAgMj4vZGV2L251bGwpClRVUkJPTU9ERV9ESVNBQkxFPSQo
> c3VkbyByZG1zciAtZiAzODozOCAweDFBMCAyPi9kZXYvbnVsbCkKZWNobyAtbiAiU3BlZWRT
> dGVwICIKaWYgWyAiJFNQRUVEU1RFUF9FTkFCTEUiID09ICIxIiBdOyB0aGVuCiAgZWNobyAi
> RW5hYmxlIgplbHNlCiAgZWNobyAiRGlzYWJsZSIKZmkKZWNobyAtbiAiVHVyYm9Nb2RlICIK
> aWYgWyAiJFRVUkJPTU9ERV9ESVNBQkxFIiA9PSAiMCIgXTsgdGhlbgogIGVjaG8gIkVuYWJs
> ZSIKZWxzZQogIGVjaG8gIkRpc2FibGUiCmZpCgplY2hvIC1uICJSZXEgKDB4MTk5KTogIgpm
> b3IgaSBpbiAkQ1BVcwpkbwpWMTk5PSQoc3VkbyByZG1zciAtdSAtZjE2OjggLXAgJGkgMHgx
> OTkgMj4vZGV2L251bGwpCmlmIFtbICIke1YxOTl9eCIgIT0gIngiIF1dIDsgdGhlbgpwcmlu
> dGYgIiU0dSAoMHglMngpICAiICQoKCRWMTk5ICogMTAwKSkgJHtWMTk5fSAKZmkKZG9uZQoK
> IyBOb3RlIHRoYXQgaXQgaXMgbm90IGV4ZWN1dGVkIGFsbCBhdCB0aGUgc2FtZSB0aW1lIHNv
> IHAtc3RhdGUgY2FuIGNoYW5nZSBpbiBiZXR3ZWVuCmVjaG8gIiIKZWNobyAtbiAiQ3VyICgw
> eDE5OCk6ICIKZm9yIGkgaW4gJENQVXMKZG8KVjE5OD0kKHN1ZG8gcmRtc3IgLXUgLXAgJGkg
> LWYxNjo4IDB4MTk4IDI+L2Rldi9udWxsKQppZiBbWyAiJHtWMTk4fXgiICE9ICJ4IiBdXSA7
> IHRoZW4KcHJpbnRmICIlNHUgKDB4JTJ4KSAgIiAkKCgkVjE5OCAqIDEwMCkpICR7VjE5OH0g
> CmZpCmRvbmUKZWNobyAiIgoKCg==
> --------------030804060705000409040008--

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: intel_pstate: Lower p-state when putting down CPU
  2014-03-13  0:07     ` intel_pstate: Lower p-state when putting down CPU Rafael J. Wysocki
@ 2014-03-13  4:56       ` Viresh Kumar
  2014-03-13 21:39         ` Rafael J. Wysocki
  2014-03-14 21:03       ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie
  1 sibling, 1 reply; 28+ messages in thread
From: Viresh Kumar @ 2014-03-13  4:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dirk Brandewie, Patrick Marlier, Srivatsa S. Bhat, Linux PM list

On 13 March 2014 08:07, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote:

>> > I see two possibilities:
>> >   1. Move the exit() callback to __cpufreq_remove_dev_prepare().  I don't
>> >      have a good understanding of what carnage this would cause in the core
>> >      or other scaling drivers.
>> >
>> >   2. Add another callback to the cpufreq_driver interface that would be call
>> >      from __cpufreq_remove_dev_prepare() if the callback was set.
>
> I prefer 2, the reason being that it pretty much is guaranteed not to break
> anything.  For the record, I'm not a fan of adding new cpufreq driver callbacks,
> but in this particular case it seems we can't really do anything better.

I haven't thought a lot about which one of these two looks better, probably
Rafael might be correct. But I can see another way out here as this is very
much driver specific. Why can't we do a register_hotcpu_notifier() from
pstate driver alone?

And by that time I will think a bit more about it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: intel_pstate: Lower p-state when putting down CPU
  2014-03-13  4:56       ` Viresh Kumar
@ 2014-03-13 21:39         ` Rafael J. Wysocki
  0 siblings, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2014-03-13 21:39 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Dirk Brandewie, Patrick Marlier, Srivatsa S. Bhat, Linux PM list

On Thursday, March 13, 2014 12:56:02 PM Viresh Kumar wrote:
> On 13 March 2014 08:07, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote:
> 
> >> > I see two possibilities:
> >> >   1. Move the exit() callback to __cpufreq_remove_dev_prepare().  I don't
> >> >      have a good understanding of what carnage this would cause in the core
> >> >      or other scaling drivers.
> >> >
> >> >   2. Add another callback to the cpufreq_driver interface that would be call
> >> >      from __cpufreq_remove_dev_prepare() if the callback was set.
> >
> > I prefer 2, the reason being that it pretty much is guaranteed not to break
> > anything.  For the record, I'm not a fan of adding new cpufreq driver callbacks,
> > but in this particular case it seems we can't really do anything better.
> 
> I haven't thought a lot about which one of these two looks better, probably
> Rafael might be correct. But I can see another way out here as this is very
> much driver specific. Why can't we do a register_hotcpu_notifier() from
> pstate driver alone?

Why would that be better than adding a new callback?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface.
  2014-03-13  0:07     ` intel_pstate: Lower p-state when putting down CPU Rafael J. Wysocki
  2014-03-13  4:56       ` Viresh Kumar
@ 2014-03-14 21:03       ` dirk.brandewie
  2014-03-14 21:03         ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie
                           ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: dirk.brandewie @ 2014-03-14 21:03 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, rjw, patrick.marlier, dirk.brandewie, Dirk Brandewie

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

Some drivers (intel_pstate) need to modify state on a core before it
is completely offline.  The ->exit() callback is executed during the
CPU_POST_DEAD phase of the cpu offline process which is too late to
change the state of the core.

Patch 1 adds an optional callback function to the cpufreq_driver
interface which is called by the core during the CPU_DOWN_PREPARE
phase of cpu offline in __cpufreq_remove_dev_prepare().

Patch 2 changes intel_pstate to use the ->stop callback to do
its cleanup during cpu offline. 

Dirk Brandewie (2):
  cpufreq: Add exit_prepare callback to cpufreq_driver interface
  intel_pstate: Set core to min P state during core offline

 Documentation/cpu-freq/cpu-drivers.txt |  8 +++++++-
 drivers/cpufreq/cpufreq.c              |  3 +++
 drivers/cpufreq/intel_pstate.c         | 20 +++++++++++++-------
 include/linux/cpufreq.h                |  1 +
 4 files changed, 24 insertions(+), 8 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/2] cpufreq: Add exit_prepare callback to cpufreq_driver interface
  2014-03-14 21:03       ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie
@ 2014-03-14 21:03         ` dirk.brandewie
  2014-03-15  2:04           ` Rafael J. Wysocki
  2014-03-14 21:03         ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
  2014-03-18 17:22         ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie
  2 siblings, 1 reply; 28+ messages in thread
From: dirk.brandewie @ 2014-03-14 21:03 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, rjw, patrick.marlier, dirk.brandewie, Dirk Brandewie

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

This callback allows the driver to do clean up before the CPU is
completely down and its state cannot be modified.  This is used
by the intel_pstate driver to reduce the requested P state prior to
the core going away.  This is required because the requested P state
of the offline core is used to select the package P state. This
effectively sets the floor package P state to the requested P state on
the offline core.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++-
 drivers/cpufreq/cpufreq.c              | 3 +++
 include/linux/cpufreq.h                | 1 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
index 8b1a445..79def80 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -61,7 +61,13 @@ target_index		-	See below on the differences.
 
 And optionally
 
-cpufreq_driver.exit -		A pointer to a per-CPU cleanup function.
+cpufreq_driver.exit -		A pointer to a per-CPU cleanup
+		    		function called during CPU_POST_DEAD
+		    		phase of cpu hotplug process.
+
+cpufreq_driver.stop -		A pointer to a per-CPU stop function
+			    	called during CPU_DOWN_PREPARE phase of
+				cpu hotplug process.
 
 cpufreq_driver.resume -		A pointer to a per-CPU resume function
 				which is called with interrupts disabled
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index cf485d9..0d430d7 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1338,6 +1338,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
 		}
 	}
 
+	if (cpufreq_driver->stop)
+		cpufreq_driver->stop(policy);
+
 	return 0;
 }
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 4d89e0e..ff8db19 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -224,6 +224,7 @@ struct cpufreq_driver {
 	int	(*bios_limit)	(int cpu, unsigned int *limit);
 
 	int	(*exit)		(struct cpufreq_policy *policy);
+	int	(*stop)		(struct cpufreq_policy *policy);
 	int	(*suspend)	(struct cpufreq_policy *policy);
 	int	(*resume)	(struct cpufreq_policy *policy);
 	struct freq_attr	**attr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-14 21:03       ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie
  2014-03-14 21:03         ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie
@ 2014-03-14 21:03         ` dirk.brandewie
  2014-03-18  5:44           ` Viresh Kumar
  2014-03-18 17:22         ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie
  2 siblings, 1 reply; 28+ messages in thread
From: dirk.brandewie @ 2014-03-14 21:03 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, rjw, patrick.marlier, dirk.brandewie, Dirk Brandewie

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

Change to use ->stop() callback to do clean up during CPU
hotplug. The requested P state for an offline core will be used by the
hardware coordination function to select the package P state. If the
core is under load when it is offlined it will fix the package P state
floor to the requested P state of offline core.

Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 2cd36b9..e9092fd 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate)
 	if (limits.no_turbo)
 		val |= (u64)1 << 32;
 
-	wrmsrl(MSR_IA32_PERF_CTL, val);
+	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
 }
 
 static struct cpu_defaults core_params = {
@@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy)
 	return 0;
 }
 
-static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
+static int intel_pstate_cpu_stop(struct cpufreq_policy *policy)
 {
-	int cpu = policy->cpu;
+	int cpu_num = policy->cpu;
+	struct cpudata *cpu = all_cpu_data[cpu_num];
+
+	pr_info("intel_pstate CPU %d exiting\n", cpu_num);
+
+	del_timer(&all_cpu_data[cpu_num]->timer);
+	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate);
+	kfree(all_cpu_data[cpu_num]);
+	all_cpu_data[cpu_num] = NULL;
 
-	del_timer(&all_cpu_data[cpu]->timer);
-	kfree(all_cpu_data[cpu]);
-	all_cpu_data[cpu] = NULL;
 	return 0;
 }
 
+
 static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
 {
 	struct cpudata *cpu;
@@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = {
 	.setpolicy	= intel_pstate_set_policy,
 	.get		= intel_pstate_get,
 	.init		= intel_pstate_cpu_init,
-	.exit		= intel_pstate_cpu_exit,
+	.stop		= intel_pstate_cpu_stop,
 	.name		= "intel_pstate",
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Add exit_prepare callback to cpufreq_driver interface
  2014-03-14 21:03         ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie
@ 2014-03-15  2:04           ` Rafael J. Wysocki
  2014-03-18  5:43             ` Viresh Kumar
  0 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2014-03-15  2:04 UTC (permalink / raw)
  To: dirk.brandewie; +Cc: linux-pm, linux-kernel, patrick.marlier, Dirk Brandewie

On Friday, March 14, 2014 02:03:56 PM dirk.brandewie@gmail.com wrote:
> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
> 
> This callback allows the driver to do clean up before the CPU is
> completely down and its state cannot be modified.  This is used
> by the intel_pstate driver to reduce the requested P state prior to
> the core going away.  This is required because the requested P state
> of the offline core is used to select the package P state. This
> effectively sets the floor package P state to the requested P state on
> the offline core.
> 
> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> ---
>  Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++-
>  drivers/cpufreq/cpufreq.c              | 3 +++
>  include/linux/cpufreq.h                | 1 +
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
> index 8b1a445..79def80 100644
> --- a/Documentation/cpu-freq/cpu-drivers.txt
> +++ b/Documentation/cpu-freq/cpu-drivers.txt
> @@ -61,7 +61,13 @@ target_index		-	See below on the differences.
>  
>  And optionally
>  
> -cpufreq_driver.exit -		A pointer to a per-CPU cleanup function.
> +cpufreq_driver.exit -		A pointer to a per-CPU cleanup
> +		    		function called during CPU_POST_DEAD
> +		    		phase of cpu hotplug process.
> +
> +cpufreq_driver.stop -		A pointer to a per-CPU stop function
> +			    	called during CPU_DOWN_PREPARE phase of
> +				cpu hotplug process.
>  
>  cpufreq_driver.resume -		A pointer to a per-CPU resume function
>  				which is called with interrupts disabled
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index cf485d9..0d430d7 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1338,6 +1338,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
>  		}
>  	}
>  
> +	if (cpufreq_driver->stop)

What about doing

+	if (cpufreq_driver->setpolicy && cpufreq_driver->stop)

here instead?  That would make it clear where the new callback belongs.

If you're fine with that, I can make that change when applying the patch.

> +		cpufreq_driver->stop(policy);
> +
>  	return 0;
>  }
>  
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 4d89e0e..ff8db19 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -224,6 +224,7 @@ struct cpufreq_driver {
>  	int	(*bios_limit)	(int cpu, unsigned int *limit);
>  
>  	int	(*exit)		(struct cpufreq_policy *policy);
> +	int	(*stop)		(struct cpufreq_policy *policy);
>  	int	(*suspend)	(struct cpufreq_policy *policy);
>  	int	(*resume)	(struct cpufreq_policy *policy);
>  	struct freq_attr	**attr;
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Add exit_prepare callback to cpufreq_driver interface
  2014-03-15  2:04           ` Rafael J. Wysocki
@ 2014-03-18  5:43             ` Viresh Kumar
  0 siblings, 0 replies; 28+ messages in thread
From: Viresh Kumar @ 2014-03-18  5:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dirk Brandewie, Linux PM list, linux-kernel, Patrick Marlier,
	Dirk Brandewie

On Sat, Mar 15, 2014 at 7:34 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Friday, March 14, 2014 02:03:56 PM dirk.brandewie@gmail.com wrote:

>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1338,6 +1338,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
>>               }
>>       }
>>
>> +     if (cpufreq_driver->stop)
>
> What about doing
>
> +       if (cpufreq_driver->setpolicy && cpufreq_driver->stop)
>
> here instead?  That would make it clear where the new callback belongs.

This is called after stopping governor and so might be useful for ->target()
drivers. So, wouldn't be a bad option if we keep it available for all..

@Dirk: I thought about the solution I mentioned in another mail. And it looks
like I will end up getting a similar solution. So, we will go with
your solution.

Just few changes for your solution..

You need to call ->stop() only for the last CPU of every policy and not for
every CPU, so you need something like this:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 19d25a8..78d41c0 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1371,6 +1371,8 @@ static int __cpufreq_remove_dev_prepare(struct
device *dev,
                                         __func__, new_cpu, cpu);
                        }
                }
+       } else if (cpufreq_driver->stop) {
+               cpufreq_driver->stop(policy);
        }

        return 0;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-14 21:03         ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
@ 2014-03-18  5:44           ` Viresh Kumar
  2014-03-18 15:01             ` Dirk Brandewie
  0 siblings, 1 reply; 28+ messages in thread
From: Viresh Kumar @ 2014-03-18  5:44 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Linux PM list, linux-kernel, Rafael J. Wysocki, Patrick Marlier,
	Dirk Brandewie

On Sat, Mar 15, 2014 at 2:33 AM,  <dirk.brandewie@gmail.com> wrote:
> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
>
> Change to use ->stop() callback to do clean up during CPU
> hotplug. The requested P state for an offline core will be used by the
> hardware coordination function to select the package P state. If the
> core is under load when it is offlined it will fix the package P state
> floor to the requested P state of offline core.
>
> Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 2cd36b9..e9092fd 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate)
>         if (limits.no_turbo)
>                 val |= (u64)1 << 32;
>
> -       wrmsrl(MSR_IA32_PERF_CTL, val);
> +       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>  }
>
>  static struct cpu_defaults core_params = {
> @@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy)
>         return 0;
>  }
>
> -static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
> +static int intel_pstate_cpu_stop(struct cpufreq_policy *policy)
>  {
> -       int cpu = policy->cpu;
> +       int cpu_num = policy->cpu;
> +       struct cpudata *cpu = all_cpu_data[cpu_num];
> +
> +       pr_info("intel_pstate CPU %d exiting\n", cpu_num);
> +
> +       del_timer(&all_cpu_data[cpu_num]->timer);
> +       intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate);
> +       kfree(all_cpu_data[cpu_num]);
> +       all_cpu_data[cpu_num] = NULL;
>
> -       del_timer(&all_cpu_data[cpu]->timer);
> -       kfree(all_cpu_data[cpu]);
> -       all_cpu_data[cpu] = NULL;
>         return 0;
>  }
>
> +
>  static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>  {
>         struct cpudata *cpu;
> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = {
>         .setpolicy      = intel_pstate_set_policy,
>         .get            = intel_pstate_get,
>         .init           = intel_pstate_cpu_init,
> -       .exit           = intel_pstate_cpu_exit,
> +       .stop           = intel_pstate_cpu_stop,

Probably, keep exit as is and only change P-state in stop(). So that
allocation of resources happen in init() and they are freed in exit()?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-18  5:44           ` Viresh Kumar
@ 2014-03-18 15:01             ` Dirk Brandewie
  2014-03-18 18:52               ` Srivatsa S. Bhat
  0 siblings, 1 reply; 28+ messages in thread
From: Dirk Brandewie @ 2014-03-18 15:01 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: dirk.brandewie, Linux PM list, linux-kernel, Rafael J. Wysocki,
	Patrick Marlier, Dirk Brandewie

On 03/17/2014 10:44 PM, Viresh Kumar wrote:
> On Sat, Mar 15, 2014 at 2:33 AM,  <dirk.brandewie@gmail.com> wrote:
>> +
>>   static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>>   {
>>          struct cpudata *cpu;
>> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = {
>>          .setpolicy      = intel_pstate_set_policy,
>>          .get            = intel_pstate_get,
>>          .init           = intel_pstate_cpu_init,
>> -       .exit           = intel_pstate_cpu_exit,
>> +       .stop           = intel_pstate_cpu_stop,
>
> Probably, keep exit as is and only change P-state in stop(). So that
> allocation of resources happen in init() and they are freed in exit()?
>
I looked at doing just that but it junked up the code.  if stop() is called
during PREPARE then init() will be called via __cpufreq_add_dev() in the ONLINE
and DOWN_FAILED case. So once stop() is called the driver will be ready for
init() to be called exactly like when exit() is called.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-14 21:03       ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie
  2014-03-14 21:03         ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie
  2014-03-14 21:03         ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
@ 2014-03-18 17:22         ` dirk.brandewie
  2014-03-18 17:22           ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie
                             ` (2 more replies)
  2 siblings, 3 replies; 28+ messages in thread
From: dirk.brandewie @ 2014-03-18 17:22 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, rjw, patrick.marlier, viresh.kumar, srivatsa.bhat,
	dirk.brandewie, Dirk Brandewie

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

Changes: 
v2->v3
Changed the calling of the ->stop() callback to be conditional on the
core being the last core controlled by a given policy.

v1->2
Change name of callback function added to cpufreq_driver interface.

Some drivers (intel_pstate) need to modify state on a core before it
is completely offline.  The ->exit() callback is executed during the
CPU_POST_DEAD phase of the cpu offline process which is too late to
change the state of the core.

Patch 1 adds an optional callback function to the cpufreq_driver
interface which is called by the core during the CPU_DOWN_PREPARE
phase of cpu offline in __cpufreq_remove_dev_prepare().

Patch 2 changes intel_pstate to use the ->stop callback to do
its cleanup during cpu offline. 

Dirk Brandewie (2):
  cpufreq: Add exit_prepare callback to cpufreq_driver interface
  intel_pstate: Set core to min P state during core offline

 Documentation/cpu-freq/cpu-drivers.txt |  8 +++++++-
 drivers/cpufreq/cpufreq.c              |  3 +++
 drivers/cpufreq/intel_pstate.c         | 20 +++++++++++++-------
 include/linux/cpufreq.h                |  1 +
 4 files changed, 24 insertions(+), 8 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] cpufreq: Add stop callback to cpufreq_driver interface
  2014-03-18 17:22         ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie
@ 2014-03-18 17:22           ` dirk.brandewie
  2014-03-19  5:04             ` Viresh Kumar
  2014-03-18 17:22           ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
  2014-03-18 19:08           ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat
  2 siblings, 1 reply; 28+ messages in thread
From: dirk.brandewie @ 2014-03-18 17:22 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, rjw, patrick.marlier, viresh.kumar, srivatsa.bhat,
	dirk.brandewie, Dirk Brandewie

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

This callback allows the driver to do clean up before the CPU is
completely down and its state cannot be modified.  This is used
by the intel_pstate driver to reduce the requested P state prior to
the core going away.  This is required because the requested P state
of the offline core is used to select the package P state. This
effectively sets the floor package P state to the requested P state on
the offline core.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++-
 drivers/cpufreq/cpufreq.c              | 3 ++-
 include/linux/cpufreq.h                | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
index 8b1a445..79def80 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -61,7 +61,13 @@ target_index		-	See below on the differences.
 
 And optionally
 
-cpufreq_driver.exit -		A pointer to a per-CPU cleanup function.
+cpufreq_driver.exit -		A pointer to a per-CPU cleanup
+		    		function called during CPU_POST_DEAD
+		    		phase of cpu hotplug process.
+
+cpufreq_driver.stop -		A pointer to a per-CPU stop function
+			    	called during CPU_DOWN_PREPARE phase of
+				cpu hotplug process.
 
 cpufreq_driver.resume -		A pointer to a per-CPU resume function
 				which is called with interrupts disabled
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index cf485d9..6e6beae 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1336,7 +1336,8 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
 						__func__, new_cpu, cpu);
 			}
 		}
-	}
+	} else if (cpufreq_driver->stop)
+		cpufreq_driver->stop(policy);
 
 	return 0;
 }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 4d89e0e..ff8db19 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -224,6 +224,7 @@ struct cpufreq_driver {
 	int	(*bios_limit)	(int cpu, unsigned int *limit);
 
 	int	(*exit)		(struct cpufreq_policy *policy);
+	int	(*stop)		(struct cpufreq_policy *policy);
 	int	(*suspend)	(struct cpufreq_policy *policy);
 	int	(*resume)	(struct cpufreq_policy *policy);
 	struct freq_attr	**attr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-18 17:22         ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie
  2014-03-18 17:22           ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie
@ 2014-03-18 17:22           ` dirk.brandewie
  2014-03-18 19:08             ` Srivatsa S. Bhat
  2014-03-18 19:08           ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat
  2 siblings, 1 reply; 28+ messages in thread
From: dirk.brandewie @ 2014-03-18 17:22 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, rjw, patrick.marlier, viresh.kumar, srivatsa.bhat,
	dirk.brandewie, Dirk Brandewie

From: Dirk Brandewie <dirk.j.brandewie@intel.com>

Change to use ->exit_prepare() callback to do clean up during CPU
hotplug. The requested P state for an offline core will be used by the
hardware coordination function to select the package P state. If the
core is under load when it is offlined it will fix the package P state
floor to the requested P state of offline core.

Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 2cd36b9..e9092fd 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate)
 	if (limits.no_turbo)
 		val |= (u64)1 << 32;
 
-	wrmsrl(MSR_IA32_PERF_CTL, val);
+	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
 }
 
 static struct cpu_defaults core_params = {
@@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy)
 	return 0;
 }
 
-static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
+static int intel_pstate_cpu_stop(struct cpufreq_policy *policy)
 {
-	int cpu = policy->cpu;
+	int cpu_num = policy->cpu;
+	struct cpudata *cpu = all_cpu_data[cpu_num];
+
+	pr_info("intel_pstate CPU %d exiting\n", cpu_num);
+
+	del_timer(&all_cpu_data[cpu_num]->timer);
+	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate);
+	kfree(all_cpu_data[cpu_num]);
+	all_cpu_data[cpu_num] = NULL;
 
-	del_timer(&all_cpu_data[cpu]->timer);
-	kfree(all_cpu_data[cpu]);
-	all_cpu_data[cpu] = NULL;
 	return 0;
 }
 
+
 static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
 {
 	struct cpudata *cpu;
@@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = {
 	.setpolicy	= intel_pstate_set_policy,
 	.get		= intel_pstate_get,
 	.init		= intel_pstate_cpu_init,
-	.exit		= intel_pstate_cpu_exit,
+	.stop		= intel_pstate_cpu_stop,
 	.name		= "intel_pstate",
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-18 15:01             ` Dirk Brandewie
@ 2014-03-18 18:52               ` Srivatsa S. Bhat
  2014-03-18 19:44                 ` Dirk Brandewie
  0 siblings, 1 reply; 28+ messages in thread
From: Srivatsa S. Bhat @ 2014-03-18 18:52 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Viresh Kumar, Linux PM list, linux-kernel, Rafael J. Wysocki,
	Patrick Marlier, Dirk Brandewie

On 03/18/2014 08:31 PM, Dirk Brandewie wrote:
> On 03/17/2014 10:44 PM, Viresh Kumar wrote:
>> On Sat, Mar 15, 2014 at 2:33 AM,  <dirk.brandewie@gmail.com> wrote:
>>> +
>>>   static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>>>   {
>>>          struct cpudata *cpu;
>>> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = {
>>>          .setpolicy      = intel_pstate_set_policy,
>>>          .get            = intel_pstate_get,
>>>          .init           = intel_pstate_cpu_init,
>>> -       .exit           = intel_pstate_cpu_exit,
>>> +       .stop           = intel_pstate_cpu_stop,
>>
>> Probably, keep exit as is and only change P-state in stop(). So that
>> allocation of resources happen in init() and they are freed in exit()?
>>
> I looked at doing just that but it junked up the code.  if stop() is called
> during PREPARE then init() will be called via __cpufreq_add_dev() in the
> ONLINE
> and DOWN_FAILED case. So once stop() is called the driver will be ready for
> init() to be called exactly like when exit() is called.
>

I'm sorry, but that didn't make much sense to me. Can you be a little
more specific as to what problems you hit while trying to have a
->stop() which sets min P state and a separate ->exit() which frees
the resources? I think we can achieve this with almost no trouble.

If you ignore the failure case (such as DOWN_FAILED) for now, do you
still see any serious roadblocks?

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-18 17:22         ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie
  2014-03-18 17:22           ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie
  2014-03-18 17:22           ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
@ 2014-03-18 19:08           ` Srivatsa S. Bhat
  2014-03-18 19:25             ` Dirk Brandewie
  2 siblings, 1 reply; 28+ messages in thread
From: Srivatsa S. Bhat @ 2014-03-18 19:08 UTC (permalink / raw)
  To: dirk.brandewie
  Cc: linux-pm, linux-kernel, rjw, patrick.marlier, viresh.kumar,
	Dirk Brandewie

On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote:
> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
>

I don't mean to nitpick, but generally its easier to deal with
patchsets if you post the subsequent versions in fresh email threads.
Otherwise it can get a bit muddled along with too many other email
discussions in the same thread :-(
 
> Changes: 
> v2->v3
> Changed the calling of the ->stop() callback to be conditional on the
> core being the last core controlled by a given policy.
> 

Wait, why? I'm sorry if I am not catching up with the discussions on
this issue quickly enough, but I don't see why we should make it
conditional on _that_. I thought we agreed that we should make it
conditional in the sense that ->stop() should be invoked only for
->setpolicy drivers, right?

The way I look at it, ->stop() gives you a chance to stop managing
the CPU going offline. As in "stop this CPU". ->exit() is your chance
to cleanup the policy, since all its users have gone offline (or this
is the last CPU belonging to that policy which is going offline).

With this in mind, we should invoke ->stop() every time we take a
CPU offline, and invoke ->exit() only when the last CPU in the policy
goes offline.

What am I missing?

Regards,
Srivatsa S. Bhat

> v1->2
> Change name of callback function added to cpufreq_driver interface.
> 
> Some drivers (intel_pstate) need to modify state on a core before it
> is completely offline.  The ->exit() callback is executed during the
> CPU_POST_DEAD phase of the cpu offline process which is too late to
> change the state of the core.
> 
> Patch 1 adds an optional callback function to the cpufreq_driver
> interface which is called by the core during the CPU_DOWN_PREPARE
> phase of cpu offline in __cpufreq_remove_dev_prepare().
> 
> Patch 2 changes intel_pstate to use the ->stop callback to do
> its cleanup during cpu offline. 
> 
> Dirk Brandewie (2):
>   cpufreq: Add exit_prepare callback to cpufreq_driver interface
>   intel_pstate: Set core to min P state during core offline
> 
>  Documentation/cpu-freq/cpu-drivers.txt |  8 +++++++-
>  drivers/cpufreq/cpufreq.c              |  3 +++
>  drivers/cpufreq/intel_pstate.c         | 20 +++++++++++++-------
>  include/linux/cpufreq.h                |  1 +
>  4 files changed, 24 insertions(+), 8 deletions(-)
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-18 17:22           ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
@ 2014-03-18 19:08             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 28+ messages in thread
From: Srivatsa S. Bhat @ 2014-03-18 19:08 UTC (permalink / raw)
  To: dirk.brandewie
  Cc: linux-pm, linux-kernel, rjw, patrick.marlier, viresh.kumar,
	Dirk Brandewie

On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote:
> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
> 
> Change to use ->exit_prepare() callback to do clean up during CPU

->stop()

> hotplug. The requested P state for an offline core will be used by the
> hardware coordination function to select the package P state. If the
> core is under load when it is offlined it will fix the package P state
> floor to the requested P state of offline core.
> 
> Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 2cd36b9..e9092fd 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate)
>  	if (limits.no_turbo)
>  		val |= (u64)1 << 32;
> 
> -	wrmsrl(MSR_IA32_PERF_CTL, val);
> +	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>  }
> 
>  static struct cpu_defaults core_params = {
> @@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy)
>  	return 0;
>  }
> 
> -static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
> +static int intel_pstate_cpu_stop(struct cpufreq_policy *policy)
>  {
> -	int cpu = policy->cpu;
> +	int cpu_num = policy->cpu;
> +	struct cpudata *cpu = all_cpu_data[cpu_num];
> +
> +	pr_info("intel_pstate CPU %d exiting\n", cpu_num);
> +
> +	del_timer(&all_cpu_data[cpu_num]->timer);
> +	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate);
> +	kfree(all_cpu_data[cpu_num]);
> +	all_cpu_data[cpu_num] = NULL;
>

I think it should be relatively simple to keep the intel_pstate_set_pstate()
here inside ->stop() and move the rest of the code to ->exit().

Regards,
Srivatsa S. Bhat
 
> -	del_timer(&all_cpu_data[cpu]->timer);
> -	kfree(all_cpu_data[cpu]);
> -	all_cpu_data[cpu] = NULL;
>  	return 0;
>  }
> 
> +
>  static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>  {
>  	struct cpudata *cpu;
> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = {
>  	.setpolicy	= intel_pstate_set_policy,
>  	.get		= intel_pstate_get,
>  	.init		= intel_pstate_cpu_init,
> -	.exit		= intel_pstate_cpu_exit,
> +	.stop		= intel_pstate_cpu_stop,
>  	.name		= "intel_pstate",
>  };
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-18 19:08           ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat
@ 2014-03-18 19:25             ` Dirk Brandewie
  2014-03-18 20:04               ` Srivatsa S. Bhat
  2014-03-19  0:53               ` Rafael J. Wysocki
  0 siblings, 2 replies; 28+ messages in thread
From: Dirk Brandewie @ 2014-03-18 19:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: dirk.brandewie, linux-pm, linux-kernel, rjw, patrick.marlier,
	viresh.kumar, Dirk Brandewie

On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote:
> On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote:
>> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
>>
>
> I don't mean to nitpick, but generally its easier to deal with
> patchsets if you post the subsequent versions in fresh email threads.
> Otherwise it can get a bit muddled along with too many other email
> discussions in the same thread :-(
>
>> Changes:
>> v2->v3
>> Changed the calling of the ->stop() callback to be conditional on the
>> core being the last core controlled by a given policy.
>>
>
> Wait, why? I'm sorry if I am not catching up with the discussions on
> this issue quickly enough, but I don't see why we should make it
> conditional on _that_. I thought we agreed that we should make it
> conditional in the sense that ->stop() should be invoked only for
> ->setpolicy drivers, right?

This was done at Viresh's suggestion since thought there might be value
for ->target drivers.

Any of the options work for me
    called only for set_policy scaling drivers
    called unconditionally for all scaling drivers
    called for last core controlled by a given policy

>
> The way I look at it, ->stop() gives you a chance to stop managing
> the CPU going offline. As in "stop this CPU". ->exit() is your chance
> to cleanup the policy, since all its users have gone offline (or this
> is the last CPU belonging to that policy which is going offline).
>
> With this in mind, we should invoke ->stop() every time we take a
> CPU offline, and invoke ->exit() only when the last CPU in the policy
> goes offline.

This is exactly what will happen for intel_pstate since the policies cover
a single core.

I will defer to you and Viresh how policies that affect more that one
cpu should be handled.

What intel_pstate needs it to be called during the PREPARE phase of the
offline process.

>
> What am I missing?
>
> Regards,
> Srivatsa S. Bhat
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-18 18:52               ` Srivatsa S. Bhat
@ 2014-03-18 19:44                 ` Dirk Brandewie
  2014-03-18 20:15                   ` Srivatsa S. Bhat
  2014-03-19  5:20                   ` Viresh Kumar
  0 siblings, 2 replies; 28+ messages in thread
From: Dirk Brandewie @ 2014-03-18 19:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat, Dirk Brandewie
  Cc: dirk.j.brandewie, Viresh Kumar, Linux PM list, linux-kernel,
	Rafael J. Wysocki, Patrick Marlier

On 03/18/2014 11:52 AM, Srivatsa S. Bhat wrote:
> On 03/18/2014 08:31 PM, Dirk Brandewie wrote:
>> On 03/17/2014 10:44 PM, Viresh Kumar wrote:
>>> On Sat, Mar 15, 2014 at 2:33 AM,  <dirk.brandewie@gmail.com> wrote:
>>>> +
>>>>    static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>>>>    {
>>>>           struct cpudata *cpu;
>>>> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = {
>>>>           .setpolicy      = intel_pstate_set_policy,
>>>>           .get            = intel_pstate_get,
>>>>           .init           = intel_pstate_cpu_init,
>>>> -       .exit           = intel_pstate_cpu_exit,
>>>> +       .stop           = intel_pstate_cpu_stop,
>>>
>>> Probably, keep exit as is and only change P-state in stop(). So that
>>> allocation of resources happen in init() and they are freed in exit()?
>>>
>> I looked at doing just that but it junked up the code.  if stop() is called
>> during PREPARE then init() will be called via __cpufreq_add_dev() in the
>> ONLINE
>> and DOWN_FAILED case. So once stop() is called the driver will be ready for
>> init() to be called exactly like when exit() is called.
>>
>
> I'm sorry, but that didn't make much sense to me. Can you be a little
> more specific as to what problems you hit while trying to have a
> ->stop() which sets min P state and a separate ->exit() which frees
> the resources? I think we can achieve this with almost no trouble.
>

There was no problem per se.  In stop() all I really needed to do is stop the
timer and set the P state to MIN.

At init time I need to allocate memory and start timer.  If stopping the timer
and deallocating memory are separated then I need code in init() to detect
this case.

Moving all the clean up to stop() make my code simpler, covers the
failure case and maintains the behaviour expected by the core.

> If you ignore the failure case (such as DOWN_FAILED) for now, do you
> still see any serious roadblocks?

Why would I ignore a valid failure case?

>
> Regards,
> Srivatsa S. Bhat
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-18 19:25             ` Dirk Brandewie
@ 2014-03-18 20:04               ` Srivatsa S. Bhat
  2014-03-19  0:53               ` Rafael J. Wysocki
  1 sibling, 0 replies; 28+ messages in thread
From: Srivatsa S. Bhat @ 2014-03-18 20:04 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: linux-pm, linux-kernel, rjw, patrick.marlier, viresh.kumar,
	Dirk Brandewie

On 03/19/2014 12:55 AM, Dirk Brandewie wrote:
> On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote:
>> On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote:
>>> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
>>>
>>
>> I don't mean to nitpick, but generally its easier to deal with
>> patchsets if you post the subsequent versions in fresh email threads.
>> Otherwise it can get a bit muddled along with too many other email
>> discussions in the same thread :-(
>>
>>> Changes:
>>> v2->v3
>>> Changed the calling of the ->stop() callback to be conditional on the
>>> core being the last core controlled by a given policy.
>>>
>>
>> Wait, why? I'm sorry if I am not catching up with the discussions on
>> this issue quickly enough, but I don't see why we should make it
>> conditional on _that_. I thought we agreed that we should make it
>> conditional in the sense that ->stop() should be invoked only for
>> ->setpolicy drivers, right?
> 
> This was done at Viresh's suggestion since thought there might be value
> for ->target drivers.
> 
> Any of the options work for me
>    called only for set_policy scaling drivers
>    called unconditionally for all scaling drivers
>    called for last core controlled by a given policy
> 
>>
>> The way I look at it, ->stop() gives you a chance to stop managing
>> the CPU going offline. As in "stop this CPU". ->exit() is your chance
>> to cleanup the policy, since all its users have gone offline (or this
>> is the last CPU belonging to that policy which is going offline).
>>
>> With this in mind, we should invoke ->stop() every time we take a
>> CPU offline, and invoke ->exit() only when the last CPU in the policy
>> goes offline.
> 
> This is exactly what will happen for intel_pstate since the policies cover
> a single core.
> 
> I will defer to you and Viresh how policies that affect more that one
> cpu should be handled.
> 
> What intel_pstate needs it to be called during the PREPARE phase of the
> offline process.
>

Sure, so here are my thoughts:

1. ->target() and ->setpolicy() drivers are mutually exclusive. Rafael
   had sent a patch to enforce this, and Viresh acked this patch.

   http://marc.info/?l=linux-pm&m=139458107925911&w=2
   http://marc.info/?l=linux-pm&m=139460177829875&w=2

2. The way I understand, ->target() drivers use one of the defined governors,
   and hence have a way to stop managing the CPUs upon CPU offline events.
   This is done via the GOV_STOP mechanism (in the DOWN_PREPARE stage).

   On the other hand, the ->setpolicy() drivers don't have any equivalent
   mechanism at all, and all they have today is the ->exit() callback which
   is invoked way too late in the offline process, for it to be of any use.

So the goal here is to introduce a new mechanism or callback to help those
->setpolicy drivers to do whatever they wish to do, while taking the CPU
offline. In the case of intel_pstate, the driver wants to set the outgoing
CPU's frequency to the min P state.

With this reasoning, the cpufreq core should invoke the ->stop() callback
only for the ->setpolicy() drivers. Let us not over-generalize interfaces
unless they are actually going to be useful in broader scenarios.


Now let's focus on the second issue - how often should we call ->stop()?
Should we call it on every CPU offline or only upon the last CPU going
offline in a given policy?

If we look back at the ->target() drivers who use the defined governors,
we _effectively_ call GOV_STOP during every CPU offline event. That is,
the policy needs to stop governing the CPU going offline from this point
onwards, hence the GOV_STOP (and subsequent GOV_START without the offline
CPU) makes sense.

Similarly, I believe that we should call ->stop() on every CPU offline.
We already have ->exit() that gets called when the last CPU goes offline
in that policy. With this scheme, we give 2 unique properties to ->stop()
as compared to ->exit():

a. ->stop() gets called during the CPU_DOWN_PREPARE stage, which lets
   the ->setpolicy driver actually take some action on the CPU's frequency.

b. ->stop() gets called for every CPU offline. The driver can use this
   fact if useful. Otherwise, the driver can still ignore some of these
   calls and do the actual work when the last CPU goes offline in that
   policy. From what I know, the former case is much more useful, and
   hence this semantics of invoking it on every CPU offline makes sense.

Thoughts?

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-18 19:44                 ` Dirk Brandewie
@ 2014-03-18 20:15                   ` Srivatsa S. Bhat
  2014-03-19  5:20                   ` Viresh Kumar
  1 sibling, 0 replies; 28+ messages in thread
From: Srivatsa S. Bhat @ 2014-03-18 20:15 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: dirk.j.brandewie, Viresh Kumar, Linux PM list, linux-kernel,
	Rafael J. Wysocki, Patrick Marlier

On 03/19/2014 01:14 AM, Dirk Brandewie wrote:
> On 03/18/2014 11:52 AM, Srivatsa S. Bhat wrote:
>> On 03/18/2014 08:31 PM, Dirk Brandewie wrote:
>>> On 03/17/2014 10:44 PM, Viresh Kumar wrote:
>>>> On Sat, Mar 15, 2014 at 2:33 AM,  <dirk.brandewie@gmail.com> wrote:
>>>>> +
>>>>>    static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>>>>>    {
>>>>>           struct cpudata *cpu;
>>>>> @@ -818,7 +824,7 @@ static struct cpufreq_driver
>>>>> intel_pstate_driver = {
>>>>>           .setpolicy      = intel_pstate_set_policy,
>>>>>           .get            = intel_pstate_get,
>>>>>           .init           = intel_pstate_cpu_init,
>>>>> -       .exit           = intel_pstate_cpu_exit,
>>>>> +       .stop           = intel_pstate_cpu_stop,
>>>>
>>>> Probably, keep exit as is and only change P-state in stop(). So that
>>>> allocation of resources happen in init() and they are freed in exit()?
>>>>
>>> I looked at doing just that but it junked up the code.  if stop() is
>>> called
>>> during PREPARE then init() will be called via __cpufreq_add_dev() in the
>>> ONLINE
>>> and DOWN_FAILED case. So once stop() is called the driver will be
>>> ready for
>>> init() to be called exactly like when exit() is called.
>>>
>>
>> I'm sorry, but that didn't make much sense to me. Can you be a little
>> more specific as to what problems you hit while trying to have a
>> ->stop() which sets min P state and a separate ->exit() which frees
>> the resources? I think we can achieve this with almost no trouble.
>>
> 
> There was no problem per se.  In stop() all I really needed to do is
> stop the
> timer and set the P state to MIN.
> 
> At init time I need to allocate memory and start timer.  If stopping the
> timer
> and deallocating memory are separated then I need code in init() to detect
> this case.
> 
> Moving all the clean up to stop() make my code simpler, covers the
> failure case and maintains the behaviour expected by the core.
> 
>> If you ignore the failure case (such as DOWN_FAILED) for now, do you
>> still see any serious roadblocks?
> 
> Why would I ignore a valid failure case?
> 

Of course you shouldn't ignore it. I was just trying to make it easier
to think about the design without complicating it with arcane failure
cases right at the outset, that's all.

Now that I looked at it again, I see your point. The problem is that
by the DOWN_PREPARE stage, the core would have completed only half the
tear-down (via __cpufreq_remove_dev_prepare()), but on failure, it tries
to do a full init (via __cpufreq_add_dev()). I would say that's actually
not a great design from the cpufreq core perspective, but perhaps we can
fix it at a later point in time if it is that painful to endure.

So yes, now I understand see why you do all the teardown in ->stop(),
to workaround the somewhat inconvenient rollback performed by the
cpufreq core. Your approach looks good to me.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-18 19:25             ` Dirk Brandewie
  2014-03-18 20:04               ` Srivatsa S. Bhat
@ 2014-03-19  0:53               ` Rafael J. Wysocki
  2014-03-19  5:33                 ` Viresh Kumar
  1 sibling, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2014-03-19  0:53 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Srivatsa S. Bhat, linux-pm, linux-kernel, patrick.marlier,
	viresh.kumar, Dirk Brandewie

On Tuesday, March 18, 2014 12:25:14 PM Dirk Brandewie wrote:
> On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote:
> > On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote:
> >> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
> >>
> >
> > I don't mean to nitpick, but generally its easier to deal with
> > patchsets if you post the subsequent versions in fresh email threads.
> > Otherwise it can get a bit muddled along with too many other email
> > discussions in the same thread :-(
> >
> >> Changes:
> >> v2->v3
> >> Changed the calling of the ->stop() callback to be conditional on the
> >> core being the last core controlled by a given policy.
> >>
> >
> > Wait, why? I'm sorry if I am not catching up with the discussions on
> > this issue quickly enough, but I don't see why we should make it
> > conditional on _that_. I thought we agreed that we should make it
> > conditional in the sense that ->stop() should be invoked only for
> > ->setpolicy drivers, right?
> 
> This was done at Viresh's suggestion since thought there might be value
> for ->target drivers.
> 
> Any of the options work for me
>     called only for set_policy scaling drivers

And that's what we should do *today* in my opinion, unless we want to add
it to any ->target() drivers *right* now.  Do we want that?

Rafael


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] cpufreq: Add stop callback to cpufreq_driver interface
  2014-03-18 17:22           ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie
@ 2014-03-19  5:04             ` Viresh Kumar
  0 siblings, 0 replies; 28+ messages in thread
From: Viresh Kumar @ 2014-03-19  5:04 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: linux-pm, Linux Kernel Mailing List, Rafael J. Wysocki,
	Patrick Marlier, Srivatsa S. Bhat, Dirk Brandewie

On 18 March 2014 22:52,  <dirk.brandewie@gmail.com> wrote:
> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
>
> This callback allows the driver to do clean up before the CPU is
> completely down and its state cannot be modified.  This is used
> by the intel_pstate driver to reduce the requested P state prior to
> the core going away.  This is required because the requested P state
> of the offline core is used to select the package P state. This
> effectively sets the floor package P state to the requested P state on
> the offline core.
>
> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
> ---
>  Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++-
>  drivers/cpufreq/cpufreq.c              | 3 ++-
>  include/linux/cpufreq.h                | 1 +
>  3 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
> index 8b1a445..79def80 100644
> --- a/Documentation/cpu-freq/cpu-drivers.txt
> +++ b/Documentation/cpu-freq/cpu-drivers.txt
> @@ -61,7 +61,13 @@ target_index         -       See below on the differences.
>
>  And optionally
>
> -cpufreq_driver.exit -          A pointer to a per-CPU cleanup function.
> +cpufreq_driver.exit -          A pointer to a per-CPU cleanup
> +                               function called during CPU_POST_DEAD
> +                               phase of cpu hotplug process.
> +
> +cpufreq_driver.stop -          A pointer to a per-CPU stop function
> +                               called during CPU_DOWN_PREPARE phase of
> +                               cpu hotplug process.
>
>  cpufreq_driver.resume -                A pointer to a per-CPU resume function
>                                 which is called with interrupts disabled
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index cf485d9..6e6beae 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1336,7 +1336,8 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
>                                                 __func__, new_cpu, cpu);
>                         }
>                 }
> -       }
> +       } else if (cpufreq_driver->stop)
> +               cpufreq_driver->stop(policy);
>
>         return 0;
>  }
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 4d89e0e..ff8db19 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -224,6 +224,7 @@ struct cpufreq_driver {
>         int     (*bios_limit)   (int cpu, unsigned int *limit);
>
>         int     (*exit)         (struct cpufreq_policy *policy);
> +       int     (*stop)         (struct cpufreq_policy *policy);
>         int     (*suspend)      (struct cpufreq_policy *policy);
>         int     (*resume)       (struct cpufreq_policy *policy);
>         struct freq_attr        **attr;

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-18 19:44                 ` Dirk Brandewie
  2014-03-18 20:15                   ` Srivatsa S. Bhat
@ 2014-03-19  5:20                   ` Viresh Kumar
  2014-03-19 15:32                     ` Dirk Brandewie
  1 sibling, 1 reply; 28+ messages in thread
From: Viresh Kumar @ 2014-03-19  5:20 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Srivatsa S. Bhat, Dirk Brandewie, Linux PM list, linux-kernel,
	Rafael J. Wysocki, Patrick Marlier

On 19 March 2014 01:14, Dirk Brandewie <dirk.brandewie@gmail.com> wrote:
> There was no problem per se.  In stop() all I really needed to do is stop
> the
> timer and set the P state to MIN.
>
> At init time I need to allocate memory and start timer.  If stopping the
> timer
> and deallocating memory are separated then I need code in init() to detect
> this case.

Sorry, I didn't understood what exactly is special here :(

If we return failure from CPU_POST_DEAD for some reason without
calling exit() then you will have memory leak in your init() as we are
allocating memory without checking if we already have that (nothing wrong
in it though as other parts of kernel should handle things properly here).

Probably the situation would be exactly same if we divide the exit path into
stop and exit routines, which I still feel is the right way forward. Because
ideally cpufreq shouldn't call init() if it hasn't called exit() (If
it is doing that
right now then its wrong and can be fixed). And so you must do the cleanup
in exit()..

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-19  0:53               ` Rafael J. Wysocki
@ 2014-03-19  5:33                 ` Viresh Kumar
  2014-03-19 14:01                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Viresh Kumar @ 2014-03-19  5:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm,
	Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie

On 19 March 2014 06:23, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, March 18, 2014 12:25:14 PM Dirk Brandewie wrote:
>> On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote:
>> > On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote:
>> >> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
>> >>
>> >
>> > I don't mean to nitpick, but generally its easier to deal with
>> > patchsets if you post the subsequent versions in fresh email threads.
>> > Otherwise it can get a bit muddled along with too many other email
>> > discussions in the same thread :-(
>> >
>> >> Changes:
>> >> v2->v3
>> >> Changed the calling of the ->stop() callback to be conditional on the
>> >> core being the last core controlled by a given policy.
>> >>
>> >
>> > Wait, why? I'm sorry if I am not catching up with the discussions on
>> > this issue quickly enough, but I don't see why we should make it
>> > conditional on _that_. I thought we agreed that we should make it
>> > conditional in the sense that ->stop() should be invoked only for
>> > ->setpolicy drivers, right?
>>
>> This was done at Viresh's suggestion since thought there might be value
>> for ->target drivers.
>>
>> Any of the options work for me
>>     called only for set_policy scaling drivers
>
> And that's what we should do *today* in my opinion, unless we want to add
> it to any ->target() drivers *right* now.  Do we want that?

We don't have a platform right now that might want to use it, but people
are doing this during suspend and so there is a high possibility that they
will use it while normal cpu offline as well..

This is what I think:
- We are adding a new callback to struct cpufreq_driver..
- We have a classic case here because a driver (set-policy ones) is both a
driver and governor. And that's why its special..
- All we want here is to give drivers a chance to do something useful on the
CPUs that are going down..
- There is nothing like GOV_STOP for setpolicy drivers as we never needed
it and if we really want that, probably we better register setpolicy drivers as
governors as well (only to a level where they would get GOV_START|STOP|etc)
callbacks and nothing else.
- And so I wanted the ->stop() callback to be more about the driver than the
governor.
- And because a policy contains only the CPUs that share clock line, its
only required to call stop() for last CPU of the policy.

--
viresh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-19 14:01                   ` Rafael J. Wysocki
@ 2014-03-19 13:49                     ` Viresh Kumar
  2014-03-19 14:25                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Viresh Kumar @ 2014-03-19 13:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm,
	Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie

On 19 March 2014 19:31, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> Let's add ->stop() (or whatever to call it) *specifically* for ->setpolicy
> drivers

setpolicy_stop() ? I know that's a bad name :)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-19  5:33                 ` Viresh Kumar
@ 2014-03-19 14:01                   ` Rafael J. Wysocki
  2014-03-19 13:49                     ` Viresh Kumar
  0 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2014-03-19 14:01 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm,
	Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie

On Wednesday, March 19, 2014 11:03:56 AM Viresh Kumar wrote:
> On 19 March 2014 06:23, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, March 18, 2014 12:25:14 PM Dirk Brandewie wrote:
> >> On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote:
> >> > On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote:
> >> >> From: Dirk Brandewie <dirk.j.brandewie@intel.com>
> >> >>
> >> >
> >> > I don't mean to nitpick, but generally its easier to deal with
> >> > patchsets if you post the subsequent versions in fresh email threads.
> >> > Otherwise it can get a bit muddled along with too many other email
> >> > discussions in the same thread :-(
> >> >
> >> >> Changes:
> >> >> v2->v3
> >> >> Changed the calling of the ->stop() callback to be conditional on the
> >> >> core being the last core controlled by a given policy.
> >> >>
> >> >
> >> > Wait, why? I'm sorry if I am not catching up with the discussions on
> >> > this issue quickly enough, but I don't see why we should make it
> >> > conditional on _that_. I thought we agreed that we should make it
> >> > conditional in the sense that ->stop() should be invoked only for
> >> > ->setpolicy drivers, right?
> >>
> >> This was done at Viresh's suggestion since thought there might be value
> >> for ->target drivers.
> >>
> >> Any of the options work for me
> >>     called only for set_policy scaling drivers
> >
> > And that's what we should do *today* in my opinion, unless we want to add
> > it to any ->target() drivers *right* now.  Do we want that?
> 
> We don't have a platform right now that might want to use it, but people
> are doing this during suspend and so there is a high possibility that they
> will use it while normal cpu offline as well..
> 
> This is what I think:
> - We are adding a new callback to struct cpufreq_driver..
> - We have a classic case here because a driver (set-policy ones) is both a
> driver and governor. And that's why its special..
> - All we want here is to give drivers a chance to do something useful on the
> CPUs that are going down..
> - There is nothing like GOV_STOP for setpolicy drivers as we never needed
> it and if we really want that, probably we better register setpolicy drivers as
> governors as well (only to a level where they would get GOV_START|STOP|etc)
> callbacks and nothing else.
> - And so I wanted the ->stop() callback to be more about the driver than the
> governor.
> - And because a policy contains only the CPUs that share clock line, its
> only required to call stop() for last CPU of the policy.

So you're still insisting on putting ->setpolicy and ->target drivers into one
bag, which in my opinion is a mistake.  Moreover, it has always been a mistake.

Let's add ->stop() (or whatever to call it) *specifically* for ->setpolicy
drivers, so that the meaning of it is entirely clear.  And *if* any ->target
drivers need a similar callback, let's add it for them *separately* (just as
a different callback pointer).

Yes, we'll potentially waste a pointer size worth of storage this way, but then
it will be clear who's supposed to use those new callbacks and when.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface.
  2014-03-19 13:49                     ` Viresh Kumar
@ 2014-03-19 14:25                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2014-03-19 14:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm,
	Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie

On Wednesday, March 19, 2014 07:19:13 PM Viresh Kumar wrote:
> On 19 March 2014 19:31, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > Let's add ->stop() (or whatever to call it) *specifically* for ->setpolicy
> > drivers
> 
> setpolicy_stop() ? I know that's a bad name :)

Well, what about ->stop_cpu()?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline
  2014-03-19  5:20                   ` Viresh Kumar
@ 2014-03-19 15:32                     ` Dirk Brandewie
  0 siblings, 0 replies; 28+ messages in thread
From: Dirk Brandewie @ 2014-03-19 15:32 UTC (permalink / raw)
  To: Viresh Kumar, Dirk Brandewie
  Cc: dirk.j.brandewie, Srivatsa S. Bhat, Linux PM list, linux-kernel,
	Rafael J. Wysocki, Patrick Marlier

On 03/18/2014 10:20 PM, Viresh Kumar wrote:
> On 19 March 2014 01:14, Dirk Brandewie <dirk.brandewie@gmail.com> wrote:
>> There was no problem per se.  In stop() all I really needed to do is stop
>> the
>> timer and set the P state to MIN.
>>
>> At init time I need to allocate memory and start timer.  If stopping the
>> timer
>> and deallocating memory are separated then I need code in init() to detect
>> this case.
>
> Sorry, I didn't understood what exactly is special here :(
>
> If we return failure from CPU_POST_DEAD for some reason without
> calling exit() then you will have memory leak in your init() as we are
> allocating memory without checking if we already have that (nothing wrong
> in it though as other parts of kernel should handle things properly here).

No.  If you got the CPU_POST_DEAD callback CPU_DOWN_PREPARE has already
succeeded.  init() is called on the CPU_ONLINE and CPU_DOWN_FAILED path.

The issue is there is a two part teardown that can fail and the teardown
fail will be followed by a call to init().

If the timer is not running (stopped in stop()) then there is no reason to
have the memory around. If CPU_DOWN_PREPARE happens followed by CPU_DOWN_FAILED
then intel_pstate is ready for init() to be called with no special case
code.  This maintains the semantics the core expects.


>
> Probably the situation would be exactly same if we divide the exit path into
> stop and exit routines, which I still feel is the right way forward. Because
> ideally cpufreq shouldn't call init() if it hasn't called exit() (If
> it is doing that
> right now then its wrong and can be fixed). And so you must do the cleanup
> in exit()..
>

The core *is* doing this on the CPU_DOWN_FAILED path ATM.

On the CPU_DOWN_FAILED path the core should be undoing the work it did in the
CPU_DOWN_PREPARE path this would require another callback to drivers to let
them "restart" after a call to stop() as well.

I don't think it is worth that level of effort IMHO.

--Dirk

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-03-19 15:32 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKQMxzQhVz0QT6tV0PKFjXXpYpDbtnOBH=miARoeQhrvs2TKFQ@mail.gmail.com>
     [not found] ` <5320D05F.7050304@gmail.com>
     [not found]   ` <5320D12B.4040303@gmail.com>
2014-03-13  0:07     ` intel_pstate: Lower p-state when putting down CPU Rafael J. Wysocki
2014-03-13  4:56       ` Viresh Kumar
2014-03-13 21:39         ` Rafael J. Wysocki
2014-03-14 21:03       ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie
2014-03-14 21:03         ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie
2014-03-15  2:04           ` Rafael J. Wysocki
2014-03-18  5:43             ` Viresh Kumar
2014-03-14 21:03         ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
2014-03-18  5:44           ` Viresh Kumar
2014-03-18 15:01             ` Dirk Brandewie
2014-03-18 18:52               ` Srivatsa S. Bhat
2014-03-18 19:44                 ` Dirk Brandewie
2014-03-18 20:15                   ` Srivatsa S. Bhat
2014-03-19  5:20                   ` Viresh Kumar
2014-03-19 15:32                     ` Dirk Brandewie
2014-03-18 17:22         ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie
2014-03-18 17:22           ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie
2014-03-19  5:04             ` Viresh Kumar
2014-03-18 17:22           ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
2014-03-18 19:08             ` Srivatsa S. Bhat
2014-03-18 19:08           ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat
2014-03-18 19:25             ` Dirk Brandewie
2014-03-18 20:04               ` Srivatsa S. Bhat
2014-03-19  0:53               ` Rafael J. Wysocki
2014-03-19  5:33                 ` Viresh Kumar
2014-03-19 14:01                   ` Rafael J. Wysocki
2014-03-19 13:49                     ` Viresh Kumar
2014-03-19 14:25                       ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.