linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
       [not found]   ` <1616499241-4906-2-git-send-email-andrew-sh.cheng@mediatek.com>
@ 2021-03-25  7:42     ` Chanwoo Choi
  2021-03-25  8:14     ` Chanwoo Choi
  1 sibling, 0 replies; 14+ messages in thread
From: Chanwoo Choi @ 2021-03-25  7:42 UTC (permalink / raw)
  To: Andrew-sh.Cheng, MyungJoo Ham, Kyungmin Park, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown
  Cc: linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Saravana Kannan, Sibi Sankar

Hi,

You are missing to add these patches to linux-pm mailing list.
Need to send them to linu-pm ML.

Also, before received this series, I tried to clean-up these patches
on testing branch[1]. So that I add my comment with my clean-up case.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov


On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> From: Saravana Kannan <skannan@codeaurora.org>
> 
> Many CPU architectures have caches that can scale independent of the
> CPUs. Frequency scaling of the caches is necessary to make sure that the
> cache is not a performance bottleneck that leads to poor performance and
> power. The same idea applies for RAM/DDR.
> 
> To achieve this, this patch adds support for cpu based scaling to the
> passive governor. This is accomplished by taking the current frequency
> of each CPU frequency domain and then adjust the frequency of the cache
> (or any devfreq device) based on the frequency of the CPUs. It listens
> to CPU frequency transition notifiers to keep itself up to date on the
> current CPU frequency.
> 
> To decide the frequency of the device, the governor does one of the
> following:
> * Derives the optimal devfreq device opp from required-opps property of
>   the parent cpu opp_table.
> 
> * Scales the device frequency in proportion to the CPU frequency. So, if
>   the CPUs are running at their max frequency, the device runs at its
>   max frequency. If the CPUs are running at their min frequency, the
>   device runs at its min frequency. It is interpolated for frequencies
>   in between.
> 
> Andrew-sh.Cheng change
> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> after kernel-5.7
> Don't return -EINVAL in devfreq_passive_event_handler()
> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
> 
> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> [Sibi: Integrated cpu-freqmap governor into passive_governor]
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> ---
>  drivers/devfreq/Kconfig            |   2 +
>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>  include/linux/devfreq.h            |  29 +++-
>  3 files changed, 342 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> index 00704efe6398..f56132b0ae64 100644
> --- a/drivers/devfreq/Kconfig
> +++ b/drivers/devfreq/Kconfig
> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>  	  device. This governor does not change the frequency by itself
>  	  through sysfs entries. The passive governor recommends that
>  	  devfreq device uses the OPP table to get the frequency/voltage.
> +	  Alternatively the governor can also be chosen to scale based on
> +	  the online CPUs current frequency.
>  
>  comment "DEVFREQ Drivers"
>  
> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> index b094132bd20b..9cc57b083839 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -8,11 +8,103 @@
>   */
>  
>  #include <linux/module.h>
> +#include <linux/cpu.h>
> +#include <linux/cpufreq.h>
> +#include <linux/cpumask.h>
>  #include <linux/device.h>
>  #include <linux/devfreq.h>
> +#include <linux/slab.h>
>  #include "governor.h"
>  
> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +struct devfreq_cpu_state {
> +	unsigned int curr_freq;
> +	unsigned int min_freq;
> +	unsigned int max_freq;
> +	unsigned int first_cpu;
> +	struct device *cpu_dev;
> +	struct opp_table *opp_table;
> +};

As I knew, the previous version has the description of structure
as following:  I wan to add the description like below.

And if you have no any objection, I'd like you to order
the variables as following and use 'dev' instead of 'cpu_dev'
because this patch use the 'cpu_state->cpu_dev' at the multiple points.
I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
Also, I prefer to use 'cur_freq' instead of 'curr_freq'
because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.

/**                                                                             
 * struct devfreq_cpu_state - Hold the per-cpu data                              
 * @dev:        reference to cpu device.                                        
 * @first_cpu:  the cpumask of the first cpu of a policy.                       
 * @opp_table:  reference to cpu opp table.                                     
 * @cur_freq:   the current frequency of the cpu.                               
 * @min_freq:   the min frequency of the cpu.                                   
 * @max_freq:   the max frequency of the cpu.                                   
 *                                                                              
 * This structure stores the required cpu_data of a cpu.                        
 * This is auto-populated by the governor.                                      
 */                                                                             
struct devfreq_cpu_state {                                                       
         struct device *dev;                                                     
         unsigned int first_cpu;                                                 

         struct opp_table *opp_table;                                            
         unsigned int cur_freq;                                                  
         unsigned int min_freq;                                                  
         unsigned int max_freq;                                                  
};               


> +
> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> +					      unsigned int cpu)
> +{
> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> +
> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> +	unsigned long cpu_curr_freq, freq;
> +
> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
> +	    !cpu_state->opp_table || !devfreq->opp_table)
> +		return 0;
> +
> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> +	if (IS_ERR(p_opp))
> +		return 0;
> +
> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> +					    devfreq->opp_table, p_opp);
> +	dev_pm_opp_put(p_opp);
> +
> +	if (!IS_ERR(opp)) {
> +		freq = dev_pm_opp_get_freq(opp);
> +		dev_pm_opp_put(opp);
> +		goto out;
> +	}
> +
> +	/* Use Interpolation if required opps is not available */
> +	cpu_min_freq = cpu_state->min_freq;
> +	cpu_max_freq = cpu_state->max_freq;
> +	cpu_curr_freq_khz = cpu_state->curr_freq;
> +
> +	if (dev_freq_table) {
> +		/* Get minimum frequency according to sorting order */
> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> +		if (dev_freq_table[0] < dev_max_state) {
> +			dev_min_freq = dev_freq_table[0];
> +			dev_max_freq = dev_max_state;
> +		} else {
> +			dev_min_freq = dev_max_state;
> +			dev_max_freq = dev_freq_table[0];
> +		}
> +	} else {
> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MIN_FREQUENCY);
> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MAX_FREQUENCY);
> +
> +		if (dev_max_freq <= dev_min_freq)
> +			return 0;
> +	}
> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> +
> +out:
> +	return freq;
> +}
> +
> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> +					unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +				(struct devfreq_passive_data *)devfreq->data;
> +	unsigned int cpu;
> +	unsigned long target_freq = 0;
> +
> +	for_each_online_cpu(cpu)
> +		target_freq = max(target_freq,
> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
> +
> +	*freq = target_freq;
> +
> +	return 0;
> +}

As you knew, governor_passive.c was already used 
both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
to get the target from OPP. So, I wan to make the common function
like 'get_taget_freq_by_required_opp' as following:
If define 'get_taget_freq_by_required_opp' as following,
it will be used for get_target_freq_with_devfreq().
After finisied the review of this patch, I'll send the patch[2].
[2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf

For example but this code is not tested,
static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
						struct opp_table *p_opp_table,
						struct opp_table *opp_table,
						unsigned long freq)
{
	struct dev_pm_opp *opp = NULL, *p_opp = NULL;

	if (!p_dev || !p_opp_table || !opp_table || !freq)
		return 0;

	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
	if (IS_ERR(p_opp))
		return 0;

	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
	dev_pm_opp_put(p_opp);

	if (IS_ERR(opp))
		return 0;

	freq = dev_pm_opp_get_freq(opp);
	dev_pm_opp_put(opp);

	return freq;
}

static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
					unsigned long *target_freq)
{
	struct devfreq_passive_data *p_data =
				(struct devfreq_passive_data *)devfreq->data;
	struct devfreq_cpu_data *cpu_data;
	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
	unsigned long dev_min, dev_max;
	unsigned long freq = 0;

	for_each_online_cpu(cpu) {
		cpu_data = p_data->cpu_data[cpu];
		if (!cpu_data || cpu_data->first_cpu != cpu)
			continue;

		/* Get target freq via required opps */
		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
		freq = get_taget_freq_by_required_opp(cpu_data->dev,
					cpu_data->opp_table,
					devfreq->opp_table, cpu_cur);
		if (freq) {
			*target_freq = max(freq, *target_freq);
			continue;
		}

		/* Use Interpolation if required opps is not available */
		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);

		cpu_min = cpu_data->min_freq;
		cpu_max = cpu_data->max_freq;
		cpu_cur = cpu_data->cur_freq;

		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);

		*target_freq = max(freq, *target_freq);
	}

	return 0;
}

> +
> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>  					unsigned long *freq)
>  {
>  	struct devfreq_passive_data *p_data
> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	int i, count;
>  
>  	/*
> -	 * If the devfreq device with passive governor has the specific method
> -	 * to determine the next frequency, should use the get_target_freq()
> -	 * of struct devfreq_passive_data.
> -	 */
> -	if (p_data->get_target_freq)
> -		return p_data->get_target_freq(devfreq, freq);
> -
> -	/*
>  	 * If the parent and passive devfreq device uses the OPP table,
>  	 * get the next frequency by using the OPP table.
>  	 */
> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	return 0;
>  }
>  
> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +					   unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +		(struct devfreq_passive_data *)devfreq->data;
> +	int ret;
> +
> +	/*
> +	 * If the devfreq device with passive governor has the specific method
> +	 * to determine the next frequency, should use the get_target_freq()
> +	 * of struct devfreq_passive_data.
> +	 */
> +	if (p_data->get_target_freq)
> +		return p_data->get_target_freq(devfreq, freq);
> +
> +	switch (p_data->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_devfreq(devfreq, freq);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		dev_err(&devfreq->dev, "Invalid parent type\n");
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  				unsigned long event, void *ptr)
>  {
> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  	return NOTIFY_DONE;
>  }
>  
> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> +					 unsigned long event, void *ptr)
> +{
> +	struct devfreq_passive_data *data =
> +			container_of(nb, struct devfreq_passive_data, nb);
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_freqs *cpu_freq = ptr;

Use 'freqs' variable name.  I prefer to use the same variable name
for both devfreq_freqs and cpufreq_freqs instance.

> +	unsigned int curr_freq;

As I commented above, better to use 'cur_frq' instead of 'curr_freq'
if there is no any special reason.

> +	int ret;
> +
> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> +	    !data->cpu_state[cpu_freq->policy->cpu])
> +		return 0;
> +
> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> +	if (cpu_state->curr_freq == cpu_freq->new)
> +		return 0;
> +
> +	/* Backup current freq and pre-update cpu state freq*/

I think that this commnet is not critial. So, please drop this comment.

> +	curr_freq = cpu_state->curr_freq;
> +	cpu_state->curr_freq = cpu_freq->new;
> +
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
as following:
	devfreq_update_target(devfreq, freqs->new);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret) {
> +		cpu_state->curr_freq = curr_freq;
> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)

In order to keep the consistent style of function name,
please change the name as following because devfreq defines
the function name as 'devfreq_regiter_notifier'
- cpufreq_passive_register -> cpufreq_passive_register_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct device *dev = devfreq->dev.parent;
> +	struct opp_table *opp_table = NULL;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_policy *policy;
> +	struct device *cpu_dev;
> +	unsigned int cpu;
> +	int ret;
> +
> +	get_online_cpus();
> +
> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
> +	ret = cpufreq_register_notifier(&data->nb,
> +					CPUFREQ_TRANSITION_NOTIFIER);
> +	if (ret) {
> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
> +		data->nb.notifier_call = NULL;
> +		goto out;
> +	}
> +
> +	/* Populate devfreq_cpu_state */

Don't need this comment. Please drop it.

> +	for_each_online_cpu(cpu) {
> +		if (data->cpu_state[cpu])
> +			continue;
> +
> +		policy = cpufreq_cpu_get(cpu);
> +		if (!policy) {
> +			ret = -EINVAL;
> +			goto out;
> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> +			ret = -EPROBE_DEFER;
> +			goto out;
> +		} else if (IS_ERR(policy)) {
> +			ret = PTR_ERR(policy);
> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> +			goto out;
> +		}

Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
It make code more simple.

> +
> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> +		if (!cpu_state) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		cpu_dev = get_cpu_device(cpu);
> +		if (!cpu_dev) {
> +			dev_err(dev, "Couldn't get cpu device.\n");
> +			ret = -ENODEV;
> +			goto out;
> +		}
> +
> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> +		if (IS_ERR(devfreq->opp_table)) {
> +			ret = PTR_ERR(opp_table);
> +			goto out;
> +		}
> +
> +		cpu_state->cpu_dev = cpu_dev;
> +		cpu_state->opp_table = opp_table;
> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> +		cpu_state->curr_freq = policy->cur;
> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
> +		data->cpu_state[cpu] = cpu_state;
> +
> +		cpufreq_cpu_put(policy);
> +	}
> +
> +out:
> +	put_online_cpus();
> +	if (ret)
> +		return ret;
> +
> +	/* Update devfreq */
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret)
> +		dev_err(dev, "Couldn't update the frequency.\n");
> +
> +	return ret;
> +}
> +
> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)

As I commented above, please change the name as following:
- cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq_cpu_state *cpu_state;
> +	int cpu;
> +
> +	if (data->nb.notifier_call)
> +		cpufreq_unregister_notifier(&data->nb,
> +					    CPUFREQ_TRANSITION_NOTIFIER);
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_state = data->cpu_state[cpu];
> +		if (cpu_state) {
> +			if (cpu_state->opp_table)
> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
> +			kfree(cpu_state);
> +			cpu_state = NULL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	struct notifier_block *nb = &(*p_data)->nb;
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		nb->notifier_call = devfreq_passive_notifier_call;
> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> +						DEVFREQ_TRANSITION_NOTIFIER);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = cpufreq_passive_register(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> +						    &(*p_data)->nb,
> +						    DEVFREQ_TRANSITION_NOTIFIER));
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		cpufreq_passive_unregister(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}

I think that you don't need to define register_parent_dev_notifier
and unregister_parent_dev_notifier as the separate functions.

Instead of the separate functions, just add the code
into devfreq_passive_event_handler.


> +
>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  				unsigned int event, void *data)
>  {
>  	struct devfreq_passive_data *p_data
>  			= (struct devfreq_passive_data *)devfreq->data;
>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
> -	struct notifier_block *nb = &p_data->nb;
>  	int ret = 0;
>  
> -	if (!parent)
> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>  		return -EPROBE_DEFER;
>  
>  	switch (event) {
> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  		if (!p_data->this)
>  			p_data->this = devfreq;
>  
> -		nb->notifier_call = devfreq_passive_notifier_call;
> -		ret = devfreq_register_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER);
> +		ret = register_parent_dev_notifier(&p_data);
>  		break;
> +
>  	case DEVFREQ_GOV_STOP:
> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER));
> +		ret = unregister_parent_dev_notifier(&p_data);
>  		break;
>  	default:
>  		break;
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 26ea0850be9b..e0093b7c805c 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>  
>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>  /**
> + * struct devfreq_cpu_state - holds the per-cpu state
> + * @freq:	the current frequency of the cpu.
> + * @min_freq:	the min frequency of the cpu.
> + * @max_freq:	the max frequency of the cpu.
> + * @first_cpu:	the cpumask of the first cpu of a policy.
> + * @dev:	reference to cpu device.
> + * @opp_table:	reference to cpu opp table.
> + *
> + * This structure stores the required cpu_state of a cpu.
> + * This is auto-populated by the governor.
> + */
> +struct devfreq_cpu_state;
> +
> +enum devfreq_parent_dev_type {
> +	DEVFREQ_PARENT_DEV,
> +	CPUFREQ_PARENT_DEV,
> +};
> +
> +/**
>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>   *	and devfreq_add_device
>   * @parent:	the devfreq instance of parent device.
> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>   *			using governors except for passive governor.
>   *			If the devfreq device has the specific method to decide
>   *			the next frequency, should use this callback.
> + * @parent_type:	parent type of the device
>   * @this:	the devfreq instance of own device.
>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>   *
>   * The devfreq_passive_data have to set the devfreq instance of parent
>   * device with governors except for the passive governor. But, don't need to
> - * initialize the 'this' and 'nb' field because the devfreq core will handle
> - * them.
> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> + * will handle them.
>   */
>  struct devfreq_passive_data {
>  	/* Should set the devfreq instance of parent device */
> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>  	/* Optional callback to decide the next frequency of passvice device */
>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>  
> +	/* Should set the type of parent device */
> +	enum devfreq_parent_dev_type parent_type;
> +
>  	/* For passive governor's internal use. Don't need to set them */
>  	struct devfreq *this;
>  	struct notifier_block nb;
> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>  };
>  #endif
>  
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 4/8] devfreq: add mediatek cci devfreq
       [not found]   ` <1616499241-4906-5-git-send-email-andrew-sh.cheng@mediatek.com>
@ 2021-03-25  8:04     ` Chanwoo Choi
  0 siblings, 0 replies; 14+ messages in thread
From: Chanwoo Choi @ 2021-03-25  8:04 UTC (permalink / raw)
  To: Andrew-sh.Cheng, MyungJoo Ham, Kyungmin Park, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown
  Cc: linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream

Hi,

On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> From: "Andrew-sh.Cheng" <andrew-sh.cheng@mediatek.com>
> 
> This adds a devfreq driver for the Cache Coherent Interconnect (CCI)
> of the Mediatek MT8183.
> 
> On the MT8183 the CCI is supplied by the same regulator as the LITTLE
> cores. The driver is notified when the regulator voltage changes
> (driven by cpufreq) and adjusts the CCI frequency to the maximum
> possible value.
> 
> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> ---
>  drivers/devfreq/Kconfig              |  10 ++
>  drivers/devfreq/Makefile             |   1 +
>  drivers/devfreq/mt8183-cci-devfreq.c | 198 +++++++++++++++++++++++++++++++++++
>  3 files changed, 209 insertions(+)
>  create mode 100644 drivers/devfreq/mt8183-cci-devfreq.c
> 
> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> index f56132b0ae64..2538255ac2c1 100644
> --- a/drivers/devfreq/Kconfig
> +++ b/drivers/devfreq/Kconfig
> @@ -111,6 +111,16 @@ config ARM_IMX8M_DDRC_DEVFREQ
>  	  This adds the DEVFREQ driver for the i.MX8M DDR Controller. It allows
>  	  adjusting DRAM frequency.
>  
> +config ARM_MT8183_CCI_DEVFREQ
> +	tristate "MT8183 CCI DEVFREQ Driver"
> +	depends on ARM_MEDIATEK_CPUFREQ
> +	help
> +		This adds a devfreq driver for Cache Coherent Interconnect
> +		of Mediatek MT8183, which is shared the same regulator
> +		with cpu cluster.
> +		It can track buck voltage and update a proper CCI frequency.
> +		Use notification to get regulator status.
> +
>  config ARM_TEGRA_DEVFREQ
>  	tristate "NVIDIA Tegra30/114/124/210 DEVFREQ Driver"
>  	depends on ARCH_TEGRA_3x_SOC || ARCH_TEGRA_114_SOC || \
> diff --git a/drivers/devfreq/Makefile b/drivers/devfreq/Makefile
> index a16333ea7034..991ef7740759 100644
> --- a/drivers/devfreq/Makefile
> +++ b/drivers/devfreq/Makefile
> @@ -11,6 +11,7 @@ obj-$(CONFIG_DEVFREQ_GOV_PASSIVE)	+= governor_passive.o
>  obj-$(CONFIG_ARM_EXYNOS_BUS_DEVFREQ)	+= exynos-bus.o
>  obj-$(CONFIG_ARM_IMX_BUS_DEVFREQ)	+= imx-bus.o
>  obj-$(CONFIG_ARM_IMX8M_DDRC_DEVFREQ)	+= imx8m-ddrc.o
> +obj-$(CONFIG_ARM_MT8183_CCI_DEVFREQ)	+= mt8183-cci-devfreq.o
>  obj-$(CONFIG_ARM_RK3399_DMC_DEVFREQ)	+= rk3399_dmc.o
>  obj-$(CONFIG_ARM_TEGRA_DEVFREQ)		+= tegra30-devfreq.o
>  
> diff --git a/drivers/devfreq/mt8183-cci-devfreq.c b/drivers/devfreq/mt8183-cci-devfreq.c
> new file mode 100644
> index 000000000000..018543db7bae
> --- /dev/null
> +++ b/drivers/devfreq/mt8183-cci-devfreq.c
> @@ -0,0 +1,198 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 MediaTek Inc.
> +
> + * Author: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> + */
> +
> +#include <linux/clk.h>
> +#include <linux/devfreq.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/regulator/consumer.h>
> +#include <linux/time.h>
> +
> +#define MAX_VOLT_LIMIT		(1150000)
> +
> +struct cci_devfreq {
> +	struct devfreq *devfreq;
> +	struct regulator *cpu_reg;
> +	struct clk *cci_clk;
> +	int old_vproc;

nitpick. how about using 'old_voltage'?
because 'vproc' is not easy for understanding.

> +	unsigned long old_freq;
> +};
> +
> +static int mtk_cci_set_voltage(struct cci_devfreq *cci_df, int vproc)

nitpick: how about changing 'vproc -> voltage'?

> +{
> +	int ret;
> +
> +	ret = regulator_set_voltage(cci_df->cpu_reg, vproc,
> +				    MAX_VOLT_LIMIT);
> +	if (!ret)
> +		cci_df->old_vproc = vproc;
> +	return ret;
> +}
> +
> +static int mtk_cci_devfreq_target(struct device *dev, unsigned long *freq,
> +				  u32 flags)
> +{
> +	int ret;
> +	struct cci_devfreq *cci_df = dev_get_drvdata(dev);
> +	struct dev_pm_opp *opp;
> +	unsigned long opp_rate, opp_voltage, old_voltage;
> +
> +	if (!cci_df)
> +		return -EINVAL;
> +
> +	if (cci_df->old_freq == *freq)
> +		return 0;
> +
> +	opp_rate = *freq;
> +	opp = devfreq_recommended_opp(dev, &opp_rate, 1);
> +	opp_voltage = dev_pm_opp_get_voltage(opp);
> +	dev_pm_opp_put(opp);
> +
> +	old_voltage = cci_df->old_vproc;
> +	if (old_voltage == 0)
> +		old_voltage = regulator_get_voltage(cci_df->cpu_reg);
> +
> +	// scale up: set voltage first then freq
> +	if (opp_voltage > old_voltage) {
> +		ret = mtk_cci_set_voltage(cci_df, opp_voltage);
> +		if (ret) {
> +			pr_err("cci: failed to scale up voltage\n");
> +			return ret;
> +		}
> +	}
> +
> +	ret = clk_set_rate(cci_df->cci_clk, *freq);
> +	if (ret) {
> +		pr_err("%s: failed cci to set rate: %d\n", __func__,
> +		       ret);
> +		mtk_cci_set_voltage(cci_df, old_voltage);
> +		return ret;
> +	}
> +
> +	// scale down: set freq first then voltage
> +	if (opp_voltage < old_voltage) {
> +		ret = mtk_cci_set_voltage(cci_df, opp_voltage);
> +		if (ret) {
> +			pr_err("cci: failed to scale down voltage\n");
> +			clk_set_rate(cci_df->cci_clk, cci_df->old_freq);
> +			return ret;
> +		}
> +	}
> +
> +	cci_df->old_freq = *freq;
> +
> +	return 0;
> +}
> +
> +static struct devfreq_dev_profile cci_devfreq_profile = {
> +	.target = mtk_cci_devfreq_target,
> +};
> +
> +static int mtk_cci_devfreq_probe(struct platform_device *pdev)
> +{
> +	struct device *cci_dev = &pdev->dev;
> +	struct cci_devfreq *cci_df;
> +	struct devfreq_passive_data *passive_data;
> +	int ret;
> +
> +	cci_df = devm_kzalloc(cci_dev, sizeof(*cci_df), GFP_KERNEL);
> +	if (!cci_df)
> +		return -ENOMEM;
> +
> +	cci_df->cci_clk = devm_clk_get(cci_dev, "cci_clock");
> +	ret = PTR_ERR_OR_ZERO(cci_df->cci_clk);
> +	if (ret) {
> +		if (ret != -EPROBE_DEFER)
> +			dev_err(cci_dev, "failed to get clock for CCI: %d\n",
> +				ret);

Use dev_err_probe() to handle EPROBE_DEFER case. It makes code more simple.

> +		return ret;
> +	}
> +	cci_df->cpu_reg = devm_regulator_get_optional(cci_dev, "proc");
> +	ret = PTR_ERR_OR_ZERO(cci_df->cpu_reg);
> +	if (ret) {
> +		if (ret != -EPROBE_DEFER)
> +			dev_err(cci_dev, "failed to get regulator for CCI: %d\n",
> +				ret);

ditto. Use dev_err_probe()

> +		return ret;
> +	}
> +	ret = regulator_enable(cci_df->cpu_reg);
> +	if (ret) {
> +		dev_err(cci_dev, "enable buck for cci fail\n");
> +		return ret;
> +	}
> +
> +	ret = dev_pm_opp_of_add_table(cci_dev);
> +	if (ret) {
> +		dev_err(cci_dev, "Fail to get OPP table for CCI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	platform_set_drvdata(pdev, cci_df);
> +
> +	passive_data = devm_kzalloc(cci_dev, sizeof(*passive_data), GFP_KERNEL);
> +	if (!passive_data) {
> +		ret = -ENOMEM;
> +		goto err_opp;
> +	}
> +
> +	passive_data->parent_type = CPUFREQ_PARENT_DEV;
> +
> +	cci_df->devfreq = devm_devfreq_add_device(cci_dev,
> +						  &cci_devfreq_profile,
> +						  DEVFREQ_GOV_PASSIVE,
> +						  passive_data);
> +	if (IS_ERR(cci_df->devfreq)) {
> +		ret = PTR_ERR(cci_df->devfreq);
> +		dev_err(cci_dev, "cannot create cci devfreq device:%d\n", ret);
> +		goto err_opp;
> +	}
> +
> +	return 0;
> +
> +err_opp:
> +	dev_pm_opp_of_remove_table(cci_dev);
> +	return ret;
> +}
> +
> +static int mtk_cci_devfreq_remove(struct platform_device *pdev)
> +{
> +	struct device *cci_dev = &pdev->dev;
> +	struct cci_devfreq *cci_df;
> +	struct notifier_block *opp_nb;
> +
> +	cci_df = platform_get_drvdata(pdev);
> +	opp_nb = &cci_df->opp_nb;
> +
> +	dev_pm_opp_unregister_notifier(cci_dev, opp_nb);

Why do you call this function without registration?
If you want to catch the OPP changes of devfreq,
you can use devfreq_register_opp_notifier/devfreq_unregister_opp_notifier
functions.

> +	dev_pm_opp_of_remove_table(cci_dev);
> +	regulator_disable(cci_df->cpu_reg);
> +
> +	return 0;
> +}
> +
> +static const __maybe_unused struct of_device_id
> +	mediatek_cci_of_match[] = {

Need to change it as following at same line:
static const __maybe_unused struct of_device_idmediatek_cci_of_match[] = {


> +	{ .compatible = "mediatek,mt8183-cci" },
> +	{ },
> +};
> +MODULE_DEVICE_TABLE(of, mediatek_cci_of_match);
> +
> +static struct platform_driver cci_devfreq_driver = {
> +	.probe	= mtk_cci_devfreq_probe,
> +	.remove	= mtk_cci_devfreq_remove,
> +	.driver = {
> +		.name = "mediatek-cci-devfreq",
> +		.of_match_table = of_match_ptr(mediatek_cci_of_match),
> +	},
> +};
> +
> +module_platform_driver(cci_devfreq_driver);
> +
> +MODULE_DESCRIPTION("Mediatek CCI devfreq driver");
> +MODULE_AUTHOR("Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>");
> +MODULE_LICENSE("GPL v2");
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 7/8] devfreq: mediatek: cci devfreq register opp notification for SVS support
       [not found]   ` <1616499241-4906-8-git-send-email-andrew-sh.cheng@mediatek.com>
@ 2021-03-25  8:11     ` Chanwoo Choi
  0 siblings, 0 replies; 14+ messages in thread
From: Chanwoo Choi @ 2021-03-25  8:11 UTC (permalink / raw)
  To: Andrew-sh.Cheng, MyungJoo Ham, Kyungmin Park, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown
  Cc: linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream

Hi,

I think that you can squash this patch to patch4.

On 3/23/21 8:34 PM, Andrew-sh.Cheng wrote:
> From: "Andrew-sh.Cheng" <andrew-sh.cheng@mediatek.com>
> 
> SVS will change the voltage of opp item.

What it the full name of SVS?

> CCI devfreq need to react to change frequency.
> 
> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> ---
>  drivers/devfreq/mt8183-cci-devfreq.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/devfreq/mt8183-cci-devfreq.c b/drivers/devfreq/mt8183-cci-devfreq.c
> index 018543db7bae..6942a48f3f4f 100644
> --- a/drivers/devfreq/mt8183-cci-devfreq.c
> +++ b/drivers/devfreq/mt8183-cci-devfreq.c
> @@ -21,6 +21,7 @@ struct cci_devfreq {
>  	struct clk *cci_clk;
>  	int old_vproc;
>  	unsigned long old_freq;
> +	struct notifier_block opp_nb;
>  };
>  
>  static int mtk_cci_set_voltage(struct cci_devfreq *cci_df, int vproc)
> @@ -89,6 +90,26 @@ static int mtk_cci_devfreq_target(struct device *dev, unsigned long *freq,
>  	return 0;
>  }
>  
> +static int ccidevfreq_opp_notifier(struct notifier_block *nb,

I think that you better to change the function name as following:
ccidevfreq_opp_notifier -> mtk_cci_devfreq_opp_notifier

> +				   unsigned long event, void *data)
> +{
> +	struct dev_pm_opp *opp = data;
> +	struct cci_devfreq *cci_df = container_of(nb, struct cci_devfreq,
> +						  opp_nb);
> +	unsigned long	freq, volt;
> +
> +	if (event == OPP_EVENT_ADJUST_VOLTAGE) {
> +		freq = dev_pm_opp_get_freq(opp);
> +		/* current opp item is changed */
> +		if (freq == cci_df->old_freq) {
> +			volt = dev_pm_opp_get_voltage(opp);
> +			mtk_cci_set_voltage(cci_df, volt);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static struct devfreq_dev_profile cci_devfreq_profile = {
>  	.target = mtk_cci_devfreq_target,
>  };
> @@ -98,12 +119,15 @@ static int mtk_cci_devfreq_probe(struct platform_device *pdev)
>  	struct device *cci_dev = &pdev->dev;
>  	struct cci_devfreq *cci_df;
>  	struct devfreq_passive_data *passive_data;
> +	struct notifier_block *opp_nb;
>  	int ret;
>  
>  	cci_df = devm_kzalloc(cci_dev, sizeof(*cci_df), GFP_KERNEL);
>  	if (!cci_df)
>  		return -ENOMEM;
>  
> +	opp_nb = &cci_df->opp_nb;

Just move this code at the neighborhood of 'opp_nb->notifier_call' init code.

> +
>  	cci_df->cci_clk = devm_clk_get(cci_dev, "cci_clock");
>  	ret = PTR_ERR_OR_ZERO(cci_df->cci_clk);
>  	if (ret) {
> @@ -152,6 +176,9 @@ static int mtk_cci_devfreq_probe(struct platform_device *pdev)
>  		goto err_opp;
>  	}
>  
> +	opp_nb->notifier_call = ccidevfreq_opp_notifier;
> +	dev_pm_opp_register_notifier(cci_dev, opp_nb);

Need to check whether return value is valid or not.

> +
>  	return 0;
>  
>  err_opp:
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
       [not found]   ` <1616499241-4906-2-git-send-email-andrew-sh.cheng@mediatek.com>
  2021-03-25  7:42     ` [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor Chanwoo Choi
@ 2021-03-25  8:14     ` Chanwoo Choi
       [not found]       ` <1617177820.15067.1.camel@mtksdaap41>
  2021-03-31 10:46       ` Hsin-Yi Wang
  1 sibling, 2 replies; 14+ messages in thread
From: Chanwoo Choi @ 2021-03-25  8:14 UTC (permalink / raw)
  To: Andrew-sh.Cheng, MyungJoo Ham, Kyungmin Park, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown
  Cc: linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

Hi,

You are missing to add these patches to linux-pm mailing list.
Need to send them to linu-pm ML.

Also, before received this series, I tried to clean-up these patches
on testing branch[1]. So that I add my comment with my clean-up case.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov

And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
Please update the email or drop this email.


On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> From: Saravana Kannan <skannan@codeaurora.org>
>
> Many CPU architectures have caches that can scale independent of the
> CPUs. Frequency scaling of the caches is necessary to make sure that the
> cache is not a performance bottleneck that leads to poor performance and
> power. The same idea applies for RAM/DDR.
>
> To achieve this, this patch adds support for cpu based scaling to the
> passive governor. This is accomplished by taking the current frequency
> of each CPU frequency domain and then adjust the frequency of the cache
> (or any devfreq device) based on the frequency of the CPUs. It listens
> to CPU frequency transition notifiers to keep itself up to date on the
> current CPU frequency.
>
> To decide the frequency of the device, the governor does one of the
> following:
> * Derives the optimal devfreq device opp from required-opps property of
>   the parent cpu opp_table.
>
> * Scales the device frequency in proportion to the CPU frequency. So, if
>   the CPUs are running at their max frequency, the device runs at its
>   max frequency. If the CPUs are running at their min frequency, the
>   device runs at its min frequency. It is interpolated for frequencies
>   in between.
>
> Andrew-sh.Cheng change
> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> after kernel-5.7
> Don't return -EINVAL in devfreq_passive_event_handler()
> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>
> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> [Sibi: Integrated cpu-freqmap governor into passive_governor]
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> ---
>  drivers/devfreq/Kconfig            |   2 +
>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>  include/linux/devfreq.h            |  29 +++-
>  3 files changed, 342 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> index 00704efe6398..f56132b0ae64 100644
> --- a/drivers/devfreq/Kconfig
> +++ b/drivers/devfreq/Kconfig
> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>  	  device. This governor does not change the frequency by itself
>  	  through sysfs entries. The passive governor recommends that
>  	  devfreq device uses the OPP table to get the frequency/voltage.
> +	  Alternatively the governor can also be chosen to scale based on
> +	  the online CPUs current frequency.
>  
>  comment "DEVFREQ Drivers"
>  
> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> index b094132bd20b..9cc57b083839 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -8,11 +8,103 @@
>   */
>  
>  #include <linux/module.h>
> +#include <linux/cpu.h>
> +#include <linux/cpufreq.h>
> +#include <linux/cpumask.h>
>  #include <linux/device.h>
>  #include <linux/devfreq.h>
> +#include <linux/slab.h>
>  #include "governor.h"
>  
> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +struct devfreq_cpu_state {
> +	unsigned int curr_freq;
> +	unsigned int min_freq;
> +	unsigned int max_freq;
> +	unsigned int first_cpu;
> +	struct device *cpu_dev;
> +	struct opp_table *opp_table;
> +};

As I knew, the previous version has the description of structure
as following:  I wan to add the description like below.

And if you have no any objection, I'd like you to order
the variables as following and use 'dev' instead of 'cpu_dev'
because this patch use the 'cpu_state->cpu_dev' at the multiple points.
I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
Also, I prefer to use 'cur_freq' instead of 'curr_freq'
because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.

/**                                                                             
 * struct devfreq_cpu_state - Hold the per-cpu data                              
 * @dev:        reference to cpu device.                                        
 * @first_cpu:  the cpumask of the first cpu of a policy.                       
 * @opp_table:  reference to cpu opp table.                                     
 * @cur_freq:   the current frequency of the cpu.                               
 * @min_freq:   the min frequency of the cpu.                                   
 * @max_freq:   the max frequency of the cpu.                                   
 *                                                                              
 * This structure stores the required cpu_data of a cpu.                        
 * This is auto-populated by the governor.                                      
 */                                                                             
struct devfreq_cpu_state {                                                       
         struct device *dev;                                                     
         unsigned int first_cpu;                                                 

         struct opp_table *opp_table;                                            
         unsigned int cur_freq;                                                  
         unsigned int min_freq;                                                  
         unsigned int max_freq;                                                  
};               


> +
> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> +					      unsigned int cpu)
> +{
> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> +
> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> +	unsigned long cpu_curr_freq, freq;
> +
> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
> +	    !cpu_state->opp_table || !devfreq->opp_table)
> +		return 0;
> +
> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> +	if (IS_ERR(p_opp))
> +		return 0;
> +
> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> +					    devfreq->opp_table, p_opp);
> +	dev_pm_opp_put(p_opp);
> +
> +	if (!IS_ERR(opp)) {
> +		freq = dev_pm_opp_get_freq(opp);
> +		dev_pm_opp_put(opp);
> +		goto out;
> +	}
> +
> +	/* Use Interpolation if required opps is not available */
> +	cpu_min_freq = cpu_state->min_freq;
> +	cpu_max_freq = cpu_state->max_freq;
> +	cpu_curr_freq_khz = cpu_state->curr_freq;
> +
> +	if (dev_freq_table) {
> +		/* Get minimum frequency according to sorting order */
> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> +		if (dev_freq_table[0] < dev_max_state) {
> +			dev_min_freq = dev_freq_table[0];
> +			dev_max_freq = dev_max_state;
> +		} else {
> +			dev_min_freq = dev_max_state;
> +			dev_max_freq = dev_freq_table[0];
> +		}
> +	} else {
> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MIN_FREQUENCY);
> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> +						     DEV_PM_QOS_MAX_FREQUENCY);
> +
> +		if (dev_max_freq <= dev_min_freq)
> +			return 0;
> +	}
> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> +
> +out:
> +	return freq;
> +}
> +
> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> +					unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +				(struct devfreq_passive_data *)devfreq->data;
> +	unsigned int cpu;
> +	unsigned long target_freq = 0;
> +
> +	for_each_online_cpu(cpu)
> +		target_freq = max(target_freq,
> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
> +
> +	*freq = target_freq;
> +
> +	return 0;
> +}

As you knew, governor_passive.c was already used 
both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
to get the target from OPP. So, I wan to make the common function
like 'get_taget_freq_by_required_opp' as following:
If define 'get_taget_freq_by_required_opp' as following,
it will be used for get_target_freq_with_devfreq().
After finisied the review of this patch, I'll send the patch[2].
[2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf

For example but this code is not tested,
static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
						struct opp_table *p_opp_table,
						struct opp_table *opp_table,
						unsigned long freq)
{
	struct dev_pm_opp *opp = NULL, *p_opp = NULL;

	if (!p_dev || !p_opp_table || !opp_table || !freq)
		return 0;

	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
	if (IS_ERR(p_opp))
		return 0;

	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
	dev_pm_opp_put(p_opp);

	if (IS_ERR(opp))
		return 0;

	freq = dev_pm_opp_get_freq(opp);
	dev_pm_opp_put(opp);

	return freq;
}

static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
					unsigned long *target_freq)
{
	struct devfreq_passive_data *p_data =
				(struct devfreq_passive_data *)devfreq->data;
	struct devfreq_cpu_data *cpu_data;
	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
	unsigned long dev_min, dev_max;
	unsigned long freq = 0;

	for_each_online_cpu(cpu) {
		cpu_data = p_data->cpu_data[cpu];
		if (!cpu_data || cpu_data->first_cpu != cpu)
			continue;

		/* Get target freq via required opps */
		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
		freq = get_taget_freq_by_required_opp(cpu_data->dev,
					cpu_data->opp_table,
					devfreq->opp_table, cpu_cur);
		if (freq) {
			*target_freq = max(freq, *target_freq);
			continue;
		}

		/* Use Interpolation if required opps is not available */
		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);

		cpu_min = cpu_data->min_freq;
		cpu_max = cpu_data->max_freq;
		cpu_cur = cpu_data->cur_freq;

		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);

		*target_freq = max(freq, *target_freq);
	}

	return 0;
}

> +
> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>  					unsigned long *freq)
>  {
>  	struct devfreq_passive_data *p_data
> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	int i, count;
>  
>  	/*
> -	 * If the devfreq device with passive governor has the specific method
> -	 * to determine the next frequency, should use the get_target_freq()
> -	 * of struct devfreq_passive_data.
> -	 */
> -	if (p_data->get_target_freq)
> -		return p_data->get_target_freq(devfreq, freq);
> -
> -	/*
>  	 * If the parent and passive devfreq device uses the OPP table,
>  	 * get the next frequency by using the OPP table.
>  	 */
> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>  	return 0;
>  }
>  
> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> +					   unsigned long *freq)
> +{
> +	struct devfreq_passive_data *p_data =
> +		(struct devfreq_passive_data *)devfreq->data;
> +	int ret;
> +
> +	/*
> +	 * If the devfreq device with passive governor has the specific method
> +	 * to determine the next frequency, should use the get_target_freq()
> +	 * of struct devfreq_passive_data.
> +	 */
> +	if (p_data->get_target_freq)
> +		return p_data->get_target_freq(devfreq, freq);
> +
> +	switch (p_data->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_devfreq(devfreq, freq);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		dev_err(&devfreq->dev, "Invalid parent type\n");
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  				unsigned long event, void *ptr)
>  {
> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>  	return NOTIFY_DONE;
>  }
>  
> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> +					 unsigned long event, void *ptr)
> +{
> +	struct devfreq_passive_data *data =
> +			container_of(nb, struct devfreq_passive_data, nb);
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_freqs *cpu_freq = ptr;

Use 'freqs' variable name.  I prefer to use the same variable name
for both devfreq_freqs and cpufreq_freqs instance.

> +	unsigned int curr_freq;

As I commented above, better to use 'cur_frq' instead of 'curr_freq'
if there is no any special reason.

> +	int ret;
> +
> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> +	    !data->cpu_state[cpu_freq->policy->cpu])
> +		return 0;
> +
> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> +	if (cpu_state->curr_freq == cpu_freq->new)
> +		return 0;
> +
> +	/* Backup current freq and pre-update cpu state freq*/

I think that this commnet is not critial. So, please drop this comment.

> +	curr_freq = cpu_state->curr_freq;
> +	cpu_state->curr_freq = cpu_freq->new;
> +
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
as following:
	devfreq_update_target(devfreq, freqs->new);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret) {
> +		cpu_state->curr_freq = curr_freq;
> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)

In order to keep the consistent style of function name,
please change the name as following because devfreq defines
the function name as 'devfreq_regiter_notifier'
- cpufreq_passive_register -> cpufreq_passive_register_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq *devfreq = (struct devfreq *)data->this;
> +	struct device *dev = devfreq->dev.parent;
> +	struct opp_table *opp_table = NULL;
> +	struct devfreq_cpu_state *cpu_state;
> +	struct cpufreq_policy *policy;
> +	struct device *cpu_dev;
> +	unsigned int cpu;
> +	int ret;
> +
> +	get_online_cpus();
> +
> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
> +	ret = cpufreq_register_notifier(&data->nb,
> +					CPUFREQ_TRANSITION_NOTIFIER);
> +	if (ret) {
> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
> +		data->nb.notifier_call = NULL;
> +		goto out;
> +	}
> +
> +	/* Populate devfreq_cpu_state */

Don't need this comment. Please drop it.

> +	for_each_online_cpu(cpu) {
> +		if (data->cpu_state[cpu])
> +			continue;
> +
> +		policy = cpufreq_cpu_get(cpu);
> +		if (!policy) {
> +			ret = -EINVAL;
> +			goto out;
> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> +			ret = -EPROBE_DEFER;
> +			goto out;
> +		} else if (IS_ERR(policy)) {
> +			ret = PTR_ERR(policy);
> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> +			goto out;
> +		}

Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
It make code more simple.

> +
> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> +		if (!cpu_state) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		cpu_dev = get_cpu_device(cpu);
> +		if (!cpu_dev) {
> +			dev_err(dev, "Couldn't get cpu device.\n");
> +			ret = -ENODEV;
> +			goto out;
> +		}
> +
> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> +		if (IS_ERR(devfreq->opp_table)) {
> +			ret = PTR_ERR(opp_table);
> +			goto out;
> +		}
> +
> +		cpu_state->cpu_dev = cpu_dev;
> +		cpu_state->opp_table = opp_table;
> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> +		cpu_state->curr_freq = policy->cur;
> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
> +		data->cpu_state[cpu] = cpu_state;
> +
> +		cpufreq_cpu_put(policy);
> +	}
> +
> +out:
> +	put_online_cpus();
> +	if (ret)
> +		return ret;
> +
> +	/* Update devfreq */
> +	mutex_lock(&devfreq->lock);
> +	ret = update_devfreq(devfreq);

> +	mutex_unlock(&devfreq->lock);
> +	if (ret)
> +		dev_err(dev, "Couldn't update the frequency.\n");
> +
> +	return ret;
> +}
> +
> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)

As I commented above, please change the name as following:
- cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier

> +{
> +	struct devfreq_passive_data *data = *p_data;
> +	struct devfreq_cpu_state *cpu_state;
> +	int cpu;
> +
> +	if (data->nb.notifier_call)
> +		cpufreq_unregister_notifier(&data->nb,
> +					    CPUFREQ_TRANSITION_NOTIFIER);
> +
> +	for_each_possible_cpu(cpu) {
> +		cpu_state = data->cpu_state[cpu];
> +		if (cpu_state) {
> +			if (cpu_state->opp_table)
> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
> +			kfree(cpu_state);
> +			cpu_state = NULL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	struct notifier_block *nb = &(*p_data)->nb;
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		nb->notifier_call = devfreq_passive_notifier_call;
> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> +						DEVFREQ_TRANSITION_NOTIFIER);
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		ret = cpufreq_passive_register(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> +{
> +	int ret = 0;
> +
> +	switch ((*p_data)->parent_type) {
> +	case DEVFREQ_PARENT_DEV:
> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> +						    &(*p_data)->nb,
> +						    DEVFREQ_TRANSITION_NOTIFIER));
> +		break;
> +	case CPUFREQ_PARENT_DEV:
> +		cpufreq_passive_unregister(p_data);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}

I think that you don't need to define register_parent_dev_notifier
and unregister_parent_dev_notifier as the separate functions.

Instead of the separate functions, just add the code
into devfreq_passive_event_handler.


> +
>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  				unsigned int event, void *data)
>  {
>  	struct devfreq_passive_data *p_data
>  			= (struct devfreq_passive_data *)devfreq->data;
>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
> -	struct notifier_block *nb = &p_data->nb;
>  	int ret = 0;
>  
> -	if (!parent)
> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>  		return -EPROBE_DEFER;
>  
>  	switch (event) {
> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>  		if (!p_data->this)
>  			p_data->this = devfreq;
>  
> -		nb->notifier_call = devfreq_passive_notifier_call;
> -		ret = devfreq_register_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER);
> +		ret = register_parent_dev_notifier(&p_data);
>  		break;
> +
>  	case DEVFREQ_GOV_STOP:
> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
> -					DEVFREQ_TRANSITION_NOTIFIER));
> +		ret = unregister_parent_dev_notifier(&p_data);
>  		break;
>  	default:
>  		break;
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 26ea0850be9b..e0093b7c805c 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>  
>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>  /**
> + * struct devfreq_cpu_state - holds the per-cpu state
> + * @freq:	the current frequency of the cpu.
> + * @min_freq:	the min frequency of the cpu.
> + * @max_freq:	the max frequency of the cpu.
> + * @first_cpu:	the cpumask of the first cpu of a policy.
> + * @dev:	reference to cpu device.
> + * @opp_table:	reference to cpu opp table.
> + *
> + * This structure stores the required cpu_state of a cpu.
> + * This is auto-populated by the governor.
> + */
> +struct devfreq_cpu_state;
> +
> +enum devfreq_parent_dev_type {
> +	DEVFREQ_PARENT_DEV,
> +	CPUFREQ_PARENT_DEV,
> +};
> +
> +/**
>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>   *	and devfreq_add_device
>   * @parent:	the devfreq instance of parent device.
> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>   *			using governors except for passive governor.
>   *			If the devfreq device has the specific method to decide
>   *			the next frequency, should use this callback.
> + * @parent_type:	parent type of the device
>   * @this:	the devfreq instance of own device.
>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>   *
>   * The devfreq_passive_data have to set the devfreq instance of parent
>   * device with governors except for the passive governor. But, don't need to
> - * initialize the 'this' and 'nb' field because the devfreq core will handle
> - * them.
> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> + * will handle them.
>   */
>  struct devfreq_passive_data {
>  	/* Should set the devfreq instance of parent device */
> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>  	/* Optional callback to decide the next frequency of passvice device */
>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>  
> +	/* Should set the type of parent device */
> +	enum devfreq_parent_dev_type parent_type;
> +
>  	/* For passive governor's internal use. Don't need to set them */
>  	struct devfreq *this;
>  	struct notifier_block nb;
> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>  };
>  #endif
>  
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 2/8] cpufreq: mediatek: Enable clock and regulator
       [not found] ` <1616499241-4906-3-git-send-email-andrew-sh.cheng@mediatek.com>
@ 2021-03-30  4:36   ` Viresh Kumar
       [not found]     ` <1617168099.18405.8.camel@mtksdaap41>
  0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2021-03-30  4:36 UTC (permalink / raw)
  To: Andrew-sh.Cheng
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream

On 23-03-21, 19:33, Andrew-sh.Cheng wrote:
> From: "Andrew-sh.Cheng" <andrew-sh.cheng@mediatek.com>
> 
> Need to enable regulator,
> so that the max/min requested value will be recorded
> even it is not applied right away.
> 
> Intermediate clock is not always enabled by ccf in different projects,
> so cpufreq should enable it by itself.
> 
> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> ---
>  drivers/cpufreq/mediatek-cpufreq.c | 33 +++++++++++++++++++++++++++++----
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/mediatek-cpufreq.c b/drivers/cpufreq/mediatek-cpufreq.c
> index f2e491b25b07..432368707ea6 100644
> --- a/drivers/cpufreq/mediatek-cpufreq.c
> +++ b/drivers/cpufreq/mediatek-cpufreq.c
> @@ -350,6 +350,11 @@ static int mtk_cpu_dvfs_info_init(struct mtk_cpu_dvfs_info *info, int cpu)
>  		ret = PTR_ERR(proc_reg);
>  		goto out_free_resources;
>  	}
> +	ret = regulator_enable(proc_reg);
> +	if (ret) {
> +		pr_warn("enable vproc for cpu%d fail\n", cpu);
> +		goto out_free_resources;
> +	}

Regulators are enabled by OPP core as well now, you sure this is
required ?

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 2/8] cpufreq: mediatek: Enable clock and regulator
       [not found]     ` <1617168099.18405.8.camel@mtksdaap41>
@ 2021-03-31  6:17       ` Viresh Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Viresh Kumar @ 2021-03-31  6:17 UTC (permalink / raw)
  To: andrew-sh.cheng
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream

On 31-03-21, 13:21, andrew-sh.cheng wrote:
> Hi Viresh,
> Yes.
> As you mentioned, it will be enable by OPP core.
> 
> Per discuss with hotplug owner and regulator owner,
> they suggest that "users should not suppose other module, will enable
> regulators for them".
> They suggest to add enable_regulator here.

Which is fine if the modules in question aren't closely related to each other,
but OPP core and cpufreq are too closely bound to each other. So much that the
cpufreq driver can depend on the OPP core for doing it.

Though I won't Nack a patch just for that, but it was just a suggestion.

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
       [not found]       ` <1617177820.15067.1.camel@mtksdaap41>
@ 2021-03-31  8:27         ` Chanwoo Choi
  2021-03-31  8:35           ` Chanwoo Choi
  0 siblings, 1 reply; 14+ messages in thread
From: Chanwoo Choi @ 2021-03-31  8:27 UTC (permalink / raw)
  To: andrew-sh.cheng
  Cc: MyungJoo Ham, Kyungmin Park, Rob Herring, Mark Rutland,
	Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

Hi,

On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>> Hi,
>>
>> You are missing to add these patches to linux-pm mailing list.
>> Need to send them to linu-pm ML.
>>
>> Also, before received this series, I tried to clean-up these patches
>> on testing branch[1]. So that I add my comment with my clean-up case.
>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>
>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>> Please update the email or drop this email.
> 
> Hi Chanwoo,
> 
> Thank you for the advices.
> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
> that my patch set base on
> https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov

I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
So that if possible, I'd like you to test your patches with this patch[1] 
and then if there is no problem, could you send the next patches with patch[1]?

[1]https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454

> 
> 
>>
>>
>> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
>>> From: Saravana Kannan <skannan@codeaurora.org>
>>>
>>> Many CPU architectures have caches that can scale independent of the
>>> CPUs. Frequency scaling of the caches is necessary to make sure that the
>>> cache is not a performance bottleneck that leads to poor performance and
>>> power. The same idea applies for RAM/DDR.
>>>
>>> To achieve this, this patch adds support for cpu based scaling to the
>>> passive governor. This is accomplished by taking the current frequency
>>> of each CPU frequency domain and then adjust the frequency of the cache
>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>> to CPU frequency transition notifiers to keep itself up to date on the
>>> current CPU frequency.
>>>
>>> To decide the frequency of the device, the governor does one of the
>>> following:
>>> * Derives the optimal devfreq device opp from required-opps property of
>>>   the parent cpu opp_table.
>>>
>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>>   the CPUs are running at their max frequency, the device runs at its
>>>   max frequency. If the CPUs are running at their min frequency, the
>>>   device runs at its min frequency. It is interpolated for frequencies
>>>   in between.
>>>
>>> Andrew-sh.Cheng change
>>> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
>>> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
>>> after kernel-5.7
>>> Don't return -EINVAL in devfreq_passive_event_handler()
>>> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>>>
>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>>> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
>>> ---
>>>  drivers/devfreq/Kconfig            |   2 +
>>>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>>>  include/linux/devfreq.h            |  29 +++-
>>>  3 files changed, 342 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>> index 00704efe6398..f56132b0ae64 100644
>>> --- a/drivers/devfreq/Kconfig
>>> +++ b/drivers/devfreq/Kconfig
>>> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>>>  	  device. This governor does not change the frequency by itself
>>>  	  through sysfs entries. The passive governor recommends that
>>>  	  devfreq device uses the OPP table to get the frequency/voltage.
>>> +	  Alternatively the governor can also be chosen to scale based on
>>> +	  the online CPUs current frequency.
>>>  
>>>  comment "DEVFREQ Drivers"
>>>  
>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>> index b094132bd20b..9cc57b083839 100644
>>> --- a/drivers/devfreq/governor_passive.c
>>> +++ b/drivers/devfreq/governor_passive.c
>>> @@ -8,11 +8,103 @@
>>>   */
>>>  
>>>  #include <linux/module.h>
>>> +#include <linux/cpu.h>
>>> +#include <linux/cpufreq.h>
>>> +#include <linux/cpumask.h>
>>>  #include <linux/device.h>
>>>  #include <linux/devfreq.h>
>>> +#include <linux/slab.h>
>>>  #include "governor.h"
>>>  
>>> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>> +struct devfreq_cpu_state {
>>> +	unsigned int curr_freq;
>>> +	unsigned int min_freq;
>>> +	unsigned int max_freq;
>>> +	unsigned int first_cpu;
>>> +	struct device *cpu_dev;
>>> +	struct opp_table *opp_table;
>>> +};
>>
>> As I knew, the previous version has the description of structure
>> as following:  I wan to add the description like below.
>>
>> And if you have no any objection, I'd like you to order
>> the variables as following and use 'dev' instead of 'cpu_dev'
>> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
>> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
>> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
>> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>>
>> /**                                                                             
>>  * struct devfreq_cpu_state - Hold the per-cpu data                              
>>  * @dev:        reference to cpu device.                                        
>>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
>>  * @opp_table:  reference to cpu opp table.                                     
>>  * @cur_freq:   the current frequency of the cpu.                               
>>  * @min_freq:   the min frequency of the cpu.                                   
>>  * @max_freq:   the max frequency of the cpu.                                   
>>  *                                                                              
>>  * This structure stores the required cpu_data of a cpu.                        
>>  * This is auto-populated by the governor.                                      
>>  */                                                                             
>> struct devfreq_cpu_state {                                                       
>>          struct device *dev;                                                     
>>          unsigned int first_cpu;                                                 
>>
>>          struct opp_table *opp_table;                                            
>>          unsigned int cur_freq;                                                  
>>          unsigned int min_freq;                                                  
>>          unsigned int max_freq;                                                  
>> };               
>>
>>
>>> +
>>> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>> +					      unsigned int cpu)
>>> +{
>>> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
>>> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
>>> +
>>> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
>>> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>> +	unsigned long cpu_curr_freq, freq;
>>> +
>>> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
>>> +	    !cpu_state->opp_table || !devfreq->opp_table)
>>> +		return 0;
>>> +
>>> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
>>> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
>>> +	if (IS_ERR(p_opp))
>>> +		return 0;
>>> +
>>> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
>>> +					    devfreq->opp_table, p_opp);
>>> +	dev_pm_opp_put(p_opp);
>>> +
>>> +	if (!IS_ERR(opp)) {
>>> +		freq = dev_pm_opp_get_freq(opp);
>>> +		dev_pm_opp_put(opp);
>>> +		goto out;
>>> +	}
>>> +
>>> +	/* Use Interpolation if required opps is not available */
>>> +	cpu_min_freq = cpu_state->min_freq;
>>> +	cpu_max_freq = cpu_state->max_freq;
>>> +	cpu_curr_freq_khz = cpu_state->curr_freq;
>>> +
>>> +	if (dev_freq_table) {
>>> +		/* Get minimum frequency according to sorting order */
>>> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
>>> +		if (dev_freq_table[0] < dev_max_state) {
>>> +			dev_min_freq = dev_freq_table[0];
>>> +			dev_max_freq = dev_max_state;
>>> +		} else {
>>> +			dev_min_freq = dev_max_state;
>>> +			dev_max_freq = dev_freq_table[0];
>>> +		}
>>> +	} else {
>>> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>> +						     DEV_PM_QOS_MIN_FREQUENCY);
>>> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>> +						     DEV_PM_QOS_MAX_FREQUENCY);
>>> +
>>> +		if (dev_max_freq <= dev_min_freq)
>>> +			return 0;
>>> +	}
>>> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
>>> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
>>> +
>>> +out:
>>> +	return freq;
>>> +}
>>> +
>>> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>> +					unsigned long *freq)
>>> +{
>>> +	struct devfreq_passive_data *p_data =
>>> +				(struct devfreq_passive_data *)devfreq->data;
>>> +	unsigned int cpu;
>>> +	unsigned long target_freq = 0;
>>> +
>>> +	for_each_online_cpu(cpu)
>>> +		target_freq = max(target_freq,
>>> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
>>> +
>>> +	*freq = target_freq;
>>> +
>>> +	return 0;
>>> +}
>>
>> As you knew, governor_passive.c was already used 
>> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
>> to get the target from OPP. So, I wan to make the common function
>> like 'get_taget_freq_by_required_opp' as following:
>> If define 'get_taget_freq_by_required_opp' as following,
>> it will be used for get_target_freq_with_devfreq().
>> After finisied the review of this patch, I'll send the patch[2].
>> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
>>
>> For example but this code is not tested,
>> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>> 						struct opp_table *p_opp_table,
>> 						struct opp_table *opp_table,
>> 						unsigned long freq)
>> {
>> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>
>> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
>> 		return 0;
>>
>> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>> 	if (IS_ERR(p_opp))
>> 		return 0;
>>
>> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>> 	dev_pm_opp_put(p_opp);
>>
>> 	if (IS_ERR(opp))
>> 		return 0;
>>
>> 	freq = dev_pm_opp_get_freq(opp);
>> 	dev_pm_opp_put(opp);
>>
>> 	return freq;
>> }
>>
>> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>> 					unsigned long *target_freq)
>> {
>> 	struct devfreq_passive_data *p_data =
>> 				(struct devfreq_passive_data *)devfreq->data;
>> 	struct devfreq_cpu_data *cpu_data;
>> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>> 	unsigned long dev_min, dev_max;
>> 	unsigned long freq = 0;
>>
>> 	for_each_online_cpu(cpu) {
>> 		cpu_data = p_data->cpu_data[cpu];
>> 		if (!cpu_data || cpu_data->first_cpu != cpu)
>> 			continue;
>>
>> 		/* Get target freq via required opps */
>> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
>> 					cpu_data->opp_table,
>> 					devfreq->opp_table, cpu_cur);
>> 		if (freq) {
>> 			*target_freq = max(freq, *target_freq);
>> 			continue;
>> 		}
>>
>> 		/* Use Interpolation if required opps is not available */
>> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>>
>> 		cpu_min = cpu_data->min_freq;
>> 		cpu_max = cpu_data->max_freq;
>> 		cpu_cur = cpu_data->cur_freq;
>>
>> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>
>> 		*target_freq = max(freq, *target_freq);
>> 	}
>>
>> 	return 0;
>> }
>>
>>> +
>>> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>>>  					unsigned long *freq)
>>>  {
>>>  	struct devfreq_passive_data *p_data
>>> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>  	int i, count;
>>>  
>>>  	/*
>>> -	 * If the devfreq device with passive governor has the specific method
>>> -	 * to determine the next frequency, should use the get_target_freq()
>>> -	 * of struct devfreq_passive_data.
>>> -	 */
>>> -	if (p_data->get_target_freq)
>>> -		return p_data->get_target_freq(devfreq, freq);
>>> -
>>> -	/*
>>>  	 * If the parent and passive devfreq device uses the OPP table,
>>>  	 * get the next frequency by using the OPP table.
>>>  	 */
>>> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>  	return 0;
>>>  }
>>>  
>>> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>> +					   unsigned long *freq)
>>> +{
>>> +	struct devfreq_passive_data *p_data =
>>> +		(struct devfreq_passive_data *)devfreq->data;
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * If the devfreq device with passive governor has the specific method
>>> +	 * to determine the next frequency, should use the get_target_freq()
>>> +	 * of struct devfreq_passive_data.
>>> +	 */
>>> +	if (p_data->get_target_freq)
>>> +		return p_data->get_target_freq(devfreq, freq);
>>> +
>>> +	switch (p_data->parent_type) {
>>> +	case DEVFREQ_PARENT_DEV:
>>> +		ret = get_target_freq_with_devfreq(devfreq, freq);
>>> +		break;
>>> +	case CPUFREQ_PARENT_DEV:
>>> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
>>> +		break;
>>> +	default:
>>> +		ret = -EINVAL;
>>> +		dev_err(&devfreq->dev, "Invalid parent type\n");
>>> +		break;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>  				unsigned long event, void *ptr)
>>>  {
>>> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>  	return NOTIFY_DONE;
>>>  }
>>>  
>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>> +					 unsigned long event, void *ptr)
>>> +{
>>> +	struct devfreq_passive_data *data =
>>> +			container_of(nb, struct devfreq_passive_data, nb);
>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +	struct devfreq_cpu_state *cpu_state;
>>> +	struct cpufreq_freqs *cpu_freq = ptr;
>>
>> Use 'freqs' variable name.  I prefer to use the same variable name
>> for both devfreq_freqs and cpufreq_freqs instance.
>>
>>> +	unsigned int curr_freq;
>>
>> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
>> if there is no any special reason.
>>
>>> +	int ret;
>>> +
>>> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
>>> +	    !data->cpu_state[cpu_freq->policy->cpu])
>>> +		return 0;
>>> +
>>> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
>>> +	if (cpu_state->curr_freq == cpu_freq->new)
>>> +		return 0;
>>> +
>>> +	/* Backup current freq and pre-update cpu state freq*/
>>
>> I think that this commnet is not critial. So, please drop this comment.
>>
>>> +	curr_freq = cpu_state->curr_freq;
>>> +	cpu_state->curr_freq = cpu_freq->new;
>>> +
>>> +	mutex_lock(&devfreq->lock);
>>> +	ret = update_devfreq(devfreq);
>>
>> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
>> as following:
>> 	devfreq_update_target(devfreq, freqs->new);
>>
>>> +	mutex_unlock(&devfreq->lock);
>>> +	if (ret) {
>>> +		cpu_state->curr_freq = curr_freq;
>>> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
>>> +		return ret;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>
>> In order to keep the consistent style of function name,
>> please change the name as following because devfreq defines
>> the function name as 'devfreq_regiter_notifier'
>> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>>
>>> +{
>>> +	struct devfreq_passive_data *data = *p_data;
>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>> +	struct device *dev = devfreq->dev.parent;
>>> +	struct opp_table *opp_table = NULL;
>>> +	struct devfreq_cpu_state *cpu_state;
>>> +	struct cpufreq_policy *policy;
>>> +	struct device *cpu_dev;
>>> +	unsigned int cpu;
>>> +	int ret;
>>> +
>>> +	get_online_cpus();
>>> +
>>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
>>> +	ret = cpufreq_register_notifier(&data->nb,
>>> +					CPUFREQ_TRANSITION_NOTIFIER);
>>> +	if (ret) {
>>> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
>>> +		data->nb.notifier_call = NULL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	/* Populate devfreq_cpu_state */
>>
>> Don't need this comment. Please drop it.
>>
>>> +	for_each_online_cpu(cpu) {
>>> +		if (data->cpu_state[cpu])
>>> +			continue;
>>> +
>>> +		policy = cpufreq_cpu_get(cpu);
>>> +		if (!policy) {
>>> +			ret = -EINVAL;
>>> +			goto out;
>>> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
>>> +			ret = -EPROBE_DEFER;
>>> +			goto out;
>>> +		} else if (IS_ERR(policy)) {
>>> +			ret = PTR_ERR(policy);
>>> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
>>> +			goto out;
>>> +		}
>>
>> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
>> It make code more simple.
>>
>>> +
>>> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
>>> +		if (!cpu_state) {
>>> +			ret = -ENOMEM;
>>> +			goto out;
>>> +		}
>>> +
>>> +		cpu_dev = get_cpu_device(cpu);
>>> +		if (!cpu_dev) {
>>> +			dev_err(dev, "Couldn't get cpu device.\n");
>>> +			ret = -ENODEV;
>>> +			goto out;
>>> +		}
>>> +
>>> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
>>> +		if (IS_ERR(devfreq->opp_table)) {
>>> +			ret = PTR_ERR(opp_table);
>>> +			goto out;
>>> +		}
>>> +
>>> +		cpu_state->cpu_dev = cpu_dev;
>>> +		cpu_state->opp_table = opp_table;
>>> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
>>> +		cpu_state->curr_freq = policy->cur;
>>> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
>>> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
>>> +		data->cpu_state[cpu] = cpu_state;
>>> +
>>> +		cpufreq_cpu_put(policy);
>>> +	}
>>> +
>>> +out:
>>> +	put_online_cpus();
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/* Update devfreq */
>>> +	mutex_lock(&devfreq->lock);
>>> +	ret = update_devfreq(devfreq);
>>
>>> +	mutex_unlock(&devfreq->lock);
>>> +	if (ret)
>>> +		dev_err(dev, "Couldn't update the frequency.\n");
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>
>> As I commented above, please change the name as following:
>> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>>
>>> +{
>>> +	struct devfreq_passive_data *data = *p_data;
>>> +	struct devfreq_cpu_state *cpu_state;
>>> +	int cpu;
>>> +
>>> +	if (data->nb.notifier_call)
>>> +		cpufreq_unregister_notifier(&data->nb,
>>> +					    CPUFREQ_TRANSITION_NOTIFIER);
>>> +
>>> +	for_each_possible_cpu(cpu) {
>>> +		cpu_state = data->cpu_state[cpu];
>>> +		if (cpu_state) {
>>> +			if (cpu_state->opp_table)
>>> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
>>> +			kfree(cpu_state);
>>> +			cpu_state = NULL;
>>> +		}
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>> +{
>>> +	struct notifier_block *nb = &(*p_data)->nb;
>>> +	int ret = 0;
>>> +
>>> +	switch ((*p_data)->parent_type) {
>>> +	case DEVFREQ_PARENT_DEV:
>>> +		nb->notifier_call = devfreq_passive_notifier_call;
>>> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
>>> +						DEVFREQ_TRANSITION_NOTIFIER);
>>> +		break;
>>> +	case CPUFREQ_PARENT_DEV:
>>> +		ret = cpufreq_passive_register(p_data);
>>> +		break;
>>> +	default:
>>> +		ret = -EINVAL;
>>> +		break;
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>> +{
>>> +	int ret = 0;
>>> +
>>> +	switch ((*p_data)->parent_type) {
>>> +	case DEVFREQ_PARENT_DEV:
>>> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
>>> +						    &(*p_data)->nb,
>>> +						    DEVFREQ_TRANSITION_NOTIFIER));
>>> +		break;
>>> +	case CPUFREQ_PARENT_DEV:
>>> +		cpufreq_passive_unregister(p_data);
>>> +		break;
>>> +	default:
>>> +		ret = -EINVAL;
>>> +		break;
>>> +	}
>>> +	return ret;
>>> +}
>>
>> I think that you don't need to define register_parent_dev_notifier
>> and unregister_parent_dev_notifier as the separate functions.
>>
>> Instead of the separate functions, just add the code
>> into devfreq_passive_event_handler.
>>
>>
>>> +
>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>  				unsigned int event, void *data)
>>>  {
>>>  	struct devfreq_passive_data *p_data
>>>  			= (struct devfreq_passive_data *)devfreq->data;
>>>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
>>> -	struct notifier_block *nb = &p_data->nb;
>>>  	int ret = 0;
>>>  
>>> -	if (!parent)
>>> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>>>  		return -EPROBE_DEFER;
>>>  
>>>  	switch (event) {
>>> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>  		if (!p_data->this)
>>>  			p_data->this = devfreq;
>>>  
>>> -		nb->notifier_call = devfreq_passive_notifier_call;
>>> -		ret = devfreq_register_notifier(parent, nb,
>>> -					DEVFREQ_TRANSITION_NOTIFIER);
>>> +		ret = register_parent_dev_notifier(&p_data);
>>>  		break;
>>> +
>>>  	case DEVFREQ_GOV_STOP:
>>> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
>>> -					DEVFREQ_TRANSITION_NOTIFIER));
>>> +		ret = unregister_parent_dev_notifier(&p_data);
>>>  		break;
>>>  	default:
>>>  		break;
>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>> index 26ea0850be9b..e0093b7c805c 100644
>>> --- a/include/linux/devfreq.h
>>> +++ b/include/linux/devfreq.h
>>> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>>>  
>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>>  /**
>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>> + * @freq:	the current frequency of the cpu.
>>> + * @min_freq:	the min frequency of the cpu.
>>> + * @max_freq:	the max frequency of the cpu.
>>> + * @first_cpu:	the cpumask of the first cpu of a policy.
>>> + * @dev:	reference to cpu device.
>>> + * @opp_table:	reference to cpu opp table.
>>> + *
>>> + * This structure stores the required cpu_state of a cpu.
>>> + * This is auto-populated by the governor.
>>> + */
>>> +struct devfreq_cpu_state;
>>> +
>>> +enum devfreq_parent_dev_type {
>>> +	DEVFREQ_PARENT_DEV,
>>> +	CPUFREQ_PARENT_DEV,
>>> +};
>>> +
>>> +/**
>>>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>>>   *	and devfreq_add_device
>>>   * @parent:	the devfreq instance of parent device.
>>> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>>>   *			using governors except for passive governor.
>>>   *			If the devfreq device has the specific method to decide
>>>   *			the next frequency, should use this callback.
>>> + * @parent_type:	parent type of the device
>>>   * @this:	the devfreq instance of own device.
>>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>>>   *
>>>   * The devfreq_passive_data have to set the devfreq instance of parent
>>>   * device with governors except for the passive governor. But, don't need to
>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>> - * them.
>>> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
>>> + * will handle them.
>>>   */
>>>  struct devfreq_passive_data {
>>>  	/* Should set the devfreq instance of parent device */
>>> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>>>  	/* Optional callback to decide the next frequency of passvice device */
>>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>  
>>> +	/* Should set the type of parent device */
>>> +	enum devfreq_parent_dev_type parent_type;
>>> +
>>>  	/* For passive governor's internal use. Don't need to set them */
>>>  	struct devfreq *this;
>>>  	struct notifier_block nb;
>>> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>>>  };
>>>  #endif
>>>  
>>>
>>
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
  2021-03-31  8:27         ` Chanwoo Choi
@ 2021-03-31  8:35           ` Chanwoo Choi
       [not found]             ` <1617195800.18432.3.camel@mtksdaap41>
  0 siblings, 1 reply; 14+ messages in thread
From: Chanwoo Choi @ 2021-03-31  8:35 UTC (permalink / raw)
  To: andrew-sh.cheng
  Cc: MyungJoo Ham, Kyungmin Park, Rob Herring, Mark Rutland,
	Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

On 3/31/21 5:27 PM, Chanwoo Choi wrote:
> Hi,
> 
> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>> Hi,
>>>
>>> You are missing to add these patches to linux-pm mailing list.
>>> Need to send them to linu-pm ML.
>>>
>>> Also, before received this series, I tried to clean-up these patches
>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>
>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>> Please update the email or drop this email.
>>
>> Hi Chanwoo,
>>
>> Thank you for the advices.
>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>> that my patch set base on
>> https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov
> 
> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
> So that if possible, I'd like you to test your patches with this patch[1] 
> and then if there is no problem, could you send the next patches with patch[1]?
> 
> [1]https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454


Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
branch based on latest devfreq-next branch.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov

First of all, if possible, I want to test them[1] with your patches in this series.
And then if there are no any problem, please let me know. After confirmed from you,
I'll send the patches of devfreq-testing-passive-gov[1] branch.
How about that?


> 
>>
>>
>>>
>>>
>>> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
>>>> From: Saravana Kannan <skannan@codeaurora.org>
>>>>
>>>> Many CPU architectures have caches that can scale independent of the
>>>> CPUs. Frequency scaling of the caches is necessary to make sure that the
>>>> cache is not a performance bottleneck that leads to poor performance and
>>>> power. The same idea applies for RAM/DDR.
>>>>
>>>> To achieve this, this patch adds support for cpu based scaling to the
>>>> passive governor. This is accomplished by taking the current frequency
>>>> of each CPU frequency domain and then adjust the frequency of the cache
>>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>>> to CPU frequency transition notifiers to keep itself up to date on the
>>>> current CPU frequency.
>>>>
>>>> To decide the frequency of the device, the governor does one of the
>>>> following:
>>>> * Derives the optimal devfreq device opp from required-opps property of
>>>>   the parent cpu opp_table.
>>>>
>>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>>>   the CPUs are running at their max frequency, the device runs at its
>>>>   max frequency. If the CPUs are running at their min frequency, the
>>>>   device runs at its min frequency. It is interpolated for frequencies
>>>>   in between.
>>>>
>>>> Andrew-sh.Cheng change
>>>> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
>>>> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
>>>> after kernel-5.7
>>>> Don't return -EINVAL in devfreq_passive_event_handler()
>>>> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>>>>
>>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>>>> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
>>>> ---
>>>>  drivers/devfreq/Kconfig            |   2 +
>>>>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>>>>  include/linux/devfreq.h            |  29 +++-
>>>>  3 files changed, 342 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>>> index 00704efe6398..f56132b0ae64 100644
>>>> --- a/drivers/devfreq/Kconfig
>>>> +++ b/drivers/devfreq/Kconfig
>>>> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>>>>  	  device. This governor does not change the frequency by itself
>>>>  	  through sysfs entries. The passive governor recommends that
>>>>  	  devfreq device uses the OPP table to get the frequency/voltage.
>>>> +	  Alternatively the governor can also be chosen to scale based on
>>>> +	  the online CPUs current frequency.
>>>>  
>>>>  comment "DEVFREQ Drivers"
>>>>  
>>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>>> index b094132bd20b..9cc57b083839 100644
>>>> --- a/drivers/devfreq/governor_passive.c
>>>> +++ b/drivers/devfreq/governor_passive.c
>>>> @@ -8,11 +8,103 @@
>>>>   */
>>>>  
>>>>  #include <linux/module.h>
>>>> +#include <linux/cpu.h>
>>>> +#include <linux/cpufreq.h>
>>>> +#include <linux/cpumask.h>
>>>>  #include <linux/device.h>
>>>>  #include <linux/devfreq.h>
>>>> +#include <linux/slab.h>
>>>>  #include "governor.h"
>>>>  
>>>> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>> +struct devfreq_cpu_state {
>>>> +	unsigned int curr_freq;
>>>> +	unsigned int min_freq;
>>>> +	unsigned int max_freq;
>>>> +	unsigned int first_cpu;
>>>> +	struct device *cpu_dev;
>>>> +	struct opp_table *opp_table;
>>>> +};
>>>
>>> As I knew, the previous version has the description of structure
>>> as following:  I wan to add the description like below.
>>>
>>> And if you have no any objection, I'd like you to order
>>> the variables as following and use 'dev' instead of 'cpu_dev'
>>> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
>>> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
>>> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
>>> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>>>
>>> /**                                                                             
>>>  * struct devfreq_cpu_state - Hold the per-cpu data                              
>>>  * @dev:        reference to cpu device.                                        
>>>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
>>>  * @opp_table:  reference to cpu opp table.                                     
>>>  * @cur_freq:   the current frequency of the cpu.                               
>>>  * @min_freq:   the min frequency of the cpu.                                   
>>>  * @max_freq:   the max frequency of the cpu.                                   
>>>  *                                                                              
>>>  * This structure stores the required cpu_data of a cpu.                        
>>>  * This is auto-populated by the governor.                                      
>>>  */                                                                             
>>> struct devfreq_cpu_state {                                                       
>>>          struct device *dev;                                                     
>>>          unsigned int first_cpu;                                                 
>>>
>>>          struct opp_table *opp_table;                                            
>>>          unsigned int cur_freq;                                                  
>>>          unsigned int min_freq;                                                  
>>>          unsigned int max_freq;                                                  
>>> };               
>>>
>>>
>>>> +
>>>> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>>> +					      unsigned int cpu)
>>>> +{
>>>> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
>>>> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
>>>> +
>>>> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
>>>> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>> +	unsigned long cpu_curr_freq, freq;
>>>> +
>>>> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
>>>> +	    !cpu_state->opp_table || !devfreq->opp_table)
>>>> +		return 0;
>>>> +
>>>> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
>>>> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
>>>> +	if (IS_ERR(p_opp))
>>>> +		return 0;
>>>> +
>>>> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
>>>> +					    devfreq->opp_table, p_opp);
>>>> +	dev_pm_opp_put(p_opp);
>>>> +
>>>> +	if (!IS_ERR(opp)) {
>>>> +		freq = dev_pm_opp_get_freq(opp);
>>>> +		dev_pm_opp_put(opp);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* Use Interpolation if required opps is not available */
>>>> +	cpu_min_freq = cpu_state->min_freq;
>>>> +	cpu_max_freq = cpu_state->max_freq;
>>>> +	cpu_curr_freq_khz = cpu_state->curr_freq;
>>>> +
>>>> +	if (dev_freq_table) {
>>>> +		/* Get minimum frequency according to sorting order */
>>>> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
>>>> +		if (dev_freq_table[0] < dev_max_state) {
>>>> +			dev_min_freq = dev_freq_table[0];
>>>> +			dev_max_freq = dev_max_state;
>>>> +		} else {
>>>> +			dev_min_freq = dev_max_state;
>>>> +			dev_max_freq = dev_freq_table[0];
>>>> +		}
>>>> +	} else {
>>>> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>> +						     DEV_PM_QOS_MIN_FREQUENCY);
>>>> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>> +						     DEV_PM_QOS_MAX_FREQUENCY);
>>>> +
>>>> +		if (dev_max_freq <= dev_min_freq)
>>>> +			return 0;
>>>> +	}
>>>> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
>>>> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
>>>> +
>>>> +out:
>>>> +	return freq;
>>>> +}
>>>> +
>>>> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>>> +					unsigned long *freq)
>>>> +{
>>>> +	struct devfreq_passive_data *p_data =
>>>> +				(struct devfreq_passive_data *)devfreq->data;
>>>> +	unsigned int cpu;
>>>> +	unsigned long target_freq = 0;
>>>> +
>>>> +	for_each_online_cpu(cpu)
>>>> +		target_freq = max(target_freq,
>>>> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
>>>> +
>>>> +	*freq = target_freq;
>>>> +
>>>> +	return 0;
>>>> +}
>>>
>>> As you knew, governor_passive.c was already used 
>>> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
>>> to get the target from OPP. So, I wan to make the common function
>>> like 'get_taget_freq_by_required_opp' as following:
>>> If define 'get_taget_freq_by_required_opp' as following,
>>> it will be used for get_target_freq_with_devfreq().
>>> After finisied the review of this patch, I'll send the patch[2].
>>> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
>>>
>>> For example but this code is not tested,
>>> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>>> 						struct opp_table *p_opp_table,
>>> 						struct opp_table *opp_table,
>>> 						unsigned long freq)
>>> {
>>> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>
>>> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
>>> 		return 0;
>>>
>>> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>>> 	if (IS_ERR(p_opp))
>>> 		return 0;
>>>
>>> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>>> 	dev_pm_opp_put(p_opp);
>>>
>>> 	if (IS_ERR(opp))
>>> 		return 0;
>>>
>>> 	freq = dev_pm_opp_get_freq(opp);
>>> 	dev_pm_opp_put(opp);
>>>
>>> 	return freq;
>>> }
>>>
>>> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>> 					unsigned long *target_freq)
>>> {
>>> 	struct devfreq_passive_data *p_data =
>>> 				(struct devfreq_passive_data *)devfreq->data;
>>> 	struct devfreq_cpu_data *cpu_data;
>>> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>>> 	unsigned long dev_min, dev_max;
>>> 	unsigned long freq = 0;
>>>
>>> 	for_each_online_cpu(cpu) {
>>> 		cpu_data = p_data->cpu_data[cpu];
>>> 		if (!cpu_data || cpu_data->first_cpu != cpu)
>>> 			continue;
>>>
>>> 		/* Get target freq via required opps */
>>> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>>> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
>>> 					cpu_data->opp_table,
>>> 					devfreq->opp_table, cpu_cur);
>>> 		if (freq) {
>>> 			*target_freq = max(freq, *target_freq);
>>> 			continue;
>>> 		}
>>>
>>> 		/* Use Interpolation if required opps is not available */
>>> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>>>
>>> 		cpu_min = cpu_data->min_freq;
>>> 		cpu_max = cpu_data->max_freq;
>>> 		cpu_cur = cpu_data->cur_freq;
>>>
>>> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>>> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>>
>>> 		*target_freq = max(freq, *target_freq);
>>> 	}
>>>
>>> 	return 0;
>>> }
>>>
>>>> +
>>>> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>>>>  					unsigned long *freq)
>>>>  {
>>>>  	struct devfreq_passive_data *p_data
>>>> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>  	int i, count;
>>>>  
>>>>  	/*
>>>> -	 * If the devfreq device with passive governor has the specific method
>>>> -	 * to determine the next frequency, should use the get_target_freq()
>>>> -	 * of struct devfreq_passive_data.
>>>> -	 */
>>>> -	if (p_data->get_target_freq)
>>>> -		return p_data->get_target_freq(devfreq, freq);
>>>> -
>>>> -	/*
>>>>  	 * If the parent and passive devfreq device uses the OPP table,
>>>>  	 * get the next frequency by using the OPP table.
>>>>  	 */
>>>> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>> +					   unsigned long *freq)
>>>> +{
>>>> +	struct devfreq_passive_data *p_data =
>>>> +		(struct devfreq_passive_data *)devfreq->data;
>>>> +	int ret;
>>>> +
>>>> +	/*
>>>> +	 * If the devfreq device with passive governor has the specific method
>>>> +	 * to determine the next frequency, should use the get_target_freq()
>>>> +	 * of struct devfreq_passive_data.
>>>> +	 */
>>>> +	if (p_data->get_target_freq)
>>>> +		return p_data->get_target_freq(devfreq, freq);
>>>> +
>>>> +	switch (p_data->parent_type) {
>>>> +	case DEVFREQ_PARENT_DEV:
>>>> +		ret = get_target_freq_with_devfreq(devfreq, freq);
>>>> +		break;
>>>> +	case CPUFREQ_PARENT_DEV:
>>>> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
>>>> +		break;
>>>> +	default:
>>>> +		ret = -EINVAL;
>>>> +		dev_err(&devfreq->dev, "Invalid parent type\n");
>>>> +		break;
>>>> +	}
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>  				unsigned long event, void *ptr)
>>>>  {
>>>> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>  	return NOTIFY_DONE;
>>>>  }
>>>>  
>>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>>> +					 unsigned long event, void *ptr)
>>>> +{
>>>> +	struct devfreq_passive_data *data =
>>>> +			container_of(nb, struct devfreq_passive_data, nb);
>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>> +	struct devfreq_cpu_state *cpu_state;
>>>> +	struct cpufreq_freqs *cpu_freq = ptr;
>>>
>>> Use 'freqs' variable name.  I prefer to use the same variable name
>>> for both devfreq_freqs and cpufreq_freqs instance.
>>>
>>>> +	unsigned int curr_freq;
>>>
>>> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
>>> if there is no any special reason.
>>>
>>>> +	int ret;
>>>> +
>>>> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
>>>> +	    !data->cpu_state[cpu_freq->policy->cpu])
>>>> +		return 0;
>>>> +
>>>> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
>>>> +	if (cpu_state->curr_freq == cpu_freq->new)
>>>> +		return 0;
>>>> +
>>>> +	/* Backup current freq and pre-update cpu state freq*/
>>>
>>> I think that this commnet is not critial. So, please drop this comment.
>>>
>>>> +	curr_freq = cpu_state->curr_freq;
>>>> +	cpu_state->curr_freq = cpu_freq->new;
>>>> +
>>>> +	mutex_lock(&devfreq->lock);
>>>> +	ret = update_devfreq(devfreq);
>>>
>>> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
>>> as following:
>>> 	devfreq_update_target(devfreq, freqs->new);
>>>
>>>> +	mutex_unlock(&devfreq->lock);
>>>> +	if (ret) {
>>>> +		cpu_state->curr_freq = curr_freq;
>>>> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>>
>>> In order to keep the consistent style of function name,
>>> please change the name as following because devfreq defines
>>> the function name as 'devfreq_regiter_notifier'
>>> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>>>
>>>> +{
>>>> +	struct devfreq_passive_data *data = *p_data;
>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>> +	struct device *dev = devfreq->dev.parent;
>>>> +	struct opp_table *opp_table = NULL;
>>>> +	struct devfreq_cpu_state *cpu_state;
>>>> +	struct cpufreq_policy *policy;
>>>> +	struct device *cpu_dev;
>>>> +	unsigned int cpu;
>>>> +	int ret;
>>>> +
>>>> +	get_online_cpus();
>>>> +
>>>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
>>>> +	ret = cpufreq_register_notifier(&data->nb,
>>>> +					CPUFREQ_TRANSITION_NOTIFIER);
>>>> +	if (ret) {
>>>> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
>>>> +		data->nb.notifier_call = NULL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* Populate devfreq_cpu_state */
>>>
>>> Don't need this comment. Please drop it.
>>>
>>>> +	for_each_online_cpu(cpu) {
>>>> +		if (data->cpu_state[cpu])
>>>> +			continue;
>>>> +
>>>> +		policy = cpufreq_cpu_get(cpu);
>>>> +		if (!policy) {
>>>> +			ret = -EINVAL;
>>>> +			goto out;
>>>> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
>>>> +			ret = -EPROBE_DEFER;
>>>> +			goto out;
>>>> +		} else if (IS_ERR(policy)) {
>>>> +			ret = PTR_ERR(policy);
>>>> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
>>>> +			goto out;
>>>> +		}
>>>
>>> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
>>> It make code more simple.
>>>
>>>> +
>>>> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
>>>> +		if (!cpu_state) {
>>>> +			ret = -ENOMEM;
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		cpu_dev = get_cpu_device(cpu);
>>>> +		if (!cpu_dev) {
>>>> +			dev_err(dev, "Couldn't get cpu device.\n");
>>>> +			ret = -ENODEV;
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
>>>> +		if (IS_ERR(devfreq->opp_table)) {
>>>> +			ret = PTR_ERR(opp_table);
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		cpu_state->cpu_dev = cpu_dev;
>>>> +		cpu_state->opp_table = opp_table;
>>>> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
>>>> +		cpu_state->curr_freq = policy->cur;
>>>> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
>>>> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
>>>> +		data->cpu_state[cpu] = cpu_state;
>>>> +
>>>> +		cpufreq_cpu_put(policy);
>>>> +	}
>>>> +
>>>> +out:
>>>> +	put_online_cpus();
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* Update devfreq */
>>>> +	mutex_lock(&devfreq->lock);
>>>> +	ret = update_devfreq(devfreq);
>>>
>>>> +	mutex_unlock(&devfreq->lock);
>>>> +	if (ret)
>>>> +		dev_err(dev, "Couldn't update the frequency.\n");
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>>
>>> As I commented above, please change the name as following:
>>> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>>>
>>>> +{
>>>> +	struct devfreq_passive_data *data = *p_data;
>>>> +	struct devfreq_cpu_state *cpu_state;
>>>> +	int cpu;
>>>> +
>>>> +	if (data->nb.notifier_call)
>>>> +		cpufreq_unregister_notifier(&data->nb,
>>>> +					    CPUFREQ_TRANSITION_NOTIFIER);
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		cpu_state = data->cpu_state[cpu];
>>>> +		if (cpu_state) {
>>>> +			if (cpu_state->opp_table)
>>>> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
>>>> +			kfree(cpu_state);
>>>> +			cpu_state = NULL;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>> +{
>>>> +	struct notifier_block *nb = &(*p_data)->nb;
>>>> +	int ret = 0;
>>>> +
>>>> +	switch ((*p_data)->parent_type) {
>>>> +	case DEVFREQ_PARENT_DEV:
>>>> +		nb->notifier_call = devfreq_passive_notifier_call;
>>>> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
>>>> +						DEVFREQ_TRANSITION_NOTIFIER);
>>>> +		break;
>>>> +	case CPUFREQ_PARENT_DEV:
>>>> +		ret = cpufreq_passive_register(p_data);
>>>> +		break;
>>>> +	default:
>>>> +		ret = -EINVAL;
>>>> +		break;
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>> +{
>>>> +	int ret = 0;
>>>> +
>>>> +	switch ((*p_data)->parent_type) {
>>>> +	case DEVFREQ_PARENT_DEV:
>>>> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
>>>> +						    &(*p_data)->nb,
>>>> +						    DEVFREQ_TRANSITION_NOTIFIER));
>>>> +		break;
>>>> +	case CPUFREQ_PARENT_DEV:
>>>> +		cpufreq_passive_unregister(p_data);
>>>> +		break;
>>>> +	default:
>>>> +		ret = -EINVAL;
>>>> +		break;
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>
>>> I think that you don't need to define register_parent_dev_notifier
>>> and unregister_parent_dev_notifier as the separate functions.
>>>
>>> Instead of the separate functions, just add the code
>>> into devfreq_passive_event_handler.
>>>
>>>
>>>> +
>>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>  				unsigned int event, void *data)
>>>>  {
>>>>  	struct devfreq_passive_data *p_data
>>>>  			= (struct devfreq_passive_data *)devfreq->data;
>>>>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
>>>> -	struct notifier_block *nb = &p_data->nb;
>>>>  	int ret = 0;
>>>>  
>>>> -	if (!parent)
>>>> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>>>>  		return -EPROBE_DEFER;
>>>>  
>>>>  	switch (event) {
>>>> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>  		if (!p_data->this)
>>>>  			p_data->this = devfreq;
>>>>  
>>>> -		nb->notifier_call = devfreq_passive_notifier_call;
>>>> -		ret = devfreq_register_notifier(parent, nb,
>>>> -					DEVFREQ_TRANSITION_NOTIFIER);
>>>> +		ret = register_parent_dev_notifier(&p_data);
>>>>  		break;
>>>> +
>>>>  	case DEVFREQ_GOV_STOP:
>>>> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
>>>> -					DEVFREQ_TRANSITION_NOTIFIER));
>>>> +		ret = unregister_parent_dev_notifier(&p_data);
>>>>  		break;
>>>>  	default:
>>>>  		break;
>>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>>> index 26ea0850be9b..e0093b7c805c 100644
>>>> --- a/include/linux/devfreq.h
>>>> +++ b/include/linux/devfreq.h
>>>> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>>>>  
>>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>>>  /**
>>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>>> + * @freq:	the current frequency of the cpu.
>>>> + * @min_freq:	the min frequency of the cpu.
>>>> + * @max_freq:	the max frequency of the cpu.
>>>> + * @first_cpu:	the cpumask of the first cpu of a policy.
>>>> + * @dev:	reference to cpu device.
>>>> + * @opp_table:	reference to cpu opp table.
>>>> + *
>>>> + * This structure stores the required cpu_state of a cpu.
>>>> + * This is auto-populated by the governor.
>>>> + */
>>>> +struct devfreq_cpu_state;
>>>> +
>>>> +enum devfreq_parent_dev_type {
>>>> +	DEVFREQ_PARENT_DEV,
>>>> +	CPUFREQ_PARENT_DEV,
>>>> +};
>>>> +
>>>> +/**
>>>>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>>>>   *	and devfreq_add_device
>>>>   * @parent:	the devfreq instance of parent device.
>>>> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>>>>   *			using governors except for passive governor.
>>>>   *			If the devfreq device has the specific method to decide
>>>>   *			the next frequency, should use this callback.
>>>> + * @parent_type:	parent type of the device
>>>>   * @this:	the devfreq instance of own device.
>>>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>>> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>>>>   *
>>>>   * The devfreq_passive_data have to set the devfreq instance of parent
>>>>   * device with governors except for the passive governor. But, don't need to
>>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>>> - * them.
>>>> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
>>>> + * will handle them.
>>>>   */
>>>>  struct devfreq_passive_data {
>>>>  	/* Should set the devfreq instance of parent device */
>>>> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>>>>  	/* Optional callback to decide the next frequency of passvice device */
>>>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>>  
>>>> +	/* Should set the type of parent device */
>>>> +	enum devfreq_parent_dev_type parent_type;
>>>> +
>>>>  	/* For passive governor's internal use. Don't need to set them */
>>>>  	struct devfreq *this;
>>>>  	struct notifier_block nb;
>>>> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>>>>  };
>>>>  #endif
>>>>  
>>>>
>>>
>>
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
  2021-03-25  8:14     ` Chanwoo Choi
       [not found]       ` <1617177820.15067.1.camel@mtksdaap41>
@ 2021-03-31 10:46       ` Hsin-Yi Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Hsin-Yi Wang @ 2021-03-31 10:46 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Andrew-sh.Cheng, MyungJoo Ham, Kyungmin Park, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	Linux PM, Devicetree List,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	moderated list:ARM/Mediatek SoC support, lkml, srv_heupstream,
	Sibi Sankar

On Thu, Mar 25, 2021 at 3:58 PM Chanwoo Choi <cw00.choi@samsung.com> wrote:
>
> Hi,
>
> You are missing to add these patches to linux-pm mailing list.
> Need to send them to linu-pm ML.
>
> Also, before received this series, I tried to clean-up these patches
> on testing branch[1]. So that I add my comment with my clean-up case.
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov
>
> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
> Please update the email or drop this email.
>
>
> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
> > From: Saravana Kannan <skannan@codeaurora.org>
> >
> > Many CPU architectures have caches that can scale independent of the
> > CPUs. Frequency scaling of the caches is necessary to make sure that the
> > cache is not a performance bottleneck that leads to poor performance and
> > power. The same idea applies for RAM/DDR.
> >
> > To achieve this, this patch adds support for cpu based scaling to the
> > passive governor. This is accomplished by taking the current frequency
> > of each CPU frequency domain and then adjust the frequency of the cache
> > (or any devfreq device) based on the frequency of the CPUs. It listens
> > to CPU frequency transition notifiers to keep itself up to date on the
> > current CPU frequency.
> >
> > To decide the frequency of the device, the governor does one of the
> > following:
> > * Derives the optimal devfreq device opp from required-opps property of
> >   the parent cpu opp_table.
> >
> > * Scales the device frequency in proportion to the CPU frequency. So, if
> >   the CPUs are running at their max frequency, the device runs at its
> >   max frequency. If the CPUs are running at their min frequency, the
> >   device runs at its min frequency. It is interpolated for frequencies
> >   in between.
> >
> > Andrew-sh.Cheng change
> > dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
> > to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
> > after kernel-5.7
> > Don't return -EINVAL in devfreq_passive_event_handler()
> > since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
> >
> > Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
> > [Sibi: Integrated cpu-freqmap governor into passive_governor]
> > Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> > Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
> > ---
> >  drivers/devfreq/Kconfig            |   2 +
> >  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
> >  include/linux/devfreq.h            |  29 +++-
> >  3 files changed, 342 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> > index 00704efe6398..f56132b0ae64 100644
> > --- a/drivers/devfreq/Kconfig
> > +++ b/drivers/devfreq/Kconfig
> > @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
> >         device. This governor does not change the frequency by itself
> >         through sysfs entries. The passive governor recommends that
> >         devfreq device uses the OPP table to get the frequency/voltage.
> > +       Alternatively the governor can also be chosen to scale based on
> > +       the online CPUs current frequency.
> >
> >  comment "DEVFREQ Drivers"
> >
> > diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> > index b094132bd20b..9cc57b083839 100644
> > --- a/drivers/devfreq/governor_passive.c
> > +++ b/drivers/devfreq/governor_passive.c
> > @@ -8,11 +8,103 @@
> >   */
> >
> >  #include <linux/module.h>
> > +#include <linux/cpu.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/cpumask.h>
> >  #include <linux/device.h>
> >  #include <linux/devfreq.h>
> > +#include <linux/slab.h>
> >  #include "governor.h"
> >
> > -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> > +struct devfreq_cpu_state {
> > +     unsigned int curr_freq;
> > +     unsigned int min_freq;
> > +     unsigned int max_freq;
> > +     unsigned int first_cpu;
> > +     struct device *cpu_dev;
> > +     struct opp_table *opp_table;
> > +};
>
> As I knew, the previous version has the description of structure
> as following:  I wan to add the description like below.
>
> And if you have no any objection, I'd like you to order
> the variables as following and use 'dev' instead of 'cpu_dev'
> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>
> /**
>  * struct devfreq_cpu_state - Hold the per-cpu data
>  * @dev:        reference to cpu device.
>  * @first_cpu:  the cpumask of the first cpu of a policy.
>  * @opp_table:  reference to cpu opp table.
>  * @cur_freq:   the current frequency of the cpu.
>  * @min_freq:   the min frequency of the cpu.
>  * @max_freq:   the max frequency of the cpu.
>  *
>  * This structure stores the required cpu_data of a cpu.
>  * This is auto-populated by the governor.
>  */
> struct devfreq_cpu_state {
>          struct device *dev;
>          unsigned int first_cpu;
>
>          struct opp_table *opp_table;
>          unsigned int cur_freq;
>          unsigned int min_freq;
>          unsigned int max_freq;
> };
>
>
> > +
> > +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
> > +                                           unsigned int cpu)
> > +{
> > +     unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
> > +     unsigned long dev_min_freq, dev_max_freq, dev_max_state;
> > +
> > +     struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
> > +     struct devfreq *devfreq = (struct devfreq *)data->this;
> > +     unsigned long *dev_freq_table = devfreq->profile->freq_table;
> > +     struct dev_pm_opp *opp = NULL, *p_opp = NULL;
> > +     unsigned long cpu_curr_freq, freq;
> > +
> > +     if (!cpu_state || cpu_state->first_cpu != cpu ||
> > +         !cpu_state->opp_table || !devfreq->opp_table)
> > +             return 0;
> > +
> > +     cpu_curr_freq = cpu_state->curr_freq * 1000;
> > +     p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
> > +     if (IS_ERR(p_opp))
> > +             return 0;
> > +
> > +     opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
> > +                                         devfreq->opp_table, p_opp);
> > +     dev_pm_opp_put(p_opp);
> > +
> > +     if (!IS_ERR(opp)) {
> > +             freq = dev_pm_opp_get_freq(opp);
> > +             dev_pm_opp_put(opp);
> > +             goto out;
> > +     }
> > +
> > +     /* Use Interpolation if required opps is not available */
> > +     cpu_min_freq = cpu_state->min_freq;
> > +     cpu_max_freq = cpu_state->max_freq;
> > +     cpu_curr_freq_khz = cpu_state->curr_freq;
> > +
> > +     if (dev_freq_table) {
> > +             /* Get minimum frequency according to sorting order */
> > +             dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
> > +             if (dev_freq_table[0] < dev_max_state) {
> > +                     dev_min_freq = dev_freq_table[0];
> > +                     dev_max_freq = dev_max_state;
> > +             } else {
> > +                     dev_min_freq = dev_max_state;
> > +                     dev_max_freq = dev_freq_table[0];
> > +             }
> > +     } else {
> > +             dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> > +                                                  DEV_PM_QOS_MIN_FREQUENCY);
> > +             dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
> > +                                                  DEV_PM_QOS_MAX_FREQUENCY);
> > +
> > +             if (dev_max_freq <= dev_min_freq)
> > +                     return 0;
> > +     }
> > +     cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;

() is missing for denominator?
cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) /
(cpu_max_freq - cpu_min_freq);


> > +     freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
> > +
> > +out:
> > +     return freq;
> > +}
> > +
> > +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
> > +                                     unsigned long *freq)
> > +{
> > +     struct devfreq_passive_data *p_data =
> > +                             (struct devfreq_passive_data *)devfreq->data;
> > +     unsigned int cpu;
> > +     unsigned long target_freq = 0;
> > +
> > +     for_each_online_cpu(cpu)
> > +             target_freq = max(target_freq,
> > +                               xlate_cpufreq_to_devfreq(p_data, cpu));
> > +
> > +     *freq = target_freq;
> > +
> > +     return 0;
> > +}
>
> As you knew, governor_passive.c was already used
> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
> to get the target from OPP. So, I wan to make the common function
> like 'get_taget_freq_by_required_opp' as following:
> If define 'get_taget_freq_by_required_opp' as following,
> it will be used for get_target_freq_with_devfreq().
> After finisied the review of this patch, I'll send the patch[2].
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf
>
> For example but this code is not tested,
> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>                                                 struct opp_table *p_opp_table,
>                                                 struct opp_table *opp_table,
>                                                 unsigned long freq)
> {
>         struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>
>         if (!p_dev || !p_opp_table || !opp_table || !freq)
>                 return 0;
>
>         p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>         if (IS_ERR(p_opp))
>                 return 0;
>
>         opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>         dev_pm_opp_put(p_opp);
>
>         if (IS_ERR(opp))
>                 return 0;
>
>         freq = dev_pm_opp_get_freq(opp);
>         dev_pm_opp_put(opp);
>
>         return freq;
> }
>
> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>                                         unsigned long *target_freq)
> {
>         struct devfreq_passive_data *p_data =
>                                 (struct devfreq_passive_data *)devfreq->data;
>         struct devfreq_cpu_data *cpu_data;
>         unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>         unsigned long dev_min, dev_max;
>         unsigned long freq = 0;
>
>         for_each_online_cpu(cpu) {
>                 cpu_data = p_data->cpu_data[cpu];
>                 if (!cpu_data || cpu_data->first_cpu != cpu)
>                         continue;
>
>                 /* Get target freq via required opps */
>                 cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>                 freq = get_taget_freq_by_required_opp(cpu_data->dev,
>                                         cpu_data->opp_table,
>                                         devfreq->opp_table, cpu_cur);
>                 if (freq) {
>                         *target_freq = max(freq, *target_freq);
>                         continue;
>                 }
>
>                 /* Use Interpolation if required opps is not available */
>                 devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>
>                 cpu_min = cpu_data->min_freq;
>                 cpu_max = cpu_data->max_freq;
>                 cpu_cur = cpu_data->cur_freq;
>
>                 cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>                 freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>
>                 *target_freq = max(freq, *target_freq);
>         }
>
>         return 0;
> }
>
> > +
> > +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
> >                                       unsigned long *freq)
> >  {
> >       struct devfreq_passive_data *p_data
> > @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >       int i, count;
> >
> >       /*
> > -      * If the devfreq device with passive governor has the specific method
> > -      * to determine the next frequency, should use the get_target_freq()
> > -      * of struct devfreq_passive_data.
> > -      */
> > -     if (p_data->get_target_freq)
> > -             return p_data->get_target_freq(devfreq, freq);
> > -
> > -     /*
> >        * If the parent and passive devfreq device uses the OPP table,
> >        * get the next frequency by using the OPP table.
> >        */
> > @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> >       return 0;
> >  }
> >
> > +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
> > +                                        unsigned long *freq)
> > +{
> > +     struct devfreq_passive_data *p_data =
> > +             (struct devfreq_passive_data *)devfreq->data;
> > +     int ret;
> > +
> > +     /*
> > +      * If the devfreq device with passive governor has the specific method
> > +      * to determine the next frequency, should use the get_target_freq()
> > +      * of struct devfreq_passive_data.
> > +      */
> > +     if (p_data->get_target_freq)
> > +             return p_data->get_target_freq(devfreq, freq);
> > +
> > +     switch (p_data->parent_type) {
> > +     case DEVFREQ_PARENT_DEV:
> > +             ret = get_target_freq_with_devfreq(devfreq, freq);
> > +             break;
> > +     case CPUFREQ_PARENT_DEV:
> > +             ret = get_target_freq_with_cpufreq(devfreq, freq);
> > +             break;
> > +     default:
> > +             ret = -EINVAL;
> > +             dev_err(&devfreq->dev, "Invalid parent type\n");
> > +             break;
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> >  static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >                               unsigned long event, void *ptr)
> >  {
> > @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
> >       return NOTIFY_DONE;
> >  }
> >
> > +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
> > +                                      unsigned long event, void *ptr)
> > +{
> > +     struct devfreq_passive_data *data =
> > +                     container_of(nb, struct devfreq_passive_data, nb);
> > +     struct devfreq *devfreq = (struct devfreq *)data->this;
> > +     struct devfreq_cpu_state *cpu_state;
> > +     struct cpufreq_freqs *cpu_freq = ptr;
>
> Use 'freqs' variable name.  I prefer to use the same variable name
> for both devfreq_freqs and cpufreq_freqs instance.
>
> > +     unsigned int curr_freq;
>
> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
> if there is no any special reason.
>
> > +     int ret;
> > +
> > +     if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
> > +         !data->cpu_state[cpu_freq->policy->cpu])
> > +             return 0;
> > +
> > +     cpu_state = data->cpu_state[cpu_freq->policy->cpu];
> > +     if (cpu_state->curr_freq == cpu_freq->new)
> > +             return 0;
> > +
> > +     /* Backup current freq and pre-update cpu state freq*/
>
> I think that this commnet is not critial. So, please drop this comment.
>
> > +     curr_freq = cpu_state->curr_freq;
> > +     cpu_state->curr_freq = cpu_freq->new;
> > +
> > +     mutex_lock(&devfreq->lock);
> > +     ret = update_devfreq(devfreq);
>
> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
> as following:
>         devfreq_update_target(devfreq, freqs->new);
>
> > +     mutex_unlock(&devfreq->lock);
> > +     if (ret) {
> > +             cpu_state->curr_freq = curr_freq;
> > +             dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
> > +             return ret;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>
> In order to keep the consistent style of function name,
> please change the name as following because devfreq defines
> the function name as 'devfreq_regiter_notifier'
> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>
> > +{
> > +     struct devfreq_passive_data *data = *p_data;
> > +     struct devfreq *devfreq = (struct devfreq *)data->this;
> > +     struct device *dev = devfreq->dev.parent;
> > +     struct opp_table *opp_table = NULL;
> > +     struct devfreq_cpu_state *cpu_state;
> > +     struct cpufreq_policy *policy;
> > +     struct device *cpu_dev;
> > +     unsigned int cpu;
> > +     int ret;
> > +
> > +     get_online_cpus();
> > +
> > +     data->nb.notifier_call = cpufreq_passive_notifier_call;
> > +     ret = cpufreq_register_notifier(&data->nb,
> > +                                     CPUFREQ_TRANSITION_NOTIFIER);
> > +     if (ret) {
> > +             dev_err(dev, "Couldn't register cpufreq notifier.\n");
> > +             data->nb.notifier_call = NULL;
> > +             goto out;
> > +     }
> > +
> > +     /* Populate devfreq_cpu_state */
>
> Don't need this comment. Please drop it.
>
> > +     for_each_online_cpu(cpu) {
> > +             if (data->cpu_state[cpu])
> > +                     continue;
> > +
> > +             policy = cpufreq_cpu_get(cpu);
> > +             if (!policy) {
> > +                     ret = -EINVAL;
> > +                     goto out;
> > +             } else if (PTR_ERR(policy) == -EPROBE_DEFER) {
> > +                     ret = -EPROBE_DEFER;
> > +                     goto out;
> > +             } else if (IS_ERR(policy)) {
> > +                     ret = PTR_ERR(policy);
> > +                     dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
> > +                     goto out;
> > +             }
>
> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
> It make code more simple.
>
> > +
> > +             cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
> > +             if (!cpu_state) {
> > +                     ret = -ENOMEM;
> > +                     goto out;
> > +             }
> > +
> > +             cpu_dev = get_cpu_device(cpu);
> > +             if (!cpu_dev) {
> > +                     dev_err(dev, "Couldn't get cpu device.\n");
> > +                     ret = -ENODEV;
> > +                     goto out;
> > +             }
> > +
> > +             opp_table = dev_pm_opp_get_opp_table(cpu_dev);
> > +             if (IS_ERR(devfreq->opp_table)) {
> > +                     ret = PTR_ERR(opp_table);
> > +                     goto out;
> > +             }
> > +
> > +             cpu_state->cpu_dev = cpu_dev;
> > +             cpu_state->opp_table = opp_table;
> > +             cpu_state->first_cpu = cpumask_first(policy->related_cpus);
> > +             cpu_state->curr_freq = policy->cur;
> > +             cpu_state->min_freq = policy->cpuinfo.min_freq;
> > +             cpu_state->max_freq = policy->cpuinfo.max_freq;
> > +             data->cpu_state[cpu] = cpu_state;
> > +
> > +             cpufreq_cpu_put(policy);
> > +     }
> > +
> > +out:
> > +     put_online_cpus();
> > +     if (ret)
> > +             return ret;
> > +
> > +     /* Update devfreq */
> > +     mutex_lock(&devfreq->lock);
> > +     ret = update_devfreq(devfreq);
>
> > +     mutex_unlock(&devfreq->lock);
> > +     if (ret)
> > +             dev_err(dev, "Couldn't update the frequency.\n");
> > +
> > +     return ret;
> > +}
> > +
> > +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>
> As I commented above, please change the name as following:
> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>
> > +{
> > +     struct devfreq_passive_data *data = *p_data;
> > +     struct devfreq_cpu_state *cpu_state;
> > +     int cpu;
> > +
> > +     if (data->nb.notifier_call)
> > +             cpufreq_unregister_notifier(&data->nb,
> > +                                         CPUFREQ_TRANSITION_NOTIFIER);
> > +
> > +     for_each_possible_cpu(cpu) {
> > +             cpu_state = data->cpu_state[cpu];
> > +             if (cpu_state) {
> > +                     if (cpu_state->opp_table)
> > +                             dev_pm_opp_put_opp_table(cpu_state->opp_table);
> > +                     kfree(cpu_state);
> > +                     cpu_state = NULL;
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
> > +{
> > +     struct notifier_block *nb = &(*p_data)->nb;
> > +     int ret = 0;
> > +
> > +     switch ((*p_data)->parent_type) {
> > +     case DEVFREQ_PARENT_DEV:
> > +             nb->notifier_call = devfreq_passive_notifier_call;
> > +             ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
> > +                                             DEVFREQ_TRANSITION_NOTIFIER);
> > +             break;
> > +     case CPUFREQ_PARENT_DEV:
> > +             ret = cpufreq_passive_register(p_data);
> > +             break;
> > +     default:
> > +             ret = -EINVAL;
> > +             break;
> > +     }
> > +     return ret;
> > +}
> > +
> > +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
> > +{
> > +     int ret = 0;
> > +
> > +     switch ((*p_data)->parent_type) {
> > +     case DEVFREQ_PARENT_DEV:
> > +             WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
> > +                                                 &(*p_data)->nb,
> > +                                                 DEVFREQ_TRANSITION_NOTIFIER));
> > +             break;
> > +     case CPUFREQ_PARENT_DEV:
> > +             cpufreq_passive_unregister(p_data);
> > +             break;
> > +     default:
> > +             ret = -EINVAL;
> > +             break;
> > +     }
> > +     return ret;
> > +}
>
> I think that you don't need to define register_parent_dev_notifier
> and unregister_parent_dev_notifier as the separate functions.
>
> Instead of the separate functions, just add the code
> into devfreq_passive_event_handler.
>
>
> > +
> >  static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >                               unsigned int event, void *data)
> >  {
> >       struct devfreq_passive_data *p_data
> >                       = (struct devfreq_passive_data *)devfreq->data;
> >       struct devfreq *parent = (struct devfreq *)p_data->parent;
> > -     struct notifier_block *nb = &p_data->nb;
> >       int ret = 0;
> >
> > -     if (!parent)
> > +     if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
> >               return -EPROBE_DEFER;
> >
> >       switch (event) {
> > @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
> >               if (!p_data->this)
> >                       p_data->this = devfreq;
> >
> > -             nb->notifier_call = devfreq_passive_notifier_call;
> > -             ret = devfreq_register_notifier(parent, nb,
> > -                                     DEVFREQ_TRANSITION_NOTIFIER);
> > +             ret = register_parent_dev_notifier(&p_data);
> >               break;
> > +
> >       case DEVFREQ_GOV_STOP:
> > -             WARN_ON(devfreq_unregister_notifier(parent, nb,
> > -                                     DEVFREQ_TRANSITION_NOTIFIER));
> > +             ret = unregister_parent_dev_notifier(&p_data);
> >               break;
> >       default:
> >               break;
> > diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> > index 26ea0850be9b..e0093b7c805c 100644
> > --- a/include/linux/devfreq.h
> > +++ b/include/linux/devfreq.h
> > @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
> >
> >  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
> >  /**
> > + * struct devfreq_cpu_state - holds the per-cpu state
> > + * @freq:    the current frequency of the cpu.
> > + * @min_freq:        the min frequency of the cpu.
> > + * @max_freq:        the max frequency of the cpu.
> > + * @first_cpu:       the cpumask of the first cpu of a policy.
> > + * @dev:     reference to cpu device.
> > + * @opp_table:       reference to cpu opp table.
> > + *
> > + * This structure stores the required cpu_state of a cpu.
> > + * This is auto-populated by the governor.
> > + */
> > +struct devfreq_cpu_state;
> > +
> > +enum devfreq_parent_dev_type {
> > +     DEVFREQ_PARENT_DEV,
> > +     CPUFREQ_PARENT_DEV,
> > +};
> > +
> > +/**
> >   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
> >   *   and devfreq_add_device
> >   * @parent:  the devfreq instance of parent device.
> > @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
> >   *                   using governors except for passive governor.
> >   *                   If the devfreq device has the specific method to decide
> >   *                   the next frequency, should use this callback.
> > + * @parent_type:     parent type of the device
> >   * @this:    the devfreq instance of own device.
> >   * @nb:              the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
> > + * @cpu_state:               the state min/max/current frequency of all online cpu's
> >   *
> >   * The devfreq_passive_data have to set the devfreq instance of parent
> >   * device with governors except for the passive governor. But, don't need to
> > - * initialize the 'this' and 'nb' field because the devfreq core will handle
> > - * them.
> > + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
> > + * will handle them.
> >   */
> >  struct devfreq_passive_data {
> >       /* Should set the devfreq instance of parent device */
> > @@ -305,9 +326,13 @@ struct devfreq_passive_data {
> >       /* Optional callback to decide the next frequency of passvice device */
> >       int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
> >
> > +     /* Should set the type of parent device */
> > +     enum devfreq_parent_dev_type parent_type;
> > +
> >       /* For passive governor's internal use. Don't need to set them */
> >       struct devfreq *this;
> >       struct notifier_block nb;
> > +     struct devfreq_cpu_state *cpu_state[NR_CPUS];
> >  };
> >  #endif
> >
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
       [not found]             ` <1617195800.18432.3.camel@mtksdaap41>
@ 2021-04-01  0:16               ` Chanwoo Choi
  2021-04-08  2:47                 ` Chanwoo Choi
  0 siblings, 1 reply; 14+ messages in thread
From: Chanwoo Choi @ 2021-04-01  0:16 UTC (permalink / raw)
  To: andrew-sh.cheng
  Cc: MyungJoo Ham, Kyungmin Park, Rob Herring, Mark Rutland,
	Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>> Hi,
>>>
>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>> Hi,
>>>>>
>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>> Need to send them to linu-pm ML.
>>>>>
>>>>> Also, before received this series, I tried to clean-up these patches
>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>
>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>> Please update the email or drop this email.
>>>>
>>>> Hi Chanwoo,
>>>>
>>>> Thank you for the advices.
>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>> that my patch set base on
>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>
>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>
>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>
>>
>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>> branch based on latest devfreq-next branch.
>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>
>> First of all, if possible, I want to test them[1] with your patches in this series.
>> And then if there are no any problem, please let me know. After confirmed from you,
>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>> How about that?
>>
> Hi Chanwoo~
> 
> We will use this on Google Chrome project.
> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
> 
>     make sure cci devfreqs runs with cpufreq.
>     suspend resume
>     speedometer2 benchmark
> It is okay.
> 
> Please send the patches of devfreq-testing-passive-gov[1] branch.
> 
> I will send patch v9 base on yours latter.

Thanks for your test. I'll send the patches today.

> 
> 
>>
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> On 3/23/21 8:33 PM, Andrew-sh.Cheng wrote:
>>>>>> From: Saravana Kannan <skannan@codeaurora.org>
>>>>>>
>>>>>> Many CPU architectures have caches that can scale independent of the
>>>>>> CPUs. Frequency scaling of the caches is necessary to make sure that the
>>>>>> cache is not a performance bottleneck that leads to poor performance and
>>>>>> power. The same idea applies for RAM/DDR.
>>>>>>
>>>>>> To achieve this, this patch adds support for cpu based scaling to the
>>>>>> passive governor. This is accomplished by taking the current frequency
>>>>>> of each CPU frequency domain and then adjust the frequency of the cache
>>>>>> (or any devfreq device) based on the frequency of the CPUs. It listens
>>>>>> to CPU frequency transition notifiers to keep itself up to date on the
>>>>>> current CPU frequency.
>>>>>>
>>>>>> To decide the frequency of the device, the governor does one of the
>>>>>> following:
>>>>>> * Derives the optimal devfreq device opp from required-opps property of
>>>>>>   the parent cpu opp_table.
>>>>>>
>>>>>> * Scales the device frequency in proportion to the CPU frequency. So, if
>>>>>>   the CPUs are running at their max frequency, the device runs at its
>>>>>>   max frequency. If the CPUs are running at their min frequency, the
>>>>>>   device runs at its min frequency. It is interpolated for frequencies
>>>>>>   in between.
>>>>>>
>>>>>> Andrew-sh.Cheng change
>>>>>> dev_pm_opp_xlate_opp to dev_pm_opp_xlate_required_opp devfreq->max_freq
>>>>>> to devfreq->user_min_freq_req.data.freq.qos->min_freq.target_value
>>>>>> after kernel-5.7
>>>>>> Don't return -EINVAL in devfreq_passive_event_handler()
>>>>>> since it doesn't handle DEVFREQ_GOV_SUSPEND DEVFREQ_GOV_RESUME cases.
>>>>>>
>>>>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
>>>>>> [Sibi: Integrated cpu-freqmap governor into passive_governor]
>>>>>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>>>>>> Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com>
>>>>>> ---
>>>>>>  drivers/devfreq/Kconfig            |   2 +
>>>>>>  drivers/devfreq/governor_passive.c | 329 +++++++++++++++++++++++++++++++++++--
>>>>>>  include/linux/devfreq.h            |  29 +++-
>>>>>>  3 files changed, 342 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>>>>>> index 00704efe6398..f56132b0ae64 100644
>>>>>> --- a/drivers/devfreq/Kconfig
>>>>>> +++ b/drivers/devfreq/Kconfig
>>>>>> @@ -73,6 +73,8 @@ config DEVFREQ_GOV_PASSIVE
>>>>>>  	  device. This governor does not change the frequency by itself
>>>>>>  	  through sysfs entries. The passive governor recommends that
>>>>>>  	  devfreq device uses the OPP table to get the frequency/voltage.
>>>>>> +	  Alternatively the governor can also be chosen to scale based on
>>>>>> +	  the online CPUs current frequency.
>>>>>>  
>>>>>>  comment "DEVFREQ Drivers"
>>>>>>  
>>>>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>>>>> index b094132bd20b..9cc57b083839 100644
>>>>>> --- a/drivers/devfreq/governor_passive.c
>>>>>> +++ b/drivers/devfreq/governor_passive.c
>>>>>> @@ -8,11 +8,103 @@
>>>>>>   */
>>>>>>  
>>>>>>  #include <linux/module.h>
>>>>>> +#include <linux/cpu.h>
>>>>>> +#include <linux/cpufreq.h>
>>>>>> +#include <linux/cpumask.h>
>>>>>>  #include <linux/device.h>
>>>>>>  #include <linux/devfreq.h>
>>>>>> +#include <linux/slab.h>
>>>>>>  #include "governor.h"
>>>>>>  
>>>>>> -static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>> +struct devfreq_cpu_state {
>>>>>> +	unsigned int curr_freq;
>>>>>> +	unsigned int min_freq;
>>>>>> +	unsigned int max_freq;
>>>>>> +	unsigned int first_cpu;
>>>>>> +	struct device *cpu_dev;
>>>>>> +	struct opp_table *opp_table;
>>>>>> +};
>>>>>
>>>>> As I knew, the previous version has the description of structure
>>>>> as following:  I wan to add the description like below.
>>>>>
>>>>> And if you have no any objection, I'd like you to order
>>>>> the variables as following and use 'dev' instead of 'cpu_dev'
>>>>> because this patch use the 'cpu_state->cpu_dev' at the multiple points.
>>>>> I think that 'cpu_state->dev' is better than 'cpu_state->cpu_dev'.
>>>>> Also, I prefer to use 'cur_freq' instead of 'curr_freq'
>>>>> because devfreq subsystem uses 'cur_freq' for expressing the 'current frequency'.
>>>>>
>>>>> /**                                                                             
>>>>>  * struct devfreq_cpu_state - Hold the per-cpu data                              
>>>>>  * @dev:        reference to cpu device.                                        
>>>>>  * @first_cpu:  the cpumask of the first cpu of a policy.                       
>>>>>  * @opp_table:  reference to cpu opp table.                                     
>>>>>  * @cur_freq:   the current frequency of the cpu.                               
>>>>>  * @min_freq:   the min frequency of the cpu.                                   
>>>>>  * @max_freq:   the max frequency of the cpu.                                   
>>>>>  *                                                                              
>>>>>  * This structure stores the required cpu_data of a cpu.                        
>>>>>  * This is auto-populated by the governor.                                      
>>>>>  */                                                                             
>>>>> struct devfreq_cpu_state {                                                       
>>>>>          struct device *dev;                                                     
>>>>>          unsigned int first_cpu;                                                 
>>>>>
>>>>>          struct opp_table *opp_table;                                            
>>>>>          unsigned int cur_freq;                                                  
>>>>>          unsigned int min_freq;                                                  
>>>>>          unsigned int max_freq;                                                  
>>>>> };               
>>>>>
>>>>>
>>>>>> +
>>>>>> +static unsigned long xlate_cpufreq_to_devfreq(struct devfreq_passive_data *data,
>>>>>> +					      unsigned int cpu)
>>>>>> +{
>>>>>> +	unsigned int cpu_min_freq, cpu_max_freq, cpu_curr_freq_khz, cpu_percent;
>>>>>> +	unsigned long dev_min_freq, dev_max_freq, dev_max_state;
>>>>>> +
>>>>>> +	struct devfreq_cpu_state *cpu_state = data->cpu_state[cpu];
>>>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>>>> +	unsigned long *dev_freq_table = devfreq->profile->freq_table;
>>>>>> +	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>>>> +	unsigned long cpu_curr_freq, freq;
>>>>>> +
>>>>>> +	if (!cpu_state || cpu_state->first_cpu != cpu ||
>>>>>> +	    !cpu_state->opp_table || !devfreq->opp_table)
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	cpu_curr_freq = cpu_state->curr_freq * 1000;
>>>>>> +	p_opp = devfreq_recommended_opp(cpu_state->cpu_dev, &cpu_curr_freq, 0);
>>>>>> +	if (IS_ERR(p_opp))
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	opp = dev_pm_opp_xlate_required_opp(cpu_state->opp_table,
>>>>>> +					    devfreq->opp_table, p_opp);
>>>>>> +	dev_pm_opp_put(p_opp);
>>>>>> +
>>>>>> +	if (!IS_ERR(opp)) {
>>>>>> +		freq = dev_pm_opp_get_freq(opp);
>>>>>> +		dev_pm_opp_put(opp);
>>>>>> +		goto out;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* Use Interpolation if required opps is not available */
>>>>>> +	cpu_min_freq = cpu_state->min_freq;
>>>>>> +	cpu_max_freq = cpu_state->max_freq;
>>>>>> +	cpu_curr_freq_khz = cpu_state->curr_freq;
>>>>>> +
>>>>>> +	if (dev_freq_table) {
>>>>>> +		/* Get minimum frequency according to sorting order */
>>>>>> +		dev_max_state = dev_freq_table[devfreq->profile->max_state - 1];
>>>>>> +		if (dev_freq_table[0] < dev_max_state) {
>>>>>> +			dev_min_freq = dev_freq_table[0];
>>>>>> +			dev_max_freq = dev_max_state;
>>>>>> +		} else {
>>>>>> +			dev_min_freq = dev_max_state;
>>>>>> +			dev_max_freq = dev_freq_table[0];
>>>>>> +		}
>>>>>> +	} else {
>>>>>> +		dev_min_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>>>> +						     DEV_PM_QOS_MIN_FREQUENCY);
>>>>>> +		dev_max_freq = dev_pm_qos_read_value(devfreq->dev.parent,
>>>>>> +						     DEV_PM_QOS_MAX_FREQUENCY);
>>>>>> +
>>>>>> +		if (dev_max_freq <= dev_min_freq)
>>>>>> +			return 0;
>>>>>> +	}
>>>>>> +	cpu_percent = ((cpu_curr_freq_khz - cpu_min_freq) * 100) / cpu_max_freq - cpu_min_freq;
>>>>>> +	freq = dev_min_freq + mult_frac(dev_max_freq - dev_min_freq, cpu_percent, 100);
>>>>>> +
>>>>>> +out:
>>>>>> +	return freq;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>>>>> +					unsigned long *freq)
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *p_data =
>>>>>> +				(struct devfreq_passive_data *)devfreq->data;
>>>>>> +	unsigned int cpu;
>>>>>> +	unsigned long target_freq = 0;
>>>>>> +
>>>>>> +	for_each_online_cpu(cpu)
>>>>>> +		target_freq = max(target_freq,
>>>>>> +				  xlate_cpufreq_to_devfreq(p_data, cpu));
>>>>>> +
>>>>>> +	*freq = target_freq;
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>
>>>>> As you knew, governor_passive.c was already used 
>>>>> both 'dev_pm_opp_xlate_required_opp' and 'devfreq_recommended_opp'
>>>>> to get the target from OPP. So, I wan to make the common function
>>>>> like 'get_taget_freq_by_required_opp' as following:
>>>>> If define 'get_taget_freq_by_required_opp' as following,
>>>>> it will be used for get_target_freq_with_devfreq().
>>>>> After finisied the review of this patch, I'll send the patch[2].
>>>>> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=101c5a087586ab2b5cf3370166a7e39227ca83cf__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgA6mp3Yqo$ 
>>>>>
>>>>> For example but this code is not tested,
>>>>> static unsigned long get_taget_freq_by_required_opp(struct device *p_dev,
>>>>> 						struct opp_table *p_opp_table,
>>>>> 						struct opp_table *opp_table,
>>>>> 						unsigned long freq)
>>>>> {
>>>>> 	struct dev_pm_opp *opp = NULL, *p_opp = NULL;
>>>>>
>>>>> 	if (!p_dev || !p_opp_table || !opp_table || !freq)
>>>>> 		return 0;
>>>>>
>>>>> 	p_opp = devfreq_recommended_opp(p_dev, &freq, 0);
>>>>> 	if (IS_ERR(p_opp))
>>>>> 		return 0;
>>>>>
>>>>> 	opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp);
>>>>> 	dev_pm_opp_put(p_opp);
>>>>>
>>>>> 	if (IS_ERR(opp))
>>>>> 		return 0;
>>>>>
>>>>> 	freq = dev_pm_opp_get_freq(opp);
>>>>> 	dev_pm_opp_put(opp);
>>>>>
>>>>> 	return freq;
>>>>> }
>>>>>
>>>>> static int get_target_freq_with_cpufreq(struct devfreq *devfreq,
>>>>> 					unsigned long *target_freq)
>>>>> {
>>>>> 	struct devfreq_passive_data *p_data =
>>>>> 				(struct devfreq_passive_data *)devfreq->data;
>>>>> 	struct devfreq_cpu_data *cpu_data;
>>>>> 	unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent;
>>>>> 	unsigned long dev_min, dev_max;
>>>>> 	unsigned long freq = 0;
>>>>>
>>>>> 	for_each_online_cpu(cpu) {
>>>>> 		cpu_data = p_data->cpu_data[cpu];
>>>>> 		if (!cpu_data || cpu_data->first_cpu != cpu)
>>>>> 			continue;
>>>>>
>>>>> 		/* Get target freq via required opps */
>>>>> 		cpu_cur = cpu_data->cur_freq * HZ_PER_KHZ;
>>>>> 		freq = get_taget_freq_by_required_opp(cpu_data->dev,
>>>>> 					cpu_data->opp_table,
>>>>> 					devfreq->opp_table, cpu_cur);
>>>>> 		if (freq) {
>>>>> 			*target_freq = max(freq, *target_freq);
>>>>> 			continue;
>>>>> 		}
>>>>>
>>>>> 		/* Use Interpolation if required opps is not available */
>>>>> 		devfreq_get_freq_range(devfreq, &dev_min, &dev_max);
>>>>>
>>>>> 		cpu_min = cpu_data->min_freq;
>>>>> 		cpu_max = cpu_data->max_freq;
>>>>> 		cpu_cur = cpu_data->cur_freq;
>>>>>
>>>>> 		cpu_percent = ((cpu_cur - cpu_min) * 100) / cpu_max - cpu_min;
>>>>> 		freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100);
>>>>>
>>>>> 		*target_freq = max(freq, *target_freq);
>>>>> 	}
>>>>>
>>>>> 	return 0;
>>>>> }
>>>>>
>>>>>> +
>>>>>> +static int get_target_freq_with_devfreq(struct devfreq *devfreq,
>>>>>>  					unsigned long *freq)
>>>>>>  {
>>>>>>  	struct devfreq_passive_data *p_data
>>>>>> @@ -23,14 +115,6 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>>  	int i, count;
>>>>>>  
>>>>>>  	/*
>>>>>> -	 * If the devfreq device with passive governor has the specific method
>>>>>> -	 * to determine the next frequency, should use the get_target_freq()
>>>>>> -	 * of struct devfreq_passive_data.
>>>>>> -	 */
>>>>>> -	if (p_data->get_target_freq)
>>>>>> -		return p_data->get_target_freq(devfreq, freq);
>>>>>> -
>>>>>> -	/*
>>>>>>  	 * If the parent and passive devfreq device uses the OPP table,
>>>>>>  	 * get the next frequency by using the OPP table.
>>>>>>  	 */
>>>>>> @@ -98,6 +182,37 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>>  	return 0;
>>>>>>  }
>>>>>>  
>>>>>> +static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
>>>>>> +					   unsigned long *freq)
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *p_data =
>>>>>> +		(struct devfreq_passive_data *)devfreq->data;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * If the devfreq device with passive governor has the specific method
>>>>>> +	 * to determine the next frequency, should use the get_target_freq()
>>>>>> +	 * of struct devfreq_passive_data.
>>>>>> +	 */
>>>>>> +	if (p_data->get_target_freq)
>>>>>> +		return p_data->get_target_freq(devfreq, freq);
>>>>>> +
>>>>>> +	switch (p_data->parent_type) {
>>>>>> +	case DEVFREQ_PARENT_DEV:
>>>>>> +		ret = get_target_freq_with_devfreq(devfreq, freq);
>>>>>> +		break;
>>>>>> +	case CPUFREQ_PARENT_DEV:
>>>>>> +		ret = get_target_freq_with_cpufreq(devfreq, freq);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		ret = -EINVAL;
>>>>>> +		dev_err(&devfreq->dev, "Invalid parent type\n");
>>>>>> +		break;
>>>>>> +	}
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>>  static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>>>  				unsigned long event, void *ptr)
>>>>>>  {
>>>>>> @@ -130,16 +245,200 @@ static int devfreq_passive_notifier_call(struct notifier_block *nb,
>>>>>>  	return NOTIFY_DONE;
>>>>>>  }
>>>>>>  
>>>>>> +static int cpufreq_passive_notifier_call(struct notifier_block *nb,
>>>>>> +					 unsigned long event, void *ptr)
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *data =
>>>>>> +			container_of(nb, struct devfreq_passive_data, nb);
>>>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>>>> +	struct devfreq_cpu_state *cpu_state;
>>>>>> +	struct cpufreq_freqs *cpu_freq = ptr;
>>>>>
>>>>> Use 'freqs' variable name.  I prefer to use the same variable name
>>>>> for both devfreq_freqs and cpufreq_freqs instance.
>>>>>
>>>>>> +	unsigned int curr_freq;
>>>>>
>>>>> As I commented above, better to use 'cur_frq' instead of 'curr_freq'
>>>>> if there is no any special reason.
>>>>>
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	if (event != CPUFREQ_POSTCHANGE || !cpu_freq ||
>>>>>> +	    !data->cpu_state[cpu_freq->policy->cpu])
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	cpu_state = data->cpu_state[cpu_freq->policy->cpu];
>>>>>> +	if (cpu_state->curr_freq == cpu_freq->new)
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	/* Backup current freq and pre-update cpu state freq*/
>>>>>
>>>>> I think that this commnet is not critial. So, please drop this comment.
>>>>>
>>>>>> +	curr_freq = cpu_state->curr_freq;
>>>>>> +	cpu_state->curr_freq = cpu_freq->new;
>>>>>> +
>>>>>> +	mutex_lock(&devfreq->lock);
>>>>>> +	ret = update_devfreq(devfreq);
>>>>>
>>>>> I recommend to use 'devfreq_update_target' instead of 'update_devfreq'
>>>>> as following:
>>>>> 	devfreq_update_target(devfreq, freqs->new);
>>>>>
>>>>>> +	mutex_unlock(&devfreq->lock);
>>>>>> +	if (ret) {
>>>>>> +		cpu_state->curr_freq = curr_freq;
>>>>>> +		dev_err(&devfreq->dev, "Couldn't update the frequency.\n");
>>>>>> +		return ret;
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int cpufreq_passive_register(struct devfreq_passive_data **p_data)
>>>>>
>>>>> In order to keep the consistent style of function name,
>>>>> please change the name as following because devfreq defines
>>>>> the function name as 'devfreq_regiter_notifier'
>>>>> - cpufreq_passive_register -> cpufreq_passive_register_notifier
>>>>>
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *data = *p_data;
>>>>>> +	struct devfreq *devfreq = (struct devfreq *)data->this;
>>>>>> +	struct device *dev = devfreq->dev.parent;
>>>>>> +	struct opp_table *opp_table = NULL;
>>>>>> +	struct devfreq_cpu_state *cpu_state;
>>>>>> +	struct cpufreq_policy *policy;
>>>>>> +	struct device *cpu_dev;
>>>>>> +	unsigned int cpu;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	get_online_cpus();
>>>>>> +
>>>>>> +	data->nb.notifier_call = cpufreq_passive_notifier_call;
>>>>>> +	ret = cpufreq_register_notifier(&data->nb,
>>>>>> +					CPUFREQ_TRANSITION_NOTIFIER);
>>>>>> +	if (ret) {
>>>>>> +		dev_err(dev, "Couldn't register cpufreq notifier.\n");
>>>>>> +		data->nb.notifier_call = NULL;
>>>>>> +		goto out;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* Populate devfreq_cpu_state */
>>>>>
>>>>> Don't need this comment. Please drop it.
>>>>>
>>>>>> +	for_each_online_cpu(cpu) {
>>>>>> +		if (data->cpu_state[cpu])
>>>>>> +			continue;
>>>>>> +
>>>>>> +		policy = cpufreq_cpu_get(cpu);
>>>>>> +		if (!policy) {
>>>>>> +			ret = -EINVAL;
>>>>>> +			goto out;
>>>>>> +		} else if (PTR_ERR(policy) == -EPROBE_DEFER) {
>>>>>> +			ret = -EPROBE_DEFER;
>>>>>> +			goto out;
>>>>>> +		} else if (IS_ERR(policy)) {
>>>>>> +			ret = PTR_ERR(policy);
>>>>>> +			dev_err(dev, "Couldn't get the cpufreq_poliy.\n");
>>>>>> +			goto out;
>>>>>> +		}
>>>>>
>>>>> Use dev_err_probe() funciotn to handle hte EPROBE_DEFER.
>>>>> It make code more simple.
>>>>>
>>>>>> +
>>>>>> +		cpu_state = kzalloc(sizeof(*cpu_state), GFP_KERNEL);
>>>>>> +		if (!cpu_state) {
>>>>>> +			ret = -ENOMEM;
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		cpu_dev = get_cpu_device(cpu);
>>>>>> +		if (!cpu_dev) {
>>>>>> +			dev_err(dev, "Couldn't get cpu device.\n");
>>>>>> +			ret = -ENODEV;
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		opp_table = dev_pm_opp_get_opp_table(cpu_dev);
>>>>>> +		if (IS_ERR(devfreq->opp_table)) {
>>>>>> +			ret = PTR_ERR(opp_table);
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		cpu_state->cpu_dev = cpu_dev;
>>>>>> +		cpu_state->opp_table = opp_table;
>>>>>> +		cpu_state->first_cpu = cpumask_first(policy->related_cpus);
>>>>>> +		cpu_state->curr_freq = policy->cur;
>>>>>> +		cpu_state->min_freq = policy->cpuinfo.min_freq;
>>>>>> +		cpu_state->max_freq = policy->cpuinfo.max_freq;
>>>>>> +		data->cpu_state[cpu] = cpu_state;
>>>>>> +
>>>>>> +		cpufreq_cpu_put(policy);
>>>>>> +	}
>>>>>> +
>>>>>> +out:
>>>>>> +	put_online_cpus();
>>>>>> +	if (ret)
>>>>>> +		return ret;
>>>>>> +
>>>>>> +	/* Update devfreq */
>>>>>> +	mutex_lock(&devfreq->lock);
>>>>>> +	ret = update_devfreq(devfreq);
>>>>>
>>>>>> +	mutex_unlock(&devfreq->lock);
>>>>>> +	if (ret)
>>>>>> +		dev_err(dev, "Couldn't update the frequency.\n");
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int cpufreq_passive_unregister(struct devfreq_passive_data **p_data)
>>>>>
>>>>> As I commented above, please change the name as following:
>>>>> - cpufreq_passive_unregister -> cpufreq_passive_unregister_notifier
>>>>>
>>>>>> +{
>>>>>> +	struct devfreq_passive_data *data = *p_data;
>>>>>> +	struct devfreq_cpu_state *cpu_state;
>>>>>> +	int cpu;
>>>>>> +
>>>>>> +	if (data->nb.notifier_call)
>>>>>> +		cpufreq_unregister_notifier(&data->nb,
>>>>>> +					    CPUFREQ_TRANSITION_NOTIFIER);
>>>>>> +
>>>>>> +	for_each_possible_cpu(cpu) {
>>>>>> +		cpu_state = data->cpu_state[cpu];
>>>>>> +		if (cpu_state) {
>>>>>> +			if (cpu_state->opp_table)
>>>>>> +				dev_pm_opp_put_opp_table(cpu_state->opp_table);
>>>>>> +			kfree(cpu_state);
>>>>>> +			cpu_state = NULL;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +int register_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>>>> +{
>>>>>> +	struct notifier_block *nb = &(*p_data)->nb;
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	switch ((*p_data)->parent_type) {
>>>>>> +	case DEVFREQ_PARENT_DEV:
>>>>>> +		nb->notifier_call = devfreq_passive_notifier_call;
>>>>>> +		ret = devfreq_register_notifier((struct devfreq *)(*p_data)->parent, nb,
>>>>>> +						DEVFREQ_TRANSITION_NOTIFIER);
>>>>>> +		break;
>>>>>> +	case CPUFREQ_PARENT_DEV:
>>>>>> +		ret = cpufreq_passive_register(p_data);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		ret = -EINVAL;
>>>>>> +		break;
>>>>>> +	}
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>> +int unregister_parent_dev_notifier(struct devfreq_passive_data **p_data)
>>>>>> +{
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	switch ((*p_data)->parent_type) {
>>>>>> +	case DEVFREQ_PARENT_DEV:
>>>>>> +		WARN_ON(devfreq_unregister_notifier((struct devfreq *)(*p_data)->parent,
>>>>>> +						    &(*p_data)->nb,
>>>>>> +						    DEVFREQ_TRANSITION_NOTIFIER));
>>>>>> +		break;
>>>>>> +	case CPUFREQ_PARENT_DEV:
>>>>>> +		cpufreq_passive_unregister(p_data);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		ret = -EINVAL;
>>>>>> +		break;
>>>>>> +	}
>>>>>> +	return ret;
>>>>>> +}
>>>>>
>>>>> I think that you don't need to define register_parent_dev_notifier
>>>>> and unregister_parent_dev_notifier as the separate functions.
>>>>>
>>>>> Instead of the separate functions, just add the code
>>>>> into devfreq_passive_event_handler.
>>>>>
>>>>>
>>>>>> +
>>>>>>  static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>>>  				unsigned int event, void *data)
>>>>>>  {
>>>>>>  	struct devfreq_passive_data *p_data
>>>>>>  			= (struct devfreq_passive_data *)devfreq->data;
>>>>>>  	struct devfreq *parent = (struct devfreq *)p_data->parent;
>>>>>> -	struct notifier_block *nb = &p_data->nb;
>>>>>>  	int ret = 0;
>>>>>>  
>>>>>> -	if (!parent)
>>>>>> +	if (p_data->parent_type == DEVFREQ_PARENT_DEV && !parent)
>>>>>>  		return -EPROBE_DEFER;
>>>>>>  
>>>>>>  	switch (event) {
>>>>>> @@ -147,13 +446,11 @@ static int devfreq_passive_event_handler(struct devfreq *devfreq,
>>>>>>  		if (!p_data->this)
>>>>>>  			p_data->this = devfreq;
>>>>>>  
>>>>>> -		nb->notifier_call = devfreq_passive_notifier_call;
>>>>>> -		ret = devfreq_register_notifier(parent, nb,
>>>>>> -					DEVFREQ_TRANSITION_NOTIFIER);
>>>>>> +		ret = register_parent_dev_notifier(&p_data);
>>>>>>  		break;
>>>>>> +
>>>>>>  	case DEVFREQ_GOV_STOP:
>>>>>> -		WARN_ON(devfreq_unregister_notifier(parent, nb,
>>>>>> -					DEVFREQ_TRANSITION_NOTIFIER));
>>>>>> +		ret = unregister_parent_dev_notifier(&p_data);
>>>>>>  		break;
>>>>>>  	default:
>>>>>>  		break;
>>>>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>>>>> index 26ea0850be9b..e0093b7c805c 100644
>>>>>> --- a/include/linux/devfreq.h
>>>>>> +++ b/include/linux/devfreq.h
>>>>>> @@ -280,6 +280,25 @@ struct devfreq_simple_ondemand_data {
>>>>>>  
>>>>>>  #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
>>>>>>  /**
>>>>>> + * struct devfreq_cpu_state - holds the per-cpu state
>>>>>> + * @freq:	the current frequency of the cpu.
>>>>>> + * @min_freq:	the min frequency of the cpu.
>>>>>> + * @max_freq:	the max frequency of the cpu.
>>>>>> + * @first_cpu:	the cpumask of the first cpu of a policy.
>>>>>> + * @dev:	reference to cpu device.
>>>>>> + * @opp_table:	reference to cpu opp table.
>>>>>> + *
>>>>>> + * This structure stores the required cpu_state of a cpu.
>>>>>> + * This is auto-populated by the governor.
>>>>>> + */
>>>>>> +struct devfreq_cpu_state;
>>>>>> +
>>>>>> +enum devfreq_parent_dev_type {
>>>>>> +	DEVFREQ_PARENT_DEV,
>>>>>> +	CPUFREQ_PARENT_DEV,
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>>   * struct devfreq_passive_data - ``void *data`` fed to struct devfreq
>>>>>>   *	and devfreq_add_device
>>>>>>   * @parent:	the devfreq instance of parent device.
>>>>>> @@ -290,13 +309,15 @@ struct devfreq_simple_ondemand_data {
>>>>>>   *			using governors except for passive governor.
>>>>>>   *			If the devfreq device has the specific method to decide
>>>>>>   *			the next frequency, should use this callback.
>>>>>> + * @parent_type:	parent type of the device
>>>>>>   * @this:	the devfreq instance of own device.
>>>>>>   * @nb:		the notifier block for DEVFREQ_TRANSITION_NOTIFIER list
>>>>>> + * @cpu_state:		the state min/max/current frequency of all online cpu's
>>>>>>   *
>>>>>>   * The devfreq_passive_data have to set the devfreq instance of parent
>>>>>>   * device with governors except for the passive governor. But, don't need to
>>>>>> - * initialize the 'this' and 'nb' field because the devfreq core will handle
>>>>>> - * them.
>>>>>> + * initialize the 'this', 'nb' and 'cpu_state' field because the devfreq core
>>>>>> + * will handle them.
>>>>>>   */
>>>>>>  struct devfreq_passive_data {
>>>>>>  	/* Should set the devfreq instance of parent device */
>>>>>> @@ -305,9 +326,13 @@ struct devfreq_passive_data {
>>>>>>  	/* Optional callback to decide the next frequency of passvice device */
>>>>>>  	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
>>>>>>  
>>>>>> +	/* Should set the type of parent device */
>>>>>> +	enum devfreq_parent_dev_type parent_type;
>>>>>> +
>>>>>>  	/* For passive governor's internal use. Don't need to set them */
>>>>>>  	struct devfreq *this;
>>>>>>  	struct notifier_block nb;
>>>>>> +	struct devfreq_cpu_state *cpu_state[NR_CPUS];
>>>>>>  };
>>>>>>  #endif
>>>>>>  
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
  2021-04-01  0:16               ` Chanwoo Choi
@ 2021-04-08  2:47                 ` Chanwoo Choi
       [not found]                   ` <1621995727.29827.1.camel@mtksdaap41>
  0 siblings, 1 reply; 14+ messages in thread
From: Chanwoo Choi @ 2021-04-08  2:47 UTC (permalink / raw)
  To: andrew-sh.cheng
  Cc: MyungJoo Ham, Kyungmin Park, Rob Herring, Mark Rutland,
	Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

On 4/1/21 9:16 AM, Chanwoo Choi wrote:
> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>>> Hi,
>>>>
>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>>> Need to send them to linu-pm ML.
>>>>>>
>>>>>> Also, before received this series, I tried to clean-up these patches
>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>>
>>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>>> Please update the email or drop this email.
>>>>>
>>>>> Hi Chanwoo,
>>>>>
>>>>> Thank you for the advices.
>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>>> that my patch set base on
>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>
>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>>
>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>>
>>>
>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>>> branch based on latest devfreq-next branch.
>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>
>>> First of all, if possible, I want to test them[1] with your patches in this series.
>>> And then if there are no any problem, please let me know. After confirmed from you,
>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>>> How about that?
>>>
>> Hi Chanwoo~
>>
>> We will use this on Google Chrome project.
>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>>
>>     make sure cci devfreqs runs with cpufreq.
>>     suspend resume
>>     speedometer2 benchmark
>> It is okay.
>>
>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>>
>> I will send patch v9 base on yours latter.
> 
> Thanks for your test. I'll send the patches today.

I'm sorry for delay because when I tested the patches
for devfreq parent type on Odroid-xu3, there are some problem
related to lazy linking of OPP. So I'm trying to analyze them.
Unfortunately, we need to postpone these patches to next linux
version.


[snip]

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
       [not found]                   ` <1621995727.29827.1.camel@mtksdaap41>
@ 2021-05-26  3:08                     ` Chanwoo Choi
       [not found]                       ` <1622431376.14423.5.camel@mtksdaap41>
  0 siblings, 1 reply; 14+ messages in thread
From: Chanwoo Choi @ 2021-05-26  3:08 UTC (permalink / raw)
  To: andrew-sh.cheng
  Cc: MyungJoo Ham, Kyungmin Park, Rob Herring, Mark Rutland,
	Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

Hi,
On 5/26/21 11:22 AM, andrew-sh.cheng wrote:
> On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
>> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
>>> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>>>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>>>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>>>>> Need to send them to linu-pm ML.
>>>>>>>>
>>>>>>>> Also, before received this series, I tried to clean-up these patches
>>>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>>>>
>>>>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>>>>> Please update the email or drop this email.
>>>>>>>
>>>>>>> Hi Chanwoo,
>>>>>>>
>>>>>>> Thank you for the advices.
>>>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>>>>> that my patch set base on
>>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>>
>>>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>>>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>>>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>>>>
>>>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>>>>
>>>>>
>>>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>>>>> branch based on latest devfreq-next branch.
>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>
>>>>> First of all, if possible, I want to test them[1] with your patches in this series.
>>>>> And then if there are no any problem, please let me know. After confirmed from you,
>>>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>>>>> How about that?
>>>>>
>>>> Hi Chanwoo~
>>>>
>>>> We will use this on Google Chrome project.
>>>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>>>>
>>>>     make sure cci devfreqs runs with cpufreq.
>>>>     suspend resume
>>>>     speedometer2 benchmark
>>>> It is okay.
>>>>
>>>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>>>>
>>>> I will send patch v9 base on yours latter.
>>>
>>> Thanks for your test. I'll send the patches today.
>>
>> I'm sorry for delay because when I tested the patches
>> for devfreq parent type on Odroid-xu3, there are some problem
>> related to lazy linking of OPP. So I'm trying to analyze them.
>> Unfortunately, we need to postpone these patches to next linux
>> version.
>>
> Hi Chanwoo Choi~
> 
> It is said that you are busy on another task recently.
> May I know your plan on this patch?
> Thank you.

Sorry for late work. I have a question.
When I tested exynos-bus.c with adding the 'required-opp' property
on odroid-xu3 board. I got some fail about 

When calling _set_required_opps(), always _set_required_opp() returns
-EBUSY error because of following lazy linking case[1].

[1] https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/core.c#L896

/* required-opps not fully initialized yet */
if (lazy_linking_pending(opp_table))
	return -EBUSY;  


For calling dev_pm_opp_of_add_table(), lazy_link_required_opp_table() function
will be called. But, there is constraint[2]. If is_genpd of opp_table is false,
driver/opp/of.c cannot resolve the lazy linking issue.

[2]  https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/of.c#L386

/* Link required OPPs for all OPPs of the newly added OPP table */
static void lazy_link_required_opp_table(struct opp_table *new_table)
{
	struct opp_table *opp_table, *temp, **required_opp_tables;
	struct device_node *required_np, *opp_np, *required_table_np;
	struct dev_pm_opp *opp;
	int i, ret;

	/*
	 * We only support genpd's OPPs in the "required-opps" for now,
	 * as we don't know much about other cases.
	 */
	if (!new_table->is_genpd)
		return;

Even if this case, there are no problem on your test case?

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
       [not found]                       ` <1622431376.14423.5.camel@mtksdaap41>
@ 2021-05-31  7:56                         ` Chanwoo Choi
       [not found]                           ` <CACb=7PUkpMkDOJ6dDHXhJ5ep4e9u8ZVYM8M2iC-iwHXn13t3DQ@mail.gmail.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Chanwoo Choi @ 2021-05-31  7:56 UTC (permalink / raw)
  To: andrew-sh.cheng, Hsin-Yi Wang
  Cc: MyungJoo Ham, Kyungmin Park, Rob Herring, Mark Rutland,
	Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

Hi,

On 5/31/21 12:22 PM, andrew-sh.cheng wrote:
> On Wed, 2021-05-26 at 12:08 +0900, Chanwoo Choi wrote:
>> Hi,
>> On 5/26/21 11:22 AM, andrew-sh.cheng wrote:
>>> On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
>>>> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
>>>>> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>>>>>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>>>>>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>>>>>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> You are missing to add these patches to linux-pm mailing list.
>>>>>>>>>> Need to send them to linu-pm ML.
>>>>>>>>>>
>>>>>>>>>> Also, before received this series, I tried to clean-up these patches
>>>>>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>>>>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$ 
>>>>>>>>>>
>>>>>>>>>> And 'Saravana Kannan <skannan@codeaurora.org>' is wrong email address.
>>>>>>>>>> Please update the email or drop this email.
>>>>>>>>>
>>>>>>>>> Hi Chanwoo,
>>>>>>>>>
>>>>>>>>> Thank you for the advices.
>>>>>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>>>>>>>>> that my patch set base on
>>>>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>>>>
>>>>>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>>>>>>>> So that if possible, I'd like you to test your patches with this patch[1] 
>>>>>>>> and then if there is no problem, could you send the next patches with patch[1]?
>>>>>>>>
>>>>>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$ 
>>>>>>>
>>>>>>>
>>>>>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>>>>>>> branch based on latest devfreq-next branch.
>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$ 
>>>>>>>
>>>>>>> First of all, if possible, I want to test them[1] with your patches in this series.
>>>>>>> And then if there are no any problem, please let me know. After confirmed from you,
>>>>>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>>>>>>> How about that?
>>>>>>>
>>>>>> Hi Chanwoo~
>>>>>>
>>>>>> We will use this on Google Chrome project.
>>>>>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>>>>>>
>>>>>>     make sure cci devfreqs runs with cpufreq.
>>>>>>     suspend resume
>>>>>>     speedometer2 benchmark
>>>>>> It is okay.
>>>>>>
>>>>>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>>>>>>
>>>>>> I will send patch v9 base on yours latter.
>>>>>
>>>>> Thanks for your test. I'll send the patches today.
>>>>
>>>> I'm sorry for delay because when I tested the patches
>>>> for devfreq parent type on Odroid-xu3, there are some problem
>>>> related to lazy linking of OPP. So I'm trying to analyze them.
>>>> Unfortunately, we need to postpone these patches to next linux
>>>> version.
>>>>
>>> Hi Chanwoo Choi~
>>>
>>> It is said that you are busy on another task recently.
>>> May I know your plan on this patch?
>>> Thank you.
>>
>> Sorry for late work. I have a question.
>> When I tested exynos-bus.c with adding the 'required-opp' property
>> on odroid-xu3 board. I got some fail about 
>>
>> When calling _set_required_opps(), always _set_required_opp() returns
>> -EBUSY error because of following lazy linking case[1].
>>
>> [1] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/core.c*L896__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_kxspMJI$ 
>>
>> /* required-opps not fully initialized yet */
>> if (lazy_linking_pending(opp_table))
>> 	return -EBUSY;  
>>
>>
>> For calling dev_pm_opp_of_add_table(), lazy_link_required_opp_table() function
>> will be called. But, there is constraint[2]. If is_genpd of opp_table is false,
>> driver/opp/of.c cannot resolve the lazy linking issue.
>>
>> [2]  https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/of.c*L386__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_QFUVY9E$ 
>>
>> /* Link required OPPs for all OPPs of the newly added OPP table */
>> static void lazy_link_required_opp_table(struct opp_table *new_table)
>> {
>> 	struct opp_table *opp_table, *temp, **required_opp_tables;
>> 	struct device_node *required_np, *opp_np, *required_table_np;
>> 	struct dev_pm_opp *opp;
>> 	int i, ret;
>>
>> 	/*
>> 	 * We only support genpd's OPPs in the "required-opps" for now,
>> 	 * as we don't know much about other cases.
>> 	 */
>> 	if (!new_table->is_genpd)
>> 		return;
>>
>> Even if this case, there are no problem on your test case?
>>
> 
> Hi Chanwoo~
> Sorry for late reply.
> Yes, we meet similar issue.
> Google member Hsin-Yi had helped deal with this issue on Chrome project.
> 
> Patch segment:
> @ /drivers/opp/of.c
> 
> /* Link required OPPs for all OPPs of the newly added OPP table */
> static void lazy_link_required_opp_table(struct opp_table *new_table)
> {
> 	struct opp_table *opp_table, *temp, **required_opp_tables;
> 	struct device_node *required_np, *opp_np, *required_table_np;
> 	struct dev_pm_opp *opp;
> 	int i, ret;
> 
> +	/*
> +	 * We only support genpd's OPPs in the "required-opps" for now,
> +	 * as we don't know much about other cases.
> +	 */
> +	if (!new_table->is_genpd)
> +		return;
> 
> 
> Hsin-Yi replied this issue in the discussion list in the original lazy
> link thread:
> https://patchwork.kernel.org/project/linux-pm/patch/20190717222340.137578-4-saravanak@google.com/#23932203
> 
> Loop Hsin-YI here.
> You can discuss with her if needing more detail.
> 
> Thank you both.
> 

Thanks. First of all, we need to resolve and discuss this issue.


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor
       [not found]                           ` <CACb=7PUkpMkDOJ6dDHXhJ5ep4e9u8ZVYM8M2iC-iwHXn13t3DQ@mail.gmail.com>
@ 2021-05-31  8:13                             ` Chanwoo Choi
  0 siblings, 0 replies; 14+ messages in thread
From: Chanwoo Choi @ 2021-05-31  8:13 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: andrew-sh.cheng, MyungJoo Ham, Kyungmin Park, Rob Herring,
	Mark Rutland, Matthias Brugger, Rafael J. Wysocki, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Liam Girdwood, Mark Brown,
	linux-pm, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, srv_heupstream, Sibi Sankar

On 5/31/21 4:42 PM, Hsin-Yi Wang wrote:
> 
> 
> On Mon, May 31, 2021 at 3:37 PM Chanwoo Choi <cw00.choi@samsung.com <mailto:cw00.choi@samsung.com>> wrote:
> 
>     Hi,
> 
>     On 5/31/21 12:22 PM, andrew-sh.cheng wrote:
>     > On Wed, 2021-05-26 at 12:08 +0900, Chanwoo Choi wrote:
>     >> Hi,
>     >> On 5/26/21 11:22 AM, andrew-sh.cheng wrote:
>     >>> On Thu, 2021-04-08 at 11:47 +0900, Chanwoo Choi wrote:
>     >>>> On 4/1/21 9:16 AM, Chanwoo Choi wrote:
>     >>>>> On 3/31/21 10:03 PM, andrew-sh.cheng wrote:
>     >>>>>> On Wed, 2021-03-31 at 17:35 +0900, Chanwoo Choi wrote:
>     >>>>>>> On 3/31/21 5:27 PM, Chanwoo Choi wrote:
>     >>>>>>>> Hi,
>     >>>>>>>>
>     >>>>>>>> On 3/31/21 5:03 PM, andrew-sh.cheng wrote:
>     >>>>>>>>> On Thu, 2021-03-25 at 17:14 +0900, Chanwoo Choi wrote:
>     >>>>>>>>>> Hi,
>     >>>>>>>>>>
>     >>>>>>>>>> You are missing to add these patches to linux-pm mailing list.
>     >>>>>>>>>> Need to send them to linu-pm ML.
>     >>>>>>>>>>
>     >>>>>>>>>> Also, before received this series, I tried to clean-up these patches
>     >>>>>>>>>> on testing branch[1]. So that I add my comment with my clean-up case.
>     >>>>>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!zIrzeDp9vPnm1_SDzVPuzqdHn3zWie9DnfBXaA-j9-CSrVc6aR9_rJQQiw81_CgAPh9XRRs$
>     >>>>>>>>>>
>     >>>>>>>>>> And 'Saravana Kannan <skannan@codeaurora.org <mailto:skannan@codeaurora.org>>' is wrong email address.
>     >>>>>>>>>> Please update the email or drop this email.
>     >>>>>>>>>
>     >>>>>>>>> Hi Chanwoo,
>     >>>>>>>>>
>     >>>>>>>>> Thank you for the advices.
>     >>>>>>>>> I will resend patch v9 (add to linux-pm ML), remove this patch, and note
>     >>>>>>>>> that my patch set base on
>     >>>>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$
>     >>>>>>>>
>     >>>>>>>> I has not yet test this patch[1] on devfreq-testing-passive-gov branch.
>     >>>>>>>> So that if possible, I'd like you to test your patches with this patch[1]
>     >>>>>>>> and then if there is no problem, could you send the next patches with patch[1]?
>     >>>>>>>>
>     >>>>>>>> [1]https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/commit/?h=devfreq-testing-passive-gov&id=39c80d11a8f42dd63ecea1e0df595a0ceb83b454__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJR2cQqZs$
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> Sorry for the confusion. I make the devfreq-testing-passive-gov[1]
>     >>>>>>> branch based on latest devfreq-next branch.
>     >>>>>>> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-testing-passive-gov__;!!CTRNKA9wMg0ARbw!yUlsuxrL5PcbF7o6A9DlCfvoA6w8V8VXKjYIybYyiJg3D0HM-lI2xRuxLUV6b3UJ8WFhg_g$
>     >>>>>>>
>     >>>>>>> First of all, if possible, I want to test them[1] with your patches in this series.
>     >>>>>>> And then if there are no any problem, please let me know. After confirmed from you,
>     >>>>>>> I'll send the patches of devfreq-testing-passive-gov[1] branch.
>     >>>>>>> How about that?
>     >>>>>>>
>     >>>>>> Hi Chanwoo~
>     >>>>>>
>     >>>>>> We will use this on Google Chrome project.
>     >>>>>> Google Hsin-Yi has test your patch + my patch set v8 [2~8]
>     >>>>>>
>     >>>>>>     make sure cci devfreqs runs with cpufreq.
>     >>>>>>     suspend resume
>     >>>>>>     speedometer2 benchmark
>     >>>>>> It is okay.
>     >>>>>>
>     >>>>>> Please send the patches of devfreq-testing-passive-gov[1] branch.
>     >>>>>>
>     >>>>>> I will send patch v9 base on yours latter.
>     >>>>>
>     >>>>> Thanks for your test. I'll send the patches today.
>     >>>>
>     >>>> I'm sorry for delay because when I tested the patches
>     >>>> for devfreq parent type on Odroid-xu3, there are some problem
>     >>>> related to lazy linking of OPP. So I'm trying to analyze them.
>     >>>> Unfortunately, we need to postpone these patches to next linux
>     >>>> version.
>     >>>>
>     >>> Hi Chanwoo Choi~
>     >>>
>     >>> It is said that you are busy on another task recently.
>     >>> May I know your plan on this patch?
>     >>> Thank you.
>     >>
>     >> Sorry for late work. I have a question.
>     >> When I tested exynos-bus.c with adding the 'required-opp' property
>     >> on odroid-xu3 board. I got some fail about
>     >>
>     >> When calling _set_required_opps(), always _set_required_opp() returns
>     >> -EBUSY error because of following lazy linking case[1].
>     >>
>     >> [1] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/core.c*L896__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_kxspMJI$
>     >>
>     >> /* required-opps not fully initialized yet */
>     >> if (lazy_linking_pending(opp_table))
>     >>      return -EBUSY; 
>     >>
>     >>
>     >> For calling dev_pm_opp_of_add_table(), lazy_link_required_opp_table() function
>     >> will be called. But, there is constraint[2]. If is_genpd of opp_table is false,
>     >> driver/opp/of.c cannot resolve the lazy linking issue.
>     >>
>     >> [2]  https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.13-rc3/source/drivers/opp/of.c*L386__;Iw!!CTRNKA9wMg0ARbw!3eNxwDZRy-Ev5BHGxT-BxCz4qrNy0NZohQuBGW36krkwOkl_WX8yBmxlqSk9hxp_QFUVY9E$
>     >>
>     >> /* Link required OPPs for all OPPs of the newly added OPP table */
>     >> static void lazy_link_required_opp_table(struct opp_table *new_table)
>     >> {
>     >>      struct opp_table *opp_table, *temp, **required_opp_tables;
>     >>      struct device_node *required_np, *opp_np, *required_table_np;
>     >>      struct dev_pm_opp *opp;
>     >>      int i, ret;
>     >>
>     >>      /*
>     >>       * We only support genpd's OPPs in the "required-opps" for now,
>     >>       * as we don't know much about other cases.
>     >>       */
>     >>      if (!new_table->is_genpd)
>     >>              return;
>     >>
>     >> Even if this case, there are no problem on your test case?
>     >>
>     >
>     > Hi Chanwoo~
>     > Sorry for late reply.
>     > Yes, we meet similar issue.
>     > Google member Hsin-Yi had helped deal with this issue on Chrome project.
>     >
>     > Patch segment:
>     > @ /drivers/opp/of.c
>     >
>     > /* Link required OPPs for all OPPs of the newly added OPP table */
>     > static void lazy_link_required_opp_table(struct opp_table *new_table)
>     > {
>     >       struct opp_table *opp_table, *temp, **required_opp_tables;
>     >       struct device_node *required_np, *opp_np, *required_table_np;
>     >       struct dev_pm_opp *opp;
>     >       int i, ret;
>     >
>     > +     /*
>     > +      * We only support genpd's OPPs in the "required-opps" for now,
>     > +      * as we don't know much about other cases.
>     > +      */
>     > +     if (!new_table->is_genpd)
>     > +             return;
>     >
>     >
>     > Hsin-Yi replied this issue in the discussion list in the original lazy
>     > link thread:
>     > https://patchwork.kernel.org/project/linux-pm/patch/20190717222340.137578-4-saravanak@google.com/#23932203
>     >
>     > Loop Hsin-YI here.
>     > You can discuss with her if needing more detail.
>     >
>     > Thank you both.
>     >
> 
>     Thanks. First of all, we need to resolve and discuss this issue.
> 
> 
> Hi Chanwoo, 
> 
> We think removing the genpd check is sufficient for our use case since we only use the lazy link for opp table translation.

Hi Hsin-Yi,

IMHO, I think 'is_genpd' checking should be removed for devices except for genpd
like as following:

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index c582a9ca397b..b54d3a985515 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -201,17 +201,6 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
                        lazy = true;
                        continue;
                }
-
-               /*
-                * We only support genpd's OPPs in the "required-opps" for now,
-                * as we don't know how much about other cases. Error out if the
-                * required OPP doesn't belong to a genpd.
-                */
-               if (!required_opp_tables[i]->is_genpd) {
-                       dev_err(dev, "required-opp doesn't belong to genpd: %pOF\n",
-                               required_np);
-                       goto free_required_tables;
-               }
        }
 
        /* Let's do the linking later on */
@@ -379,13 +368,6 @@ static void lazy_link_required_opp_table(struct opp_table *new_table)
        struct dev_pm_opp *opp;
        int i, ret;
 
-       /*
-        * We only support genpd's OPPs in the "required-opps" for now,
-        * as we don't know much about other cases.
-        */
-       if (!new_table->is_genpd)
-               return;
-
        mutex_lock(&opp_table_lock);
 
        list_for_each_entry_safe(opp_table, temp, &lazy_opp_tables, lazy) {
@@ -874,7 +856,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
                return ERR_PTR(-ENOMEM);
 
        ret = _read_opp_key(new_opp, opp_table, np, &rate_not_available);
-       if (ret < 0 && !opp_table->is_genpd) {
+       if (ret < 0) {
                dev_err(dev, "%s: opp key field not found\n", __func__);
                goto free_opp;
        }


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-31  7:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1616499241-4906-1-git-send-email-andrew-sh.cheng@mediatek.com>
     [not found] ` <CGME20210323113411epcas1p3b4367563007ca91c30201d7fc225bb67@epcas1p3.samsung.com>
     [not found]   ` <1616499241-4906-2-git-send-email-andrew-sh.cheng@mediatek.com>
2021-03-25  7:42     ` [PATCH V8 1/8] PM / devfreq: Add cpu based scaling support to passive_governor Chanwoo Choi
2021-03-25  8:14     ` Chanwoo Choi
     [not found]       ` <1617177820.15067.1.camel@mtksdaap41>
2021-03-31  8:27         ` Chanwoo Choi
2021-03-31  8:35           ` Chanwoo Choi
     [not found]             ` <1617195800.18432.3.camel@mtksdaap41>
2021-04-01  0:16               ` Chanwoo Choi
2021-04-08  2:47                 ` Chanwoo Choi
     [not found]                   ` <1621995727.29827.1.camel@mtksdaap41>
2021-05-26  3:08                     ` Chanwoo Choi
     [not found]                       ` <1622431376.14423.5.camel@mtksdaap41>
2021-05-31  7:56                         ` Chanwoo Choi
     [not found]                           ` <CACb=7PUkpMkDOJ6dDHXhJ5ep4e9u8ZVYM8M2iC-iwHXn13t3DQ@mail.gmail.com>
2021-05-31  8:13                             ` Chanwoo Choi
2021-03-31 10:46       ` Hsin-Yi Wang
     [not found] ` <CGME20210323113411epcas1p3dcc8649a2e3bed66866e3470d7aab447@epcas1p3.samsung.com>
     [not found]   ` <1616499241-4906-5-git-send-email-andrew-sh.cheng@mediatek.com>
2021-03-25  8:04     ` [PATCH V8 4/8] devfreq: add mediatek cci devfreq Chanwoo Choi
     [not found] ` <CGME20210323113413epcas1p1d3acc9ac2539da96b0757a0159bdcfc7@epcas1p1.samsung.com>
     [not found]   ` <1616499241-4906-8-git-send-email-andrew-sh.cheng@mediatek.com>
2021-03-25  8:11     ` [PATCH V8 7/8] devfreq: mediatek: cci devfreq register opp notification for SVS support Chanwoo Choi
     [not found] ` <1616499241-4906-3-git-send-email-andrew-sh.cheng@mediatek.com>
2021-03-30  4:36   ` [PATCH V8 2/8] cpufreq: mediatek: Enable clock and regulator Viresh Kumar
     [not found]     ` <1617168099.18405.8.camel@mtksdaap41>
2021-03-31  6:17       ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).