All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm
@ 2016-03-28  5:32 Huang Rui
  2016-03-28  5:32 ` [PATCH v5 1/6] hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence Huang Rui
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Huang Rui @ 2016-03-28  5:32 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-hwmon, linux-kernel, spg_linux_kernel, Borislav Petkov, Huang Rui

Hi Guenter,

This serial of patches introduces an accumulated power reporting
algorithm. It will calculate the average power consumption for the
processor. The cpu feature flag is CPUID.8000_0007H:EDX[12].

This algorithm is used to test the comparison of processor power
consumption with between MWAITX delay and TSC delay on AMD Carrizo
platforms.

Reference:
https://lkml.kernel.org/r/1438744732-1459-1-git-send-email-ray.huang@amd.com

Commit f96756 at tip ("x86/asm: Add MONITORX/MWAITX instruction support")
Commit b466bd at tip ("x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer")

V1: https://lkml.kernel.org/r/1440662866-28716-1-git-send-email-ray.huang@amd.com
V2: https://lkml.kernel.org/r/1445308109-17970-1-git-send-email-ray.huang@amd.com
V3: https://lkml.kernel.org/r/1446199024-1896-1-git-send-email-ray.huang@amd.com
V4: https://lkml.kernel.org/r/1457662670-3354-1-git-send-email-ray.huang@amd.com

On V5, I used the new x86 topology with compute unit id (cpu_core_id)
and compute unit numbers (x86_max_cores) for AMD processors. The
change and documentation will go to master soon as below link. And
This way is better to use smp_num_siblings which is un-defined
variable out of CONFIG_SMP.

Refer:
https://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/commit/?h=tip-urgent&id=f6ce1851fa2eec2c332255fc25a544658dbfbfe4

Changes from v1 -> v2:
- Move fam15h_power_groups and fam15h_power_group into fam15h_power_data to
  avoid overwrite on multi-CPU system.
- Rename FAM15H_MIN_NUM_ATTRS macro and fix return error code.
- Remove unnecessary warning print.
- Adds do_read_registers_on_cu to do all the read to all MSRs and run it on one
  of the online cores on each compute unit with smp_call_function_many().
- Use power1_average and power1_average_interval standard entry
  instread of power1_acc
- Fix the CPU-hotplug case.

Changes from v2 -> v3:
- As Guenter's suggestion, remove typecast, use &data->groups[0].
- Remove all "fam15_power_*" prefix at data.
- Remove unnecessary ( ).
- Fix the issue that is reported by build test robot, and add
  CPU_SUP_AMD as the dependence of fam15h_power
- Remove the WARN_ON at do_read_registers_on_cu, because it must be
  behind CPUID check. The MSR must be available since
  CPUID.8000_0007H:EDX[12] is set 
- Add get_online_cpus()/put_online_cpus() functions.
- Refine commments and the method which generate cpumask for cu.
- Add the interval scope to make the value suitable for user
  experience
- Remove the useless mutex.

Changes from v3 -> v4:
- Rebase the patches to latest groeck/hwmon-next.
- Use smp_num_siblings instead of cores_per_cu accessor.
- Refine the cpumask method which is inspired by perf solution.
- Fix some typo and errors.

Changes from v4 -> v5:
- Rebase patches to v4.6-rc1.
- Use new x86 topology with compute id (cpu_core_id) and compute unit
  number (x86_max_cores) instead of smp_num_siblings.

A simple example:

ray@hr-ub:~/tip$ sensors
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       19.58 mW (avg =   2.55 mW, interval =   0.01 s)
                       (crit =  15.00 W)

...

These patches are rebased on v4.6-rc1.

Thanks,
Rui

Huang Rui (6):
  hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence
  hwmon: (fam15h_power) Add compute unit accumulated power
  hwmon: (fam15h_power) Add ptsc counter value for accumulated power
  hwmon: (fam15h_power) Introduce a cpu accumulated power reporting
    algorithm
  hwmon: (fam15h_power) Add documentation for TDP and accumulated power
    algorithm
  hwmon: (fam15h_power) Add platform check function

 Documentation/hwmon/fam15h_power |  57 ++++++++++-
 drivers/hwmon/Kconfig            |   2 +-
 drivers/hwmon/fam15h_power.c     | 199 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 252 insertions(+), 6 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 1/6] hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence
  2016-03-28  5:32 [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
@ 2016-03-28  5:32 ` Huang Rui
  2016-03-28  8:59   ` Borislav Petkov
  2016-03-28  5:32 ` [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Huang Rui @ 2016-03-28  5:32 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-hwmon, linux-kernel, spg_linux_kernel, Borislav Petkov, Huang Rui

This patch adds CONFIG_CPU_SUP_AMD as the dependence of fam15h_power
driver. Because the following patch will use the interface from
x86/kernel/cpu/amd.c.

Otherwise, the below error might be encountered:

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `fam15h_power_probe':
>> fam15h_power.c:(.text+0x26e3a3): undefined reference to
>> `amd_get_cores_per_cu'
   fam15h_power.c:(.text+0x26e41e): undefined reference to
`amd_get_cores_per_cu'

Reported-by: build test robot <lkp@intel.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/hwmon/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 5c2d13a..4be3792 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -288,7 +288,7 @@ config SENSORS_K10TEMP
 
 config SENSORS_FAM15H_POWER
 	tristate "AMD Family 15h processor power"
-	depends on X86 && PCI
+	depends on X86 && PCI && CPU_SUP_AMD
 	help
 	  If you say yes here you get support for processor power
 	  information of your AMD family 15h CPU.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power
  2016-03-28  5:32 [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
  2016-03-28  5:32 ` [PATCH v5 1/6] hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence Huang Rui
@ 2016-03-28  5:32 ` Huang Rui
  2016-03-28  9:29   ` Borislav Petkov
  2016-03-28  5:32 ` [PATCH v5 3/6] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Huang Rui @ 2016-03-28  5:32 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-hwmon, linux-kernel, spg_linux_kernel, Borislav Petkov, Huang Rui

This patch adds a member in fam15h_power_data which specifies the
compute unit accumulated power. It adds do_read_registers_on_cu to do
all the read to all MSRs and run it on one of the online cores on each
compute unit with smp_call_function_many(). This behavior can decrease
IPI numbers.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/hwmon/fam15h_power.c | 65 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 4f695d8..ccbc944 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -25,6 +25,8 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/bitops.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 
@@ -44,7 +46,9 @@ MODULE_LICENSE("GPL");
 
 #define FAM15H_MIN_NUM_ATTRS		2
 #define FAM15H_NUM_GROUPS		2
+#define MAX_CUS				8
 
+#define MSR_F15H_CU_PWR_ACCUMULATOR	0xc001007a
 #define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
 
 #define PCI_DEVICE_ID_AMD_15H_M70H_NB_F4 0x15b4
@@ -59,6 +63,8 @@ struct fam15h_power_data {
 	struct attribute_group group;
 	/* maximum accumulated power of a compute unit */
 	u64 max_cu_acc_power;
+	/* accumulated power of the compute units */
+	u64 cu_acc_power[MAX_CUS];
 };
 
 static ssize_t show_power(struct device *dev,
@@ -125,6 +131,63 @@ static ssize_t show_power_crit(struct device *dev,
 }
 static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
 
+static void do_read_registers_on_cu(void *_data)
+{
+	struct fam15h_power_data *data = _data;
+	int cpu, cu;
+
+	cpu = smp_processor_id();
+
+	/*
+	 * With the new x86 topology modelling, cpu core id actually
+	 * is compute unit id.
+	 */
+	cu = cpu_data(cpu).cpu_core_id;
+
+	rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR, &data->cu_acc_power[cu]);
+}
+
+/*
+ * This function is only able to be called when CPUID
+ * Fn8000_0007:EDX[12] is set.
+ */
+static int read_registers(struct fam15h_power_data *data)
+{
+	int this_cpu, ret, cpu;
+	int target;
+	cpumask_var_t mask;
+
+	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
+	if (!ret)
+		return -ENOMEM;
+
+	get_online_cpus();
+	this_cpu = get_cpu();
+
+	/*
+	 * Choose the first online core of each compute unit, and then
+	 * read their MSR value of power and ptsc in a single IPI,
+	 * because the MSR value of CPU core represent the compute
+	 * unit's.
+	 */
+	for_each_online_cpu(cpu) {
+		target = cpumask_first(topology_sibling_cpumask(cpu));
+		if (!cpumask_test_cpu(target, mask))
+			cpumask_set_cpu(target, mask);
+	}
+
+	if (cpumask_test_cpu(this_cpu, mask))
+		do_read_registers_on_cu(data);
+
+	smp_call_function_many(mask, do_read_registers_on_cu, data, true);
+	put_cpu();
+	put_online_cpus();
+
+	free_cpumask_var(mask);
+
+	return 0;
+}
+
 static int fam15h_power_init_attrs(struct pci_dev *pdev,
 				   struct fam15h_power_data *data)
 {
@@ -263,7 +326,7 @@ static int fam15h_power_init_data(struct pci_dev *f4,
 
 	data->max_cu_acc_power = tmp;
 
-	return 0;
+	return read_registers(data);
 }
 
 static int fam15h_power_probe(struct pci_dev *pdev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 3/6] hwmon: (fam15h_power) Add ptsc counter value for accumulated power
  2016-03-28  5:32 [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
  2016-03-28  5:32 ` [PATCH v5 1/6] hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence Huang Rui
  2016-03-28  5:32 ` [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
@ 2016-03-28  5:32 ` Huang Rui
  2016-03-28  5:32 ` [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Huang Rui @ 2016-03-28  5:32 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-hwmon, linux-kernel, spg_linux_kernel, Borislav Petkov, Huang Rui

PTSC is the performance timestamp counter value in a cpu core and the
cores in one compute unit have the fixed frequency. So it picks up the
performance timestamp counter value of the first core per compute unit
to measure the interval for average power per compute unit.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
---
 drivers/hwmon/fam15h_power.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index ccbc944..de6f52b 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -50,6 +50,7 @@ MODULE_LICENSE("GPL");
 
 #define MSR_F15H_CU_PWR_ACCUMULATOR	0xc001007a
 #define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
+#define MSR_F15H_PTSC			0xc0010280
 
 #define PCI_DEVICE_ID_AMD_15H_M70H_NB_F4 0x15b4
 
@@ -65,6 +66,8 @@ struct fam15h_power_data {
 	u64 max_cu_acc_power;
 	/* accumulated power of the compute units */
 	u64 cu_acc_power[MAX_CUS];
+	/* performance timestamp counter */
+	u64 cpu_sw_pwr_ptsc[MAX_CUS];
 };
 
 static ssize_t show_power(struct device *dev,
@@ -145,6 +148,7 @@ static void do_read_registers_on_cu(void *_data)
 	cu = cpu_data(cpu).cpu_core_id;
 
 	rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR, &data->cu_acc_power[cu]);
+	rdmsrl_safe(MSR_F15H_PTSC, &data->cpu_sw_pwr_ptsc[cu]);
 }
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2016-03-28  5:32 [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (2 preceding siblings ...)
  2016-03-28  5:32 ` [PATCH v5 3/6] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
@ 2016-03-28  5:32 ` Huang Rui
  2016-03-28  9:33   ` Borislav Petkov
  2016-03-28  5:32 ` [PATCH v5 5/6] hwmon: (fam15h_power) Add documentation for TDP and accumulated power algorithm Huang Rui
  2016-03-28  5:32 ` [PATCH v5 6/6] hwmon: (fam15h_power) Add platform check function Huang Rui
  5 siblings, 1 reply; 15+ messages in thread
From: Huang Rui @ 2016-03-28  5:32 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-hwmon, linux-kernel, spg_linux_kernel, Borislav Petkov, Huang Rui

This patch introduces an algorithm that computes the average power by
reading a delta value of “core power accumulator” register during
measurement interval, and then dividing delta value by the length of
the time interval.

User is able to use power1_average entry to measure the processor power
consumption and power1_average_interval entry to set the interval.

A simple example:

ray@hr-ub:~/tip$ sensors
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       19.58 mW (avg =   2.55 mW, interval =   0.01 s)
                       (crit =  15.00 W)

...

The result is current average processor power consumption in 10
millisecond. The unit of the result is uWatt.

Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
---
 drivers/hwmon/fam15h_power.c | 119 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index de6f52b..003564b 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -27,6 +27,8 @@
 #include <linux/bitops.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
+#include <linux/time.h>
+#include <linux/sched.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 
@@ -48,6 +50,9 @@ MODULE_LICENSE("GPL");
 #define FAM15H_NUM_GROUPS		2
 #define MAX_CUS				8
 
+/* set maximum interval as 1 second */
+#define MAX_INTERVAL			1000
+
 #define MSR_F15H_CU_PWR_ACCUMULATOR	0xc001007a
 #define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
 #define MSR_F15H_PTSC			0xc0010280
@@ -68,6 +73,9 @@ struct fam15h_power_data {
 	u64 cu_acc_power[MAX_CUS];
 	/* performance timestamp counter */
 	u64 cpu_sw_pwr_ptsc[MAX_CUS];
+	/* online/offline status of current compute unit */
+	int cu_on[MAX_CUS];
+	unsigned long power_period;
 };
 
 static ssize_t show_power(struct device *dev,
@@ -149,6 +157,8 @@ static void do_read_registers_on_cu(void *_data)
 
 	rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR, &data->cu_acc_power[cu]);
 	rdmsrl_safe(MSR_F15H_PTSC, &data->cpu_sw_pwr_ptsc[cu]);
+
+	data->cu_on[cu] = 1;
 }
 
 /*
@@ -165,6 +175,8 @@ static int read_registers(struct fam15h_power_data *data)
 	if (!ret)
 		return -ENOMEM;
 
+	memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
+
 	get_online_cpus();
 	this_cpu = get_cpu();
 
@@ -192,18 +204,117 @@ static int read_registers(struct fam15h_power_data *data)
 	return 0;
 }
 
+static ssize_t acc_show_power(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct fam15h_power_data *data = dev_get_drvdata(dev);
+	u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
+	    jdelta[MAX_CUS];
+	u64 tdelta, avg_acc;
+	int cu, cu_num, ret;
+	signed long leftover;
+
+	/*
+	 * With the new x86 topology modelling, x86_max_cores is the
+	 * compute unit number.
+	 */
+	cu_num = boot_cpu_data.x86_max_cores;
+
+	ret = read_registers(data);
+	if (ret)
+		return 0;
+
+	for (cu = 0; cu < cu_num; cu++) {
+		prev_cu_acc_power[cu] = data->cu_acc_power[cu];
+		prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
+	}
+
+	leftover = schedule_timeout_interruptible(msecs_to_jiffies(data->power_period));
+	if (leftover)
+		return 0;
+
+	ret = read_registers(data);
+	if (ret)
+		return 0;
+
+	for (cu = 0, avg_acc = 0; cu < cu_num; cu++) {
+		/* check if current compute unit is online */
+		if (data->cu_on[cu] == 0)
+			continue;
+
+		if (data->cu_acc_power[cu] < prev_cu_acc_power[cu]) {
+			jdelta[cu] = data->max_cu_acc_power + data->cu_acc_power[cu];
+			jdelta[cu] -= prev_cu_acc_power[cu];
+		} else {
+			jdelta[cu] = data->cu_acc_power[cu] - prev_cu_acc_power[cu];
+		}
+		tdelta = data->cpu_sw_pwr_ptsc[cu] - prev_ptsc[cu];
+		jdelta[cu] *= data->cpu_pwr_sample_ratio * 1000;
+		do_div(jdelta[cu], tdelta);
+
+		/* the unit is microWatt */
+		avg_acc += jdelta[cu];
+	}
+
+	return sprintf(buf, "%llu\n", (unsigned long long)avg_acc);
+}
+static DEVICE_ATTR(power1_average, S_IRUGO, acc_show_power, NULL);
+
+static ssize_t acc_show_power_period(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct fam15h_power_data *data = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%lu\n", data->power_period);
+}
+
+static ssize_t acc_set_power_period(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct fam15h_power_data *data = dev_get_drvdata(dev);
+	unsigned long temp;
+	int ret;
+
+	ret = kstrtoul(buf, 10, &temp);
+	if (ret)
+		return ret;
+
+	if (temp > MAX_INTERVAL)
+		return -EINVAL;
+
+	/* the interval value should be greater than 0 */
+	if (temp <= 0)
+		return -EINVAL;
+
+	data->power_period = temp;
+
+	return count;
+}
+static DEVICE_ATTR(power1_average_interval, S_IRUGO | S_IWUSR,
+		   acc_show_power_period, acc_set_power_period);
+
 static int fam15h_power_init_attrs(struct pci_dev *pdev,
 				   struct fam15h_power_data *data)
 {
 	int n = FAM15H_MIN_NUM_ATTRS;
 	struct attribute **fam15h_power_attrs;
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+	u32 cpuid;
 
 	if (c->x86 == 0x15 &&
 	    (c->x86_model <= 0xf ||
 	     (c->x86_model >= 0x60 && c->x86_model <= 0x7f)))
 		n += 1;
 
+	cpuid = cpuid_edx(0x80000007);
+
+	/* check if processor supports accumulated power */
+	if (cpuid & BIT(12))
+		n += 2;
+
 	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
 					  sizeof(*fam15h_power_attrs),
 					  GFP_KERNEL);
@@ -218,6 +329,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
 	     (c->x86_model >= 0x60 && c->x86_model <= 0x7f)))
 		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
 
+	if (cpuid & BIT(12)) {
+		fam15h_power_attrs[n++] = &dev_attr_power1_average.attr;
+		fam15h_power_attrs[n++] = &dev_attr_power1_average_interval.attr;
+	}
+
 	data->group.attrs = fam15h_power_attrs;
 
 	return 0;
@@ -330,6 +446,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
 
 	data->max_cu_acc_power = tmp;
 
+	/* set default interval as 10 ms */
+	data->power_period = 10;
+
 	return read_registers(data);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 5/6] hwmon: (fam15h_power) Add documentation for TDP and accumulated power algorithm
  2016-03-28  5:32 [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (3 preceding siblings ...)
  2016-03-28  5:32 ` [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
@ 2016-03-28  5:32 ` Huang Rui
  2016-03-28  5:32 ` [PATCH v5 6/6] hwmon: (fam15h_power) Add platform check function Huang Rui
  5 siblings, 0 replies; 15+ messages in thread
From: Huang Rui @ 2016-03-28  5:32 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-hwmon, linux-kernel, spg_linux_kernel, Borislav Petkov, Huang Rui

This patch adds the description to explain the TDP reporting mechanism
and accumulated power algorithm.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
---
 Documentation/hwmon/fam15h_power | 57 +++++++++++++++++++++++++++++++++++++++-
 drivers/hwmon/fam15h_power.c     |  2 +-
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/Documentation/hwmon/fam15h_power b/Documentation/hwmon/fam15h_power
index e2b1b69..2c4fbee 100644
--- a/Documentation/hwmon/fam15h_power
+++ b/Documentation/hwmon/fam15h_power
@@ -10,14 +10,22 @@ Supported chips:
   Datasheets:
   BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Processors
   BIOS and Kernel Developer's Guide (BKDG) For AMD Family 16h Processors
+  AMD64 Architecture Programmer's Manual Volume 2: System Programming
 
 Author: Andreas Herrmann <herrmann.der.user@googlemail.com>
 
 Description
 -----------
 
+1) Processor TDP (Thermal design power)
+
+Given a fixed frequency and voltage, the power consumption of a
+processor varies based on the workload being executed. Derated power
+is the power consumed when running a specific application. Thermal
+design power (TDP) is an example of derated power.
+
 This driver permits reading of registers providing power information
-of AMD Family 15h and 16h processors.
+of AMD Family 15h and 16h processors via TDP algorithm.
 
 For AMD Family 15h and 16h processors the following power values can
 be calculated using different processor northbridge function
@@ -37,3 +45,50 @@ This driver provides ProcessorPwrWatts and CurrPwrWatts:
 On multi-node processors the calculated value is for the entire
 package and not for a single node. Thus the driver creates sysfs
 attributes only for internal node0 of a multi-node processor.
+
+2) Accumulated Power Mechanism
+
+This driver also introduces an algorithm that should be used to
+calculate the average power consumed by a processor during a
+measurement interval Tm. The feature of accumulated power mechanism is
+indicated by CPUID Fn8000_0007_EDX[12].
+
+* Tsample: compute unit power accumulator sample period
+* Tref: the PTSC counter period
+* PTSC: performance timestamp counter
+* N: the ratio of compute unit power accumulator sample period to the
+  PTSC period
+* Jmax: max compute unit accumulated power which is indicated by
+  MaxCpuSwPwrAcc MSR C001007b
+* Jx/Jy: compute unit accumulated power which is indicated by
+  CpuSwPwrAcc MSR C001007a
+* Tx/Ty: the value of performance timestamp counter which is indicated
+  by CU_PTSC MSR C0010280
+* PwrCPUave: CPU average power
+
+i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
+	N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].
+
+ii. Read the full range of the cumulative energy value from the new
+MSR MaxCpuSwPwrAcc.
+	Jmax = value returned.
+iii. At time x, SW reads CpuSwPwrAcc MSR and samples the PTSC.
+	Jx = value read from CpuSwPwrAcc and Tx = value read from
+PTSC.
+
+iv. At time y, SW reads CpuSwPwrAcc MSR and samples the PTSC.
+	Jy = value read from CpuSwPwrAcc and Ty = value read from
+PTSC.
+
+v. Calculate the average power consumption for a compute unit over
+time period (y-x). Unit of result is uWatt.
+	if (Jy < Jx) // Rollover has occurred
+		Jdelta = (Jy + Jmax) - Jx
+	else
+		Jdelta = Jy - Jx
+	PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)
+
+This driver provides PwrCPUave and interval(default is 10 millisecond
+and maximum is 1 second):
+* power1_average (PwrCPUave)
+* power1_average_interval (Interval)
diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 003564b..c1cad26 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -1,7 +1,7 @@
 /*
  * fam15h_power.c - AMD Family 15h processor power monitoring
  *
- * Copyright (c) 2011 Advanced Micro Devices, Inc.
+ * Copyright (c) 2011-2016 Advanced Micro Devices, Inc.
  * Author: Andreas Herrmann <herrmann.der.user@googlemail.com>
  *
  *
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 6/6] hwmon: (fam15h_power) Add platform check function
  2016-03-28  5:32 [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (4 preceding siblings ...)
  2016-03-28  5:32 ` [PATCH v5 5/6] hwmon: (fam15h_power) Add documentation for TDP and accumulated power algorithm Huang Rui
@ 2016-03-28  5:32 ` Huang Rui
  5 siblings, 0 replies; 15+ messages in thread
From: Huang Rui @ 2016-03-28  5:32 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare
  Cc: linux-hwmon, linux-kernel, spg_linux_kernel, Borislav Petkov, Huang Rui

This patch adds a platform check function to make code more readable.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/hwmon/fam15h_power.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index c1cad26..622c646 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -78,6 +78,11 @@ struct fam15h_power_data {
 	unsigned long power_period;
 };
 
+static bool is_carrizo_or_later(void)
+{
+	return boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model >= 0x60;
+}
+
 static ssize_t show_power(struct device *dev,
 			  struct device_attribute *attr, char *buf)
 {
@@ -94,7 +99,7 @@ static ssize_t show_power(struct device *dev,
 	 * On Carrizo and later platforms, TdpRunAvgAccCap bit field
 	 * is extended to 4:31 from 4:25.
 	 */
-	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model >= 0x60) {
+	if (is_carrizo_or_later()) {
 		running_avg_capture = val >> 4;
 		running_avg_capture = sign_extend32(running_avg_capture, 27);
 	} else {
@@ -111,7 +116,7 @@ static ssize_t show_power(struct device *dev,
 	 * On Carrizo and later platforms, ApmTdpLimit bit field
 	 * is extended to 16:31 from 16:28.
 	 */
-	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model >= 0x60)
+	if (is_carrizo_or_later())
 		tdp_limit = val >> 16;
 	else
 		tdp_limit = (val >> 16) & 0x1fff;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/6] hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence
  2016-03-28  5:32 ` [PATCH v5 1/6] hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence Huang Rui
@ 2016-03-28  8:59   ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2016-03-28  8:59 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Jean Delvare, linux-hwmon, linux-kernel, spg_linux_kernel

On Mon, Mar 28, 2016 at 01:32:11PM +0800, Huang Rui wrote:
> This patch adds CONFIG_CPU_SUP_AMD as the dependence of fam15h_power
> driver. Because the following patch will use the interface from
> x86/kernel/cpu/amd.c.
> 
> Otherwise, the below error might be encountered:
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/built-in.o: In function `fam15h_power_probe':
> >> fam15h_power.c:(.text+0x26e3a3): undefined reference to
> >> `amd_get_cores_per_cu'
>    fam15h_power.c:(.text+0x26e41e): undefined reference to
> `amd_get_cores_per_cu'
> 
> Reported-by: build test robot <lkp@intel.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>

Acked-by: Borislav Petkov <bp@suse.de>

This can go in now, regardless of the rest.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power
  2016-03-28  5:32 ` [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
@ 2016-03-28  9:29   ` Borislav Petkov
  2016-03-29  3:02     ` Huang Rui
  2016-03-29  7:31     ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Borislav Petkov @ 2016-03-28  9:29 UTC (permalink / raw)
  To: Huang Rui, Peter Zijlstra, Thomas Gleixner
  Cc: Guenter Roeck, Jean Delvare, linux-hwmon, linux-kernel, spg_linux_kernel

On Mon, Mar 28, 2016 at 01:32:12PM +0800, Huang Rui wrote:
> This patch adds a member in fam15h_power_data which specifies the
> compute unit accumulated power. It adds do_read_registers_on_cu to do
> all the read to all MSRs and run it on one of the online cores on each
> compute unit with smp_call_function_many(). This behavior can decrease
> IPI numbers.
> 
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  drivers/hwmon/fam15h_power.c | 65 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index 4f695d8..ccbc944 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -25,6 +25,8 @@
>  #include <linux/module.h>
>  #include <linux/pci.h>
>  #include <linux/bitops.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <asm/processor.h>
>  #include <asm/msr.h>
>  
> @@ -44,7 +46,9 @@ MODULE_LICENSE("GPL");
>  
>  #define FAM15H_MIN_NUM_ATTRS		2
>  #define FAM15H_NUM_GROUPS		2
> +#define MAX_CUS				8
>  
> +#define MSR_F15H_CU_PWR_ACCUMULATOR	0xc001007a
>  #define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
>  
>  #define PCI_DEVICE_ID_AMD_15H_M70H_NB_F4 0x15b4
> @@ -59,6 +63,8 @@ struct fam15h_power_data {
>  	struct attribute_group group;
>  	/* maximum accumulated power of a compute unit */
>  	u64 max_cu_acc_power;
> +	/* accumulated power of the compute units */
> +	u64 cu_acc_power[MAX_CUS];
>  };
>  
>  static ssize_t show_power(struct device *dev,
> @@ -125,6 +131,63 @@ static ssize_t show_power_crit(struct device *dev,
>  }
>  static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
>  
> +static void do_read_registers_on_cu(void *_data)
> +{
> +	struct fam15h_power_data *data = _data;
> +	int cpu, cu;
> +
> +	cpu = smp_processor_id();
> +
> +	/*
> +	 * With the new x86 topology modelling, cpu core id actually
> +	 * is compute unit id.
> +	 */
> +	cu = cpu_data(cpu).cpu_core_id;
> +
> +	rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR, &data->cu_acc_power[cu]);
> +}
> +
> +/*
> + * This function is only able to be called when CPUID
> + * Fn8000_0007:EDX[12] is set.
> + */
> +static int read_registers(struct fam15h_power_data *data)
> +{
> +	int this_cpu, ret, cpu;
> +	int target;
> +	cpumask_var_t mask;
> +
> +	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> +	if (!ret)
> +		return -ENOMEM;
> +
> +	get_online_cpus();
> +	this_cpu = get_cpu();

What now?

get_online_cpus() is enough.

> +
> +	/*
> +	 * Choose the first online core of each compute unit, and then
> +	 * read their MSR value of power and ptsc in a single IPI,
> +	 * because the MSR value of CPU core represent the compute
> +	 * unit's.
> +	 */
> +	for_each_online_cpu(cpu) {
> +		target = cpumask_first(topology_sibling_cpumask(cpu));
> +		if (!cpumask_test_cpu(target, mask))
> +			cpumask_set_cpu(target, mask);
> +	}

I think you want something like this: iterate over each core and put one
of them into the mask.

	core = -1;

	for_each_online_cpu(cpu) {
		this_core = topology_core_id(cpu);

		if (this_core == core)
			continue;

		core = this_core;

		/* get any CPU on this compute unit */
		cpumask_set_cpu(cpumask_any(topology_sibling_cpumask(cpu)), mask);
	}


Btw, tglx, peterz, do you guys think it would make sense to have a
generic helper:

	for_each_core(cpu)

which would give you any of the threads on the core in the @cpu var when
iterating?

I see only arch/x86/events/intel/cstate.c doing any comparisons with
topology_core_id() now but it might be useful...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2016-03-28  5:32 ` [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
@ 2016-03-28  9:33   ` Borislav Petkov
  2016-03-29  3:28     ` Huang Rui
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2016-03-28  9:33 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Jean Delvare, linux-hwmon, linux-kernel, spg_linux_kernel

On Mon, Mar 28, 2016 at 01:32:14PM +0800, Huang Rui wrote:
> This patch introduces an algorithm that computes the average power by
> reading a delta value of “core power accumulator” register during
> measurement interval, and then dividing delta value by the length of
> the time interval.
> 
> User is able to use power1_average entry to measure the processor power
> consumption and power1_average_interval entry to set the interval.
> 
> A simple example:
> 
> ray@hr-ub:~/tip$ sensors
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1:       19.58 mW (avg =   2.55 mW, interval =   0.01 s)
>                        (crit =  15.00 W)
> 
> ...
> 
> The result is current average processor power consumption in 10
> millisecond. The unit of the result is uWatt.
> 
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> ---
>  drivers/hwmon/fam15h_power.c | 119 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 119 insertions(+)
> 
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index de6f52b..003564b 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -27,6 +27,8 @@
>  #include <linux/bitops.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> +#include <linux/time.h>
> +#include <linux/sched.h>
>  #include <asm/processor.h>
>  #include <asm/msr.h>
>  
> @@ -48,6 +50,9 @@ MODULE_LICENSE("GPL");
>  #define FAM15H_NUM_GROUPS		2
>  #define MAX_CUS				8
>  
> +/* set maximum interval as 1 second */
> +#define MAX_INTERVAL			1000
> +
>  #define MSR_F15H_CU_PWR_ACCUMULATOR	0xc001007a
>  #define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
>  #define MSR_F15H_PTSC			0xc0010280
> @@ -68,6 +73,9 @@ struct fam15h_power_data {
>  	u64 cu_acc_power[MAX_CUS];
>  	/* performance timestamp counter */
>  	u64 cpu_sw_pwr_ptsc[MAX_CUS];
> +	/* online/offline status of current compute unit */
> +	int cu_on[MAX_CUS];
> +	unsigned long power_period;
>  };
>  
>  static ssize_t show_power(struct device *dev,
> @@ -149,6 +157,8 @@ static void do_read_registers_on_cu(void *_data)
>  
>  	rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR, &data->cu_acc_power[cu]);
>  	rdmsrl_safe(MSR_F15H_PTSC, &data->cpu_sw_pwr_ptsc[cu]);
> +
> +	data->cu_on[cu] = 1;
>  }
>  
>  /*
> @@ -165,6 +175,8 @@ static int read_registers(struct fam15h_power_data *data)
>  	if (!ret)
>  		return -ENOMEM;
>  
> +	memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
> +
>  	get_online_cpus();
>  	this_cpu = get_cpu();
>  
> @@ -192,18 +204,117 @@ static int read_registers(struct fam15h_power_data *data)
>  	return 0;
>  }
>  
> +static ssize_t acc_show_power(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> +	u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
> +	    jdelta[MAX_CUS];
> +	u64 tdelta, avg_acc;
> +	int cu, cu_num, ret;
> +	signed long leftover;
> +
> +	/*
> +	 * With the new x86 topology modelling, x86_max_cores is the
> +	 * compute unit number.
> +	 */
> +	cu_num = boot_cpu_data.x86_max_cores;
> +
> +	ret = read_registers(data);
> +	if (ret)
> +		return 0;
> +
> +	for (cu = 0; cu < cu_num; cu++) {
> +		prev_cu_acc_power[cu] = data->cu_acc_power[cu];
> +		prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
> +	}
> +
> +	leftover = schedule_timeout_interruptible(msecs_to_jiffies(data->power_period));
> +	if (leftover)
> +		return 0;
> +
> +	ret = read_registers(data);
> +	if (ret)
> +		return 0;
> +
> +	for (cu = 0, avg_acc = 0; cu < cu_num; cu++) {
> +		/* check if current compute unit is online */
> +		if (data->cu_on[cu] == 0)
> +			continue;
> +
> +		if (data->cu_acc_power[cu] < prev_cu_acc_power[cu]) {
> +			jdelta[cu] = data->max_cu_acc_power + data->cu_acc_power[cu];
> +			jdelta[cu] -= prev_cu_acc_power[cu];
> +		} else {
> +			jdelta[cu] = data->cu_acc_power[cu] - prev_cu_acc_power[cu];
> +		}
> +		tdelta = data->cpu_sw_pwr_ptsc[cu] - prev_ptsc[cu];
> +		jdelta[cu] *= data->cpu_pwr_sample_ratio * 1000;
> +		do_div(jdelta[cu], tdelta);
> +
> +		/* the unit is microWatt */
> +		avg_acc += jdelta[cu];
> +	}
> +
> +	return sprintf(buf, "%llu\n", (unsigned long long)avg_acc);
> +}
> +static DEVICE_ATTR(power1_average, S_IRUGO, acc_show_power, NULL);
> +
> +static ssize_t acc_show_power_period(struct device *dev,
> +				     struct device_attribute *attr,
> +				     char *buf)
> +{
> +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> +
> +	return sprintf(buf, "%lu\n", data->power_period);
> +}
> +
> +static ssize_t acc_set_power_period(struct device *dev,
> +				    struct device_attribute *attr,
> +				    const char *buf, size_t count)
> +{
> +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> +	unsigned long temp;
> +	int ret;
> +
> +	ret = kstrtoul(buf, 10, &temp);
> +	if (ret)
> +		return ret;
> +
> +	if (temp > MAX_INTERVAL)
> +		return -EINVAL;
> +
> +	/* the interval value should be greater than 0 */
> +	if (temp <= 0)
> +		return -EINVAL;
> +
> +	data->power_period = temp;
> +
> +	return count;
> +}
> +static DEVICE_ATTR(power1_average_interval, S_IRUGO | S_IWUSR,
> +		   acc_show_power_period, acc_set_power_period);
> +
>  static int fam15h_power_init_attrs(struct pci_dev *pdev,
>  				   struct fam15h_power_data *data)
>  {
>  	int n = FAM15H_MIN_NUM_ATTRS;
>  	struct attribute **fam15h_power_attrs;
>  	struct cpuinfo_x86 *c = &boot_cpu_data;
> +	u32 cpuid;
>  
>  	if (c->x86 == 0x15 &&
>  	    (c->x86_model <= 0xf ||
>  	     (c->x86_model >= 0x60 && c->x86_model <= 0x7f)))
>  		n += 1;
>  
> +	cpuid = cpuid_edx(0x80000007);
> +
> +	/* check if processor supports accumulated power */
> +	if (cpuid & BIT(12))
> +		n += 2;

	if (boot_cpu_has(X86_FEATURE_ACC_POWER))

> +
>  	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
>  					  sizeof(*fam15h_power_attrs),
>  					  GFP_KERNEL);
> @@ -218,6 +329,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
>  	     (c->x86_model >= 0x60 && c->x86_model <= 0x7f)))
>  		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
>  
> +	if (cpuid & BIT(12)) {

Ditto.

> +		fam15h_power_attrs[n++] = &dev_attr_power1_average.attr;
> +		fam15h_power_attrs[n++] = &dev_attr_power1_average_interval.attr;
> +	}
> +
>  	data->group.attrs = fam15h_power_attrs;
>  
>  	return 0;
> @@ -330,6 +446,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
>  
>  	data->max_cu_acc_power = tmp;
>  
> +	/* set default interval as 10 ms */

Because...?

> +	data->power_period = 10;
> +
>  	return read_registers(data);
>  }
>  
> -- 
> 1.9.1
> 
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power
  2016-03-28  9:29   ` Borislav Petkov
@ 2016-03-29  3:02     ` Huang Rui
  2016-03-29  7:31     ` Peter Zijlstra
  1 sibling, 0 replies; 15+ messages in thread
From: Huang Rui @ 2016-03-29  3:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Thomas Gleixner, Guenter Roeck, Jean Delvare,
	linux-hwmon, linux-kernel, spg_linux_kernel

On Mon, Mar 28, 2016 at 11:29:52AM +0200, Borislav Petkov wrote:
> On Mon, Mar 28, 2016 at 01:32:12PM +0800, Huang Rui wrote:
> > +
> > +	get_online_cpus();
> > +	this_cpu = get_cpu();
> 
> What now?
> 
> get_online_cpus() is enough.
> 

Will remove get_cpu().

> > +
> > +	/*
> > +	 * Choose the first online core of each compute unit, and then
> > +	 * read their MSR value of power and ptsc in a single IPI,
> > +	 * because the MSR value of CPU core represent the compute
> > +	 * unit's.
> > +	 */
> > +	for_each_online_cpu(cpu) {
> > +		target = cpumask_first(topology_sibling_cpumask(cpu));
> > +		if (!cpumask_test_cpu(target, mask))
> > +			cpumask_set_cpu(target, mask);
> > +	}
> 
> I think you want something like this: iterate over each core and put one
> of them into the mask.
> 
> 	core = -1;
> 
> 	for_each_online_cpu(cpu) {
> 		this_core = topology_core_id(cpu);
> 
> 		if (this_core == core)
> 			continue;
> 
> 		core = this_core;
> 
> 		/* get any CPU on this compute unit */
> 		cpumask_set_cpu(cpumask_any(topology_sibling_cpumask(cpu)), mask);
> 	}
> 

Yep, with new x86 topology for core on AMD, using this way should be
more clear.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2016-03-28  9:33   ` Borislav Petkov
@ 2016-03-29  3:28     ` Huang Rui
  2016-03-29  7:25       ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Huang Rui @ 2016-03-29  3:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Guenter Roeck, Jean Delvare, linux-hwmon, linux-kernel, spg_linux_kernel

On Mon, Mar 28, 2016 at 11:33:27AM +0200, Borislav Petkov wrote:
> On Mon, Mar 28, 2016 at 01:32:14PM +0800, Huang Rui wrote:
> >  
> >  	return 0;
> > @@ -330,6 +446,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
> >  
> >  	data->max_cu_acc_power = tmp;
> >  
> > +	/* set default interval as 10 ms */
> 
> Because...?
> 

I checked with HW designer, milliseconds is also a reasonable interval of acc power.
And I cannot set too long here, because several seconds will cause the
read function to hang for that period of time.

So I pick 10ms here, and actually, we can update the interval at
/etc/sensors3.conf

chip "fam15h_power-*"
        set power1_average_interval 0.01

Thanks,
Rui

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2016-03-29  3:28     ` Huang Rui
@ 2016-03-29  7:25       ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2016-03-29  7:25 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Jean Delvare, linux-hwmon, linux-kernel, spg_linux_kernel

On Tue, Mar 29, 2016 at 11:28:48AM +0800, Huang Rui wrote:
> I checked with HW designer, milliseconds is also a reasonable interval
> of acc power. And I cannot set too long here, because several seconds
> will cause the read function to hang for that period of time.

Exactly this justification should be as a comment in the code above the
setting of the default value.

> So I pick 10ms here, and actually, we can update the interval at
> /etc/sensors3.conf
> 
> chip "fam15h_power-*"
>         set power1_average_interval 0.01

I think this text should be in Documentation/hwmon/fam15h_power for
users to know.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power
  2016-03-28  9:29   ` Borislav Petkov
  2016-03-29  3:02     ` Huang Rui
@ 2016-03-29  7:31     ` Peter Zijlstra
  2016-03-29  7:57       ` Borislav Petkov
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2016-03-29  7:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Thomas Gleixner, Guenter Roeck, Jean Delvare,
	linux-hwmon, linux-kernel, spg_linux_kernel

On Mon, Mar 28, 2016 at 11:29:52AM +0200, Borislav Petkov wrote:
> I think you want something like this: iterate over each core and put one
> of them into the mask.
> 
> 	core = -1;
> 
> 	for_each_online_cpu(cpu) {
> 		this_core = topology_core_id(cpu);
> 
> 		if (this_core == core)
> 			continue;
> 
> 		core = this_core;
> 
> 		/* get any CPU on this compute unit */
> 		cpumask_set_cpu(cpumask_any(topology_sibling_cpumask(cpu)), mask);
> 	}

This will not in fact work for Intel, nor if I manage to one day
randomize our CPU numbers on AMD.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power
  2016-03-29  7:31     ` Peter Zijlstra
@ 2016-03-29  7:57       ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2016-03-29  7:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Huang Rui, Thomas Gleixner, Guenter Roeck, Jean Delvare,
	linux-hwmon, linux-kernel, spg_linux_kernel

On Tue, Mar 29, 2016 at 09:31:58AM +0200, Peter Zijlstra wrote:
> This will not in fact work for Intel, nor if I manage to one day
> randomize our CPU numbers on AMD.

Oh, I know why. I have this 64 CPUs box here:

$ grep "core id" /proc/cpuinfo | uniq
core id         : 0
core id         : 8
core id         : 2
core id         : 10
core id         : 1
core id         : 9
core id         : 3
core id         : 11
core id         : 0
core id         : 8
core id         : 2
core id         : 10
core id         : 1
core id         : 9
core id         : 3
core id         : 11

Those core IDs repeat and are almost random too :)

I guess we'll need a mask. Maybe as a future exercise...

That box's topology has other funsies like this:

$ grep -E -B 2 "core id\s+: 0" /proc/cpuinfo
physical id     : 0
siblings        : 16
core id         : 0
--
physical id     : 1
siblings        : 16
core id         : 0
--
physical id     : 2
siblings        : 16
core id         : 0
--
physical id     : 3
siblings        : 16
core id         : 0
--
physical id     : 0
siblings        : 16
core id         : 0
--
physical id     : 1
siblings        : 16
core id         : 0
--
physical id     : 2
siblings        : 16
core id         : 0
--
physical id     : 3
siblings        : 16
core id         : 0

So in order to dig out which HT threads belong together, I need to look
at the (core id, physical id) pair.

I guess this is how we "fix" the schedulers of other OSes - by playing
topology games...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-03-29  7:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-28  5:32 [PATCH v5 0/6] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
2016-03-28  5:32 ` [PATCH v5 1/6] hwmon: (fam15h_power) Add CPU_SUP_AMD as the dependence Huang Rui
2016-03-28  8:59   ` Borislav Petkov
2016-03-28  5:32 ` [PATCH v5 2/6] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
2016-03-28  9:29   ` Borislav Petkov
2016-03-29  3:02     ` Huang Rui
2016-03-29  7:31     ` Peter Zijlstra
2016-03-29  7:57       ` Borislav Petkov
2016-03-28  5:32 ` [PATCH v5 3/6] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
2016-03-28  5:32 ` [PATCH v5 4/6] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
2016-03-28  9:33   ` Borislav Petkov
2016-03-29  3:28     ` Huang Rui
2016-03-29  7:25       ` Borislav Petkov
2016-03-28  5:32 ` [PATCH v5 5/6] hwmon: (fam15h_power) Add documentation for TDP and accumulated power algorithm Huang Rui
2016-03-28  5:32 ` [PATCH v5 6/6] hwmon: (fam15h_power) Add platform check function Huang Rui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.