linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm
@ 2015-10-20  2:28 Huang Rui
  2015-10-20  2:28 ` [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added Huang Rui
                   ` (9 more replies)
  0 siblings, 10 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

Hi all,

This serial of patches introduces an accumulated power reporting
algorithm. It will calculate the average power consumption for the
processor. The cpu feature flag is CPUID.8000_0007H:EDX[12].

This algorithm is used to test the comparison of processor power
consumption with between MWAITX delay and TSC delay on AMD Carrizo
platforms.

Reference:
http://marc.info/?l=linux-kernel&m=143874573111310&w=2

Commit f96756 at tip ("x86/asm: Add MONITORX/MWAITX instruction support")
Commit b466bd at tip ("x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer")

V1: http://marc.info/?l=linux-kernel&m=144066380613299&w=2

Changes from v1 -> v2:
- Move fam15h_power_groups and fam15h_power_group into fam15h_power_data to
  avoid overwrite on multi-CPU system.
- Rename FAM15H_MIN_NUM_ATTRS macro and fix return error code.
- Remove unnecessary warning print.
- Adds do_read_registers_on_cu to do all the read to all MSRs and run it on one
  of the online cores on each compute unit with smp_call_function_many().
- Use power1_average and power1_average_interval standard entry
  instread of power1_acc
- Fix the CPU-hotplug case.

A simple example:

ray@hr-ub:~/tip$ sensors
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       23.73 mW (avg = 634.63 mW, interval =   0.01 s)
                       (crit =  15.00 W)

...

These patches are rebased on groeck/hwmon-next.

Thanks,
Rui

Huang Rui (10):
  hwmon: (fam15h_power) Refactor attributes for dynamically added
  hwmon: (fam15h_power) Enable power1_input on AMD Carrizo
  hwmon: (fam15h_power) Add max compute unit accumulated power
  x86, amd: add accessor for number of cores per compute unit
  hwmon: (fam15h_power) Add compute unit accumulated power
  hwmon: (fam15h_power) Add ptsc counter value for accumulated power
  hwmon: (fam15h_power) Introduce a cpu accumulated power reporting
    algorithm
  hwmon: (fam15h_power) Add documentation for previous TDP reporting
  hwmon: (fam15h_power) Add documentation for accumulated power
    algorithm
  MAINTAINERS: change the maintainer of fam15h_power driver

 CREDITS                          |   8 ++
 Documentation/hwmon/fam15h_power |  56 +++++++-
 MAINTAINERS                      |   4 +-
 arch/x86/include/asm/msr-index.h |   1 +
 arch/x86/include/asm/processor.h |   1 +
 arch/x86/kernel/cpu/amd.c        |  19 ++-
 drivers/hwmon/fam15h_power.c     | 274 +++++++++++++++++++++++++++++++++++----
 7 files changed, 335 insertions(+), 28 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-23 13:42   ` Guenter Roeck
  2015-10-20  2:28 ` [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo Huang Rui
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

Attributes depend on the CPU model the driver gets loaded on.
Therefore, add those attributes dynamically at init time. This is more
flexible to control the different attributes on different platforms.

Suggestedy-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/hwmon/fam15h_power.c | 70 ++++++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index e80ee23..41d022e 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -41,12 +41,17 @@ MODULE_LICENSE("GPL");
 #define REG_TDP_RUNNING_AVERAGE		0xe0
 #define REG_TDP_LIMIT3			0xe8
 
+#define FAM15H_MIN_NUM_ATTRS		2
+#define FAM15H_NUM_GROUPS		2
+
 struct fam15h_power_data {
 	struct pci_dev *pdev;
 	unsigned int tdp_to_watts;
 	unsigned int base_tdp;
 	unsigned int processor_pwr_watts;
 	unsigned int cpu_pwr_sample_ratio;
+	const struct attribute_group *fam15h_power_groups[FAM15H_NUM_GROUPS];
+	struct attribute_group fam15h_power_group;
 };
 
 static ssize_t show_power(struct device *dev,
@@ -105,29 +110,31 @@ static ssize_t show_power_crit(struct device *dev,
 }
 static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
 
-static umode_t fam15h_power_is_visible(struct kobject *kobj,
-				       struct attribute *attr,
-				       int index)
+static int fam15h_power_init_attrs(struct pci_dev *pdev,
+				   struct fam15h_power_data *data)
 {
-	/* power1_input is only reported for Fam15h, Models 00h-0fh */
-	if (attr == &dev_attr_power1_input.attr &&
-	   (boot_cpu_data.x86 != 0x15 || boot_cpu_data.x86_model > 0xf))
-		return 0;
+	int n = FAM15H_MIN_NUM_ATTRS;
+	struct attribute **fam15h_power_attrs;
 
-	return attr->mode;
-}
+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
+		n += 1;
 
-static struct attribute *fam15h_power_attrs[] = {
-	&dev_attr_power1_input.attr,
-	&dev_attr_power1_crit.attr,
-	NULL
-};
+	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
+					  sizeof(*fam15h_power_attrs),
+					  GFP_KERNEL);
 
-static const struct attribute_group fam15h_power_group = {
-	.attrs = fam15h_power_attrs,
-	.is_visible = fam15h_power_is_visible,
-};
-__ATTRIBUTE_GROUPS(fam15h_power);
+	if (!fam15h_power_attrs)
+		return -ENOMEM;
+
+	n = 0;
+	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
+		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
+
+	data->fam15h_power_group.attrs = fam15h_power_attrs;
+
+	return 0;
+}
 
 static bool should_load_on_this_node(struct pci_dev *f4)
 {
@@ -186,11 +193,12 @@ static int fam15h_power_resume(struct pci_dev *pdev)
 #define fam15h_power_resume NULL
 #endif
 
-static void fam15h_power_init_data(struct pci_dev *f4,
-					     struct fam15h_power_data *data)
+static int fam15h_power_init_data(struct pci_dev *f4,
+				  struct fam15h_power_data *data)
 {
 	u32 val, eax, ebx, ecx, edx;
 	u64 tmp;
+	int ret;
 
 	pci_read_config_dword(f4, REG_PROCESSOR_TDP, &val);
 	data->base_tdp = val >> 16;
@@ -211,11 +219,15 @@ static void fam15h_power_init_data(struct pci_dev *f4,
 	/* convert to microWatt */
 	data->processor_pwr_watts = (tmp * 15625) >> 10;
 
+	ret = fam15h_power_init_attrs(f4, data);
+	if (ret)
+		return ret;
+
 	cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
 
 	/* CPUID Fn8000_0007:EDX[12] indicates to support accumulated power */
 	if (!(edx & BIT(12)))
-		return;
+		return 0;
 
 	/*
 	 * determine the ratio of the compute unit power accumulator
@@ -223,14 +235,17 @@ static void fam15h_power_init_data(struct pci_dev *f4,
 	 * Fn8000_0007:ECX
 	 */
 	data->cpu_pwr_sample_ratio = ecx;
+
+	return 0;
 }
 
 static int fam15h_power_probe(struct pci_dev *pdev,
-					const struct pci_device_id *id)
+			      const struct pci_device_id *id)
 {
 	struct fam15h_power_data *data;
 	struct device *dev = &pdev->dev;
 	struct device *hwmon_dev;
+	int ret;
 
 	/*
 	 * though we ignore every other northbridge, we still have to
@@ -246,12 +261,17 @@ static int fam15h_power_probe(struct pci_dev *pdev,
 	if (!data)
 		return -ENOMEM;
 
-	fam15h_power_init_data(pdev, data);
+	ret = fam15h_power_init_data(pdev, data);
+	if (ret)
+		return ret;
+
 	data->pdev = pdev;
 
+	data->fam15h_power_groups[0] = &data->fam15h_power_group;
+
 	hwmon_dev = devm_hwmon_device_register_with_groups(dev, "fam15h_power",
 							   data,
-							   fam15h_power_groups);
+							   (const struct attribute_group **)&data->fam15h_power_groups);
 	return PTR_ERR_OR_ZERO(hwmon_dev);
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
  2015-10-20  2:28 ` [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-23 13:45   ` Guenter Roeck
  2015-10-20  2:28 ` [PATCH v2 03/10] hwmon: (fam15h_power) Add max compute unit accumulated power Huang Rui
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

This patch enables power1_input attribute for Carrizo platform.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/hwmon/fam15h_power.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 41d022e..a090adf 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -115,8 +115,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
 {
 	int n = FAM15H_MIN_NUM_ATTRS;
 	struct attribute **fam15h_power_attrs;
+	struct cpuinfo_x86 *c = &boot_cpu_data;
 
-	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
+	if (c->x86 == 0x15 &&
+	    ((c->x86_model <= 0xf) ||
+	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
 		n += 1;
 
 	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
@@ -128,7 +131,9 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
 
 	n = 0;
 	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
-	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
+	if (c->x86 == 0x15 &&
+	    ((c->x86_model <= 0xf) ||
+	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
 		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
 
 	data->fam15h_power_group.attrs = fam15h_power_attrs;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 03/10] hwmon: (fam15h_power) Add max compute unit accumulated power
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
  2015-10-20  2:28 ` [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added Huang Rui
  2015-10-20  2:28 ` [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-20  2:28 ` [PATCH v2 04/10] x86, amd: add accessor for number of cores per compute unit Huang Rui
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

This patch adds a member in fam15h_power_data which specifies the
maximum accumulated power in a compute unit.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/hwmon/fam15h_power.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index a090adf..e2bfab5 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -26,6 +26,7 @@
 #include <linux/pci.h>
 #include <linux/bitops.h>
 #include <asm/processor.h>
+#include <asm/msr.h>
 
 MODULE_DESCRIPTION("AMD Family 15h CPU processor power monitor");
 MODULE_AUTHOR("Andreas Herrmann <herrmann.der.user@googlemail.com>");
@@ -44,6 +45,8 @@ MODULE_LICENSE("GPL");
 #define FAM15H_MIN_NUM_ATTRS		2
 #define FAM15H_NUM_GROUPS		2
 
+#define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
+
 struct fam15h_power_data {
 	struct pci_dev *pdev;
 	unsigned int tdp_to_watts;
@@ -52,6 +55,8 @@ struct fam15h_power_data {
 	unsigned int cpu_pwr_sample_ratio;
 	const struct attribute_group *fam15h_power_groups[FAM15H_NUM_GROUPS];
 	struct attribute_group fam15h_power_group;
+	/* maximum accumulated power of a compute unit */
+	u64 max_cu_acc_power;
 };
 
 static ssize_t show_power(struct device *dev,
@@ -241,6 +246,13 @@ static int fam15h_power_init_data(struct pci_dev *f4,
 	 */
 	data->cpu_pwr_sample_ratio = ecx;
 
+	if (rdmsrl_safe(MSR_F15H_CU_MAX_PWR_ACCUMULATOR, &tmp)) {
+		pr_err("Failed to read max compute unit power accumulator MSR\n");
+		return -ENODEV;
+	}
+
+	data->max_cu_acc_power = tmp;
+
 	return 0;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 04/10] x86, amd: add accessor for number of cores per compute unit
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (2 preceding siblings ...)
  2015-10-20  2:28 ` [PATCH v2 03/10] hwmon: (fam15h_power) Add max compute unit accumulated power Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-20  2:28 ` [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

Add an accessor function amd_get_cores_per_cu() which returns the
number of cores per compute unit. In multiple CPUs, they always have
the same number of cores per compute unit.

In a subsequent patch, we will use this function in fam15h_power
driver.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/processor.h |  1 +
 arch/x86/kernel/cpu/amd.c        | 19 +++++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 19577dd..831ad682 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -810,6 +810,7 @@ static inline int mpx_disable_management(void)
 
 extern u16 amd_get_nb_id(int cpu);
 extern u32 amd_get_nodes_per_socket(void);
+extern u32 amd_get_cores_per_cu(void);
 
 static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
 {
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4a70fc6..a914b1b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -27,6 +27,9 @@
  */
 static u32 nodes_per_socket = 1;
 
+/* cores_per_cu: stores the number of cores per compute unit */
+static u32 cores_per_cu = 1;
+
 static inline int rdmsrl_amd_safe(unsigned msr, unsigned long long *p)
 {
 	u32 gprs[8] = { 0 };
@@ -299,7 +302,6 @@ static int nearby_node(int apicid)
 #ifdef CONFIG_SMP
 static void amd_get_topology(struct cpuinfo_x86 *c)
 {
-	u32 cores_per_cu = 1;
 	u8 node_id;
 	int cpu = smp_processor_id();
 
@@ -314,7 +316,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
 		/* get compute unit information */
 		smp_num_siblings = ((ebx >> 8) & 3) + 1;
 		c->compute_unit_id = ebx & 0xff;
-		cores_per_cu += ((ebx >> 8) & 3);
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
 		u64 value;
 
@@ -380,6 +381,13 @@ u32 amd_get_nodes_per_socket(void)
 }
 EXPORT_SYMBOL_GPL(amd_get_nodes_per_socket);
 
+/* this function returns the number of cores per compute unit */
+u32 amd_get_cores_per_cu(void)
+{
+	return cores_per_cu;
+}
+EXPORT_SYMBOL_GPL(amd_get_cores_per_cu);
+
 static void srat_detect_node(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_NUMA
@@ -510,6 +518,13 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 
 	if (cpu_has(c, X86_FEATURE_MWAITX))
 		use_mwaitx_delay();
+
+	if (cpu_has_topoext) {
+		u32 cpuid;
+
+		cpuid = cpuid_ebx(0x8000001e);
+		cores_per_cu += ((cpuid >> 8) & 3);
+	}
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (3 preceding siblings ...)
  2015-10-20  2:28 ` [PATCH v2 04/10] x86, amd: add accessor for number of cores per compute unit Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-20  7:24   ` kbuild test robot
  2015-10-23 13:27   ` Borislav Petkov
  2015-10-20  2:28 ` [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

This patch adds a member in fam15h_power_data which specifies the
compute unit accumulated power. It adds do_read_registers_on_cu to do
all the read to all MSRs and run it on one of the online cores on each
compute unit with smp_call_function_many(). This behavior can decrease
IPI numbers.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/hwmon/fam15h_power.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index e2bfab5..88e4f3e 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -25,6 +25,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/bitops.h>
+#include <linux/cpumask.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 
@@ -44,7 +45,9 @@ MODULE_LICENSE("GPL");
 
 #define FAM15H_MIN_NUM_ATTRS		2
 #define FAM15H_NUM_GROUPS		2
+#define MAX_CUS				8
 
+#define MSR_F15H_CU_PWR_ACCUMULATOR	0xc001007a
 #define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
 
 struct fam15h_power_data {
@@ -57,6 +60,8 @@ struct fam15h_power_data {
 	struct attribute_group fam15h_power_group;
 	/* maximum accumulated power of a compute unit */
 	u64 max_cu_acc_power;
+	/* accumulated power of the compute units */
+	u64 cu_acc_power[MAX_CUS];
 };
 
 static ssize_t show_power(struct device *dev,
@@ -115,6 +120,65 @@ static ssize_t show_power_crit(struct device *dev,
 }
 static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
 
+static void do_read_registers_on_cu(void *_data)
+{
+	struct fam15h_power_data *data = _data;
+	int cpu, cu, cores_per_cu;
+
+	cpu = smp_processor_id();
+
+	cores_per_cu = amd_get_cores_per_cu();
+	cu = cpu / cores_per_cu;
+
+	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
+			    &data->cu_acc_power[cu]));
+}
+
+static int read_registers(struct fam15h_power_data *data)
+{
+	int this_cpu, ret;
+	int cu_num, cores_per_cu, cpu, cu;
+	cpumask_var_t mask;
+
+	cores_per_cu = amd_get_cores_per_cu();
+	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
+
+	WARN_ON_ONCE(cu_num > MAX_CUS);
+
+	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
+	if (!ret)
+		return -ENOMEM;
+
+	this_cpu = get_cpu();
+
+	/*
+	 * Choose the first online core of each compute unit, and then
+	 * read their MSR value of power and ptsc in one time of IPI,
+	 * because the MSR value of cpu core represent the compute
+	 * unit's. This behavior can decrease IPI numbers between the
+	 * cores.
+	 */
+	cpu = cpumask_first(cpu_online_mask);
+	cu = cpu / cores_per_cu;
+	while (cpu < boot_cpu_data.x86_max_cores) {
+		if (cu <= cpu / cores_per_cu) {
+			cu = cpu / cores_per_cu + 1;
+			cpumask_set_cpu(cpu, mask);
+		}
+		cpu = cpumask_next(cu * cores_per_cu - 1, cpu_online_mask);
+	}
+
+	if (cpumask_test_cpu(this_cpu, mask))
+		do_read_registers_on_cu(data);
+
+	smp_call_function_many(mask, do_read_registers_on_cu, data, true);
+	put_cpu();
+
+	free_cpumask_var(mask);
+
+	return 0;
+}
+
 static int fam15h_power_init_attrs(struct pci_dev *pdev,
 				   struct fam15h_power_data *data)
 {
@@ -253,7 +317,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
 
 	data->max_cu_acc_power = tmp;
 
-	return 0;
+	ret = read_registers(data);
+
+	return ret;
 }
 
 static int fam15h_power_probe(struct pci_dev *pdev,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for accumulated power
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (4 preceding siblings ...)
  2015-10-20  2:28 ` [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-23 13:59   ` Guenter Roeck
  2015-10-20  2:28 ` [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

PTSC is the performance timestamp counter value in a cpu core and the
cores in one compute unit have the fixed frequency. So it picks up the
performance timestamp counter value of the first core per compute unit
to measure the interval for average power per compute unit.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/msr-index.h | 1 +
 drivers/hwmon/fam15h_power.c     | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index c1c0a1c..3686eaa 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -313,6 +313,7 @@
 #define MSR_F15H_PERF_CTR		0xc0010201
 #define MSR_F15H_NB_PERF_CTL		0xc0010240
 #define MSR_F15H_NB_PERF_CTR		0xc0010241
+#define MSR_F15H_PTSC			0xc0010280
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 88e4f3e..6321f73 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -62,6 +62,8 @@ struct fam15h_power_data {
 	u64 max_cu_acc_power;
 	/* accumulated power of the compute units */
 	u64 cu_acc_power[MAX_CUS];
+	/* performance timestamp counter */
+	u64 cpu_sw_pwr_ptsc[MAX_CUS];
 };
 
 static ssize_t show_power(struct device *dev,
@@ -132,6 +134,9 @@ static void do_read_registers_on_cu(void *_data)
 
 	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
 			    &data->cu_acc_power[cu]));
+
+	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
+			    &data->cpu_sw_pwr_ptsc[cu]));
 }
 
 static int read_registers(struct fam15h_power_data *data)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (5 preceding siblings ...)
  2015-10-20  2:28 ` [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-23 13:28   ` Borislav Petkov
  2015-10-23 14:20   ` Guenter Roeck
  2015-10-20  2:28 ` [PATCH v2 08/10] hwmon: (fam15h_power) Add documentation for previous TDP reporting Huang Rui
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

This patch introduces an algorithm that computes the average power by
reading a delta value of “core power accumulator” register during
measurement interval, and then dividing delta value by the length of
the time interval.

User is able to use power1_average entry to measure the processor power
consumption and power1_average_interval entry to set the interval.

A simple example:

ray@hr-ub:~/tip$ sensors
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       23.73 mW (avg = 634.63 mW, interval =   0.01 s)
                       (crit =  15.00 W)

...

The result is current average processor power consumption in 10
millisecond. The unit of the result is uWatt.

Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/hwmon/fam15h_power.c | 120 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)

diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
index 6321f73..a5a539e 100644
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -26,6 +26,9 @@
 #include <linux/pci.h>
 #include <linux/bitops.h>
 #include <linux/cpumask.h>
+#include <linux/mutex.h>
+#include <linux/time.h>
+#include <linux/sched.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 
@@ -64,6 +67,10 @@ struct fam15h_power_data {
 	u64 cu_acc_power[MAX_CUS];
 	/* performance timestamp counter */
 	u64 cpu_sw_pwr_ptsc[MAX_CUS];
+	/* online/offline status of current compute unit */
+	int cu_on[MAX_CUS];
+	unsigned long power_period;
+	struct mutex acc_pwr_mutex;
 };
 
 static ssize_t show_power(struct device *dev,
@@ -132,11 +139,15 @@ static void do_read_registers_on_cu(void *_data)
 	cores_per_cu = amd_get_cores_per_cu();
 	cu = cpu / cores_per_cu;
 
+	mutex_lock(&data->acc_pwr_mutex);
 	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
 			    &data->cu_acc_power[cu]));
 
 	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
 			    &data->cpu_sw_pwr_ptsc[cu]));
+
+	data->cu_on[cu] = 1;
+	mutex_unlock(&data->acc_pwr_mutex);
 }
 
 static int read_registers(struct fam15h_power_data *data)
@@ -148,6 +159,10 @@ static int read_registers(struct fam15h_power_data *data)
 	cores_per_cu = amd_get_cores_per_cu();
 	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
 
+	mutex_lock(&data->acc_pwr_mutex);
+	memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
+	mutex_unlock(&data->acc_pwr_mutex);
+
 	WARN_ON_ONCE(cu_num > MAX_CUS);
 
 	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
@@ -184,18 +199,113 @@ static int read_registers(struct fam15h_power_data *data)
 	return 0;
 }
 
+static ssize_t acc_show_power(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct fam15h_power_data *data = dev_get_drvdata(dev);
+	u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
+	    jdelta[MAX_CUS];
+	u64 tdelta, avg_acc;
+	int cu, cu_num, cores_per_cu, ret;
+	signed long leftover;
+
+	cores_per_cu = amd_get_cores_per_cu();
+	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
+
+	ret = read_registers(data);
+	if (ret)
+		return 0;
+
+	cu = 0;
+	while(cu++ < cu_num) {
+		prev_cu_acc_power[cu] = data->cu_acc_power[cu];
+		prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
+	}
+
+	leftover = schedule_timeout_interruptible(
+			msecs_to_jiffies(data->power_period)
+		   );
+	if (leftover)
+		return 0;
+
+	ret = read_registers(data);
+	if (ret)
+		return 0;
+
+	for (cu = 0, avg_acc = 0; cu < cu_num; cu++) {
+		/* check if current compute unit is online */
+		if (data->cu_on[cu] == 0)
+			continue;
+
+		if (data->cu_acc_power[cu] < prev_cu_acc_power[cu]) {
+			jdelta[cu] = data->max_cu_acc_power + data->cu_acc_power[cu];
+			jdelta[cu] -= prev_cu_acc_power[cu];
+		} else {
+			jdelta[cu] = data->cu_acc_power[cu] - prev_cu_acc_power[cu];
+		}
+		tdelta = data->cpu_sw_pwr_ptsc[cu] - prev_ptsc[cu];
+		jdelta[cu] *= data->cpu_pwr_sample_ratio * 1000;
+		do_div(jdelta[cu], tdelta);
+
+		/* the unit is microWatt */
+		avg_acc += jdelta[cu];
+	}
+
+	return sprintf(buf, "%llu\n", (unsigned long long)avg_acc);
+}
+static DEVICE_ATTR(power1_average, S_IRUGO, acc_show_power, NULL);
+
+
+static ssize_t acc_show_power_period(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct fam15h_power_data *data = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%lu\n", data->power_period);
+}
+
+static ssize_t acc_set_power_period(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct fam15h_power_data *data = dev_get_drvdata(dev);
+	unsigned long temp;
+	int ret;
+
+	ret = kstrtoul(buf, 10, &temp);
+	if (ret)
+		return ret;
+
+	mutex_lock(&data->acc_pwr_mutex);
+	data->power_period = temp;
+	mutex_unlock(&data->acc_pwr_mutex);
+
+	return count;
+}
+static DEVICE_ATTR(power1_average_interval, S_IRUGO | S_IWUSR,
+		   acc_show_power_period, acc_set_power_period);
+
 static int fam15h_power_init_attrs(struct pci_dev *pdev,
 				   struct fam15h_power_data *data)
 {
 	int n = FAM15H_MIN_NUM_ATTRS;
 	struct attribute **fam15h_power_attrs;
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+	u32 cpuid;
 
 	if (c->x86 == 0x15 &&
 	    ((c->x86_model <= 0xf) ||
 	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
 		n += 1;
 
+	cpuid = cpuid_edx(0x80000007);
+
+	/* check if processor supports accumulated power */
+	if (cpuid & BIT(12))
+		n += 2;
+
 	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
 					  sizeof(*fam15h_power_attrs),
 					  GFP_KERNEL);
@@ -210,6 +320,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
 	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
 		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
 
+	if (cpuid & BIT(12)) {
+		fam15h_power_attrs[n++] = &dev_attr_power1_average.attr;
+		fam15h_power_attrs[n++] = &dev_attr_power1_average_interval.attr;
+	}
+
 	data->fam15h_power_group.attrs = fam15h_power_attrs;
 
 	return 0;
@@ -322,6 +437,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
 
 	data->max_cu_acc_power = tmp;
 
+	/* set default interval as 10 ms */
+	data->power_period = 10;
+
 	ret = read_registers(data);
 
 	return ret;
@@ -349,6 +467,8 @@ static int fam15h_power_probe(struct pci_dev *pdev,
 	if (!data)
 		return -ENOMEM;
 
+	mutex_init(&data->acc_pwr_mutex);
+
 	ret = fam15h_power_init_data(pdev, data);
 	if (ret)
 		return ret;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 08/10] hwmon: (fam15h_power) Add documentation for previous TDP reporting
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (6 preceding siblings ...)
  2015-10-20  2:28 ` [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-20  2:28 ` [PATCH v2 09/10] hwmon: (fam15h_power) Add documentation for accumulated power algorithm Huang Rui
  2015-10-20  2:28 ` [PATCH v2 10/10] MAINTAINERS: change the maintainer of fam15h_power driver Huang Rui
  9 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

This patch adds description to explain the TDP reporting mechanism of
fam15h_power driver.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 Documentation/hwmon/fam15h_power | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/hwmon/fam15h_power b/Documentation/hwmon/fam15h_power
index e2b1b69..42bf04e 100644
--- a/Documentation/hwmon/fam15h_power
+++ b/Documentation/hwmon/fam15h_power
@@ -10,14 +10,22 @@ Supported chips:
   Datasheets:
   BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Processors
   BIOS and Kernel Developer's Guide (BKDG) For AMD Family 16h Processors
+  AMD64 Architecture Programmer's Manual Volume 2: System Programming
 
 Author: Andreas Herrmann <herrmann.der.user@googlemail.com>
 
 Description
 -----------
 
+1) Processor TDP (Thermal design power)
+
+Given a fixed frequency and voltage, the power consumption of a
+processor varies based on the workload being executed. Derated power
+is the power consumed when running a specific application. Thermal
+design power (TDP) is an example of derated power.
+
 This driver permits reading of registers providing power information
-of AMD Family 15h and 16h processors.
+of AMD Family 15h and 16h processors via TDP algorithm.
 
 For AMD Family 15h and 16h processors the following power values can
 be calculated using different processor northbridge function
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 09/10] hwmon: (fam15h_power) Add documentation for accumulated power algorithm
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (7 preceding siblings ...)
  2015-10-20  2:28 ` [PATCH v2 08/10] hwmon: (fam15h_power) Add documentation for previous TDP reporting Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  2015-10-23 10:22   ` Borislav Petkov
  2015-10-20  2:28 ` [PATCH v2 10/10] MAINTAINERS: change the maintainer of fam15h_power driver Huang Rui
  9 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

This patch adds the description to explain the accumulated power
algorithm.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 Documentation/hwmon/fam15h_power | 46 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/Documentation/hwmon/fam15h_power b/Documentation/hwmon/fam15h_power
index 42bf04e..74cfbbf 100644
--- a/Documentation/hwmon/fam15h_power
+++ b/Documentation/hwmon/fam15h_power
@@ -45,3 +45,49 @@ This driver provides ProcessorPwrWatts and CurrPwrWatts:
 On multi-node processors the calculated value is for the entire
 package and not for a single node. Thus the driver creates sysfs
 attributes only for internal node0 of a multi-node processor.
+
+2) Accumulated Power Mechanism
+
+This driver also introduces an algorithm that should be used to
+calculate the average power consumed by a processor during a
+measurement interval Tm. The feature of accumulated power mechanism is
+indicated by CPUID Fn8000_0007_EDX[12].
+
+* Tsample: compute unit power accumulator sample period
+* Tref: the PTSC counter period
+* PTSC: performance timestamp counter
+* N: the ratio of compute unit power accumulator sample period to the
+  PTSC period
+* Jmax: max compute unit accumulated power which is indicated by
+  MaxCpuSwPwrAcc MSR C001007b
+* Jx/Jy: compute unit accumulated power which is indicated by
+  CpuSwPwrAcc MSR C001007a
+* Tx/Ty: the value of performance timestamp counter which is indicated
+  by CU_PTSC MSR C0010280
+* PwrCPUave: CPU average power
+
+i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
+	N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].
+
+ii. Read the full range of the cumulative energy value from the new
+MSR MaxCpuSwPwrAcc.
+	Jmax = value returned.
+iii. At time x, SW reads CpuSwPwrAcc MSR and samples the PTSC.
+	Jx = value read from CpuSwPwrAcc and Tx = value read from
+PTSC.
+
+iv. At time y, SW reads CpuSwPwrAcc MSR and samples the PTSC.
+	Jy = value read from CpuSwPwrAcc and Ty = value read from
+PTSC.
+
+v. Calculate the average power consumption for a compute unit over
+time period (y-x). Unit of result is uWatt.
+	if (Jy < Jx) // Rollover has occurred
+		Jdelta = (Jy + Jmax) - Jx
+	else
+		Jdelta = Jy - Jx
+	PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)
+
+This driver provides PwrCPUave and interval(default is 10 millisecond):
+* power1_average (PwrCPUave)
+* power1_average_interval (Interval)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 10/10] MAINTAINERS: change the maintainer of fam15h_power driver
  2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
                   ` (8 preceding siblings ...)
  2015-10-20  2:28 ` [PATCH v2 09/10] hwmon: (fam15h_power) Add documentation for accumulated power algorithm Huang Rui
@ 2015-10-20  2:28 ` Huang Rui
  9 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-20  2:28 UTC (permalink / raw)
  To: Borislav Petkov, Guenter Roeck, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li, Huang Rui

Andreas Herrmann won't take the maintainer of fam15h_power driver. I
will take it and appreciate him for the great contributions on this
driver.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 CREDITS     | 8 ++++++++
 MAINTAINERS | 4 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/CREDITS b/CREDITS
index 8207cc6..30bdce8 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1507,6 +1507,14 @@ S: 312/107 Canberra Avenue
 S: Griffith, ACT 2603 
 S: Australia
 
+N: Andreas Herrmann
+E: herrmann.der.user@gmail.com
+E: herrmann.der.user@googlemail.com
+D: Key developer of x86/AMD64
+D: Author of AMD family 15h processor power monintoring driver
+D: Maintainer of AMD Athlon 64 and Opteron processor frequency driver
+S: Germany
+
 N: Sebastian Hetze
 E: she@lunetix.de
 D: German Linux Documentation,
diff --git a/MAINTAINERS b/MAINTAINERS
index 7ba7ab7..639a957 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -608,9 +608,9 @@ F:	drivers/crypto/ccp/
 F:	include/linux/ccp.h
 
 AMD FAM15H PROCESSOR POWER MONITORING DRIVER
-M:	Andreas Herrmann <herrmann.der.user@googlemail.com>
+M:	Huang Rui <ray.huang@amd.com>
 L:	lm-sensors@lm-sensors.org
-S:	Maintained
+S:	Supported
 F:	Documentation/hwmon/fam15h_power
 F:	drivers/hwmon/fam15h_power.c
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-20  2:28 ` [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
@ 2015-10-20  7:24   ` kbuild test robot
  2015-10-21  1:42     ` Huang Rui
  2015-10-23 13:27   ` Borislav Petkov
  1 sibling, 1 reply; 39+ messages in thread
From: kbuild test robot @ 2015-10-20  7:24 UTC (permalink / raw)
  To: Huang Rui
  Cc: kbuild-all, Borislav Petkov, Guenter Roeck, Peter Zijlstra,
	Jean Delvare, Andy Lutomirski, Andreas Herrmann, Thomas Gleixner,
	Ingo Molnar, Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li, Huang Rui

[-- Attachment #1: Type: text/plain, Size: 865 bytes --]

Hi Huang,

[auto build test ERROR on hwmon/hwmon-next -- if it's inappropriate base, please suggest rules for selecting the more suitable base]

url:    https://github.com/0day-ci/linux/commits/Huang-Rui/hwmon-fam15h_power-Introduce-an-accumulated-power-reporting-algorithm/20151020-110712
config: x86_64-randconfig-s2-10201413 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `fam15h_power_probe':
>> fam15h_power.c:(.text+0x26e3a3): undefined reference to `amd_get_cores_per_cu'
   fam15h_power.c:(.text+0x26e41e): undefined reference to `amd_get_cores_per_cu'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24958 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-20  7:24   ` kbuild test robot
@ 2015-10-21  1:42     ` Huang Rui
  2015-10-21  2:15       ` Guenter Roeck
  0 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-21  1:42 UTC (permalink / raw)
  To: kbuild test robot, Yuanhan Liu, Fengguang Wu
  Cc: kbuild-all, Borislav Petkov, Guenter Roeck, Peter Zijlstra,
	Jean Delvare, Andy Lutomirski, Andreas Herrmann, Thomas Gleixner,
	Ingo Molnar, Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Aaron Lu, Tony Li

On Tue, Oct 20, 2015 at 03:24:09PM +0800, kbuild test robot wrote:
> Hi Huang,
> 
> [auto build test ERROR on hwmon/hwmon-next -- if it's inappropriate base, please suggest rules for selecting the more suitable base]
> 
> url:    https://github.com/0day-ci/linux/commits/Huang-Rui/hwmon-fam15h_power-Introduce-an-accumulated-power-reporting-algorithm/20151020-110712
> config: x86_64-randconfig-s2-10201413 (attached as .config)
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/built-in.o: In function `fam15h_power_probe':
> >> fam15h_power.c:(.text+0x26e3a3): undefined reference to `amd_get_cores_per_cu'
>    fam15h_power.c:(.text+0x26e41e): undefined reference to `amd_get_cores_per_cu'
> 

Thanks to report this issue. :)
The root cause is that the test config doesn't enable
CONFIG_CPU_SUP_AMD.

How about below fix:

---
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 796569ee..603eadd 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -288,7 +288,7 @@ config SENSORS_K10TEMP
 
 config SENSORS_FAM15H_POWER
 	tristate "AMD Family 15h processor power"
-	depends on X86 && PCI
+	depends on X86 && PCI && CPU_SUP_AMD
 	help
 	  If you say yes here you get support for processor power
 	  information of your AMD family 15h CPU.

---

Thanks,
Rui

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-21  1:42     ` Huang Rui
@ 2015-10-21  2:15       ` Guenter Roeck
  2015-10-21  2:40         ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Guenter Roeck @ 2015-10-21  2:15 UTC (permalink / raw)
  To: Huang Rui, kbuild test robot, Yuanhan Liu, Fengguang Wu
  Cc: kbuild-all, Borislav Petkov, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Aaron Lu, Tony Li

On 10/20/2015 06:42 PM, Huang Rui wrote:
> On Tue, Oct 20, 2015 at 03:24:09PM +0800, kbuild test robot wrote:
>> Hi Huang,
>>
>> [auto build test ERROR on hwmon/hwmon-next -- if it's inappropriate base, please suggest rules for selecting the more suitable base]
>>
>> url:    https://github.com/0day-ci/linux/commits/Huang-Rui/hwmon-fam15h_power-Introduce-an-accumulated-power-reporting-algorithm/20151020-110712
>> config: x86_64-randconfig-s2-10201413 (attached as .config)
>> reproduce:
>>          # save the attached .config to linux build tree
>>          make ARCH=x86_64
>>
>> All errors (new ones prefixed by >>):
>>
>>     drivers/built-in.o: In function `fam15h_power_probe':
>>>> fam15h_power.c:(.text+0x26e3a3): undefined reference to `amd_get_cores_per_cu'
>>     fam15h_power.c:(.text+0x26e41e): undefined reference to `amd_get_cores_per_cu'
>>
>
> Thanks to report this issue. :)
> The root cause is that the test config doesn't enable
> CONFIG_CPU_SUP_AMD.
>
> How about below fix:
>

Guess you don't have a choice.

Guenter

> ---
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 796569ee..603eadd 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -288,7 +288,7 @@ config SENSORS_K10TEMP
>
>   config SENSORS_FAM15H_POWER
>   	tristate "AMD Family 15h processor power"
> -	depends on X86 && PCI
> +	depends on X86 && PCI && CPU_SUP_AMD
>   	help
>   	  If you say yes here you get support for processor power
>   	  information of your AMD family 15h CPU.
>
> ---
>
> Thanks,
> Rui
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-21  2:15       ` Guenter Roeck
@ 2015-10-21  2:40         ` Huang Rui
  2015-10-21  2:49           ` Guenter Roeck
  0 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-21  2:40 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: kbuild test robot, Yuanhan Liu, Fengguang Wu, kbuild-all,
	Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Aaron Lu, Tony Li

On Tue, Oct 20, 2015 at 07:15:25PM -0700, Guenter Roeck wrote:
> On 10/20/2015 06:42 PM, Huang Rui wrote:
> >On Tue, Oct 20, 2015 at 03:24:09PM +0800, kbuild test robot wrote:
> >>Hi Huang,
> >>
> >>[auto build test ERROR on hwmon/hwmon-next -- if it's inappropriate base, please suggest rules for selecting the more suitable base]
> >>
> >>url:    https://github.com/0day-ci/linux/commits/Huang-Rui/hwmon-fam15h_power-Introduce-an-accumulated-power-reporting-algorithm/20151020-110712
> >>config: x86_64-randconfig-s2-10201413 (attached as .config)
> >>reproduce:
> >>         # save the attached .config to linux build tree
> >>         make ARCH=x86_64
> >>
> >>All errors (new ones prefixed by >>):
> >>
> >>    drivers/built-in.o: In function `fam15h_power_probe':
> >>>>fam15h_power.c:(.text+0x26e3a3): undefined reference to `amd_get_cores_per_cu'
> >>    fam15h_power.c:(.text+0x26e41e): undefined reference to `amd_get_cores_per_cu'
> >>
> >
> >Thanks to report this issue. :)
> >The root cause is that the test config doesn't enable
> >CONFIG_CPU_SUP_AMD.
> >
> >How about below fix:
> >
> 
> Guess you don't have a choice.
> 

Yes, if I use test config, fam15h_power isn't chosen. :)

And if I use the "select" flag like below, fam15h_power can be built
successfully.

---
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 796569ee..50b4fef 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -289,6 +289,7 @@ config SENSORS_K10TEMP
 config SENSORS_FAM15H_POWER
 	tristate "AMD Family 15h processor power"
 	depends on X86 && PCI
+	select CPU_SUP_AMD
 	help
 	  If you say yes here you get support for processor power
 	  information of your AMD family 15h CPU.
---

Thanks,
Rui

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-21  2:40         ` Huang Rui
@ 2015-10-21  2:49           ` Guenter Roeck
  2015-10-21  3:04             ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Guenter Roeck @ 2015-10-21  2:49 UTC (permalink / raw)
  To: Huang Rui
  Cc: kbuild test robot, Yuanhan Liu, Fengguang Wu, kbuild-all,
	Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Aaron Lu, Tony Li

On 10/20/2015 07:40 PM, Huang Rui wrote:
> On Tue, Oct 20, 2015 at 07:15:25PM -0700, Guenter Roeck wrote:
>> On 10/20/2015 06:42 PM, Huang Rui wrote:
>>> On Tue, Oct 20, 2015 at 03:24:09PM +0800, kbuild test robot wrote:
>>>> Hi Huang,
>>>>
>>>> [auto build test ERROR on hwmon/hwmon-next -- if it's inappropriate base, please suggest rules for selecting the more suitable base]
>>>>
>>>> url:    https://github.com/0day-ci/linux/commits/Huang-Rui/hwmon-fam15h_power-Introduce-an-accumulated-power-reporting-algorithm/20151020-110712
>>>> config: x86_64-randconfig-s2-10201413 (attached as .config)
>>>> reproduce:
>>>>          # save the attached .config to linux build tree
>>>>          make ARCH=x86_64
>>>>
>>>> All errors (new ones prefixed by >>):
>>>>
>>>>     drivers/built-in.o: In function `fam15h_power_probe':
>>>>>> fam15h_power.c:(.text+0x26e3a3): undefined reference to `amd_get_cores_per_cu'
>>>>     fam15h_power.c:(.text+0x26e41e): undefined reference to `amd_get_cores_per_cu'
>>>>
>>>
>>> Thanks to report this issue. :)
>>> The root cause is that the test config doesn't enable
>>> CONFIG_CPU_SUP_AMD.
>>>
>>> How about below fix:
>>>
>>
>> Guess you don't have a choice.
>>
>
> Yes, if I use test config, fam15h_power isn't chosen. :)
>
> And if I use the "select" flag like below, fam15h_power can be built
> successfully.
>
That is another possibility, though that isn't how CPU_SUP_AMD
is handled by its other users. Matter of philosophy, I guess.

Guenter

> ---
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 796569ee..50b4fef 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -289,6 +289,7 @@ config SENSORS_K10TEMP
>   config SENSORS_FAM15H_POWER
>   	tristate "AMD Family 15h processor power"
>   	depends on X86 && PCI
> +	select CPU_SUP_AMD
>   	help
>   	  If you say yes here you get support for processor power
>   	  information of your AMD family 15h CPU.
> ---
>
> Thanks,
> Rui
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-21  2:49           ` Guenter Roeck
@ 2015-10-21  3:04             ` Huang Rui
  2015-10-21  6:05               ` Jean Delvare
  0 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-21  3:04 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: kbuild test robot, Yuanhan Liu, Fengguang Wu, kbuild-all,
	Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Aaron Lu, Tony Li

On Tue, Oct 20, 2015 at 07:49:53PM -0700, Guenter Roeck wrote:
> On 10/20/2015 07:40 PM, Huang Rui wrote:
> >On Tue, Oct 20, 2015 at 07:15:25PM -0700, Guenter Roeck wrote:
> >>On 10/20/2015 06:42 PM, Huang Rui wrote:
> >>>On Tue, Oct 20, 2015 at 03:24:09PM +0800, kbuild test robot wrote:
> >>>>Hi Huang,
> >>>>
> >>>>[auto build test ERROR on hwmon/hwmon-next -- if it's inappropriate base, please suggest rules for selecting the more suitable base]
> >>>>
> >>>>url:    https://github.com/0day-ci/linux/commits/Huang-Rui/hwmon-fam15h_power-Introduce-an-accumulated-power-reporting-algorithm/20151020-110712
> >>>>config: x86_64-randconfig-s2-10201413 (attached as .config)
> >>>>reproduce:
> >>>>         # save the attached .config to linux build tree
> >>>>         make ARCH=x86_64
> >>>>
> >>>>All errors (new ones prefixed by >>):
> >>>>
> >>>>    drivers/built-in.o: In function `fam15h_power_probe':
> >>>>>>fam15h_power.c:(.text+0x26e3a3): undefined reference to `amd_get_cores_per_cu'
> >>>>    fam15h_power.c:(.text+0x26e41e): undefined reference to `amd_get_cores_per_cu'
> >>>>
> >>>
> >>>Thanks to report this issue. :)
> >>>The root cause is that the test config doesn't enable
> >>>CONFIG_CPU_SUP_AMD.
> >>>
> >>>How about below fix:
> >>>
> >>
> >>Guess you don't have a choice.
> >>
> >
> >Yes, if I use test config, fam15h_power isn't chosen. :)
> >
> >And if I use the "select" flag like below, fam15h_power can be built
> >successfully.
> >
> That is another possibility, though that isn't how CPU_SUP_AMD
> is handled by its other users. Matter of philosophy, I guess.
> 

Err, sorry. Could you please point out the other possibility?

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-21  3:04             ` Huang Rui
@ 2015-10-21  6:05               ` Jean Delvare
  2015-10-21  6:28                 ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Jean Delvare @ 2015-10-21  6:05 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, kbuild test robot, Yuanhan Liu, Fengguang Wu,
	kbuild-all, Borislav Petkov, Peter Zijlstra, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz, Frederic Weisbecker,
	lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Aaron Lu, Tony Li

Hi Rui,

On Wed, 21 Oct 2015 11:04:01 +0800, Huang Rui wrote:
> On Tue, Oct 20, 2015 at 07:49:53PM -0700, Guenter Roeck wrote:
> > On 10/20/2015 07:40 PM, Huang Rui wrote:
> > >>>Thanks to report this issue. :)
> > >>>The root cause is that the test config doesn't enable
> > >>>CONFIG_CPU_SUP_AMD.
> > >>>
> > >>>How about below fix:
> > >>>
> > >>
> > >>Guess you don't have a choice.
> > >>
> > >
> > >Yes, if I use test config, fam15h_power isn't chosen. :)
> > >
> > >And if I use the "select" flag like below, fam15h_power can be built
> > >successfully.
> > >
> > That is another possibility, though that isn't how CPU_SUP_AMD
> > is handled by its other users. Matter of philosophy, I guess.
> 
> Err, sorry. Could you please point out the other possibility?

Both possibilities came from you. First one is to use "depends", second
one is to use "select".

As pointed out bu Gunter, other drivers (EDAC_DECODE_MCE, MICROCODE,
AMD_NB) use "depends" so you should do the same for consistency.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-21  6:05               ` Jean Delvare
@ 2015-10-21  6:28                 ` Huang Rui
  0 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-21  6:28 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Guenter Roeck, kbuild test robot, Yuanhan Liu, Fengguang Wu,
	kbuild-all, Borislav Petkov, Peter Zijlstra, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz, Frederic Weisbecker,
	lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Aaron Lu, Tony Li

Hi Jean,

On Wed, Oct 21, 2015 at 08:05:20AM +0200, Jean Delvare wrote:
> Hi Rui,
> 
> On Wed, 21 Oct 2015 11:04:01 +0800, Huang Rui wrote:
> > On Tue, Oct 20, 2015 at 07:49:53PM -0700, Guenter Roeck wrote:
> > > On 10/20/2015 07:40 PM, Huang Rui wrote:
> > > >>>Thanks to report this issue. :)
> > > >>>The root cause is that the test config doesn't enable
> > > >>>CONFIG_CPU_SUP_AMD.
> > > >>>
> > > >>>How about below fix:
> > > >>>
> > > >>
> > > >>Guess you don't have a choice.
> > > >>
> > > >
> > > >Yes, if I use test config, fam15h_power isn't chosen. :)
> > > >
> > > >And if I use the "select" flag like below, fam15h_power can be built
> > > >successfully.
> > > >
> > > That is another possibility, though that isn't how CPU_SUP_AMD
> > > is handled by its other users. Matter of philosophy, I guess.
> > 
> > Err, sorry. Could you please point out the other possibility?
> 
> Both possibilities came from you. First one is to use "depends", second
> one is to use "select".
> 
> As pointed out bu Gunter, other drivers (EDAC_DECODE_MCE, MICROCODE,
> AMD_NB) use "depends" so you should do the same for consistency.
> 

Thanks to clarify it. I will use "depends" at next version.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/10] hwmon: (fam15h_power) Add documentation for accumulated power algorithm
  2015-10-20  2:28 ` [PATCH v2 09/10] hwmon: (fam15h_power) Add documentation for accumulated power algorithm Huang Rui
@ 2015-10-23 10:22   ` Borislav Petkov
  0 siblings, 0 replies; 39+ messages in thread
From: Borislav Petkov @ 2015-10-23 10:22 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Tue, Oct 20, 2015 at 10:28:28AM +0800, Huang Rui wrote:
> This patch adds the description to explain the accumulated power
> algorithm.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  Documentation/hwmon/fam15h_power | 46 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)

You can merge this one into the previous one - they're both updating
documentation.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-20  2:28 ` [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
  2015-10-20  7:24   ` kbuild test robot
@ 2015-10-23 13:27   ` Borislav Petkov
  2015-10-23 13:37     ` Guenter Roeck
  2015-10-27  2:53     ` Huang Rui
  1 sibling, 2 replies; 39+ messages in thread
From: Borislav Petkov @ 2015-10-23 13:27 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Tue, Oct 20, 2015 at 10:28:24AM +0800, Huang Rui wrote:
> This patch adds a member in fam15h_power_data which specifies the
> compute unit accumulated power. It adds do_read_registers_on_cu to do
> all the read to all MSRs and run it on one of the online cores on each
> compute unit with smp_call_function_many(). This behavior can decrease
> IPI numbers.
> 
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  drivers/hwmon/fam15h_power.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index e2bfab5..88e4f3e 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -25,6 +25,7 @@
>  #include <linux/module.h>
>  #include <linux/pci.h>
>  #include <linux/bitops.h>
> +#include <linux/cpumask.h>
>  #include <asm/processor.h>
>  #include <asm/msr.h>
>  
> @@ -44,7 +45,9 @@ MODULE_LICENSE("GPL");
>  
>  #define FAM15H_MIN_NUM_ATTRS		2
>  #define FAM15H_NUM_GROUPS		2
> +#define MAX_CUS				8
>  
> +#define MSR_F15H_CU_PWR_ACCUMULATOR	0xc001007a
>  #define MSR_F15H_CU_MAX_PWR_ACCUMULATOR	0xc001007b
>  
>  struct fam15h_power_data {
> @@ -57,6 +60,8 @@ struct fam15h_power_data {
>  	struct attribute_group fam15h_power_group;
>  	/* maximum accumulated power of a compute unit */
>  	u64 max_cu_acc_power;
> +	/* accumulated power of the compute units */
> +	u64 cu_acc_power[MAX_CUS];
>  };
>  
>  static ssize_t show_power(struct device *dev,
> @@ -115,6 +120,65 @@ static ssize_t show_power_crit(struct device *dev,
>  }
>  static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
>  
> +static void do_read_registers_on_cu(void *_data)
> +{
> +	struct fam15h_power_data *data = _data;
> +	int cpu, cu, cores_per_cu;
> +
> +	cpu = smp_processor_id();
> +
> +	cores_per_cu = amd_get_cores_per_cu();
> +	cu = cpu / cores_per_cu;
> +
> +	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
> +			    &data->cu_acc_power[cu]));

I guess the WARN_ON's here should be WARN_ON_ONCE() - otherwise dmesg is
filling up very quickly.

> +}
> +
> +static int read_registers(struct fam15h_power_data *data)
> +{
> +	int this_cpu, ret;
> +	int cu_num, cores_per_cu, cpu, cu;
> +	cpumask_var_t mask;
> +
> +	cores_per_cu = amd_get_cores_per_cu();
> +	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> +
> +	WARN_ON_ONCE(cu_num > MAX_CUS);
> +
> +	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> +	if (!ret)
> +		return -ENOMEM;
> +
> +	this_cpu = get_cpu();

This should be get_online_cpus() and its counterpart below should be
put_online_cpus().

> +
> +	/*
> +	 * Choose the first online core of each compute unit, and then
> +	 * read their MSR value of power and ptsc in one time of IPI,

						in a single IPI.

> +	 * because the MSR value of cpu core represent the compute

s/cpu/CPU/

do that in *all* your text.

> +	 * unit's. This behavior can decrease IPI numbers between the

	  unit's ?

What does that sentence even mean?

> +	 * cores.
> +	 */
> +	cpu = cpumask_first(cpu_online_mask);
> +	cu = cpu / cores_per_cu;
> +	while (cpu < boot_cpu_data.x86_max_cores) {
> +		if (cu <= cpu / cores_per_cu) {
> +			cu = cpu / cores_per_cu + 1;
> +			cpumask_set_cpu(cpu, mask);
> +		}
> +		cpu = cpumask_next(cu * cores_per_cu - 1, cpu_online_mask);
> +	}

This is hard to parse - I *think* you're setting a bit in mask for a
core in each CU...

If so, I think you can simplify it by generating a tmp mask which
contains the cores of CU0, i.e. something like that:

	11_00_00_...

and then do cpumask_and(res, ...) to find the online cores on that CU
and then do cpumask_set_cpu(cpumask_any(res), mask) to select one CPU on
that CU.

And then shift to the next CU:

	cpumask_shift_right(dst, src_mask, cores_per_cu);

I think this should be cleaner and less error prone, without the
conditionals...

> +
> +	if (cpumask_test_cpu(this_cpu, mask))
> +		do_read_registers_on_cu(data);
> +
> +	smp_call_function_many(mask, do_read_registers_on_cu, data, true);
> +	put_cpu();
> +
> +	free_cpumask_var(mask);
> +
> +	return 0;
> +}
> +
>  static int fam15h_power_init_attrs(struct pci_dev *pdev,
>  				   struct fam15h_power_data *data)
>  {
> @@ -253,7 +317,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
>  
>  	data->max_cu_acc_power = tmp;
>  
> -	return 0;
> +	ret = read_registers(data);
> +
> +	return ret;

Simply:

	return read_registers(data);

>  }
>  
>  static int fam15h_power_probe(struct pci_dev *pdev,
> -- 
> 1.9.1
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2015-10-20  2:28 ` [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
@ 2015-10-23 13:28   ` Borislav Petkov
  2015-10-26  3:46     ` Huang Rui
  2015-10-23 14:20   ` Guenter Roeck
  1 sibling, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2015-10-23 13:28 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Tue, Oct 20, 2015 at 10:28:26AM +0800, Huang Rui wrote:
> This patch introduces an algorithm that computes the average power by
> reading a delta value of “core power accumulator” register during
> measurement interval, and then dividing delta value by the length of
> the time interval.
> 
> User is able to use power1_average entry to measure the processor power
> consumption and power1_average_interval entry to set the interval.
> 
> A simple example:
> 
> ray@hr-ub:~/tip$ sensors
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1:       23.73 mW (avg = 634.63 mW, interval =   0.01 s)
>                        (crit =  15.00 W)
> 
> ...

I need to play with this more after I get back from KS. Just a partial
review for now.

> 
> The result is current average processor power consumption in 10
> millisecond. The unit of the result is uWatt.
> 
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  drivers/hwmon/fam15h_power.c | 120 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 120 insertions(+)
> 
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index 6321f73..a5a539e 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -26,6 +26,9 @@
>  #include <linux/pci.h>
>  #include <linux/bitops.h>
>  #include <linux/cpumask.h>
> +#include <linux/mutex.h>
> +#include <linux/time.h>
> +#include <linux/sched.h>
>  #include <asm/processor.h>
>  #include <asm/msr.h>
>  
> @@ -64,6 +67,10 @@ struct fam15h_power_data {
>  	u64 cu_acc_power[MAX_CUS];
>  	/* performance timestamp counter */
>  	u64 cpu_sw_pwr_ptsc[MAX_CUS];
> +	/* online/offline status of current compute unit */
> +	int cu_on[MAX_CUS];
> +	unsigned long power_period;
> +	struct mutex acc_pwr_mutex;
>  };
>  
>  static ssize_t show_power(struct device *dev,
> @@ -132,11 +139,15 @@ static void do_read_registers_on_cu(void *_data)
>  	cores_per_cu = amd_get_cores_per_cu();
>  	cu = cpu / cores_per_cu;
>  
> +	mutex_lock(&data->acc_pwr_mutex);
>  	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
>  			    &data->cu_acc_power[cu]));
>  
>  	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
>  			    &data->cpu_sw_pwr_ptsc[cu]));
> +
> +	data->cu_on[cu] = 1;
> +	mutex_unlock(&data->acc_pwr_mutex);
>  }
>  
>  static int read_registers(struct fam15h_power_data *data)
> @@ -148,6 +159,10 @@ static int read_registers(struct fam15h_power_data *data)
>  	cores_per_cu = amd_get_cores_per_cu();
>  	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
>  
> +	mutex_lock(&data->acc_pwr_mutex);
> +	memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
> +	mutex_unlock(&data->acc_pwr_mutex);
> +
>  	WARN_ON_ONCE(cu_num > MAX_CUS);
>  
>  	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> @@ -184,18 +199,113 @@ static int read_registers(struct fam15h_power_data *data)
>  	return 0;
>  }
>  
> +static ssize_t acc_show_power(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> +	u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
> +	    jdelta[MAX_CUS];
> +	u64 tdelta, avg_acc;
> +	int cu, cu_num, cores_per_cu, ret;
> +	signed long leftover;
> +
> +	cores_per_cu = amd_get_cores_per_cu();
> +	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> +
> +	ret = read_registers(data);
> +	if (ret)
> +		return 0;
> +
> +	cu = 0;
> +	while(cu++ < cu_num) {
> +		prev_cu_acc_power[cu] = data->cu_acc_power[cu];
> +		prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
> +	}

Please integrate checkpatch into your workflow of creating patches - it
can be correct sometimes:

ERROR: space required before the open parenthesis '('
#130: FILE: drivers/hwmon/fam15h_power.c:221:
+       while(cu++ < cu_num) {


> +
> +	leftover = schedule_timeout_interruptible(
> +			msecs_to_jiffies(data->power_period)
> +		   );

This way of writing a function call is reaaally ugly. What's wrong with:

	leftover = schedule_timeout_interruptible(msecs_to_jiffies(data->power_period));

?

And don't tell me 80 columns - that rule is not a hard one.

> +	if (leftover)
> +		return 0;
> +

...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-23 13:27   ` Borislav Petkov
@ 2015-10-23 13:37     ` Guenter Roeck
  2015-10-27  2:53     ` Huang Rui
  1 sibling, 0 replies; 39+ messages in thread
From: Guenter Roeck @ 2015-10-23 13:37 UTC (permalink / raw)
  To: Borislav Petkov, Huang Rui
  Cc: Peter Zijlstra, Jean Delvare, Andy Lutomirski, Andreas Herrmann,
	Thomas Gleixner, Ingo Molnar, Rafael J. Wysocki, Len Brown,
	John Stultz, Frédéric Weisbecker, lm-sensors,
	linux-kernel, x86, Andreas Herrmann, Aravind Gopalakrishnan,
	Fengguang Wu, Aaron Lu, Tony Li

On 10/23/2015 06:27 AM, Borislav Petkov wrote:
> On Tue, Oct 20, 2015 at 10:28:24AM +0800, Huang Rui wrote:
>> This patch adds a member in fam15h_power_data which specifies the
>> compute unit accumulated power. It adds do_read_registers_on_cu to do
>> all the read to all MSRs and run it on one of the online cores on each
>> compute unit with smp_call_function_many(). This behavior can decrease
>> IPI numbers.
>>
>> Suggested-by: Borislav Petkov <bp@alien8.de>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Cc: Guenter Roeck <linux@roeck-us.net>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> ---
>>   drivers/hwmon/fam15h_power.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 67 insertions(+), 1 deletion(-)
>>

[ ... ]

>> +static void do_read_registers_on_cu(void *_data)
>> +{
>> +	struct fam15h_power_data *data = _data;
>> +	int cpu, cu, cores_per_cu;
>> +
>> +	cpu = smp_processor_id();
>> +
>> +	cores_per_cu = amd_get_cores_per_cu();
>> +	cu = cpu / cores_per_cu;
>> +
>> +	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
>> +			    &data->cu_acc_power[cu]));
>
> I guess the WARN_ON's here should be WARN_ON_ONCE() - otherwise dmesg is
> filling up very quickly.
>
I don't really understand why WARN_ON (or WARN_ON_ONCE)
would be warranted here in the first place.

Guenter


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added
  2015-10-20  2:28 ` [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added Huang Rui
@ 2015-10-23 13:42   ` Guenter Roeck
  2015-10-26  1:58     ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Guenter Roeck @ 2015-10-23 13:42 UTC (permalink / raw)
  To: Huang Rui, Borislav Petkov, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li

On 10/19/2015 07:28 PM, Huang Rui wrote:
> Attributes depend on the CPU model the driver gets loaded on.
> Therefore, add those attributes dynamically at init time. This is more
> flexible to control the different attributes on different platforms.
>
> Suggestedy-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>   drivers/hwmon/fam15h_power.c | 70 ++++++++++++++++++++++++++++----------------
>   1 file changed, 45 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index e80ee23..41d022e 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -41,12 +41,17 @@ MODULE_LICENSE("GPL");
>   #define REG_TDP_RUNNING_AVERAGE		0xe0
>   #define REG_TDP_LIMIT3			0xe8
>
> +#define FAM15H_MIN_NUM_ATTRS		2
> +#define FAM15H_NUM_GROUPS		2
> +
>   struct fam15h_power_data {
>   	struct pci_dev *pdev;
>   	unsigned int tdp_to_watts;
>   	unsigned int base_tdp;
>   	unsigned int processor_pwr_watts;
>   	unsigned int cpu_pwr_sample_ratio;
> +	const struct attribute_group *fam15h_power_groups[FAM15H_NUM_GROUPS];
> +	struct attribute_group fam15h_power_group;
>   };
>
>   static ssize_t show_power(struct device *dev,
> @@ -105,29 +110,31 @@ static ssize_t show_power_crit(struct device *dev,
>   }
>   static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
>
> -static umode_t fam15h_power_is_visible(struct kobject *kobj,
> -				       struct attribute *attr,
> -				       int index)
> +static int fam15h_power_init_attrs(struct pci_dev *pdev,
> +				   struct fam15h_power_data *data)
>   {
> -	/* power1_input is only reported for Fam15h, Models 00h-0fh */
> -	if (attr == &dev_attr_power1_input.attr &&
> -	   (boot_cpu_data.x86 != 0x15 || boot_cpu_data.x86_model > 0xf))
> -		return 0;
> +	int n = FAM15H_MIN_NUM_ATTRS;
> +	struct attribute **fam15h_power_attrs;
>
> -	return attr->mode;
> -}
> +	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> +		n += 1;
>
> -static struct attribute *fam15h_power_attrs[] = {
> -	&dev_attr_power1_input.attr,
> -	&dev_attr_power1_crit.attr,
> -	NULL
> -};
> +	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
> +					  sizeof(*fam15h_power_attrs),
> +					  GFP_KERNEL);
>
> -static const struct attribute_group fam15h_power_group = {
> -	.attrs = fam15h_power_attrs,
> -	.is_visible = fam15h_power_is_visible,
> -};
> -__ATTRIBUTE_GROUPS(fam15h_power);
> +	if (!fam15h_power_attrs)
> +		return -ENOMEM;
> +
> +	n = 0;
> +	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
> +	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> +		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
> +
> +	data->fam15h_power_group.attrs = fam15h_power_attrs;
> +
> +	return 0;
> +}
>
>   static bool should_load_on_this_node(struct pci_dev *f4)
>   {
> @@ -186,11 +193,12 @@ static int fam15h_power_resume(struct pci_dev *pdev)
>   #define fam15h_power_resume NULL
>   #endif
>
> -static void fam15h_power_init_data(struct pci_dev *f4,
> -					     struct fam15h_power_data *data)
> +static int fam15h_power_init_data(struct pci_dev *f4,
> +				  struct fam15h_power_data *data)
>   {
>   	u32 val, eax, ebx, ecx, edx;
>   	u64 tmp;
> +	int ret;
>
>   	pci_read_config_dword(f4, REG_PROCESSOR_TDP, &val);
>   	data->base_tdp = val >> 16;
> @@ -211,11 +219,15 @@ static void fam15h_power_init_data(struct pci_dev *f4,
>   	/* convert to microWatt */
>   	data->processor_pwr_watts = (tmp * 15625) >> 10;
>
> +	ret = fam15h_power_init_attrs(f4, data);
> +	if (ret)
> +		return ret;
> +
>   	cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
>
>   	/* CPUID Fn8000_0007:EDX[12] indicates to support accumulated power */
>   	if (!(edx & BIT(12)))
> -		return;
> +		return 0;
>
>   	/*
>   	 * determine the ratio of the compute unit power accumulator
> @@ -223,14 +235,17 @@ static void fam15h_power_init_data(struct pci_dev *f4,
>   	 * Fn8000_0007:ECX
>   	 */
>   	data->cpu_pwr_sample_ratio = ecx;
> +
> +	return 0;
>   }
>
>   static int fam15h_power_probe(struct pci_dev *pdev,
> -					const struct pci_device_id *id)
> +			      const struct pci_device_id *id)
>   {
>   	struct fam15h_power_data *data;
>   	struct device *dev = &pdev->dev;
>   	struct device *hwmon_dev;
> +	int ret;
>
>   	/*
>   	 * though we ignore every other northbridge, we still have to
> @@ -246,12 +261,17 @@ static int fam15h_power_probe(struct pci_dev *pdev,
>   	if (!data)
>   		return -ENOMEM;
>
> -	fam15h_power_init_data(pdev, data);
> +	ret = fam15h_power_init_data(pdev, data);
> +	if (ret)
> +		return ret;
> +
>   	data->pdev = pdev;
>
> +	data->fam15h_power_groups[0] = &data->fam15h_power_group;
> +
>   	hwmon_dev = devm_hwmon_device_register_with_groups(dev, "fam15h_power",
>   							   data,
> -							   fam15h_power_groups);
> +							   (const struct attribute_group **)&data->fam15h_power_groups);

Why this typecast ? It should not be needed.

Guenter


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo
  2015-10-20  2:28 ` [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo Huang Rui
@ 2015-10-23 13:45   ` Guenter Roeck
  2015-10-26  2:19     ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Guenter Roeck @ 2015-10-23 13:45 UTC (permalink / raw)
  To: Huang Rui, Borislav Petkov, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li

On 10/19/2015 07:28 PM, Huang Rui wrote:
> This patch enables power1_input attribute for Carrizo platform.
>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>   drivers/hwmon/fam15h_power.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index 41d022e..a090adf 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -115,8 +115,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
>   {
>   	int n = FAM15H_MIN_NUM_ATTRS;
>   	struct attribute **fam15h_power_attrs;
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
>
> -	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> +	if (c->x86 == 0x15 &&
> +	    ((c->x86_model <= 0xf) ||

Please no unnecessary ( ).

> +	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))

Those are acceptable to clarify that the && has precedence on purpose,
but "(c->x86_model <= 0xf)" is really unnecessary (and inconsistent
with the rest of the code).

>   		n += 1;
>
>   	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
> @@ -128,7 +131,9 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
>
>   	n = 0;
>   	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
> -	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> +	if (c->x86 == 0x15 &&
> +	    ((c->x86_model <= 0xf) ||
> +	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))

Same here.

>   		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
>
>   	data->fam15h_power_group.attrs = fam15h_power_attrs;
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for accumulated power
  2015-10-20  2:28 ` [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
@ 2015-10-23 13:59   ` Guenter Roeck
  2015-10-26  3:37     ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Guenter Roeck @ 2015-10-23 13:59 UTC (permalink / raw)
  To: Huang Rui, Borislav Petkov, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li

On 10/19/2015 07:28 PM, Huang Rui wrote:
> PTSC is the performance timestamp counter value in a cpu core and the
> cores in one compute unit have the fixed frequency. So it picks up the
> performance timestamp counter value of the first core per compute unit
> to measure the interval for average power per compute unit.
>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>   arch/x86/include/asm/msr-index.h | 1 +
>   drivers/hwmon/fam15h_power.c     | 5 +++++
>   2 files changed, 6 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index c1c0a1c..3686eaa 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -313,6 +313,7 @@
>   #define MSR_F15H_PERF_CTR		0xc0010201
>   #define MSR_F15H_NB_PERF_CTL		0xc0010240
>   #define MSR_F15H_NB_PERF_CTR		0xc0010241
> +#define MSR_F15H_PTSC			0xc0010280
>
>   /* Fam 10h MSRs */
>   #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index 88e4f3e..6321f73 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -62,6 +62,8 @@ struct fam15h_power_data {
>   	u64 max_cu_acc_power;
>   	/* accumulated power of the compute units */
>   	u64 cu_acc_power[MAX_CUS];
> +	/* performance timestamp counter */
> +	u64 cpu_sw_pwr_ptsc[MAX_CUS];
>   };
>
>   static ssize_t show_power(struct device *dev,
> @@ -132,6 +134,9 @@ static void do_read_registers_on_cu(void *_data)
>
>   	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
>   			    &data->cu_acc_power[cu]));
> +
> +	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
> +			    &data->cpu_sw_pwr_ptsc[cu]));
>   }

I am not really happy with those WARN_ON, or even an error message.
If the error is seen, it may be persistent.

If an error check is really needed here, it might make more sense to store
the read error and return it to user space if the respective sysfs attribute
is read.

Guenter

>
>   static int read_registers(struct fam15h_power_data *data)
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2015-10-20  2:28 ` [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
  2015-10-23 13:28   ` Borislav Petkov
@ 2015-10-23 14:20   ` Guenter Roeck
  2015-10-27  3:36     ` Huang Rui
  1 sibling, 1 reply; 39+ messages in thread
From: Guenter Roeck @ 2015-10-23 14:20 UTC (permalink / raw)
  To: Huang Rui, Borislav Petkov, Peter Zijlstra, Jean Delvare,
	Andy Lutomirski, Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker
  Cc: lm-sensors, linux-kernel, x86, Andreas Herrmann,
	Aravind Gopalakrishnan, Borislav Petkov, Fengguang Wu, Aaron Lu,
	Tony Li

On 10/19/2015 07:28 PM, Huang Rui wrote:
> This patch introduces an algorithm that computes the average power by
> reading a delta value of “core power accumulator” register during
> measurement interval, and then dividing delta value by the length of
> the time interval.
>
> User is able to use power1_average entry to measure the processor power
> consumption and power1_average_interval entry to set the interval.
>
> A simple example:
>
> ray@hr-ub:~/tip$ sensors
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1:       23.73 mW (avg = 634.63 mW, interval =   0.01 s)
>                         (crit =  15.00 W)
>
> ...
>
> The result is current average processor power consumption in 10
> millisecond. The unit of the result is uWatt.
>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>   drivers/hwmon/fam15h_power.c | 120 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 120 insertions(+)
>
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index 6321f73..a5a539e 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -26,6 +26,9 @@
>   #include <linux/pci.h>
>   #include <linux/bitops.h>
>   #include <linux/cpumask.h>
> +#include <linux/mutex.h>
> +#include <linux/time.h>
> +#include <linux/sched.h>
>   #include <asm/processor.h>
>   #include <asm/msr.h>
>
> @@ -64,6 +67,10 @@ struct fam15h_power_data {
>   	u64 cu_acc_power[MAX_CUS];
>   	/* performance timestamp counter */
>   	u64 cpu_sw_pwr_ptsc[MAX_CUS];
> +	/* online/offline status of current compute unit */
> +	int cu_on[MAX_CUS];
> +	unsigned long power_period;
> +	struct mutex acc_pwr_mutex;

Can you elaborate a bit about what this mutex is supposed to protect ?
To me it seems that it doesn't really protect anything.

>   };
>
>   static ssize_t show_power(struct device *dev,
> @@ -132,11 +139,15 @@ static void do_read_registers_on_cu(void *_data)
>   	cores_per_cu = amd_get_cores_per_cu();
>   	cu = cpu / cores_per_cu;
>
> +	mutex_lock(&data->acc_pwr_mutex);
>   	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
>   			    &data->cu_acc_power[cu]));
>
>   	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
>   			    &data->cpu_sw_pwr_ptsc[cu]));
> +
> +	data->cu_on[cu] = 1;
> +	mutex_unlock(&data->acc_pwr_mutex);

... for example, while this protects cu_on[cu],

>   }
>
>   static int read_registers(struct fam15h_power_data *data)
> @@ -148,6 +159,10 @@ static int read_registers(struct fam15h_power_data *data)
>   	cores_per_cu = amd_get_cores_per_cu();
>   	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
>
> +	mutex_lock(&data->acc_pwr_mutex);
> +	memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
> +	mutex_unlock(&data->acc_pwr_mutex);

... this code may well overwrite that same value.

> +
>   	WARN_ON_ONCE(cu_num > MAX_CUS);
>
>   	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> @@ -184,18 +199,113 @@ static int read_registers(struct fam15h_power_data *data)
>   	return 0;
>   }
>
> +static ssize_t acc_show_power(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> +	u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
> +	    jdelta[MAX_CUS];
> +	u64 tdelta, avg_acc;
> +	int cu, cu_num, cores_per_cu, ret;
> +	signed long leftover;
> +
> +	cores_per_cu = amd_get_cores_per_cu();
> +	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> +
> +	ret = read_registers(data);
> +	if (ret)
> +		return 0;
> +
> +	cu = 0;
> +	while(cu++ < cu_num) {
> +		prev_cu_acc_power[cu] = data->cu_acc_power[cu];
> +		prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
> +	}

... and multiple parallel reads on the power attribute must produce
pretty random values, unless I am really missing something.

> +
> +	leftover = schedule_timeout_interruptible(
> +			msecs_to_jiffies(data->power_period)
> +		   );
> +	if (leftover)
> +		return 0;
> +
> +	ret = read_registers(data);
> +	if (ret)
> +		return 0;
> +
With a 10ms period, I wonder how accurate this really is.

> +	for (cu = 0, avg_acc = 0; cu < cu_num; cu++) {
> +		/* check if current compute unit is online */
> +		if (data->cu_on[cu] == 0)
> +			continue;
> +
> +		if (data->cu_acc_power[cu] < prev_cu_acc_power[cu]) {
> +			jdelta[cu] = data->max_cu_acc_power + data->cu_acc_power[cu];
> +			jdelta[cu] -= prev_cu_acc_power[cu];
> +		} else {
> +			jdelta[cu] = data->cu_acc_power[cu] - prev_cu_acc_power[cu];
> +		}
> +		tdelta = data->cpu_sw_pwr_ptsc[cu] - prev_ptsc[cu];
> +		jdelta[cu] *= data->cpu_pwr_sample_ratio * 1000;
> +		do_div(jdelta[cu], tdelta);
> +
> +		/* the unit is microWatt */
> +		avg_acc += jdelta[cu];
> +	}
> +
> +	return sprintf(buf, "%llu\n", (unsigned long long)avg_acc);
> +}
> +static DEVICE_ATTR(power1_average, S_IRUGO, acc_show_power, NULL);
> +
> +
> +static ssize_t acc_show_power_period(struct device *dev,
> +				     struct device_attribute *attr,
> +				     char *buf)
> +{
> +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> +
> +	return sprintf(buf, "%lu\n", data->power_period);
> +}
> +
> +static ssize_t acc_set_power_period(struct device *dev,
> +				    struct device_attribute *attr,
> +				    const char *buf, size_t count)
> +{
> +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> +	unsigned long temp;
> +	int ret;
> +
> +	ret = kstrtoul(buf, 10, &temp);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&data->acc_pwr_mutex);
> +	data->power_period = temp;
> +	mutex_unlock(&data->acc_pwr_mutex);

This doesn't really protect anything either except that power_period
can not be updated while the lock is active. But the code using
power_period does not run under mutex protection, so that seems pretty
pointless.

Also, this accepts an unlimited timeout. If I understand correctly,
setting the timeout to, say, 10 seconds will cause the read function
to hang for that period of time. Setting it to 1 hour will cause the read
function to hang for 1 hour.

Does this really make sense ?

> +
> +	return count;
> +}
> +static DEVICE_ATTR(power1_average_interval, S_IRUGO | S_IWUSR,
> +		   acc_show_power_period, acc_set_power_period);
> +
>   static int fam15h_power_init_attrs(struct pci_dev *pdev,
>   				   struct fam15h_power_data *data)
>   {
>   	int n = FAM15H_MIN_NUM_ATTRS;
>   	struct attribute **fam15h_power_attrs;
>   	struct cpuinfo_x86 *c = &boot_cpu_data;
> +	u32 cpuid;
>
>   	if (c->x86 == 0x15 &&
>   	    ((c->x86_model <= 0xf) ||
>   	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
>   		n += 1;
>
> +	cpuid = cpuid_edx(0x80000007);
> +
> +	/* check if processor supports accumulated power */
> +	if (cpuid & BIT(12))
> +		n += 2;
> +
>   	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
>   					  sizeof(*fam15h_power_attrs),
>   					  GFP_KERNEL);
> @@ -210,6 +320,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
>   	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
>   		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
>
> +	if (cpuid & BIT(12)) {
> +		fam15h_power_attrs[n++] = &dev_attr_power1_average.attr;
> +		fam15h_power_attrs[n++] = &dev_attr_power1_average_interval.attr;
> +	}
> +
>   	data->fam15h_power_group.attrs = fam15h_power_attrs;
>
>   	return 0;
> @@ -322,6 +437,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
>
>   	data->max_cu_acc_power = tmp;
>
> +	/* set default interval as 10 ms */
> +	data->power_period = 10;
> +
>   	ret = read_registers(data);
>
>   	return ret;
> @@ -349,6 +467,8 @@ static int fam15h_power_probe(struct pci_dev *pdev,
>   	if (!data)
>   		return -ENOMEM;
>
> +	mutex_init(&data->acc_pwr_mutex);
> +
>   	ret = fam15h_power_init_data(pdev, data);
>   	if (ret)
>   		return ret;
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added
  2015-10-23 13:42   ` Guenter Roeck
@ 2015-10-26  1:58     ` Huang Rui
  2015-10-26  2:14       ` Guenter Roeck
  0 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-26  1:58 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li

On Fri, Oct 23, 2015 at 06:42:20AM -0700, Guenter Roeck wrote:
> On 10/19/2015 07:28 PM, Huang Rui wrote:
> >Attributes depend on the CPU model the driver gets loaded on.
> >Therefore, add those attributes dynamically at init time. This is more
> >flexible to control the different attributes on different platforms.
> >
> >Suggestedy-by: Borislav Petkov <bp@alien8.de>
> >Signed-off-by: Huang Rui <ray.huang@amd.com>
> >Cc: Guenter Roeck <linux@roeck-us.net>
> >Cc: Peter Zijlstra <peterz@infradead.org>
> >Cc: Ingo Molnar <mingo@kernel.org>
> >---
> >  drivers/hwmon/fam15h_power.c | 70 ++++++++++++++++++++++++++++----------------
> >  1 file changed, 45 insertions(+), 25 deletions(-)
> >
> >diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> >index e80ee23..41d022e 100644
> >--- a/drivers/hwmon/fam15h_power.c
> >+++ b/drivers/hwmon/fam15h_power.c
> >@@ -41,12 +41,17 @@ MODULE_LICENSE("GPL");
> >  #define REG_TDP_RUNNING_AVERAGE		0xe0
> >  #define REG_TDP_LIMIT3			0xe8
> >
> >+#define FAM15H_MIN_NUM_ATTRS		2
> >+#define FAM15H_NUM_GROUPS		2
> >+
> >  struct fam15h_power_data {
> >  	struct pci_dev *pdev;
> >  	unsigned int tdp_to_watts;
> >  	unsigned int base_tdp;
> >  	unsigned int processor_pwr_watts;
> >  	unsigned int cpu_pwr_sample_ratio;
> >+	const struct attribute_group *fam15h_power_groups[FAM15H_NUM_GROUPS];
> >+	struct attribute_group fam15h_power_group;
> >  };
> >
> >  static ssize_t show_power(struct device *dev,
> >@@ -105,29 +110,31 @@ static ssize_t show_power_crit(struct device *dev,
> >  }
> >  static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
> >
> >-static umode_t fam15h_power_is_visible(struct kobject *kobj,
> >-				       struct attribute *attr,
> >-				       int index)
> >+static int fam15h_power_init_attrs(struct pci_dev *pdev,
> >+				   struct fam15h_power_data *data)
> >  {
> >-	/* power1_input is only reported for Fam15h, Models 00h-0fh */
> >-	if (attr == &dev_attr_power1_input.attr &&
> >-	   (boot_cpu_data.x86 != 0x15 || boot_cpu_data.x86_model > 0xf))
> >-		return 0;
> >+	int n = FAM15H_MIN_NUM_ATTRS;
> >+	struct attribute **fam15h_power_attrs;
> >
> >-	return attr->mode;
> >-}
> >+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> >+		n += 1;
> >
> >-static struct attribute *fam15h_power_attrs[] = {
> >-	&dev_attr_power1_input.attr,
> >-	&dev_attr_power1_crit.attr,
> >-	NULL
> >-};
> >+	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
> >+					  sizeof(*fam15h_power_attrs),
> >+					  GFP_KERNEL);
> >
> >-static const struct attribute_group fam15h_power_group = {
> >-	.attrs = fam15h_power_attrs,
> >-	.is_visible = fam15h_power_is_visible,
> >-};
> >-__ATTRIBUTE_GROUPS(fam15h_power);
> >+	if (!fam15h_power_attrs)
> >+		return -ENOMEM;
> >+
> >+	n = 0;
> >+	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
> >+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> >+		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
> >+
> >+	data->fam15h_power_group.attrs = fam15h_power_attrs;
> >+
> >+	return 0;
> >+}
> >
> >  static bool should_load_on_this_node(struct pci_dev *f4)
> >  {
> >@@ -186,11 +193,12 @@ static int fam15h_power_resume(struct pci_dev *pdev)
> >  #define fam15h_power_resume NULL
> >  #endif
> >
> >-static void fam15h_power_init_data(struct pci_dev *f4,
> >-					     struct fam15h_power_data *data)
> >+static int fam15h_power_init_data(struct pci_dev *f4,
> >+				  struct fam15h_power_data *data)
> >  {
> >  	u32 val, eax, ebx, ecx, edx;
> >  	u64 tmp;
> >+	int ret;
> >
> >  	pci_read_config_dword(f4, REG_PROCESSOR_TDP, &val);
> >  	data->base_tdp = val >> 16;
> >@@ -211,11 +219,15 @@ static void fam15h_power_init_data(struct pci_dev *f4,
> >  	/* convert to microWatt */
> >  	data->processor_pwr_watts = (tmp * 15625) >> 10;
> >
> >+	ret = fam15h_power_init_attrs(f4, data);
> >+	if (ret)
> >+		return ret;
> >+
> >  	cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
> >
> >  	/* CPUID Fn8000_0007:EDX[12] indicates to support accumulated power */
> >  	if (!(edx & BIT(12)))
> >-		return;
> >+		return 0;
> >
> >  	/*
> >  	 * determine the ratio of the compute unit power accumulator
> >@@ -223,14 +235,17 @@ static void fam15h_power_init_data(struct pci_dev *f4,
> >  	 * Fn8000_0007:ECX
> >  	 */
> >  	data->cpu_pwr_sample_ratio = ecx;
> >+
> >+	return 0;
> >  }
> >
> >  static int fam15h_power_probe(struct pci_dev *pdev,
> >-					const struct pci_device_id *id)
> >+			      const struct pci_device_id *id)
> >  {
> >  	struct fam15h_power_data *data;
> >  	struct device *dev = &pdev->dev;
> >  	struct device *hwmon_dev;
> >+	int ret;
> >
> >  	/*
> >  	 * though we ignore every other northbridge, we still have to
> >@@ -246,12 +261,17 @@ static int fam15h_power_probe(struct pci_dev *pdev,
> >  	if (!data)
> >  		return -ENOMEM;
> >
> >-	fam15h_power_init_data(pdev, data);
> >+	ret = fam15h_power_init_data(pdev, data);
> >+	if (ret)
> >+		return ret;
> >+
> >  	data->pdev = pdev;
> >
> >+	data->fam15h_power_groups[0] = &data->fam15h_power_group;
> >+
> >  	hwmon_dev = devm_hwmon_device_register_with_groups(dev, "fam15h_power",
> >  							   data,
> >-							   fam15h_power_groups);
> >+							   (const struct attribute_group **)&data->fam15h_power_groups);
> 
> Why this typecast ? It should not be needed.
> 

That's because there will be a warning without the typecast.

Expected 'const struct attribute_group **', but current type is 'const
struct attribute_group * (*)[2]'.

---
  CC [M]  /home/ray/tip/drivers/hwmon/fam15h_power.o
/home/ray/tip/drivers/hwmon/fam15h_power.c: In function ‘fam15h_power_probe’:
/home/ray/tip/drivers/hwmon/fam15h_power.c:481:11: warning: passing argument 4 of ‘devm_hwmon_device_register_with_groups’ from incompatible pointer type [enabled by default]
           &data->fam15h_power_groups);
           ^
In file included from /home/ray/tip/drivers/hwmon/fam15h_power.c:22:0:
include/linux/hwmon.h:26:1: note: expected ‘const struct attribute_group **’ but argument is of type ‘const struct attribute_group * (*)[2]’
 devm_hwmon_device_register_with_groups(struct device *dev, const char *name,
 ^
---

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added
  2015-10-26  1:58     ` Huang Rui
@ 2015-10-26  2:14       ` Guenter Roeck
  2015-10-26  2:25         ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Guenter Roeck @ 2015-10-26  2:14 UTC (permalink / raw)
  To: Huang Rui
  Cc: Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li

On Mon, Oct 26, 2015 at 09:58:45AM +0800, Huang Rui wrote:
> On Fri, Oct 23, 2015 at 06:42:20AM -0700, Guenter Roeck wrote:
> > On 10/19/2015 07:28 PM, Huang Rui wrote:
> > >Attributes depend on the CPU model the driver gets loaded on.
> > >Therefore, add those attributes dynamically at init time. This is more
> > >flexible to control the different attributes on different platforms.
> > >
> > >Suggestedy-by: Borislav Petkov <bp@alien8.de>
> > >Signed-off-by: Huang Rui <ray.huang@amd.com>
> > >Cc: Guenter Roeck <linux@roeck-us.net>
> > >Cc: Peter Zijlstra <peterz@infradead.org>
> > >Cc: Ingo Molnar <mingo@kernel.org>
> > >---
> > >  drivers/hwmon/fam15h_power.c | 70 ++++++++++++++++++++++++++++----------------
> > >  1 file changed, 45 insertions(+), 25 deletions(-)
> > >
> > >diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> > >index e80ee23..41d022e 100644
> > >--- a/drivers/hwmon/fam15h_power.c
> > >+++ b/drivers/hwmon/fam15h_power.c
> > >@@ -41,12 +41,17 @@ MODULE_LICENSE("GPL");
> > >  #define REG_TDP_RUNNING_AVERAGE		0xe0
> > >  #define REG_TDP_LIMIT3			0xe8
> > >
> > >+#define FAM15H_MIN_NUM_ATTRS		2
> > >+#define FAM15H_NUM_GROUPS		2
> > >+
> > >  struct fam15h_power_data {
> > >  	struct pci_dev *pdev;
> > >  	unsigned int tdp_to_watts;
> > >  	unsigned int base_tdp;
> > >  	unsigned int processor_pwr_watts;
> > >  	unsigned int cpu_pwr_sample_ratio;
> > >+	const struct attribute_group *fam15h_power_groups[FAM15H_NUM_GROUPS];
> > >+	struct attribute_group fam15h_power_group;
> > >  };
> > >
> > >  static ssize_t show_power(struct device *dev,
> > >@@ -105,29 +110,31 @@ static ssize_t show_power_crit(struct device *dev,
> > >  }
> > >  static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
> > >
> > >-static umode_t fam15h_power_is_visible(struct kobject *kobj,
> > >-				       struct attribute *attr,
> > >-				       int index)
> > >+static int fam15h_power_init_attrs(struct pci_dev *pdev,
> > >+				   struct fam15h_power_data *data)
> > >  {
> > >-	/* power1_input is only reported for Fam15h, Models 00h-0fh */
> > >-	if (attr == &dev_attr_power1_input.attr &&
> > >-	   (boot_cpu_data.x86 != 0x15 || boot_cpu_data.x86_model > 0xf))
> > >-		return 0;
> > >+	int n = FAM15H_MIN_NUM_ATTRS;
> > >+	struct attribute **fam15h_power_attrs;
> > >
> > >-	return attr->mode;
> > >-}
> > >+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> > >+		n += 1;
> > >
> > >-static struct attribute *fam15h_power_attrs[] = {
> > >-	&dev_attr_power1_input.attr,
> > >-	&dev_attr_power1_crit.attr,
> > >-	NULL
> > >-};
> > >+	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
> > >+					  sizeof(*fam15h_power_attrs),
> > >+					  GFP_KERNEL);
> > >
> > >-static const struct attribute_group fam15h_power_group = {
> > >-	.attrs = fam15h_power_attrs,
> > >-	.is_visible = fam15h_power_is_visible,
> > >-};
> > >-__ATTRIBUTE_GROUPS(fam15h_power);
> > >+	if (!fam15h_power_attrs)
> > >+		return -ENOMEM;
> > >+
> > >+	n = 0;
> > >+	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
> > >+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> > >+		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
> > >+
> > >+	data->fam15h_power_group.attrs = fam15h_power_attrs;
> > >+
> > >+	return 0;
> > >+}
> > >
> > >  static bool should_load_on_this_node(struct pci_dev *f4)
> > >  {
> > >@@ -186,11 +193,12 @@ static int fam15h_power_resume(struct pci_dev *pdev)
> > >  #define fam15h_power_resume NULL
> > >  #endif
> > >
> > >-static void fam15h_power_init_data(struct pci_dev *f4,
> > >-					     struct fam15h_power_data *data)
> > >+static int fam15h_power_init_data(struct pci_dev *f4,
> > >+				  struct fam15h_power_data *data)
> > >  {
> > >  	u32 val, eax, ebx, ecx, edx;
> > >  	u64 tmp;
> > >+	int ret;
> > >
> > >  	pci_read_config_dword(f4, REG_PROCESSOR_TDP, &val);
> > >  	data->base_tdp = val >> 16;
> > >@@ -211,11 +219,15 @@ static void fam15h_power_init_data(struct pci_dev *f4,
> > >  	/* convert to microWatt */
> > >  	data->processor_pwr_watts = (tmp * 15625) >> 10;
> > >
> > >+	ret = fam15h_power_init_attrs(f4, data);
> > >+	if (ret)
> > >+		return ret;
> > >+
> > >  	cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
> > >
> > >  	/* CPUID Fn8000_0007:EDX[12] indicates to support accumulated power */
> > >  	if (!(edx & BIT(12)))
> > >-		return;
> > >+		return 0;
> > >
> > >  	/*
> > >  	 * determine the ratio of the compute unit power accumulator
> > >@@ -223,14 +235,17 @@ static void fam15h_power_init_data(struct pci_dev *f4,
> > >  	 * Fn8000_0007:ECX
> > >  	 */
> > >  	data->cpu_pwr_sample_ratio = ecx;
> > >+
> > >+	return 0;
> > >  }
> > >
> > >  static int fam15h_power_probe(struct pci_dev *pdev,
> > >-					const struct pci_device_id *id)
> > >+			      const struct pci_device_id *id)
> > >  {
> > >  	struct fam15h_power_data *data;
> > >  	struct device *dev = &pdev->dev;
> > >  	struct device *hwmon_dev;
> > >+	int ret;
> > >
> > >  	/*
> > >  	 * though we ignore every other northbridge, we still have to
> > >@@ -246,12 +261,17 @@ static int fam15h_power_probe(struct pci_dev *pdev,
> > >  	if (!data)
> > >  		return -ENOMEM;
> > >
> > >-	fam15h_power_init_data(pdev, data);
> > >+	ret = fam15h_power_init_data(pdev, data);
> > >+	if (ret)
> > >+		return ret;
> > >+
> > >  	data->pdev = pdev;
> > >
> > >+	data->fam15h_power_groups[0] = &data->fam15h_power_group;
> > >+
> > >  	hwmon_dev = devm_hwmon_device_register_with_groups(dev, "fam15h_power",
> > >  							   data,
> > >-							   fam15h_power_groups);
> > >+							   (const struct attribute_group **)&data->fam15h_power_groups);
> > 
> > Why this typecast ? It should not be needed.
> > 
> 
> That's because there will be a warning without the typecast.
> 
> Expected 'const struct attribute_group **', but current type is 'const
> struct attribute_group * (*)[2]'.
> 

Maybe then either use data->fam15h_power_groups (no &) or
&data->fam15h_power_groups[0].

On a side note, the "fam15h_" prefix in the variable name is really
unnecessary unless there are some other groups in the same data structure.
Even the "power_" prefix doesn't really add any value unless there
are other groups.

Thanks,
Guenter
	
> ---
>   CC [M]  /home/ray/tip/drivers/hwmon/fam15h_power.o
> /home/ray/tip/drivers/hwmon/fam15h_power.c: In function ‘fam15h_power_probe’:
> /home/ray/tip/drivers/hwmon/fam15h_power.c:481:11: warning: passing argument 4 of ‘devm_hwmon_device_register_with_groups’ from incompatible pointer type [enabled by default]
>            &data->fam15h_power_groups);
>            ^
> In file included from /home/ray/tip/drivers/hwmon/fam15h_power.c:22:0:
> include/linux/hwmon.h:26:1: note: expected ‘const struct attribute_group **’ but argument is of type ‘const struct attribute_group * (*)[2]’
>  devm_hwmon_device_register_with_groups(struct device *dev, const char *name,
>  ^
> ---
> 
> Thanks,
> Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo
  2015-10-23 13:45   ` Guenter Roeck
@ 2015-10-26  2:19     ` Huang Rui
  0 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-26  2:19 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li

On Fri, Oct 23, 2015 at 06:45:59AM -0700, Guenter Roeck wrote:
> On 10/19/2015 07:28 PM, Huang Rui wrote:
> >This patch enables power1_input attribute for Carrizo platform.
> >
> >Signed-off-by: Huang Rui <ray.huang@amd.com>
> >Cc: Borislav Petkov <bp@alien8.de>
> >Cc: Guenter Roeck <linux@roeck-us.net>
> >Cc: Peter Zijlstra <peterz@infradead.org>
> >Cc: Ingo Molnar <mingo@kernel.org>
> >---
> >  drivers/hwmon/fam15h_power.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> >index 41d022e..a090adf 100644
> >--- a/drivers/hwmon/fam15h_power.c
> >+++ b/drivers/hwmon/fam15h_power.c
> >@@ -115,8 +115,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
> >  {
> >  	int n = FAM15H_MIN_NUM_ATTRS;
> >  	struct attribute **fam15h_power_attrs;
> >+	struct cpuinfo_x86 *c = &boot_cpu_data;
> >
> >-	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> >+	if (c->x86 == 0x15 &&
> >+	    ((c->x86_model <= 0xf) ||
> 
> Please no unnecessary ( ).
> 
> >+	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
> 
> Those are acceptable to clarify that the && has precedence on purpose,
> but "(c->x86_model <= 0xf)" is really unnecessary (and inconsistent
> with the rest of the code).
> 

OK, I will fixed it on V3. :)

Thanks,
Rui

> >  		n += 1;
> >
> >  	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
> >@@ -128,7 +131,9 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
> >
> >  	n = 0;
> >  	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
> >-	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> >+	if (c->x86 == 0x15 &&
> >+	    ((c->x86_model <= 0xf) ||
> >+	     (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
> 
> Same here.
> 
> >  		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
> >
> >  	data->fam15h_power_group.attrs = fam15h_power_attrs;
> >
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added
  2015-10-26  2:14       ` Guenter Roeck
@ 2015-10-26  2:25         ` Huang Rui
  0 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-26  2:25 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li

On Sun, Oct 25, 2015 at 07:14:27PM -0700, Guenter Roeck wrote:
> On Mon, Oct 26, 2015 at 09:58:45AM +0800, Huang Rui wrote:
> > On Fri, Oct 23, 2015 at 06:42:20AM -0700, Guenter Roeck wrote:
> > > On 10/19/2015 07:28 PM, Huang Rui wrote:
> > > >Attributes depend on the CPU model the driver gets loaded on.
> > > >Therefore, add those attributes dynamically at init time. This is more
> > > >flexible to control the different attributes on different platforms.
> > > >
> > > >Suggestedy-by: Borislav Petkov <bp@alien8.de>
> > > >Signed-off-by: Huang Rui <ray.huang@amd.com>
> > > >Cc: Guenter Roeck <linux@roeck-us.net>
> > > >Cc: Peter Zijlstra <peterz@infradead.org>
> > > >Cc: Ingo Molnar <mingo@kernel.org>
> > > >---
> > > >  drivers/hwmon/fam15h_power.c | 70 ++++++++++++++++++++++++++++----------------
> > > >  1 file changed, 45 insertions(+), 25 deletions(-)
> > > >
> > > >diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> > > >index e80ee23..41d022e 100644
> > > >--- a/drivers/hwmon/fam15h_power.c
> > > >+++ b/drivers/hwmon/fam15h_power.c
> > > >@@ -41,12 +41,17 @@ MODULE_LICENSE("GPL");
> > > >  #define REG_TDP_RUNNING_AVERAGE		0xe0
> > > >  #define REG_TDP_LIMIT3			0xe8
> > > >
> > > >+#define FAM15H_MIN_NUM_ATTRS		2
> > > >+#define FAM15H_NUM_GROUPS		2
> > > >+
> > > >  struct fam15h_power_data {
> > > >  	struct pci_dev *pdev;
> > > >  	unsigned int tdp_to_watts;
> > > >  	unsigned int base_tdp;
> > > >  	unsigned int processor_pwr_watts;
> > > >  	unsigned int cpu_pwr_sample_ratio;
> > > >+	const struct attribute_group *fam15h_power_groups[FAM15H_NUM_GROUPS];
> > > >+	struct attribute_group fam15h_power_group;
> > > >  };
> > > >
> > > >  static ssize_t show_power(struct device *dev,
> > > >@@ -105,29 +110,31 @@ static ssize_t show_power_crit(struct device *dev,
> > > >  }
> > > >  static DEVICE_ATTR(power1_crit, S_IRUGO, show_power_crit, NULL);
> > > >
> > > >-static umode_t fam15h_power_is_visible(struct kobject *kobj,
> > > >-				       struct attribute *attr,
> > > >-				       int index)
> > > >+static int fam15h_power_init_attrs(struct pci_dev *pdev,
> > > >+				   struct fam15h_power_data *data)
> > > >  {
> > > >-	/* power1_input is only reported for Fam15h, Models 00h-0fh */
> > > >-	if (attr == &dev_attr_power1_input.attr &&
> > > >-	   (boot_cpu_data.x86 != 0x15 || boot_cpu_data.x86_model > 0xf))
> > > >-		return 0;
> > > >+	int n = FAM15H_MIN_NUM_ATTRS;
> > > >+	struct attribute **fam15h_power_attrs;
> > > >
> > > >-	return attr->mode;
> > > >-}
> > > >+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> > > >+		n += 1;
> > > >
> > > >-static struct attribute *fam15h_power_attrs[] = {
> > > >-	&dev_attr_power1_input.attr,
> > > >-	&dev_attr_power1_crit.attr,
> > > >-	NULL
> > > >-};
> > > >+	fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
> > > >+					  sizeof(*fam15h_power_attrs),
> > > >+					  GFP_KERNEL);
> > > >
> > > >-static const struct attribute_group fam15h_power_group = {
> > > >-	.attrs = fam15h_power_attrs,
> > > >-	.is_visible = fam15h_power_is_visible,
> > > >-};
> > > >-__ATTRIBUTE_GROUPS(fam15h_power);
> > > >+	if (!fam15h_power_attrs)
> > > >+		return -ENOMEM;
> > > >+
> > > >+	n = 0;
> > > >+	fam15h_power_attrs[n++] = &dev_attr_power1_crit.attr;
> > > >+	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model <= 0xf)
> > > >+		fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
> > > >+
> > > >+	data->fam15h_power_group.attrs = fam15h_power_attrs;
> > > >+
> > > >+	return 0;
> > > >+}
> > > >
> > > >  static bool should_load_on_this_node(struct pci_dev *f4)
> > > >  {
> > > >@@ -186,11 +193,12 @@ static int fam15h_power_resume(struct pci_dev *pdev)
> > > >  #define fam15h_power_resume NULL
> > > >  #endif
> > > >
> > > >-static void fam15h_power_init_data(struct pci_dev *f4,
> > > >-					     struct fam15h_power_data *data)
> > > >+static int fam15h_power_init_data(struct pci_dev *f4,
> > > >+				  struct fam15h_power_data *data)
> > > >  {
> > > >  	u32 val, eax, ebx, ecx, edx;
> > > >  	u64 tmp;
> > > >+	int ret;
> > > >
> > > >  	pci_read_config_dword(f4, REG_PROCESSOR_TDP, &val);
> > > >  	data->base_tdp = val >> 16;
> > > >@@ -211,11 +219,15 @@ static void fam15h_power_init_data(struct pci_dev *f4,
> > > >  	/* convert to microWatt */
> > > >  	data->processor_pwr_watts = (tmp * 15625) >> 10;
> > > >
> > > >+	ret = fam15h_power_init_attrs(f4, data);
> > > >+	if (ret)
> > > >+		return ret;
> > > >+
> > > >  	cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
> > > >
> > > >  	/* CPUID Fn8000_0007:EDX[12] indicates to support accumulated power */
> > > >  	if (!(edx & BIT(12)))
> > > >-		return;
> > > >+		return 0;
> > > >
> > > >  	/*
> > > >  	 * determine the ratio of the compute unit power accumulator
> > > >@@ -223,14 +235,17 @@ static void fam15h_power_init_data(struct pci_dev *f4,
> > > >  	 * Fn8000_0007:ECX
> > > >  	 */
> > > >  	data->cpu_pwr_sample_ratio = ecx;
> > > >+
> > > >+	return 0;
> > > >  }
> > > >
> > > >  static int fam15h_power_probe(struct pci_dev *pdev,
> > > >-					const struct pci_device_id *id)
> > > >+			      const struct pci_device_id *id)
> > > >  {
> > > >  	struct fam15h_power_data *data;
> > > >  	struct device *dev = &pdev->dev;
> > > >  	struct device *hwmon_dev;
> > > >+	int ret;
> > > >
> > > >  	/*
> > > >  	 * though we ignore every other northbridge, we still have to
> > > >@@ -246,12 +261,17 @@ static int fam15h_power_probe(struct pci_dev *pdev,
> > > >  	if (!data)
> > > >  		return -ENOMEM;
> > > >
> > > >-	fam15h_power_init_data(pdev, data);
> > > >+	ret = fam15h_power_init_data(pdev, data);
> > > >+	if (ret)
> > > >+		return ret;
> > > >+
> > > >  	data->pdev = pdev;
> > > >
> > > >+	data->fam15h_power_groups[0] = &data->fam15h_power_group;
> > > >+
> > > >  	hwmon_dev = devm_hwmon_device_register_with_groups(dev, "fam15h_power",
> > > >  							   data,
> > > >-							   fam15h_power_groups);
> > > >+							   (const struct attribute_group **)&data->fam15h_power_groups);
> > > 
> > > Why this typecast ? It should not be needed.
> > > 
> > 
> > That's because there will be a warning without the typecast.
> > 
> > Expected 'const struct attribute_group **', but current type is 'const
> > struct attribute_group * (*)[2]'.
> > 
> 
> Maybe then either use data->fam15h_power_groups (no &) or
> &data->fam15h_power_groups[0].

Looks better. :)

> 
> On a side note, the "fam15h_" prefix in the variable name is really
> unnecessary unless there are some other groups in the same data structure.
> Even the "power_" prefix doesn't really add any value unless there
> are other groups.
> 

OK, I will update it on V3.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for accumulated power
  2015-10-23 13:59   ` Guenter Roeck
@ 2015-10-26  3:37     ` Huang Rui
  2015-10-26  4:36       ` Borislav Petkov
  0 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-26  3:37 UTC (permalink / raw)
  To: Guenter Roeck, Borislav Petkov
  Cc: Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Fri, Oct 23, 2015 at 06:59:19AM -0700, Guenter Roeck wrote:
> On 10/19/2015 07:28 PM, Huang Rui wrote:
> >PTSC is the performance timestamp counter value in a cpu core and the
> >cores in one compute unit have the fixed frequency. So it picks up the
> >performance timestamp counter value of the first core per compute unit
> >to measure the interval for average power per compute unit.
> >
> >Signed-off-by: Huang Rui <ray.huang@amd.com>
> >Cc: Borislav Petkov <bp@alien8.de>
> >Cc: Guenter Roeck <linux@roeck-us.net>
> >Cc: Peter Zijlstra <peterz@infradead.org>
> >Cc: Ingo Molnar <mingo@kernel.org>
> >---
> >  arch/x86/include/asm/msr-index.h | 1 +
> >  drivers/hwmon/fam15h_power.c     | 5 +++++
> >  2 files changed, 6 insertions(+)
> >
> >diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> >index c1c0a1c..3686eaa 100644
> >--- a/arch/x86/include/asm/msr-index.h
> >+++ b/arch/x86/include/asm/msr-index.h
> >@@ -313,6 +313,7 @@
> >  #define MSR_F15H_PERF_CTR		0xc0010201
> >  #define MSR_F15H_NB_PERF_CTL		0xc0010240
> >  #define MSR_F15H_NB_PERF_CTR		0xc0010241
> >+#define MSR_F15H_PTSC			0xc0010280
> >
> >  /* Fam 10h MSRs */
> >  #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
> >diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> >index 88e4f3e..6321f73 100644
> >--- a/drivers/hwmon/fam15h_power.c
> >+++ b/drivers/hwmon/fam15h_power.c
> >@@ -62,6 +62,8 @@ struct fam15h_power_data {
> >  	u64 max_cu_acc_power;
> >  	/* accumulated power of the compute units */
> >  	u64 cu_acc_power[MAX_CUS];
> >+	/* performance timestamp counter */
> >+	u64 cpu_sw_pwr_ptsc[MAX_CUS];
> >  };
> >
> >  static ssize_t show_power(struct device *dev,
> >@@ -132,6 +134,9 @@ static void do_read_registers_on_cu(void *_data)
> >
> >  	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
> >  			    &data->cu_acc_power[cu]));
> >+
> >+	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
> >+			    &data->cpu_sw_pwr_ptsc[cu]));
> >  }
> 
> I am not really happy with those WARN_ON, or even an error message.
> If the error is seen, it may be persistent.
> 
> If an error check is really needed here, it might make more sense to store
> the read error and return it to user space if the respective sysfs attribute
> is read.
> 

I am OK with removing WARN_ON here. Boris, if you also agree with it,
I will remove it on V3.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2015-10-23 13:28   ` Borislav Petkov
@ 2015-10-26  3:46     ` Huang Rui
  2015-10-26  4:41       ` Borislav Petkov
  0 siblings, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-26  3:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Fri, Oct 23, 2015 at 03:28:02PM +0200, Borislav Petkov wrote:
> On Tue, Oct 20, 2015 at 10:28:26AM +0800, Huang Rui wrote:
> > This patch introduces an algorithm that computes the average power by
> > reading a delta value of “core power accumulator” register during
> > measurement interval, and then dividing delta value by the length of
> > the time interval.
> > 
> > User is able to use power1_average entry to measure the processor power
> > consumption and power1_average_interval entry to set the interval.
> > 
> > A simple example:
> > 
> > ray@hr-ub:~/tip$ sensors
> > fam15h_power-pci-00c4
> > Adapter: PCI adapter
> > power1:       23.73 mW (avg = 634.63 mW, interval =   0.01 s)
> >                        (crit =  15.00 W)
> > 
> > ...
> 
> I need to play with this more after I get back from KS. Just a partial
> review for now.
> 

Sounds good.

> > 
> > The result is current average processor power consumption in 10
> > millisecond. The unit of the result is uWatt.
> > 
> > Suggested-by: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > ---
> >  drivers/hwmon/fam15h_power.c | 120 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 120 insertions(+)
> > 
> > diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> > index 6321f73..a5a539e 100644
> > --- a/drivers/hwmon/fam15h_power.c
> > +++ b/drivers/hwmon/fam15h_power.c
> > @@ -26,6 +26,9 @@
> >  #include <linux/pci.h>
> >  #include <linux/bitops.h>
> >  #include <linux/cpumask.h>
> > +#include <linux/mutex.h>
> > +#include <linux/time.h>
> > +#include <linux/sched.h>
> >  #include <asm/processor.h>
> >  #include <asm/msr.h>
> >  
> > @@ -64,6 +67,10 @@ struct fam15h_power_data {
> >  	u64 cu_acc_power[MAX_CUS];
> >  	/* performance timestamp counter */
> >  	u64 cpu_sw_pwr_ptsc[MAX_CUS];
> > +	/* online/offline status of current compute unit */
> > +	int cu_on[MAX_CUS];
> > +	unsigned long power_period;
> > +	struct mutex acc_pwr_mutex;
> >  };
> >  
> >  static ssize_t show_power(struct device *dev,
> > @@ -132,11 +139,15 @@ static void do_read_registers_on_cu(void *_data)
> >  	cores_per_cu = amd_get_cores_per_cu();
> >  	cu = cpu / cores_per_cu;
> >  
> > +	mutex_lock(&data->acc_pwr_mutex);
> >  	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
> >  			    &data->cu_acc_power[cu]));
> >  
> >  	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
> >  			    &data->cpu_sw_pwr_ptsc[cu]));
> > +
> > +	data->cu_on[cu] = 1;
> > +	mutex_unlock(&data->acc_pwr_mutex);
> >  }
> >  
> >  static int read_registers(struct fam15h_power_data *data)
> > @@ -148,6 +159,10 @@ static int read_registers(struct fam15h_power_data *data)
> >  	cores_per_cu = amd_get_cores_per_cu();
> >  	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> >  
> > +	mutex_lock(&data->acc_pwr_mutex);
> > +	memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
> > +	mutex_unlock(&data->acc_pwr_mutex);
> > +
> >  	WARN_ON_ONCE(cu_num > MAX_CUS);
> >  
> >  	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> > @@ -184,18 +199,113 @@ static int read_registers(struct fam15h_power_data *data)
> >  	return 0;
> >  }
> >  
> > +static ssize_t acc_show_power(struct device *dev,
> > +			      struct device_attribute *attr,
> > +			      char *buf)
> > +{
> > +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> > +	u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
> > +	    jdelta[MAX_CUS];
> > +	u64 tdelta, avg_acc;
> > +	int cu, cu_num, cores_per_cu, ret;
> > +	signed long leftover;
> > +
> > +	cores_per_cu = amd_get_cores_per_cu();
> > +	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> > +
> > +	ret = read_registers(data);
> > +	if (ret)
> > +		return 0;
> > +
> > +	cu = 0;
> > +	while(cu++ < cu_num) {
> > +		prev_cu_acc_power[cu] = data->cu_acc_power[cu];
> > +		prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
> > +	}
> 
> Please integrate checkpatch into your workflow of creating patches - it
> can be correct sometimes:
> 
> ERROR: space required before the open parenthesis '('
> #130: FILE: drivers/hwmon/fam15h_power.c:221:
> +       while(cu++ < cu_num) {
> 

Oh, I will take care next time.

> 
> > +
> > +	leftover = schedule_timeout_interruptible(
> > +			msecs_to_jiffies(data->power_period)
> > +		   );
> 
> This way of writing a function call is reaaally ugly. What's wrong with:
> 
> 	leftover = schedule_timeout_interruptible(msecs_to_jiffies(data->power_period));
> 
> ?
> 
> And don't tell me 80 columns - that rule is not a hard one.
> 

Actually, yes. I found it too long, so... :)
OK, I will fix it on V3.


Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for accumulated power
  2015-10-26  3:37     ` Huang Rui
@ 2015-10-26  4:36       ` Borislav Petkov
  0 siblings, 0 replies; 39+ messages in thread
From: Borislav Petkov @ 2015-10-26  4:36 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Fr�d�ric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Mon, Oct 26, 2015 at 11:37:23AM +0800, Huang Rui wrote:
> On Fri, Oct 23, 2015 at 06:59:19AM -0700, Guenter Roeck wrote:
> > On 10/19/2015 07:28 PM, Huang Rui wrote:
> > >PTSC is the performance timestamp counter value in a cpu core and the
> > >cores in one compute unit have the fixed frequency. So it picks up the
> > >performance timestamp counter value of the first core per compute unit
> > >to measure the interval for average power per compute unit.
> > >
> > >Signed-off-by: Huang Rui <ray.huang@amd.com>
> > >Cc: Borislav Petkov <bp@alien8.de>
> > >Cc: Guenter Roeck <linux@roeck-us.net>
> > >Cc: Peter Zijlstra <peterz@infradead.org>
> > >Cc: Ingo Molnar <mingo@kernel.org>
> > >---
> > >  arch/x86/include/asm/msr-index.h | 1 +
> > >  drivers/hwmon/fam15h_power.c     | 5 +++++
> > >  2 files changed, 6 insertions(+)

...

> > >@@ -132,6 +134,9 @@ static void do_read_registers_on_cu(void *_data)
> > >
> > >  	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
> > >  			    &data->cu_acc_power[cu]));
> > >+
> > >+	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
> > >+			    &data->cpu_sw_pwr_ptsc[cu]));
> > >  }
> > 
> > I am not really happy with those WARN_ON, or even an error message.
> > If the error is seen, it may be persistent.
> > 
> > If an error check is really needed here, it might make more sense to store
> > the read error and return it to user space if the respective sysfs attribute
> > is read.
> > 
> 
> I am OK with removing WARN_ON here. Boris, if you also agree with it,

The real question should be: are those MSR accesses behind a CPUID flag check
which guarantees their existence?

If so, you don't really need WARN_ONs. And the MSR accesses better be
behind a CPUID flag anyway because reading non-existent MSRs is pretty
much pure waste of energy.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2015-10-26  3:46     ` Huang Rui
@ 2015-10-26  4:41       ` Borislav Petkov
  0 siblings, 0 replies; 39+ messages in thread
From: Borislav Petkov @ 2015-10-26  4:41 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Mon, Oct 26, 2015 at 11:46:16AM +0800, Huang Rui wrote:
> Actually, yes. I found it too long, so... :)

Yeah, just try to apply some good, old-fashioned human common sense when
looking at longer lines and deciding whether actually letting them stick
out would be better for readability than breaking them.

:-)

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-23 13:27   ` Borislav Petkov
  2015-10-23 13:37     ` Guenter Roeck
@ 2015-10-27  2:53     ` Huang Rui
  2015-10-28  0:34       ` Borislav Petkov
  1 sibling, 1 reply; 39+ messages in thread
From: Huang Rui @ 2015-10-27  2:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Fri, Oct 23, 2015 at 03:27:02PM +0200, Borislav Petkov wrote:
> On Tue, Oct 20, 2015 at 10:28:24AM +0800, Huang Rui wrote:
> > This patch adds a member in fam15h_power_data which specifies the
> > compute unit accumulated power. It adds do_read_registers_on_cu to do
> > all the read to all MSRs and run it on one of the online cores on each
> > compute unit with smp_call_function_many(). This behavior can decrease
> > IPI numbers.
> > 
> > Suggested-by: Borislav Petkov <bp@alien8.de>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > Cc: Guenter Roeck <linux@roeck-us.net>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > ---
> >  drivers/hwmon/fam15h_power.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 67 insertions(+), 1 deletion(-)
> > 

<snip>

> > +
> > +static int read_registers(struct fam15h_power_data *data)
> > +{
> > +	int this_cpu, ret;
> > +	int cu_num, cores_per_cu, cpu, cu;
> > +	cpumask_var_t mask;
> > +
> > +	cores_per_cu = amd_get_cores_per_cu();
> > +	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> > +
> > +	WARN_ON_ONCE(cu_num > MAX_CUS);
> > +
> > +	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> > +	if (!ret)
> > +		return -ENOMEM;
> > +
> > +	this_cpu = get_cpu();
> 
> This should be get_online_cpus() and its counterpart below should be
> put_online_cpus().
> 

Preemption must be disabled when calling smp_call_function_many,
get_cpu would did that. Will get_online_cpus have the same behavior
like that?

> > +
> > +	/*
> > +	 * Choose the first online core of each compute unit, and then
> > +	 * read their MSR value of power and ptsc in one time of IPI,
> 
> 						in a single IPI.
> 
> > +	 * because the MSR value of cpu core represent the compute
> 
> s/cpu/CPU/
> 
> do that in *all* your text.
> 
> > +	 * unit's. This behavior can decrease IPI numbers between the
> 
> 	  unit's ?
> 
> What does that sentence even mean?
> 

That means "the value(cu_acc_power) of the compute unit", which does
not represent the value of one CPU core.

> > +	 * cores.
> > +	 */
> > +	cpu = cpumask_first(cpu_online_mask);
> > +	cu = cpu / cores_per_cu;
> > +	while (cpu < boot_cpu_data.x86_max_cores) {
> > +		if (cu <= cpu / cores_per_cu) {
> > +			cu = cpu / cores_per_cu + 1;
> > +			cpumask_set_cpu(cpu, mask);
> > +		}
> > +		cpu = cpumask_next(cu * cores_per_cu - 1, cpu_online_mask);
> > +	}
> 
> This is hard to parse - I *think* you're setting a bit in mask for a
> core in each CU...
> 

Yes, that's right.

My codes' behavior is below:

Assumed cores_per_cu is 4 and cu_number is 6, and the online cpumask
is:

        cu5  cu4  cu3  cu2  cu1  cu0
        1000_1100_0110_1011_0000_1111

After setting bits of the mask:

        1000_0100_0010_0001_0000_0001
        on   on   on   on   off  on

> If so, I think you can simplify it by generating a tmp mask which
> contains the cores of CU0, i.e. something like that:
> 
> 	11_00_00_...
> 
> and then do cpumask_and(res, ...) to find the online cores on that CU
> and then do cpumask_set_cpu(cpumask_any(res), mask) to select one CPU on
> that CU.
> 
> And then shift to the next CU:
> 
> 	cpumask_shift_right(dst, src_mask, cores_per_cu);
> 
> I think this should be cleaner and less error prone, without the
> conditionals...
> 

OK, how about below codes:

---
for (i = 0; i <= cores_per_cu / BITS_PER_LONG; i++) {
	offset = cores_per_cu % BITS_PER_LONG;
	if (i == cores_per_cu / BITS_PER_LONG) {
		cpumask_bits(src_mask)[i] = GENMASK(offset -1, 0);
		break;
	}
	cpumask_bits(src_mask)[i] = GENMASK(BITS_PER_LONG - 1, 0);
}

for (i = 0; i < cu_num; i++) {
	cpumask_shift_left(dst, src_mask, cores_per_cu * i);
	cpumask_and(res, dst, cpu_online_mask);
	cpumask_set_cpu(cpumask_any(res), mask);
}
---

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
  2015-10-23 14:20   ` Guenter Roeck
@ 2015-10-27  3:36     ` Huang Rui
  0 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-27  3:36 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Borislav Petkov, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Gopalakrishnan, Aravind, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Li, Tony

On Fri, Oct 23, 2015 at 10:20:59PM +0800, Guenter Roeck wrote:
> On 10/19/2015 07:28 PM, Huang Rui wrote:
> > This patch introduces an algorithm that computes the average power by
> > reading a delta value of “core power accumulator” register during
> > measurement interval, and then dividing delta value by the length of
> > the time interval.
> >
> > User is able to use power1_average entry to measure the processor power
> > consumption and power1_average_interval entry to set the interval.
> >
> > A simple example:
> >
> > ray@hr-ub:~/tip$ sensors
> > fam15h_power-pci-00c4
> > Adapter: PCI adapter
> > power1:       23.73 mW (avg = 634.63 mW, interval =   0.01 s)
> >                         (crit =  15.00 W)
> >
> > ...
> >
> > The result is current average processor power consumption in 10
> > millisecond. The unit of the result is uWatt.
> >
> > Suggested-by: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > ---
> >   drivers/hwmon/fam15h_power.c | 120 +++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 120 insertions(+)
> >
> > diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> > index 6321f73..a5a539e 100644
> > --- a/drivers/hwmon/fam15h_power.c
> > +++ b/drivers/hwmon/fam15h_power.c
> > @@ -26,6 +26,9 @@
> >   #include <linux/pci.h>
> >   #include <linux/bitops.h>
> >   #include <linux/cpumask.h>
> > +#include <linux/mutex.h>
> > +#include <linux/time.h>
> > +#include <linux/sched.h>
> >   #include <asm/processor.h>
> >   #include <asm/msr.h>
> >
> > @@ -64,6 +67,10 @@ struct fam15h_power_data {
> >   	u64 cu_acc_power[MAX_CUS];
> >   	/* performance timestamp counter */
> >   	u64 cpu_sw_pwr_ptsc[MAX_CUS];
> > +	/* online/offline status of current compute unit */
> > +	int cu_on[MAX_CUS];
> > +	unsigned long power_period;
> > +	struct mutex acc_pwr_mutex;
> 
> Can you elaborate a bit about what this mutex is supposed to protect ?
> To me it seems that it doesn't really protect anything.
> 

My orignal intention is to avoid race condition from user space.
Actually you know, show_power_acc, acc_set_power_period will expose to
application layer.

> >   };
> >
> >   static ssize_t show_power(struct device *dev,
> > @@ -132,11 +139,15 @@ static void do_read_registers_on_cu(void *_data)
> >   	cores_per_cu = amd_get_cores_per_cu();
> >   	cu = cpu / cores_per_cu;
> >
> > +	mutex_lock(&data->acc_pwr_mutex);
> >   	WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
> >   			    &data->cu_acc_power[cu]));
> >
> >   	WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
> >   			    &data->cpu_sw_pwr_ptsc[cu]));
> > +
> > +	data->cu_on[cu] = 1;
> > +	mutex_unlock(&data->acc_pwr_mutex);
> 
> ... for example, while this protects cu_on[cu],
> 
> >   }
> >
> >   static int read_registers(struct fam15h_power_data *data)
> > @@ -148,6 +159,10 @@ static int read_registers(struct fam15h_power_data *data)
> >   	cores_per_cu = amd_get_cores_per_cu();
> >   	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> >
> > +	mutex_lock(&data->acc_pwr_mutex);
> > +	memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
> > +	mutex_unlock(&data->acc_pwr_mutex);
> 
> ... this code may well overwrite that same value.
> 

Yes, but I only need the compute unit status of the "sencond" time of read_registers.
Then ignore the offline compute unit to avoid hotplug issue.

> > +
> >   	WARN_ON_ONCE(cu_num > MAX_CUS);
> >
> >   	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> > @@ -184,18 +199,113 @@ static int read_registers(struct fam15h_power_data *data)
> >   	return 0;
> >   }
> >
> > +static ssize_t acc_show_power(struct device *dev,
> > +			      struct device_attribute *attr,
> > +			      char *buf)
> > +{
> > +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> > +	u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
> > +	    jdelta[MAX_CUS];
> > +	u64 tdelta, avg_acc;
> > +	int cu, cu_num, cores_per_cu, ret;
> > +	signed long leftover;
> > +
> > +	cores_per_cu = amd_get_cores_per_cu();
> > +	cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> > +
> > +	ret = read_registers(data);
> > +	if (ret)
> > +		return 0;

The first time of read_registers

> > +
> > +	cu = 0;
> > +	while(cu++ < cu_num) {
> > +		prev_cu_acc_power[cu] = data->cu_acc_power[cu];
> > +		prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
> > +	}
> 
> ... and multiple parallel reads on the power attribute must produce
> pretty random values, unless I am really missing something.
> 
> > +
> > +	leftover = schedule_timeout_interruptible(
> > +			msecs_to_jiffies(data->power_period)
> > +		   );
> > +	if (leftover)
> > +		return 0;
> > +
> > +	ret = read_registers(data);

The second time of read_registers

> > +	if (ret)
> > +		return 0;
> > +
> With a 10ms period, I wonder how accurate this really is.
> 

According to the HW description, the measurement interval could be on
the order of several milliseconds. That should get a stable average
power value.

> > +	for (cu = 0, avg_acc = 0; cu < cu_num; cu++) {
> > +		/* check if current compute unit is online */
> > +		if (data->cu_on[cu] == 0)
> > +			continue;
> > +
> > +		if (data->cu_acc_power[cu] < prev_cu_acc_power[cu]) {
> > +			jdelta[cu] = data->max_cu_acc_power + data->cu_acc_power[cu];
> > +			jdelta[cu] -= prev_cu_acc_power[cu];
> > +		} else {
> > +			jdelta[cu] = data->cu_acc_power[cu] - prev_cu_acc_power[cu];
> > +		}
> > +		tdelta = data->cpu_sw_pwr_ptsc[cu] - prev_ptsc[cu];
> > +		jdelta[cu] *= data->cpu_pwr_sample_ratio * 1000;
> > +		do_div(jdelta[cu], tdelta);
> > +
> > +		/* the unit is microWatt */
> > +		avg_acc += jdelta[cu];
> > +	}
> > +
> > +	return sprintf(buf, "%llu\n", (unsigned long long)avg_acc);
> > +}
> > +static DEVICE_ATTR(power1_average, S_IRUGO, acc_show_power, NULL);
> > +
> > +
> > +static ssize_t acc_show_power_period(struct device *dev,
> > +				     struct device_attribute *attr,
> > +				     char *buf)
> > +{
> > +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> > +
> > +	return sprintf(buf, "%lu\n", data->power_period);
> > +}
> > +
> > +static ssize_t acc_set_power_period(struct device *dev,
> > +				    struct device_attribute *attr,
> > +				    const char *buf, size_t count)
> > +{
> > +	struct fam15h_power_data *data = dev_get_drvdata(dev);
> > +	unsigned long temp;
> > +	int ret;
> > +
> > +	ret = kstrtoul(buf, 10, &temp);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mutex_lock(&data->acc_pwr_mutex);
> > +	data->power_period = temp;
> > +	mutex_unlock(&data->acc_pwr_mutex);
> 
> This doesn't really protect anything either except that power_period
> can not be updated while the lock is active. But the code using
> power_period does not run under mutex protection, so that seems pretty
> pointless.
> 

I thought about carefully, the mutex here didn't protect the variable. 
Actually, my variables such as data->cu_acc_power[cu],
data->cu_on[cu] are independent for different cpu cores. They cannot
be updated on different threads at the same time. Different threads
just only updates values on different cu(index).

> Also, this accepts an unlimited timeout. If I understand correctly,
> setting the timeout to, say, 10 seconds will cause the read function
> to hang for that period of time. Setting it to 1 hour will cause the read
> function to hang for 1 hour.
> 
> Does this really make sense ?
> 

You're right. If set a long interval, the mutex will be hold a long
time, that causes cpu looks stuck...
So should I set a "power1_average_max"?

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-27  2:53     ` Huang Rui
@ 2015-10-28  0:34       ` Borislav Petkov
  2015-10-28  5:06         ` Huang Rui
  0 siblings, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2015-10-28  0:34 UTC (permalink / raw)
  To: Huang Rui
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Tue, Oct 27, 2015 at 10:53:40AM +0800, Huang Rui wrote:
> Preemption must be disabled when calling smp_call_function_many,
> get_cpu would did that. Will get_online_cpus have the same behavior
> like that?

Well, get_online_cpus() protects you against CPU hotplug operations in
general. If you want to protect yourself against CPUs going away only,
then I guess get_cpu()/put_cpu() is fine.

But since we're going to work with the masks as below, prohibiting any
changes to cpu_online_mask is probably the better/safe thing to do, i.e.

	get_online_cpus();
	preempt_disable();

	smp_call_function_many( ... );

	preempt_enable();
	put_online_cpus();

> That means "the value(cu_acc_power) of the compute unit", which does
> not represent the value of one CPU core.

No, I mean this: "This behavior can decrease IPI numbers between the
unit's."

I'm wondering whether it is really needed at all ...

> OK, how about below codes:
> 
> ---
> for (i = 0; i <= cores_per_cu / BITS_PER_LONG; i++) {
> 	offset = cores_per_cu % BITS_PER_LONG;
> 	if (i == cores_per_cu / BITS_PER_LONG) {
> 		cpumask_bits(src_mask)[i] = GENMASK(offset -1, 0);
> 		break;
> 	}
> 	cpumask_bits(src_mask)[i] = GENMASK(BITS_PER_LONG - 1, 0);
> }
> 
> for (i = 0; i < cu_num; i++) {
> 	cpumask_shift_left(dst, src_mask, cores_per_cu * i);
> 	cpumask_and(res, dst, cpu_online_mask);
> 	cpumask_set_cpu(cpumask_any(res), mask);
> }

I think you can make it even simpler:

	/* prepare CU temp mask */
	for (i = 0; i < cores_per_cu; i++)
		cpumask_set_cpu(i, tmp_mask);

	for (i = 0; i < cu_num; i++) {
		/* WARN_ON for empty CU masks */
		WARN_ON(!cpumask_and(res_mask, tmp_mask, cpu_online_mask));
		cpumask_set(cpumask_any(res_mask), call_mask);
		cpumask_shift_right(tmp_mask, tmp_mask, cores_per_cu);
	}

	smp_call_function_many(call_mask, .... );

Something like that...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power
  2015-10-28  0:34       ` Borislav Petkov
@ 2015-10-28  5:06         ` Huang Rui
  0 siblings, 0 replies; 39+ messages in thread
From: Huang Rui @ 2015-10-28  5:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Guenter Roeck, Peter Zijlstra, Jean Delvare, Andy Lutomirski,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, lm-sensors, linux-kernel, x86,
	Andreas Herrmann, Aravind Gopalakrishnan, Fengguang Wu, Aaron Lu,
	Tony Li

On Wed, Oct 28, 2015 at 01:34:18AM +0100, Borislav Petkov wrote:
> On Tue, Oct 27, 2015 at 10:53:40AM +0800, Huang Rui wrote:
> > Preemption must be disabled when calling smp_call_function_many,
> > get_cpu would did that. Will get_online_cpus have the same behavior
> > like that?
> 
> Well, get_online_cpus() protects you against CPU hotplug operations in
> general. If you want to protect yourself against CPUs going away only,
> then I guess get_cpu()/put_cpu() is fine.
> 
> But since we're going to work with the masks as below, prohibiting any
> changes to cpu_online_mask is probably the better/safe thing to do, i.e.
> 
> 	get_online_cpus();
> 	preempt_disable();
> 
> 	smp_call_function_many( ... );
> 
> 	preempt_enable();
> 	put_online_cpus();
> 
> > That means "the value(cu_acc_power) of the compute unit", which does
> > not represent the value of one CPU core.
> 
> No, I mean this: "This behavior can decrease IPI numbers between the
> unit's."
> 
> I'm wondering whether it is really needed at all ...
> 

OK, The real words are "This behavior can decrease IPI numbers between
the cores." Actually, the meaning also can be reflected from the
codes. So I could remove this sentence.

> > OK, how about below codes:
> > 
> > ---
> > for (i = 0; i <= cores_per_cu / BITS_PER_LONG; i++) {
> > 	offset = cores_per_cu % BITS_PER_LONG;
> > 	if (i == cores_per_cu / BITS_PER_LONG) {
> > 		cpumask_bits(src_mask)[i] = GENMASK(offset -1, 0);
> > 		break;
> > 	}
> > 	cpumask_bits(src_mask)[i] = GENMASK(BITS_PER_LONG - 1, 0);
> > }
> > 
> > for (i = 0; i < cu_num; i++) {
> > 	cpumask_shift_left(dst, src_mask, cores_per_cu * i);
> > 	cpumask_and(res, dst, cpu_online_mask);
> > 	cpumask_set_cpu(cpumask_any(res), mask);
> > }
> 
> I think you can make it even simpler:
> 
> 	/* prepare CU temp mask */
> 	for (i = 0; i < cores_per_cu; i++)
> 		cpumask_set_cpu(i, tmp_mask);
> 
> 	for (i = 0; i < cu_num; i++) {
> 		/* WARN_ON for empty CU masks */
> 		WARN_ON(!cpumask_and(res_mask, tmp_mask, cpu_online_mask));
> 		cpumask_set(cpumask_any(res_mask), call_mask);
> 		cpumask_shift_right(tmp_mask, tmp_mask, cores_per_cu);
> 	}
> 
> 	smp_call_function_many(call_mask, .... );
> 
> Something like that...
> 

Looks better. :)

Thanks,
Rui

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-10-28  5:07 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-20  2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
2015-10-20  2:28 ` [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added Huang Rui
2015-10-23 13:42   ` Guenter Roeck
2015-10-26  1:58     ` Huang Rui
2015-10-26  2:14       ` Guenter Roeck
2015-10-26  2:25         ` Huang Rui
2015-10-20  2:28 ` [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo Huang Rui
2015-10-23 13:45   ` Guenter Roeck
2015-10-26  2:19     ` Huang Rui
2015-10-20  2:28 ` [PATCH v2 03/10] hwmon: (fam15h_power) Add max compute unit accumulated power Huang Rui
2015-10-20  2:28 ` [PATCH v2 04/10] x86, amd: add accessor for number of cores per compute unit Huang Rui
2015-10-20  2:28 ` [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
2015-10-20  7:24   ` kbuild test robot
2015-10-21  1:42     ` Huang Rui
2015-10-21  2:15       ` Guenter Roeck
2015-10-21  2:40         ` Huang Rui
2015-10-21  2:49           ` Guenter Roeck
2015-10-21  3:04             ` Huang Rui
2015-10-21  6:05               ` Jean Delvare
2015-10-21  6:28                 ` Huang Rui
2015-10-23 13:27   ` Borislav Petkov
2015-10-23 13:37     ` Guenter Roeck
2015-10-27  2:53     ` Huang Rui
2015-10-28  0:34       ` Borislav Petkov
2015-10-28  5:06         ` Huang Rui
2015-10-20  2:28 ` [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
2015-10-23 13:59   ` Guenter Roeck
2015-10-26  3:37     ` Huang Rui
2015-10-26  4:36       ` Borislav Petkov
2015-10-20  2:28 ` [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
2015-10-23 13:28   ` Borislav Petkov
2015-10-26  3:46     ` Huang Rui
2015-10-26  4:41       ` Borislav Petkov
2015-10-23 14:20   ` Guenter Roeck
2015-10-27  3:36     ` Huang Rui
2015-10-20  2:28 ` [PATCH v2 08/10] hwmon: (fam15h_power) Add documentation for previous TDP reporting Huang Rui
2015-10-20  2:28 ` [PATCH v2 09/10] hwmon: (fam15h_power) Add documentation for accumulated power algorithm Huang Rui
2015-10-23 10:22   ` Borislav Petkov
2015-10-20  2:28 ` [PATCH v2 10/10] MAINTAINERS: change the maintainer of fam15h_power driver Huang Rui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).