linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/3] hwmon: (k10temp): Rework the temperature offset calculation
       [not found] <20210827201527.24454-1-mario.limonciello@amd.com>
@ 2021-08-27 20:15 ` Mario Limonciello
  2021-08-27 20:55   ` Guenter Roeck
  2021-08-27 20:15 ` [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp Mario Limonciello
  2021-08-27 20:15 ` [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read Mario Limonciello
  2 siblings, 1 reply; 9+ messages in thread
From: Mario Limonciello @ 2021-08-27 20:15 UTC (permalink / raw)
  To: Clemens Ladisch
  Cc: linux-hwmon, Gabriel Craciunescu, Guenter Roeck, Wei Huang,
	Mario Limonciello, Jean Delvare, open list

Some of the existing assumptions made do not scale properly
to new silicon in upcoming changes.  This commit should cause
no functional changes to existing silicon.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/hwmon/k10temp.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index f6b325b8463e..159dbad73d82 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -65,10 +65,11 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
 #define F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET	0xd8200c64
 #define F15H_M60H_REPORTED_TEMP_CTRL_OFFSET	0xd8200ca4
 
-/* Common for Zen CPU families (Family 17h and 18h) */
-#define ZEN_REPORTED_TEMP_CTRL_OFFSET		0x00059800
+/* Common for Zen CPU families (Family 17h and 18h and 19h) */
+#define ZEN_REPORTED_TEMP_CTRL_BASE		0x00059800
 
-#define ZEN_CCD_TEMP(x)				(0x00059954 + ((x) * 4))
+#define ZEN_CCD_TEMP(offset, x)			(ZEN_REPORTED_TEMP_CTRL_BASE + \
+						 (offset) + ((x) * 4))
 #define ZEN_CCD_TEMP_VALID			BIT(11)
 #define ZEN_CCD_TEMP_MASK			GENMASK(10, 0)
 
@@ -103,6 +104,7 @@ struct k10temp_data {
 	u32 temp_adjust_mask;
 	u32 show_temp;
 	bool is_zen;
+	u32 ccd_offset;
 };
 
 #define TCTL_BIT	0
@@ -163,7 +165,7 @@ static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
 static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
 {
 	amd_smn_read(amd_pci_dev_to_node_id(pdev),
-		     ZEN_REPORTED_TEMP_CTRL_OFFSET, regval);
+		     ZEN_REPORTED_TEMP_CTRL_BASE, regval);
 }
 
 static long get_raw_temp(struct k10temp_data *data)
@@ -226,7 +228,8 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
 			break;
 		case 2 ... 9:		/* Tccd{1-8} */
 			amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
-				     ZEN_CCD_TEMP(channel - 2), &regval);
+				     ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
+						  &regval);
 			*val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
 			break;
 		default:
@@ -387,7 +390,7 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev,
 
 	for (i = 0; i < limit; i++) {
 		amd_smn_read(amd_pci_dev_to_node_id(pdev),
-			     ZEN_CCD_TEMP(i), &regval);
+			     ZEN_CCD_TEMP(data->ccd_offset, i), &regval);
 		if (regval & ZEN_CCD_TEMP_VALID)
 			data->show_temp |= BIT(TCCD_BIT(i));
 	}
@@ -433,12 +436,14 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		case 0x8:	/* Zen+ */
 		case 0x11:	/* Zen APU */
 		case 0x18:	/* Zen+ APU */
+			data->ccd_offset = 0x154;
 			k10temp_get_ccd_support(pdev, data, 4);
 			break;
 		case 0x31:	/* Zen2 Threadripper */
 		case 0x60:	/* Renoir */
 		case 0x68:	/* Lucienne */
 		case 0x71:	/* Zen2 */
+			data->ccd_offset = 0x154;
 			k10temp_get_ccd_support(pdev, data, 8);
 			break;
 		}
@@ -451,6 +456,7 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		case 0x0 ... 0x1:	/* Zen3 SP3/TR */
 		case 0x21:		/* Zen3 Ryzen Desktop */
 		case 0x50 ... 0x5f:	/* Green Sardine */
+			data->ccd_offset = 0x154;
 			k10temp_get_ccd_support(pdev, data, 8);
 			break;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp
       [not found] <20210827201527.24454-1-mario.limonciello@amd.com>
  2021-08-27 20:15 ` [PATCH v2 1/3] hwmon: (k10temp): Rework the temperature offset calculation Mario Limonciello
@ 2021-08-27 20:15 ` Mario Limonciello
  2021-08-27 20:20   ` Borislav Petkov
  2021-08-27 20:55   ` Guenter Roeck
  2021-08-27 20:15 ` [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read Mario Limonciello
  2 siblings, 2 replies; 9+ messages in thread
From: Mario Limonciello @ 2021-08-27 20:15 UTC (permalink / raw)
  To: Clemens Ladisch
  Cc: linux-hwmon, Gabriel Craciunescu, Guenter Roeck, Wei Huang,
	Mario Limonciello, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Jean Delvare, Bjorn Helgaas, Yazen Ghannam,
	David Bartley, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:PCI SUBSYSTEM

Yellow carp matches same behavior as green sardine and other Zen3
products, but have different CCD offsets.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 arch/x86/kernel/amd_nb.c | 5 +++++
 drivers/hwmon/k10temp.c  | 5 +++++
 include/linux/pci_ids.h  | 1 +
 3 files changed, 11 insertions(+)

diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 23dda362dc0f..c92c9c774c0e 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -25,6 +25,8 @@
 #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F4 0x144c
 #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
 #define PCI_DEVICE_ID_AMD_19H_DF_F4	0x1654
+#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT	0x14b5
+#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d
 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
 
 /* Protect the PCI config register pairs used for SMN and DF indirect access. */
@@ -37,6 +39,7 @@ static const struct pci_device_id amd_root_ids[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M10H_ROOT) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) },
 	{}
 };
 
@@ -58,6 +61,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
 	{}
 };
@@ -74,6 +78,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_DF_F4) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F4) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F4) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
 	{}
diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 159dbad73d82..38bc35ac8135 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -459,6 +459,10 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 			data->ccd_offset = 0x154;
 			k10temp_get_ccd_support(pdev, data, 8);
 			break;
+		case 0x40 ... 0x4f:	/* Yellow Carp */
+			data->ccd_offset = 0x300;
+			k10temp_get_ccd_support(pdev, data, 8);
+			break;
 		}
 	} else {
 		data->read_htcreg = read_htcreg_pci;
@@ -499,6 +503,7 @@ static const struct pci_device_id k10temp_id_table[] = {
 	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M60H_DF_F3) },
 	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) },
 	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
+	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) },
 	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
 	{ PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) },
 	{}
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 5356ccf1c275..e77a62fd0036 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -555,6 +555,7 @@
 #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
 #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
 #define PCI_DEVICE_ID_AMD_19H_DF_F3	0x1653
+#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F3 0x167c
 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
 #define PCI_DEVICE_ID_AMD_CNB17H_F3	0x1703
 #define PCI_DEVICE_ID_AMD_LANCE		0x2000
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read
       [not found] <20210827201527.24454-1-mario.limonciello@amd.com>
  2021-08-27 20:15 ` [PATCH v2 1/3] hwmon: (k10temp): Rework the temperature offset calculation Mario Limonciello
  2021-08-27 20:15 ` [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp Mario Limonciello
@ 2021-08-27 20:15 ` Mario Limonciello
  2021-08-27 21:06   ` Guenter Roeck
  2 siblings, 1 reply; 9+ messages in thread
From: Mario Limonciello @ 2021-08-27 20:15 UTC (permalink / raw)
  To: Clemens Ladisch
  Cc: linux-hwmon, Gabriel Craciunescu, Guenter Roeck, Wei Huang,
	Mario Limonciello, Jean Delvare, open list

Enabling Yellow Carp was initially not working "properly"
because extra IDs were needed, but this wasn't obvious because fail values
from `amd_smn_read` were ignored.

Don't discard errors from any functions providing them, instead pass up
to the caller.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/hwmon/k10temp.c | 87 ++++++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 35 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 38bc35ac8135..2edb49d39d22 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -98,8 +98,8 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
 
 struct k10temp_data {
 	struct pci_dev *pdev;
-	void (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
-	void (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
+	int (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
+	int (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
 	int temp_offset;
 	u32 temp_adjust_mask;
 	u32 show_temp;
@@ -129,55 +129,65 @@ static const struct tctl_offset tctl_offset_table[] = {
 	{ 0x17, "AMD Ryzen Threadripper 29", 27000 }, /* 29{20,50,70,90}[W]X */
 };
 
-static void read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
+static int read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
 {
-	pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
+	return pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
 }
 
-static void read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
+static int read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
 {
-	pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
+	return pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
 }
 
-static void amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
+static int amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
 			      unsigned int base, int offset, u32 *val)
 {
+	int ret;
+
 	mutex_lock(&nb_smu_ind_mutex);
-	pci_bus_write_config_dword(pdev->bus, devfn,
-				   base, offset);
-	pci_bus_read_config_dword(pdev->bus, devfn,
-				  base + 4, val);
+	ret = pci_bus_write_config_dword(pdev->bus, devfn,
+					 base, offset);
+	if (ret)
+		goto out;
+	ret = pci_bus_read_config_dword(pdev->bus, devfn,
+					base + 4, val);
+out:
 	mutex_unlock(&nb_smu_ind_mutex);
+	return ret;
 }
 
-static void read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
+static int read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
 {
-	amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
-			  F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
+	return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
+				F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
 }
 
-static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
+static int read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
 {
-	amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
-			  F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
+	return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
+				F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
 }
 
-static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
+static int read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
 {
-	amd_smn_read(amd_pci_dev_to_node_id(pdev),
-		     ZEN_REPORTED_TEMP_CTRL_BASE, regval);
+	return amd_smn_read(amd_pci_dev_to_node_id(pdev),
+			    ZEN_REPORTED_TEMP_CTRL_BASE, regval);
 }
 
-static long get_raw_temp(struct k10temp_data *data)
+static int get_raw_temp(struct k10temp_data *data, long *val)
 {
 	u32 regval;
-	long temp;
+	int ret;
 
-	data->read_tempreg(data->pdev, &regval);
-	temp = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
+	ret = data->read_tempreg(data->pdev, &regval);
+	if (ret)
+		return ret;
+	*val = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
 	if (regval & data->temp_adjust_mask)
-		temp -= 49000;
-	return temp;
+		*val -= 49000;
+	if (*val < 0)
+		return -EINVAL;
+	return 0;
 }
 
 static const char *k10temp_temp_label[] = {
@@ -212,24 +222,27 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
 {
 	struct k10temp_data *data = dev_get_drvdata(dev);
 	u32 regval;
+	int ret;
 
 	switch (attr) {
 	case hwmon_temp_input:
 		switch (channel) {
 		case 0:		/* Tctl */
-			*val = get_raw_temp(data);
-			if (*val < 0)
-				*val = 0;
+			ret = get_raw_temp(data, val);
+			if (ret)
+				return ret;
 			break;
 		case 1:		/* Tdie */
-			*val = get_raw_temp(data) - data->temp_offset;
-			if (*val < 0)
-				*val = 0;
+			ret = get_raw_temp(data, val) - data->temp_offset;
+			if (ret)
+				return ret;
 			break;
 		case 2 ... 9:		/* Tccd{1-8} */
-			amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
+			ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
 				     ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
 						  &regval);
+			if (ret)
+				return ret;
 			*val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
 			break;
 		default:
@@ -240,11 +253,15 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
 		*val = 70 * 1000;
 		break;
 	case hwmon_temp_crit:
-		data->read_htcreg(data->pdev, &regval);
+		ret = data->read_htcreg(data->pdev, &regval);
+		if (ret)
+			return ret;
 		*val = ((regval >> 16) & 0x7f) * 500 + 52000;
 		break;
 	case hwmon_temp_crit_hyst:
-		data->read_htcreg(data->pdev, &regval);
+		ret = data->read_htcreg(data->pdev, &regval);
+		if (ret)
+			return ret;
 		*val = (((regval >> 16) & 0x7f)
 			- ((regval >> 24) & 0xf)) * 500 + 52000;
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp
  2021-08-27 20:15 ` [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp Mario Limonciello
@ 2021-08-27 20:20   ` Borislav Petkov
  2021-08-27 20:55   ` Guenter Roeck
  1 sibling, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2021-08-27 20:20 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Clemens Ladisch, linux-hwmon, Gabriel Craciunescu, Guenter Roeck,
	Wei Huang, Thomas Gleixner, Ingo Molnar,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Jean Delvare, Bjorn Helgaas, Yazen Ghannam,
	David Bartley, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:PCI SUBSYSTEM

On Fri, Aug 27, 2021 at 03:15:26PM -0500, Mario Limonciello wrote:
> Yellow carp matches same behavior as green sardine and other Zen3
> products, but have different CCD offsets.
> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  arch/x86/kernel/amd_nb.c | 5 +++++
>  drivers/hwmon/k10temp.c  | 5 +++++
>  include/linux/pci_ids.h  | 1 +
>  3 files changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 23dda362dc0f..c92c9c774c0e 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -25,6 +25,8 @@
>  #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F4 0x144c
>  #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
>  #define PCI_DEVICE_ID_AMD_19H_DF_F4	0x1654
> +#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT	0x14b5
> +#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d
>  #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
>  
>  /* Protect the PCI config register pairs used for SMN and DF indirect access. */
> @@ -37,6 +39,7 @@ static const struct pci_device_id amd_root_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M10H_ROOT) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) },
>  	{}
>  };
>  
> @@ -58,6 +61,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
>  	{}
>  };
> @@ -74,6 +78,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F4) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
>  	{}

The x86 bits:

Acked-by: Borislav Petkov <bp@suse.de>

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] hwmon: (k10temp): Rework the temperature offset calculation
  2021-08-27 20:15 ` [PATCH v2 1/3] hwmon: (k10temp): Rework the temperature offset calculation Mario Limonciello
@ 2021-08-27 20:55   ` Guenter Roeck
  0 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2021-08-27 20:55 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Clemens Ladisch, linux-hwmon, Gabriel Craciunescu, Wei Huang,
	Jean Delvare, open list

On Fri, Aug 27, 2021 at 03:15:25PM -0500, Mario Limonciello wrote:
> Some of the existing assumptions made do not scale properly
> to new silicon in upcoming changes.  This commit should cause
> no functional changes to existing silicon.
> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

Applied.

Thanks,
Guenter

> ---
>  drivers/hwmon/k10temp.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
> index f6b325b8463e..159dbad73d82 100644
> --- a/drivers/hwmon/k10temp.c
> +++ b/drivers/hwmon/k10temp.c
> @@ -65,10 +65,11 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
>  #define F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET	0xd8200c64
>  #define F15H_M60H_REPORTED_TEMP_CTRL_OFFSET	0xd8200ca4
>  
> -/* Common for Zen CPU families (Family 17h and 18h) */
> -#define ZEN_REPORTED_TEMP_CTRL_OFFSET		0x00059800
> +/* Common for Zen CPU families (Family 17h and 18h and 19h) */
> +#define ZEN_REPORTED_TEMP_CTRL_BASE		0x00059800
>  
> -#define ZEN_CCD_TEMP(x)				(0x00059954 + ((x) * 4))
> +#define ZEN_CCD_TEMP(offset, x)			(ZEN_REPORTED_TEMP_CTRL_BASE + \
> +						 (offset) + ((x) * 4))
>  #define ZEN_CCD_TEMP_VALID			BIT(11)
>  #define ZEN_CCD_TEMP_MASK			GENMASK(10, 0)
>  
> @@ -103,6 +104,7 @@ struct k10temp_data {
>  	u32 temp_adjust_mask;
>  	u32 show_temp;
>  	bool is_zen;
> +	u32 ccd_offset;
>  };
>  
>  #define TCTL_BIT	0
> @@ -163,7 +165,7 @@ static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>  static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
>  {
>  	amd_smn_read(amd_pci_dev_to_node_id(pdev),
> -		     ZEN_REPORTED_TEMP_CTRL_OFFSET, regval);
> +		     ZEN_REPORTED_TEMP_CTRL_BASE, regval);
>  }
>  
>  static long get_raw_temp(struct k10temp_data *data)
> @@ -226,7 +228,8 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>  			break;
>  		case 2 ... 9:		/* Tccd{1-8} */
>  			amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
> -				     ZEN_CCD_TEMP(channel - 2), &regval);
> +				     ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
> +						  &regval);
>  			*val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
>  			break;
>  		default:
> @@ -387,7 +390,7 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev,
>  
>  	for (i = 0; i < limit; i++) {
>  		amd_smn_read(amd_pci_dev_to_node_id(pdev),
> -			     ZEN_CCD_TEMP(i), &regval);
> +			     ZEN_CCD_TEMP(data->ccd_offset, i), &regval);
>  		if (regval & ZEN_CCD_TEMP_VALID)
>  			data->show_temp |= BIT(TCCD_BIT(i));
>  	}
> @@ -433,12 +436,14 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  		case 0x8:	/* Zen+ */
>  		case 0x11:	/* Zen APU */
>  		case 0x18:	/* Zen+ APU */
> +			data->ccd_offset = 0x154;
>  			k10temp_get_ccd_support(pdev, data, 4);
>  			break;
>  		case 0x31:	/* Zen2 Threadripper */
>  		case 0x60:	/* Renoir */
>  		case 0x68:	/* Lucienne */
>  		case 0x71:	/* Zen2 */
> +			data->ccd_offset = 0x154;
>  			k10temp_get_ccd_support(pdev, data, 8);
>  			break;
>  		}
> @@ -451,6 +456,7 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  		case 0x0 ... 0x1:	/* Zen3 SP3/TR */
>  		case 0x21:		/* Zen3 Ryzen Desktop */
>  		case 0x50 ... 0x5f:	/* Green Sardine */
> +			data->ccd_offset = 0x154;
>  			k10temp_get_ccd_support(pdev, data, 8);
>  			break;
>  		}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp
  2021-08-27 20:15 ` [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp Mario Limonciello
  2021-08-27 20:20   ` Borislav Petkov
@ 2021-08-27 20:55   ` Guenter Roeck
  1 sibling, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2021-08-27 20:55 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Clemens Ladisch, linux-hwmon, Gabriel Craciunescu, Wei Huang,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Jean Delvare, Bjorn Helgaas, Yazen Ghannam,
	David Bartley, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:PCI SUBSYSTEM

On Fri, Aug 27, 2021 at 03:15:26PM -0500, Mario Limonciello wrote:
> Yellow carp matches same behavior as green sardine and other Zen3
> products, but have different CCD offsets.
> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> Acked-by: Borislav Petkov <bp@suse.de>

Applied.

Thanks,
Guenter

> ---
>  arch/x86/kernel/amd_nb.c | 5 +++++
>  drivers/hwmon/k10temp.c  | 5 +++++
>  include/linux/pci_ids.h  | 1 +
>  3 files changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 23dda362dc0f..c92c9c774c0e 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -25,6 +25,8 @@
>  #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F4 0x144c
>  #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
>  #define PCI_DEVICE_ID_AMD_19H_DF_F4	0x1654
> +#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT	0x14b5
> +#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d
>  #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
>  
>  /* Protect the PCI config register pairs used for SMN and DF indirect access. */
> @@ -37,6 +39,7 @@ static const struct pci_device_id amd_root_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M10H_ROOT) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) },
>  	{}
>  };
>  
> @@ -58,6 +61,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
>  	{}
>  };
> @@ -74,6 +78,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F4) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
>  	{}
> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
> index 159dbad73d82..38bc35ac8135 100644
> --- a/drivers/hwmon/k10temp.c
> +++ b/drivers/hwmon/k10temp.c
> @@ -459,6 +459,10 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  			data->ccd_offset = 0x154;
>  			k10temp_get_ccd_support(pdev, data, 8);
>  			break;
> +		case 0x40 ... 0x4f:	/* Yellow Carp */
> +			data->ccd_offset = 0x300;
> +			k10temp_get_ccd_support(pdev, data, 8);
> +			break;
>  		}
>  	} else {
>  		data->read_htcreg = read_htcreg_pci;
> @@ -499,6 +503,7 @@ static const struct pci_device_id k10temp_id_table[] = {
>  	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M60H_DF_F3) },
>  	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) },
>  	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
> +	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) },
>  	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
>  	{ PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) },
>  	{}
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 5356ccf1c275..e77a62fd0036 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -555,6 +555,7 @@
>  #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
>  #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
>  #define PCI_DEVICE_ID_AMD_19H_DF_F3	0x1653
> +#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F3 0x167c
>  #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
>  #define PCI_DEVICE_ID_AMD_CNB17H_F3	0x1703
>  #define PCI_DEVICE_ID_AMD_LANCE		0x2000

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read
  2021-08-27 20:15 ` [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read Mario Limonciello
@ 2021-08-27 21:06   ` Guenter Roeck
  2021-08-27 21:10     ` Limonciello, Mario
  0 siblings, 1 reply; 9+ messages in thread
From: Guenter Roeck @ 2021-08-27 21:06 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Clemens Ladisch, linux-hwmon, Gabriel Craciunescu, Wei Huang,
	Jean Delvare, open list

On Fri, Aug 27, 2021 at 03:15:27PM -0500, Mario Limonciello wrote:
> Enabling Yellow Carp was initially not working "properly"
> because extra IDs were needed, but this wasn't obvious because fail values
> from `amd_smn_read` were ignored.
> 
> Don't discard errors from any functions providing them, instead pass up
> to the caller.
> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  drivers/hwmon/k10temp.c | 87 ++++++++++++++++++++++++-----------------
>  1 file changed, 52 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
> index 38bc35ac8135..2edb49d39d22 100644
> --- a/drivers/hwmon/k10temp.c
> +++ b/drivers/hwmon/k10temp.c
> @@ -98,8 +98,8 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
>  
>  struct k10temp_data {
>  	struct pci_dev *pdev;
> -	void (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
> -	void (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
> +	int (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
> +	int (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
>  	int temp_offset;
>  	u32 temp_adjust_mask;
>  	u32 show_temp;
> @@ -129,55 +129,65 @@ static const struct tctl_offset tctl_offset_table[] = {
>  	{ 0x17, "AMD Ryzen Threadripper 29", 27000 }, /* 29{20,50,70,90}[W]X */
>  };
>  
> -static void read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
> +static int read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
>  {
> -	pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
> +	return pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
>  }
>  
> -static void read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
> +static int read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
>  {
> -	pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
> +	return pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
>  }
>  
> -static void amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
> +static int amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
>  			      unsigned int base, int offset, u32 *val)
>  {
> +	int ret;
> +
>  	mutex_lock(&nb_smu_ind_mutex);
> -	pci_bus_write_config_dword(pdev->bus, devfn,
> -				   base, offset);
> -	pci_bus_read_config_dword(pdev->bus, devfn,
> -				  base + 4, val);
> +	ret = pci_bus_write_config_dword(pdev->bus, devfn,
> +					 base, offset);
> +	if (ret)
> +		goto out;
> +	ret = pci_bus_read_config_dword(pdev->bus, devfn,
> +					base + 4, val);
> +out:
>  	mutex_unlock(&nb_smu_ind_mutex);
> +	return ret;
>  }
>  
> -static void read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
> +static int read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>  {
> -	amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
> -			  F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
> +	return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
> +				F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
>  }
>  
> -static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
> +static int read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>  {
> -	amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
> -			  F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
> +	return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
> +				F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
>  }
>  
> -static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
> +static int read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
>  {
> -	amd_smn_read(amd_pci_dev_to_node_id(pdev),
> -		     ZEN_REPORTED_TEMP_CTRL_BASE, regval);
> +	return amd_smn_read(amd_pci_dev_to_node_id(pdev),
> +			    ZEN_REPORTED_TEMP_CTRL_BASE, regval);
>  }
>  
> -static long get_raw_temp(struct k10temp_data *data)
> +static int get_raw_temp(struct k10temp_data *data, long *val)
>  {
>  	u32 regval;
> -	long temp;
> +	int ret;
>  
> -	data->read_tempreg(data->pdev, &regval);
> -	temp = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
> +	ret = data->read_tempreg(data->pdev, &regval);
> +	if (ret)
> +		return ret;
> +	*val = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
>  	if (regval & data->temp_adjust_mask)
> -		temp -= 49000;
> -	return temp;
> +		*val -= 49000;
> +	if (*val < 0)
> +		return -EINVAL;

Please don't do that. More on that see below.

> +	return 0;
>  }
>  
>  static const char *k10temp_temp_label[] = {
> @@ -212,24 +222,27 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>  {
>  	struct k10temp_data *data = dev_get_drvdata(dev);
>  	u32 regval;
> +	int ret;
>  
>  	switch (attr) {
>  	case hwmon_temp_input:
>  		switch (channel) {
>  		case 0:		/* Tctl */
> -			*val = get_raw_temp(data);
> -			if (*val < 0)
> -				*val = 0;

We have to take the history into account here. A negative value
is not an error per se, but it suggests that the chip returns wrong
data. See commit aef17ca12719 ("hwmon: (k10temp) Only apply temperature
offset if result is positive") for some of the background. I don't really
want to change that into an error return just because we don't know
what the chip is doing. Please retain the above code, either by fixing
the values up here or in get_raw_temp().

Thanks,
Guenter

> +			ret = get_raw_temp(data, val);
> +			if (ret)
> +				return ret;
>  			break;
>  		case 1:		/* Tdie */
> -			*val = get_raw_temp(data) - data->temp_offset;
> -			if (*val < 0)
> -				*val = 0;
> +			ret = get_raw_temp(data, val) - data->temp_offset;
> +			if (ret)
> +				return ret;
>  			break;
>  		case 2 ... 9:		/* Tccd{1-8} */
> -			amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
> +			ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
>  				     ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
>  						  &regval);
> +			if (ret)
> +				return ret;
>  			*val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
>  			break;
>  		default:
> @@ -240,11 +253,15 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>  		*val = 70 * 1000;
>  		break;
>  	case hwmon_temp_crit:
> -		data->read_htcreg(data->pdev, &regval);
> +		ret = data->read_htcreg(data->pdev, &regval);
> +		if (ret)
> +			return ret;
>  		*val = ((regval >> 16) & 0x7f) * 500 + 52000;
>  		break;
>  	case hwmon_temp_crit_hyst:
> -		data->read_htcreg(data->pdev, &regval);
> +		ret = data->read_htcreg(data->pdev, &regval);
> +		if (ret)
> +			return ret;
>  		*val = (((regval >> 16) & 0x7f)
>  			- ((regval >> 24) & 0xf)) * 500 + 52000;
>  		break;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read
  2021-08-27 21:06   ` Guenter Roeck
@ 2021-08-27 21:10     ` Limonciello, Mario
  2021-08-27 21:45       ` Guenter Roeck
  0 siblings, 1 reply; 9+ messages in thread
From: Limonciello, Mario @ 2021-08-27 21:10 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Clemens Ladisch, linux-hwmon, Gabriel Craciunescu, Wei Huang,
	Jean Delvare, open list

On 8/27/2021 16:06, Guenter Roeck wrote:
> On Fri, Aug 27, 2021 at 03:15:27PM -0500, Mario Limonciello wrote:
>> Enabling Yellow Carp was initially not working "properly"
>> because extra IDs were needed, but this wasn't obvious because fail values
>> from `amd_smn_read` were ignored.
>>
>> Don't discard errors from any functions providing them, instead pass up
>> to the caller.
>>
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>>   drivers/hwmon/k10temp.c | 87 ++++++++++++++++++++++++-----------------
>>   1 file changed, 52 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
>> index 38bc35ac8135..2edb49d39d22 100644
>> --- a/drivers/hwmon/k10temp.c
>> +++ b/drivers/hwmon/k10temp.c
>> @@ -98,8 +98,8 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
>>   
>>   struct k10temp_data {
>>   	struct pci_dev *pdev;
>> -	void (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
>> -	void (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
>> +	int (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
>> +	int (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
>>   	int temp_offset;
>>   	u32 temp_adjust_mask;
>>   	u32 show_temp;
>> @@ -129,55 +129,65 @@ static const struct tctl_offset tctl_offset_table[] = {
>>   	{ 0x17, "AMD Ryzen Threadripper 29", 27000 }, /* 29{20,50,70,90}[W]X */
>>   };
>>   
>> -static void read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
>> +static int read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
>>   {
>> -	pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
>> +	return pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
>>   }
>>   
>> -static void read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
>> +static int read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
>>   {
>> -	pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
>> +	return pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
>>   }
>>   
>> -static void amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
>> +static int amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
>>   			      unsigned int base, int offset, u32 *val)
>>   {
>> +	int ret;
>> +
>>   	mutex_lock(&nb_smu_ind_mutex);
>> -	pci_bus_write_config_dword(pdev->bus, devfn,
>> -				   base, offset);
>> -	pci_bus_read_config_dword(pdev->bus, devfn,
>> -				  base + 4, val);
>> +	ret = pci_bus_write_config_dword(pdev->bus, devfn,
>> +					 base, offset);
>> +	if (ret)
>> +		goto out;
>> +	ret = pci_bus_read_config_dword(pdev->bus, devfn,
>> +					base + 4, val);
>> +out:
>>   	mutex_unlock(&nb_smu_ind_mutex);
>> +	return ret;
>>   }
>>   
>> -static void read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>> +static int read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>>   {
>> -	amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>> -			  F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
>> +	return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>> +				F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
>>   }
>>   
>> -static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>> +static int read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>>   {
>> -	amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>> -			  F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
>> +	return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>> +				F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
>>   }
>>   
>> -static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
>> +static int read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
>>   {
>> -	amd_smn_read(amd_pci_dev_to_node_id(pdev),
>> -		     ZEN_REPORTED_TEMP_CTRL_BASE, regval);
>> +	return amd_smn_read(amd_pci_dev_to_node_id(pdev),
>> +			    ZEN_REPORTED_TEMP_CTRL_BASE, regval);
>>   }
>>   
>> -static long get_raw_temp(struct k10temp_data *data)
>> +static int get_raw_temp(struct k10temp_data *data, long *val)
>>   {
>>   	u32 regval;
>> -	long temp;
>> +	int ret;
>>   
>> -	data->read_tempreg(data->pdev, &regval);
>> -	temp = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
>> +	ret = data->read_tempreg(data->pdev, &regval);
>> +	if (ret)
>> +		return ret;
>> +	*val = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
>>   	if (regval & data->temp_adjust_mask)
>> -		temp -= 49000;
>> -	return temp;
>> +		*val -= 49000;
>> +	if (*val < 0)
>> +		return -EINVAL;
> 
> Please don't do that. More on that see below.
> 
>> +	return 0;
>>   }
>>   
>>   static const char *k10temp_temp_label[] = {
>> @@ -212,24 +222,27 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>>   {
>>   	struct k10temp_data *data = dev_get_drvdata(dev);
>>   	u32 regval;
>> +	int ret;
>>   
>>   	switch (attr) {
>>   	case hwmon_temp_input:
>>   		switch (channel) {
>>   		case 0:		/* Tctl */
>> -			*val = get_raw_temp(data);
>> -			if (*val < 0)
>> -				*val = 0;
> 
> We have to take the history into account here. A negative value
> is not an error per se, but it suggests that the chip returns wrong
> data. See commit aef17ca12719 ("hwmon: (k10temp) Only apply temperature
> offset if result is positive") for some of the background. I don't really
> want to change that into an error return just because we don't know
> what the chip is doing. Please retain the above code, either by fixing
> the values up here or in get_raw_temp().

Actually I thought what I was doing *was* making it a lot less ambiguous.

The caller getting -EINVAL from get_raw_tempt will indicate that the 
data shouldn't be trusted rather than a surely wrong "0".

> 
> Thanks,
> Guenter
> 
>> +			ret = get_raw_temp(data, val);
>> +			if (ret)
>> +				return ret;
>>   			break;
>>   		case 1:		/* Tdie */
>> -			*val = get_raw_temp(data) - data->temp_offset;
>> -			if (*val < 0)
>> -				*val = 0;
>> +			ret = get_raw_temp(data, val) - data->temp_offset;
>> +			if (ret)
>> +				return ret;
>>   			break;
>>   		case 2 ... 9:		/* Tccd{1-8} */
>> -			amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
>> +			ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
>>   				     ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
>>   						  &regval);
>> +			if (ret)
>> +				return ret;
>>   			*val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
>>   			break;
>>   		default:
>> @@ -240,11 +253,15 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>>   		*val = 70 * 1000;
>>   		break;
>>   	case hwmon_temp_crit:
>> -		data->read_htcreg(data->pdev, &regval);
>> +		ret = data->read_htcreg(data->pdev, &regval);
>> +		if (ret)
>> +			return ret;
>>   		*val = ((regval >> 16) & 0x7f) * 500 + 52000;
>>   		break;
>>   	case hwmon_temp_crit_hyst:
>> -		data->read_htcreg(data->pdev, &regval);
>> +		ret = data->read_htcreg(data->pdev, &regval);
>> +		if (ret)
>> +			return ret;
>>   		*val = (((regval >> 16) & 0x7f)
>>   			- ((regval >> 24) & 0xf)) * 500 + 52000;
>>   		break;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read
  2021-08-27 21:10     ` Limonciello, Mario
@ 2021-08-27 21:45       ` Guenter Roeck
  0 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2021-08-27 21:45 UTC (permalink / raw)
  To: Limonciello, Mario
  Cc: Clemens Ladisch, linux-hwmon, Gabriel Craciunescu, Wei Huang,
	Jean Delvare, open list

On 8/27/21 2:10 PM, Limonciello, Mario wrote:
> On 8/27/2021 16:06, Guenter Roeck wrote:
>> On Fri, Aug 27, 2021 at 03:15:27PM -0500, Mario Limonciello wrote:
>>> Enabling Yellow Carp was initially not working "properly"
>>> because extra IDs were needed, but this wasn't obvious because fail values
>>> from `amd_smn_read` were ignored.
>>>
>>> Don't discard errors from any functions providing them, instead pass up
>>> to the caller.
>>>
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>>   drivers/hwmon/k10temp.c | 87 ++++++++++++++++++++++++-----------------
>>>   1 file changed, 52 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
>>> index 38bc35ac8135..2edb49d39d22 100644
>>> --- a/drivers/hwmon/k10temp.c
>>> +++ b/drivers/hwmon/k10temp.c
>>> @@ -98,8 +98,8 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
>>>   struct k10temp_data {
>>>       struct pci_dev *pdev;
>>> -    void (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
>>> -    void (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
>>> +    int (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
>>> +    int (*read_tempreg)(struct pci_dev *pdev, u32 *regval);
>>>       int temp_offset;
>>>       u32 temp_adjust_mask;
>>>       u32 show_temp;
>>> @@ -129,55 +129,65 @@ static const struct tctl_offset tctl_offset_table[] = {
>>>       { 0x17, "AMD Ryzen Threadripper 29", 27000 }, /* 29{20,50,70,90}[W]X */
>>>   };
>>> -static void read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
>>> +static int read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
>>>   {
>>> -    pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
>>> +    return pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
>>>   }
>>> -static void read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
>>> +static int read_tempreg_pci(struct pci_dev *pdev, u32 *regval)
>>>   {
>>> -    pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
>>> +    return pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, regval);
>>>   }
>>> -static void amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
>>> +static int amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
>>>                     unsigned int base, int offset, u32 *val)
>>>   {
>>> +    int ret;
>>> +
>>>       mutex_lock(&nb_smu_ind_mutex);
>>> -    pci_bus_write_config_dword(pdev->bus, devfn,
>>> -                   base, offset);
>>> -    pci_bus_read_config_dword(pdev->bus, devfn,
>>> -                  base + 4, val);
>>> +    ret = pci_bus_write_config_dword(pdev->bus, devfn,
>>> +                     base, offset);
>>> +    if (ret)
>>> +        goto out;
>>> +    ret = pci_bus_read_config_dword(pdev->bus, devfn,
>>> +                    base + 4, val);
>>> +out:
>>>       mutex_unlock(&nb_smu_ind_mutex);
>>> +    return ret;
>>>   }
>>> -static void read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>>> +static int read_htcreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>>>   {
>>> -    amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>>> -              F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
>>> +    return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>>> +                F15H_M60H_HARDWARE_TEMP_CTRL_OFFSET, regval);
>>>   }
>>> -static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>>> +static int read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)
>>>   {
>>> -    amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>>> -              F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
>>> +    return amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
>>> +                F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, regval);
>>>   }
>>> -static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
>>> +static int read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
>>>   {
>>> -    amd_smn_read(amd_pci_dev_to_node_id(pdev),
>>> -             ZEN_REPORTED_TEMP_CTRL_BASE, regval);
>>> +    return amd_smn_read(amd_pci_dev_to_node_id(pdev),
>>> +                ZEN_REPORTED_TEMP_CTRL_BASE, regval);
>>>   }
>>> -static long get_raw_temp(struct k10temp_data *data)
>>> +static int get_raw_temp(struct k10temp_data *data, long *val)
>>>   {
>>>       u32 regval;
>>> -    long temp;
>>> +    int ret;
>>> -    data->read_tempreg(data->pdev, &regval);
>>> -    temp = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
>>> +    ret = data->read_tempreg(data->pdev, &regval);
>>> +    if (ret)
>>> +        return ret;
>>> +    *val = (regval >> ZEN_CUR_TEMP_SHIFT) * 125;
>>>       if (regval & data->temp_adjust_mask)
>>> -        temp -= 49000;
>>> -    return temp;
>>> +        *val -= 49000;
>>> +    if (*val < 0)
>>> +        return -EINVAL;
>>
>> Please don't do that. More on that see below.
>>
>>> +    return 0;
>>>   }
>>>   static const char *k10temp_temp_label[] = {
>>> @@ -212,24 +222,27 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>>>   {
>>>       struct k10temp_data *data = dev_get_drvdata(dev);
>>>       u32 regval;
>>> +    int ret;
>>>       switch (attr) {
>>>       case hwmon_temp_input:
>>>           switch (channel) {
>>>           case 0:        /* Tctl */
>>> -            *val = get_raw_temp(data);
>>> -            if (*val < 0)
>>> -                *val = 0;
>>
>> We have to take the history into account here. A negative value
>> is not an error per se, but it suggests that the chip returns wrong
>> data. See commit aef17ca12719 ("hwmon: (k10temp) Only apply temperature
>> offset if result is positive") for some of the background. I don't really
>> want to change that into an error return just because we don't know
>> what the chip is doing. Please retain the above code, either by fixing
>> the values up here or in get_raw_temp().
> 
> Actually I thought what I was doing *was* making it a lot less ambiguous.
> 
> The caller getting -EINVAL from get_raw_tempt will indicate that the data shouldn't be trusted rather than a surely wrong "0".
> 

First, EINVAL is "Invalid argument" and thus most definitely the wrong
error.

Second, this is a logically different change and violates "one logical
change per patch". You can try to argue this in a separate patch,
but do not try sneak it in with an unrelated change.

Plus, of course, I'd rather get code from AMD that does the right thing
and reports the correct temperature in the observed situation.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-27 21:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210827201527.24454-1-mario.limonciello@amd.com>
2021-08-27 20:15 ` [PATCH v2 1/3] hwmon: (k10temp): Rework the temperature offset calculation Mario Limonciello
2021-08-27 20:55   ` Guenter Roeck
2021-08-27 20:15 ` [PATCH v2 2/3] hwmon: (k10temp): Add support for yellow carp Mario Limonciello
2021-08-27 20:20   ` Borislav Petkov
2021-08-27 20:55   ` Guenter Roeck
2021-08-27 20:15 ` [PATCH v2 3/3] hwmon: (k10temp): Show errors failing to read Mario Limonciello
2021-08-27 21:06   ` Guenter Roeck
2021-08-27 21:10     ` Limonciello, Mario
2021-08-27 21:45       ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).