linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] PECI device driver introduction
@ 2018-02-21 16:15 Jae Hyun Yoo
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
                   ` (8 more replies)
  0 siblings, 9 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:15 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

Introduction of the Platform Environment Control Interface (PECI) bus
device driver. PECI is a one-wire bus interface that provides a
communication channel between Intel processor and chipset components to
external monitoring or control devices. PECI is designed to support the
following sideband functions:

* Processor and DRAM thermal management
  - Processor fan speed control is managed by comparing Digital Thermal
    Sensor (DTS) thermal readings acquired via PECI against the
    processor-specific fan speed control reference point, or TCONTROL.
    Both TCONTROL and DTS thermal readings are accessible via the processor
    PECI client. These variables are referenced to a common temperature,
    the TCC activation point, and are both defined as negative offsets from
    that reference.
  - PECI based access to the processor package configuration space provides
    a means for Baseboard Management Controllers (BMC) or other platform
    management devices to actively manage the processor and memory power
    and thermal features.

* Platform Manageability
  - Platform manageability functions including thermal, power, and error
    monitoring. Note that platform 'power' management includes monitoring
    and control for both the processor and DRAM subsystem to assist with
    data center power limiting.
  - PECI allows read access to certain error registers in the processor MSR
    space and status monitoring registers in the PCI configuration space
    within the processor and downstream devices.
  - PECI permits writes to certain registers in the processor PCI
    configuration space.

* Processor Interface Tuning and Diagnostics
  - Processor interface tuning and diagnostics capabilities
    (Intel(c) Interconnect BIST). The processors Intel(c) Interconnect
    Built In Self Test (Intel(c) IBIST) allows for infield diagnostic
    capabilities in the Intel UPI and memory controller interfaces. PECI
    provides a port to execute these diagnostics via its PCI Configuration
    read and write capabilities.

* Failure Analysis
  - Output the state of the processor after a failure for analysis via
    Crashdump.

PECI uses a single wire for self-clocking and data transfer. The bus
requires no additional control lines. The physical layer is a self-clocked
one-wire bus that begins each bit with a driven, rising edge from an idle
level near zero volts. The duration of the signal driven high depends on
whether the bit value is a logic '0' or logic '1'. PECI also includes
variable data transfer rate established with every message. In this way,
it is highly flexible even though underlying logic is simple.

The interface design was optimized for interfacing to Intel processor and
chipset components in both single processor and multiple processor
environments. The single wire interface provides low board routing
overhead for the multiple load connections in the congested routing area
near the processor and chipset components. Bus speed, error checking, and
low protocol overhead provides adequate link bandwidth and reliability to
transfer critical device operating conditions and configuration
information.

This implementation provides the basic framework to add PECI extensions
to the Linux bus and device models. A hardware specific 'Adapter' driver
can be attached to the PECI bus to provide sideband functions described
above. It is also possible to access all devices on an adapter from
userspace through the /dev interface. A device specific 'Client' driver
also can be attached to the PECI bus so each processor client's features
can be supported by the 'Client' driver through an adapter connection in
the bus. This patch set includes Aspeed 24xx/25xx PECI driver and a generic
PECI hwmon driver as the first implementation for both adapter and client
drivers on the PECI bus framework.

v1 -> v2
- Additionally implemented a core driver to support PECI linux bus driver
  model.
- Modified Aspeed PECI driver to make that to be an adapter driver in PECI
  bus.
- Modified PECI hwmon driver to make that to be a client driver in PECI
  bus.
- Simplified hwmon driver attribute labels and removed redundant strings.
- Removed core_nums from device tree setting of hwmon driver and modified
  core number detection logic to check the resolved_core register in
  client CPU's local PCI configuration area.
- Removed dimm_nums from device tree setting of hwmon driver and added
  populated DIMM detection logic to support dynamic creation.
- Removed indexing gap on core temperature and DIMM temperature attributes.
- Improved hwmon registration and dynamic attribute creation logic.
- Fixed structure definitions in PECI uapi header to make that use __u8,
  __u16 and etc.
- Modified wait_for_completion_interruptible_timeout error handling logic
  in Aspeed PECI driver to deliver errors correctly.
- Removed low-level xfer command from ioctl and kept only high-level PECI
  command suite as ioctls.
- Fixed I/O timeout logic in Aspeed PECI driver using ktime.
- Added a function into hwmon driver to simplify update delay checking.
- Added a function into hwmon driver to convert 10.6 to millidegree.
- Dropped non-standard attributes in hwmon driver.
- Fixed OF table for hwmon to make it indicate as a PECI client of Intel
  CPU target.
- Added a maintainer of PECI subsystem into MAINTAINERS document.

Thanks,

-Jae

Jae Hyun Yoo (8):
  drivers/peci: Add support for PECI bus driver core
  Documentations: dt-bindings: Add a document of PECI adapter driver for
    Aspeed AST24xx/25xx SoCs
  ARM: dts: aspeed: peci: Add PECI node
  drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx
  Documentation: dt-bindings: Add a document for PECI hwmon client driver
  Documentation: hwmon: Add a document for a PECI hwmon client driver
  drivers/hwmon: Add a generic PECI hwmon client driver
  Add a maintainer for the PECI subsystem

 .../devicetree/bindings/hwmon/peci-hwmon.txt       |   27 +
 .../devicetree/bindings/peci/peci-aspeed.txt       |   73 ++
 Documentation/hwmon/peci-hwmon                     |   73 ++
 MAINTAINERS                                        |    9 +
 arch/arm/boot/dts/aspeed-g4.dtsi                   |   25 +
 arch/arm/boot/dts/aspeed-g5.dtsi                   |   25 +
 drivers/Kconfig                                    |    2 +
 drivers/Makefile                                   |    1 +
 drivers/hwmon/Kconfig                              |   10 +
 drivers/hwmon/Makefile                             |    1 +
 drivers/hwmon/peci-hwmon.c                         |  928 ++++++++++++++
 drivers/peci/Kconfig                               |   39 +
 drivers/peci/Makefile                              |    9 +
 drivers/peci/peci-aspeed.c                         |  510 ++++++++
 drivers/peci/peci-core.c                           | 1337 ++++++++++++++++
 include/linux/peci.h                               |   97 ++
 include/uapi/linux/peci-ioctl.h                    |  207 +++
 17 files changed, 3373 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/peci-hwmon.txt
 create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt
 create mode 100644 Documentation/hwmon/peci-hwmon
 create mode 100644 drivers/hwmon/peci-hwmon.c
 create mode 100644 drivers/peci/Kconfig
 create mode 100644 drivers/peci/Makefile
 create mode 100644 drivers/peci/peci-aspeed.c
 create mode 100644 drivers/peci/peci-core.c
 create mode 100644 include/linux/peci.h
 create mode 100644 include/uapi/linux/peci-ioctl.h

-- 
2.16.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
@ 2018-02-21 16:15 ` Jae Hyun Yoo
  2018-02-21 17:04   ` Andrew Lunn
                     ` (4 more replies)
  2018-02-21 16:16 ` [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs Jae Hyun Yoo
                   ` (7 subsequent siblings)
  8 siblings, 5 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:15 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds driver implementation for PECI bus into linux
driver framework.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 drivers/Kconfig                 |    2 +
 drivers/Makefile                |    1 +
 drivers/peci/Kconfig            |   20 +
 drivers/peci/Makefile           |    6 +
 drivers/peci/peci-core.c        | 1337 +++++++++++++++++++++++++++++++++++++++
 include/linux/peci.h            |   97 +++
 include/uapi/linux/peci-ioctl.h |  207 ++++++
 7 files changed, 1670 insertions(+)
 create mode 100644 drivers/peci/Kconfig
 create mode 100644 drivers/peci/Makefile
 create mode 100644 drivers/peci/peci-core.c
 create mode 100644 include/linux/peci.h
 create mode 100644 include/uapi/linux/peci-ioctl.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 879dc0604cba..031bed5bbe7b 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -219,4 +219,6 @@ source "drivers/siox/Kconfig"
 
 source "drivers/slimbus/Kconfig"
 
+source "drivers/peci/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 24cd47014657..250fe3d0fa7e 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -185,3 +185,4 @@ obj-$(CONFIG_TEE)		+= tee/
 obj-$(CONFIG_MULTIPLEXER)	+= mux/
 obj-$(CONFIG_UNISYS_VISORBUS)	+= visorbus/
 obj-$(CONFIG_SIOX)		+= siox/
+obj-$(CONFIG_PECI)		+= peci/
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
new file mode 100644
index 000000000000..1cd2cb4b2298
--- /dev/null
+++ b/drivers/peci/Kconfig
@@ -0,0 +1,20 @@
+#
+# Platform Environment Control Interface (PECI) subsystem configuration
+#
+
+menu "PECI support"
+
+config PECI
+	tristate "PECI support"
+	select RT_MUTEXES
+	select CRC8
+	help
+	  The Platform Environment Control Interface (PECI) is a one-wire bus
+	  interface that provides a communication channel between Intel
+	  processor and chipset components to external monitoring or control
+	  devices.
+
+	  This PECI support can also be built as a module.  If so, the module
+	  will be called peci-core.
+
+endmenu
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
new file mode 100644
index 000000000000..9e8615e0d3ff
--- /dev/null
+++ b/drivers/peci/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for the PECI core and bus drivers.
+#
+
+# Core functionality
+obj-$(CONFIG_PECI)		+= peci-core.o
diff --git a/drivers/peci/peci-core.c b/drivers/peci/peci-core.c
new file mode 100644
index 000000000000..d976c7317801
--- /dev/null
+++ b/drivers/peci/peci-core.c
@@ -0,0 +1,1337 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#include <linux/crc8.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/peci.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+/* Device Specific Completion Code (CC) Definition */
+#define DEV_PECI_CC_RETRY_ERR_MASK  0xf0
+#define DEV_PECI_CC_SUCCESS         0x40
+#define DEV_PECI_CC_TIMEOUT         0x80
+#define DEV_PECI_CC_OUT_OF_RESOURCE 0x81
+#define DEV_PECI_CC_INVALID_REQ     0x90
+
+/* Skylake EDS says to retry for 250ms */
+#define DEV_PECI_RETRY_TIME_MS     250
+#define DEV_PECI_RETRY_BIT         0x01
+
+#define GET_TEMP_WR_LEN   1
+#define GET_TEMP_RD_LEN   2
+#define GET_TEMP_PECI_CMD 0x01
+
+#define GET_DIB_WR_LEN   1
+#define GET_DIB_RD_LEN   8
+#define GET_DIB_PECI_CMD 0xf7
+
+#define RDPKGCFG_WRITE_LEN     5
+#define RDPKGCFG_READ_LEN_BASE 1
+#define RDPKGCFG_PECI_CMD      0xa1
+
+#define WRPKGCFG_WRITE_LEN_BASE 6
+#define WRPKGCFG_READ_LEN       1
+#define WRPKGCFG_PECI_CMD       0xa5
+
+#define RDIAMSR_WRITE_LEN 5
+#define RDIAMSR_READ_LEN  9
+#define RDIAMSR_PECI_CMD  0xb1
+
+#define WRIAMSR_PECI_CMD  0xb5
+
+#define RDPCICFG_WRITE_LEN 6
+#define RDPCICFG_READ_LEN  5
+#define RDPCICFG_PECI_CMD  0x61
+
+#define WRPCICFG_PECI_CMD  0x65
+
+#define RDPCICFGLOCAL_WRITE_LEN     5
+#define RDPCICFGLOCAL_READ_LEN_BASE 1
+#define RDPCICFGLOCAL_PECI_CMD      0xe1
+
+#define WRPCICFGLOCAL_WRITE_LEN_BASE 6
+#define WRPCICFGLOCAL_READ_LEN       1
+#define WRPCICFGLOCAL_PECI_CMD       0xe5
+
+/* CRC8 table for Assure Write Frame Check */
+#define PECI_CRC8_POLYNOMIAL 0x07
+DECLARE_CRC8_TABLE(peci_crc8_table);
+
+static struct device_type peci_adapter_type;
+static struct device_type peci_client_type;
+
+#define PECI_CDEV_MAX 16
+static dev_t peci_devt;
+static bool is_registered;
+
+static DEFINE_MUTEX(core_lock);
+static DEFINE_IDR(peci_adapter_idr);
+
+static ssize_t name_show(struct device *dev,
+			 struct device_attribute *attr,
+			 char *buf)
+{
+	return sprintf(buf, "%s\n", dev->type == &peci_client_type ?
+		       to_peci_client(dev)->name : to_peci_adapter(dev)->name);
+}
+static DEVICE_ATTR_RO(name);
+
+static void peci_client_dev_release(struct device *dev)
+{
+	kfree(to_peci_client(dev));
+}
+
+static struct attribute *peci_device_attrs[] = {
+	&dev_attr_name.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(peci_device);
+
+static struct device_type peci_client_type = {
+	.groups		= peci_device_groups,
+	.release	= peci_client_dev_release,
+};
+
+static struct peci_client *peci_verify_client(struct device *dev)
+{
+	return (dev->type == &peci_client_type)
+			? to_peci_client(dev)
+			: NULL;
+}
+
+static void peci_adapter_dev_release(struct device *dev)
+{
+	/* do nothing */
+}
+
+static struct attribute *peci_adapter_attrs[] = {
+	&dev_attr_name.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(peci_adapter);
+
+static struct device_type peci_adapter_type = {
+	.groups		= peci_adapter_groups,
+	.release	= peci_adapter_dev_release,
+};
+
+static struct peci_adapter *peci_verify_adapter(struct device *dev)
+{
+	return (dev->type == &peci_adapter_type)
+			? to_peci_adapter(dev)
+			: NULL;
+}
+
+static struct peci_adapter *peci_get_adapter(int nr)
+{
+	struct peci_adapter *adapter;
+
+	mutex_lock(&core_lock);
+	adapter = idr_find(&peci_adapter_idr, nr);
+	if (!adapter)
+		goto out_unlock;
+
+	if (try_module_get(adapter->owner))
+		get_device(&adapter->dev);
+	else
+		adapter = NULL;
+
+out_unlock:
+	mutex_unlock(&core_lock);
+	return adapter;
+}
+
+static void peci_put_adapter(struct peci_adapter *adapter)
+{
+	if (!adapter)
+		return;
+
+	put_device(&adapter->dev);
+	module_put(adapter->owner);
+}
+
+static u8 peci_aw_fcs(u8 *data, int len)
+{
+	return crc8(peci_crc8_table, data, (size_t)len, 0);
+}
+
+static int peci_locked_xfer(struct peci_adapter *adapter,
+			    struct peci_xfer_msg *msg,
+			    bool do_retry,
+			    bool has_aw_fcs)
+{
+	ktime_t start, end;
+	s64 elapsed_ms;
+	int rc = 0;
+
+	if (!adapter->xfer) {
+		dev_dbg(&adapter->dev, "PECI level transfers not supported\n");
+		return -ENODEV;
+	}
+
+	if (in_atomic() || irqs_disabled()) {
+		rt_mutex_trylock(&adapter->bus_lock);
+		if (!rc)
+			return -EAGAIN; /* PECI activity is ongoing */
+	} else {
+		rt_mutex_lock(&adapter->bus_lock);
+	}
+
+	if (do_retry)
+		start = ktime_get();
+
+	do {
+		rc = adapter->xfer(adapter, msg);
+
+		if (!do_retry)
+			break;
+
+		/* Per the PECI spec, need to retry commands that return 0x8x */
+		if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
+			      DEV_PECI_CC_TIMEOUT)))
+			break;
+
+		/* Set the retry bit to indicate a retry attempt */
+		msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;
+
+		/* Recalculate the AW FCS if it has one */
+		if (has_aw_fcs)
+			msg->tx_buf[msg->tx_len - 1] = 0x80 ^
+						peci_aw_fcs((u8 *)msg,
+							    2 + msg->tx_len);
+
+		/* Retry for at least 250ms before returning an error */
+		end = ktime_get();
+		elapsed_ms = ktime_to_ms(ktime_sub(end, start));
+		if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
+			dev_dbg(&adapter->dev, "Timeout retrying xfer!\n");
+			break;
+		}
+	} while (true);
+
+	rt_mutex_unlock(&adapter->bus_lock);
+
+	return rc;
+}
+
+static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg)
+{
+	return peci_locked_xfer(adapter, msg, false, false);
+}
+
+static int peci_xfer_with_retries(struct peci_adapter *adapter,
+				  struct peci_xfer_msg *msg,
+				  bool has_aw_fcs)
+{
+	return peci_locked_xfer(adapter, msg, true, has_aw_fcs);
+}
+
+static int peci_scan_cmd_mask(struct peci_adapter *adapter)
+{
+	struct peci_xfer_msg msg;
+	u32 dib;
+	int rc = 0;
+
+	/* Update command mask just once */
+	if (adapter->cmd_mask & BIT(PECI_CMD_PING))
+		return 0;
+
+	msg.addr      = PECI_BASE_ADDR;
+	msg.tx_len    = GET_DIB_WR_LEN;
+	msg.rx_len    = GET_DIB_RD_LEN;
+	msg.tx_buf[0] = GET_DIB_PECI_CMD;
+
+	rc = peci_xfer(adapter, &msg);
+	if (rc < 0) {
+		dev_dbg(&adapter->dev, "PECI xfer error, rc : %d\n", rc);
+		return rc;
+	}
+
+	dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
+	      (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
+
+	/* Check special case for Get DIB command */
+	if (dib == 0x00) {
+		dev_dbg(&adapter->dev, "DIB read as 0x00\n");
+		return -1;
+	}
+
+	if (!rc) {
+		/**
+		 * setting up the supporting commands based on minor rev#
+		 * see PECI Spec Table 3-1
+		 */
+		dib = (dib >> 8) & 0xF;
+
+		if (dib >= 0x1) {
+			adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
+			adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
+		}
+
+		if (dib >= 0x2)
+			adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR);
+
+		if (dib >= 0x3) {
+			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG_LOCAL);
+			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG_LOCAL);
+		}
+
+		if (dib >= 0x4)
+			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG);
+
+		if (dib >= 0x5)
+			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG);
+
+		if (dib >= 0x6)
+			adapter->cmd_mask |= BIT(PECI_CMD_WR_IA_MSR);
+
+		adapter->cmd_mask |= BIT(PECI_CMD_GET_TEMP);
+		adapter->cmd_mask |= BIT(PECI_CMD_GET_DIB);
+		adapter->cmd_mask |= BIT(PECI_CMD_PING);
+	} else {
+		dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
+	}
+
+	return rc;
+}
+
+static int peci_cmd_support(struct peci_adapter *adapter, enum peci_cmd cmd)
+{
+	if (!(adapter->cmd_mask & BIT(PECI_CMD_PING)) &&
+	    peci_scan_cmd_mask(adapter) < 0) {
+		dev_dbg(&adapter->dev, "Failed to scan command mask\n");
+		return -EIO;
+	}
+
+	if (!(adapter->cmd_mask & BIT(cmd))) {
+		dev_dbg(&adapter->dev, "Command %d is not supported\n", cmd);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int peci_ioctl_ping(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_ping_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	int rc;
+
+	rc = peci_cmd_support(adapter, PECI_CMD_PING);
+	if (rc < 0)
+		return rc;
+
+	msg.addr   = umsg->addr;
+	msg.tx_len = 0;
+	msg.rx_len = 0;
+
+	rc = peci_xfer(adapter, &msg);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+
+static int peci_ioctl_get_dib(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_get_dib_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	int rc;
+
+	rc = peci_cmd_support(adapter, PECI_CMD_GET_DIB);
+	if (rc < 0)
+		return rc;
+
+	msg.addr      = umsg->addr;
+	msg.tx_len    = GET_DIB_WR_LEN;
+	msg.rx_len    = GET_DIB_RD_LEN;
+	msg.tx_buf[0] = GET_DIB_PECI_CMD;
+
+	rc = peci_xfer(adapter, &msg);
+	if (rc < 0)
+		return rc;
+
+	umsg->dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
+		     (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
+
+	return 0;
+}
+
+static int peci_ioctl_get_temp(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_get_temp_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	int rc;
+
+	rc = peci_cmd_support(adapter, PECI_CMD_GET_TEMP);
+	if (rc < 0)
+		return rc;
+
+	msg.addr      = umsg->addr;
+	msg.tx_len    = GET_TEMP_WR_LEN;
+	msg.rx_len    = GET_TEMP_RD_LEN;
+	msg.tx_buf[0] = GET_TEMP_PECI_CMD;
+
+	rc = peci_xfer(adapter, &msg);
+	if (rc < 0)
+		return rc;
+
+	umsg->temp_raw = msg.rx_buf[0] | (msg.rx_buf[1] << 8);
+
+	return 0;
+}
+
+static int peci_ioctl_rd_pkg_cfg(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_rd_pkg_cfg_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	int rc = 0;
+
+	/* Per the PECI spec, the read length must be a byte, word, or dword */
+	if (umsg->rx_len != 1 && umsg->rx_len != 2 && umsg->rx_len != 4) {
+		dev_dbg(&adapter->dev, "Invalid read length, rx_len: %d\n",
+			umsg->rx_len);
+		return -EINVAL;
+	}
+
+	rc = peci_cmd_support(adapter, PECI_CMD_RD_PKG_CFG);
+	if (rc < 0)
+		return rc;
+
+	msg.addr = umsg->addr;
+	msg.tx_len = RDPKGCFG_WRITE_LEN;
+	/* read lengths of 1 and 2 result in an error, so only use 4 for now */
+	msg.rx_len = RDPKGCFG_READ_LEN_BASE + umsg->rx_len;
+	msg.tx_buf[0] = RDPKGCFG_PECI_CMD;
+	msg.tx_buf[1] = 0x00;         /* request byte for Host ID / Retry bit */
+				      /* Host ID is 0 for PECI 3.0 */
+	msg.tx_buf[2] = umsg->index;            /* RdPkgConfig index */
+	msg.tx_buf[3] = (u8)umsg->param;        /* LSB - Config parameter */
+	msg.tx_buf[4] = (u8)(umsg->param >> 8); /* MSB - Config parameter */
+
+	rc = peci_xfer_with_retries(adapter, &msg, false);
+	if (rc || msg.rx_buf[0] != DEV_PECI_CC_SUCCESS) {
+		dev_dbg(&adapter->dev, "xfer error, rc : %d\n", rc);
+		return -EIO;
+	}
+
+	memcpy(umsg->pkg_config, &msg.rx_buf[1], umsg->rx_len);
+
+	return rc;
+}
+
+static int peci_ioctl_wr_pkg_cfg(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_wr_pkg_cfg_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	int rc = 0, i;
+
+	/* Per the PECI spec, the write length must be a dword */
+	if (umsg->tx_len != 4) {
+		dev_dbg(&adapter->dev, "Invalid write length, tx_len: %d\n",
+			umsg->tx_len);
+		return -EINVAL;
+	}
+
+	rc = peci_cmd_support(adapter, PECI_CMD_WR_PKG_CFG);
+	if (rc < 0)
+		return rc;
+
+	msg.addr = umsg->addr;
+	msg.tx_len = WRPKGCFG_WRITE_LEN_BASE + umsg->tx_len;
+	/* read lengths of 1 and 2 result in an error, so only use 4 for now */
+	msg.rx_len = WRPKGCFG_READ_LEN;
+	msg.tx_buf[0] = WRPKGCFG_PECI_CMD;
+	msg.tx_buf[1] = 0x00;         /* request byte for Host ID / Retry bit */
+				      /* Host ID is 0 for PECI 3.0 */
+	msg.tx_buf[2] = umsg->index;            /* RdPkgConfig index */
+	msg.tx_buf[3] = (u8)umsg->param;        /* LSB - Config parameter */
+	msg.tx_buf[4] = (u8)(umsg->param >> 8); /* MSB - Config parameter */
+	for (i = 0; i < umsg->tx_len; i++)
+		msg.tx_buf[5 + i] = (u8)(umsg->value >> (i << 3));
+
+	/* Add an Assure Write Frame Check Sequence byte */
+	msg.tx_buf[5 + i] = 0x80 ^
+			    peci_aw_fcs((u8 *)&msg, 8 + umsg->tx_len);
+
+	rc = peci_xfer_with_retries(adapter, &msg, true);
+	if (rc || msg.rx_buf[0] != DEV_PECI_CC_SUCCESS) {
+		dev_dbg(&adapter->dev, "xfer error, rc : %d\n", rc);
+		return -EIO;
+	}
+
+	return rc;
+}
+
+static int peci_ioctl_rd_ia_msr(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_rd_ia_msr_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	int rc = 0;
+
+	rc = peci_cmd_support(adapter, PECI_CMD_RD_IA_MSR);
+	if (rc < 0)
+		return rc;
+
+	msg.addr = umsg->addr;
+	msg.tx_len = RDIAMSR_WRITE_LEN;
+	msg.rx_len = RDIAMSR_READ_LEN;
+	msg.tx_buf[0] = RDIAMSR_PECI_CMD;
+	msg.tx_buf[1] = 0x00;
+	msg.tx_buf[2] = umsg->thread_id;
+	msg.tx_buf[3] = (u8)umsg->address;
+	msg.tx_buf[4] = (u8)(umsg->address >> 8);
+
+	rc = peci_xfer_with_retries(adapter, &msg, false);
+	if (rc || msg.rx_buf[0] != DEV_PECI_CC_SUCCESS) {
+		dev_dbg(&adapter->dev, "xfer error, rc : %d\n", rc);
+		return -EIO;
+	}
+
+	memcpy(&umsg->value, &msg.rx_buf[1], sizeof(uint64_t));
+
+	return rc;
+}
+
+static int peci_ioctl_rd_pci_cfg(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_rd_pci_cfg_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	u32 address;
+	int rc = 0;
+
+	rc = peci_cmd_support(adapter, PECI_CMD_RD_PCI_CFG);
+	if (rc < 0)
+		return rc;
+
+	address = umsg->reg;                  /* [11:0]  - Register */
+	address |= (u32)umsg->function << 12; /* [14:12] - Function */
+	address |= (u32)umsg->device << 15;   /* [19:15] - Device   */
+	address |= (u32)umsg->bus << 20;      /* [27:20] - Bus      */
+					      /* [31:28] - Reserved */
+	msg.addr = umsg->addr;
+	msg.tx_len = RDPCICFG_WRITE_LEN;
+	msg.rx_len = RDPCICFG_READ_LEN;
+	msg.tx_buf[0] = RDPCICFG_PECI_CMD;
+	msg.tx_buf[1] = 0x00;         /* request byte for Host ID / Retry bit */
+				      /* Host ID is 0 for PECI 3.0 */
+	msg.tx_buf[2] = (u8)address;         /* LSB - PCI Config Address */
+	msg.tx_buf[3] = (u8)(address >> 8);  /* PCI Config Address */
+	msg.tx_buf[4] = (u8)(address >> 16); /* PCI Config Address */
+	msg.tx_buf[5] = (u8)(address >> 24); /* MSB - PCI Config Address */
+
+	rc = peci_xfer_with_retries(adapter, &msg, false);
+	if (rc || msg.rx_buf[0] != DEV_PECI_CC_SUCCESS) {
+		dev_dbg(&adapter->dev, "xfer error, rc : %d\n", rc);
+		return -EIO;
+	}
+
+	memcpy(umsg->pci_config, &msg.rx_buf[1], 4);
+
+	return rc;
+}
+
+static int peci_ioctl_rd_pci_cfg_local(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_rd_pci_cfg_local_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	u32 address;
+	int rc = 0;
+
+	/* Per the PECI spec, the read length must be a byte, word, or dword */
+	if (umsg->rx_len != 1 && umsg->rx_len != 2 && umsg->rx_len != 4) {
+		dev_dbg(&adapter->dev, "Invalid read length, rx_len: %d\n",
+			umsg->rx_len);
+		return -EINVAL;
+	}
+
+	rc = peci_cmd_support(adapter, PECI_CMD_RD_PCI_CFG_LOCAL);
+	if (rc < 0)
+		return rc;
+
+	address = umsg->reg;                  /* [11:0]  - Register */
+	address |= (u32)umsg->function << 12; /* [14:12] - Function */
+	address |= (u32)umsg->device << 15;   /* [19:15] - Device   */
+	address |= (u32)umsg->bus << 20;      /* [23:20] - Bus      */
+
+	msg.addr = umsg->addr;
+	msg.tx_len = RDPCICFGLOCAL_WRITE_LEN;
+	msg.rx_len = RDPCICFGLOCAL_READ_LEN_BASE + umsg->rx_len;
+	msg.tx_buf[0] = RDPCICFGLOCAL_PECI_CMD;
+	msg.tx_buf[1] = 0x00;         /* request byte for Host ID / Retry bit */
+				      /* Host ID is 0 for PECI 3.0 */
+	msg.tx_buf[2] = (u8)address;       /* LSB - PCI Configuration Address */
+	msg.tx_buf[3] = (u8)(address >> 8);  /* PCI Configuration Address */
+	msg.tx_buf[4] = (u8)(address >> 16); /* PCI Configuration Address */
+
+	rc = peci_xfer_with_retries(adapter, &msg, false);
+	if (rc || msg.rx_buf[0] != DEV_PECI_CC_SUCCESS) {
+		dev_dbg(&adapter->dev, "xfer error, rc : %d\n", rc);
+		return -EIO;
+	}
+
+	memcpy(umsg->pci_config, &msg.rx_buf[1], umsg->rx_len);
+
+	return rc;
+}
+
+static int peci_ioctl_wr_pci_cfg_local(struct peci_adapter *adapter, void *vmsg)
+{
+	struct peci_wr_pci_cfg_local_msg *umsg = vmsg;
+	struct peci_xfer_msg msg;
+	u32 address;
+	int rc = 0, i;
+
+	/* Per the PECI spec, the write length must be a byte, word, or dword */
+	if (umsg->tx_len != 1 && umsg->tx_len != 2 && umsg->tx_len != 4) {
+		dev_dbg(&adapter->dev, "Invalid write length, tx_len: %d\n",
+			umsg->tx_len);
+		return -EINVAL;
+	}
+
+	rc = peci_cmd_support(adapter, PECI_CMD_RD_PCI_CFG_LOCAL);
+	if (rc < 0)
+		return rc;
+
+	address = umsg->reg;                  /* [11:0]  - Register */
+	address |= (u32)umsg->function << 12; /* [14:12] - Function */
+	address |= (u32)umsg->device << 15;   /* [19:15] - Device   */
+	address |= (u32)umsg->bus << 20;      /* [23:20] - Bus      */
+
+	msg.addr = umsg->addr;
+	msg.tx_len = WRPCICFGLOCAL_WRITE_LEN_BASE + umsg->tx_len;
+	msg.rx_len = WRPCICFGLOCAL_READ_LEN;
+	msg.tx_buf[0] = WRPCICFGLOCAL_PECI_CMD;
+	msg.tx_buf[1] = 0x00;         /* request byte for Host ID / Retry bit */
+				      /* Host ID is 0 for PECI 3.0 */
+	msg.tx_buf[2] = (u8)address;       /* LSB - PCI Configuration Address */
+	msg.tx_buf[3] = (u8)(address >> 8);  /* PCI Configuration Address */
+	msg.tx_buf[4] = (u8)(address >> 16); /* PCI Configuration Address */
+	for (i = 0; i < umsg->tx_len; i++)
+		msg.tx_buf[5 + i] = (u8)(umsg->value >> (i << 3));
+
+	/* Add an Assure Write Frame Check Sequence byte */
+	msg.tx_buf[5 + i] = 0x80 ^
+			    peci_aw_fcs((u8 *)&msg, 8 + umsg->tx_len);
+
+	rc = peci_xfer_with_retries(adapter, &msg, true);
+	if (rc || msg.rx_buf[0] != DEV_PECI_CC_SUCCESS) {
+		dev_dbg(&adapter->dev, "xfer error, rc : %d\n", rc);
+		return -EIO;
+	}
+
+	return rc;
+}
+
+typedef int (*peci_ioctl_fn_type)(struct peci_adapter *, void *);
+
+static peci_ioctl_fn_type peci_ioctl_fn[PECI_CMD_MAX] = {
+	NULL, /* Reserved */
+	peci_ioctl_ping,
+	peci_ioctl_get_dib,
+	peci_ioctl_get_temp,
+	peci_ioctl_rd_pkg_cfg,
+	peci_ioctl_wr_pkg_cfg,
+	peci_ioctl_rd_ia_msr,
+	NULL, /* Reserved */
+	peci_ioctl_rd_pci_cfg,
+	NULL, /* Reserved */
+	peci_ioctl_rd_pci_cfg_local,
+	peci_ioctl_wr_pci_cfg_local,
+};
+
+int peci_command(struct peci_adapter *adapter, enum peci_cmd cmd, void *vmsg)
+{
+	int ret = 0;
+
+	if (cmd >= PECI_CMD_MAX)
+		return -EINVAL;
+
+	dev_dbg(&adapter->dev, "%s, cmd=0x%02x\n", __func__, cmd);
+
+	if (!peci_ioctl_fn[cmd])
+		return -EINVAL;
+
+	ret = peci_ioctl_fn[cmd](adapter, vmsg);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(peci_command);
+
+static long peci_ioctl(struct file *file, unsigned int iocmd, unsigned long arg)
+{
+	struct peci_adapter *adapter = file->private_data;
+	void __user *argp = (void __user *)arg;
+	unsigned int msg_len;
+	enum peci_cmd cmd;
+	u8 *msg;
+	int rc = 0;
+
+	dev_dbg(&adapter->dev, "ioctl, cmd=0x%x, arg=0x%lx\n", iocmd, arg);
+
+	switch (iocmd) {
+	case PECI_IOC_PING:
+	case PECI_IOC_GET_DIB:
+	case PECI_IOC_GET_TEMP:
+	case PECI_IOC_RD_PKG_CFG:
+	case PECI_IOC_WR_PKG_CFG:
+	case PECI_IOC_RD_IA_MSR:
+	case PECI_IOC_RD_PCI_CFG:
+	case PECI_IOC_RD_PCI_CFG_LOCAL:
+	case PECI_IOC_WR_PCI_CFG_LOCAL:
+		cmd = _IOC_TYPE(iocmd) - PECI_IOC_BASE;
+		msg_len = _IOC_SIZE(iocmd);
+		break;
+
+	default:
+		dev_dbg(&adapter->dev, "Invalid ioctl cmd : 0x%x\n", iocmd);
+		return -EINVAL;
+	}
+
+	if (!msg_len)
+		return -EINVAL;
+
+	msg = memdup_user(argp, msg_len);
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	rc = peci_command(adapter, cmd, msg);
+
+	if (!rc && copy_to_user(argp, msg, msg_len))
+		rc = -EFAULT;
+
+	kfree(msg);
+	return (long)rc;
+}
+
+static int peci_open(struct inode *inode, struct file *file)
+{
+	unsigned int minor = iminor(inode);
+	struct peci_adapter *adapter;
+
+	adapter = peci_get_adapter(minor);
+	if (!adapter)
+		return -ENODEV;
+
+	file->private_data = adapter;
+
+	return 0;
+}
+
+static int peci_release(struct inode *inode, struct file *file)
+{
+	struct peci_adapter *adapter = file->private_data;
+
+	peci_put_adapter(adapter);
+	file->private_data = NULL;
+
+	return 0;
+}
+
+static const struct file_operations peci_fops = {
+	.owner          = THIS_MODULE,
+	.unlocked_ioctl = peci_ioctl,
+	.open           = peci_open,
+	.release        = peci_release,
+};
+
+static int peci_detect(struct peci_adapter *adapter, u8 addr)
+{
+	struct peci_xfer_msg msg;
+	int rc;
+
+	rc = peci_cmd_support(adapter, PECI_CMD_PING);
+	if (rc < 0)
+		return rc;
+
+	msg.addr   = addr;
+	msg.tx_len = 0;
+	msg.rx_len = 0;
+
+	rc = peci_xfer(adapter, &msg);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+
+#if IS_ENABLED(CONFIG_OF)
+static const struct of_device_id *
+peci_of_match_device(const struct of_device_id *matches,
+		     struct peci_client *client)
+{
+	if (!(client && matches))
+		return NULL;
+
+	return of_match_device(matches, &client->dev);
+}
+#endif
+
+const struct peci_device_id *peci_match_id(const struct peci_device_id *id,
+					   struct peci_client *client)
+{
+	if (!(id && client))
+		return NULL;
+
+	while (id->name[0]) {
+		if (strcmp(client->name, id->name) == 0)
+			return id;
+		id++;
+	}
+
+	return NULL;
+}
+
+static int peci_device_match(struct device *dev, struct device_driver *drv)
+{
+	struct peci_client *client = peci_verify_client(dev);
+	struct peci_driver *driver;
+
+	/* Attempt an OF style match */
+	if (peci_of_match_device(drv->of_match_table, client))
+		return 1;
+
+	driver = to_peci_driver(drv);
+
+	if (peci_match_id(driver->id_table, client))
+		return 1;
+
+	return 0;
+}
+
+static int peci_device_probe(struct device *dev)
+{
+	struct peci_client	*client = peci_verify_client(dev);
+	struct peci_driver	*driver;
+	int status = -EINVAL;
+
+	if (!client)
+		return 0;
+
+	if (!peci_of_match_device(dev->driver->of_match_table, client))
+		return -ENODEV;
+
+	dev_dbg(dev, "%s: name:%s\n", __func__, client->name);
+
+	driver = to_peci_driver(dev->driver);
+	if (driver->probe)
+		status = driver->probe(client);
+
+	return status;
+}
+
+static int peci_device_remove(struct device *dev)
+{
+	struct peci_client	*client = peci_verify_client(dev);
+	struct peci_driver	*driver;
+	int status = 0;
+
+	if (!client || !dev->driver)
+		return 0;
+
+	driver = to_peci_driver(dev->driver);
+	if (driver->remove) {
+		dev_dbg(dev, "%s: name:%s\n", __func__, client->name);
+		status = driver->remove(client);
+	}
+
+	return status;
+}
+
+static void peci_device_shutdown(struct device *dev)
+{
+	struct peci_client *client = peci_verify_client(dev);
+	struct peci_driver *driver;
+
+	if (!client || !dev->driver)
+		return;
+
+	dev_dbg(dev, "%s: name:%s\n", __func__, client->name);
+
+	driver = to_peci_driver(dev->driver);
+	if (driver->shutdown)
+		driver->shutdown(client);
+}
+
+static struct bus_type peci_bus_type = {
+	.name		= "peci",
+	.match		= peci_device_match,
+	.probe		= peci_device_probe,
+	.remove		= peci_device_remove,
+	.shutdown	= peci_device_shutdown,
+};
+
+static void peci_unregister_device(struct peci_client *client)
+{
+	if (client->dev.of_node)
+		of_node_clear_flag(client->dev.of_node, OF_POPULATED);
+
+	device_unregister(&client->dev);
+}
+
+static int peci_check_addr_validity(u8 addr)
+{
+	if (addr < PECI_BASE_ADDR && addr > PECI_BASE_ADDR + PECI_OFFSET_MAX)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int peci_check_addr_busy(struct device *dev, void *addrp)
+{
+	struct peci_client *client = peci_verify_client(dev);
+	u8 addr = *(u8 *)addrp;
+
+	if (client && client->addr == addr)
+		return -EBUSY;
+
+	return 0;
+}
+
+static struct peci_client *peci_new_device(struct peci_adapter *adapter,
+					   struct peci_board_info const *info)
+{
+	struct peci_client *client;
+	int rc;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return NULL;
+
+	client->adapter = adapter;
+	client->addr = info->addr;
+	strlcpy(client->name, info->type, sizeof(client->name));
+
+	rc = peci_check_addr_validity(client->addr);
+	if (rc) {
+		dev_err(&adapter->dev, "Invalid PECI CPU address 0x%02hx\n",
+			client->addr);
+		goto err_free_client_silent;
+	}
+
+	/* Check for address business */
+	rc = device_for_each_child(&adapter->dev, &client->addr,
+				   peci_check_addr_busy);
+	if (rc)
+		goto err_free_client;
+
+	/* Check client's online status */
+	rc = peci_detect(adapter, client->addr);
+	if (rc)
+		goto err_free_client;
+
+	client->dev.parent = &client->adapter->dev;
+	client->dev.bus = &peci_bus_type;
+	client->dev.type = &peci_client_type;
+	client->dev.of_node = info->of_node;
+	dev_set_name(&client->dev, "%d-%02x", adapter->nr, client->addr);
+	rc = device_register(&client->dev);
+	if (rc)
+		goto err_free_client;
+
+	dev_dbg(&adapter->dev, "client [%s] registered with bus id %s\n",
+		client->name, dev_name(&client->dev));
+
+	return client;
+
+err_free_client:
+	dev_err(&adapter->dev,
+		"Failed to register peci client %s at 0x%02x (%d)\n",
+		client->name, client->addr, rc);
+err_free_client_silent:
+	kfree(client);
+	return NULL;
+}
+
+#if IS_ENABLED(CONFIG_OF)
+static struct peci_client *peci_of_register_device(struct peci_adapter *adapter,
+						   struct device_node *node)
+{
+	struct peci_client *result;
+	struct peci_board_info info = {};
+	const __be32 *addr_be;
+	u32 addr;
+	int len;
+
+	dev_dbg(&adapter->dev, "register %s\n", node->full_name);
+
+	if (of_modalias_node(node, info.type, sizeof(info.type)) < 0) {
+		dev_err(&adapter->dev, "modalias failure on %s\n",
+			node->full_name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	addr_be = of_get_property(node, "reg", &len);
+	if (!addr_be || len < sizeof(*addr_be)) {
+		dev_err(&adapter->dev, "invalid reg on %s\n",
+			node->full_name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	addr = be32_to_cpup(addr_be);
+
+	if (peci_check_addr_validity(addr)) {
+		dev_err(&adapter->dev, "invalid addr=%x on %s\n",
+			addr, node->full_name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	info.addr = addr;
+	info.of_node = of_node_get(node);
+
+	result = peci_new_device(adapter, &info);
+	if (!result)
+		result = ERR_PTR(-EINVAL);
+
+	of_node_put(node);
+	return result;
+}
+
+static void peci_of_register_devices(struct peci_adapter *adapter)
+{
+	struct device_node *bus, *node;
+	struct peci_client *client;
+
+	/* Only register child devices if the adapter has a node pointer set */
+	if (!adapter->dev.of_node)
+		return;
+
+	bus = of_get_child_by_name(adapter->dev.of_node, "peci-bus");
+	if (!bus)
+		bus = of_node_get(adapter->dev.of_node);
+
+	for_each_available_child_of_node(bus, node) {
+		if (of_node_test_and_set_flag(node, OF_POPULATED))
+			continue;
+
+		client = peci_of_register_device(adapter, node);
+		if (IS_ERR(client)) {
+			dev_warn(&adapter->dev,
+				 "Failed to create PECI device for %s\n",
+				 node->full_name);
+			of_node_clear_flag(node, OF_POPULATED);
+		}
+	}
+
+	of_node_put(bus);
+}
+
+static int peci_of_match_node(struct device *dev, void *data)
+{
+	return dev->of_node == data;
+}
+
+/* must call put_device() when done with returned peci_client device */
+static struct peci_client *peci_of_find_device(struct device_node *node)
+{
+	struct device *dev;
+	struct peci_client *client;
+
+	dev = bus_find_device(&peci_bus_type, NULL, node, peci_of_match_node);
+	if (!dev)
+		return NULL;
+
+	client = peci_verify_client(dev);
+	if (!client)
+		put_device(dev);
+
+	return client;
+}
+
+/* must call put_device() when done with returned peci_adapter device */
+static struct peci_adapter *peci_of_find_adapter(struct device_node *node)
+{
+	struct device *dev;
+	struct peci_adapter *adapter;
+
+	dev = bus_find_device(&peci_bus_type, NULL, node, peci_of_match_node);
+	if (!dev)
+		return NULL;
+
+	adapter = peci_verify_adapter(dev);
+	if (!adapter)
+		put_device(dev);
+
+	return adapter;
+}
+#else
+static void peci_of_register_devices(struct peci_adapter *adapter) { }
+#endif /* CONFIG_OF */
+
+#if IS_ENABLED(CONFIG_OF_DYNAMIC)
+static int peci_of_notify(struct notifier_block *nb,
+			  unsigned long action,
+			  void *arg)
+{
+	struct of_reconfig_data *rd = arg;
+	struct peci_adapter *adapter;
+	struct peci_client *client;
+
+	switch (of_reconfig_get_state_change(action, rd)) {
+	case OF_RECONFIG_CHANGE_ADD:
+		adapter = peci_of_find_adapter(rd->dn->parent);
+		if (!adapter)
+			return NOTIFY_OK;	/* not for us */
+
+		if (of_node_test_and_set_flag(rd->dn, OF_POPULATED)) {
+			put_device(&adapter->dev);
+			return NOTIFY_OK;
+		}
+
+		client = peci_of_register_device(adapter, rd->dn);
+		put_device(&adapter->dev);
+
+		if (IS_ERR(client)) {
+			dev_err(&adapter->dev,
+				"failed to create client for '%s'\n",
+				rd->dn->full_name);
+			of_node_clear_flag(rd->dn, OF_POPULATED);
+			return notifier_from_errno(PTR_ERR(client));
+		}
+		break;
+	case OF_RECONFIG_CHANGE_REMOVE:
+		/* already depopulated? */
+		if (!of_node_check_flag(rd->dn, OF_POPULATED))
+			return NOTIFY_OK;
+
+		/* find our device by node */
+		client = peci_of_find_device(rd->dn);
+		if (!client)
+			return NOTIFY_OK;	/* no? not meant for us */
+
+		/* unregister takes one ref away */
+		peci_unregister_device(client);
+
+		/* and put the reference of the find */
+		put_device(&client->dev);
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block peci_of_notifier = {
+	.notifier_call = peci_of_notify,
+};
+#else
+extern struct notifier_block peci_of_notifier;
+#endif /* CONFIG_OF_DYNAMIC */
+
+static int peci_register_adapter(struct peci_adapter *adapter)
+{
+	int res = -EINVAL;
+
+	/* Can't register until after driver model init */
+	if (WARN_ON(!is_registered)) {
+		res = -EAGAIN;
+		goto err_free_idr;
+	}
+
+	if (WARN(!adapter->name[0], "peci adapter has no name"))
+		goto err_free_idr;
+
+	rt_mutex_init(&adapter->bus_lock);
+
+	dev_set_name(&adapter->dev, "peci%d", adapter->nr);
+	adapter->dev.bus = &peci_bus_type;
+	adapter->dev.type = &peci_adapter_type;
+	device_initialize(&adapter->dev);
+
+	/* cdev */
+	cdev_init(&adapter->cdev, &peci_fops);
+	adapter->cdev.owner = THIS_MODULE;
+	adapter->cdev.kobj.parent = &adapter->dev.kobj;
+	adapter->dev.devt = MKDEV(MAJOR(peci_devt), adapter->nr);
+	res = cdev_add(&adapter->cdev, adapter->dev.devt, 1);
+	if (res) {
+		pr_err("adapter '%s': can't add cdev (%d)\n",
+		       adapter->name, res);
+		goto err_free_idr;
+	}
+	res = device_add(&adapter->dev);
+	if (res) {
+		pr_err("adapter '%s': can't add device (%d)\n",
+		       adapter->name, res);
+		goto err_del_cdev;
+	}
+
+	dev_dbg(&adapter->dev, "adapter [%s] registered\n", adapter->name);
+
+	/* create pre-declared device nodes */
+	peci_of_register_devices(adapter);
+
+	return 0;
+
+err_del_cdev:
+	cdev_del(&adapter->cdev);
+err_free_idr:
+	mutex_lock(&core_lock);
+	idr_remove(&peci_adapter_idr, adapter->nr);
+	mutex_unlock(&core_lock);
+	return res;
+}
+
+static int peci_add_numbered_adapter(struct peci_adapter *adapter)
+{
+	int id;
+
+	mutex_lock(&core_lock);
+	id = idr_alloc(&peci_adapter_idr, adapter,
+		       adapter->nr, adapter->nr + 1, GFP_KERNEL);
+	mutex_unlock(&core_lock);
+	if (WARN(id < 0, "couldn't get idr"))
+		return id == -ENOSPC ? -EBUSY : id;
+
+	return peci_register_adapter(adapter);
+}
+
+int peci_add_adapter(struct peci_adapter *adapter)
+{
+	struct device *dev = &adapter->dev;
+	int id;
+
+	if (dev->of_node) {
+		id = of_alias_get_id(dev->of_node, "peci");
+		if (id >= 0) {
+			adapter->nr = id;
+			return peci_add_numbered_adapter(adapter);
+		}
+	}
+
+	mutex_lock(&core_lock);
+	id = idr_alloc(&peci_adapter_idr, adapter, 0, 0, GFP_KERNEL);
+	mutex_unlock(&core_lock);
+	if (WARN(id < 0, "couldn't get idr"))
+		return id;
+
+	adapter->nr = id;
+
+	return peci_register_adapter(adapter);
+}
+EXPORT_SYMBOL_GPL(peci_add_adapter);
+
+static int peci_unregister_client(struct device *dev, void *dummy)
+{
+	struct peci_client *client = peci_verify_client(dev);
+
+	if (client)
+		peci_unregister_device(client);
+
+	return 0;
+}
+
+void peci_del_adapter(struct peci_adapter *adapter)
+{
+	struct peci_adapter *found;
+
+	/* First make sure that this adapter was ever added */
+	mutex_lock(&core_lock);
+	found = idr_find(&peci_adapter_idr, adapter->nr);
+	mutex_unlock(&core_lock);
+
+	if (found != adapter)
+		return;
+
+	/**
+	 * Detach any active clients. This can't fail, thus we do not
+	 * check the returned value.
+	 */
+	device_for_each_child(&adapter->dev, NULL, peci_unregister_client);
+
+	/* device name is gone after device_unregister */
+	dev_dbg(&adapter->dev, "adapter [%s] unregistered\n", adapter->name);
+
+	device_unregister(&adapter->dev);
+
+	/* free cdev */
+	cdev_del(&adapter->cdev);
+
+	/* free bus id */
+	mutex_lock(&core_lock);
+	idr_remove(&peci_adapter_idr, adapter->nr);
+	mutex_unlock(&core_lock);
+}
+EXPORT_SYMBOL_GPL(peci_del_adapter);
+
+/**
+ * A peci_driver is used with one or more peci_client (device) nodes to access
+ * peci clients, on a bus instance associated with some peci_adapter.
+ */
+int peci_register_driver(struct module *owner, struct peci_driver *driver)
+{
+	int res;
+
+	/* Can't register until after driver model init */
+	if (WARN_ON(!is_registered))
+		return -EAGAIN;
+
+	/* add the driver to the list of peci drivers in the driver core */
+	driver->driver.owner = owner;
+	driver->driver.bus = &peci_bus_type;
+
+	/**
+	 * When registration returns, the driver core
+	 * will have called probe() for all matching-but-unbound devices.
+	 */
+	res = driver_register(&driver->driver);
+	if (res)
+		return res;
+
+	pr_debug("driver [%s] registered\n", driver->driver.name);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(peci_register_driver);
+
+void peci_del_driver(struct peci_driver *driver)
+{
+	driver_unregister(&driver->driver);
+	pr_debug("driver [%s] unregistered\n", driver->driver.name);
+}
+EXPORT_SYMBOL_GPL(peci_del_driver);
+
+static int __init peci_init(void)
+{
+	int ret;
+
+	ret = bus_register(&peci_bus_type);
+	if (ret < 0) {
+		pr_err("peci: Failed to register PECI bus type!\n");
+		return ret;
+	}
+
+	ret = alloc_chrdev_region(&peci_devt, 0, PECI_CDEV_MAX, "peci");
+	if (ret < 0) {
+		pr_err("peci: Failed to allocate chr dev region!\n");
+		bus_unregister(&peci_bus_type);
+		return ret;
+	}
+
+	crc8_populate_msb(peci_crc8_table, PECI_CRC8_POLYNOMIAL);
+
+	if (IS_ENABLED(CONFIG_OF_DYNAMIC))
+		WARN_ON(of_reconfig_notifier_register(&peci_of_notifier));
+
+	is_registered = true;
+
+	return 0;
+}
+
+static void __exit peci_exit(void)
+{
+	if (IS_ENABLED(CONFIG_OF_DYNAMIC))
+		WARN_ON(of_reconfig_notifier_unregister(&peci_of_notifier));
+
+	unregister_chrdev_region(peci_devt, PECI_CDEV_MAX);
+	bus_unregister(&peci_bus_type);
+}
+
+postcore_initcall(peci_init);
+module_exit(peci_exit);
+
+MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
+MODULE_DESCRIPTION("PECI bus core module");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/peci.h b/include/linux/peci.h
new file mode 100644
index 000000000000..e0cace2701a9
--- /dev/null
+++ b/include/linux/peci.h
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#ifndef __LINUX_PECI_H
+#define __LINUX_PECI_H
+
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/peci-ioctl.h>
+#include <linux/rtmutex.h>
+
+#define PECI_BUFFER_SIZE  32
+#define PECI_NAME_SIZE    32
+
+struct peci_xfer_msg {
+	u8	addr;
+	u8	tx_len;
+	u8	rx_len;
+	u8	tx_buf[PECI_BUFFER_SIZE];
+	u8	rx_buf[PECI_BUFFER_SIZE];
+} __attribute__((__packed__));
+
+struct peci_board_info {
+	char			type[PECI_NAME_SIZE];
+	u8			addr;	/* CPU client address */
+	struct device_node	*of_node;
+};
+
+struct peci_adapter {
+	struct module	*owner;
+	struct rt_mutex	bus_lock;
+	struct device	dev;
+	struct cdev	cdev;
+	int		nr;
+	char		name[PECI_NAME_SIZE];
+	int		(*xfer)(struct peci_adapter *adapter,
+				struct peci_xfer_msg *msg);
+	uint		cmd_mask;
+};
+
+#define to_peci_adapter(d) container_of(d, struct peci_adapter, dev)
+
+static inline void *peci_get_adapdata(const struct peci_adapter *adapter)
+{
+	return dev_get_drvdata(&adapter->dev);
+}
+
+static inline void peci_set_adapdata(struct peci_adapter *adapter, void *data)
+{
+	dev_set_drvdata(&adapter->dev, data);
+}
+
+struct peci_client {
+	struct device		dev;		/* the device structure */
+	struct peci_adapter	*adapter;	/* the adapter we sit on */
+	u8			addr;		/* CPU client address */
+	char			name[PECI_NAME_SIZE];
+};
+
+#define to_peci_client(d) container_of(d, struct peci_client, dev)
+
+struct peci_device_id {
+	char		name[PECI_NAME_SIZE];
+	kernel_ulong_t	driver_data;	/* Data private to the driver */
+};
+
+struct peci_driver {
+	int				(*probe)(struct peci_client *client);
+	int				(*remove)(struct peci_client *client);
+	void				(*shutdown)(struct peci_client *client);
+	struct device_driver		driver;
+	const struct peci_device_id	*id_table;
+};
+
+#define to_peci_driver(d) container_of(d, struct peci_driver, driver)
+
+/**
+ * module_peci_driver() - Helper macro for registering a modular PECI driver
+ * @__peci_driver: peci_driver struct
+ *
+ * Helper macro for PECI drivers which do not do anything special in module
+ * init/exit. This eliminates a lot of boilerplate. Each module may only
+ * use this macro once, and calling it replaces module_init() and module_exit()
+ */
+#define module_peci_driver(__peci_driver) \
+	module_driver(__peci_driver, peci_add_driver, peci_del_driver)
+
+/* use a define to avoid include chaining to get THIS_MODULE */
+#define peci_add_driver(driver) peci_register_driver(THIS_MODULE, driver)
+
+int  peci_register_driver(struct module *owner, struct peci_driver *drv);
+void peci_del_driver(struct peci_driver *driver);
+int  peci_add_adapter(struct peci_adapter *adapter);
+void peci_del_adapter(struct peci_adapter *adapter);
+int  peci_command(struct peci_adapter *adpater, enum peci_cmd cmd, void *vmsg);
+
+#endif /* __LINUX_PECI_H */
diff --git a/include/uapi/linux/peci-ioctl.h b/include/uapi/linux/peci-ioctl.h
new file mode 100644
index 000000000000..6132180f39ba
--- /dev/null
+++ b/include/uapi/linux/peci-ioctl.h
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#ifndef __PECI_IOCTL_H
+#define __PECI_IOCTL_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* Base Address of 48d */
+#define PECI_BASE_ADDR  0x30  /* The PECI client's default address of 0x30 */
+#define PECI_OFFSET_MAX 8     /* Max numver of CPU clients */
+
+/* PCI Access */
+#define MAX_PCI_READ_LEN 24   /* Number of bytes of the PCI Space read */
+
+#define PCI_BUS0_CPU0      0x00
+#define PCI_BUS0_CPU1      0x80
+#define PCI_CPUBUSNO_BUS   0x00
+#define PCI_CPUBUSNO_DEV   0x08
+#define PCI_CPUBUSNO_FUNC  0x02
+#define PCI_CPUBUSNO       0xcc
+#define PCI_CPUBUSNO_1     0xd0
+#define PCI_CPUBUSNO_VALID 0xd4
+
+/* Package Identifier Read Parameter Value */
+#define PKG_ID_CPU_ID               0x0000  /* CPUID Info */
+#define PKG_ID_PLATFORM_ID          0x0001  /* Platform ID */
+#define PKG_ID_UNCORE_ID            0x0002  /* Uncore Device ID */
+#define PKG_ID_MAX_THREAD_ID        0x0003  /* Max Thread ID */
+#define PKG_ID_MICROCODE_REV        0x0004  /* CPU Microcode Update Revision */
+#define PKG_ID_MACHINE_CHECK_STATUS 0x0005  /* Machine Check Status */
+
+/* RdPkgConfig Index */
+#define MBX_INDEX_CPU_ID            0   /* Package Identifier Read */
+#define MBX_INDEX_VR_DEBUG          1   /* VR Debug */
+#define MBX_INDEX_PKG_TEMP_READ     2   /* Package Temperature Read */
+#define MBX_INDEX_ENERGY_COUNTER    3   /* Energy counter */
+#define MBX_INDEX_ENERGY_STATUS     4   /* DDR Energy Status */
+#define MBX_INDEX_WAKE_MODE_BIT     5   /* "Wake on PECI" Mode bit */
+#define MBX_INDEX_EPI               6   /* Efficient Performance Indication */
+#define MBX_INDEX_PKG_RAPL_PERF     8   /* Pkg RAPL Performance Status Read */
+#define MBX_INDEX_PER_CORE_DTS_TEMP 9   /* Per Core DTS Temperature Read */
+#define MBX_INDEX_DTS_MARGIN        10  /* DTS thermal margin */
+#define MBX_INDEX_SKT_PWR_THRTL_DUR 11  /* Socket Power Throttled Duration */
+#define MBX_INDEX_CFG_TDP_CONTROL   12  /* TDP Config Control */
+#define MBX_INDEX_CFG_TDP_LEVELS    13  /* TDP Config Levels */
+#define MBX_INDEX_DDR_DIMM_TEMP     14  /* DDR DIMM Temperature */
+#define MBX_INDEX_CFG_ICCMAX        15  /* Configurable ICCMAX */
+#define MBX_INDEX_TEMP_TARGET       16  /* Temperature Target Read */
+#define MBX_INDEX_CURR_CFG_LIMIT    17  /* Current Config Limit */
+#define MBX_INDEX_DIMM_TEMP_READ    20  /* Package Thermal Status Read */
+#define MBX_INDEX_DRAM_IMC_TMP_READ 22  /* DRAM IMC Temperature Read */
+#define MBX_INDEX_DDR_CH_THERM_STAT 23  /* DDR Channel Thermal Status */
+#define MBX_INDEX_PKG_POWER_LIMIT1  26  /* Package Power Limit1 */
+#define MBX_INDEX_PKG_POWER_LIMIT2  27  /* Package Power Limit2 */
+#define MBX_INDEX_TDP               28  /* Thermal design power minimum */
+#define MBX_INDEX_TDP_HIGH          29  /* Thermal design power maximum */
+#define MBX_INDEX_TDP_UNITS         30  /* Units for power/energy registers */
+#define MBX_INDEX_RUN_TIME          31  /* Accumulated Run Time */
+#define MBX_INDEX_CONSTRAINED_TIME  32  /* Thermally Constrained Time Read */
+#define MBX_INDEX_TURBO_RATIO       33  /* Turbo Activation Ratio */
+#define MBX_INDEX_DDR_RAPL_PL1      34  /* DDR RAPL PL1 */
+#define MBX_INDEX_DDR_PWR_INFO_HIGH 35  /* DRAM Power Info Read (high) */
+#define MBX_INDEX_DDR_PWR_INFO_LOW  36  /* DRAM Power Info Read (low) */
+#define MBX_INDEX_DDR_RAPL_PL2      37  /* DDR RAPL PL2 */
+#define MBX_INDEX_DDR_RAPL_STATUS   38  /* DDR RAPL Performance Status */
+#define MBX_INDEX_DDR_HOT_ABSOLUTE  43  /* DDR Hottest Dimm Absolute Temp */
+#define MBX_INDEX_DDR_HOT_RELATIVE  44  /* DDR Hottest Dimm Relative Temp */
+#define MBX_INDEX_DDR_THROTTLE_TIME 45  /* DDR Throttle Time */
+#define MBX_INDEX_DDR_THERM_STATUS  46  /* DDR Thermal Status */
+#define MBX_INDEX_TIME_AVG_TEMP     47  /* Package time-averaged temperature */
+#define MBX_INDEX_TURBO_RATIO_LIMIT 49  /* Turbo Ratio Limit Read */
+#define MBX_INDEX_HWP_AUTO_OOB      53  /* HWP Autonomous Out-of-band */
+#define MBX_INDEX_DDR_WARM_BUDGET   55  /* DDR Warm Power Budget */
+#define MBX_INDEX_DDR_HOT_BUDGET    56  /* DDR Hot Power Budget */
+#define MBX_INDEX_PKG_PSYS_PWR_LIM3 57  /* Package/Psys Power Limit3 */
+#define MBX_INDEX_PKG_PSYS_PWR_LIM1 58  /* Package/Psys Power Limit1 */
+#define MBX_INDEX_PKG_PSYS_PWR_LIM2 59  /* Package/Psys Power Limit2 */
+#define MBX_INDEX_PKG_PSYS_PWR_LIM4 60  /* Package/Psys Power Limit4 */
+#define MBX_INDEX_PERF_LIMIT_REASON 65  /* Performance Limit Reasons */
+
+/* WrPkgConfig Index */
+#define MBX_INDEX_DIMM_AMBIENT 19
+#define MBX_INDEX_DIMM_TEMP    24
+
+enum peci_cmd {
+	PECI_CMD_XFER = 0,
+	PECI_CMD_PING,
+	PECI_CMD_GET_DIB,
+	PECI_CMD_GET_TEMP,
+	PECI_CMD_RD_PKG_CFG,
+	PECI_CMD_WR_PKG_CFG,
+	PECI_CMD_RD_IA_MSR,
+	PECI_CMD_WR_IA_MSR,
+	PECI_CMD_RD_PCI_CFG,
+	PECI_CMD_WR_PCI_CFG,
+	PECI_CMD_RD_PCI_CFG_LOCAL,
+	PECI_CMD_WR_PCI_CFG_LOCAL,
+	PECI_CMD_MAX
+};
+
+struct peci_ping_msg {
+	__u8 addr;
+} __attribute__((__packed__));
+
+struct peci_get_dib_msg {
+	__u8  addr;
+	__u32 dib;
+} __attribute__((__packed__));
+
+struct peci_get_temp_msg {
+	__u8  addr;
+	__s16 temp_raw;
+} __attribute__((__packed__));
+
+struct peci_rd_pkg_cfg_msg {
+	__u8  addr;
+	__u8  index;
+	__u16 param;
+	__u8  rx_len;
+	__u8  pkg_config[4];
+} __attribute__((__packed__));
+
+struct peci_wr_pkg_cfg_msg {
+	__u8  addr;
+	__u8  index;
+	__u16 param;
+	__u8  tx_len;
+	__u32 value;
+} __attribute__((__packed__));
+
+struct peci_rd_ia_msr_msg {
+	__u8  addr;
+	__u8  thread_id;
+	__u16 address;
+	__u64 value;
+} __attribute__((__packed__));
+
+struct peci_rd_pci_cfg_msg {
+	__u8  addr;
+	__u8  bus;
+	__u8  device;
+	__u8  function;
+	__u16 reg;
+	__u8  pci_config[4];
+} __attribute__((__packed__));
+
+struct peci_rd_pci_cfg_local_msg {
+	__u8  addr;
+	__u8  bus;
+	__u8  device;
+	__u8  function;
+	__u16 reg;
+	__u8  rx_len;
+	__u8  pci_config[4];
+} __attribute__((__packed__));
+
+struct peci_wr_pci_cfg_local_msg {
+	__u8  addr;
+	__u8  bus;
+	__u8  device;
+	__u8  function;
+	__u16 reg;
+	__u8  tx_len;
+	__u32 value;
+} __attribute__((__packed__));
+
+#define PECI_IOC_BASE  'P'
+
+#define PECI_IOC_PING \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_PING, 0, \
+		struct peci_ping_msg)
+
+#define PECI_IOC_GET_DIB \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_GET_DIB, 0, \
+		struct peci_get_dib_msg)
+
+#define PECI_IOC_GET_TEMP \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_GET_TEMP, 0, \
+		struct peci_get_temp_msg)
+
+#define PECI_IOC_RD_PKG_CFG \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_RD_PKG_CFG, 0, \
+		struct peci_rd_pkg_cfg_msg)
+
+#define PECI_IOC_WR_PKG_CFG \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_WR_PKG_CFG, 0, \
+		struct peci_wr_pkg_cfg_msg)
+
+#define PECI_IOC_RD_IA_MSR \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_RD_IA_MSR, 0, \
+		struct peci_rd_ia_msr_msg)
+
+#define PECI_IOC_RD_PCI_CFG \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_RD_PCI_CFG, 0, \
+		struct peci_rd_pci_cfg_msg)
+
+#define PECI_IOC_RD_PCI_CFG_LOCAL \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_RD_PCI_CFG_LOCAL, 0, \
+		struct peci_rd_pci_cfg_local_msg)
+
+#define PECI_IOC_WR_PCI_CFG_LOCAL \
+	_IOWR(PECI_IOC_BASE + PECI_CMD_WR_PCI_CFG_LOCAL, 0, \
+		struct peci_wr_pci_cfg_local_msg)
+
+#endif /* __PECI_IOCTL_H */
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
@ 2018-02-21 16:16 ` Jae Hyun Yoo
  2018-02-21 17:13   ` Andrew Lunn
  2018-03-06 12:40   ` Pavel Machek
  2018-02-21 16:16 ` [PATCH v2 3/8] [PATCH 3/8] ARM: dts: aspeed: peci: Add PECI node Jae Hyun Yoo
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:16 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds a dt-bindings document of PECI adapter driver for Aspeed
AST24xx/25xx SoCs.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 .../devicetree/bindings/peci/peci-aspeed.txt       | 73 ++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt

diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
new file mode 100644
index 000000000000..8a86f346d550
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
@@ -0,0 +1,73 @@
+Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs.
+
+Required properties:
+- compatible
+	"aspeed,ast2400-peci" or "aspeed,ast2500-peci"
+	- aspeed,ast2400-peci: Aspeed AST2400 family PECI controller
+	- aspeed,ast2500-peci: Aspeed AST2500 family PECI controller
+
+- reg
+	Should contain PECI registers location and length.
+
+- #address-cells
+	Should be <1>.
+
+- #size-cells
+	Should be <0>.
+
+- interrupts
+	Should contain PECI interrupt.
+
+- clocks
+	Should contain clock source for PECI hardware module. Should reference
+	clkin clock.
+
+- clock_frequency
+	Should contain the operation frequency of PECI hardware module.
+	187500 ~ 24000000
+
+Optional properties:
+- msg-timing-nego
+	Message timing negotiation period. This value will determine the period
+	of message timing negotiation to be issued by PECI controller. The unit
+	of the programmed value is four times of PECI clock period.
+	0 ~ 255 (default: 1)
+
+- addr-timing-nego
+	Address timing negotiation period. This value will determine the period
+	of address timing negotiation to be issued by PECI controller. The unit
+	of the programmed value is four times of PECI clock period.
+	0 ~ 255 (default: 1)
+
+- rd-sampling-point
+	Read sampling point selection. The whole period of a bit time will be
+	divided into 16 time frames. This value will determine which time frame
+	this controller will sample PECI signal for data read back. Usually in
+	the middle of a bit time is the best.
+	0 ~ 15 (default: 8)
+
+- cmd_timeout_ms
+	Command timeout in units of ms.
+	1 ~ 60000 (default: 1000)
+
+Example:
+	peci: peci@1e78b000 {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0x0 0x1e78b000 0x60>;
+
+		peci0: peci-bus@0 {
+			compatible = "aspeed,ast2500-peci";
+			reg = <0x0 0x60>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+			interrupts = <15>;
+			clocks = <&clk_clkin>;
+			clock-frequency = <24000000>;
+			msg-timing-nego = <1>;
+			addr-timing-nego = <1>;
+			rd-sampling-point = <8>;
+			cmd-timeout-ms = <1000>;
+		};
+	};
\ No newline at end of file
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 3/8] [PATCH 3/8] ARM: dts: aspeed: peci: Add PECI node
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
  2018-02-21 16:16 ` [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs Jae Hyun Yoo
@ 2018-02-21 16:16 ` Jae Hyun Yoo
  2018-02-21 16:16 ` [PATCH v2 4/8] [PATCH 4/8] drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx Jae Hyun Yoo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:16 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds PECI bus/adapter node of AST24xx/AST25xx into
aspeed-g4 and aspeed-g5.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 arch/arm/boot/dts/aspeed-g4.dtsi | 25 +++++++++++++++++++++++++
 arch/arm/boot/dts/aspeed-g5.dtsi | 25 +++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-g4.dtsi b/arch/arm/boot/dts/aspeed-g4.dtsi
index b0d8431a3700..077b4d6795b8 100644
--- a/arch/arm/boot/dts/aspeed-g4.dtsi
+++ b/arch/arm/boot/dts/aspeed-g4.dtsi
@@ -29,6 +29,7 @@
 		serial3 = &uart4;
 		serial4 = &uart5;
 		serial5 = &vuart;
+		peci0 = &peci0;
 	};
 
 	cpus {
@@ -250,6 +251,13 @@
 				};
 			};
 
+			peci: peci@1e78b000 {
+				compatible = "simple-bus";
+				#address-cells = <1>;
+				#size-cells = <1>;
+				ranges = <0x0 0x1e78b000 0x60>;
+			};
+
 			uart2: serial@1e78d000 {
 				compatible = "ns16550a";
 				reg = <0x1e78d000 0x20>;
@@ -290,6 +298,23 @@
 	};
 };
 
+&peci {
+	peci0: peci-bus@0 {
+		compatible = "aspeed,ast2400-peci";
+		reg = <0x0 0x60>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+		interrupts = <15>;
+		clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+		clock-frequency = <24000000>;
+		msg-timing-nego = <1>;
+		addr-timing-nego = <1>;
+		rd-sampling-point = <8>;
+		cmd-timeout-ms = <1000>;
+		status = "disabled";
+	};
+};
+
 &i2c {
 	i2c_ic: interrupt-controller@0 {
 		#interrupt-cells = <1>;
diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed-g5.dtsi
index 40de3b66c33f..5d3b5e177a32 100644
--- a/arch/arm/boot/dts/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed-g5.dtsi
@@ -29,6 +29,7 @@
 		serial3 = &uart4;
 		serial4 = &uart5;
 		serial5 = &vuart;
+		peci0 = &peci0;
 	};
 
 	cpus {
@@ -301,6 +302,13 @@
 				};
 			};
 
+			peci: peci@1e78b000 {
+				compatible = "simple-bus";
+				#address-cells = <1>;
+				#size-cells = <1>;
+				ranges = <0x0 0x1e78b000 0x60>;
+			};
+
 			uart2: serial@1e78d000 {
 				compatible = "ns16550a";
 				reg = <0x1e78d000 0x20>;
@@ -341,6 +349,23 @@
 	};
 };
 
+&peci {
+	peci0: peci-bus@0 {
+		compatible = "aspeed,ast2500-peci";
+		reg = <0x0 0x60>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+		interrupts = <15>;
+		clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+		clock-frequency = <24000000>;
+		msg-timing-nego = <1>;
+		addr-timing-nego = <1>;
+		rd-sampling-point = <8>;
+		cmd-timeout-ms = <1000>;
+		status = "disabled";
+	};
+};
+
 &i2c {
 	i2c_ic: interrupt-controller@0 {
 		#interrupt-cells = <1>;
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 4/8] [PATCH 4/8] drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
                   ` (2 preceding siblings ...)
  2018-02-21 16:16 ` [PATCH v2 3/8] [PATCH 3/8] ARM: dts: aspeed: peci: Add PECI node Jae Hyun Yoo
@ 2018-02-21 16:16 ` Jae Hyun Yoo
  2018-02-21 16:16 ` [PATCH v2 5/8] [PATCH [5/8] Documentation: dt-bindings: Add a document for PECI hwmon client driver Jae Hyun Yoo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:16 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds PECI adapter driver implementation for Aspeed
AST24xx/AST25xx.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 drivers/peci/Kconfig       |  19 ++
 drivers/peci/Makefile      |   3 +
 drivers/peci/peci-aspeed.c | 510 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 532 insertions(+)
 create mode 100644 drivers/peci/peci-aspeed.c

diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 1cd2cb4b2298..f9875a0d0bc7 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -14,7 +14,26 @@ config PECI
 	  processor and chipset components to external monitoring or control
 	  devices.
 
+	  If you want PECI support, you should say Y here and also to the
+	  specific driver for your bus adapter(s) below.
+
 	  This PECI support can also be built as a module.  If so, the module
 	  will be called peci-core.
 
+if PECI
+
+config PECI_ASPEED
+	tristate "Aspeed AST24xx/AST25xx PECI support"
+	select REGMAP_MMIO
+	depends on OF && (ARCH_ASPEED || COMPILE_TEST)
+	help
+	  Say Y here if you want support for the Platform Environment Control
+	  Interface (PECI) bus adapter driver on the Aspeed AST24XX and AST25XX
+	  SoCs.
+
+	  This support is also available as a module.  If so, the module
+	  will be called peci-aspeed.
+
+endif # PECI
+
 endmenu
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 9e8615e0d3ff..886285e69765 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -4,3 +4,6 @@
 
 # Core functionality
 obj-$(CONFIG_PECI)		+= peci-core.o
+
+# Hardware specific bus drivers
+obj-$(CONFIG_PECI_ASPEED)	+= peci-aspeed.o
diff --git a/drivers/peci/peci-aspeed.c b/drivers/peci/peci-aspeed.c
new file mode 100644
index 000000000000..2b7800e96805
--- /dev/null
+++ b/drivers/peci/peci-aspeed.c
@@ -0,0 +1,510 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2012-2020 ASPEED Technology Inc.
+// Copyright (c) 2018 Intel Corporation
+
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/peci.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+
+#define DUMP_DEBUG 0
+
+/* Aspeed PECI Registers */
+#define AST_PECI_CTRL     0x00
+#define AST_PECI_TIMING   0x04
+#define AST_PECI_CMD      0x08
+#define AST_PECI_CMD_CTRL 0x0c
+#define AST_PECI_EXP_FCS  0x10
+#define AST_PECI_CAP_FCS  0x14
+#define AST_PECI_INT_CTRL 0x18
+#define AST_PECI_INT_STS  0x1c
+#define AST_PECI_W_DATA0  0x20
+#define AST_PECI_W_DATA1  0x24
+#define AST_PECI_W_DATA2  0x28
+#define AST_PECI_W_DATA3  0x2c
+#define AST_PECI_R_DATA0  0x30
+#define AST_PECI_R_DATA1  0x34
+#define AST_PECI_R_DATA2  0x38
+#define AST_PECI_R_DATA3  0x3c
+#define AST_PECI_W_DATA4  0x40
+#define AST_PECI_W_DATA5  0x44
+#define AST_PECI_W_DATA6  0x48
+#define AST_PECI_W_DATA7  0x4c
+#define AST_PECI_R_DATA4  0x50
+#define AST_PECI_R_DATA5  0x54
+#define AST_PECI_R_DATA6  0x58
+#define AST_PECI_R_DATA7  0x5c
+
+/* AST_PECI_CTRL - 0x00 : Control Register */
+#define PECI_CTRL_SAMPLING_MASK     GENMASK(19, 16)
+#define PECI_CTRL_SAMPLING(x)       (((x) << 16) & PECI_CTRL_SAMPLING_MASK)
+#define PECI_CTRL_SAMPLING_GET(x)   (((x) & PECI_CTRL_SAMPLING_MASK) >> 16)
+#define PECI_CTRL_READ_MODE_MASK    GENMASK(13, 12)
+#define PECI_CTRL_READ_MODE(x)      (((x) << 12) & PECI_CTRL_READ_MODE_MASK)
+#define PECI_CTRL_READ_MODE_GET(x)  (((x) & PECI_CTRL_READ_MODE_MASK) >> 12)
+#define PECI_CTRL_READ_MODE_COUNT   BIT(12)
+#define PECI_CTRL_READ_MODE_DBG     BIT(13)
+#define PECI_CTRL_CLK_SOURCE_MASK   BIT(11)
+#define PECI_CTRL_CLK_SOURCE(x)     (((x) << 11) & PECI_CTRL_CLK_SOURCE_MASK)
+#define PECI_CTRL_CLK_SOURCE_GET(x) (((x) & PECI_CTRL_CLK_SOURCE_MASK) >> 11)
+#define PECI_CTRL_CLK_DIV_MASK      GENMASK(10, 8)
+#define PECI_CTRL_CLK_DIV(x)        (((x) << 8) & PECI_CTRL_CLK_DIV_MASK)
+#define PECI_CTRL_CLK_DIV_GET(x)    (((x) & PECI_CTRL_CLK_DIV_MASK) >> 8)
+#define PECI_CTRL_INVERT_OUT        BIT(7)
+#define PECI_CTRL_INVERT_IN         BIT(6)
+#define PECI_CTRL_BUS_CONTENT_EN    BIT(5)
+#define PECI_CTRL_PECI_EN           BIT(4)
+#define PECI_CTRL_PECI_CLK_EN       BIT(0)
+
+/* AST_PECI_TIMING - 0x04 : Timing Negotiation Register */
+#define PECI_TIMING_MESSAGE_MASK   GENMASK(15, 8)
+#define PECI_TIMING_MESSAGE(x)     (((x) << 8) & PECI_TIMING_MESSAGE_MASK)
+#define PECI_TIMING_MESSAGE_GET(x) (((x) & PECI_TIMING_MESSAGE_MASK) >> 8)
+#define PECI_TIMING_ADDRESS_MASK   GENMASK(7, 0)
+#define PECI_TIMING_ADDRESS(x)     ((x) & PECI_TIMING_ADDRESS_MASK)
+#define PECI_TIMING_ADDRESS_GET(x) ((x) & PECI_TIMING_ADDRESS_MASK)
+
+/* AST_PECI_CMD - 0x08 : Command Register */
+#define PECI_CMD_PIN_MON    BIT(31)
+#define PECI_CMD_STS_MASK   GENMASK(27, 24)
+#define PECI_CMD_STS_GET(x) (((x) & PECI_CMD_STS_MASK) >> 24)
+#define PECI_CMD_FIRE       BIT(0)
+
+/* AST_PECI_LEN - 0x0C : Read/Write Length Register */
+#define PECI_AW_FCS_EN       BIT(31)
+#define PECI_READ_LEN_MASK   GENMASK(23, 16)
+#define PECI_READ_LEN(x)     (((x) << 16) & PECI_READ_LEN_MASK)
+#define PECI_WRITE_LEN_MASK  GENMASK(15, 8)
+#define PECI_WRITE_LEN(x)    (((x) << 8) & PECI_WRITE_LEN_MASK)
+#define PECI_TAGET_ADDR_MASK GENMASK(7, 0)
+#define PECI_TAGET_ADDR(x)   ((x) & PECI_TAGET_ADDR_MASK)
+
+/* AST_PECI_EXP_FCS - 0x10 : Expected FCS Data Register */
+#define PECI_EXPECT_READ_FCS_MASK      GENMASK(23, 16)
+#define PECI_EXPECT_READ_FCS_GET(x)    (((x) & PECI_EXPECT_READ_FCS_MASK) >> 16)
+#define PECI_EXPECT_AW_FCS_AUTO_MASK   GENMASK(15, 8)
+#define PECI_EXPECT_AW_FCS_AUTO_GET(x) (((x) & PECI_EXPECT_AW_FCS_AUTO_MASK) \
+					>> 8)
+#define PECI_EXPECT_WRITE_FCS_MASK     GENMASK(7, 0)
+#define PECI_EXPECT_WRITE_FCS_GET(x)   ((x) & PECI_EXPECT_WRITE_FCS_MASK)
+
+/* AST_PECI_CAP_FCS - 0x14 : Captured FCS Data Register */
+#define PECI_CAPTURE_READ_FCS_MASK    GENMASK(23, 16)
+#define PECI_CAPTURE_READ_FCS_GET(x)  (((x) & PECI_CAPTURE_READ_FCS_MASK) >> 16)
+#define PECI_CAPTURE_WRITE_FCS_MASK   GENMASK(7, 0)
+#define PECI_CAPTURE_WRITE_FCS_GET(x) ((x) & PECI_CAPTURE_WRITE_FCS_MASK)
+
+/* AST_PECI_INT_CTRL/STS - 0x18/0x1c : Interrupt Register */
+#define PECI_INT_TIMING_RESULT_MASK GENMASK(31, 30)
+#define PECI_INT_TIMEOUT            BIT(4)
+#define PECI_INT_CONNECT            BIT(3)
+#define PECI_INT_W_FCS_BAD          BIT(2)
+#define PECI_INT_W_FCS_ABORT        BIT(1)
+#define PECI_INT_CMD_DONE           BIT(0)
+
+struct aspeed_peci {
+	struct peci_adapter	adaper;
+	struct device		*dev;
+	struct regmap		*regmap;
+	int			irq;
+	struct completion	xfer_complete;
+	u32			sts;
+	u32			cmd_timeout_ms;
+};
+
+#define PECI_INT_MASK  (PECI_INT_TIMEOUT | PECI_INT_CONNECT | \
+			PECI_INT_W_FCS_BAD | PECI_INT_W_FCS_ABORT | \
+			PECI_INT_CMD_DONE)
+
+#define PECI_IDLE_CHECK_TIMEOUT_MS      50
+#define PECI_IDLE_CHECK_INTERVAL_MS     10
+
+#define PECI_RD_SAMPLING_POINT_DEFAULT  8
+#define PECI_RD_SAMPLING_POINT_MAX      15
+#define PECI_CLK_DIV_DEFAULT            0
+#define PECI_CLK_DIV_MAX                7
+#define PECI_MSG_TIMING_NEGO_DEFAULT    1
+#define PECI_MSG_TIMING_NEGO_MAX        255
+#define PECI_ADDR_TIMING_NEGO_DEFAULT   1
+#define PECI_ADDR_TIMING_NEGO_MAX       255
+#define PECI_CMD_TIMEOUT_MS_DEFAULT     1000
+#define PECI_CMD_TIMEOUT_MS_MAX         60000
+
+static int aspeed_peci_xfer_native(struct aspeed_peci *priv,
+				   struct peci_xfer_msg *msg)
+{
+	u32 peci_head, peci_state, rx_data, cmd_sts;
+	uint reg;
+	ktime_t start, end;
+	s64 elapsed_ms;
+	long err, timeout = msecs_to_jiffies(priv->cmd_timeout_ms);
+	int i, rc = 0;
+
+	start = ktime_get();
+
+	/* Check command sts and bus idle state */
+	while (!regmap_read(priv->regmap, AST_PECI_CMD, &cmd_sts) &&
+	       (cmd_sts & (PECI_CMD_STS_MASK | PECI_CMD_PIN_MON))) {
+		end = ktime_get();
+		elapsed_ms = ktime_to_ms(ktime_sub(end, start));
+		if (elapsed_ms >= PECI_IDLE_CHECK_TIMEOUT_MS) {
+			dev_dbg(priv->dev, "Timeout waiting for idle state!\n");
+			return -ETIMEDOUT;
+		}
+
+		usleep_range(PECI_IDLE_CHECK_INTERVAL_MS * 1000,
+			     (PECI_IDLE_CHECK_INTERVAL_MS * 1000) + 1000);
+	};
+
+	reinit_completion(&priv->xfer_complete);
+
+	peci_head = PECI_TAGET_ADDR(msg->addr) |
+				    PECI_WRITE_LEN(msg->tx_len) |
+				    PECI_READ_LEN(msg->rx_len);
+
+	rc = regmap_write(priv->regmap, AST_PECI_CMD_CTRL, peci_head);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < msg->tx_len; i += 4) {
+		reg = i < 16 ? AST_PECI_W_DATA0 + i % 16 :
+			       AST_PECI_W_DATA4 + i % 16;
+		rc = regmap_write(priv->regmap, reg,
+				  (msg->tx_buf[i + 3] << 24) |
+				  (msg->tx_buf[i + 2] << 16) |
+				  (msg->tx_buf[i + 1] << 8) |
+				  msg->tx_buf[i + 0]);
+		if (rc)
+			return rc;
+	}
+
+	dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
+#if DUMP_DEBUG
+	print_hex_dump(KERN_DEBUG, "TX : ", DUMP_PREFIX_NONE, 16, 1,
+		       msg->tx_buf, msg->tx_len, true);
+#endif
+
+	rc = regmap_write(priv->regmap, AST_PECI_CMD, PECI_CMD_FIRE);
+	if (rc)
+		return rc;
+
+	err = wait_for_completion_interruptible_timeout(&priv->xfer_complete,
+							timeout);
+
+	dev_dbg(priv->dev, "INT_STS : 0x%08x\n", priv->sts);
+	if (!regmap_read(priv->regmap, AST_PECI_CMD, &peci_state))
+		dev_dbg(priv->dev, "PECI_STATE : 0x%lx\n",
+			PECI_CMD_STS_GET(peci_state));
+	else
+		dev_dbg(priv->dev, "PECI_STATE : read error\n");
+
+	if (err <= 0 || !(priv->sts & PECI_INT_CMD_DONE)) {
+		if (err < 0) { /* -ERESTARTSYS */
+			return (int)err;
+		} else if (err == 0) {
+			dev_dbg(priv->dev, "Timeout waiting for a response!\n");
+			return -ETIMEDOUT;
+		}
+
+		dev_dbg(priv->dev, "No valid response!\n");
+		return -EFAULT;
+	}
+
+	for (i = 0; i < msg->rx_len; i++) {
+		u8 byte_offset = i % 4;
+
+		if (byte_offset == 0) {
+			reg = i < 16 ? AST_PECI_R_DATA0 + i % 16 :
+				       AST_PECI_R_DATA4 + i % 16;
+			rc = regmap_read(priv->regmap, reg, &rx_data);
+			if (rc)
+				return rc;
+		}
+
+		msg->rx_buf[i] = (u8)(rx_data >> (byte_offset << 3));
+	}
+
+#if DUMP_DEBUG
+	print_hex_dump(KERN_DEBUG, "RX : ", DUMP_PREFIX_NONE, 16, 1,
+		       msg->rx_buf, msg->rx_len, true);
+#endif
+	if (!regmap_read(priv->regmap, AST_PECI_CMD, &peci_state))
+		dev_dbg(priv->dev, "PECI_STATE : 0x%lx\n",
+			PECI_CMD_STS_GET(peci_state));
+	else
+		dev_dbg(priv->dev, "PECI_STATE : read error\n");
+	dev_dbg(priv->dev, "------------------------\n");
+
+	return rc;
+}
+
+static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
+{
+	struct aspeed_peci *priv = arg;
+	bool valid_irq = true;
+
+	if (regmap_read(priv->regmap, AST_PECI_INT_STS, &priv->sts))
+		return IRQ_NONE;
+
+	switch (priv->sts & PECI_INT_MASK) {
+	case PECI_INT_TIMEOUT:
+		dev_dbg(priv->dev, "PECI_INT_TIMEOUT\n");
+		if (regmap_write(priv->regmap, AST_PECI_INT_STS,
+				 PECI_INT_TIMEOUT))
+			return IRQ_NONE;
+		break;
+	case PECI_INT_CONNECT:
+		dev_dbg(priv->dev, "PECI_INT_CONNECT\n");
+		if (regmap_write(priv->regmap, AST_PECI_INT_STS,
+				 PECI_INT_CONNECT))
+			return IRQ_NONE;
+		break;
+	case PECI_INT_W_FCS_BAD:
+		dev_dbg(priv->dev, "PECI_INT_W_FCS_BAD\n");
+		if (regmap_write(priv->regmap, AST_PECI_INT_STS,
+				 PECI_INT_W_FCS_BAD))
+			return IRQ_NONE;
+		break;
+	case PECI_INT_W_FCS_ABORT:
+		dev_dbg(priv->dev, "PECI_INT_W_FCS_ABORT\n");
+		if (regmap_write(priv->regmap, AST_PECI_INT_STS,
+				 PECI_INT_W_FCS_ABORT))
+			return IRQ_NONE;
+		break;
+	case PECI_INT_CMD_DONE:
+		dev_dbg(priv->dev, "PECI_INT_CMD_DONE\n");
+		if (regmap_write(priv->regmap, AST_PECI_INT_STS,
+				 PECI_INT_CMD_DONE) ||
+		    regmap_write(priv->regmap, AST_PECI_CMD, 0))
+			return IRQ_NONE;
+		break;
+	default:
+		dev_dbg(priv->dev, "Unknown PECI interrupt : 0x%08x\n",
+			priv->sts);
+		if (regmap_write(priv->regmap, AST_PECI_INT_STS, priv->sts))
+			return IRQ_NONE;
+		valid_irq = false;
+		break;
+	}
+
+	if (valid_irq)
+		complete(&priv->xfer_complete);
+
+	return IRQ_HANDLED;
+}
+
+static int aspeed_peci_init_ctrl(struct aspeed_peci *priv)
+{
+	struct clk *clkin;
+	u32 clk_freq, clk_divisor, clk_div_val = 0;
+	u32 msg_timing_nego, addr_timing_nego, rd_sampling_point;
+	int ret;
+
+	clkin = devm_clk_get(priv->dev, NULL);
+	if (IS_ERR(clkin)) {
+		dev_err(priv->dev, "Failed to get clk source.\n");
+		return PTR_ERR(clkin);
+	}
+
+	ret = of_property_read_u32(priv->dev->of_node, "clock-frequency",
+				   &clk_freq);
+	if (ret < 0) {
+		dev_err(priv->dev,
+			"Could not read clock-frequency property.\n");
+		return ret;
+	}
+
+	clk_divisor = clk_get_rate(clkin) / clk_freq;
+	devm_clk_put(priv->dev, clkin);
+
+	while ((clk_divisor >> 1) && (clk_div_val < PECI_CLK_DIV_MAX))
+		clk_div_val++;
+
+	ret = of_property_read_u32(priv->dev->of_node, "msg-timing-nego",
+				   &msg_timing_nego);
+	if (ret || msg_timing_nego > PECI_MSG_TIMING_NEGO_MAX) {
+		dev_warn(priv->dev,
+			 "Invalid msg-timing-nego : %u, Use default : %u\n",
+			 msg_timing_nego, PECI_MSG_TIMING_NEGO_DEFAULT);
+		msg_timing_nego = PECI_MSG_TIMING_NEGO_DEFAULT;
+	}
+
+	ret = of_property_read_u32(priv->dev->of_node, "addr-timing-nego",
+				   &addr_timing_nego);
+	if (ret || addr_timing_nego > PECI_ADDR_TIMING_NEGO_MAX) {
+		dev_warn(priv->dev,
+			 "Invalid addr-timing-nego : %u, Use default : %u\n",
+			 addr_timing_nego, PECI_ADDR_TIMING_NEGO_DEFAULT);
+		addr_timing_nego = PECI_ADDR_TIMING_NEGO_DEFAULT;
+	}
+
+	ret = of_property_read_u32(priv->dev->of_node, "rd-sampling-point",
+				   &rd_sampling_point);
+	if (ret || rd_sampling_point > PECI_RD_SAMPLING_POINT_MAX) {
+		dev_warn(priv->dev,
+			 "Invalid rd-sampling-point : %u. Use default : %u\n",
+			 rd_sampling_point,
+			 PECI_RD_SAMPLING_POINT_DEFAULT);
+		rd_sampling_point = PECI_RD_SAMPLING_POINT_DEFAULT;
+	}
+
+	ret = of_property_read_u32(priv->dev->of_node, "cmd-timeout-ms",
+				   &priv->cmd_timeout_ms);
+	if (ret || priv->cmd_timeout_ms > PECI_CMD_TIMEOUT_MS_MAX ||
+	    priv->cmd_timeout_ms == 0) {
+		dev_warn(priv->dev,
+			 "Invalid cmd-timeout-ms : %u. Use default : %u\n",
+			 priv->cmd_timeout_ms,
+			 PECI_CMD_TIMEOUT_MS_DEFAULT);
+		priv->cmd_timeout_ms = PECI_CMD_TIMEOUT_MS_DEFAULT;
+	}
+
+	ret = regmap_write(priv->regmap, AST_PECI_CTRL,
+			   PECI_CTRL_CLK_DIV(PECI_CLK_DIV_DEFAULT) |
+			   PECI_CTRL_PECI_CLK_EN);
+	if (ret)
+		return ret;
+
+	usleep_range(1000, 5000);
+
+	/**
+	 * Timing negotiation period setting.
+	 * The unit of the programmed value is 4 times of PECI clock period.
+	 */
+	ret = regmap_write(priv->regmap, AST_PECI_TIMING,
+			   PECI_TIMING_MESSAGE(msg_timing_nego) |
+			   PECI_TIMING_ADDRESS(addr_timing_nego));
+	if (ret)
+		return ret;
+
+	/* Clear interrupts */
+	ret = regmap_write(priv->regmap, AST_PECI_INT_STS, PECI_INT_MASK);
+	if (ret)
+		return ret;
+
+	/* Enable interrupts */
+	ret = regmap_write(priv->regmap, AST_PECI_INT_CTRL, PECI_INT_MASK);
+	if (ret)
+		return ret;
+
+	/* Read sampling point and clock speed setting */
+	ret = regmap_write(priv->regmap, AST_PECI_CTRL,
+			   PECI_CTRL_SAMPLING(rd_sampling_point) |
+			   PECI_CTRL_CLK_DIV(clk_div_val) |
+			   PECI_CTRL_PECI_EN | PECI_CTRL_PECI_CLK_EN);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static const struct regmap_config aspeed_peci_regmap_config = {
+	.reg_bits = 32,
+	.val_bits = 32,
+	.reg_stride = 4,
+	.max_register = AST_PECI_R_DATA7,
+	.val_format_endian = REGMAP_ENDIAN_LITTLE,
+	.fast_io = true,
+};
+
+static int aspeed_peci_xfer(struct peci_adapter *adaper,
+			    struct peci_xfer_msg *msg)
+{
+	struct aspeed_peci *priv = peci_get_adapdata(adaper);
+
+	return aspeed_peci_xfer_native(priv, msg);
+}
+
+static int aspeed_peci_probe(struct platform_device *pdev)
+{
+	struct aspeed_peci *priv;
+	struct resource *res;
+	void __iomem *base;
+	int ret = 0;
+
+	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	dev_set_drvdata(&pdev->dev, priv);
+	priv->dev = &pdev->dev;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+
+	priv->regmap = devm_regmap_init_mmio(&pdev->dev, base,
+					     &aspeed_peci_regmap_config);
+	if (IS_ERR(priv->regmap))
+		return PTR_ERR(priv->regmap);
+
+	priv->irq = platform_get_irq(pdev, 0);
+	if (!priv->irq)
+		return -ENODEV;
+
+	ret = devm_request_irq(&pdev->dev, priv->irq, aspeed_peci_irq_handler,
+			       IRQF_SHARED,
+			       "peci-aspeed-irq",
+			       priv);
+	if (ret < 0)
+		return ret;
+
+	init_completion(&priv->xfer_complete);
+
+	priv->adaper.dev.parent = priv->dev;
+	priv->adaper.dev.of_node = of_node_get(dev_of_node(priv->dev));
+	strlcpy(priv->adaper.name, pdev->name, sizeof(priv->adaper.name));
+	priv->adaper.xfer = aspeed_peci_xfer;
+	peci_set_adapdata(&priv->adaper, priv);
+
+	ret = aspeed_peci_init_ctrl(priv);
+	if (ret < 0)
+		return ret;
+
+	ret = peci_add_adapter(&priv->adaper);
+	if (ret < 0)
+		return ret;
+
+	dev_info(&pdev->dev, "peci bus %d registered, irq %d\n",
+		 priv->adaper.nr, priv->irq);
+
+	return 0;
+}
+
+static int aspeed_peci_remove(struct platform_device *pdev)
+{
+	struct aspeed_peci *priv = dev_get_drvdata(&pdev->dev);
+
+	peci_del_adapter(&priv->adaper);
+	of_node_put(priv->adaper.dev.of_node);
+
+	return 0;
+}
+
+static const struct of_device_id aspeed_peci_of_table[] = {
+	{ .compatible = "aspeed,ast2400-peci", },
+	{ .compatible = "aspeed,ast2500-peci", },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
+
+static struct platform_driver aspeed_peci_driver = {
+	.probe  = aspeed_peci_probe,
+	.remove = aspeed_peci_remove,
+	.driver = {
+		.name           = "peci-aspeed",
+		.of_match_table = of_match_ptr(aspeed_peci_of_table),
+	},
+};
+module_platform_driver(aspeed_peci_driver);
+
+MODULE_AUTHOR("Ryan Chen <ryan_chen@aspeedtech.com>");
+MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
+MODULE_DESCRIPTION("Aspeed PECI driver");
+MODULE_LICENSE("GPL v2");
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 5/8] [PATCH [5/8] Documentation: dt-bindings: Add a document for PECI hwmon client driver
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
                   ` (3 preceding siblings ...)
  2018-02-21 16:16 ` [PATCH v2 4/8] [PATCH 4/8] drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx Jae Hyun Yoo
@ 2018-02-21 16:16 ` Jae Hyun Yoo
  2018-02-21 16:16 ` [PATCH v2 6/8] [PATCH 6/8] Documentation: hwmon: " Jae Hyun Yoo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:16 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds a dt-bindings document for a generic PECI hwmon client
driver.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 .../devicetree/bindings/hwmon/peci-hwmon.txt       | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/peci-hwmon.txt

diff --git a/Documentation/devicetree/bindings/hwmon/peci-hwmon.txt b/Documentation/devicetree/bindings/hwmon/peci-hwmon.txt
new file mode 100644
index 000000000000..831813158884
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/peci-hwmon.txt
@@ -0,0 +1,27 @@
+Bindings for Intel PECI (Platform Environment Control Interface) hwmon driver.
+
+Required properties:
+- compatible
+	Should be "intel,peci-hwmon".
+
+- reg
+	Should contain address of a client CPU. Address range of CPU clients is
+	starting from 0x30 based on PECI specification.
+	<0x30> .. <0x37> (depends on the PECI_OFFSET_MAX definition)
+
+Example:
+	peci-bus@0 {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		< more properties >
+
+		peci-hwmon@cpu0 {
+			compatible = "intel,peci-hwmon";
+			reg = <0x30>;
+		};
+
+		peci-hwmon@cpu1 {
+			compatible = "intel,peci-hwmon";
+			reg = <0x31>;
+		};
+	};
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 6/8] [PATCH 6/8] Documentation: hwmon: Add a document for PECI hwmon client driver
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
                   ` (4 preceding siblings ...)
  2018-02-21 16:16 ` [PATCH v2 5/8] [PATCH [5/8] Documentation: dt-bindings: Add a document for PECI hwmon client driver Jae Hyun Yoo
@ 2018-02-21 16:16 ` Jae Hyun Yoo
  2018-03-06 20:28   ` Randy Dunlap
  2018-02-21 16:16 ` [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic " Jae Hyun Yoo
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:16 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds a hwmon document for a generic PECI hwmon client driver.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 Documentation/hwmon/peci-hwmon | 73 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 Documentation/hwmon/peci-hwmon

diff --git a/Documentation/hwmon/peci-hwmon b/Documentation/hwmon/peci-hwmon
new file mode 100644
index 000000000000..93e587498536
--- /dev/null
+++ b/Documentation/hwmon/peci-hwmon
@@ -0,0 +1,73 @@
+Kernel driver peci-hwmon
+===============================
+
+Supported chips:
+	Any recent Intel CPU which is connected through a PECI bus.
+	Addresses scanned: PECI client address 0x30 - 0x37
+	Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author:
+	Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
+
+Description
+-----------
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of the CPU package, CPU cores and DIMM
+components that are accessible using the PECI Client Command Suite via the
+processor PECI client.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+sysfs attributes
+----------------
+
+temp1_input		Provides current die temperature of the CPU package.
+temp1_max		Provides thermal control temperature of the CPU package
+			which is also known as Tcontrol.
+temp1_crit		Provides shutdown temperature of the CPU package which
+			is also known as the maximum processor junction
+			temperature, Tjmax or Tprochot.
+temp1_crit_hyst		Provides the hysteresis value from Tcontrol to Tjmax of
+			the CPU package.
+
+temp2_input		Provides current DTS thermal margin to Tcontrol of the
+			CPU package. Value 0 means it reaches to Tcontrol
+			temperature. Sub-zero value means the die temperature
+			goes across Tconrtol to Tjmax.
+temp2_min		Provides the minimum DTS thermal margin to Tcontrol of
+			the CPU package.
+temp2_lcrit		Provides the value when the CPU package temperature
+			reaches to Tjmax.
+
+temp3_input		Provides current Tcontrol temperature of the CPU
+			package which is also known as Fan Temperature target.
+			Indicates the relative value from thermal monitor trip
+			temperature at which fans should be engaged.
+temp3_crit		Provides Tcontrol critical value of the CPU package
+			which is same to Tjmax.
+
+temp4_input		Provides current Tthrottle temperature of the CPU
+			package. Used for throttling temperature. If this value
+			is allowed and lower than Tjmax - the throttle will
+			occur and reported at lower than Tjmax.
+
+temp5_input		Provides the maximum junction temperature, Tjmax of the
+			CPU package.
+
+temp<n>_label		Provides core temperature if this label indicates
+			'Core #'.
+temp[n]_input		Provides current temperature of each core.
+temp[n]_max		Provides thermal control temperature of the core.
+temp[n]_crit		Provides shutdown temperature of the core.
+temp[n]_crit_hyst	Provides the hysteresis value from Tcontrol to Tjmax of
+			the core.
+
+temp<n>_label		Provides DDR DIMM temperature if this label indicates
+			'DIMM #'.
+temp<n>_input		Provides current temperature of the DDR DIMM.
+
+Note:
+	DIMM temperature group will be appeared when the client CPU's BIOS
+	completes memory training and testing.
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
                   ` (5 preceding siblings ...)
  2018-02-21 16:16 ` [PATCH v2 6/8] [PATCH 6/8] Documentation: hwmon: " Jae Hyun Yoo
@ 2018-02-21 16:16 ` Jae Hyun Yoo
  2018-02-21 18:26   ` Guenter Roeck
  2018-03-13  9:32   ` Stef van Os
  2018-02-21 16:16 ` [PATCH v2 8/8] [PATCH 8/8] Add a maintainer for the PECI subsystem Jae Hyun Yoo
  2018-03-06 12:40 ` [PATCH v2 0/8] PECI device driver introduction Pavel Machek
  8 siblings, 2 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:16 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds a generic PECI hwmon client driver implementation.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 drivers/hwmon/Kconfig      |  10 +
 drivers/hwmon/Makefile     |   1 +
 drivers/hwmon/peci-hwmon.c | 928 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 939 insertions(+)
 create mode 100644 drivers/hwmon/peci-hwmon.c

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index ef23553ff5cb..f22e0c31f597 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
 	  This driver can also be built as a module.  If so, the module
 	  will be called nct7904.
 
+config SENSORS_PECI_HWMON
+	tristate "PECI hwmon support"
+	depends on PECI
+	help
+	  If you say yes here you get support for the generic PECI hwmon
+	  driver.
+
+	  This driver can also be built as a module.  If so, the module
+	  will be called peci-hwmon.
+
 config SENSORS_NSA320
 	tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
 	depends on GPIOLIB && OF
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index f814b4ace138..946f54b168e5 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)	+= nct7802.o
 obj-$(CONFIG_SENSORS_NCT7904)	+= nct7904.o
 obj-$(CONFIG_SENSORS_NSA320)	+= nsa320-hwmon.o
 obj-$(CONFIG_SENSORS_NTC_THERMISTOR)	+= ntc_thermistor.o
+obj-$(CONFIG_SENSORS_PECI_HWMON)	+= peci-hwmon.o
 obj-$(CONFIG_SENSORS_PC87360)	+= pc87360.o
 obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
 obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
new file mode 100644
index 000000000000..edd27744adcb
--- /dev/null
+++ b/drivers/hwmon/peci-hwmon.c
@@ -0,0 +1,928 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#include <linux/delay.h>
+#include <linux/hwmon.h>
+#include <linux/hwmon-sysfs.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/peci.h>
+#include <linux/workqueue.h>
+
+#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks x 2) */
+#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX Platinum) */
+#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
+
+#define CORE_TEMP_ATTRS       5
+#define DIMM_TEMP_ATTRS       2
+#define ATTR_NAME_LEN         24
+
+#define DEFAULT_ATTR_GRP_NUMS 5
+
+#define UPDATE_INTERVAL_MIN   HZ
+#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
+
+enum sign {
+	POS,
+	NEG
+};
+
+struct temp_data {
+	bool valid;
+	s32  value;
+	unsigned long last_updated;
+};
+
+struct temp_group {
+	struct temp_data tjmax;
+	struct temp_data tcontrol;
+	struct temp_data tthrottle;
+	struct temp_data dts_margin;
+	struct temp_data die;
+	struct temp_data core[CORE_NUMS_MAX];
+	struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
+};
+
+struct core_temp_group {
+	struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
+	char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
+	struct attribute *attrs[CORE_TEMP_ATTRS + 1];
+	struct attribute_group attr_group;
+};
+
+struct dimm_temp_group {
+	struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
+	char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
+	struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
+	struct attribute_group attr_group;
+};
+
+struct peci_hwmon {
+	struct peci_client *client;
+	struct device *dev;
+	struct device *hwmon_dev;
+	struct workqueue_struct *work_queue;
+	struct delayed_work work_handler;
+	char name[PECI_NAME_SIZE];
+	struct temp_group temp;
+	u8 addr;
+	uint cpu_no;
+	u32 core_mask;
+	u32 dimm_mask;
+	const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
+	const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX + 1];
+	uint global_idx;
+	uint core_idx;
+	uint dimm_idx;
+};
+
+enum label {
+	L_DIE,
+	L_DTS,
+	L_TCONTROL,
+	L_TTHROTTLE,
+	L_TJMAX,
+	L_MAX
+};
+
+static const char *peci_label[L_MAX] = {
+	"Die\n",
+	"DTS margin to Tcontrol\n",
+	"Tcontrol\n",
+	"Tthrottle\n",
+	"Tjmax\n",
+};
+
+static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, void *msg)
+{
+	return peci_command(priv->client->adapter, cmd, msg);
+}
+
+static int need_update(struct temp_data *temp)
+{
+	if (temp->valid &&
+	    time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
+		return 0;
+
+	return 1;
+}
+
+static s32 ten_dot_six_to_millidegree(s32 x)
+{
+	return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
+}
+
+static int get_tjmax(struct peci_hwmon *priv)
+{
+	struct peci_rd_pkg_cfg_msg msg;
+	int rc;
+
+	if (!priv->temp.tjmax.valid) {
+		msg.addr = priv->addr;
+		msg.index = MBX_INDEX_TEMP_TARGET;
+		msg.param = 0;
+		msg.rx_len = 4;
+
+		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
+		if (rc < 0)
+			return rc;
+
+		priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
+		priv->temp.tjmax.valid = true;
+	}
+
+	return 0;
+}
+
+static int get_tcontrol(struct peci_hwmon *priv)
+{
+	struct peci_rd_pkg_cfg_msg msg;
+	s32 tcontrol_margin;
+	int rc;
+
+	if (!need_update(&priv->temp.tcontrol))
+		return 0;
+
+	rc = get_tjmax(priv);
+	if (rc < 0)
+		return rc;
+
+	msg.addr = priv->addr;
+	msg.index = MBX_INDEX_TEMP_TARGET;
+	msg.param = 0;
+	msg.rx_len = 4;
+
+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
+	if (rc < 0)
+		return rc;
+
+	tcontrol_margin = msg.pkg_config[1];
+	tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
+
+	priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
+
+	if (!priv->temp.tcontrol.valid) {
+		priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
+		priv->temp.tcontrol.valid = true;
+	} else {
+		priv->temp.tcontrol.last_updated = jiffies;
+	}
+
+	return 0;
+}
+
+static int get_tthrottle(struct peci_hwmon *priv)
+{
+	struct peci_rd_pkg_cfg_msg msg;
+	s32 tthrottle_offset;
+	int rc;
+
+	if (!need_update(&priv->temp.tthrottle))
+		return 0;
+
+	rc = get_tjmax(priv);
+	if (rc < 0)
+		return rc;
+
+	msg.addr = priv->addr;
+	msg.index = MBX_INDEX_TEMP_TARGET;
+	msg.param = 0;
+	msg.rx_len = 4;
+
+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
+	if (rc < 0)
+		return rc;
+
+	tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
+	priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
+
+	if (!priv->temp.tthrottle.valid) {
+		priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
+		priv->temp.tthrottle.valid = true;
+	} else {
+		priv->temp.tthrottle.last_updated = jiffies;
+	}
+
+	return 0;
+}
+
+static int get_die_temp(struct peci_hwmon *priv)
+{
+	struct peci_get_temp_msg msg;
+	int rc;
+
+	if (!need_update(&priv->temp.die))
+		return 0;
+
+	rc = get_tjmax(priv);
+	if (rc < 0)
+		return rc;
+
+	msg.addr = priv->addr;
+
+	rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
+	if (rc < 0)
+		return rc;
+
+	priv->temp.die.value = priv->temp.tjmax.value +
+			       ((s32)msg.temp_raw * 1000 / 64);
+
+	if (!priv->temp.die.valid) {
+		priv->temp.die.last_updated = INITIAL_JIFFIES;
+		priv->temp.die.valid = true;
+	} else {
+		priv->temp.die.last_updated = jiffies;
+	}
+
+	return 0;
+}
+
+static int get_dts_margin(struct peci_hwmon *priv)
+{
+	struct peci_rd_pkg_cfg_msg msg;
+	s32 dts_margin;
+	int rc;
+
+	if (!need_update(&priv->temp.dts_margin))
+		return 0;
+
+	msg.addr = priv->addr;
+	msg.index = MBX_INDEX_DTS_MARGIN;
+	msg.param = 0;
+	msg.rx_len = 4;
+
+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
+	if (rc < 0)
+		return rc;
+
+	dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
+
+	/**
+	 * Processors return a value of DTS reading in 10.6 format
+	 * (10 bits signed decimal, 6 bits fractional).
+	 * Error codes:
+	 *   0x8000: General sensor error
+	 *   0x8001: Reserved
+	 *   0x8002: Underflow on reading value
+	 *   0x8003-0x81ff: Reserved
+	 */
+	if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
+		return -1;
+
+	dts_margin = ten_dot_six_to_millidegree(dts_margin);
+
+	priv->temp.dts_margin.value = dts_margin;
+
+	if (!priv->temp.dts_margin.valid) {
+		priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
+		priv->temp.dts_margin.valid = true;
+	} else {
+		priv->temp.dts_margin.last_updated = jiffies;
+	}
+
+	return 0;
+}
+
+static int get_core_temp(struct peci_hwmon *priv, int core_index)
+{
+	struct peci_rd_pkg_cfg_msg msg;
+	s32 core_dts_margin;
+	int rc;
+
+	if (!need_update(&priv->temp.core[core_index]))
+		return 0;
+
+	rc = get_tjmax(priv);
+	if (rc < 0)
+		return rc;
+
+	msg.addr = priv->addr;
+	msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
+	msg.param = core_index;
+	msg.rx_len = 4;
+
+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
+	if (rc < 0)
+		return rc;
+
+	core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
+
+	/**
+	 * Processors return a value of the core DTS reading in 10.6 format
+	 * (10 bits signed decimal, 6 bits fractional).
+	 * Error codes:
+	 *   0x8000: General sensor error
+	 *   0x8001: Reserved
+	 *   0x8002: Underflow on reading value
+	 *   0x8003-0x81ff: Reserved
+	 */
+	if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
+		return -1;
+
+	core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
+
+	priv->temp.core[core_index].value = priv->temp.tjmax.value +
+					    core_dts_margin;
+
+	if (!priv->temp.core[core_index].valid) {
+		priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
+		priv->temp.core[core_index].valid = true;
+	} else {
+		priv->temp.core[core_index].last_updated = jiffies;
+	}
+
+	return 0;
+}
+
+static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
+{
+	struct peci_rd_pkg_cfg_msg msg;
+	int channel = dimm_index / 2;
+	int dimm_order = dimm_index % 2;
+	int rc;
+
+	if (!need_update(&priv->temp.dimm[dimm_index]))
+		return 0;
+
+	msg.addr = priv->addr;
+	msg.index = MBX_INDEX_DDR_DIMM_TEMP;
+	msg.param = channel;
+	msg.rx_len = 4;
+
+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
+	if (rc < 0)
+		return rc;
+
+	priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 1000;
+
+	if (!priv->temp.dimm[dimm_index].valid) {
+		priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
+		priv->temp.dimm[dimm_index].valid = true;
+	} else {
+		priv->temp.dimm[dimm_index].last_updated = jiffies;
+	}
+
+	return 0;
+}
+
+static ssize_t show_tcontrol(struct device *dev,
+			     struct device_attribute *attr,
+			     char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	int rc;
+
+	rc = get_tcontrol(priv);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
+}
+
+static ssize_t show_tcontrol_margin(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
+	int rc;
+
+	rc = get_tcontrol(priv);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", sensor_attr->index == POS ?
+				    priv->temp.tjmax.value -
+				    priv->temp.tcontrol.value :
+				    priv->temp.tcontrol.value -
+				    priv->temp.tjmax.value);
+}
+
+static ssize_t show_tthrottle(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	int rc;
+
+	rc = get_tthrottle(priv);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
+}
+
+static ssize_t show_tjmax(struct device *dev,
+			  struct device_attribute *attr,
+			  char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	int rc;
+
+	rc = get_tjmax(priv);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", priv->temp.tjmax.value);
+}
+
+static ssize_t show_die_temp(struct device *dev,
+			     struct device_attribute *attr,
+			     char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	int rc;
+
+	rc = get_die_temp(priv);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", priv->temp.die.value);
+}
+
+static ssize_t show_dts_margin(struct device *dev,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	int rc;
+
+	rc = get_dts_margin(priv);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
+}
+
+static ssize_t show_core_temp(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
+	int core_index = sensor_attr->index;
+	int rc;
+
+	rc = get_core_temp(priv, core_index);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
+}
+
+static ssize_t show_dimm_temp(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(dev);
+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
+	int dimm_index = sensor_attr->index;
+	int rc;
+
+	rc = get_dimm_temp(priv, dimm_index);
+	if (rc < 0)
+		return rc;
+
+	return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
+}
+
+static ssize_t show_value(struct device *dev,
+			  struct device_attribute *attr,
+			  char *buf)
+{
+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
+
+	return sprintf(buf, "%d\n", sensor_attr->index);
+}
+
+static ssize_t show_label(struct device *dev,
+			  struct device_attribute *attr,
+			  char *buf)
+{
+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
+
+	return sprintf(buf, peci_label[sensor_attr->index]);
+}
+
+static ssize_t show_core_label(struct device *dev,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
+
+	return sprintf(buf, "Core %d\n", sensor_attr->index);
+}
+
+static ssize_t show_dimm_label(struct device *dev,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
+
+	char channel = 'A' + (sensor_attr->index / 2);
+	int index = sensor_attr->index % 2;
+
+	return sprintf(buf, "DIMM %d (%c%d)\n",
+		       sensor_attr->index, channel, index);
+}
+
+/* Die temperature */
+static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
+static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
+static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
+static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
+static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, show_tcontrol_margin, NULL,
+			  POS);
+
+static struct attribute *die_temp_attrs[] = {
+	&sensor_dev_attr_temp1_label.dev_attr.attr,
+	&sensor_dev_attr_temp1_input.dev_attr.attr,
+	&sensor_dev_attr_temp1_max.dev_attr.attr,
+	&sensor_dev_attr_temp1_crit.dev_attr.attr,
+	&sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
+	NULL
+};
+
+static struct attribute_group die_temp_attr_group = {
+	.attrs = die_temp_attrs,
+};
+
+/* DTS margin temperature */
+static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
+static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
+static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
+static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, NULL, NEG);
+
+static struct attribute *dts_margin_temp_attrs[] = {
+	&sensor_dev_attr_temp2_label.dev_attr.attr,
+	&sensor_dev_attr_temp2_input.dev_attr.attr,
+	&sensor_dev_attr_temp2_min.dev_attr.attr,
+	&sensor_dev_attr_temp2_lcrit.dev_attr.attr,
+	NULL
+};
+
+static struct attribute_group dts_margin_temp_attr_group = {
+	.attrs = dts_margin_temp_attrs,
+};
+
+/* Tcontrol temperature */
+static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, L_TCONTROL);
+static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
+static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
+
+static struct attribute *tcontrol_temp_attrs[] = {
+	&sensor_dev_attr_temp3_label.dev_attr.attr,
+	&sensor_dev_attr_temp3_input.dev_attr.attr,
+	&sensor_dev_attr_temp3_crit.dev_attr.attr,
+	NULL
+};
+
+static struct attribute_group tcontrol_temp_attr_group = {
+	.attrs = tcontrol_temp_attrs,
+};
+
+/* Tthrottle temperature */
+static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, L_TTHROTTLE);
+static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
+
+static struct attribute *tthrottle_temp_attrs[] = {
+	&sensor_dev_attr_temp4_label.dev_attr.attr,
+	&sensor_dev_attr_temp4_input.dev_attr.attr,
+	NULL
+};
+
+static struct attribute_group tthrottle_temp_attr_group = {
+	.attrs = tthrottle_temp_attrs,
+};
+
+/* Tjmax temperature */
+static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
+static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
+
+static struct attribute *tjmax_temp_attrs[] = {
+	&sensor_dev_attr_temp5_label.dev_attr.attr,
+	&sensor_dev_attr_temp5_input.dev_attr.attr,
+	NULL
+};
+
+static struct attribute_group tjmax_temp_attr_group = {
+	.attrs = tjmax_temp_attrs,
+};
+
+static const struct attribute_group *
+default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
+	&die_temp_attr_group,
+	&dts_margin_temp_attr_group,
+	&tcontrol_temp_attr_group,
+	&tthrottle_temp_attr_group,
+	&tjmax_temp_attr_group,
+	NULL
+};
+
+/* Core temperature */
+static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device *dev,
+		struct device_attribute *devattr, char *buf) = {
+	show_core_label,
+	show_core_temp,
+	show_tcontrol,
+	show_tjmax,
+	show_tcontrol_margin,
+};
+
+static const char *const core_suffix[CORE_TEMP_ATTRS] = {
+	"label",
+	"input",
+	"max",
+	"crit",
+	"crit_hyst",
+};
+
+static int check_resolved_cores(struct peci_hwmon *priv)
+{
+	struct peci_rd_pci_cfg_local_msg msg;
+	int rc;
+
+	if (!(priv->client->adapter->cmd_mask & BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
+		return -EINVAL;
+
+	/* Get the RESOLVED_CORES register value */
+	msg.addr = priv->addr;
+	msg.bus = 1;
+	msg.device = 30;
+	msg.function = 3;
+	msg.reg = 0xB4;
+	msg.rx_len = 4;
+
+	rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
+	if (rc < 0)
+		return rc;
+
+	priv->core_mask = msg.pci_config[3] << 24 |
+			  msg.pci_config[2] << 16 |
+			  msg.pci_config[1] << 8 |
+			  msg.pci_config[0];
+
+	if (!priv->core_mask)
+		return -EAGAIN;
+
+	dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", priv->core_mask);
+	return 0;
+}
+
+static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
+{
+	struct core_temp_group *data;
+	int i;
+
+	data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
+			    GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	for (i = 0; i < CORE_TEMP_ATTRS; i++) {
+		snprintf(data->attr_name[i], ATTR_NAME_LEN,
+			 "temp%d_%s", priv->global_idx, core_suffix[i]);
+		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
+		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
+		data->sd_attrs[i].dev_attr.attr.mode = 0444;
+		data->sd_attrs[i].dev_attr.show = core_show_fn[i];
+		if (i == 0 || i == 1) /* label or temp */
+			data->sd_attrs[i].index = core_no;
+		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
+	}
+
+	data->attr_group.attrs = data->attrs;
+	priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
+	priv->global_idx++;
+
+	return 0;
+}
+
+static int create_core_temp_groups(struct peci_hwmon *priv)
+{
+	int rc, i;
+
+	rc = check_resolved_cores(priv);
+	if (!rc) {
+		for (i = 0; i < CORE_NUMS_MAX; i++) {
+			if (priv->core_mask & BIT(i)) {
+				rc = create_core_temp_group(priv, i);
+				if (rc)
+					return rc;
+			}
+		}
+
+		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
+					 priv->core_attr_groups);
+	}
+
+	return rc;
+}
+
+/* DIMM temperature */
+static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device *dev,
+		struct device_attribute *devattr, char *buf) = {
+	show_dimm_label,
+	show_dimm_temp,
+};
+
+static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
+	"label",
+	"input",
+};
+
+static int check_populated_dimms(struct peci_hwmon *priv)
+{
+	struct peci_rd_pkg_cfg_msg msg;
+	int i, rc, pass = 0;
+
+do_scan:
+	for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
+		msg.addr = priv->addr;
+		msg.index = MBX_INDEX_DDR_DIMM_TEMP;
+		msg.param = i; /* channel */
+		msg.rx_len = 4;
+
+		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
+		if (rc < 0)
+			return rc;
+
+		if (msg.pkg_config[0]) /* DIMM #0 on the channel */
+			priv->dimm_mask |= BIT(i);
+
+		if (msg.pkg_config[1]) /* DIMM #1 on the channel */
+			priv->dimm_mask |= BIT(i + 1);
+	}
+
+	/* Do 2-pass scanning */
+	if (priv->dimm_mask && pass == 0) {
+		pass++;
+		goto do_scan;
+	}
+
+	if (!priv->dimm_mask)
+		return -EAGAIN;
+
+	dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", priv->dimm_mask);
+	return 0;
+}
+
+static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
+{
+	struct dimm_temp_group *data;
+	int i;
+
+	data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
+			    GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
+		snprintf(data->attr_name[i], ATTR_NAME_LEN,
+			 "temp%d_%s", priv->global_idx, dimm_suffix[i]);
+		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
+		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
+		data->sd_attrs[i].dev_attr.attr.mode = 0444;
+		data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
+		data->sd_attrs[i].index = dimm_no;
+		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
+	}
+
+	data->attr_group.attrs = data->attrs;
+	priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
+	priv->global_idx++;
+
+	return 0;
+}
+
+static int create_dimm_temp_groups(struct peci_hwmon *priv)
+{
+	int rc, i;
+
+	rc = check_populated_dimms(priv);
+	if (!rc) {
+		for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
+			if (priv->dimm_mask & BIT(i)) {
+				rc = create_dimm_temp_group(priv, i);
+				if (rc)
+					return rc;
+			}
+		}
+
+		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
+					 priv->dimm_attr_groups);
+		if (!rc)
+			dev_dbg(priv->dev, "Done DIMM temp group creation\n");
+	} else if (rc == -EAGAIN) {
+		queue_delayed_work(priv->work_queue, &priv->work_handler,
+				   DIMM_MASK_CHECK_DELAY);
+		dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");
+	}
+
+	return rc;
+}
+
+static void create_dimm_temp_groups_delayed(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
+					       work_handler);
+	int rc;
+
+	rc = create_dimm_temp_groups(priv);
+	if (rc && rc != -EAGAIN)
+		dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
+}
+
+static int peci_hwmon_probe(struct peci_client *client)
+{
+	struct device *dev = &client->dev;
+	struct peci_hwmon *priv;
+	int rc;
+
+	if ((client->adapter->cmd_mask &
+	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
+	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
+		dev_err(dev, "Client doesn't support temperature monitoring\n");
+		return -EINVAL;
+	}
+
+	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	dev_set_drvdata(dev, priv);
+	priv->client = client;
+	priv->dev = dev;
+	priv->addr = client->addr;
+	priv->cpu_no = priv->addr - PECI_BASE_ADDR;
+
+	snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", priv->cpu_no);
+
+	priv->work_queue = create_singlethread_workqueue(priv->name);
+	if (!priv->work_queue)
+		return -ENOMEM;
+
+	priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
+							    priv->name,
+							    priv,
+							   default_attr_groups);
+
+	rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
+	if (rc) {
+		dev_err(dev, "Failed to register peci hwmon\n");
+		return rc;
+	}
+
+	priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
+
+	rc = create_core_temp_groups(priv);
+	if (rc) {
+		dev_err(dev, "Failed to create core groups\n");
+		return rc;
+	}
+
+	INIT_DELAYED_WORK(&priv->work_handler, create_dimm_temp_groups_delayed);
+
+	rc = create_dimm_temp_groups(priv);
+	if (rc && rc != -EAGAIN)
+		dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
+
+	dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
+
+	return 0;
+}
+
+static int peci_hwmon_remove(struct peci_client *client)
+{
+	struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
+
+	cancel_delayed_work(&priv->work_handler);
+	destroy_workqueue(priv->work_queue);
+	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
+	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
+	hwmon_device_unregister(priv->hwmon_dev);
+
+	return 0;
+}
+
+static const struct of_device_id peci_of_table[] = {
+	{ .compatible = "intel,peci-hwmon", },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, peci_of_table);
+
+static struct peci_driver peci_hwmon_driver = {
+	.probe  = peci_hwmon_probe,
+	.remove = peci_hwmon_remove,
+	.driver = {
+		.name           = "peci-hwmon",
+		.of_match_table = of_match_ptr(peci_of_table),
+	},
+};
+module_peci_driver(peci_hwmon_driver);
+
+MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
+MODULE_DESCRIPTION("PECI hwmon driver");
+MODULE_LICENSE("GPL v2");
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 8/8] [PATCH 8/8] Add a maintainer for the PECI subsystem
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
                   ` (6 preceding siblings ...)
  2018-02-21 16:16 ` [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic " Jae Hyun Yoo
@ 2018-02-21 16:16 ` Jae Hyun Yoo
  2018-03-06 12:40 ` [PATCH v2 0/8] PECI device driver introduction Pavel Machek
  8 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 16:16 UTC (permalink / raw)
  To: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

This commit adds a maintainer information for the PECI subsystem.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
---
 MAINTAINERS | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 93a12af4f180..f9c302cbb76b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10830,6 +10830,15 @@ L:	platform-driver-x86@vger.kernel.org
 S:	Maintained
 F:	drivers/platform/x86/peaq-wmi.c
 
+PECI SUBSYSTEM
+M:	Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
+S:	Maintained
+F:	Documentation/devicetree/bindings/peci/
+F:	drivers/peci/
+F:	drivers/hwmon/peci-*.c
+F:	include/linux/peci.h
+F:	include/uapi/linux/peci-ioctl.h
+
 PER-CPU MEMORY ALLOCATOR
 M:	Tejun Heo <tj@kernel.org>
 M:	Christoph Lameter <cl@linux.com>
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
@ 2018-02-21 17:04   ` Andrew Lunn
  2018-02-21 20:31     ` Jae Hyun Yoo
  2018-02-21 17:58   ` Greg KH
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Andrew Lunn @ 2018-02-21 17:04 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

> +static int peci_locked_xfer(struct peci_adapter *adapter,
> +			    struct peci_xfer_msg *msg,
> +			    bool do_retry,
> +			    bool has_aw_fcs)
> +{
> +	ktime_t start, end;
> +	s64 elapsed_ms;
> +	int rc = 0;
> +
> +	if (!adapter->xfer) {
> +		dev_dbg(&adapter->dev, "PECI level transfers not supported\n");
> +		return -ENODEV;
> +	}
> +
> +	if (in_atomic() || irqs_disabled()) {

Hi Jae

Is there a real need to do transfers in atomic context, or with
interrupts disabled? 

> +		rt_mutex_trylock(&adapter->bus_lock);
> +		if (!rc)
> +			return -EAGAIN; /* PECI activity is ongoing */
> +	} else {
> +		rt_mutex_lock(&adapter->bus_lock);
> +	}
> +
> +	if (do_retry)
> +		start = ktime_get();
> +
> +	do {
> +		rc = adapter->xfer(adapter, msg);
> +
> +		if (!do_retry)
> +			break;
> +
> +		/* Per the PECI spec, need to retry commands that return 0x8x */
> +		if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
> +			      DEV_PECI_CC_TIMEOUT)))
> +			break;
> +
> +		/* Set the retry bit to indicate a retry attempt */
> +		msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;
> +
> +		/* Recalculate the AW FCS if it has one */
> +		if (has_aw_fcs)
> +			msg->tx_buf[msg->tx_len - 1] = 0x80 ^
> +						peci_aw_fcs((u8 *)msg,
> +							    2 + msg->tx_len);
> +
> +		/* Retry for at least 250ms before returning an error */
> +		end = ktime_get();
> +		elapsed_ms = ktime_to_ms(ktime_sub(end, start));
> +		if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
> +			dev_dbg(&adapter->dev, "Timeout retrying xfer!\n");
> +			break;
> +		}
> +	} while (true);

So you busy loop to 1/4 second? How about putting a sleep in here so
other things can be done between each retry.

And should it not return -ETIMEDOUT after that 1/4 second?

> +static int peci_scan_cmd_mask(struct peci_adapter *adapter)
> +{
> +	struct peci_xfer_msg msg;
> +	u32 dib;
> +	int rc = 0;
> +
> +	/* Update command mask just once */
> +	if (adapter->cmd_mask & BIT(PECI_CMD_PING))
> +		return 0;
> +
> +	msg.addr      = PECI_BASE_ADDR;
> +	msg.tx_len    = GET_DIB_WR_LEN;
> +	msg.rx_len    = GET_DIB_RD_LEN;
> +	msg.tx_buf[0] = GET_DIB_PECI_CMD;
> +
> +	rc = peci_xfer(adapter, &msg);
> +	if (rc < 0) {
> +		dev_dbg(&adapter->dev, "PECI xfer error, rc : %d\n", rc);
> +		return rc;
> +	}
> +
> +	dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
> +	      (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
> +
> +	/* Check special case for Get DIB command */
> +	if (dib == 0x00) {
> +		dev_dbg(&adapter->dev, "DIB read as 0x00\n");
> +		return -1;
> +	}
> +
> +	if (!rc) {
> +		/**
> +		 * setting up the supporting commands based on minor rev#
> +		 * see PECI Spec Table 3-1
> +		 */
> +		dib = (dib >> 8) & 0xF;
> +
> +		if (dib >= 0x1) {
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
> +		}
> +
> +		if (dib >= 0x2)
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR);
> +
> +		if (dib >= 0x3) {
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG_LOCAL);
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG_LOCAL);
> +		}
> +
> +		if (dib >= 0x4)
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG);
> +
> +		if (dib >= 0x5)
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG);
> +
> +		if (dib >= 0x6)
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_IA_MSR);

Lots of magic numbers here. Can they be replaced with #defines.  Also,
it looks like a switch statement could be used, with fall through.

> +
> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_TEMP);
> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_DIB);
> +		adapter->cmd_mask |= BIT(PECI_CMD_PING);
> +	} else {
> +		dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
> +	}
> +
> +	return rc;
> +}
> +

> +static int peci_ioctl_get_temp(struct peci_adapter *adapter, void *vmsg)
> +{
> +	struct peci_get_temp_msg *umsg = vmsg;
> +	struct peci_xfer_msg msg;
> +	int rc;
> +

Is this getting the temperature?

> +	rc = peci_cmd_support(adapter, PECI_CMD_GET_TEMP);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr      = umsg->addr;
> +	msg.tx_len    = GET_TEMP_WR_LEN;
> +	msg.rx_len    = GET_TEMP_RD_LEN;
> +	msg.tx_buf[0] = GET_TEMP_PECI_CMD;
> +
> +	rc = peci_xfer(adapter, &msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	umsg->temp_raw = msg.rx_buf[0] | (msg.rx_buf[1] << 8);
> +
> +	return 0;
> +}



> +static long peci_ioctl(struct file *file, unsigned int iocmd, unsigned long arg)
> +{
> +	struct peci_adapter *adapter = file->private_data;
> +	void __user *argp = (void __user *)arg;
> +	unsigned int msg_len;
> +	enum peci_cmd cmd;
> +	u8 *msg;
> +	int rc = 0;
> +
> +	dev_dbg(&adapter->dev, "ioctl, cmd=0x%x, arg=0x%lx\n", iocmd, arg);
> +
> +	switch (iocmd) {
> +	case PECI_IOC_PING:
> +	case PECI_IOC_GET_DIB:
> +	case PECI_IOC_GET_TEMP:
> +	case PECI_IOC_RD_PKG_CFG:
> +	case PECI_IOC_WR_PKG_CFG:
> +	case PECI_IOC_RD_IA_MSR:
> +	case PECI_IOC_RD_PCI_CFG:
> +	case PECI_IOC_RD_PCI_CFG_LOCAL:
> +	case PECI_IOC_WR_PCI_CFG_LOCAL:
> +		cmd = _IOC_TYPE(iocmd) - PECI_IOC_BASE;
> +		msg_len = _IOC_SIZE(iocmd);
> +		break;

Adding new ioctl calls is pretty frowned up. Can you export this info
via /sysfs?

Also, should there be some permission checks here? Or is any user
allowed to call these ioctls?

	Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-02-21 16:16 ` [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs Jae Hyun Yoo
@ 2018-02-21 17:13   ` Andrew Lunn
  2018-02-21 20:35     ` Jae Hyun Yoo
  2018-03-06 12:40   ` Pavel Machek
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Lunn @ 2018-02-21 17:13 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On Wed, Feb 21, 2018 at 08:16:00AM -0800, Jae Hyun Yoo wrote:
> This commit adds a dt-bindings document of PECI adapter driver for Aspeed
> AST24xx/25xx SoCs.

Hi Jae

It would be good to separate this into two. One binding document for a
generic adaptor, with a generic PECI bus, and generic client
devices. List all the properties you expect at the generic level.

Then have an aspeed specific binding for those properties which are
specific to the Aspeed adaptor.

	 Andrew
 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
  2018-02-21 17:04   ` Andrew Lunn
@ 2018-02-21 17:58   ` Greg KH
  2018-02-21 20:42     ` Jae Hyun Yoo
  2018-02-22  7:01   ` kbuild test robot
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Greg KH @ 2018-02-21 17:58 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, jdelvare, linux, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
> This commit adds driver implementation for PECI bus into linux
> driver framework.
> 
> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> ---

Why is there no other Intel developers willing to review and sign off on
this patch?  Please get their review first before asking us to do their
work for them :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-21 16:16 ` [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic " Jae Hyun Yoo
@ 2018-02-21 18:26   ` Guenter Roeck
  2018-02-21 21:24     ` Jae Hyun Yoo
  2018-03-13  9:32   ` Stef van Os
  1 sibling, 1 reply; 46+ messages in thread
From: Guenter Roeck @ 2018-02-21 18:26 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On Wed, Feb 21, 2018 at 08:16:05AM -0800, Jae Hyun Yoo wrote:
> This commit adds a generic PECI hwmon client driver implementation.
> 
> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> ---
>  drivers/hwmon/Kconfig      |  10 +
>  drivers/hwmon/Makefile     |   1 +
>  drivers/hwmon/peci-hwmon.c | 928 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 939 insertions(+)
>  create mode 100644 drivers/hwmon/peci-hwmon.c
> 
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index ef23553ff5cb..f22e0c31f597 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
>  	  This driver can also be built as a module.  If so, the module
>  	  will be called nct7904.
>  
> +config SENSORS_PECI_HWMON
> +	tristate "PECI hwmon support"
> +	depends on PECI
> +	help
> +	  If you say yes here you get support for the generic PECI hwmon
> +	  driver.
> +
> +	  This driver can also be built as a module.  If so, the module
> +	  will be called peci-hwmon.
> +
>  config SENSORS_NSA320
>  	tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
>  	depends on GPIOLIB && OF
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index f814b4ace138..946f54b168e5 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)	+= nct7802.o
>  obj-$(CONFIG_SENSORS_NCT7904)	+= nct7904.o
>  obj-$(CONFIG_SENSORS_NSA320)	+= nsa320-hwmon.o
>  obj-$(CONFIG_SENSORS_NTC_THERMISTOR)	+= ntc_thermistor.o
> +obj-$(CONFIG_SENSORS_PECI_HWMON)	+= peci-hwmon.o
>  obj-$(CONFIG_SENSORS_PC87360)	+= pc87360.o
>  obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
>  obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
> diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
> new file mode 100644
> index 000000000000..edd27744adcb
> --- /dev/null
> +++ b/drivers/hwmon/peci-hwmon.c
> @@ -0,0 +1,928 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2018 Intel Corporation
> +
> +#include <linux/delay.h>
> +#include <linux/hwmon.h>
> +#include <linux/hwmon-sysfs.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/of_device.h>
> +#include <linux/peci.h>
> +#include <linux/workqueue.h>
> +
> +#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks x 2) */
> +#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX Platinum) */
> +#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
> +
> +#define CORE_TEMP_ATTRS       5
> +#define DIMM_TEMP_ATTRS       2
> +#define ATTR_NAME_LEN         24
> +
> +#define DEFAULT_ATTR_GRP_NUMS 5
> +
> +#define UPDATE_INTERVAL_MIN   HZ
> +#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
> +
> +enum sign {
> +	POS,
> +	NEG
> +};
> +
> +struct temp_data {
> +	bool valid;
> +	s32  value;
> +	unsigned long last_updated;
> +};
> +
> +struct temp_group {
> +	struct temp_data tjmax;
> +	struct temp_data tcontrol;
> +	struct temp_data tthrottle;
> +	struct temp_data dts_margin;
> +	struct temp_data die;
> +	struct temp_data core[CORE_NUMS_MAX];
> +	struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
> +};
> +
> +struct core_temp_group {
> +	struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
> +	char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
> +	struct attribute *attrs[CORE_TEMP_ATTRS + 1];
> +	struct attribute_group attr_group;
> +};
> +
> +struct dimm_temp_group {
> +	struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
> +	char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
> +	struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
> +	struct attribute_group attr_group;
> +};
> +
> +struct peci_hwmon {
> +	struct peci_client *client;
> +	struct device *dev;
> +	struct device *hwmon_dev;
> +	struct workqueue_struct *work_queue;
> +	struct delayed_work work_handler;
> +	char name[PECI_NAME_SIZE];
> +	struct temp_group temp;
> +	u8 addr;
> +	uint cpu_no;
> +	u32 core_mask;
> +	u32 dimm_mask;
> +	const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
> +	const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX + 1];
> +	uint global_idx;
> +	uint core_idx;
> +	uint dimm_idx;
> +};
> +
> +enum label {
> +	L_DIE,
> +	L_DTS,
> +	L_TCONTROL,
> +	L_TTHROTTLE,
> +	L_TJMAX,
> +	L_MAX
> +};
> +
> +static const char *peci_label[L_MAX] = {
> +	"Die\n",
> +	"DTS margin to Tcontrol\n",
> +	"Tcontrol\n",
> +	"Tthrottle\n",
> +	"Tjmax\n",
> +};
> +
> +static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, void *msg)
> +{
> +	return peci_command(priv->client->adapter, cmd, msg);
> +}
> +
> +static int need_update(struct temp_data *temp)
> +{
> +	if (temp->valid &&
> +	    time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static s32 ten_dot_six_to_millidegree(s32 x)
> +{
> +	return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
> +}
> +
> +static int get_tjmax(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int rc;
> +
> +	if (!priv->temp.tjmax.valid) {
> +		msg.addr = priv->addr;
> +		msg.index = MBX_INDEX_TEMP_TARGET;
> +		msg.param = 0;
> +		msg.rx_len = 4;
> +
> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);

Is a typecast to a void * necessary ?

> +		if (rc < 0)
> +			return rc;
> +
> +		priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
> +		priv->temp.tjmax.valid = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_tcontrol(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 tcontrol_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.tcontrol))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_TEMP_TARGET;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	tcontrol_margin = msg.pkg_config[1];
> +	tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
> +
> +	priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
> +
> +	if (!priv->temp.tcontrol.valid) {
> +		priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
> +		priv->temp.tcontrol.valid = true;
> +	} else {
> +		priv->temp.tcontrol.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_tthrottle(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 tthrottle_offset;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.tthrottle))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_TEMP_TARGET;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
> +	priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
> +
> +	if (!priv->temp.tthrottle.valid) {
> +		priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
> +		priv->temp.tthrottle.valid = true;
> +	} else {
> +		priv->temp.tthrottle.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_die_temp(struct peci_hwmon *priv)
> +{
> +	struct peci_get_temp_msg msg;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.die))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->temp.die.value = priv->temp.tjmax.value +
> +			       ((s32)msg.temp_raw * 1000 / 64);
> +
> +	if (!priv->temp.die.valid) {
> +		priv->temp.die.last_updated = INITIAL_JIFFIES;
> +		priv->temp.die.valid = true;
> +	} else {
> +		priv->temp.die.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_dts_margin(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 dts_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.dts_margin))
> +		return 0;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_DTS_MARGIN;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> +
> +	/**
> +	 * Processors return a value of DTS reading in 10.6 format
> +	 * (10 bits signed decimal, 6 bits fractional).
> +	 * Error codes:
> +	 *   0x8000: General sensor error
> +	 *   0x8001: Reserved
> +	 *   0x8002: Underflow on reading value
> +	 *   0x8003-0x81ff: Reserved
> +	 */
> +	if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
> +		return -1;
> +
> +	dts_margin = ten_dot_six_to_millidegree(dts_margin);
> +
> +	priv->temp.dts_margin.value = dts_margin;
> +
> +	if (!priv->temp.dts_margin.valid) {
> +		priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
> +		priv->temp.dts_margin.valid = true;
> +	} else {
> +		priv->temp.dts_margin.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_core_temp(struct peci_hwmon *priv, int core_index)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 core_dts_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.core[core_index]))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
> +	msg.param = core_index;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> +
> +	/**
> +	 * Processors return a value of the core DTS reading in 10.6 format
> +	 * (10 bits signed decimal, 6 bits fractional).
> +	 * Error codes:
> +	 *   0x8000: General sensor error
> +	 *   0x8001: Reserved
> +	 *   0x8002: Underflow on reading value
> +	 *   0x8003-0x81ff: Reserved
> +	 */
> +	if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
> +		return -1;

Please use valid error codes. This value is returned to user space,
and I don't think this error should be EPERM.

> +
> +	core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
> +
> +	priv->temp.core[core_index].value = priv->temp.tjmax.value +
> +					    core_dts_margin;
> +
> +	if (!priv->temp.core[core_index].valid) {
> +		priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
> +		priv->temp.core[core_index].valid = true;
> +	} else {
> +		priv->temp.core[core_index].last_updated = jiffies;
> +	}

I don't understand the purpose of this code. Why not just set valid = true
and last_updated = jiffies ? Why set anything to INITIAL_JIFFIES some
arbitrary time after boot ? AFAICS the first read will always be followed
by another immediately afterwards if the user requests two readings in
a row. Maybe that is intentional, but not to me. If this code is on
purpose, it will require a detailed explanation.

> +
> +	return 0;
> +}
> +
> +static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int channel = dimm_index / 2;
> +	int dimm_order = dimm_index % 2;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.dimm[dimm_index]))
> +		return 0;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> +	msg.param = channel;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 1000;
> +
> +	if (!priv->temp.dimm[dimm_index].valid) {
> +		priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
> +		priv->temp.dimm[dimm_index].valid = true;
> +	} else {
> +		priv->temp.dimm[dimm_index].last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t show_tcontrol(struct device *dev,
> +			     struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tcontrol(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
> +}
> +
> +static ssize_t show_tcontrol_margin(struct device *dev,
> +				    struct device_attribute *attr,
> +				    char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int rc;
> +
> +	rc = get_tcontrol(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", sensor_attr->index == POS ?
> +				    priv->temp.tjmax.value -
> +				    priv->temp.tcontrol.value :
> +				    priv->temp.tcontrol.value -
> +				    priv->temp.tjmax.value);
> +}
> +
> +static ssize_t show_tthrottle(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tthrottle(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
> +}
> +
> +static ssize_t show_tjmax(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tjmax.value);
> +}
> +
> +static ssize_t show_die_temp(struct device *dev,
> +			     struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_die_temp(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.die.value);
> +}
> +
> +static ssize_t show_dts_margin(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_dts_margin(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
> +}
> +
> +static ssize_t show_core_temp(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int core_index = sensor_attr->index;
> +	int rc;
> +
> +	rc = get_core_temp(priv, core_index);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
> +}
> +
> +static ssize_t show_dimm_temp(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int dimm_index = sensor_attr->index;
> +	int rc;
> +
> +	rc = get_dimm_temp(priv, dimm_index);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
> +}
> +
> +static ssize_t show_value(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, "%d\n", sensor_attr->index);
> +}
> +
> +static ssize_t show_label(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, peci_label[sensor_attr->index]);
> +}
> +
> +static ssize_t show_core_label(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, "Core %d\n", sensor_attr->index);
> +}
> +
> +static ssize_t show_dimm_label(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	char channel = 'A' + (sensor_attr->index / 2);
> +	int index = sensor_attr->index % 2;
> +
> +	return sprintf(buf, "DIMM %d (%c%d)\n",
> +		       sensor_attr->index, channel, index);
> +}
> +
> +/* Die temperature */
> +static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
> +static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, show_tcontrol_margin, NULL,
> +			  POS);
> +
> +static struct attribute *die_temp_attrs[] = {
> +	&sensor_dev_attr_temp1_label.dev_attr.attr,
> +	&sensor_dev_attr_temp1_input.dev_attr.attr,
> +	&sensor_dev_attr_temp1_max.dev_attr.attr,
> +	&sensor_dev_attr_temp1_crit.dev_attr.attr,
> +	&sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group die_temp_attr_group = {
> +	.attrs = die_temp_attrs,
> +};
> +
> +/* DTS margin temperature */
> +static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
> +static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, NULL, NEG);
> +
> +static struct attribute *dts_margin_temp_attrs[] = {
> +	&sensor_dev_attr_temp2_label.dev_attr.attr,
> +	&sensor_dev_attr_temp2_input.dev_attr.attr,
> +	&sensor_dev_attr_temp2_min.dev_attr.attr,
> +	&sensor_dev_attr_temp2_lcrit.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group dts_margin_temp_attr_group = {
> +	.attrs = dts_margin_temp_attrs,
> +};
> +
> +/* Tcontrol temperature */
> +static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, L_TCONTROL);
> +static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
> +
> +static struct attribute *tcontrol_temp_attrs[] = {
> +	&sensor_dev_attr_temp3_label.dev_attr.attr,
> +	&sensor_dev_attr_temp3_input.dev_attr.attr,
> +	&sensor_dev_attr_temp3_crit.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tcontrol_temp_attr_group = {
> +	.attrs = tcontrol_temp_attrs,
> +};
> +
> +/* Tthrottle temperature */
> +static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, L_TTHROTTLE);
> +static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
> +
> +static struct attribute *tthrottle_temp_attrs[] = {
> +	&sensor_dev_attr_temp4_label.dev_attr.attr,
> +	&sensor_dev_attr_temp4_input.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tthrottle_temp_attr_group = {
> +	.attrs = tthrottle_temp_attrs,
> +};
> +
> +/* Tjmax temperature */
> +static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
> +static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
> +
> +static struct attribute *tjmax_temp_attrs[] = {
> +	&sensor_dev_attr_temp5_label.dev_attr.attr,
> +	&sensor_dev_attr_temp5_input.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tjmax_temp_attr_group = {
> +	.attrs = tjmax_temp_attrs,
> +};
> +
> +static const struct attribute_group *
> +default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
> +	&die_temp_attr_group,
> +	&dts_margin_temp_attr_group,
> +	&tcontrol_temp_attr_group,
> +	&tthrottle_temp_attr_group,
> +	&tjmax_temp_attr_group,
> +	NULL
> +};
> +
> +/* Core temperature */
> +static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device *dev,
> +		struct device_attribute *devattr, char *buf) = {
> +	show_core_label,
> +	show_core_temp,
> +	show_tcontrol,
> +	show_tjmax,
> +	show_tcontrol_margin,
> +};
> +
> +static const char *const core_suffix[CORE_TEMP_ATTRS] = {
> +	"label",
> +	"input",
> +	"max",
> +	"crit",
> +	"crit_hyst",
> +};
> +
> +static int check_resolved_cores(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pci_cfg_local_msg msg;
> +	int rc;
> +
> +	if (!(priv->client->adapter->cmd_mask & BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
> +		return -EINVAL;
> +
> +	/* Get the RESOLVED_CORES register value */
> +	msg.addr = priv->addr;
> +	msg.bus = 1;
> +	msg.device = 30;
> +	msg.function = 3;
> +	msg.reg = 0xB4;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->core_mask = msg.pci_config[3] << 24 |
> +			  msg.pci_config[2] << 16 |
> +			  msg.pci_config[1] << 8 |
> +			  msg.pci_config[0];
> +
> +	if (!priv->core_mask)
> +		return -EAGAIN;
> +
> +	dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", priv->core_mask);
> +	return 0;
> +}
> +
> +static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
> +{
> +	struct core_temp_group *data;
> +	int i;
> +
> +	data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
> +			    GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < CORE_TEMP_ATTRS; i++) {
> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> +			 "temp%d_%s", priv->global_idx, core_suffix[i]);
> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> +		data->sd_attrs[i].dev_attr.show = core_show_fn[i];
> +		if (i == 0 || i == 1) /* label or temp */
> +			data->sd_attrs[i].index = core_no;
> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> +	}
> +
> +	data->attr_group.attrs = data->attrs;
> +	priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
> +	priv->global_idx++;
> +
> +	return 0;
> +}
> +
> +static int create_core_temp_groups(struct peci_hwmon *priv)
> +{
> +	int rc, i;
> +
> +	rc = check_resolved_cores(priv);
> +	if (!rc) {
> +		for (i = 0; i < CORE_NUMS_MAX; i++) {
> +			if (priv->core_mask & BIT(i)) {
> +				rc = create_core_temp_group(priv, i);
> +				if (rc)
> +					return rc;
> +			}
> +		}
> +
> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> +					 priv->core_attr_groups);
> +	}
> +
> +	return rc;
> +}
> +
> +/* DIMM temperature */
> +static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device *dev,
> +		struct device_attribute *devattr, char *buf) = {
> +	show_dimm_label,
> +	show_dimm_temp,
> +};
> +
> +static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
> +	"label",
> +	"input",
> +};
> +
> +static int check_populated_dimms(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int i, rc, pass = 0;
> +
> +do_scan:
> +	for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
> +		msg.addr = priv->addr;
> +		msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> +		msg.param = i; /* channel */
> +		msg.rx_len = 4;
> +
> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +		if (rc < 0)
> +			return rc;
> +
> +		if (msg.pkg_config[0]) /* DIMM #0 on the channel */
> +			priv->dimm_mask |= BIT(i);
> +
> +		if (msg.pkg_config[1]) /* DIMM #1 on the channel */
> +			priv->dimm_mask |= BIT(i + 1);

Each loop sets overlapping bits in dimm_mask. The first loop sets
bit 0 and 1, the second sets bit 1 and 2, and so on. I _think_ this 
should probably set bits (i*2) and (i*2+1). If so, I would suggest to
test the code in a system with more than one DIMM in more than one bank.

> +	}
> +
> +	/* Do 2-pass scanning */
> +	if (priv->dimm_mask && pass == 0) {
> +		pass++;
> +		goto do_scan;

This goto is only used to avoid a nested loops. Please don't do that.
If you want to avoid indentation levels, add another function.

Also, this will require an explanation why the loop is executed if
and only if a dimm is found the first time around.

> +	}
> +
> +	if (!priv->dimm_mask)
> +		return -EAGAIN;
> +
> +	dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", priv->dimm_mask);
> +	return 0;
> +}
> +
> +static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
> +{
> +	struct dimm_temp_group *data;
> +	int i;
> +
> +	data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
> +			    GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> +			 "temp%d_%s", priv->global_idx, dimm_suffix[i]);
> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> +		data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
> +		data->sd_attrs[i].index = dimm_no;
> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> +	}
> +
> +	data->attr_group.attrs = data->attrs;
> +	priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
> +	priv->global_idx++;
> +
> +	return 0;
> +}
> +
> +static int create_dimm_temp_groups(struct peci_hwmon *priv)
> +{
> +	int rc, i;
> +
> +	rc = check_populated_dimms(priv);
> +	if (!rc) {
> +		for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
> +			if (priv->dimm_mask & BIT(i)) {
> +				rc = create_dimm_temp_group(priv, i);
> +				if (rc)
> +					return rc;
> +			}
> +		}
> +
> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> +					 priv->dimm_attr_groups);
> +		if (!rc)
> +			dev_dbg(priv->dev, "Done DIMM temp group creation\n");
> +	} else if (rc == -EAGAIN) {
> +		queue_delayed_work(priv->work_queue, &priv->work_handler,
> +				   DIMM_MASK_CHECK_DELAY);
> +		dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");

s/Diferred/Deferred/

If PECI never reports any DIMMS, this will be repeated forever until
it finds at least one group. Is this intentional ? If so, I would expect
some detailed explanation of the rationale. As it is, the DIMM temperatures
can show up randomly after some hours of runtime, which isn't exactly
deterministic. Maybe that does make sense, but it will need to be explained.

> +	}
> +
> +	return rc;
> +}
> +
> +static void create_dimm_temp_groups_delayed(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
> +					       work_handler);
> +	int rc;
> +
> +	rc = create_dimm_temp_groups(priv);
> +	if (rc && rc != -EAGAIN)
> +		dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
> +}
> +
> +static int peci_hwmon_probe(struct peci_client *client)
> +{
> +	struct device *dev = &client->dev;
> +	struct peci_hwmon *priv;
> +	int rc;
> +
> +	if ((client->adapter->cmd_mask &
> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
> +		dev_err(dev, "Client doesn't support temperature monitoring\n");
> +		return -EINVAL;
> +	}
> +
> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	dev_set_drvdata(dev, priv);
> +	priv->client = client;
> +	priv->dev = dev;
> +	priv->addr = client->addr;
> +	priv->cpu_no = priv->addr - PECI_BASE_ADDR;
> +
> +	snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", priv->cpu_no);
> +
> +	priv->work_queue = create_singlethread_workqueue(priv->name);
> +	if (!priv->work_queue)
> +		return -ENOMEM;
> +
> +	priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
> +							    priv->name,
> +							    priv,
> +							   default_attr_groups);
> + 
I'll expect a detailed explanation why using hwmon_device_register_with_info()
does not work for this driver, and why it would make sense to ever register
the hwmon device before its attributes are available. From my perspective,
the driver should delay registration entirely until all attributes are
available. The hwmon ABI implicitly assumes that all sensors are available
at the time of hwmon device registration. Anything else can result in
unexpected behavior.

> +	rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
> +	if (rc) {
> +		dev_err(dev, "Failed to register peci hwmon\n");
> +		return rc;
> +	}
> +
> +	priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
> +
> +	rc = create_core_temp_groups(priv);
> +	if (rc) {
> +		dev_err(dev, "Failed to create core groups\n");
> +		return rc;
> +	}

This should be done before registering the hwmon device (or be left
to the hwmon core by using the _info API). And it should definitely
not return an error while keeping the hwmon device around.

> +
> +	INIT_DELAYED_WORK(&priv->work_handler, create_dimm_temp_groups_delayed);
> +
> +	rc = create_dimm_temp_groups(priv);
> +	if (rc && rc != -EAGAIN)
> +		dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
> +
Not that it should be there in the first place, but "creat" is not a word.

> +	dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
> +
> +	return 0;
> +}
> +
> +static int peci_hwmon_remove(struct peci_client *client)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
> +
> +	cancel_delayed_work(&priv->work_handler);
> +	destroy_workqueue(priv->work_queue);
> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
> +	hwmon_device_unregister(priv->hwmon_dev);
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id peci_of_table[] = {
> +	{ .compatible = "intel,peci-hwmon", },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(of, peci_of_table);
> +
> +static struct peci_driver peci_hwmon_driver = {
> +	.probe  = peci_hwmon_probe,
> +	.remove = peci_hwmon_remove,
> +	.driver = {
> +		.name           = "peci-hwmon",
> +		.of_match_table = of_match_ptr(peci_of_table),
> +	},
> +};
> +module_peci_driver(peci_hwmon_driver);
> +
> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
> +MODULE_DESCRIPTION("PECI hwmon driver");
> +MODULE_LICENSE("GPL v2");
> -- 
> 2.16.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 17:04   ` Andrew Lunn
@ 2018-02-21 20:31     ` Jae Hyun Yoo
  2018-02-21 21:51       ` Andrew Lunn
  0 siblings, 1 reply; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 20:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

Hi Andrew,

Thanks for sharing your time to review it. Please check my answers inline.

On 2/21/2018 9:04 AM, Andrew Lunn wrote:
>> +static int peci_locked_xfer(struct peci_adapter *adapter,
>> +			    struct peci_xfer_msg *msg,
>> +			    bool do_retry,
>> +			    bool has_aw_fcs)
>> +{
>> +	ktime_t start, end;
>> +	s64 elapsed_ms;
>> +	int rc = 0;
>> +
>> +	if (!adapter->xfer) {
>> +		dev_dbg(&adapter->dev, "PECI level transfers not supported\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	if (in_atomic() || irqs_disabled()) {
> 
> Hi Jae
> 
> Is there a real need to do transfers in atomic context, or with
> interrupts disabled?
> 

Actually, no. Generally, this function will be called in sleep-able 
context so this code is for an exceptional case handling.

I'll rewrite this code like below:
	if (in_atomic() || irqs_disabled()) {
		dev_dbg(&adapter->dev,
			"xfer in non-sleepable context is not supported\n");
		return -EWOULDBLOCK;
	}

And then, will add a sleep call into the below loop.

I know that in_atomic() call is not recommended in driver code but some 
driver codes still use it since there is no alternative way at this 
time, AFAIK. Please tell me if there is a better solution.

>> +		rt_mutex_trylock(&adapter->bus_lock);
>> +		if (!rc)
>> +			return -EAGAIN; /* PECI activity is ongoing */
>> +	} else {
>> +		rt_mutex_lock(&adapter->bus_lock);
>> +	}
>> +
>> +	if (do_retry)
>> +		start = ktime_get();
>> +
>> +	do {
>> +		rc = adapter->xfer(adapter, msg);
>> +
>> +		if (!do_retry)
>> +			break;
>> +
>> +		/* Per the PECI spec, need to retry commands that return 0x8x */
>> +		if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
>> +			      DEV_PECI_CC_TIMEOUT)))
>> +			break;
>> +
>> +		/* Set the retry bit to indicate a retry attempt */
>> +		msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;
>> +
>> +		/* Recalculate the AW FCS if it has one */
>> +		if (has_aw_fcs)
>> +			msg->tx_buf[msg->tx_len - 1] = 0x80 ^
>> +						peci_aw_fcs((u8 *)msg,
>> +							    2 + msg->tx_len);
>> +
>> +		/* Retry for at least 250ms before returning an error */
>> +		end = ktime_get();
>> +		elapsed_ms = ktime_to_ms(ktime_sub(end, start));
>> +		if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
>> +			dev_dbg(&adapter->dev, "Timeout retrying xfer!\n");
>> +			break;
>> +		}
>> +	} while (true);
> 
> So you busy loop to 1/4 second? How about putting a sleep in here so
> other things can be done between each retry.
> 
> And should it not return -ETIMEDOUT after that 1/4 second?
> 

Yes, you are right. I'll rewrite this code like below after adding the 
above change:

		/**
		 * Retry for at least 250ms before returning an error.
		 * Retry interval guideline:
		 *   No minimum < Retry Interval < No maximum
		 *                (recommend 10ms)
		 */
		end = ktime_get();
		elapsed_ms = ktime_to_ms(ktime_sub(end, start));
		if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
			dev_dbg(&adapter->dev, "Timeout retrying xfer!\n");
			rc = -ETIMEDOUT;
			break;
		}

		usleep_range(DEV_PECI_RETRY_INTERVAL_MS * 1000,
			     (DEV_PECI_RETRY_INTERVAL_MS * 1000) + 1000);

>> +static int peci_scan_cmd_mask(struct peci_adapter *adapter)
>> +{
>> +	struct peci_xfer_msg msg;
>> +	u32 dib;
>> +	int rc = 0;
>> +
>> +	/* Update command mask just once */
>> +	if (adapter->cmd_mask & BIT(PECI_CMD_PING))
>> +		return 0;
>> +
>> +	msg.addr      = PECI_BASE_ADDR;
>> +	msg.tx_len    = GET_DIB_WR_LEN;
>> +	msg.rx_len    = GET_DIB_RD_LEN;
>> +	msg.tx_buf[0] = GET_DIB_PECI_CMD;
>> +
>> +	rc = peci_xfer(adapter, &msg);
>> +	if (rc < 0) {
>> +		dev_dbg(&adapter->dev, "PECI xfer error, rc : %d\n", rc);
>> +		return rc;
>> +	}
>> +
>> +	dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
>> +	      (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
>> +
>> +	/* Check special case for Get DIB command */
>> +	if (dib == 0x00) {
>> +		dev_dbg(&adapter->dev, "DIB read as 0x00\n");
>> +		return -1;
>> +	}
>> +
>> +	if (!rc) {
>> +		/**
>> +		 * setting up the supporting commands based on minor rev#
>> +		 * see PECI Spec Table 3-1
>> +		 */
>> +		dib = (dib >> 8) & 0xF;
>> +
>> +		if (dib >= 0x1) {
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
>> +		}
>> +
>> +		if (dib >= 0x2)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR);
>> +
>> +		if (dib >= 0x3) {
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG_LOCAL);
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG_LOCAL);
>> +		}
>> +
>> +		if (dib >= 0x4)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG);
>> +
>> +		if (dib >= 0x5)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG);
>> +
>> +		if (dib >= 0x6)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_IA_MSR);
> 
> Lots of magic numbers here. Can they be replaced with #defines.  Also,
> it looks like a switch statement could be used, with fall through.
> 

I agree. Will rewrite it.

>> +
>> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_TEMP);
>> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_DIB);
>> +		adapter->cmd_mask |= BIT(PECI_CMD_PING);
>> +	} else {
>> +		dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
>> +	}
>> +
>> +	return rc;
>> +}
>> +
> 
>> +static int peci_ioctl_get_temp(struct peci_adapter *adapter, void *vmsg)
>> +{
>> +	struct peci_get_temp_msg *umsg = vmsg;
>> +	struct peci_xfer_msg msg;
>> +	int rc;
>> +
> 
> Is this getting the temperature?
> 

Yes, this is getting the 'die' temperature of a processor package.

>> +	rc = peci_cmd_support(adapter, PECI_CMD_GET_TEMP);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	msg.addr      = umsg->addr;
>> +	msg.tx_len    = GET_TEMP_WR_LEN;
>> +	msg.rx_len    = GET_TEMP_RD_LEN;
>> +	msg.tx_buf[0] = GET_TEMP_PECI_CMD;
>> +
>> +	rc = peci_xfer(adapter, &msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	umsg->temp_raw = msg.rx_buf[0] | (msg.rx_buf[1] << 8);
>> +
>> +	return 0;
>> +}
> 
> 
> 
>> +static long peci_ioctl(struct file *file, unsigned int iocmd, unsigned long arg)
>> +{
>> +	struct peci_adapter *adapter = file->private_data;
>> +	void __user *argp = (void __user *)arg;
>> +	unsigned int msg_len;
>> +	enum peci_cmd cmd;
>> +	u8 *msg;
>> +	int rc = 0;
>> +
>> +	dev_dbg(&adapter->dev, "ioctl, cmd=0x%x, arg=0x%lx\n", iocmd, arg);
>> +
>> +	switch (iocmd) {
>> +	case PECI_IOC_PING:
>> +	case PECI_IOC_GET_DIB:
>> +	case PECI_IOC_GET_TEMP:
>> +	case PECI_IOC_RD_PKG_CFG:
>> +	case PECI_IOC_WR_PKG_CFG:
>> +	case PECI_IOC_RD_IA_MSR:
>> +	case PECI_IOC_RD_PCI_CFG:
>> +	case PECI_IOC_RD_PCI_CFG_LOCAL:
>> +	case PECI_IOC_WR_PCI_CFG_LOCAL:
>> +		cmd = _IOC_TYPE(iocmd) - PECI_IOC_BASE;
>> +		msg_len = _IOC_SIZE(iocmd);
>> +		break;
> 
> Adding new ioctl calls is pretty frowned up. Can you export this info
> via /sysfs?
> 

Most of these are not simple IOs so ioctl is better suited, I think.

> Also, should there be some permission checks here? Or is any user
> allowed to call these ioctls?
> 

I agree. I will add some permission checks here.

> 	Andrew
> 

Thanks a lot,
Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-02-21 17:13   ` Andrew Lunn
@ 2018-02-21 20:35     ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 20:35 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On 2/21/2018 9:13 AM, Andrew Lunn wrote:
> On Wed, Feb 21, 2018 at 08:16:00AM -0800, Jae Hyun Yoo wrote:
>> This commit adds a dt-bindings document of PECI adapter driver for Aspeed
>> AST24xx/25xx SoCs.
> 
> Hi Jae
> 
> It would be good to separate this into two. One binding document for a
> generic adaptor, with a generic PECI bus, and generic client
> devices. List all the properties you expect at the generic level.
> 
> Then have an aspeed specific binding for those properties which are
> specific to the Aspeed adaptor.
> 

That makes sense. I'll add generic PECI bus/adapter/client and Aspeed 
specific documents as separated.

> 	 Andrew
>   
> 

Thanks again for sharing your time to review it. I really appreciate it.

BR,
Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 17:58   ` Greg KH
@ 2018-02-21 20:42     ` Jae Hyun Yoo
  2018-02-22  6:54       ` Greg KH
  0 siblings, 1 reply; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 20:42 UTC (permalink / raw)
  To: Greg KH
  Cc: joel, andrew, arnd, jdelvare, linux, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On 2/21/2018 9:58 AM, Greg KH wrote:
> On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
>> This commit adds driver implementation for PECI bus into linux
>> driver framework.
>>
>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>> ---
> 
> Why is there no other Intel developers willing to review and sign off on
> this patch?  Please get their review first before asking us to do their
> work for them :)
> 
> thanks,
> 
> greg k-h
> 

Hi Greg,

This patch set got our internal review process. Sorry if it's code 
quality is under your expectation but it's the reason why I'm asking you 
to review the code. Could you please share your time to review it?

Thanks a lot,
Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-21 18:26   ` Guenter Roeck
@ 2018-02-21 21:24     ` Jae Hyun Yoo
  2018-02-21 21:48       ` Guenter Roeck
  0 siblings, 1 reply; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 21:24 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: joel, andrew, arnd, gregkh, jdelvare, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

Hi Guenter,

Thanks for sharing your time to review this code. Please check my 
answers inline.

On 2/21/2018 10:26 AM, Guenter Roeck wrote:
> On Wed, Feb 21, 2018 at 08:16:05AM -0800, Jae Hyun Yoo wrote:
>> This commit adds a generic PECI hwmon client driver implementation.
>>
>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>> ---
>>   drivers/hwmon/Kconfig      |  10 +
>>   drivers/hwmon/Makefile     |   1 +
>>   drivers/hwmon/peci-hwmon.c | 928 +++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 939 insertions(+)
>>   create mode 100644 drivers/hwmon/peci-hwmon.c
>>
>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>> index ef23553ff5cb..f22e0c31f597 100644
>> --- a/drivers/hwmon/Kconfig
>> +++ b/drivers/hwmon/Kconfig
>> @@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
>>   	  This driver can also be built as a module.  If so, the module
>>   	  will be called nct7904.
>>   
>> +config SENSORS_PECI_HWMON
>> +	tristate "PECI hwmon support"
>> +	depends on PECI
>> +	help
>> +	  If you say yes here you get support for the generic PECI hwmon
>> +	  driver.
>> +
>> +	  This driver can also be built as a module.  If so, the module
>> +	  will be called peci-hwmon.
>> +
>>   config SENSORS_NSA320
>>   	tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
>>   	depends on GPIOLIB && OF
>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>> index f814b4ace138..946f54b168e5 100644
>> --- a/drivers/hwmon/Makefile
>> +++ b/drivers/hwmon/Makefile
>> @@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)	+= nct7802.o
>>   obj-$(CONFIG_SENSORS_NCT7904)	+= nct7904.o
>>   obj-$(CONFIG_SENSORS_NSA320)	+= nsa320-hwmon.o
>>   obj-$(CONFIG_SENSORS_NTC_THERMISTOR)	+= ntc_thermistor.o
>> +obj-$(CONFIG_SENSORS_PECI_HWMON)	+= peci-hwmon.o
>>   obj-$(CONFIG_SENSORS_PC87360)	+= pc87360.o
>>   obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
>>   obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
>> diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
>> new file mode 100644
>> index 000000000000..edd27744adcb
>> --- /dev/null
>> +++ b/drivers/hwmon/peci-hwmon.c
>> @@ -0,0 +1,928 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (c) 2018 Intel Corporation
>> +
>> +#include <linux/delay.h>
>> +#include <linux/hwmon.h>
>> +#include <linux/hwmon-sysfs.h>
>> +#include <linux/jiffies.h>
>> +#include <linux/module.h>
>> +#include <linux/of_device.h>
>> +#include <linux/peci.h>
>> +#include <linux/workqueue.h>
>> +
>> +#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks x 2) */
>> +#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX Platinum) */
>> +#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
>> +
>> +#define CORE_TEMP_ATTRS       5
>> +#define DIMM_TEMP_ATTRS       2
>> +#define ATTR_NAME_LEN         24
>> +
>> +#define DEFAULT_ATTR_GRP_NUMS 5
>> +
>> +#define UPDATE_INTERVAL_MIN   HZ
>> +#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
>> +
>> +enum sign {
>> +	POS,
>> +	NEG
>> +};
>> +
>> +struct temp_data {
>> +	bool valid;
>> +	s32  value;
>> +	unsigned long last_updated;
>> +};
>> +
>> +struct temp_group {
>> +	struct temp_data tjmax;
>> +	struct temp_data tcontrol;
>> +	struct temp_data tthrottle;
>> +	struct temp_data dts_margin;
>> +	struct temp_data die;
>> +	struct temp_data core[CORE_NUMS_MAX];
>> +	struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
>> +};
>> +
>> +struct core_temp_group {
>> +	struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
>> +	char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
>> +	struct attribute *attrs[CORE_TEMP_ATTRS + 1];
>> +	struct attribute_group attr_group;
>> +};
>> +
>> +struct dimm_temp_group {
>> +	struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
>> +	char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
>> +	struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
>> +	struct attribute_group attr_group;
>> +};
>> +
>> +struct peci_hwmon {
>> +	struct peci_client *client;
>> +	struct device *dev;
>> +	struct device *hwmon_dev;
>> +	struct workqueue_struct *work_queue;
>> +	struct delayed_work work_handler;
>> +	char name[PECI_NAME_SIZE];
>> +	struct temp_group temp;
>> +	u8 addr;
>> +	uint cpu_no;
>> +	u32 core_mask;
>> +	u32 dimm_mask;
>> +	const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
>> +	const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX + 1];
>> +	uint global_idx;
>> +	uint core_idx;
>> +	uint dimm_idx;
>> +};
>> +
>> +enum label {
>> +	L_DIE,
>> +	L_DTS,
>> +	L_TCONTROL,
>> +	L_TTHROTTLE,
>> +	L_TJMAX,
>> +	L_MAX
>> +};
>> +
>> +static const char *peci_label[L_MAX] = {
>> +	"Die\n",
>> +	"DTS margin to Tcontrol\n",
>> +	"Tcontrol\n",
>> +	"Tthrottle\n",
>> +	"Tjmax\n",
>> +};
>> +
>> +static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, void *msg)
>> +{
>> +	return peci_command(priv->client->adapter, cmd, msg);
>> +}
>> +
>> +static int need_update(struct temp_data *temp)
>> +{
>> +	if (temp->valid &&
>> +	    time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
>> +		return 0;
>> +
>> +	return 1;
>> +}
>> +
>> +static s32 ten_dot_six_to_millidegree(s32 x)
>> +{
>> +	return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
>> +}
>> +
>> +static int get_tjmax(struct peci_hwmon *priv)
>> +{
>> +	struct peci_rd_pkg_cfg_msg msg;
>> +	int rc;
>> +
>> +	if (!priv->temp.tjmax.valid) {
>> +		msg.addr = priv->addr;
>> +		msg.index = MBX_INDEX_TEMP_TARGET;
>> +		msg.param = 0;
>> +		msg.rx_len = 4;
>> +
>> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> 
> Is a typecast to a void * necessary ?
> 

No. I'll remove the type cast. Thanks!

>> +		if (rc < 0)
>> +			return rc;
>> +
>> +		priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
>> +		priv->temp.tjmax.valid = true;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int get_tcontrol(struct peci_hwmon *priv)
>> +{
>> +	struct peci_rd_pkg_cfg_msg msg;
>> +	s32 tcontrol_margin;
>> +	int rc;
>> +
>> +	if (!need_update(&priv->temp.tcontrol))
>> +		return 0;
>> +
>> +	rc = get_tjmax(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	msg.addr = priv->addr;
>> +	msg.index = MBX_INDEX_TEMP_TARGET;
>> +	msg.param = 0;
>> +	msg.rx_len = 4;
>> +
>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	tcontrol_margin = msg.pkg_config[1];
>> +	tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
>> +
>> +	priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
>> +
>> +	if (!priv->temp.tcontrol.valid) {
>> +		priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
>> +		priv->temp.tcontrol.valid = true;
>> +	} else {
>> +		priv->temp.tcontrol.last_updated = jiffies;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int get_tthrottle(struct peci_hwmon *priv)
>> +{
>> +	struct peci_rd_pkg_cfg_msg msg;
>> +	s32 tthrottle_offset;
>> +	int rc;
>> +
>> +	if (!need_update(&priv->temp.tthrottle))
>> +		return 0;
>> +
>> +	rc = get_tjmax(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	msg.addr = priv->addr;
>> +	msg.index = MBX_INDEX_TEMP_TARGET;
>> +	msg.param = 0;
>> +	msg.rx_len = 4;
>> +
>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
>> +	priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
>> +
>> +	if (!priv->temp.tthrottle.valid) {
>> +		priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
>> +		priv->temp.tthrottle.valid = true;
>> +	} else {
>> +		priv->temp.tthrottle.last_updated = jiffies;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int get_die_temp(struct peci_hwmon *priv)
>> +{
>> +	struct peci_get_temp_msg msg;
>> +	int rc;
>> +
>> +	if (!need_update(&priv->temp.die))
>> +		return 0;
>> +
>> +	rc = get_tjmax(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	msg.addr = priv->addr;
>> +
>> +	rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	priv->temp.die.value = priv->temp.tjmax.value +
>> +			       ((s32)msg.temp_raw * 1000 / 64);
>> +
>> +	if (!priv->temp.die.valid) {
>> +		priv->temp.die.last_updated = INITIAL_JIFFIES;
>> +		priv->temp.die.valid = true;
>> +	} else {
>> +		priv->temp.die.last_updated = jiffies;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int get_dts_margin(struct peci_hwmon *priv)
>> +{
>> +	struct peci_rd_pkg_cfg_msg msg;
>> +	s32 dts_margin;
>> +	int rc;
>> +
>> +	if (!need_update(&priv->temp.dts_margin))
>> +		return 0;
>> +
>> +	msg.addr = priv->addr;
>> +	msg.index = MBX_INDEX_DTS_MARGIN;
>> +	msg.param = 0;
>> +	msg.rx_len = 4;
>> +
>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>> +
>> +	/**
>> +	 * Processors return a value of DTS reading in 10.6 format
>> +	 * (10 bits signed decimal, 6 bits fractional).
>> +	 * Error codes:
>> +	 *   0x8000: General sensor error
>> +	 *   0x8001: Reserved
>> +	 *   0x8002: Underflow on reading value
>> +	 *   0x8003-0x81ff: Reserved
>> +	 */
>> +	if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
>> +		return -1;
>> +
>> +	dts_margin = ten_dot_six_to_millidegree(dts_margin);
>> +
>> +	priv->temp.dts_margin.value = dts_margin;
>> +
>> +	if (!priv->temp.dts_margin.valid) {
>> +		priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
>> +		priv->temp.dts_margin.valid = true;
>> +	} else {
>> +		priv->temp.dts_margin.last_updated = jiffies;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int get_core_temp(struct peci_hwmon *priv, int core_index)
>> +{
>> +	struct peci_rd_pkg_cfg_msg msg;
>> +	s32 core_dts_margin;
>> +	int rc;
>> +
>> +	if (!need_update(&priv->temp.core[core_index]))
>> +		return 0;
>> +
>> +	rc = get_tjmax(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	msg.addr = priv->addr;
>> +	msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
>> +	msg.param = core_index;
>> +	msg.rx_len = 4;
>> +
>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>> +
>> +	/**
>> +	 * Processors return a value of the core DTS reading in 10.6 format
>> +	 * (10 bits signed decimal, 6 bits fractional).
>> +	 * Error codes:
>> +	 *   0x8000: General sensor error
>> +	 *   0x8001: Reserved
>> +	 *   0x8002: Underflow on reading value
>> +	 *   0x8003-0x81ff: Reserved
>> +	 */
>> +	if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
>> +		return -1;
> 
> Please use valid error codes. This value is returned to user space,
> and I don't think this error should be EPERM.
> 

Yes, you are right. I'll fix it.

>> +
>> +	core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
>> +
>> +	priv->temp.core[core_index].value = priv->temp.tjmax.value +
>> +					    core_dts_margin;
>> +
>> +	if (!priv->temp.core[core_index].valid) {
>> +		priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
>> +		priv->temp.core[core_index].valid = true;
>> +	} else {
>> +		priv->temp.core[core_index].last_updated = jiffies;
>> +	}
> 
> I don't understand the purpose of this code. Why not just set valid = true
> and last_updated = jiffies ? Why set anything to INITIAL_JIFFIES some
> arbitrary time after boot ? AFAICS the first read will always be followed
> by another immediately afterwards if the user requests two readings in
> a row. Maybe that is intentional, but not to me. If this code is on
> purpose, it will require a detailed explanation.
> 

Agreed. I'll rewrite the code as a function like this:

static void mark_updated(struct temp_data *temp)
{
	if (!temp->valid)
		temp->valid = true;

	temp->last_updated = jiffies;
}

>> +
>> +	return 0;
>> +}
>> +
>> +static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
>> +{
>> +	struct peci_rd_pkg_cfg_msg msg;
>> +	int channel = dimm_index / 2;
>> +	int dimm_order = dimm_index % 2;
>> +	int rc;
>> +
>> +	if (!need_update(&priv->temp.dimm[dimm_index]))
>> +		return 0;
>> +
>> +	msg.addr = priv->addr;
>> +	msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>> +	msg.param = channel;
>> +	msg.rx_len = 4;
>> +
>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 1000;
>> +
>> +	if (!priv->temp.dimm[dimm_index].valid) {
>> +		priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
>> +		priv->temp.dimm[dimm_index].valid = true;
>> +	} else {
>> +		priv->temp.dimm[dimm_index].last_updated = jiffies;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static ssize_t show_tcontrol(struct device *dev,
>> +			     struct device_attribute *attr,
>> +			     char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	int rc;
>> +
>> +	rc = get_tcontrol(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
>> +}
>> +
>> +static ssize_t show_tcontrol_margin(struct device *dev,
>> +				    struct device_attribute *attr,
>> +				    char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>> +	int rc;
>> +
>> +	rc = get_tcontrol(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", sensor_attr->index == POS ?
>> +				    priv->temp.tjmax.value -
>> +				    priv->temp.tcontrol.value :
>> +				    priv->temp.tcontrol.value -
>> +				    priv->temp.tjmax.value);
>> +}
>> +
>> +static ssize_t show_tthrottle(struct device *dev,
>> +			      struct device_attribute *attr,
>> +			      char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	int rc;
>> +
>> +	rc = get_tthrottle(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
>> +}
>> +
>> +static ssize_t show_tjmax(struct device *dev,
>> +			  struct device_attribute *attr,
>> +			  char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	int rc;
>> +
>> +	rc = get_tjmax(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", priv->temp.tjmax.value);
>> +}
>> +
>> +static ssize_t show_die_temp(struct device *dev,
>> +			     struct device_attribute *attr,
>> +			     char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	int rc;
>> +
>> +	rc = get_die_temp(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", priv->temp.die.value);
>> +}
>> +
>> +static ssize_t show_dts_margin(struct device *dev,
>> +			       struct device_attribute *attr,
>> +			       char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	int rc;
>> +
>> +	rc = get_dts_margin(priv);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
>> +}
>> +
>> +static ssize_t show_core_temp(struct device *dev,
>> +			      struct device_attribute *attr,
>> +			      char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>> +	int core_index = sensor_attr->index;
>> +	int rc;
>> +
>> +	rc = get_core_temp(priv, core_index);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
>> +}
>> +
>> +static ssize_t show_dimm_temp(struct device *dev,
>> +			      struct device_attribute *attr,
>> +			      char *buf)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>> +	int dimm_index = sensor_attr->index;
>> +	int rc;
>> +
>> +	rc = get_dimm_temp(priv, dimm_index);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
>> +}
>> +
>> +static ssize_t show_value(struct device *dev,
>> +			  struct device_attribute *attr,
>> +			  char *buf)
>> +{
>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>> +
>> +	return sprintf(buf, "%d\n", sensor_attr->index);
>> +}
>> +
>> +static ssize_t show_label(struct device *dev,
>> +			  struct device_attribute *attr,
>> +			  char *buf)
>> +{
>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>> +
>> +	return sprintf(buf, peci_label[sensor_attr->index]);
>> +}
>> +
>> +static ssize_t show_core_label(struct device *dev,
>> +			       struct device_attribute *attr,
>> +			       char *buf)
>> +{
>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>> +
>> +	return sprintf(buf, "Core %d\n", sensor_attr->index);
>> +}
>> +
>> +static ssize_t show_dimm_label(struct device *dev,
>> +			       struct device_attribute *attr,
>> +			       char *buf)
>> +{
>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>> +
>> +	char channel = 'A' + (sensor_attr->index / 2);
>> +	int index = sensor_attr->index % 2;
>> +
>> +	return sprintf(buf, "DIMM %d (%c%d)\n",
>> +		       sensor_attr->index, channel, index);
>> +}
>> +
>> +/* Die temperature */
>> +static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
>> +static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, show_tcontrol_margin, NULL,
>> +			  POS);
>> +
>> +static struct attribute *die_temp_attrs[] = {
>> +	&sensor_dev_attr_temp1_label.dev_attr.attr,
>> +	&sensor_dev_attr_temp1_input.dev_attr.attr,
>> +	&sensor_dev_attr_temp1_max.dev_attr.attr,
>> +	&sensor_dev_attr_temp1_crit.dev_attr.attr,
>> +	&sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
>> +	NULL
>> +};
>> +
>> +static struct attribute_group die_temp_attr_group = {
>> +	.attrs = die_temp_attrs,
>> +};
>> +
>> +/* DTS margin temperature */
>> +static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
>> +static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, NULL, NEG);
>> +
>> +static struct attribute *dts_margin_temp_attrs[] = {
>> +	&sensor_dev_attr_temp2_label.dev_attr.attr,
>> +	&sensor_dev_attr_temp2_input.dev_attr.attr,
>> +	&sensor_dev_attr_temp2_min.dev_attr.attr,
>> +	&sensor_dev_attr_temp2_lcrit.dev_attr.attr,
>> +	NULL
>> +};
>> +
>> +static struct attribute_group dts_margin_temp_attr_group = {
>> +	.attrs = dts_margin_temp_attrs,
>> +};
>> +
>> +/* Tcontrol temperature */
>> +static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, L_TCONTROL);
>> +static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
>> +
>> +static struct attribute *tcontrol_temp_attrs[] = {
>> +	&sensor_dev_attr_temp3_label.dev_attr.attr,
>> +	&sensor_dev_attr_temp3_input.dev_attr.attr,
>> +	&sensor_dev_attr_temp3_crit.dev_attr.attr,
>> +	NULL
>> +};
>> +
>> +static struct attribute_group tcontrol_temp_attr_group = {
>> +	.attrs = tcontrol_temp_attrs,
>> +};
>> +
>> +/* Tthrottle temperature */
>> +static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, L_TTHROTTLE);
>> +static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
>> +
>> +static struct attribute *tthrottle_temp_attrs[] = {
>> +	&sensor_dev_attr_temp4_label.dev_attr.attr,
>> +	&sensor_dev_attr_temp4_input.dev_attr.attr,
>> +	NULL
>> +};
>> +
>> +static struct attribute_group tthrottle_temp_attr_group = {
>> +	.attrs = tthrottle_temp_attrs,
>> +};
>> +
>> +/* Tjmax temperature */
>> +static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
>> +static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
>> +
>> +static struct attribute *tjmax_temp_attrs[] = {
>> +	&sensor_dev_attr_temp5_label.dev_attr.attr,
>> +	&sensor_dev_attr_temp5_input.dev_attr.attr,
>> +	NULL
>> +};
>> +
>> +static struct attribute_group tjmax_temp_attr_group = {
>> +	.attrs = tjmax_temp_attrs,
>> +};
>> +
>> +static const struct attribute_group *
>> +default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
>> +	&die_temp_attr_group,
>> +	&dts_margin_temp_attr_group,
>> +	&tcontrol_temp_attr_group,
>> +	&tthrottle_temp_attr_group,
>> +	&tjmax_temp_attr_group,
>> +	NULL
>> +};
>> +
>> +/* Core temperature */
>> +static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device *dev,
>> +		struct device_attribute *devattr, char *buf) = {
>> +	show_core_label,
>> +	show_core_temp,
>> +	show_tcontrol,
>> +	show_tjmax,
>> +	show_tcontrol_margin,
>> +};
>> +
>> +static const char *const core_suffix[CORE_TEMP_ATTRS] = {
>> +	"label",
>> +	"input",
>> +	"max",
>> +	"crit",
>> +	"crit_hyst",
>> +};
>> +
>> +static int check_resolved_cores(struct peci_hwmon *priv)
>> +{
>> +	struct peci_rd_pci_cfg_local_msg msg;
>> +	int rc;
>> +
>> +	if (!(priv->client->adapter->cmd_mask & BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
>> +		return -EINVAL;
>> +
>> +	/* Get the RESOLVED_CORES register value */
>> +	msg.addr = priv->addr;
>> +	msg.bus = 1;
>> +	msg.device = 30;
>> +	msg.function = 3;
>> +	msg.reg = 0xB4;
>> +	msg.rx_len = 4;
>> +
>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	priv->core_mask = msg.pci_config[3] << 24 |
>> +			  msg.pci_config[2] << 16 |
>> +			  msg.pci_config[1] << 8 |
>> +			  msg.pci_config[0];
>> +
>> +	if (!priv->core_mask)
>> +		return -EAGAIN;
>> +
>> +	dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", priv->core_mask);
>> +	return 0;
>> +}
>> +
>> +static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
>> +{
>> +	struct core_temp_group *data;
>> +	int i;
>> +
>> +	data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
>> +			    GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0; i < CORE_TEMP_ATTRS; i++) {
>> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
>> +			 "temp%d_%s", priv->global_idx, core_suffix[i]);
>> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
>> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
>> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
>> +		data->sd_attrs[i].dev_attr.show = core_show_fn[i];
>> +		if (i == 0 || i == 1) /* label or temp */
>> +			data->sd_attrs[i].index = core_no;
>> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
>> +	}
>> +
>> +	data->attr_group.attrs = data->attrs;
>> +	priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
>> +	priv->global_idx++;
>> +
>> +	return 0;
>> +}
>> +
>> +static int create_core_temp_groups(struct peci_hwmon *priv)
>> +{
>> +	int rc, i;
>> +
>> +	rc = check_resolved_cores(priv);
>> +	if (!rc) {
>> +		for (i = 0; i < CORE_NUMS_MAX; i++) {
>> +			if (priv->core_mask & BIT(i)) {
>> +				rc = create_core_temp_group(priv, i);
>> +				if (rc)
>> +					return rc;
>> +			}
>> +		}
>> +
>> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
>> +					 priv->core_attr_groups);
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +/* DIMM temperature */
>> +static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device *dev,
>> +		struct device_attribute *devattr, char *buf) = {
>> +	show_dimm_label,
>> +	show_dimm_temp,
>> +};
>> +
>> +static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
>> +	"label",
>> +	"input",
>> +};
>> +
>> +static int check_populated_dimms(struct peci_hwmon *priv)
>> +{
>> +	struct peci_rd_pkg_cfg_msg msg;
>> +	int i, rc, pass = 0;
>> +
>> +do_scan:
>> +	for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
>> +		msg.addr = priv->addr;
>> +		msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>> +		msg.param = i; /* channel */
>> +		msg.rx_len = 4;
>> +
>> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +		if (rc < 0)
>> +			return rc;
>> +
>> +		if (msg.pkg_config[0]) /* DIMM #0 on the channel */
>> +			priv->dimm_mask |= BIT(i);
>> +
>> +		if (msg.pkg_config[1]) /* DIMM #1 on the channel */
>> +			priv->dimm_mask |= BIT(i + 1);
> 
> Each loop sets overlapping bits in dimm_mask. The first loop sets
> bit 0 and 1, the second sets bit 1 and 2, and so on. I _think_ this
> should probably set bits (i*2) and (i*2+1). If so, I would suggest to
> test the code in a system with more than one DIMM in more than one bank.
> 

Thanks for your pointing out my mistake. It has to be changed to (i*2) 
and (i*2+1) as you suggested. I'll fix it with taking enough test on a 
various DIMM setting environment.

>> +	}
>> +
>> +	/* Do 2-pass scanning */
>> +	if (priv->dimm_mask && pass == 0) {
>> +		pass++;
>> +		goto do_scan;
> 
> This goto is only used to avoid a nested loops. Please don't do that.
> If you want to avoid indentation levels, add another function.
> 
> Also, this will require an explanation why the loop is executed if
> and only if a dimm is found the first time around.
> 

The reason why is needs 2-pass scanning is, we function can be called in 
the middle of the timing when cient BIOS is updating the values. I'll 
add this as a comment and rewrite this code without using the goto.

>> +	}
>> +
>> +	if (!priv->dimm_mask)
>> +		return -EAGAIN;
>> +
>> +	dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", priv->dimm_mask);
>> +	return 0;
>> +}
>> +
>> +static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
>> +{
>> +	struct dimm_temp_group *data;
>> +	int i;
>> +
>> +	data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
>> +			    GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
>> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
>> +			 "temp%d_%s", priv->global_idx, dimm_suffix[i]);
>> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
>> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
>> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
>> +		data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
>> +		data->sd_attrs[i].index = dimm_no;
>> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
>> +	}
>> +
>> +	data->attr_group.attrs = data->attrs;
>> +	priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
>> +	priv->global_idx++;
>> +
>> +	return 0;
>> +}
>> +
>> +static int create_dimm_temp_groups(struct peci_hwmon *priv)
>> +{
>> +	int rc, i;
>> +
>> +	rc = check_populated_dimms(priv);
>> +	if (!rc) {
>> +		for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
>> +			if (priv->dimm_mask & BIT(i)) {
>> +				rc = create_dimm_temp_group(priv, i);
>> +				if (rc)
>> +					return rc;
>> +			}
>> +		}
>> +
>> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
>> +					 priv->dimm_attr_groups);
>> +		if (!rc)
>> +			dev_dbg(priv->dev, "Done DIMM temp group creation\n");
>> +	} else if (rc == -EAGAIN) {
>> +		queue_delayed_work(priv->work_queue, &priv->work_handler,
>> +				   DIMM_MASK_CHECK_DELAY);
>> +		dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");
> 
> s/Diferred/Deferred/
> 

Will fix this typo.

> If PECI never reports any DIMMS, this will be repeated forever until
> it finds at least one group. Is this intentional ? If so, I would expect
> some detailed explanation of the rationale. As it is, the DIMM temperatures
> can show up randomly after some hours of runtime, which isn't exactly
> deterministic. Maybe that does make sense, but it will need to be explained.
> 

In general, a client CPU will report DIMM population info just after the 
client CPU completes memory training and testing at the very beginning 
of BIOS boot. The time varies depends on the client system but it would 
be less than 5 minutes. I'll add a timeout logic.

>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static void create_dimm_temp_groups_delayed(struct work_struct *work)
>> +{
>> +	struct delayed_work *dwork = to_delayed_work(work);
>> +	struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
>> +					       work_handler);
>> +	int rc;
>> +
>> +	rc = create_dimm_temp_groups(priv);
>> +	if (rc && rc != -EAGAIN)
>> +		dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
>> +}
>> +
>> +static int peci_hwmon_probe(struct peci_client *client)
>> +{
>> +	struct device *dev = &client->dev;
>> +	struct peci_hwmon *priv;
>> +	int rc;
>> +
>> +	if ((client->adapter->cmd_mask &
>> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
>> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
>> +		dev_err(dev, "Client doesn't support temperature monitoring\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>> +	if (!priv)
>> +		return -ENOMEM;
>> +
>> +	dev_set_drvdata(dev, priv);
>> +	priv->client = client;
>> +	priv->dev = dev;
>> +	priv->addr = client->addr;
>> +	priv->cpu_no = priv->addr - PECI_BASE_ADDR;
>> +
>> +	snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", priv->cpu_no);
>> +
>> +	priv->work_queue = create_singlethread_workqueue(priv->name);
>> +	if (!priv->work_queue)
>> +		return -ENOMEM;
>> +
>> +	priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
>> +							    priv->name,
>> +							    priv,
>> +							   default_attr_groups);
>> +
> I'll expect a detailed explanation why using hwmon_device_register_with_info()
> does not work for this driver, and why it would make sense to ever register
> the hwmon device before its attributes are available. From my perspective,
> the driver should delay registration entirely until all attributes are
> available. The hwmon ABI implicitly assumes that all sensors are available
> at the time of hwmon device registration. Anything else can result in
> unexpected behavior.
> 

AFAIK, hwmon_device_register_with_info is for adding a non-standard 
attribute so hwmon_device_register_with_group is correct in this case. 
Also, this delayed additional registration case is being used in 
core-temp.c using a similar way.

>> +	rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
>> +	if (rc) {
>> +		dev_err(dev, "Failed to register peci hwmon\n");
>> +		return rc;
>> +	}
>> +
>> +	priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
>> +
>> +	rc = create_core_temp_groups(priv);
>> +	if (rc) {
>> +		dev_err(dev, "Failed to create core groups\n");
>> +		return rc;
>> +	}
> 
> This should be done before registering the hwmon device (or be left
> to the hwmon core by using the _info API). And it should definitely
> not return an error while keeping the hwmon device around.
> 

As I answers above, hwmon_device_register_with_info is for adding a 
non-standard attribute and this kind of way is being used in core-temp.c 
using a similar way.

>> +
>> +	INIT_DELAYED_WORK(&priv->work_handler, create_dimm_temp_groups_delayed);
>> +
>> +	rc = create_dimm_temp_groups(priv);
>> +	if (rc && rc != -EAGAIN)
>> +		dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
>> +
> Not that it should be there in the first place, but "creat" is not a word.
> 

Please check my above answers. I'll fix the typo.

Again, Thanks a lot for sharing your time to review it. I really 
appreciate it.

BR,
Jae

>> +	dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
>> +
>> +	return 0;
>> +}
>> +
>> +static int peci_hwmon_remove(struct peci_client *client)
>> +{
>> +	struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
>> +
>> +	cancel_delayed_work(&priv->work_handler);
>> +	destroy_workqueue(priv->work_queue);
>> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
>> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
>> +	hwmon_device_unregister(priv->hwmon_dev);
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct of_device_id peci_of_table[] = {
>> +	{ .compatible = "intel,peci-hwmon", },
>> +	{ }
>> +};
>> +MODULE_DEVICE_TABLE(of, peci_of_table);
>> +
>> +static struct peci_driver peci_hwmon_driver = {
>> +	.probe  = peci_hwmon_probe,
>> +	.remove = peci_hwmon_remove,
>> +	.driver = {
>> +		.name           = "peci-hwmon",
>> +		.of_match_table = of_match_ptr(peci_of_table),
>> +	},
>> +};
>> +module_peci_driver(peci_hwmon_driver);
>> +
>> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
>> +MODULE_DESCRIPTION("PECI hwmon driver");
>> +MODULE_LICENSE("GPL v2");
>> -- 
>> 2.16.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-21 21:24     ` Jae Hyun Yoo
@ 2018-02-21 21:48       ` Guenter Roeck
  2018-02-21 23:07         ` Jae Hyun Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Guenter Roeck @ 2018-02-21 21:48 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On Wed, Feb 21, 2018 at 01:24:48PM -0800, Jae Hyun Yoo wrote:
> Hi Guenter,
> 
> Thanks for sharing your time to review this code. Please check my answers
> inline.
> 
> On 2/21/2018 10:26 AM, Guenter Roeck wrote:
> >On Wed, Feb 21, 2018 at 08:16:05AM -0800, Jae Hyun Yoo wrote:
> >>This commit adds a generic PECI hwmon client driver implementation.
> >>
> >>Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> >>---
> >>  drivers/hwmon/Kconfig      |  10 +
> >>  drivers/hwmon/Makefile     |   1 +
> >>  drivers/hwmon/peci-hwmon.c | 928 +++++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 939 insertions(+)
> >>  create mode 100644 drivers/hwmon/peci-hwmon.c
> >>
> >>diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> >>index ef23553ff5cb..f22e0c31f597 100644
> >>--- a/drivers/hwmon/Kconfig
> >>+++ b/drivers/hwmon/Kconfig
> >>@@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
> >>  	  This driver can also be built as a module.  If so, the module
> >>  	  will be called nct7904.
> >>+config SENSORS_PECI_HWMON
> >>+	tristate "PECI hwmon support"
> >>+	depends on PECI
> >>+	help
> >>+	  If you say yes here you get support for the generic PECI hwmon
> >>+	  driver.
> >>+
> >>+	  This driver can also be built as a module.  If so, the module
> >>+	  will be called peci-hwmon.
> >>+
> >>  config SENSORS_NSA320
> >>  	tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
> >>  	depends on GPIOLIB && OF
> >>diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> >>index f814b4ace138..946f54b168e5 100644
> >>--- a/drivers/hwmon/Makefile
> >>+++ b/drivers/hwmon/Makefile
> >>@@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)	+= nct7802.o
> >>  obj-$(CONFIG_SENSORS_NCT7904)	+= nct7904.o
> >>  obj-$(CONFIG_SENSORS_NSA320)	+= nsa320-hwmon.o
> >>  obj-$(CONFIG_SENSORS_NTC_THERMISTOR)	+= ntc_thermistor.o
> >>+obj-$(CONFIG_SENSORS_PECI_HWMON)	+= peci-hwmon.o
> >>  obj-$(CONFIG_SENSORS_PC87360)	+= pc87360.o
> >>  obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
> >>  obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
> >>diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
> >>new file mode 100644
> >>index 000000000000..edd27744adcb
> >>--- /dev/null
> >>+++ b/drivers/hwmon/peci-hwmon.c
> >>@@ -0,0 +1,928 @@
> >>+// SPDX-License-Identifier: GPL-2.0
> >>+// Copyright (c) 2018 Intel Corporation
> >>+
> >>+#include <linux/delay.h>
> >>+#include <linux/hwmon.h>
> >>+#include <linux/hwmon-sysfs.h>
> >>+#include <linux/jiffies.h>
> >>+#include <linux/module.h>
> >>+#include <linux/of_device.h>
> >>+#include <linux/peci.h>
> >>+#include <linux/workqueue.h>
> >>+
> >>+#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks x 2) */
> >>+#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX Platinum) */
> >>+#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
> >>+
> >>+#define CORE_TEMP_ATTRS       5
> >>+#define DIMM_TEMP_ATTRS       2
> >>+#define ATTR_NAME_LEN         24
> >>+
> >>+#define DEFAULT_ATTR_GRP_NUMS 5
> >>+
> >>+#define UPDATE_INTERVAL_MIN   HZ
> >>+#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
> >>+
> >>+enum sign {
> >>+	POS,
> >>+	NEG
> >>+};
> >>+
> >>+struct temp_data {
> >>+	bool valid;
> >>+	s32  value;
> >>+	unsigned long last_updated;
> >>+};
> >>+
> >>+struct temp_group {
> >>+	struct temp_data tjmax;
> >>+	struct temp_data tcontrol;
> >>+	struct temp_data tthrottle;
> >>+	struct temp_data dts_margin;
> >>+	struct temp_data die;
> >>+	struct temp_data core[CORE_NUMS_MAX];
> >>+	struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
> >>+};
> >>+
> >>+struct core_temp_group {
> >>+	struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
> >>+	char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
> >>+	struct attribute *attrs[CORE_TEMP_ATTRS + 1];
> >>+	struct attribute_group attr_group;
> >>+};
> >>+
> >>+struct dimm_temp_group {
> >>+	struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
> >>+	char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
> >>+	struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
> >>+	struct attribute_group attr_group;
> >>+};
> >>+
> >>+struct peci_hwmon {
> >>+	struct peci_client *client;
> >>+	struct device *dev;
> >>+	struct device *hwmon_dev;
> >>+	struct workqueue_struct *work_queue;
> >>+	struct delayed_work work_handler;
> >>+	char name[PECI_NAME_SIZE];
> >>+	struct temp_group temp;
> >>+	u8 addr;
> >>+	uint cpu_no;
> >>+	u32 core_mask;
> >>+	u32 dimm_mask;
> >>+	const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
> >>+	const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX + 1];
> >>+	uint global_idx;
> >>+	uint core_idx;
> >>+	uint dimm_idx;
> >>+};
> >>+
> >>+enum label {
> >>+	L_DIE,
> >>+	L_DTS,
> >>+	L_TCONTROL,
> >>+	L_TTHROTTLE,
> >>+	L_TJMAX,
> >>+	L_MAX
> >>+};
> >>+
> >>+static const char *peci_label[L_MAX] = {
> >>+	"Die\n",
> >>+	"DTS margin to Tcontrol\n",
> >>+	"Tcontrol\n",
> >>+	"Tthrottle\n",
> >>+	"Tjmax\n",
> >>+};
> >>+
> >>+static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, void *msg)
> >>+{
> >>+	return peci_command(priv->client->adapter, cmd, msg);
> >>+}
> >>+
> >>+static int need_update(struct temp_data *temp)
> >>+{
> >>+	if (temp->valid &&
> >>+	    time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
> >>+		return 0;
> >>+
> >>+	return 1;
> >>+}
> >>+
> >>+static s32 ten_dot_six_to_millidegree(s32 x)
> >>+{
> >>+	return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
> >>+}
> >>+
> >>+static int get_tjmax(struct peci_hwmon *priv)
> >>+{
> >>+	struct peci_rd_pkg_cfg_msg msg;
> >>+	int rc;
> >>+
> >>+	if (!priv->temp.tjmax.valid) {
> >>+		msg.addr = priv->addr;
> >>+		msg.index = MBX_INDEX_TEMP_TARGET;
> >>+		msg.param = 0;
> >>+		msg.rx_len = 4;
> >>+
> >>+		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> >
> >Is a typecast to a void * necessary ?
> >
> 
> No. I'll remove the type cast. Thanks!
> 
> >>+		if (rc < 0)
> >>+			return rc;
> >>+
> >>+		priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
> >>+		priv->temp.tjmax.valid = true;
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int get_tcontrol(struct peci_hwmon *priv)
> >>+{
> >>+	struct peci_rd_pkg_cfg_msg msg;
> >>+	s32 tcontrol_margin;
> >>+	int rc;
> >>+
> >>+	if (!need_update(&priv->temp.tcontrol))
> >>+		return 0;
> >>+
> >>+	rc = get_tjmax(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	msg.addr = priv->addr;
> >>+	msg.index = MBX_INDEX_TEMP_TARGET;
> >>+	msg.param = 0;
> >>+	msg.rx_len = 4;
> >>+
> >>+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	tcontrol_margin = msg.pkg_config[1];
> >>+	tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
> >>+
> >>+	priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
> >>+
> >>+	if (!priv->temp.tcontrol.valid) {
> >>+		priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
> >>+		priv->temp.tcontrol.valid = true;
> >>+	} else {
> >>+		priv->temp.tcontrol.last_updated = jiffies;
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int get_tthrottle(struct peci_hwmon *priv)
> >>+{
> >>+	struct peci_rd_pkg_cfg_msg msg;
> >>+	s32 tthrottle_offset;
> >>+	int rc;
> >>+
> >>+	if (!need_update(&priv->temp.tthrottle))
> >>+		return 0;
> >>+
> >>+	rc = get_tjmax(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	msg.addr = priv->addr;
> >>+	msg.index = MBX_INDEX_TEMP_TARGET;
> >>+	msg.param = 0;
> >>+	msg.rx_len = 4;
> >>+
> >>+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
> >>+	priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
> >>+
> >>+	if (!priv->temp.tthrottle.valid) {
> >>+		priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
> >>+		priv->temp.tthrottle.valid = true;
> >>+	} else {
> >>+		priv->temp.tthrottle.last_updated = jiffies;
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int get_die_temp(struct peci_hwmon *priv)
> >>+{
> >>+	struct peci_get_temp_msg msg;
> >>+	int rc;
> >>+
> >>+	if (!need_update(&priv->temp.die))
> >>+		return 0;
> >>+
> >>+	rc = get_tjmax(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	msg.addr = priv->addr;
> >>+
> >>+	rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	priv->temp.die.value = priv->temp.tjmax.value +
> >>+			       ((s32)msg.temp_raw * 1000 / 64);
> >>+
> >>+	if (!priv->temp.die.valid) {
> >>+		priv->temp.die.last_updated = INITIAL_JIFFIES;
> >>+		priv->temp.die.valid = true;
> >>+	} else {
> >>+		priv->temp.die.last_updated = jiffies;
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int get_dts_margin(struct peci_hwmon *priv)
> >>+{
> >>+	struct peci_rd_pkg_cfg_msg msg;
> >>+	s32 dts_margin;
> >>+	int rc;
> >>+
> >>+	if (!need_update(&priv->temp.dts_margin))
> >>+		return 0;
> >>+
> >>+	msg.addr = priv->addr;
> >>+	msg.index = MBX_INDEX_DTS_MARGIN;
> >>+	msg.param = 0;
> >>+	msg.rx_len = 4;
> >>+
> >>+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> >>+
> >>+	/**
> >>+	 * Processors return a value of DTS reading in 10.6 format
> >>+	 * (10 bits signed decimal, 6 bits fractional).
> >>+	 * Error codes:
> >>+	 *   0x8000: General sensor error
> >>+	 *   0x8001: Reserved
> >>+	 *   0x8002: Underflow on reading value
> >>+	 *   0x8003-0x81ff: Reserved
> >>+	 */
> >>+	if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
> >>+		return -1;
> >>+
> >>+	dts_margin = ten_dot_six_to_millidegree(dts_margin);
> >>+
> >>+	priv->temp.dts_margin.value = dts_margin;
> >>+
> >>+	if (!priv->temp.dts_margin.valid) {
> >>+		priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
> >>+		priv->temp.dts_margin.valid = true;
> >>+	} else {
> >>+		priv->temp.dts_margin.last_updated = jiffies;
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int get_core_temp(struct peci_hwmon *priv, int core_index)
> >>+{
> >>+	struct peci_rd_pkg_cfg_msg msg;
> >>+	s32 core_dts_margin;
> >>+	int rc;
> >>+
> >>+	if (!need_update(&priv->temp.core[core_index]))
> >>+		return 0;
> >>+
> >>+	rc = get_tjmax(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	msg.addr = priv->addr;
> >>+	msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
> >>+	msg.param = core_index;
> >>+	msg.rx_len = 4;
> >>+
> >>+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> >>+
> >>+	/**
> >>+	 * Processors return a value of the core DTS reading in 10.6 format
> >>+	 * (10 bits signed decimal, 6 bits fractional).
> >>+	 * Error codes:
> >>+	 *   0x8000: General sensor error
> >>+	 *   0x8001: Reserved
> >>+	 *   0x8002: Underflow on reading value
> >>+	 *   0x8003-0x81ff: Reserved
> >>+	 */
> >>+	if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
> >>+		return -1;
> >
> >Please use valid error codes. This value is returned to user space,
> >and I don't think this error should be EPERM.
> >
> 
> Yes, you are right. I'll fix it.
> 
> >>+
> >>+	core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
> >>+
> >>+	priv->temp.core[core_index].value = priv->temp.tjmax.value +
> >>+					    core_dts_margin;
> >>+
> >>+	if (!priv->temp.core[core_index].valid) {
> >>+		priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
> >>+		priv->temp.core[core_index].valid = true;
> >>+	} else {
> >>+		priv->temp.core[core_index].last_updated = jiffies;
> >>+	}
> >
> >I don't understand the purpose of this code. Why not just set valid = true
> >and last_updated = jiffies ? Why set anything to INITIAL_JIFFIES some
> >arbitrary time after boot ? AFAICS the first read will always be followed
> >by another immediately afterwards if the user requests two readings in
> >a row. Maybe that is intentional, but not to me. If this code is on
> >purpose, it will require a detailed explanation.
> >
> 
> Agreed. I'll rewrite the code as a function like this:
> 
> static void mark_updated(struct temp_data *temp)
> {
> 	if (!temp->valid)
> 		temp->valid = true;

Are you concerned about memory write bandwidth ?

	temp->valid = true;

should be sufficient. I don't see the point of the if statement.

> 
> 	temp->last_updated = jiffies;
> }
> 
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
> >>+{
> >>+	struct peci_rd_pkg_cfg_msg msg;
> >>+	int channel = dimm_index / 2;
> >>+	int dimm_order = dimm_index % 2;
> >>+	int rc;
> >>+
> >>+	if (!need_update(&priv->temp.dimm[dimm_index]))
> >>+		return 0;
> >>+
> >>+	msg.addr = priv->addr;
> >>+	msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> >>+	msg.param = channel;
> >>+	msg.rx_len = 4;
> >>+
> >>+	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 1000;
> >>+
> >>+	if (!priv->temp.dimm[dimm_index].valid) {
> >>+		priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
> >>+		priv->temp.dimm[dimm_index].valid = true;
> >>+	} else {
> >>+		priv->temp.dimm[dimm_index].last_updated = jiffies;
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static ssize_t show_tcontrol(struct device *dev,
> >>+			     struct device_attribute *attr,
> >>+			     char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	int rc;
> >>+
> >>+	rc = get_tcontrol(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
> >>+}
> >>+
> >>+static ssize_t show_tcontrol_margin(struct device *dev,
> >>+				    struct device_attribute *attr,
> >>+				    char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> >>+	int rc;
> >>+
> >>+	rc = get_tcontrol(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", sensor_attr->index == POS ?
> >>+				    priv->temp.tjmax.value -
> >>+				    priv->temp.tcontrol.value :
> >>+				    priv->temp.tcontrol.value -
> >>+				    priv->temp.tjmax.value);
> >>+}
> >>+
> >>+static ssize_t show_tthrottle(struct device *dev,
> >>+			      struct device_attribute *attr,
> >>+			      char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	int rc;
> >>+
> >>+	rc = get_tthrottle(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
> >>+}
> >>+
> >>+static ssize_t show_tjmax(struct device *dev,
> >>+			  struct device_attribute *attr,
> >>+			  char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	int rc;
> >>+
> >>+	rc = get_tjmax(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", priv->temp.tjmax.value);
> >>+}
> >>+
> >>+static ssize_t show_die_temp(struct device *dev,
> >>+			     struct device_attribute *attr,
> >>+			     char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	int rc;
> >>+
> >>+	rc = get_die_temp(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", priv->temp.die.value);
> >>+}
> >>+
> >>+static ssize_t show_dts_margin(struct device *dev,
> >>+			       struct device_attribute *attr,
> >>+			       char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	int rc;
> >>+
> >>+	rc = get_dts_margin(priv);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
> >>+}
> >>+
> >>+static ssize_t show_core_temp(struct device *dev,
> >>+			      struct device_attribute *attr,
> >>+			      char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> >>+	int core_index = sensor_attr->index;
> >>+	int rc;
> >>+
> >>+	rc = get_core_temp(priv, core_index);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
> >>+}
> >>+
> >>+static ssize_t show_dimm_temp(struct device *dev,
> >>+			      struct device_attribute *attr,
> >>+			      char *buf)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(dev);
> >>+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> >>+	int dimm_index = sensor_attr->index;
> >>+	int rc;
> >>+
> >>+	rc = get_dimm_temp(priv, dimm_index);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
> >>+}
> >>+
> >>+static ssize_t show_value(struct device *dev,
> >>+			  struct device_attribute *attr,
> >>+			  char *buf)
> >>+{
> >>+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> >>+
> >>+	return sprintf(buf, "%d\n", sensor_attr->index);
> >>+}
> >>+
> >>+static ssize_t show_label(struct device *dev,
> >>+			  struct device_attribute *attr,
> >>+			  char *buf)
> >>+{
> >>+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> >>+
> >>+	return sprintf(buf, peci_label[sensor_attr->index]);
> >>+}
> >>+
> >>+static ssize_t show_core_label(struct device *dev,
> >>+			       struct device_attribute *attr,
> >>+			       char *buf)
> >>+{
> >>+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> >>+
> >>+	return sprintf(buf, "Core %d\n", sensor_attr->index);
> >>+}
> >>+
> >>+static ssize_t show_dimm_label(struct device *dev,
> >>+			       struct device_attribute *attr,
> >>+			       char *buf)
> >>+{
> >>+	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> >>+
> >>+	char channel = 'A' + (sensor_attr->index / 2);
> >>+	int index = sensor_attr->index % 2;
> >>+
> >>+	return sprintf(buf, "DIMM %d (%c%d)\n",
> >>+		       sensor_attr->index, channel, index);
> >>+}
> >>+
> >>+/* Die temperature */
> >>+static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
> >>+static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
> >>+static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
> >>+static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
> >>+static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, show_tcontrol_margin, NULL,
> >>+			  POS);
> >>+
> >>+static struct attribute *die_temp_attrs[] = {
> >>+	&sensor_dev_attr_temp1_label.dev_attr.attr,
> >>+	&sensor_dev_attr_temp1_input.dev_attr.attr,
> >>+	&sensor_dev_attr_temp1_max.dev_attr.attr,
> >>+	&sensor_dev_attr_temp1_crit.dev_attr.attr,
> >>+	&sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
> >>+	NULL
> >>+};
> >>+
> >>+static struct attribute_group die_temp_attr_group = {
> >>+	.attrs = die_temp_attrs,
> >>+};
> >>+
> >>+/* DTS margin temperature */
> >>+static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
> >>+static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
> >>+static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
> >>+static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, NULL, NEG);
> >>+
> >>+static struct attribute *dts_margin_temp_attrs[] = {
> >>+	&sensor_dev_attr_temp2_label.dev_attr.attr,
> >>+	&sensor_dev_attr_temp2_input.dev_attr.attr,
> >>+	&sensor_dev_attr_temp2_min.dev_attr.attr,
> >>+	&sensor_dev_attr_temp2_lcrit.dev_attr.attr,
> >>+	NULL
> >>+};
> >>+
> >>+static struct attribute_group dts_margin_temp_attr_group = {
> >>+	.attrs = dts_margin_temp_attrs,
> >>+};
> >>+
> >>+/* Tcontrol temperature */
> >>+static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, L_TCONTROL);
> >>+static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
> >>+static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
> >>+
> >>+static struct attribute *tcontrol_temp_attrs[] = {
> >>+	&sensor_dev_attr_temp3_label.dev_attr.attr,
> >>+	&sensor_dev_attr_temp3_input.dev_attr.attr,
> >>+	&sensor_dev_attr_temp3_crit.dev_attr.attr,
> >>+	NULL
> >>+};
> >>+
> >>+static struct attribute_group tcontrol_temp_attr_group = {
> >>+	.attrs = tcontrol_temp_attrs,
> >>+};
> >>+
> >>+/* Tthrottle temperature */
> >>+static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, L_TTHROTTLE);
> >>+static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
> >>+
> >>+static struct attribute *tthrottle_temp_attrs[] = {
> >>+	&sensor_dev_attr_temp4_label.dev_attr.attr,
> >>+	&sensor_dev_attr_temp4_input.dev_attr.attr,
> >>+	NULL
> >>+};
> >>+
> >>+static struct attribute_group tthrottle_temp_attr_group = {
> >>+	.attrs = tthrottle_temp_attrs,
> >>+};
> >>+
> >>+/* Tjmax temperature */
> >>+static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
> >>+static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
> >>+
> >>+static struct attribute *tjmax_temp_attrs[] = {
> >>+	&sensor_dev_attr_temp5_label.dev_attr.attr,
> >>+	&sensor_dev_attr_temp5_input.dev_attr.attr,
> >>+	NULL
> >>+};
> >>+
> >>+static struct attribute_group tjmax_temp_attr_group = {
> >>+	.attrs = tjmax_temp_attrs,
> >>+};
> >>+
> >>+static const struct attribute_group *
> >>+default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
> >>+	&die_temp_attr_group,
> >>+	&dts_margin_temp_attr_group,
> >>+	&tcontrol_temp_attr_group,
> >>+	&tthrottle_temp_attr_group,
> >>+	&tjmax_temp_attr_group,
> >>+	NULL
> >>+};
> >>+
> >>+/* Core temperature */
> >>+static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device *dev,
> >>+		struct device_attribute *devattr, char *buf) = {
> >>+	show_core_label,
> >>+	show_core_temp,
> >>+	show_tcontrol,
> >>+	show_tjmax,
> >>+	show_tcontrol_margin,
> >>+};
> >>+
> >>+static const char *const core_suffix[CORE_TEMP_ATTRS] = {
> >>+	"label",
> >>+	"input",
> >>+	"max",
> >>+	"crit",
> >>+	"crit_hyst",
> >>+};
> >>+
> >>+static int check_resolved_cores(struct peci_hwmon *priv)
> >>+{
> >>+	struct peci_rd_pci_cfg_local_msg msg;
> >>+	int rc;
> >>+
> >>+	if (!(priv->client->adapter->cmd_mask & BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
> >>+		return -EINVAL;
> >>+
> >>+	/* Get the RESOLVED_CORES register value */
> >>+	msg.addr = priv->addr;
> >>+	msg.bus = 1;
> >>+	msg.device = 30;
> >>+	msg.function = 3;
> >>+	msg.reg = 0xB4;
> >>+	msg.rx_len = 4;
> >>+
> >>+	rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
> >>+	if (rc < 0)
> >>+		return rc;
> >>+
> >>+	priv->core_mask = msg.pci_config[3] << 24 |
> >>+			  msg.pci_config[2] << 16 |
> >>+			  msg.pci_config[1] << 8 |
> >>+			  msg.pci_config[0];
> >>+
> >>+	if (!priv->core_mask)
> >>+		return -EAGAIN;
> >>+
> >>+	dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", priv->core_mask);
> >>+	return 0;
> >>+}
> >>+
> >>+static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
> >>+{
> >>+	struct core_temp_group *data;
> >>+	int i;
> >>+
> >>+	data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
> >>+			    GFP_KERNEL);
> >>+	if (!data)
> >>+		return -ENOMEM;
> >>+
> >>+	for (i = 0; i < CORE_TEMP_ATTRS; i++) {
> >>+		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> >>+			 "temp%d_%s", priv->global_idx, core_suffix[i]);
> >>+		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> >>+		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> >>+		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> >>+		data->sd_attrs[i].dev_attr.show = core_show_fn[i];
> >>+		if (i == 0 || i == 1) /* label or temp */
> >>+			data->sd_attrs[i].index = core_no;
> >>+		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> >>+	}
> >>+
> >>+	data->attr_group.attrs = data->attrs;
> >>+	priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
> >>+	priv->global_idx++;
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int create_core_temp_groups(struct peci_hwmon *priv)
> >>+{
> >>+	int rc, i;
> >>+
> >>+	rc = check_resolved_cores(priv);
> >>+	if (!rc) {
> >>+		for (i = 0; i < CORE_NUMS_MAX; i++) {
> >>+			if (priv->core_mask & BIT(i)) {
> >>+				rc = create_core_temp_group(priv, i);
> >>+				if (rc)
> >>+					return rc;
> >>+			}
> >>+		}
> >>+
> >>+		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> >>+					 priv->core_attr_groups);
> >>+	}
> >>+
> >>+	return rc;
> >>+}
> >>+
> >>+/* DIMM temperature */
> >>+static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device *dev,
> >>+		struct device_attribute *devattr, char *buf) = {
> >>+	show_dimm_label,
> >>+	show_dimm_temp,
> >>+};
> >>+
> >>+static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
> >>+	"label",
> >>+	"input",
> >>+};
> >>+
> >>+static int check_populated_dimms(struct peci_hwmon *priv)
> >>+{
> >>+	struct peci_rd_pkg_cfg_msg msg;
> >>+	int i, rc, pass = 0;
> >>+
> >>+do_scan:
> >>+	for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
> >>+		msg.addr = priv->addr;
> >>+		msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> >>+		msg.param = i; /* channel */
> >>+		msg.rx_len = 4;
> >>+
> >>+		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> >>+		if (rc < 0)
> >>+			return rc;
> >>+
> >>+		if (msg.pkg_config[0]) /* DIMM #0 on the channel */
> >>+			priv->dimm_mask |= BIT(i);
> >>+
> >>+		if (msg.pkg_config[1]) /* DIMM #1 on the channel */
> >>+			priv->dimm_mask |= BIT(i + 1);
> >
> >Each loop sets overlapping bits in dimm_mask. The first loop sets
> >bit 0 and 1, the second sets bit 1 and 2, and so on. I _think_ this
> >should probably set bits (i*2) and (i*2+1). If so, I would suggest to
> >test the code in a system with more than one DIMM in more than one bank.
> >
> 
> Thanks for your pointing out my mistake. It has to be changed to (i*2) and
> (i*2+1) as you suggested. I'll fix it with taking enough test on a various
> DIMM setting environment.
> 
> >>+	}
> >>+
> >>+	/* Do 2-pass scanning */
> >>+	if (priv->dimm_mask && pass == 0) {
> >>+		pass++;
> >>+		goto do_scan;
> >
> >This goto is only used to avoid a nested loops. Please don't do that.
> >If you want to avoid indentation levels, add another function.
> >
> >Also, this will require an explanation why the loop is executed if
> >and only if a dimm is found the first time around.
> >
> 
> The reason why is needs 2-pass scanning is, we function can be called in the
> middle of the timing when cient BIOS is updating the values. I'll add this
> as a comment and rewrite this code without using the goto.
> 

How would that be different during the 2nd scan ? If there is concern about
concurrency, the loop would have to be repeated until there are no more changes.
Even then there could be update during the last scan (which the code did not
catch).

> >>+	}
> >>+
> >>+	if (!priv->dimm_mask)
> >>+		return -EAGAIN;
> >>+
> >>+	dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", priv->dimm_mask);
> >>+	return 0;
> >>+}
> >>+
> >>+static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
> >>+{
> >>+	struct dimm_temp_group *data;
> >>+	int i;
> >>+
> >>+	data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
> >>+			    GFP_KERNEL);
> >>+	if (!data)
> >>+		return -ENOMEM;
> >>+
> >>+	for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
> >>+		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> >>+			 "temp%d_%s", priv->global_idx, dimm_suffix[i]);
> >>+		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> >>+		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> >>+		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> >>+		data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
> >>+		data->sd_attrs[i].index = dimm_no;
> >>+		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> >>+	}
> >>+
> >>+	data->attr_group.attrs = data->attrs;
> >>+	priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
> >>+	priv->global_idx++;
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int create_dimm_temp_groups(struct peci_hwmon *priv)
> >>+{
> >>+	int rc, i;
> >>+
> >>+	rc = check_populated_dimms(priv);
> >>+	if (!rc) {
> >>+		for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
> >>+			if (priv->dimm_mask & BIT(i)) {
> >>+				rc = create_dimm_temp_group(priv, i);
> >>+				if (rc)
> >>+					return rc;
> >>+			}
> >>+		}
> >>+
> >>+		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> >>+					 priv->dimm_attr_groups);
> >>+		if (!rc)
> >>+			dev_dbg(priv->dev, "Done DIMM temp group creation\n");
> >>+	} else if (rc == -EAGAIN) {
> >>+		queue_delayed_work(priv->work_queue, &priv->work_handler,
> >>+				   DIMM_MASK_CHECK_DELAY);
> >>+		dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");
> >
> >s/Diferred/Deferred/
> >
> 
> Will fix this typo.
> 
> >If PECI never reports any DIMMS, this will be repeated forever until
> >it finds at least one group. Is this intentional ? If so, I would expect
> >some detailed explanation of the rationale. As it is, the DIMM temperatures
> >can show up randomly after some hours of runtime, which isn't exactly
> >deterministic. Maybe that does make sense, but it will need to be explained.
> >
> 
> In general, a client CPU will report DIMM population info just after the
> client CPU completes memory training and testing at the very beginning of
> BIOS boot. The time varies depends on the client system but it would be less
> than 5 minutes. I'll add a timeout logic.
> 
But it is not complete by the time Linux boots, and there is no "incomplete"
message ?

This sounds racy; how is it guaranteed that any reading is complete and that
no additional DIMMs will show up some arbitrary time after the first DIMM
was reported ?

> >>+	}
> >>+
> >>+	return rc;
> >>+}
> >>+
> >>+static void create_dimm_temp_groups_delayed(struct work_struct *work)
> >>+{
> >>+	struct delayed_work *dwork = to_delayed_work(work);
> >>+	struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
> >>+					       work_handler);
> >>+	int rc;
> >>+
> >>+	rc = create_dimm_temp_groups(priv);
> >>+	if (rc && rc != -EAGAIN)
> >>+		dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
> >>+}
> >>+
> >>+static int peci_hwmon_probe(struct peci_client *client)
> >>+{
> >>+	struct device *dev = &client->dev;
> >>+	struct peci_hwmon *priv;
> >>+	int rc;
> >>+
> >>+	if ((client->adapter->cmd_mask &
> >>+	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
> >>+	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
> >>+		dev_err(dev, "Client doesn't support temperature monitoring\n");
> >>+		return -EINVAL;
> >>+	}
> >>+
> >>+	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> >>+	if (!priv)
> >>+		return -ENOMEM;
> >>+
> >>+	dev_set_drvdata(dev, priv);
> >>+	priv->client = client;
> >>+	priv->dev = dev;
> >>+	priv->addr = client->addr;
> >>+	priv->cpu_no = priv->addr - PECI_BASE_ADDR;
> >>+
> >>+	snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", priv->cpu_no);
> >>+
> >>+	priv->work_queue = create_singlethread_workqueue(priv->name);
> >>+	if (!priv->work_queue)
> >>+		return -ENOMEM;
> >>+
> >>+	priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
> >>+							    priv->name,
> >>+							    priv,
> >>+							   default_attr_groups);
> >>+
> >I'll expect a detailed explanation why using hwmon_device_register_with_info()
> >does not work for this driver, and why it would make sense to ever register
> >the hwmon device before its attributes are available. From my perspective,
> >the driver should delay registration entirely until all attributes are
> >available. The hwmon ABI implicitly assumes that all sensors are available
> >at the time of hwmon device registration. Anything else can result in
> >unexpected behavior.
> >
> 
> AFAIK, hwmon_device_register_with_info is for adding a non-standard

This is wrong.

hwmon_device_register_with_info() has an _option_ to provide a set of
additional non-standard attributes as its last parameter. Its purpose
is to simplify drivers by moving sysfs attribute handling into the
hwmon core. All new drivers should use that API unless there is a
compelling reason not to do so.

> attribute so hwmon_device_register_with_group is correct in this case. Also,

This is wrong. Quoting from the documentation.

"hwmon_device_register_with_info is the most comprehensive and preferred means
to register a hardware monitoring device. It creates the standard sysfs
attributes in the hardware monitoring core, letting the driver focus on reading
from and writing to the chip instead of having to bother with sysfs attributes.
Its parameters are described in more detail below."

> this delayed additional registration case is being used in core-temp.c using
> a similar way.
> 
That doesn't make it better.

> >>+	rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
> >>+	if (rc) {
> >>+		dev_err(dev, "Failed to register peci hwmon\n");
> >>+		return rc;
> >>+	}
> >>+
> >>+	priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
> >>+
> >>+	rc = create_core_temp_groups(priv);
> >>+	if (rc) {
> >>+		dev_err(dev, "Failed to create core groups\n");
> >>+		return rc;
> >>+	}
> >
> >This should be done before registering the hwmon device (or be left
> >to the hwmon core by using the _info API). And it should definitely
> >not return an error while keeping the hwmon device around.
> >
> 
> As I answers above, hwmon_device_register_with_info is for adding a
> non-standard attribute and this kind of way is being used in core-temp.c

Again, this is wrong.

> using a similar way.

Two wrongs don't make it right. Besides, the coretemp driver handles 
(or tries to handle) dynamic CPU insertion and removal, which is not
the case here.

Guenter

> >>+
> >>+	INIT_DELAYED_WORK(&priv->work_handler, create_dimm_temp_groups_delayed);
> >>+
> >>+	rc = create_dimm_temp_groups(priv);
> >>+	if (rc && rc != -EAGAIN)
> >>+		dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
> >>+
> >Not that it should be there in the first place, but "creat" is not a word.
> >
> 
> Please check my above answers. I'll fix the typo.
> 
> Again, Thanks a lot for sharing your time to review it. I really appreciate
> it.
> 
> BR,
> Jae
> 
> >>+	dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static int peci_hwmon_remove(struct peci_client *client)
> >>+{
> >>+	struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
> >>+
> >>+	cancel_delayed_work(&priv->work_handler);
> >>+	destroy_workqueue(priv->work_queue);
> >>+	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
> >>+	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
> >>+	hwmon_device_unregister(priv->hwmon_dev);
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+static const struct of_device_id peci_of_table[] = {
> >>+	{ .compatible = "intel,peci-hwmon", },
> >>+	{ }
> >>+};
> >>+MODULE_DEVICE_TABLE(of, peci_of_table);
> >>+
> >>+static struct peci_driver peci_hwmon_driver = {
> >>+	.probe  = peci_hwmon_probe,
> >>+	.remove = peci_hwmon_remove,
> >>+	.driver = {
> >>+		.name           = "peci-hwmon",
> >>+		.of_match_table = of_match_ptr(peci_of_table),
> >>+	},
> >>+};
> >>+module_peci_driver(peci_hwmon_driver);
> >>+
> >>+MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
> >>+MODULE_DESCRIPTION("PECI hwmon driver");
> >>+MODULE_LICENSE("GPL v2");
> >>-- 
> >>2.16.1
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 20:31     ` Jae Hyun Yoo
@ 2018-02-21 21:51       ` Andrew Lunn
  2018-02-21 22:03         ` Jae Hyun Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Lunn @ 2018-02-21 21:51 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

> >Is there a real need to do transfers in atomic context, or with
> >interrupts disabled?
> >
> 
> Actually, no. Generally, this function will be called in sleep-able context
> so this code is for an exceptional case handling.
> 
> I'll rewrite this code like below:
> 	if (in_atomic() || irqs_disabled()) {
> 		dev_dbg(&adapter->dev,
> 			"xfer in non-sleepable context is not supported\n");
> 		return -EWOULDBLOCK;
> 	}

I would not even do that. Just add a call to
might_sleep(). CONFIG_DEBUG_ATOMIC_SLEEP will then find bad calls.

> >>+static int peci_ioctl_get_temp(struct peci_adapter *adapter, void *vmsg)
> >>+{
> >>+	struct peci_get_temp_msg *umsg = vmsg;
> >>+	struct peci_xfer_msg msg;
> >>+	int rc;
> >>+
> >
> >Is this getting the temperature?
> >
> 
> Yes, this is getting the 'die' temperature of a processor package.
 
So the hwmon driver provides this. No need to have both.

> >>+static long peci_ioctl(struct file *file, unsigned int iocmd, unsigned long arg)
> >>+{
> >>+	struct peci_adapter *adapter = file->private_data;
> >>+	void __user *argp = (void __user *)arg;
> >>+	unsigned int msg_len;
> >>+	enum peci_cmd cmd;
> >>+	u8 *msg;
> >>+	int rc = 0;
> >>+
> >>+	dev_dbg(&adapter->dev, "ioctl, cmd=0x%x, arg=0x%lx\n", iocmd, arg);
> >>+
> >>+	switch (iocmd) {
> >>+	case PECI_IOC_PING:
> >>+	case PECI_IOC_GET_DIB:
> >>+	case PECI_IOC_GET_TEMP:
> >>+	case PECI_IOC_RD_PKG_CFG:
> >>+	case PECI_IOC_WR_PKG_CFG:
> >>+	case PECI_IOC_RD_IA_MSR:
> >>+	case PECI_IOC_RD_PCI_CFG:
> >>+	case PECI_IOC_RD_PCI_CFG_LOCAL:
> >>+	case PECI_IOC_WR_PCI_CFG_LOCAL:
> >>+		cmd = _IOC_TYPE(iocmd) - PECI_IOC_BASE;
> >>+		msg_len = _IOC_SIZE(iocmd);
> >>+		break;
> >
> >Adding new ioctl calls is pretty frowned up. Can you export this info
> >via /sysfs?
> >
> 
> Most of these are not simple IOs so ioctl is better suited, I think.

Lets see what other reviewers say, but i think ioctls are
wrong.

     Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 21:51       ` Andrew Lunn
@ 2018-02-21 22:03         ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 22:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc



On 2/21/2018 1:51 PM, Andrew Lunn wrote:
>>> Is there a real need to do transfers in atomic context, or with
>>> interrupts disabled?
>>>
>>
>> Actually, no. Generally, this function will be called in sleep-able context
>> so this code is for an exceptional case handling.
>>
>> I'll rewrite this code like below:
>> 	if (in_atomic() || irqs_disabled()) {
>> 		dev_dbg(&adapter->dev,
>> 			"xfer in non-sleepable context is not supported\n");
>> 		return -EWOULDBLOCK;
>> 	}
> 
> I would not even do that. Just add a call to
> might_sleep(). CONFIG_DEBUG_ATOMIC_SLEEP will then find bad calls.
> 

Thanks for the suggestion. I've learned one thing. :)

>>>> +static int peci_ioctl_get_temp(struct peci_adapter *adapter, void *vmsg)
>>>> +{
>>>> +	struct peci_get_temp_msg *umsg = vmsg;
>>>> +	struct peci_xfer_msg msg;
>>>> +	int rc;
>>>> +
>>>
>>> Is this getting the temperature?
>>>
>>
>> Yes, this is getting the 'die' temperature of a processor package.
>   
> So the hwmon driver provides this. No need to have both.
> 

This this common API in core driver of PECI bus. The hwmon is also uses 
it through peci_command call.

>>>> +static long peci_ioctl(struct file *file, unsigned int iocmd, unsigned long arg)
>>>> +{
>>>> +	struct peci_adapter *adapter = file->private_data;
>>>> +	void __user *argp = (void __user *)arg;
>>>> +	unsigned int msg_len;
>>>> +	enum peci_cmd cmd;
>>>> +	u8 *msg;
>>>> +	int rc = 0;
>>>> +
>>>> +	dev_dbg(&adapter->dev, "ioctl, cmd=0x%x, arg=0x%lx\n", iocmd, arg);
>>>> +
>>>> +	switch (iocmd) {
>>>> +	case PECI_IOC_PING:
>>>> +	case PECI_IOC_GET_DIB:
>>>> +	case PECI_IOC_GET_TEMP:
>>>> +	case PECI_IOC_RD_PKG_CFG:
>>>> +	case PECI_IOC_WR_PKG_CFG:
>>>> +	case PECI_IOC_RD_IA_MSR:
>>>> +	case PECI_IOC_RD_PCI_CFG:
>>>> +	case PECI_IOC_RD_PCI_CFG_LOCAL:
>>>> +	case PECI_IOC_WR_PCI_CFG_LOCAL:
>>>> +		cmd = _IOC_TYPE(iocmd) - PECI_IOC_BASE;
>>>> +		msg_len = _IOC_SIZE(iocmd);
>>>> +		break;
>>>
>>> Adding new ioctl calls is pretty frowned up. Can you export this info
>>> via /sysfs?
>>>
>>
>> Most of these are not simple IOs so ioctl is better suited, I think.
> 
> Lets see what other reviewers say, but i think ioctls are
> wrong.
> 
>       Andrew
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-21 21:48       ` Guenter Roeck
@ 2018-02-21 23:07         ` Jae Hyun Yoo
  2018-02-22  0:37           ` Andrew Lunn
  0 siblings, 1 reply; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-21 23:07 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: joel, andrew, arnd, gregkh, jdelvare, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc



On 2/21/2018 1:48 PM, Guenter Roeck wrote:
> On Wed, Feb 21, 2018 at 01:24:48PM -0800, Jae Hyun Yoo wrote:
>> Hi Guenter,
>>
>> Thanks for sharing your time to review this code. Please check my answers
>> inline.
>>
>> On 2/21/2018 10:26 AM, Guenter Roeck wrote:
>>> On Wed, Feb 21, 2018 at 08:16:05AM -0800, Jae Hyun Yoo wrote:
>>>> This commit adds a generic PECI hwmon client driver implementation.
>>>>
>>>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>>>> ---
>>>>   drivers/hwmon/Kconfig      |  10 +
>>>>   drivers/hwmon/Makefile     |   1 +
>>>>   drivers/hwmon/peci-hwmon.c | 928 +++++++++++++++++++++++++++++++++++++++++++++
>>>>   3 files changed, 939 insertions(+)
>>>>   create mode 100644 drivers/hwmon/peci-hwmon.c
>>>>
>>>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>>>> index ef23553ff5cb..f22e0c31f597 100644
>>>> --- a/drivers/hwmon/Kconfig
>>>> +++ b/drivers/hwmon/Kconfig
>>>> @@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
>>>>   	  This driver can also be built as a module.  If so, the module
>>>>   	  will be called nct7904.
>>>> +config SENSORS_PECI_HWMON
>>>> +	tristate "PECI hwmon support"
>>>> +	depends on PECI
>>>> +	help
>>>> +	  If you say yes here you get support for the generic PECI hwmon
>>>> +	  driver.
>>>> +
>>>> +	  This driver can also be built as a module.  If so, the module
>>>> +	  will be called peci-hwmon.
>>>> +
>>>>   config SENSORS_NSA320
>>>>   	tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
>>>>   	depends on GPIOLIB && OF
>>>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>>>> index f814b4ace138..946f54b168e5 100644
>>>> --- a/drivers/hwmon/Makefile
>>>> +++ b/drivers/hwmon/Makefile
>>>> @@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)	+= nct7802.o
>>>>   obj-$(CONFIG_SENSORS_NCT7904)	+= nct7904.o
>>>>   obj-$(CONFIG_SENSORS_NSA320)	+= nsa320-hwmon.o
>>>>   obj-$(CONFIG_SENSORS_NTC_THERMISTOR)	+= ntc_thermistor.o
>>>> +obj-$(CONFIG_SENSORS_PECI_HWMON)	+= peci-hwmon.o
>>>>   obj-$(CONFIG_SENSORS_PC87360)	+= pc87360.o
>>>>   obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
>>>>   obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
>>>> diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
>>>> new file mode 100644
>>>> index 000000000000..edd27744adcb
>>>> --- /dev/null
>>>> +++ b/drivers/hwmon/peci-hwmon.c
>>>> @@ -0,0 +1,928 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +// Copyright (c) 2018 Intel Corporation
>>>> +
>>>> +#include <linux/delay.h>
>>>> +#include <linux/hwmon.h>
>>>> +#include <linux/hwmon-sysfs.h>
>>>> +#include <linux/jiffies.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/of_device.h>
>>>> +#include <linux/peci.h>
>>>> +#include <linux/workqueue.h>
>>>> +
>>>> +#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks x 2) */
>>>> +#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX Platinum) */
>>>> +#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
>>>> +
>>>> +#define CORE_TEMP_ATTRS       5
>>>> +#define DIMM_TEMP_ATTRS       2
>>>> +#define ATTR_NAME_LEN         24
>>>> +
>>>> +#define DEFAULT_ATTR_GRP_NUMS 5
>>>> +
>>>> +#define UPDATE_INTERVAL_MIN   HZ
>>>> +#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
>>>> +
>>>> +enum sign {
>>>> +	POS,
>>>> +	NEG
>>>> +};
>>>> +
>>>> +struct temp_data {
>>>> +	bool valid;
>>>> +	s32  value;
>>>> +	unsigned long last_updated;
>>>> +};
>>>> +
>>>> +struct temp_group {
>>>> +	struct temp_data tjmax;
>>>> +	struct temp_data tcontrol;
>>>> +	struct temp_data tthrottle;
>>>> +	struct temp_data dts_margin;
>>>> +	struct temp_data die;
>>>> +	struct temp_data core[CORE_NUMS_MAX];
>>>> +	struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
>>>> +};
>>>> +
>>>> +struct core_temp_group {
>>>> +	struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
>>>> +	char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
>>>> +	struct attribute *attrs[CORE_TEMP_ATTRS + 1];
>>>> +	struct attribute_group attr_group;
>>>> +};
>>>> +
>>>> +struct dimm_temp_group {
>>>> +	struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
>>>> +	char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
>>>> +	struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
>>>> +	struct attribute_group attr_group;
>>>> +};
>>>> +
>>>> +struct peci_hwmon {
>>>> +	struct peci_client *client;
>>>> +	struct device *dev;
>>>> +	struct device *hwmon_dev;
>>>> +	struct workqueue_struct *work_queue;
>>>> +	struct delayed_work work_handler;
>>>> +	char name[PECI_NAME_SIZE];
>>>> +	struct temp_group temp;
>>>> +	u8 addr;
>>>> +	uint cpu_no;
>>>> +	u32 core_mask;
>>>> +	u32 dimm_mask;
>>>> +	const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
>>>> +	const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX + 1];
>>>> +	uint global_idx;
>>>> +	uint core_idx;
>>>> +	uint dimm_idx;
>>>> +};
>>>> +
>>>> +enum label {
>>>> +	L_DIE,
>>>> +	L_DTS,
>>>> +	L_TCONTROL,
>>>> +	L_TTHROTTLE,
>>>> +	L_TJMAX,
>>>> +	L_MAX
>>>> +};
>>>> +
>>>> +static const char *peci_label[L_MAX] = {
>>>> +	"Die\n",
>>>> +	"DTS margin to Tcontrol\n",
>>>> +	"Tcontrol\n",
>>>> +	"Tthrottle\n",
>>>> +	"Tjmax\n",
>>>> +};
>>>> +
>>>> +static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, void *msg)
>>>> +{
>>>> +	return peci_command(priv->client->adapter, cmd, msg);
>>>> +}
>>>> +
>>>> +static int need_update(struct temp_data *temp)
>>>> +{
>>>> +	if (temp->valid &&
>>>> +	    time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
>>>> +		return 0;
>>>> +
>>>> +	return 1;
>>>> +}
>>>> +
>>>> +static s32 ten_dot_six_to_millidegree(s32 x)
>>>> +{
>>>> +	return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
>>>> +}
>>>> +
>>>> +static int get_tjmax(struct peci_hwmon *priv)
>>>> +{
>>>> +	struct peci_rd_pkg_cfg_msg msg;
>>>> +	int rc;
>>>> +
>>>> +	if (!priv->temp.tjmax.valid) {
>>>> +		msg.addr = priv->addr;
>>>> +		msg.index = MBX_INDEX_TEMP_TARGET;
>>>> +		msg.param = 0;
>>>> +		msg.rx_len = 4;
>>>> +
>>>> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>>>
>>> Is a typecast to a void * necessary ?
>>>
>>
>> No. I'll remove the type cast. Thanks!
>>
>>>> +		if (rc < 0)
>>>> +			return rc;
>>>> +
>>>> +		priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
>>>> +		priv->temp.tjmax.valid = true;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int get_tcontrol(struct peci_hwmon *priv)
>>>> +{
>>>> +	struct peci_rd_pkg_cfg_msg msg;
>>>> +	s32 tcontrol_margin;
>>>> +	int rc;
>>>> +
>>>> +	if (!need_update(&priv->temp.tcontrol))
>>>> +		return 0;
>>>> +
>>>> +	rc = get_tjmax(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	msg.addr = priv->addr;
>>>> +	msg.index = MBX_INDEX_TEMP_TARGET;
>>>> +	msg.param = 0;
>>>> +	msg.rx_len = 4;
>>>> +
>>>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	tcontrol_margin = msg.pkg_config[1];
>>>> +	tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
>>>> +
>>>> +	priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
>>>> +
>>>> +	if (!priv->temp.tcontrol.valid) {
>>>> +		priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
>>>> +		priv->temp.tcontrol.valid = true;
>>>> +	} else {
>>>> +		priv->temp.tcontrol.last_updated = jiffies;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int get_tthrottle(struct peci_hwmon *priv)
>>>> +{
>>>> +	struct peci_rd_pkg_cfg_msg msg;
>>>> +	s32 tthrottle_offset;
>>>> +	int rc;
>>>> +
>>>> +	if (!need_update(&priv->temp.tthrottle))
>>>> +		return 0;
>>>> +
>>>> +	rc = get_tjmax(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	msg.addr = priv->addr;
>>>> +	msg.index = MBX_INDEX_TEMP_TARGET;
>>>> +	msg.param = 0;
>>>> +	msg.rx_len = 4;
>>>> +
>>>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
>>>> +	priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
>>>> +
>>>> +	if (!priv->temp.tthrottle.valid) {
>>>> +		priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
>>>> +		priv->temp.tthrottle.valid = true;
>>>> +	} else {
>>>> +		priv->temp.tthrottle.last_updated = jiffies;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int get_die_temp(struct peci_hwmon *priv)
>>>> +{
>>>> +	struct peci_get_temp_msg msg;
>>>> +	int rc;
>>>> +
>>>> +	if (!need_update(&priv->temp.die))
>>>> +		return 0;
>>>> +
>>>> +	rc = get_tjmax(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	msg.addr = priv->addr;
>>>> +
>>>> +	rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	priv->temp.die.value = priv->temp.tjmax.value +
>>>> +			       ((s32)msg.temp_raw * 1000 / 64);
>>>> +
>>>> +	if (!priv->temp.die.valid) {
>>>> +		priv->temp.die.last_updated = INITIAL_JIFFIES;
>>>> +		priv->temp.die.valid = true;
>>>> +	} else {
>>>> +		priv->temp.die.last_updated = jiffies;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int get_dts_margin(struct peci_hwmon *priv)
>>>> +{
>>>> +	struct peci_rd_pkg_cfg_msg msg;
>>>> +	s32 dts_margin;
>>>> +	int rc;
>>>> +
>>>> +	if (!need_update(&priv->temp.dts_margin))
>>>> +		return 0;
>>>> +
>>>> +	msg.addr = priv->addr;
>>>> +	msg.index = MBX_INDEX_DTS_MARGIN;
>>>> +	msg.param = 0;
>>>> +	msg.rx_len = 4;
>>>> +
>>>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>>>> +
>>>> +	/**
>>>> +	 * Processors return a value of DTS reading in 10.6 format
>>>> +	 * (10 bits signed decimal, 6 bits fractional).
>>>> +	 * Error codes:
>>>> +	 *   0x8000: General sensor error
>>>> +	 *   0x8001: Reserved
>>>> +	 *   0x8002: Underflow on reading value
>>>> +	 *   0x8003-0x81ff: Reserved
>>>> +	 */
>>>> +	if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
>>>> +		return -1;
>>>> +
>>>> +	dts_margin = ten_dot_six_to_millidegree(dts_margin);
>>>> +
>>>> +	priv->temp.dts_margin.value = dts_margin;
>>>> +
>>>> +	if (!priv->temp.dts_margin.valid) {
>>>> +		priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
>>>> +		priv->temp.dts_margin.valid = true;
>>>> +	} else {
>>>> +		priv->temp.dts_margin.last_updated = jiffies;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int get_core_temp(struct peci_hwmon *priv, int core_index)
>>>> +{
>>>> +	struct peci_rd_pkg_cfg_msg msg;
>>>> +	s32 core_dts_margin;
>>>> +	int rc;
>>>> +
>>>> +	if (!need_update(&priv->temp.core[core_index]))
>>>> +		return 0;
>>>> +
>>>> +	rc = get_tjmax(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	msg.addr = priv->addr;
>>>> +	msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
>>>> +	msg.param = core_index;
>>>> +	msg.rx_len = 4;
>>>> +
>>>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>>>> +
>>>> +	/**
>>>> +	 * Processors return a value of the core DTS reading in 10.6 format
>>>> +	 * (10 bits signed decimal, 6 bits fractional).
>>>> +	 * Error codes:
>>>> +	 *   0x8000: General sensor error
>>>> +	 *   0x8001: Reserved
>>>> +	 *   0x8002: Underflow on reading value
>>>> +	 *   0x8003-0x81ff: Reserved
>>>> +	 */
>>>> +	if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
>>>> +		return -1;
>>>
>>> Please use valid error codes. This value is returned to user space,
>>> and I don't think this error should be EPERM.
>>>
>>
>> Yes, you are right. I'll fix it.
>>
>>>> +
>>>> +	core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
>>>> +
>>>> +	priv->temp.core[core_index].value = priv->temp.tjmax.value +
>>>> +					    core_dts_margin;
>>>> +
>>>> +	if (!priv->temp.core[core_index].valid) {
>>>> +		priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
>>>> +		priv->temp.core[core_index].valid = true;
>>>> +	} else {
>>>> +		priv->temp.core[core_index].last_updated = jiffies;
>>>> +	}
>>>
>>> I don't understand the purpose of this code. Why not just set valid = true
>>> and last_updated = jiffies ? Why set anything to INITIAL_JIFFIES some
>>> arbitrary time after boot ? AFAICS the first read will always be followed
>>> by another immediately afterwards if the user requests two readings in
>>> a row. Maybe that is intentional, but not to me. If this code is on
>>> purpose, it will require a detailed explanation.
>>>
>>
>> Agreed. I'll rewrite the code as a function like this:
>>
>> static void mark_updated(struct temp_data *temp)
>> {
>> 	if (!temp->valid)
>> 		temp->valid = true;
> 
> Are you concerned about memory write bandwidth ?
> 
> 	temp->valid = true;
> 
> should be sufficient. I don't see the point of the if statement.
> 

Right. The variable can be simply overwritten without using the if 
statement. I'll drop the if statement.

>>
>> 	temp->last_updated = jiffies;
>> }
>>
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
>>>> +{
>>>> +	struct peci_rd_pkg_cfg_msg msg;
>>>> +	int channel = dimm_index / 2;
>>>> +	int dimm_order = dimm_index % 2;
>>>> +	int rc;
>>>> +
>>>> +	if (!need_update(&priv->temp.dimm[dimm_index]))
>>>> +		return 0;
>>>> +
>>>> +	msg.addr = priv->addr;
>>>> +	msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>>>> +	msg.param = channel;
>>>> +	msg.rx_len = 4;
>>>> +
>>>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 1000;
>>>> +
>>>> +	if (!priv->temp.dimm[dimm_index].valid) {
>>>> +		priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
>>>> +		priv->temp.dimm[dimm_index].valid = true;
>>>> +	} else {
>>>> +		priv->temp.dimm[dimm_index].last_updated = jiffies;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static ssize_t show_tcontrol(struct device *dev,
>>>> +			     struct device_attribute *attr,
>>>> +			     char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	int rc;
>>>> +
>>>> +	rc = get_tcontrol(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
>>>> +}
>>>> +
>>>> +static ssize_t show_tcontrol_margin(struct device *dev,
>>>> +				    struct device_attribute *attr,
>>>> +				    char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>>>> +	int rc;
>>>> +
>>>> +	rc = get_tcontrol(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", sensor_attr->index == POS ?
>>>> +				    priv->temp.tjmax.value -
>>>> +				    priv->temp.tcontrol.value :
>>>> +				    priv->temp.tcontrol.value -
>>>> +				    priv->temp.tjmax.value);
>>>> +}
>>>> +
>>>> +static ssize_t show_tthrottle(struct device *dev,
>>>> +			      struct device_attribute *attr,
>>>> +			      char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	int rc;
>>>> +
>>>> +	rc = get_tthrottle(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
>>>> +}
>>>> +
>>>> +static ssize_t show_tjmax(struct device *dev,
>>>> +			  struct device_attribute *attr,
>>>> +			  char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	int rc;
>>>> +
>>>> +	rc = get_tjmax(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", priv->temp.tjmax.value);
>>>> +}
>>>> +
>>>> +static ssize_t show_die_temp(struct device *dev,
>>>> +			     struct device_attribute *attr,
>>>> +			     char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	int rc;
>>>> +
>>>> +	rc = get_die_temp(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", priv->temp.die.value);
>>>> +}
>>>> +
>>>> +static ssize_t show_dts_margin(struct device *dev,
>>>> +			       struct device_attribute *attr,
>>>> +			       char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	int rc;
>>>> +
>>>> +	rc = get_dts_margin(priv);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
>>>> +}
>>>> +
>>>> +static ssize_t show_core_temp(struct device *dev,
>>>> +			      struct device_attribute *attr,
>>>> +			      char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>>>> +	int core_index = sensor_attr->index;
>>>> +	int rc;
>>>> +
>>>> +	rc = get_core_temp(priv, core_index);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
>>>> +}
>>>> +
>>>> +static ssize_t show_dimm_temp(struct device *dev,
>>>> +			      struct device_attribute *attr,
>>>> +			      char *buf)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
>>>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>>>> +	int dimm_index = sensor_attr->index;
>>>> +	int rc;
>>>> +
>>>> +	rc = get_dimm_temp(priv, dimm_index);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
>>>> +}
>>>> +
>>>> +static ssize_t show_value(struct device *dev,
>>>> +			  struct device_attribute *attr,
>>>> +			  char *buf)
>>>> +{
>>>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>>>> +
>>>> +	return sprintf(buf, "%d\n", sensor_attr->index);
>>>> +}
>>>> +
>>>> +static ssize_t show_label(struct device *dev,
>>>> +			  struct device_attribute *attr,
>>>> +			  char *buf)
>>>> +{
>>>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>>>> +
>>>> +	return sprintf(buf, peci_label[sensor_attr->index]);
>>>> +}
>>>> +
>>>> +static ssize_t show_core_label(struct device *dev,
>>>> +			       struct device_attribute *attr,
>>>> +			       char *buf)
>>>> +{
>>>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>>>> +
>>>> +	return sprintf(buf, "Core %d\n", sensor_attr->index);
>>>> +}
>>>> +
>>>> +static ssize_t show_dimm_label(struct device *dev,
>>>> +			       struct device_attribute *attr,
>>>> +			       char *buf)
>>>> +{
>>>> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
>>>> +
>>>> +	char channel = 'A' + (sensor_attr->index / 2);
>>>> +	int index = sensor_attr->index % 2;
>>>> +
>>>> +	return sprintf(buf, "DIMM %d (%c%d)\n",
>>>> +		       sensor_attr->index, channel, index);
>>>> +}
>>>> +
>>>> +/* Die temperature */
>>>> +static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
>>>> +static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
>>>> +static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
>>>> +static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
>>>> +static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, show_tcontrol_margin, NULL,
>>>> +			  POS);
>>>> +
>>>> +static struct attribute *die_temp_attrs[] = {
>>>> +	&sensor_dev_attr_temp1_label.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp1_input.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp1_max.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp1_crit.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
>>>> +	NULL
>>>> +};
>>>> +
>>>> +static struct attribute_group die_temp_attr_group = {
>>>> +	.attrs = die_temp_attrs,
>>>> +};
>>>> +
>>>> +/* DTS margin temperature */
>>>> +static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
>>>> +static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
>>>> +static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
>>>> +static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, NULL, NEG);
>>>> +
>>>> +static struct attribute *dts_margin_temp_attrs[] = {
>>>> +	&sensor_dev_attr_temp2_label.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp2_input.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp2_min.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp2_lcrit.dev_attr.attr,
>>>> +	NULL
>>>> +};
>>>> +
>>>> +static struct attribute_group dts_margin_temp_attr_group = {
>>>> +	.attrs = dts_margin_temp_attrs,
>>>> +};
>>>> +
>>>> +/* Tcontrol temperature */
>>>> +static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, L_TCONTROL);
>>>> +static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
>>>> +static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
>>>> +
>>>> +static struct attribute *tcontrol_temp_attrs[] = {
>>>> +	&sensor_dev_attr_temp3_label.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp3_input.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp3_crit.dev_attr.attr,
>>>> +	NULL
>>>> +};
>>>> +
>>>> +static struct attribute_group tcontrol_temp_attr_group = {
>>>> +	.attrs = tcontrol_temp_attrs,
>>>> +};
>>>> +
>>>> +/* Tthrottle temperature */
>>>> +static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, L_TTHROTTLE);
>>>> +static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
>>>> +
>>>> +static struct attribute *tthrottle_temp_attrs[] = {
>>>> +	&sensor_dev_attr_temp4_label.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp4_input.dev_attr.attr,
>>>> +	NULL
>>>> +};
>>>> +
>>>> +static struct attribute_group tthrottle_temp_attr_group = {
>>>> +	.attrs = tthrottle_temp_attrs,
>>>> +};
>>>> +
>>>> +/* Tjmax temperature */
>>>> +static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
>>>> +static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
>>>> +
>>>> +static struct attribute *tjmax_temp_attrs[] = {
>>>> +	&sensor_dev_attr_temp5_label.dev_attr.attr,
>>>> +	&sensor_dev_attr_temp5_input.dev_attr.attr,
>>>> +	NULL
>>>> +};
>>>> +
>>>> +static struct attribute_group tjmax_temp_attr_group = {
>>>> +	.attrs = tjmax_temp_attrs,
>>>> +};
>>>> +
>>>> +static const struct attribute_group *
>>>> +default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
>>>> +	&die_temp_attr_group,
>>>> +	&dts_margin_temp_attr_group,
>>>> +	&tcontrol_temp_attr_group,
>>>> +	&tthrottle_temp_attr_group,
>>>> +	&tjmax_temp_attr_group,
>>>> +	NULL
>>>> +};
>>>> +
>>>> +/* Core temperature */
>>>> +static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device *dev,
>>>> +		struct device_attribute *devattr, char *buf) = {
>>>> +	show_core_label,
>>>> +	show_core_temp,
>>>> +	show_tcontrol,
>>>> +	show_tjmax,
>>>> +	show_tcontrol_margin,
>>>> +};
>>>> +
>>>> +static const char *const core_suffix[CORE_TEMP_ATTRS] = {
>>>> +	"label",
>>>> +	"input",
>>>> +	"max",
>>>> +	"crit",
>>>> +	"crit_hyst",
>>>> +};
>>>> +
>>>> +static int check_resolved_cores(struct peci_hwmon *priv)
>>>> +{
>>>> +	struct peci_rd_pci_cfg_local_msg msg;
>>>> +	int rc;
>>>> +
>>>> +	if (!(priv->client->adapter->cmd_mask & BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
>>>> +		return -EINVAL;
>>>> +
>>>> +	/* Get the RESOLVED_CORES register value */
>>>> +	msg.addr = priv->addr;
>>>> +	msg.bus = 1;
>>>> +	msg.device = 30;
>>>> +	msg.function = 3;
>>>> +	msg.reg = 0xB4;
>>>> +	msg.rx_len = 4;
>>>> +
>>>> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
>>>> +	if (rc < 0)
>>>> +		return rc;
>>>> +
>>>> +	priv->core_mask = msg.pci_config[3] << 24 |
>>>> +			  msg.pci_config[2] << 16 |
>>>> +			  msg.pci_config[1] << 8 |
>>>> +			  msg.pci_config[0];
>>>> +
>>>> +	if (!priv->core_mask)
>>>> +		return -EAGAIN;
>>>> +
>>>> +	dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", priv->core_mask);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
>>>> +{
>>>> +	struct core_temp_group *data;
>>>> +	int i;
>>>> +
>>>> +	data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
>>>> +			    GFP_KERNEL);
>>>> +	if (!data)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	for (i = 0; i < CORE_TEMP_ATTRS; i++) {
>>>> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
>>>> +			 "temp%d_%s", priv->global_idx, core_suffix[i]);
>>>> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
>>>> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
>>>> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
>>>> +		data->sd_attrs[i].dev_attr.show = core_show_fn[i];
>>>> +		if (i == 0 || i == 1) /* label or temp */
>>>> +			data->sd_attrs[i].index = core_no;
>>>> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
>>>> +	}
>>>> +
>>>> +	data->attr_group.attrs = data->attrs;
>>>> +	priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
>>>> +	priv->global_idx++;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int create_core_temp_groups(struct peci_hwmon *priv)
>>>> +{
>>>> +	int rc, i;
>>>> +
>>>> +	rc = check_resolved_cores(priv);
>>>> +	if (!rc) {
>>>> +		for (i = 0; i < CORE_NUMS_MAX; i++) {
>>>> +			if (priv->core_mask & BIT(i)) {
>>>> +				rc = create_core_temp_group(priv, i);
>>>> +				if (rc)
>>>> +					return rc;
>>>> +			}
>>>> +		}
>>>> +
>>>> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
>>>> +					 priv->core_attr_groups);
>>>> +	}
>>>> +
>>>> +	return rc;
>>>> +}
>>>> +
>>>> +/* DIMM temperature */
>>>> +static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device *dev,
>>>> +		struct device_attribute *devattr, char *buf) = {
>>>> +	show_dimm_label,
>>>> +	show_dimm_temp,
>>>> +};
>>>> +
>>>> +static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
>>>> +	"label",
>>>> +	"input",
>>>> +};
>>>> +
>>>> +static int check_populated_dimms(struct peci_hwmon *priv)
>>>> +{
>>>> +	struct peci_rd_pkg_cfg_msg msg;
>>>> +	int i, rc, pass = 0;
>>>> +
>>>> +do_scan:
>>>> +	for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
>>>> +		msg.addr = priv->addr;
>>>> +		msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>>>> +		msg.param = i; /* channel */
>>>> +		msg.rx_len = 4;
>>>> +
>>>> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>>>> +		if (rc < 0)
>>>> +			return rc;
>>>> +
>>>> +		if (msg.pkg_config[0]) /* DIMM #0 on the channel */
>>>> +			priv->dimm_mask |= BIT(i);
>>>> +
>>>> +		if (msg.pkg_config[1]) /* DIMM #1 on the channel */
>>>> +			priv->dimm_mask |= BIT(i + 1);
>>>
>>> Each loop sets overlapping bits in dimm_mask. The first loop sets
>>> bit 0 and 1, the second sets bit 1 and 2, and so on. I _think_ this
>>> should probably set bits (i*2) and (i*2+1). If so, I would suggest to
>>> test the code in a system with more than one DIMM in more than one bank.
>>>
>>
>> Thanks for your pointing out my mistake. It has to be changed to (i*2) and
>> (i*2+1) as you suggested. I'll fix it with taking enough test on a various
>> DIMM setting environment.
>>
>>>> +	}
>>>> +
>>>> +	/* Do 2-pass scanning */
>>>> +	if (priv->dimm_mask && pass == 0) {
>>>> +		pass++;
>>>> +		goto do_scan;
>>>
>>> This goto is only used to avoid a nested loops. Please don't do that.
>>> If you want to avoid indentation levels, add another function.
>>>
>>> Also, this will require an explanation why the loop is executed if
>>> and only if a dimm is found the first time around.
>>>
>>
>> The reason why is needs 2-pass scanning is, we function can be called in the
>> middle of the timing when cient BIOS is updating the values. I'll add this
>> as a comment and rewrite this code without using the goto.
>>
> 
> How would that be different during the 2nd scan ? If there is concern about
> concurrency, the loop would have to be repeated until there are no more changes.
> Even then there could be update during the last scan (which the code did not
> catch).
> 

Okay. I'll change it to single scan logic after taking some tests.

>>>> +	}
>>>> +
>>>> +	if (!priv->dimm_mask)
>>>> +		return -EAGAIN;
>>>> +
>>>> +	dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", priv->dimm_mask);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
>>>> +{
>>>> +	struct dimm_temp_group *data;
>>>> +	int i;
>>>> +
>>>> +	data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
>>>> +			    GFP_KERNEL);
>>>> +	if (!data)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
>>>> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
>>>> +			 "temp%d_%s", priv->global_idx, dimm_suffix[i]);
>>>> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
>>>> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
>>>> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
>>>> +		data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
>>>> +		data->sd_attrs[i].index = dimm_no;
>>>> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
>>>> +	}
>>>> +
>>>> +	data->attr_group.attrs = data->attrs;
>>>> +	priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
>>>> +	priv->global_idx++;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int create_dimm_temp_groups(struct peci_hwmon *priv)
>>>> +{
>>>> +	int rc, i;
>>>> +
>>>> +	rc = check_populated_dimms(priv);
>>>> +	if (!rc) {
>>>> +		for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
>>>> +			if (priv->dimm_mask & BIT(i)) {
>>>> +				rc = create_dimm_temp_group(priv, i);
>>>> +				if (rc)
>>>> +					return rc;
>>>> +			}
>>>> +		}
>>>> +
>>>> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
>>>> +					 priv->dimm_attr_groups);
>>>> +		if (!rc)
>>>> +			dev_dbg(priv->dev, "Done DIMM temp group creation\n");
>>>> +	} else if (rc == -EAGAIN) {
>>>> +		queue_delayed_work(priv->work_queue, &priv->work_handler,
>>>> +				   DIMM_MASK_CHECK_DELAY);
>>>> +		dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");
>>>
>>> s/Diferred/Deferred/
>>>
>>
>> Will fix this typo.
>>
>>> If PECI never reports any DIMMS, this will be repeated forever until
>>> it finds at least one group. Is this intentional ? If so, I would expect
>>> some detailed explanation of the rationale. As it is, the DIMM temperatures
>>> can show up randomly after some hours of runtime, which isn't exactly
>>> deterministic. Maybe that does make sense, but it will need to be explained.
>>>
>>
>> In general, a client CPU will report DIMM population info just after the
>> client CPU completes memory training and testing at the very beginning of
>> BIOS boot. The time varies depends on the client system but it would be less
>> than 5 minutes. I'll add a timeout logic.
>>
> But it is not complete by the time Linux boots, and there is no "incomplete"
> message ?
> 
> This sounds racy; how is it guaranteed that any reading is complete and that
> no additional DIMMs will show up some arbitrary time after the first DIMM
> was reported ?
> 

This hwmon driver is for BMC side kernel which is for monitoring remote 
CPUs, not a local CPU. So it needs to consider the remote CPU's booting 
from the remote server's power on.

>>>> +	}
>>>> +
>>>> +	return rc;
>>>> +}
>>>> +
>>>> +static void create_dimm_temp_groups_delayed(struct work_struct *work)
>>>> +{
>>>> +	struct delayed_work *dwork = to_delayed_work(work);
>>>> +	struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
>>>> +					       work_handler);
>>>> +	int rc;
>>>> +
>>>> +	rc = create_dimm_temp_groups(priv);
>>>> +	if (rc && rc != -EAGAIN)
>>>> +		dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
>>>> +}
>>>> +
>>>> +static int peci_hwmon_probe(struct peci_client *client)
>>>> +{
>>>> +	struct device *dev = &client->dev;
>>>> +	struct peci_hwmon *priv;
>>>> +	int rc;
>>>> +
>>>> +	if ((client->adapter->cmd_mask &
>>>> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
>>>> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
>>>> +		dev_err(dev, "Client doesn't support temperature monitoring\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>>>> +	if (!priv)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	dev_set_drvdata(dev, priv);
>>>> +	priv->client = client;
>>>> +	priv->dev = dev;
>>>> +	priv->addr = client->addr;
>>>> +	priv->cpu_no = priv->addr - PECI_BASE_ADDR;
>>>> +
>>>> +	snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", priv->cpu_no);
>>>> +
>>>> +	priv->work_queue = create_singlethread_workqueue(priv->name);
>>>> +	if (!priv->work_queue)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
>>>> +							    priv->name,
>>>> +							    priv,
>>>> +							   default_attr_groups);
>>>> +
>>> I'll expect a detailed explanation why using hwmon_device_register_with_info()
>>> does not work for this driver, and why it would make sense to ever register
>>> the hwmon device before its attributes are available. From my perspective,
>>> the driver should delay registration entirely until all attributes are
>>> available. The hwmon ABI implicitly assumes that all sensors are available
>>> at the time of hwmon device registration. Anything else can result in
>>> unexpected behavior.
>>>
>>
>> AFAIK, hwmon_device_register_with_info is for adding a non-standard
> 
> This is wrong.
> 
> hwmon_device_register_with_info() has an _option_ to provide a set of
> additional non-standard attributes as its last parameter. Its purpose
> is to simplify drivers by moving sysfs attribute handling into the
> hwmon core. All new drivers should use that API unless there is a
> compelling reason not to do so.
> 
>> attribute so hwmon_device_register_with_group is correct in this case. Also,
> 
> This is wrong. Quoting from the documentation.
> 
> "hwmon_device_register_with_info is the most comprehensive and preferred means
> to register a hardware monitoring device. It creates the standard sysfs
> attributes in the hardware monitoring core, letting the driver focus on reading
> from and writing to the chip instead of having to bother with sysfs attributes.
> Its parameters are described in more detail below."
> 
>> this delayed additional registration case is being used in core-temp.c using
>> a similar way.
>>
> That doesn't make it better.
> 
>>>> +	rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
>>>> +	if (rc) {
>>>> +		dev_err(dev, "Failed to register peci hwmon\n");
>>>> +		return rc;
>>>> +	}
>>>> +
>>>> +	priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
>>>> +
>>>> +	rc = create_core_temp_groups(priv);
>>>> +	if (rc) {
>>>> +		dev_err(dev, "Failed to create core groups\n");
>>>> +		return rc;
>>>> +	}
>>>
>>> This should be done before registering the hwmon device (or be left
>>> to the hwmon core by using the _info API). And it should definitely
>>> not return an error while keeping the hwmon device around.
>>>
>>
>> As I answers above, hwmon_device_register_with_info is for adding a
>> non-standard attribute and this kind of way is being used in core-temp.c
> 
> Again, this is wrong.
> 
>> using a similar way.
> 
> Two wrongs don't make it right. Besides, the coretemp driver handles
> (or tries to handle) dynamic CPU insertion and removal, which is not
> the case here.
> 
> Guenter
> 

Thanks for your kind explanation. My understanding is, it should use 
hwmon_device_register_with_info with multiple hwmon_channel_info 
parameter as an array for adding standard attributes and don't use the 
groups parameter of the API because the groups parameter of this API is 
for adding a non-standard attribute, right? But even with this change, 
it still needs to use delayed creation because BMC side kernel doesn't 
know how many DIMMs are populated on a remote server before the remote 
server completes its memory training and testing in BIOS, but it needs 
to check the remote server's CPU temperature as immediate as possible to 
make appropriate thermal control based on the remote CPU's temperature 
to avoid any critical thermal issue. What would be a better solution in 
this case?

Thanks a lot,
Jae

>>>> +
>>>> +	INIT_DELAYED_WORK(&priv->work_handler, create_dimm_temp_groups_delayed);
>>>> +
>>>> +	rc = create_dimm_temp_groups(priv);
>>>> +	if (rc && rc != -EAGAIN)
>>>> +		dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
>>>> +
>>> Not that it should be there in the first place, but "creat" is not a word.
>>>
>>
>> Please check my above answers. I'll fix the typo.
>>
>> Again, Thanks a lot for sharing your time to review it. I really appreciate
>> it.
>>
>> BR,
>> Jae
>>
>>>> +	dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int peci_hwmon_remove(struct peci_client *client)
>>>> +{
>>>> +	struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
>>>> +
>>>> +	cancel_delayed_work(&priv->work_handler);
>>>> +	destroy_workqueue(priv->work_queue);
>>>> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
>>>> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
>>>> +	hwmon_device_unregister(priv->hwmon_dev);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static const struct of_device_id peci_of_table[] = {
>>>> +	{ .compatible = "intel,peci-hwmon", },
>>>> +	{ }
>>>> +};
>>>> +MODULE_DEVICE_TABLE(of, peci_of_table);
>>>> +
>>>> +static struct peci_driver peci_hwmon_driver = {
>>>> +	.probe  = peci_hwmon_probe,
>>>> +	.remove = peci_hwmon_remove,
>>>> +	.driver = {
>>>> +		.name           = "peci-hwmon",
>>>> +		.of_match_table = of_match_ptr(peci_of_table),
>>>> +	},
>>>> +};
>>>> +module_peci_driver(peci_hwmon_driver);
>>>> +
>>>> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
>>>> +MODULE_DESCRIPTION("PECI hwmon driver");
>>>> +MODULE_LICENSE("GPL v2");
>>>> -- 
>>>> 2.16.1
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-21 23:07         ` Jae Hyun Yoo
@ 2018-02-22  0:37           ` Andrew Lunn
  2018-02-22  1:29             ` Jae Hyun Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Lunn @ 2018-02-22  0:37 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: Guenter Roeck, joel, andrew, arnd, gregkh, jdelvare, benh,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

> But even with this change, it still needs to use delayed creation
> because BMC side kernel doesn't know how many DIMMs are populated on
> a remote server before the remote server completes its memory
> training and testing in BIOS, but it needs to check the remote
> server's CPU temperature as immediate as possible to make
> appropriate thermal control based on the remote CPU's temperature to
> avoid any critical thermal issue. What would be a better solution in
> this case?

You could change this driver so that it supports one DIMM.  Move the
'hotplug' part into another driver which creates and destroys
instances of the hwmon DIMM device as the DIMMS come and go.

Also, do you need to handle CPU hotplug? You could split the CPU
temperature part into a separate hwmon driver? And again create and
destroy devices as CPUs come and go?

	Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-22  0:37           ` Andrew Lunn
@ 2018-02-22  1:29             ` Jae Hyun Yoo
  2018-02-24  0:00               ` Miguel Ojeda
  0 siblings, 1 reply; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-22  1:29 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Guenter Roeck, joel, andrew, arnd, gregkh, jdelvare, benh,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

On 2/21/2018 4:37 PM, Andrew Lunn wrote:
>> But even with this change, it still needs to use delayed creation
>> because BMC side kernel doesn't know how many DIMMs are populated on
>> a remote server before the remote server completes its memory
>> training and testing in BIOS, but it needs to check the remote
>> server's CPU temperature as immediate as possible to make
>> appropriate thermal control based on the remote CPU's temperature to
>> avoid any critical thermal issue. What would be a better solution in
>> this case?
> 
> You could change this driver so that it supports one DIMM.  Move the
> 'hotplug' part into another driver which creates and destroys
> instances of the hwmon DIMM device as the DIMMS come and go.
> 
> Also, do you need to handle CPU hotplug? You could split the CPU
> temperature part into a separate hwmon driver? And again create and
> destroy devices as CPUs come and go?
> 
> 	Andrew
> 

That seems like a possible option. I'll rewrite the hwmon driver again 
like that.

Thanks for the good idea. :)

Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 20:42     ` Jae Hyun Yoo
@ 2018-02-22  6:54       ` Greg KH
  2018-02-22 17:20         ` Jae Hyun Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Greg KH @ 2018-02-22  6:54 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, jdelvare, linux, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On Wed, Feb 21, 2018 at 12:42:30PM -0800, Jae Hyun Yoo wrote:
> On 2/21/2018 9:58 AM, Greg KH wrote:
> > On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
> > > This commit adds driver implementation for PECI bus into linux
> > > driver framework.
> > > 
> > > Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > > ---
> > 
> > Why is there no other Intel developers willing to review and sign off on
> > this patch?  Please get their review first before asking us to do their
> > work for them :)
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Hi Greg,
> 
> This patch set got our internal review process. Sorry if it's code quality
> is under your expectation but it's the reason why I'm asking you to review
> the code. Could you please share your time to review it?

Nope.  If no other Intel developer thinks it is good enough to put their
name on it as part of their review process, why should I?

Again, please use the resources you have, to fix the obvious problems in
your code, BEFORE asking the community to do that work for you.

greg k-h

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
  2018-02-21 17:04   ` Andrew Lunn
  2018-02-21 17:58   ` Greg KH
@ 2018-02-22  7:01   ` kbuild test robot
  2018-02-22  7:01   ` [RFC PATCH] drivers/peci: peci_match_id() can be static kbuild test robot
  2018-03-07  3:19   ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Julia Cartwright
  4 siblings, 0 replies; 46+ messages in thread
From: kbuild test robot @ 2018-02-22  7:01 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: kbuild-all, joel, andrew, arnd, gregkh, jdelvare, linux, benh,
	andrew, linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo

Hi Jae,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.16-rc2 next-20180221]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jae-Hyun-Yoo/PECI-device-driver-introduction/20180222-054545
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/peci/peci-core.c:773:29: sparse: symbol 'peci_match_id' was not declared. Should it be

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH] drivers/peci: peci_match_id() can be static
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
                     ` (2 preceding siblings ...)
  2018-02-22  7:01   ` kbuild test robot
@ 2018-02-22  7:01   ` kbuild test robot
  2018-02-22 17:25     ` Jae Hyun Yoo
  2018-03-07  3:19   ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Julia Cartwright
  4 siblings, 1 reply; 46+ messages in thread
From: kbuild test robot @ 2018-02-22  7:01 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: kbuild-all, joel, andrew, arnd, gregkh, jdelvare, linux, benh,
	andrew, linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc, Jae Hyun Yoo


Fixes: 99f5d2b99ecd ("drivers/peci: Add support for PECI bus driver core")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 peci-core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/peci/peci-core.c b/drivers/peci/peci-core.c
index d976c73..4709b8c 100644
--- a/drivers/peci/peci-core.c
+++ b/drivers/peci/peci-core.c
@@ -770,8 +770,8 @@ peci_of_match_device(const struct of_device_id *matches,
 }
 #endif
 
-const struct peci_device_id *peci_match_id(const struct peci_device_id *id,
-					   struct peci_client *client)
+static const struct peci_device_id *peci_match_id(const struct peci_device_id *id,
+						  struct peci_client *client)
 {
 	if (!(id && client))
 		return NULL;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-22  6:54       ` Greg KH
@ 2018-02-22 17:20         ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-22 17:20 UTC (permalink / raw)
  To: Greg KH
  Cc: joel, andrew, arnd, jdelvare, linux, benh, andrew, linux-kernel,
	linux-doc, devicetree, linux-hwmon, linux-arm-kernel, openbmc

On 2/21/2018 10:54 PM, Greg KH wrote:
> On Wed, Feb 21, 2018 at 12:42:30PM -0800, Jae Hyun Yoo wrote:
>> On 2/21/2018 9:58 AM, Greg KH wrote:
>>> On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
>>>> This commit adds driver implementation for PECI bus into linux
>>>> driver framework.
>>>>
>>>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>>>> ---
>>>
>>> Why is there no other Intel developers willing to review and sign off on
>>> this patch?  Please get their review first before asking us to do their
>>> work for them :)
>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>>
>> Hi Greg,
>>
>> This patch set got our internal review process. Sorry if it's code quality
>> is under your expectation but it's the reason why I'm asking you to review
>> the code. Could you please share your time to review it?
> 
> Nope.  If no other Intel developer thinks it is good enough to put their
> name on it as part of their review process, why should I?
> 
> Again, please use the resources you have, to fix the obvious problems in
> your code, BEFORE asking the community to do that work for you.
> 
> greg k-h
> 

Okay. I'll take our internal review process again on this patch set and 
collect more credit tags before submitting v3.

Thanks for your advice!

Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH] drivers/peci: peci_match_id() can be static
  2018-02-22  7:01   ` [RFC PATCH] drivers/peci: peci_match_id() can be static kbuild test robot
@ 2018-02-22 17:25     ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-22 17:25 UTC (permalink / raw)
  To: fengguang.wu
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

On 2/21/2018 11:01 PM, kbuild test robot wrote:
> 
> Fixes: 99f5d2b99ecd ("drivers/peci: Add support for PECI bus driver core")
> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
> ---
>   peci-core.c |    4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/peci/peci-core.c b/drivers/peci/peci-core.c
> index d976c73..4709b8c 100644
> --- a/drivers/peci/peci-core.c
> +++ b/drivers/peci/peci-core.c
> @@ -770,8 +770,8 @@ peci_of_match_device(const struct of_device_id *matches,
>   }
>   #endif
>   
> -const struct peci_device_id *peci_match_id(const struct peci_device_id *id,
> -					   struct peci_client *client)
> +static const struct peci_device_id *peci_match_id(const struct peci_device_id *id,
> +						  struct peci_client *client)
>   {
>   	if (!(id && client))
>   		return NULL;
> 

Hi Fengguang,

Thanks a lot for the fix. I'll merge your patch in v3 submission.

BR,
Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-22  1:29             ` Jae Hyun Yoo
@ 2018-02-24  0:00               ` Miguel Ojeda
  2018-02-24  9:32                 ` Jae Hyun Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Miguel Ojeda @ 2018-02-24  0:00 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: Andrew Lunn, Guenter Roeck, joel, andrew, Arnd Bergmann, Greg KH,
	jdelvare, benh, linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

On Thu, Feb 22, 2018 at 2:29 AM, Jae Hyun Yoo
<jae.hyun.yoo@linux.intel.com> wrote:
> On 2/21/2018 4:37 PM, Andrew Lunn wrote:
>>>
>>> But even with this change, it still needs to use delayed creation
>>> because BMC side kernel doesn't know how many DIMMs are populated on
>>> a remote server before the remote server completes its memory
>>> training and testing in BIOS, but it needs to check the remote
>>> server's CPU temperature as immediate as possible to make
>>> appropriate thermal control based on the remote CPU's temperature to
>>> avoid any critical thermal issue. What would be a better solution in
>>> this case?
>>
>>
>> You could change this driver so that it supports one DIMM.  Move the
>> 'hotplug' part into another driver which creates and destroys
>> instances of the hwmon DIMM device as the DIMMS come and go.
>>
>> Also, do you need to handle CPU hotplug? You could split the CPU
>> temperature part into a separate hwmon driver? And again create and
>> destroy devices as CPUs come and go?
>>
>>         Andrew
>>
>
> That seems like a possible option. I'll rewrite the hwmon driver again like
> that.
>
> Thanks for the good idea. :)

By the way, in the rewrite, please try to avoid the create*workqueue()
functions (they are deprecated :).

Cheers,
Miguel

>
> Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-24  0:00               ` Miguel Ojeda
@ 2018-02-24  9:32                 ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-02-24  9:32 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Andrew Lunn, Guenter Roeck, joel, andrew, Arnd Bergmann, Greg KH,
	jdelvare, benh, linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

On 2/23/2018 4:00 PM, Miguel Ojeda wrote:
> On Thu, Feb 22, 2018 at 2:29 AM, Jae Hyun Yoo
> <jae.hyun.yoo@linux.intel.com> wrote:
>> On 2/21/2018 4:37 PM, Andrew Lunn wrote:
>>>>
>>>> But even with this change, it still needs to use delayed creation
>>>> because BMC side kernel doesn't know how many DIMMs are populated on
>>>> a remote server before the remote server completes its memory
>>>> training and testing in BIOS, but it needs to check the remote
>>>> server's CPU temperature as immediate as possible to make
>>>> appropriate thermal control based on the remote CPU's temperature to
>>>> avoid any critical thermal issue. What would be a better solution in
>>>> this case?
>>>
>>>
>>> You could change this driver so that it supports one DIMM.  Move the
>>> 'hotplug' part into another driver which creates and destroys
>>> instances of the hwmon DIMM device as the DIMMS come and go.
>>>
>>> Also, do you need to handle CPU hotplug? You could split the CPU
>>> temperature part into a separate hwmon driver? And again create and
>>> destroy devices as CPUs come and go?
>>>
>>>          Andrew
>>>
>>
>> That seems like a possible option. I'll rewrite the hwmon driver again like
>> that.
>>
>> Thanks for the good idea. :)
> 
> By the way, in the rewrite, please try to avoid the create*workqueue()
> functions (they are deprecated :).
> 
> Cheers,
> Miguel
> 

Hi Miguel,

Thanks for letting me know that. I'll replace that with 
alloc_workqueue(). :)

Regards,
Jae

>>
>> Jae

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-02-21 16:16 ` [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs Jae Hyun Yoo
  2018-02-21 17:13   ` Andrew Lunn
@ 2018-03-06 12:40   ` Pavel Machek
  2018-03-06 12:54     ` Andrew Lunn
  2018-03-06 19:05     ` Jae Hyun Yoo
  1 sibling, 2 replies; 46+ messages in thread
From: Pavel Machek @ 2018-03-06 12:40 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

Hi!

> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> ---
>  .../devicetree/bindings/peci/peci-aspeed.txt       | 73 ++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt
> 
> diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
> new file mode 100644
> index 000000000000..8a86f346d550
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
> @@ -0,0 +1,73 @@
> +Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs.

Are these SoCs x86-based?

> +Required properties:
> +- compatible
> +	"aspeed,ast2400-peci" or "aspeed,ast2500-peci"
> +	- aspeed,ast2400-peci: Aspeed AST2400 family PECI controller
> +	- aspeed,ast2500-peci: Aspeed AST2500 family PECI controller
> +
> +- reg
> +	Should contain PECI registers location and length.

Other dts documents put it on one line, reg: Should contain ...

> +- clock_frequency
> +	Should contain the operation frequency of PECI hardware module.
> +	187500 ~ 24000000

specify this is Hz?

> +- rd-sampling-point
> +	Read sampling point selection. The whole period of a bit time will be
> +	divided into 16 time frames. This value will determine which time frame
> +	this controller will sample PECI signal for data read back. Usually in
> +	the middle of a bit time is the best.

English? "This value will determine when this controller"?

> +	0 ~ 15 (default: 8)
> +
> +- cmd_timeout_ms
> +	Command timeout in units of ms.
> +	1 ~ 60000 (default: 1000)
> +
> +Example:
> +	peci: peci@1e78b000 {
> +		compatible = "simple-bus";
> +		#address-cells = <1>;
> +		#size-cells = <1>;
> +		ranges = <0x0 0x1e78b000 0x60>;
> +
> +		peci0: peci-bus@0 {
> +			compatible = "aspeed,ast2500-peci";
> +			reg = <0x0 0x60>;
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +			interrupts = <15>;
> +			clocks = <&clk_clkin>;
> +			clock-frequency = <24000000>;
> +			msg-timing-nego = <1>;
> +			addr-timing-nego = <1>;
> +			rd-sampling-point = <8>;
> +			cmd-timeout-ms = <1000>;
> +		};
> +	};
> \ No newline at end of file

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 0/8] PECI device driver introduction
  2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
                   ` (7 preceding siblings ...)
  2018-02-21 16:16 ` [PATCH v2 8/8] [PATCH 8/8] Add a maintainer for the PECI subsystem Jae Hyun Yoo
@ 2018-03-06 12:40 ` Pavel Machek
  2018-03-06 19:21   ` Jae Hyun Yoo
  8 siblings, 1 reply; 46+ messages in thread
From: Pavel Machek @ 2018-03-06 12:40 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

[-- Attachment #1: Type: text/plain, Size: 4427 bytes --]

Hi!

> Introduction of the Platform Environment Control Interface (PECI) bus
> device driver. PECI is a one-wire bus interface that provides a
> communication channel between Intel processor and chipset components to
> external monitoring or control devices. PECI is designed to support the
> following sideband functions:
> 
> * Processor and DRAM thermal management
>   - Processor fan speed control is managed by comparing Digital Thermal
>     Sensor (DTS) thermal readings acquired via PECI against the
>     processor-specific fan speed control reference point, or TCONTROL.
>     Both TCONTROL and DTS thermal readings are accessible via the processor
>     PECI client. These variables are referenced to a common temperature,
>     the TCC activation point, and are both defined as negative offsets from
>     that reference.
>   - PECI based access to the processor package configuration space provides
>     a means for Baseboard Management Controllers (BMC) or other platform
>     management devices to actively manage the processor and memory power
>     and thermal features.
> 
> * Platform Manageability
>   - Platform manageability functions including thermal, power, and error
>     monitoring. Note that platform 'power' management includes monitoring
>     and control for both the processor and DRAM subsystem to assist with
>     data center power limiting.
>   - PECI allows read access to certain error registers in the processor MSR
>     space and status monitoring registers in the PCI configuration space
>     within the processor and downstream devices.
>   - PECI permits writes to certain registers in the processor PCI
>     configuration space.
> 
> * Processor Interface Tuning and Diagnostics
>   - Processor interface tuning and diagnostics capabilities
>     (Intel(c) Interconnect BIST). The processors Intel(c) Interconnect
>     Built In Self Test (Intel(c) IBIST) allows for infield diagnostic
>     capabilities in the Intel UPI and memory controller interfaces. PECI
>     provides a port to execute these diagnostics via its PCI Configuration
>     read and write capabilities.
> 
> * Failure Analysis
>   - Output the state of the processor after a failure for analysis via
>     Crashdump.
> 
> PECI uses a single wire for self-clocking and data transfer. The bus
> requires no additional control lines. The physical layer is a self-clocked
> one-wire bus that begins each bit with a driven, rising edge from an idle
> level near zero volts. The duration of the signal driven high depends on
> whether the bit value is a logic '0' or logic '1'. PECI also includes
> variable data transfer rate established with every message. In this way,
> it is highly flexible even though underlying logic is simple.
> 
> The interface design was optimized for interfacing to Intel processor and
> chipset components in both single processor and multiple processor
> environments. The single wire interface provides low board routing
> overhead for the multiple load connections in the congested routing area
> near the processor and chipset components. Bus speed, error checking, and
> low protocol overhead provides adequate link bandwidth and reliability to
> transfer critical device operating conditions and configuration
> information.
> 
> This implementation provides the basic framework to add PECI extensions
> to the Linux bus and device models. A hardware specific 'Adapter' driver
> can be attached to the PECI bus to provide sideband functions described
> above. It is also possible to access all devices on an adapter from
> userspace through the /dev interface. A device specific 'Client' driver
> also can be attached to the PECI bus so each processor client's features
> can be supported by the 'Client' driver through an adapter connection in
> the bus. This patch set includes Aspeed 24xx/25xx PECI driver and a generic
> PECI hwmon driver as the first implementation for both adapter and client
> drivers on the PECI bus framework.

Ok, how does this interact with ACPI/SMM BIOS/Secure mode code? Does
Linux _need_ to control the fan? Or is SMM BIOS capable of doing all
the work itself and Linux has just read-only access for monitoring
purposes?

Pavel

-- (english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-03-06 12:40   ` Pavel Machek
@ 2018-03-06 12:54     ` Andrew Lunn
  2018-03-06 13:05       ` Pavel Machek
  2018-03-06 19:05     ` Jae Hyun Yoo
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Lunn @ 2018-03-06 12:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jae Hyun Yoo, joel, andrew, arnd, gregkh, jdelvare, linux, benh,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

On Tue, Mar 06, 2018 at 01:40:02PM +0100, Pavel Machek wrote:
> Hi!
> 
> > Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > ---
> >  .../devicetree/bindings/peci/peci-aspeed.txt       | 73 ++++++++++++++++++++++
> >  1 file changed, 73 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
> > new file mode 100644
> > index 000000000000..8a86f346d550
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
> > @@ -0,0 +1,73 @@
> > +Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs.
> 
> Are these SoCs x86-based?

ARM, as far as i can tell. If i get the architecture correct, these
are BMC, Board Management Controllers, looking after the main x86 CPU,
stopping it overheating, controlling the power supplies, remote
management, etc.

    Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-03-06 12:54     ` Andrew Lunn
@ 2018-03-06 13:05       ` Pavel Machek
  2018-03-06 13:19         ` Arnd Bergmann
  0 siblings, 1 reply; 46+ messages in thread
From: Pavel Machek @ 2018-03-06 13:05 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jae Hyun Yoo, joel, andrew, arnd, gregkh, jdelvare, linux, benh,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

On Tue 2018-03-06 13:54:16, Andrew Lunn wrote:
> On Tue, Mar 06, 2018 at 01:40:02PM +0100, Pavel Machek wrote:
> > Hi!
> > 
> > > Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > > ---
> > >  .../devicetree/bindings/peci/peci-aspeed.txt       | 73 ++++++++++++++++++++++
> > >  1 file changed, 73 insertions(+)
> > >  create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt
> > > 
> > > diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
> > > new file mode 100644
> > > index 000000000000..8a86f346d550
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
> > > @@ -0,0 +1,73 @@
> > > +Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs.
> > 
> > Are these SoCs x86-based?
> 
> ARM, as far as i can tell. If i get the architecture correct, these
> are BMC, Board Management Controllers, looking after the main x86 CPU,
> stopping it overheating, controlling the power supplies, remote
> management, etc.

Ok, so with x86 machine, I get arm-based one for free. I get it. Is
user able to run his own kernel on the arm system, or is it locked
down, TiVo style?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-03-06 13:05       ` Pavel Machek
@ 2018-03-06 13:19         ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2018-03-06 13:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Lunn, Jae Hyun Yoo, Joel Stanley, Andrew Jeffery, gregkh,
	Jean Delvare, Guenter Roeck, Benjamin Herrenschmidt,
	Linux Kernel Mailing List, open list:DOCUMENTATION, DTML,
	linux-hwmon, Linux ARM, OpenBMC Maillist

On Tue, Mar 6, 2018 at 2:05 PM, Pavel Machek <pavel@ucw.cz> wrote:
> On Tue 2018-03-06 13:54:16, Andrew Lunn wrote:
>> On Tue, Mar 06, 2018 at 01:40:02PM +0100, Pavel Machek wrote:
>> > Hi!
>> >
>> > > Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>> > > ---
>> > >  .../devicetree/bindings/peci/peci-aspeed.txt       | 73 ++++++++++++++++++++++
>> > >  1 file changed, 73 insertions(+)
>> > >  create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt
>> > >
>> > > diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
>> > > new file mode 100644
>> > > index 000000000000..8a86f346d550
>> > > --- /dev/null
>> > > +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
>> > > @@ -0,0 +1,73 @@
>> > > +Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs.
>> >
>> > Are these SoCs x86-based?
>>
>> ARM, as far as i can tell. If i get the architecture correct, these
>> are BMC, Board Management Controllers, looking after the main x86 CPU,
>> stopping it overheating, controlling the power supplies, remote
>> management, etc.
>
> Ok, so with x86 machine, I get arm-based one for free. I get it. Is
> user able to run his own kernel on the arm system, or is it locked
> down, TiVo style?

In the past, they were all locked down, the team submitting those
patches in working on changing that. Have a look for OpenBMC.

       Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-03-06 12:40   ` Pavel Machek
  2018-03-06 12:54     ` Andrew Lunn
@ 2018-03-06 19:05     ` Jae Hyun Yoo
  2018-03-07 22:11       ` Pavel Machek
  2018-03-09 23:41       ` Milton Miller II
  1 sibling, 2 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-03-06 19:05 UTC (permalink / raw)
  To: Pavel Machek
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

Hi Pavel,

Thanks for sharing your time on reviewing it. Please see my answers inline.

-Jae

On 3/6/2018 4:40 AM, Pavel Machek wrote:
> Hi!
> 
>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>> ---
>>   .../devicetree/bindings/peci/peci-aspeed.txt       | 73 ++++++++++++++++++++++
>>   1 file changed, 73 insertions(+)
>>   create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt
>>
>> diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
>> new file mode 100644
>> index 000000000000..8a86f346d550
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
>> @@ -0,0 +1,73 @@
>> +Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs.
> 
> Are these SoCs x86-based?
> 

Yes, these are ARM SoCs. Please see Andrew's answer as well.

>> +Required properties:
>> +- compatible
>> +	"aspeed,ast2400-peci" or "aspeed,ast2500-peci"
>> +	- aspeed,ast2400-peci: Aspeed AST2400 family PECI controller
>> +	- aspeed,ast2500-peci: Aspeed AST2500 family PECI controller
>> +
>> +- reg
>> +	Should contain PECI registers location and length.
> 
> Other dts documents put it on one line, reg: Should contain ...
> 
>> +- clock_frequency
>> +	Should contain the operation frequency of PECI hardware module.
>> +	187500 ~ 24000000
> 
> specify this is Hz?
> 

I'll add a description. Thanks!

>> +- rd-sampling-point
>> +	Read sampling point selection. The whole period of a bit time will be
>> +	divided into 16 time frames. This value will determine which time frame
>> +	this controller will sample PECI signal for data read back. Usually in
>> +	the middle of a bit time is the best.
> 
> English? "This value will determine when this controller"?
> 

Could I change it like below?:

"This value will determine in which time frame this controller samples 
PECI signal for data read back"

>> +	0 ~ 15 (default: 8)
>> +
>> +- cmd_timeout_ms
>> +	Command timeout in units of ms.
>> +	1 ~ 60000 (default: 1000)
>> +
>> +Example:
>> +	peci: peci@1e78b000 {
>> +		compatible = "simple-bus";
>> +		#address-cells = <1>;
>> +		#size-cells = <1>;
>> +		ranges = <0x0 0x1e78b000 0x60>;
>> +
>> +		peci0: peci-bus@0 {
>> +			compatible = "aspeed,ast2500-peci";
>> +			reg = <0x0 0x60>;
>> +			#address-cells = <1>;
>> +			#size-cells = <0>;
>> +			interrupts = <15>;
>> +			clocks = <&clk_clkin>;
>> +			clock-frequency = <24000000>;
>> +			msg-timing-nego = <1>;
>> +			addr-timing-nego = <1>;
>> +			rd-sampling-point = <8>;
>> +			cmd-timeout-ms = <1000>;
>> +		};
>> +	};
>> \ No newline at end of file
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 0/8] PECI device driver introduction
  2018-03-06 12:40 ` [PATCH v2 0/8] PECI device driver introduction Pavel Machek
@ 2018-03-06 19:21   ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-03-06 19:21 UTC (permalink / raw)
  To: Pavel Machek
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

Hi Pavel,

Please see my answer inline.

On 3/6/2018 4:40 AM, Pavel Machek wrote:
> Hi!
> 
>> Introduction of the Platform Environment Control Interface (PECI) bus
>> device driver. PECI is a one-wire bus interface that provides a
>> communication channel between Intel processor and chipset components to
>> external monitoring or control devices. PECI is designed to support the
>> following sideband functions:
>>
>> * Processor and DRAM thermal management
>>    - Processor fan speed control is managed by comparing Digital Thermal
>>      Sensor (DTS) thermal readings acquired via PECI against the
>>      processor-specific fan speed control reference point, or TCONTROL.
>>      Both TCONTROL and DTS thermal readings are accessible via the processor
>>      PECI client. These variables are referenced to a common temperature,
>>      the TCC activation point, and are both defined as negative offsets from
>>      that reference.
>>    - PECI based access to the processor package configuration space provides
>>      a means for Baseboard Management Controllers (BMC) or other platform
>>      management devices to actively manage the processor and memory power
>>      and thermal features.
>>
>> * Platform Manageability
>>    - Platform manageability functions including thermal, power, and error
>>      monitoring. Note that platform 'power' management includes monitoring
>>      and control for both the processor and DRAM subsystem to assist with
>>      data center power limiting.
>>    - PECI allows read access to certain error registers in the processor MSR
>>      space and status monitoring registers in the PCI configuration space
>>      within the processor and downstream devices.
>>    - PECI permits writes to certain registers in the processor PCI
>>      configuration space.
>>
>> * Processor Interface Tuning and Diagnostics
>>    - Processor interface tuning and diagnostics capabilities
>>      (Intel(c) Interconnect BIST). The processors Intel(c) Interconnect
>>      Built In Self Test (Intel(c) IBIST) allows for infield diagnostic
>>      capabilities in the Intel UPI and memory controller interfaces. PECI
>>      provides a port to execute these diagnostics via its PCI Configuration
>>      read and write capabilities.
>>
>> * Failure Analysis
>>    - Output the state of the processor after a failure for analysis via
>>      Crashdump.
>>
>> PECI uses a single wire for self-clocking and data transfer. The bus
>> requires no additional control lines. The physical layer is a self-clocked
>> one-wire bus that begins each bit with a driven, rising edge from an idle
>> level near zero volts. The duration of the signal driven high depends on
>> whether the bit value is a logic '0' or logic '1'. PECI also includes
>> variable data transfer rate established with every message. In this way,
>> it is highly flexible even though underlying logic is simple.
>>
>> The interface design was optimized for interfacing to Intel processor and
>> chipset components in both single processor and multiple processor
>> environments. The single wire interface provides low board routing
>> overhead for the multiple load connections in the congested routing area
>> near the processor and chipset components. Bus speed, error checking, and
>> low protocol overhead provides adequate link bandwidth and reliability to
>> transfer critical device operating conditions and configuration
>> information.
>>
>> This implementation provides the basic framework to add PECI extensions
>> to the Linux bus and device models. A hardware specific 'Adapter' driver
>> can be attached to the PECI bus to provide sideband functions described
>> above. It is also possible to access all devices on an adapter from
>> userspace through the /dev interface. A device specific 'Client' driver
>> also can be attached to the PECI bus so each processor client's features
>> can be supported by the 'Client' driver through an adapter connection in
>> the bus. This patch set includes Aspeed 24xx/25xx PECI driver and a generic
>> PECI hwmon driver as the first implementation for both adapter and client
>> drivers on the PECI bus framework.
> 
> Ok, how does this interact with ACPI/SMM BIOS/Secure mode code? Does
> Linux _need_ to control the fan? Or is SMM BIOS capable of doing all
> the work itself and Linux has just read-only access for monitoring
> purposes?
> 

This driver is not for local CPUs which this driver is running on. 
Instead, this driver will be running on BMC (Baseboard Management 
Controller) kernel which is separated from the server machine. In this 
implementation, it provides just read-only access for monitoring the 
server's CPU and DIMM temperatures remotely through a PECI connection. 
The BMC can control fans according to the monitoring data if the BMC has 
a fan control interface and feature, but it depends on baseboard 
hardware and software designs.

Thanks,
Jae

> Pavel
> 
> -- (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 6/8] [PATCH 6/8] Documentation: hwmon: Add a document for PECI hwmon client driver
  2018-02-21 16:16 ` [PATCH v2 6/8] [PATCH 6/8] Documentation: hwmon: " Jae Hyun Yoo
@ 2018-03-06 20:28   ` Randy Dunlap
  2018-03-06 21:08     ` Jae Hyun Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Randy Dunlap @ 2018-03-06 20:28 UTC (permalink / raw)
  To: Jae Hyun Yoo, joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

Hi,

On 02/21/2018 08:16 AM, Jae Hyun Yoo wrote:

> +temp<n>_label		Provides DDR DIMM temperature if this label indicates
> +			'DIMM #'.
> +temp<n>_input		Provides current temperature of the DDR DIMM.
> +
> +Note:
> +	DIMM temperature group will be appeared when the client CPU's BIOS

	                       will appear when

> +	completes memory training and testing.
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 6/8] [PATCH 6/8] Documentation: hwmon: Add a document for PECI hwmon client driver
  2018-03-06 20:28   ` Randy Dunlap
@ 2018-03-06 21:08     ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-03-06 21:08 UTC (permalink / raw)
  To: Randy Dunlap, joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

Hi Randy,

On 3/6/2018 12:28 PM, Randy Dunlap wrote:
> Hi,
> 
> On 02/21/2018 08:16 AM, Jae Hyun Yoo wrote:
> 
>> +temp<n>_label		Provides DDR DIMM temperature if this label indicates
>> +			'DIMM #'.
>> +temp<n>_input		Provides current temperature of the DDR DIMM.
>> +
>> +Note:
>> +	DIMM temperature group will be appeared when the client CPU's BIOS
> 
> 	                       will appear when
> 

I'll fix this description as you suggested. Thanks a lot!

Jae

>> +	completes memory training and testing.
>>
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
                     ` (3 preceding siblings ...)
  2018-02-22  7:01   ` [RFC PATCH] drivers/peci: peci_match_id() can be static kbuild test robot
@ 2018-03-07  3:19   ` Julia Cartwright
  2018-03-07 19:03     ` Jae Hyun Yoo
  4 siblings, 1 reply; 46+ messages in thread
From: Julia Cartwright @ 2018-03-07  3:19 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
> This commit adds driver implementation for PECI bus into linux
> driver framework.
> 
> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> ---
[..]
> +static int peci_locked_xfer(struct peci_adapter *adapter,
> +			    struct peci_xfer_msg *msg,
> +			    bool do_retry,
> +			    bool has_aw_fcs)

_locked generally means that this function is invoked with some critical
lock held, what lock does the caller need to acquire before invoking
this function?

> +{
> +	ktime_t start, end;
> +	s64 elapsed_ms;
> +	int rc = 0;
> +
> +	if (!adapter->xfer) {

Is this really an optional feature of an adapter?  If this is not
optional, then this check should be in place when the adapter is
registered, not here.  (And it should WARN_ON(), because it's a driver
developer error).

> +		dev_dbg(&adapter->dev, "PECI level transfers not supported\n");
> +		return -ENODEV;
> +	}
> +
> +	if (in_atomic() || irqs_disabled()) {

As Andrew mentioned, this is broken.

You don't even need a might_sleep().  The locking functions you use here
will already include a might_sleep() w/ CONFIG_DEBUG_ATOMIC_SLEEP.

> +		rt_mutex_trylock(&adapter->bus_lock);
> +		if (!rc)
> +			return -EAGAIN; /* PECI activity is ongoing */
> +	} else {
> +		rt_mutex_lock(&adapter->bus_lock);
> +	}
> +
> +	if (do_retry)
> +		start = ktime_get();
> +
> +	do {
> +		rc = adapter->xfer(adapter, msg);
> +
> +		if (!do_retry)
> +			break;
> +
> +		/* Per the PECI spec, need to retry commands that return 0x8x */
> +		if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
> +			      DEV_PECI_CC_TIMEOUT)))
> +			break;

This is pretty difficult to parse.  Can you split it into two different
conditions?

> +
> +		/* Set the retry bit to indicate a retry attempt */
> +		msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;

Are you sure this bit is to be set in the _second_ byte of tx_buf?

> +
> +		/* Recalculate the AW FCS if it has one */
> +		if (has_aw_fcs)
> +			msg->tx_buf[msg->tx_len - 1] = 0x80 ^
> +						peci_aw_fcs((u8 *)msg,
> +							    2 + msg->tx_len);
> +
> +		/* Retry for at least 250ms before returning an error */
> +		end = ktime_get();
> +		elapsed_ms = ktime_to_ms(ktime_sub(end, start));
> +		if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
> +			dev_dbg(&adapter->dev, "Timeout retrying xfer!\n");
> +			break;
> +		}
> +	} while (true);
> +
> +	rt_mutex_unlock(&adapter->bus_lock);
> +
> +	return rc;
> +}
> +
> +static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg)
> +{
> +	return peci_locked_xfer(adapter, msg, false, false);
> +}
> +
> +static int peci_xfer_with_retries(struct peci_adapter *adapter,
> +				  struct peci_xfer_msg *msg,
> +				  bool has_aw_fcs)
> +{
> +	return peci_locked_xfer(adapter, msg, true, has_aw_fcs);
> +}
> +
> +static int peci_scan_cmd_mask(struct peci_adapter *adapter)
> +{
> +	struct peci_xfer_msg msg;
> +	u32 dib;
> +	int rc = 0;
> +
> +	/* Update command mask just once */
> +	if (adapter->cmd_mask & BIT(PECI_CMD_PING))
> +		return 0;
> +
> +	msg.addr      = PECI_BASE_ADDR;
> +	msg.tx_len    = GET_DIB_WR_LEN;
> +	msg.rx_len    = GET_DIB_RD_LEN;
> +	msg.tx_buf[0] = GET_DIB_PECI_CMD;
> +
> +	rc = peci_xfer(adapter, &msg);
> +	if (rc < 0) {
> +		dev_dbg(&adapter->dev, "PECI xfer error, rc : %d\n", rc);
> +		return rc;
> +	}
> +
> +	dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
> +	      (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
> +
> +	/* Check special case for Get DIB command */
> +	if (dib == 0x00) {
> +		dev_dbg(&adapter->dev, "DIB read as 0x00\n");
> +		return -1;
> +	}
> +
> +	if (!rc) {

You should change this to:

	if (rc) {
		dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
		return rc;
	}

And then leave the happy path below unindented.

> +		/**
> +		 * setting up the supporting commands based on minor rev#
> +		 * see PECI Spec Table 3-1
> +		 */
> +		dib = (dib >> 8) & 0xF;
> +
> +		if (dib >= 0x1) {
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
> +		}
> +
> +		if (dib >= 0x2)
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR);
> +
> +		if (dib >= 0x3) {
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG_LOCAL);
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG_LOCAL);
> +		}
> +
> +		if (dib >= 0x4)
> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG);
> +
> +		if (dib >= 0x5)
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG);
> +
> +		if (dib >= 0x6)
> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_IA_MSR);
> +
> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_TEMP);
> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_DIB);
> +		adapter->cmd_mask |= BIT(PECI_CMD_PING);

These cmd_mask updates are not done with any locking in mind.  Is this
intentional?  Or: is synchronization not necessary because this is
always done during enumeration prior to exposing the adapter to users?

> +	} else {
> +		dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
> +	}
> +
> +	return rc;
> +}
> +
> +static int peci_cmd_support(struct peci_adapter *adapter, enum peci_cmd cmd)
> +{
> +	if (!(adapter->cmd_mask & BIT(PECI_CMD_PING)) &&
> +	    peci_scan_cmd_mask(adapter) < 0) {
> +		dev_dbg(&adapter->dev, "Failed to scan command mask\n");
> +		return -EIO;
> +	}
> +
> +	if (!(adapter->cmd_mask & BIT(cmd))) {
> +		dev_dbg(&adapter->dev, "Command %d is not supported\n", cmd);
> +		return -EINVAL;
> +	}

It would be nicer if you did this check prior to dispatching to the
various subfunctions (peci_ioctl_ping, peci_ioctl_get_dib, etc.).  In
that way, these functions could just assume the adapter supports them.

[..]
> +static int peci_register_adapter(struct peci_adapter *adapter)
> +{
> +	int res = -EINVAL;
> +
> +	/* Can't register until after driver model init */
> +	if (WARN_ON(!is_registered)) {

Is this solving a problem you actually ran into?

[.. skipped review due to fatigue ..]

> +++ b/include/linux/peci.h
> @@ -0,0 +1,97 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2018 Intel Corporation
> +
> +#ifndef __LINUX_PECI_H
> +#define __LINUX_PECI_H
> +
> +#include <linux/cdev.h>
> +#include <linux/device.h>
> +#include <linux/peci-ioctl.h>
> +#include <linux/rtmutex.h>
> +
> +#define PECI_BUFFER_SIZE  32
> +#define PECI_NAME_SIZE    32
> +
> +struct peci_xfer_msg {
> +	u8	addr;
> +	u8	tx_len;
> +	u8	rx_len;
> +	u8	tx_buf[PECI_BUFFER_SIZE];
> +	u8	rx_buf[PECI_BUFFER_SIZE];
> +} __attribute__((__packed__));

The packed attribute has historically caused gcc to emit atrocious code,
as it seems to assume packed implies members might not be naturally
aligned.  Seeing as you're only working with u8s in this case, though,
this shouldn't be a problem.

> +struct peci_board_info {
> +	char			type[PECI_NAME_SIZE];
> +	u8			addr;	/* CPU client address */
> +	struct device_node	*of_node;
> +};
> +
> +struct peci_adapter {
> +	struct module	*owner;
> +	struct rt_mutex	bus_lock;

Why an rt_mutex, instead of a regular mutex.  Do you explicitly need PI
in mainline?

> +	struct device	dev;
> +	struct cdev	cdev;
> +	int		nr;
> +	char		name[PECI_NAME_SIZE];
> +	int		(*xfer)(struct peci_adapter *adapter,
> +				struct peci_xfer_msg *msg);
> +	uint		cmd_mask;
> +};
> +
> +#define to_peci_adapter(d) container_of(d, struct peci_adapter, dev)

You can also do this with a static inline, which provides a marginally
better error when screwed up.

   Julia

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
  2018-03-07  3:19   ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Julia Cartwright
@ 2018-03-07 19:03     ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-03-07 19:03 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

Hi Julia,

Thanks for sharing your time on reviewing it. Please see my inline answers.

Jae

On 3/6/2018 7:19 PM, Julia Cartwright wrote:
> On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
>> This commit adds driver implementation for PECI bus into linux
>> driver framework.
>>
>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>> ---
> [..]
>> +static int peci_locked_xfer(struct peci_adapter *adapter,
>> +			    struct peci_xfer_msg *msg,
>> +			    bool do_retry,
>> +			    bool has_aw_fcs)
> 
> _locked generally means that this function is invoked with some critical
> lock held, what lock does the caller need to acquire before invoking
> this function?
> 

I intended to show that this function has a mutex locking inside for 
serialization of PECI data transactions from multiple callers, but as 
you commented out below, the mutex protection scope should be adjusted 
to make that covers the peci_scan_cmd_mask() function too. I'll rewrite 
the mutex protection scope then this function will be in the locked scope.

>> +{
>> +	ktime_t start, end;
>> +	s64 elapsed_ms;
>> +	int rc = 0;
>> +
>> +	if (!adapter->xfer) {
> 
> Is this really an optional feature of an adapter?  If this is not
> optional, then this check should be in place when the adapter is
> registered, not here.  (And it should WARN_ON(), because it's a driver
> developer error).
> 

I agree with you. I'll move this code into the peci_register_adapter() 
function.

>> +		dev_dbg(&adapter->dev, "PECI level transfers not supported\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	if (in_atomic() || irqs_disabled()) {
> 
> As Andrew mentioned, this is broken.
> 
> You don't even need a might_sleep().  The locking functions you use here
> will already include a might_sleep() w/ CONFIG_DEBUG_ATOMIC_SLEEP.
> 

Thanks for letting me know that. I'll drop that checking code and 
might_sleep() too.

>> +		rt_mutex_trylock(&adapter->bus_lock);
>> +		if (!rc)
>> +			return -EAGAIN; /* PECI activity is ongoing */
>> +	} else {
>> +		rt_mutex_lock(&adapter->bus_lock);
>> +	}
>> +
>> +	if (do_retry)
>> +		start = ktime_get();
>> +
>> +	do {
>> +		rc = adapter->xfer(adapter, msg);
>> +
>> +		if (!do_retry)
>> +			break;
>> +
>> +		/* Per the PECI spec, need to retry commands that return 0x8x */
>> +		if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
>> +			      DEV_PECI_CC_TIMEOUT)))
>> +			break;
> 
> This is pretty difficult to parse.  Can you split it into two different
> conditions?
> 

Sure. I'll split it out.

>> +
>> +		/* Set the retry bit to indicate a retry attempt */
>> +		msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;
> 
> Are you sure this bit is to be set in the _second_ byte of tx_buf?
> 

Yes, I'm pretty sure. The first byte contains a PECI command value and 
the second byte contains 'HostID[7:1] & Retry[0]' value.

>> +
>> +		/* Recalculate the AW FCS if it has one */
>> +		if (has_aw_fcs)
>> +			msg->tx_buf[msg->tx_len - 1] = 0x80 ^
>> +						peci_aw_fcs((u8 *)msg,
>> +							    2 + msg->tx_len);
>> +
>> +		/* Retry for at least 250ms before returning an error */
>> +		end = ktime_get();
>> +		elapsed_ms = ktime_to_ms(ktime_sub(end, start));
>> +		if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
>> +			dev_dbg(&adapter->dev, "Timeout retrying xfer!\n");
>> +			break;
>> +		}
>> +	} while (true);
>> +
>> +	rt_mutex_unlock(&adapter->bus_lock);
>> +
>> +	return rc;
>> +}
>> +
>> +static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg)
>> +{
>> +	return peci_locked_xfer(adapter, msg, false, false);
>> +}
>> +
>> +static int peci_xfer_with_retries(struct peci_adapter *adapter,
>> +				  struct peci_xfer_msg *msg,
>> +				  bool has_aw_fcs)
>> +{
>> +	return peci_locked_xfer(adapter, msg, true, has_aw_fcs);
>> +}
>> +
>> +static int peci_scan_cmd_mask(struct peci_adapter *adapter)
>> +{
>> +	struct peci_xfer_msg msg;
>> +	u32 dib;
>> +	int rc = 0;
>> +
>> +	/* Update command mask just once */
>> +	if (adapter->cmd_mask & BIT(PECI_CMD_PING))
>> +		return 0;
>> +
>> +	msg.addr      = PECI_BASE_ADDR;
>> +	msg.tx_len    = GET_DIB_WR_LEN;
>> +	msg.rx_len    = GET_DIB_RD_LEN;
>> +	msg.tx_buf[0] = GET_DIB_PECI_CMD;
>> +
>> +	rc = peci_xfer(adapter, &msg);
>> +	if (rc < 0) {
>> +		dev_dbg(&adapter->dev, "PECI xfer error, rc : %d\n", rc);
>> +		return rc;
>> +	}
>> +
>> +	dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
>> +	      (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
>> +
>> +	/* Check special case for Get DIB command */
>> +	if (dib == 0x00) {
>> +		dev_dbg(&adapter->dev, "DIB read as 0x00\n");
>> +		return -1;
>> +	}
>> +
>> +	if (!rc) {
> 
> You should change this to:
> 
> 	if (rc) {
> 		dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
> 		return rc;
> 	}
> 
> And then leave the happy path below unindented.
> 

Agreed. That would be neater. Will rewrite it. Thanks!

>> +		/**
>> +		 * setting up the supporting commands based on minor rev#
>> +		 * see PECI Spec Table 3-1
>> +		 */
>> +		dib = (dib >> 8) & 0xF;
>> +
>> +		if (dib >= 0x1) {
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
>> +		}
>> +
>> +		if (dib >= 0x2)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR);
>> +
>> +		if (dib >= 0x3) {
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG_LOCAL);
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG_LOCAL);
>> +		}
>> +
>> +		if (dib >= 0x4)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_RD_PCI_CFG);
>> +
>> +		if (dib >= 0x5)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_PCI_CFG);
>> +
>> +		if (dib >= 0x6)
>> +			adapter->cmd_mask |= BIT(PECI_CMD_WR_IA_MSR);
>> +
>> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_TEMP);
>> +		adapter->cmd_mask |= BIT(PECI_CMD_GET_DIB);
>> +		adapter->cmd_mask |= BIT(PECI_CMD_PING);
> 
> These cmd_mask updates are not done with any locking in mind.  Is this
> intentional?  Or: is synchronization not necessary because this is
> always done during enumeration prior to exposing the adapter to users?
> 

Thanks for the pointing it out. This function should be done in a locked 
scope as you said. I'll adjust mutex protection scope to make that 
covers this function as well.

>> +	} else {
>> +		dev_dbg(&adapter->dev, "Error reading DIB, rc : %d\n", rc);
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static int peci_cmd_support(struct peci_adapter *adapter, enum peci_cmd cmd)
>> +{
>> +	if (!(adapter->cmd_mask & BIT(PECI_CMD_PING)) &&
>> +	    peci_scan_cmd_mask(adapter) < 0) {
>> +		dev_dbg(&adapter->dev, "Failed to scan command mask\n");
>> +		return -EIO;
>> +	}
>> +
>> +	if (!(adapter->cmd_mask & BIT(cmd))) {
>> +		dev_dbg(&adapter->dev, "Command %d is not supported\n", cmd);
>> +		return -EINVAL;
>> +	}
> 
> It would be nicer if you did this check prior to dispatching to the
> various subfunctions (peci_ioctl_ping, peci_ioctl_get_dib, etc.).  In
> that way, these functions could just assume the adapter supports them.
> 

Agreed. I'll drop all individual calls from subfunctions and will call 
it from peci_command().

> [..]
>> +static int peci_register_adapter(struct peci_adapter *adapter)
>> +{
>> +	int res = -EINVAL;
>> +
>> +	/* Can't register until after driver model init */
>> +	if (WARN_ON(!is_registered)) {
> 
> Is this solving a problem you actually ran into?
> 

Generally, an adapter driver registration will be happened after the 
PECI bus registration because peci_init uses postcore_initcall, but in 
case of incorrect implementation of an adapter driver which uses
a preceding postcore_initcall or a core_initcall as its module init, 
then an adapter registration would be prior to bus registration. This 
code is an exceptional case handling for that to warn the incorrect 
adapter driver implementation.

> [.. skipped review due to fatigue ..]
> 
>> +++ b/include/linux/peci.h
>> @@ -0,0 +1,97 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (c) 2018 Intel Corporation
>> +
>> +#ifndef __LINUX_PECI_H
>> +#define __LINUX_PECI_H
>> +
>> +#include <linux/cdev.h>
>> +#include <linux/device.h>
>> +#include <linux/peci-ioctl.h>
>> +#include <linux/rtmutex.h>
>> +
>> +#define PECI_BUFFER_SIZE  32
>> +#define PECI_NAME_SIZE    32
>> +
>> +struct peci_xfer_msg {
>> +	u8	addr;
>> +	u8	tx_len;
>> +	u8	rx_len;
>> +	u8	tx_buf[PECI_BUFFER_SIZE];
>> +	u8	rx_buf[PECI_BUFFER_SIZE];
>> +} __attribute__((__packed__));
> 
> The packed attribute has historically caused gcc to emit atrocious code,
> as it seems to assume packed implies members might not be naturally
> aligned.  Seeing as you're only working with u8s in this case, though,
> this shouldn't be a problem.
> 

It should be a packed struct because it is also being used for CRC8 
calculation which is treating it as a contiguous byte array.

>> +struct peci_board_info {
>> +	char			type[PECI_NAME_SIZE];
>> +	u8			addr;	/* CPU client address */
>> +	struct device_node	*of_node;
>> +};
>> +
>> +struct peci_adapter {
>> +	struct module	*owner;
>> +	struct rt_mutex	bus_lock;
> 
> Why an rt_mutex, instead of a regular mutex.  Do you explicitly need PI
> in mainline?
> 

Currently this implementation has only a temperature monitoring sideband 
feature but other sideband features such as CPU error detection and 
crash dump will be implemented later, and those additional sideband 
features should have higher priority than the temperature monitoring 
feature so it is the reason why I used an rt_mutex.

>> +	struct device	dev;
>> +	struct cdev	cdev;
>> +	int		nr;
>> +	char		name[PECI_NAME_SIZE];
>> +	int		(*xfer)(struct peci_adapter *adapter,
>> +				struct peci_xfer_msg *msg);
>> +	uint		cmd_mask;
>> +};
>> +
>> +#define to_peci_adapter(d) container_of(d, struct peci_adapter, dev)
> 
> You can also do this with a static inline, which provides a marginally
> better error when screwed up.
> 

Agreed. That would be more helpful for debugging in debug build. I'll 
rewrite the macro to a static inline like below:

static inline struct peci_adapter *to_peci_adapter(void *d)
{
	return container_of(d, struct peci_adapter, dev);
}

>     Julia
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-03-06 19:05     ` Jae Hyun Yoo
@ 2018-03-07 22:11       ` Pavel Machek
  2018-03-09 23:41       ` Milton Miller II
  1 sibling, 0 replies; 46+ messages in thread
From: Pavel Machek @ 2018-03-07 22:11 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew,
	linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

Hi!

> >Are these SoCs x86-based?
> 
> Yes, these are ARM SoCs. Please see Andrew's answer as well.

Understood, thanks.

> >>+	Read sampling point selection. The whole period of a bit time will be
> >>+	divided into 16 time frames. This value will determine which time frame
> >>+	this controller will sample PECI signal for data read back. Usually in
> >>+	the middle of a bit time is the best.
> >
> >English? "This value will determine when this controller"?
> >
> 
> Could I change it like below?:
> 
> "This value will determine in which time frame this controller samples PECI
> signal for data read back"

I guess... I'm not native speaker, I guess this could be improved some
more.

Best regards,
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-03-06 19:05     ` Jae Hyun Yoo
  2018-03-07 22:11       ` Pavel Machek
@ 2018-03-09 23:41       ` Milton Miller II
  2018-03-09 23:47         ` Jae Hyun Yoo
  1 sibling, 1 reply; 46+ messages in thread
From: Milton Miller II @ 2018-03-09 23:41 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jae Hyun Yoo, linux-hwmon, andrew, jdelvare, arnd, linux-doc,
	andrew, gregkh, openbmc, linux-kernel, devicetree, linux,
	linux-arm-kernel

About  03/07/2018 04:12PM in some time zone, Pavel Machek wrote:
>Subject: Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings:
>Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
>
>Hi!
>
>> >Are these SoCs x86-based?
>> 
>> Yes, these are ARM SoCs. Please see Andrew's answer as well.
>
>Understood, thanks.
>
>> >>+	Read sampling point selection. The whole period of a bit time
>will be
>> >>+	divided into 16 time frames. This value will determine which
>time frame
>> >>+	this controller will sample PECI signal for data read back.
>Usually in
>> >>+	the middle of a bit time is the best.
>> >
>> >English? "This value will determine when this controller"?
>> >
>> 
>> Could I change it like below?:
>> 
>> "This value will determine in which time frame this controller
>samples PECI
>> signal for data read back"
>
>I guess... I'm not native speaker, I guess this could be improved
>some
>more.
>

I agree this wording is still confusing. 

The problem is that the key subject, the time of the sampling, is in the descriptive clause "in which time frame".

"This value will determine the time frame in which the controller will sample"

or perhaps phrase it as saving a specific sample from the over-clock, or a phase of the clock.

>Best regards,
>									Pavel
>
>-- 
>(english) http://www.livejournal.com/~pavelmachek
>(cesky, pictures)
>http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

milton
--
Speaking for myself not IBM.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
  2018-03-09 23:41       ` Milton Miller II
@ 2018-03-09 23:47         ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-03-09 23:47 UTC (permalink / raw)
  To: Milton Miller II, Pavel Machek
  Cc: linux-hwmon, andrew, jdelvare, arnd, linux-doc, andrew, gregkh,
	openbmc, linux-kernel, devicetree, linux, linux-arm-kernel

Hi Milton,

Thanks for sharing your time to review this patch. Please see my answer 
inline.

Jae

On 3/9/2018 3:41 PM, Milton Miller II wrote:
> About  03/07/2018 04:12PM in some time zone, Pavel Machek wrote:
>> Subject: Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings:
>> Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
>>
>> Hi!
>>
>>>> Are these SoCs x86-based?
>>>
>>> Yes, these are ARM SoCs. Please see Andrew's answer as well.
>>
>> Understood, thanks.
>>
>>>>> +	Read sampling point selection. The whole period of a bit time
>> will be
>>>>> +	divided into 16 time frames. This value will determine which
>> time frame
>>>>> +	this controller will sample PECI signal for data read back.
>> Usually in
>>>>> +	the middle of a bit time is the best.
>>>>
>>>> English? "This value will determine when this controller"?
>>>>
>>>
>>> Could I change it like below?:
>>>
>>> "This value will determine in which time frame this controller
>> samples PECI
>>> signal for data read back"
>>
>> I guess... I'm not native speaker, I guess this could be improved
>> some
>> more.
>>
> 
> I agree this wording is still confusing.
> 
> The problem is that the key subject, the time of the sampling, is in the descriptive clause "in which time frame".
> 
> "This value will determine the time frame in which the controller will sample"
> 
> or perhaps phrase it as saving a specific sample from the over-clock, or a phase of the clock.
> 

Yes, that looks more better. I'll change the wording as you suggested. 
Thanks a lot!

Jae

>> Best regards,
>> 									Pavel
>>
>> -- 
>> (english) http://www.livejournal.com/~pavelmachek
>> (cesky, pictures)
>> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>>
> 
> milton
> --
> Speaking for myself not IBM.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-02-21 16:16 ` [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic " Jae Hyun Yoo
  2018-02-21 18:26   ` Guenter Roeck
@ 2018-03-13  9:32   ` Stef van Os
  2018-03-13 18:56     ` Jae Hyun Yoo
  1 sibling, 1 reply; 46+ messages in thread
From: Stef van Os @ 2018-03-13  9:32 UTC (permalink / raw)
  To: Jae Hyun Yoo, joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-hwmon, devicetree, linux-doc, openbmc, linux-kernel,
	linux-arm-kernel

Hi Jae,

I tried version 1 and 2 of your PECI patch on our (AST2500 / Xeon E5 v4) 
system. The V1 patchset works as expected (reading back temperature 0 
until PECI is up), but the hwmon driver probe fails with version 2. It 
communicates with the Xeon and assumes during kernel boot of the Aspeed 
that PECI to the Xeon's is already up and running, but our system 
enables the main Xeon supplies from AST2500 userspace.

If I load the hwmon driver as a module to load later on, the driver does 
not call probe like e.g. a I2C driver on the I2C bus does. Am I using V2 
wrongly?

BR,
Stef

On 02/21/2018 05:16 PM, Jae Hyun Yoo wrote:
> This commit adds a generic PECI hwmon client driver implementation.
> 
> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> ---
>   drivers/hwmon/Kconfig      |  10 +
>   drivers/hwmon/Makefile     |   1 +
>   drivers/hwmon/peci-hwmon.c | 928 +++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 939 insertions(+)
>   create mode 100644 drivers/hwmon/peci-hwmon.c
> 
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index ef23553ff5cb..f22e0c31f597 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
>   	  This driver can also be built as a module.  If so, the module
>   	  will be called nct7904.
>   
> +config SENSORS_PECI_HWMON
> +	tristate "PECI hwmon support"
> +	depends on PECI
> +	help
> +	  If you say yes here you get support for the generic PECI hwmon
> +	  driver.
> +
> +	  This driver can also be built as a module.  If so, the module
> +	  will be called peci-hwmon.
> +
>   config SENSORS_NSA320
>   	tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
>   	depends on GPIOLIB && OF
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index f814b4ace138..946f54b168e5 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)	+= nct7802.o
>   obj-$(CONFIG_SENSORS_NCT7904)	+= nct7904.o
>   obj-$(CONFIG_SENSORS_NSA320)	+= nsa320-hwmon.o
>   obj-$(CONFIG_SENSORS_NTC_THERMISTOR)	+= ntc_thermistor.o
> +obj-$(CONFIG_SENSORS_PECI_HWMON)	+= peci-hwmon.o
>   obj-$(CONFIG_SENSORS_PC87360)	+= pc87360.o
>   obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
>   obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
> diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
> new file mode 100644
> index 000000000000..edd27744adcb
> --- /dev/null
> +++ b/drivers/hwmon/peci-hwmon.c
> @@ -0,0 +1,928 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2018 Intel Corporation
> +
> +#include <linux/delay.h>
> +#include <linux/hwmon.h>
> +#include <linux/hwmon-sysfs.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/of_device.h>
> +#include <linux/peci.h>
> +#include <linux/workqueue.h>
> +
> +#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks x 2) */
> +#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX Platinum) */
> +#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
> +
> +#define CORE_TEMP_ATTRS       5
> +#define DIMM_TEMP_ATTRS       2
> +#define ATTR_NAME_LEN         24
> +
> +#define DEFAULT_ATTR_GRP_NUMS 5
> +
> +#define UPDATE_INTERVAL_MIN   HZ
> +#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
> +
> +enum sign {
> +	POS,
> +	NEG
> +};
> +
> +struct temp_data {
> +	bool valid;
> +	s32  value;
> +	unsigned long last_updated;
> +};
> +
> +struct temp_group {
> +	struct temp_data tjmax;
> +	struct temp_data tcontrol;
> +	struct temp_data tthrottle;
> +	struct temp_data dts_margin;
> +	struct temp_data die;
> +	struct temp_data core[CORE_NUMS_MAX];
> +	struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
> +};
> +
> +struct core_temp_group {
> +	struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
> +	char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
> +	struct attribute *attrs[CORE_TEMP_ATTRS + 1];
> +	struct attribute_group attr_group;
> +};
> +
> +struct dimm_temp_group {
> +	struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
> +	char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
> +	struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
> +	struct attribute_group attr_group;
> +};
> +
> +struct peci_hwmon {
> +	struct peci_client *client;
> +	struct device *dev;
> +	struct device *hwmon_dev;
> +	struct workqueue_struct *work_queue;
> +	struct delayed_work work_handler;
> +	char name[PECI_NAME_SIZE];
> +	struct temp_group temp;
> +	u8 addr;
> +	uint cpu_no;
> +	u32 core_mask;
> +	u32 dimm_mask;
> +	const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
> +	const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX + 1];
> +	uint global_idx;
> +	uint core_idx;
> +	uint dimm_idx;
> +};
> +
> +enum label {
> +	L_DIE,
> +	L_DTS,
> +	L_TCONTROL,
> +	L_TTHROTTLE,
> +	L_TJMAX,
> +	L_MAX
> +};
> +
> +static const char *peci_label[L_MAX] = {
> +	"Die\n",
> +	"DTS margin to Tcontrol\n",
> +	"Tcontrol\n",
> +	"Tthrottle\n",
> +	"Tjmax\n",
> +};
> +
> +static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, void *msg)
> +{
> +	return peci_command(priv->client->adapter, cmd, msg);
> +}
> +
> +static int need_update(struct temp_data *temp)
> +{
> +	if (temp->valid &&
> +	    time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static s32 ten_dot_six_to_millidegree(s32 x)
> +{
> +	return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
> +}
> +
> +static int get_tjmax(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int rc;
> +
> +	if (!priv->temp.tjmax.valid) {
> +		msg.addr = priv->addr;
> +		msg.index = MBX_INDEX_TEMP_TARGET;
> +		msg.param = 0;
> +		msg.rx_len = 4;
> +
> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +		if (rc < 0)
> +			return rc;
> +
> +		priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
> +		priv->temp.tjmax.valid = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_tcontrol(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 tcontrol_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.tcontrol))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_TEMP_TARGET;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	tcontrol_margin = msg.pkg_config[1];
> +	tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
> +
> +	priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
> +
> +	if (!priv->temp.tcontrol.valid) {
> +		priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
> +		priv->temp.tcontrol.valid = true;
> +	} else {
> +		priv->temp.tcontrol.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_tthrottle(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 tthrottle_offset;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.tthrottle))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_TEMP_TARGET;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
> +	priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
> +
> +	if (!priv->temp.tthrottle.valid) {
> +		priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
> +		priv->temp.tthrottle.valid = true;
> +	} else {
> +		priv->temp.tthrottle.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_die_temp(struct peci_hwmon *priv)
> +{
> +	struct peci_get_temp_msg msg;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.die))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->temp.die.value = priv->temp.tjmax.value +
> +			       ((s32)msg.temp_raw * 1000 / 64);
> +
> +	if (!priv->temp.die.valid) {
> +		priv->temp.die.last_updated = INITIAL_JIFFIES;
> +		priv->temp.die.valid = true;
> +	} else {
> +		priv->temp.die.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_dts_margin(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 dts_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.dts_margin))
> +		return 0;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_DTS_MARGIN;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> +
> +	/**
> +	 * Processors return a value of DTS reading in 10.6 format
> +	 * (10 bits signed decimal, 6 bits fractional).
> +	 * Error codes:
> +	 *   0x8000: General sensor error
> +	 *   0x8001: Reserved
> +	 *   0x8002: Underflow on reading value
> +	 *   0x8003-0x81ff: Reserved
> +	 */
> +	if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
> +		return -1;
> +
> +	dts_margin = ten_dot_six_to_millidegree(dts_margin);
> +
> +	priv->temp.dts_margin.value = dts_margin;
> +
> +	if (!priv->temp.dts_margin.valid) {
> +		priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
> +		priv->temp.dts_margin.valid = true;
> +	} else {
> +		priv->temp.dts_margin.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_core_temp(struct peci_hwmon *priv, int core_index)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 core_dts_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.core[core_index]))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
> +	msg.param = core_index;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> +
> +	/**
> +	 * Processors return a value of the core DTS reading in 10.6 format
> +	 * (10 bits signed decimal, 6 bits fractional).
> +	 * Error codes:
> +	 *   0x8000: General sensor error
> +	 *   0x8001: Reserved
> +	 *   0x8002: Underflow on reading value
> +	 *   0x8003-0x81ff: Reserved
> +	 */
> +	if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
> +		return -1;
> +
> +	core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
> +
> +	priv->temp.core[core_index].value = priv->temp.tjmax.value +
> +					    core_dts_margin;
> +
> +	if (!priv->temp.core[core_index].valid) {
> +		priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
> +		priv->temp.core[core_index].valid = true;
> +	} else {
> +		priv->temp.core[core_index].last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int channel = dimm_index / 2;
> +	int dimm_order = dimm_index % 2;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.dimm[dimm_index]))
> +		return 0;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> +	msg.param = channel;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 1000;
> +
> +	if (!priv->temp.dimm[dimm_index].valid) {
> +		priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
> +		priv->temp.dimm[dimm_index].valid = true;
> +	} else {
> +		priv->temp.dimm[dimm_index].last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t show_tcontrol(struct device *dev,
> +			     struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tcontrol(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
> +}
> +
> +static ssize_t show_tcontrol_margin(struct device *dev,
> +				    struct device_attribute *attr,
> +				    char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int rc;
> +
> +	rc = get_tcontrol(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", sensor_attr->index == POS ?
> +				    priv->temp.tjmax.value -
> +				    priv->temp.tcontrol.value :
> +				    priv->temp.tcontrol.value -
> +				    priv->temp.tjmax.value);
> +}
> +
> +static ssize_t show_tthrottle(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tthrottle(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
> +}
> +
> +static ssize_t show_tjmax(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tjmax.value);
> +}
> +
> +static ssize_t show_die_temp(struct device *dev,
> +			     struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_die_temp(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.die.value);
> +}
> +
> +static ssize_t show_dts_margin(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_dts_margin(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
> +}
> +
> +static ssize_t show_core_temp(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int core_index = sensor_attr->index;
> +	int rc;
> +
> +	rc = get_core_temp(priv, core_index);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
> +}
> +
> +static ssize_t show_dimm_temp(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int dimm_index = sensor_attr->index;
> +	int rc;
> +
> +	rc = get_dimm_temp(priv, dimm_index);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
> +}
> +
> +static ssize_t show_value(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, "%d\n", sensor_attr->index);
> +}
> +
> +static ssize_t show_label(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, peci_label[sensor_attr->index]);
> +}
> +
> +static ssize_t show_core_label(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, "Core %d\n", sensor_attr->index);
> +}
> +
> +static ssize_t show_dimm_label(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	char channel = 'A' + (sensor_attr->index / 2);
> +	int index = sensor_attr->index % 2;
> +
> +	return sprintf(buf, "DIMM %d (%c%d)\n",
> +		       sensor_attr->index, channel, index);
> +}
> +
> +/* Die temperature */
> +static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
> +static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, show_tcontrol_margin, NULL,
> +			  POS);
> +
> +static struct attribute *die_temp_attrs[] = {
> +	&sensor_dev_attr_temp1_label.dev_attr.attr,
> +	&sensor_dev_attr_temp1_input.dev_attr.attr,
> +	&sensor_dev_attr_temp1_max.dev_attr.attr,
> +	&sensor_dev_attr_temp1_crit.dev_attr.attr,
> +	&sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group die_temp_attr_group = {
> +	.attrs = die_temp_attrs,
> +};
> +
> +/* DTS margin temperature */
> +static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
> +static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, NULL, NEG);
> +
> +static struct attribute *dts_margin_temp_attrs[] = {
> +	&sensor_dev_attr_temp2_label.dev_attr.attr,
> +	&sensor_dev_attr_temp2_input.dev_attr.attr,
> +	&sensor_dev_attr_temp2_min.dev_attr.attr,
> +	&sensor_dev_attr_temp2_lcrit.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group dts_margin_temp_attr_group = {
> +	.attrs = dts_margin_temp_attrs,
> +};
> +
> +/* Tcontrol temperature */
> +static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, L_TCONTROL);
> +static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
> +
> +static struct attribute *tcontrol_temp_attrs[] = {
> +	&sensor_dev_attr_temp3_label.dev_attr.attr,
> +	&sensor_dev_attr_temp3_input.dev_attr.attr,
> +	&sensor_dev_attr_temp3_crit.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tcontrol_temp_attr_group = {
> +	.attrs = tcontrol_temp_attrs,
> +};
> +
> +/* Tthrottle temperature */
> +static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, L_TTHROTTLE);
> +static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
> +
> +static struct attribute *tthrottle_temp_attrs[] = {
> +	&sensor_dev_attr_temp4_label.dev_attr.attr,
> +	&sensor_dev_attr_temp4_input.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tthrottle_temp_attr_group = {
> +	.attrs = tthrottle_temp_attrs,
> +};
> +
> +/* Tjmax temperature */
> +static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
> +static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
> +
> +static struct attribute *tjmax_temp_attrs[] = {
> +	&sensor_dev_attr_temp5_label.dev_attr.attr,
> +	&sensor_dev_attr_temp5_input.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tjmax_temp_attr_group = {
> +	.attrs = tjmax_temp_attrs,
> +};
> +
> +static const struct attribute_group *
> +default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
> +	&die_temp_attr_group,
> +	&dts_margin_temp_attr_group,
> +	&tcontrol_temp_attr_group,
> +	&tthrottle_temp_attr_group,
> +	&tjmax_temp_attr_group,
> +	NULL
> +};
> +
> +/* Core temperature */
> +static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device *dev,
> +		struct device_attribute *devattr, char *buf) = {
> +	show_core_label,
> +	show_core_temp,
> +	show_tcontrol,
> +	show_tjmax,
> +	show_tcontrol_margin,
> +};
> +
> +static const char *const core_suffix[CORE_TEMP_ATTRS] = {
> +	"label",
> +	"input",
> +	"max",
> +	"crit",
> +	"crit_hyst",
> +};
> +
> +static int check_resolved_cores(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pci_cfg_local_msg msg;
> +	int rc;
> +
> +	if (!(priv->client->adapter->cmd_mask & BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
> +		return -EINVAL;
> +
> +	/* Get the RESOLVED_CORES register value */
> +	msg.addr = priv->addr;
> +	msg.bus = 1;
> +	msg.device = 30;
> +	msg.function = 3;
> +	msg.reg = 0xB4;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->core_mask = msg.pci_config[3] << 24 |
> +			  msg.pci_config[2] << 16 |
> +			  msg.pci_config[1] << 8 |
> +			  msg.pci_config[0];
> +
> +	if (!priv->core_mask)
> +		return -EAGAIN;
> +
> +	dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", priv->core_mask);
> +	return 0;
> +}
> +
> +static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
> +{
> +	struct core_temp_group *data;
> +	int i;
> +
> +	data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
> +			    GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < CORE_TEMP_ATTRS; i++) {
> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> +			 "temp%d_%s", priv->global_idx, core_suffix[i]);
> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> +		data->sd_attrs[i].dev_attr.show = core_show_fn[i];
> +		if (i == 0 || i == 1) /* label or temp */
> +			data->sd_attrs[i].index = core_no;
> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> +	}
> +
> +	data->attr_group.attrs = data->attrs;
> +	priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
> +	priv->global_idx++;
> +
> +	return 0;
> +}
> +
> +static int create_core_temp_groups(struct peci_hwmon *priv)
> +{
> +	int rc, i;
> +
> +	rc = check_resolved_cores(priv);
> +	if (!rc) {
> +		for (i = 0; i < CORE_NUMS_MAX; i++) {
> +			if (priv->core_mask & BIT(i)) {
> +				rc = create_core_temp_group(priv, i);
> +				if (rc)
> +					return rc;
> +			}
> +		}
> +
> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> +					 priv->core_attr_groups);
> +	}
> +
> +	return rc;
> +}
> +
> +/* DIMM temperature */
> +static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device *dev,
> +		struct device_attribute *devattr, char *buf) = {
> +	show_dimm_label,
> +	show_dimm_temp,
> +};
> +
> +static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
> +	"label",
> +	"input",
> +};
> +
> +static int check_populated_dimms(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int i, rc, pass = 0;
> +
> +do_scan:
> +	for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
> +		msg.addr = priv->addr;
> +		msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> +		msg.param = i; /* channel */
> +		msg.rx_len = 4;
> +
> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +		if (rc < 0)
> +			return rc;
> +
> +		if (msg.pkg_config[0]) /* DIMM #0 on the channel */
> +			priv->dimm_mask |= BIT(i);
> +
> +		if (msg.pkg_config[1]) /* DIMM #1 on the channel */
> +			priv->dimm_mask |= BIT(i + 1);
> +	}
> +
> +	/* Do 2-pass scanning */
> +	if (priv->dimm_mask && pass == 0) {
> +		pass++;
> +		goto do_scan;
> +	}
> +
> +	if (!priv->dimm_mask)
> +		return -EAGAIN;
> +
> +	dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", priv->dimm_mask);
> +	return 0;
> +}
> +
> +static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
> +{
> +	struct dimm_temp_group *data;
> +	int i;
> +
> +	data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
> +			    GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> +			 "temp%d_%s", priv->global_idx, dimm_suffix[i]);
> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> +		data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
> +		data->sd_attrs[i].index = dimm_no;
> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> +	}
> +
> +	data->attr_group.attrs = data->attrs;
> +	priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
> +	priv->global_idx++;
> +
> +	return 0;
> +}
> +
> +static int create_dimm_temp_groups(struct peci_hwmon *priv)
> +{
> +	int rc, i;
> +
> +	rc = check_populated_dimms(priv);
> +	if (!rc) {
> +		for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
> +			if (priv->dimm_mask & BIT(i)) {
> +				rc = create_dimm_temp_group(priv, i);
> +				if (rc)
> +					return rc;
> +			}
> +		}
> +
> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> +					 priv->dimm_attr_groups);
> +		if (!rc)
> +			dev_dbg(priv->dev, "Done DIMM temp group creation\n");
> +	} else if (rc == -EAGAIN) {
> +		queue_delayed_work(priv->work_queue, &priv->work_handler,
> +				   DIMM_MASK_CHECK_DELAY);
> +		dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");
> +	}
> +
> +	return rc;
> +}
> +
> +static void create_dimm_temp_groups_delayed(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
> +					       work_handler);
> +	int rc;
> +
> +	rc = create_dimm_temp_groups(priv);
> +	if (rc && rc != -EAGAIN)
> +		dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
> +}
> +
> +static int peci_hwmon_probe(struct peci_client *client)
> +{
> +	struct device *dev = &client->dev;
> +	struct peci_hwmon *priv;
> +	int rc;
> +
> +	if ((client->adapter->cmd_mask &
> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
> +		dev_err(dev, "Client doesn't support temperature monitoring\n");
> +		return -EINVAL;
> +	}
> +
> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	dev_set_drvdata(dev, priv);
> +	priv->client = client;
> +	priv->dev = dev;
> +	priv->addr = client->addr;
> +	priv->cpu_no = priv->addr - PECI_BASE_ADDR;
> +
> +	snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", priv->cpu_no);
> +
> +	priv->work_queue = create_singlethread_workqueue(priv->name);
> +	if (!priv->work_queue)
> +		return -ENOMEM;
> +
> +	priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
> +							    priv->name,
> +							    priv,
> +							   default_attr_groups);
> +
> +	rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
> +	if (rc) {
> +		dev_err(dev, "Failed to register peci hwmon\n");
> +		return rc;
> +	}
> +
> +	priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
> +
> +	rc = create_core_temp_groups(priv);
> +	if (rc) {
> +		dev_err(dev, "Failed to create core groups\n");
> +		return rc;
> +	}
> +
> +	INIT_DELAYED_WORK(&priv->work_handler, create_dimm_temp_groups_delayed);
> +
> +	rc = create_dimm_temp_groups(priv);
> +	if (rc && rc != -EAGAIN)
> +		dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
> +
> +	dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
> +
> +	return 0;
> +}
> +
> +static int peci_hwmon_remove(struct peci_client *client)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
> +
> +	cancel_delayed_work(&priv->work_handler);
> +	destroy_workqueue(priv->work_queue);
> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
> +	hwmon_device_unregister(priv->hwmon_dev);
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id peci_of_table[] = {
> +	{ .compatible = "intel,peci-hwmon", },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(of, peci_of_table);
> +
> +static struct peci_driver peci_hwmon_driver = {
> +	.probe  = peci_hwmon_probe,
> +	.remove = peci_hwmon_remove,
> +	.driver = {
> +		.name           = "peci-hwmon",
> +		.of_match_table = of_match_ptr(peci_of_table),
> +	},
> +};
> +module_peci_driver(peci_hwmon_driver);
> +
> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
> +MODULE_DESCRIPTION("PECI hwmon driver");
> +MODULE_LICENSE("GPL v2");
> 

-- 
Stef van Os
Designer
Prodrive Technologies B.V.
Mobile: +31 63 17 76 319
Phone:  +31 40 26 76 200

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
  2018-03-13  9:32   ` Stef van Os
@ 2018-03-13 18:56     ` Jae Hyun Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Jae Hyun Yoo @ 2018-03-13 18:56 UTC (permalink / raw)
  To: Stef van Os, joel, andrew, arnd, gregkh, jdelvare, linux, benh, andrew
  Cc: linux-hwmon, devicetree, linux-doc, openbmc, linux-kernel,
	linux-arm-kernel

Hi Stef,

Thanks for sharing your time to test it.

That is expected result in v2. Previously in v1, it used delayed 
creation on core temperature group so it was okay if hwmon driver is 
registered when client CPU is powered down, but in v2, the driver should 
check resolved cores at probing time to prevent the delayed creation on 
core temperature group and indexing gap which breaks hwmon subsystem's 
common rule, so I added peci_detect() into peci_new_device() in PECI 
core driver to check online status of the client CPU when registering a 
new device.

You may need to use dynamic dtoverlay for loading/unloading a PECI hwmon 
driver according to the current client CPU power state. It means a PECI 
hwmon driver can be registered only when the client CPU is powered on. 
This design will be kept in v3 as well.

Thanks,
Jae

On 3/13/2018 2:32 AM, Stef van Os wrote:
> Hi Jae,
> 
> I tried version 1 and 2 of your PECI patch on our (AST2500 / Xeon E5 v4) 
> system. The V1 patchset works as expected (reading back temperature 0 
> until PECI is up), but the hwmon driver probe fails with version 2. It 
> communicates with the Xeon and assumes during kernel boot of the Aspeed 
> that PECI to the Xeon's is already up and running, but our system 
> enables the main Xeon supplies from AST2500 userspace.
> 
> If I load the hwmon driver as a module to load later on, the driver does 
> not call probe like e.g. a I2C driver on the I2C bus does. Am I using V2 
> wrongly?
> 
> BR,
> Stef
> 
> On 02/21/2018 05:16 PM, Jae Hyun Yoo wrote:
>> This commit adds a generic PECI hwmon client driver implementation.
>>
>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>> ---
>>   drivers/hwmon/Kconfig      |  10 +
>>   drivers/hwmon/Makefile     |   1 +
>>   drivers/hwmon/peci-hwmon.c | 928 
>> +++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 939 insertions(+)
>>   create mode 100644 drivers/hwmon/peci-hwmon.c
>>
>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>> index ef23553ff5cb..f22e0c31f597 100644
>> --- a/drivers/hwmon/Kconfig
>> +++ b/drivers/hwmon/Kconfig
>> @@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
>>         This driver can also be built as a module.  If so, the module
>>         will be called nct7904.
>> +config SENSORS_PECI_HWMON
>> +    tristate "PECI hwmon support"
>> +    depends on PECI
>> +    help
>> +      If you say yes here you get support for the generic PECI hwmon
>> +      driver.
>> +
>> +      This driver can also be built as a module.  If so, the module
>> +      will be called peci-hwmon.
>> +
>>   config SENSORS_NSA320
>>       tristate "ZyXEL NSA320 and compatible fan speed and temperature 
>> sensors"
>>       depends on GPIOLIB && OF
>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>> index f814b4ace138..946f54b168e5 100644
>> --- a/drivers/hwmon/Makefile
>> +++ b/drivers/hwmon/Makefile
>> @@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)    += nct7802.o
>>   obj-$(CONFIG_SENSORS_NCT7904)    += nct7904.o
>>   obj-$(CONFIG_SENSORS_NSA320)    += nsa320-hwmon.o
>>   obj-$(CONFIG_SENSORS_NTC_THERMISTOR)    += ntc_thermistor.o
>> +obj-$(CONFIG_SENSORS_PECI_HWMON)    += peci-hwmon.o
>>   obj-$(CONFIG_SENSORS_PC87360)    += pc87360.o
>>   obj-$(CONFIG_SENSORS_PC87427)    += pc87427.o
>>   obj-$(CONFIG_SENSORS_PCF8591)    += pcf8591.o
>> diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
>> new file mode 100644
>> index 000000000000..edd27744adcb
>> --- /dev/null
>> +++ b/drivers/hwmon/peci-hwmon.c
>> @@ -0,0 +1,928 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (c) 2018 Intel Corporation
>> +
>> +#include <linux/delay.h>
>> +#include <linux/hwmon.h>
>> +#include <linux/hwmon-sysfs.h>
>> +#include <linux/jiffies.h>
>> +#include <linux/module.h>
>> +#include <linux/of_device.h>
>> +#include <linux/peci.h>
>> +#include <linux/workqueue.h>
>> +
>> +#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks 
>> x 2) */
>> +#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX 
>> Platinum) */
>> +#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
>> +
>> +#define CORE_TEMP_ATTRS       5
>> +#define DIMM_TEMP_ATTRS       2
>> +#define ATTR_NAME_LEN         24
>> +
>> +#define DEFAULT_ATTR_GRP_NUMS 5
>> +
>> +#define UPDATE_INTERVAL_MIN   HZ
>> +#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
>> +
>> +enum sign {
>> +    POS,
>> +    NEG
>> +};
>> +
>> +struct temp_data {
>> +    bool valid;
>> +    s32  value;
>> +    unsigned long last_updated;
>> +};
>> +
>> +struct temp_group {
>> +    struct temp_data tjmax;
>> +    struct temp_data tcontrol;
>> +    struct temp_data tthrottle;
>> +    struct temp_data dts_margin;
>> +    struct temp_data die;
>> +    struct temp_data core[CORE_NUMS_MAX];
>> +    struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
>> +};
>> +
>> +struct core_temp_group {
>> +    struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
>> +    char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
>> +    struct attribute *attrs[CORE_TEMP_ATTRS + 1];
>> +    struct attribute_group attr_group;
>> +};
>> +
>> +struct dimm_temp_group {
>> +    struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
>> +    char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
>> +    struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
>> +    struct attribute_group attr_group;
>> +};
>> +
>> +struct peci_hwmon {
>> +    struct peci_client *client;
>> +    struct device *dev;
>> +    struct device *hwmon_dev;
>> +    struct workqueue_struct *work_queue;
>> +    struct delayed_work work_handler;
>> +    char name[PECI_NAME_SIZE];
>> +    struct temp_group temp;
>> +    u8 addr;
>> +    uint cpu_no;
>> +    u32 core_mask;
>> +    u32 dimm_mask;
>> +    const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
>> +    const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX 
>> + 1];
>> +    uint global_idx;
>> +    uint core_idx;
>> +    uint dimm_idx;
>> +};
>> +
>> +enum label {
>> +    L_DIE,
>> +    L_DTS,
>> +    L_TCONTROL,
>> +    L_TTHROTTLE,
>> +    L_TJMAX,
>> +    L_MAX
>> +};
>> +
>> +static const char *peci_label[L_MAX] = {
>> +    "Die\n",
>> +    "DTS margin to Tcontrol\n",
>> +    "Tcontrol\n",
>> +    "Tthrottle\n",
>> +    "Tjmax\n",
>> +};
>> +
>> +static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, 
>> void *msg)
>> +{
>> +    return peci_command(priv->client->adapter, cmd, msg);
>> +}
>> +
>> +static int need_update(struct temp_data *temp)
>> +{
>> +    if (temp->valid &&
>> +        time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
>> +        return 0;
>> +
>> +    return 1;
>> +}
>> +
>> +static s32 ten_dot_six_to_millidegree(s32 x)
>> +{
>> +    return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
>> +}
>> +
>> +static int get_tjmax(struct peci_hwmon *priv)
>> +{
>> +    struct peci_rd_pkg_cfg_msg msg;
>> +    int rc;
>> +
>> +    if (!priv->temp.tjmax.valid) {
>> +        msg.addr = priv->addr;
>> +        msg.index = MBX_INDEX_TEMP_TARGET;
>> +        msg.param = 0;
>> +        msg.rx_len = 4;
>> +
>> +        rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +        if (rc < 0)
>> +            return rc;
>> +
>> +        priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
>> +        priv->temp.tjmax.valid = true;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_tcontrol(struct peci_hwmon *priv)
>> +{
>> +    struct peci_rd_pkg_cfg_msg msg;
>> +    s32 tcontrol_margin;
>> +    int rc;
>> +
>> +    if (!need_update(&priv->temp.tcontrol))
>> +        return 0;
>> +
>> +    rc = get_tjmax(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    msg.addr = priv->addr;
>> +    msg.index = MBX_INDEX_TEMP_TARGET;
>> +    msg.param = 0;
>> +    msg.rx_len = 4;
>> +
>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    tcontrol_margin = msg.pkg_config[1];
>> +    tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
>> +
>> +    priv->temp.tcontrol.value = priv->temp.tjmax.value - 
>> tcontrol_margin;
>> +
>> +    if (!priv->temp.tcontrol.valid) {
>> +        priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
>> +        priv->temp.tcontrol.valid = true;
>> +    } else {
>> +        priv->temp.tcontrol.last_updated = jiffies;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_tthrottle(struct peci_hwmon *priv)
>> +{
>> +    struct peci_rd_pkg_cfg_msg msg;
>> +    s32 tthrottle_offset;
>> +    int rc;
>> +
>> +    if (!need_update(&priv->temp.tthrottle))
>> +        return 0;
>> +
>> +    rc = get_tjmax(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    msg.addr = priv->addr;
>> +    msg.index = MBX_INDEX_TEMP_TARGET;
>> +    msg.param = 0;
>> +    msg.rx_len = 4;
>> +
>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
>> +    priv->temp.tthrottle.value = priv->temp.tjmax.value - 
>> tthrottle_offset;
>> +
>> +    if (!priv->temp.tthrottle.valid) {
>> +        priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
>> +        priv->temp.tthrottle.valid = true;
>> +    } else {
>> +        priv->temp.tthrottle.last_updated = jiffies;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_die_temp(struct peci_hwmon *priv)
>> +{
>> +    struct peci_get_temp_msg msg;
>> +    int rc;
>> +
>> +    if (!need_update(&priv->temp.die))
>> +        return 0;
>> +
>> +    rc = get_tjmax(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    msg.addr = priv->addr;
>> +
>> +    rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    priv->temp.die.value = priv->temp.tjmax.value +
>> +                   ((s32)msg.temp_raw * 1000 / 64);
>> +
>> +    if (!priv->temp.die.valid) {
>> +        priv->temp.die.last_updated = INITIAL_JIFFIES;
>> +        priv->temp.die.valid = true;
>> +    } else {
>> +        priv->temp.die.last_updated = jiffies;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_dts_margin(struct peci_hwmon *priv)
>> +{
>> +    struct peci_rd_pkg_cfg_msg msg;
>> +    s32 dts_margin;
>> +    int rc;
>> +
>> +    if (!need_update(&priv->temp.dts_margin))
>> +        return 0;
>> +
>> +    msg.addr = priv->addr;
>> +    msg.index = MBX_INDEX_DTS_MARGIN;
>> +    msg.param = 0;
>> +    msg.rx_len = 4;
>> +
>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>> +
>> +    /**
>> +     * Processors return a value of DTS reading in 10.6 format
>> +     * (10 bits signed decimal, 6 bits fractional).
>> +     * Error codes:
>> +     *   0x8000: General sensor error
>> +     *   0x8001: Reserved
>> +     *   0x8002: Underflow on reading value
>> +     *   0x8003-0x81ff: Reserved
>> +     */
>> +    if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
>> +        return -1;
>> +
>> +    dts_margin = ten_dot_six_to_millidegree(dts_margin);
>> +
>> +    priv->temp.dts_margin.value = dts_margin;
>> +
>> +    if (!priv->temp.dts_margin.valid) {
>> +        priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
>> +        priv->temp.dts_margin.valid = true;
>> +    } else {
>> +        priv->temp.dts_margin.last_updated = jiffies;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_core_temp(struct peci_hwmon *priv, int core_index)
>> +{
>> +    struct peci_rd_pkg_cfg_msg msg;
>> +    s32 core_dts_margin;
>> +    int rc;
>> +
>> +    if (!need_update(&priv->temp.core[core_index]))
>> +        return 0;
>> +
>> +    rc = get_tjmax(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    msg.addr = priv->addr;
>> +    msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
>> +    msg.param = core_index;
>> +    msg.rx_len = 4;
>> +
>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>> +
>> +    /**
>> +     * Processors return a value of the core DTS reading in 10.6 format
>> +     * (10 bits signed decimal, 6 bits fractional).
>> +     * Error codes:
>> +     *   0x8000: General sensor error
>> +     *   0x8001: Reserved
>> +     *   0x8002: Underflow on reading value
>> +     *   0x8003-0x81ff: Reserved
>> +     */
>> +    if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
>> +        return -1;
>> +
>> +    core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
>> +
>> +    priv->temp.core[core_index].value = priv->temp.tjmax.value +
>> +                        core_dts_margin;
>> +
>> +    if (!priv->temp.core[core_index].valid) {
>> +        priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
>> +        priv->temp.core[core_index].valid = true;
>> +    } else {
>> +        priv->temp.core[core_index].last_updated = jiffies;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
>> +{
>> +    struct peci_rd_pkg_cfg_msg msg;
>> +    int channel = dimm_index / 2;
>> +    int dimm_order = dimm_index % 2;
>> +    int rc;
>> +
>> +    if (!need_update(&priv->temp.dimm[dimm_index]))
>> +        return 0;
>> +
>> +    msg.addr = priv->addr;
>> +    msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>> +    msg.param = channel;
>> +    msg.rx_len = 4;
>> +
>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 
>> 1000;
>> +
>> +    if (!priv->temp.dimm[dimm_index].valid) {
>> +        priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
>> +        priv->temp.dimm[dimm_index].valid = true;
>> +    } else {
>> +        priv->temp.dimm[dimm_index].last_updated = jiffies;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static ssize_t show_tcontrol(struct device *dev,
>> +                 struct device_attribute *attr,
>> +                 char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    int rc;
>> +
>> +    rc = get_tcontrol(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
>> +}
>> +
>> +static ssize_t show_tcontrol_margin(struct device *dev,
>> +                    struct device_attribute *attr,
>> +                    char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    struct sensor_device_attribute *sensor_attr = 
>> to_sensor_dev_attr(attr);
>> +    int rc;
>> +
>> +    rc = get_tcontrol(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", sensor_attr->index == POS ?
>> +                    priv->temp.tjmax.value -
>> +                    priv->temp.tcontrol.value :
>> +                    priv->temp.tcontrol.value -
>> +                    priv->temp.tjmax.value);
>> +}
>> +
>> +static ssize_t show_tthrottle(struct device *dev,
>> +                  struct device_attribute *attr,
>> +                  char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    int rc;
>> +
>> +    rc = get_tthrottle(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
>> +}
>> +
>> +static ssize_t show_tjmax(struct device *dev,
>> +              struct device_attribute *attr,
>> +              char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    int rc;
>> +
>> +    rc = get_tjmax(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", priv->temp.tjmax.value);
>> +}
>> +
>> +static ssize_t show_die_temp(struct device *dev,
>> +                 struct device_attribute *attr,
>> +                 char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    int rc;
>> +
>> +    rc = get_die_temp(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", priv->temp.die.value);
>> +}
>> +
>> +static ssize_t show_dts_margin(struct device *dev,
>> +                   struct device_attribute *attr,
>> +                   char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    int rc;
>> +
>> +    rc = get_dts_margin(priv);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
>> +}
>> +
>> +static ssize_t show_core_temp(struct device *dev,
>> +                  struct device_attribute *attr,
>> +                  char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    struct sensor_device_attribute *sensor_attr = 
>> to_sensor_dev_attr(attr);
>> +    int core_index = sensor_attr->index;
>> +    int rc;
>> +
>> +    rc = get_core_temp(priv, core_index);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
>> +}
>> +
>> +static ssize_t show_dimm_temp(struct device *dev,
>> +                  struct device_attribute *attr,
>> +                  char *buf)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(dev);
>> +    struct sensor_device_attribute *sensor_attr = 
>> to_sensor_dev_attr(attr);
>> +    int dimm_index = sensor_attr->index;
>> +    int rc;
>> +
>> +    rc = get_dimm_temp(priv, dimm_index);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
>> +}
>> +
>> +static ssize_t show_value(struct device *dev,
>> +              struct device_attribute *attr,
>> +              char *buf)
>> +{
>> +    struct sensor_device_attribute *sensor_attr = 
>> to_sensor_dev_attr(attr);
>> +
>> +    return sprintf(buf, "%d\n", sensor_attr->index);
>> +}
>> +
>> +static ssize_t show_label(struct device *dev,
>> +              struct device_attribute *attr,
>> +              char *buf)
>> +{
>> +    struct sensor_device_attribute *sensor_attr = 
>> to_sensor_dev_attr(attr);
>> +
>> +    return sprintf(buf, peci_label[sensor_attr->index]);
>> +}
>> +
>> +static ssize_t show_core_label(struct device *dev,
>> +                   struct device_attribute *attr,
>> +                   char *buf)
>> +{
>> +    struct sensor_device_attribute *sensor_attr = 
>> to_sensor_dev_attr(attr);
>> +
>> +    return sprintf(buf, "Core %d\n", sensor_attr->index);
>> +}
>> +
>> +static ssize_t show_dimm_label(struct device *dev,
>> +                   struct device_attribute *attr,
>> +                   char *buf)
>> +{
>> +    struct sensor_device_attribute *sensor_attr = 
>> to_sensor_dev_attr(attr);
>> +
>> +    char channel = 'A' + (sensor_attr->index / 2);
>> +    int index = sensor_attr->index % 2;
>> +
>> +    return sprintf(buf, "DIMM %d (%c%d)\n",
>> +               sensor_attr->index, channel, index);
>> +}
>> +
>> +/* Die temperature */
>> +static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
>> +static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, 
>> show_tcontrol_margin, NULL,
>> +              POS);
>> +
>> +static struct attribute *die_temp_attrs[] = {
>> +    &sensor_dev_attr_temp1_label.dev_attr.attr,
>> +    &sensor_dev_attr_temp1_input.dev_attr.attr,
>> +    &sensor_dev_attr_temp1_max.dev_attr.attr,
>> +    &sensor_dev_attr_temp1_crit.dev_attr.attr,
>> +    &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
>> +    NULL
>> +};
>> +
>> +static struct attribute_group die_temp_attr_group = {
>> +    .attrs = die_temp_attrs,
>> +};
>> +
>> +/* DTS margin temperature */
>> +static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
>> +static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, 
>> NULL, NEG);
>> +
>> +static struct attribute *dts_margin_temp_attrs[] = {
>> +    &sensor_dev_attr_temp2_label.dev_attr.attr,
>> +    &sensor_dev_attr_temp2_input.dev_attr.attr,
>> +    &sensor_dev_attr_temp2_min.dev_attr.attr,
>> +    &sensor_dev_attr_temp2_lcrit.dev_attr.attr,
>> +    NULL
>> +};
>> +
>> +static struct attribute_group dts_margin_temp_attr_group = {
>> +    .attrs = dts_margin_temp_attrs,
>> +};
>> +
>> +/* Tcontrol temperature */
>> +static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, 
>> L_TCONTROL);
>> +static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
>> +static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
>> +
>> +static struct attribute *tcontrol_temp_attrs[] = {
>> +    &sensor_dev_attr_temp3_label.dev_attr.attr,
>> +    &sensor_dev_attr_temp3_input.dev_attr.attr,
>> +    &sensor_dev_attr_temp3_crit.dev_attr.attr,
>> +    NULL
>> +};
>> +
>> +static struct attribute_group tcontrol_temp_attr_group = {
>> +    .attrs = tcontrol_temp_attrs,
>> +};
>> +
>> +/* Tthrottle temperature */
>> +static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, 
>> L_TTHROTTLE);
>> +static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
>> +
>> +static struct attribute *tthrottle_temp_attrs[] = {
>> +    &sensor_dev_attr_temp4_label.dev_attr.attr,
>> +    &sensor_dev_attr_temp4_input.dev_attr.attr,
>> +    NULL
>> +};
>> +
>> +static struct attribute_group tthrottle_temp_attr_group = {
>> +    .attrs = tthrottle_temp_attrs,
>> +};
>> +
>> +/* Tjmax temperature */
>> +static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
>> +static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
>> +
>> +static struct attribute *tjmax_temp_attrs[] = {
>> +    &sensor_dev_attr_temp5_label.dev_attr.attr,
>> +    &sensor_dev_attr_temp5_input.dev_attr.attr,
>> +    NULL
>> +};
>> +
>> +static struct attribute_group tjmax_temp_attr_group = {
>> +    .attrs = tjmax_temp_attrs,
>> +};
>> +
>> +static const struct attribute_group *
>> +default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
>> +    &die_temp_attr_group,
>> +    &dts_margin_temp_attr_group,
>> +    &tcontrol_temp_attr_group,
>> +    &tthrottle_temp_attr_group,
>> +    &tjmax_temp_attr_group,
>> +    NULL
>> +};
>> +
>> +/* Core temperature */
>> +static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device 
>> *dev,
>> +        struct device_attribute *devattr, char *buf) = {
>> +    show_core_label,
>> +    show_core_temp,
>> +    show_tcontrol,
>> +    show_tjmax,
>> +    show_tcontrol_margin,
>> +};
>> +
>> +static const char *const core_suffix[CORE_TEMP_ATTRS] = {
>> +    "label",
>> +    "input",
>> +    "max",
>> +    "crit",
>> +    "crit_hyst",
>> +};
>> +
>> +static int check_resolved_cores(struct peci_hwmon *priv)
>> +{
>> +    struct peci_rd_pci_cfg_local_msg msg;
>> +    int rc;
>> +
>> +    if (!(priv->client->adapter->cmd_mask & 
>> BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
>> +        return -EINVAL;
>> +
>> +    /* Get the RESOLVED_CORES register value */
>> +    msg.addr = priv->addr;
>> +    msg.bus = 1;
>> +    msg.device = 30;
>> +    msg.function = 3;
>> +    msg.reg = 0xB4;
>> +    msg.rx_len = 4;
>> +
>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    priv->core_mask = msg.pci_config[3] << 24 |
>> +              msg.pci_config[2] << 16 |
>> +              msg.pci_config[1] << 8 |
>> +              msg.pci_config[0];
>> +
>> +    if (!priv->core_mask)
>> +        return -EAGAIN;
>> +
>> +    dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", 
>> priv->core_mask);
>> +    return 0;
>> +}
>> +
>> +static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
>> +{
>> +    struct core_temp_group *data;
>> +    int i;
>> +
>> +    data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
>> +                GFP_KERNEL);
>> +    if (!data)
>> +        return -ENOMEM;
>> +
>> +    for (i = 0; i < CORE_TEMP_ATTRS; i++) {
>> +        snprintf(data->attr_name[i], ATTR_NAME_LEN,
>> +             "temp%d_%s", priv->global_idx, core_suffix[i]);
>> +        sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
>> +        data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
>> +        data->sd_attrs[i].dev_attr.attr.mode = 0444;
>> +        data->sd_attrs[i].dev_attr.show = core_show_fn[i];
>> +        if (i == 0 || i == 1) /* label or temp */
>> +            data->sd_attrs[i].index = core_no;
>> +        data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
>> +    }
>> +
>> +    data->attr_group.attrs = data->attrs;
>> +    priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
>> +    priv->global_idx++;
>> +
>> +    return 0;
>> +}
>> +
>> +static int create_core_temp_groups(struct peci_hwmon *priv)
>> +{
>> +    int rc, i;
>> +
>> +    rc = check_resolved_cores(priv);
>> +    if (!rc) {
>> +        for (i = 0; i < CORE_NUMS_MAX; i++) {
>> +            if (priv->core_mask & BIT(i)) {
>> +                rc = create_core_temp_group(priv, i);
>> +                if (rc)
>> +                    return rc;
>> +            }
>> +        }
>> +
>> +        rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
>> +                     priv->core_attr_groups);
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +/* DIMM temperature */
>> +static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device 
>> *dev,
>> +        struct device_attribute *devattr, char *buf) = {
>> +    show_dimm_label,
>> +    show_dimm_temp,
>> +};
>> +
>> +static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
>> +    "label",
>> +    "input",
>> +};
>> +
>> +static int check_populated_dimms(struct peci_hwmon *priv)
>> +{
>> +    struct peci_rd_pkg_cfg_msg msg;
>> +    int i, rc, pass = 0;
>> +
>> +do_scan:
>> +    for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
>> +        msg.addr = priv->addr;
>> +        msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>> +        msg.param = i; /* channel */
>> +        msg.rx_len = 4;
>> +
>> +        rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
>> +        if (rc < 0)
>> +            return rc;
>> +
>> +        if (msg.pkg_config[0]) /* DIMM #0 on the channel */
>> +            priv->dimm_mask |= BIT(i);
>> +
>> +        if (msg.pkg_config[1]) /* DIMM #1 on the channel */
>> +            priv->dimm_mask |= BIT(i + 1);
>> +    }
>> +
>> +    /* Do 2-pass scanning */
>> +    if (priv->dimm_mask && pass == 0) {
>> +        pass++;
>> +        goto do_scan;
>> +    }
>> +
>> +    if (!priv->dimm_mask)
>> +        return -EAGAIN;
>> +
>> +    dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", 
>> priv->dimm_mask);
>> +    return 0;
>> +}
>> +
>> +static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
>> +{
>> +    struct dimm_temp_group *data;
>> +    int i;
>> +
>> +    data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
>> +                GFP_KERNEL);
>> +    if (!data)
>> +        return -ENOMEM;
>> +
>> +    for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
>> +        snprintf(data->attr_name[i], ATTR_NAME_LEN,
>> +             "temp%d_%s", priv->global_idx, dimm_suffix[i]);
>> +        sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
>> +        data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
>> +        data->sd_attrs[i].dev_attr.attr.mode = 0444;
>> +        data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
>> +        data->sd_attrs[i].index = dimm_no;
>> +        data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
>> +    }
>> +
>> +    data->attr_group.attrs = data->attrs;
>> +    priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
>> +    priv->global_idx++;
>> +
>> +    return 0;
>> +}
>> +
>> +static int create_dimm_temp_groups(struct peci_hwmon *priv)
>> +{
>> +    int rc, i;
>> +
>> +    rc = check_populated_dimms(priv);
>> +    if (!rc) {
>> +        for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
>> +            if (priv->dimm_mask & BIT(i)) {
>> +                rc = create_dimm_temp_group(priv, i);
>> +                if (rc)
>> +                    return rc;
>> +            }
>> +        }
>> +
>> +        rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
>> +                     priv->dimm_attr_groups);
>> +        if (!rc)
>> +            dev_dbg(priv->dev, "Done DIMM temp group creation\n");
>> +    } else if (rc == -EAGAIN) {
>> +        queue_delayed_work(priv->work_queue, &priv->work_handler,
>> +                   DIMM_MASK_CHECK_DELAY);
>> +        dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static void create_dimm_temp_groups_delayed(struct work_struct *work)
>> +{
>> +    struct delayed_work *dwork = to_delayed_work(work);
>> +    struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
>> +                           work_handler);
>> +    int rc;
>> +
>> +    rc = create_dimm_temp_groups(priv);
>> +    if (rc && rc != -EAGAIN)
>> +        dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
>> +}
>> +
>> +static int peci_hwmon_probe(struct peci_client *client)
>> +{
>> +    struct device *dev = &client->dev;
>> +    struct peci_hwmon *priv;
>> +    int rc;
>> +
>> +    if ((client->adapter->cmd_mask &
>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
>> +        dev_err(dev, "Client doesn't support temperature monitoring\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>> +    if (!priv)
>> +        return -ENOMEM;
>> +
>> +    dev_set_drvdata(dev, priv);
>> +    priv->client = client;
>> +    priv->dev = dev;
>> +    priv->addr = client->addr;
>> +    priv->cpu_no = priv->addr - PECI_BASE_ADDR;
>> +
>> +    snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", 
>> priv->cpu_no);
>> +
>> +    priv->work_queue = create_singlethread_workqueue(priv->name);
>> +    if (!priv->work_queue)
>> +        return -ENOMEM;
>> +
>> +    priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
>> +                                priv->name,
>> +                                priv,
>> +                               default_attr_groups);
>> +
>> +    rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
>> +    if (rc) {
>> +        dev_err(dev, "Failed to register peci hwmon\n");
>> +        return rc;
>> +    }
>> +
>> +    priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
>> +
>> +    rc = create_core_temp_groups(priv);
>> +    if (rc) {
>> +        dev_err(dev, "Failed to create core groups\n");
>> +        return rc;
>> +    }
>> +
>> +    INIT_DELAYED_WORK(&priv->work_handler, 
>> create_dimm_temp_groups_delayed);
>> +
>> +    rc = create_dimm_temp_groups(priv);
>> +    if (rc && rc != -EAGAIN)
>> +        dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
>> +
>> +    dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
>> +
>> +    return 0;
>> +}
>> +
>> +static int peci_hwmon_remove(struct peci_client *client)
>> +{
>> +    struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
>> +
>> +    cancel_delayed_work(&priv->work_handler);
>> +    destroy_workqueue(priv->work_queue);
>> +    sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
>> +    sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
>> +    hwmon_device_unregister(priv->hwmon_dev);
>> +
>> +    return 0;
>> +}
>> +
>> +static const struct of_device_id peci_of_table[] = {
>> +    { .compatible = "intel,peci-hwmon", },
>> +    { }
>> +};
>> +MODULE_DEVICE_TABLE(of, peci_of_table);
>> +
>> +static struct peci_driver peci_hwmon_driver = {
>> +    .probe  = peci_hwmon_probe,
>> +    .remove = peci_hwmon_remove,
>> +    .driver = {
>> +        .name           = "peci-hwmon",
>> +        .of_match_table = of_match_ptr(peci_of_table),
>> +    },
>> +};
>> +module_peci_driver(peci_hwmon_driver);
>> +
>> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
>> +MODULE_DESCRIPTION("PECI hwmon driver");
>> +MODULE_LICENSE("GPL v2");
>>
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-03-13 18:56 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-21 16:15 [PATCH v2 0/8] PECI device driver introduction Jae Hyun Yoo
2018-02-21 16:15 ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Jae Hyun Yoo
2018-02-21 17:04   ` Andrew Lunn
2018-02-21 20:31     ` Jae Hyun Yoo
2018-02-21 21:51       ` Andrew Lunn
2018-02-21 22:03         ` Jae Hyun Yoo
2018-02-21 17:58   ` Greg KH
2018-02-21 20:42     ` Jae Hyun Yoo
2018-02-22  6:54       ` Greg KH
2018-02-22 17:20         ` Jae Hyun Yoo
2018-02-22  7:01   ` kbuild test robot
2018-02-22  7:01   ` [RFC PATCH] drivers/peci: peci_match_id() can be static kbuild test robot
2018-02-22 17:25     ` Jae Hyun Yoo
2018-03-07  3:19   ` [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core Julia Cartwright
2018-03-07 19:03     ` Jae Hyun Yoo
2018-02-21 16:16 ` [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs Jae Hyun Yoo
2018-02-21 17:13   ` Andrew Lunn
2018-02-21 20:35     ` Jae Hyun Yoo
2018-03-06 12:40   ` Pavel Machek
2018-03-06 12:54     ` Andrew Lunn
2018-03-06 13:05       ` Pavel Machek
2018-03-06 13:19         ` Arnd Bergmann
2018-03-06 19:05     ` Jae Hyun Yoo
2018-03-07 22:11       ` Pavel Machek
2018-03-09 23:41       ` Milton Miller II
2018-03-09 23:47         ` Jae Hyun Yoo
2018-02-21 16:16 ` [PATCH v2 3/8] [PATCH 3/8] ARM: dts: aspeed: peci: Add PECI node Jae Hyun Yoo
2018-02-21 16:16 ` [PATCH v2 4/8] [PATCH 4/8] drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx Jae Hyun Yoo
2018-02-21 16:16 ` [PATCH v2 5/8] [PATCH [5/8] Documentation: dt-bindings: Add a document for PECI hwmon client driver Jae Hyun Yoo
2018-02-21 16:16 ` [PATCH v2 6/8] [PATCH 6/8] Documentation: hwmon: " Jae Hyun Yoo
2018-03-06 20:28   ` Randy Dunlap
2018-03-06 21:08     ` Jae Hyun Yoo
2018-02-21 16:16 ` [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic " Jae Hyun Yoo
2018-02-21 18:26   ` Guenter Roeck
2018-02-21 21:24     ` Jae Hyun Yoo
2018-02-21 21:48       ` Guenter Roeck
2018-02-21 23:07         ` Jae Hyun Yoo
2018-02-22  0:37           ` Andrew Lunn
2018-02-22  1:29             ` Jae Hyun Yoo
2018-02-24  0:00               ` Miguel Ojeda
2018-02-24  9:32                 ` Jae Hyun Yoo
2018-03-13  9:32   ` Stef van Os
2018-03-13 18:56     ` Jae Hyun Yoo
2018-02-21 16:16 ` [PATCH v2 8/8] [PATCH 8/8] Add a maintainer for the PECI subsystem Jae Hyun Yoo
2018-03-06 12:40 ` [PATCH v2 0/8] PECI device driver introduction Pavel Machek
2018-03-06 19:21   ` Jae Hyun Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).