All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] nvme: add thermal zone devices
@ 2019-05-21 16:04 ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)
  To: linux-nvme, linux-pm
  Cc: Akinobu Mita, Zhang Rui, Eduardo Valentin, Daniel Lezcano,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Minwoo Im, Kenneth Heitke, Chaitanya Kulkarni

The NVMe controller reports up to nine temperature values in the SMART /
Health log page (the composite temperature and temperature sensor 1 through
temperature sensor 8).
The temperature threshold feature (Feature Identifier 04h) configures the
asynchronous event request command to complete when the temperature is
crossed its corresponding temperature threshold.

This provide these temperatures and thresholds via thermal zone devices.

* v2
- s/correspoinding/corresponding/ typo in commit log
- Borrowed nvme_get_features() from Keith's patch
- Temperature threshold notification is splitted into another patch
- Change the data type of 'sensor' to unsigned
- Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
- Add WARN_ON_ONCE for paranoid checks
- Fix off-by-one error in nvme_get_temp
- Validate 'sensor' where the value is actually used
- Define and utilize two enums related to the temperature threshold feature
- Remove hysteresis value for this trip point and don't utilize the under
  temperature threshold
- Print error message for thermal_zone_device_register() failure
- Add function comments for nvme_thermal_zones_{,un}register
- Suppress non-fatal errors from nvme_thermal_zones_register()
- Add comment about implemented temperature sensors 
- Instead of creating a new 'thermal_work', append async smart event's
  action to the existing async_event_work
- Add comment for tzdev member in nvme_ctrl
- Call nvme_thermal_zones_unregister() earlier than the last reference
  release

Akinobu Mita (3):
  nvme: add thermal zone infrastructure
  nvme: notify thermal framework when temperature threshold events occur
  nvme-pci: support thermal zone

Keith Busch (1):
  nvme: Export get and set features

 drivers/nvme/host/core.c | 317 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h |  31 +++++
 drivers/nvme/host/pci.c  |   5 +
 include/linux/nvme.h     |  12 ++
 4 files changed, 362 insertions(+), 3 deletions(-)

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
-- 
2.7.4


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 0/4] nvme: add thermal zone devices
@ 2019-05-21 16:04 ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)


The NVMe controller reports up to nine temperature values in the SMART /
Health log page (the composite temperature and temperature sensor 1 through
temperature sensor 8).
The temperature threshold feature (Feature Identifier 04h) configures the
asynchronous event request command to complete when the temperature is
crossed its corresponding temperature threshold.

This provide these temperatures and thresholds via thermal zone devices.

* v2
- s/correspoinding/corresponding/ typo in commit log
- Borrowed nvme_get_features() from Keith's patch
- Temperature threshold notification is splitted into another patch
- Change the data type of 'sensor' to unsigned
- Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
- Add WARN_ON_ONCE for paranoid checks
- Fix off-by-one error in nvme_get_temp
- Validate 'sensor' where the value is actually used
- Define and utilize two enums related to the temperature threshold feature
- Remove hysteresis value for this trip point and don't utilize the under
  temperature threshold
- Print error message for thermal_zone_device_register() failure
- Add function comments for nvme_thermal_zones_{,un}register
- Suppress non-fatal errors from nvme_thermal_zones_register()
- Add comment about implemented temperature sensors 
- Instead of creating a new 'thermal_work', append async smart event's
  action to the existing async_event_work
- Add comment for tzdev member in nvme_ctrl
- Call nvme_thermal_zones_unregister() earlier than the last reference
  release

Akinobu Mita (3):
  nvme: add thermal zone infrastructure
  nvme: notify thermal framework when temperature threshold events occur
  nvme-pci: support thermal zone

Keith Busch (1):
  nvme: Export get and set features

 drivers/nvme/host/core.c | 317 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h |  31 +++++
 drivers/nvme/host/pci.c  |   5 +
 include/linux/nvme.h     |  12 ++
 4 files changed, 362 insertions(+), 3 deletions(-)

Cc: Zhang Rui <rui.zhang at intel.com>
Cc: Eduardo Valentin <edubezval at gmail.com>
Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Jens Axboe <axboe at fb.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Minwoo Im <minwoo.im.dev at gmail.com>
Cc: Kenneth Heitke <kenneth.heitke at intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/4] nvme: Export get and set features
  2019-05-21 16:04 ` Akinobu Mita
@ 2019-05-21 16:04   ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)
  To: linux-nvme, linux-pm; +Cc: Keith Busch

From: Keith Busch <keith.busch@intel.com>

Future use intends to make use of features, so export these functions. And
since their implementation is identical except for the opcode, provide
a new convenience function that implement each.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/nvme/host/core.c | 22 +++++++++++++++++++---
 drivers/nvme/host/nvme.h |  4 ++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d352145..c04df80 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
 	return id;
 }
 
-static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
-		      void *buffer, size_t buflen, u32 *result)
+static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,
+		unsigned dword11, void *buffer, size_t buflen, u32 *result)
 {
 	struct nvme_command c;
 	union nvme_result res;
 	int ret;
 
 	memset(&c, 0, sizeof(c));
-	c.features.opcode = nvme_admin_set_features;
+	c.features.opcode = op;
 	c.features.fid = cpu_to_le32(fid);
 	c.features.dword11 = cpu_to_le32(dword11);
 
@@ -1132,6 +1132,22 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
 	return ret;
 }
 
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result)
+{
+	return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
+			     buflen, result);
+}
+EXPORT_SYMBOL_GPL(nvme_set_features);
+
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result)
+{
+	return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
+			     buflen, result);
+}
+EXPORT_SYMBOL_GPL(nvme_get_features);
+
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 {
 	u32 q_count = (*count - 1) | ((*count - 1) << 16);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 56bba7a..bb673b8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -459,6 +459,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
 		unsigned timeout, int qid, int at_head,
 		blk_mq_req_flags_t flags, bool poll);
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result);
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 1/4] nvme: Export get and set features
@ 2019-05-21 16:04   ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)


From: Keith Busch <keith.busch@intel.com>

Future use intends to make use of features, so export these functions. And
since their implementation is identical except for the opcode, provide
a new convenience function that implement each.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 22 +++++++++++++++++++---
 drivers/nvme/host/nvme.h |  4 ++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d352145..c04df80 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
 	return id;
 }
 
-static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
-		      void *buffer, size_t buflen, u32 *result)
+static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,
+		unsigned dword11, void *buffer, size_t buflen, u32 *result)
 {
 	struct nvme_command c;
 	union nvme_result res;
 	int ret;
 
 	memset(&c, 0, sizeof(c));
-	c.features.opcode = nvme_admin_set_features;
+	c.features.opcode = op;
 	c.features.fid = cpu_to_le32(fid);
 	c.features.dword11 = cpu_to_le32(dword11);
 
@@ -1132,6 +1132,22 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
 	return ret;
 }
 
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result)
+{
+	return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
+			     buflen, result);
+}
+EXPORT_SYMBOL_GPL(nvme_set_features);
+
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result)
+{
+	return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
+			     buflen, result);
+}
+EXPORT_SYMBOL_GPL(nvme_get_features);
+
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 {
 	u32 q_count = (*count - 1) | ((*count - 1) << 16);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 56bba7a..bb673b8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -459,6 +459,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
 		unsigned timeout, int qid, int at_head,
 		blk_mq_req_flags_t flags, bool poll);
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result);
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
+		      void *buffer, size_t buflen, u32 *result);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-21 16:04 ` Akinobu Mita
@ 2019-05-21 16:04   ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)
  To: linux-nvme, linux-pm
  Cc: Akinobu Mita, Zhang Rui, Eduardo Valentin, Daniel Lezcano,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Minwoo Im, Kenneth Heitke, Chaitanya Kulkarni

The NVMe controller reports up to nine temperature values in the SMART /
Health log page (the composite temperature and temperature sensor 1 through
temperature sensor 8).
The temperature threshold feature (Feature Identifier 04h) configures the
asynchronous event request command to complete when the temperature is
crossed its corresponding temperature threshold.

This adds infrastructure to provide these temperatures and thresholds via
thermal zone devices.

The nvme_thermal_zones_register() creates up to nine thermal zone devices
for all implemented temperature sensors including the composite
temperature.

/sys/class/thermal/thermal_zone[0-*]:
    |---temp: Temperature
    |---trip_point_0_temp: Over temperature threshold

The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
On the other hand, the following symlinks to the thermal zone devices are
created in the nvme device sysfs directory.

- nvme_temp0: Composite temperature
- nvme_temp1: Temperature sensor 1
...
- nvme_temp8: Temperature sensor 8

The nvme_thermal_zones_unregister() removes the registered thermal zone
devices and symlinks.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- s/correspoinding/corresponding/ typo in commit log
- Borrowed nvme_get_features() from Keith's patch
- Temperature threshold notification is splitted into another patch
- Change the data type of 'sensor' to unsigned
- Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
- Add WARN_ON_ONCE for paranoid checks
- Fix off-by-one error in nvme_get_temp
- Validate 'sensor' where the value is actually used
- Define and utilize two enums related to the temperature threshold feature
- Remove hysteresis value for this trip point and don't utilize the under
  temperature threshold
- Print error message for thermal_zone_device_register() failure
- Add function comments for nvme_thermal_zones_{,un}register
- Suppress non-fatal errors from nvme_thermal_zones_register()
- Add comment about implemented temperature sensors 
- Instead of creating a new 'thermal_work', append async smart event's
  action to the existing async_event_work
- Add comment for tzdev member in nvme_ctrl

 drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  27 +++++
 include/linux/nvme.h     |   5 +
 3 files changed, 297 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c04df80..0ec303c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
 	}
 }
 
+#ifdef CONFIG_THERMAL
+
+static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
+{
+	struct nvme_smart_log *log;
+	int ret;
+
+	BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
+		     ARRAY_SIZE(ctrl->tzdev));
+
+	if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
+		return -EINVAL;
+
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log)
+		return -ENOMEM;
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
+			   log, sizeof(*log), 0);
+	if (ret) {
+		ret = ret > 0 ? -EINVAL : ret;
+		goto free_log;
+	}
+
+	if (sensor)
+		*temp = le16_to_cpu(log->temp_sensor[sensor - 1]);
+	else
+		*temp = get_unaligned_le16(log->temperature);
+
+	if (!*temp)
+		ret = -EINVAL;
+
+free_log:
+	kfree(log);
+
+	return ret;
+}
+
+static unsigned int nvme_tz_type_to_sensor(const char *type)
+{
+	unsigned int sensor;
+
+	if (sscanf(type, "nvme_temp%u", &sensor) != 1)
+		return UINT_MAX;
+
+	return sensor;
+}
+
+#define KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS((t) * 10)
+#define MILLICELSIUS_TO_KELVIN(t) ((MILLICELSIUS_TO_DECI_KELVIN(t) + 5) / 10)
+
+static int nvme_tz_get_temp(struct thermal_zone_device *tzdev,
+			    int *temp)
+{
+	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
+	struct nvme_ctrl *ctrl = tzdev->devdata;
+	int ret;
+
+	ret = nvme_get_temp(ctrl, sensor, temp);
+	if (!ret)
+		*temp = KELVIN_TO_MILLICELSIUS(*temp);
+
+	return ret;
+}
+
+static int nvme_tz_get_trip_type(struct thermal_zone_device *tzdev,
+				 int trip, enum thermal_trip_type *type)
+{
+	*type = THERMAL_TRIP_ACTIVE;
+
+	return 0;
+}
+
+static int nvme_get_over_temp_thresh(struct nvme_ctrl *ctrl,
+				     unsigned int sensor, int *temp)
+{
+	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
+	int status;
+	int ret;
+
+	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
+		return -EINVAL;
+
+	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
+				&status);
+	if (!ret)
+		*temp = status & NVME_TEMP_THRESH_MASK;
+
+	return ret > 0 ? -EINVAL : ret;
+}
+
+static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
+				     unsigned int sensor, int temp)
+{
+	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
+	int status;
+	int ret;
+
+	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
+		return -EINVAL;
+
+	if (temp > NVME_TEMP_THRESH_MASK)
+		return -EINVAL;
+
+	threshold |= temp & NVME_TEMP_THRESH_MASK;
+
+	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
+				&status);
+
+	return ret > 0 ? -EINVAL : ret;
+}
+
+static int nvme_tz_get_trip_temp(struct thermal_zone_device *tzdev,
+				 int trip, int *temp)
+{
+	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
+	struct nvme_ctrl *ctrl = tzdev->devdata;
+	int ret;
+
+	ret = nvme_get_over_temp_thresh(ctrl, sensor, temp);
+	if (!ret)
+		*temp = KELVIN_TO_MILLICELSIUS(*temp);
+
+	return ret;
+}
+
+static int nvme_tz_set_trip_temp(struct thermal_zone_device *tzdev,
+				 int trip, int temp)
+{
+	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
+	struct nvme_ctrl *ctrl = tzdev->devdata;
+
+	temp = MILLICELSIUS_TO_KELVIN(temp);
+
+	return nvme_set_over_temp_thresh(ctrl, sensor, temp);
+}
+
+static struct thermal_zone_device_ops nvme_tz_ops = {
+	.get_temp = nvme_tz_get_temp,
+	.get_trip_type = nvme_tz_get_trip_type,
+	.get_trip_temp = nvme_tz_get_trip_temp,
+	.set_trip_temp = nvme_tz_set_trip_temp,
+};
+
+static struct thermal_zone_params nvme_tz_params = {
+	.governor_name = "user_space",
+	.no_hwmon = true,
+};
+
+static struct thermal_zone_device *
+nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
+{
+	struct thermal_zone_device *tzdev;
+	char type[THERMAL_NAME_LENGTH];
+	int ret;
+
+	snprintf(type, sizeof(type), "nvme_temp%d", sensor);
+
+	tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
+					     &nvme_tz_params, 0, 0);
+	if (IS_ERR(tzdev)) {
+		dev_err(ctrl->device,
+			"Failed to register thermal zone device: %ld\n",
+			PTR_ERR(tzdev));
+		return tzdev;
+	}
+
+	ret = sysfs_create_link(&ctrl->ctrl_device.kobj,
+				&tzdev->device.kobj, type);
+	if (ret)
+		goto device_unregister;
+
+	ret = sysfs_create_link(&tzdev->device.kobj,
+				&ctrl->ctrl_device.kobj, "device");
+	if (ret)
+		goto remove_link;
+
+	return tzdev;
+
+remove_link:
+	sysfs_remove_link(&ctrl->ctrl_device.kobj, type);
+device_unregister:
+	thermal_zone_device_unregister(tzdev);
+
+	return ERR_PTR(ret);
+}
+
+/**
+ * nvme_thermal_zones_register() - register nvme thermal zone devices
+ * @ctrl: controller instance
+ *
+ * This function creates up to nine thermal zone devices for all implemented
+ * temperature sensors including the composite temperature.
+ * Each thermal zone device provides a single trip point temperature that is
+ * associated with an over temperature threshold.
+ */
+int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
+{
+	struct nvme_smart_log *log;
+	int ret;
+	int i;
+
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log)
+		return 0; /* non-fatal error */
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
+			   log, sizeof(*log), 0);
+	if (ret) {
+		dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
+		ret = ret > 0 ? -EINVAL : ret;
+		goto free_log;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
+		struct thermal_zone_device *tzdev;
+
+		/*
+		 * All implemented temperature sensors report a non-zero value
+		 * in temperature sensor fields in the smart log page.
+		 */
+		if (i && !le16_to_cpu(log->temp_sensor[i - 1]))
+			continue;
+		if (ctrl->tzdev[i])
+			continue;
+
+		tzdev = nvme_thermal_zone_register(ctrl, i);
+		if (!IS_ERR(tzdev))
+			ctrl->tzdev[i] = tzdev;
+	}
+
+free_log:
+	kfree(log);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_thermal_zones_register);
+
+/**
+ * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices
+ * @ctrl: controller instance
+ *
+ * This function removes the registered thermal zone devices and symlinks.
+ */
+void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
+		struct thermal_zone_device *tzdev = ctrl->tzdev[i];
+
+		if (!tzdev)
+			continue;
+
+		sysfs_remove_link(&tzdev->device.kobj, "device");
+		sysfs_remove_link(&ctrl->ctrl_device.kobj, tzdev->type);
+		thermal_zone_device_unregister(tzdev);
+
+		ctrl->tzdev[i] = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(nvme_thermal_zones_unregister);
+
+#endif /* CONFIG_THERMAL */
+
 struct nvme_core_quirk_entry {
 	/*
 	 * NVMe model and firmware strings are padded with spaces.  For
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb673b8..0bc4e85 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -15,6 +15,7 @@
 #include <linux/sed-opal.h>
 #include <linux/fault-inject.h>
 #include <linux/rcupdate.h>
+#include <linux/thermal.h>
 
 extern unsigned int nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
@@ -248,6 +249,14 @@ struct nvme_ctrl {
 
 	struct page *discard_page;
 	unsigned long discard_page_busy;
+
+#ifdef CONFIG_THERMAL
+	/*
+	 * tzdev[0]: composite temperature
+	 * tzdev[1-8]: temperature sensor 1 through 8
+	 */
+	struct thermal_zone_device *tzdev[9];
+#endif
 };
 
 enum nvme_iopolicy {
@@ -559,6 +568,24 @@ static inline void nvme_mpath_stop(struct nvme_ctrl *ctrl)
 }
 #endif /* CONFIG_NVME_MULTIPATH */
 
+#ifdef CONFIG_THERMAL
+
+int nvme_thermal_zones_register(struct nvme_ctrl *ctrl);
+void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl);
+
+#else
+
+static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
+{
+	return 0;
+}
+
+static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
+{
+}
+
+#endif /* CONFIG_THERMAL */
+
 #ifdef CONFIG_NVM
 int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node);
 void nvme_nvm_unregister(struct nvme_ns *ns);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 658ac75..54f0a13 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -780,6 +780,11 @@ struct nvme_write_zeroes_cmd {
 
 /* Features */
 
+enum {
+	NVME_TEMP_THRESH_MASK		= 0xffff,
+	NVME_TEMP_THRESH_SELECT_SHIFT	= 16,
+};
+
 struct nvme_feat_auto_pst {
 	__le64 entries[32];
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-21 16:04   ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)


The NVMe controller reports up to nine temperature values in the SMART /
Health log page (the composite temperature and temperature sensor 1 through
temperature sensor 8).
The temperature threshold feature (Feature Identifier 04h) configures the
asynchronous event request command to complete when the temperature is
crossed its corresponding temperature threshold.

This adds infrastructure to provide these temperatures and thresholds via
thermal zone devices.

The nvme_thermal_zones_register() creates up to nine thermal zone devices
for all implemented temperature sensors including the composite
temperature.

/sys/class/thermal/thermal_zone[0-*]:
    |---temp: Temperature
    |---trip_point_0_temp: Over temperature threshold

The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
On the other hand, the following symlinks to the thermal zone devices are
created in the nvme device sysfs directory.

- nvme_temp0: Composite temperature
- nvme_temp1: Temperature sensor 1
...
- nvme_temp8: Temperature sensor 8

The nvme_thermal_zones_unregister() removes the registered thermal zone
devices and symlinks.

Cc: Zhang Rui <rui.zhang at intel.com>
Cc: Eduardo Valentin <edubezval at gmail.com>
Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Jens Axboe <axboe at fb.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Minwoo Im <minwoo.im.dev at gmail.com>
Cc: Kenneth Heitke <kenneth.heitke at intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
Signed-off-by: Akinobu Mita <akinobu.mita at gmail.com>
---
* v2
- s/correspoinding/corresponding/ typo in commit log
- Borrowed nvme_get_features() from Keith's patch
- Temperature threshold notification is splitted into another patch
- Change the data type of 'sensor' to unsigned
- Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
- Add WARN_ON_ONCE for paranoid checks
- Fix off-by-one error in nvme_get_temp
- Validate 'sensor' where the value is actually used
- Define and utilize two enums related to the temperature threshold feature
- Remove hysteresis value for this trip point and don't utilize the under
  temperature threshold
- Print error message for thermal_zone_device_register() failure
- Add function comments for nvme_thermal_zones_{,un}register
- Suppress non-fatal errors from nvme_thermal_zones_register()
- Add comment about implemented temperature sensors 
- Instead of creating a new 'thermal_work', append async smart event's
  action to the existing async_event_work
- Add comment for tzdev member in nvme_ctrl

 drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  27 +++++
 include/linux/nvme.h     |   5 +
 3 files changed, 297 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c04df80..0ec303c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
 	}
 }
 
+#ifdef CONFIG_THERMAL
+
+static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
+{
+	struct nvme_smart_log *log;
+	int ret;
+
+	BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
+		     ARRAY_SIZE(ctrl->tzdev));
+
+	if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
+		return -EINVAL;
+
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log)
+		return -ENOMEM;
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
+			   log, sizeof(*log), 0);
+	if (ret) {
+		ret = ret > 0 ? -EINVAL : ret;
+		goto free_log;
+	}
+
+	if (sensor)
+		*temp = le16_to_cpu(log->temp_sensor[sensor - 1]);
+	else
+		*temp = get_unaligned_le16(log->temperature);
+
+	if (!*temp)
+		ret = -EINVAL;
+
+free_log:
+	kfree(log);
+
+	return ret;
+}
+
+static unsigned int nvme_tz_type_to_sensor(const char *type)
+{
+	unsigned int sensor;
+
+	if (sscanf(type, "nvme_temp%u", &sensor) != 1)
+		return UINT_MAX;
+
+	return sensor;
+}
+
+#define KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS((t) * 10)
+#define MILLICELSIUS_TO_KELVIN(t) ((MILLICELSIUS_TO_DECI_KELVIN(t) + 5) / 10)
+
+static int nvme_tz_get_temp(struct thermal_zone_device *tzdev,
+			    int *temp)
+{
+	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
+	struct nvme_ctrl *ctrl = tzdev->devdata;
+	int ret;
+
+	ret = nvme_get_temp(ctrl, sensor, temp);
+	if (!ret)
+		*temp = KELVIN_TO_MILLICELSIUS(*temp);
+
+	return ret;
+}
+
+static int nvme_tz_get_trip_type(struct thermal_zone_device *tzdev,
+				 int trip, enum thermal_trip_type *type)
+{
+	*type = THERMAL_TRIP_ACTIVE;
+
+	return 0;
+}
+
+static int nvme_get_over_temp_thresh(struct nvme_ctrl *ctrl,
+				     unsigned int sensor, int *temp)
+{
+	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
+	int status;
+	int ret;
+
+	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
+		return -EINVAL;
+
+	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
+				&status);
+	if (!ret)
+		*temp = status & NVME_TEMP_THRESH_MASK;
+
+	return ret > 0 ? -EINVAL : ret;
+}
+
+static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
+				     unsigned int sensor, int temp)
+{
+	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
+	int status;
+	int ret;
+
+	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
+		return -EINVAL;
+
+	if (temp > NVME_TEMP_THRESH_MASK)
+		return -EINVAL;
+
+	threshold |= temp & NVME_TEMP_THRESH_MASK;
+
+	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
+				&status);
+
+	return ret > 0 ? -EINVAL : ret;
+}
+
+static int nvme_tz_get_trip_temp(struct thermal_zone_device *tzdev,
+				 int trip, int *temp)
+{
+	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
+	struct nvme_ctrl *ctrl = tzdev->devdata;
+	int ret;
+
+	ret = nvme_get_over_temp_thresh(ctrl, sensor, temp);
+	if (!ret)
+		*temp = KELVIN_TO_MILLICELSIUS(*temp);
+
+	return ret;
+}
+
+static int nvme_tz_set_trip_temp(struct thermal_zone_device *tzdev,
+				 int trip, int temp)
+{
+	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
+	struct nvme_ctrl *ctrl = tzdev->devdata;
+
+	temp = MILLICELSIUS_TO_KELVIN(temp);
+
+	return nvme_set_over_temp_thresh(ctrl, sensor, temp);
+}
+
+static struct thermal_zone_device_ops nvme_tz_ops = {
+	.get_temp = nvme_tz_get_temp,
+	.get_trip_type = nvme_tz_get_trip_type,
+	.get_trip_temp = nvme_tz_get_trip_temp,
+	.set_trip_temp = nvme_tz_set_trip_temp,
+};
+
+static struct thermal_zone_params nvme_tz_params = {
+	.governor_name = "user_space",
+	.no_hwmon = true,
+};
+
+static struct thermal_zone_device *
+nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
+{
+	struct thermal_zone_device *tzdev;
+	char type[THERMAL_NAME_LENGTH];
+	int ret;
+
+	snprintf(type, sizeof(type), "nvme_temp%d", sensor);
+
+	tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
+					     &nvme_tz_params, 0, 0);
+	if (IS_ERR(tzdev)) {
+		dev_err(ctrl->device,
+			"Failed to register thermal zone device: %ld\n",
+			PTR_ERR(tzdev));
+		return tzdev;
+	}
+
+	ret = sysfs_create_link(&ctrl->ctrl_device.kobj,
+				&tzdev->device.kobj, type);
+	if (ret)
+		goto device_unregister;
+
+	ret = sysfs_create_link(&tzdev->device.kobj,
+				&ctrl->ctrl_device.kobj, "device");
+	if (ret)
+		goto remove_link;
+
+	return tzdev;
+
+remove_link:
+	sysfs_remove_link(&ctrl->ctrl_device.kobj, type);
+device_unregister:
+	thermal_zone_device_unregister(tzdev);
+
+	return ERR_PTR(ret);
+}
+
+/**
+ * nvme_thermal_zones_register() - register nvme thermal zone devices
+ * @ctrl: controller instance
+ *
+ * This function creates up to nine thermal zone devices for all implemented
+ * temperature sensors including the composite temperature.
+ * Each thermal zone device provides a single trip point temperature that is
+ * associated with an over temperature threshold.
+ */
+int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
+{
+	struct nvme_smart_log *log;
+	int ret;
+	int i;
+
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log)
+		return 0; /* non-fatal error */
+
+	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
+			   log, sizeof(*log), 0);
+	if (ret) {
+		dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
+		ret = ret > 0 ? -EINVAL : ret;
+		goto free_log;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
+		struct thermal_zone_device *tzdev;
+
+		/*
+		 * All implemented temperature sensors report a non-zero value
+		 * in temperature sensor fields in the smart log page.
+		 */
+		if (i && !le16_to_cpu(log->temp_sensor[i - 1]))
+			continue;
+		if (ctrl->tzdev[i])
+			continue;
+
+		tzdev = nvme_thermal_zone_register(ctrl, i);
+		if (!IS_ERR(tzdev))
+			ctrl->tzdev[i] = tzdev;
+	}
+
+free_log:
+	kfree(log);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_thermal_zones_register);
+
+/**
+ * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices
+ * @ctrl: controller instance
+ *
+ * This function removes the registered thermal zone devices and symlinks.
+ */
+void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
+		struct thermal_zone_device *tzdev = ctrl->tzdev[i];
+
+		if (!tzdev)
+			continue;
+
+		sysfs_remove_link(&tzdev->device.kobj, "device");
+		sysfs_remove_link(&ctrl->ctrl_device.kobj, tzdev->type);
+		thermal_zone_device_unregister(tzdev);
+
+		ctrl->tzdev[i] = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(nvme_thermal_zones_unregister);
+
+#endif /* CONFIG_THERMAL */
+
 struct nvme_core_quirk_entry {
 	/*
 	 * NVMe model and firmware strings are padded with spaces.  For
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb673b8..0bc4e85 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -15,6 +15,7 @@
 #include <linux/sed-opal.h>
 #include <linux/fault-inject.h>
 #include <linux/rcupdate.h>
+#include <linux/thermal.h>
 
 extern unsigned int nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
@@ -248,6 +249,14 @@ struct nvme_ctrl {
 
 	struct page *discard_page;
 	unsigned long discard_page_busy;
+
+#ifdef CONFIG_THERMAL
+	/*
+	 * tzdev[0]: composite temperature
+	 * tzdev[1-8]: temperature sensor 1 through 8
+	 */
+	struct thermal_zone_device *tzdev[9];
+#endif
 };
 
 enum nvme_iopolicy {
@@ -559,6 +568,24 @@ static inline void nvme_mpath_stop(struct nvme_ctrl *ctrl)
 }
 #endif /* CONFIG_NVME_MULTIPATH */
 
+#ifdef CONFIG_THERMAL
+
+int nvme_thermal_zones_register(struct nvme_ctrl *ctrl);
+void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl);
+
+#else
+
+static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
+{
+	return 0;
+}
+
+static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
+{
+}
+
+#endif /* CONFIG_THERMAL */
+
 #ifdef CONFIG_NVM
 int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node);
 void nvme_nvm_unregister(struct nvme_ns *ns);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 658ac75..54f0a13 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -780,6 +780,11 @@ struct nvme_write_zeroes_cmd {
 
 /* Features */
 
+enum {
+	NVME_TEMP_THRESH_MASK		= 0xffff,
+	NVME_TEMP_THRESH_SELECT_SHIFT	= 16,
+};
+
 struct nvme_feat_auto_pst {
 	__le64 entries[32];
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 3/4] nvme: notify thermal framework when temperature threshold events occur
  2019-05-21 16:04 ` Akinobu Mita
@ 2019-05-21 16:04   ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)
  To: linux-nvme, linux-pm
  Cc: Akinobu Mita, Zhang Rui, Eduardo Valentin, Daniel Lezcano,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Minwoo Im, Kenneth Heitke, Chaitanya Kulkarni

This enables the reporting of asynchronous events from the controller when
the temperature reached or exceeded a temperature threshold.

In the case of the temperature threshold conditions, this notifies the
thermal framework.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- New patch since v2
- Extracted from 'add thermal zone infrastructure' patch

 drivers/nvme/host/core.c | 30 ++++++++++++++++++++++++++++++
 include/linux/nvme.h     |  7 +++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0ec303c..a86f9f4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1184,6 +1184,9 @@ static void nvme_enable_aen(struct nvme_ctrl *ctrl)
 	u32 result, supported_aens = ctrl->oaes & NVME_AEN_SUPPORTED;
 	int status;
 
+	if (IS_ENABLED(CONFIG_THERMAL))
+		supported_aens |= NVME_SMART_CRIT_TEMPERATURE;
+
 	if (!supported_aens)
 		return;
 
@@ -2442,6 +2445,22 @@ void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_thermal_zones_unregister);
 
+static void nvme_thermal_notify_framework(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
+		if (ctrl->tzdev[i])
+			thermal_notify_framework(ctrl->tzdev[i], 0);
+	}
+}
+
+#else
+
+static void nvme_thermal_notify_framework(struct nvme_ctrl *ctrl)
+{
+}
+
 #endif /* CONFIG_THERMAL */
 
 struct nvme_core_quirk_entry {
@@ -3857,6 +3876,16 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_remove_namespaces);
 
+static void nvme_handle_aen_smart(struct nvme_ctrl *ctrl, u32 result)
+{
+	u32 aer_type = result & NVME_AER_TYPE_MASK;
+	u32 aer_info = (result >> NVME_AER_INFO_SHIFT) & NVME_AER_INFO_MASK;
+
+	if (aer_type == NVME_AER_SMART &&
+	    aer_info == NVME_AER_SMART_TEMP_THRESH)
+		nvme_thermal_notify_framework(ctrl);
+}
+
 static void nvme_aen_uevent(struct nvme_ctrl *ctrl)
 {
 	char *envp[2] = { NULL, NULL };
@@ -3878,6 +3907,7 @@ static void nvme_async_event_work(struct work_struct *work)
 	struct nvme_ctrl *ctrl =
 		container_of(work, struct nvme_ctrl, async_event_work);
 
+	nvme_handle_aen_smart(ctrl, ctrl->aen_result);
 	nvme_aen_uevent(ctrl);
 	ctrl->ops->submit_async_event(ctrl);
 }
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 54f0a13..8e7d599 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -507,6 +507,7 @@ enum {
 };
 
 enum {
+	NVME_AER_TYPE_MASK		= 0x7,
 	NVME_AER_ERROR			= 0,
 	NVME_AER_SMART			= 1,
 	NVME_AER_NOTICE			= 2,
@@ -515,6 +516,12 @@ enum {
 };
 
 enum {
+	NVME_AER_INFO_SHIFT		= 8,
+	NVME_AER_INFO_MASK		= 0xff,
+	NVME_AER_SMART_TEMP_THRESH	= 0x01,
+};
+
+enum {
 	NVME_AER_NOTICE_NS_CHANGED	= 0x00,
 	NVME_AER_NOTICE_FW_ACT_STARTING = 0x01,
 	NVME_AER_NOTICE_ANA		= 0x03,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 3/4] nvme: notify thermal framework when temperature threshold events occur
@ 2019-05-21 16:04   ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)


This enables the reporting of asynchronous events from the controller when
the temperature reached or exceeded a temperature threshold.

In the case of the temperature threshold conditions, this notifies the
thermal framework.

Cc: Zhang Rui <rui.zhang at intel.com>
Cc: Eduardo Valentin <edubezval at gmail.com>
Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Jens Axboe <axboe at fb.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Minwoo Im <minwoo.im.dev at gmail.com>
Cc: Kenneth Heitke <kenneth.heitke at intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
Signed-off-by: Akinobu Mita <akinobu.mita at gmail.com>
---
* v2
- New patch since v2
- Extracted from 'add thermal zone infrastructure' patch

 drivers/nvme/host/core.c | 30 ++++++++++++++++++++++++++++++
 include/linux/nvme.h     |  7 +++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0ec303c..a86f9f4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1184,6 +1184,9 @@ static void nvme_enable_aen(struct nvme_ctrl *ctrl)
 	u32 result, supported_aens = ctrl->oaes & NVME_AEN_SUPPORTED;
 	int status;
 
+	if (IS_ENABLED(CONFIG_THERMAL))
+		supported_aens |= NVME_SMART_CRIT_TEMPERATURE;
+
 	if (!supported_aens)
 		return;
 
@@ -2442,6 +2445,22 @@ void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_thermal_zones_unregister);
 
+static void nvme_thermal_notify_framework(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
+		if (ctrl->tzdev[i])
+			thermal_notify_framework(ctrl->tzdev[i], 0);
+	}
+}
+
+#else
+
+static void nvme_thermal_notify_framework(struct nvme_ctrl *ctrl)
+{
+}
+
 #endif /* CONFIG_THERMAL */
 
 struct nvme_core_quirk_entry {
@@ -3857,6 +3876,16 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_remove_namespaces);
 
+static void nvme_handle_aen_smart(struct nvme_ctrl *ctrl, u32 result)
+{
+	u32 aer_type = result & NVME_AER_TYPE_MASK;
+	u32 aer_info = (result >> NVME_AER_INFO_SHIFT) & NVME_AER_INFO_MASK;
+
+	if (aer_type == NVME_AER_SMART &&
+	    aer_info == NVME_AER_SMART_TEMP_THRESH)
+		nvme_thermal_notify_framework(ctrl);
+}
+
 static void nvme_aen_uevent(struct nvme_ctrl *ctrl)
 {
 	char *envp[2] = { NULL, NULL };
@@ -3878,6 +3907,7 @@ static void nvme_async_event_work(struct work_struct *work)
 	struct nvme_ctrl *ctrl =
 		container_of(work, struct nvme_ctrl, async_event_work);
 
+	nvme_handle_aen_smart(ctrl, ctrl->aen_result);
 	nvme_aen_uevent(ctrl);
 	ctrl->ops->submit_async_event(ctrl);
 }
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 54f0a13..8e7d599 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -507,6 +507,7 @@ enum {
 };
 
 enum {
+	NVME_AER_TYPE_MASK		= 0x7,
 	NVME_AER_ERROR			= 0,
 	NVME_AER_SMART			= 1,
 	NVME_AER_NOTICE			= 2,
@@ -515,6 +516,12 @@ enum {
 };
 
 enum {
+	NVME_AER_INFO_SHIFT		= 8,
+	NVME_AER_INFO_MASK		= 0xff,
+	NVME_AER_SMART_TEMP_THRESH	= 0x01,
+};
+
+enum {
 	NVME_AER_NOTICE_NS_CHANGED	= 0x00,
 	NVME_AER_NOTICE_FW_ACT_STARTING = 0x01,
 	NVME_AER_NOTICE_ANA		= 0x03,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 4/4] nvme-pci: support thermal zone
  2019-05-21 16:04 ` Akinobu Mita
@ 2019-05-21 16:04   ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)
  To: linux-nvme, linux-pm
  Cc: Akinobu Mita, Zhang Rui, Eduardo Valentin, Daniel Lezcano,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Minwoo Im, Kenneth Heitke, Chaitanya Kulkarni

This enables to use thermal zone interfaces for NVMe
temperature sensors.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Kenneth Heitke <kenneth.heitke@intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
* v2
- Call nvme_thermal_zones_unregister() earlier than the last reference
  release

 drivers/nvme/host/pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 04084b9..108b022 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2555,6 +2555,10 @@ static void nvme_reset_work(struct work_struct *work)
 		dev->ctrl.opal_dev = NULL;
 	}
 
+	result = nvme_thermal_zones_register(&dev->ctrl);
+	if (result < 0)
+		goto out;
+
 	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP) {
 		result = nvme_dbbuf_dma_alloc(dev);
 		if (result)
@@ -2833,6 +2837,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
+	nvme_thermal_zones_unregister(&dev->ctrl);
 	nvme_dev_disable(dev, true, false);
 	nvme_release_cmb(dev);
 	nvme_free_host_mem(dev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 4/4] nvme-pci: support thermal zone
@ 2019-05-21 16:04   ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-21 16:04 UTC (permalink / raw)


This enables to use thermal zone interfaces for NVMe
temperature sensors.

Cc: Zhang Rui <rui.zhang at intel.com>
Cc: Eduardo Valentin <edubezval at gmail.com>
Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Jens Axboe <axboe at fb.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Minwoo Im <minwoo.im.dev at gmail.com>
Cc: Kenneth Heitke <kenneth.heitke at intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
Signed-off-by: Akinobu Mita <akinobu.mita at gmail.com>
---
* v2
- Call nvme_thermal_zones_unregister() earlier than the last reference
  release

 drivers/nvme/host/pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 04084b9..108b022 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2555,6 +2555,10 @@ static void nvme_reset_work(struct work_struct *work)
 		dev->ctrl.opal_dev = NULL;
 	}
 
+	result = nvme_thermal_zones_register(&dev->ctrl);
+	if (result < 0)
+		goto out;
+
 	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP) {
 		result = nvme_dbbuf_dma_alloc(dev);
 		if (result)
@@ -2833,6 +2837,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
+	nvme_thermal_zones_unregister(&dev->ctrl);
 	nvme_dev_disable(dev, true, false);
 	nvme_release_cmb(dev);
 	nvme_free_host_mem(dev);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-21 16:04   ` Akinobu Mita
@ 2019-05-21 16:15     ` Keith Busch
  -1 siblings, 0 replies; 38+ messages in thread
From: Keith Busch @ 2019-05-21 16:15 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-pm, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Jens Axboe, Kenneth Heitke, Daniel Lezcano,
	Eduardo Valentin, Minwoo Im, Zhang Rui, Christoph Hellwig

On Wed, May 22, 2019 at 01:04:07AM +0900, Akinobu Mita wrote:
> +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_smart_log *log;
> +	int ret;
> +	int i;
> +
> +	log = kzalloc(sizeof(*log), GFP_KERNEL);
> +	if (!log)
> +		return 0; /* non-fatal error */
> +
> +	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> +			   log, sizeof(*log), 0);
> +	if (ret) {
> +		dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> +		ret = ret > 0 ? -EINVAL : ret;

A ret > 0 means the device provided a response, so don't return a
negative for that condition, please. That's just going to break
controllers that don't provide smart data, like qemu.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-21 16:15     ` Keith Busch
  0 siblings, 0 replies; 38+ messages in thread
From: Keith Busch @ 2019-05-21 16:15 UTC (permalink / raw)


On Wed, May 22, 2019@01:04:07AM +0900, Akinobu Mita wrote:
> +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_smart_log *log;
> +	int ret;
> +	int i;
> +
> +	log = kzalloc(sizeof(*log), GFP_KERNEL);
> +	if (!log)
> +		return 0; /* non-fatal error */
> +
> +	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> +			   log, sizeof(*log), 0);
> +	if (ret) {
> +		dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> +		ret = ret > 0 ? -EINVAL : ret;

A ret > 0 means the device provided a response, so don't return a
negative for that condition, please. That's just going to break
controllers that don't provide smart data, like qemu.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] nvme: Export get and set features
  2019-05-21 16:04   ` Akinobu Mita
@ 2019-05-21 17:23     ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2019-05-21 17:23 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-pm; +Cc: Keith Busch

On 5/21/19 9:05 AM, Akinobu Mita wrote:
> From: Keith Busch <keith.busch@intel.com>
>
> Future use intends to make use of features, so export these functions. And
> since their implementation is identical except for the opcode, provide
> a new convenience function that implement each.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/nvme/host/core.c | 22 +++++++++++++++++++---
>  drivers/nvme/host/nvme.h |  4 ++++
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d352145..c04df80 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
>  	return id;
>  }
>  
> -static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> -		      void *buffer, size_t buflen, u32 *result)
> +static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,
> +		unsigned dword11, void *buffer, size_t buflen, u32 *result)

Your patch is generating warnings, can we please avoid these warnings ?

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#28: FILE: drivers/nvme/host/core.c:1116:
+static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#29: FILE: drivers/nvme/host/core.c:1117:
+        unsigned dword11, void *buffer, size_t buflen, u32 *result)

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#45: FILE: drivers/nvme/host/core.c:1135:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#45: FILE: drivers/nvme/host/core.c:1135:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#53: FILE: drivers/nvme/host/core.c:1143:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#53: FILE: drivers/nvme/host/core.c:1143:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#72: FILE: drivers/nvme/host/nvme.h:462:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#72: FILE: drivers/nvme/host/nvme.h:462:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#74: FILE: drivers/nvme/host/nvme.h:464:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#74: FILE: drivers/nvme/host/nvme.h:464:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

total: 0 errors, 10 warnings, 50 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or
--fix-inplace.

0001-nvme-fix-memory-leak-for-power-latency-tolerance.patch has style
problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

>  {
>  	struct nvme_command c;
>  	union nvme_result res;
>  	int ret;
>  
>  	memset(&c, 0, sizeof(c));
> -	c.features.opcode = nvme_admin_set_features;
> +	c.features.opcode = op;
>  	c.features.fid = cpu_to_le32(fid);
>  	c.features.dword11 = cpu_to_le32(dword11);
>  
> @@ -1132,6 +1132,22 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
>  	return ret;
>  }
>  
> +int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result)
> +{
> +	return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
> +			     buflen, result);
> +}
> +EXPORT_SYMBOL_GPL(nvme_set_features);
> +
> +int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result)
> +{
> +	return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
> +			     buflen, result);
> +}
> +EXPORT_SYMBOL_GPL(nvme_get_features);
> +
>  int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
>  {
>  	u32 q_count = (*count - 1) | ((*count - 1) << 16);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 56bba7a..bb673b8 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -459,6 +459,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>  		union nvme_result *result, void *buffer, unsigned bufflen,
>  		unsigned timeout, int qid, int at_head,
>  		blk_mq_req_flags_t flags, bool poll);
> +int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result);
> +int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result);
>  int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
>  void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
>  int nvme_reset_ctrl(struct nvme_ctrl *ctrl);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/4] nvme: Export get and set features
@ 2019-05-21 17:23     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2019-05-21 17:23 UTC (permalink / raw)


On 5/21/19 9:05 AM, Akinobu Mita wrote:
> From: Keith Busch <keith.busch at intel.com>
>
> Future use intends to make use of features, so export these functions. And
> since their implementation is identical except for the opcode, provide
> a new convenience function that implement each.
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/core.c | 22 +++++++++++++++++++---
>  drivers/nvme/host/nvme.h |  4 ++++
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d352145..c04df80 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
>  	return id;
>  }
>  
> -static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> -		      void *buffer, size_t buflen, u32 *result)
> +static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,
> +		unsigned dword11, void *buffer, size_t buflen, u32 *result)

Your patch is generating warnings, can we please avoid these warnings ?

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#28: FILE: drivers/nvme/host/core.c:1116:
+static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#29: FILE: drivers/nvme/host/core.c:1117:
+        unsigned dword11, void *buffer, size_t buflen, u32 *result)

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#45: FILE: drivers/nvme/host/core.c:1135:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#45: FILE: drivers/nvme/host/core.c:1135:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#53: FILE: drivers/nvme/host/core.c:1143:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#53: FILE: drivers/nvme/host/core.c:1143:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#72: FILE: drivers/nvme/host/nvme.h:462:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#72: FILE: drivers/nvme/host/nvme.h:462:
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#74: FILE: drivers/nvme/host/nvme.h:464:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
#74: FILE: drivers/nvme/host/nvme.h:464:
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned
dword11,

total: 0 errors, 10 warnings, 50 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or
--fix-inplace.

0001-nvme-fix-memory-leak-for-power-latency-tolerance.patch has style
problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

>  {
>  	struct nvme_command c;
>  	union nvme_result res;
>  	int ret;
>  
>  	memset(&c, 0, sizeof(c));
> -	c.features.opcode = nvme_admin_set_features;
> +	c.features.opcode = op;
>  	c.features.fid = cpu_to_le32(fid);
>  	c.features.dword11 = cpu_to_le32(dword11);
>  
> @@ -1132,6 +1132,22 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
>  	return ret;
>  }
>  
> +int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result)
> +{
> +	return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
> +			     buflen, result);
> +}
> +EXPORT_SYMBOL_GPL(nvme_set_features);
> +
> +int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result)
> +{
> +	return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
> +			     buflen, result);
> +}
> +EXPORT_SYMBOL_GPL(nvme_get_features);
> +
>  int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
>  {
>  	u32 q_count = (*count - 1) | ((*count - 1) << 16);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 56bba7a..bb673b8 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -459,6 +459,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>  		union nvme_result *result, void *buffer, unsigned bufflen,
>  		unsigned timeout, int qid, int at_head,
>  		blk_mq_req_flags_t flags, bool poll);
> +int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result);
> +int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> +		      void *buffer, size_t buflen, u32 *result);
>  int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
>  void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
>  int nvme_reset_ctrl(struct nvme_ctrl *ctrl);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-21 16:04   ` Akinobu Mita
@ 2019-05-21 21:05     ` Heitke, Kenneth
  -1 siblings, 0 replies; 38+ messages in thread
From: Heitke, Kenneth @ 2019-05-21 21:05 UTC (permalink / raw)
  To: Akinobu Mita, linux-nvme, linux-pm
  Cc: Keith Busch, Sagi Grimberg, Chaitanya Kulkarni, Jens Axboe,
	Daniel Lezcano, Eduardo Valentin, Minwoo Im, Zhang Rui,
	Christoph Hellwig



On 5/21/2019 10:04 AM, Akinobu Mita wrote:
> +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> +				     unsigned int sensor, int temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> +		return -EINVAL;
> +
> +	if (temp > NVME_TEMP_THRESH_MASK)
> +		return -EINVAL;
> +
> +	threshold |= temp & NVME_TEMP_THRESH_MASK;
> +
> +	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);

I'm not sure about best practices here but since 'status' is unused, you
can pass in a NULL pointer. The called function deals with it correctly.

> +
> +	return ret > 0 ? -EINVAL : ret;
> +}
> +

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-21 21:05     ` Heitke, Kenneth
  0 siblings, 0 replies; 38+ messages in thread
From: Heitke, Kenneth @ 2019-05-21 21:05 UTC (permalink / raw)




On 5/21/2019 10:04 AM, Akinobu Mita wrote:
> +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> +				     unsigned int sensor, int temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> +		return -EINVAL;
> +
> +	if (temp > NVME_TEMP_THRESH_MASK)
> +		return -EINVAL;
> +
> +	threshold |= temp & NVME_TEMP_THRESH_MASK;
> +
> +	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);

I'm not sure about best practices here but since 'status' is unused, you
can pass in a NULL pointer. The called function deals with it correctly.

> +
> +	return ret > 0 ? -EINVAL : ret;
> +}
> +

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-21 21:05     ` Heitke, Kenneth
@ 2019-05-22 15:23       ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:23 UTC (permalink / raw)
  To: Heitke, Kenneth
  Cc: linux-nvme, linux-pm, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Jens Axboe, Daniel Lezcano, Eduardo Valentin,
	Minwoo Im, Zhang Rui, Christoph Hellwig

2019年5月22日(水) 6:05 Heitke, Kenneth <kenneth.heitke@intel.com>:
>
>
>
> On 5/21/2019 10:04 AM, Akinobu Mita wrote:
> > +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> > +                                  unsigned int sensor, int temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> > +             return -EINVAL;
> > +
> > +     if (temp > NVME_TEMP_THRESH_MASK)
> > +             return -EINVAL;
> > +
> > +     threshold |= temp & NVME_TEMP_THRESH_MASK;
> > +
> > +     ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
>
> I'm not sure about best practices here but since 'status' is unused, you
> can pass in a NULL pointer. The called function deals with it correctly.

OK.  Make sense.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-22 15:23       ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:23 UTC (permalink / raw)


2019?5?22?(?) 6:05 Heitke, Kenneth <kenneth.heitke at intel.com>:
>
>
>
> On 5/21/2019 10:04 AM, Akinobu Mita wrote:
> > +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> > +                                  unsigned int sensor, int temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> > +             return -EINVAL;
> > +
> > +     if (temp > NVME_TEMP_THRESH_MASK)
> > +             return -EINVAL;
> > +
> > +     threshold |= temp & NVME_TEMP_THRESH_MASK;
> > +
> > +     ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
>
> I'm not sure about best practices here but since 'status' is unused, you
> can pass in a NULL pointer. The called function deals with it correctly.

OK.  Make sense.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] nvme: Export get and set features
  2019-05-21 17:23     ` Chaitanya Kulkarni
@ 2019-05-22 15:24       ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:24 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-nvme, linux-pm, Keith Busch

2019年5月22日(水) 2:23 Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>:
>
> On 5/21/19 9:05 AM, Akinobu Mita wrote:
> > From: Keith Busch <keith.busch@intel.com>
> >
> > Future use intends to make use of features, so export these functions. And
> > since their implementation is identical except for the opcode, provide
> > a new convenience function that implement each.
> >
> > Signed-off-by: Keith Busch <keith.busch@intel.com>
> > ---
> >  drivers/nvme/host/core.c | 22 +++++++++++++++++++---
> >  drivers/nvme/host/nvme.h |  4 ++++
> >  2 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index d352145..c04df80 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
> >       return id;
> >  }
> >
> > -static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> > -                   void *buffer, size_t buflen, u32 *result)
> > +static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,
> > +             unsigned dword11, void *buffer, size_t buflen, u32 *result)
>
> Your patch is generating warnings, can we please avoid these warnings ?
>
> WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
> #28: FILE: drivers/nvme/host/core.c:1116:
> +static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,

OK.  I'll convert all 'unsigned' to 'unsigned int' in this patch.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/4] nvme: Export get and set features
@ 2019-05-22 15:24       ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:24 UTC (permalink / raw)


2019?5?22?(?) 2:23 Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>:
>
> On 5/21/19 9:05 AM, Akinobu Mita wrote:
> > From: Keith Busch <keith.busch at intel.com>
> >
> > Future use intends to make use of features, so export these functions. And
> > since their implementation is identical except for the opcode, provide
> > a new convenience function that implement each.
> >
> > Signed-off-by: Keith Busch <keith.busch at intel.com>
> > ---
> >  drivers/nvme/host/core.c | 22 +++++++++++++++++++---
> >  drivers/nvme/host/nvme.h |  4 ++++
> >  2 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index d352145..c04df80 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
> >       return id;
> >  }
> >
> > -static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
> > -                   void *buffer, size_t buflen, u32 *result)
> > +static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,
> > +             unsigned dword11, void *buffer, size_t buflen, u32 *result)
>
> Your patch is generating warnings, can we please avoid these warnings ?
>
> WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
> #28: FILE: drivers/nvme/host/core.c:1116:
> +static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned fid,

OK.  I'll convert all 'unsigned' to 'unsigned int' in this patch.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-21 16:15     ` Keith Busch
@ 2019-05-22 15:44       ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:44 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-pm, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Jens Axboe, Kenneth Heitke, Daniel Lezcano,
	Eduardo Valentin, Minwoo Im, Zhang Rui, Christoph Hellwig

2019年5月22日(水) 1:20 Keith Busch <kbusch@kernel.org>:
>
> On Wed, May 22, 2019 at 01:04:07AM +0900, Akinobu Mita wrote:
> > +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > +{
> > +     struct nvme_smart_log *log;
> > +     int ret;
> > +     int i;
> > +
> > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > +     if (!log)
> > +             return 0; /* non-fatal error */
> > +
> > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > +                        log, sizeof(*log), 0);
> > +     if (ret) {
> > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > +             ret = ret > 0 ? -EINVAL : ret;
>
> A ret > 0 means the device provided a response, so don't return a
> negative for that condition, please. That's just going to break
> controllers that don't provide smart data, like qemu.

After looking at __nvme_submit_sync_cmd(), it returns -EINTR if the device
doesn't respond.  So, should this return a negative only when nvme_get_log()
returns -EINTR?

        ret = nvme_get_log();
        if (ret) {
                dev_err(...);
                if (ret != -EINTR)
                        ret = 0;
                goto free_log;
        }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-22 15:44       ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:44 UTC (permalink / raw)


2019?5?22?(?) 1:20 Keith Busch <kbusch at kernel.org>:
>
> On Wed, May 22, 2019@01:04:07AM +0900, Akinobu Mita wrote:
> > +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > +{
> > +     struct nvme_smart_log *log;
> > +     int ret;
> > +     int i;
> > +
> > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > +     if (!log)
> > +             return 0; /* non-fatal error */
> > +
> > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > +                        log, sizeof(*log), 0);
> > +     if (ret) {
> > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > +             ret = ret > 0 ? -EINVAL : ret;
>
> A ret > 0 means the device provided a response, so don't return a
> negative for that condition, please. That's just going to break
> controllers that don't provide smart data, like qemu.

After looking at __nvme_submit_sync_cmd(), it returns -EINTR if the device
doesn't respond.  So, should this return a negative only when nvme_get_log()
returns -EINTR?

        ret = nvme_get_log();
        if (ret) {
                dev_err(...);
                if (ret != -EINTR)
                        ret = 0;
                goto free_log;
        }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-22 15:44       ` Akinobu Mita
@ 2019-05-22 15:44         ` Keith Busch
  -1 siblings, 0 replies; 38+ messages in thread
From: Keith Busch @ 2019-05-22 15:44 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-pm, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Jens Axboe, Kenneth Heitke, Daniel Lezcano,
	Eduardo Valentin, Minwoo Im, Zhang Rui, Christoph Hellwig

On Thu, May 23, 2019 at 12:44:04AM +0900, Akinobu Mita wrote:
> 2019年5月22日(水) 1:20 Keith Busch <kbusch@kernel.org>:
> >
> > On Wed, May 22, 2019 at 01:04:07AM +0900, Akinobu Mita wrote:
> > > +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > > +{
> > > +     struct nvme_smart_log *log;
> > > +     int ret;
> > > +     int i;
> > > +
> > > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > > +     if (!log)
> > > +             return 0; /* non-fatal error */
> > > +
> > > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > > +                        log, sizeof(*log), 0);
> > > +     if (ret) {
> > > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > > +             ret = ret > 0 ? -EINVAL : ret;
> >
> > A ret > 0 means the device provided a response, so don't return a
> > negative for that condition, please. That's just going to break
> > controllers that don't provide smart data, like qemu.
> 
> After looking at __nvme_submit_sync_cmd(), it returns -EINTR if the device
> doesn't respond.  So, should this return a negative only when nvme_get_log()
> returns -EINTR?
> 
>         ret = nvme_get_log();
>         if (ret) {
>                 dev_err(...);
>                 if (ret != -EINTR)
>                         ret = 0;
>                 goto free_log;
>         }

We return a different negative error if we can't allocate a request,
like what happens if the controller is dead, like a surprise hot remove.

There's a simpler way to look at this: if ret >= 0, we may proceed,
otherwise we're done with this controller. Don't make it any more
complicated than that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-22 15:44         ` Keith Busch
  0 siblings, 0 replies; 38+ messages in thread
From: Keith Busch @ 2019-05-22 15:44 UTC (permalink / raw)


On Thu, May 23, 2019@12:44:04AM +0900, Akinobu Mita wrote:
> 2019?5?22?(?) 1:20 Keith Busch <kbusch at kernel.org>:
> >
> > On Wed, May 22, 2019@01:04:07AM +0900, Akinobu Mita wrote:
> > > +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > > +{
> > > +     struct nvme_smart_log *log;
> > > +     int ret;
> > > +     int i;
> > > +
> > > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > > +     if (!log)
> > > +             return 0; /* non-fatal error */
> > > +
> > > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > > +                        log, sizeof(*log), 0);
> > > +     if (ret) {
> > > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > > +             ret = ret > 0 ? -EINVAL : ret;
> >
> > A ret > 0 means the device provided a response, so don't return a
> > negative for that condition, please. That's just going to break
> > controllers that don't provide smart data, like qemu.
> 
> After looking at __nvme_submit_sync_cmd(), it returns -EINTR if the device
> doesn't respond.  So, should this return a negative only when nvme_get_log()
> returns -EINTR?
> 
>         ret = nvme_get_log();
>         if (ret) {
>                 dev_err(...);
>                 if (ret != -EINTR)
>                         ret = 0;
>                 goto free_log;
>         }

We return a different negative error if we can't allocate a request,
like what happens if the controller is dead, like a surprise hot remove.

There's a simpler way to look at this: if ret >= 0, we may proceed,
otherwise we're done with this controller. Don't make it any more
complicated than that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-22 15:44         ` Keith Busch
@ 2019-05-22 15:52           ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:52 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-pm, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Jens Axboe, Kenneth Heitke, Daniel Lezcano,
	Eduardo Valentin, Minwoo Im, Zhang Rui, Christoph Hellwig

2019年5月23日(木) 0:49 Keith Busch <kbusch@kernel.org>:
>
> On Thu, May 23, 2019 at 12:44:04AM +0900, Akinobu Mita wrote:
> > 2019年5月22日(水) 1:20 Keith Busch <kbusch@kernel.org>:
> > >
> > > On Wed, May 22, 2019 at 01:04:07AM +0900, Akinobu Mita wrote:
> > > > +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > > > +{
> > > > +     struct nvme_smart_log *log;
> > > > +     int ret;
> > > > +     int i;
> > > > +
> > > > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > > > +     if (!log)
> > > > +             return 0; /* non-fatal error */
> > > > +
> > > > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > > > +                        log, sizeof(*log), 0);
> > > > +     if (ret) {
> > > > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > > > +             ret = ret > 0 ? -EINVAL : ret;
> > >
> > > A ret > 0 means the device provided a response, so don't return a
> > > negative for that condition, please. That's just going to break
> > > controllers that don't provide smart data, like qemu.
> >
> > After looking at __nvme_submit_sync_cmd(), it returns -EINTR if the device
> > doesn't respond.  So, should this return a negative only when nvme_get_log()
> > returns -EINTR?
> >
> >         ret = nvme_get_log();
> >         if (ret) {
> >                 dev_err(...);
> >                 if (ret != -EINTR)
> >                         ret = 0;
> >                 goto free_log;
> >         }
>
> We return a different negative error if we can't allocate a request,
> like what happens if the controller is dead, like a surprise hot remove.
>
> There's a simpler way to look at this: if ret >= 0, we may proceed,
> otherwise we're done with this controller. Don't make it any more
> complicated than that.

OK.  Sounds good.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-22 15:52           ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-22 15:52 UTC (permalink / raw)


2019?5?23?(?) 0:49 Keith Busch <kbusch at kernel.org>:
>
> On Thu, May 23, 2019@12:44:04AM +0900, Akinobu Mita wrote:
> > 2019?5?22?(?) 1:20 Keith Busch <kbusch at kernel.org>:
> > >
> > > On Wed, May 22, 2019@01:04:07AM +0900, Akinobu Mita wrote:
> > > > +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > > > +{
> > > > +     struct nvme_smart_log *log;
> > > > +     int ret;
> > > > +     int i;
> > > > +
> > > > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > > > +     if (!log)
> > > > +             return 0; /* non-fatal error */
> > > > +
> > > > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > > > +                        log, sizeof(*log), 0);
> > > > +     if (ret) {
> > > > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > > > +             ret = ret > 0 ? -EINVAL : ret;
> > >
> > > A ret > 0 means the device provided a response, so don't return a
> > > negative for that condition, please. That's just going to break
> > > controllers that don't provide smart data, like qemu.
> >
> > After looking at __nvme_submit_sync_cmd(), it returns -EINTR if the device
> > doesn't respond.  So, should this return a negative only when nvme_get_log()
> > returns -EINTR?
> >
> >         ret = nvme_get_log();
> >         if (ret) {
> >                 dev_err(...);
> >                 if (ret != -EINTR)
> >                         ret = 0;
> >                 goto free_log;
> >         }
>
> We return a different negative error if we can't allocate a request,
> like what happens if the controller is dead, like a surprise hot remove.
>
> There's a simpler way to look at this: if ret >= 0, we may proceed,
> otherwise we're done with this controller. Don't make it any more
> complicated than that.

OK.  Sounds good.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] nvme-pci: support thermal zone
  2019-05-21 16:04   ` Akinobu Mita
@ 2019-05-22 17:46     ` Christoph Hellwig
  -1 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2019-05-22 17:46 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-pm, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Jens Axboe, Kenneth Heitke, Daniel Lezcano,
	Eduardo Valentin, Minwoo Im, Zhang Rui, Christoph Hellwig

Is there any good reason why we need to call this from the PCIe driver
instead of handling it all in the core?

Sure non-PCIe devices are usually external, but so are some PCIe
devices, so if we really care about that we need some sort of flag
anyway.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 4/4] nvme-pci: support thermal zone
@ 2019-05-22 17:46     ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2019-05-22 17:46 UTC (permalink / raw)


Is there any good reason why we need to call this from the PCIe driver
instead of handling it all in the core?

Sure non-PCIe devices are usually external, but so are some PCIe
devices, so if we really care about that we need some sort of flag
anyway.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] nvme-pci: support thermal zone
  2019-05-22 17:46     ` Christoph Hellwig
@ 2019-05-23 14:21       ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-23 14:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-pm, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Jens Axboe, Kenneth Heitke, Daniel Lezcano,
	Eduardo Valentin, Minwoo Im, Zhang Rui, Christoph Hellwig

2019年5月23日(木) 2:46 Christoph Hellwig <hch@infradead.org>:
>
> Is there any good reason why we need to call this from the PCIe driver
> instead of handling it all in the core?

OK. I'll move the thermal zones registration and unregistration into the
core module.

Call nvme_thermal_zones_register() in nvme_init_identify(), and call
nvme_thermal_zones_unregister() in nvme_stop_ctrl().

> Sure non-PCIe devices are usually external, but so are some PCIe
> devices, so if we really care about that we need some sort of flag
> anyway.

I'm going to not use the flag in next version.
If there is a demand, we'll have 'use_tz' or 'no_tz' flag in nvme_ctrl.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 4/4] nvme-pci: support thermal zone
@ 2019-05-23 14:21       ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-23 14:21 UTC (permalink / raw)


2019?5?23?(?) 2:46 Christoph Hellwig <hch at infradead.org>:
>
> Is there any good reason why we need to call this from the PCIe driver
> instead of handling it all in the core?

OK. I'll move the thermal zones registration and unregistration into the
core module.

Call nvme_thermal_zones_register() in nvme_init_identify(), and call
nvme_thermal_zones_unregister() in nvme_stop_ctrl().

> Sure non-PCIe devices are usually external, but so are some PCIe
> devices, so if we really care about that we need some sort of flag
> anyway.

I'm going to not use the flag in next version.
If there is a demand, we'll have 'use_tz' or 'no_tz' flag in nvme_ctrl.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-21 16:04   ` Akinobu Mita
@ 2019-05-24  2:35     ` Eduardo Valentin
  -1 siblings, 0 replies; 38+ messages in thread
From: Eduardo Valentin @ 2019-05-24  2:35 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-pm, Zhang Rui, Daniel Lezcano, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Minwoo Im,
	Kenneth Heitke, Chaitanya Kulkarni

Hello Mita,

On Wed, May 22, 2019 at 01:04:07AM +0900, Akinobu Mita wrote:
> The NVMe controller reports up to nine temperature values in the SMART /
> Health log page (the composite temperature and temperature sensor 1 through
> temperature sensor 8).

Is this a fixed number or we should be more flexible on the amount of
sensors?

> The temperature threshold feature (Feature Identifier 04h) configures the
> asynchronous event request command to complete when the temperature is
> crossed its corresponding temperature threshold.
> 
> This adds infrastructure to provide these temperatures and thresholds via
> thermal zone devices.
> 
> The nvme_thermal_zones_register() creates up to nine thermal zone devices
> for all implemented temperature sensors including the composite
> temperature.

great!

> 
> /sys/class/thermal/thermal_zone[0-*]:
>     |---temp: Temperature
>     |---trip_point_0_temp: Over temperature threshold
> 
> The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
> On the other hand, the following symlinks to the thermal zone devices are
> created in the nvme device sysfs directory.
> 
> - nvme_temp0: Composite temperature
> - nvme_temp1: Temperature sensor 1
> ...
> - nvme_temp8: Temperature sensor 8
> 
> The nvme_thermal_zones_unregister() removes the registered thermal zone
> devices and symlinks.
> 
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: Eduardo Valentin <edubezval@gmail.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> Cc: Kenneth Heitke <kenneth.heitke@intel.com>
> Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> * v2
> - s/correspoinding/corresponding/ typo in commit log
> - Borrowed nvme_get_features() from Keith's patch
> - Temperature threshold notification is splitted into another patch
> - Change the data type of 'sensor' to unsigned
> - Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
> - Add WARN_ON_ONCE for paranoid checks
> - Fix off-by-one error in nvme_get_temp
> - Validate 'sensor' where the value is actually used
> - Define and utilize two enums related to the temperature threshold feature
> - Remove hysteresis value for this trip point and don't utilize the under
>   temperature threshold
> - Print error message for thermal_zone_device_register() failure
> - Add function comments for nvme_thermal_zones_{,un}register
> - Suppress non-fatal errors from nvme_thermal_zones_register()
> - Add comment about implemented temperature sensors 
> - Instead of creating a new 'thermal_work', append async smart event's
>   action to the existing async_event_work
> - Add comment for tzdev member in nvme_ctrl
> 
>  drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/nvme/host/nvme.h |  27 +++++
>  include/linux/nvme.h     |   5 +
>  3 files changed, 297 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c04df80..0ec303c 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
>  	}
>  }
>  
> +#ifdef CONFIG_THERMAL
> +
> +static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
> +{
> +	struct nvme_smart_log *log;
> +	int ret;
> +
> +	BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
> +		     ARRAY_SIZE(ctrl->tzdev));

When would this be triggered?

> +
> +	if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
> +		return -EINVAL;
> +
> +	log = kzalloc(sizeof(*log), GFP_KERNEL);

Do we really need to allocate memory every time we want to read
temperature? Is this struct too large to fit stack?

> +	if (!log)
> +		return -ENOMEM;
> +
> +	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> +			   log, sizeof(*log), 0);
> +	if (ret) {
> +		ret = ret > 0 ? -EINVAL : ret;
> +		goto free_log;
> +	}
> +
> +	if (sensor)
> +		*temp = le16_to_cpu(log->temp_sensor[sensor - 1]);
> +	else
> +		*temp = get_unaligned_le16(log->temperature);
> +
> +	if (!*temp)
> +		ret = -EINVAL;
> +
> +free_log:
> +	kfree(log);
> +
> +	return ret;
> +}
> +
> +static unsigned int nvme_tz_type_to_sensor(const char *type)
> +{
> +	unsigned int sensor;
> +
> +	if (sscanf(type, "nvme_temp%u", &sensor) != 1)
> +		return UINT_MAX;
> +
> +	return sensor;
> +}
> +
> +#define KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS((t) * 10)
> +#define MILLICELSIUS_TO_KELVIN(t) ((MILLICELSIUS_TO_DECI_KELVIN(t) + 5) / 10)
> +
> +static int nvme_tz_get_temp(struct thermal_zone_device *tzdev,
> +			    int *temp)
> +{
> +	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> +	struct nvme_ctrl *ctrl = tzdev->devdata;
> +	int ret;
> +
> +	ret = nvme_get_temp(ctrl, sensor, temp);
> +	if (!ret)
> +		*temp = KELVIN_TO_MILLICELSIUS(*temp);
> +
> +	return ret;
> +}
> +
> +static int nvme_tz_get_trip_type(struct thermal_zone_device *tzdev,
> +				 int trip, enum thermal_trip_type *type)
> +{
> +	*type = THERMAL_TRIP_ACTIVE;
> +
> +	return 0;
> +}
> +
> +static int nvme_get_over_temp_thresh(struct nvme_ctrl *ctrl,
> +				     unsigned int sensor, int *temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> +		return -EINVAL;
> +
> +	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);
> +	if (!ret)
> +		*temp = status & NVME_TEMP_THRESH_MASK;
> +
> +	return ret > 0 ? -EINVAL : ret;
> +}
> +
> +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> +				     unsigned int sensor, int temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> +		return -EINVAL;
> +
> +	if (temp > NVME_TEMP_THRESH_MASK)
> +		return -EINVAL;
> +
> +	threshold |= temp & NVME_TEMP_THRESH_MASK;
> +
> +	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);
> +
> +	return ret > 0 ? -EINVAL : ret;
> +}
> +
> +static int nvme_tz_get_trip_temp(struct thermal_zone_device *tzdev,
> +				 int trip, int *temp)
> +{
> +	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> +	struct nvme_ctrl *ctrl = tzdev->devdata;
> +	int ret;
> +
> +	ret = nvme_get_over_temp_thresh(ctrl, sensor, temp);
> +	if (!ret)
> +		*temp = KELVIN_TO_MILLICELSIUS(*temp);
> +
> +	return ret;
> +}
> +
> +static int nvme_tz_set_trip_temp(struct thermal_zone_device *tzdev,
> +				 int trip, int temp)
> +{
> +	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> +	struct nvme_ctrl *ctrl = tzdev->devdata;
> +
> +	temp = MILLICELSIUS_TO_KELVIN(temp);
> +
> +	return nvme_set_over_temp_thresh(ctrl, sensor, temp);
> +}
> +
> +static struct thermal_zone_device_ops nvme_tz_ops = {
> +	.get_temp = nvme_tz_get_temp,
> +	.get_trip_type = nvme_tz_get_trip_type,
> +	.get_trip_temp = nvme_tz_get_trip_temp,
> +	.set_trip_temp = nvme_tz_set_trip_temp,
> +};
> +
> +static struct thermal_zone_params nvme_tz_params = {
> +	.governor_name = "user_space",
> +	.no_hwmon = true,
> +};
> +
> +static struct thermal_zone_device *
> +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
> +{
> +	struct thermal_zone_device *tzdev;
> +	char type[THERMAL_NAME_LENGTH];
> +	int ret;
> +
> +	snprintf(type, sizeof(type), "nvme_temp%d", sensor);
> +

Do we have something more meaningful or descriptive here? A more
interesting type would be a string that could remind of the sensor
location. Unless nvme_temp0 is enough to understand where this
temperature is coming from, I would ask to get something more
descriptive.

> +	tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> +					     &nvme_tz_params, 0, 0);

Have you considered if there is a use case for using of-thermal here?

> +	if (IS_ERR(tzdev)) {
> +		dev_err(ctrl->device,
> +			"Failed to register thermal zone device: %ld\n",
> +			PTR_ERR(tzdev));
> +		return tzdev;
> +	}
> +
> +	ret = sysfs_create_link(&ctrl->ctrl_device.kobj,
> +				&tzdev->device.kobj, type);
> +	if (ret)
> +		goto device_unregister;
> +
> +	ret = sysfs_create_link(&tzdev->device.kobj,
> +				&ctrl->ctrl_device.kobj, "device");
> +	if (ret)
> +		goto remove_link;
> +
> +	return tzdev;
> +
> +remove_link:
> +	sysfs_remove_link(&ctrl->ctrl_device.kobj, type);
> +device_unregister:
> +	thermal_zone_device_unregister(tzdev);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +/**
> + * nvme_thermal_zones_register() - register nvme thermal zone devices
> + * @ctrl: controller instance
> + *
> + * This function creates up to nine thermal zone devices for all implemented
> + * temperature sensors including the composite temperature.
> + * Each thermal zone device provides a single trip point temperature that is
> + * associated with an over temperature threshold.
> + */
> +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_smart_log *log;
> +	int ret;
> +	int i;
> +
> +	log = kzalloc(sizeof(*log), GFP_KERNEL);
> +	if (!log)
> +		return 0; /* non-fatal error */
> +
> +	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> +			   log, sizeof(*log), 0);
> +	if (ret) {
> +		dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> +		ret = ret > 0 ? -EINVAL : ret;
> +		goto free_log;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> +		struct thermal_zone_device *tzdev;
> +
> +		/*
> +		 * All implemented temperature sensors report a non-zero value
> +		 * in temperature sensor fields in the smart log page.
> +		 */
> +		if (i && !le16_to_cpu(log->temp_sensor[i - 1]))
> +			continue;
> +		if (ctrl->tzdev[i])
> +			continue;
> +
> +		tzdev = nvme_thermal_zone_register(ctrl, i);
> +		if (!IS_ERR(tzdev))
> +			ctrl->tzdev[i] = tzdev;
> +	}
> +
> +free_log:
> +	kfree(log);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(nvme_thermal_zones_register);
> +
> +/**
> + * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices
> + * @ctrl: controller instance
> + *
> + * This function removes the registered thermal zone devices and symlinks.
> + */
> +void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> +		struct thermal_zone_device *tzdev = ctrl->tzdev[i];
> +
> +		if (!tzdev)
> +			continue;
> +
> +		sysfs_remove_link(&tzdev->device.kobj, "device");
> +		sysfs_remove_link(&ctrl->ctrl_device.kobj, tzdev->type);
> +		thermal_zone_device_unregister(tzdev);
> +
> +		ctrl->tzdev[i] = NULL;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(nvme_thermal_zones_unregister);
> +
> +#endif /* CONFIG_THERMAL */
> +
>  struct nvme_core_quirk_entry {
>  	/*
>  	 * NVMe model and firmware strings are padded with spaces.  For
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bb673b8..0bc4e85 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -15,6 +15,7 @@
>  #include <linux/sed-opal.h>
>  #include <linux/fault-inject.h>
>  #include <linux/rcupdate.h>
> +#include <linux/thermal.h>
>  
>  extern unsigned int nvme_io_timeout;
>  #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
> @@ -248,6 +249,14 @@ struct nvme_ctrl {
>  
>  	struct page *discard_page;
>  	unsigned long discard_page_busy;
> +
> +#ifdef CONFIG_THERMAL
> +	/*
> +	 * tzdev[0]: composite temperature
> +	 * tzdev[1-8]: temperature sensor 1 through 8
> +	 */
> +	struct thermal_zone_device *tzdev[9];
> +#endif
>  };
>  
>  enum nvme_iopolicy {
> @@ -559,6 +568,24 @@ static inline void nvme_mpath_stop(struct nvme_ctrl *ctrl)
>  }
>  #endif /* CONFIG_NVME_MULTIPATH */
>  
> +#ifdef CONFIG_THERMAL
> +
> +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl);
> +void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl);
> +
> +#else
> +
> +static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> +{
> +	return 0;
> +}
> +
> +static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> +{
> +}
> +
> +#endif /* CONFIG_THERMAL */
> +
>  #ifdef CONFIG_NVM
>  int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node);
>  void nvme_nvm_unregister(struct nvme_ns *ns);
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index 658ac75..54f0a13 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -780,6 +780,11 @@ struct nvme_write_zeroes_cmd {
>  
>  /* Features */
>  
> +enum {
> +	NVME_TEMP_THRESH_MASK		= 0xffff,
> +	NVME_TEMP_THRESH_SELECT_SHIFT	= 16,
> +};
> +
>  struct nvme_feat_auto_pst {
>  	__le64 entries[32];
>  };
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-24  2:35     ` Eduardo Valentin
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Valentin @ 2019-05-24  2:35 UTC (permalink / raw)


Hello Mita,

On Wed, May 22, 2019@01:04:07AM +0900, Akinobu Mita wrote:
> The NVMe controller reports up to nine temperature values in the SMART /
> Health log page (the composite temperature and temperature sensor 1 through
> temperature sensor 8).

Is this a fixed number or we should be more flexible on the amount of
sensors?

> The temperature threshold feature (Feature Identifier 04h) configures the
> asynchronous event request command to complete when the temperature is
> crossed its corresponding temperature threshold.
> 
> This adds infrastructure to provide these temperatures and thresholds via
> thermal zone devices.
> 
> The nvme_thermal_zones_register() creates up to nine thermal zone devices
> for all implemented temperature sensors including the composite
> temperature.

great!

> 
> /sys/class/thermal/thermal_zone[0-*]:
>     |---temp: Temperature
>     |---trip_point_0_temp: Over temperature threshold
> 
> The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
> On the other hand, the following symlinks to the thermal zone devices are
> created in the nvme device sysfs directory.
> 
> - nvme_temp0: Composite temperature
> - nvme_temp1: Temperature sensor 1
> ...
> - nvme_temp8: Temperature sensor 8
> 
> The nvme_thermal_zones_unregister() removes the registered thermal zone
> devices and symlinks.
> 
> Cc: Zhang Rui <rui.zhang at intel.com>
> Cc: Eduardo Valentin <edubezval at gmail.com>
> Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Jens Axboe <axboe at fb.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Cc: Sagi Grimberg <sagi at grimberg.me>
> Cc: Minwoo Im <minwoo.im.dev at gmail.com>
> Cc: Kenneth Heitke <kenneth.heitke at intel.com>
> Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
> Signed-off-by: Akinobu Mita <akinobu.mita at gmail.com>
> ---
> * v2
> - s/correspoinding/corresponding/ typo in commit log
> - Borrowed nvme_get_features() from Keith's patch
> - Temperature threshold notification is splitted into another patch
> - Change the data type of 'sensor' to unsigned
> - Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
> - Add WARN_ON_ONCE for paranoid checks
> - Fix off-by-one error in nvme_get_temp
> - Validate 'sensor' where the value is actually used
> - Define and utilize two enums related to the temperature threshold feature
> - Remove hysteresis value for this trip point and don't utilize the under
>   temperature threshold
> - Print error message for thermal_zone_device_register() failure
> - Add function comments for nvme_thermal_zones_{,un}register
> - Suppress non-fatal errors from nvme_thermal_zones_register()
> - Add comment about implemented temperature sensors 
> - Instead of creating a new 'thermal_work', append async smart event's
>   action to the existing async_event_work
> - Add comment for tzdev member in nvme_ctrl
> 
>  drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/nvme/host/nvme.h |  27 +++++
>  include/linux/nvme.h     |   5 +
>  3 files changed, 297 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c04df80..0ec303c 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
>  	}
>  }
>  
> +#ifdef CONFIG_THERMAL
> +
> +static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
> +{
> +	struct nvme_smart_log *log;
> +	int ret;
> +
> +	BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
> +		     ARRAY_SIZE(ctrl->tzdev));

When would this be triggered?

> +
> +	if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
> +		return -EINVAL;
> +
> +	log = kzalloc(sizeof(*log), GFP_KERNEL);

Do we really need to allocate memory every time we want to read
temperature? Is this struct too large to fit stack?

> +	if (!log)
> +		return -ENOMEM;
> +
> +	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> +			   log, sizeof(*log), 0);
> +	if (ret) {
> +		ret = ret > 0 ? -EINVAL : ret;
> +		goto free_log;
> +	}
> +
> +	if (sensor)
> +		*temp = le16_to_cpu(log->temp_sensor[sensor - 1]);
> +	else
> +		*temp = get_unaligned_le16(log->temperature);
> +
> +	if (!*temp)
> +		ret = -EINVAL;
> +
> +free_log:
> +	kfree(log);
> +
> +	return ret;
> +}
> +
> +static unsigned int nvme_tz_type_to_sensor(const char *type)
> +{
> +	unsigned int sensor;
> +
> +	if (sscanf(type, "nvme_temp%u", &sensor) != 1)
> +		return UINT_MAX;
> +
> +	return sensor;
> +}
> +
> +#define KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS((t) * 10)
> +#define MILLICELSIUS_TO_KELVIN(t) ((MILLICELSIUS_TO_DECI_KELVIN(t) + 5) / 10)
> +
> +static int nvme_tz_get_temp(struct thermal_zone_device *tzdev,
> +			    int *temp)
> +{
> +	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> +	struct nvme_ctrl *ctrl = tzdev->devdata;
> +	int ret;
> +
> +	ret = nvme_get_temp(ctrl, sensor, temp);
> +	if (!ret)
> +		*temp = KELVIN_TO_MILLICELSIUS(*temp);
> +
> +	return ret;
> +}
> +
> +static int nvme_tz_get_trip_type(struct thermal_zone_device *tzdev,
> +				 int trip, enum thermal_trip_type *type)
> +{
> +	*type = THERMAL_TRIP_ACTIVE;
> +
> +	return 0;
> +}
> +
> +static int nvme_get_over_temp_thresh(struct nvme_ctrl *ctrl,
> +				     unsigned int sensor, int *temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> +		return -EINVAL;
> +
> +	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);
> +	if (!ret)
> +		*temp = status & NVME_TEMP_THRESH_MASK;
> +
> +	return ret > 0 ? -EINVAL : ret;
> +}
> +
> +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> +				     unsigned int sensor, int temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> +		return -EINVAL;
> +
> +	if (temp > NVME_TEMP_THRESH_MASK)
> +		return -EINVAL;
> +
> +	threshold |= temp & NVME_TEMP_THRESH_MASK;
> +
> +	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);
> +
> +	return ret > 0 ? -EINVAL : ret;
> +}
> +
> +static int nvme_tz_get_trip_temp(struct thermal_zone_device *tzdev,
> +				 int trip, int *temp)
> +{
> +	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> +	struct nvme_ctrl *ctrl = tzdev->devdata;
> +	int ret;
> +
> +	ret = nvme_get_over_temp_thresh(ctrl, sensor, temp);
> +	if (!ret)
> +		*temp = KELVIN_TO_MILLICELSIUS(*temp);
> +
> +	return ret;
> +}
> +
> +static int nvme_tz_set_trip_temp(struct thermal_zone_device *tzdev,
> +				 int trip, int temp)
> +{
> +	unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> +	struct nvme_ctrl *ctrl = tzdev->devdata;
> +
> +	temp = MILLICELSIUS_TO_KELVIN(temp);
> +
> +	return nvme_set_over_temp_thresh(ctrl, sensor, temp);
> +}
> +
> +static struct thermal_zone_device_ops nvme_tz_ops = {
> +	.get_temp = nvme_tz_get_temp,
> +	.get_trip_type = nvme_tz_get_trip_type,
> +	.get_trip_temp = nvme_tz_get_trip_temp,
> +	.set_trip_temp = nvme_tz_set_trip_temp,
> +};
> +
> +static struct thermal_zone_params nvme_tz_params = {
> +	.governor_name = "user_space",
> +	.no_hwmon = true,
> +};
> +
> +static struct thermal_zone_device *
> +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
> +{
> +	struct thermal_zone_device *tzdev;
> +	char type[THERMAL_NAME_LENGTH];
> +	int ret;
> +
> +	snprintf(type, sizeof(type), "nvme_temp%d", sensor);
> +

Do we have something more meaningful or descriptive here? A more
interesting type would be a string that could remind of the sensor
location. Unless nvme_temp0 is enough to understand where this
temperature is coming from, I would ask to get something more
descriptive.

> +	tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> +					     &nvme_tz_params, 0, 0);

Have you considered if there is a use case for using of-thermal here?

> +	if (IS_ERR(tzdev)) {
> +		dev_err(ctrl->device,
> +			"Failed to register thermal zone device: %ld\n",
> +			PTR_ERR(tzdev));
> +		return tzdev;
> +	}
> +
> +	ret = sysfs_create_link(&ctrl->ctrl_device.kobj,
> +				&tzdev->device.kobj, type);
> +	if (ret)
> +		goto device_unregister;
> +
> +	ret = sysfs_create_link(&tzdev->device.kobj,
> +				&ctrl->ctrl_device.kobj, "device");
> +	if (ret)
> +		goto remove_link;
> +
> +	return tzdev;
> +
> +remove_link:
> +	sysfs_remove_link(&ctrl->ctrl_device.kobj, type);
> +device_unregister:
> +	thermal_zone_device_unregister(tzdev);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +/**
> + * nvme_thermal_zones_register() - register nvme thermal zone devices
> + * @ctrl: controller instance
> + *
> + * This function creates up to nine thermal zone devices for all implemented
> + * temperature sensors including the composite temperature.
> + * Each thermal zone device provides a single trip point temperature that is
> + * associated with an over temperature threshold.
> + */
> +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_smart_log *log;
> +	int ret;
> +	int i;
> +
> +	log = kzalloc(sizeof(*log), GFP_KERNEL);
> +	if (!log)
> +		return 0; /* non-fatal error */
> +
> +	ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> +			   log, sizeof(*log), 0);
> +	if (ret) {
> +		dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> +		ret = ret > 0 ? -EINVAL : ret;
> +		goto free_log;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> +		struct thermal_zone_device *tzdev;
> +
> +		/*
> +		 * All implemented temperature sensors report a non-zero value
> +		 * in temperature sensor fields in the smart log page.
> +		 */
> +		if (i && !le16_to_cpu(log->temp_sensor[i - 1]))
> +			continue;
> +		if (ctrl->tzdev[i])
> +			continue;
> +
> +		tzdev = nvme_thermal_zone_register(ctrl, i);
> +		if (!IS_ERR(tzdev))
> +			ctrl->tzdev[i] = tzdev;
> +	}
> +
> +free_log:
> +	kfree(log);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(nvme_thermal_zones_register);
> +
> +/**
> + * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices
> + * @ctrl: controller instance
> + *
> + * This function removes the registered thermal zone devices and symlinks.
> + */
> +void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> +		struct thermal_zone_device *tzdev = ctrl->tzdev[i];
> +
> +		if (!tzdev)
> +			continue;
> +
> +		sysfs_remove_link(&tzdev->device.kobj, "device");
> +		sysfs_remove_link(&ctrl->ctrl_device.kobj, tzdev->type);
> +		thermal_zone_device_unregister(tzdev);
> +
> +		ctrl->tzdev[i] = NULL;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(nvme_thermal_zones_unregister);
> +
> +#endif /* CONFIG_THERMAL */
> +
>  struct nvme_core_quirk_entry {
>  	/*
>  	 * NVMe model and firmware strings are padded with spaces.  For
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bb673b8..0bc4e85 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -15,6 +15,7 @@
>  #include <linux/sed-opal.h>
>  #include <linux/fault-inject.h>
>  #include <linux/rcupdate.h>
> +#include <linux/thermal.h>
>  
>  extern unsigned int nvme_io_timeout;
>  #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
> @@ -248,6 +249,14 @@ struct nvme_ctrl {
>  
>  	struct page *discard_page;
>  	unsigned long discard_page_busy;
> +
> +#ifdef CONFIG_THERMAL
> +	/*
> +	 * tzdev[0]: composite temperature
> +	 * tzdev[1-8]: temperature sensor 1 through 8
> +	 */
> +	struct thermal_zone_device *tzdev[9];
> +#endif
>  };
>  
>  enum nvme_iopolicy {
> @@ -559,6 +568,24 @@ static inline void nvme_mpath_stop(struct nvme_ctrl *ctrl)
>  }
>  #endif /* CONFIG_NVME_MULTIPATH */
>  
> +#ifdef CONFIG_THERMAL
> +
> +int nvme_thermal_zones_register(struct nvme_ctrl *ctrl);
> +void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl);
> +
> +#else
> +
> +static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> +{
> +	return 0;
> +}
> +
> +static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> +{
> +}
> +
> +#endif /* CONFIG_THERMAL */
> +
>  #ifdef CONFIG_NVM
>  int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node);
>  void nvme_nvm_unregister(struct nvme_ns *ns);
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index 658ac75..54f0a13 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -780,6 +780,11 @@ struct nvme_write_zeroes_cmd {
>  
>  /* Features */
>  
> +enum {
> +	NVME_TEMP_THRESH_MASK		= 0xffff,
> +	NVME_TEMP_THRESH_SELECT_SHIFT	= 16,
> +};
> +
>  struct nvme_feat_auto_pst {
>  	__le64 entries[32];
>  };
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-24  2:35     ` Eduardo Valentin
@ 2019-05-24 13:57       ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-24 13:57 UTC (permalink / raw)
  To: Eduardo Valentin
  Cc: linux-nvme, linux-pm, Zhang Rui, Daniel Lezcano, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Minwoo Im,
	Kenneth Heitke, Chaitanya Kulkarni

2019年5月24日(金) 11:35 Eduardo Valentin <edubezval@gmail.com>:
>
> Hello Mita,
>
> On Wed, May 22, 2019 at 01:04:07AM +0900, Akinobu Mita wrote:
> > The NVMe controller reports up to nine temperature values in the SMART /
> > Health log page (the composite temperature and temperature sensor 1 through
> > temperature sensor 8).
>
> Is this a fixed number or we should be more flexible on the amount of
> sensors?

Max number is fixed.  In NVMe spec revision 1.3, a controller reports up
to nine temperature values in the SMART / Health information log.

It may change to more than nine in the future, but we can fix then.

> > The temperature threshold feature (Feature Identifier 04h) configures the
> > asynchronous event request command to complete when the temperature is
> > crossed its corresponding temperature threshold.
> >
> > This adds infrastructure to provide these temperatures and thresholds via
> > thermal zone devices.
> >
> > The nvme_thermal_zones_register() creates up to nine thermal zone devices
> > for all implemented temperature sensors including the composite
> > temperature.
>
> great!
>
> >
> > /sys/class/thermal/thermal_zone[0-*]:
> >     |---temp: Temperature
> >     |---trip_point_0_temp: Over temperature threshold
> >
> > The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
> > On the other hand, the following symlinks to the thermal zone devices are
> > created in the nvme device sysfs directory.
> >
> > - nvme_temp0: Composite temperature
> > - nvme_temp1: Temperature sensor 1
> > ...
> > - nvme_temp8: Temperature sensor 8
> >
> > The nvme_thermal_zones_unregister() removes the registered thermal zone
> > devices and symlinks.
> >
> > Cc: Zhang Rui <rui.zhang@intel.com>
> > Cc: Eduardo Valentin <edubezval@gmail.com>
> > Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> > Cc: Kenneth Heitke <kenneth.heitke@intel.com>
> > Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
> > Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> > ---
> > * v2
> > - s/correspoinding/corresponding/ typo in commit log
> > - Borrowed nvme_get_features() from Keith's patch
> > - Temperature threshold notification is splitted into another patch
> > - Change the data type of 'sensor' to unsigned
> > - Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
> > - Add WARN_ON_ONCE for paranoid checks
> > - Fix off-by-one error in nvme_get_temp
> > - Validate 'sensor' where the value is actually used
> > - Define and utilize two enums related to the temperature threshold feature
> > - Remove hysteresis value for this trip point and don't utilize the under
> >   temperature threshold
> > - Print error message for thermal_zone_device_register() failure
> > - Add function comments for nvme_thermal_zones_{,un}register
> > - Suppress non-fatal errors from nvme_thermal_zones_register()
> > - Add comment about implemented temperature sensors
> > - Instead of creating a new 'thermal_work', append async smart event's
> >   action to the existing async_event_work
> > - Add comment for tzdev member in nvme_ctrl
> >
> >  drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/nvme/host/nvme.h |  27 +++++
> >  include/linux/nvme.h     |   5 +
> >  3 files changed, 297 insertions(+)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index c04df80..0ec303c 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
> >       }
> >  }
> >
> > +#ifdef CONFIG_THERMAL
> > +
> > +static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
> > +{
> > +     struct nvme_smart_log *log;
> > +     int ret;
> > +
> > +     BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
> > +                  ARRAY_SIZE(ctrl->tzdev));
>
> When would this be triggered?

This just ensures that the temperature fields for the SMART log page
structure and nvme_ctrl are not changed accidentally.

> > +
> > +     if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
> > +             return -EINVAL;
> > +
> > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
>
> Do we really need to allocate memory every time we want to read
> temperature? Is this struct too large to fit stack?

I think 512 bytes is too large in the kernel stack

> > +     if (!log)
> > +             return -ENOMEM;
> > +
> > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > +                        log, sizeof(*log), 0);
> > +     if (ret) {
> > +             ret = ret > 0 ? -EINVAL : ret;
> > +             goto free_log;
> > +     }
> > +
> > +     if (sensor)
> > +             *temp = le16_to_cpu(log->temp_sensor[sensor - 1]);
> > +     else
> > +             *temp = get_unaligned_le16(log->temperature);
> > +
> > +     if (!*temp)
> > +             ret = -EINVAL;
> > +
> > +free_log:
> > +     kfree(log);
> > +
> > +     return ret;
> > +}
> > +
> > +static unsigned int nvme_tz_type_to_sensor(const char *type)
> > +{
> > +     unsigned int sensor;
> > +
> > +     if (sscanf(type, "nvme_temp%u", &sensor) != 1)
> > +             return UINT_MAX;
> > +
> > +     return sensor;
> > +}
> > +
> > +#define KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS((t) * 10)
> > +#define MILLICELSIUS_TO_KELVIN(t) ((MILLICELSIUS_TO_DECI_KELVIN(t) + 5) / 10)
> > +
> > +static int nvme_tz_get_temp(struct thermal_zone_device *tzdev,
> > +                         int *temp)
> > +{
> > +     unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> > +     struct nvme_ctrl *ctrl = tzdev->devdata;
> > +     int ret;
> > +
> > +     ret = nvme_get_temp(ctrl, sensor, temp);
> > +     if (!ret)
> > +             *temp = KELVIN_TO_MILLICELSIUS(*temp);
> > +
> > +     return ret;
> > +}
> > +
> > +static int nvme_tz_get_trip_type(struct thermal_zone_device *tzdev,
> > +                              int trip, enum thermal_trip_type *type)
> > +{
> > +     *type = THERMAL_TRIP_ACTIVE;
> > +
> > +     return 0;
> > +}
> > +
> > +static int nvme_get_over_temp_thresh(struct nvme_ctrl *ctrl,
> > +                                  unsigned int sensor, int *temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> > +             return -EINVAL;
> > +
> > +     ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
> > +     if (!ret)
> > +             *temp = status & NVME_TEMP_THRESH_MASK;
> > +
> > +     return ret > 0 ? -EINVAL : ret;
> > +}
> > +
> > +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> > +                                  unsigned int sensor, int temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> > +             return -EINVAL;
> > +
> > +     if (temp > NVME_TEMP_THRESH_MASK)
> > +             return -EINVAL;
> > +
> > +     threshold |= temp & NVME_TEMP_THRESH_MASK;
> > +
> > +     ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
> > +
> > +     return ret > 0 ? -EINVAL : ret;
> > +}
> > +
> > +static int nvme_tz_get_trip_temp(struct thermal_zone_device *tzdev,
> > +                              int trip, int *temp)
> > +{
> > +     unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> > +     struct nvme_ctrl *ctrl = tzdev->devdata;
> > +     int ret;
> > +
> > +     ret = nvme_get_over_temp_thresh(ctrl, sensor, temp);
> > +     if (!ret)
> > +             *temp = KELVIN_TO_MILLICELSIUS(*temp);
> > +
> > +     return ret;
> > +}
> > +
> > +static int nvme_tz_set_trip_temp(struct thermal_zone_device *tzdev,
> > +                              int trip, int temp)
> > +{
> > +     unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> > +     struct nvme_ctrl *ctrl = tzdev->devdata;
> > +
> > +     temp = MILLICELSIUS_TO_KELVIN(temp);
> > +
> > +     return nvme_set_over_temp_thresh(ctrl, sensor, temp);
> > +}
> > +
> > +static struct thermal_zone_device_ops nvme_tz_ops = {
> > +     .get_temp = nvme_tz_get_temp,
> > +     .get_trip_type = nvme_tz_get_trip_type,
> > +     .get_trip_temp = nvme_tz_get_trip_temp,
> > +     .set_trip_temp = nvme_tz_set_trip_temp,
> > +};
> > +
> > +static struct thermal_zone_params nvme_tz_params = {
> > +     .governor_name = "user_space",
> > +     .no_hwmon = true,
> > +};
> > +
> > +static struct thermal_zone_device *
> > +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
> > +{
> > +     struct thermal_zone_device *tzdev;
> > +     char type[THERMAL_NAME_LENGTH];
> > +     int ret;
> > +
> > +     snprintf(type, sizeof(type), "nvme_temp%d", sensor);
> > +
>
> Do we have something more meaningful or descriptive here? A more
> interesting type would be a string that could remind of the sensor
> location. Unless nvme_temp0 is enough to understand where this
> temperature is coming from, I would ask to get something more
> descriptive.

The SMART log page defines composite temperature and temperature sensor 1
through temperature sensor 8.  So I think nvme_temp1 to nvme_temp8 are
descriptive.  And I personally prefer 'nvme_temp0' rather than
'nvme_composite_temp'.

BTW, if we have more than two controllers, we'll have same type names
in the system.  So I'm going to append instance number after 'nvme'.
(e.g. nvme0_temp0).

> > +     tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> > +                                          &nvme_tz_params, 0, 0);
>
> Have you considered if there is a use case for using of-thermal here?

Is it possible to specify the device node properties for the pci devices?
If so, of-thermal zone devices are very useful.

I think normal thermal zone devices and of-thermal zone devices can
co-exist. (i.e. add 'tzdev_of[9]' in nvme_ctrl and the operations are
almost same with the normal one)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-05-24 13:57       ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-05-24 13:57 UTC (permalink / raw)


2019?5?24?(?) 11:35 Eduardo Valentin <edubezval at gmail.com>:
>
> Hello Mita,
>
> On Wed, May 22, 2019@01:04:07AM +0900, Akinobu Mita wrote:
> > The NVMe controller reports up to nine temperature values in the SMART /
> > Health log page (the composite temperature and temperature sensor 1 through
> > temperature sensor 8).
>
> Is this a fixed number or we should be more flexible on the amount of
> sensors?

Max number is fixed.  In NVMe spec revision 1.3, a controller reports up
to nine temperature values in the SMART / Health information log.

It may change to more than nine in the future, but we can fix then.

> > The temperature threshold feature (Feature Identifier 04h) configures the
> > asynchronous event request command to complete when the temperature is
> > crossed its corresponding temperature threshold.
> >
> > This adds infrastructure to provide these temperatures and thresholds via
> > thermal zone devices.
> >
> > The nvme_thermal_zones_register() creates up to nine thermal zone devices
> > for all implemented temperature sensors including the composite
> > temperature.
>
> great!
>
> >
> > /sys/class/thermal/thermal_zone[0-*]:
> >     |---temp: Temperature
> >     |---trip_point_0_temp: Over temperature threshold
> >
> > The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
> > On the other hand, the following symlinks to the thermal zone devices are
> > created in the nvme device sysfs directory.
> >
> > - nvme_temp0: Composite temperature
> > - nvme_temp1: Temperature sensor 1
> > ...
> > - nvme_temp8: Temperature sensor 8
> >
> > The nvme_thermal_zones_unregister() removes the registered thermal zone
> > devices and symlinks.
> >
> > Cc: Zhang Rui <rui.zhang at intel.com>
> > Cc: Eduardo Valentin <edubezval at gmail.com>
> > Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
> > Cc: Keith Busch <keith.busch at intel.com>
> > Cc: Jens Axboe <axboe at fb.com>
> > Cc: Christoph Hellwig <hch at lst.de>
> > Cc: Sagi Grimberg <sagi at grimberg.me>
> > Cc: Minwoo Im <minwoo.im.dev at gmail.com>
> > Cc: Kenneth Heitke <kenneth.heitke at intel.com>
> > Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
> > Signed-off-by: Akinobu Mita <akinobu.mita at gmail.com>
> > ---
> > * v2
> > - s/correspoinding/corresponding/ typo in commit log
> > - Borrowed nvme_get_features() from Keith's patch
> > - Temperature threshold notification is splitted into another patch
> > - Change the data type of 'sensor' to unsigned
> > - Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
> > - Add WARN_ON_ONCE for paranoid checks
> > - Fix off-by-one error in nvme_get_temp
> > - Validate 'sensor' where the value is actually used
> > - Define and utilize two enums related to the temperature threshold feature
> > - Remove hysteresis value for this trip point and don't utilize the under
> >   temperature threshold
> > - Print error message for thermal_zone_device_register() failure
> > - Add function comments for nvme_thermal_zones_{,un}register
> > - Suppress non-fatal errors from nvme_thermal_zones_register()
> > - Add comment about implemented temperature sensors
> > - Instead of creating a new 'thermal_work', append async smart event's
> >   action to the existing async_event_work
> > - Add comment for tzdev member in nvme_ctrl
> >
> >  drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/nvme/host/nvme.h |  27 +++++
> >  include/linux/nvme.h     |   5 +
> >  3 files changed, 297 insertions(+)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index c04df80..0ec303c 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
> >       }
> >  }
> >
> > +#ifdef CONFIG_THERMAL
> > +
> > +static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
> > +{
> > +     struct nvme_smart_log *log;
> > +     int ret;
> > +
> > +     BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
> > +                  ARRAY_SIZE(ctrl->tzdev));
>
> When would this be triggered?

This just ensures that the temperature fields for the SMART log page
structure and nvme_ctrl are not changed accidentally.

> > +
> > +     if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
> > +             return -EINVAL;
> > +
> > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
>
> Do we really need to allocate memory every time we want to read
> temperature? Is this struct too large to fit stack?

I think 512 bytes is too large in the kernel stack

> > +     if (!log)
> > +             return -ENOMEM;
> > +
> > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > +                        log, sizeof(*log), 0);
> > +     if (ret) {
> > +             ret = ret > 0 ? -EINVAL : ret;
> > +             goto free_log;
> > +     }
> > +
> > +     if (sensor)
> > +             *temp = le16_to_cpu(log->temp_sensor[sensor - 1]);
> > +     else
> > +             *temp = get_unaligned_le16(log->temperature);
> > +
> > +     if (!*temp)
> > +             ret = -EINVAL;
> > +
> > +free_log:
> > +     kfree(log);
> > +
> > +     return ret;
> > +}
> > +
> > +static unsigned int nvme_tz_type_to_sensor(const char *type)
> > +{
> > +     unsigned int sensor;
> > +
> > +     if (sscanf(type, "nvme_temp%u", &sensor) != 1)
> > +             return UINT_MAX;
> > +
> > +     return sensor;
> > +}
> > +
> > +#define KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS((t) * 10)
> > +#define MILLICELSIUS_TO_KELVIN(t) ((MILLICELSIUS_TO_DECI_KELVIN(t) + 5) / 10)
> > +
> > +static int nvme_tz_get_temp(struct thermal_zone_device *tzdev,
> > +                         int *temp)
> > +{
> > +     unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> > +     struct nvme_ctrl *ctrl = tzdev->devdata;
> > +     int ret;
> > +
> > +     ret = nvme_get_temp(ctrl, sensor, temp);
> > +     if (!ret)
> > +             *temp = KELVIN_TO_MILLICELSIUS(*temp);
> > +
> > +     return ret;
> > +}
> > +
> > +static int nvme_tz_get_trip_type(struct thermal_zone_device *tzdev,
> > +                              int trip, enum thermal_trip_type *type)
> > +{
> > +     *type = THERMAL_TRIP_ACTIVE;
> > +
> > +     return 0;
> > +}
> > +
> > +static int nvme_get_over_temp_thresh(struct nvme_ctrl *ctrl,
> > +                                  unsigned int sensor, int *temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> > +             return -EINVAL;
> > +
> > +     ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
> > +     if (!ret)
> > +             *temp = status & NVME_TEMP_THRESH_MASK;
> > +
> > +     return ret > 0 ? -EINVAL : ret;
> > +}
> > +
> > +static int nvme_set_over_temp_thresh(struct nvme_ctrl *ctrl,
> > +                                  unsigned int sensor, int temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (WARN_ON_ONCE(sensor >= ARRAY_SIZE(ctrl->tzdev)))
> > +             return -EINVAL;
> > +
> > +     if (temp > NVME_TEMP_THRESH_MASK)
> > +             return -EINVAL;
> > +
> > +     threshold |= temp & NVME_TEMP_THRESH_MASK;
> > +
> > +     ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
> > +
> > +     return ret > 0 ? -EINVAL : ret;
> > +}
> > +
> > +static int nvme_tz_get_trip_temp(struct thermal_zone_device *tzdev,
> > +                              int trip, int *temp)
> > +{
> > +     unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> > +     struct nvme_ctrl *ctrl = tzdev->devdata;
> > +     int ret;
> > +
> > +     ret = nvme_get_over_temp_thresh(ctrl, sensor, temp);
> > +     if (!ret)
> > +             *temp = KELVIN_TO_MILLICELSIUS(*temp);
> > +
> > +     return ret;
> > +}
> > +
> > +static int nvme_tz_set_trip_temp(struct thermal_zone_device *tzdev,
> > +                              int trip, int temp)
> > +{
> > +     unsigned int sensor = nvme_tz_type_to_sensor(tzdev->type);
> > +     struct nvme_ctrl *ctrl = tzdev->devdata;
> > +
> > +     temp = MILLICELSIUS_TO_KELVIN(temp);
> > +
> > +     return nvme_set_over_temp_thresh(ctrl, sensor, temp);
> > +}
> > +
> > +static struct thermal_zone_device_ops nvme_tz_ops = {
> > +     .get_temp = nvme_tz_get_temp,
> > +     .get_trip_type = nvme_tz_get_trip_type,
> > +     .get_trip_temp = nvme_tz_get_trip_temp,
> > +     .set_trip_temp = nvme_tz_set_trip_temp,
> > +};
> > +
> > +static struct thermal_zone_params nvme_tz_params = {
> > +     .governor_name = "user_space",
> > +     .no_hwmon = true,
> > +};
> > +
> > +static struct thermal_zone_device *
> > +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
> > +{
> > +     struct thermal_zone_device *tzdev;
> > +     char type[THERMAL_NAME_LENGTH];
> > +     int ret;
> > +
> > +     snprintf(type, sizeof(type), "nvme_temp%d", sensor);
> > +
>
> Do we have something more meaningful or descriptive here? A more
> interesting type would be a string that could remind of the sensor
> location. Unless nvme_temp0 is enough to understand where this
> temperature is coming from, I would ask to get something more
> descriptive.

The SMART log page defines composite temperature and temperature sensor 1
through temperature sensor 8.  So I think nvme_temp1 to nvme_temp8 are
descriptive.  And I personally prefer 'nvme_temp0' rather than
'nvme_composite_temp'.

BTW, if we have more than two controllers, we'll have same type names
in the system.  So I'm going to append instance number after 'nvme'.
(e.g. nvme0_temp0).

> > +     tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> > +                                          &nvme_tz_params, 0, 0);
>
> Have you considered if there is a use case for using of-thermal here?

Is it possible to specify the device node properties for the pci devices?
If so, of-thermal zone devices are very useful.

I think normal thermal zone devices and of-thermal zone devices can
co-exist. (i.e. add 'tzdev_of[9]' in nvme_ctrl and the operations are
almost same with the normal one)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-05-24 13:57       ` Akinobu Mita
@ 2019-06-03  2:18         ` Eduardo Valentin
  -1 siblings, 0 replies; 38+ messages in thread
From: Eduardo Valentin @ 2019-06-03  2:18 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-nvme, linux-pm, Zhang Rui, Daniel Lezcano, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Minwoo Im,
	Kenneth Heitke, Chaitanya Kulkarni

On Fri, May 24, 2019 at 10:57:36PM +0900, Akinobu Mita wrote:
> 2019年5月24日(金) 11:35 Eduardo Valentin <edubezval@gmail.com>:
> >
> > Hello Mita,
> >
> > On Wed, May 22, 2019 at 01:04:07AM +0900, Akinobu Mita wrote:
> > > The NVMe controller reports up to nine temperature values in the SMART /
> > > Health log page (the composite temperature and temperature sensor 1 through
> > > temperature sensor 8).
> >
> > Is this a fixed number or we should be more flexible on the amount of
> > sensors?
> 
> Max number is fixed.  In NVMe spec revision 1.3, a controller reports up
> to nine temperature values in the SMART / Health information log.
> 
> It may change to more than nine in the future, but we can fix then.
> 
> > > The temperature threshold feature (Feature Identifier 04h) configures the
> > > asynchronous event request command to complete when the temperature is
> > > crossed its corresponding temperature threshold.
> > >
> > > This adds infrastructure to provide these temperatures and thresholds via
> > > thermal zone devices.
> > >
> > > The nvme_thermal_zones_register() creates up to nine thermal zone devices
> > > for all implemented temperature sensors including the composite
> > > temperature.
> >
> > great!
> >
> > >
> > > /sys/class/thermal/thermal_zone[0-*]:
> > >     |---temp: Temperature
> > >     |---trip_point_0_temp: Over temperature threshold
> > >
> > > The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
> > > On the other hand, the following symlinks to the thermal zone devices are
> > > created in the nvme device sysfs directory.
> > >
> > > - nvme_temp0: Composite temperature
> > > - nvme_temp1: Temperature sensor 1
> > > ...
> > > - nvme_temp8: Temperature sensor 8
> > >
> > > The nvme_thermal_zones_unregister() removes the registered thermal zone
> > > devices and symlinks.
> > >
> > > Cc: Zhang Rui <rui.zhang@intel.com>
> > > Cc: Eduardo Valentin <edubezval@gmail.com>
> > > Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> > > Cc: Keith Busch <keith.busch@intel.com>
> > > Cc: Jens Axboe <axboe@fb.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Sagi Grimberg <sagi@grimberg.me>
> > > Cc: Minwoo Im <minwoo.im.dev@gmail.com>
> > > Cc: Kenneth Heitke <kenneth.heitke@intel.com>
> > > Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
> > > Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> > > ---
> > > * v2
> > > - s/correspoinding/corresponding/ typo in commit log
> > > - Borrowed nvme_get_features() from Keith's patch
> > > - Temperature threshold notification is splitted into another patch
> > > - Change the data type of 'sensor' to unsigned
> > > - Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
> > > - Add WARN_ON_ONCE for paranoid checks
> > > - Fix off-by-one error in nvme_get_temp
> > > - Validate 'sensor' where the value is actually used
> > > - Define and utilize two enums related to the temperature threshold feature
> > > - Remove hysteresis value for this trip point and don't utilize the under
> > >   temperature threshold
> > > - Print error message for thermal_zone_device_register() failure
> > > - Add function comments for nvme_thermal_zones_{,un}register
> > > - Suppress non-fatal errors from nvme_thermal_zones_register()
> > > - Add comment about implemented temperature sensors
> > > - Instead of creating a new 'thermal_work', append async smart event's
> > >   action to the existing async_event_work
> > > - Add comment for tzdev member in nvme_ctrl
> > >
> > >  drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  drivers/nvme/host/nvme.h |  27 +++++
> > >  include/linux/nvme.h     |   5 +
> > >  3 files changed, 297 insertions(+)
> > >
> > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > index c04df80..0ec303c 100644
> > > --- a/drivers/nvme/host/core.c
> > > +++ b/drivers/nvme/host/core.c
> > > @@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
> > >       }
> > >  }
> > >
> > > +#ifdef CONFIG_THERMAL
> > > +
> > > +static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
> > > +{
> > > +     struct nvme_smart_log *log;
> > > +     int ret;
> > > +
> > > +     BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
> > > +                  ARRAY_SIZE(ctrl->tzdev));
> >
> > When would this be triggered?
> 
> This just ensures that the temperature fields for the SMART log page
> structure and nvme_ctrl are not changed accidentally.
> 

Ok.

> > > +
> > > +     if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
> > > +             return -EINVAL;
> > > +
> > > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> >
> > Do we really need to allocate memory every time we want to read
> > temperature? Is this struct too large to fit stack?
> 
> I think 512 bytes is too large in the kernel stack
> 

I see


<cut> 

> > > +
> >
> > Do we have something more meaningful or descriptive here? A more
> > interesting type would be a string that could remind of the sensor
> > location. Unless nvme_temp0 is enough to understand where this
> > temperature is coming from, I would ask to get something more
> > descriptive.
> 
> The SMART log page defines composite temperature and temperature sensor 1
> through temperature sensor 8.  So I think nvme_temp1 to nvme_temp8 are
> descriptive.  And I personally prefer 'nvme_temp0' rather than
> 'nvme_composite_temp'.

I was leaning towards something even more descriptive. nvme_temp0 means
what? Usually we want something more meaningful, Is this a co-processor?
Is this a disk? what exactly nvme_temp0 really represents?


> 
> BTW, if we have more than two controllers, we'll have same type names
> in the system.  So I'm going to append instance number after 'nvme'.
> (e.g. nvme0_temp0).
> 
> > > +     tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> > > +                                          &nvme_tz_params, 0, 0);
> >
> > Have you considered if there is a use case for using of-thermal here?
> 
> Is it possible to specify the device node properties for the pci devices?
> If so, of-thermal zone devices are very useful.
> 

Yeah, I guess that would depend on the PCI device node descriptor that
the sensor is going to be embedded, not of-thermal. But I would expect
that DT has already a good enough DT descriptors for PCI devices, can
you check that?

> I think normal thermal zone devices and of-thermal zone devices can
> co-exist. (i.e. add 'tzdev_of[9]' in nvme_ctrl and the operations are
> almost same with the normal one)

Right, that is usually the case for drivers that have a real need to
support both. Most of the drivers from embedded systems would prefer
to keep only DT probing. But if you have a use case to support non-DT
probing, yes, your driver would need to support both ways.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-06-03  2:18         ` Eduardo Valentin
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Valentin @ 2019-06-03  2:18 UTC (permalink / raw)


On Fri, May 24, 2019@10:57:36PM +0900, Akinobu Mita wrote:
> 2019?5?24?(?) 11:35 Eduardo Valentin <edubezval at gmail.com>:
> >
> > Hello Mita,
> >
> > On Wed, May 22, 2019@01:04:07AM +0900, Akinobu Mita wrote:
> > > The NVMe controller reports up to nine temperature values in the SMART /
> > > Health log page (the composite temperature and temperature sensor 1 through
> > > temperature sensor 8).
> >
> > Is this a fixed number or we should be more flexible on the amount of
> > sensors?
> 
> Max number is fixed.  In NVMe spec revision 1.3, a controller reports up
> to nine temperature values in the SMART / Health information log.
> 
> It may change to more than nine in the future, but we can fix then.
> 
> > > The temperature threshold feature (Feature Identifier 04h) configures the
> > > asynchronous event request command to complete when the temperature is
> > > crossed its corresponding temperature threshold.
> > >
> > > This adds infrastructure to provide these temperatures and thresholds via
> > > thermal zone devices.
> > >
> > > The nvme_thermal_zones_register() creates up to nine thermal zone devices
> > > for all implemented temperature sensors including the composite
> > > temperature.
> >
> > great!
> >
> > >
> > > /sys/class/thermal/thermal_zone[0-*]:
> > >     |---temp: Temperature
> > >     |---trip_point_0_temp: Over temperature threshold
> > >
> > > The thermal_zone[0-*] contains a symlink to the corresponding nvme device.
> > > On the other hand, the following symlinks to the thermal zone devices are
> > > created in the nvme device sysfs directory.
> > >
> > > - nvme_temp0: Composite temperature
> > > - nvme_temp1: Temperature sensor 1
> > > ...
> > > - nvme_temp8: Temperature sensor 8
> > >
> > > The nvme_thermal_zones_unregister() removes the registered thermal zone
> > > devices and symlinks.
> > >
> > > Cc: Zhang Rui <rui.zhang at intel.com>
> > > Cc: Eduardo Valentin <edubezval at gmail.com>
> > > Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
> > > Cc: Keith Busch <keith.busch at intel.com>
> > > Cc: Jens Axboe <axboe at fb.com>
> > > Cc: Christoph Hellwig <hch at lst.de>
> > > Cc: Sagi Grimberg <sagi at grimberg.me>
> > > Cc: Minwoo Im <minwoo.im.dev at gmail.com>
> > > Cc: Kenneth Heitke <kenneth.heitke at intel.com>
> > > Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
> > > Signed-off-by: Akinobu Mita <akinobu.mita at gmail.com>
> > > ---
> > > * v2
> > > - s/correspoinding/corresponding/ typo in commit log
> > > - Borrowed nvme_get_features() from Keith's patch
> > > - Temperature threshold notification is splitted into another patch
> > > - Change the data type of 'sensor' to unsigned
> > > - Add BUILD_BUG_ON for the array size of tzdev member in nvme_ctrl
> > > - Add WARN_ON_ONCE for paranoid checks
> > > - Fix off-by-one error in nvme_get_temp
> > > - Validate 'sensor' where the value is actually used
> > > - Define and utilize two enums related to the temperature threshold feature
> > > - Remove hysteresis value for this trip point and don't utilize the under
> > >   temperature threshold
> > > - Print error message for thermal_zone_device_register() failure
> > > - Add function comments for nvme_thermal_zones_{,un}register
> > > - Suppress non-fatal errors from nvme_thermal_zones_register()
> > > - Add comment about implemented temperature sensors
> > > - Instead of creating a new 'thermal_work', append async smart event's
> > >   action to the existing async_event_work
> > > - Add comment for tzdev member in nvme_ctrl
> > >
> > >  drivers/nvme/host/core.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  drivers/nvme/host/nvme.h |  27 +++++
> > >  include/linux/nvme.h     |   5 +
> > >  3 files changed, 297 insertions(+)
> > >
> > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > index c04df80..0ec303c 100644
> > > --- a/drivers/nvme/host/core.c
> > > +++ b/drivers/nvme/host/core.c
> > > @@ -2179,6 +2179,271 @@ static void nvme_set_latency_tolerance(struct device *dev, s32 val)
> > >       }
> > >  }
> > >
> > > +#ifdef CONFIG_THERMAL
> > > +
> > > +static int nvme_get_temp(struct nvme_ctrl *ctrl, unsigned int sensor, int *temp)
> > > +{
> > > +     struct nvme_smart_log *log;
> > > +     int ret;
> > > +
> > > +     BUILD_BUG_ON(ARRAY_SIZE(log->temp_sensor) + 1 !=
> > > +                  ARRAY_SIZE(ctrl->tzdev));
> >
> > When would this be triggered?
> 
> This just ensures that the temperature fields for the SMART log page
> structure and nvme_ctrl are not changed accidentally.
> 

Ok.

> > > +
> > > +     if (WARN_ON_ONCE(sensor > ARRAY_SIZE(log->temp_sensor)))
> > > +             return -EINVAL;
> > > +
> > > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> >
> > Do we really need to allocate memory every time we want to read
> > temperature? Is this struct too large to fit stack?
> 
> I think 512 bytes is too large in the kernel stack
> 

I see


<cut> 

> > > +
> >
> > Do we have something more meaningful or descriptive here? A more
> > interesting type would be a string that could remind of the sensor
> > location. Unless nvme_temp0 is enough to understand where this
> > temperature is coming from, I would ask to get something more
> > descriptive.
> 
> The SMART log page defines composite temperature and temperature sensor 1
> through temperature sensor 8.  So I think nvme_temp1 to nvme_temp8 are
> descriptive.  And I personally prefer 'nvme_temp0' rather than
> 'nvme_composite_temp'.

I was leaning towards something even more descriptive. nvme_temp0 means
what? Usually we want something more meaningful, Is this a co-processor?
Is this a disk? what exactly nvme_temp0 really represents?


> 
> BTW, if we have more than two controllers, we'll have same type names
> in the system.  So I'm going to append instance number after 'nvme'.
> (e.g. nvme0_temp0).
> 
> > > +     tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> > > +                                          &nvme_tz_params, 0, 0);
> >
> > Have you considered if there is a use case for using of-thermal here?
> 
> Is it possible to specify the device node properties for the pci devices?
> If so, of-thermal zone devices are very useful.
> 

Yeah, I guess that would depend on the PCI device node descriptor that
the sensor is going to be embedded, not of-thermal. But I would expect
that DT has already a good enough DT descriptors for PCI devices, can
you check that?

> I think normal thermal zone devices and of-thermal zone devices can
> co-exist. (i.e. add 'tzdev_of[9]' in nvme_ctrl and the operations are
> almost same with the normal one)

Right, that is usually the case for drivers that have a real need to
support both. Most of the drivers from embedded systems would prefer
to keep only DT probing. But if you have a use case to support non-DT
probing, yes, your driver would need to support both ways.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/4] nvme: add thermal zone infrastructure
  2019-06-03  2:18         ` Eduardo Valentin
@ 2019-06-03 15:03           ` Akinobu Mita
  -1 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-06-03 15:03 UTC (permalink / raw)
  To: Eduardo Valentin
  Cc: linux-nvme, linux-pm, Zhang Rui, Daniel Lezcano, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Minwoo Im,
	Kenneth Heitke, Chaitanya Kulkarni

2019年6月3日(月) 11:18 Eduardo Valentin <edubezval@gmail.com>:

> > > Do we have something more meaningful or descriptive here? A more
> > > interesting type would be a string that could remind of the sensor
> > > location. Unless nvme_temp0 is enough to understand where this
> > > temperature is coming from, I would ask to get something more
> > > descriptive.
> >
> > The SMART log page defines composite temperature and temperature sensor 1
> > through temperature sensor 8.  So I think nvme_temp1 to nvme_temp8 are
> > descriptive.  And I personally prefer 'nvme_temp0' rather than
> > 'nvme_composite_temp'.
>
> I was leaning towards something even more descriptive. nvme_temp0 means
> what? Usually we want something more meaningful, Is this a co-processor?
> Is this a disk? what exactly nvme_temp0 really represents?

It's vendor specific. The NVMe spec only says a controller reports the
composite temperature and temperature sensor 1 through 8.
It doesn't define which part of the device (CPUs, DRAM, NAND, or else)
should implement temperature sensors and how the composite temperature is
calculated from implemented sensors.

I have three NVMe devices from different vendors.

The device A provides only composite temperature.

The device B provides composite temperature and temperature sensor 1.
Both temperatures are always same.

The device C provides the composite temperature and temperature sensor 1,
2, and 5.  For example, the smart log reports
Composite temperature : 43 C
Temperature Sensor 1  : 45 C
Temperature Sensor 2  : 41 C
Temperature Sensor 5  : 65 C

> > BTW, if we have more than two controllers, we'll have same type names
> > in the system.  So I'm going to append instance number after 'nvme'.
> > (e.g. nvme0_temp0).
> >
> > > > +     tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> > > > +                                          &nvme_tz_params, 0, 0);
> > >
> > > Have you considered if there is a use case for using of-thermal here?
> >
> > Is it possible to specify the device node properties for the pci devices?
> > If so, of-thermal zone devices are very useful.
> >
>
> Yeah, I guess that would depend on the PCI device node descriptor that
> the sensor is going to be embedded, not of-thermal. But I would expect
> that DT has already a good enough DT descriptors for PCI devices, can
> you check that?

I can find the examples for ath9k and ath10k pcie wireless devices.
(Documentation/devicetree/bindings/net/wireless/qca,ath9k.txt and
qcom,ath10k.txt)

> > I think normal thermal zone devices and of-thermal zone devices can
> > co-exist. (i.e. add 'tzdev_of[9]' in nvme_ctrl and the operations are
> > almost same with the normal one)
>
> Right, that is usually the case for drivers that have a real need to
> support both. Most of the drivers from embedded systems would prefer
> to keep only DT probing. But if you have a use case to support non-DT
> probing, yes, your driver would need to support both ways.

Distro kernels for x86 usually disables CONFIG_OF.  So we need both.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] nvme: add thermal zone infrastructure
@ 2019-06-03 15:03           ` Akinobu Mita
  0 siblings, 0 replies; 38+ messages in thread
From: Akinobu Mita @ 2019-06-03 15:03 UTC (permalink / raw)


2019?6?3?(?) 11:18 Eduardo Valentin <edubezval at gmail.com>:

> > > Do we have something more meaningful or descriptive here? A more
> > > interesting type would be a string that could remind of the sensor
> > > location. Unless nvme_temp0 is enough to understand where this
> > > temperature is coming from, I would ask to get something more
> > > descriptive.
> >
> > The SMART log page defines composite temperature and temperature sensor 1
> > through temperature sensor 8.  So I think nvme_temp1 to nvme_temp8 are
> > descriptive.  And I personally prefer 'nvme_temp0' rather than
> > 'nvme_composite_temp'.
>
> I was leaning towards something even more descriptive. nvme_temp0 means
> what? Usually we want something more meaningful, Is this a co-processor?
> Is this a disk? what exactly nvme_temp0 really represents?

It's vendor specific. The NVMe spec only says a controller reports the
composite temperature and temperature sensor 1 through 8.
It doesn't define which part of the device (CPUs, DRAM, NAND, or else)
should implement temperature sensors and how the composite temperature is
calculated from implemented sensors.

I have three NVMe devices from different vendors.

The device A provides only composite temperature.

The device B provides composite temperature and temperature sensor 1.
Both temperatures are always same.

The device C provides the composite temperature and temperature sensor 1,
2, and 5.  For example, the smart log reports
Composite temperature : 43 C
Temperature Sensor 1  : 45 C
Temperature Sensor 2  : 41 C
Temperature Sensor 5  : 65 C

> > BTW, if we have more than two controllers, we'll have same type names
> > in the system.  So I'm going to append instance number after 'nvme'.
> > (e.g. nvme0_temp0).
> >
> > > > +     tzdev = thermal_zone_device_register(type, 1, 1, ctrl, &nvme_tz_ops,
> > > > +                                          &nvme_tz_params, 0, 0);
> > >
> > > Have you considered if there is a use case for using of-thermal here?
> >
> > Is it possible to specify the device node properties for the pci devices?
> > If so, of-thermal zone devices are very useful.
> >
>
> Yeah, I guess that would depend on the PCI device node descriptor that
> the sensor is going to be embedded, not of-thermal. But I would expect
> that DT has already a good enough DT descriptors for PCI devices, can
> you check that?

I can find the examples for ath9k and ath10k pcie wireless devices.
(Documentation/devicetree/bindings/net/wireless/qca,ath9k.txt and
qcom,ath10k.txt)

> > I think normal thermal zone devices and of-thermal zone devices can
> > co-exist. (i.e. add 'tzdev_of[9]' in nvme_ctrl and the operations are
> > almost same with the normal one)
>
> Right, that is usually the case for drivers that have a real need to
> support both. Most of the drivers from embedded systems would prefer
> to keep only DT probing. But if you have a use case to support non-DT
> probing, yes, your driver would need to support both ways.

Distro kernels for x86 usually disables CONFIG_OF.  So we need both.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-06-03 15:03 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-21 16:04 [PATCH v2 0/4] nvme: add thermal zone devices Akinobu Mita
2019-05-21 16:04 ` Akinobu Mita
2019-05-21 16:04 ` [PATCH v2 1/4] nvme: Export get and set features Akinobu Mita
2019-05-21 16:04   ` Akinobu Mita
2019-05-21 17:23   ` Chaitanya Kulkarni
2019-05-21 17:23     ` Chaitanya Kulkarni
2019-05-22 15:24     ` Akinobu Mita
2019-05-22 15:24       ` Akinobu Mita
2019-05-21 16:04 ` [PATCH v2 2/4] nvme: add thermal zone infrastructure Akinobu Mita
2019-05-21 16:04   ` Akinobu Mita
2019-05-21 16:15   ` Keith Busch
2019-05-21 16:15     ` Keith Busch
2019-05-22 15:44     ` Akinobu Mita
2019-05-22 15:44       ` Akinobu Mita
2019-05-22 15:44       ` Keith Busch
2019-05-22 15:44         ` Keith Busch
2019-05-22 15:52         ` Akinobu Mita
2019-05-22 15:52           ` Akinobu Mita
2019-05-21 21:05   ` Heitke, Kenneth
2019-05-21 21:05     ` Heitke, Kenneth
2019-05-22 15:23     ` Akinobu Mita
2019-05-22 15:23       ` Akinobu Mita
2019-05-24  2:35   ` Eduardo Valentin
2019-05-24  2:35     ` Eduardo Valentin
2019-05-24 13:57     ` Akinobu Mita
2019-05-24 13:57       ` Akinobu Mita
2019-06-03  2:18       ` Eduardo Valentin
2019-06-03  2:18         ` Eduardo Valentin
2019-06-03 15:03         ` Akinobu Mita
2019-06-03 15:03           ` Akinobu Mita
2019-05-21 16:04 ` [PATCH v2 3/4] nvme: notify thermal framework when temperature threshold events occur Akinobu Mita
2019-05-21 16:04   ` Akinobu Mita
2019-05-21 16:04 ` [PATCH v2 4/4] nvme-pci: support thermal zone Akinobu Mita
2019-05-21 16:04   ` Akinobu Mita
2019-05-22 17:46   ` Christoph Hellwig
2019-05-22 17:46     ` Christoph Hellwig
2019-05-23 14:21     ` Akinobu Mita
2019-05-23 14:21       ` Akinobu Mita

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.