From: Akinobu Mita <akinobu.mita@gmail.com> To: Eduardo Valentin <edubezval@gmail.com> Cc: linux-nvme@lists.infradead.org, linux-pm@vger.kernel.org, Zhang Rui <rui.zhang@intel.com>, Daniel Lezcano <daniel.lezcano@linaro.org>, Keith Busch <keith.busch@intel.com>, Jens Axboe <axboe@fb.com>, Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>, Minwoo Im <minwoo.im.dev@gmail.com>, Kenneth Heitke <kenneth.heitke@intel.com>, Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> Subject: Re: [PATCH v3 2/3] nvme: add thermal zone devices Date: Tue, 4 Jun 2019 00:20:19 +0900 [thread overview] Message-ID: <CAC5umyj9V0sTD-ZsK6Q684wPdMpJGs434vDtZDm6a8gwoz3D7Q@mail.gmail.com> (raw) In-Reply-To: <20190603023617.GA8567@localhost.localdomain> 2019年6月3日(月) 11:36 Eduardo Valentin <edubezval@gmail.com>: > > Hey Mita, > > On Mon, May 27, 2019 at 01:29:02AM +0900, Akinobu Mita wrote: > > The NVMe controller reports up to nine temperature values in the SMART / > > Health log page (the composite temperature and temperature sensor 1 through > > temperature sensor 8). > > > > This provides these temperatures via thermal zone devices. > > > > Once the controller is identified, the thermal zone devices are created for > > all implemented temperature sensors including the composite temperature. > > > > /sys/class/thermal/thermal_zone[0-*]: > > |---type: 'nvme<instance>-temp<sensor>' > > |---temp: Temperature > > |---trip_point_0_temp: Over temperature threshold > > > > The thermal_zone[0-*] contains a 'device' symlink to the corresponding nvme > > device. > > > > On the other hand, the following symlinks to the thermal zone devices are > > created in the nvme device sysfs directory. > > > > - temp0: Composite temperature > > - temp1: Temperature sensor 1 > > ... > > - temp8: Temperature sensor 8 > > > > These questions on V2 are still unanswered: > a. Can we get a more descriptive string into tz->type? As I said in the other thread, the NVMe spec only says a controller reports the composite temperature and temperature sensor 1 through 8. What temperature sensor N means is vendor specific. > b. Can these APIs support DT probing too? OK. I can try, but I don't have arm or arm64 boards with PCIe for testing with of-thermal. So it'll be untested. > > +static struct thermal_zone_params nvme_tz_params = { > > + .governor_name = "user_space", > > Also, > > Was there any particular reason why defaulting to user_space here? I only tested with the user_space governor. There is no cooling device to bind with this. > > + .no_hwmon = true, > > +}; > > + > > +static struct thermal_zone_device * > > +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor) > > +{ > > + struct thermal_zone_device *tzdev; > > + char name[THERMAL_NAME_LENGTH]; > > + int ret; > > + > > + snprintf(name, sizeof(name), "nvme%d_temp%u", ctrl->instance, sensor); > > + > > + tzdev = thermal_zone_device_register(name, 1, 1, ctrl, &nvme_tz_ops, > > + &nvme_tz_params, 0, 0); > > + if (IS_ERR(tzdev)) { > > + dev_err(ctrl->device, > > + "Failed to register thermal zone device: %ld\n", > > + PTR_ERR(tzdev)); > > + return tzdev; > > + } > > + > > + snprintf(name, sizeof(name), "temp%d", sensor); > > + ret = sysfs_create_link(&ctrl->ctrl_device.kobj, &tzdev->device.kobj, > > + name); > > + if (ret) > > + goto device_unregister; > > + > > + ret = sysfs_create_link(&tzdev->device.kobj, > > + &ctrl->ctrl_device.kobj, "device"); > > + if (ret) > > + goto remove_link; > > + > > + return tzdev; > > + > > +remove_link: > > + sysfs_remove_link(&ctrl->ctrl_device.kobj, name); > > +device_unregister: > > + thermal_zone_device_unregister(tzdev); > > + > > + return ERR_PTR(ret); > > +} > > + > > +/** > > + * nvme_thermal_zones_register() - register nvme thermal zone devices > > + * @ctrl: controller instance > > + * > > + * This function creates up to nine thermal zone devices for all implemented > > + * temperature sensors including the composite temperature. > > + * Each thermal zone device provides a single trip point temperature that is > > + * associated with an over temperature threshold. > > + */ > > +static int nvme_thermal_zones_register(struct nvme_ctrl *ctrl) > > +{ > > + struct nvme_smart_log *log; > > + int ret; > > + int i; > > + > > + log = kzalloc(sizeof(*log), GFP_KERNEL); > > + if (!log) > > + return 0; /* non-fatal error */ > > I am not sure about this API design here. I would leave the error > handling and judging if this is fatal or not to the caller. > I mean, if I ask to register a nvme thermal zone and I get > a 0 as response, I would assume the thermal zone exists from > now on, right? This routine is designed to return error code only when we're unable to communicate with the device at all (i.e. nvme_submit_sync_cmd returns a negative value). We don't want to abandon device initialization just due to thermal zone failures. Because thermal zone device isn't mandatory to manage the controllers, and the device like qemu doesn't provide smart log (according to Keith). > > + > > + ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0, > > + log, sizeof(*log), 0); > > + if (ret) { > > + dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret); > > + /* If the device provided a response, then it's non-fatal */ > > + if (ret > 0) > > + ret = 0; > > + goto free_log; > > Once again, hiding errors is never a strategy that scales.. > > > + } > > + > > + for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) { > > + struct thermal_zone_device *tzdev; > > + int temp; > > + > > + if (i) > > + temp = le16_to_cpu(log->temp_sensor[i - 1]); > > + else > > + temp = get_unaligned_le16(log->temperature); > > + > > + /* > > + * All implemented temperature sensors report a non-zero value > > + * in temperature sensor fields in the smart log page. > > + */ > > + if (!temp) > > + continue; > > + if (ctrl->tzdev[i]) > > + continue; > > + > > + tzdev = nvme_thermal_zone_register(ctrl, i); > > + if (!IS_ERR(tzdev)) > > + ctrl->tzdev[i] = tzdev; > > Here again, I would not hide errors, the API should propagate errors > if something goes wrong. > > > + } > > + > > +free_log: > > + kfree(log); > > + > > + return ret; > > +} > > + > > +/** > > + * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices > > + * @ctrl: controller instance > > + * > > + * This function removes the registered thermal zone devices and symlinks. > > + */ > > +static void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl) > > +{ > > + int i; > > + > > + for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) { > > + struct thermal_zone_device *tzdev = ctrl->tzdev[i]; > > + char name[20]; > > + > > + if (!tzdev) > > + continue; > > + > > + sysfs_remove_link(&tzdev->device.kobj, "device"); > > + > > + snprintf(name, sizeof(name), "temp%d", i); > > + sysfs_remove_link(&ctrl->ctrl_device.kobj, name); > > + > > + thermal_zone_device_unregister(tzdev); > > + > > + ctrl->tzdev[i] = NULL; > > + } > > +} > > + > > +#else > > + > > +static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl) > > +{ > > + return 0; > > +} > > + > > +static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl) > > +{ > > +} > > + > > +#endif /* CONFIG_THERMAL */ > > + > > struct nvme_core_quirk_entry { > > /* > > * NVMe model and firmware strings are padded with spaces. For > > @@ -2754,6 +3037,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) > > if (ret < 0) > > return ret; > > > > + ret = nvme_thermal_zones_register(ctrl); > > + if (ret < 0) > > + return ret; > > > I would definitely keep this code the way it is here in > nvme_init_identify(), but if I read this right, your > nvme_thermal_zones_register() will never return < 0, and therefore this > condition is never met. The nvme_get_log() can return a negative value. Only in this case, nvme_thermal_zones_register() returns error and abandon the device initialization.
WARNING: multiple messages have this Message-ID (diff)
From: akinobu.mita@gmail.com (Akinobu Mita) Subject: [PATCH v3 2/3] nvme: add thermal zone devices Date: Tue, 4 Jun 2019 00:20:19 +0900 [thread overview] Message-ID: <CAC5umyj9V0sTD-ZsK6Q684wPdMpJGs434vDtZDm6a8gwoz3D7Q@mail.gmail.com> (raw) In-Reply-To: <20190603023617.GA8567@localhost.localdomain> 2019?6?3?(?) 11:36 Eduardo Valentin <edubezval at gmail.com>: > > Hey Mita, > > On Mon, May 27, 2019@01:29:02AM +0900, Akinobu Mita wrote: > > The NVMe controller reports up to nine temperature values in the SMART / > > Health log page (the composite temperature and temperature sensor 1 through > > temperature sensor 8). > > > > This provides these temperatures via thermal zone devices. > > > > Once the controller is identified, the thermal zone devices are created for > > all implemented temperature sensors including the composite temperature. > > > > /sys/class/thermal/thermal_zone[0-*]: > > |---type: 'nvme<instance>-temp<sensor>' > > |---temp: Temperature > > |---trip_point_0_temp: Over temperature threshold > > > > The thermal_zone[0-*] contains a 'device' symlink to the corresponding nvme > > device. > > > > On the other hand, the following symlinks to the thermal zone devices are > > created in the nvme device sysfs directory. > > > > - temp0: Composite temperature > > - temp1: Temperature sensor 1 > > ... > > - temp8: Temperature sensor 8 > > > > These questions on V2 are still unanswered: > a. Can we get a more descriptive string into tz->type? As I said in the other thread, the NVMe spec only says a controller reports the composite temperature and temperature sensor 1 through 8. What temperature sensor N means is vendor specific. > b. Can these APIs support DT probing too? OK. I can try, but I don't have arm or arm64 boards with PCIe for testing with of-thermal. So it'll be untested. > > +static struct thermal_zone_params nvme_tz_params = { > > + .governor_name = "user_space", > > Also, > > Was there any particular reason why defaulting to user_space here? I only tested with the user_space governor. There is no cooling device to bind with this. > > + .no_hwmon = true, > > +}; > > + > > +static struct thermal_zone_device * > > +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor) > > +{ > > + struct thermal_zone_device *tzdev; > > + char name[THERMAL_NAME_LENGTH]; > > + int ret; > > + > > + snprintf(name, sizeof(name), "nvme%d_temp%u", ctrl->instance, sensor); > > + > > + tzdev = thermal_zone_device_register(name, 1, 1, ctrl, &nvme_tz_ops, > > + &nvme_tz_params, 0, 0); > > + if (IS_ERR(tzdev)) { > > + dev_err(ctrl->device, > > + "Failed to register thermal zone device: %ld\n", > > + PTR_ERR(tzdev)); > > + return tzdev; > > + } > > + > > + snprintf(name, sizeof(name), "temp%d", sensor); > > + ret = sysfs_create_link(&ctrl->ctrl_device.kobj, &tzdev->device.kobj, > > + name); > > + if (ret) > > + goto device_unregister; > > + > > + ret = sysfs_create_link(&tzdev->device.kobj, > > + &ctrl->ctrl_device.kobj, "device"); > > + if (ret) > > + goto remove_link; > > + > > + return tzdev; > > + > > +remove_link: > > + sysfs_remove_link(&ctrl->ctrl_device.kobj, name); > > +device_unregister: > > + thermal_zone_device_unregister(tzdev); > > + > > + return ERR_PTR(ret); > > +} > > + > > +/** > > + * nvme_thermal_zones_register() - register nvme thermal zone devices > > + * @ctrl: controller instance > > + * > > + * This function creates up to nine thermal zone devices for all implemented > > + * temperature sensors including the composite temperature. > > + * Each thermal zone device provides a single trip point temperature that is > > + * associated with an over temperature threshold. > > + */ > > +static int nvme_thermal_zones_register(struct nvme_ctrl *ctrl) > > +{ > > + struct nvme_smart_log *log; > > + int ret; > > + int i; > > + > > + log = kzalloc(sizeof(*log), GFP_KERNEL); > > + if (!log) > > + return 0; /* non-fatal error */ > > I am not sure about this API design here. I would leave the error > handling and judging if this is fatal or not to the caller. > I mean, if I ask to register a nvme thermal zone and I get > a 0 as response, I would assume the thermal zone exists from > now on, right? This routine is designed to return error code only when we're unable to communicate with the device at all (i.e. nvme_submit_sync_cmd returns a negative value). We don't want to abandon device initialization just due to thermal zone failures. Because thermal zone device isn't mandatory to manage the controllers, and the device like qemu doesn't provide smart log (according to Keith). > > + > > + ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0, > > + log, sizeof(*log), 0); > > + if (ret) { > > + dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret); > > + /* If the device provided a response, then it's non-fatal */ > > + if (ret > 0) > > + ret = 0; > > + goto free_log; > > Once again, hiding errors is never a strategy that scales.. > > > + } > > + > > + for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) { > > + struct thermal_zone_device *tzdev; > > + int temp; > > + > > + if (i) > > + temp = le16_to_cpu(log->temp_sensor[i - 1]); > > + else > > + temp = get_unaligned_le16(log->temperature); > > + > > + /* > > + * All implemented temperature sensors report a non-zero value > > + * in temperature sensor fields in the smart log page. > > + */ > > + if (!temp) > > + continue; > > + if (ctrl->tzdev[i]) > > + continue; > > + > > + tzdev = nvme_thermal_zone_register(ctrl, i); > > + if (!IS_ERR(tzdev)) > > + ctrl->tzdev[i] = tzdev; > > Here again, I would not hide errors, the API should propagate errors > if something goes wrong. > > > + } > > + > > +free_log: > > + kfree(log); > > + > > + return ret; > > +} > > + > > +/** > > + * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices > > + * @ctrl: controller instance > > + * > > + * This function removes the registered thermal zone devices and symlinks. > > + */ > > +static void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl) > > +{ > > + int i; > > + > > + for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) { > > + struct thermal_zone_device *tzdev = ctrl->tzdev[i]; > > + char name[20]; > > + > > + if (!tzdev) > > + continue; > > + > > + sysfs_remove_link(&tzdev->device.kobj, "device"); > > + > > + snprintf(name, sizeof(name), "temp%d", i); > > + sysfs_remove_link(&ctrl->ctrl_device.kobj, name); > > + > > + thermal_zone_device_unregister(tzdev); > > + > > + ctrl->tzdev[i] = NULL; > > + } > > +} > > + > > +#else > > + > > +static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl) > > +{ > > + return 0; > > +} > > + > > +static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl) > > +{ > > +} > > + > > +#endif /* CONFIG_THERMAL */ > > + > > struct nvme_core_quirk_entry { > > /* > > * NVMe model and firmware strings are padded with spaces. For > > @@ -2754,6 +3037,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) > > if (ret < 0) > > return ret; > > > > + ret = nvme_thermal_zones_register(ctrl); > > + if (ret < 0) > > + return ret; > > > I would definitely keep this code the way it is here in > nvme_init_identify(), but if I read this right, your > nvme_thermal_zones_register() will never return < 0, and therefore this > condition is never met. The nvme_get_log() can return a negative value. Only in this case, nvme_thermal_zones_register() returns error and abandon the device initialization.
next prev parent reply other threads:[~2019-06-03 15:20 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-05-26 16:29 [PATCH v3 0/3] nvme: add thermal zone devices Akinobu Mita 2019-05-26 16:29 ` Akinobu Mita 2019-05-26 16:29 ` [PATCH v3 1/3] nvme: Export get and set features Akinobu Mita 2019-05-26 16:29 ` Akinobu Mita 2019-05-26 16:45 ` Chaitanya Kulkarni 2019-05-26 16:45 ` Chaitanya Kulkarni 2019-05-29 15:19 ` Minwoo Im 2019-05-29 15:19 ` Minwoo Im 2019-05-26 16:29 ` [PATCH v3 2/3] nvme: add thermal zone devices Akinobu Mita 2019-05-26 16:29 ` Akinobu Mita 2019-05-29 15:15 ` Minwoo Im 2019-05-29 15:15 ` Minwoo Im 2019-05-29 16:47 ` Akinobu Mita 2019-05-29 16:47 ` Akinobu Mita 2019-05-30 10:18 ` Minwoo Im 2019-05-30 10:18 ` Minwoo Im 2019-06-01 9:02 ` Christoph Hellwig 2019-06-01 9:02 ` Christoph Hellwig 2019-06-02 13:19 ` Akinobu Mita 2019-06-02 13:19 ` Akinobu Mita 2019-06-04 7:31 ` Christoph Hellwig 2019-06-04 7:31 ` Christoph Hellwig 2019-06-05 15:42 ` Akinobu Mita 2019-06-05 15:42 ` Akinobu Mita 2019-06-03 2:36 ` Eduardo Valentin 2019-06-03 2:36 ` Eduardo Valentin 2019-06-03 15:20 ` Akinobu Mita [this message] 2019-06-03 15:20 ` Akinobu Mita 2019-06-06 4:05 ` Eduardo Valentin 2019-06-06 4:05 ` Eduardo Valentin 2019-06-07 15:21 ` Akinobu Mita 2019-06-07 15:21 ` Akinobu Mita 2019-05-26 16:29 ` [PATCH v3 3/3] nvme: notify thermal framework when temperature threshold events occur Akinobu Mita 2019-05-26 16:29 ` Akinobu Mita 2019-06-01 9:03 ` Christoph Hellwig 2019-06-01 9:03 ` Christoph Hellwig 2019-06-02 13:46 ` Akinobu Mita 2019-06-02 13:46 ` Akinobu Mita 2019-06-04 7:32 ` Christoph Hellwig 2019-06-04 7:32 ` Christoph Hellwig
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAC5umyj9V0sTD-ZsK6Q684wPdMpJGs434vDtZDm6a8gwoz3D7Q@mail.gmail.com \ --to=akinobu.mita@gmail.com \ --cc=Chaitanya.Kulkarni@wdc.com \ --cc=axboe@fb.com \ --cc=daniel.lezcano@linaro.org \ --cc=edubezval@gmail.com \ --cc=hch@lst.de \ --cc=keith.busch@intel.com \ --cc=kenneth.heitke@intel.com \ --cc=linux-nvme@lists.infradead.org \ --cc=linux-pm@vger.kernel.org \ --cc=minwoo.im.dev@gmail.com \ --cc=rui.zhang@intel.com \ --cc=sagi@grimberg.me \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.