All of lore.kernel.org
 help / color / mirror / Atom feed
From: Akinobu Mita <akinobu.mita@gmail.com>
To: Eduardo Valentin <edubezval@gmail.com>
Cc: linux-nvme@lists.infradead.org, linux-pm@vger.kernel.org,
	Zhang Rui <rui.zhang@intel.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Keith Busch <keith.busch@intel.com>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	Minwoo Im <minwoo.im.dev@gmail.com>,
	Kenneth Heitke <kenneth.heitke@intel.com>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: Re: [PATCH v3 2/3] nvme: add thermal zone devices
Date: Tue, 4 Jun 2019 00:20:19 +0900	[thread overview]
Message-ID: <CAC5umyj9V0sTD-ZsK6Q684wPdMpJGs434vDtZDm6a8gwoz3D7Q@mail.gmail.com> (raw)
In-Reply-To: <20190603023617.GA8567@localhost.localdomain>

2019年6月3日(月) 11:36 Eduardo Valentin <edubezval@gmail.com>:
>
> Hey Mita,
>
> On Mon, May 27, 2019 at 01:29:02AM +0900, Akinobu Mita wrote:
> > The NVMe controller reports up to nine temperature values in the SMART /
> > Health log page (the composite temperature and temperature sensor 1 through
> > temperature sensor 8).
> >
> > This provides these temperatures via thermal zone devices.
> >
> > Once the controller is identified, the thermal zone devices are created for
> > all implemented temperature sensors including the composite temperature.
> >
> > /sys/class/thermal/thermal_zone[0-*]:
> >     |---type: 'nvme<instance>-temp<sensor>'
> >     |---temp: Temperature
> >     |---trip_point_0_temp: Over temperature threshold
> >
> > The thermal_zone[0-*] contains a 'device' symlink to the corresponding nvme
> > device.
> >
> > On the other hand, the following symlinks to the thermal zone devices are
> > created in the nvme device sysfs directory.
> >
> > - temp0: Composite temperature
> > - temp1: Temperature sensor 1
> > ...
> > - temp8: Temperature sensor 8
> >
>
> These questions on V2 are still unanswered:
> a. Can we get a more descriptive string into tz->type?

As I said in the other thread, the NVMe spec only says a controller
reports the composite temperature and temperature sensor 1 through 8.
What temperature sensor N means is vendor specific.

> b. Can these APIs support DT probing too?

OK. I can try, but I don't have arm or arm64 boards with PCIe for
testing with of-thermal.  So it'll be untested.

> > +static struct thermal_zone_params nvme_tz_params = {
> > +     .governor_name = "user_space",
>
> Also,
>
> Was there any particular reason why defaulting to user_space here?

I only tested with the user_space governor.  There is no cooling device
to bind with this.

> > +     .no_hwmon = true,
> > +};
> > +
> > +static struct thermal_zone_device *
> > +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
> > +{
> > +     struct thermal_zone_device *tzdev;
> > +     char name[THERMAL_NAME_LENGTH];
> > +     int ret;
> > +
> > +     snprintf(name, sizeof(name), "nvme%d_temp%u", ctrl->instance, sensor);
> > +
> > +     tzdev = thermal_zone_device_register(name, 1, 1, ctrl, &nvme_tz_ops,
> > +                                          &nvme_tz_params, 0, 0);
> > +     if (IS_ERR(tzdev)) {
> > +             dev_err(ctrl->device,
> > +                     "Failed to register thermal zone device: %ld\n",
> > +                     PTR_ERR(tzdev));
> > +             return tzdev;
> > +     }
> > +
> > +     snprintf(name, sizeof(name), "temp%d", sensor);
> > +     ret = sysfs_create_link(&ctrl->ctrl_device.kobj, &tzdev->device.kobj,
> > +                             name);
> > +     if (ret)
> > +             goto device_unregister;
> > +
> > +     ret = sysfs_create_link(&tzdev->device.kobj,
> > +                             &ctrl->ctrl_device.kobj, "device");
> > +     if (ret)
> > +             goto remove_link;
> > +
> > +     return tzdev;
> > +
> > +remove_link:
> > +     sysfs_remove_link(&ctrl->ctrl_device.kobj, name);
> > +device_unregister:
> > +     thermal_zone_device_unregister(tzdev);
> > +
> > +     return ERR_PTR(ret);
> > +}
> > +
> > +/**
> > + * nvme_thermal_zones_register() - register nvme thermal zone devices
> > + * @ctrl: controller instance
> > + *
> > + * This function creates up to nine thermal zone devices for all implemented
> > + * temperature sensors including the composite temperature.
> > + * Each thermal zone device provides a single trip point temperature that is
> > + * associated with an over temperature threshold.
> > + */
> > +static int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > +{
> > +     struct nvme_smart_log *log;
> > +     int ret;
> > +     int i;
> > +
> > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > +     if (!log)
> > +             return 0; /* non-fatal error */
>
> I am not sure about this API design here. I would leave the error
> handling and judging if this is fatal or not to the caller.
> I mean, if I ask to register a nvme thermal zone and I get
> a 0 as response, I would assume the thermal zone exists from
> now on, right?

This routine is designed to return error code only when we're unable to
communicate with the device at all (i.e. nvme_submit_sync_cmd returns a
negative value).

We don't want to abandon device initialization just due to thermal zone
failures.  Because thermal zone device isn't mandatory to manage the
controllers, and the device like qemu doesn't provide smart log (according
to Keith).

> > +
> > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > +                        log, sizeof(*log), 0);
> > +     if (ret) {
> > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > +             /* If the device provided a response, then it's non-fatal */
> > +             if (ret > 0)
> > +                     ret = 0;
> > +             goto free_log;
>
> Once again, hiding errors is never a strategy that scales..
>
> > +     }
> > +
> > +     for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> > +             struct thermal_zone_device *tzdev;
> > +             int temp;
> > +
> > +             if (i)
> > +                     temp = le16_to_cpu(log->temp_sensor[i - 1]);
> > +             else
> > +                     temp = get_unaligned_le16(log->temperature);
> > +
> > +             /*
> > +              * All implemented temperature sensors report a non-zero value
> > +              * in temperature sensor fields in the smart log page.
> > +              */
> > +             if (!temp)
> > +                     continue;
> > +             if (ctrl->tzdev[i])
> > +                     continue;
> > +
> > +             tzdev = nvme_thermal_zone_register(ctrl, i);
> > +             if (!IS_ERR(tzdev))
> > +                     ctrl->tzdev[i] = tzdev;
>
> Here again, I would not hide errors, the API should propagate errors
> if something goes wrong.
>
> > +     }
> > +
> > +free_log:
> > +     kfree(log);
> > +
> > +     return ret;
> > +}
> > +
> > +/**
> > + * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices
> > + * @ctrl: controller instance
> > + *
> > + * This function removes the registered thermal zone devices and symlinks.
> > + */
> > +static void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> > +             struct thermal_zone_device *tzdev = ctrl->tzdev[i];
> > +             char name[20];
> > +
> > +             if (!tzdev)
> > +                     continue;
> > +
> > +             sysfs_remove_link(&tzdev->device.kobj, "device");
> > +
> > +             snprintf(name, sizeof(name), "temp%d", i);
> > +             sysfs_remove_link(&ctrl->ctrl_device.kobj, name);
> > +
> > +             thermal_zone_device_unregister(tzdev);
> > +
> > +             ctrl->tzdev[i] = NULL;
> > +     }
> > +}
> > +
> > +#else
> > +
> > +static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > +{
> > +     return 0;
> > +}
> > +
> > +static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_THERMAL */
> > +
> >  struct nvme_core_quirk_entry {
> >       /*
> >        * NVMe model and firmware strings are padded with spaces.  For
> > @@ -2754,6 +3037,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
> >       if (ret < 0)
> >               return ret;
> >
> > +     ret = nvme_thermal_zones_register(ctrl);
> > +     if (ret < 0)
> > +             return ret;
>
>
> I would definitely keep this code the way it is here in
> nvme_init_identify(), but if I read this right, your
> nvme_thermal_zones_register() will never return < 0, and therefore this
> condition is never met.

The nvme_get_log() can return a negative value.  Only in this case,
nvme_thermal_zones_register() returns error and abandon the device
initialization.

WARNING: multiple messages have this Message-ID (diff)
From: akinobu.mita@gmail.com (Akinobu Mita)
Subject: [PATCH v3 2/3] nvme: add thermal zone devices
Date: Tue, 4 Jun 2019 00:20:19 +0900	[thread overview]
Message-ID: <CAC5umyj9V0sTD-ZsK6Q684wPdMpJGs434vDtZDm6a8gwoz3D7Q@mail.gmail.com> (raw)
In-Reply-To: <20190603023617.GA8567@localhost.localdomain>

2019?6?3?(?) 11:36 Eduardo Valentin <edubezval at gmail.com>:
>
> Hey Mita,
>
> On Mon, May 27, 2019@01:29:02AM +0900, Akinobu Mita wrote:
> > The NVMe controller reports up to nine temperature values in the SMART /
> > Health log page (the composite temperature and temperature sensor 1 through
> > temperature sensor 8).
> >
> > This provides these temperatures via thermal zone devices.
> >
> > Once the controller is identified, the thermal zone devices are created for
> > all implemented temperature sensors including the composite temperature.
> >
> > /sys/class/thermal/thermal_zone[0-*]:
> >     |---type: 'nvme<instance>-temp<sensor>'
> >     |---temp: Temperature
> >     |---trip_point_0_temp: Over temperature threshold
> >
> > The thermal_zone[0-*] contains a 'device' symlink to the corresponding nvme
> > device.
> >
> > On the other hand, the following symlinks to the thermal zone devices are
> > created in the nvme device sysfs directory.
> >
> > - temp0: Composite temperature
> > - temp1: Temperature sensor 1
> > ...
> > - temp8: Temperature sensor 8
> >
>
> These questions on V2 are still unanswered:
> a. Can we get a more descriptive string into tz->type?

As I said in the other thread, the NVMe spec only says a controller
reports the composite temperature and temperature sensor 1 through 8.
What temperature sensor N means is vendor specific.

> b. Can these APIs support DT probing too?

OK. I can try, but I don't have arm or arm64 boards with PCIe for
testing with of-thermal.  So it'll be untested.

> > +static struct thermal_zone_params nvme_tz_params = {
> > +     .governor_name = "user_space",
>
> Also,
>
> Was there any particular reason why defaulting to user_space here?

I only tested with the user_space governor.  There is no cooling device
to bind with this.

> > +     .no_hwmon = true,
> > +};
> > +
> > +static struct thermal_zone_device *
> > +nvme_thermal_zone_register(struct nvme_ctrl *ctrl, unsigned int sensor)
> > +{
> > +     struct thermal_zone_device *tzdev;
> > +     char name[THERMAL_NAME_LENGTH];
> > +     int ret;
> > +
> > +     snprintf(name, sizeof(name), "nvme%d_temp%u", ctrl->instance, sensor);
> > +
> > +     tzdev = thermal_zone_device_register(name, 1, 1, ctrl, &nvme_tz_ops,
> > +                                          &nvme_tz_params, 0, 0);
> > +     if (IS_ERR(tzdev)) {
> > +             dev_err(ctrl->device,
> > +                     "Failed to register thermal zone device: %ld\n",
> > +                     PTR_ERR(tzdev));
> > +             return tzdev;
> > +     }
> > +
> > +     snprintf(name, sizeof(name), "temp%d", sensor);
> > +     ret = sysfs_create_link(&ctrl->ctrl_device.kobj, &tzdev->device.kobj,
> > +                             name);
> > +     if (ret)
> > +             goto device_unregister;
> > +
> > +     ret = sysfs_create_link(&tzdev->device.kobj,
> > +                             &ctrl->ctrl_device.kobj, "device");
> > +     if (ret)
> > +             goto remove_link;
> > +
> > +     return tzdev;
> > +
> > +remove_link:
> > +     sysfs_remove_link(&ctrl->ctrl_device.kobj, name);
> > +device_unregister:
> > +     thermal_zone_device_unregister(tzdev);
> > +
> > +     return ERR_PTR(ret);
> > +}
> > +
> > +/**
> > + * nvme_thermal_zones_register() - register nvme thermal zone devices
> > + * @ctrl: controller instance
> > + *
> > + * This function creates up to nine thermal zone devices for all implemented
> > + * temperature sensors including the composite temperature.
> > + * Each thermal zone device provides a single trip point temperature that is
> > + * associated with an over temperature threshold.
> > + */
> > +static int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > +{
> > +     struct nvme_smart_log *log;
> > +     int ret;
> > +     int i;
> > +
> > +     log = kzalloc(sizeof(*log), GFP_KERNEL);
> > +     if (!log)
> > +             return 0; /* non-fatal error */
>
> I am not sure about this API design here. I would leave the error
> handling and judging if this is fatal or not to the caller.
> I mean, if I ask to register a nvme thermal zone and I get
> a 0 as response, I would assume the thermal zone exists from
> now on, right?

This routine is designed to return error code only when we're unable to
communicate with the device at all (i.e. nvme_submit_sync_cmd returns a
negative value).

We don't want to abandon device initialization just due to thermal zone
failures.  Because thermal zone device isn't mandatory to manage the
controllers, and the device like qemu doesn't provide smart log (according
to Keith).

> > +
> > +     ret = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
> > +                        log, sizeof(*log), 0);
> > +     if (ret) {
> > +             dev_err(ctrl->device, "Failed to get SMART log: %d\n", ret);
> > +             /* If the device provided a response, then it's non-fatal */
> > +             if (ret > 0)
> > +                     ret = 0;
> > +             goto free_log;
>
> Once again, hiding errors is never a strategy that scales..
>
> > +     }
> > +
> > +     for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> > +             struct thermal_zone_device *tzdev;
> > +             int temp;
> > +
> > +             if (i)
> > +                     temp = le16_to_cpu(log->temp_sensor[i - 1]);
> > +             else
> > +                     temp = get_unaligned_le16(log->temperature);
> > +
> > +             /*
> > +              * All implemented temperature sensors report a non-zero value
> > +              * in temperature sensor fields in the smart log page.
> > +              */
> > +             if (!temp)
> > +                     continue;
> > +             if (ctrl->tzdev[i])
> > +                     continue;
> > +
> > +             tzdev = nvme_thermal_zone_register(ctrl, i);
> > +             if (!IS_ERR(tzdev))
> > +                     ctrl->tzdev[i] = tzdev;
>
> Here again, I would not hide errors, the API should propagate errors
> if something goes wrong.
>
> > +     }
> > +
> > +free_log:
> > +     kfree(log);
> > +
> > +     return ret;
> > +}
> > +
> > +/**
> > + * nvme_thermal_zones_unregister() - unregister nvme thermal zone devices
> > + * @ctrl: controller instance
> > + *
> > + * This function removes the registered thermal zone devices and symlinks.
> > + */
> > +static void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(ctrl->tzdev); i++) {
> > +             struct thermal_zone_device *tzdev = ctrl->tzdev[i];
> > +             char name[20];
> > +
> > +             if (!tzdev)
> > +                     continue;
> > +
> > +             sysfs_remove_link(&tzdev->device.kobj, "device");
> > +
> > +             snprintf(name, sizeof(name), "temp%d", i);
> > +             sysfs_remove_link(&ctrl->ctrl_device.kobj, name);
> > +
> > +             thermal_zone_device_unregister(tzdev);
> > +
> > +             ctrl->tzdev[i] = NULL;
> > +     }
> > +}
> > +
> > +#else
> > +
> > +static inline int nvme_thermal_zones_register(struct nvme_ctrl *ctrl)
> > +{
> > +     return 0;
> > +}
> > +
> > +static inline void nvme_thermal_zones_unregister(struct nvme_ctrl *ctrl)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_THERMAL */
> > +
> >  struct nvme_core_quirk_entry {
> >       /*
> >        * NVMe model and firmware strings are padded with spaces.  For
> > @@ -2754,6 +3037,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
> >       if (ret < 0)
> >               return ret;
> >
> > +     ret = nvme_thermal_zones_register(ctrl);
> > +     if (ret < 0)
> > +             return ret;
>
>
> I would definitely keep this code the way it is here in
> nvme_init_identify(), but if I read this right, your
> nvme_thermal_zones_register() will never return < 0, and therefore this
> condition is never met.

The nvme_get_log() can return a negative value.  Only in this case,
nvme_thermal_zones_register() returns error and abandon the device
initialization.

  reply	other threads:[~2019-06-03 15:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-26 16:29 [PATCH v3 0/3] nvme: add thermal zone devices Akinobu Mita
2019-05-26 16:29 ` Akinobu Mita
2019-05-26 16:29 ` [PATCH v3 1/3] nvme: Export get and set features Akinobu Mita
2019-05-26 16:29   ` Akinobu Mita
2019-05-26 16:45   ` Chaitanya Kulkarni
2019-05-26 16:45     ` Chaitanya Kulkarni
2019-05-29 15:19   ` Minwoo Im
2019-05-29 15:19     ` Minwoo Im
2019-05-26 16:29 ` [PATCH v3 2/3] nvme: add thermal zone devices Akinobu Mita
2019-05-26 16:29   ` Akinobu Mita
2019-05-29 15:15   ` Minwoo Im
2019-05-29 15:15     ` Minwoo Im
2019-05-29 16:47     ` Akinobu Mita
2019-05-29 16:47       ` Akinobu Mita
2019-05-30 10:18       ` Minwoo Im
2019-05-30 10:18         ` Minwoo Im
2019-06-01  9:02   ` Christoph Hellwig
2019-06-01  9:02     ` Christoph Hellwig
2019-06-02 13:19     ` Akinobu Mita
2019-06-02 13:19       ` Akinobu Mita
2019-06-04  7:31       ` Christoph Hellwig
2019-06-04  7:31         ` Christoph Hellwig
2019-06-05 15:42         ` Akinobu Mita
2019-06-05 15:42           ` Akinobu Mita
2019-06-03  2:36   ` Eduardo Valentin
2019-06-03  2:36     ` Eduardo Valentin
2019-06-03 15:20     ` Akinobu Mita [this message]
2019-06-03 15:20       ` Akinobu Mita
2019-06-06  4:05       ` Eduardo Valentin
2019-06-06  4:05         ` Eduardo Valentin
2019-06-07 15:21         ` Akinobu Mita
2019-06-07 15:21           ` Akinobu Mita
2019-05-26 16:29 ` [PATCH v3 3/3] nvme: notify thermal framework when temperature threshold events occur Akinobu Mita
2019-05-26 16:29   ` Akinobu Mita
2019-06-01  9:03   ` Christoph Hellwig
2019-06-01  9:03     ` Christoph Hellwig
2019-06-02 13:46     ` Akinobu Mita
2019-06-02 13:46       ` Akinobu Mita
2019-06-04  7:32       ` Christoph Hellwig
2019-06-04  7:32         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAC5umyj9V0sTD-ZsK6Q684wPdMpJGs434vDtZDm6a8gwoz3D7Q@mail.gmail.com \
    --to=akinobu.mita@gmail.com \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=axboe@fb.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=edubezval@gmail.com \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=kenneth.heitke@intel.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=minwoo.im.dev@gmail.com \
    --cc=rui.zhang@intel.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.