All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avri Altman <Avri.Altman@wdc.com>
To: Guenter Roeck <linux@roeck-us.net>,
	"James E . J . Bottomley" <jejb@linux.vnet.ibm.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Bart Van Assche <bvanassche@acm.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Bean Huo <beanhuo@micron.com>
Subject: RE: [PATCH v3 1/2] scsi: ufs: Probe for temperature notification support
Date: Mon, 13 Sep 2021 07:06:36 +0000	[thread overview]
Message-ID: <DM6PR04MB65754D1CF6B4769E6CECDB5DFCD99@DM6PR04MB6575.namprd04.prod.outlook.com> (raw)
In-Reply-To: <8abe6364-9240-bcaf-c17f-1703243170cb@roeck-us.net>

> > +config SCSI_UFS_HWMON
> > +     bool "UFS  Temperature Notification"
> > +     depends on SCSI_UFSHCD && HWMON
> > +     help
> > +       This provides support for UFS hardware monitoring. If enabled,
> > +       a hardware monitoring device will be created for the UFS device.
> > +
> > +       If unsure, say N.
> > +
> 
> git complains about blank line at EOF.
Done.

> 
> > diff --git a/drivers/scsi/ufs/Makefile b/drivers/scsi/ufs/Makefile
> > index c407da9b5171..966048875b50 100644
> > --- a/drivers/scsi/ufs/Makefile
> > +++ b/drivers/scsi/ufs/Makefile
> > @@ -10,6 +10,7 @@ ufshcd-core-$(CONFIG_SCSI_UFS_BSG)  += ufs_bsg.o
> >   ufshcd-core-$(CONFIG_SCSI_UFS_CRYPTO)       += ufshcd-crypto.o
> >   ufshcd-core-$(CONFIG_SCSI_UFS_HPB)  += ufshpb.o
> >   ufshcd-core-$(CONFIG_SCSI_UFS_FAULT_INJECTION) +=
> > ufs-fault-injection.o
> > +ufshcd-core-$(CONFIG_SCSI_UFS_HWMON) += ufs-hwmon.o
> >
> >   obj-$(CONFIG_SCSI_UFS_DWC_TC_PCI) += tc-dwc-g210-pci.o ufshcd-dwc.o
> tc-dwc-g210.o
> >   obj-$(CONFIG_SCSI_UFS_DWC_TC_PLATFORM) += tc-dwc-g210-pltfrm.o
> > ufshcd-dwc.o tc-dwc-g210.o diff --git a/drivers/scsi/ufs/ufs-hwmon.c
> > b/drivers/scsi/ufs/ufs-hwmon.c new file mode 100644 index
> > 000000000000..a50e83f645f4
> > --- /dev/null
> > +++ b/drivers/scsi/ufs/ufs-hwmon.c
> > @@ -0,0 +1,179 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * UFS hardware monitoring support
> > + * Copyright (c) 2021, Western Digital Corporation  */
> > +
> > +#include <linux/hwmon.h>
> > +
> > +#include "ufshcd.h"
> > +
> > +struct ufs_hwmon_data {
> > +     struct ufs_hba *hba;
> > +     u8 mask;
> > +};
> > +
> > +static bool ufs_temp_enabled(struct ufs_hba *hba, u8 mask) {
> > +     u32 ee_mask;
> > +
> > +     if (ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_READ_ATTR,
> > +                           QUERY_ATTR_IDN_EE_CONTROL, 0, 0, &ee_mask))
> > +             return false;
> > +
> > +     return (mask & ee_mask & MASK_EE_TOO_HIGH_TEMP) ||
> > +             (mask & ee_mask & MASK_EE_TOO_LOW_TEMP); }
> > +
> > +static bool ufs_temp_valid(struct ufs_hba *hba, u8 mask,
> > +                        enum attr_idn idn, u32 value) {
> > +     return (idn == QUERY_ATTR_IDN_CASE_ROUGH_TEMP && value >= 1
> &&
> > +             value <= 250 && ufs_temp_enabled(hba, mask)) ||
> > +           (idn == QUERY_ATTR_IDN_HIGH_TEMP_BOUND && value >= 100 &&
> > +            value <= 250) ||
> > +           (idn == QUERY_ATTR_IDN_LOW_TEMP_BOUND && value >= 1 &&
> > +            value <= 80);
> > +}
> > +
> The value ranges checed above suggest that the temperature is reported in
> degrees C (or maybe degrees C with an offset). 
Yes.  No offset.

>The hwmon API expects
> temperatures to be reported in milli-degrees C, and I don't see a conversion in
> the actual read functions. What does the "sensors" command report ?
I missed that (Although it is well documented) - sorry about that.
I wasn't aware of the sensors command.  I don't have it in my arm64 android platform image (galaxy s21).
Will try to get it.
I was reading the temperature using hwmon sysfs entries, which indicate the correct temperature.
e.g
t2s:/ # ls -la /sys/class/hwmon/
total 0
drwxr-xr-x   2 root root 0 2020-12-20 10:16 .
drwxr-xr-x 104 root root 0 2020-12-19 19:05 ..
lrwxrwxrwx   1 root root 0 2020-12-20 10:16 hwmon0 -> ../../devices/platform/13100000.ufs/hwmon/hwmon0
.....

t2s:/ # cat /sys/class/hwmon/hwmon0/temp1_input
25

Will fix it.  Thanks.

> 
> > +static int ufs_get_temp(struct ufs_hba *hba, u8 mask, enum attr_idn
> > +idn) {
> > +     u32 value;
> > +
> > +     if (ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_READ_ATTR, idn, 0, 0,
> > +         &value))
> 
> checkpatch states that alignment is off, and I am quite sure this fits into one
> line anyway (with the 100-column limit). There are more instances with bad
> alignment according to checkpatch.
I wasn't aware that the Linux Kernel deprecates the 80 Character Line Coding Style.
Will try to make it full 100-characters lines.
I didn't get any alignment complaints from checkpatch.

> 
> Also, ufshcd_query_attr() returns a valid Linux error code. That should be
> returned to the caller and not be replaced. More on that below.
> 
> > +             return 0;
> > +
> > +     if (ufs_temp_valid(hba, mask, idn, value))
> > +             return value - 80;
> > +
> 
> This again suggests that the temperature is not milli-degrees C.
> 
> Is there reason to believe that this validation is necessary ?
> Note that this reports an "error" if the returned temperature value happens to
> have a value of 80. Again, more on that below.
Data->mask holds the temperature related bits in the ufs features register: TOO_LOW_TEMPERATURE and TOO_HIGH_TEMPERATURE.
This is set for the device by the flash vendor and can't be changed by the OEMs.
If the device doesn't support any of that, then hwmon_probe is not even called, see ufshcd_temp_notif_probe.
So data->mask is not 0, and never changes.

When the device returns a 0 temperature value, it means that it is not valid.
The spec say about the Device’s rough package case surface temperature:
"
This value shall be valid when (TOO_HIGH_TEMPERATURE is supported and TOO_HIGH_TEMP_EN is enabled) or 
( TOO_LOW_TEMPERATURE is supported and TOO_LOW_TEMP_EN is enabled ).
0 : Unknown Temperature , 1~250 : ( this value – 80 ) degrees in Celsius. ( i.e., -79 ºC ~ 170 ºC )
Others: Reserved
"
data->mask is not 0, but the temperature exception bits: TOO_HIGH_TEMP_EN and TOO_LOW_TEMP_EN are of type read/volatile,
Meaning it can be written many times, e.g. by debugfs or ufs-utils.

To sum up:
 - yes, checking the temperature against the spec boundaries is useless.
   The device will return 0 if it is not valid.
   ufs_temp_valid() can be removed, and just need to check that the temperature is not 0.

 - The return value of querry_attr is of less interest.
    if it failed or temp == 0, then the temperature is invalid and the proper return value should be -EINVAL.

> 
> > +     return 0;
> > +}
> > +
> > +static int ufs_hwmon_read(struct device *dev, enum hwmon_sensor_types
> type,
> > +                        u32 attr, int channel, long *val) {
> > +     struct ufs_hwmon_data *data = dev_get_drvdata(dev);
> > +     struct ufs_hba *hba = data->hba;
> > +     u8 mask = data->mask;
> > +     int err = 0;
> > +     bool get_temp = true;
> > +
> > +     if (type != hwmon_temp)
> > +             return 0;
> > +
> > +     down(&hba->host_sem);
> > +
> > +     if (!ufshcd_is_user_access_allowed(hba)) {
> > +             up(&hba->host_sem);
> > +             return -EBUSY;
> > +     }
> > +
> > +     ufshcd_rpm_get_sync(hba);
> > +
> > +     switch (attr) {
> > +     case hwmon_temp_enable:
> > +             *val = ufs_temp_enabled(hba, mask);
> > +             get_temp = false;
> > +
> 
> This seems to be read-only, and the mask only affects the limit registers as far
> as I con see. If so, this is wrong: The mask should be used to enable or hide the
> limit attributes as needed. If the mask is 0, and if this means that the current
> temperature is not reported either, the driver should not instantiate at all.
> 
> The "enable" attribute only makes sense if it can be used to actually enable or
> disable a specific sensor, and is not tied to limit attributes but to the actual
> sensor values.
See explanation above.
 Will make it writable as well.

> 
> > +             break;
> > +     case hwmon_temp_max_alarm:
> > +             *val = ufs_get_temp(hba, mask,
> > + QUERY_ATTR_IDN_HIGH_TEMP_BOUND);
> > +
> > +             break;
> > +     case hwmon_temp_min_alarm:
> > +             *val = ufs_get_temp(hba, mask,
> > + QUERY_ATTR_IDN_LOW_TEMP_BOUND);
> > +
> > +             break;
> > +     case hwmon_temp_input:
> > +             *val = ufs_get_temp(hba, mask,
> > + QUERY_ATTR_IDN_CASE_ROUGH_TEMP);
> > +
> If an enable attribute exists and is 0 (disabled), this should return -ENODATA.
> In this case, that would imply that the driver should not be instantiated in the
> first place since it has nothing to report.
See explanation above.
Will fix it so the error value will make more sense.

> 
> > +             break;
> > +     default:
> > +             err = -EOPNOTSUPP;
> > +
> > +             break;
> > +     }
> > +
> > +     ufshcd_rpm_put_sync(hba);
> > +
> > +     up(&hba->host_sem);
> > +
> > +     if (get_temp && !err && *val == 0)
> > +             err = -EINVAL;
> > +
> That is an odd way of detection errors. If it was in the hwmon subsystem, I'd ask
> for the error handling to be moved into the case statements. On top of that,
> interpreting a return value of "0" as error seems wrong.
> ufs_get_temp() returns 0 if the measured temperature or the reported limit
> happens to have a value of 80, and that is perfectly valid. If ufs_get_temp()
> reports an error, it should report that as error.
> 
> Also, EINVAL is "invalid argument", which I am quite sure does not apply here.
Ditto.
EINVAL implies that the temperature is invalid.

> >
> > +static void ufshcd_temp_notif_probe(struct ufs_hba *hba, u8
> > +*desc_buf) {
> > +     struct ufs_dev_info *dev_info = &hba->dev_info;
> > +     u32 ext_ufs_feature;
> > +     u8 mask = 0;
> > +
> > +     if (!(hba->caps & UFSHCD_CAP_TEMP_NOTIF) ||
> > +         dev_info->wspecversion < 0x300)
> 
> I am quite sure this fits a single line.
Done.

  reply	other threads:[~2021-09-13  7:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-12 13:19 [PATCH v3 0/2] Add temperature notification support Avri Altman
2021-09-12 13:19 ` [PATCH v3 1/2] scsi: ufs: Probe for " Avri Altman
2021-09-12 15:24   ` Guenter Roeck
2021-09-13  7:06     ` Avri Altman [this message]
2021-09-13  7:41       ` Guenter Roeck
2021-09-13  7:49         ` Avri Altman
2021-09-13 14:24           ` Guenter Roeck
2021-09-13 15:26             ` Avri Altman
2021-09-12 13:19 ` [PATCH v3 2/2] scsi: ufs: Add temperature notification exception handling Avri Altman
2021-09-12 15:43   ` kernel test robot
2021-09-12 15:43     ` kernel test robot
2021-09-12 15:55   ` Guenter Roeck
2021-09-13  6:09     ` Avri Altman
2021-09-12 16:33   ` kernel test robot
2021-09-12 16:33     ` kernel test robot
2021-09-12 16:50   ` kernel test robot
2021-09-12 16:50     ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR04MB65754D1CF6B4769E6CECDB5DFCD99@DM6PR04MB6575.namprd04.prod.outlook.com \
    --to=avri.altman@wdc.com \
    --cc=adrian.hunter@intel.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.