Re: [PATCH 2/3] hwmon: Add support for ltc2947

From: Guenter Roeck <linux@roeck-us.net>
To: "Sa, Nuno" <Nuno.Sa@analog.com>
Cc: "jdelvare@suse.com" <jdelvare@suse.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"linux-hwmon@vger.kernel.org" <linux-hwmon@vger.kernel.org>,
	"devicetree@vger.kernel.org" <devicetree@vger.kernel.org>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"robh+dt@kernel.org" <robh+dt@kernel.org>
Subject: Re: [PATCH 2/3] hwmon: Add support for ltc2947
Date: Mon, 7 Oct 2019 05:44:53 -0700	[thread overview]
Message-ID: <94cf417f-90fa-50b8-9d4a-d7e4c9dd3d8d@roeck-us.net> (raw)
In-Reply-To: <7d4ca133201f8c75855de6777f6018567701e16a.camel@analog.com>

On 10/7/19 5:25 AM, Sa, Nuno wrote:
> On Fri, 2019-10-04 at 08:06 -0700, Guenter Roeck wrote:
>>
>> On Fri, Oct 04, 2019 at 07:45:07AM +0000, Sa, Nuno wrote:
>> [ ... ]
>>>>> +static int ltc2947_val_read(struct ltc2947_data *st, const u8
>>>>> reg,
>>>>> +			    const u8 page, const size_t size,
>>>>> s64 *val)
>>>>> +{
>>>>> +	int ret;
>>>>> +	u64 __val = 0;
>>>>> +
>>>>> +	mutex_lock(&st->lock);
>>>>> +
>>
>> On a side note, suspend code is supposed to be atomic,
>> If this lock is held, the process or thread holding it
>> will likely stall suspend since it won't run.
>> At the very least, this would have to be a trylock,
>> with suspend failing if the lock can not be acquired.
> 
> Got it. Even more, Now I don't see the point of having the mutex in the
> PM callbacks (though I saw other drivers doing this). As you said, at
> the very least the trylock should be used since a frozen task might
> have the mutex locked. Either way, I will drop both the flag and the
> call to mutex_* functions is suspend()/resume().
> 
>>>>> +	if (st->reset) {
>>>>> +		mutex_unlock(&st->lock);
>>>>> +		return -EPERM;
>>>>
>>>> Not sure what the error here should be, but EPERM is not correct.
>>>>
>>>> Under which conditions would this function be called while in
>>>> suspend ?
>>>
>>> Honestly, this is more like a sanity check. I'm not sure if we can
>>> get
>>> here in suspend mode. Don't userland apps can still run in suspend?
>>> I
>>> guess so but I'm not 100% sure on this. Do you have any
>>> recommendation
>>> for the error here?
>>>
>> Sorry, I won't accept "just in case" code.
>>
>> Having said that, please inform yourself how suspend works. Userland
>> code
>> has to be stopped before any hardware can be disabled. Similar,
>> hardware
>> has to be re-enabled before userland code can resume.
>> Otherwise the kernel would crash all over the place. In many cases,
>> disabling the hardware means that trying to access hardware registers
>> will cause an acess fault.
> 
> Yes, you are right. I did checked the suspend code and all userland
> tasks and kthreads are stopped before the suspend() callback is called
> for the HW devices.
> 
>> [...]
>>
>>>>> +
>>>>> +static struct attribute *ltc2947_attrs[] = {
>>>>> +	&sensor_dev_attr_in0_fault.dev_attr.attr,
>>>>> +	&sensor_dev_attr_in1_fault.dev_attr.attr,
>>>>> +	&sensor_dev_attr_curr1_fault.dev_attr.attr,
>>>>> +	&sensor_dev_attr_temp1_fault.dev_attr.attr,
>>>>> +	&sensor_dev_attr_power1_input.dev_attr.attr,
>>>>> +	&sensor_dev_attr_power1_max.dev_attr.attr,
>>>>> +	&sensor_dev_attr_power1_min.dev_attr.attr,
>>>>> +	&sensor_dev_attr_power1_input_highest.dev_attr.attr,
>>>>> +	&sensor_dev_attr_power1_input_lowest.dev_attr.attr,
>>>>> +	&sensor_dev_attr_power1_fault.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy1_input.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy1_max.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy1_min.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy1_max_alarm.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy1_min_alarm.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy2_input.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy2_max.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy2_min.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy2_max_alarm.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy2_min_alarm.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy1_overflow_alarm.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy2_overflow_alarm.dev_attr.attr,
>>>
>>> These overflow attributes are kind of an alarm for the energy ones.
>>> It
>>> tells that the energy registers are about to overflow. I guess that
>>> some application can easily find out the maximum values supported
>>> on
>>> these registers and implement whatever logic they want in the app
>>> itself. So, if you prefer I can just drop this ones?
>>>
>> I understand the overflow use case. However, the mere presence
>> of min/max attributes for energy attributes suggests that this
>> is not the min/max use case for hwmon attributes. There is no
>> "minimum"
>> or "maximum" energy for an accumulating value. Such attributes
>> only make sense in an application abler to measure available
>> energy (eg a battery controller). I'll have to read the chip
>> specification to understand the intended use case.
> 
> Should I just drop the overflow attributes? I think the part can be
> used to check battery charging efficiency for example (though I guess
> we would need to also support the Charge register's).
> 

If chip is (or can be) used as charger, it should register as such.
Note my question was the energy limit attributes, not the overflow
attributes.

>>>>> +	&sensor_dev_attr_energy1_fault.dev_attr.attr,
>>>>> +	&sensor_dev_attr_energy2_fault.dev_attr.attr,
>>>>
>>>> Some of those are non-standard attributes. You'll have
>>>> to explain each in detail, especially why it makes sense
>>>> to provide such attributes to the user and why you can't
>>>> use standard attributes instead. Also, for the _fault
>>>> attributes, I don't entirely see the point. If the fault bit
>>>> is set, ADC readings are not valid because supply voltage
>>>> is low. This means that ADC register reads will be invalid.
>>>> What is the point of having a non-standard attribute - which
>>>> likely will be ignored - instead of returning an error when
>>>> an attempt is made to read an ADC value ?
>>>
>>> I was also not sure on this *_fault attributes. They are there to
>>> tell
>>> that the readings are invalid. Now that I think about it, I'm not
>>> sure
>>> if it even makes sense to return error if this bit is set. The part
>>> is
>>> in continuous mode so, it might happen that we have the fault bit
>>> set
>>> for a short time but afterwards things go normal and the bit will
>>> still
>>> be set until we read it. So my point is, we might be returning
>>> error
>>> for a conversion that happened way before our current reading. Any
>>> suggestion here? Would you be fine if I just drop this attributes?
>>>    
>>
>> It sounds like "fault" means something like "one of the past readings
>> was wrong, but I don't know which one and I don't know if the wrong
>> value was ever read by user space". Sorry, I fail to see what value
>> those attributes are supposed to have for the user. At best it could
>> mean "please re-read the associated attrribute", but that could as
>> well
>> be accomplished without userspace action if it is really needed.
>> Also, per datasheet, it looks like the fault bit is set of the chip
>> voltage is too low. If that happens, the system has a severe problem
>> which can not be resolved with an attribute visible to userspace.
> 
> I will drop the fault attributes.
> 
>>>> Others, like energy1_input, or most of the power attributes,
>>>> are standard attributes. Please explain the reasoning for
>>>> not using the standard API for those.
>>>
>>> This ones were because we need 64bit variables. For energy, the
>>> part
>>> also supports the alarms, max and min attributes so I included
>>> them.
>>>   
>> I can to some degree the logic for energy attributes, but do you
>> really
>> have an application where the chip is used on a 32-bit system and
>> monitors power larger than 2kW ?
> 
> Hmm, I looked again at the chip specification and unless I'm missing
> something obvious the part can only measure +/- 30A and 0-15V giving us
> +/- 450W which definitely fits in a long variable. The only thing that
> will be truncated is the min/max values. The part, by default, has this
> value to 0x7fff and 0x8000 which times 200000uW (part scale) will be
> truncated. Now, we can argue that this max/min values are not real and
> the user is expected to write this attributes with some meaningful
> values? How do you suggest to proceed? Should I just use standard
> attributes for power?
>   
How about detecting the overflow on read and just report the maximum
supported value ? Or, alternatively, initialize the register with the
maximum supported value.

Guenter