linux-hwmon.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questions about adt7470 driver
@ 2020-05-26  9:22 Jean Delvare
  2020-05-27 22:42 ` Guenter Roeck
  0 siblings, 1 reply; 9+ messages in thread
From: Jean Delvare @ 2020-05-26  9:22 UTC (permalink / raw)
  To: linux-hwmon; +Cc: Joshua Scott, Axel Lin, Darrick J. Wong

Hi all,

In the context of bug #207771, I got to look into the adt7470 driver.
I'm slowing understanding the logic of the background temperature
registers update thread, still there are 2 things I do not understand:

1* Function adt7470_read_temperatures() sets data->num_temp_sensors,
however this value seems to be only used to limit the wait time of
future calls to the same function. In the general update function we
still read ALL temperature sensors regardless of its value:

		for (i = 0; i < ADT7470_TEMP_COUNT; i++)
			data->temp[i] = i2c_smbus_read_byte_data(client,
						ADT7470_TEMP_REG(i));

Shouldn't this loop be bounded with data->num_temp_sensors instead of
ADT7470_TEMP_COUNT?

2* Function adt7470_read_temperatures() also sets
data->temperatures_probed to 1, and this boolean is then used to skip
further calls to that function. But do we really need a separate
variable for this, given that num_temp_sensors >= 0 matches the same
condition as far as I can see?

Thanks,
-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-26  9:22 Questions about adt7470 driver Jean Delvare
@ 2020-05-27 22:42 ` Guenter Roeck
  2020-05-27 23:33   ` Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Guenter Roeck @ 2020-05-27 22:42 UTC (permalink / raw)
  To: Jean Delvare; +Cc: linux-hwmon, Joshua Scott, Axel Lin, Darrick J. Wong

On Tue, May 26, 2020 at 11:22:59AM +0200, Jean Delvare wrote:
> Hi all,
> 
> In the context of bug #207771, I got to look into the adt7470 driver.
> I'm slowing understanding the logic of the background temperature
> registers update thread, still there are 2 things I do not understand:
> 
> 1* Function adt7470_read_temperatures() sets data->num_temp_sensors,
> however this value seems to be only used to limit the wait time of
> future calls to the same function. In the general update function we
> still read ALL temperature sensors regardless of its value:
> 
> 		for (i = 0; i < ADT7470_TEMP_COUNT; i++)
> 			data->temp[i] = i2c_smbus_read_byte_data(client,
> 						ADT7470_TEMP_REG(i));
> 
> Shouldn't this loop be bounded with data->num_temp_sensors instead of
> ADT7470_TEMP_COUNT?
> 
Guess so.

> 2* Function adt7470_read_temperatures() also sets
> data->temperatures_probed to 1, and this boolean is then used to skip
> further calls to that function. But do we really need a separate
> variable for this, given that num_temp_sensors >= 0 matches the same
> condition as far as I can see?
> 
Agreed. On the other side, those are optimizations. I am not really sure
if that driver is worth spending time on, given the age of the chip.

Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-27 22:42 ` Guenter Roeck
@ 2020-05-27 23:33   ` Darrick J. Wong
  2020-05-28 10:02     ` Jean Delvare
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2020-05-27 23:33 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jean Delvare, linux-hwmon, Joshua Scott, Axel Lin

On Wed, May 27, 2020 at 03:42:52PM -0700, Guenter Roeck wrote:
> On Tue, May 26, 2020 at 11:22:59AM +0200, Jean Delvare wrote:
> > Hi all,
> > 
> > In the context of bug #207771, I got to look into the adt7470 driver.
> > I'm slowing understanding the logic of the background temperature
> > registers update thread, still there are 2 things I do not understand:
> > 
> > 1* Function adt7470_read_temperatures() sets data->num_temp_sensors,
> > however this value seems to be only used to limit the wait time of
> > future calls to the same function. In the general update function we
> > still read ALL temperature sensors regardless of its value:
> > 
> > 		for (i = 0; i < ADT7470_TEMP_COUNT; i++)
> > 			data->temp[i] = i2c_smbus_read_byte_data(client,
> > 						ADT7470_TEMP_REG(i));
> > 
> > Shouldn't this loop be bounded with data->num_temp_sensors instead of
> > ADT7470_TEMP_COUNT?
> > 
> Guess so.
> 
> > 2* Function adt7470_read_temperatures() also sets
> > data->temperatures_probed to 1, and this boolean is then used to skip
> > further calls to that function. But do we really need a separate
> > variable for this, given that num_temp_sensors >= 0 matches the same
> > condition as far as I can see?
> > 
> Agreed. On the other side, those are optimizations. I am not really sure
> if that driver is worth spending time on, given the age of the chip.

I /think/ the answer to both questions is yes, but I don't have the
hardware anymore so I have no way to QA that... :/

--D

> Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-27 23:33   ` Darrick J. Wong
@ 2020-05-28 10:02     ` Jean Delvare
  2020-05-28 13:52       ` Guenter Roeck
  0 siblings, 1 reply; 9+ messages in thread
From: Jean Delvare @ 2020-05-28 10:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Guenter Roeck, linux-hwmon, Joshua Scott, Axel Lin

On Wed, 27 May 2020 16:33:34 -0700, Darrick J. Wong wrote:
> On Wed, May 27, 2020 at 03:42:52PM -0700, Guenter Roeck wrote:
> > On Tue, May 26, 2020 at 11:22:59AM +0200, Jean Delvare wrote:  
> > > Hi all,
> > > 
> > > In the context of bug #207771, I got to look into the adt7470 driver.
> > > I'm slowing understanding the logic of the background temperature
> > > registers update thread, still there are 2 things I do not understand:
> > > 
> > > 1* Function adt7470_read_temperatures() sets data->num_temp_sensors,
> > > however this value seems to be only used to limit the wait time of
> > > future calls to the same function. In the general update function we
> > > still read ALL temperature sensors regardless of its value:
> > > 
> > > 		for (i = 0; i < ADT7470_TEMP_COUNT; i++)
> > > 			data->temp[i] = i2c_smbus_read_byte_data(client,
> > > 						ADT7470_TEMP_REG(i));
> > > 
> > > Shouldn't this loop be bounded with data->num_temp_sensors instead of
> > > ADT7470_TEMP_COUNT?
> > >   
> > Guess so.
> >   
> > > 2* Function adt7470_read_temperatures() also sets
> > > data->temperatures_probed to 1, and this boolean is then used to skip
> > > further calls to that function. But do we really need a separate
> > > variable for this, given that num_temp_sensors >= 0 matches the same
> > > condition as far as I can see?
> >
> > Agreed. On the other side, those are optimizations. I am not really sure
> > if that driver is worth spending time on, given the age of the chip.

Well it's still in use and apparently at least one user cares enough to
report a bug and propose a candidate fix.

> I /think/ the answer to both questions is yes, but I don't have the
> hardware anymore so I have no way to QA that... :/

Thanks for both of you for your answers.

Darrick, I have 3 more questions for you if you remember...

3* I understand that the temperatures need to be read periodically in
order for automatic fan speed control to work. What I do not understand
is why you bother switching PWM outputs to manual mode each time? What
bad would happen if we did not do that? I see nothing in the datasheet
justifying it.

4* Why are you calling msleep_interruptible() in
adt7470_read_temperatures() to wait for the temperature conversions? We
return -EAGAIN if that happens, but then ignore that error code, and we
log a cryptic error message. Do I understand correctly that the only
case where this should happen is when the user unloads the kernel
driver, in which case we do not care about having been interrupted? I
can't actually get the error message to be logged when rmmod'ing the
module so I don't know what it would take to trigger it.

5* Is there any reason why the update thread is being started
unconditionally? As I understand it, it is only needed if at least one
PWM output is configured in automatic mode, which (I think) is not the
default. It is odd that the bug reporter hits a problem with the
polling thread when they are not using automatic fan speed control in
the first place so they do not need the polling thread to run. I was
wondering if it would be possible to start and stop the polling thread
depending on whether automatic PWM is enabled or not.

Thanks again,
-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-28 10:02     ` Jean Delvare
@ 2020-05-28 13:52       ` Guenter Roeck
  2020-05-29  0:18         ` Darrick J. Wong
  2020-05-29 13:29         ` Jean Delvare
  0 siblings, 2 replies; 9+ messages in thread
From: Guenter Roeck @ 2020-05-28 13:52 UTC (permalink / raw)
  To: Jean Delvare, Darrick J. Wong; +Cc: linux-hwmon, Joshua Scott, Axel Lin

On 5/28/20 3:02 AM, Jean Delvare wrote:
> On Wed, 27 May 2020 16:33:34 -0700, Darrick J. Wong wrote:
>> On Wed, May 27, 2020 at 03:42:52PM -0700, Guenter Roeck wrote:
>>> On Tue, May 26, 2020 at 11:22:59AM +0200, Jean Delvare wrote:  
>>>> Hi all,
>>>>
>>>> In the context of bug #207771, I got to look into the adt7470 driver.
>>>> I'm slowing understanding the logic of the background temperature
>>>> registers update thread, still there are 2 things I do not understand:
>>>>
>>>> 1* Function adt7470_read_temperatures() sets data->num_temp_sensors,
>>>> however this value seems to be only used to limit the wait time of
>>>> future calls to the same function. In the general update function we
>>>> still read ALL temperature sensors regardless of its value:
>>>>
>>>> 		for (i = 0; i < ADT7470_TEMP_COUNT; i++)
>>>> 			data->temp[i] = i2c_smbus_read_byte_data(client,
>>>> 						ADT7470_TEMP_REG(i));
>>>>
>>>> Shouldn't this loop be bounded with data->num_temp_sensors instead of
>>>> ADT7470_TEMP_COUNT?
>>>>   
>>> Guess so.
>>>   
>>>> 2* Function adt7470_read_temperatures() also sets
>>>> data->temperatures_probed to 1, and this boolean is then used to skip
>>>> further calls to that function. But do we really need a separate
>>>> variable for this, given that num_temp_sensors >= 0 matches the same
>>>> condition as far as I can see?
>>>
>>> Agreed. On the other side, those are optimizations. I am not really sure
>>> if that driver is worth spending time on, given the age of the chip.
> 
> Well it's still in use and apparently at least one user cares enough to
> report a bug and propose a candidate fix.
> 
... but at the same time that user doesn't have any temperature sensors
installed and won't be able to test any changes.

>> I /think/ the answer to both questions is yes, but I don't have the
>> hardware anymore so I have no way to QA that... :/
> 
> Thanks for both of you for your answers.
> 
> Darrick, I have 3 more questions for you if you remember...
> 
> 3* I understand that the temperatures need to be read periodically in
> order for automatic fan speed control to work. What I do not understand
> is why you bother switching PWM outputs to manual mode each time? What
> bad would happen if we did not do that? I see nothing in the datasheet
> justifying it.
> 
> 4* Why are you calling msleep_interruptible() in
> adt7470_read_temperatures() to wait for the temperature conversions? We
> return -EAGAIN if that happens, but then ignore that error code, and we
> log a cryptic error message. Do I understand correctly that the only
> case where this should happen is when the user unloads the kernel
> driver, in which case we do not care about having been interrupted? I
> can't actually get the error message to be logged when rmmod'ing the
> module so I don't know what it would take to trigger it.
> 

Not sure if rmmod generates a signal. In theory you could possibly
keep writing -1 into the num_temp_sensors attribute, execute the
sensors command (or just read a temperature), and interrupt the
sequence.

> 5* Is there any reason why the update thread is being started
> unconditionally? As I understand it, it is only needed if at least one
> PWM output is configured in automatic mode, which (I think) is not the
> default. It is odd that the bug reporter hits a problem with the
> polling thread when they are not using automatic fan speed control in
> the first place so they do not need the polling thread to run. I was
> wondering if it would be possible to start and stop the polling thread
> depending on whether automatic PWM is enabled or not.
>


The datasheet says nothing about the need to run such a thread for
automatic mode either. As I understand it, the chip is supposed to
repeat temperature measurements automatically once configuration
register 1 bit 7 is set. Of course that is difficult if not
impossible to confirm without access to the chip.

Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-28 13:52       ` Guenter Roeck
@ 2020-05-29  0:18         ` Darrick J. Wong
  2020-05-29 13:41           ` Jean Delvare
  2020-05-29 13:29         ` Jean Delvare
  1 sibling, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2020-05-29  0:18 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jean Delvare, linux-hwmon, Joshua Scott, Axel Lin

On Thu, May 28, 2020 at 06:52:37AM -0700, Guenter Roeck wrote:
> On 5/28/20 3:02 AM, Jean Delvare wrote:
> > On Wed, 27 May 2020 16:33:34 -0700, Darrick J. Wong wrote:
> >> On Wed, May 27, 2020 at 03:42:52PM -0700, Guenter Roeck wrote:
> >>> On Tue, May 26, 2020 at 11:22:59AM +0200, Jean Delvare wrote:  
> >>>> Hi all,
> >>>>
> >>>> In the context of bug #207771, I got to look into the adt7470 driver.
> >>>> I'm slowing understanding the logic of the background temperature
> >>>> registers update thread, still there are 2 things I do not understand:
> >>>>
> >>>> 1* Function adt7470_read_temperatures() sets data->num_temp_sensors,
> >>>> however this value seems to be only used to limit the wait time of
> >>>> future calls to the same function. In the general update function we
> >>>> still read ALL temperature sensors regardless of its value:
> >>>>
> >>>> 		for (i = 0; i < ADT7470_TEMP_COUNT; i++)
> >>>> 			data->temp[i] = i2c_smbus_read_byte_data(client,
> >>>> 						ADT7470_TEMP_REG(i));
> >>>>
> >>>> Shouldn't this loop be bounded with data->num_temp_sensors instead of
> >>>> ADT7470_TEMP_COUNT?
> >>>>   
> >>> Guess so.
> >>>   
> >>>> 2* Function adt7470_read_temperatures() also sets
> >>>> data->temperatures_probed to 1, and this boolean is then used to skip
> >>>> further calls to that function. But do we really need a separate
> >>>> variable for this, given that num_temp_sensors >= 0 matches the same
> >>>> condition as far as I can see?
> >>>
> >>> Agreed. On the other side, those are optimizations. I am not really sure
> >>> if that driver is worth spending time on, given the age of the chip.
> > 
> > Well it's still in use and apparently at least one user cares enough to
> > report a bug and propose a candidate fix.
> > 
> ... but at the same time that user doesn't have any temperature sensors
> installed and won't be able to test any changes.
> 
> >> I /think/ the answer to both questions is yes, but I don't have the
> >> hardware anymore so I have no way to QA that... :/
> > 
> > Thanks for both of you for your answers.
> > 
> > Darrick, I have 3 more questions for you if you remember...
> > 
> > 3* I understand that the temperatures need to be read periodically in
> > order for automatic fan speed control to work. What I do not understand
> > is why you bother switching PWM outputs to manual mode each time? What
> > bad would happen if we did not do that? I see nothing in the datasheet
> > justifying it.

I don't recall the specifics very well, sorry in advance. :/

I vaguely remember that the adt7470 temperature inputs were connected to
the CPU, and the PWM outputs were connected to the CPU heatsink fans.
The BIOS appeared to set up the adt7470 for automatic thermal management
(i.e. when you cranked all four cores of the machine to maximum) it
would gradually raise the CPU fan speed, like you'd expect.

The reality (again, vaguely remembered) was that the chip wouldn't run
its pwm control loop unless *something* poked it to reread the
temperature sensors.  A different model of the same machine had a BMC
which would talk to the adt7470 over i2c and take care of that.

The other problem was that /some/ of the machines for whatever reason
would adjust the pwm value that you could read out over i2c, but
wouldn't actually change the fan speed unless you whacked the adt into
manual modem.

Neither of those two behaviors were listed in the datasheet, and we
(IBM) could never get an answer out of either Analog or our own hardware
group about whether or not this was the expected behavior.  I
disassembled the BMC code to figure out what the other model computer
was doing, and (clumsily) wrote that into the driver.  For all I know we
got a bad batch of adt7470s and all these weird gymnastics aren't
supposed to be necessary.

The next generation switched to a totally different chip and supplier,
so I surmise they weren't happy with the results either.  Those machines
tended to overheat if you were in Windows.

> > 4* Why are you calling msleep_interruptible() in
> > adt7470_read_temperatures() to wait for the temperature conversions? We
> > return -EAGAIN if that happens, but then ignore that error code, and we
> > log a cryptic error message. Do I understand correctly that the only
> > case where this should happen is when the user unloads the kernel
> > driver, in which case we do not care about having been interrupted? I
> > can't actually get the error message to be logged when rmmod'ing the
> > module so I don't know what it would take to trigger it.

Urrk, what a doof who wrote that.  /me smacks 2009-era djwong. :P

kthread_stop blocks until the thread exits... but strangely we don't
even try to interrupt the msleep_interruptible call.  That's fine,
though device removal will take longer than it needs to.  We also don't
care about the return value of msleep_interruptible at all since one
cannot interrupt the kthread.

I probably picked interruptible sleep to avoid triggering the hangcheck
timer.

> Not sure if rmmod generates a signal. In theory you could possibly
> keep writing -1 into the num_temp_sensors attribute, execute the
> sensors command (or just read a temperature), and interrupt the
> sequence.
> 
> > 5* Is there any reason why the update thread is being started
> > unconditionally? As I understand it, it is only needed if at least one
> > PWM output is configured in automatic mode, which (I think) is not the
> > default. It is odd that the bug reporter hits a problem with the

Yes, the driver should only start the kthread loop if someone wants
automatic temp control.

> > polling thread when they are not using automatic fan speed control in
> > the first place so they do not need the polling thread to run. I was
> > wondering if it would be possible to start and stop the polling thread
> > depending on whether automatic PWM is enabled or not.
> >
> 
> 
> The datasheet says nothing about the need to run such a thread for
> automatic mode either. As I understand it, the chip is supposed to
> repeat temperature measurements automatically once configuration
> register 1 bit 7 is set. Of course that is difficult if not
> impossible to confirm without access to the chip.

IIRC my experiences with that chip were that the one I was using didn't
conform to the datasheet, i.e. the automatic temp management bit /was/
set by the BIOS, but the chip didn't do adjust the pwm control until
something came along and poked the adt7470 to re-query the temperature
sensors.  Unfortunately it was being used to control the CPU fan on a
socket 771 Xeon, which meant that we really /did/ need it to cycle up
the fans when the CPU went to full load.

Needless to say I don't have that Xeon 5080 system anymore. :/

--D

> Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-28 13:52       ` Guenter Roeck
  2020-05-29  0:18         ` Darrick J. Wong
@ 2020-05-29 13:29         ` Jean Delvare
  1 sibling, 0 replies; 9+ messages in thread
From: Jean Delvare @ 2020-05-29 13:29 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Darrick J. Wong, linux-hwmon, Joshua Scott, Axel Lin

On Thu, 28 May 2020 06:52:37 -0700, Guenter Roeck wrote:
> On 5/28/20 3:02 AM, Jean Delvare wrote:
> > Well it's still in use and apparently at least one user cares enough to
> > report a bug and propose a candidate fix.
>
> ... but at the same time that user doesn't have any temperature sensors
> installed and won't be able to test any changes.

They would be able to test the rest of the driver still. Plus I have a
register dump which I am feeding i2c-stub with, and that lets me test
the driver to some extent. God bless Mark M. Hoffman!

What I can't test are timing issues, and hardware misbehavior as
described by Darrick.

> > (...)
> > 4* Why are you calling msleep_interruptible() in
> > adt7470_read_temperatures() to wait for the temperature conversions? We
> > return -EAGAIN if that happens, but then ignore that error code, and we
> > log a cryptic error message. Do I understand correctly that the only
> > case where this should happen is when the user unloads the kernel
> > driver, in which case we do not care about having been interrupted? I
> > can't actually get the error message to be logged when rmmod'ing the
> > module so I don't know what it would take to trigger it.
> 
> Not sure if rmmod generates a signal.

I tested and it doesn't seem so. I expected rmmod to call
adt7470_remove(), in turn calling kthread_stop() and I thought this
would interrupt the thread. But the interrupt message never gets logged.

"modprobe adt7470 && rmmod adt7470" takes 2 seconds, so I assume that
rmmod gives the thread all the time it wants to exit on its own
(through kthread_should_stop()).

> In theory you could possibly
> keep writing -1 into the num_temp_sensors attribute, execute the
> sensors command (or just read a temperature), and interrupt the
> sequence.

Interrupt how? I tried Ctrl+C while "sensors" is waiting for the driver
to update the values, but it waits for completion before actually
exiting.

So far I couldn't find any way to get this msleep_interruptible() to
actually get interrupted.

> (...)
> The datasheet says nothing about the need to run such a thread for
> automatic mode either.

But that at least makes some sense due to the external nature of the
thermal sensors. The data needs to be fetched from the slaves somehow
before you can read it from the ADT7470's registers.

On the other hand, having to change PWM configuration registers for
temperature readings to be correct (/if/ that's what is happening
here... not sure) is highly unexpected.

> As I understand it, the chip is supposed to
> repeat temperature measurements automatically once configuration
> register 1 bit 7 is set. Of course that is difficult if not
> impossible to confirm without access to the chip.

Well there's clearly a huge design mistake for that chip, sharing a
pin between FULL_SPEED and TMP_START makes it pretty much impossible
for automatic temperature monitoring to be implemented without a
software loop. And a hardware monitor device that can't run on its own
is just asking for trouble. Oh well.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-29  0:18         ` Darrick J. Wong
@ 2020-05-29 13:41           ` Jean Delvare
  2020-06-02 18:36             ` Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Jean Delvare @ 2020-05-29 13:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Guenter Roeck, linux-hwmon, Joshua Scott, Axel Lin

On Thu, 28 May 2020 17:18:58 -0700, Darrick J. Wong wrote:
> I vaguely remember that the adt7470 temperature inputs were connected to
> the CPU, and the PWM outputs were connected to the CPU heatsink fans.
> The BIOS appeared to set up the adt7470 for automatic thermal management
> (i.e. when you cranked all four cores of the machine to maximum) it
> would gradually raise the CPU fan speed, like you'd expect.
> 
> The reality (again, vaguely remembered) was that the chip wouldn't run
> its pwm control loop unless *something* poked it to reread the
> temperature sensors.  A different model of the same machine had a BMC
> which would talk to the adt7470 over i2c and take care of that.

That I understand, and while it is poor design in my opinion, it makes
sense to some degree.

> The other problem was that /some/ of the machines for whatever reason
> would adjust the pwm value that you could read out over i2c, but
> wouldn't actually change the fan speed unless you whacked the adt into
> manual modem.

Ah. That would be the reason for the extra code. Automatic fan speed
control that needs to be refreshed manually. Oh my.

> Neither of those two behaviors were listed in the datasheet, and we
> (IBM) could never get an answer out of either Analog or our own hardware
> group about whether or not this was the expected behavior.  I
> disassembled the BMC code to figure out what the other model computer
> was doing, and (clumsily) wrote that into the driver.  For all I know we
> got a bad batch of adt7470s and all these weird gymnastics aren't
> supposed to be necessary.
> 
> The next generation switched to a totally different chip and supplier,
> so I surmise they weren't happy with the results either.  Those machines
> tended to overheat if you were in Windows.
> 
> > > 4* Why are you calling msleep_interruptible() in
> > > adt7470_read_temperatures() to wait for the temperature conversions? We
> > > return -EAGAIN if that happens, but then ignore that error code, and we
> > > log a cryptic error message. Do I understand correctly that the only
> > > case where this should happen is when the user unloads the kernel
> > > driver, in which case we do not care about having been interrupted? I
> > > can't actually get the error message to be logged when rmmod'ing the
> > > module so I don't know what it would take to trigger it.  
> 
> Urrk, what a doof who wrote that.  /me smacks 2009-era djwong. :P
> 
> kthread_stop blocks until the thread exits...

My experiments seem to confirm this.

> but strangely we don't
> even try to interrupt the msleep_interruptible call.

How would we do that if we wanted to? Later you say this is not
possible?

> That's fine,
> though device removal will take longer than it needs to.

Yes, up to 2 seconds in my tests. Not pleasant, but also not
necessarily something to worry about, as rmmod is usually not needed.

> We also don't
> care about the return value of msleep_interruptible at all since one
> cannot interrupt the kthread.
> 
> I probably picked interruptible sleep to avoid triggering the hangcheck
> timer.

I don't understand that part. Is a 2 second uninteruptible sleep in a
kthread considered bad somehow?

> > > 5* Is there any reason why the update thread is being started
> > > unconditionally? As I understand it, it is only needed if at least one
> > > PWM output is configured in automatic mode, which (I think) is not the
> > > default. It is odd that the bug reporter hits a problem with the  
> 
> Yes, the driver should only start the kthread loop if someone wants
> automatic temp control.

OK, I'll give it a try. I don't want to add too much complexity though.

Thanks,
-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Questions about adt7470 driver
  2020-05-29 13:41           ` Jean Delvare
@ 2020-06-02 18:36             ` Darrick J. Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2020-06-02 18:36 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Guenter Roeck, linux-hwmon, Joshua Scott, Axel Lin

On Fri, May 29, 2020 at 03:41:57PM +0200, Jean Delvare wrote:
> On Thu, 28 May 2020 17:18:58 -0700, Darrick J. Wong wrote:
> > I vaguely remember that the adt7470 temperature inputs were connected to
> > the CPU, and the PWM outputs were connected to the CPU heatsink fans.
> > The BIOS appeared to set up the adt7470 for automatic thermal management
> > (i.e. when you cranked all four cores of the machine to maximum) it
> > would gradually raise the CPU fan speed, like you'd expect.
> > 
> > The reality (again, vaguely remembered) was that the chip wouldn't run
> > its pwm control loop unless *something* poked it to reread the
> > temperature sensors.  A different model of the same machine had a BMC
> > which would talk to the adt7470 over i2c and take care of that.
> 
> That I understand, and while it is poor design in my opinion, it makes
> sense to some degree.
> 
> > The other problem was that /some/ of the machines for whatever reason
> > would adjust the pwm value that you could read out over i2c, but
> > wouldn't actually change the fan speed unless you whacked the adt into
> > manual modem.
> 
> Ah. That would be the reason for the extra code. Automatic fan speed
> control that needs to be refreshed manually. Oh my.
> 
> > Neither of those two behaviors were listed in the datasheet, and we
> > (IBM) could never get an answer out of either Analog or our own hardware
> > group about whether or not this was the expected behavior.  I
> > disassembled the BMC code to figure out what the other model computer
> > was doing, and (clumsily) wrote that into the driver.  For all I know we
> > got a bad batch of adt7470s and all these weird gymnastics aren't
> > supposed to be necessary.
> > 
> > The next generation switched to a totally different chip and supplier,
> > so I surmise they weren't happy with the results either.  Those machines
> > tended to overheat if you were in Windows.
> > 
> > > > 4* Why are you calling msleep_interruptible() in
> > > > adt7470_read_temperatures() to wait for the temperature conversions? We
> > > > return -EAGAIN if that happens, but then ignore that error code, and we
> > > > log a cryptic error message. Do I understand correctly that the only
> > > > case where this should happen is when the user unloads the kernel
> > > > driver, in which case we do not care about having been interrupted? I
> > > > can't actually get the error message to be logged when rmmod'ing the
> > > > module so I don't know what it would take to trigger it.  
> > 
> > Urrk, what a doof who wrote that.  /me smacks 2009-era djwong. :P
> > 
> > kthread_stop blocks until the thread exits...
> 
> My experiments seem to confirm this.
> 
> > but strangely we don't
> > even try to interrupt the msleep_interruptible call.
> 
> How would we do that if we wanted to? Later you say this is not
> possible?

You /could/ theoretically send the kthread a signal to interrupt the
sleep, though I can't remember if kthreads are sufficiently special that
signals don't work...

> > That's fine,
> > though device removal will take longer than it needs to.
> 
> Yes, up to 2 seconds in my tests. Not pleasant, but also not
> necessarily something to worry about, as rmmod is usually not needed.

...but probably not necessary since nobody's complained about the 2s
yet.

> > We also don't
> > care about the return value of msleep_interruptible at all since one
> > cannot interrupt the kthread.
> > 
> > I probably picked interruptible sleep to avoid triggering the hangcheck
> > timer.
> 
> I don't understand that part. Is a 2 second uninteruptible sleep in a
> kthread considered bad somehow?

Not really, but the sysadmin can (probably ill-advisedly) set the
hangcheck timer to go off after 1 second.

> > > > 5* Is there any reason why the update thread is being started
> > > > unconditionally? As I understand it, it is only needed if at least one
> > > > PWM output is configured in automatic mode, which (I think) is not the
> > > > default. It is odd that the bug reporter hits a problem with the  
> > 
> > Yes, the driver should only start the kthread loop if someone wants
> > automatic temp control.
> 
> OK, I'll give it a try. I don't want to add too much complexity though.

<nod>

--D

> Thanks,
> -- 
> Jean Delvare
> SUSE L3 Support

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-02 18:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-26  9:22 Questions about adt7470 driver Jean Delvare
2020-05-27 22:42 ` Guenter Roeck
2020-05-27 23:33   ` Darrick J. Wong
2020-05-28 10:02     ` Jean Delvare
2020-05-28 13:52       ` Guenter Roeck
2020-05-29  0:18         ` Darrick J. Wong
2020-05-29 13:41           ` Jean Delvare
2020-06-02 18:36             ` Darrick J. Wong
2020-05-29 13:29         ` Jean Delvare

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).