regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius
@ 2023-04-26 13:39 Stefan Wahren
  2023-04-26 14:01 ` Greg Kroah-Hartman
  2023-05-22 10:44 ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 2 replies; 4+ messages in thread
From: Stefan Wahren @ 2023-04-26 13:39 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Akira Shimahara
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Stefan Wahren,
	regressions

Hi,

recently we switch on our Tarragon board (i.MX6ULL) to Linux 6.1 and 
noticed that the connected 1-wire temperature sensors 
(w1_therm.w1_strong_pull=0) sometimes (~ 1 of 20 times) report 85 
degrees Celsius, which is AFAIK the only way to report errors to the 
1-wire master:

sys/bus/w1/devices/28-04168158faff# cat w1_slave
50 05 4b 46 7f ff 0c 10 1c : crc=1c YES
50 05 4b 46 7f ff 0c 10 1c t=85000

I wasn't able to reproduce this issue with the old kernel 4.9.

After that i successfully bisected the issue to this commit:
67b392f7b8ed ("w1_therm: optimizing temperature read timings")

Unfortunately this commit contains a lot of independent changes, which 
makes it hard to figured out the cause of this issue. So i tried to 
split this patch in seven independent changes [1]. Now i was able to 
bisect the cause further to this change [2] which seems to rework the 
pullup handling within read_therm().

Looking closer at the code change and verify it some debug messages, the 
change inverted the locking behavior (before: no pullup -> keep lock, 
after: no pullup -> release lock during sleep).

Before:
	if (external_power) {
		mutex_unlock(&dev_master->bus_mutex);

		sleep_rem = msleep_interruptible(tm);
		if (sleep_rem != 0) {
			ret = -EINTR;
			goto dec_refcnt;
		}

		ret = mutex_lock_interruptible(&dev_master->bus_mutex);
		if (ret != 0)
			goto dec_refcnt;
	} else if (!w1_strong_pullup) {
		sleep_rem = msleep_interruptible(tm);
		if (sleep_rem != 0) {
			ret = -EINTR;
			goto mt_unlock;
		}
	}

After:
	if (strong_pullup) { /*some device need pullup */
		sleep_rem = msleep_interruptible(tm);
		if (sleep_rem != 0) {
			ret = -EINTR;
			goto mt_unlock;
		}
	} else { /*no device need pullup */
		mutex_unlock(&dev_master->bus_mutex);

		sleep_rem = msleep_interruptible(tm);
		if (sleep_rem != 0) {
			ret = -EINTR;
			goto dec_refcnt;
		}

		ret = mutex_lock_interruptible(&dev_master->bus_mutex);
		if (ret != 0)
			goto dec_refcnt;
	}

I don't believe this is intended. After inverting the strong_pullup 
check, the issue wasn't reproducible on our platform anymore. But i'm 
not sure this is clean.

Best regards

#regzbot introduced: 67b392f7b8ed

[1] - https://github.com/chargebyte/linux/commits/v6.1-tarragon_w1
[2] - 
https://github.com/chargebyte/linux/commit/17ca863a32a6a1bdd376959f05c954bef12fc1b5

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius
  2023-04-26 13:39 Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius Stefan Wahren
@ 2023-04-26 14:01 ` Greg Kroah-Hartman
  2023-04-27 11:47   ` Stefan Wahren
  2023-05-22 10:44 ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 4+ messages in thread
From: Greg Kroah-Hartman @ 2023-04-26 14:01 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Krzysztof Kozlowski, Akira Shimahara, Linux Kernel Mailing List,
	Stefan Wahren, regressions

On Wed, Apr 26, 2023 at 03:39:15PM +0200, Stefan Wahren wrote:
> Hi,
> 
> recently we switch on our Tarragon board (i.MX6ULL) to Linux 6.1 and noticed
> that the connected 1-wire temperature sensors (w1_therm.w1_strong_pull=0)
> sometimes (~ 1 of 20 times) report 85 degrees Celsius, which is AFAIK the
> only way to report errors to the 1-wire master:
> 
> sys/bus/w1/devices/28-04168158faff# cat w1_slave
> 50 05 4b 46 7f ff 0c 10 1c : crc=1c YES
> 50 05 4b 46 7f ff 0c 10 1c t=85000
> 
> I wasn't able to reproduce this issue with the old kernel 4.9.
> 
> After that i successfully bisected the issue to this commit:
> 67b392f7b8ed ("w1_therm: optimizing temperature read timings")
> 
> Unfortunately this commit contains a lot of independent changes, which makes
> it hard to figured out the cause of this issue. So i tried to split this
> patch in seven independent changes [1]. Now i was able to bisect the cause
> further to this change [2] which seems to rework the pullup handling within
> read_therm().
> 
> Looking closer at the code change and verify it some debug messages, the
> change inverted the locking behavior (before: no pullup -> keep lock, after:
> no pullup -> release lock during sleep).
> 
> Before:
> 	if (external_power) {
> 		mutex_unlock(&dev_master->bus_mutex);
> 
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto dec_refcnt;
> 		}
> 
> 		ret = mutex_lock_interruptible(&dev_master->bus_mutex);
> 		if (ret != 0)
> 			goto dec_refcnt;
> 	} else if (!w1_strong_pullup) {
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto mt_unlock;
> 		}
> 	}
> 
> After:
> 	if (strong_pullup) { /*some device need pullup */
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto mt_unlock;
> 		}
> 	} else { /*no device need pullup */
> 		mutex_unlock(&dev_master->bus_mutex);
> 
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto dec_refcnt;
> 		}
> 
> 		ret = mutex_lock_interruptible(&dev_master->bus_mutex);
> 		if (ret != 0)
> 			goto dec_refcnt;
> 	}
> 
> I don't believe this is intended. After inverting the strong_pullup check,
> the issue wasn't reproducible on our platform anymore. But i'm not sure this
> is clean.
> 
> Best regards
> 
> #regzbot introduced: 67b392f7b8ed
> 
> [1] - https://github.com/chargebyte/linux/commits/v6.1-tarragon_w1
> [2] - https://github.com/chargebyte/linux/commit/17ca863a32a6a1bdd376959f05c954bef12fc1b5

Can you send a patch that shows the change you wish to have made here so
that you can get credit for fixing the issue?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius
  2023-04-26 14:01 ` Greg Kroah-Hartman
@ 2023-04-27 11:47   ` Stefan Wahren
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Wahren @ 2023-04-27 11:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Krzysztof Kozlowski, Akira Shimahara, Linux Kernel Mailing List,
	Stefan Wahren, regressions

> Can you send a patch that shows the change you wish to have made here so
> that you can get credit for fixing the issue?

Sure, just for reference:

https://lore.kernel.org/lkml/20230427112152.12313-1-stefan.wahren@i2se.com/T/

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius
  2023-04-26 13:39 Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius Stefan Wahren
  2023-04-26 14:01 ` Greg Kroah-Hartman
@ 2023-05-22 10:44 ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 0 replies; 4+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-05-22 10:44 UTC (permalink / raw)
  To: Stefan Wahren, Krzysztof Kozlowski, Akira Shimahara
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Stefan Wahren,
	regressions

On 26.04.23 15:39, Stefan Wahren wrote:
> recently we switch on our Tarragon board (i.MX6ULL) to Linux 6.1 and
> noticed that the connected 1-wire temperature sensors
> (w1_therm.w1_strong_pull=0) sometimes (~ 1 of 20 times) report 85
> degrees Celsius, which is AFAIK the only way to report errors to the
> 1-wire master:
> [...]
> #regzbot introduced: 67b392f7b8ed

#regzbot fix: w1_therm: optimizing temperature read timings
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-22 10:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-26 13:39 Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius Stefan Wahren
2023-04-26 14:01 ` Greg Kroah-Hartman
2023-04-27 11:47   ` Stefan Wahren
2023-05-22 10:44 ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).