All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
@ 2023-05-26 11:45 Dan Carpenter
  2023-05-26 11:50 ` Russell King (Oracle)
  2023-05-30  4:58 ` Jakub Kicinski
  0 siblings, 2 replies; 17+ messages in thread
From: Dan Carpenter @ 2023-05-26 11:45 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	kernel-janitors

The "val" variable is used to store error codes from phy_read() so
it needs to be signed for the error handling to work as expected.

Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
---
 drivers/net/phy/phy_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 2cad9cc3f6b8..d52dd699ae0b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2700,8 +2700,8 @@ EXPORT_SYMBOL(genphy_resume);
 int genphy_loopback(struct phy_device *phydev, bool enable)
 {
 	if (enable) {
-		u16 val, ctl = BMCR_LOOPBACK;
-		int ret;
+		u16 ctl = BMCR_LOOPBACK;
+		int val, ret;
 
 		ctl |= mii_bmcr_encode_fixed(phydev->speed, phydev->duplex);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-26 11:45 [PATCH net] net: phy: fix a signedness bug in genphy_loopback() Dan Carpenter
@ 2023-05-26 11:50 ` Russell King (Oracle)
  2023-05-30  4:58 ` Jakub Kicinski
  1 sibling, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2023-05-26 11:50 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Oleksij Rempel, Andrew Lunn, Heiner Kallweit, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	kernel-janitors

On Fri, May 26, 2023 at 02:45:54PM +0300, Dan Carpenter wrote:
> The "val" variable is used to store error codes from phy_read() so
> it needs to be signed for the error handling to work as expected.
> 
> Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")

LGTM.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-26 11:45 [PATCH net] net: phy: fix a signedness bug in genphy_loopback() Dan Carpenter
  2023-05-26 11:50 ` Russell King (Oracle)
@ 2023-05-30  4:58 ` Jakub Kicinski
  2023-05-30  9:01   ` Russell King (Oracle)
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-05-30  4:58 UTC (permalink / raw)
  To: Andrew Lunn, Russell King
  Cc: Dan Carpenter, Oleksij Rempel, Heiner Kallweit, David S. Miller,
	Eric Dumazet, Paolo Abeni, netdev, kernel-janitors

On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> The "val" variable is used to store error codes from phy_read() so
> it needs to be signed for the error handling to work as expected.
> 
> Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>

Is it going to be obvious to PHY-savvy folks that the val passed to
phy_read_poll_timeout() must be an int? Is it a very common pattern?
My outsider intuition is that since regs are 16b, u16 is reasonable,
and more people may make the same mistake. Therefore we should try to
fix phy_read_poll_timeout() instead to use a local variable like it
does for __ret. 

Weaker version would be to add a compile time check to ensure val 
is signed (assert(typeof(val)~0ULL < 0) or such?).

Opinions?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30  4:58 ` Jakub Kicinski
@ 2023-05-30  9:01   ` Russell King (Oracle)
  2023-05-30  9:06   ` Paolo Abeni
  2023-05-30 12:39   ` Andrew Lunn
  2 siblings, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2023-05-30  9:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Dan Carpenter, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

On Mon, May 29, 2023 at 09:58:02PM -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > The "val" variable is used to store error codes from phy_read() so
> > it needs to be signed for the error handling to work as expected.
> > 
> > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> 
> Is it going to be obvious to PHY-savvy folks that the val passed to
> phy_read_poll_timeout() must be an int? Is it a very common pattern?
> My outsider intuition is that since regs are 16b, u16 is reasonable,
> and more people may make the same mistake. Therefore we should try to
> fix phy_read_poll_timeout() instead to use a local variable like it
> does for __ret. 
> 
> Weaker version would be to add a compile time check to ensure val 
> is signed (assert(typeof(val)~0ULL < 0) or such?).
> 
> Opinions?

Yes, I think that would be a saner approach, since
phy_read_poll_timeout() returns the error value, rather than using
the variable passed in. Andrew?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30  4:58 ` Jakub Kicinski
  2023-05-30  9:01   ` Russell King (Oracle)
@ 2023-05-30  9:06   ` Paolo Abeni
  2023-05-30  9:23     ` Dan Carpenter
  2023-05-30 12:39   ` Andrew Lunn
  2 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2023-05-30  9:06 UTC (permalink / raw)
  To: Jakub Kicinski, Andrew Lunn, Russell King
  Cc: Dan Carpenter, Oleksij Rempel, Heiner Kallweit, David S. Miller,
	Eric Dumazet, netdev, kernel-janitors

On Mon, 2023-05-29 at 21:58 -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > The "val" variable is used to store error codes from phy_read() so
> > it needs to be signed for the error handling to work as expected.
> > 
> > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> 
> Is it going to be obvious to PHY-savvy folks that the val passed to
> phy_read_poll_timeout() must be an int? Is it a very common pattern?
> My outsider intuition is that since regs are 16b, u16 is reasonable,
> and more people may make the same mistake. Therefore we should try to
> fix phy_read_poll_timeout() instead to use a local variable like it
> does for __ret. 
> 
> Weaker version would be to add a compile time check to ensure val 
> is signed (assert(typeof(val)~0ULL < 0) or such?).

FTR, a BUILD_BUG_ON() the above check spots issues in several places
(e.g. r8169_main.c, drivers/net/phy/phy_device.c, ...)

I think it should be better resort to a signed local variable in
phy_read_poll_timeout()

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30  9:06   ` Paolo Abeni
@ 2023-05-30  9:23     ` Dan Carpenter
  2023-05-30  9:40       ` Paolo Abeni
  2023-05-30  9:49       ` Russell King (Oracle)
  0 siblings, 2 replies; 17+ messages in thread
From: Dan Carpenter @ 2023-05-30  9:23 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jakub Kicinski, Andrew Lunn, Russell King, Oleksij Rempel,
	Heiner Kallweit, David S. Miller, Eric Dumazet, netdev,
	kernel-janitors

On Tue, May 30, 2023 at 11:06:55AM +0200, Paolo Abeni wrote:
> On Mon, 2023-05-29 at 21:58 -0700, Jakub Kicinski wrote:
> > On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > > The "val" variable is used to store error codes from phy_read() so
> > > it needs to be signed for the error handling to work as expected.
> > > 
> > > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > 
> > Is it going to be obvious to PHY-savvy folks that the val passed to
> > phy_read_poll_timeout() must be an int? Is it a very common pattern?
> > My outsider intuition is that since regs are 16b, u16 is reasonable,
> > and more people may make the same mistake. Therefore we should try to
> > fix phy_read_poll_timeout() instead to use a local variable like it
> > does for __ret. 
> > 
> > Weaker version would be to add a compile time check to ensure val 
> > is signed (assert(typeof(val)~0ULL < 0) or such?).
> 
> FTR, a BUILD_BUG_ON() the above check spots issues in several places
> (e.g. r8169_main.c, drivers/net/phy/phy_device.c, ...)
> 

I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
then I only find the bug from this patch.

regards,
dan carpenter

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 6478838405a08..f05fc25b77583 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1173,6 +1173,7 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
 ({ \
 	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
 		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
+	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\
 	if (val < 0) \
 		__ret = val; \
 	if (__ret) \

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30  9:23     ` Dan Carpenter
@ 2023-05-30  9:40       ` Paolo Abeni
  2023-05-30  9:49       ` Russell King (Oracle)
  1 sibling, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2023-05-30  9:40 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Jakub Kicinski, Andrew Lunn, Russell King, Oleksij Rempel,
	Heiner Kallweit, David S. Miller, Eric Dumazet, netdev,
	kernel-janitors

On Tue, 2023-05-30 at 12:23 +0300, Dan Carpenter wrote:
> On Tue, May 30, 2023 at 11:06:55AM +0200, Paolo Abeni wrote:
> > On Mon, 2023-05-29 at 21:58 -0700, Jakub Kicinski wrote:
> > > On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > > > The "val" variable is used to store error codes from phy_read() so
> > > > it needs to be signed for the error handling to work as expected.
> > > > 
> > > > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > > > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > > 
> > > Is it going to be obvious to PHY-savvy folks that the val passed to
> > > phy_read_poll_timeout() must be an int? Is it a very common pattern?
> > > My outsider intuition is that since regs are 16b, u16 is reasonable,
> > > and more people may make the same mistake. Therefore we should try to
> > > fix phy_read_poll_timeout() instead to use a local variable like it
> > > does for __ret. 
> > > 
> > > Weaker version would be to add a compile time check to ensure val 
> > > is signed (assert(typeof(val)~0ULL < 0) or such?).
> > 
> > FTR, a BUILD_BUG_ON() the above check spots issues in several places
> > (e.g. r8169_main.c, drivers/net/phy/phy_device.c, ...)
> > 
> 
> I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
> then I only find the bug from this patch.
> 
> regards,
> dan carpenter
> 
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 6478838405a08..f05fc25b77583 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -1173,6 +1173,7 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
>  ({ \
>  	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
>  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> +	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\
>  	if (val < 0) \
>  		__ret = val; \
>  	if (__ret) \
> 

Uhm... I have no idea what happened to my build host. I did see more
build errors in previous attempt, but now I only observe the one you
address with this patch. I guess some PEBKAC hit me here.

/P


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30  9:23     ` Dan Carpenter
  2023-05-30  9:40       ` Paolo Abeni
@ 2023-05-30  9:49       ` Russell King (Oracle)
  2023-05-30 10:06         ` Russell King (Oracle)
  1 sibling, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2023-05-30  9:49 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Paolo Abeni, Jakub Kicinski, Andrew Lunn, Oleksij Rempel,
	Heiner Kallweit, David S. Miller, Eric Dumazet, netdev,
	kernel-janitors

On Tue, May 30, 2023 at 12:23:32PM +0300, Dan Carpenter wrote:
> I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
> then I only find the bug from this patch.

I agree - inspecting the code reveals that "val" would be of type "int".

> +	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\

I've just thrown this in to my builds, and building for arm64 using
debian stable's gcc, I don't see any errors with genphy_loopback()
suitably hacked, even with r8169 included in the build.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30  9:49       ` Russell King (Oracle)
@ 2023-05-30 10:06         ` Russell King (Oracle)
  0 siblings, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2023-05-30 10:06 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Paolo Abeni, Jakub Kicinski, Andrew Lunn, Oleksij Rempel,
	Heiner Kallweit, David S. Miller, Eric Dumazet, netdev,
	kernel-janitors

On Tue, May 30, 2023 at 10:49:22AM +0100, Russell King (Oracle) wrote:
> On Tue, May 30, 2023 at 12:23:32PM +0300, Dan Carpenter wrote:
> > I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
> > then I only find the bug from this patch.
> 
> I agree - inspecting the code reveals that "val" would be of type "int".
> 
> > +	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\
> 
> I've just thrown this in to my builds, and building for arm64 using
> debian stable's gcc, I don't see any errors with genphy_loopback()
> suitably hacked, even with r8169 included in the build.

Also successfully built with 32-bit ARM gcc.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30  4:58 ` Jakub Kicinski
  2023-05-30  9:01   ` Russell King (Oracle)
  2023-05-30  9:06   ` Paolo Abeni
@ 2023-05-30 12:39   ` Andrew Lunn
  2023-05-30 14:40     ` Dan Carpenter
  2023-05-30 19:19     ` Jakub Kicinski
  2 siblings, 2 replies; 17+ messages in thread
From: Andrew Lunn @ 2023-05-30 12:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Russell King, Dan Carpenter, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

On Mon, May 29, 2023 at 09:58:02PM -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > The "val" variable is used to store error codes from phy_read() so
> > it needs to be signed for the error handling to work as expected.
> > 
> > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> 
> Is it going to be obvious to PHY-savvy folks that the val passed to
> phy_read_poll_timeout() must be an int? Is it a very common pattern?
> My outsider intuition is that since regs are 16b, u16 is reasonable,
> and more people may make the same mistake.

It is common to get this wrong in general with PHY drivers. Dan
regularly posts fixes like this soon after a PHY driver patch it
merged. I really wish we could somehow get the compiler to warn when
the result from phy_read() is stored into a unsigned type. It would
save Dan a lot of work.

> Therefore we should try to fix phy_read_poll_timeout() instead to
> use a local variable like it does for __ret.

The problem with that is val is supposed to be available to the
caller. I don't know if it is every actually used, but if it is, using
an internal signed variable and then throwing away the sign bit on
return is going to result in similar bugs.
 
> Weaker version would be to add a compile time check to ensure val 
> is signed (assert(typeof(val)~0ULL < 0) or such?).

I think the BUILD BUG is a better solution, since it catches all
problems. And at developer compile time, rather than at Dan runs his
static checker time.

	Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30 12:39   ` Andrew Lunn
@ 2023-05-30 14:40     ` Dan Carpenter
  2023-05-30 17:06       ` Andrew Lunn
  2023-05-30 19:19     ` Jakub Kicinski
  1 sibling, 1 reply; 17+ messages in thread
From: Dan Carpenter @ 2023-05-30 14:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Russell King, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

On Tue, May 30, 2023 at 02:39:53PM +0200, Andrew Lunn wrote:
> On Mon, May 29, 2023 at 09:58:02PM -0700, Jakub Kicinski wrote:
> > On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > > The "val" variable is used to store error codes from phy_read() so
> > > it needs to be signed for the error handling to work as expected.
> > > 
> > > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > 
> > Is it going to be obvious to PHY-savvy folks that the val passed to
> > phy_read_poll_timeout() must be an int? Is it a very common pattern?
> > My outsider intuition is that since regs are 16b, u16 is reasonable,
> > and more people may make the same mistake.
> 
> It is common to get this wrong in general with PHY drivers. Dan
> regularly posts fixes like this soon after a PHY driver patch it
> merged. I really wish we could somehow get the compiler to warn when
> the result from phy_read() is stored into a unsigned type. It would
> save Dan a lot of work.

I don't see these as much as I used.  It's maybe once per month.  I'm
not sure why, maybe kbuild emails everyone before I see it?  GCC will
warn about this with -Wtype-limits.  Clang will also trigger a warning.

The Smatch check for this had a bug where it only warned about if
(x < 0) { if x was u32 or larger.  I fixed that bug which is why I was
looking at this code.  I will push the fix for that in a couple days.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30 14:40     ` Dan Carpenter
@ 2023-05-30 17:06       ` Andrew Lunn
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2023-05-30 17:06 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Jakub Kicinski, Russell King, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

> I don't see these as much as I used.  It's maybe once per month.  I'm
> not sure why, maybe kbuild emails everyone before I see it?  GCC will
> warn about this with -Wtype-limits.  Clang will also trigger a warning.

That is interesting. Maybe we should enable that in driver/net/net and
driver/net/pcs.

	Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30 12:39   ` Andrew Lunn
  2023-05-30 14:40     ` Dan Carpenter
@ 2023-05-30 19:19     ` Jakub Kicinski
  2023-05-30 19:39       ` Russell King (Oracle)
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-05-30 19:19 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King, Dan Carpenter, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

On Tue, 30 May 2023 14:39:53 +0200 Andrew Lunn wrote:
> > Therefore we should try to fix phy_read_poll_timeout() instead to
> > use a local variable like it does for __ret.  
> 
> The problem with that is val is supposed to be available to the
> caller. I don't know if it is every actually used, but if it is, using
> an internal signed variable and then throwing away the sign bit on
> return is going to result in similar bugs.

This is what I meant FWIW:

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7addde5d14c0..829bd57b8794 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
 #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
 				timeout_us, sleep_before_read) \
 ({ \
-	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
+	int __ret, __val;						\
+									\
+	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
 		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
-	if (val < 0) \
-		__ret = val; \
+	val = __val;
+	if (__val < 0) \
+		__ret = __val; \
 	if (__ret) \
 		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
 	__ret; \


I tried enabling -Wtype-limits but it's _very_ noisy :(

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30 19:19     ` Jakub Kicinski
@ 2023-05-30 19:39       ` Russell King (Oracle)
  2023-05-30 20:04         ` Andrew Lunn
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2023-05-30 19:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Dan Carpenter, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

On Tue, May 30, 2023 at 12:19:10PM -0700, Jakub Kicinski wrote:
> On Tue, 30 May 2023 14:39:53 +0200 Andrew Lunn wrote:
> > > Therefore we should try to fix phy_read_poll_timeout() instead to
> > > use a local variable like it does for __ret.  
> > 
> > The problem with that is val is supposed to be available to the
> > caller. I don't know if it is every actually used, but if it is, using
> > an internal signed variable and then throwing away the sign bit on
> > return is going to result in similar bugs.
> 
> This is what I meant FWIW:
> 
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 7addde5d14c0..829bd57b8794 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
>  #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
>  				timeout_us, sleep_before_read) \
>  ({ \
> -	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
> +	int __ret, __val;						\
> +									\
> +	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
>  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> -	if (val < 0) \
> -		__ret = val; \
> +	val = __val;
> +	if (__val < 0) \
> +		__ret = __val; \
>  	if (__ret) \
>  		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
>  	__ret; \
> 
> 
> I tried enabling -Wtype-limits but it's _very_ noisy :(

Yes, looks good, that's what I thought you were meaning, and I totally
agree with it. Thanks!

Whatever we decide for this will also need to be applied to
phy_read_mmd_poll_timeout() as well.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30 19:39       ` Russell King (Oracle)
@ 2023-05-30 20:04         ` Andrew Lunn
  2023-05-30 21:09           ` Russell King (Oracle)
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Lunn @ 2023-05-30 20:04 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Dan Carpenter, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

> > This is what I meant FWIW:
> > 
> > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > index 7addde5d14c0..829bd57b8794 100644
> > --- a/include/linux/phy.h
> > +++ b/include/linux/phy.h
> > @@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
> >  #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
> >  				timeout_us, sleep_before_read) \
> >  ({ \
> > -	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
> > +	int __ret, __val;						\
> > +									\
> > +	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
> >  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> > -	if (val < 0) \
> > -		__ret = val; \
> > +	val = __val;

This results in the sign being discarded if val is unsigned. Yes, the
test is remove, which i assume will stop Smatch complaining, but it is
still broken.

> > +	if (__val < 0) \
> > +		__ret = __val; \
> >  	if (__ret) \
> >  		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
> >  	__ret; \

> > I tried enabling -Wtype-limits but it's _very_ noisy :(

This is a no go until GENMASK gets fixed :-(

However, if that is fixed, we might be able to turn it on. But it will
then trigger with this fix.

So i still think a BUILD_BUG_ON is a better fix. Help developers get
the code correct, rather than work around them getting it wrong.

I also wonder if this needs to go down a level. Dan, how often do you
see similar problems with the lower level read_poll_timeout() and
friends?

    Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30 20:04         ` Andrew Lunn
@ 2023-05-30 21:09           ` Russell King (Oracle)
  2023-05-30 21:28             ` Russell King (Oracle)
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2023-05-30 21:09 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Dan Carpenter, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

On Tue, May 30, 2023 at 10:04:52PM +0200, Andrew Lunn wrote:
> > > This is what I meant FWIW:
> > > 
> > > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > > index 7addde5d14c0..829bd57b8794 100644
> > > --- a/include/linux/phy.h
> > > +++ b/include/linux/phy.h
> > > @@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
> > >  #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
> > >  				timeout_us, sleep_before_read) \
> > >  ({ \
> > > -	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
> > > +	int __ret, __val;						\
> > > +									\
> > > +	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
> > >  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> > > -	if (val < 0) \
> > > -		__ret = val; \
> > > +	val = __val;
> 
> This results in the sign being discarded if val is unsigned. Yes, the
> test is remove, which i assume will stop Smatch complaining, but it is
> still broken.

I was going to ask you to explain that, but having thought about
this more, there's much bigger problems with the proposal.

First, if I'm understanding you correctly, your point doesn't seem
relevant, because if val is unsigned, we have an implicit cast from a
signed int to an unsigned int _at_ _some_ _point_. With the existing
code, that implicit cast is buried inside read_poll_timeout(), here
to be exact:

	(val) = op(args);

because "op" will be one of the phy_read*() functions that returns an
"int", but "val" is unsigned - which means there's an implicit cast
here. Jakub's patch moves that cast after read_poll_timeout().

The elephant in the room has nothing to do with this, but everything
to do with "cond". "cond" is an expression to be evaluated inside the
loop, which must have access to the value read from the phy_read*()
function, and that value is referenced via whatever variable was
provided via "val". So changing "val" immediately breaks "cond".


Having thought about this, the best I can come up with is this, which
I think gives us everything we want without needing BUILD_BUG_ONs:

#define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
                                timeout_us, sleep_before_read) \
({ \
        int __ret, __val;
	__ret = read_poll_timeout(__val = phy_read, val, __val < 0 || (cond), \
                sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
        if (__val < 0) \
                __ret = __val; \
        if (__ret) \
                phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
        __ret; \
})

This looks rather horrid, but what it essentially does is:

                (val) = op(args); \
                if (cond) \
                        break; \

expands to:

		(val) = __val = phy_read(args);
		if (__val < 0 || (cond))
			break;

As phy_read() returns an int, there is no cast or loss assigning it
to __val, since that is also an int. The conversion from int to
something else happens at the same point it always has.

Hmm?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] net: phy: fix a signedness bug in genphy_loopback()
  2023-05-30 21:09           ` Russell King (Oracle)
@ 2023-05-30 21:28             ` Russell King (Oracle)
  0 siblings, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2023-05-30 21:28 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Dan Carpenter, Oleksij Rempel, Heiner Kallweit,
	David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	kernel-janitors

On Tue, May 30, 2023 at 10:09:24PM +0100, Russell King (Oracle) wrote:
> Having thought about this, the best I can come up with is this, which
> I think gives us everything we want without needing BUILD_BUG_ONs:
> 
> #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
>                                 timeout_us, sleep_before_read) \
> ({ \
>         int __ret, __val;
> 	__ret = read_poll_timeout(__val = phy_read, val, __val < 0 || (cond), \
>                 sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
>         if (__val < 0) \
>                 __ret = __val; \
>         if (__ret) \
>                 phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
>         __ret; \
> })
> 
> This looks rather horrid, but what it essentially does is:
> 
>                 (val) = op(args); \
>                 if (cond) \
>                         break; \
> 
> expands to:
> 
> 		(val) = __val = phy_read(args);
> 		if (__val < 0 || (cond))
> 			break;
> 
> As phy_read() returns an int, there is no cast or loss assigning it
> to __val, since that is also an int. The conversion from int to
> something else happens at the same point it always has.

... and actually produces nicer code on 32-bit ARM:

Old (with the u16 val changed to an int val):

 2f8:   ebfffffe        bl      0 <mdiobus_read>
 2fc:   e7e03150        ubfx    r3, r0, #2, #1		extract bit 2 into r3
 300:   e1a04000        mov     r4, r0			save return value
 304:   e2002004        and     r2, r0, #4		extract bit 2 again
 308:   e1933fa0        orrs    r3, r3, r0, lsr #31	grab sign bit
 30c:   1a00000d        bne     348 <genphy_loopback+0xd8>
		breaks out of loop if r3 is nonzero
	... rest of loop ...
...
 348:   e3520000        cmp     r2, #0
 34c:   0a00000b        beq     380 <genphy_loopback+0x110>
		basically tests whether bit 2 was zero, and jumps if it
		was. Basically (cond) is false.

 350:   e3540000        cmp     r4, #0
 354:   a3a04000        movge   r4, #0
 358:   ba00000a        blt     388 <genphy_loopback+0x118>
		tests whether a phy_read returned an error and jumps
		if it did. r4 is basically __ret.
...

 380:   e3540000        cmp     r4, #0
 384:   a3e0406d        mvnge   r4, #109        ; 0x6d
		if r4 (__ret) was >= 0, sets an error code (-ETIMEDOUT).
 388:   e1a03004        mov     r3, r4
 ... dev_err() bit.

The new generated code is:

 2f8:   ebfffffe        bl      0 <mdiobus_read>
                        2f8: R_ARM_CALL mdiobus_read
 2fc:   e2504000        subs    r4, r0, #0		__val assignment
 300:   ba000014        blt     358 <genphy_loopback+0xe8>
		if <0, go direct to dev_err code
 304:   e3140004        tst     r4, #4			cond test within loop
 308:   1a00000d        bne     344 <genphy_loopback+0xd4>
	... rest of loop ...

 344:   e6ff4074        uxth    r4, r4			cast to 16-bit uint
 348:   e3140004        tst     r4, #4			test
 34c:   13a04000        movne   r4, #0			__ret is zero if bit set
 350:   1a000007        bne     374 <genphy_loopback+0x104> basically returns
 354:   e3e0406d        mvn     r4, #109        ; 0x6d
	... otherwise sets __ret to -ETIMEDOUT
	... dev_err() code

Is there a reason why it was written (cond) || val < 0 rather than
val < 0 || (cond) ? Note that the order of these tests makes no
difference in this situation, but I'm wondering whether it was
intentional?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-05-30 21:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 11:45 [PATCH net] net: phy: fix a signedness bug in genphy_loopback() Dan Carpenter
2023-05-26 11:50 ` Russell King (Oracle)
2023-05-30  4:58 ` Jakub Kicinski
2023-05-30  9:01   ` Russell King (Oracle)
2023-05-30  9:06   ` Paolo Abeni
2023-05-30  9:23     ` Dan Carpenter
2023-05-30  9:40       ` Paolo Abeni
2023-05-30  9:49       ` Russell King (Oracle)
2023-05-30 10:06         ` Russell King (Oracle)
2023-05-30 12:39   ` Andrew Lunn
2023-05-30 14:40     ` Dan Carpenter
2023-05-30 17:06       ` Andrew Lunn
2023-05-30 19:19     ` Jakub Kicinski
2023-05-30 19:39       ` Russell King (Oracle)
2023-05-30 20:04         ` Andrew Lunn
2023-05-30 21:09           ` Russell King (Oracle)
2023-05-30 21:28             ` Russell King (Oracle)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.