linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v1] net: phy: nxp-tja11xx: log critical health state
@ 2021-08-10 12:56 Oleksij Rempel
  2021-08-10 15:05 ` Andrew Lunn
  0 siblings, 1 reply; 4+ messages in thread
From: Oleksij Rempel @ 2021-08-10 12:56 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Jean Delvare, Guenter Roeck
  Cc: Oleksij Rempel, Pengutronix Kernel Team, linux-kernel, netdev,
	Marek Vasut, David Jander, linux-hwmon

TJA1102 provides interrupt notification for the critical health states
like overtemperature and undervoltage.

The overtemperature bit is set if package temperature is beyond 155C°.
This functionality was tested by heating the package up to 200C°

The undervoltage bit is set if supply voltage drops beyond some critical
threshold. Currently not tested.

In a typical use case, both of this events should be logged and stored
(or send to some remote system) for further investigations.

To make testing and troubleshooting more easier, other bits are exported
over the ethtool phy stats interface:
- level of overtemperature. There is no access to thermal sensor.
- power supply line where undervoltage was detected

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 drivers/net/phy/nxp-tja11xx.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/nxp-tja11xx.c b/drivers/net/phy/nxp-tja11xx.c
index afd7afa1f498..4c37b427a53b 100644
--- a/drivers/net/phy/nxp-tja11xx.c
+++ b/drivers/net/phy/nxp-tja11xx.c
@@ -47,12 +47,14 @@
 #define MII_INTSRC_LINK_FAIL		BIT(10)
 #define MII_INTSRC_LINK_UP		BIT(9)
 #define MII_INTSRC_MASK			(MII_INTSRC_LINK_FAIL | MII_INTSRC_LINK_UP)
-#define MII_INTSRC_TEMP_ERR		BIT(1)
 #define MII_INTSRC_UV_ERR		BIT(3)
+#define MII_INTSRC_TEMP_ERR		BIT(1)
 
 #define MII_INTEN			22
 #define MII_INTEN_LINK_FAIL		BIT(10)
 #define MII_INTEN_LINK_UP		BIT(9)
+#define MII_INTEN_UV_ERR		BIT(3)
+#define MII_INTEN_TEMP_ERR		BIT(1)
 
 #define MII_COMMSTAT			23
 #define MII_COMMSTAT_LINK_UP		BIT(15)
@@ -89,6 +91,12 @@ static struct tja11xx_phy_stats tja11xx_hw_stats[] = {
 	{ "phy_polarity_detect", 25, 6, BIT(6) },
 	{ "phy_open_detect", 25, 7, BIT(7) },
 	{ "phy_short_detect", 25, 8, BIT(8) },
+	{ "phy_temp_warn (temp > 155C°)", 25, 9, BIT(9) },
+	{ "phy_temp_high (temp > 180C°)", 25, 10, BIT(10) },
+	{ "phy_uv_vddio", 25, 11, BIT(11) },
+	{ "phy_uv_vddd_1v8", 25, 13, BIT(13) },
+	{ "phy_uv_vdda_3v3", 25, 14, BIT(14) },
+	{ "phy_uv_vddd_3v3", 25, 15, BIT(15) },
 	{ "phy_rem_rcvr_count", 26, 0, GENMASK(7, 0) },
 	{ "phy_loc_rcvr_count", 26, 8, GENMASK(15, 8) },
 };
@@ -607,7 +615,8 @@ static int tja11xx_config_intr(struct phy_device *phydev)
 		if (err)
 			return err;
 
-		value = MII_INTEN_LINK_FAIL | MII_INTEN_LINK_UP;
+		value = MII_INTEN_LINK_FAIL | MII_INTEN_LINK_UP |
+			MII_INTEN_UV_ERR | MII_INTEN_TEMP_ERR;
 		err = phy_write(phydev, MII_INTEN, value);
 	} else {
 		err = phy_write(phydev, MII_INTEN, value);
@@ -622,6 +631,7 @@ static int tja11xx_config_intr(struct phy_device *phydev)
 
 static irqreturn_t tja11xx_handle_interrupt(struct phy_device *phydev)
 {
+	struct device *dev = &phydev->mdio.dev;
 	int irq_status;
 
 	irq_status = phy_read(phydev, MII_INTSRC);
@@ -630,6 +640,11 @@ static irqreturn_t tja11xx_handle_interrupt(struct phy_device *phydev)
 		return IRQ_NONE;
 	}
 
+	if (irq_status & MII_INTSRC_TEMP_ERR)
+		dev_err(dev, "Overtemperature error detected (temp > 155C°).\n");
+	if (irq_status & MII_INTSRC_UV_ERR)
+		dev_err(dev, "Undervoltage error detected.\n");
+
 	if (!(irq_status & MII_INTSRC_MASK))
 		return IRQ_NONE;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next v1] net: phy: nxp-tja11xx: log critical health state
  2021-08-10 12:56 [PATCH net-next v1] net: phy: nxp-tja11xx: log critical health state Oleksij Rempel
@ 2021-08-10 15:05 ` Andrew Lunn
  2021-08-10 18:18   ` Guenter Roeck
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Lunn @ 2021-08-10 15:05 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Heiner Kallweit, David S. Miller, Jakub Kicinski, Jean Delvare,
	Guenter Roeck, Pengutronix Kernel Team, linux-kernel, netdev,
	Marek Vasut, David Jander, linux-hwmon

Hi Oleksij

> @@ -89,6 +91,12 @@ static struct tja11xx_phy_stats tja11xx_hw_stats[] = {
>  	{ "phy_polarity_detect", 25, 6, BIT(6) },
>  	{ "phy_open_detect", 25, 7, BIT(7) },
>  	{ "phy_short_detect", 25, 8, BIT(8) },
> +	{ "phy_temp_warn (temp > 155C°)", 25, 9, BIT(9) },
> +	{ "phy_temp_high (temp > 180C°)", 25, 10, BIT(10) },
> +	{ "phy_uv_vddio", 25, 11, BIT(11) },
> +	{ "phy_uv_vddd_1v8", 25, 13, BIT(13) },
> +	{ "phy_uv_vdda_3v3", 25, 14, BIT(14) },
> +	{ "phy_uv_vddd_3v3", 25, 15, BIT(15) },
>  	{ "phy_rem_rcvr_count", 26, 0, GENMASK(7, 0) },
>  	{ "phy_loc_rcvr_count", 26, 8, GENMASK(15, 8) },

I'm not so happy abusing the statistic counters like this. Especially
when we have a better API for temperature and voltage: hwmon.

phy_temp_warn maps to hwmon_temp_max_alarm. phy_temp_high maps to
either hwmon_temp_crit_alarm or hwmon_temp_emergency_alarm.

The under voltage maps to hwmon_in_lcrit_alarm.

> @@ -630,6 +640,11 @@ static irqreturn_t tja11xx_handle_interrupt(struct phy_device *phydev)
>  		return IRQ_NONE;
>  	}
>  
> +	if (irq_status & MII_INTSRC_TEMP_ERR)
> +		dev_err(dev, "Overtemperature error detected (temp > 155C°).\n");
> +	if (irq_status & MII_INTSRC_UV_ERR)
> +		dev_err(dev, "Undervoltage error detected.\n");
> +

These are not actual errors, in the linux sense. So dev_warn() or
maybe dev_info().

      Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next v1] net: phy: nxp-tja11xx: log critical health state
  2021-08-10 15:05 ` Andrew Lunn
@ 2021-08-10 18:18   ` Guenter Roeck
  2021-08-10 18:42     ` Andrew Lunn
  0 siblings, 1 reply; 4+ messages in thread
From: Guenter Roeck @ 2021-08-10 18:18 UTC (permalink / raw)
  To: Andrew Lunn, Oleksij Rempel
  Cc: Heiner Kallweit, David S. Miller, Jakub Kicinski, Jean Delvare,
	Pengutronix Kernel Team, linux-kernel, netdev, Marek Vasut,
	David Jander, linux-hwmon

On 8/10/21 8:05 AM, Andrew Lunn wrote:
> Hi Oleksij
> 
>> @@ -89,6 +91,12 @@ static struct tja11xx_phy_stats tja11xx_hw_stats[] = {
>>   	{ "phy_polarity_detect", 25, 6, BIT(6) },
>>   	{ "phy_open_detect", 25, 7, BIT(7) },
>>   	{ "phy_short_detect", 25, 8, BIT(8) },
>> +	{ "phy_temp_warn (temp > 155C°)", 25, 9, BIT(9) },
>> +	{ "phy_temp_high (temp > 180C°)", 25, 10, BIT(10) },
>> +	{ "phy_uv_vddio", 25, 11, BIT(11) },
>> +	{ "phy_uv_vddd_1v8", 25, 13, BIT(13) },
>> +	{ "phy_uv_vdda_3v3", 25, 14, BIT(14) },
>> +	{ "phy_uv_vddd_3v3", 25, 15, BIT(15) },
>>   	{ "phy_rem_rcvr_count", 26, 0, GENMASK(7, 0) },
>>   	{ "phy_loc_rcvr_count", 26, 8, GENMASK(15, 8) },
> 
> I'm not so happy abusing the statistic counters like this. Especially
> when we have a better API for temperature and voltage: hwmon.
> 
> phy_temp_warn maps to hwmon_temp_max_alarm. phy_temp_high maps to
> either hwmon_temp_crit_alarm or hwmon_temp_emergency_alarm.
> 
> The under voltage maps to hwmon_in_lcrit_alarm.
> 

FWIW, the statistics counters in this driver are already abused
(phy_polarity_detect, phy_open_detect, phy_short_detect), so
I am not sure if adding more abuse makes a difference (and/or
if such abuse is common for phy drivers in general).

Guenter

>> @@ -630,6 +640,11 @@ static irqreturn_t tja11xx_handle_interrupt(struct phy_device *phydev)
>>   		return IRQ_NONE;
>>   	}
>>   
>> +	if (irq_status & MII_INTSRC_TEMP_ERR)
>> +		dev_err(dev, "Overtemperature error detected (temp > 155C°).\n");
>> +	if (irq_status & MII_INTSRC_UV_ERR)
>> +		dev_err(dev, "Undervoltage error detected.\n");
>> +
> 
> These are not actual errors, in the linux sense. So dev_warn() or
> maybe dev_info().
> 
>        Andrew
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next v1] net: phy: nxp-tja11xx: log critical health state
  2021-08-10 18:18   ` Guenter Roeck
@ 2021-08-10 18:42     ` Andrew Lunn
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Lunn @ 2021-08-10 18:42 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Oleksij Rempel, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Jean Delvare, Pengutronix Kernel Team, linux-kernel, netdev,
	Marek Vasut, David Jander, linux-hwmon

> > I'm not so happy abusing the statistic counters like this. Especially
> > when we have a better API for temperature and voltage: hwmon.
> > 
> > phy_temp_warn maps to hwmon_temp_max_alarm. phy_temp_high maps to
> > either hwmon_temp_crit_alarm or hwmon_temp_emergency_alarm.
> > 
> > The under voltage maps to hwmon_in_lcrit_alarm.
> > 
> 
> FWIW, the statistics counters in this driver are already abused
> (phy_polarity_detect, phy_open_detect, phy_short_detect), so
> I am not sure if adding more abuse makes a difference (and/or
> if such abuse is common for phy drivers in general).

Hi Guenter

Abuse is not common in general. I think this is the only driver
abusing stats to return flags.  At the time those where added, we did
not have phy cable test support. Now we do, i would also suggest that
the driver makes use of that infrastructure to issue a cable test
report. These 'stats' need to stay, since they are ABI, but we should
not add more.

That is also why i said "Especially when we have a better API".

     Andrew


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-10 18:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 12:56 [PATCH net-next v1] net: phy: nxp-tja11xx: log critical health state Oleksij Rempel
2021-08-10 15:05 ` Andrew Lunn
2021-08-10 18:18   ` Guenter Roeck
2021-08-10 18:42     ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).