* Driver SFC: Possible bug in LM87 temperature XFP detection code @ 2009-04-28 9:36 Jesper Dangaard Brouer 2009-04-28 13:36 ` Ben Hutchings 0 siblings, 1 reply; 11+ messages in thread From: Jesper Dangaard Brouer @ 2009-04-28 9:36 UTC (permalink / raw) To: Ben Hutchings; +Cc: netdev Hi Ben, I have borrowed some SMC10GPCIe-XFP NICs directly from SMC for evaluation. The NICs uses a Solarflare Chip and the SFC driver. If unpluging the fiber cable I start getting these errors: +-------- sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 30:00) INTERNAL EXTERNAL sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 30:00) INTERNAL EXTERNAL sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 10:00) INTERNAL sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY +--------- Reading through the driver code (drivers/net/sfc/boards.c), this problem is related to temperature. The real issues is that I cannot get the device up and running again after lowering the temperature. Only if I unload and load the sfc driver, then I can get the device running again. I'm thinking perhaps there is missing a PHY power up again, after the temperature alarm has gone? I'm using kernel 2.6.30-rc1-net-next-00664-gd93fe1a. To Ben; do you have anything you want me to try. Do you want to fix this you self, or can you give me some code hints or patches to try out? I'm wondering what chip the SMC NIC is using? From lspci is says SFC4000, but does that corrospond to EFX_BOARD_SFE4001 or EFX_BOARD_SFE4002 ? (Additional tech info below signature) -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network developer Cand. Scient Datalog / MSc. Author of http://adsl-optimizer.dk LinkedIn: http://www.linkedin.com/in/brouer * XFP modules used ------------------ I have tested with two different XFP modules: (1) Finisar FTLX1412D3BCL 10GBASE-LR/LW (2) Sumitomo SXP3101LX * lspci info ------------ lspci -xvs 12:00.0 +--------- 12:00.0 Ethernet controller: Solarflare Communications SFC4000 rev B [Solarstorm] (rev 02) Subsystem: Standard Microsystems Corp [SMC] SMC10GPCIe-XFP (A1) [TigerCard] Flags: bus master, fast devsel, latency 0, IRQ 19 I/O ports at e800 [size=256] Memory at f8000000 (64-bit, non-prefetchable) [size=64M] Expansion ROM at feaf0000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable- Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [90] MSI-X: Enable+ Mask- TabSize=64 Capabilities: [b0] Vital Product Data <?> Capabilities: [100] Advanced Error Reporting <?> Capabilities: [140] Device Serial Number 00-0f-53-ff-ff-2c-b2-1a Kernel driver in use: sfc Kernel modules: sfc 00: 24 19 10 07 07 05 10 00 02 00 00 02 10 00 00 00 10: 01 e8 00 00 00 00 00 00 04 00 00 f8 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 b8 10 01 02 30: 00 00 af fe 40 00 00 00 00 00 00 00 0a 01 00 00 +------- * Relevant kernel CONFIG_xxx ---------------------------- CONFIG_SFC_MTD not set CONFIG_SENSORS_LM87=m CONFIG_SFC=m * ethtool status ---------------- ethtool eth88 +------- Settings for eth88: Supported ports: [ FIBRE ] Supported link modes: Supports auto-negotiation: No Advertised link modes: Not reported Advertised auto-negotiation: No Speed: 10000Mb/s Duplex: Full Port: FIBRE PHYAD: 2 Transceiver: internal Auto-negotiation: off Link detected: no +------- * SMC links ----------- SMC10GPCIe-XFP TigerCard™ 10G http://www.smc.com/index.cfm?event=viewProduct&cid=9&scid=51&pid=1648 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Driver SFC: Possible bug in LM87 temperature XFP detection code 2009-04-28 9:36 Driver SFC: Possible bug in LM87 temperature XFP detection code Jesper Dangaard Brouer @ 2009-04-28 13:36 ` Ben Hutchings 2009-04-28 14:44 ` Jesper Dangaard Brouer 2009-04-28 17:04 ` Driver SFC: Possible bug in LM87 temperature XFP detection code Ben Hutchings 0 siblings, 2 replies; 11+ messages in thread From: Ben Hutchings @ 2009-04-28 13:36 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: netdev On Tue, 2009-04-28 at 11:36 +0200, Jesper Dangaard Brouer wrote: > Hi Ben, > > I have borrowed some SMC10GPCIe-XFP NICs directly from SMC for > evaluation. The NICs uses a Solarflare Chip and the SFC driver. > > If unpluging the fiber cable I start getting these errors: > > +-------- > sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 30:00) INTERNAL EXTERNAL > sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY > > sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 30:00) INTERNAL EXTERNAL > sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY > > sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 10:00) INTERNAL > sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY > +--------- > > Reading through the driver code (drivers/net/sfc/boards.c), this problem > is related to temperature. Right. And the sensors are not polled while the link is up, on the assumption that a temperature or voltage fault will cause the link to go down, and because bit-banged I2C will reduce throughput slightly. > The real issues is that I cannot get the device up and running again > after lowering the temperature. Only if I unload and load the sfc > driver, then I can get the device running again. > > I'm thinking perhaps there is missing a PHY power up again, after the > temperature alarm has gone? We considered it most important to shut down the board to prevent or mitigate damage, and did not implement any recovery beyond that. > I'm using kernel 2.6.30-rc1-net-next-00664-gd93fe1a. > > > To Ben; do you have anything you want me to try. Do you want to fix this > you self, or can you give me some code hints or patches to try out? I don't intend to fix this myself. If you want to try implementing this then you should start by looking at efx_monitor() in efx.c. However, I think your time might be better spent in fixing the air flow in the computer before the board is permanently damaged. > I'm wondering what chip the SMC NIC is using? From lspci is says > SFC4000, but does that corrospond to EFX_BOARD_SFE4001 or > EFX_BOARD_SFE4002 ? The SMC10GPCIe-XFP is based on SFE4002. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Driver SFC: Possible bug in LM87 temperature XFP detection code 2009-04-28 13:36 ` Ben Hutchings @ 2009-04-28 14:44 ` Jesper Dangaard Brouer 2009-04-28 14:48 ` [PATCH] sfc: Make temperature warnings/alarms more explicit Jesper Dangaard Brouer 2009-04-28 17:04 ` Driver SFC: Possible bug in LM87 temperature XFP detection code Ben Hutchings 1 sibling, 1 reply; 11+ messages in thread From: Jesper Dangaard Brouer @ 2009-04-28 14:44 UTC (permalink / raw) To: Ben Hutchings; +Cc: netdev On Tue, 2009-04-28 at 14:36 +0100, Ben Hutchings wrote: > On Tue, 2009-04-28 at 11:36 +0200, Jesper Dangaard Brouer wrote: > > Hi Ben, > > > > I have borrowed some SMC10GPCIe-XFP NICs directly from SMC for > > evaluation. The NICs uses a Solarflare Chip and the SFC driver. > > > > If unpluging the fiber cable I start getting these errors: > > > > +-------- > > sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 30:00) INTERNAL EXTERNAL > > sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY > > > > sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 30:00) INTERNAL EXTERNAL > > sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY > > > > sfc 0000:12:00.0: ERR: eth88 LM87 detected a hardware failure (status 10:00) INTERNAL > > sfc 0000:12:00.0: ERR: eth88 Board sensor reported fault; shutting down PHY > > +--------- > > > > Reading through the driver code (drivers/net/sfc/boards.c), this problem > > is related to temperature. > > Right. And the sensors are not polled while the link is up, on the > assumption that a temperature or voltage fault will cause the link to go > down, and because bit-banged I2C will reduce throughput slightly. In my situation the link does not go down due the temperature issue. > > The real issues is that I cannot get the device up and running again > > after lowering the temperature. Only if I unload and load the sfc > > driver, then I can get the device running again. > > > > I'm thinking perhaps there is missing a PHY power up again, after the > > temperature alarm has gone? > > We considered it most important to shut down the board to prevent or > mitigate damage, and did not implement any recovery beyond that. Im my case putting the PHY in PHY_MODE_LOW_POWER, does not help lowering the temperature. The errors are continous, until I apply "manual" airflow ;-) > > I'm using kernel 2.6.30-rc1-net-next-00664-gd93fe1a. > > > > > > To Ben; do you have anything you want me to try. Do you want to fix this > > you self, or can you give me some code hints or patches to try out? > > I don't intend to fix this myself. If you want to try implementing this > then you should start by looking at efx_monitor() in efx.c. However, I > think your time might be better spent in fixing the air flow in the > computer before the board is permanently damaged. I see you point, I don't want to damage the board... not sure I want to fix it then... Although in a production environment, I think the driver should support exchanging a failed XFP without rebooting the server. Then I also think that we should make the error message a bit more explicit, in order to warn people before the board is permanently damaged. I'll post a patch proposal as reply to this message... -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network developer Cand. Scient Datalog / MSc. Author of http://adsl-optimizer.dk LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] sfc: Make temperature warnings/alarms more explicit. 2009-04-28 14:44 ` Jesper Dangaard Brouer @ 2009-04-28 14:48 ` Jesper Dangaard Brouer 2009-04-30 0:50 ` David Miller 2009-04-30 1:25 ` Ben Hutchings 0 siblings, 2 replies; 11+ messages in thread From: Jesper Dangaard Brouer @ 2009-04-28 14:48 UTC (permalink / raw) To: Ben Hutchings; +Cc: netdev The sfc driver can detect different hardware failures via the LM87 system. One of the failures I have experienced is the temperature alarm, but the error message didn't reveal that this error was temperature related. I had to read the code to discover that. I think that the temperature error should be more explicit, in order to warn people before the board is permanently damaged. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> --- drivers/net/sfc/boards.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c index 4a4c74c..b1822fe 100644 --- a/drivers/net/sfc/boards.c +++ b/drivers/net/sfc/boards.c @@ -121,8 +121,10 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask) if (alarms1 || alarms2) { EFX_ERR(efx, "LM87 detected a hardware failure (status %02x:%02x)" - "%s%s\n", + "%s%s%s\n", alarms1, alarms2, + (alarms1 & (LM87_ALARM_TEMP_INT|LM87_ALARM_TEMP_EXT1)) + ? " high temperature" : "", (alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "", (alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : ""); return -ERANGE; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sfc: Make temperature warnings/alarms more explicit. 2009-04-28 14:48 ` [PATCH] sfc: Make temperature warnings/alarms more explicit Jesper Dangaard Brouer @ 2009-04-30 0:50 ` David Miller 2009-04-30 1:25 ` Ben Hutchings 1 sibling, 0 replies; 11+ messages in thread From: David Miller @ 2009-04-30 0:50 UTC (permalink / raw) To: jdb; +Cc: bhutchings, netdev From: Jesper Dangaard Brouer <jdb@comx.dk> Date: Tue, 28 Apr 2009 16:48:04 +0200 > > The sfc driver can detect different hardware failures via the > LM87 system. One of the failures I have experienced is the > temperature alarm, but the error message didn't reveal that this > error was temperature related. I had to read the code to > discover that. > > I think that the temperature error should be more explicit, in > order to warn people before the board is permanently damaged. > > Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Ben, ACK or something? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sfc: Make temperature warnings/alarms more explicit. 2009-04-28 14:48 ` [PATCH] sfc: Make temperature warnings/alarms more explicit Jesper Dangaard Brouer 2009-04-30 0:50 ` David Miller @ 2009-04-30 1:25 ` Ben Hutchings 2009-04-30 8:44 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 11+ messages in thread From: Ben Hutchings @ 2009-04-30 1:25 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: David Miller, netdev On Tue, 2009-04-28 at 16:48 +0200, Jesper Dangaard Brouer wrote: > The sfc driver can detect different hardware failures via the > LM87 system. One of the failures I have experienced is the > temperature alarm, but the error message didn't reveal that this > error was temperature related. I had to read the code to > discover that. > > I think that the temperature error should be more explicit, in > order to warn people before the board is permanently damaged. You are right, but... > diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c > index 4a4c74c..b1822fe 100644 > --- a/drivers/net/sfc/boards.c > +++ b/drivers/net/sfc/boards.c > @@ -121,8 +121,10 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask) > if (alarms1 || alarms2) { > EFX_ERR(efx, > "LM87 detected a hardware failure (status %02x:%02x)" > - "%s%s\n", > + "%s%s%s\n", > alarms1, alarms2, > + (alarms1 & (LM87_ALARM_TEMP_INT|LM87_ALARM_TEMP_EXT1)) > + ? " high temperature" : "", > (alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "", > (alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : ""); > return -ERANGE; We could be more explicit still. How about: EFX_ERR(efx, "%s out of range (LM87 status %02x:%02x)\n", (alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" : (alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature : "Voltage", alarms1, alarms2); Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sfc: Make temperature warnings/alarms more explicit. 2009-04-30 1:25 ` Ben Hutchings @ 2009-04-30 8:44 ` Jesper Dangaard Brouer 0 siblings, 0 replies; 11+ messages in thread From: Jesper Dangaard Brouer @ 2009-04-30 8:44 UTC (permalink / raw) To: Ben Hutchings; +Cc: David Miller, netdev On Thu, 2009-04-30 at 02:25 +0100, Ben Hutchings wrote: > On Tue, 2009-04-28 at 16:48 +0200, Jesper Dangaard Brouer wrote: > > The sfc driver can detect different hardware failures via the > > LM87 system. One of the failures I have experienced is the > > temperature alarm, but the error message didn't reveal that this > > error was temperature related. I had to read the code to > > discover that. > > > > I think that the temperature error should be more explicit, in > > order to warn people before the board is permanently damaged. > > You are right, but... > > > diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c > > index 4a4c74c..b1822fe 100644 > > --- a/drivers/net/sfc/boards.c > > +++ b/drivers/net/sfc/boards.c > > @@ -121,8 +121,10 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask) > > if (alarms1 || alarms2) { > > EFX_ERR(efx, > > "LM87 detected a hardware failure (status %02x:%02x)" > > - "%s%s\n", > > + "%s%s%s\n", > > alarms1, alarms2, > > + (alarms1 & (LM87_ALARM_TEMP_INT|LM87_ALARM_TEMP_EXT1)) > > + ? " high temperature" : "", > > (alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "", > > (alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : ""); > > return -ERANGE; > > We could be more explicit still. How about: Yes, the "INTERNAL" and "EXTERNAL" is not very descriptive. > > EFX_ERR(efx, > "%s out of range (LM87 status %02x:%02x)\n", > (alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" : > (alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature : > "Voltage", Notice that both LM87_ALARM_TEMP_INT and LM87_ALARM_TEMP_EXT1 can be set at the same time. In that case we only print "Board temperature", is that good enough? (I thinks its okay, as the status will provide developers enough info for debugging the event) > alarms1, alarms2); I would like to emphesize that its a hardware alarm, by adding "HW alarm:" see revised patch... -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network developer Cand. Scient Datalog / MSc. Author of http://adsl-optimizer.dk LinkedIn: http://www.linkedin.com/in/brouer [PATCH] sfc: Make temperature warnings/alarms more explicit. The sfc driver can detect different hardware failures via the LM87 system. One of the failures I have experienced is the temperature alarm, but the error message didn't reveal that this error was temperature related. I had to read the code to discover that. I think that the temperature error should be more explicit, in order to warn people before the board is permanently damaged. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> --- drivers/net/sfc/boards.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c index 4a4c74c..223866c 100644 --- a/drivers/net/sfc/boards.c +++ b/drivers/net/sfc/boards.c @@ -120,11 +120,11 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask) alarms2 &= mask >> 8; if (alarms1 || alarms2) { EFX_ERR(efx, - "LM87 detected a hardware failure (status %02x:%02x)" - "%s%s\n", - alarms1, alarms2, - (alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "", - (alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : ""); + "HW alarm: %s out of range (LM87 status %02x:%02x)\n", + (alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" : + (alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature" : + "Voltage", + alarms1, alarms2); return -ERANGE; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Driver SFC: Possible bug in LM87 temperature XFP detection code 2009-04-28 13:36 ` Ben Hutchings 2009-04-28 14:44 ` Jesper Dangaard Brouer @ 2009-04-28 17:04 ` Ben Hutchings 2009-04-29 8:52 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 11+ messages in thread From: Ben Hutchings @ 2009-04-28 17:04 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: netdev I wrote: > However, I > think your time might be better spent in fixing the air flow in the > computer before the board is permanently damaged. It may also be worth checking that the sensors are actually reading correctly. You can read them using the "sensors" command from the lm_sensors package or by running: grep . /sys/class/net/eth88/device/i2c-adapter:*/*-002e/temp*_input Divide the numbers by 1000 to get temperatures in degrees celsius. The "internal" temperature sensor (temp1_input) is in the LM87, which is placed at the corner of the board away from the bracket and the edge connector. The "external" temperature sensor (temp2_input) is in the SFC4000. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Driver SFC: Possible bug in LM87 temperature XFP detection code 2009-04-28 17:04 ` Driver SFC: Possible bug in LM87 temperature XFP detection code Ben Hutchings @ 2009-04-29 8:52 ` Jesper Dangaard Brouer 2009-04-29 12:11 ` Jesper Dangaard Brouer 2009-04-29 12:47 ` Ben Hutchings 0 siblings, 2 replies; 11+ messages in thread From: Jesper Dangaard Brouer @ 2009-04-29 8:52 UTC (permalink / raw) To: Ben Hutchings; +Cc: Jesper Dangaard Brouer, netdev, Robert Olsson On Tue, 28 Apr 2009, Ben Hutchings wrote: > I wrote: >> However, I >> think your time might be better spent in fixing the air flow in the >> computer before the board is permanently damaged. > > It may also be worth checking that the sensors are actually reading > correctly. You can read them using the "sensors" command from the > lm_sensors package or by running: > > grep . /sys/class/net/eth88/device/i2c-adapter:*/*-002e/temp*_input > > Divide the numbers by 1000 to get temperatures in degrees celsius. /sys/class/net/eth88/device/i2c-adapter:i2c-1/1-002e/temp1_input:61000 /sys/class/net/eth88/device/i2c-adapter:i2c-1/1-002e/temp2_input:70000 Guess the temperature is fairly high... I can touch it with a finger without getting burned. I also have a 10GbE Sun Neptune (niu) in the machine, if I touch the heatsink on that I burn my finger, so that is even hotter... > The "internal" temperature sensor (temp1_input) is in the LM87, which is > placed at the corner of the board away from the bracket and the edge > connector. > > The "external" temperature sensor (temp2_input) is in the SFC4000. I though this was read from the XFP. I was hoping the NIC supported reading i2c stuff from the XFP. Does it support that? (Robert Olsson wanted to play with this stuff) Cheers, Jesper Brouer -- ------------------------------------------------------------------- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk ------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Driver SFC: Possible bug in LM87 temperature XFP detection code 2009-04-29 8:52 ` Jesper Dangaard Brouer @ 2009-04-29 12:11 ` Jesper Dangaard Brouer 2009-04-29 12:47 ` Ben Hutchings 1 sibling, 0 replies; 11+ messages in thread From: Jesper Dangaard Brouer @ 2009-04-29 12:11 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: Ben Hutchings, netdev, Robert Olsson On Wed, 2009-04-29 at 10:52 +0200, Jesper Dangaard Brouer wrote: > On Tue, 28 Apr 2009, Ben Hutchings wrote: > > It may also be worth checking that the sensors are actually reading > > correctly. You can read them using the "sensors" command from the > > lm_sensors package or by running: > > > > grep . /sys/class/net/eth88/device/i2c-adapter:*/*-002e/temp*_input > > > > Divide the numbers by 1000 to get temperatures in degrees celsius. > > /sys/class/net/eth88/device/i2c-adapter:i2c-1/1-002e/temp1_input:61000 > /sys/class/net/eth88/device/i2c-adapter:i2c-1/1-002e/temp2_input:70000 > > Guess the temperature is fairly high... I can touch it with a finger > without getting burned. > I also have a 10GbE Sun Neptune (niu) in the > machine, if I touch the heatsink on that I burn my finger, so that is even > hotter... I got my hands on a InfraRed Thermometer. The Sun NIC was 98 degrees celsius!!! > > The "internal" temperature sensor (temp1_input) is in the LM87, which is > > placed at the corner of the board away from the bracket and the edge > > connector. Measuring that point gives the correct measurement using the external infrared thermometer. > > The "external" temperature sensor (temp2_input) is in the SFC4000. It was harder to measure the temperature on the NIC CPU because of its blank surface. I can get a reading of 66 Celcius where the sensor says 68-69 Celcius. The XFP it self of "only" 55 Celcius. Measured by quickly pulling it out and measuring (and taking care of measuring on a non blank surface point). > I though this was read from the XFP. I was hoping the NIC supported > reading i2c stuff from the XFP. Does it support that? (Robert Olsson > wanted to play with this stuff) -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network developer Cand. Scient Datalog / MSc. Author of http://adsl-optimizer.dk LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Driver SFC: Possible bug in LM87 temperature XFP detection code 2009-04-29 8:52 ` Jesper Dangaard Brouer 2009-04-29 12:11 ` Jesper Dangaard Brouer @ 2009-04-29 12:47 ` Ben Hutchings 1 sibling, 0 replies; 11+ messages in thread From: Ben Hutchings @ 2009-04-29 12:47 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: Jesper Dangaard Brouer, netdev, Robert Olsson On Wed, 2009-04-29 at 10:52 +0200, Jesper Dangaard Brouer wrote: [...] > > The "internal" temperature sensor (temp1_input) is in the LM87, which is > > placed at the corner of the board away from the bracket and the edge > > connector. > > > > The "external" temperature sensor (temp2_input) is in the SFC4000. > > I though this was read from the XFP. No. > I was hoping the NIC supported > reading i2c stuff from the XFP. Does it support that? (Robert Olsson > wanted to play with this stuff) That's connected to the PHY and not the controller. According to the datasheet for the QT2022, you can read out the EEPROM from MDIO registers 1.32775-1.33030 (ignoring bits 8-15). Use the MII ioctl interface and set phy_id to 0x441 (the formula is 0x400 | (prtad << 5) | devad). Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-04-30 8:45 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-28 9:36 Driver SFC: Possible bug in LM87 temperature XFP detection code Jesper Dangaard Brouer 2009-04-28 13:36 ` Ben Hutchings 2009-04-28 14:44 ` Jesper Dangaard Brouer 2009-04-28 14:48 ` [PATCH] sfc: Make temperature warnings/alarms more explicit Jesper Dangaard Brouer 2009-04-30 0:50 ` David Miller 2009-04-30 1:25 ` Ben Hutchings 2009-04-30 8:44 ` Jesper Dangaard Brouer 2009-04-28 17:04 ` Driver SFC: Possible bug in LM87 temperature XFP detection code Ben Hutchings 2009-04-29 8:52 ` Jesper Dangaard Brouer 2009-04-29 12:11 ` Jesper Dangaard Brouer 2009-04-29 12:47 ` Ben Hutchings
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.