* Network cooling device and how to control NIC speed on thermal condition @ 2017-04-25 8:36 Waldemar Rymarkiewicz 2017-04-25 13:17 ` Andrew Lunn ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Waldemar Rymarkiewicz @ 2017-04-25 8:36 UTC (permalink / raw) To: netdev; +Cc: linux-kernel Hi, I am not much aware of linux networking architecture so I'd like to ask first before will start to dig into the code. Appreciate any feedback. I am looking on Linux thermal framework and on how to cool down the system effectively when it hits thermal condition. Already existing cooling methods cpu_cooling and clock_cooling are good. However, I wanted to go further and dynamically control also a switch ports' speed based on thermal condition. Lowering speed means less power, less power means lower temp. Is there any in-kernel interface to configure switch port/NIC from other driver? Is there any mechanism to power save, when port/interface is not really used (not much or low data traffic), embedded in networking stack or is it a task for NIC driver itself ? I was thinking to create net_cooling device similarly to cpu_cooling device which cool down the system scaling down cpu freq. net_cooling could lower down interface speed (or tune more parameters to achieve ). Do you thing could this work form networking stack perspective? Any pointers to the code or a doc highly appreciated. Thanks, /Waldek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-04-25 8:36 Network cooling device and how to control NIC speed on thermal condition Waldemar Rymarkiewicz @ 2017-04-25 13:17 ` Andrew Lunn 2017-04-25 13:45 ` Alan Cox 2017-04-25 16:23 ` Florian Fainelli 2 siblings, 0 replies; 9+ messages in thread From: Andrew Lunn @ 2017-04-25 13:17 UTC (permalink / raw) To: Waldemar Rymarkiewicz; +Cc: netdev, linux-kernel On Tue, Apr 25, 2017 at 10:36:28AM +0200, Waldemar Rymarkiewicz wrote: > Hi, > > I am not much aware of linux networking architecture so I'd like to > ask first before will start to dig into the code. Appreciate any > feedback. > > I am looking on Linux thermal framework and on how to cool down the > system effectively when it hits thermal condition. Already existing > cooling methods cpu_cooling and clock_cooling are good. However, I > wanted to go further and dynamically control also a switch ports' > speed based on thermal condition. Lowering speed means less power, > less power means lower temp. > > Is there any in-kernel interface to configure switch port/NIC from other driver? Hi Waldemar Linux models switch ports as network interfaces, so mostly, there is little difference between a NIC and a switch port. What you define for one, should work for the other. Mostly. However, i don't think you need to be too worried about the NIC level of the stack. You can mostly do this higher up in the stack. I would expect there is a relationship between Packets per Second and generated heat. You might want the NIC to give you some sort of heating coefficient, 1PPS is ~ 10uC. Given that, you want to throttle the PPS in the generic queuing layers. This sounds like a TC filter. You have userspace install a TC filter, which is a net_cooling device. This however does not directly work for so called 'fastpath' traffic in switches. Frames which ingress one switch port and egress another switch port are mostly never seen by Linux. So a software TC filter will not affect them. However, there is infrastructure in place to accelerate TC filters by pushing them down into the hardware. So the same basic concept can be used for switch fastpath traffic, but requires a bit more work. Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-04-25 8:36 Network cooling device and how to control NIC speed on thermal condition Waldemar Rymarkiewicz 2017-04-25 13:17 ` Andrew Lunn @ 2017-04-25 13:45 ` Alan Cox 2017-04-28 8:04 ` Waldemar Rymarkiewicz 2017-04-25 16:23 ` Florian Fainelli 2 siblings, 1 reply; 9+ messages in thread From: Alan Cox @ 2017-04-25 13:45 UTC (permalink / raw) To: Waldemar Rymarkiewicz; +Cc: netdev, linux-kernel > I am looking on Linux thermal framework and on how to cool down the > system effectively when it hits thermal condition. Already existing > cooling methods cpu_cooling and clock_cooling are good. However, I > wanted to go further and dynamically control also a switch ports' > speed based on thermal condition. Lowering speed means less power, > less power means lower temp. > > Is there any in-kernel interface to configure switch port/NIC from other driver? No but you can always hook that kind of functionality to the thermal daemon. However I'd be careful with your assumptions. Lower speed also means more time active. https://github.com/01org/thermal_daemon For example if you run a big encoding job on an atom instead of an Intel i7, the atom will often not only take way longer but actually use more total power than the i7 did. Thus it would often be far more efficient to time synchronize your systems, batch up data on the collecting end, have the processing node wake up on an alarm, collect data from the other node and then actually go back into suspend. Modern processors are generally very good in idle state (less so sometimes the platform around them) so trying to lower speeds may actually be the wrong thing to do, versus say trying to batch up activity so that you handle a burst and then sleep the entire platform. It also makes sense to keep policy like that mostly user space - because what you do is going to be very device specific - eg with things like dimming the screen, lowering the wifi power, pausing some system services, pausing battery charge etc. Now at platform design time there are some interesting trade offs between 100Mbit and 1Gbit ethernet although less so than there used to be 8) Alan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-04-25 13:45 ` Alan Cox @ 2017-04-28 8:04 ` Waldemar Rymarkiewicz 2017-04-28 11:56 ` Andrew Lunn 0 siblings, 1 reply; 9+ messages in thread From: Waldemar Rymarkiewicz @ 2017-04-28 8:04 UTC (permalink / raw) To: Alan Cox, Andrew Lunn, Florian Fainelli; +Cc: netdev, linux-kernel On 25 April 2017 at 15:45, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote: >> I am looking on Linux thermal framework and on how to cool down the >> system effectively when it hits thermal condition. Already existing >> cooling methods cpu_cooling and clock_cooling are good. However, I >> wanted to go further and dynamically control also a switch ports' >> speed based on thermal condition. Lowering speed means less power, >> less power means lower temp. >> >> Is there any in-kernel interface to configure switch port/NIC from other driver? > > No but you can always hook that kind of functionality to the thermal > daemon. However I'd be careful with your assumptions. Lower speed also > means more time active. > > https://github.com/01org/thermal_daemon This is one of the option indeed. Will consider this option as well. I would see, however, a generic solution in the kernel (configurable of course) as every network device can generate higher heat with higher link speed. > For example if you run a big encoding job on an atom instead of an Intel > i7, the atom will often not only take way longer but actually use more > total power than the i7 did. > > Thus it would often be far more efficient to time synchronize your > systems, batch up data on the collecting end, have the processing node > wake up on an alarm, collect data from the other node and then actually > go back into suspend. Yes, that's true in a normal thermal conditions. However, if the platform reaches max temp trip we don't really care about performance and time efficiency we just try to avoid critical trip and system shutdown by cooling the system eg. lowering cpu freq, limiting usb phy speed, or net link speed etc. I did a quick test to show you what I am about. I collect SoC temp every a few secs. Meantime, I use ethtool -s ethX speed <speed> to manipulate link speed and to see how it impacts SoC temp. My 4 PHYs and switch are integrated into SoC and I always change link speed for all PHYs , no traffic on the link for this test. Starting with 1Gb/s and then scaling down to 100 Mb/s and then to 10Mb/s, I see significant ~10 *C drop in temp while link is set to 10Mb/s. So, throttling link speed can really help to dissipate heat significantly when the platform is under threat. Renegotiating link speed costs something I agree, it also impacts user experience, but such a thermal condition will not occur often I believe. /Waldek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-04-28 8:04 ` Waldemar Rymarkiewicz @ 2017-04-28 11:56 ` Andrew Lunn 2017-05-08 8:08 ` Waldemar Rymarkiewicz 0 siblings, 1 reply; 9+ messages in thread From: Andrew Lunn @ 2017-04-28 11:56 UTC (permalink / raw) To: Waldemar Rymarkiewicz; +Cc: Alan Cox, Florian Fainelli, netdev, linux-kernel > I collect SoC temp every a few secs. Meantime, I use ethtool -s ethX > speed <speed> to manipulate link speed and to see how it impacts SoC > temp. My 4 PHYs and switch are integrated into SoC and I always > change link speed for all PHYs , no traffic on the link for this test. > Starting with 1Gb/s and then scaling down to 100 Mb/s and then to > 10Mb/s, I see significant ~10 *C drop in temp while link is set to > 10Mb/s. Is that a realistic test? No traffic over the network? If you are hitting your thermal limit, to me that means one of two things: 1) The device is under very heavy load, consuming a lot of power to do what it needs to to. 2) Your device is idle, no packets are flowing, but your thermal design is wrong, so that it cannot dissipate enough heat. It seems to me, you are more interested in 1). But your quick test is more about 2). I would be more interested in do quick tests of switching 8Gbps, 4Gbps, 2Gbps, 1Gbps, 512Mbps, 256Bps, ... What effect does this have on temperature? > So, throttling link speed can really help to dissipate heat > significantly when the platform is under threat. > > Renegotiating link speed costs something I agree, it also impacts user > experience, but such a thermal condition will not occur often I > believe. It is a heavy handed approach, and you have to be careful. There are some devices which don't work properly, e.g. if you try to negotiate 1000 half duplex, you might find the link just breaks. Doing this via packet filtering, dropping packets, gives you a much finer grained control and is a lot less disruptive. But it assumes handling packets is what it causing you heat problems, not the links themselves. Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-04-28 11:56 ` Andrew Lunn @ 2017-05-08 8:08 ` Waldemar Rymarkiewicz 2017-05-08 14:02 ` Andrew Lunn 0 siblings, 1 reply; 9+ messages in thread From: Waldemar Rymarkiewicz @ 2017-05-08 8:08 UTC (permalink / raw) To: Andrew Lunn; +Cc: Alan Cox, Florian Fainelli, netdev, linux-kernel On 28 April 2017 at 13:56, Andrew Lunn <andrew@lunn.ch> wrote: > Is that a realistic test? No traffic over the network? If you are > hitting your thermal limit, to me that means one of two things: > > 1) The device is under very heavy load, consuming a lot of power to do > what it needs to to. > > 2) Your device is idle, no packets are flowing, but your thermal > design is wrong, so that it cannot dissipate enough heat. > > It seems to me, you are more interested in 1). But your quick test is > more about 2). The test was not realistic indeed, but it was rather about showing how link speed correlates to temperature. In the test, I was not under any thermal condition. But the same gain on the temperature we can achieve when we hit hot temperature trip point and does not matter how heavy the network traffic is. It's not said the source of heat is a heavy network traffic. There can be several sources of heat on SoC. However, the fact is that PHYs having active 1G/s link generate much more heat than having 100M/s link independently from network traffic. > I would be more interested in do quick tests of switching 8Gbps, > 4Gbps, 2Gbps, 1Gbps, 512Mbps, 256Bps, ... What effect does this have > on temperature? > >> So, throttling link speed can really help to dissipate heat >> significantly when the platform is under threat. >> >> Renegotiating link speed costs something I agree, it also impacts user >> experience, but such a thermal condition will not occur often I >> believe. > > It is a heavy handed approach, and you have to be careful. There are > some devices which don't work properly, e.g. if you try to negotiate > 1000 half duplex, you might find the link just breaks. That is a valuable remark. I definitely need to run some interoperability tests. > Doing this via packet filtering, dropping packets, gives you a much > finer grained control and is a lot less disruptive. But it assumes > handling packets is what it causing you heat problems, not the links > themselves. Link speed manipulation is considered by me as one of a cooling method, the way to maintain the temperature along with cpufreq, fan etc. It's not said, the heat is caused by heavy network traffic itself. So, packets filtering is not of my interests. All cooling methods impact host only, but "net cooling" impacts remote side in addition, which seems to me to be a problem sometimes. Also, the moment of link renegotiation blocks rx/tx for upper layers, so the user sees a pause when streaming a video for example. However, if a system is under a thermal condition, does it really matter? /Waldek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-05-08 8:08 ` Waldemar Rymarkiewicz @ 2017-05-08 14:02 ` Andrew Lunn 2017-05-15 14:14 ` Waldemar Rymarkiewicz 0 siblings, 1 reply; 9+ messages in thread From: Andrew Lunn @ 2017-05-08 14:02 UTC (permalink / raw) To: Waldemar Rymarkiewicz; +Cc: Alan Cox, Florian Fainelli, netdev, linux-kernel > However, the fact is that PHYs having active 1G/s link generate much > more heat than having 100M/s link independently from network traffic. Yes, this is true. I got an off-list email suggesting this power difference is very significant, more so than actually processing packets. > All cooling methods impact host only, but "net cooling" impacts remote > side in addition, which seems to me to be a problem sometimes. Also, > the moment of link renegotiation blocks rx/tx for upper layers, so the > user sees a pause when streaming a video for example. However, if a > system is under a thermal condition, does it really matter? I don't know the cooling subsystem too well. Can you express a 'cost' for making a change, as well as the likely result in making the change. You might want to make the cost high, so it is used as a last resort if other methods cannot give enough cooling. Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-05-08 14:02 ` Andrew Lunn @ 2017-05-15 14:14 ` Waldemar Rymarkiewicz 0 siblings, 0 replies; 9+ messages in thread From: Waldemar Rymarkiewicz @ 2017-05-15 14:14 UTC (permalink / raw) To: Andrew Lunn; +Cc: Alan Cox, Florian Fainelli, netdev, linux-kernel On 8 May 2017 at 16:02, Andrew Lunn <andrew@lunn.ch> wrote: > Yes, this is true. I got an off-list email suggesting this power > difference is very significant, more so than actually processing > packets. this is a reason I've started to discuss this topic. PHYS consume a lot of power so from thermal perspective its a good candidate for a cooling device. >> All cooling methods impact host only, but "net cooling" impacts remote >> side in addition, which seems to me to be a problem sometimes. Also, >> the moment of link renegotiation blocks rx/tx for upper layers, so the >> user sees a pause when streaming a video for example. However, if a >> system is under a thermal condition, does it really matter? > > I don't know the cooling subsystem too well. Can you express a 'cost' > for making a change, as well as the likely result in making the > change. You might want to make the cost high, so it is used as a last > resort if other methods cannot give enough cooling. Because the cost is relatively high (user experience impact and risk that we break the link with devices that cannot handle link reneg properly) definitely it should be a last resort cooling method before system shutdown. Thermal framework by default shuts down the system when it reaches the critical trip point, before we have a hot trip point. Normally, in the system when you have a thermal zone defined, you also define several trip points (struct of temp, hysteresis and type) and you map trip point to the cooling device (cpu, clock, devfreq, fan or whatever you implement). The thermal governor will respectively activate cooling devices based on system temperature and trip<->cool_dev map to maintain system temperature on a possible lowest level. I also did more tests and actually implemented a prototype net_cooling device, register it by eth driver. In my setup (a switch and 2 PC, running iperf test, streaming video) all work pretty well (the link is renegotiated and transfer continues), but I came to the conclusion, that instead manipulating link speed I can modify advertised link modes, excluding the highest speeds and let the PHY layer to reneg a link. It's much safer. /Waldek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network cooling device and how to control NIC speed on thermal condition 2017-04-25 8:36 Network cooling device and how to control NIC speed on thermal condition Waldemar Rymarkiewicz 2017-04-25 13:17 ` Andrew Lunn 2017-04-25 13:45 ` Alan Cox @ 2017-04-25 16:23 ` Florian Fainelli 2 siblings, 0 replies; 9+ messages in thread From: Florian Fainelli @ 2017-04-25 16:23 UTC (permalink / raw) To: Waldemar Rymarkiewicz, netdev; +Cc: linux-kernel Hello, On 04/25/2017 01:36 AM, Waldemar Rymarkiewicz wrote: > Hi, > > I am not much aware of linux networking architecture so I'd like to > ask first before will start to dig into the code. Appreciate any > feedback. > > I am looking on Linux thermal framework and on how to cool down the > system effectively when it hits thermal condition. Already existing > cooling methods cpu_cooling and clock_cooling are good. However, I > wanted to go further and dynamically control also a switch ports' > speed based on thermal condition. Lowering speed means less power, > less power means lower temp. > > Is there any in-kernel interface to configure switch port/NIC from other driver? Well, there is mostly under the form of notifiers though. For instance there are lots of devices that do converged FCoE/RoCE/Ethernet that have a two headed set of drivers, one for normal ethernet, and another one for RDMA/IB for instance. To some extent stacked devices (VLAN, bond, team, etc.) also call back down into their lower device, but in an abstracted way, at the net_device level of course (layering). > > Is there any mechanism to power save, when port/interface is not > really used (not much or low data traffic), embedded in networking > stack or is it a task for NIC driver itself ? The thing we did (currently out of tree) in the Starfighter 2 switch driver (drivers/net/dsa/bcm_sf2.c) is that any time a port is brought up/down (a port = a network device) we recalculate the switch core clock, and we also resize the buffers and that yields to a little bit of power savings here and there. I don't recall the numbers from the top of my head, but it was significant enough our HW designers convinced me into doing it ;) > > I was thinking to create net_cooling device similarly to cpu_cooling > device which cool down the system scaling down cpu freq. net_cooling > could lower down interface speed (or tune more parameters to achieve > ). Do you thing could this work form networking stack perspective? This sounds like a good idea, but it could be very tricky to get right, because even if you can somehow throttle your transmit activity (since the host is in control), you can't do that without being disruptive to the receive path (or not as effectively). Unlike any kind of host driven activity: CPU run queue, block devices, USB etc. (SPI, I2C and so on when no using slave driven interrupts) you cannot simply apply a "duty cycle" pattern where you turn on your HW just enough of time that is needed for you to set it up for transfer, signal transfer completion and go back to sleep. Networking needs to be able to asynchronously receive packets in a way that is usually not predictable although it could be for very specific workloads though. Another thing is that there is still a fair amount of energy that needs to be spent in maintaining the link, and the HW design may be entirely clocked based on the link speed. Depending on the HW architecture (store and forward, cut through etc.) there would still be a cost associated with maintaining RAMs in a state where they are operational and so on. You could imagine writing a queuing discipline driver that would throttle transmission based on temperature sensors present in your NIC, you could definitively do this in a way that is completely device driver agnostic by using Linux's thermal framework trip point and temperature notifications. For reception, if you are okay with dropping some packets, you could implement something similar, but chances are that your NIC would still need to receive packets, be able to fully process them before SW drops them, at which point, you have a myriad of solutions about how not to process incoming traffic. Hope this helps > > Any pointers to the code or a doc highly appreciated. > > Thanks, > /Waldek > -- Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-05-15 14:15 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-25 8:36 Network cooling device and how to control NIC speed on thermal condition Waldemar Rymarkiewicz 2017-04-25 13:17 ` Andrew Lunn 2017-04-25 13:45 ` Alan Cox 2017-04-28 8:04 ` Waldemar Rymarkiewicz 2017-04-28 11:56 ` Andrew Lunn 2017-05-08 8:08 ` Waldemar Rymarkiewicz 2017-05-08 14:02 ` Andrew Lunn 2017-05-15 14:14 ` Waldemar Rymarkiewicz 2017-04-25 16:23 ` Florian Fainelli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).