* Net device byte statistics @ 2003-07-24 23:56 Fredrik Tolf 2003-07-25 0:22 ` Bernd Eckenfels 0 siblings, 1 reply; 16+ messages in thread From: Fredrik Tolf @ 2003-07-24 23:56 UTC (permalink / raw) To: linux-kernel I have set up a network statistics gathering script, which is based on the byte statistics from /proc/net/dev, but it has always been reporting too little. Yesterday, I discovered that the cause was that these statistics are defined as unsigned longs in include/linux/netdevice.h. Surely, this must be strange? They overflow at least once a day for me. On the other hand, I cannot imagine that noone would have thought of it. What is the reason for this? Is there another interface that I should use instead of /proc/net/dev to gather byte statistics for interfaces? Shouldn't they be changed to unsigned long longs in any case? Fredrik Tolf ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-24 23:56 Net device byte statistics Fredrik Tolf @ 2003-07-25 0:22 ` Bernd Eckenfels 2003-07-25 2:37 ` Fredrik Tolf 2003-07-25 7:03 ` Denis Vlasenko 0 siblings, 2 replies; 16+ messages in thread From: Bernd Eckenfels @ 2003-07-25 0:22 UTC (permalink / raw) To: linux-kernel In article <200307250156.47108.fredrik@dolda2000.cjb.net> you wrote: > On the other hand, I cannot imagine that noone would have thought of it. What > is the reason for this? Is there another interface that I should use instead > of /proc/net/dev to gather byte statistics for interfaces? it is for performance reasons. You can a) collect your numbers more often and asume wrap/reboot if numbers decrease b) use iptables counters instead BTW: it is a very often discussed topic, personally (as net tools maintainer) I would love to see 64bit counters here, but this still means you have to sample often enough, so you do not lose numbers on crash. Greetings Bernd -- eckes privat - http://www.eckes.org/ Project Freefire - http://www.freefire.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 0:22 ` Bernd Eckenfels @ 2003-07-25 2:37 ` Fredrik Tolf 2003-07-25 3:26 ` Bernd Eckenfels 2003-07-25 3:54 ` Jeff Sipek 2003-07-25 7:03 ` Denis Vlasenko 1 sibling, 2 replies; 16+ messages in thread From: Fredrik Tolf @ 2003-07-25 2:37 UTC (permalink / raw) To: Bernd Eckenfels, linux-kernel On Friday 25 July 2003 02.22, Bernd Eckenfels wrote: > it is for performance reasons. You can I almost thought that would be it. I do understand that that code needs to be really clean, but, correct me if I'm wrong, but isn't GCC's long long implementation efficient enough to only add minimal overhead to that? On IA32, it shouldn't take more than one or two more instructions (per counter), and it seems to me that net_device_stats should still be small enough to avoid any more cache misses. I'm no expert, of course, so if I'm wrong, please tell me. > a) collect your numbers more often and asume wrap/reboot if numbers > decrease > b) use iptables counters instead Currently, I'm sampling once a day, and although sampling more often could, of course, solve the problem, it's just that I don't think that it should be necessary. Do the iptables counters take the whole packet into account, or do they ignore the ethernet header? > BTW: it is a very often discussed topic, personally (as net tools > maintainer) I would love to see 64bit counters here, but this still means > you have to sample often enough, so you do not lose numbers on crash. While that is true in theory, I'm just using it to estimate my home net usage, and my router hasn't crashed this far, so I'm not very worried about that. Thank you very much for your input. For now, I'm just going to implement 64 bit counters in my kernel. Fredrik Tolf ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 2:37 ` Fredrik Tolf @ 2003-07-25 3:26 ` Bernd Eckenfels 2003-07-25 3:49 ` Jeff Sipek 2003-07-25 3:54 ` Jeff Sipek 1 sibling, 1 reply; 16+ messages in thread From: Bernd Eckenfels @ 2003-07-25 3:26 UTC (permalink / raw) To: linux-kernel In article <200307250437.50928.fredrik@dolda2000.cjb.net> you wrote: > I almost thought that would be it. I do understand that that code needs to be > really clean, but, correct me if I'm wrong, but isn't GCC's long long > implementation efficient enough to only add minimal overhead to that? I think there is mainly an issue with atomic incremets. I am not sure if the counter can be incremeted concurrently, or if the code path would be serialized, but there is always the reading side, which may need to retry an read. Besides that, the counter is in the fast path, so it will add some delay to packet handling. I guess a 64bit implementation will need to be a per-cpu solution. Greetings Bernd -- eckes privat - http://www.eckes.org/ Project Freefire - http://www.freefire.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 3:26 ` Bernd Eckenfels @ 2003-07-25 3:49 ` Jeff Sipek 0 siblings, 0 replies; 16+ messages in thread From: Jeff Sipek @ 2003-07-25 3:49 UTC (permalink / raw) To: Bernd Eckenfels, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 24 July 2003 23:26, Bernd Eckenfels wrote: > In article <200307250437.50928.fredrik@dolda2000.cjb.net> you wrote: > > I almost thought that would be it. I do understand that that code needs > > to be really clean, but, correct me if I'm wrong, but isn't GCC's long > > long implementation efficient enough to only add minimal overhead to > > that? > > I think there is mainly an issue with atomic incremets. I am not sure if > the counter can be incremeted concurrently, or if the code path would be > serialized, but there is always the reading side, which may need to retry > an read. Besides that, the counter is in the fast path, so it will add some > delay to packet handling. > > I guess a 64bit implementation will need to be a per-cpu solution. I am actually working on it. Jeff. - -- You measure democracy by the freedom it gives its dissidents, not the freedom it gives its assimilated conformists. - Abbie Hoffman -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/IKjYwFP0+seVj/4RAi30AKCKSue0MzoXXggx0BJriERW5DXpIQCg1ECY ZDgy8Dra96jzj4zJz/pGAW0= =wuhA -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 2:37 ` Fredrik Tolf 2003-07-25 3:26 ` Bernd Eckenfels @ 2003-07-25 3:54 ` Jeff Sipek 1 sibling, 0 replies; 16+ messages in thread From: Jeff Sipek @ 2003-07-25 3:54 UTC (permalink / raw) To: Fredrik Tolf, Bernd Eckenfels, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 24 July 2003 22:37, Fredrik Tolf wrote: > On Friday 25 July 2003 02.22, Bernd Eckenfels wrote: > > it is for performance reasons. You can > > I almost thought that would be it. I do understand that that code needs to > be really clean, but, correct me if I'm wrong, but isn't GCC's long long > implementation efficient enough to only add minimal overhead to that? On > IA32, it shouldn't take more than one or two more instructions (per > counter), That is the problem. Nobody can tell what is going to happen between those extra instructions. The worst case scenario would be statistics off by 4GB. > and it seems to me that net_device_stats should still be small > enough to avoid any more cache misses. > I'm no expert, of course, so if I'm wrong, please tell me. > > > a) collect your numbers more often and asume wrap/reboot if numbers > > decrease > > b) use iptables counters instead > > Currently, I'm sampling once a day, and although sampling more often could, > of course, solve the problem, it's just that I don't think that it should > be necessary. There needs to be an implementation that is very friendly to the performance. > Do the iptables counters take the whole packet into account, or do they > ignore the ethernet header? I have no idea. > > BTW: it is a very often discussed topic, personally (as net tools > > maintainer) I would love to see 64bit counters here, but this still means > > you have to sample often enough, so you do not lose numbers on crash. > > While that is true in theory, I'm just using it to estimate my home net > usage, and my router hasn't crashed this far, so I'm not very worried about > that. > > Thank you very much for your input. For now, I'm just going to implement 64 > bit counters in my kernel. May I ask how and which kernel version? Jeff. - -- A lot of my debugging happens in the shower. - Zwane Mwaikambo -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/IKnowFP0+seVj/4RAp5xAJ48QIdsoo2uzZMoARh3pXeLa3ZgoACgnKSG X9bSWvQu0u3s1jWYdN7+Dxk= =PNeu -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 0:22 ` Bernd Eckenfels 2003-07-25 2:37 ` Fredrik Tolf @ 2003-07-25 7:03 ` Denis Vlasenko 2003-07-25 7:01 ` Andre Hedrick 2003-07-25 16:23 ` Jeff Sipek 1 sibling, 2 replies; 16+ messages in thread From: Denis Vlasenko @ 2003-07-25 7:03 UTC (permalink / raw) To: Bernd Eckenfels, linux-kernel On 25 July 2003 03:22, Bernd Eckenfels wrote: > In article <200307250156.47108.fredrik@dolda2000.cjb.net> you wrote: > > On the other hand, I cannot imagine that noone would have thought of it. What > > is the reason for this? Is there another interface that I should use instead > > of /proc/net/dev to gather byte statistics for interfaces? > > it is for performance reasons. You can > > a) collect your numbers more often and asume wrap/reboot if numbers > decrease > b) use iptables counters instead > > BTW: it is a very often discussed topic, personally (as net tools > maintainer) I would love to see 64bit counters here, but this still means > you have to sample often enough, so you do not lose numbers on crash. I sample the data every minute. Will need to do it much more often on 10ge ifaces, when those will appear at my home ;) Or we will need 64bit counters then. -- vda ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 7:03 ` Denis Vlasenko @ 2003-07-25 7:01 ` Andre Hedrick 2003-07-25 16:23 ` Jeff Sipek 1 sibling, 0 replies; 16+ messages in thread From: Andre Hedrick @ 2003-07-25 7:01 UTC (permalink / raw) To: Denis Vlasenko; +Cc: Bernd Eckenfels, linux-kernel Denis, @ $7K per card you will not have to worry for a while :-O -a On Fri, 25 Jul 2003, Denis Vlasenko wrote: > On 25 July 2003 03:22, Bernd Eckenfels wrote: > > In article <200307250156.47108.fredrik@dolda2000.cjb.net> you wrote: > > > On the other hand, I cannot imagine that noone would have thought of it. What > > > is the reason for this? Is there another interface that I should use instead > > > of /proc/net/dev to gather byte statistics for interfaces? > > > > it is for performance reasons. You can > > > > a) collect your numbers more often and asume wrap/reboot if numbers > > decrease > > b) use iptables counters instead > > > > BTW: it is a very often discussed topic, personally (as net tools > > maintainer) I would love to see 64bit counters here, but this still means > > you have to sample often enough, so you do not lose numbers on crash. > > I sample the data every minute. Will need to do it much more often > on 10ge ifaces, when those will appear at my home ;) > > Or we will need 64bit counters then. > -- > vda > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 7:03 ` Denis Vlasenko 2003-07-25 7:01 ` Andre Hedrick @ 2003-07-25 16:23 ` Jeff Sipek 2003-07-25 17:20 ` Randy.Dunlap 1 sibling, 1 reply; 16+ messages in thread From: Jeff Sipek @ 2003-07-25 16:23 UTC (permalink / raw) To: vda, Bernd Eckenfels, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 25 July 2003 03:03, Denis Vlasenko wrote: > I sample the data every minute. Will need to do it much more often > on 10ge ifaces, when those will appear at my home ;) Speed Time for one overflow 10Gbits/s => 3.436 seconds 1Gbit/s => 34.36 seconds 100Mbits/s => 343.6 seconds > Or we will need 64bit counters then. For anything up to (and including) 1GBit/s it is possible to do in easily in userspace, but then were are getting into an area where a program would have to check the files every 3 seconds (and a bit of load could delay it long enough for an overflow to happen.) Jeff. - -- FORTUNE PROVIDES QUESTIONS FOR THE GREAT ANSWERS: #19 A: To be or not to be. Q: What is the square root of 4b^2? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/IVmNwFP0+seVj/4RAioPAJ0Y9+lsU/pcwubJeyt8sIogOJt7/ACgoNhT o1qluqX84CNqU2du7WXG4Eo= =IlX4 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 16:23 ` Jeff Sipek @ 2003-07-25 17:20 ` Randy.Dunlap 2003-07-25 17:55 ` Jeff Sipek 0 siblings, 1 reply; 16+ messages in thread From: Randy.Dunlap @ 2003-07-25 17:20 UTC (permalink / raw) To: Jeff Sipek; +Cc: vda, ecki-lkm, linux-kernel On Fri, 25 Jul 2003 12:23:37 -0400 Jeff Sipek <jeffpc@optonline.net> wrote: | -----BEGIN PGP SIGNED MESSAGE----- | Hash: SHA1 | | On Friday 25 July 2003 03:03, Denis Vlasenko wrote: | > I sample the data every minute. Will need to do it much more often | > on 10ge ifaces, when those will appear at my home ;) | | Speed Time for one overflow | | 10Gbits/s => 3.436 seconds | 1Gbit/s => 34.36 seconds | 100Mbits/s => 343.6 seconds | | > Or we will need 64bit counters then. | | For anything up to (and including) 1GBit/s it is possible to do in easily in | userspace, but then were are getting into an area where a program would have | to check the files every 3 seconds (and a bit of load could delay it long | enough for an overflow to happen.) Yes, a common solution for this is to use some SNMP agent that does 64-bit counter accumulation. IETF expects that some high-speed interfaces will have 64-bit counters. From RFC 2233 (Interfaces Group MIB using SMIv2): <quote> For interfaces that operate at 20,000,000 (20 million) bits per second or less, 32-bit byte and packet counters MUST be used. For interfaces that operate faster than 20,000,000 bits/second, and slower than 650,000,000 bits/second, 32-bit packet counters MUST be used and 64-bit octet counters MUST be used. For interfaces that operate at 650,000,000 bits/second or faster, 64-bit packet counters AND 64-bit octet counters MUST be used. </quote> However, this is a MIB spec. It does not require a Linux (/proc) interface to support 64-bit counters. -- ~Randy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 17:20 ` Randy.Dunlap @ 2003-07-25 17:55 ` Jeff Sipek 2003-07-25 17:58 ` Randy.Dunlap 0 siblings, 1 reply; 16+ messages in thread From: Jeff Sipek @ 2003-07-25 17:55 UTC (permalink / raw) To: Randy.Dunlap; +Cc: vda, ecki-lkm, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 25 July 2003 13:20, Randy.Dunlap wrote: > Yes, a common solution for this is to use some SNMP agent that does > 64-bit counter accumulation. Interesting...I haven't thought of SNMP. > IETF expects that some high-speed interfaces will have 64-bit > counters. From RFC 2233 (Interfaces Group MIB using SMIv2): > > <quote> > For interfaces that operate at 20,000,000 (20 million) bits per > second or less, 32-bit byte and packet counters MUST be used. > For interfaces that operate faster than 20,000,000 bits/second, > and slower than 650,000,000 bits/second, 32-bit packet counters > MUST be used and 64-bit octet counters MUST be used. For > interfaces that operate at 650,000,000 bits/second or faster, > 64-bit packet counters AND 64-bit octet counters MUST be used. > </quote> It is just easier to have everything 64-bits. > However, this is a MIB spec. It does not require a Linux > (/proc) interface to support 64-bit counters. Agreed, however if we are going to change some counters, we should do it for all of them. (Btw, /proc is not the only point where users can get stats.... there is also /sys and something else...I can't remember now...) Jeff. - -- Research, n.: Consider Columbus: He didn't know where he was going. When he got there he didn't know where he was. When he got back he didn't know where he had been. And he did it all on someone else's money. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/IW8GwFP0+seVj/4RAhf3AKDAtCkm8UdL4T1ZQzqttEG7XyVW9ACeIT6m RKO8c2UnpSuJwyvwHd5PS8c= =4vls -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 17:55 ` Jeff Sipek @ 2003-07-25 17:58 ` Randy.Dunlap 2003-07-25 21:55 ` jw schultz 0 siblings, 1 reply; 16+ messages in thread From: Randy.Dunlap @ 2003-07-25 17:58 UTC (permalink / raw) To: Jeff Sipek; +Cc: vda, ecki-lkm, linux-kernel On Fri, 25 Jul 2003 13:55:14 -0400 Jeff Sipek <jeffpc@optonline.net> wrote: | -----BEGIN PGP SIGNED MESSAGE----- | Hash: SHA1 | | On Friday 25 July 2003 13:20, Randy.Dunlap wrote: | > Yes, a common solution for this is to use some SNMP agent that does | > 64-bit counter accumulation. | | Interesting...I haven't thought of SNMP. | | > IETF expects that some high-speed interfaces will have 64-bit | > counters. From RFC 2233 (Interfaces Group MIB using SMIv2): | > | > <quote> | > For interfaces that operate at 20,000,000 (20 million) bits per | > second or less, 32-bit byte and packet counters MUST be used. | > For interfaces that operate faster than 20,000,000 bits/second, | > and slower than 650,000,000 bits/second, 32-bit packet counters | > MUST be used and 64-bit octet counters MUST be used. For | > interfaces that operate at 650,000,000 bits/second or faster, | > 64-bit packet counters AND 64-bit octet counters MUST be used. | > </quote> | | It is just easier to have everything 64-bits. I think the counterpoint is that if it were easy & safe, it would already be in the kernel. | > However, this is a MIB spec. It does not require a Linux | > (/proc) interface to support 64-bit counters. | | Agreed, however if we are going to change some counters, we should do it for | all of them. (Btw, /proc is not the only point where users can get stats.... | there is also /sys and something else...I can't remember now...) Right, I was just saying that the kernel interface doesn't have to support 64-bit counters in lots of cases. That can often be done in userspace. -- ~Randy | http://developer.osdl.org/rddunlap/ | http://www.xenotime.net/linux/ | For Linux-2.6: http://www.codemonkey.org.uk/post-halloween-2.5.txt or http://lwn.net/Articles/39901/ http://www.kernel.org/pub/linux/kernel/people/rusty/modules/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 17:58 ` Randy.Dunlap @ 2003-07-25 21:55 ` jw schultz 2003-07-25 22:51 ` Jeff Sipek 0 siblings, 1 reply; 16+ messages in thread From: jw schultz @ 2003-07-25 21:55 UTC (permalink / raw) To: linux-kernel On Fri, Jul 25, 2003 at 10:58:18AM -0700, Randy.Dunlap wrote: > On Fri, 25 Jul 2003 13:55:14 -0400 Jeff Sipek <jeffpc@optonline.net> wrote: > > | -----BEGIN PGP SIGNED MESSAGE----- > | Hash: SHA1 > | > | On Friday 25 July 2003 13:20, Randy.Dunlap wrote: > | > Yes, a common solution for this is to use some SNMP agent that does > | > 64-bit counter accumulation. > | > | Interesting...I haven't thought of SNMP. > | > | > IETF expects that some high-speed interfaces will have 64-bit > | > counters. From RFC 2233 (Interfaces Group MIB using SMIv2): > | > > | > <quote> > | > For interfaces that operate at 20,000,000 (20 million) bits per > | > second or less, 32-bit byte and packet counters MUST be used. > | > For interfaces that operate faster than 20,000,000 bits/second, > | > and slower than 650,000,000 bits/second, 32-bit packet counters > | > MUST be used and 64-bit octet counters MUST be used. For > | > interfaces that operate at 650,000,000 bits/second or faster, > | > 64-bit packet counters AND 64-bit octet counters MUST be used. > | > </quote> > | > | It is just easier to have everything 64-bits. > > I think the counterpoint is that if it were easy & safe, it would > already be in the kernel. > > | > However, this is a MIB spec. It does not require a Linux > | > (/proc) interface to support 64-bit counters. > | > | Agreed, however if we are going to change some counters, we should do it for > | all of them. (Btw, /proc is not the only point where users can get stats.... > | there is also /sys and something else...I can't remember now...) > > Right, I was just saying that the kernel interface doesn't have > to support 64-bit counters in lots of cases. That can often be > done in userspace. I've been watching this discussion for several months. If i may, let me summarise what i see as the salient points. 1. Uptime is such that many 32bit counters wrap. 2. Userspace can easily detect wrapping when measuring deltas. Provided it only wraps once. 3. Some counters can wrap at intervals so small that userspace cannot accurately detect the wrap without the monitoring tool becoming a significant system load. 4. 64bit counters would be sufficient. At least for most of these counters. 6. Without atomicity the counters will have windows where they report garbage. And if the code paths writing the counter aren't otherwise protected they can likewise corrupt the counter. 5. The locking overhead needed for atomicity of 64bit counters on 32bit architectures is excessive for fast-paths. It seems to me that what is needed is a in-kernel component that can intermediate between internal 32bit counters and userspace-visible 64bit (or larger) counters. This component would need to be active often enough that the counters don't wrap without detection and so that userspace will see sufficiently accurate numbers. My thought would be to use 96bits for each counter. In-kernel code would run periodically doing something like this: curval = counter.in_kernel; /* get it in a register for atomicity */ if (counter.user_low < curval) ++counter.user_high; counter.user_low = curval; This code would run every N jiffies or be in a high priority kernel thread. As an in-kernel service it could loop over a set of counters that have been registered with it. If needed you could even have user_high be larger than 32 bits. It could even be possible to make the code accessing the userspace counter fall-back to the kernel one if the 64bit counter is zero. That way registration could potentially be userspace triggered. This is just the acorn of an idea. It does mean that userspace visible counters will not have instantaneous resolution but it seems to me that HZ should be more than tight enough. There are certainly other ways to achieve this and implementation should take into account cache effects. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 21:55 ` jw schultz @ 2003-07-25 22:51 ` Jeff Sipek 2003-07-26 0:08 ` Ben Greear 0 siblings, 1 reply; 16+ messages in thread From: Jeff Sipek @ 2003-07-25 22:51 UTC (permalink / raw) To: jw schultz, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 25 July 2003 17:55, jw schultz wrote: > I've been watching this discussion for several months. If i > may, let me summarise what i see as the salient points. > > 1. Uptime is such that many 32bit counters wrap. > > 2. Userspace can easily detect wrapping when > measuring deltas. Provided it only wraps once. > > 3. Some counters can wrap at intervals so small that > userspace cannot accurately detect the wrap without > the monitoring tool becoming a significant system > load. Exactly, this is why I think that we should make the counters 64-bits right now, so that we don't have to worry about them later - when it will be required to have them 64-bits long. > 4. 64bit counters would be sufficient. At least for > most of these counters. > > 6. Without atomicity the counters will have windows > where they report garbage. And if the code paths > writing the counter aren't otherwise protected they > can likewise corrupt the counter. > > 5. The locking overhead needed for atomicity of > 64bit counters on 32bit architectures is excessive > for fast-paths. Per cpu variables with global overflow seem to be the way to go (at least for the network statistics.) > It seems to me that what is needed is a in-kernel component > that can intermediate between internal 32bit counters and > userspace-visible 64bit (or larger) counters. This > component would need to be active often enough that the > counters don't wrap without detection and so that userspace > will see sufficiently accurate numbers. Very interesting, the same thing that "was supposed to be done" in user space, but modular and in the kernel itself...I am impressed. > My thought would be to use 96bits for each counter. In-kernel > code would run periodically doing something like this: > > curval = counter.in_kernel; > /* get it in a register for atomicity */ > if (counter.user_low < curval) > ++counter.user_high; > counter.user_low = curval; > > This code would run every N jiffies or be in a high priority > kernel thread. As an in-kernel service it could loop over a > set of counters that have been registered with it. If > needed you could even have user_high be larger than 32 bits. > > It could even be possible to make the code accessing the > userspace counter fall-back to the kernel one if the 64bit > counter is zero. That way registration could potentially be > userspace triggered. > > This is just the acorn of an idea. It does mean that > userspace visible counters will not have instantaneous > resolution but it seems to me that HZ should be more than > tight enough. There are certainly other ways to achieve > this and implementation should take into account cache > effects. Overall, great idea! We basically have a choice: - - 32-bit counters with overflows every 4GB and instantenious (sp?) stats - - 64-bit counters with overflows every 16PB and possibility of stats being off a bit Jeff. - -- *NOTE: This message is ROT-13 encrypted twice for extra protection* -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/IbSPwFP0+seVj/4RAgpoAKCZm4eswdJ+iPJZdsvlhUGXyfJZYACfVwyl 4dIfHzaufhuGSMFt2ZDd5Vg= =iVm4 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-25 22:51 ` Jeff Sipek @ 2003-07-26 0:08 ` Ben Greear 2003-07-26 0:44 ` jw schultz 0 siblings, 1 reply; 16+ messages in thread From: Ben Greear @ 2003-07-26 0:08 UTC (permalink / raw) To: Jeff Sipek; +Cc: jw schultz, linux-kernel Jeff Sipek wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Friday 25 July 2003 17:55, jw schultz wrote: > >>My thought would be to use 96bits for each counter. In-kernel >>code would run periodically doing something like this: >> >> curval = counter.in_kernel; >> /* get it in a register for atomicity */ >> if (counter.user_low < curval) >> ++counter.user_high; >> counter.user_low = curval; What about every 30 seconds or so, detect wraps, and bump the 'high' counter if it wraps. (Check more often if you can wrap more than once in 30 secs). Then, upon read by user-space (or whatever needs 64-bit counters): 1) check wrap 2) grab low bits and OR them with the high bits. 3) check wrap again. If wrap happened, try again. Assumption is it could never wrap more than once during the time you are checking. I think this could give us very low overhead, and extremely precise 64-bit reads. And, I think it would not need locks in the fast path..but I could also be missing something :) -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Net device byte statistics 2003-07-26 0:08 ` Ben Greear @ 2003-07-26 0:44 ` jw schultz 0 siblings, 0 replies; 16+ messages in thread From: jw schultz @ 2003-07-26 0:44 UTC (permalink / raw) To: linux-kernel On Fri, Jul 25, 2003 at 05:08:46PM -0700, Ben Greear wrote: > Jeff Sipek wrote: > >-----BEGIN PGP SIGNED MESSAGE----- > >Hash: SHA1 > > > >On Friday 25 July 2003 17:55, jw schultz wrote: > > > > >>My thought would be to use 96bits for each counter. In-kernel > >>code would run periodically doing something like this: > >> > >> curval = counter.in_kernel; > >> /* get it in a register for atomicity */ > >> if (counter.user_low < curval) > >> ++counter.user_high; > >> counter.user_low = curval; > > What about every 30 seconds or so, detect wraps, and bump the 'high' counter > if it wraps. (Check more often if you can wrap more than once in 30 secs). Yes, how often the component needs run will depend on the fastest counter. > Then, upon read by user-space (or whatever needs 64-bit counters): > > 1) check wrap > 2) grab low bits and OR them with the high bits. > 3) check wrap again. If wrap happened, try again. Assumption is it could > never wrap > more than once during the time you are checking. If you need to have userspace get instantaneous values it would be more efficient to have userspace do the update_64bit_counter code for just its counter than to have multiple wrap checks. > I think this could give us very low overhead, and extremely precise 64-bit > reads. And, I think it would not need locks in the fast path..but I could > also be missing something :) Per-cpu counters. If this is done a variant of this for per-cpu counters would be helpful. Per-cpu counters have the advantage of reducing cache-line bouncing. I don't think per-cpu counters should be used as a band-aid (elasto-plast) for counter wrapping. Besides, how many 12Ghz 4-way hypertheaded (shared cache) CPUs do you need? And if you only have one should you have per-cpu counters? I don't think so. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2003-07-26 0:29 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-07-24 23:56 Net device byte statistics Fredrik Tolf 2003-07-25 0:22 ` Bernd Eckenfels 2003-07-25 2:37 ` Fredrik Tolf 2003-07-25 3:26 ` Bernd Eckenfels 2003-07-25 3:49 ` Jeff Sipek 2003-07-25 3:54 ` Jeff Sipek 2003-07-25 7:03 ` Denis Vlasenko 2003-07-25 7:01 ` Andre Hedrick 2003-07-25 16:23 ` Jeff Sipek 2003-07-25 17:20 ` Randy.Dunlap 2003-07-25 17:55 ` Jeff Sipek 2003-07-25 17:58 ` Randy.Dunlap 2003-07-25 21:55 ` jw schultz 2003-07-25 22:51 ` Jeff Sipek 2003-07-26 0:08 ` Ben Greear 2003-07-26 0:44 ` jw schultz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).