linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Net device byte statistics
@ 2003-07-24 23:56 Fredrik Tolf
  2003-07-25  0:22 ` Bernd Eckenfels
  0 siblings, 1 reply; 16+ messages in thread
From: Fredrik Tolf @ 2003-07-24 23:56 UTC (permalink / raw)
  To: linux-kernel

I have set up a network statistics gathering script, which is based on the 
byte statistics from /proc/net/dev, but it has always been reporting too 
little.
Yesterday, I discovered that the cause was that these statistics are defined 
as unsigned longs in include/linux/netdevice.h. Surely, this must be strange? 
They overflow at least once a day for me.
On the other hand, I cannot imagine that noone would have thought of it. What 
is the reason for this? Is there another interface that I should use instead 
of /proc/net/dev to gather byte statistics for interfaces?
Shouldn't they be changed to unsigned long longs in any case?

Fredrik Tolf


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-24 23:56 Net device byte statistics Fredrik Tolf
@ 2003-07-25  0:22 ` Bernd Eckenfels
  2003-07-25  2:37   ` Fredrik Tolf
  2003-07-25  7:03   ` Denis Vlasenko
  0 siblings, 2 replies; 16+ messages in thread
From: Bernd Eckenfels @ 2003-07-25  0:22 UTC (permalink / raw)
  To: linux-kernel

In article <200307250156.47108.fredrik@dolda2000.cjb.net> you wrote:
> On the other hand, I cannot imagine that noone would have thought of it. What 
> is the reason for this? Is there another interface that I should use instead 
> of /proc/net/dev to gather byte statistics for interfaces?

it is for performance reasons. You can

a) collect your numbers more often and asume wrap/reboot  if numbers
decrease
b) use iptables counters instead

BTW: it is a very often discussed topic, personally (as net tools
maintainer) I would love to see 64bit counters here, but this still means
you have to sample often enough, so you do not lose numbers on crash.

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25  0:22 ` Bernd Eckenfels
@ 2003-07-25  2:37   ` Fredrik Tolf
  2003-07-25  3:26     ` Bernd Eckenfels
  2003-07-25  3:54     ` Jeff Sipek
  2003-07-25  7:03   ` Denis Vlasenko
  1 sibling, 2 replies; 16+ messages in thread
From: Fredrik Tolf @ 2003-07-25  2:37 UTC (permalink / raw)
  To: Bernd Eckenfels, linux-kernel

On Friday 25 July 2003 02.22, Bernd Eckenfels wrote:
> it is for performance reasons. You can

I almost thought that would be it. I do understand that that code needs to be 
really clean, but, correct me if I'm wrong, but isn't GCC's long long 
implementation efficient enough to only add minimal overhead to that? On 
IA32, it shouldn't take more than one or two more instructions (per counter), 
and it seems to me that net_device_stats should still be small enough to 
avoid any more cache misses.
I'm no expert, of course, so if I'm wrong, please tell me.

> a) collect your numbers more often and asume wrap/reboot  if numbers
> decrease
> b) use iptables counters instead

Currently, I'm sampling once a day, and although sampling more often could, of 
course, solve the problem, it's just that I don't think that it should be 
necessary.
Do the iptables counters take the whole packet into account, or do they ignore 
the ethernet header?

> BTW: it is a very often discussed topic, personally (as net tools
> maintainer) I would love to see 64bit counters here, but this still means
> you have to sample often enough, so you do not lose numbers on crash.

While that is true in theory, I'm just using it to estimate my home net usage, 
and my router hasn't crashed this far, so I'm not very worried about that.

Thank you very much for your input. For now, I'm just going to implement 64 
bit counters in my kernel.

Fredrik Tolf


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25  2:37   ` Fredrik Tolf
@ 2003-07-25  3:26     ` Bernd Eckenfels
  2003-07-25  3:49       ` Jeff Sipek
  2003-07-25  3:54     ` Jeff Sipek
  1 sibling, 1 reply; 16+ messages in thread
From: Bernd Eckenfels @ 2003-07-25  3:26 UTC (permalink / raw)
  To: linux-kernel

In article <200307250437.50928.fredrik@dolda2000.cjb.net> you wrote:
> I almost thought that would be it. I do understand that that code needs to be 
> really clean, but, correct me if I'm wrong, but isn't GCC's long long 
> implementation efficient enough to only add minimal overhead to that?

I think there is mainly an issue with atomic incremets. I am not sure if the
counter can be incremeted concurrently, or if the code path would be
serialized, but there is always the reading side, which may need to retry an
read. Besides that, the counter is in the fast path, so it will add some
delay to packet handling.

I guess a 64bit implementation will need to be a per-cpu solution.

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25  3:26     ` Bernd Eckenfels
@ 2003-07-25  3:49       ` Jeff Sipek
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Sipek @ 2003-07-25  3:49 UTC (permalink / raw)
  To: Bernd Eckenfels, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 24 July 2003 23:26, Bernd Eckenfels wrote:
> In article <200307250437.50928.fredrik@dolda2000.cjb.net> you wrote:
> > I almost thought that would be it. I do understand that that code needs
> > to be really clean, but, correct me if I'm wrong, but isn't GCC's long
> > long implementation efficient enough to only add minimal overhead to
> > that?
>
> I think there is mainly an issue with atomic incremets. I am not sure if
> the counter can be incremeted concurrently, or if the code path would be
> serialized, but there is always the reading side, which may need to retry
> an read. Besides that, the counter is in the fast path, so it will add some
> delay to packet handling.
>
> I guess a 64bit implementation will need to be a per-cpu solution.

I am actually working on it.

Jeff.

- -- 
You measure democracy by the freedom it gives its dissidents, not the
freedom it gives its assimilated conformists.
		- Abbie Hoffman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IKjYwFP0+seVj/4RAi30AKCKSue0MzoXXggx0BJriERW5DXpIQCg1ECY
ZDgy8Dra96jzj4zJz/pGAW0=
=wuhA
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25  2:37   ` Fredrik Tolf
  2003-07-25  3:26     ` Bernd Eckenfels
@ 2003-07-25  3:54     ` Jeff Sipek
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Sipek @ 2003-07-25  3:54 UTC (permalink / raw)
  To: Fredrik Tolf, Bernd Eckenfels, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 24 July 2003 22:37, Fredrik Tolf wrote:
> On Friday 25 July 2003 02.22, Bernd Eckenfels wrote:
> > it is for performance reasons. You can
>
> I almost thought that would be it. I do understand that that code needs to
> be really clean, but, correct me if I'm wrong, but isn't GCC's long long
> implementation efficient enough to only add minimal overhead to that? On
> IA32, it shouldn't take more than one or two more instructions (per
> counter),

That is the problem. Nobody can tell what is going to happen between those 
extra instructions. The worst case scenario would be statistics off by 4GB.

> and it seems to me that net_device_stats should still be small
> enough to avoid any more cache misses.
> I'm no expert, of course, so if I'm wrong, please tell me.
>
> > a) collect your numbers more often and asume wrap/reboot  if numbers
> > decrease
> > b) use iptables counters instead
>
> Currently, I'm sampling once a day, and although sampling more often could,
> of course, solve the problem, it's just that I don't think that it should
> be necessary.

There needs to be an implementation that is very friendly to the performance.

> Do the iptables counters take the whole packet into account, or do they
> ignore the ethernet header?

I have no idea.

> > BTW: it is a very often discussed topic, personally (as net tools
> > maintainer) I would love to see 64bit counters here, but this still means
> > you have to sample often enough, so you do not lose numbers on crash.
>
> While that is true in theory, I'm just using it to estimate my home net
> usage, and my router hasn't crashed this far, so I'm not very worried about
> that.
>
> Thank you very much for your input. For now, I'm just going to implement 64
> bit counters in my kernel.

May I ask how and which kernel version?

Jeff.

- -- 
A lot of my debugging happens in the shower.
		- Zwane Mwaikambo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IKnowFP0+seVj/4RAp5xAJ48QIdsoo2uzZMoARh3pXeLa3ZgoACgnKSG
X9bSWvQu0u3s1jWYdN7+Dxk=
=PNeu
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25  7:03   ` Denis Vlasenko
@ 2003-07-25  7:01     ` Andre Hedrick
  2003-07-25 16:23     ` Jeff Sipek
  1 sibling, 0 replies; 16+ messages in thread
From: Andre Hedrick @ 2003-07-25  7:01 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Bernd Eckenfels, linux-kernel


Denis,

@ $7K per card you will not have to worry for a while :-O

-a

On Fri, 25 Jul 2003, Denis Vlasenko wrote:

> On 25 July 2003 03:22, Bernd Eckenfels wrote:
> > In article <200307250156.47108.fredrik@dolda2000.cjb.net> you wrote:
> > > On the other hand, I cannot imagine that noone would have thought of it. What 
> > > is the reason for this? Is there another interface that I should use instead 
> > > of /proc/net/dev to gather byte statistics for interfaces?
> > 
> > it is for performance reasons. You can
> > 
> > a) collect your numbers more often and asume wrap/reboot  if numbers
> > decrease
> > b) use iptables counters instead
> > 
> > BTW: it is a very often discussed topic, personally (as net tools
> > maintainer) I would love to see 64bit counters here, but this still means
> > you have to sample often enough, so you do not lose numbers on crash.
> 
> I sample the data every minute. Will need to do it much more often
> on 10ge ifaces, when those will appear at my home ;)
> 
> Or we will need 64bit counters then.
> --
> vda
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25  0:22 ` Bernd Eckenfels
  2003-07-25  2:37   ` Fredrik Tolf
@ 2003-07-25  7:03   ` Denis Vlasenko
  2003-07-25  7:01     ` Andre Hedrick
  2003-07-25 16:23     ` Jeff Sipek
  1 sibling, 2 replies; 16+ messages in thread
From: Denis Vlasenko @ 2003-07-25  7:03 UTC (permalink / raw)
  To: Bernd Eckenfels, linux-kernel

On 25 July 2003 03:22, Bernd Eckenfels wrote:
> In article <200307250156.47108.fredrik@dolda2000.cjb.net> you wrote:
> > On the other hand, I cannot imagine that noone would have thought of it. What 
> > is the reason for this? Is there another interface that I should use instead 
> > of /proc/net/dev to gather byte statistics for interfaces?
> 
> it is for performance reasons. You can
> 
> a) collect your numbers more often and asume wrap/reboot  if numbers
> decrease
> b) use iptables counters instead
> 
> BTW: it is a very often discussed topic, personally (as net tools
> maintainer) I would love to see 64bit counters here, but this still means
> you have to sample often enough, so you do not lose numbers on crash.

I sample the data every minute. Will need to do it much more often
on 10ge ifaces, when those will appear at my home ;)

Or we will need 64bit counters then.
--
vda

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25  7:03   ` Denis Vlasenko
  2003-07-25  7:01     ` Andre Hedrick
@ 2003-07-25 16:23     ` Jeff Sipek
  2003-07-25 17:20       ` Randy.Dunlap
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Sipek @ 2003-07-25 16:23 UTC (permalink / raw)
  To: vda, Bernd Eckenfels, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 25 July 2003 03:03, Denis Vlasenko wrote:
> I sample the data every minute. Will need to do it much more often
> on 10ge ifaces, when those will appear at my home ;)

Speed			Time for one overflow

10Gbits/s	=> 3.436 seconds
1Gbit/s		=> 34.36 seconds
100Mbits/s	=> 343.6 seconds

> Or we will need 64bit counters then.

For anything up to (and including) 1GBit/s it is possible to do in easily in 
userspace, but then were are getting into an area where a program would have 
to check the files every 3 seconds (and a bit of load could delay it long 
enough for an overflow to happen.)

Jeff.

- -- 
FORTUNE PROVIDES QUESTIONS FOR THE GREAT ANSWERS: #19
A:      To be or not to be.
Q:      What is the square root of 4b^2?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IVmNwFP0+seVj/4RAioPAJ0Y9+lsU/pcwubJeyt8sIogOJt7/ACgoNhT
o1qluqX84CNqU2du7WXG4Eo=
=IlX4
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25 16:23     ` Jeff Sipek
@ 2003-07-25 17:20       ` Randy.Dunlap
  2003-07-25 17:55         ` Jeff Sipek
  0 siblings, 1 reply; 16+ messages in thread
From: Randy.Dunlap @ 2003-07-25 17:20 UTC (permalink / raw)
  To: Jeff Sipek; +Cc: vda, ecki-lkm, linux-kernel

On Fri, 25 Jul 2003 12:23:37 -0400 Jeff Sipek <jeffpc@optonline.net> wrote:

| -----BEGIN PGP SIGNED MESSAGE-----
| Hash: SHA1
| 
| On Friday 25 July 2003 03:03, Denis Vlasenko wrote:
| > I sample the data every minute. Will need to do it much more often
| > on 10ge ifaces, when those will appear at my home ;)
| 
| Speed			Time for one overflow
| 
| 10Gbits/s	=> 3.436 seconds
| 1Gbit/s		=> 34.36 seconds
| 100Mbits/s	=> 343.6 seconds
| 
| > Or we will need 64bit counters then.
| 
| For anything up to (and including) 1GBit/s it is possible to do in easily in 
| userspace, but then were are getting into an area where a program would have 
| to check the files every 3 seconds (and a bit of load could delay it long 
| enough for an overflow to happen.)

Yes, a common solution for this is to use some SNMP agent that does
64-bit counter accumulation.

IETF expects that some high-speed interfaces will have 64-bit
counters.  From RFC 2233 (Interfaces Group MIB using SMIv2):

<quote>
For interfaces that operate at 20,000,000 (20 million) bits per
second or less, 32-bit byte and packet counters MUST be used.
For interfaces that operate faster than 20,000,000 bits/second,
and slower than 650,000,000 bits/second, 32-bit packet counters
MUST be used and 64-bit octet counters MUST be used. For
interfaces that operate at 650,000,000 bits/second or faster,
64-bit packet counters AND 64-bit octet counters MUST be used.
</quote>

However, this is a MIB spec.  It does not require a Linux
(/proc) interface to support 64-bit counters.

--
~Randy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25 17:20       ` Randy.Dunlap
@ 2003-07-25 17:55         ` Jeff Sipek
  2003-07-25 17:58           ` Randy.Dunlap
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Sipek @ 2003-07-25 17:55 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: vda, ecki-lkm, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 25 July 2003 13:20, Randy.Dunlap wrote:
> Yes, a common solution for this is to use some SNMP agent that does
> 64-bit counter accumulation.

Interesting...I haven't thought of SNMP.

> IETF expects that some high-speed interfaces will have 64-bit
> counters.  From RFC 2233 (Interfaces Group MIB using SMIv2):
>
> <quote>
> For interfaces that operate at 20,000,000 (20 million) bits per
> second or less, 32-bit byte and packet counters MUST be used.
> For interfaces that operate faster than 20,000,000 bits/second,
> and slower than 650,000,000 bits/second, 32-bit packet counters
> MUST be used and 64-bit octet counters MUST be used. For
> interfaces that operate at 650,000,000 bits/second or faster,
> 64-bit packet counters AND 64-bit octet counters MUST be used.
> </quote>

It is just easier to have everything 64-bits.

> However, this is a MIB spec.  It does not require a Linux
> (/proc) interface to support 64-bit counters.

Agreed, however if we are going to change some counters, we should do it for 
all of them. (Btw, /proc is not the only point where users can get stats.... 
there is also /sys and something else...I can't remember now...)

Jeff.

- -- 
Research, n.:
  Consider Columbus:
    He didn't know where he was going.
    When he got there he didn't know where he was.
    When he got back he didn't know where he had been.
    And he did it all on someone else's money.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IW8GwFP0+seVj/4RAhf3AKDAtCkm8UdL4T1ZQzqttEG7XyVW9ACeIT6m
RKO8c2UnpSuJwyvwHd5PS8c=
=4vls
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25 17:55         ` Jeff Sipek
@ 2003-07-25 17:58           ` Randy.Dunlap
  2003-07-25 21:55             ` jw schultz
  0 siblings, 1 reply; 16+ messages in thread
From: Randy.Dunlap @ 2003-07-25 17:58 UTC (permalink / raw)
  To: Jeff Sipek; +Cc: vda, ecki-lkm, linux-kernel

On Fri, 25 Jul 2003 13:55:14 -0400 Jeff Sipek <jeffpc@optonline.net> wrote:

| -----BEGIN PGP SIGNED MESSAGE-----
| Hash: SHA1
| 
| On Friday 25 July 2003 13:20, Randy.Dunlap wrote:
| > Yes, a common solution for this is to use some SNMP agent that does
| > 64-bit counter accumulation.
| 
| Interesting...I haven't thought of SNMP.
| 
| > IETF expects that some high-speed interfaces will have 64-bit
| > counters.  From RFC 2233 (Interfaces Group MIB using SMIv2):
| >
| > <quote>
| > For interfaces that operate at 20,000,000 (20 million) bits per
| > second or less, 32-bit byte and packet counters MUST be used.
| > For interfaces that operate faster than 20,000,000 bits/second,
| > and slower than 650,000,000 bits/second, 32-bit packet counters
| > MUST be used and 64-bit octet counters MUST be used. For
| > interfaces that operate at 650,000,000 bits/second or faster,
| > 64-bit packet counters AND 64-bit octet counters MUST be used.
| > </quote>
| 
| It is just easier to have everything 64-bits.

I think the counterpoint is that if it were easy & safe, it would
already be in the kernel.

| > However, this is a MIB spec.  It does not require a Linux
| > (/proc) interface to support 64-bit counters.
| 
| Agreed, however if we are going to change some counters, we should do it for 
| all of them. (Btw, /proc is not the only point where users can get stats.... 
| there is also /sys and something else...I can't remember now...)

Right, I was just saying that the kernel interface doesn't have
to support 64-bit counters in lots of cases.  That can often be
done in userspace.

--
~Randy
| http://developer.osdl.org/rddunlap/ | http://www.xenotime.net/linux/ |
For Linux-2.6:
http://www.codemonkey.org.uk/post-halloween-2.5.txt
  or http://lwn.net/Articles/39901/
http://www.kernel.org/pub/linux/kernel/people/rusty/modules/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25 17:58           ` Randy.Dunlap
@ 2003-07-25 21:55             ` jw schultz
  2003-07-25 22:51               ` Jeff Sipek
  0 siblings, 1 reply; 16+ messages in thread
From: jw schultz @ 2003-07-25 21:55 UTC (permalink / raw)
  To: linux-kernel

On Fri, Jul 25, 2003 at 10:58:18AM -0700, Randy.Dunlap wrote:
> On Fri, 25 Jul 2003 13:55:14 -0400 Jeff Sipek <jeffpc@optonline.net> wrote:
> 
> | -----BEGIN PGP SIGNED MESSAGE-----
> | Hash: SHA1
> | 
> | On Friday 25 July 2003 13:20, Randy.Dunlap wrote:
> | > Yes, a common solution for this is to use some SNMP agent that does
> | > 64-bit counter accumulation.
> | 
> | Interesting...I haven't thought of SNMP.
> | 
> | > IETF expects that some high-speed interfaces will have 64-bit
> | > counters.  From RFC 2233 (Interfaces Group MIB using SMIv2):
> | >
> | > <quote>
> | > For interfaces that operate at 20,000,000 (20 million) bits per
> | > second or less, 32-bit byte and packet counters MUST be used.
> | > For interfaces that operate faster than 20,000,000 bits/second,
> | > and slower than 650,000,000 bits/second, 32-bit packet counters
> | > MUST be used and 64-bit octet counters MUST be used. For
> | > interfaces that operate at 650,000,000 bits/second or faster,
> | > 64-bit packet counters AND 64-bit octet counters MUST be used.
> | > </quote>
> | 
> | It is just easier to have everything 64-bits.
> 
> I think the counterpoint is that if it were easy & safe, it would
> already be in the kernel.
> 
> | > However, this is a MIB spec.  It does not require a Linux
> | > (/proc) interface to support 64-bit counters.
> | 
> | Agreed, however if we are going to change some counters, we should do it for 
> | all of them. (Btw, /proc is not the only point where users can get stats.... 
> | there is also /sys and something else...I can't remember now...)
> 
> Right, I was just saying that the kernel interface doesn't have
> to support 64-bit counters in lots of cases.  That can often be
> done in userspace.

I've been watching this discussion for several months.  If i
may, let me summarise what i see as the salient points.

	1. Uptime is such that many 32bit counters wrap.

	2. Userspace can easily detect wrapping when
	measuring deltas.  Provided it only wraps once.

	3. Some counters can wrap at intervals so small that
	userspace cannot accurately detect the wrap without
	the monitoring tool becoming a significant system
	load.

	4. 64bit counters would be sufficient.  At least for
	most of these counters.

	6. Without atomicity the counters will have windows
	where they report garbage.  And if the code paths
	writing the counter aren't otherwise protected they
	can likewise corrupt the counter.

	5. The locking overhead needed for atomicity of
	64bit counters on 32bit architectures is excessive
	for fast-paths.

It seems to me that what is needed is a in-kernel component
that can intermediate between internal 32bit counters and
userspace-visible 64bit (or larger) counters.  This
component would need to be active often enough that the
counters don't wrap without detection and so that userspace
will see sufficiently accurate numbers.

My thought would be to use 96bits for each counter.  In-kernel
code would run periodically doing something like this:

	curval = counter.in_kernel;
			/* get it in a register for atomicity */
	if (counter.user_low < curval)
		++counter.user_high;
	counter.user_low = curval;

This code would run every N jiffies or be in a high priority
kernel thread.  As an in-kernel service it could loop over a
set of counters that have been registered with it.  If
needed you could even have user_high be larger than 32 bits.

It could even be possible to make the code accessing the
userspace counter fall-back to the kernel one if the 64bit
counter is zero.  That way registration could potentially be
userspace triggered.

This is just the acorn of an idea.  It does mean that
userspace visible counters will not have instantaneous
resolution but it seems to me that HZ should be more than
tight enough.  There are certainly other ways to achieve
this and implementation should take into account cache
effects.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25 21:55             ` jw schultz
@ 2003-07-25 22:51               ` Jeff Sipek
  2003-07-26  0:08                 ` Ben Greear
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Sipek @ 2003-07-25 22:51 UTC (permalink / raw)
  To: jw schultz, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 25 July 2003 17:55, jw schultz wrote:
> I've been watching this discussion for several months.  If i
> may, let me summarise what i see as the salient points.
>
> 	1. Uptime is such that many 32bit counters wrap.
>
> 	2. Userspace can easily detect wrapping when
> 	measuring deltas.  Provided it only wraps once.
>
> 	3. Some counters can wrap at intervals so small that
> 	userspace cannot accurately detect the wrap without
> 	the monitoring tool becoming a significant system
> 	load.

Exactly, this is why I think that we should make the counters 64-bits right 
now, so that we don't have to worry about them later - when it will be 
required to have them 64-bits long.

> 	4. 64bit counters would be sufficient.  At least for
> 	most of these counters.
>
> 	6. Without atomicity the counters will have windows
> 	where they report garbage.  And if the code paths
> 	writing the counter aren't otherwise protected they
> 	can likewise corrupt the counter.
>
> 	5. The locking overhead needed for atomicity of
> 	64bit counters on 32bit architectures is excessive
> 	for fast-paths.

Per cpu variables with global overflow seem to be the way to go (at least for 
the network statistics.)

> It seems to me that what is needed is a in-kernel component
> that can intermediate between internal 32bit counters and
> userspace-visible 64bit (or larger) counters.  This
> component would need to be active often enough that the
> counters don't wrap without detection and so that userspace
> will see sufficiently accurate numbers.

Very interesting, the same thing that "was supposed to be done" in user space, 
but modular and in the kernel itself...I am impressed.

> My thought would be to use 96bits for each counter.  In-kernel
> code would run periodically doing something like this:
>
> 	curval = counter.in_kernel;
> 			/* get it in a register for atomicity */
> 	if (counter.user_low < curval)
> 		++counter.user_high;
> 	counter.user_low = curval;
>
> This code would run every N jiffies or be in a high priority
> kernel thread.  As an in-kernel service it could loop over a
> set of counters that have been registered with it.  If
> needed you could even have user_high be larger than 32 bits.
>
> It could even be possible to make the code accessing the
> userspace counter fall-back to the kernel one if the 64bit
> counter is zero.  That way registration could potentially be
> userspace triggered.
>
> This is just the acorn of an idea.  It does mean that
> userspace visible counters will not have instantaneous
> resolution but it seems to me that HZ should be more than
> tight enough.  There are certainly other ways to achieve
> this and implementation should take into account cache
> effects.

Overall, great idea! 

We basically have a choice:

- - 32-bit counters with overflows every 4GB and instantenious (sp?) stats
- - 64-bit counters with overflows every 16PB and possibility of stats being off 
a bit

Jeff.

- -- 
*NOTE: This message is ROT-13 encrypted twice for extra protection*
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IbSPwFP0+seVj/4RAgpoAKCZm4eswdJ+iPJZdsvlhUGXyfJZYACfVwyl
4dIfHzaufhuGSMFt2ZDd5Vg=
=iVm4
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-25 22:51               ` Jeff Sipek
@ 2003-07-26  0:08                 ` Ben Greear
  2003-07-26  0:44                   ` jw schultz
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Greear @ 2003-07-26  0:08 UTC (permalink / raw)
  To: Jeff Sipek; +Cc: jw schultz, linux-kernel

Jeff Sipek wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Friday 25 July 2003 17:55, jw schultz wrote:
> 

>>My thought would be to use 96bits for each counter.  In-kernel
>>code would run periodically doing something like this:
>>
>>	curval = counter.in_kernel;
>>			/* get it in a register for atomicity */
>>	if (counter.user_low < curval)
>>		++counter.user_high;
>>	counter.user_low = curval;

What about every 30 seconds or so, detect wraps, and bump the 'high' counter
if it wraps.  (Check more often if you can wrap more than once in 30 secs).

Then, upon read by user-space (or whatever needs 64-bit counters):

1) check wrap
2) grab low bits and OR them with the high bits.
3) check wrap again.  If wrap happened, try again.  Assumption is it could never wrap
    more than once during the time you are checking.

I think this could give us very low overhead, and extremely precise 64-bit
reads.  And, I think it would not need locks in the fast path..but I could
also be missing something :)




-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Net device byte statistics
  2003-07-26  0:08                 ` Ben Greear
@ 2003-07-26  0:44                   ` jw schultz
  0 siblings, 0 replies; 16+ messages in thread
From: jw schultz @ 2003-07-26  0:44 UTC (permalink / raw)
  To: linux-kernel

On Fri, Jul 25, 2003 at 05:08:46PM -0700, Ben Greear wrote:
> Jeff Sipek wrote:
> >-----BEGIN PGP SIGNED MESSAGE-----
> >Hash: SHA1
> >
> >On Friday 25 July 2003 17:55, jw schultz wrote:
> >
> 
> >>My thought would be to use 96bits for each counter.  In-kernel
> >>code would run periodically doing something like this:
> >>
> >>	curval = counter.in_kernel;
> >>			/* get it in a register for atomicity */
> >>	if (counter.user_low < curval)
> >>		++counter.user_high;
> >>	counter.user_low = curval;
> 
> What about every 30 seconds or so, detect wraps, and bump the 'high' counter
> if it wraps.  (Check more often if you can wrap more than once in 30 secs).

Yes, how often the component needs run will depend on the
fastest counter.

> Then, upon read by user-space (or whatever needs 64-bit counters):
> 
> 1) check wrap
> 2) grab low bits and OR them with the high bits.
> 3) check wrap again.  If wrap happened, try again.  Assumption is it could 
> never wrap
>    more than once during the time you are checking.

If you need to have userspace get instantaneous values it
would be more efficient to have userspace do the
update_64bit_counter code for just its counter than to have
multiple wrap checks.

> I think this could give us very low overhead, and extremely precise 64-bit
> reads.  And, I think it would not need locks in the fast path..but I could
> also be missing something :)

Per-cpu counters.  If this is done a variant of this for
per-cpu counters would be helpful.  Per-cpu counters have
the advantage of reducing cache-line bouncing.  I don't
think per-cpu counters should be used as a band-aid
(elasto-plast) for counter wrapping.  Besides, how many
12Ghz 4-way hypertheaded (shared cache) CPUs do you need?
And if you only have one should you have per-cpu counters?
I don't think so.  

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-07-26  0:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-24 23:56 Net device byte statistics Fredrik Tolf
2003-07-25  0:22 ` Bernd Eckenfels
2003-07-25  2:37   ` Fredrik Tolf
2003-07-25  3:26     ` Bernd Eckenfels
2003-07-25  3:49       ` Jeff Sipek
2003-07-25  3:54     ` Jeff Sipek
2003-07-25  7:03   ` Denis Vlasenko
2003-07-25  7:01     ` Andre Hedrick
2003-07-25 16:23     ` Jeff Sipek
2003-07-25 17:20       ` Randy.Dunlap
2003-07-25 17:55         ` Jeff Sipek
2003-07-25 17:58           ` Randy.Dunlap
2003-07-25 21:55             ` jw schultz
2003-07-25 22:51               ` Jeff Sipek
2003-07-26  0:08                 ` Ben Greear
2003-07-26  0:44                   ` jw schultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).