All of lore.kernel.org
 help / color / mirror / Atom feed
* Workaround for wrapping loadaverage
@ 2004-11-08  0:19 Patrick Mau
  2004-11-08  9:27 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick Mau @ 2004-11-08  0:19 UTC (permalink / raw)
  To: Linux Kernel

Hallo everyone,

in a previous message archived at

http://www.ussg.iu.edu/hypermail/linux/kernel/0410.2/2950.html

I described a problem with a wrapping load average on my SMP system.
The following small userspace load simulation exactly matches the
numbers I am seeing.

We can only account for 1024 runnable processes, since we have 22 bits
precision, I would like to suggest a patch to calc_load in kernel/timer.c
that would limit the number of active tasks:


if (active_tasks > 1024 * FIXED_1)
	active_tasks = 1024 * FIXED_1;


I am aware that this is not a fix ... The wrapping happens using threaded
applications (Java/JBoss in my case). Below you'll find a small userspace
simulation.

I would really like to provide a real fix, but I really couldn't figure
out what went wrong.

Thanks for any feedback,
Patrick


/* Sample code copied from include/linux/sched.h and kernel/timer.c */

#include <stdio.h>

#define FSHIFT	11		/* 11 bit precision */
#define FIXED_1	(1 << FSHIFT)	/* 1.0 as fixed-point */

#define EXP_1	1884	/* 1/exp(5sec/1min)  */
#define EXP_5	2014	/* 1/exp(5sec/5min)  */
#define EXP_15	2037	/* 1/exp(5sec/15min) */

#define CALC_LOAD(load, exp, n) \
		load *= exp; \
		load += n*(FIXED_1-exp); \
		load >>= FSHIFT;

static unsigned long avenrun[3];

/* normal load spike and one error */
static unsigned long tasks[8] = {
	0, 1, 0, ~0, 0, 0, 0
};

static void calc_load(unsigned long tasks)
{
	tasks <<= FSHIFT;

	CALC_LOAD(avenrun[0], EXP_1, tasks);
	CALC_LOAD(avenrun[1], EXP_5, tasks);
	CALC_LOAD(avenrun[2], EXP_15, tasks);
}

int main(int argc, char **argv)
{
	int i, j;

	for (i = 0; i < 8; i++) { /* index for tasks[] */
 		/* 24 calculations per load change */

		for (j = 0; j < 24; j++) {
			calc_load(tasks[i]);

			printf("%.2f %.2f %.2f\n",
				   (float) avenrun[0] / FIXED_1, 
				   (float) avenrun[1] / FIXED_1,
				   (float) avenrun[2] / FIXED_1);
		}
	}

	return 0;
}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-08  0:19 Workaround for wrapping loadaverage Patrick Mau
@ 2004-11-08  9:27 ` Andrew Morton
  2004-11-08 10:25   ` Patrick Mau
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-11-08  9:27 UTC (permalink / raw)
  To: Patrick Mau; +Cc: linux-kernel

Patrick Mau <mau@oscar.ping.de> wrote:
>
> n a previous message archived at
> 
>  http://www.ussg.iu.edu/hypermail/linux/kernel/0410.2/2950.html
> 
>  I described a problem with a wrapping load average on my SMP system.
>  The following small userspace load simulation exactly matches the
>  numbers I am seeing.
> 
>  We can only account for 1024 runnable processes, since we have 22 bits
>  precision, I would like to suggest a patch to calc_load in kernel/timer.c
>  that would limit the number of active tasks:
> 
> 
>  if (active_tasks > 1024 * FIXED_1)
>  	active_tasks = 1024 * FIXED_1;
> 

It's better than wrapping to zero...

Why do we need 11 bits after the binary point?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-08  9:27 ` Andrew Morton
@ 2004-11-08 10:25   ` Patrick Mau
  2004-11-08 23:50     ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick Mau @ 2004-11-08 10:25 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Patrick Mau

On Mon, Nov 08, 2004 at 01:27:07AM -0800, Andrew Morton wrote:
> Patrick Mau <mau@oscar.ping.de> wrote:
> >
> >  We can only account for 1024 runnable processes, since we have 22 bits
> >  precision, I would like to suggest a patch to calc_load in kernel/timer.c
> 
> It's better than wrapping to zero...
> 
> Why do we need 11 bits after the binary point?

I tried various other combinations, the most interesting alternative was
8 bits precision. The exponential values would be:

1 / e (5/60) * 256
235.53

1 / e (5/300) * 256
251.76

1 / e (5/900) * 256
254.58

If you would use 236, 252 and 255 the last to load calculations would
get optimized into register shifts during calculation. The precision
would be bad, but I personally don't mind loosing the fraction.

Best regards,
Patrick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-08 10:25   ` Patrick Mau
@ 2004-11-08 23:50     ` Andrew Morton
  2004-11-09  0:43       ` Patrick Mau
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-11-08 23:50 UTC (permalink / raw)
  To: Patrick Mau; +Cc: linux-kernel


(PLease don't remove people from Cc:.  Just do reply-to-all).

Patrick Mau <mau@oscar.ping.de> wrote:
>
> On Mon, Nov 08, 2004 at 01:27:07AM -0800, Andrew Morton wrote:
> > Patrick Mau <mau@oscar.ping.de> wrote:
> > >
> > >  We can only account for 1024 runnable processes, since we have 22 bits
> > >  precision, I would like to suggest a patch to calc_load in kernel/timer.c
> > 
> > It's better than wrapping to zero...
> > 
> > Why do we need 11 bits after the binary point?
> 
> I tried various other combinations, the most interesting alternative was
> 8 bits precision. The exponential values would be:
> 
> 1 / e (5/60) * 256
> 235.53
> 
> 1 / e (5/300) * 256
> 251.76
> 
> 1 / e (5/900) * 256
> 254.58
> 
> If you would use 236, 252 and 255 the last to load calculations would
> get optimized into register shifts during calculation. The precision
> would be bad, but I personally don't mind loosing the fraction.

What would be the impact on the precision if we were to use 8 bits of
fraction?

An upper limit of 1024 tasks sounds a bit squeezy.  Even 8192 is a bit
uncomfortable.  Maybe we should just reimplement the whole thing, perhaps
in terms of tuples of 32-bit values: 32 bits each side of the binary point?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-08 23:50     ` Andrew Morton
@ 2004-11-09  0:43       ` Patrick Mau
  2004-11-09 18:51         ` Herbert Poetzl
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick Mau @ 2004-11-09  0:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel

On Mon, Nov 08, 2004 at 03:50:51PM -0800, Andrew Morton wrote:
> 
> (PLease don't remove people from Cc:.  Just do reply-to-all).

Hi Andrew,

sorry, I usually remove people from CC if they're subscribed.

> Patrick Mau <mau@oscar.ping.de> wrote:
> >
> > If you would use 236, 252 and 255 the last to load calculations would
> > get optimized into register shifts during calculation. The precision
> > would be bad, but I personally don't mind loosing the fraction.
> 
> What would be the impact on the precision if we were to use 8 bits of
> fraction?

I didn't have time to check again, but I think I ended up with a load of 0.97
using one runnable process because of rounding errors.

> An upper limit of 1024 tasks sounds a bit squeezy.  Even 8192 is a bit
> uncomfortable.  Maybe we should just reimplement the whole thing, perhaps
> in terms of tuples of 32-bit values: 32 bits each side of the binary point?

We re-calculate the load every 5 seconds. I think it would be OK to
use more bits/registers, it's not that frequently called.

It's 1:30 AM and I had a rough working day, maybe I'll prepare a little patch
tomorrow. I think that 8192 _runnable_ processes seems a bit unusual, but we
also account for uninterruptable processes. Maybe there was some swap/IO
storm that triggered the initial overflow, I'll have to check that first.

Best regards,
Patrick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-09  0:43       ` Patrick Mau
@ 2004-11-09 18:51         ` Herbert Poetzl
  2004-11-09 21:49           ` Con Kolivas
  2004-11-10  7:07           ` Nick Piggin
  0 siblings, 2 replies; 11+ messages in thread
From: Herbert Poetzl @ 2004-11-09 18:51 UTC (permalink / raw)
  To: Patrick Mau; +Cc: Andrew Morton, Linux Kernel

On Tue, Nov 09, 2004 at 01:43:35AM +0100, Patrick Mau wrote:
> On Mon, Nov 08, 2004 at 03:50:51PM -0800, Andrew Morton wrote:
> > 
> > (PLease don't remove people from Cc:.  Just do reply-to-all).
> 
> Hi Andrew,
> 
> sorry, I usually remove people from CC if they're subscribed.
> 
> > Patrick Mau <mau@oscar.ping.de> wrote:
> > >
> > > If you would use 236, 252 and 255 the last to load calculations would
> > > get optimized into register shifts during calculation. The precision
> > > would be bad, but I personally don't mind loosing the fraction.
> > 
> > What would be the impact on the precision if we were to use 8 bits of
> > fraction?
> 
> I didn't have time to check again, but I think I ended up with a load of 0.97
> using one runnable process because of rounding errors.
> 
> > An upper limit of 1024 tasks sounds a bit squeezy.  Even 8192 is a bit
> > uncomfortable.  Maybe we should just reimplement the whole thing, perhaps
> > in terms of tuples of 32-bit values: 32 bits each side of the binary point?
> 
> We re-calculate the load every 5 seconds. I think it would be OK to
> use more bits/registers, it's not that frequently called.

hmm ...

	do_timer() -> update_times() -> calc_load()

so not exactly every 5 seconds ...

but I agree that a higher resolution would be a good
idea ... also doing the calculation when the number
of running/uninterruptible processes has changed would
be a good idea ...

(I implemented something similar for linux-vserver, 
 if there is interest, I could adapt it for mainline)

> It's 1:30 AM and I had a rough working day, maybe I'll prepare a little patch
> tomorrow. I think that 8192 _runnable_ processes seems a bit unusual, but we
> also account for uninterruptable processes. Maybe there was some swap/IO
> storm that triggered the initial overflow, I'll have to check that first.

best,
Herbert

> Best regards,
> Patrick
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-09 18:51         ` Herbert Poetzl
@ 2004-11-09 21:49           ` Con Kolivas
  2004-11-10  6:20             ` Herbert Poetzl
  2004-11-10  7:07           ` Nick Piggin
  1 sibling, 1 reply; 11+ messages in thread
From: Con Kolivas @ 2004-11-09 21:49 UTC (permalink / raw)
  To: Herbert Poetzl; +Cc: Patrick Mau, Andrew Morton, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 394 bytes --]

Herbert Poetzl wrote:
> but I agree that a higher resolution would be a good
> idea ... also doing the calculation when the number
> of running/uninterruptible processes has changed would
> be a good idea ...

This could get very expensive. A modern cpu can do about 700,000 context 
switches per second of a real task with the current linux kernel so I'd 
suggest not doing this.

Cheers,
Con

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-09 21:49           ` Con Kolivas
@ 2004-11-10  6:20             ` Herbert Poetzl
  2004-11-10  9:57               ` Con Kolivas
  0 siblings, 1 reply; 11+ messages in thread
From: Herbert Poetzl @ 2004-11-10  6:20 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Patrick Mau, Andrew Morton, Linux Kernel

On Wed, Nov 10, 2004 at 08:49:41AM +1100, Con Kolivas wrote:
> Herbert Poetzl wrote:
> >but I agree that a higher resolution would be a good
> >idea ... also doing the calculation when the number
> >of running/uninterruptible processes has changed would
> >be a good idea ...
> 
> This could get very expensive. A modern cpu can do about 700,000 context 
> switches per second of a real task with the current linux kernel so I'd 
> suggest not doing this.

hmm, right it can, do you have any stats about the
'typical' workload behaviour? 

do you know the average time between changes of 
nr_running and nr_uninterruptible?

TIA,
Herbert

> Cheers,
> Con



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-09 18:51         ` Herbert Poetzl
  2004-11-09 21:49           ` Con Kolivas
@ 2004-11-10  7:07           ` Nick Piggin
  2004-11-10 23:31             ` Herbert Poetzl
  1 sibling, 1 reply; 11+ messages in thread
From: Nick Piggin @ 2004-11-10  7:07 UTC (permalink / raw)
  To: Herbert Poetzl; +Cc: Patrick Mau, Andrew Morton, Linux Kernel

Herbert Poetzl wrote:
> On Tue, Nov 09, 2004 at 01:43:35AM +0100, Patrick Mau wrote:
> 

>>We re-calculate the load every 5 seconds. I think it would be OK to
>>use more bits/registers, it's not that frequently called.
> 
> 
> hmm ...
> 
> 	do_timer() -> update_times() -> calc_load()
> 
> so not exactly every 5 seconds ...
> 

calc_load() -> messing with LOAD_FREQ -> once every 5 seconds, no?

I think doing 32/32 bit calculations would be fine.

> but I agree that a higher resolution would be a good
> idea ... also doing the calculation when the number
> of running/uninterruptible processes has changed would
> be a good idea ...
> 

Apart from the problem Con pointed out, you'd need a fancier algorithm
to calculate load because your interval isn't going to be fixed, so you
need to factor that in when calculating the area under the 'curve'
(loadavg).

I think the good 'ol 5 seconds should be alright.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-10  6:20             ` Herbert Poetzl
@ 2004-11-10  9:57               ` Con Kolivas
  0 siblings, 0 replies; 11+ messages in thread
From: Con Kolivas @ 2004-11-10  9:57 UTC (permalink / raw)
  To: Herbert Poetzl; +Cc: Patrick Mau, Andrew Morton, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

Herbert Poetzl wrote:
> On Wed, Nov 10, 2004 at 08:49:41AM +1100, Con Kolivas wrote:
> 
>>Herbert Poetzl wrote:
>>
>>>but I agree that a higher resolution would be a good
>>>idea ... also doing the calculation when the number
>>>of running/uninterruptible processes has changed would
>>>be a good idea ...
>>
>>This could get very expensive. A modern cpu can do about 700,000 context 
>>switches per second of a real task with the current linux kernel so I'd 
>>suggest not doing this.
> 
> 
> hmm, right it can, do you have any stats about the
> 'typical' workload behaviour? 

How long is a piece of string? It depends entirely on your workload. On 
a desktop machine just switching applications pushes it to 10,000. 
Basically you end up making it an O(n) calculation by increasing the 
overhead of it (albeit small) proportionately to the context switch load 
which is usually significantly higher than the system load.

> do you know the average time between changes of 
> nr_running and nr_uninterruptible?

Same answer. Depends entirely on the workload and to whether your 
running tasks sleep at all or not (hint - most do). While it will be a 
lower number than the number of context switches, it potentially can be 
as high with just the right sort of threads (think server, network type 
stuff).

> TIA,
> Herbert

Cheers,
Con

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Workaround for wrapping loadaverage
  2004-11-10  7:07           ` Nick Piggin
@ 2004-11-10 23:31             ` Herbert Poetzl
  0 siblings, 0 replies; 11+ messages in thread
From: Herbert Poetzl @ 2004-11-10 23:31 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Patrick Mau, Andrew Morton, Linux Kernel

On Wed, Nov 10, 2004 at 06:07:25PM +1100, Nick Piggin wrote:
> Herbert Poetzl wrote:
> >On Tue, Nov 09, 2004 at 01:43:35AM +0100, Patrick Mau wrote:
> >
> 
> >>We re-calculate the load every 5 seconds. I think it would be OK to
> >>use more bits/registers, it's not that frequently called.
> >
> >
> >hmm ...
> >
> >	do_timer() -> update_times() -> calc_load()
> >
> >so not exactly every 5 seconds ...
> 
> calc_load() -> messing with LOAD_FREQ -> once every 5 seconds, no?

usually yes ...

> I think doing 32/32 bit calculations would be fine.

agreed ...

> >but I agree that a higher resolution would be a good
> >idea ... also doing the calculation when the number
> >of running/uninterruptible processes has changed would
> >be a good idea ...
> >
> 
> Apart from the problem Con pointed out, you'd need a fancier algorithm
> to calculate load because your interval isn't going to be fixed, so you
> need to factor that in when calculating the area under the 'curve'
> (loadavg).

yes, something like this:

  update_loadavg(uint32_t load, int wsize, int delta, int n)
  {
	unsigned long long calc;

	if (delta >= wsize)
		return (n << FSHIFT);
	calc = (delta * n) << FSHIFT;
	calc += (wsize - delta) * load;
	do_div(calc, wsize);
	return calc;
  }

> I think the good 'ol 5 seconds should be alright.

probably sufficient, yes ...

best,
Herbert

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-11-10 23:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-08  0:19 Workaround for wrapping loadaverage Patrick Mau
2004-11-08  9:27 ` Andrew Morton
2004-11-08 10:25   ` Patrick Mau
2004-11-08 23:50     ` Andrew Morton
2004-11-09  0:43       ` Patrick Mau
2004-11-09 18:51         ` Herbert Poetzl
2004-11-09 21:49           ` Con Kolivas
2004-11-10  6:20             ` Herbert Poetzl
2004-11-10  9:57               ` Con Kolivas
2004-11-10  7:07           ` Nick Piggin
2004-11-10 23:31             ` Herbert Poetzl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.