linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] procfs: export context switch counts in /proc/*/stat
@ 2006-12-18 23:50 David Wragg
  2006-12-19  6:39 ` Benjamin LaHaise
  0 siblings, 1 reply; 12+ messages in thread
From: David Wragg @ 2006-12-18 23:50 UTC (permalink / raw)
  To: linux-kernel

The kernel already maintains context switch counts for each task, and
exposes them through getrusage(2).  These counters can also be used
more generally to track which processes on the system are active
(i.e. getting scheduled to run), but getrusage is too constrained to
use it in that way.

This patch (against 2.6.19/2.6.19.1) adds the four context switch
values (voluntary context switches, involuntary context switches, and
the same values accumulated from terminated child processes) to the
end of /proc/*/stat, similarly to min_flt, maj_flt and the time used
values.

Signed-off-by: David Wragg <david@wragg.org>

diff -uprN --exclude='*.o' --exclude='*~' --exclude='.*' linux-2.6.19.1/fs/proc/array.c linux-2.6.19.1.build/fs/proc/array.c
--- linux-2.6.19.1/fs/proc/array.c	2006-12-18 14:35:36.000000000 +0000
+++ linux-2.6.19.1.build/fs/proc/array.c	2006-12-18 14:43:21.000000000 +0000
@@ -327,6 +327,8 @@ static int do_task_stat(struct task_stru
 	unsigned long cmin_flt = 0, cmaj_flt = 0;
 	unsigned long  min_flt = 0,  maj_flt = 0;
 	cputime_t cutime, cstime, utime, stime;
+	unsigned long cnvcsw = 0, cnivcsw = 0;
+	unsigned long  nvcsw = 0,  nivcsw = 0;
 	unsigned long rsslim = 0;
 	char tcomm[sizeof(task->comm)];
 	unsigned long flags;
@@ -369,6 +371,8 @@ static int do_task_stat(struct task_stru
 		cmaj_flt = sig->cmaj_flt;
 		cutime = sig->cutime;
 		cstime = sig->cstime;
+		cnvcsw = sig->cnvcsw;
+		cnivcsw = sig->cnivcsw;
 		rsslim = sig->rlim[RLIMIT_RSS].rlim_cur;
 
 		/* add up live thread stats at the group level */
@@ -379,6 +383,8 @@ static int do_task_stat(struct task_stru
 				maj_flt += t->maj_flt;
 				utime = cputime_add(utime, t->utime);
 				stime = cputime_add(stime, t->stime);
+				nvcsw += t->nvcsw;
+				nivcsw += t->nivcsw;
 				t = next_thread(t);
 			} while (t != task);
 
@@ -386,6 +392,8 @@ static int do_task_stat(struct task_stru
 			maj_flt += sig->maj_flt;
 			utime = cputime_add(utime, sig->utime);
 			stime = cputime_add(stime, sig->stime);
+			nvcsw += sig->nvcsw;
+			nivcsw += sig->nivcsw;
 		}
 
 		sid = sig->session;
@@ -404,6 +412,8 @@ static int do_task_stat(struct task_stru
 		maj_flt = task->maj_flt;
 		utime = task->utime;
 		stime = task->stime;
+		nvcsw = task->nvcsw;
+		nivcsw = task->nivcsw;		
 	}
 
 	/* scale priority and nice values from timeslices to -20..20 */
@@ -420,7 +430,7 @@ static int do_task_stat(struct task_stru
 
 	res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
-%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
+%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu %lu %lu %lu %lu\n",
 		task->pid,
 		tcomm,
 		state,
@@ -465,7 +475,12 @@ static int do_task_stat(struct task_stru
 		task_cpu(task),
 		task->rt_priority,
 		task->policy,
-		(unsigned long long)delayacct_blkio_ticks(task));
+		(unsigned long long)delayacct_blkio_ticks(task),
+		nvcsw,
+		cnvcsw,
+		nivcsw,
+		cnivcsw);
+                
 	if(mm)
 		mmput(mm);
 	return res;



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-18 23:50 [PATCH] procfs: export context switch counts in /proc/*/stat David Wragg
@ 2006-12-19  6:39 ` Benjamin LaHaise
  2006-12-19 11:47   ` David Wragg
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin LaHaise @ 2006-12-19  6:39 UTC (permalink / raw)
  To: David Wragg; +Cc: linux-kernel

On Mon, Dec 18, 2006 at 11:50:08PM +0000, David Wragg wrote:
> This patch (against 2.6.19/2.6.19.1) adds the four context switch
> values (voluntary context switches, involuntary context switches, and
> the same values accumulated from terminated child processes) to the
> end of /proc/*/stat, similarly to min_flt, maj_flt and the time used
> values.

Please put these into new files, as the stat files in /proc are 
horribly overloaded and have always been somewhat problematic 
when it comes to changing how things are reported due to internal 
changes to the kernel.  Cheers,

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-19  6:39 ` Benjamin LaHaise
@ 2006-12-19 11:47   ` David Wragg
  0 siblings, 0 replies; 12+ messages in thread
From: David Wragg @ 2006-12-19 11:47 UTC (permalink / raw)
  To: Benjamin LaHaise, linux-kernel

Benjamin LaHaise <bcrl@kvack.org> writes:
> On Mon, Dec 18, 2006 at 11:50:08PM +0000, David Wragg wrote:
>> This patch (against 2.6.19/2.6.19.1) adds the four context switch
>> values (voluntary context switches, involuntary context switches, and
>> the same values accumulated from terminated child processes) to the
>> end of /proc/*/stat, similarly to min_flt, maj_flt and the time used
>> values.
>
> Please put these into new files, as the stat files in /proc are 
> horribly overloaded and have always been somewhat problematic 
> when it comes to changing how things are reported due to internal 
> changes to the kernel.  Cheers,

The delay accounting value was added to the end of /proc/pid/stat back
in July without discussion, so I assumed this approach was still
considered satisfactory.

Putting just these four values into a new file would seem a little
odd, since they have a lot in common with the other getrusage values
that are already in /proc/pid/stat.  One possibility is to add
/proc/pid/rusage, mirroring the full struct rusage in text form, since
struct rusage is already part of the kernel ABI (though Linux doesn't
fill in half of the values).

Or perhaps it makes sense to reorganize all the values from
/proc/pid/stat and its siblings into a sysfs-like one-value-per-file
structure, though that might introduce atomicity and efficiency issues
(calculating some of the values involves iterating over the threads in
the process; with everything in one file, these loops are folded
together).

Any thoughts?


David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-20 17:36   ` Albert Cahalan
@ 2006-12-24  1:40     ` David Wragg
  0 siblings, 0 replies; 12+ messages in thread
From: David Wragg @ 2006-12-24  1:40 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel, bcrl

"Albert Cahalan" <acahalan@gmail.com> writes:
> The cumulative ones are still not justified though, and I fear they
> may be 64-bit even on i386.

All the context switch counts are unsigned long.

> It turns out that an i386 procps spends
> much of its time doing 64-bit division to parse the damn ASCII crap.
> I suppose I could just skip those fields, but generating them isn't
> too cheap and probably I'd get stuck parsing them for some other
> reason -- having them separate is probably a good idea.

I can't think of a compelling justification for the cumulative context
switch counts.  But I suggest that if the cost of exposing these
values is low enough, they should be exposed anyway, just for the sake
of uniformity (these would be the only two getrusage values not
present in /proc/pid/stat).

If the decimal representation of values in /proc/pid/stat has such
unpleasant overheads, then I wonder if that is something worth fixing,
whether the context switch counts are added or not?  It occurs to me
that it would be easy to add support for a hex version of
/proc/pid/stat with very little additional code, by using an alternate
sprintf format string in fs/proc/array.c:do_task_stat().  I assume
that procps could be adapted quite easily to take advantage of this?


David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
@ 2006-12-21  6:02 Al Boldi
  0 siblings, 0 replies; 12+ messages in thread
From: Al Boldi @ 2006-12-21  6:02 UTC (permalink / raw)
  To: linux-kernel

Albert Cahalan wrote:
> On 12/20/06, David Wragg <david@wragg.org> wrote:
> > "Albert Cahalan" <acahalan@gmail.com> writes:
> > > On Mon, Dec 18, 2006 at 11:50:08PM +0000, David Wragg wrote:
> > >> This patch (against 2.6.19/2.6.19.1) adds the four context
> > >> switch values (voluntary context switches, involuntary
> > >> context switches, and the same values accumulated from
> > >> terminated child processes) to the end of /proc/*/stat,
> > >> similarly to min_flt, maj_flt and the time used values.
> > >
> > > Hmmm, OK, do people have a use for these values?
> >
> > My reason for writing the patch was to track which processes are
> > active (i.e. got scheduled to run) by polling these context switch
> > values.  The time used values are not a reliable way to detect process
> > activity on fast machines.  So for example, when sorting by %CPU, top
> > often shows many processes using 0% CPU, despite the fact that these
> > processes are running occasionally.  If top sorted by (%CPU, context
> > switch count delta), it might give a more useful display of which
> > processes are active on the system.
>
> Oh, that'd be great.

It may be great, but it's really only a workaround.  The real fix is in 
changing the current probed proc-timing to an inlined one.

> The cumulative ones are still not justified though, and I fear they
> may be 64-bit even on i386. It turns out that an i386 procps spends
> much of its time doing 64-bit division to parse the damn ASCII crap.
> I suppose I could just skip those fields, but generating them isn't
> too cheap and probably I'd get stuck parsing them for some other
> reason -- having them separate is probably a good idea.

Agreed.  It may also be advisable to add a top3 line in /proc/stat, to 
circumvent parsing /proc/*/stat, when only checking who is eating CPU most. 


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-20 13:20 ` David Wragg
  2006-12-20 13:48   ` Arjan van de Ven
@ 2006-12-20 17:36   ` Albert Cahalan
  2006-12-24  1:40     ` David Wragg
  1 sibling, 1 reply; 12+ messages in thread
From: Albert Cahalan @ 2006-12-20 17:36 UTC (permalink / raw)
  To: David Wragg; +Cc: linux-kernel, bcrl

On 12/20/06, David Wragg <david@wragg.org> wrote:
> "Albert Cahalan" <acahalan@gmail.com> writes:
> > On Mon, Dec 18, 2006 at 11:50:08PM +0000, David Wragg wrote:
> >> This patch (against 2.6.19/2.6.19.1) adds the four context
> >> switch values (voluntary context switches, involuntary
> >> context switches, and the same values accumulated from
> >> terminated child processes) to the end of /proc/*/stat,
> >> similarly to min_flt, maj_flt and the time used values.
> >
> > Hmmm, OK, do people have a use for these values?
>
> My reason for writing the patch was to track which processes are
> active (i.e. got scheduled to run) by polling these context switch
> values.  The time used values are not a reliable way to detect process
> activity on fast machines.  So for example, when sorting by %CPU, top
> often shows many processes using 0% CPU, despite the fact that these
> processes are running occasionally.  If top sorted by (%CPU, context
> switch count delta), it might give a more useful display of which
> processes are active on the system.

Oh, that'd be great.

The cumulative ones are still not justified though, and I fear they
may be 64-bit even on i386. It turns out that an i386 procps spends
much of its time doing 64-bit division to parse the damn ASCII crap.
I suppose I could just skip those fields, but generating them isn't
too cheap and probably I'd get stuck parsing them for some other
reason -- having them separate is probably a good idea.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-20 14:51       ` Arjan van de Ven
@ 2006-12-20 15:13         ` David Wragg
  0 siblings, 0 replies; 12+ messages in thread
From: David Wragg @ 2006-12-20 15:13 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel

Arjan van de Ven <arjan@infradead.org> writes:
> On Wed, 2006-12-20 at 14:38 +0000, David Wragg wrote:
>> (When I try the script, stap complains about the lack of the kernel
>> debuginfo package, which of course doesn't exist for my self-built
>> kernel.  After hunting around on the web for 10 minutes, I'm still no
>> closer to resolving this.  But I look forward to playing with
>> systemtap once I get past that problem.)
>
> what worked for me is copying the "vmlinux" file to /boot as
> /boot/vmlinux-`uname -r`

Thanks, that's got it working.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-20 14:38     ` David Wragg
@ 2006-12-20 14:51       ` Arjan van de Ven
  2006-12-20 15:13         ` David Wragg
  0 siblings, 1 reply; 12+ messages in thread
From: Arjan van de Ven @ 2006-12-20 14:51 UTC (permalink / raw)
  To: David Wragg; +Cc: Albert Cahalan, linux-kernel, bcrl

On Wed, 2006-12-20 at 14:38 +0000, David Wragg wrote:
> Arjan van de Ven <arjan@infradead.org> writes:
> > if all you care is the number of context switches, you can use the
> > following system tap script as well:
> >
> > http://www.fenrus.org/cstop.stp
> 
> Thanks, something similar to that might well have solved my original
> problem.  
> 
> (When I try the script, stap complains about the lack of the kernel
> debuginfo package, which of course doesn't exist for my self-built
> kernel.  After hunting around on the web for 10 minutes, I'm still no
> closer to resolving this.  But I look forward to playing with
> systemtap once I get past that problem.)

what worked for me is copying the "vmlinux" file to /boot as
/boot/vmlinux-`uname -r`

(strace the stap program to see what it tries to load)



-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-20 13:48   ` Arjan van de Ven
@ 2006-12-20 14:38     ` David Wragg
  2006-12-20 14:51       ` Arjan van de Ven
  0 siblings, 1 reply; 12+ messages in thread
From: David Wragg @ 2006-12-20 14:38 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Albert Cahalan, linux-kernel, bcrl

Arjan van de Ven <arjan@infradead.org> writes:
> if all you care is the number of context switches, you can use the
> following system tap script as well:
>
> http://www.fenrus.org/cstop.stp

Thanks, something similar to that might well have solved my original
problem.  

(When I try the script, stap complains about the lack of the kernel
debuginfo package, which of course doesn't exist for my self-built
kernel.  After hunting around on the web for 10 minutes, I'm still no
closer to resolving this.  But I look forward to playing with
systemtap once I get past that problem.)

Nonetheless, while systemtap might provide an objection to adding
per-task context switch counters to the kernel, it doesn't answer the
question, since we do have these counters, why not expose them in the
normal way?


David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-20 13:20 ` David Wragg
@ 2006-12-20 13:48   ` Arjan van de Ven
  2006-12-20 14:38     ` David Wragg
  2006-12-20 17:36   ` Albert Cahalan
  1 sibling, 1 reply; 12+ messages in thread
From: Arjan van de Ven @ 2006-12-20 13:48 UTC (permalink / raw)
  To: David Wragg; +Cc: Albert Cahalan, linux-kernel, bcrl

On Wed, 2006-12-20 at 13:20 +0000, David Wragg wrote:
> "Albert Cahalan" <acahalan@gmail.com> writes:
> > On Mon, Dec 18, 2006 at 11:50:08PM +0000, David Wragg wrote:
> >> This patch (against 2.6.19/2.6.19.1) adds the four context
> >> switch values (voluntary context switches, involuntary
> >> context switches, and the same values accumulated from
> >> terminated child processes) to the end of /proc/*/stat,
> >> similarly to min_flt, maj_flt and the time used values.
> >
> > Hmmm, OK, do people have a use for these values?
> 
> My reason for writing the patch was to track which processes are
> active (i.e. got scheduled to run) by polling these context switch
> values.  The time used values are not a reliable way to detect process
> activity on fast machines.  So for example, when sorting by %CPU, top
> often shows many processes using 0% CPU, despite the fact that these
> processes are running occasionally.  If top sorted by (%CPU, context
> switch count delta), it might give a more useful display of which
> processes are active on the system.


if all you care is the number of context switches, you can use the
following system tap script as well:

http://www.fenrus.org/cstop.stp


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
  2006-12-20  5:40 Albert Cahalan
@ 2006-12-20 13:20 ` David Wragg
  2006-12-20 13:48   ` Arjan van de Ven
  2006-12-20 17:36   ` Albert Cahalan
  0 siblings, 2 replies; 12+ messages in thread
From: David Wragg @ 2006-12-20 13:20 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: david, linux-kernel, bcrl

"Albert Cahalan" <acahalan@gmail.com> writes:
> On Mon, Dec 18, 2006 at 11:50:08PM +0000, David Wragg wrote:
>> This patch (against 2.6.19/2.6.19.1) adds the four context
>> switch values (voluntary context switches, involuntary
>> context switches, and the same values accumulated from
>> terminated child processes) to the end of /proc/*/stat,
>> similarly to min_flt, maj_flt and the time used values.
>
> Hmmm, OK, do people have a use for these values?

My reason for writing the patch was to track which processes are
active (i.e. got scheduled to run) by polling these context switch
values.  The time used values are not a reliable way to detect process
activity on fast machines.  So for example, when sorting by %CPU, top
often shows many processes using 0% CPU, despite the fact that these
processes are running occasionally.  If top sorted by (%CPU, context
switch count delta), it might give a more useful display of which
processes are active on the system.

More generally, it seems perverse to track these context switch values
but only expose them through the constrained getrusage interface.  If
they are worth having, why aren't they worth exposing in the same way
as all other process info?

> [...]
>> Putting just these four values into a new file would seem a little
>> odd, since they have a lot in common with the other getrusage values
>> that are already in /proc/pid/stat.  One possibility is to add
>> /proc/pid/rusage, mirroring the full struct rusage in text form, since
>> struct rusage is already part of the kernel ABI (though Linux doesn't
>> fill in half of the values).
>
> Since we already have a struct defined and all...
>
> sys_get_rusage(int pid)

That would be a much more useful system call than getrusage.  But why
have two ways of retrieving process info, /proc and a sys_get_rusage,
exposing differing subsets of process information?


David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] procfs: export context switch counts in /proc/*/stat
@ 2006-12-20  5:40 Albert Cahalan
  2006-12-20 13:20 ` David Wragg
  0 siblings, 1 reply; 12+ messages in thread
From: Albert Cahalan @ 2006-12-20  5:40 UTC (permalink / raw)
  To: david, linux-kernel, bcrl

David Wragg writes:
> Benjamin LaHaise <bcrl@kvack.org> writes:
>> On Mon, Dec 18, 2006 at 11:50:08PM +0000, David Wragg wrote:

>>> This patch (against 2.6.19/2.6.19.1) adds the four context
>>> switch values (voluntary context switches, involuntary
>>> context switches, and the same values accumulated from
>>> terminated child processes) to the end of /proc/*/stat,
>>> similarly to min_flt, maj_flt and the time used values.

Hmmm, OK, do people have a use for these values?

>> Please put these into new files, as the stat files in /proc are
>> horribly overloaded and have always been somewhat problematic
>> when it comes to changing how things are reported due to internal
>> changes to the kernel.  Cheers,

No thanks. Yours truly, the maintainer of "ps", "top", "vmstat", etc.

> The delay accounting value was added to the end of /proc/pid/stat back
> in July without discussion, so I assumed this approach was still
> considered satisfactory.

/proc/*/stat is the very best place in /proc for any per-process
data that will be commonly needed. Unlike /proc/*/status, few
people are tempted to screw with the formatting and/or spelling.
Unlike the /sys crap, it doesn't take 3 syscalls PER VALUE to
get at the data.

The things to ask are of course: will this really be used, and
does it really belong in /proc at all?

> Putting just these four values into a new file would seem a little
> odd, since they have a lot in common with the other getrusage values
> that are already in /proc/pid/stat.  One possibility is to add
> /proc/pid/rusage, mirroring the full struct rusage in text form, since
> struct rusage is already part of the kernel ABI (though Linux doesn't
> fill in half of the values).

Since we already have a struct defined and all...

sys_get_rusage(int pid)

> Or perhaps it makes sense to reorganize all the values from
> /proc/pid/stat and its siblings into a sysfs-like one-value-per-file
> structure, though that might introduce atomicity and efficiency issues
> (calculating some of the values involves iterating over the threads in
> the process; with everything in one file, these loops are folded
> together).

Yeah, big time. Things are quite bad in /proc, but /sys is a joke.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-12-24  1:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-18 23:50 [PATCH] procfs: export context switch counts in /proc/*/stat David Wragg
2006-12-19  6:39 ` Benjamin LaHaise
2006-12-19 11:47   ` David Wragg
2006-12-20  5:40 Albert Cahalan
2006-12-20 13:20 ` David Wragg
2006-12-20 13:48   ` Arjan van de Ven
2006-12-20 14:38     ` David Wragg
2006-12-20 14:51       ` Arjan van de Ven
2006-12-20 15:13         ` David Wragg
2006-12-20 17:36   ` Albert Cahalan
2006-12-24  1:40     ` David Wragg
2006-12-21  6:02 Al Boldi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).