linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: ps performance sucks
@ 2002-11-05 21:39 Albert D. Cahalan
  2002-11-05 22:46 ` Rik van Riel
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Albert D. Cahalan @ 2002-11-05 21:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: mbligh, jw, wa, rml, andersen, woofwoof


First of all, sorry to break the threading. I didn't get
a Cc: and the web archives drop most email headers. I'm
going to respond to everyone in a big blob w/o attributions.

> Clearly ps could do with a cleanup. There is no reason to
> read environ if it wasn't asked for. Deciding which files
> are needed based on the command line options would be a

Done. You should be using procps-3.0.5 now. If you're not,
an upgrade is called for. http://procps.sf.net/

(tough luck if you're using some other ps)

Nothing that parses the crap in /proc will ever be fast though.
There's a patch for Linux 2.4.0 that some people might like:

http://www.uwsg.iu.edu/hypermail/linux/kernel/0104.2/1720.html

> Strace it - IIRC it does 5 opens per PID. Vomit.

Nope, it does 2. Perhaps you're not running procps 3 yet?
http://procps.sf.net/

Of course if you do something like "ps ev" you need all 5.

> I'm thinking that ps, top and company are good reasons to
> make an exception of one value per file in proc. Clearly
> open+read+close of 3-5 "files" each extracting data from
> task_struct isn't more efficient than one "file" that
> generates the needed data one field per line.

There are several ways to attack this.

First of all, implement an open_read_close() syscall. Duh.
I expect Hans Reiser would be delighted too. Maybe return
a file descriptor if the file was too big or it blocked.
Maybe provide some basic stat data atomically with the call.

For per-task proc files, one file per kernel lock seems sane.
I haven't looked at how many that would be, and of course it
varies by kernel. So maybe it ends up not being exact; that's OK.

> I think it's pretty trivial to make /proc/<pid>/psinfo, which
> dumps the garbage from all five files in one place. Which makes
> it 5 times better, but it still sucks.

Well, not all the garbage! It'd be nice to have the popular
stuff in a file similar to /proc/*/stat. That would be what ps
needs to support these options: -f -l -F l u v j -j -ly -lc
plus "top". (not counting the process name or args though)

> You could take a more radical approach. Since the goal of such
> a psinfo file would be to accelerate access to information
> that's already available elsewhere, you can do away with many
> of the niceties of procfs, e.g.
>
>  - no need to be human-readable (e.g. binary or hex dump may
>    make sense in this case)

As long as you expand everything to the biggest data type that
could ever be used, binary is wonderful. Make the ABI be 64-bit
for almost everything, with proper alignment of course. Somebody
slap the person who put a 32-bit ino_t in the latest stat syscall.

> First write says "pid,comm". Internally, this gets translated
> to 0x8c+0x04, 0x2ee+0x10 (offset+length). Next read returns
> "pid 4,comm 16" (include the name, so you can indicate fields
> the kernel doesn't recognize). Then, kmalloc 20*tasks bytes,
> lock, copy the fields from struct task_struct, unlock, let the
> stuff be read by user space, kfree. Adjacent fields can be
> optimized to single byte strings at setup time.

If you're going to do that, then specify stuff via the filename:
/proc/12345/hack/80basic,20pids,20uids,40argv,4tty,4stat

Not that I care for dealing with the above!

>> sgid country
>> * real killer: you think Albert would fail to produce equally
>> crappy code and equally crappy behaviour? Yeah, right.
>
> Well I think Rik and I can handle it in our tree :)

You guys can't even get BSD process selection right.

If necessary I could fix a few spots needed for setgid usage.
I'd rather not need to do so, because then yet another chunk
of non-kernel code is making security decisions.

> * device is not network-transparent - even in principle

ROTFL. What a fantasy. You damn well know /proc isn't either.
If you can hack /proc to be exportable, you can damn well do
the same for a device file. You won't be using NFS for this.
I think Mosix already has a shared /proc anyway; an ioctl() is
a simple matter of writing a little ugly code.

> And i'd still keep environ seperate. I'm inclined to think
> ps should never have presented it in the first place.
> This is the direction i (for what it's worth) favor.

Yeah, well that's BSD compatibility for you. Printing the
environment might actually be useful if you could pick just
the fields you wanted:  ps -eo pid,stat,.DISPLAY,comm

Useful? Like that notation?

> Well if we want to be gross and efficient, we could just compile
> a kmem-diving dynamic library with every kernel compile and stick
> it in /boot or somewhere. Mildly less extreme is a flat index file
> for the data you need a la System.map. Then just open /dev/kmem
> and grab what you want. Walking the tasklist with no locking would
> be an interesting challenge, but probably not insurmountable.
> That's how things like ps always used to work IIRC.

Yep, that's gross and efficient for sure. The dynamic library idea
fixes a major problem; BSD "top" is always breaking due to kernel
differences on Solaris and FreeBSD.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-05 21:39 ps performance sucks Albert D. Cahalan
@ 2002-11-05 22:46 ` Rik van Riel
  2002-11-05 22:48   ` Robert Love
  2002-11-05 23:37   ` Albert D. Cahalan
  2002-11-05 23:37 ` Werner Almesberger
  2002-11-07 17:19 ` Bill Davidsen
  2 siblings, 2 replies; 14+ messages in thread
From: Rik van Riel @ 2002-11-05 22:46 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-kernel, mbligh, jw, wa, rml, andersen, woofwoof

On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

> (tough luck if you're using some other ps)

Why do your procps mails always contain more references to
procps 2 than to your own version ?

What is your obsession with procps 2 ?

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-05 22:46 ` Rik van Riel
@ 2002-11-05 22:48   ` Robert Love
  2002-11-05 23:23     ` Miquel van Smoorenburg
  2002-11-05 23:37   ` Albert D. Cahalan
  1 sibling, 1 reply; 14+ messages in thread
From: Robert Love @ 2002-11-05 22:48 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Albert D. Cahalan, linux-kernel, mbligh, jw, wa, andersen, woofwoof

On Tue, 2002-11-05 at 17:46, Rik van Riel wrote:

> On Tue, 5 Nov 2002, Albert D. Cahalan wrote:
> 
> > (tough luck if you're using some other ps)
> 
> Why do your procps mails always contain more references to
> procps 2 than to your own version ?
> 
> What is your obsession with procps 2 ?

Because he forked procps and cannot get over it.

	Robert Love


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-05 22:48   ` Robert Love
@ 2002-11-05 23:23     ` Miquel van Smoorenburg
  0 siblings, 0 replies; 14+ messages in thread
From: Miquel van Smoorenburg @ 2002-11-05 23:23 UTC (permalink / raw)
  To: linux-kernel

In article <1036536496.777.57.camel@phantasy>,
Robert Love  <rml@tech9.net> wrote:
>On Tue, 2002-11-05 at 17:46, Rik van Riel wrote:
>
>> On Tue, 5 Nov 2002, Albert D. Cahalan wrote:
>> 
>> > (tough luck if you're using some other ps)
>> 
>> Why do your procps mails always contain more references to
>> procps 2 than to your own version ?
>> 
>> What is your obsession with procps 2 ?
>
>Because he forked procps and cannot get over it.

For the record, I'm using Alberts procps (using Debian), I have
no idea what the code looks like, but it has a very high
useability factor. No idea why everybody keeps flaming Albert
for doing a good job.

Mike.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-05 22:46 ` Rik van Riel
  2002-11-05 22:48   ` Robert Love
@ 2002-11-05 23:37   ` Albert D. Cahalan
  1 sibling, 0 replies; 14+ messages in thread
From: Albert D. Cahalan @ 2002-11-05 23:37 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Albert D. Cahalan, linux-kernel, mbligh, jw, wa, rml, andersen, woofwoof

Rik van Riel writes:
> On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

>> (tough luck if you're using some other ps)
>
> Why do your procps mails always contain more references to
> procps 2 than to your own version ?
>
> What is your obsession with procps 2 ?

I'm rather sick of being blamed for problems that are not
seen in procps 3. Somebody posts about procps needing to
read 5 files per process, then somebody else makes a rude
comment about me... never minding that the procps 3 code
doesn't have the behavior that was being complained about.

I also have to make the differences clear. Really, I hate
doing that. I've learned a harsh lesson though; failure to
advertise leads to forks. It also leads to people using
obsolete code. Some poor soul even started hacking on top,
not realizing that it was already rewritten and is improving
quickly.

Do realize that you _started_ with buggy old code. I really
wish you'd just let it die. There wasn't any need to start
hacking on that buggy old code; I take patches, even from you.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-05 21:39 ps performance sucks Albert D. Cahalan
  2002-11-05 22:46 ` Rik van Riel
@ 2002-11-05 23:37 ` Werner Almesberger
  2002-11-06  0:10   ` Albert D. Cahalan
  2002-11-07 17:19 ` Bill Davidsen
  2 siblings, 1 reply; 14+ messages in thread
From: Werner Almesberger @ 2002-11-05 23:37 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-kernel, mbligh, jw, rml, andersen, woofwoof

Albert D. Cahalan wrote:
> If you're going to do that, then specify stuff via the filename:
> /proc/12345/hack/80basic,20pids,20uids,40argv,4tty,4stat

Well, you'd get the numbers (sizes) from the kernel, as a
response. Of course, you could define the interface such that
the query (after all, that's what it is) contains the full
field name plus size information, and the kernel just says
"EINVAL" if it doesn't like it, but then you lose some
flexibility. Might not be a big deal, though.

Yeah, perhaps it's actually better to avoid being overly
clever. How frequently are ps and friends hit by the removal
of fields or size changes anyway ?

Oh, BTW, it would be more like /proc/hack/<query>, so you do
all PIDs in one sweep.

> Not that I care for dealing with the above!

Well, that's what programs are for :-)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-05 23:37 ` Werner Almesberger
@ 2002-11-06  0:10   ` Albert D. Cahalan
  2002-11-06  1:29     ` Werner Almesberger
  0 siblings, 1 reply; 14+ messages in thread
From: Albert D. Cahalan @ 2002-11-06  0:10 UTC (permalink / raw)
  To: Werner Almesberger
  Cc: Albert D. Cahalan, linux-kernel, mbligh, jw, rml, andersen, woofwoof

Werner Almesberger writes:
> Albert D. Cahalan wrote:

>> If you're going to do that, then specify stuff via the filename:
>> /proc/12345/hack/80basic,20pids,20uids,40argv,4tty,4stat
>
> Well, you'd get the numbers (sizes) from the kernel, as a
> response. Of course, you could define the interface such that
> the query (after all, that's what it is) contains the full
> field name plus size information, and the kernel just says
> "EINVAL" if it doesn't like it, but then you lose some
> flexibility. Might not be a big deal, though.

I was thinking "80basic" would ask for the first 0x80 words
of basic info. If there's less, zero-fill. If there's more,
truncate the struct. Then "20pids" asks for the first 0x20
words of pid info (pid, ppid, sess, pgid...) and so on.

It's saying "give me 0x80 words of struct basic, followed
by 0x20 words of struct pids..." so that there isn't too
much version trouble.

Note: not expressing either approval or condemnation for
the general idea or for any specific implementation

> Yeah, perhaps it's actually better to avoid being overly
> clever. How frequently are ps and friends hit by the removal
> of fields or size changes anyway ?

Removal is a killer. It hit back in the Linux 1.3.xx days
when /proc/meminfo briefly had the current format. It hit
again just recently, when data was removed from /proc/stat
without even a transition period.

Size changes usually don't hurt, because most people are
satisfied with the old limits. If there is to be a binary
kernel interface, it damn well better use 64-bit values
for most everything.

Name changes are nasty, and are the reason I hate the
status file. Is it "SigCgt" or SigCat" in that file?
The answer depends on your kernel version...

> Oh, BTW, it would be more like /proc/hack/<query>, so you do
> all PIDs in one sweep.

That's nice, until you exceed the amount of memory available.
Right now, a "ps" without sorting can work even if there isn't
enough physical memory or address space for ps to hold info about
every process. Using a snapshot interface would cause ps to fail
under some heavy load conditions that it currently survives.

Hey, if reiserfs can have a database query syscall... >:-)
open("/proc/SELECT PID,TTY,TIME,CMD FROM PS WHERE RUID=42",O_RDONLY)
Somebody check if Al Viro needs a defibrilator. On second thought...



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-06  0:10   ` Albert D. Cahalan
@ 2002-11-06  1:29     ` Werner Almesberger
  0 siblings, 0 replies; 14+ messages in thread
From: Werner Almesberger @ 2002-11-06  1:29 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-kernel, mbligh, jw, rml, andersen, woofwoof

Albert D. Cahalan wrote:
> I was thinking "80basic" would ask for the first 0x80 words
> of basic info. If there's less, zero-fill. If there's more,
> truncate the struct. Then "20pids" asks for the first 0x20
> words of pid info (pid, ppid, sess, pgid...) and so on.

Argl, this has "silent failure" written all over it. No, I think
single-field granularity wouldn't incur excessive overhead: at
run time, you can trivially handle adjacent fields with a single
copy, and I don't think there are really that many practically
useful fields that setup cost (CPU or memory) would be terrible.

[ Various change horrors ]

Hmm yes, about as bad as I remember it from my psmisc days :-(

> That's nice, until you exceed the amount of memory available.

That would the the least of my concerns. If you really run out
of memory, you can always fall back to an iterative process.

> Hey, if reiserfs can have a database query syscall... >:-)
> open("/proc/SELECT PID,TTY,TIME,CMD FROM PS WHERE RUID=42",O_RDONLY)

Cute ;-) But it might be faster just to dump the whole data,
and let user space worry about picking the right entries.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-05 21:39 ps performance sucks Albert D. Cahalan
  2002-11-05 22:46 ` Rik van Riel
  2002-11-05 23:37 ` Werner Almesberger
@ 2002-11-07 17:19 ` Bill Davidsen
  2002-11-07 20:42   ` Albert D. Cahalan
  2 siblings, 1 reply; 14+ messages in thread
From: Bill Davidsen @ 2002-11-07 17:19 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-kernel, mbligh, jw, wa, rml, andersen, woofwoof

On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

> > Strace it - IIRC it does 5 opens per PID. Vomit.
> 
> Nope, it does 2. Perhaps you're not running procps 3 yet?
> http://procps.sf.net/
> 
> Of course if you do something like "ps ev" you need all 5.

  Well, since you're doing all this stuff to push your version, how about
an option to do a fast ps for most processes and only do the hard work for
processes owned by a given user. Or not owned, so everything not root
would be shown in detail, as an example. What about showing or not
threads, or showing minimal detail (fast) for threads.

  There is a lot of room for options if you want to see everything but
only detail for some.

  I wish competing procps could be merged, I feel as though it's something
not requiring the time of top kernel developers. If you are willing to add
features suggested by others and they are willing to push a feature list
to you maybe that could happen.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-07 17:19 ` Bill Davidsen
@ 2002-11-07 20:42   ` Albert D. Cahalan
  2002-11-07 21:05     ` Andrew Morton
  2002-11-08 21:05     ` Bill Davidsen
  0 siblings, 2 replies; 14+ messages in thread
From: Albert D. Cahalan @ 2002-11-07 20:42 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Albert D. Cahalan, linux-kernel, mbligh, jw, wa, andersen, woofwoof

Bill Davidsen writes:
> On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

>>> Strace it - IIRC it does 5 opens per PID. Vomit.
>>
>> Nope, it does 2. Perhaps you're not running procps 3 yet?
>> http://procps.sf.net/
>>
>> Of course if you do something like "ps ev" you need all 5.
>
>   Well, since you're doing all this stuff to push your version, how about
> an option to do a fast ps for most processes and only do the hard work for
> processes owned by a given user. Or not owned, so everything not root
> would be shown in detail, as an example. What about showing or not
> threads, or showing minimal detail (fast) for threads.
>
>   There is a lot of room for options if you want to see everything but
> only detail for some.

Would people use it? I risk burying users in options.
The closest things to this that I've considered are:

1. select every process that I can signal (including by TTY)
2. expand the selection with all ancestor processes up to init
3. expand the selection with all descendant processes

As for threads, support will come when the kernel makes it work
sanely. Right now I could make ps crudely guess what is a thread
and what is not, but that is slow and it suffers from both false
positives and false negatives. I'd be in business if the kernel
would do the following:

1. group related (same memory context) tasks in the /proc output
2. supply a "more tasks follow" flag
3. supply a way to identify a task's primary memory context

Note that #3 has to be immune to UML, Wine, and Bochs playing
tricks with segment registers and alternate memory contexts.

>   I wish competing procps could be merged, I feel as though it's something
> not requiring the time of top kernel developers. If you are willing to add
> features suggested by others and they are willing to push a feature list
> to you maybe that could happen.

I have difficulty understanding why somebody would want to start
hacking on code that hasn't been maintained for ages. I'm certainly
not about to throw away years worth of bug fixes. I suspect there was
a failure to realize how much Craig and I had done over the years.
Then Jim Warner (new top author) and I (ps, skill, snice, half of
libproc, and now much of free and vmstat) were blown off for reasons
I can't figure. In spite of this, I would gladly consider patches
from Rik van Riel and Robert M. Love, neither of which had even
touched the procps source code until just recently. I try to keep
this civil while making it clear who has the continuously maintained
source tree with original authors still actively participating.

Oh well.

Are you a vmstat user? Suggestions are needed; it's getting a rewrite.
I may even change the default format, assuming people don't all
have scripts that parse the output. How do you like this?

procs ------------memory----------- ---swap-- ----io--- --system-- ----cpu----
 r  b swpd free buff cache act !act   si   so   bi   bo   in    cs us sy id wa
 0  0 304k  14m 2.5m  27m  16m  23m    0    0    0    0   33     4  0  0 90  9
 0  0 304k  14m 2.5m  27m  16m  23m    0    0    0    0  114    12  1  0 88 11
 0  0 304k  14m 2.5m  27m  16m  23m    0    0    0    0  104     6  0  1 91  8

Let me know if any of that is junk, or if there is something you'd add.
Adding stuff means removing stuff, since blowing past 80 columns ins't
OK for most users. For ideas, see: /proc/vmstat, /proc/meminfo, /proc/stat

In case you happen to know where they are, I'm looking for these:

pages reclaimed
minor faults
COW faults
zero-page faults
anticipated short-term memory shortfall
pages freed
pages scanned by page-replacement algorithm
clock cycles by page replacement algorithm
number of system calls
number of forks (fork, vfork, & clone) and execs

This would be easy if every OS used the same terminology,
had the same stats, and had proper documentation.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-07 20:42   ` Albert D. Cahalan
@ 2002-11-07 21:05     ` Andrew Morton
  2002-11-07 22:02       ` Albert D. Cahalan
  2002-11-08 21:05     ` Bill Davidsen
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2002-11-07 21:05 UTC (permalink / raw)
  To: Albert D. Cahalan
  Cc: Bill Davidsen, linux-kernel, mbligh, jw, wa, andersen, woofwoof

"Albert D. Cahalan" wrote:
> 
> In case you happen to know where they are, I'm looking for these:
> 
> pages reclaimed

/proc/vmstat:pgsteal

> minor faults

/proc/vmstat:pgfault - /proc/vmstat:pgmajfault

> COW faults
> zero-page faults

These are not available separately

> anticipated short-term memory shortfall

hm.  tricky.

> pages freed

/proc/vmstat:pgfree

This is a little broken in 2.5.46.  pgfree is accumulated
_before_ the per-cpu LIFO queues and pgalloc is accumulated _after_
the per-cpu queues (or vice versa) so they're out of whack.  

> pages scanned by page-replacement algorithm

/proc/vmstat:pgscan

> clock cycles by page replacement algorithm

Not available.  Could sum up the CPU across all kswapd instances,
which is a bit lame.

> number of system calls

Not available

> number of forks (fork, vfork, & clone) and execs

/proc/stat: processes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-07 21:05     ` Andrew Morton
@ 2002-11-07 22:02       ` Albert D. Cahalan
  2002-11-07 22:21         ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Albert D. Cahalan @ 2002-11-07 22:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Albert D. Cahalan, Bill Davidsen, linux-kernel, mbligh, jw, wa,
	andersen, woofwoof

Andrew Morton writes:
> "Albert D. Cahalan" wrote:

>> In case you happen to know where they are, I'm looking for these:
>>
>> pages reclaimed
>
> /proc/vmstat:pgsteal

That's a funny name for it. Sure about that? Longer description
of what I'm looking for:

    reattaches from reclaim list
        Number of pages that have been faulted while on the inactive list

To me, "pgsteal" sounds like pages grabbed from a clean list to
be used for some new purpose.

>> minor faults
>
> /proc/vmstat:pgfault - /proc/vmstat:pgmajfault
>
>> COW faults
>> zero-page faults
>
> These are not available separately

They count as minor faults?

>> anticipated short-term memory shortfall
>
> hm.  tricky.

How about these then? (and would you want them?)

a. urgency level for the need to free up memory
b. amount (or %) by which the system is overcommitted

>> pages freed
>
> /proc/vmstat:pgfree
>
> This is a little broken in 2.5.46.  pgfree is accumulated
> _before_ the per-cpu LIFO queues and pgalloc is accumulated _after_
> the per-cpu queues (or vice versa) so they're out of whack.

Can I assume it will be fixed soon? Is this a value you'd like?

>> pages scanned by page-replacement algorithm
>
> /proc/vmstat:pgscan
>
>> clock cycles by page replacement algorithm
>
> Not available.  Could sum up the CPU across all kswapd instances,
> which is a bit lame.

I suspect that it's cycles of the page aging "clock" hand,
not CPU cycles. So that would be pages scanned divided by
the average number of pages in a full scan.

>> number of system calls
>
> Not available

I though so. Bummer. I guess this is due to overhead.

>> number of forks (fork, vfork, & clone) and execs
>
> /proc/stat: processes

That's fork/vfork/clone all together, w/o execs?
(good for "vmstat -f", but poor for "vmstat -s")

Got one more:

      wired pages
          Total number of pages that are currently in use
          and cannot be used for paging

Thanks for all the help. BTW, you didn't say if you liked the
proposed changes, so I'm assuming they don't matter to you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-07 22:02       ` Albert D. Cahalan
@ 2002-11-07 22:21         ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2002-11-07 22:21 UTC (permalink / raw)
  To: Albert D. Cahalan
  Cc: Bill Davidsen, linux-kernel, mbligh, jw, wa, andersen, woofwoof

"Albert D. Cahalan" wrote:
> 
> Andrew Morton writes:
> > "Albert D. Cahalan" wrote:
> 
> >> In case you happen to know where they are, I'm looking for these:
> >>
> >> pages reclaimed
> >
> > /proc/vmstat:pgsteal
> 
> That's a funny name for it. Sure about that? Longer description
> of what I'm looking for:
> 
>     reattaches from reclaim list
>         Number of pages that have been faulted while on the inactive list

Ah.  No, we don't account that.  That would be "minor faults
against pagecache".

> To me, "pgsteal" sounds like pages grabbed from a clean list to
> be used for some new purpose.

Yes, it is.  "page reclaim"
 
> >> minor faults
> >
> > /proc/vmstat:pgfault - /proc/vmstat:pgmajfault
> >
> >> COW faults
> >> zero-page faults
> >
> > These are not available separately
> 
> They count as minor faults?

Yes.
 
> >> anticipated short-term memory shortfall
> >
> > hm.  tricky.
> 
> How about these then? (and would you want them?)
> 
> a. urgency level for the need to free up memory

It's a bit hard to put one's finger on what this means
really.  We have the free-pages info in /proc/meminfo,
and the page-stealing rates in /proc/meminfo.

If we see that kswapd is stealing pages like mad then we
know there's a lot of replacement pressure, but that certainly
doesn't mean that the system is under any sort of difficulty.

I guess one should step back and ask "what are we trying to
report here"?

> b. amount (or %) by which the system is overcommitted

That's approximately /proc/meminfo:Committed_AS / total memory.
 
> >> pages freed
> >
> > /proc/vmstat:pgfree
> >
> > This is a little broken in 2.5.46.  pgfree is accumulated
> > _before_ the per-cpu LIFO queues and pgalloc is accumulated _after_
> > the per-cpu queues (or vice versa) so they're out of whack.
> 
> Can I assume it will be fixed soon? Is this a value you'd like?

Yes, I have a fix for that queued.

pgfree will include freeings from programs exitting, munmapping,
truncating, etc.  I think it's not a very interesting metric
for system behaviour.

/proc/vmstat:pgsteal is more interesting.  It shows the rate
at which the kernel is reclaiming cache (pagecache and swapcache)
to satisfy its memory demands.

> >> pages scanned by page-replacement algorithm
> >
> > /proc/vmstat:pgscan
> >
> >> clock cycles by page replacement algorithm
> >
> > Not available.  Could sum up the CPU across all kswapd instances,
> > which is a bit lame.
> 
> I suspect that it's cycles of the page aging "clock" hand,
> not CPU cycles. So that would be pages scanned divided by
> the average number of pages in a full scan.

OK.  /proc/vmstat:pgscan is incremented when the VM considers
a page for replacement.  You can divide this by Active+Inactive
from meminfo to determine the scanning rate.

Also, pgscan/pgsteal is a metric of the efficiency of page
reclaim.  If it's 1.00 then every page coming off the tail of
the inactive list is being reaped.  If it gets much below 0.3
or so then the VM is having quite some difficulty.
 
> >> number of system calls
> >
> > Not available
> 
> I though so. Bummer. I guess this is due to overhead.
> 
> >> number of forks (fork, vfork, & clone) and execs
> >
> > /proc/stat: processes
> 
> That's fork/vfork/clone all together, w/o execs?

Looks like it, yes.

> (good for "vmstat -f", but poor for "vmstat -s")
> 
> Got one more:
> 
>       wired pages
>           Total number of pages that are currently in use
>           and cannot be used for paging

I guess this should include:

- Pages in use by the kernel (kmalloc, kernel stacks etc).
  (These are mostly accounted for in /proc/meminfo:Slab)

- Pages which are mlocked (there is no accounting of these at all)

- PageReserved pages

- Not much else, really.  Maybe pages which are under direct-IO.

PageReserved accounting is simple enough, but it looks like the
problem which needed that metric can be solved by other means.

mlock accounting is tricky, and without that the value of this is a 
bit questionable.

> Thanks for all the help. BTW, you didn't say if you liked the
> proposed changes, so I'm assuming they don't matter to you.

I like anything which improves the observability of kernel
behaviour ;)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ps performance sucks
  2002-11-07 20:42   ` Albert D. Cahalan
  2002-11-07 21:05     ` Andrew Morton
@ 2002-11-08 21:05     ` Bill Davidsen
  1 sibling, 0 replies; 14+ messages in thread
From: Bill Davidsen @ 2002-11-08 21:05 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-kernel, mbligh, jw, wa, andersen, woofwoof

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1569 bytes --]

On Thu, 7 Nov 2002, Albert D. Cahalan wrote:

> Are you a vmstat user? Suggestions are needed; it's getting a rewrite.
> I may even change the default format, assuming people don't all
> have scripts that parse the output. How do you like this?
> 
> procs ------------memory----------- ---swap-- ----io--- --system-- ----cpu----
>  r  b swpd free buff cache act !act   si   so   bi   bo   in    cs us sy id wa
>  0  0 304k  14m 2.5m  27m  16m  23m    0    0    0    0   33     4  0  0 90  9
>  0  0 304k  14m 2.5m  27m  16m  23m    0    0    0    0  114    12  1  0 88 11
>  0  0 304k  14m 2.5m  27m  16m  23m    0    0    0    0  104     6  0  1 91  8

The reason I maintain vmstat2 (NOT based on any of your code AFAIK) is
that I want to see data rates on the non-loopback NICs. I also have a
timestamp every line option and after 2.5 settles a bit and I get the time
to find where things are I want optional stats by individual NIC, and
individual drive and partition if I can find it. It isn't in partitions
any more, and the file I was told to use doesn't exist. Maybe in devicefs?
Oh, and -M starts the output with memory sizes, for a package to generate
usage graphs with a line at physical memory size.

An option to flush the buffers after each line even when writing to a
pipe.

Line length: the w option in ps doesn't worry about it, why should vmstat?
If the user tells you to show more, do it.

vmstat2 output attached (it's wide).

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 842 bytes --]

Script started on Fri Nov  8 15:49:56 2002
newscon02:earthquake$ vmstat2 -tkfM 10 5
MemTotal:      1551840 kB
SwapTotal:     2048248 kB
time   load free buffs swap pgin pgou dk0 dk1 dk2 dk3 ipkt opkt  int  ctx   usr sys idl  i_netK  o_netK
15.839 8.25  5.1  1412 49.0 3861 8588  10 243   0   0 6092 6264 8216 3408    35  37  28  5722.2  5785.1
15.842 9.26  7.2  1409 49.0 4272 3764  11 171   0   0 7707 8124 9772 3585    43  43  14  7369.6  7748.6
15.844 8.21  5.5  1412 49.0 3942 7688  11 232   0   0 6351 6281 8554 3289    30  42  28  6412.7  5618.9
15.847 7.77  6.7  1410 49.0 4932 5029  15 202   0   0 7813 7886 10072 3469    42  44  14  7639.9  7342.0
15.850 7.99  7.0  1410 49.0 4367 5976   7 211   0   0 6907 6953 9045 3467    37  38  25  6721.5  6456.4
newscon02:earthquake$ exit

Script done on Fri Nov  8 15:56:08 2002

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-11-08 21:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-05 21:39 ps performance sucks Albert D. Cahalan
2002-11-05 22:46 ` Rik van Riel
2002-11-05 22:48   ` Robert Love
2002-11-05 23:23     ` Miquel van Smoorenburg
2002-11-05 23:37   ` Albert D. Cahalan
2002-11-05 23:37 ` Werner Almesberger
2002-11-06  0:10   ` Albert D. Cahalan
2002-11-06  1:29     ` Werner Almesberger
2002-11-07 17:19 ` Bill Davidsen
2002-11-07 20:42   ` Albert D. Cahalan
2002-11-07 21:05     ` Andrew Morton
2002-11-07 22:02       ` Albert D. Cahalan
2002-11-07 22:21         ` Andrew Morton
2002-11-08 21:05     ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).