All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06  1:11 Albert Cahalan
  2004-08-06  3:48 ` William Lee Irwin III
  2004-08-06  9:40 ` Roger Luethi
  0 siblings, 2 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06  1:11 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: linux-mm, rl, wli

Roger Luethi writes:

> I really wanted /proc/pid/statm to die [1] and I still believe the
> reasoning is valid. As it doesn't look like that is going to happen,

It would be awful to lose statm, especially since WLI has fixes
for some of the problems. Just why do you want to kill statm?

Now quoting from your patch...

+ size     total program size (pages)  (same as VmSize in status)
+ resident size of memory portions (pages) (same as VmRSS in status)

There was a distinction here that has been lost. One of these
included memory-mapped hardware. You could see this with the
X server video memory.

For "top" running on a 2.2.xx or 2.4.xx kernel, the statm values
are better. Jim Warner determined this after careful examination,
and I have no desire to re-analyse the matter. Remember that user
tools are expected to run on both old and new kernels, while the
kernel is expected to support old apps. We call this an ABI...

+ shared   number of pages that are shared (i.e. backed by a file)

This isn't in the status file. It's shown in top's default output.
Since top must read this value from statm, it might as well use    
other parts of statm as well.                                    
                                       
+ trs      number of pages that are 'code' (not including libs; broken,
+       includes data segment)

Perhaps this works OK with the NX bit or on an Alpha? On a regular
i386 box, code and read-only data are pretty much the same.

Note: trs means "text RESIDENT set".

+ lrs      number of pages of library  (always 0 on 2.6)

This worked for a.out executables. (that 0x60000000 value is an
a.out constant) Oh well, trs will do.

+ drs      number of pages of data/stack  (including libs; broken,
+       includes library text)

Note: trs means "data RESIDENT set".

+ dt       number of dirty pages   (always 0 on 2.6)

This one would be useful.

These would be really useful too:
1. swap space used
2. swap space that would be used if fully paged out

For the pmap command, it would be nice to have per-mapping
values in the /proc/*/maps files. (resident, locked,
dirty, C-O-W, swapped...) 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06  1:11 [proc.txt] Fix /proc/pid/statm documentation Albert Cahalan
@ 2004-08-06  3:48 ` William Lee Irwin III
  2004-08-06  9:40 ` Roger Luethi
  1 sibling, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06  3:48 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, linux-mm, rl

Roger Luethi writes:
>> I really wanted /proc/pid/statm to die [1] and I still believe the
>> reasoning is valid. As it doesn't look like that is going to happen,

On Thu, Aug 05, 2004 at 09:11:52PM -0400, Albert Cahalan wrote:
> It would be awful to lose statm, especially since WLI has fixes
> for some of the problems. Just why do you want to kill statm?
> Now quoting from your patch...

Just to cite my own sources, the fixes I had were forward ports of RHAS
patches for statm accounting at the time of pte modification.


-- wli

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06  1:11 [proc.txt] Fix /proc/pid/statm documentation Albert Cahalan
  2004-08-06  3:48 ` William Lee Irwin III
@ 2004-08-06  9:40 ` Roger Luethi
  2004-08-06 10:46   ` William Lee Irwin III
  2004-08-06 12:58     ` Albert Cahalan
  1 sibling, 2 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06  9:40 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, linux-mm, wli

On Thu, 05 Aug 2004 21:11:52 -0400, Albert Cahalan wrote:
> Roger Luethi writes:
> 
> > I really wanted /proc/pid/statm to die [1] and I still believe the
> > reasoning is valid. As it doesn't look like that is going to happen,
> 
> It would be awful to lose statm,

Hardly. All I was asking this time was to have a documentation fix
merged, though.

> especially since WLI has fixes for some of the problems.

I discussed this very issue with wli on linux-mm about a year ago. proc
file and documentation are still broken. So what's wrong with doing
something about it?

> Just why do you want to kill statm?

* Almost everything in there is redundant. IMO the kernel should provide
  information once and leave the rest to userspace. To make things worse,
  statm does not simply mirror information from somewhere else in the
  proc tree, it has its own (broken) routine to calculate redundant
  information.

* statm is broken. It was broken in 2.4 as well, but _differently_. Every
  application that relies on statm forwards wrong information, or at
  the very least needs special casing because the information provided
  in various fields differs between kernel versions.

* Nobody can really tell exactly how broken statm is because there is
  no canonical documentation of what it is supposed to do. That implies
  that it is kinda hard to properly fix statm.

* I hate the format. I like my proc files human readable. An important
  reason that statm could linger around in a broken state for so long
  is the lack of labels. It's hard to find bugs if there's nothing to
  indicate what the values are supposed to be. (and yes, /proc/pid/stat
  is awful, too, but it has the excuse of providing valuable information)

  (others may disagree, but you asked me why _I_ want to kill statm)

The only reason I could see for keeping statm around is that it
is cheaper than status for parsers in top & Co. Having written one
of them myself, I have spent quite some time thinking about better
alternatives. If you want to talk about that, count me in.

> Now quoting from your patch...
> 
> + size     total program size (pages)  (same as VmSize in status)
> + resident size of memory portions (pages) (same as VmRSS in status)
> 
> There was a distinction here that has been lost. One of these
> included memory-mapped hardware. You could see this with the
> X server video memory.

You can definitely not rely on that distinction being there. Feel free to
add a comment "may or may not include memory-mapped hardware, depending
on the kernel". This makes statm even worse, because even the seemingly
well-defined, redundant fields aren't.

If the memory-mapped hardware is valuable information, I suggest you
add a properly labeled field to /proc/pid/status.

> For "top" running on a 2.2.xx or 2.4.xx kernel, the statm values
> are better. Jim Warner determined this after careful examination,
> and I have no desire to re-analyse the matter. Remember that user

I didn't ask you to re-analyse anything. I didn't ask to change anything
about 2.2 or 2.4, either. But I found 2.4 statm (partially) broken a
year ago.

> tools are expected to run on both old and new kernels, while the
> kernel is expected to support old apps. We call this an ABI...

Newsflash: Your "ABI" has been broken a long time ago. statm output is
not what it used to be. If statm is so important, how come its behavior
is nowhere documented? The code does what it does, but it fails to
explain what it's meant to calculate. The proc.txt documentation has
been broken forever (fields switched!) and nobody noticed.

Besides, as you very well know it's not unheard of that contents of
files in proc change.

> + shared   number of pages that are shared (i.e. backed by a file)
> 
> This isn't in the status file. It's shown in top's default output.
> Since top must read this value from statm, it might as well use    
> other parts of statm as well.                                    

I agree that it's not in the status file. I agree that it would be
useful.

Too bad that column in statm does not really contain the amount of
shared memory, either. So you got a field labeled "shared" in top which
contains some other data.

Again, I suggest you add a field to status and make sure the calculation
is correct.

> + trs      number of pages that are 'code' (not including libs; broken,
> +       includes data segment)
> 
> Perhaps this works OK with the NX bit or on an Alpha? On a regular
> i386 box, code and read-only data are pretty much the same.

I didn't say read-only data. Let me illustrate:

$ cat /proc/23357/maps
08048000-0804c000 r-xp 00000000 03:42 9875928    /bin/cat (RL: 16 KB)
0804c000-0804d000 rw-p 00003000 03:42 9875928    /bin/cat (RL:  4 KB)
0804d000-0806e000 rw-p 0804d000 00:00 0 
40000000-40001000 rw-p 40000000 00:00 0 
40001000-40201000 r--p 00000000 03:42 9461381    /usr/lib/locale/locale-archive
422c4000-422d7000 r-xp 00000000 03:42 9290970    /lib/ld-2.3.3.so
422d7000-422d8000 rw-p 00012000 03:42 9290970    /lib/ld-2.3.3.so
422da000-423e4000 r-xp 00000000 03:42 9290974    /lib/libc-2.3.3.so
423e4000-423e8000 rw-p 00109000 03:42 9290974    /lib/libc-2.3.3.so
423e8000-423ea000 rw-p 423e8000 00:00 0 
bfffe000-c0000000 rw-p bfffe000 00:00 0 
ffffe000-fffff000 ---p 00000000 00:00 0

$ cat /proc/23357/status|grep VmExe
VmExe:        16 kB

$ cat /proc/23357/statm
845 105 807 5 0 840 0
            ^---- 5 pages == 20 KB

In other words: In this case, the trs/text field in statm counts one
page too many (4 pages text + 1 page data).

> Note: trs means "text RESIDENT set".

Your point being?

That name is only mentioned in proc.txt, it's not used anywhere in the
code (it's called "text" there). If you want to replace trs with a
better fitting name, that's great.

> + lrs      number of pages of library  (always 0 on 2.6)
> 
> This worked for a.out executables. (that 0x60000000 value is an
> a.out constant) Oh well, trs will do.
> 
> + drs      number of pages of data/stack  (including libs; broken,
> +       includes library text)
> 
> Note: trs means "data RESIDENT set".
> 
> + dt       number of dirty pages   (always 0 on 2.6)
> 
> This one would be useful.

Agreed. It would be nice to have it somewhere else.

> These would be really useful too:
> 1. swap space used
> 2. swap space that would be used if fully paged out

There are many values that could be interesting or useful. But that
has nothing to do with the abomination that is statm.

> For the pmap command, it would be nice to have per-mapping
> values in the /proc/*/maps files. (resident, locked,
> dirty, C-O-W, swapped...) 

Hey, I am all _for_ improving proc. But rather than adding more values,
I'd like to address some design problems first: For example, I'd
like to have a reserved value for N/A (currently, kernels just set
obsolete fields to 0 and parsers must guess whether it's truly 0 or not
available). And then there is the trade-off between human readable and
easy to parse. ISTR there have been occasional discussions, but maybe
it's time to revisit the issue because the current mess is a problem.

Roger

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06  9:40 ` Roger Luethi
@ 2004-08-06 10:46   ` William Lee Irwin III
  2004-08-06 12:01       ` Roger Luethi
  2004-08-06 12:58     ` Albert Cahalan
  1 sibling, 1 reply; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 10:46 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Albert Cahalan, linux-kernel mailing list, linux-mm

On Thu, 05 Aug 2004 21:11:52 -0400, Albert Cahalan wrote:
>> especially since WLI has fixes for some of the problems.

On Fri, Aug 06, 2004 at 11:40:37AM +0200, Roger Luethi wrote:
> I discussed this very issue with wli on linux-mm about a year ago. proc
> file and documentation are still broken. So what's wrong with doing
> something about it?

So now what, you want me to do yet another forward port of
linux-2.4.9-statm-B1.diff?


-- wli

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 10:46   ` William Lee Irwin III
@ 2004-08-06 12:01       ` Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 12:01 UTC (permalink / raw)
  To: William Lee Irwin III, Albert Cahalan, linux-kernel mailing list,
	linux-mm

[ fixed linux-mm address ]

On Fri, 06 Aug 2004 03:46:30 -0700, William Lee Irwin III wrote:
> On Fri, Aug 06, 2004 at 11:40:37AM +0200, Roger Luethi wrote:
> > I discussed this very issue with wli on linux-mm about a year ago. proc
> > file and documentation are still broken. So what's wrong with doing
> > something about it?
> 
> So now what, you want me to do yet another forward port of
> linux-2.4.9-statm-B1.diff?

Your call, obviously -- do you think it's worthwhile? I didn't CC you
on my initial posting because I wanted to avoid the impression that I am
trying to make this your problem somehow. Priorities as I see them are:

- Document statm content somewhere. I posted a patch to document
  the current state. It could be complemented with a description of
  what it is supposed to do.

- Come to some agreement on what the proper values should be and
  change kernels accordingly. I'm inclined to favor keeping the first two
  (albeit redundant) fields and setting the rest to 0, simply because for
  them too many different de-facto semantics live in exisiting kernels.

  A year ago, the first field was broken in 2.4 as well (not sure if/when
  it got fixed), but I can see why it is useful to keep around until top
  has found a better source. Same for the second field, the only one that
  has always been correct AFAIK.

- Provide additional information in proc files other than statm.

  The problems with undocumented records are evident, but
  /proc/pid/status may be getting too heavy for frequent parsing. It's
  not realistic to redesign proc at this point, but it would be nice
  to have some documented understanding about the direction of proc
  evolution.

Roger

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 12:01       ` Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 12:01 UTC (permalink / raw)
  To: William Lee Irwin III, Albert Cahalan, linux-kernel mailing list,
	linux-mm

[ fixed linux-mm address ]

On Fri, 06 Aug 2004 03:46:30 -0700, William Lee Irwin III wrote:
> On Fri, Aug 06, 2004 at 11:40:37AM +0200, Roger Luethi wrote:
> > I discussed this very issue with wli on linux-mm about a year ago. proc
> > file and documentation are still broken. So what's wrong with doing
> > something about it?
> 
> So now what, you want me to do yet another forward port of
> linux-2.4.9-statm-B1.diff?

Your call, obviously -- do you think it's worthwhile? I didn't CC you
on my initial posting because I wanted to avoid the impression that I am
trying to make this your problem somehow. Priorities as I see them are:

- Document statm content somewhere. I posted a patch to document
  the current state. It could be complemented with a description of
  what it is supposed to do.

- Come to some agreement on what the proper values should be and
  change kernels accordingly. I'm inclined to favor keeping the first two
  (albeit redundant) fields and setting the rest to 0, simply because for
  them too many different de-facto semantics live in exisiting kernels.

  A year ago, the first field was broken in 2.4 as well (not sure if/when
  it got fixed), but I can see why it is useful to keep around until top
  has found a better source. Same for the second field, the only one that
  has always been correct AFAIK.

- Provide additional information in proc files other than statm.

  The problems with undocumented records are evident, but
  /proc/pid/status may be getting too heavy for frequent parsing. It's
  not realistic to redesign proc at this point, but it would be nice
  to have some documented understanding about the direction of proc
  evolution.

Roger
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 12:01       ` Roger Luethi
@ 2004-08-06 12:11         ` William Lee Irwin III
  -1 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 12:11 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Albert Cahalan, linux-kernel mailing list, linux-mm

On Fri, Aug 06, 2004 at 02:01:23PM +0200, Roger Luethi wrote:
> Your call, obviously -- do you think it's worthwhile? I didn't CC you
> on my initial posting because I wanted to avoid the impression that I am
> trying to make this your problem somehow. Priorities as I see them are:
> - Document statm content somewhere. I posted a patch to document
>   the current state. It could be complemented with a description of
>   what it is supposed to do.
> - Come to some agreement on what the proper values should be and
>   change kernels accordingly. I'm inclined to favor keeping the first two
>   (albeit redundant) fields and setting the rest to 0, simply because for
>   them too many different de-facto semantics live in exisiting kernels.
>   A year ago, the first field was broken in 2.4 as well (not sure if/when
>   it got fixed), but I can see why it is useful to keep around until top
>   has found a better source. Same for the second field, the only one that
>   has always been correct AFAIK.

Some of the 2.4 semantics just don't make sense. I would not find it
difficult to explain what I believe correct semantics to be in a written
document.

The largest barrier is that the accounting has a large code impact.


On Fri, Aug 06, 2004 at 02:01:23PM +0200, Roger Luethi wrote:
> - Provide additional information in proc files other than statm.
>   The problems with undocumented records are evident, but
>   /proc/pid/status may be getting too heavy for frequent parsing. It's
>   not realistic to redesign proc at this point, but it would be nice
>   to have some documented understanding about the direction of proc
>   evolution.

It will likely be easier to merge improvements of /proc/$PID/status as
the operations there are far less frequent and the accounting less
invasive.


-- wli

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 12:11         ` William Lee Irwin III
  0 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 12:11 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Albert Cahalan, linux-kernel mailing list, linux-mm

On Fri, Aug 06, 2004 at 02:01:23PM +0200, Roger Luethi wrote:
> Your call, obviously -- do you think it's worthwhile? I didn't CC you
> on my initial posting because I wanted to avoid the impression that I am
> trying to make this your problem somehow. Priorities as I see them are:
> - Document statm content somewhere. I posted a patch to document
>   the current state. It could be complemented with a description of
>   what it is supposed to do.
> - Come to some agreement on what the proper values should be and
>   change kernels accordingly. I'm inclined to favor keeping the first two
>   (albeit redundant) fields and setting the rest to 0, simply because for
>   them too many different de-facto semantics live in exisiting kernels.
>   A year ago, the first field was broken in 2.4 as well (not sure if/when
>   it got fixed), but I can see why it is useful to keep around until top
>   has found a better source. Same for the second field, the only one that
>   has always been correct AFAIK.

Some of the 2.4 semantics just don't make sense. I would not find it
difficult to explain what I believe correct semantics to be in a written
document.

The largest barrier is that the accounting has a large code impact.


On Fri, Aug 06, 2004 at 02:01:23PM +0200, Roger Luethi wrote:
> - Provide additional information in proc files other than statm.
>   The problems with undocumented records are evident, but
>   /proc/pid/status may be getting too heavy for frequent parsing. It's
>   not realistic to redesign proc at this point, but it would be nice
>   to have some documented understanding about the direction of proc
>   evolution.

It will likely be easier to merge improvements of /proc/$PID/status as
the operations there are far less frequent and the accounting less
invasive.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06  9:40 ` Roger Luethi
@ 2004-08-06 12:58     ` Albert Cahalan
  2004-08-06 12:58     ` Albert Cahalan
  1 sibling, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 12:58 UTC (permalink / raw)
  To: Roger Luethi; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 2004-08-06 at 05:40, Roger Luethi wrote:
> On Thu, 05 Aug 2004 21:11:52 -0400, Albert Cahalan wrote:
>> Roger Luethi writes:
>>
>>> I really wanted /proc/pid/statm to die [1] and I still believe the
>>> reasoning is valid. As it doesn't look like that is going to happen,
>>
>> It would be awful to lose statm,
>
> Hardly. All I was asking this time was to have a documentation fix
> merged, though.

Just delete the documentation. I certainly never use it.
Since you need the kernel source to get the documentation
anyway, you might as well examine the fs/proc/*.c files.

>> Just why do you want to kill statm?
>
> * Almost everything in there is redundant. IMO the kernel should provide
>   information once and leave the rest to userspace. To make things worse,
>   statm does not simply mirror information from somewhere else in the
>   proc tree, it has its own (broken) routine to calculate redundant
>   information.
>
> * statm is broken. It was broken in 2.4 as well, but _differently_. Every
>   application that relies on statm forwards wrong information, or at
>   the very least needs special casing because the information provided
>   in various fields differs between kernel versions.

The kernel has multiple stat() syscalls. At times, they have been
broken when dealing with UID values that overflow. Should these
system calls have been eliminated? If not, how is this different?

> * Nobody can really tell exactly how broken statm is because there is
>   no canonical documentation of what it is supposed to do. That implies
>   that it is kinda hard to properly fix statm.

Nah. Just look at the 2.2.xx and 2.4.xx kernels.

> * I hate the format. I like my proc files human readable. An important
>   reason that statm could linger around in a broken state for so long
>   is the lack of labels. It's hard to find bugs if there's nothing to
>   indicate what the values are supposed to be. (and yes, /proc/pid/stat
>   is awful, too, but it has the excuse of providing valuable information)

Nobody has been screwing with the statm formatting. There is
no temptation. The same can not be said of the "readable" files.

Is is SigCgt or SigCat? That would depend on kernel version.
What about /proc/cpuinfo? An old file gets parsed on whitespace.
A recent one has ':' characters that you must use.

> The only reason I could see for keeping statm around is that it
> is cheaper than status for parsers in top & Co. Having written one
> of them myself, I have spent quite some time thinking about better
> alternatives. If you want to talk about that, count me in.

The statm format rules, assuming you don't go binary.

>> Now quoting from your patch...
>>
>> + size     total program size (pages)  (same as VmSize in status)
>> + resident size of memory portions (pages) (same as VmRSS in status)
>>
>> There was a distinction here that has been lost. One of these
>> included memory-mapped hardware. You could see this with the
>> X server video memory.
>
> You can definitely not rely on that distinction being there. Feel free to
> add a comment "may or may not include memory-mapped hardware, depending
> on the kernel". This makes statm even worse, because even the seemingly
> well-defined, redundant fields aren't.

This is merely a kernel bug. Hey, bugs happen.

>> tools are expected to run on both old and new kernels, while the
>> kernel is expected to support old apps. We call this an ABI...
>
> Newsflash: Your "ABI" has been broken a long time ago. statm output is
> not what it used to be. If statm is so important, how come its behavior
> is nowhere documented? The code does what it does, but it fails to
> explain what it's meant to calculate. The proc.txt documentation has
> been broken forever (fields switched!) and nobody noticed.

Nobody uses proc.txt, right? The source is documentation.
Old source code is available.

>> + shared   number of pages that are shared (i.e. backed by a file)
>>
>> This isn't in the status file. It's shown in top's default output.
>> Since top must read this value from statm, it might as well use    
>> other parts of statm as well.                                    
>
> I agree that it's not in the status file. I agree that it would be
> useful.
>
> Too bad that column in statm does not really contain the amount of
> shared memory, either. So you got a field labeled "shared" in top which
> contains some other data.
>
> Again, I suggest you add a field to status and make sure the calculation
> is correct.

Why? If statm is broken, it should be fixed. Putting the statm
data into the status file was dumb, but it's too late now.

>> Note: trs means "text RESIDENT set".
>
> Your point being?
>
> That name is only mentioned in proc.txt, it's not used anywhere in the
> code (it's called "text" there). If you want to replace trs with a
> better fitting name, that's great.

The name is correct, though the code might not be. The name is common
to other UNIX-like systems.

On AIX:  ps -eo trs
On BSD:  ps axo trss

Text size is "tsiz". We have that in the stat file, as the difference
between end_code and start_code. We don't need second copy of tsiz.

>> + dt       number of dirty pages   (always 0 on 2.6)
>>
>> This one would be useful.
>
> Agreed. It would be nice to have it somewhere else.

No, it's not nice to go moving things around. How about you go
renumber all the syscalls? The x86-64 arch ordered them to avoid
cache misses. That would be great for i386 too, hmmm?

>> These would be really useful too:
>> 1. swap space used
>> 2. swap space that would be used if fully paged out
>
> There are many values that could be interesting or useful. But that
> has nothing to do with the abomination that is statm.

These values belong in statm.

>> For the pmap command, it would be nice to have per-mapping
>> values in the /proc/*/maps files. (resident, locked,
>> dirty, C-O-W, swapped...) 
>
> Hey, I am all _for_ improving proc. But rather than adding more values,
> I'd like to address some design problems first: For example, I'd
> like to have a reserved value for N/A (currently, kernels just set
> obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> available).

Don't even think of changing this.

> And then there is the trade-off between human readable and
> easy to parse. ISTR there have been occasional discussions, but maybe
> it's time to revisit the issue because the current mess is a problem.

The current bugs are a problem.

Quoting your other email now:

> [ fixed linux-mm address ]

This should have been on linux-kernel in the first place.
The linux-mm list is kind of obscure, and doubly so because
it isn't on vger.kernel.org.

> - Document statm content somewhere. I posted a patch to document
>   the current state. It could be complemented with a description of
>   what it is supposed to do.

Put this in the code as comments if you like.
The proc.txt file isn't used.

> - Come to some agreement on what the proper values should be and
>   change kernels accordingly. I'm inclined to favor keeping the first two
>   (albeit redundant) fields and setting the rest to 0, simply because for
>   them too many different de-facto semantics live in exisiting kernels.
>
>   A year ago, the first field was broken in 2.4 as well (not sure if/when
>   it got fixed), but I can see why it is useful to keep around until top
>   has found a better source. Same for the second field, the only one that
>   has always been correct AFAIK.
>
> - Provide additional information in proc files other than statm.

No, statm is the proper and only place for this data.
I certainly don't claim that statm is bug-free code.
That's not a reason to discard the whole statm concept.

IMHO, the status file should never have been introduced.
It's redundant, wordy, slow to parse, and too tempting to
people who want to rename the keywords. In spite of this,
I don't suggest ripping out the status file. It's in the
ABI now.

>   The problems with undocumented records are evident, but
>   /proc/pid/status may be getting too heavy for frequent parsing. It's
>   not realistic to redesign proc at this point, but it would be nice
>   to have some documented understanding about the direction of proc
>   evolution.

The status file was too heavy from the beginning. I'm now using
a pre-computed hash table to parse the damn thing.

Say, how's this?

$ cat "/proc/SELECT pid,tty,time,cmd FROM proc WHERE user='rl'"
 PID TTY          TIME CMD
5139 ttypf    00:00:00 bash
5158 ttypf    00:00:00 cat
$

It's a standard. :-)




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 12:58     ` Albert Cahalan
  0 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 12:58 UTC (permalink / raw)
  To: Roger Luethi; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 2004-08-06 at 05:40, Roger Luethi wrote:
> On Thu, 05 Aug 2004 21:11:52 -0400, Albert Cahalan wrote:
>> Roger Luethi writes:
>>
>>> I really wanted /proc/pid/statm to die [1] and I still believe the
>>> reasoning is valid. As it doesn't look like that is going to happen,
>>
>> It would be awful to lose statm,
>
> Hardly. All I was asking this time was to have a documentation fix
> merged, though.

Just delete the documentation. I certainly never use it.
Since you need the kernel source to get the documentation
anyway, you might as well examine the fs/proc/*.c files.

>> Just why do you want to kill statm?
>
> * Almost everything in there is redundant. IMO the kernel should provide
>   information once and leave the rest to userspace. To make things worse,
>   statm does not simply mirror information from somewhere else in the
>   proc tree, it has its own (broken) routine to calculate redundant
>   information.
>
> * statm is broken. It was broken in 2.4 as well, but _differently_. Every
>   application that relies on statm forwards wrong information, or at
>   the very least needs special casing because the information provided
>   in various fields differs between kernel versions.

The kernel has multiple stat() syscalls. At times, they have been
broken when dealing with UID values that overflow. Should these
system calls have been eliminated? If not, how is this different?

> * Nobody can really tell exactly how broken statm is because there is
>   no canonical documentation of what it is supposed to do. That implies
>   that it is kinda hard to properly fix statm.

Nah. Just look at the 2.2.xx and 2.4.xx kernels.

> * I hate the format. I like my proc files human readable. An important
>   reason that statm could linger around in a broken state for so long
>   is the lack of labels. It's hard to find bugs if there's nothing to
>   indicate what the values are supposed to be. (and yes, /proc/pid/stat
>   is awful, too, but it has the excuse of providing valuable information)

Nobody has been screwing with the statm formatting. There is
no temptation. The same can not be said of the "readable" files.

Is is SigCgt or SigCat? That would depend on kernel version.
What about /proc/cpuinfo? An old file gets parsed on whitespace.
A recent one has ':' characters that you must use.

> The only reason I could see for keeping statm around is that it
> is cheaper than status for parsers in top & Co. Having written one
> of them myself, I have spent quite some time thinking about better
> alternatives. If you want to talk about that, count me in.

The statm format rules, assuming you don't go binary.

>> Now quoting from your patch...
>>
>> + size     total program size (pages)  (same as VmSize in status)
>> + resident size of memory portions (pages) (same as VmRSS in status)
>>
>> There was a distinction here that has been lost. One of these
>> included memory-mapped hardware. You could see this with the
>> X server video memory.
>
> You can definitely not rely on that distinction being there. Feel free to
> add a comment "may or may not include memory-mapped hardware, depending
> on the kernel". This makes statm even worse, because even the seemingly
> well-defined, redundant fields aren't.

This is merely a kernel bug. Hey, bugs happen.

>> tools are expected to run on both old and new kernels, while the
>> kernel is expected to support old apps. We call this an ABI...
>
> Newsflash: Your "ABI" has been broken a long time ago. statm output is
> not what it used to be. If statm is so important, how come its behavior
> is nowhere documented? The code does what it does, but it fails to
> explain what it's meant to calculate. The proc.txt documentation has
> been broken forever (fields switched!) and nobody noticed.

Nobody uses proc.txt, right? The source is documentation.
Old source code is available.

>> + shared   number of pages that are shared (i.e. backed by a file)
>>
>> This isn't in the status file. It's shown in top's default output.
>> Since top must read this value from statm, it might as well use    
>> other parts of statm as well.                                    
>
> I agree that it's not in the status file. I agree that it would be
> useful.
>
> Too bad that column in statm does not really contain the amount of
> shared memory, either. So you got a field labeled "shared" in top which
> contains some other data.
>
> Again, I suggest you add a field to status and make sure the calculation
> is correct.

Why? If statm is broken, it should be fixed. Putting the statm
data into the status file was dumb, but it's too late now.

>> Note: trs means "text RESIDENT set".
>
> Your point being?
>
> That name is only mentioned in proc.txt, it's not used anywhere in the
> code (it's called "text" there). If you want to replace trs with a
> better fitting name, that's great.

The name is correct, though the code might not be. The name is common
to other UNIX-like systems.

On AIX:  ps -eo trs
On BSD:  ps axo trss

Text size is "tsiz". We have that in the stat file, as the difference
between end_code and start_code. We don't need second copy of tsiz.

>> + dt       number of dirty pages   (always 0 on 2.6)
>>
>> This one would be useful.
>
> Agreed. It would be nice to have it somewhere else.

No, it's not nice to go moving things around. How about you go
renumber all the syscalls? The x86-64 arch ordered them to avoid
cache misses. That would be great for i386 too, hmmm?

>> These would be really useful too:
>> 1. swap space used
>> 2. swap space that would be used if fully paged out
>
> There are many values that could be interesting or useful. But that
> has nothing to do with the abomination that is statm.

These values belong in statm.

>> For the pmap command, it would be nice to have per-mapping
>> values in the /proc/*/maps files. (resident, locked,
>> dirty, C-O-W, swapped...) 
>
> Hey, I am all _for_ improving proc. But rather than adding more values,
> I'd like to address some design problems first: For example, I'd
> like to have a reserved value for N/A (currently, kernels just set
> obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> available).

Don't even think of changing this.

> And then there is the trade-off between human readable and
> easy to parse. ISTR there have been occasional discussions, but maybe
> it's time to revisit the issue because the current mess is a problem.

The current bugs are a problem.

Quoting your other email now:

> [ fixed linux-mm address ]

This should have been on linux-kernel in the first place.
The linux-mm list is kind of obscure, and doubly so because
it isn't on vger.kernel.org.

> - Document statm content somewhere. I posted a patch to document
>   the current state. It could be complemented with a description of
>   what it is supposed to do.

Put this in the code as comments if you like.
The proc.txt file isn't used.

> - Come to some agreement on what the proper values should be and
>   change kernels accordingly. I'm inclined to favor keeping the first two
>   (albeit redundant) fields and setting the rest to 0, simply because for
>   them too many different de-facto semantics live in exisiting kernels.
>
>   A year ago, the first field was broken in 2.4 as well (not sure if/when
>   it got fixed), but I can see why it is useful to keep around until top
>   has found a better source. Same for the second field, the only one that
>   has always been correct AFAIK.
>
> - Provide additional information in proc files other than statm.

No, statm is the proper and only place for this data.
I certainly don't claim that statm is bug-free code.
That's not a reason to discard the whole statm concept.

IMHO, the status file should never have been introduced.
It's redundant, wordy, slow to parse, and too tempting to
people who want to rename the keywords. In spite of this,
I don't suggest ripping out the status file. It's in the
ABI now.

>   The problems with undocumented records are evident, but
>   /proc/pid/status may be getting too heavy for frequent parsing. It's
>   not realistic to redesign proc at this point, but it would be nice
>   to have some documented understanding about the direction of proc
>   evolution.

The status file was too heavy from the beginning. I'm now using
a pre-computed hash table to parse the damn thing.

Say, how's this?

$ cat "/proc/SELECT pid,tty,time,cmd FROM proc WHERE user='rl'"
 PID TTY          TIME CMD
5139 ttypf    00:00:00 bash
5158 ttypf    00:00:00 cat
$

It's a standard. :-)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 12:11         ` William Lee Irwin III
@ 2004-08-06 13:57           ` Roger Luethi
  -1 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 13:57 UTC (permalink / raw)
  To: William Lee Irwin III, Albert Cahalan, linux-kernel mailing list,
	linux-mm

On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:
> Some of the 2.4 semantics just don't make sense. I would not find it
> difficult to explain what I believe correct semantics to be in a written
> document.

IMO this is a must for such files (and be it only some comments above
the code implementing them). I'm afraid that statm is carrying too much
historical baggage, though -- you would add yet another interpretation
of those 7 fields.

Tools reading statm would have to be updated anyway, so I'd rather
think about what could be done with a new (or just different) file.

For sysfs we have guidelines (e.g. sysfs.txt: "Attributes should be ASCII
text files, preferably with only one value per file. It is noted that it
may not be efficient to contain only value per file, so it is socially
acceptable to express an array of values of the same type.").

I'm not aware of anything comparable for proc, so it's hard to say
what a good solution would look like. Files like /proc/pid/status
are human-readable and maintenance-friendly (the parser can recognize
unknown values and gets a free label along with it; obsolete fields can
be removed). The downside is the performance aspect you pointed out:
Reading that file for every process just to grep for one or two values
is slow, and some of the unused data items might be expensive for the
kernel to produce in the first place.

It seems that most new information of interest is being added to
/proc/pid/status and friends these days. Are there any plans to
accomodate tool authors who are interested in additional information
but are wary of the increasing costs of these files?

A light-weight interface for tools could work like this (ugly):

$ cat /proc/pid.provided
Name SleepAVG Pid Tgid PPid VmSize VmLck VmData [...]
$ cat /proc/10235/VmSize.VmData
3380 144

Or use netlink maybe? It sure would be nice to monitor all processes
with lower overhead, and to have tools that can deal with new data
items without an update.

Roger

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 13:57           ` Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 13:57 UTC (permalink / raw)
  To: William Lee Irwin III, Albert Cahalan, linux-kernel mailing list,
	linux-mm

On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:
> Some of the 2.4 semantics just don't make sense. I would not find it
> difficult to explain what I believe correct semantics to be in a written
> document.

IMO this is a must for such files (and be it only some comments above
the code implementing them). I'm afraid that statm is carrying too much
historical baggage, though -- you would add yet another interpretation
of those 7 fields.

Tools reading statm would have to be updated anyway, so I'd rather
think about what could be done with a new (or just different) file.

For sysfs we have guidelines (e.g. sysfs.txt: "Attributes should be ASCII
text files, preferably with only one value per file. It is noted that it
may not be efficient to contain only value per file, so it is socially
acceptable to express an array of values of the same type.").

I'm not aware of anything comparable for proc, so it's hard to say
what a good solution would look like. Files like /proc/pid/status
are human-readable and maintenance-friendly (the parser can recognize
unknown values and gets a free label along with it; obsolete fields can
be removed). The downside is the performance aspect you pointed out:
Reading that file for every process just to grep for one or two values
is slow, and some of the unused data items might be expensive for the
kernel to produce in the first place.

It seems that most new information of interest is being added to
/proc/pid/status and friends these days. Are there any plans to
accomodate tool authors who are interested in additional information
but are wary of the increasing costs of these files?

A light-weight interface for tools could work like this (ugly):

$ cat /proc/pid.provided
Name SleepAVG Pid Tgid PPid VmSize VmLck VmData [...]
$ cat /proc/10235/VmSize.VmData
3380 144

Or use netlink maybe? It sure would be nice to monitor all processes
with lower overhead, and to have tools that can deal with new data
items without an update.

Roger
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 12:01       ` Roger Luethi
@ 2004-08-06 14:02         ` Albert Cahalan
  -1 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 14:02 UTC (permalink / raw)
  To: Roger Luethi; +Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

Roger Luethi writes:
> On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:

>> Some of the 2.4 semantics just don't make sense. I would not find it
>> difficult to explain what I believe correct semantics to be in a written
>> document.
>
> IMO this is a must for such files (and be it only some comments above
> the code implementing them). I'm afraid that statm is carrying too much
> historical baggage, though -- you would add yet another interpretation
> of those 7 fields.
>
> Tools reading statm would have to be updated anyway, so I'd rather
> think about what could be done with a new (or just different) file.

Even if the existing fields are indeed mostly junk, you can always
add new fields to the end.

> For sysfs we have guidelines (e.g. sysfs.txt: "Attributes should be ASCII
> text files, preferably with only one value per file. It is noted that it
> may not be efficient to contain only value per file, so it is socially
> acceptable to express an array of values of the same type.").

This is being lost. PCI ROM data isn't ASCII unless you use hex.

> I'm not aware of anything comparable for proc, so it's hard to say
> what a good solution would look like. Files like /proc/pid/status
> are human-readable and maintenance-friendly (the parser can recognize
> unknown values and gets a free label along with it; obsolete fields can
> be removed).

If you're just spewing the values with a perl script, sure.
I'm not sure this matters.

Normal C programs don't work that way. Unknown values are useless.
What am I supposed to do with an unknown value? I can't even tell
what data type it is. Maybe 12345 is really a string. I'm going
to rely on the values I need, so you can't freely delete things.
If I didn't need the values, I wouldn't read the file at all.

> The downside is the performance aspect you pointed out:
> Reading that file for every process just to grep for one or two values
> is slow, and some of the unused data items might be expensive for the
> kernel to produce in the first place.

You're using grep??? That's a script then. You can tolerate
getting your info from "ps" output. It's not a performance
issue for you. For ps, performance is a problem. Thus ps must
get priority in the design of /proc files.

You can do this:

ps -eo pid= -o comm= | grep '[f]oo' | ...

Heck, it's even portable!

> It seems that most new information of interest is being added to
> /proc/pid/status and friends these days. Are there any plans to
> accomodate tool authors who are interested in additional information
> but are wary of the increasing costs of these files?

> A light-weight interface for tools could work like this (ugly):
>
> $ cat /proc/pid.provided
> Name SleepAVG Pid Tgid PPid VmSize VmLck VmData [...]
> $ cat /proc/10235/VmSize.VmData
> 3380 144

It's hard to imagine parsing that. I suppose I'm expected to
dynamicly create a sscanf format using the numbered-parameter
notation? Maybe I have to fill a table with pointers to... Ugh.

If it's going to be this dynamic, then just give me DWARF2 debug
info and the raw data. Like this:

/proc/DWARF2
/proc/1000/mm_struct
/proc/1000/signal_struct
/proc/1000/sighand_struct
/proc/1000/task/1024/thread_info
/proc/1000/task/1024/task_struct
/proc/1000/task/1024/fs_struct

> Or use netlink maybe? It sure would be nice to monitor all processes
> with lower overhead, and to have tools that can deal with new data
> items without an update.

I've been thinking netlink might be good.

> I am also interested in a related problem -- finding a better way for
> tools to access process information. Preferably a generic way so we
> don't need to keep tools and kernel in sync forever. I have some ideas,
> but I don't know if they are acceptable as solutions (and if the problem
> actually exists as I see it).

Look at other systems. FreeBSD, AIX, and Solaris all have
superior ways of getting process data. Being compatible, at
least for the basic info, would be good.

FreeBSD: binary sysctl data with built-in process selection
AIX:     dedicated syscall, somewhat resembling directory reads
Solaris: binary /proc, including arrays for per-thread data

Somebody can research Tru64, HP-UX, MacOS X, and IRIX.

> Most of the current problems with proc are related to tools: They don't
> like changes and some of them are very sensitive to resource usage
> (because they may make hundreds of calls per second on typical systems).

Make that 2000 /proc reads per second or more. This is too slow.
I need to read about 1 million /proc files per second.

> If we want to facilitate the use of additional information in tools,
> I see two possible strategies:
>
> - Design a new solution that enables tools to discover the fields
>   that are available and to ask for a subset (as I sketched out in my
>   previous post). This would remove the need for inflexible solutions
>   like statm.

That's useless.

If I didn't need the data, I wouldn't be trying to read it.
If I haven't written code to use new data, I sure won't be
caring to know the name of the new data.

> - Split proc information by new criteria: Slow, expensive items should
>   not be in the same file as information that tools typically
>   and frequently read. For instance, you could have status_basic,
>   status_exotic, and status_slow. Even status_basic could have a format
>   similar to /proc/pid/status, but would be shorter and contain only
>   the most frequently used values (like statm today -- with all the
>   problems that come with such a pre-made selection).

Split by:
1. locking
2. security.




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 14:02         ` Albert Cahalan
  0 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 14:02 UTC (permalink / raw)
  To: Roger Luethi; +Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

Roger Luethi writes:
> On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:

>> Some of the 2.4 semantics just don't make sense. I would not find it
>> difficult to explain what I believe correct semantics to be in a written
>> document.
>
> IMO this is a must for such files (and be it only some comments above
> the code implementing them). I'm afraid that statm is carrying too much
> historical baggage, though -- you would add yet another interpretation
> of those 7 fields.
>
> Tools reading statm would have to be updated anyway, so I'd rather
> think about what could be done with a new (or just different) file.

Even if the existing fields are indeed mostly junk, you can always
add new fields to the end.

> For sysfs we have guidelines (e.g. sysfs.txt: "Attributes should be ASCII
> text files, preferably with only one value per file. It is noted that it
> may not be efficient to contain only value per file, so it is socially
> acceptable to express an array of values of the same type.").

This is being lost. PCI ROM data isn't ASCII unless you use hex.

> I'm not aware of anything comparable for proc, so it's hard to say
> what a good solution would look like. Files like /proc/pid/status
> are human-readable and maintenance-friendly (the parser can recognize
> unknown values and gets a free label along with it; obsolete fields can
> be removed).

If you're just spewing the values with a perl script, sure.
I'm not sure this matters.

Normal C programs don't work that way. Unknown values are useless.
What am I supposed to do with an unknown value? I can't even tell
what data type it is. Maybe 12345 is really a string. I'm going
to rely on the values I need, so you can't freely delete things.
If I didn't need the values, I wouldn't read the file at all.

> The downside is the performance aspect you pointed out:
> Reading that file for every process just to grep for one or two values
> is slow, and some of the unused data items might be expensive for the
> kernel to produce in the first place.

You're using grep??? That's a script then. You can tolerate
getting your info from "ps" output. It's not a performance
issue for you. For ps, performance is a problem. Thus ps must
get priority in the design of /proc files.

You can do this:

ps -eo pid= -o comm= | grep '[f]oo' | ...

Heck, it's even portable!

> It seems that most new information of interest is being added to
> /proc/pid/status and friends these days. Are there any plans to
> accomodate tool authors who are interested in additional information
> but are wary of the increasing costs of these files?

> A light-weight interface for tools could work like this (ugly):
>
> $ cat /proc/pid.provided
> Name SleepAVG Pid Tgid PPid VmSize VmLck VmData [...]
> $ cat /proc/10235/VmSize.VmData
> 3380 144

It's hard to imagine parsing that. I suppose I'm expected to
dynamicly create a sscanf format using the numbered-parameter
notation? Maybe I have to fill a table with pointers to... Ugh.

If it's going to be this dynamic, then just give me DWARF2 debug
info and the raw data. Like this:

/proc/DWARF2
/proc/1000/mm_struct
/proc/1000/signal_struct
/proc/1000/sighand_struct
/proc/1000/task/1024/thread_info
/proc/1000/task/1024/task_struct
/proc/1000/task/1024/fs_struct

> Or use netlink maybe? It sure would be nice to monitor all processes
> with lower overhead, and to have tools that can deal with new data
> items without an update.

I've been thinking netlink might be good.

> I am also interested in a related problem -- finding a better way for
> tools to access process information. Preferably a generic way so we
> don't need to keep tools and kernel in sync forever. I have some ideas,
> but I don't know if they are acceptable as solutions (and if the problem
> actually exists as I see it).

Look at other systems. FreeBSD, AIX, and Solaris all have
superior ways of getting process data. Being compatible, at
least for the basic info, would be good.

FreeBSD: binary sysctl data with built-in process selection
AIX:     dedicated syscall, somewhat resembling directory reads
Solaris: binary /proc, including arrays for per-thread data

Somebody can research Tru64, HP-UX, MacOS X, and IRIX.

> Most of the current problems with proc are related to tools: They don't
> like changes and some of them are very sensitive to resource usage
> (because they may make hundreds of calls per second on typical systems).

Make that 2000 /proc reads per second or more. This is too slow.
I need to read about 1 million /proc files per second.

> If we want to facilitate the use of additional information in tools,
> I see two possible strategies:
>
> - Design a new solution that enables tools to discover the fields
>   that are available and to ask for a subset (as I sketched out in my
>   previous post). This would remove the need for inflexible solutions
>   like statm.

That's useless.

If I didn't need the data, I wouldn't be trying to read it.
If I haven't written code to use new data, I sure won't be
caring to know the name of the new data.

> - Split proc information by new criteria: Slow, expensive items should
>   not be in the same file as information that tools typically
>   and frequently read. For instance, you could have status_basic,
>   status_exotic, and status_slow. Even status_basic could have a format
>   similar to /proc/pid/status, but would be shorter and contain only
>   the most frequently used values (like statm today -- with all the
>   problems that come with such a pre-made selection).

Split by:
1. locking
2. security.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 13:57           ` Roger Luethi
@ 2004-08-06 14:07             ` William Lee Irwin III
  -1 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 14:07 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Albert Cahalan, linux-kernel mailing list, linux-mm

On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:
>> Some of the 2.4 semantics just don't make sense. I would not find it
>> difficult to explain what I believe correct semantics to be in a written
>> document.

On Fri, Aug 06, 2004 at 03:57:56PM +0200, Roger Luethi wrote:
> IMO this is a must for such files (and be it only some comments above
> the code implementing them). I'm afraid that statm is carrying too much
> historical baggage, though -- you would add yet another interpretation
> of those 7 fields.
> Tools reading statm would have to be updated anyway, so I'd rather
> think about what could be done with a new (or just different) file.

Okay, could you write up a "specification" for what you want reported,
then I can cook up a new file or some such for you?

Thanks.


-- wli

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 14:07             ` William Lee Irwin III
  0 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 14:07 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Albert Cahalan, linux-kernel mailing list, linux-mm

On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:
>> Some of the 2.4 semantics just don't make sense. I would not find it
>> difficult to explain what I believe correct semantics to be in a written
>> document.

On Fri, Aug 06, 2004 at 03:57:56PM +0200, Roger Luethi wrote:
> IMO this is a must for such files (and be it only some comments above
> the code implementing them). I'm afraid that statm is carrying too much
> historical baggage, though -- you would add yet another interpretation
> of those 7 fields.
> Tools reading statm would have to be updated anyway, so I'd rather
> think about what could be done with a new (or just different) file.

Okay, could you write up a "specification" for what you want reported,
then I can cook up a new file or some such for you?

Thanks.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 15:48       ` William Lee Irwin III
@ 2004-08-06 14:14         ` Albert Cahalan
  -1 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 14:14 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Roger Luethi, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 11:48, William Lee Irwin III wrote:
> On Fri, 2004-08-06 at 05:40, Roger Luethi wrote:
> >> And then there is the trade-off between human readable and
> >> easy to parse. ISTR there have been occasional discussions, but maybe
> >> it's time to revisit the issue because the current mess is a problem.
> 
> On Fri, Aug 06, 2004 at 08:58:43AM -0400, Albert Cahalan wrote:
> > The current bugs are a problem.
> > Quoting your other email now:
> 
> Could you describe those in isolation from other issues?

Whatever Roger found, plus:

1. trs == text RESIDENT size

2. drs == data RESIDENT size

3. memory-mapped devices should be counted for only 1 file
   (use an old Linux box running X to see)

I'm not terribly concerned right now. I just don't think
it's OK to go ripping out statm over a few bugs.

If we ripped out every buggy piece of kernel code, we'd
have a 0-byte kernel.

There are far bigger issues elsewhere, like %CPU.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 14:14         ` Albert Cahalan
  0 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 14:14 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Roger Luethi, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 11:48, William Lee Irwin III wrote:
> On Fri, 2004-08-06 at 05:40, Roger Luethi wrote:
> >> And then there is the trade-off between human readable and
> >> easy to parse. ISTR there have been occasional discussions, but maybe
> >> it's time to revisit the issue because the current mess is a problem.
> 
> On Fri, Aug 06, 2004 at 08:58:43AM -0400, Albert Cahalan wrote:
> > The current bugs are a problem.
> > Quoting your other email now:
> 
> Could you describe those in isolation from other issues?

Whatever Roger found, plus:

1. trs == text RESIDENT size

2. drs == data RESIDENT size

3. memory-mapped devices should be counted for only 1 file
   (use an old Linux box running X to see)

I'm not terribly concerned right now. I just don't think
it's OK to go ripping out statm over a few bugs.

If we ripped out every buggy piece of kernel code, we'd
have a 0-byte kernel.

There are far bigger issues elsewhere, like %CPU.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 16:34       ` Roger Luethi
@ 2004-08-06 14:51         ` Albert Cahalan
  -1 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 14:51 UTC (permalink / raw)
  To: Roger Luethi; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 2004-08-06 at 12:34, Roger Luethi wrote:
> On Fri, 06 Aug 2004 08:58:43 -0400, Albert Cahalan wrote:
> > > Hardly. All I was asking this time was to have a documentation fix
> > > merged, though.
> > 
> > Just delete the documentation. I certainly never use it.
> 
> It wasn't written for you.

OK, but the statm file was. (well, for the maintainer
of procps a decade ago)

Everybody else can parse ps output.

> > Since you need the kernel source to get the documentation
> > anyway, you might as well examine the fs/proc/*.c files.
> 
> Some users may prefer written documentation over reading the kernel
> source. In addition, in the case of statm, there is nothing to document
> the expected behavior in the source, either. Which is precisely why
> statm has been utterly broken forever.

A correct proc.txt would not have avoided this.
The source code needs a few comments.

> > > * statm is broken. It was broken in 2.4 as well, but _differently_. Every
> > >   application that relies on statm forwards wrong information, or at
> > >   the very least needs special casing because the information provided
> > >   in various fields differs between kernel versions.
> > 
> > The kernel has multiple stat() syscalls. At times, they have been
> > broken when dealing with UID values that overflow. Should these
> > system calls have been eliminated? If not, how is this different?
> 
> stat is a well-defined POSIX call.

Sure, and the currently version with wide UID values was
working just fine. POSIX only defines the library interface
anyway. This made the old stat syscalls be non-POSIX crud.

They were part of the ABI. They were fixed.

> > Why? If statm is broken, it should be fixed. Putting the statm
> > data into the status file was dumb, but it's too late now.
> 
> It was not dumb. Some people actually prefer human-readable output when
> working with proc.

These people shouldn't be working in /proc. It's easier and
more portable to use "ps" for their scripts. You can select
which fields you want, get a header if you like, have the
processes filtered for you, and so on. Look:

ps -U root -u root -o pid= -o ppid= -o args

What's not to like about that? It's portable even.

> > On AIX:  ps -eo trs
> > On BSD:  ps axo trss
> 
> I trust they take that information from /proc/pid/statm, too?

The point is that the name "trs" has a specific meaning.
The statm file was created to support ps. It wouldn't exist
if ps didn't need to display a TRS column. So the proper
behavior of ps is what defines the meaning. Run these two
commands to see where TRS should be used:

ps v
CMD_ENV=old ps m

Right now, TSIZ is being substituted. That's wrong.
(this is part of the reason why "ps m" was changed)

The top command still tries to display the real trs.

> > >> + dt       number of dirty pages   (always 0 on 2.6)
> > >>
> > >> This one would be useful.
> > >
> > > Agreed. It would be nice to have it somewhere else.
> > 
> > No, it's not nice to go moving things around. How about you go
> 
> This field is 0 on 2.6. Zero. Always. I am suggesting to have the
> information available somewhere. That sure ought to count as an
> improvement.

Sure. That "somewhere" should be where it was before.

> > >> These would be really useful too:
> > >> 1. swap space used
> > >> 2. swap space that would be used if fully paged out
> > >
> > > There are many values that could be interesting or useful. But that
> > > has nothing to do with the abomination that is statm.
> > 
> > These values belong in statm.
> 
> I thought there was no screwing around with the statm format!?

Adding on to the end is always allowed.

You can't change a value to hex, insert a field in the middle,
delete a field, add comments, change the units, and so on.

> > > Hey, I am all _for_ improving proc. But rather than adding more values,
> > > I'd like to address some design problems first: For example, I'd
> > > like to have a reserved value for N/A (currently, kernels just set
> > > obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> > > available).
> > 
> > Don't even think of changing this.
> 
> Why not? Got a better solution?

Old tools need a value that will best make them work. (zero)
New tools can examine the kernel version number.

> > > [ fixed linux-mm address ]
> > 
> > This should have been on linux-kernel in the first place.
> > The linux-mm list is kind of obscure, and doubly so because
> > it isn't on vger.kernel.org.
> 
> This _was_ on linux-kernel in the first place. _You_ added the wrong
> linux-mm address. I don't get your humor.

I'm referring to your original post. I added back linux-mm
because that's where you first brought this up.

> > No, statm is the proper and only place for this data.
> > I certainly don't claim that statm is bug-free code.
> > That's not a reason to discard the whole statm concept.
> 
> The current state of statm code clearly demonstrates the level of
> interest in this concept.

It demonstrates that misleading data is hard to spot.
It demonstrates that people hacking on the kernel are
often unconcerned with providing correct stats for others.
For example, the LRS field should have been fixed when
the ELF binary format support was introduced.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 14:51         ` Albert Cahalan
  0 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 14:51 UTC (permalink / raw)
  To: Roger Luethi; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 2004-08-06 at 12:34, Roger Luethi wrote:
> On Fri, 06 Aug 2004 08:58:43 -0400, Albert Cahalan wrote:
> > > Hardly. All I was asking this time was to have a documentation fix
> > > merged, though.
> > 
> > Just delete the documentation. I certainly never use it.
> 
> It wasn't written for you.

OK, but the statm file was. (well, for the maintainer
of procps a decade ago)

Everybody else can parse ps output.

> > Since you need the kernel source to get the documentation
> > anyway, you might as well examine the fs/proc/*.c files.
> 
> Some users may prefer written documentation over reading the kernel
> source. In addition, in the case of statm, there is nothing to document
> the expected behavior in the source, either. Which is precisely why
> statm has been utterly broken forever.

A correct proc.txt would not have avoided this.
The source code needs a few comments.

> > > * statm is broken. It was broken in 2.4 as well, but _differently_. Every
> > >   application that relies on statm forwards wrong information, or at
> > >   the very least needs special casing because the information provided
> > >   in various fields differs between kernel versions.
> > 
> > The kernel has multiple stat() syscalls. At times, they have been
> > broken when dealing with UID values that overflow. Should these
> > system calls have been eliminated? If not, how is this different?
> 
> stat is a well-defined POSIX call.

Sure, and the currently version with wide UID values was
working just fine. POSIX only defines the library interface
anyway. This made the old stat syscalls be non-POSIX crud.

They were part of the ABI. They were fixed.

> > Why? If statm is broken, it should be fixed. Putting the statm
> > data into the status file was dumb, but it's too late now.
> 
> It was not dumb. Some people actually prefer human-readable output when
> working with proc.

These people shouldn't be working in /proc. It's easier and
more portable to use "ps" for their scripts. You can select
which fields you want, get a header if you like, have the
processes filtered for you, and so on. Look:

ps -U root -u root -o pid= -o ppid= -o args

What's not to like about that? It's portable even.

> > On AIX:  ps -eo trs
> > On BSD:  ps axo trss
> 
> I trust they take that information from /proc/pid/statm, too?

The point is that the name "trs" has a specific meaning.
The statm file was created to support ps. It wouldn't exist
if ps didn't need to display a TRS column. So the proper
behavior of ps is what defines the meaning. Run these two
commands to see where TRS should be used:

ps v
CMD_ENV=old ps m

Right now, TSIZ is being substituted. That's wrong.
(this is part of the reason why "ps m" was changed)

The top command still tries to display the real trs.

> > >> + dt       number of dirty pages   (always 0 on 2.6)
> > >>
> > >> This one would be useful.
> > >
> > > Agreed. It would be nice to have it somewhere else.
> > 
> > No, it's not nice to go moving things around. How about you go
> 
> This field is 0 on 2.6. Zero. Always. I am suggesting to have the
> information available somewhere. That sure ought to count as an
> improvement.

Sure. That "somewhere" should be where it was before.

> > >> These would be really useful too:
> > >> 1. swap space used
> > >> 2. swap space that would be used if fully paged out
> > >
> > > There are many values that could be interesting or useful. But that
> > > has nothing to do with the abomination that is statm.
> > 
> > These values belong in statm.
> 
> I thought there was no screwing around with the statm format!?

Adding on to the end is always allowed.

You can't change a value to hex, insert a field in the middle,
delete a field, add comments, change the units, and so on.

> > > Hey, I am all _for_ improving proc. But rather than adding more values,
> > > I'd like to address some design problems first: For example, I'd
> > > like to have a reserved value for N/A (currently, kernels just set
> > > obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> > > available).
> > 
> > Don't even think of changing this.
> 
> Why not? Got a better solution?

Old tools need a value that will best make them work. (zero)
New tools can examine the kernel version number.

> > > [ fixed linux-mm address ]
> > 
> > This should have been on linux-kernel in the first place.
> > The linux-mm list is kind of obscure, and doubly so because
> > it isn't on vger.kernel.org.
> 
> This _was_ on linux-kernel in the first place. _You_ added the wrong
> linux-mm address. I don't get your humor.

I'm referring to your original post. I added back linux-mm
because that's where you first brought this up.

> > No, statm is the proper and only place for this data.
> > I certainly don't claim that statm is bug-free code.
> > That's not a reason to discard the whole statm concept.
> 
> The current state of statm code clearly demonstrates the level of
> interest in this concept.

It demonstrates that misleading data is hard to spot.
It demonstrates that people hacking on the kernel are
often unconcerned with providing correct stats for others.
For example, the LRS field should have been fixed when
the ELF binary format support was introduced.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 14:07             ` William Lee Irwin III
@ 2004-08-06 15:02               ` Roger Luethi
  -1 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 15:02 UTC (permalink / raw)
  To: William Lee Irwin III, Albert Cahalan, linux-kernel mailing list,
	linux-mm

On Fri, 06 Aug 2004 07:07:14 -0700, William Lee Irwin III wrote:
> On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:
> >> Some of the 2.4 semantics just don't make sense. I would not find it
> >> difficult to explain what I believe correct semantics to be in a written
> >> document.
> 
> On Fri, Aug 06, 2004 at 03:57:56PM +0200, Roger Luethi wrote:
> > IMO this is a must for such files (and be it only some comments above
> > the code implementing them). I'm afraid that statm is carrying too much
> > historical baggage, though -- you would add yet another interpretation
> > of those 7 fields.
> > Tools reading statm would have to be updated anyway, so I'd rather
> > think about what could be done with a new (or just different) file.
> 
> Okay, could you write up a "specification" for what you want reported,
> then I can cook up a new file or some such for you?

Thanks for your offer. I really suck at communicating, it seems. I don't
mind implementing my suggestion or writing documentation if there is a
general agreement that this is the way to go. Currently, I am looking
for suggestions and comments.

My suggestion for /proc/pid/statm would be

Field 0 := /proc/pid/status:VmSize
Field 1 := /proc/pid/status:VmRSS
Fields 2-6: 0

That's really just cleaning up cruft and is trivial to implement.

============ Warning. Switching subject.
Maybe I should have started a separate thread for that. Sorry about
the confusion.

I am also interested in a related problem -- finding a better way for
tools to access process information. Preferably a generic way so we
don't need to keep tools and kernel in sync forever. I have some ideas,
but I don't know if they are acceptable as solutions (and if the problem
actually exists as I see it).

Most of the current problems with proc are related to tools: They don't
like changes and some of them are very sensitive to resource usage
(because they may make hundreds of calls per second on typical systems).

If we want to facilitate the use of additional information in tools,
I see two possible strategies:

- Design a new solution that enables tools to discover the fields
  that are available and to ask for a subset (as I sketched out in my
  previous post). This would remove the need for inflexible solutions
  like statm.

- Split proc information by new criteria: Slow, expensive items should
  not be in the same file as information that tools typically
  and frequently read. For instance, you could have status_basic,
  status_exotic, and status_slow. Even status_basic could have a format
  similar to /proc/pid/status, but would be shorter and contain only
  the most frequently used values (like statm today -- with all the
  problems that come with such a pre-made selection).

Roger

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 15:02               ` Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 15:02 UTC (permalink / raw)
  To: William Lee Irwin III, Albert Cahalan, linux-kernel mailing list,
	linux-mm

On Fri, 06 Aug 2004 07:07:14 -0700, William Lee Irwin III wrote:
> On Fri, 06 Aug 2004 05:11:18 -0700, William Lee Irwin III wrote:
> >> Some of the 2.4 semantics just don't make sense. I would not find it
> >> difficult to explain what I believe correct semantics to be in a written
> >> document.
> 
> On Fri, Aug 06, 2004 at 03:57:56PM +0200, Roger Luethi wrote:
> > IMO this is a must for such files (and be it only some comments above
> > the code implementing them). I'm afraid that statm is carrying too much
> > historical baggage, though -- you would add yet another interpretation
> > of those 7 fields.
> > Tools reading statm would have to be updated anyway, so I'd rather
> > think about what could be done with a new (or just different) file.
> 
> Okay, could you write up a "specification" for what you want reported,
> then I can cook up a new file or some such for you?

Thanks for your offer. I really suck at communicating, it seems. I don't
mind implementing my suggestion or writing documentation if there is a
general agreement that this is the way to go. Currently, I am looking
for suggestions and comments.

My suggestion for /proc/pid/statm would be

Field 0 := /proc/pid/status:VmSize
Field 1 := /proc/pid/status:VmRSS
Fields 2-6: 0

That's really just cleaning up cruft and is trivial to implement.

============ Warning. Switching subject.
Maybe I should have started a separate thread for that. Sorry about
the confusion.

I am also interested in a related problem -- finding a better way for
tools to access process information. Preferably a generic way so we
don't need to keep tools and kernel in sync forever. I have some ideas,
but I don't know if they are acceptable as solutions (and if the problem
actually exists as I see it).

Most of the current problems with proc are related to tools: They don't
like changes and some of them are very sensitive to resource usage
(because they may make hundreds of calls per second on typical systems).

If we want to facilitate the use of additional information in tools,
I see two possible strategies:

- Design a new solution that enables tools to discover the fields
  that are available and to ask for a subset (as I sketched out in my
  previous post). This would remove the need for inflexible solutions
  like statm.

- Split proc information by new criteria: Slow, expensive items should
  not be in the same file as information that tools typically
  and frequently read. For instance, you could have status_basic,
  status_exotic, and status_slow. Even status_basic could have a format
  similar to /proc/pid/status, but would be shorter and contain only
  the most frequently used values (like statm today -- with all the
  problems that come with such a pre-made selection).

Roger
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 17:08           ` Roger Luethi
@ 2004-08-06 15:14             ` Albert Cahalan
  -1 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 15:14 UTC (permalink / raw)
  To: Roger Luethi; +Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 13:08, Roger Luethi wrote:
> On Fri, 06 Aug 2004 10:02:28 -0400, Albert Cahalan wrote:

> > > what a good solution would look like. Files like /proc/pid/status
> > > are human-readable and maintenance-friendly (the parser can recognize
> > > unknown values and gets a free label along with it; obsolete fields can
> > > be removed).
> > 
> > If you're just spewing the values with a perl script, sure.
> > I'm not sure this matters.
> 
> It matters to me. I like to have tools that don't need updates to
> cope with new fields. Having to wait for tool authors to catch up with
> kernels is annoying.

Not many people want raw data, so the tool authors
will need to put out new releases anyway.

It doesn't take more than a week generally.

> > If it's going to be this dynamic, then just give me DWARF2 debug
> > info and the raw data. Like this:
> > 
> > /proc/DWARF2
> > /proc/1000/mm_struct
> > /proc/1000/signal_struct
> > /proc/1000/sighand_struct
> > /proc/1000/task/1024/thread_info
> > /proc/1000/task/1024/task_struct
> > /proc/1000/task/1024/fs_struct
> 
> That's different. The overhead would be prohibitive. Also, this exposes
> internal kernel structures.

The overhead? I'm not seeing much, other than the multiple
files and the very fact that field locations are movable.

As long as I can fall back to the old /proc files when truly
radical kernel changes happen, exposure of kernel internals
isn't a serious problem.

If I had the DWARF2 data alone, /dev/mem might be enough.
(sadly, "top" would require some major work before I'd trust it)

> > > Or use netlink maybe? It sure would be nice to monitor all processes
> > > with lower overhead, and to have tools that can deal with new data
> > > items without an update.
> > 
> > I've been thinking netlink might be good.
> 
> Alright. Maybe we can move our discussion into this direction?

I'll need to track down some netlink documentation.
Last time I looked, there wasn't any.
 
> > > - Split proc information by new criteria: Slow, expensive items should
> > >   not be in the same file as information that tools typically
> > >   and frequently read. For instance, you could have status_basic,
> > >   status_exotic, and status_slow. Even status_basic could have a format
> > >   similar to /proc/pid/status, but would be shorter and contain only
> > >   the most frequently used values (like statm today -- with all the
> > >   problems that come with such a pre-made selection).
> > 
> > Split by:
> > 1. locking
> > 2. security.
> 
> Hmmm... How does this translate to a netlink interface? Can you elaborate?

I don't think it does.

For the existing files though:

Some SE Linux policies block all access to /proc. Some security
feature patches zero out things that would reveal addresses.
(start_code, end_code, wchan...)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 15:14             ` Albert Cahalan
  0 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 15:14 UTC (permalink / raw)
  To: Roger Luethi; +Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 13:08, Roger Luethi wrote:
> On Fri, 06 Aug 2004 10:02:28 -0400, Albert Cahalan wrote:

> > > what a good solution would look like. Files like /proc/pid/status
> > > are human-readable and maintenance-friendly (the parser can recognize
> > > unknown values and gets a free label along with it; obsolete fields can
> > > be removed).
> > 
> > If you're just spewing the values with a perl script, sure.
> > I'm not sure this matters.
> 
> It matters to me. I like to have tools that don't need updates to
> cope with new fields. Having to wait for tool authors to catch up with
> kernels is annoying.

Not many people want raw data, so the tool authors
will need to put out new releases anyway.

It doesn't take more than a week generally.

> > If it's going to be this dynamic, then just give me DWARF2 debug
> > info and the raw data. Like this:
> > 
> > /proc/DWARF2
> > /proc/1000/mm_struct
> > /proc/1000/signal_struct
> > /proc/1000/sighand_struct
> > /proc/1000/task/1024/thread_info
> > /proc/1000/task/1024/task_struct
> > /proc/1000/task/1024/fs_struct
> 
> That's different. The overhead would be prohibitive. Also, this exposes
> internal kernel structures.

The overhead? I'm not seeing much, other than the multiple
files and the very fact that field locations are movable.

As long as I can fall back to the old /proc files when truly
radical kernel changes happen, exposure of kernel internals
isn't a serious problem.

If I had the DWARF2 data alone, /dev/mem might be enough.
(sadly, "top" would require some major work before I'd trust it)

> > > Or use netlink maybe? It sure would be nice to monitor all processes
> > > with lower overhead, and to have tools that can deal with new data
> > > items without an update.
> > 
> > I've been thinking netlink might be good.
> 
> Alright. Maybe we can move our discussion into this direction?

I'll need to track down some netlink documentation.
Last time I looked, there wasn't any.
 
> > > - Split proc information by new criteria: Slow, expensive items should
> > >   not be in the same file as information that tools typically
> > >   and frequently read. For instance, you could have status_basic,
> > >   status_exotic, and status_slow. Even status_basic could have a format
> > >   similar to /proc/pid/status, but would be shorter and contain only
> > >   the most frequently used values (like statm today -- with all the
> > >   problems that come with such a pre-made selection).
> > 
> > Split by:
> > 1. locking
> > 2. security.
> 
> Hmmm... How does this translate to a netlink interface? Can you elaborate?

I don't think it does.

For the existing files though:

Some SE Linux policies block all access to /proc. Some security
feature patches zero out things that would reveal addresses.
(start_code, end_code, wchan...)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 12:58     ` Albert Cahalan
@ 2004-08-06 15:48       ` William Lee Irwin III
  -1 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 15:48 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Roger Luethi, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 05:40, Roger Luethi wrote:
>> And then there is the trade-off between human readable and
>> easy to parse. ISTR there have been occasional discussions, but maybe
>> it's time to revisit the issue because the current mess is a problem.

On Fri, Aug 06, 2004 at 08:58:43AM -0400, Albert Cahalan wrote:
> The current bugs are a problem.
> Quoting your other email now:

Could you describe those in isolation from other issues?

Thanks.


-- wli

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 15:48       ` William Lee Irwin III
  0 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 15:48 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Roger Luethi, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 05:40, Roger Luethi wrote:
>> And then there is the trade-off between human readable and
>> easy to parse. ISTR there have been occasional discussions, but maybe
>> it's time to revisit the issue because the current mess is a problem.

On Fri, Aug 06, 2004 at 08:58:43AM -0400, Albert Cahalan wrote:
> The current bugs are a problem.
> Quoting your other email now:

Could you describe those in isolation from other issues?

Thanks.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 12:58     ` Albert Cahalan
@ 2004-08-06 16:34       ` Roger Luethi
  -1 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 16:34 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 06 Aug 2004 08:58:43 -0400, Albert Cahalan wrote:
> > Hardly. All I was asking this time was to have a documentation fix
> > merged, though.
> 
> Just delete the documentation. I certainly never use it.

It wasn't written for you.

> Since you need the kernel source to get the documentation
> anyway, you might as well examine the fs/proc/*.c files.

Some users may prefer written documentation over reading the kernel
source. In addition, in the case of statm, there is nothing to document
the expected behavior in the source, either. Which is precisely why
statm has been utterly broken forever.

> > * statm is broken. It was broken in 2.4 as well, but _differently_. Every
> >   application that relies on statm forwards wrong information, or at
> >   the very least needs special casing because the information provided
> >   in various fields differs between kernel versions.
> 
> The kernel has multiple stat() syscalls. At times, they have been
> broken when dealing with UID values that overflow. Should these
> system calls have been eliminated? If not, how is this different?

stat is a well-defined POSIX call.

> > * Nobody can really tell exactly how broken statm is because there is
> >   no canonical documentation of what it is supposed to do. That implies
> >   that it is kinda hard to properly fix statm.
> 
> Nah. Just look at the 2.2.xx and 2.4.xx kernels.

2.4 was (is?) broken as well.

> > * I hate the format. I like my proc files human readable. An important
> >   reason that statm could linger around in a broken state for so long
> >   is the lack of labels. It's hard to find bugs if there's nothing to
> >   indicate what the values are supposed to be. (and yes, /proc/pid/stat
> >   is awful, too, but it has the excuse of providing valuable information)
> 
> Nobody has been screwing with the statm formatting. There is
> no temptation. The same can not be said of the "readable" files.

Agreed, and I'd be interested in solutions. OTOH, it is harder to
discover if the _content_ is broken in statm.

> Is is SigCgt or SigCat? That would depend on kernel version.
> What about /proc/cpuinfo? An old file gets parsed on whitespace.
> A recent one has ':' characters that you must use.

I wish there were some written guidelines to prevent things like that in
the future. I'd be willing to write them up if there was some agreement
on those rules.

> > The only reason I could see for keeping statm around is that it
> > is cheaper than status for parsers in top & Co. Having written one
> > of them myself, I have spent quite some time thinking about better
> > alternatives. If you want to talk about that, count me in.
> 
> The statm format rules, assuming you don't go binary.

The statm format could only work if there was a clear understanding
what the fields mean. But there isn't.

> >> + size     total program size (pages)  (same as VmSize in status)
> >> + resident size of memory portions (pages) (same as VmRSS in status)
> >>
> >> There was a distinction here that has been lost. One of these
> >> included memory-mapped hardware. You could see this with the
> >> X server video memory.
> >
> > You can definitely not rely on that distinction being there. Feel free to
> > add a comment "may or may not include memory-mapped hardware, depending
> > on the kernel". This makes statm even worse, because even the seemingly
> > well-defined, redundant fields aren't.
> 
> This is merely a kernel bug. Hey, bugs happen.

How can you tell it's a bug? It looks correct to me.

> Why? If statm is broken, it should be fixed. Putting the statm
> data into the status file was dumb, but it's too late now.

It was not dumb. Some people actually prefer human-readable output when
working with proc.

> On AIX:  ps -eo trs
> On BSD:  ps axo trss

I trust they take that information from /proc/pid/statm, too?

> Text size is "tsiz". We have that in the stat file, as the difference
> between end_code and start_code. We don't need second copy of tsiz.
> 
> >> + dt       number of dirty pages   (always 0 on 2.6)
> >>
> >> This one would be useful.
> >
> > Agreed. It would be nice to have it somewhere else.
> 
> No, it's not nice to go moving things around. How about you go

This field is 0 on 2.6. Zero. Always. I am suggesting to have the
information available somewhere. That sure ought to count as an
improvement.

> >> These would be really useful too:
> >> 1. swap space used
> >> 2. swap space that would be used if fully paged out
> >
> > There are many values that could be interesting or useful. But that
> > has nothing to do with the abomination that is statm.
> 
> These values belong in statm.

I thought there was no screwing around with the statm format!?

> > Hey, I am all _for_ improving proc. But rather than adding more values,
> > I'd like to address some design problems first: For example, I'd
> > like to have a reserved value for N/A (currently, kernels just set
> > obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> > available).
> 
> Don't even think of changing this.

Why not? Got a better solution?

> > [ fixed linux-mm address ]
> 
> This should have been on linux-kernel in the first place.
> The linux-mm list is kind of obscure, and doubly so because
> it isn't on vger.kernel.org.

This _was_ on linux-kernel in the first place. _You_ added the wrong
linux-mm address. I don't get your humor.

> No, statm is the proper and only place for this data.
> I certainly don't claim that statm is bug-free code.
> That's not a reason to discard the whole statm concept.

The current state of statm code clearly demonstrates the level of
interest in this concept.

Roger

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 16:34       ` Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 16:34 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 06 Aug 2004 08:58:43 -0400, Albert Cahalan wrote:
> > Hardly. All I was asking this time was to have a documentation fix
> > merged, though.
> 
> Just delete the documentation. I certainly never use it.

It wasn't written for you.

> Since you need the kernel source to get the documentation
> anyway, you might as well examine the fs/proc/*.c files.

Some users may prefer written documentation over reading the kernel
source. In addition, in the case of statm, there is nothing to document
the expected behavior in the source, either. Which is precisely why
statm has been utterly broken forever.

> > * statm is broken. It was broken in 2.4 as well, but _differently_. Every
> >   application that relies on statm forwards wrong information, or at
> >   the very least needs special casing because the information provided
> >   in various fields differs between kernel versions.
> 
> The kernel has multiple stat() syscalls. At times, they have been
> broken when dealing with UID values that overflow. Should these
> system calls have been eliminated? If not, how is this different?

stat is a well-defined POSIX call.

> > * Nobody can really tell exactly how broken statm is because there is
> >   no canonical documentation of what it is supposed to do. That implies
> >   that it is kinda hard to properly fix statm.
> 
> Nah. Just look at the 2.2.xx and 2.4.xx kernels.

2.4 was (is?) broken as well.

> > * I hate the format. I like my proc files human readable. An important
> >   reason that statm could linger around in a broken state for so long
> >   is the lack of labels. It's hard to find bugs if there's nothing to
> >   indicate what the values are supposed to be. (and yes, /proc/pid/stat
> >   is awful, too, but it has the excuse of providing valuable information)
> 
> Nobody has been screwing with the statm formatting. There is
> no temptation. The same can not be said of the "readable" files.

Agreed, and I'd be interested in solutions. OTOH, it is harder to
discover if the _content_ is broken in statm.

> Is is SigCgt or SigCat? That would depend on kernel version.
> What about /proc/cpuinfo? An old file gets parsed on whitespace.
> A recent one has ':' characters that you must use.

I wish there were some written guidelines to prevent things like that in
the future. I'd be willing to write them up if there was some agreement
on those rules.

> > The only reason I could see for keeping statm around is that it
> > is cheaper than status for parsers in top & Co. Having written one
> > of them myself, I have spent quite some time thinking about better
> > alternatives. If you want to talk about that, count me in.
> 
> The statm format rules, assuming you don't go binary.

The statm format could only work if there was a clear understanding
what the fields mean. But there isn't.

> >> + size     total program size (pages)  (same as VmSize in status)
> >> + resident size of memory portions (pages) (same as VmRSS in status)
> >>
> >> There was a distinction here that has been lost. One of these
> >> included memory-mapped hardware. You could see this with the
> >> X server video memory.
> >
> > You can definitely not rely on that distinction being there. Feel free to
> > add a comment "may or may not include memory-mapped hardware, depending
> > on the kernel". This makes statm even worse, because even the seemingly
> > well-defined, redundant fields aren't.
> 
> This is merely a kernel bug. Hey, bugs happen.

How can you tell it's a bug? It looks correct to me.

> Why? If statm is broken, it should be fixed. Putting the statm
> data into the status file was dumb, but it's too late now.

It was not dumb. Some people actually prefer human-readable output when
working with proc.

> On AIX:  ps -eo trs
> On BSD:  ps axo trss

I trust they take that information from /proc/pid/statm, too?

> Text size is "tsiz". We have that in the stat file, as the difference
> between end_code and start_code. We don't need second copy of tsiz.
> 
> >> + dt       number of dirty pages   (always 0 on 2.6)
> >>
> >> This one would be useful.
> >
> > Agreed. It would be nice to have it somewhere else.
> 
> No, it's not nice to go moving things around. How about you go

This field is 0 on 2.6. Zero. Always. I am suggesting to have the
information available somewhere. That sure ought to count as an
improvement.

> >> These would be really useful too:
> >> 1. swap space used
> >> 2. swap space that would be used if fully paged out
> >
> > There are many values that could be interesting or useful. But that
> > has nothing to do with the abomination that is statm.
> 
> These values belong in statm.

I thought there was no screwing around with the statm format!?

> > Hey, I am all _for_ improving proc. But rather than adding more values,
> > I'd like to address some design problems first: For example, I'd
> > like to have a reserved value for N/A (currently, kernels just set
> > obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> > available).
> 
> Don't even think of changing this.

Why not? Got a better solution?

> > [ fixed linux-mm address ]
> 
> This should have been on linux-kernel in the first place.
> The linux-mm list is kind of obscure, and doubly so because
> it isn't on vger.kernel.org.

This _was_ on linux-kernel in the first place. _You_ added the wrong
linux-mm address. I don't get your humor.

> No, statm is the proper and only place for this data.
> I certainly don't claim that statm is bug-free code.
> That's not a reason to discard the whole statm concept.

The current state of statm code clearly demonstrates the level of
interest in this concept.

Roger
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 14:02         ` Albert Cahalan
  (?)
@ 2004-08-06 16:48         ` William Lee Irwin III
  -1 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 16:48 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Roger Luethi, linux-kernel mailing list, linux-mm

Roger Luethi writes:
>> Most of the current problems with proc are related to tools: They don't
>> like changes and some of them are very sensitive to resource usage
>> (because they may make hundreds of calls per second on typical systems).

On Fri, Aug 06, 2004 at 10:02:28AM -0400, Albert Cahalan wrote:
> Make that 2000 /proc reads per second or more. This is too slow.
> I need to read about 1 million /proc files per second.

This is a truly terrifying prospect. The vfs overheads of manipulating
that much metadata is unthinkably enormous, not to mention the very
real tasklist_lock starvation issues killing boxen dead now.

By any chance could a rate-limited incremental algorithm be used,
at least for top(1)?


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 14:14         ` Albert Cahalan
@ 2004-08-06 16:49           ` William Lee Irwin III
  -1 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 16:49 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Roger Luethi, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 11:48, William Lee Irwin III wrote:
>> Could you describe those in isolation from other issues?

On Fri, Aug 06, 2004 at 10:14:43AM -0400, Albert Cahalan wrote:
> Whatever Roger found, plus:
> 1. trs == text RESIDENT size
> 2. drs == data RESIDENT size
> 3. memory-mapped devices should be counted for only 1 file
>    (use an old Linux box running X to see)
> I'm not terribly concerned right now. I just don't think
> it's OK to go ripping out statm over a few bugs.
> If we ripped out every buggy piece of kernel code, we'd
> have a 0-byte kernel.
> There are far bigger issues elsewhere, like %CPU.

Okay, can you give precise definitions of trs and drs?


-- wli

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 16:49           ` William Lee Irwin III
  0 siblings, 0 replies; 46+ messages in thread
From: William Lee Irwin III @ 2004-08-06 16:49 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Roger Luethi, linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 11:48, William Lee Irwin III wrote:
>> Could you describe those in isolation from other issues?

On Fri, Aug 06, 2004 at 10:14:43AM -0400, Albert Cahalan wrote:
> Whatever Roger found, plus:
> 1. trs == text RESIDENT size
> 2. drs == data RESIDENT size
> 3. memory-mapped devices should be counted for only 1 file
>    (use an old Linux box running X to see)
> I'm not terribly concerned right now. I just don't think
> it's OK to go ripping out statm over a few bugs.
> If we ripped out every buggy piece of kernel code, we'd
> have a 0-byte kernel.
> There are far bigger issues elsewhere, like %CPU.

Okay, can you give precise definitions of trs and drs?


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 14:02         ` Albert Cahalan
@ 2004-08-06 17:08           ` Roger Luethi
  -1 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 17:08 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

On Fri, 06 Aug 2004 10:02:28 -0400, Albert Cahalan wrote:
> > Tools reading statm would have to be updated anyway, so I'd rather
> > think about what could be done with a new (or just different) file.
> 
> Even if the existing fields are indeed mostly junk, you can always
> add new fields to the end.

I don't like it, but it is a possible solution. It only works for tools
reading proc, though. Humans don't parse such files well.

> > what a good solution would look like. Files like /proc/pid/status
> > are human-readable and maintenance-friendly (the parser can recognize
> > unknown values and gets a free label along with it; obsolete fields can
> > be removed).
> 
> If you're just spewing the values with a perl script, sure.
> I'm not sure this matters.

It matters to me. I like to have tools that don't need updates to
cope with new fields. Having to wait for tool authors to catch up with
kernels is annoying.

> Normal C programs don't work that way. Unknown values are useless.
> What am I supposed to do with an unknown value? I can't even tell
> what data type it is. Maybe 12345 is really a string. I'm going

You could e.g. restrict automatic fields to long. There are other
solutions possible.

> to rely on the values I need, so you can't freely delete things.
> If I didn't need the values, I wouldn't read the file at all.

Not all programs work like that.

> > The downside is the performance aspect you pointed out:
> > Reading that file for every process just to grep for one or two values
> > is slow, and some of the unused data items might be expensive for the
> > kernel to produce in the first place.
> 
> You're using grep??? That's a script then. You can tolerate

No. s/grep/look for/

> getting your info from "ps" output. It's not a performance
> issue for you. For ps, performance is a problem. Thus ps must
> get priority in the design of /proc files.

ps can get priority in statm for all I care. I am interested in other
files and mechanisms.

> > A light-weight interface for tools could work like this (ugly):
> >
> > $ cat /proc/pid.provided
> > Name SleepAVG Pid Tgid PPid VmSize VmLck VmData [...]
> > $ cat /proc/10235/VmSize.VmData
> > 3380 144
> 
> It's hard to imagine parsing that. I suppose I'm expected to
> dynamicly create a sscanf format using the numbered-parameter
> notation? Maybe I have to fill a table with pointers to... Ugh.

The interface was just to illustrate the kind of functionality I'm
considering. It's ugly, but it's not that hard to use, either.

> If it's going to be this dynamic, then just give me DWARF2 debug
> info and the raw data. Like this:
> 
> /proc/DWARF2
> /proc/1000/mm_struct
> /proc/1000/signal_struct
> /proc/1000/sighand_struct
> /proc/1000/task/1024/thread_info
> /proc/1000/task/1024/task_struct
> /proc/1000/task/1024/fs_struct

That's different. The overhead would be prohibitive. Also, this exposes
internal kernel structures.

> > Or use netlink maybe? It sure would be nice to monitor all processes
> > with lower overhead, and to have tools that can deal with new data
> > items without an update.
> 
> I've been thinking netlink might be good.

Alright. Maybe we can move our discussion into this direction?

> > I am also interested in a related problem -- finding a better way for
> > tools to access process information. Preferably a generic way so we
> > don't need to keep tools and kernel in sync forever. I have some ideas,
> > but I don't know if they are acceptable as solutions (and if the problem
> > actually exists as I see it).
> 
> Look at other systems. FreeBSD, AIX, and Solaris all have
> superior ways of getting process data. Being compatible, at
> least for the basic info, would be good.

Quite frankly, in this area I care more about good than about compatible.

> FreeBSD: binary sysctl data with built-in process selection
> AIX:     dedicated syscall, somewhat resembling directory reads
> Solaris: binary /proc, including arrays for per-thread data
> 
> Somebody can research Tru64, HP-UX, MacOS X, and IRIX.
> 

> > Most of the current problems with proc are related to tools: They don't
> > like changes and some of them are very sensitive to resource usage
> > (because they may make hundreds of calls per second on typical systems).
> 
> Make that 2000 /proc reads per second or more. This is too slow.
> I need to read about 1 million /proc files per second.

Depends on your definition of typical. Obviously, it grows with the
number of processes and time resolution.

> > If we want to facilitate the use of additional information in tools,
> > I see two possible strategies:
> >
> > - Design a new solution that enables tools to discover the fields
> >   that are available and to ask for a subset (as I sketched out in my
> >   previous post). This would remove the need for inflexible solutions
> >   like statm.
> 
> That's useless.

Your opinion has been duly noted.

> If I didn't need the data, I wouldn't be trying to read it.
> If I haven't written code to use new data, I sure won't be
> caring to know the name of the new data.
> 
> > - Split proc information by new criteria: Slow, expensive items should
> >   not be in the same file as information that tools typically
> >   and frequently read. For instance, you could have status_basic,
> >   status_exotic, and status_slow. Even status_basic could have a format
> >   similar to /proc/pid/status, but would be shorter and contain only
> >   the most frequently used values (like statm today -- with all the
> >   problems that come with such a pre-made selection).
> 
> Split by:
> 1. locking
> 2. security.

Hmmm... How does this translate to a netlink interface? Can you elaborate?

Roger

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 17:08           ` Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 17:08 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

On Fri, 06 Aug 2004 10:02:28 -0400, Albert Cahalan wrote:
> > Tools reading statm would have to be updated anyway, so I'd rather
> > think about what could be done with a new (or just different) file.
> 
> Even if the existing fields are indeed mostly junk, you can always
> add new fields to the end.

I don't like it, but it is a possible solution. It only works for tools
reading proc, though. Humans don't parse such files well.

> > what a good solution would look like. Files like /proc/pid/status
> > are human-readable and maintenance-friendly (the parser can recognize
> > unknown values and gets a free label along with it; obsolete fields can
> > be removed).
> 
> If you're just spewing the values with a perl script, sure.
> I'm not sure this matters.

It matters to me. I like to have tools that don't need updates to
cope with new fields. Having to wait for tool authors to catch up with
kernels is annoying.

> Normal C programs don't work that way. Unknown values are useless.
> What am I supposed to do with an unknown value? I can't even tell
> what data type it is. Maybe 12345 is really a string. I'm going

You could e.g. restrict automatic fields to long. There are other
solutions possible.

> to rely on the values I need, so you can't freely delete things.
> If I didn't need the values, I wouldn't read the file at all.

Not all programs work like that.

> > The downside is the performance aspect you pointed out:
> > Reading that file for every process just to grep for one or two values
> > is slow, and some of the unused data items might be expensive for the
> > kernel to produce in the first place.
> 
> You're using grep??? That's a script then. You can tolerate

No. s/grep/look for/

> getting your info from "ps" output. It's not a performance
> issue for you. For ps, performance is a problem. Thus ps must
> get priority in the design of /proc files.

ps can get priority in statm for all I care. I am interested in other
files and mechanisms.

> > A light-weight interface for tools could work like this (ugly):
> >
> > $ cat /proc/pid.provided
> > Name SleepAVG Pid Tgid PPid VmSize VmLck VmData [...]
> > $ cat /proc/10235/VmSize.VmData
> > 3380 144
> 
> It's hard to imagine parsing that. I suppose I'm expected to
> dynamicly create a sscanf format using the numbered-parameter
> notation? Maybe I have to fill a table with pointers to... Ugh.

The interface was just to illustrate the kind of functionality I'm
considering. It's ugly, but it's not that hard to use, either.

> If it's going to be this dynamic, then just give me DWARF2 debug
> info and the raw data. Like this:
> 
> /proc/DWARF2
> /proc/1000/mm_struct
> /proc/1000/signal_struct
> /proc/1000/sighand_struct
> /proc/1000/task/1024/thread_info
> /proc/1000/task/1024/task_struct
> /proc/1000/task/1024/fs_struct

That's different. The overhead would be prohibitive. Also, this exposes
internal kernel structures.

> > Or use netlink maybe? It sure would be nice to monitor all processes
> > with lower overhead, and to have tools that can deal with new data
> > items without an update.
> 
> I've been thinking netlink might be good.

Alright. Maybe we can move our discussion into this direction?

> > I am also interested in a related problem -- finding a better way for
> > tools to access process information. Preferably a generic way so we
> > don't need to keep tools and kernel in sync forever. I have some ideas,
> > but I don't know if they are acceptable as solutions (and if the problem
> > actually exists as I see it).
> 
> Look at other systems. FreeBSD, AIX, and Solaris all have
> superior ways of getting process data. Being compatible, at
> least for the basic info, would be good.

Quite frankly, in this area I care more about good than about compatible.

> FreeBSD: binary sysctl data with built-in process selection
> AIX:     dedicated syscall, somewhat resembling directory reads
> Solaris: binary /proc, including arrays for per-thread data
> 
> Somebody can research Tru64, HP-UX, MacOS X, and IRIX.
> 

> > Most of the current problems with proc are related to tools: They don't
> > like changes and some of them are very sensitive to resource usage
> > (because they may make hundreds of calls per second on typical systems).
> 
> Make that 2000 /proc reads per second or more. This is too slow.
> I need to read about 1 million /proc files per second.

Depends on your definition of typical. Obviously, it grows with the
number of processes and time resolution.

> > If we want to facilitate the use of additional information in tools,
> > I see two possible strategies:
> >
> > - Design a new solution that enables tools to discover the fields
> >   that are available and to ask for a subset (as I sketched out in my
> >   previous post). This would remove the need for inflexible solutions
> >   like statm.
> 
> That's useless.

Your opinion has been duly noted.

> If I didn't need the data, I wouldn't be trying to read it.
> If I haven't written code to use new data, I sure won't be
> caring to know the name of the new data.
> 
> > - Split proc information by new criteria: Slow, expensive items should
> >   not be in the same file as information that tools typically
> >   and frequently read. For instance, you could have status_basic,
> >   status_exotic, and status_slow. Even status_basic could have a format
> >   similar to /proc/pid/status, but would be shorter and contain only
> >   the most frequently used values (like statm today -- with all the
> >   problems that come with such a pre-made selection).
> 
> Split by:
> 1. locking
> 2. security.

Hmmm... How does this translate to a netlink interface? Can you elaborate?

Roger
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 14:51         ` Albert Cahalan
@ 2004-08-06 17:28           ` Martin J. Bligh
  -1 siblings, 0 replies; 46+ messages in thread
From: Martin J. Bligh @ 2004-08-06 17:28 UTC (permalink / raw)
  To: Albert Cahalan, Roger Luethi; +Cc: linux-kernel mailing list, linux-mm, wli



--On Friday, August 06, 2004 10:51:24 -0400 Albert Cahalan <albert@users.sourceforge.net> wrote:

> On Fri, 2004-08-06 at 12:34, Roger Luethi wrote:
>> On Fri, 06 Aug 2004 08:58:43 -0400, Albert Cahalan wrote:
>> > > Hardly. All I was asking this time was to have a documentation fix
>> > > merged, though.
>> > 
>> > Just delete the documentation. I certainly never use it.
>> 
>> It wasn't written for you.
> 
> OK, but the statm file was. (well, for the maintainer
> of procps a decade ago)
> 
> Everybody else can parse ps output.

I don't think that's necessarily a good idea - access to lower level
data would be nice. What's the harm in fixing the docs, anyway?

M.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 17:28           ` Martin J. Bligh
  0 siblings, 0 replies; 46+ messages in thread
From: Martin J. Bligh @ 2004-08-06 17:28 UTC (permalink / raw)
  To: Albert Cahalan, Roger Luethi; +Cc: linux-kernel mailing list, linux-mm, wli


--On Friday, August 06, 2004 10:51:24 -0400 Albert Cahalan <albert@users.sourceforge.net> wrote:

> On Fri, 2004-08-06 at 12:34, Roger Luethi wrote:
>> On Fri, 06 Aug 2004 08:58:43 -0400, Albert Cahalan wrote:
>> > > Hardly. All I was asking this time was to have a documentation fix
>> > > merged, though.
>> > 
>> > Just delete the documentation. I certainly never use it.
>> 
>> It wasn't written for you.
> 
> OK, but the statm file was. (well, for the maintainer
> of procps a decade ago)
> 
> Everybody else can parse ps output.

I don't think that's necessarily a good idea - access to lower level
data would be nice. What's the harm in fixing the docs, anyway?

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 14:51         ` Albert Cahalan
@ 2004-08-06 18:21           ` Roger Luethi
  -1 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 18:21 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 06 Aug 2004 10:51:24 -0400, Albert Cahalan wrote:
> Everybody else can parse ps output.

Not everybody wants to. And ps doesn't provide all the process
information I can get via proc anyway.

> > Some users may prefer written documentation over reading the kernel
> > source. In addition, in the case of statm, there is nothing to document
> > the expected behavior in the source, either. Which is precisely why
> > statm has been utterly broken forever.
> 
> A correct proc.txt would not have avoided this.

It would have helped some of us confirming our suspicions.

> The source code needs a few comments.

Works for me. I'm looking forward to see that fixed.

> > It was not dumb. Some people actually prefer human-readable output when
> > working with proc.
> 
> These people shouldn't be working in /proc. It's easier and

Let me be the one to decide when I use a proc file.

> more portable to use "ps" for their scripts. You can select
> which fields you want, get a header if you like, have the
> processes filtered for you, and so on. Look:
> 
> ps -U root -u root -o pid= -o ppid= -o args
> 
> What's not to like about that? It's portable even.

I wouldn't like ps to be the gatekeeper to proc information. Plus
there's non-portable information in proc that I care about.

> > > On AIX:  ps -eo trs
> > > On BSD:  ps axo trss
> > 
> > I trust they take that information from /proc/pid/statm, too?
> 
> The point is that the name "trs" has a specific meaning.
> The statm file was created to support ps. It wouldn't exist
> if ps didn't need to display a TRS column. So the proper
> behavior of ps is what defines the meaning. Run these two

Fair enough. I think proc.txt should note this relation between
statm and ps.

> > > >> + dt       number of dirty pages   (always 0 on 2.6)
> > > >>
> > > >> This one would be useful.
> > > >
> > > > Agreed. It would be nice to have it somewhere else.
> > > 
> > > No, it's not nice to go moving things around. How about you go
> > 
> > This field is 0 on 2.6. Zero. Always. I am suggesting to have the
> > information available somewhere. That sure ought to count as an
> > improvement.
> 
> Sure. That "somewhere" should be where it was before.

I wouldn't mind seeing it in /proc/pid/status if the accounting gets
merged.

> > > > Hey, I am all _for_ improving proc. But rather than adding more values,
> > > > I'd like to address some design problems first: For example, I'd
> > > > like to have a reserved value for N/A (currently, kernels just set
> > > > obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> > > > available).
> > > 
> > > Don't even think of changing this.
> > 
> > Why not? Got a better solution?
> 
> Old tools need a value that will best make them work. (zero)

True for old fields. New fields are a different matter.

> New tools can examine the kernel version number.

Right. And every user space tool reading proc needs a database to
remember which field is active in which kernel version.

> > The current state of statm code clearly demonstrates the level of
> > interest in this concept.
> 
> It demonstrates that misleading data is hard to spot.
> It demonstrates that people hacking on the kernel are
> often unconcerned with providing correct stats for others.

I'd agree to some extent, but statm was pretty much impossible to fix
for even the most concerned kernel hacker.

Roger

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 18:21           ` Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-06 18:21 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, linux-mm, wli

On Fri, 06 Aug 2004 10:51:24 -0400, Albert Cahalan wrote:
> Everybody else can parse ps output.

Not everybody wants to. And ps doesn't provide all the process
information I can get via proc anyway.

> > Some users may prefer written documentation over reading the kernel
> > source. In addition, in the case of statm, there is nothing to document
> > the expected behavior in the source, either. Which is precisely why
> > statm has been utterly broken forever.
> 
> A correct proc.txt would not have avoided this.

It would have helped some of us confirming our suspicions.

> The source code needs a few comments.

Works for me. I'm looking forward to see that fixed.

> > It was not dumb. Some people actually prefer human-readable output when
> > working with proc.
> 
> These people shouldn't be working in /proc. It's easier and

Let me be the one to decide when I use a proc file.

> more portable to use "ps" for their scripts. You can select
> which fields you want, get a header if you like, have the
> processes filtered for you, and so on. Look:
> 
> ps -U root -u root -o pid= -o ppid= -o args
> 
> What's not to like about that? It's portable even.

I wouldn't like ps to be the gatekeeper to proc information. Plus
there's non-portable information in proc that I care about.

> > > On AIX:  ps -eo trs
> > > On BSD:  ps axo trss
> > 
> > I trust they take that information from /proc/pid/statm, too?
> 
> The point is that the name "trs" has a specific meaning.
> The statm file was created to support ps. It wouldn't exist
> if ps didn't need to display a TRS column. So the proper
> behavior of ps is what defines the meaning. Run these two

Fair enough. I think proc.txt should note this relation between
statm and ps.

> > > >> + dt       number of dirty pages   (always 0 on 2.6)
> > > >>
> > > >> This one would be useful.
> > > >
> > > > Agreed. It would be nice to have it somewhere else.
> > > 
> > > No, it's not nice to go moving things around. How about you go
> > 
> > This field is 0 on 2.6. Zero. Always. I am suggesting to have the
> > information available somewhere. That sure ought to count as an
> > improvement.
> 
> Sure. That "somewhere" should be where it was before.

I wouldn't mind seeing it in /proc/pid/status if the accounting gets
merged.

> > > > Hey, I am all _for_ improving proc. But rather than adding more values,
> > > > I'd like to address some design problems first: For example, I'd
> > > > like to have a reserved value for N/A (currently, kernels just set
> > > > obsolete fields to 0 and parsers must guess whether it's truly 0 or not
> > > > available).
> > > 
> > > Don't even think of changing this.
> > 
> > Why not? Got a better solution?
> 
> Old tools need a value that will best make them work. (zero)

True for old fields. New fields are a different matter.

> New tools can examine the kernel version number.

Right. And every user space tool reading proc needs a database to
remember which field is active in which kernel version.

> > The current state of statm code clearly demonstrates the level of
> > interest in this concept.
> 
> It demonstrates that misleading data is hard to spot.
> It demonstrates that people hacking on the kernel are
> often unconcerned with providing correct stats for others.

I'd agree to some extent, but statm was pretty much impossible to fix
for even the most concerned kernel hacker.

Roger
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 20:49               ` Martin J. Bligh
@ 2004-08-06 18:38                 ` Albert Cahalan
  -1 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 18:38 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Albert Cahalan, Roger Luethi, William Lee Irwin III,
	linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 16:49, Martin J. Bligh wrote:

> > As long as I can fall back to the old /proc files when truly
> > radical kernel changes happen, exposure of kernel internals
> > isn't a serious problem.
> > 
> > If I had the DWARF2 data alone, /dev/mem might be enough.
> > (sadly, "top" would require some major work before I'd trust it)
> 
> We did that on PTX ... walking tasklists lockless is a bitch.

It's fast. Lockless tasklist walking looks easy enough.
Find the process, grab the data, then find the process
again. If the process went away, discard the data.

I guess I'd like to have a /dev/ram-only device, for protection
against touching device memory (including AGP mem) by mistake.
It's odd that there doesn't seem to be such a device already.
Without this, I'd need to re-verify much more often.

Any problem I'm not seeing?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 18:38                 ` Albert Cahalan
  0 siblings, 0 replies; 46+ messages in thread
From: Albert Cahalan @ 2004-08-06 18:38 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Albert Cahalan, Roger Luethi, William Lee Irwin III,
	linux-kernel mailing list, linux-mm

On Fri, 2004-08-06 at 16:49, Martin J. Bligh wrote:

> > As long as I can fall back to the old /proc files when truly
> > radical kernel changes happen, exposure of kernel internals
> > isn't a serious problem.
> > 
> > If I had the DWARF2 data alone, /dev/mem might be enough.
> > (sadly, "top" would require some major work before I'd trust it)
> 
> We did that on PTX ... walking tasklists lockless is a bitch.

It's fast. Lockless tasklist walking looks easy enough.
Find the process, grab the data, then find the process
again. If the process went away, discard the data.

I guess I'd like to have a /dev/ram-only device, for protection
against touching device memory (including AGP mem) by mistake.
It's odd that there doesn't seem to be such a device already.
Without this, I'd need to re-verify much more often.

Any problem I'm not seeing?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 15:14             ` Albert Cahalan
@ 2004-08-06 20:49               ` Martin J. Bligh
  -1 siblings, 0 replies; 46+ messages in thread
From: Martin J. Bligh @ 2004-08-06 20:49 UTC (permalink / raw)
  To: Albert Cahalan, Roger Luethi
  Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

>> > If it's going to be this dynamic, then just give me DWARF2 debug
>> > info and the raw data. Like this:
>> > 
>> > /proc/DWARF2
>> > /proc/1000/mm_struct
>> > /proc/1000/signal_struct
>> > /proc/1000/sighand_struct
>> > /proc/1000/task/1024/thread_info
>> > /proc/1000/task/1024/task_struct
>> > /proc/1000/task/1024/fs_struct
>> 
>> That's different. The overhead would be prohibitive. Also, this exposes
>> internal kernel structures.
> 
> The overhead? I'm not seeing much, other than the multiple
> files and the very fact that field locations are movable.
> 
> As long as I can fall back to the old /proc files when truly
> radical kernel changes happen, exposure of kernel internals
> isn't a serious problem.
> 
> If I had the DWARF2 data alone, /dev/mem might be enough.
> (sadly, "top" would require some major work before I'd trust it)

We did that on PTX ... walking tasklists lockless is a bitch.

M.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 20:49               ` Martin J. Bligh
  0 siblings, 0 replies; 46+ messages in thread
From: Martin J. Bligh @ 2004-08-06 20:49 UTC (permalink / raw)
  To: Albert Cahalan, Roger Luethi
  Cc: William Lee Irwin III, linux-kernel mailing list, linux-mm

>> > If it's going to be this dynamic, then just give me DWARF2 debug
>> > info and the raw data. Like this:
>> > 
>> > /proc/DWARF2
>> > /proc/1000/mm_struct
>> > /proc/1000/signal_struct
>> > /proc/1000/sighand_struct
>> > /proc/1000/task/1024/thread_info
>> > /proc/1000/task/1024/task_struct
>> > /proc/1000/task/1024/fs_struct
>> 
>> That's different. The overhead would be prohibitive. Also, this exposes
>> internal kernel structures.
> 
> The overhead? I'm not seeing much, other than the multiple
> files and the very fact that field locations are movable.
> 
> As long as I can fall back to the old /proc files when truly
> radical kernel changes happen, exposure of kernel internals
> isn't a serious problem.
> 
> If I had the DWARF2 data alone, /dev/mem might be enough.
> (sadly, "top" would require some major work before I'd trust it)

We did that on PTX ... walking tasklists lockless is a bitch.

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 18:38                 ` Albert Cahalan
@ 2004-08-06 21:15                   ` Martin J. Bligh
  -1 siblings, 0 replies; 46+ messages in thread
From: Martin J. Bligh @ 2004-08-06 21:15 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Roger Luethi, William Lee Irwin III, linux-kernel mailing list, linux-mm

--On Friday, August 06, 2004 14:38:54 -0400 Albert Cahalan <albert@users.sourceforge.net> wrote:

> On Fri, 2004-08-06 at 16:49, Martin J. Bligh wrote:
> 
>> > As long as I can fall back to the old /proc files when truly
>> > radical kernel changes happen, exposure of kernel internals
>> > isn't a serious problem.
>> > 
>> > If I had the DWARF2 data alone, /dev/mem might be enough.
>> > (sadly, "top" would require some major work before I'd trust it)
>> 
>> We did that on PTX ... walking tasklists lockless is a bitch.
> 
> It's fast. Lockless tasklist walking looks easy enough.
> Find the process, grab the data, then find the process
> again. If the process went away, discard the data.

Oh, I know it's fast ... and probably the right thing to do. just hard ;-)
Either that or we come up with some intermediate abstraction that's faster
than /proc.
 
> I guess I'd like to have a /dev/ram-only device, for protection
> against touching device memory (including AGP mem) by mistake.
> It's odd that there doesn't seem to be such a device already.
> Without this, I'd need to re-verify much more often.

I'll make you one if you need it, but it shouldn't be a problem,
I'd think as you're just following pointers, which should all be
valid ...

M.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-06 21:15                   ` Martin J. Bligh
  0 siblings, 0 replies; 46+ messages in thread
From: Martin J. Bligh @ 2004-08-06 21:15 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Roger Luethi, William Lee Irwin III, linux-kernel mailing list, linux-mm

--On Friday, August 06, 2004 14:38:54 -0400 Albert Cahalan <albert@users.sourceforge.net> wrote:

> On Fri, 2004-08-06 at 16:49, Martin J. Bligh wrote:
> 
>> > As long as I can fall back to the old /proc files when truly
>> > radical kernel changes happen, exposure of kernel internals
>> > isn't a serious problem.
>> > 
>> > If I had the DWARF2 data alone, /dev/mem might be enough.
>> > (sadly, "top" would require some major work before I'd trust it)
>> 
>> We did that on PTX ... walking tasklists lockless is a bitch.
> 
> It's fast. Lockless tasklist walking looks easy enough.
> Find the process, grab the data, then find the process
> again. If the process went away, discard the data.

Oh, I know it's fast ... and probably the right thing to do. just hard ;-)
Either that or we come up with some intermediate abstraction that's faster
than /proc.
 
> I guess I'd like to have a /dev/ram-only device, for protection
> against touching device memory (including AGP mem) by mistake.
> It's odd that there doesn't seem to be such a device already.
> Without this, I'd need to re-verify much more often.

I'll make you one if you need it, but it shouldn't be a problem,
I'd think as you're just following pointers, which should all be
valid ...

M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
  2004-08-06 14:02         ` Albert Cahalan
@ 2004-08-07 17:37           ` Paul Jackson
  -1 siblings, 0 replies; 46+ messages in thread
From: Paul Jackson @ 2004-08-07 17:37 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: rl, wli, linux-kernel, linux-mm

Albert wrote:
> Somebody can research ... IRIX.

The Irix /proc documentation can be found at:

  http://www.mcsr.olemiss.edu/cgi-bin/man-cgi?proc+4
  UNIX man pages : proc (4)

Based on a quick scan, this is the same page, or close to, as on my late
model Irix box.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-07 17:37           ` Paul Jackson
  0 siblings, 0 replies; 46+ messages in thread
From: Paul Jackson @ 2004-08-07 17:37 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: rl, wli, linux-kernel, linux-mm

Albert wrote:
> Somebody can research ... IRIX.

The Irix /proc documentation can be found at:

  http://www.mcsr.olemiss.edu/cgi-bin/man-cgi?proc+4
  UNIX man pages : proc (4)

Based on a quick scan, this is the same page, or close to, as on my late
model Irix box.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [proc.txt] Fix /proc/pid/statm documentation
@ 2004-08-05 17:10 Roger Luethi
  0 siblings, 0 replies; 46+ messages in thread
From: Roger Luethi @ 2004-08-05 17:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

I really wanted /proc/pid/statm to die [1] and I still believe the
reasoning is valid. As it doesn't look like that is going to happen,
though, I offer this fix for the respective documentation.
Note: lrs/drs fields are switched.

Roger

[1] http://marc.theaimsgroup.com/?l=linux-mm&m=106059260315203

Signed-off-by: Roger Luethi <rl@hellgate.ch>

--- 2.6-mm/Documentation/filesystems/proc.txt.orig	2004-08-05 16:06:47.000000000 +0200
+++ 2.6-mm/Documentation/filesystems/proc.txt	2004-08-05 19:01:50.943888417 +0200
@@ -169,16 +169,18 @@ information. The  statm  file  contains 
 process memory usage. Its seven fields are explained in Table 1-2.
 
 
-Table 1-2: Contents of the statm files 
+Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
 ..............................................................................
- File     Content                         
- size     total program size              
- resident size of memory portions         
- shared   number of pages that are shared 
- trs      number of pages that are 'code' 
- drs      number of pages of data/stack   
- lrs      number of pages of library      
- dt       number of dirty pages           
+ Field    Content
+ size     total program size (pages)		(same as VmSize in status)
+ resident size of memory portions (pages)	(same as VmRSS in status)
+ shared   number of pages that are shared	(i.e. backed by a file)
+ trs      number of pages that are 'code'	(not including libs; broken,
+							includes data segment)
+ lrs      number of pages of library		(always 0 on 2.6)
+ drs      number of pages of data/stack		(including libs; broken,
+							includes library text)
+ dt       number of dirty pages			(always 0 on 2.6)
 ..............................................................................
 
 1.2 Kernel data

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2004-08-07 17:38 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-06  1:11 [proc.txt] Fix /proc/pid/statm documentation Albert Cahalan
2004-08-06  3:48 ` William Lee Irwin III
2004-08-06  9:40 ` Roger Luethi
2004-08-06 10:46   ` William Lee Irwin III
2004-08-06 12:01     ` Roger Luethi
2004-08-06 12:01       ` Roger Luethi
2004-08-06 12:11       ` William Lee Irwin III
2004-08-06 12:11         ` William Lee Irwin III
2004-08-06 13:57         ` Roger Luethi
2004-08-06 13:57           ` Roger Luethi
2004-08-06 14:07           ` William Lee Irwin III
2004-08-06 14:07             ` William Lee Irwin III
2004-08-06 15:02             ` Roger Luethi
2004-08-06 15:02               ` Roger Luethi
2004-08-06 14:02       ` Albert Cahalan
2004-08-06 14:02         ` Albert Cahalan
2004-08-06 16:48         ` William Lee Irwin III
2004-08-06 17:08         ` Roger Luethi
2004-08-06 17:08           ` Roger Luethi
2004-08-06 15:14           ` Albert Cahalan
2004-08-06 15:14             ` Albert Cahalan
2004-08-06 20:49             ` Martin J. Bligh
2004-08-06 20:49               ` Martin J. Bligh
2004-08-06 18:38               ` Albert Cahalan
2004-08-06 18:38                 ` Albert Cahalan
2004-08-06 21:15                 ` Martin J. Bligh
2004-08-06 21:15                   ` Martin J. Bligh
2004-08-07 17:37         ` Paul Jackson
2004-08-07 17:37           ` Paul Jackson
2004-08-06 12:58   ` Albert Cahalan
2004-08-06 12:58     ` Albert Cahalan
2004-08-06 15:48     ` William Lee Irwin III
2004-08-06 15:48       ` William Lee Irwin III
2004-08-06 14:14       ` Albert Cahalan
2004-08-06 14:14         ` Albert Cahalan
2004-08-06 16:49         ` William Lee Irwin III
2004-08-06 16:49           ` William Lee Irwin III
2004-08-06 16:34     ` Roger Luethi
2004-08-06 16:34       ` Roger Luethi
2004-08-06 14:51       ` Albert Cahalan
2004-08-06 14:51         ` Albert Cahalan
2004-08-06 17:28         ` Martin J. Bligh
2004-08-06 17:28           ` Martin J. Bligh
2004-08-06 18:21         ` Roger Luethi
2004-08-06 18:21           ` Roger Luethi
  -- strict thread matches above, loose matches on Subject: below --
2004-08-05 17:10 Roger Luethi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.