linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] proc.5: tell how to parse /proc/*/stat correctly
@ 2022-12-22 16:42 Alexey Dobriyan
  2022-12-22 22:03 ` [oss-security] " Dominique Martinet
  0 siblings, 1 reply; 21+ messages in thread
From: Alexey Dobriyan @ 2022-12-22 16:42 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk; +Cc: linux-kernel, linux-man, oss-security

/proc/*/stat can't be parsed with split() or split(" ") or split(' ')
or sscanf("%d (%s) ...") or equivalents because "comm" can contain
whitespace and parenthesis and is not escaped by the kernel.

BTW escaping would not help with naive split() anyway.

Mention strrchr(')') so people can at least stop adding new bugs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 man5/proc.5 |    5 +++++
 1 file changed, 5 insertions(+)

--- a/man5/proc.5
+++ b/man5/proc.5
@@ -2092,6 +2092,11 @@ Strings longer than
 .B TASK_COMM_LEN
 (16) characters (including the terminating null byte) are silently truncated.
 This is visible whether or not the executable is swapped out.
+
+Note that \fIcomm\fP can contain space and closing parenthesis characters. 
+Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't
+reliable. The correct way is to locate closing parenthesis with strrchr(')')
+from the end of the buffer and parse integers from there.
 .TP
 (3) \fIstate\fP \ %c
 One of the following characters, indicating process state:

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-22 16:42 [patch] proc.5: tell how to parse /proc/*/stat correctly Alexey Dobriyan
@ 2022-12-22 22:03 ` Dominique Martinet
  2022-12-22 23:21   ` Solar Designer
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Dominique Martinet @ 2022-12-22 22:03 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

Alexey Dobriyan wrote on Thu, Dec 22, 2022 at 07:42:53PM +0300:
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -2092,6 +2092,11 @@ Strings longer than
>  .B TASK_COMM_LEN
>  (16) characters (including the terminating null byte) are silently truncated.
>  This is visible whether or not the executable is swapped out.
> +
> +Note that \fIcomm\fP can contain space and closing parenthesis characters. 
> +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't
> +reliable. The correct way is to locate closing parenthesis with strrchr(')')
> +from the end of the buffer and parse integers from there.

That's still not enough unless new lines are escaped, which they aren't:

$ echo -n 'test) 0 0 0
' > /proc/$$/comm
$ cat /proc/$$/stat
71076 (test) 0 0 0
) S 71075 71076 71076 34840 71192 4194304 6623 6824 0 0 10 3 2 7 20 0 1 0 36396573 15208448 2888 18446744073709551615 94173281726464 94173282650929 140734972513568 0 0 0 65536 3686404 1266761467 1 0 0 17 1 0 0 0 0 0 94173282892592 94173282940880 94173287231488 140734972522071 140734972522076 140734972522076 140734972526574 0

The silver lining here is that comm length is rather small (16) so we
cannot emulate full lines and a very careful process could notice that
there are not enough fields after the last parenthesis... So just look
for the last closing parenthesis in the next line and try again?

But, really, I just don't see how this can practically be said to be parsable...

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-22 22:03 ` [oss-security] " Dominique Martinet
@ 2022-12-22 23:21   ` Solar Designer
  2022-12-23  0:15     ` Dominique Martinet
  2022-12-23  0:21   ` Jan Engelhardt
  2022-12-28  0:44   ` Lyndon Nerenberg (VE7TFX/VE6BBM)
  2 siblings, 1 reply; 21+ messages in thread
From: Solar Designer @ 2022-12-22 23:21 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: oss-security, Alejandro Colomar, Michael Kerrisk, linux-kernel,
	linux-man

On Fri, Dec 23, 2022 at 07:03:17AM +0900, Dominique Martinet wrote:
> Alexey Dobriyan wrote on Thu, Dec 22, 2022 at 07:42:53PM +0300:
> > --- a/man5/proc.5
> > +++ b/man5/proc.5
> > @@ -2092,6 +2092,11 @@ Strings longer than
> >  .B TASK_COMM_LEN
> >  (16) characters (including the terminating null byte) are silently truncated.
> >  This is visible whether or not the executable is swapped out.
> > +
> > +Note that \fIcomm\fP can contain space and closing parenthesis characters. 
> > +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't
> > +reliable. The correct way is to locate closing parenthesis with strrchr(')')
> > +from the end of the buffer and parse integers from there.
> 
> That's still not enough unless new lines are escaped, which they aren't:
> 
> $ echo -n 'test) 0 0 0
> ' > /proc/$$/comm
> $ cat /proc/$$/stat
> 71076 (test) 0 0 0
> ) S 71075 71076 71076 34840 71192 4194304 6623 6824 0 0 10 3 2 7 20 0 1 0 36396573 15208448 2888 18446744073709551615 94173281726464 94173282650929 140734972513568 0 0 0 65536 3686404 1266761467 1 0 0 17 1 0 0 0 0 0 94173282892592 94173282940880 94173287231488 140734972522071 140734972522076 140734972522076 140734972526574 0
> 
> The silver lining here is that comm length is rather small (16) so we
> cannot emulate full lines and a very careful process could notice that
> there are not enough fields after the last parenthesis... So just look
> for the last closing parenthesis in the next line and try again?

No, just don't treat this file's content as a line (nor as several
lines) - treat it as a string that might contain new line characters.

The ps command from procps-ng seems to manage, e.g. for your test "ps c"
prints:

29394 pts/3    S      0:00 test) 0 0 0?

where the question mark is what it substitutes for the non-printable
character (the new line character).  I didn't check whether the process
name it prints comes from /proc/$$/stat or /proc/$$/status, though (per
strace, it reads both).

> But, really, I just don't see how this can practically be said to be parsable...

This format certainly makes it easier to get a parser wrong than to get
it right.

I agree the above man page edit is not enough, and should also mention
the caveat that this shouldn't be read in nor parsed as a line.

Also, the Linux kernel does have problems with new lines in the comm
field elsewhere, at least in the log messages it produces:

https://github.com/lkrg-org/lkrg/issues/165

Here I looked into this in context of LKRG development, but with the
kernel itself also producing messages with comm in them the point of
only fixing LKRG's messages is moot.

Alexander

P.S. While this thread goes well so far, please note that in general
CC'ing other lists on postings to oss-security (or vice versa) is
discouraged.  With such CC's, possible follow-ups from members of those
other lists can be off-topic for oss-security - e.g., they might focus
on non-security technicalities.  Probably not this time when only a man
page is to be patched, but proposed patches to the Linux kernel often
result in lengthy discussions and multiple versions of the patch.  In
those cases, I think it's better to have separate threads and only post
summary follow-up(s) to oss-security (e.g., one message stating that a
patch was proposed and linking to the thread, and another after the
final version is merged).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-22 23:21   ` Solar Designer
@ 2022-12-23  0:15     ` Dominique Martinet
  0 siblings, 0 replies; 21+ messages in thread
From: Dominique Martinet @ 2022-12-23  0:15 UTC (permalink / raw)
  To: Solar Designer
  Cc: oss-security, Alejandro Colomar, Michael Kerrisk, linux-kernel,
	linux-man

Solar Designer wrote on Fri, Dec 23, 2022 at 12:21:12AM +0100:
> On Fri, Dec 23, 2022 at 07:03:17AM +0900, Dominique Martinet wrote:
> > Alexey Dobriyan wrote on Thu, Dec 22, 2022 at 07:42:53PM +0300:
> > > --- a/man5/proc.5
> > > +++ b/man5/proc.5
> > > @@ -2092,6 +2092,11 @@ Strings longer than
> > >  .B TASK_COMM_LEN
> > >  (16) characters (including the terminating null byte) are silently truncated.
> > >  This is visible whether or not the executable is swapped out.
> > > +
> > > +Note that \fIcomm\fP can contain space and closing parenthesis characters. 
> > > +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't
> > > +reliable. The correct way is to locate closing parenthesis with strrchr(')')
> > > +from the end of the buffer and parse integers from there.
> > 
> > That's still not enough unless new lines are escaped, which they aren't:
> > 
> > $ echo -n 'test) 0 0 0
> > ' > /proc/$$/comm
> > $ cat /proc/$$/stat
> > 71076 (test) 0 0 0
> > ) S 71075 71076 71076 34840 71192 4194304 6623 6824 0 0 10 3 2 7 20 0 1 0 36396573 15208448 2888 18446744073709551615 94173281726464 94173282650929 140734972513568 0 0 0 65536 3686404 1266761467 1 0 0 17 1 0 0 0 0 0 94173282892592 94173282940880 94173287231488 140734972522071 140734972522076 140734972522076 140734972526574 0
> > 
> > The silver lining here is that comm length is rather small (16) so we
> > cannot emulate full lines and a very careful process could notice that
> > there are not enough fields after the last parenthesis... So just look
> > for the last closing parenthesis in the next line and try again?
> 
> No, just don't treat this file's content as a line (nor as several
> lines) - treat it as a string that might contain new line characters.

Ah, this came just after the /proc/net/unix discussion in another
thread[1] pointing to [2] with one line per entry, and I was still in
that mode.

For /proc/pid/stat with a single entry I agree treating it as a buffer
and looking for the last closing parenthesis should be correct as per
the man page suggestion -- sorry for the noise.

[1] https://www.openwall.com/lists/oss-security/2022/12/21/8
[2] https://lore.kernel.org/all/8a87957e-4d33-9351-ae74-243441cb03cd@opteya.com/

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-22 22:03 ` [oss-security] " Dominique Martinet
  2022-12-22 23:21   ` Solar Designer
@ 2022-12-23  0:21   ` Jan Engelhardt
  2022-12-28  0:44   ` Lyndon Nerenberg (VE7TFX/VE6BBM)
  2 siblings, 0 replies; 21+ messages in thread
From: Jan Engelhardt @ 2022-12-23  0:21 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man


On Thursday 2022-12-22 23:03, Dominique Martinet wrote:
>> +
>> +Note that \fIcomm\fP can contain space and closing parenthesis characters. 
>> +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't
>> +reliable. The correct way is to locate closing parenthesis with strrchr(')')
>> +from the end of the buffer and parse integers from there.
>
>That's still not enough unless new lines are escaped, which they aren't:

strrchr does not concern itself with "lines".
If your input buffer contains the complete content of /proc/X/stat (and not
just a "line" thereof), the strrchr approach appears quite workable.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-22 22:03 ` [oss-security] " Dominique Martinet
  2022-12-22 23:21   ` Solar Designer
  2022-12-23  0:21   ` Jan Engelhardt
@ 2022-12-28  0:44   ` Lyndon Nerenberg (VE7TFX/VE6BBM)
  2022-12-28  1:50     ` Tavis Ormandy
  2022-12-28 15:24     ` Shawn Webb
  2 siblings, 2 replies; 21+ messages in thread
From: Lyndon Nerenberg (VE7TFX/VE6BBM) @ 2022-12-28  0:44 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

Dominique Martinet writes:

> But, really, I just don't see how this can practically be said to be parsable...

In its current form it never will be.  The solution is to place
this variable-length field last.  Then you can "cut -d ' ' -f 51-"
to get the command+args part (assuming I counted all those fields
correctly ...)

Of course, this breaks backwards compatability.

--lyndon

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28  0:44   ` Lyndon Nerenberg (VE7TFX/VE6BBM)
@ 2022-12-28  1:50     ` Tavis Ormandy
  2022-12-30 20:15       ` Jakub Wilk
  2022-12-28 15:24     ` Shawn Webb
  1 sibling, 1 reply; 21+ messages in thread
From: Tavis Ormandy @ 2022-12-28  1:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: oss-security, linux-man

On 2022-12-28, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> Dominique Martinet writes:
>
>> But, really, I just don't see how this can practically be said to be parsable...
>
> In its current form it never will be.  The solution is to place
> this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> to get the command+args part (assuming I counted all those fields
> correctly ...)
>
> Of course, this breaks backwards compatability.
>

I think that cut command doesn't handle newlines, you probably need cut
-z, which looks a bit hacky to me. There already is 'ps -q $$ -o comm=',
of course, and that works fine - as well as libprocps.

I don't know, I think just adding the strrchr note seems acceptable.

Tavis.


-- 
 _o)            $ lynx lock.cmpxchg8b.com
 /\\  _o)  _o)  $ finger taviso@sdf.org
_\_V _( ) _( )  @taviso


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28  0:44   ` Lyndon Nerenberg (VE7TFX/VE6BBM)
  2022-12-28  1:50     ` Tavis Ormandy
@ 2022-12-28 15:24     ` Shawn Webb
  2022-12-28 15:31       ` Shawn Webb
                         ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Shawn Webb @ 2022-12-28 15:24 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> Dominique Martinet writes:
> 
> > But, really, I just don't see how this can practically be said to be parsable...
> 
> In its current form it never will be.  The solution is to place
> this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> to get the command+args part (assuming I counted all those fields
> correctly ...)
> 
> Of course, this breaks backwards compatability.

It would also break forwards compatibility in the case new fields
needed to be added.

The only solution would be a libxo-style feature wherein a
machine-parseable format is exposed by virtue of a file extension.

Examples:

1. /proc/pid/stats.json
2. /proc/pid/stats.xml
3. /proc/pid/stats.yaml_shouldnt_be_a_thing

Thanks,

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 15:24     ` Shawn Webb
@ 2022-12-28 15:31       ` Shawn Webb
  2022-12-28 16:47       ` Demi Marie Obenour
  2022-12-31 16:31       ` David Laight
  2 siblings, 0 replies; 21+ messages in thread
From: Shawn Webb @ 2022-12-28 15:31 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]

On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote:
> On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > Dominique Martinet writes:
> > 
> > > But, really, I just don't see how this can practically be said to be parsable...
> > 
> > In its current form it never will be.  The solution is to place
> > this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> > to get the command+args part (assuming I counted all those fields
> > correctly ...)
> > 
> > Of course, this breaks backwards compatability.
> 
> It would also break forwards compatibility in the case new fields
> needed to be added.
> 
> The only solution would be a libxo-style feature wherein a
> machine-parseable format is exposed by virtue of a file extension.
> 
> Examples:
> 
> 1. /proc/pid/stats.json
> 2. /proc/pid/stats.xml
> 3. /proc/pid/stats.yaml_shouldnt_be_a_thing

To expand upon this idea, lets define an example json file:

{
	"schemaver": "20221228001",
	"name": "cat",
	"state": {
		"raw": "R",
		"intval": 1,
		"Pretty": "(Running)",
	},
	"tgid": 5452,
	"pid": 5452,
	"ppid": 743,
	"uid": {
		"real": 501,
		"effective": 501,
		"saved_set": 501,
		"fs": 501
	}
}

And so on.

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 15:24     ` Shawn Webb
  2022-12-28 15:31       ` Shawn Webb
@ 2022-12-28 16:47       ` Demi Marie Obenour
  2022-12-28 17:09         ` Jan Engelhardt
  2022-12-28 17:25         ` Shawn Webb
  2022-12-31 16:31       ` David Laight
  2 siblings, 2 replies; 21+ messages in thread
From: Demi Marie Obenour @ 2022-12-28 16:47 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote:
> On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > Dominique Martinet writes:
> > 
> > > But, really, I just don't see how this can practically be said to be parsable...
> > 
> > In its current form it never will be.  The solution is to place
> > this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> > to get the command+args part (assuming I counted all those fields
> > correctly ...)
> > 
> > Of course, this breaks backwards compatability.
> 
> It would also break forwards compatibility in the case new fields
> needed to be added.
> 
> The only solution would be a libxo-style feature wherein a
> machine-parseable format is exposed by virtue of a file extension.
> 
> Examples:
> 
> 1. /proc/pid/stats.json
> 2. /proc/pid/stats.xml
> 3. /proc/pid/stats.yaml_shouldnt_be_a_thing

A binary format would be even better.  No risk of ambiguity.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 16:47       ` Demi Marie Obenour
@ 2022-12-28 17:09         ` Jan Engelhardt
  2022-12-28 17:25         ` Shawn Webb
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Engelhardt @ 2022-12-28 17:09 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man


On Wednesday 2022-12-28 17:47, Demi Marie Obenour wrote:
>> Examples:
>> 
>> 1. /proc/pid/stats.json
>> 2. /proc/pid/stats.xml
>> 3. /proc/pid/stats.yaml_shouldnt_be_a_thing
>
>A binary format would be even better.  No risk of ambiguity.

So like EBML?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 16:47       ` Demi Marie Obenour
  2022-12-28 17:09         ` Jan Engelhardt
@ 2022-12-28 17:25         ` Shawn Webb
  2022-12-28 18:02           ` Demi Marie Obenour
  1 sibling, 1 reply; 21+ messages in thread
From: Shawn Webb @ 2022-12-28 17:25 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]

On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote:
> On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote:
> > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > > Dominique Martinet writes:
> > > 
> > > > But, really, I just don't see how this can practically be said to be parsable...
> > > 
> > > In its current form it never will be.  The solution is to place
> > > this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> > > to get the command+args part (assuming I counted all those fields
> > > correctly ...)
> > > 
> > > Of course, this breaks backwards compatability.
> > 
> > It would also break forwards compatibility in the case new fields
> > needed to be added.
> > 
> > The only solution would be a libxo-style feature wherein a
> > machine-parseable format is exposed by virtue of a file extension.
> > 
> > Examples:
> > 
> > 1. /proc/pid/stats.json
> > 2. /proc/pid/stats.xml
> > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing
> 
> A binary format would be even better.  No risk of ambiguity.

I think the argument I'm trying to make is to be flexible in
implementation, allowing for future needs and wants--that is "future
proofing".

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 17:25         ` Shawn Webb
@ 2022-12-28 18:02           ` Demi Marie Obenour
  2022-12-28 18:36             ` John Helmert III
                               ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Demi Marie Obenour @ 2022-12-28 18:02 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]

On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote:
> On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote:
> > On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote:
> > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > > > Dominique Martinet writes:
> > > > 
> > > > > But, really, I just don't see how this can practically be said to be parsable...
> > > > 
> > > > In its current form it never will be.  The solution is to place
> > > > this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> > > > to get the command+args part (assuming I counted all those fields
> > > > correctly ...)
> > > > 
> > > > Of course, this breaks backwards compatability.
> > > 
> > > It would also break forwards compatibility in the case new fields
> > > needed to be added.
> > > 
> > > The only solution would be a libxo-style feature wherein a
> > > machine-parseable format is exposed by virtue of a file extension.
> > > 
> > > Examples:
> > > 
> > > 1. /proc/pid/stats.json
> > > 2. /proc/pid/stats.xml
> > > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing
> > 
> > A binary format would be even better.  No risk of ambiguity.
> 
> I think the argument I'm trying to make is to be flexible in
> implementation, allowing for future needs and wants--that is "future
> proofing".

Linux should not have an XML, JSON, or YAML serializer.  Linux already
does way too much; let’s not add one more thing to the list.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 18:02           ` Demi Marie Obenour
@ 2022-12-28 18:36             ` John Helmert III
  2022-12-28 19:24             ` Shawn Webb
  2022-12-28 22:14             ` Theodore Ts'o
  2 siblings, 0 replies; 21+ messages in thread
From: John Helmert III @ 2022-12-28 18:36 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote:
> On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote:
> > On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote:
> > > On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote:
> > > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > > > > Dominique Martinet writes:
> > > > > 
> > > > > > But, really, I just don't see how this can practically be said to be parsable...
> > > > > 
> > > > > In its current form it never will be.  The solution is to place
> > > > > this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> > > > > to get the command+args part (assuming I counted all those fields
> > > > > correctly ...)
> > > > > 
> > > > > Of course, this breaks backwards compatability.
> > > > 
> > > > It would also break forwards compatibility in the case new fields
> > > > needed to be added.
> > > > 
> > > > The only solution would be a libxo-style feature wherein a
> > > > machine-parseable format is exposed by virtue of a file extension.
> > > > 
> > > > Examples:
> > > > 
> > > > 1. /proc/pid/stats.json
> > > > 2. /proc/pid/stats.xml
> > > > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing
> > > 
> > > A binary format would be even better.  No risk of ambiguity.
> > 
> > I think the argument I'm trying to make is to be flexible in
> > implementation, allowing for future needs and wants--that is "future
> > proofing".
> 
> Linux should not have an XML, JSON, or YAML serializer.  Linux already
> does way too much; let’s not add one more thing to the list.

Handling a new binary format is not 'one more thing' added?

> -- 
> Sincerely,
> Demi Marie Obenour (she/her/hers)
> Invisible Things Lab



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 18:02           ` Demi Marie Obenour
  2022-12-28 18:36             ` John Helmert III
@ 2022-12-28 19:24             ` Shawn Webb
  2022-12-28 19:57               ` Alejandro Colomar
  2022-12-28 22:14             ` Theodore Ts'o
  2 siblings, 1 reply; 21+ messages in thread
From: Shawn Webb @ 2022-12-28 19:24 UTC (permalink / raw)
  To: oss-security; +Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

[-- Attachment #1: Type: text/plain, Size: 2291 bytes --]

On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote:
> On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote:
> > On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote:
> > > On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote:
> > > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > > > > Dominique Martinet writes:
> > > > > 
> > > > > > But, really, I just don't see how this can practically be said to be parsable...
> > > > > 
> > > > > In its current form it never will be.  The solution is to place
> > > > > this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> > > > > to get the command+args part (assuming I counted all those fields
> > > > > correctly ...)
> > > > > 
> > > > > Of course, this breaks backwards compatability.
> > > > 
> > > > It would also break forwards compatibility in the case new fields
> > > > needed to be added.
> > > > 
> > > > The only solution would be a libxo-style feature wherein a
> > > > machine-parseable format is exposed by virtue of a file extension.
> > > > 
> > > > Examples:
> > > > 
> > > > 1. /proc/pid/stats.json
> > > > 2. /proc/pid/stats.xml
> > > > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing
> > > 
> > > A binary format would be even better.  No risk of ambiguity.
> > 
> > I think the argument I'm trying to make is to be flexible in
> > implementation, allowing for future needs and wants--that is "future
> > proofing".
> 
> Linux should not have an XML, JSON, or YAML serializer.  Linux already
> does way too much; let’s not add one more thing to the list.

Somewhat agreed. I think formats like JSON provide a good balance
between machine parseable and human readable.

As I described earlier, though, when it comes to concepts like procfs
and sysfs, I have a bias towards abandoning them in favor of sysctl.
If sysctl nodes were to be used, no new serialization formats would
need to be implemented--and developers would also use a safter method
of system and process inspection and manipulation.

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 19:24             ` Shawn Webb
@ 2022-12-28 19:57               ` Alejandro Colomar
  0 siblings, 0 replies; 21+ messages in thread
From: Alejandro Colomar @ 2022-12-28 19:57 UTC (permalink / raw)
  To: Shawn Webb, oss-security, John Helmert III, Demi Marie Obenour,
	Jan Engelhardt, Lyndon Nerenberg (VE7TFX/VE6BBM)
  Cc: Michael Kerrisk, linux-kernel, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 2760 bytes --]

Hi all,

On 12/28/22 20:24, Shawn Webb wrote:
> On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote:
>> On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote:
>>> On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote:
>>>> On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote:
>>>>> On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
>>>>>> Dominique Martinet writes:
>>>>>>
>>>>>>> But, really, I just don't see how this can practically be said to be parsable...
>>>>>>
>>>>>> In its current form it never will be.  The solution is to place
>>>>>> this variable-length field last.  Then you can "cut -d ' ' -f 51-"
>>>>>> to get the command+args part (assuming I counted all those fields
>>>>>> correctly ...)
>>>>>>
>>>>>> Of course, this breaks backwards compatability.
>>>>>
>>>>> It would also break forwards compatibility in the case new fields
>>>>> needed to be added.
>>>>>
>>>>> The only solution would be a libxo-style feature wherein a
>>>>> machine-parseable format is exposed by virtue of a file extension.
>>>>>
>>>>> Examples:
>>>>>
>>>>> 1. /proc/pid/stats.json
>>>>> 2. /proc/pid/stats.xml
>>>>> 3. /proc/pid/stats.yaml_shouldnt_be_a_thing
>>>>
>>>> A binary format would be even better.  No risk of ambiguity.
>>>
>>> I think the argument I'm trying to make is to be flexible in
>>> implementation, allowing for future needs and wants--that is "future
>>> proofing".
>>
>> Linux should not have an XML, JSON, or YAML serializer.  Linux already
>> does way too much; let’s not add one more thing to the list.
> 
> Somewhat agreed. I think formats like JSON provide a good balance
> between machine parseable and human readable.
> a
> As I described earlier, though, when it comes to concepts like procfs
> and sysfs, I have a bias towards abandoning them in favor of sysctl.
> If sysctl nodes were to be used, no new serialization formats would
> need to be implemented--and developers would also use a safter method
> of system and process inspection and manipulation.
> 

Just a comment as someone who is reading without much understanding of the 
contents of /prod/pid/stat:

If organization of the data in the file is a problem, and the format starts to 
matter, maybe it's a hint that there are too many different contents, and could 
be split into different files, each one with its own formatting rules.  I'll 
suggest that maybe a set of files, maybe contained in a common directory 
stats.d, is what you're looking for?

Binary format is not of my preference, since most user-space tools work with the 
standard interface, that is, text.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 18:02           ` Demi Marie Obenour
  2022-12-28 18:36             ` John Helmert III
  2022-12-28 19:24             ` Shawn Webb
@ 2022-12-28 22:14             ` Theodore Ts'o
  2022-12-29  0:33               ` Demi Marie Obenour
  2 siblings, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2022-12-28 22:14 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: oss-security, Alejandro Colomar, Michael Kerrisk, linux-kernel,
	linux-man

On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote:
> > I think the argument I'm trying to make is to be flexible in
> > implementation, allowing for future needs and wants--that is "future
> > proofing".
> 
> Linux should not have an XML, JSON, or YAML serializer.  Linux already
> does way too much; let’s not add one more thing to the list.

There's always Protobufs[1]!  :-)  And all of these are better than
ASN.1, for which Google already has a limited parser (for x.509
certificates).   :-)   :-)   :-)

						- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 22:14             ` Theodore Ts'o
@ 2022-12-29  0:33               ` Demi Marie Obenour
  0 siblings, 0 replies; 21+ messages in thread
From: Demi Marie Obenour @ 2022-12-29  0:33 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: oss-security, Alejandro Colomar, Michael Kerrisk, linux-kernel,
	linux-man

[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

On Wed, Dec 28, 2022 at 05:14:42PM -0500, Theodore Ts'o wrote:
> On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote:
> > > I think the argument I'm trying to make is to be flexible in
> > > implementation, allowing for future needs and wants--that is "future
> > > proofing".
> > 
> > Linux should not have an XML, JSON, or YAML serializer.  Linux already
> > does way too much; let’s not add one more thing to the list.
> 
> There's always Protobufs[1]!  :-)  And all of these are better than
> ASN.1, for which Google already has a limited parser (for x.509
> certificates).   :-)   :-)   :-)
> 
> 						- Ted

Cap’n Proto is better than Protobufs :-)
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28  1:50     ` Tavis Ormandy
@ 2022-12-30 20:15       ` Jakub Wilk
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Wilk @ 2022-12-30 20:15 UTC (permalink / raw)
  To: oss-security; +Cc: linux-man, linux-kernel

* Tavis Ormandy <taviso@gmail.com>, 2022-12-28 01:50:
>>>But, really, I just don't see how this can practically be said to be 
>>>parsable...
>>
>>In its current form it never will be.  The solution is to place this 
>>variable-length field last.  Then you can "cut -d ' ' -f 51-" to get 
>>the command+args part (assuming I counted all those fields correctly 
>>...)
>>
>>Of course, this breaks backwards compatability.
>
>I think that cut command doesn't handle newlines,

Indeed.

>There already is 'ps -q $$ -o >comm='

FWIW, "ps ... -o comm=" doesn't just print the raw comm value: it 
replaces non-printable chars with punctuation characters, and it may 
append " <defunct>" if the process is a zombie.

The easiest way to get unmangled comm is to read it from 
/proc/$PID/comm, then strip the trailing newline.

(But I suspect most /proc/*/stat parsers don't care about the comm field 
at all; they just want to skip over it to get their hands on the 
following fields.)

-- 
Jakub Wilk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-28 15:24     ` Shawn Webb
  2022-12-28 15:31       ` Shawn Webb
  2022-12-28 16:47       ` Demi Marie Obenour
@ 2022-12-31 16:31       ` David Laight
  2022-12-31 17:27         ` Solar Designer
  2 siblings, 1 reply; 21+ messages in thread
From: David Laight @ 2022-12-31 16:31 UTC (permalink / raw)
  To: 'Shawn Webb', oss-security
  Cc: Alejandro Colomar, Michael Kerrisk, linux-kernel, linux-man

From: Shawn Webb
> Sent: 28 December 2022 15:25
> 
> On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > Dominique Martinet writes:
> >
> > > But, really, I just don't see how this can practically be said to be parsable...
> >
> > In its current form it never will be.  The solution is to place
> > this variable-length field last.  Then you can "cut -d ' ' -f 51-"
> > to get the command+args part (assuming I counted all those fields
> > correctly ...)
> >
> > Of course, this breaks backwards compatability.
> 
> It would also break forwards compatibility in the case new fields
> needed to be added.
> 
> The only solution would be a libxo-style feature wherein a
> machine-parseable format is exposed by virtue of a file extension.
> 
> Examples:
> 
> 1. /proc/pid/stats.json
> 2. /proc/pid/stats.xml
> 3. /proc/pid/stats.yaml_shouldnt_be_a_thing

None of those are of any real use if you are trying to parse the
data in something like a shell script.
Multiple lines formatted as "tag:value" are probably the best bet.
Provided something sane is done with embedded \n (and maybe \r).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [oss-security] [patch] proc.5: tell how to parse /proc/*/stat correctly
  2022-12-31 16:31       ` David Laight
@ 2022-12-31 17:27         ` Solar Designer
  0 siblings, 0 replies; 21+ messages in thread
From: Solar Designer @ 2022-12-31 17:27 UTC (permalink / raw)
  To: David Laight
  Cc: 'Shawn Webb',
	oss-security, Alejandro Colomar, Michael Kerrisk, linux-kernel,
	linux-man

Hi all,

Let's wind this oss-security thread down as it relates to brainstorming
and commenting on totally new designs - no more of that, please.

Many things were said, but realistically the interface isn't _that_
broken (this can be parsed correctly, and procps-ng manages to) and is
(hopefully) not going to change much (in my opinion, and I know I'm not
alone in this, most of the proposals would make things worse overall).

Somewhat realistically, one possible change is replacing the most risky
characters, such as braces and anything <= ASCII 32, perhaps with '?'
to match what procps-ng is doing.  Perhaps do this either on all updates
of "comm" or in all places where "comm" is reported to userspace
(including procfs and kernel messages, by calling a common function).
"comm" isn't the full process name anyway - it's often truncated - so it
can reasonably be made safer in other ways as well.  As an option, the
replacing of whitespace (ASCII 32) and braces could be limited to the
"stat" file, but the control characters are (even more) problematic with
other interfaces where "comm" is exposed, so replacing them should
probably be global.

Happy New Year!

Alexander

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-12-31 17:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22 16:42 [patch] proc.5: tell how to parse /proc/*/stat correctly Alexey Dobriyan
2022-12-22 22:03 ` [oss-security] " Dominique Martinet
2022-12-22 23:21   ` Solar Designer
2022-12-23  0:15     ` Dominique Martinet
2022-12-23  0:21   ` Jan Engelhardt
2022-12-28  0:44   ` Lyndon Nerenberg (VE7TFX/VE6BBM)
2022-12-28  1:50     ` Tavis Ormandy
2022-12-30 20:15       ` Jakub Wilk
2022-12-28 15:24     ` Shawn Webb
2022-12-28 15:31       ` Shawn Webb
2022-12-28 16:47       ` Demi Marie Obenour
2022-12-28 17:09         ` Jan Engelhardt
2022-12-28 17:25         ` Shawn Webb
2022-12-28 18:02           ` Demi Marie Obenour
2022-12-28 18:36             ` John Helmert III
2022-12-28 19:24             ` Shawn Webb
2022-12-28 19:57               ` Alejandro Colomar
2022-12-28 22:14             ` Theodore Ts'o
2022-12-29  0:33               ` Demi Marie Obenour
2022-12-31 16:31       ` David Laight
2022-12-31 17:27         ` Solar Designer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).