All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG 212887] [PATCH] getopt.3: Clarify behaviour
@ 2021-04-30 20:00 Alejandro Colomar
  2021-04-30 20:37 ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 4+ messages in thread
From: Alejandro Colomar @ 2021-04-30 20:00 UTC (permalink / raw)
  To: linux-man; +Cc: James O. D. Hunt, alx.manpages

From: "James O. D. Hunt" <jamesodhunt@gmail.com>

Improved the `getopt(3)` man page in the following ways:

1) Defined the existing term "legitimate option character".
2) Added an additional NOTE stressing that arguments are parsed in strict
   order and the implications of this when numeric options are utilised.
3) Added a new WARNINGS section that alerts the reader to the fact they
   should:
   - Validate all option argument.
   - Take care if mixing numeric options and arguments accepting numeric
     values.

Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=212887>
---

I'm only forwarding the patch to the list to better discuss it.

 man3/getopt.3 | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/man3/getopt.3 b/man3/getopt.3
index 921e747f8..ec3a5640f 100644
--- a/man3/getopt.3
+++ b/man3/getopt.3
@@ -125,7 +125,13 @@ Then \fIoptind\fP is the index in \fIargv\fP of the first
 \fIargv\fP-element that is not an option.
 .PP
 .I optstring
-is a string containing the legitimate option characters.
+is a string containing the legitimate option characters. A legitimate
+option character is any visible one byte
+.BR ascii (7)
+character (for which
+.BR isgraph (3)
+would return nonzero) that is not the null byte (\(aq\e0\(aq),
+dash (\(aq-\(aq) or colon (\(aq:\(aq).
 If such a
 character is followed by a colon, the option requires an argument, so
 .BR getopt ()
@@ -402,6 +408,22 @@ routine that rechecks
 .B POSIXLY_CORRECT
 and checks for GNU extensions in
 .IR optstring .)
+
+Command-line arguments are parsed in strict order meaning that an option requiring
+an argument will consume the next argument, regardless of whether that
+argument is the correctly specified option argument or simply the next option
+(in the scenario the user mis-specifies the command-line). For example, if
+.IR optstring
+is specified as "1n:"
+and the user incorrectly specifies the command-line arguments as
+\(aqprog\ -n\ -1\(aq, the
+.IR \-n
+option will be given the
+.BR optarg
+value \(aq\-1\(aq, and the
+.IR \-1
+option will be considered to have not been specified.
+
 .SH EXAMPLES
 .SS getopt()
 The following trivial example program uses
@@ -542,6 +564,41 @@ main(int argc, char **argv)
     exit(EXIT_SUCCESS);
 }
 .EE
+
+.SH WARNINGS
+Since
+.BR getopt ()
+allows users to provide values to the program, every care should be taken to
+validate every option value specified by the user calling the program.
+.BR getopt ()
+itself provides no validation so the programmer should perform boundary value
+checks on
+.ft I
+every
+.ft
+argument to minimise the risk of bad input data being accepted by the program.
+String values should be checked to ensure they are not empty (unless
+permitted), sanitized appropriately and that internal buffers used to
+store the string values returned in \fIoptarg\fP are large enough to hold
+pathologically long values. Numeric values should be verified to ensure they
+are within the expected permissible range of values.
+
+Further, since
+.BR getopt ()
+can handle numeric options (such as \(aq-1\(aq or \(aq-2 foo\(aq), care should be
+taken when writing  a program that accepts both a numeric flag option and an option
+accepting a numeric argument. Specifically, the program should sanity check the numeric
+\fIoptarg\fP value carefully to protect against the case where a user
+mis-specifies the command-line which chould result in a numeric option flag
+being specified as the \fIoptarg\fP value for the numeric option by mistake.
+For example, if
+.IR optstring
+is specified as "1n:" and the \(aqn\(aq option accepts a numeric value, if the
+command-line is specified accidentally as \(aqprog\ -n\ -1\(aq, care needs to
+be taken to ensure the program does not try to convert the \(aq-1\(aq passed
+to the \(aqn\(aq option into an unsigned numeric value since that would result
+in it being set to the largest possible integer value for the type used to
+encode it.
 .SH SEE ALSO
 .BR getopt (1),
 .BR getsubopt (3)
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [BUG 212887] [PATCH] getopt.3: Clarify behaviour
  2021-04-30 20:00 [BUG 212887] [PATCH] getopt.3: Clarify behaviour Alejandro Colomar
@ 2021-04-30 20:37 ` Alejandro Colomar (man-pages)
  2021-05-01 19:41   ` [BUG 212887] [PATCH v2] " Alejandro Colomar
  0 siblings, 1 reply; 4+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-04-30 20:37 UTC (permalink / raw)
  To: James O. D. Hunt; +Cc: bugzilla-daemon, linux-man

Hi James,

Thanks for documenting that!
Please see some comments below.

Thanks,

Alex

On 4/30/21 10:00 PM, Alejandro Colomar wrote:
> From: "James O. D. Hunt" <jamesodhunt@gmail.com>
> 
> Improved the `getopt(3)` man page in the following ways:
> 
> 1) Defined the existing term "legitimate option character".
> 2) Added an additional NOTE stressing that arguments are parsed in strict
>    order and the implications of this when numeric options are utilised.
> 3) Added a new WARNINGS section that alerts the reader to the fact they
>    should:
>    - Validate all option argument.
>    - Take care if mixing numeric options and arguments accepting numeric
>      values.
> 
> Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
> Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=212887>
> ---
> 
> I'm only forwarding the patch to the list to better discuss it.
> 
>  man3/getopt.3 | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/man3/getopt.3 b/man3/getopt.3
> index 921e747f8..ec3a5640f 100644
> --- a/man3/getopt.3
> +++ b/man3/getopt.3
> @@ -125,7 +125,13 @@ Then \fIoptind\fP is the index in \fIargv\fP of the first
>  \fIargv\fP-element that is not an option.
>  .PP
>  .I optstring
> -is a string containing the legitimate option characters.
> +is a string containing the legitimate option characters. A legitimate
> +option character is any visible one byte
> +.BR ascii (7)
> +character (for which
> +.BR isgraph (3)
> +would return nonzero) that is not the null byte (\(aq\e0\(aq),

Is it necessary to mention the null byte ('\0') here?  It is already not
in the 'visible character' set.  I don't think isgraph(3) would return
true for it, right?

> +dash (\(aq-\(aq) or colon (\(aq:\(aq).

So the only visible characters that are invalid are '-' and ':', right?

BTW, you should escape the dash: (\(aq\-\(aq)

See:

$ man 7 man-pages | sed -n '/Generating optimal glyphs/,+19p';
   Generating optimal glyphs
       Where a real minus character is required (e.g., for numbers
       such as -1, for man page cross references such as utf-8(7),
       or  when  writing options that have a leading dash, such as
       in ls -l), use the following form in the man page source:

           \-

       This guideline applies also to code examples.

       The use of real minus signs serves the following purposes:

       *  To provide better renderings on  various  targets  other
          than  ASCII  terminals,  notably  in  PDF  and  on  Uni‐
          code/UTF-8‐capable terminals.

       *  To generate glyphs that when copied from rendered  pages
          will  produce real minus signs when pasted into a termi‐
          nal.


>  If such a
>  character is followed by a colon, the option requires an argument, so
>  .BR getopt ()
> @@ -402,6 +408,22 @@ routine that rechecks
>  .B POSIXLY_CORRECT
>  and checks for GNU extensions in
>  .IR optstring .)
> +

$ man 7 man-pages | sed -n '/Formatting conventions (general)/,/^$/p';
   Formatting conventions (general)
       Paragraphs should be separated by suitable markers (usually
       either .PP or .IP).  Do not separate paragraphs using blank
       lines, as this results in poor  rendering  in  some  output
       formats (such as PostScript and PDF).



> +Command-line arguments are parsed in strict order meaning that an option requiring
> +an argument will consume the next argument, regardless of whether that
> +argument is the correctly specified option argument or simply the next option
> +(in the scenario the user mis-specifies the command-line). For example, if

$ man 7 man-pages | sed -n '/Use semantic newlines/,/^$/p';
   Use semantic newlines
       In the source of a manual page,  new  sentences  should  be
       started  on new lines, and long sentences should split into
       lines at clause breaks (commas, semicolons, colons, and  so
       on).   This  convention,  sometimes known as "semantic new‐
       lines", makes it easier to see the effect of patches, which
       often  operate at the level of individual sentences or sen‐
       tence clauses.



> +.IR optstring
> +is specified as "1n:"
> +and the user incorrectly specifies the command-line arguments as
> +\(aqprog\ -n\ -1\(aq, the
> +.IR \-n

Use .I instead of .IR here.

$ man 7 groff_man | sed -n '/^ *\.IR italic/,+5p';
       .IR italic‐text roman‐text ...
              Set each argument in italics and roman, alternately.

                     This is the first command of the
                     .IR prologue .

$ man 7 groff_man | sed -n '/^ *\.I \[text]/,+16p';
       .I [text]
              Set text in italics.  If the macro is given no argu‐
              ments,  the  text  of  the next input line is set in
              italics.

              Use italics for file and path names, for environment
              variables, for enumeration or preprocessor constants
              in C, for  variable  (user‐determined)  portions  of
              syntax  synopses, for the first occurrence only of a
              technical concept being  introduced,  for  names  of
              works of software (including commands and functions,
              but excluding names of operating  systems  or  their
              kernels),  and  anywhere  a  parameter requiring re‐
              placement by the user is encountered.  An  exception
              involves  variable text in a context that is already
              marked up in italics, such as  file  or  path  names
              with  variable components; in such cases, follow the


> +option will be given the
> +.BR optarg

The same as with .I.  Use .B here.  See groff_man(7).

> +value \(aq\-1\(aq, and the
> +.IR \-1
> +option will be considered to have not been specified.
> +
>  .SH EXAMPLES
>  .SS getopt()
>  The following trivial example program uses
> @@ -542,6 +564,41 @@ main(int argc, char **argv)
>      exit(EXIT_SUCCESS);
>  }
>  .EE
> +
> +.SH WARNINGS
> +Since
> +.BR getopt ()
> +allows users to provide values to the program, every care should be taken to
> +validate every option value specified by the user calling the program.
> +.BR getopt ()
> +itself provides no validation so the programmer should perform boundary value
> +checks on
> +.ft I
> +every
> +.ft

Please use:

.I every

> +argument to minimise the risk of bad input data being accepted by the program.
> +String values should be checked to ensure they are not empty (unless
> +permitted), sanitized appropriately and that internal buffers used to
> +store the string values returned in \fIoptarg\fP are large enough to hold

Please avoid embedding formatted text in normal text. Use a separate line:

.I optarg

> +pathologically long values. Numeric values should be verified to ensure they
> +are within the expected permissible range of values.
> +
> +Further, since
> +.BR getopt ()
> +can handle numeric options (such as \(aq-1\(aq or \(aq-2 foo\(aq), care should be
> +taken when writing  a program that accepts both a numeric flag option and an option
> +accepting a numeric argument. Specifically, the program should sanity check the numeric
> +\fIoptarg\fP value carefully to protect against the case where a user
> +mis-specifies the command-line which chould result in a numeric option flag
> +being specified as the \fIoptarg\fP value for the numeric option by mistake.
> +For example, if
> +.IR optstring
> +is specified as "1n:" and the \(aqn\(aq option accepts a numeric value, if the
> +command-line is specified accidentally as \(aqprog\ -n\ -1\(aq, care needs to
> +be taken to ensure the program does not try to convert the \(aq-1\(aq passed
> +to the \(aqn\(aq option into an unsigned numeric value since that would result
> +in it being set to the largest possible integer value for the type used to
> +encode it.
>  .SH SEE ALSO
>  .BR getopt (1),
>  .BR getsubopt (3)
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [BUG 212887] [PATCH v2] getopt.3: Clarify behaviour
  2021-04-30 20:37 ` Alejandro Colomar (man-pages)
@ 2021-05-01 19:41   ` Alejandro Colomar
  2021-05-01 20:53     ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 4+ messages in thread
From: Alejandro Colomar @ 2021-05-01 19:41 UTC (permalink / raw)
  To: linux-man; +Cc: James O. D. Hunt, alx.manpages

From: "James O. D. Hunt" <jamesodhunt@gmail.com>

Improved the `getopt(3)` man page in the following ways:

1) Defined the existing term "legitimate option character".
2) Added an additional NOTE stressing that arguments are parsed in strict
   order and the implications of this when numeric options are utilised.
3) Added a new WARNINGS section that alerts the reader to the fact they
   should:
   - Validate all option argument.
   - Take care if mixing numeric options and arguments accepting numeric
     values.

Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=212887>
---

Forward patch v2 from bugzilla to linux-man@.

 man3/getopt.3 | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/man3/getopt.3 b/man3/getopt.3
index 921e747f8..810298505 100644
--- a/man3/getopt.3
+++ b/man3/getopt.3
@@ -126,6 +126,11 @@ Then \fIoptind\fP is the index in \fIargv\fP of the first
 .PP
 .I optstring
 is a string containing the legitimate option characters.
+A legitimate option character is any visible one byte
+.BR ascii (7)
+character (for which
+.BR isgraph (3)
+would return nonzero) that is not dash (\(aq\-\(aq) or colon (\(aq:\(aq).
 If such a
 character is followed by a colon, the option requires an argument, so
 .BR getopt ()
@@ -402,6 +407,23 @@ routine that rechecks
 .B POSIXLY_CORRECT
 and checks for GNU extensions in
 .IR optstring .)
+.PP
+Command-line arguments are parsed in strict order meaning that an option
+requiring an argument will consume the next argument, regardless of whether
+that argument is the correctly specified option argument or simply the next
+option (in the scenario the user mis-specifies the command line).
+For example, if
+.IR optstring
+is specified as "1n:"
+and the user incorrectly specifies the command line arguments as
+\(aqprog\ \-n\ \-1\(aq, the
+.I \-n
+option will be given the
+.B optarg
+value \(aq\-1\(aq, and the
+.I \-1
+option will be considered to have not been specified.
+.PP
 .SH EXAMPLES
 .SS getopt()
 The following trivial example program uses
@@ -542,6 +564,45 @@ main(int argc, char **argv)
     exit(EXIT_SUCCESS);
 }
 .EE
+.PP
+.SH WARNINGS
+Since
+.BR getopt ()
+allows users to provide values to the program, every care should be taken to
+validate every option value specified by the user calling the program.
+.BR getopt ()
+itself provides no validation so the programmer should perform boundary value
+checks on
+.I every
+argument to minimise the risk of bad input data being accepted by the program.
+String values should be checked to ensure they are not empty (unless
+permitted), sanitized appropriately and that internal buffers used to store
+the string values returned in
+.I optarg
+are large enough to hold pathologically long values.
+Numeric values should be verified to ensure they are within the expected
+permissible range of values.
+.PP
+Further, since
+.BR getopt ()
+can handle numeric options (such as \(aq\-1\(aq or \(aq\-2\ foo\(aq), care should
+be taken when writing  a program that accepts both a numeric flag option and
+an option accepting a numeric argument.
+Specifically, the program should sanity check the numeric
+.I optarg
+value carefully to protect against the case where a user mis-specifies the
+command line which chould result in a numeric option flag being specified as
+the
+.I optarg
+value for the numeric option by mistake.
+For example, if
+.IR optstring
+is specified as "1n:" and the \(aqn\(aq option accepts a numeric value, if the
+command line is specified accidentally as \(aqprog\ \-n\ \-1\(aq, care needs to
+be taken to ensure the program does not try to convert the \(aq\-1\(aq passed
+to the \(aqn\(aq option into an unsigned numeric value since that would result
+in it being set to the largest possible integer value for the type used to
+encode it.
 .SH SEE ALSO
 .BR getopt (1),
 .BR getsubopt (3)
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [BUG 212887] [PATCH v2] getopt.3: Clarify behaviour
  2021-05-01 19:41   ` [BUG 212887] [PATCH v2] " Alejandro Colomar
@ 2021-05-01 20:53     ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 4+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-05-01 20:53 UTC (permalink / raw)
  To: James O. D. Hunt; +Cc: linux-man, bugzilla-daemon

Hi James,

See some more comments below.

Thanks,

Alex

On 5/1/21 9:41 PM, Alejandro Colomar wrote:
> From: "James O. D. Hunt" <jamesodhunt@gmail.com>
> 
> Improved the `getopt(3)` man page in the following ways:
> 
> 1) Defined the existing term "legitimate option character".
> 2) Added an additional NOTE stressing that arguments are parsed in strict
>    order and the implications of this when numeric options are utilised.
> 3) Added a new WARNINGS section that alerts the reader to the fact they
>    should:
>    - Validate all option argument.
>    - Take care if mixing numeric options and arguments accepting numeric
>      values.

Could you please separate this into 2 patches?  1 & 2 seem to be very
related, but 3 is quite different.

> 
> Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
> Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=212887>
> ---
> 
> Forward patch v2 from bugzilla to linux-man@.
> 
>  man3/getopt.3 | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/man3/getopt.3 b/man3/getopt.3
> index 921e747f8..810298505 100644
> --- a/man3/getopt.3
> +++ b/man3/getopt.3
> @@ -126,6 +126,11 @@ Then \fIoptind\fP is the index in \fIargv\fP of the first
>  .PP
>  .I optstring
>  is a string containing the legitimate option characters.
> +A legitimate option character is any visible one byte
> +.BR ascii (7)
> +character (for which
> +.BR isgraph (3)
> +would return nonzero) that is not dash (\(aq\-\(aq) or colon (\(aq:\(aq).
>  If such a
>  character is followed by a colon, the option requires an argument, so
>  .BR getopt ()
> @@ -402,6 +407,23 @@ routine that rechecks
>  .B POSIXLY_CORRECT
>  and checks for GNU extensions in
>  .IR optstring .)
> +.PP

Some more "semantic newline" cuts (//):

> +Command-line arguments are parsed in strict order // meaning that an option
> +requiring an argument will consume the next argument, // regardless of whether
> +that argument is the correctly specified option argument // or simply the next
> +option // (in the scenario the user mis-specifies the command line).
> +For example, if
> +.IR optstring
> +is specified as "1n:"
> +and the user incorrectly specifies the command line arguments as
> +\(aqprog\ \-n\ \-1\(aq, the

Please replace the quotes by italics (.IR or .I) (and possibly
non-breaking spaces).

$ man 7 man-pages | sed -n '/Complete commands/,+11p';
       Complete commands should, if long, be written as  an  in‐
       dented  line  on  their own, with a blank line before and
       after the command, for example

           man 7 man-pages

       If the command is short, then it can be  included  inline
       in  the  text,  in italic format, for example, man 7 man‐
       pages.  In this case, it may be worth  using  nonbreaking
       spaces ("\ ") at suitable places in the command.  Command
       options should be written in italics (e.g., -l).

> +.I \-n
> +option will be given the
> +.B optarg
> +value \(aq\-1\(aq, and the

Given that in the case above -1 is a C string, double quotes would
probably be more appropriate.

> +.I \-1
> +option will be considered to have not been specified.
> +.PP
>  .SH EXAMPLES
>  .SS getopt()
>  The following trivial example program uses
> @@ -542,6 +564,45 @@ main(int argc, char **argv)
>      exit(EXIT_SUCCESS);
>  }
>  .EE
> +.PP
> +.SH WARNINGS

This should probably be a subsection in NOTES.

$ man 7 man-pages | sed -n '/^ *Where.*traditional/,/^$/p';
       Where  a  traditional heading would apply, please use it;
       this kind of consistency can make the information  easier
       to  understand.   If  you  must,  you can create your own
       headings if they make things easier to  understand  (this
       can  be especially useful for pages in Sections 4 and 5).
       However, before doing this, consider  whether  you  could
       use the traditional headings, with some subsections (.SS)
       within those sections.

> +Since
> +.BR getopt ()

Some more "semantic newline" cuts (//):

> +allows users to provide values to the program, // every care should be taken to
> +validate // every option value specified by the user calling the program.
> +.BR getopt ()
> +itself provides no validation so // the programmer should perform boundary value
> +checks on
> +.I every
> +argument to minimise the risk of bad input data being accepted by the program.
> +String values should be checked to // ensure they are not empty (unless
> +permitted), // sanitized appropriately and // that internal buffers used to store

Review wording (s/that internal/check that internal/?)

> +the string values returned in
> +.I optarg
> +are large enough to hold pathologically long values.

I'm not sure if this is extending these notes too much.  I see it
obvious that any user input, especially strings, should be checked for
every corner case in paranoid mode.  But I checked scanf(3) and didn't
see any NOTES about that.

> +Numeric values should be verified to ensure they are within the expected
> +permissible range of values.
> +.PP
> +Further, since
> +.BR getopt ()
> +can handle numeric options (such as \(aq\-1\(aq or \(aq\-2\ foo\(aq), care should
> +be taken when writing  a program that accepts both a numeric flag option and
> +an option accepting a numeric argument.
> +Specifically, the program should sanity check the numeric
> +.I optarg
> +value carefully to protect against the case where a user mis-specifies the
> +command line which chould result in a numeric option flag being specified as
> +the
> +.I optarg
> +value for the numeric option by mistake.
> +For example, if
> +.IR optstring
> +is specified as "1n:" and the \(aqn\(aq option accepts a numeric value, if the
> +command line is specified accidentally as \(aqprog\ \-n\ \-1\(aq, care needs to
> +be taken to ensure the program does not try to convert the \(aq\-1\(aq passed
> +to the \(aqn\(aq option into an unsigned numeric value since that would result
> +in it being set to the largest possible integer value for the type used to
> +encode it.

I don't think we should warn about this.  If the user inputs a wrong
command line, he can only expect undefined behavior.  For the program,
as long as it doesn't have any security problems, it doesn't need to
care if the user doesn't provide a valid input.  Normal checks should be
done.

>  .SH SEE ALSO
>  .BR getopt (1),
>  .BR getsubopt (3)
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-01 20:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 20:00 [BUG 212887] [PATCH] getopt.3: Clarify behaviour Alejandro Colomar
2021-04-30 20:37 ` Alejandro Colomar (man-pages)
2021-05-01 19:41   ` [BUG 212887] [PATCH v2] " Alejandro Colomar
2021-05-01 20:53     ` Alejandro Colomar (man-pages)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.