linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] close_range.2: new page documenting close_range(2)
@ 2020-12-21 19:46 Stephen Kitt
  2020-12-22 20:36 ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Kitt @ 2020-12-21 19:46 UTC (permalink / raw)
  To: linux-man, Alejandro Colomar, Michael Kerrisk
  Cc: Christian Brauner, Giuseppe Scrivano, linux-kernel, Stephen Kitt

This documents close_range(2) based on information in
278a5fbaed89dacd04e9d052f4594ffd0e0585de,
60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and
582f1fb6b721facf04848d2ca57f34468da1813e.

Signed-off-by: Stephen Kitt <steve@sk2.org>
---
V5: clarification of the open/close_range/execve sequence

V4: sort flags alphabetically
    move commit references inside the corresponding section
    more semantic newlines
    unformat numeric constants
    more formatting for function references
    escape C backslashes
    C99 loop indices

V3: fix synopsis overflow
    copy notes from membarrier.2 re the lack of wrapper
    semantic newlines
    drop non-standard "USE CASES" section heading
    add code example

V2: unsigned int to match the kernel declarations
    groff and grammar tweaks
    CLOSE_RANGE_UNSHARE unshares *and* closes
    Explain that EMFILE and ENOMEM can occur with C_R_U
    "Conforming to" phrasing
    Detailed explanation of CLOSE_RANGE_UNSHARE
    Reading /proc isn't common

 man2/close_range.2 | 267 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 267 insertions(+)
 create mode 100644 man2/close_range.2

diff --git a/man2/close_range.2 b/man2/close_range.2
new file mode 100644
index 000000000..0677a9bf9
--- /dev/null
+++ b/man2/close_range.2
@@ -0,0 +1,267 @@
+.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
+.SH NAME
+close_range \- close all file descriptors in a given range
+.SH SYNOPSIS
+.nf
+.B #include <linux/close_range.h>
+.PP
+.BI "int close_range(unsigned int " first ", unsigned int " last ,
+.BI "                unsigned int " flags );
+.fi
+.PP
+.IR Note :
+There is no glibc wrapper for this system call; see NOTES.
+.SH DESCRIPTION
+The
+.BR close_range ()
+system call closes all open file descriptors from
+.I first
+to
+.I last
+(included).
+.PP
+Errors closing a given file descriptor are currently ignored.
+.PP
+.I flags
+can be 0 or set to one or both of the following:
+.TP
+.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)"
+sets the close-on-exec bit instead of
+immediately closing the file descriptors.
+.TP
+.B CLOSE_RANGE_UNSHARE
+unshares the range of file descriptors from any other processes,
+before closing them,
+avoiding races with other threads sharing the file descriptor table.
+.SH RETURN VALUE
+On success,
+.BR close_range ()
+returns 0.
+On error, \-1 is returned and
+.I errno
+is set to indicate the cause of the error.
+.SH ERRORS
+.TP
+.B EINVAL
+.I flags
+is not valid, or
+.I first
+is greater than
+.IR last .
+.PP
+The following can occur with
+.B CLOSE_RANGE_UNSHARE
+(when constructing the new descriptor table):
+.TP
+.B EMFILE
+The per-process limit on the number of open file descriptors has been reached
+(see the description of
+.B RLIMIT_NOFILE
+in
+.BR getrlimit (2)).
+.TP
+.B ENOMEM
+Insufficient kernel memory was available.
+.SH VERSIONS
+.BR close_range ()
+first appeared in Linux 5.9.
+.SH CONFORMING TO
+.BR close_range ()
+is a nonstandard function that is also present on FreeBSD.
+.SH NOTES
+Glibc does not provide a wrapper for this system call; call it using
+.BR syscall (2).
+.SS Closing all open file descriptors
+.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
+To avoid blindly closing file descriptors
+in the range of possible file descriptors,
+this is sometimes implemented (on Linux)
+by listing open file descriptors in
+.I /proc/self/fd/
+and calling
+.BR close (2)
+on each one.
+.BR close_range ()
+can take care of this without requiring
+.I /proc
+and within a single system call,
+which provides significant performance benefits.
+.SS Closing file descriptors before exec
+.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
+File descriptors can be closed safely using
+.PP
+.in +4n
+.EX
+/* we don't want anything past stderr here */
+close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
+execve(....);
+.EE
+.in
+.PP
+.B CLOSE_RANGE_UNSHARE
+is conceptually equivalent to
+.PP
+.in +4n
+.EX
+unshare(CLONE_FILES);
+close_range(first, last, 0);
+.EE
+.in
+.PP
+but can be more efficient:
+if the unshared range extends past
+the current maximum number of file descriptors allocated
+in the caller's file descriptor table
+(the common case when
+.I last
+is ~0U),
+the kernel will unshare a new file descriptor table for the caller up to
+.IR first .
+This avoids subsequent close calls entirely;
+the whole operation is complete once the table is unshared.
+.SS Closing files on \fBexec\fP
+.\" 582f1fb6b721facf04848d2ca57f34468da1813e
+This is particularly useful in cases where multiple
+.RB pre- exec
+setup steps risk conflicting with each other.
+For example, setting up a
+.BR seccomp (2)
+profile can conflict with a
+.BR close_range ()
+call:
+if the file descriptors are closed before the
+.BR seccomp (2)
+profile is set up,
+the profile setup can't use them itself,
+or control their closure;
+if the file descriptors are closed afterwards,
+the seccomp profile can't block the
+.BR close_range ()
+call or any fallbacks.
+Using
+.B CLOSE_RANGE_CLOEXEC
+avoids this:
+the descriptors can be marked before the
+.BR seccomp (2)
+profile is set up,
+and the profile can control access to
+.BR close_range ()
+without affecting the calling process.
+.SH EXAMPLES
+The following program is designed to be execed by the second program
+below.
+It lists its open file descriptors:
+.PP
+.in +4n
+.EX
+/* listopen.c */
+
+#include <stdio.h>
+#include <sys/stat.h>
+
+int
+main(int argc, char *argv[])
+{
+    struct stat buf;
+
+    for (int i = 0; i < 100; i++) {
+        if (!fstat(i, &buf))
+            printf("FD %d is open.\en", i);
+    }
+
+    exit(EXIT_SUCCESS);
+)
+.EE
+.in
+.PP
+This program executes the command given on its command-line,
+after opening the files listed after the command
+and then using
+.BR close_range ()
+to close them:
+.PP
+.in +4n
+.EX
+/* close_range.c */
+
+#include <fcntl.h>
+#include <linux/close_range.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+int
+main(int argc, char *argv[])
+{
+    char *newargv[] = { NULL };
+    char *newenviron[] = { NULL };
+
+    if (argc < 3) {
+        fprintf(stderr, "Usage: %s <command-to-run> <files-to-open>\en", argv[0]);
+        exit(EXIT_FAILURE);
+    }
+
+    for (int i = 2; i < argc; i++) {
+        if (open(argv[i], O_RDONLY) == -1) {
+            perror(argv[i]);
+            exit(EXIT_FAILURE);
+        }
+    }
+
+    if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) {
+        perror("close_range");
+        exit(EXIT_FAILURE);
+    }
+
+    execve(argv[1], newargv, newenviron);
+    perror("execve");
+    exit(EXIT_FAILURE);
+}
+.EE
+.in
+.PP
+We can use the second program to exec the first as follows:
+.PP
+.in +4n
+.EX
+.RB "$" " make listopen close_range"
+.RB "$" " ./close_range ./listopen /dev/null /dev/zero"
+FD 0 is open.
+FD 1 is open.
+FD 2 is open.
+.EE
+.in
+.PP
+Removing the call to
+.BR close_range ()
+will show different output,
+with the file descriptors for the named files still open.
+.SH SEE ALSO
+.BR close (2)

base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] close_range.2: new page documenting close_range(2)
  2020-12-21 19:46 [PATCH v5] close_range.2: new page documenting close_range(2) Stephen Kitt
@ 2020-12-22 20:36 ` Michael Kerrisk (man-pages)
  2020-12-23 19:33   ` Stephen Kitt
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-12-22 20:36 UTC (permalink / raw)
  To: Stephen Kitt, linux-man, Alejandro Colomar
  Cc: mtk.manpages, Christian Brauner, Giuseppe Scrivano, linux-kernel

Hello Stephen,

Thank you for your revisions! I still have a few comments.

On 12/21/20 8:46 PM, Stephen Kitt wrote:
> This documents close_range(2) based on information in
> 278a5fbaed89dacd04e9d052f4594ffd0e0585de,
> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and
> 582f1fb6b721facf04848d2ca57f34468da1813e.
> 
> Signed-off-by: Stephen Kitt <steve@sk2.org>
> ---
> V5: clarification of the open/close_range/execve sequence
> 
> V4: sort flags alphabetically
>     move commit references inside the corresponding section
>     more semantic newlines
>     unformat numeric constants
>     more formatting for function references
>     escape C backslashes
>     C99 loop indices
> 
> V3: fix synopsis overflow
>     copy notes from membarrier.2 re the lack of wrapper
>     semantic newlines
>     drop non-standard "USE CASES" section heading
>     add code example
> 
> V2: unsigned int to match the kernel declarations
>     groff and grammar tweaks
>     CLOSE_RANGE_UNSHARE unshares *and* closes
>     Explain that EMFILE and ENOMEM can occur with C_R_U
>     "Conforming to" phrasing
>     Detailed explanation of CLOSE_RANGE_UNSHARE
>     Reading /proc isn't common
> 
>  man2/close_range.2 | 267 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 267 insertions(+)
>  create mode 100644 man2/close_range.2
> 
> diff --git a/man2/close_range.2 b/man2/close_range.2
> new file mode 100644
> index 000000000..0677a9bf9
> --- /dev/null
> +++ b/man2/close_range.2
> @@ -0,0 +1,267 @@
> +.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +close_range \- close all file descriptors in a given range
> +.SH SYNOPSIS
> +.nf
> +.B #include <linux/close_range.h>
> +.PP
> +.BI "int close_range(unsigned int " first ", unsigned int " last ,
> +.BI "                unsigned int " flags );
> +.fi
> +.PP
> +.IR Note :
> +There is no glibc wrapper for this system call; see NOTES.
> +.SH DESCRIPTION
> +The
> +.BR close_range ()
> +system call closes all open file descriptors from
> +.I first
> +to
> +.I last
> +(included).
> +.PP
> +Errors closing a given file descriptor are currently ignored.
> +.PP
> +.I flags
> +can be 0 or set to one or both of the following:

Better, I think:
"flags is a bit mask containing 0 or more of the following:"

> +.TP
> +.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)"

s/5.10/5.11/ ?

> +sets the close-on-exec bit instead of

s/close-on-exec bit/file descriptor's close-on-exec flag/

> +immediately closing the file descriptors.
> +.TP
> +.B CLOSE_RANGE_UNSHARE
> +unshares the range of file descriptors from any other processes,
> +before closing them,
> +avoiding races with other threads sharing the file descriptor table.
> +.SH RETURN VALUE
> +On success,
> +.BR close_range ()
> +returns 0.
> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the cause of the error.
> +.SH ERRORS
> +.TP
> +.B EINVAL
> +.I flags
> +is not valid, or
> +.I first
> +is greater than
> +.IR last .
> +.PP
> +The following can occur with
> +.B CLOSE_RANGE_UNSHARE
> +(when constructing the new descriptor table):
> +.TP
> +.B EMFILE
> +The per-process limit on the number of open file descriptors has been reached
> +(see the description of
> +.B RLIMIT_NOFILE
> +in
> +.BR getrlimit (2)).
> +.TP
> +.B ENOMEM
> +Insufficient kernel memory was available.
> +.SH VERSIONS
> +.BR close_range ()
> +first appeared in Linux 5.9.
> +.SH CONFORMING TO
> +.BR close_range ()
> +is a nonstandard function that is also present on FreeBSD.
> +.SH NOTES
> +Glibc does not provide a wrapper for this system call; call it using
> +.BR syscall (2).
> +.SS Closing all open file descriptors
> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
> +To avoid blindly closing file descriptors
> +in the range of possible file descriptors,
> +this is sometimes implemented (on Linux)
> +by listing open file descriptors in
> +.I /proc/self/fd/
> +and calling
> +.BR close (2)
> +on each one.
> +.BR close_range ()
> +can take care of this without requiring
> +.I /proc
> +and within a single system call,
> +which provides significant performance benefits.
> +.SS Closing file descriptors before exec
> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
> +File descriptors can be closed safely using
> +.PP
> +.in +4n
> +.EX
> +/* we don't want anything past stderr here */
> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
> +execve(....);
> +.EE
> +.in
> +.PP
> +.B CLOSE_RANGE_UNSHARE
> +is conceptually equivalent to
> +.PP
> +.in +4n
> +.EX
> +unshare(CLONE_FILES);
> +close_range(first, last, 0);
> +.EE
> +.in
> +.PP
> +but can be more efficient:
> +if the unshared range extends past
> +the current maximum number of file descriptors allocated
> +in the caller's file descriptor table
> +(the common case when
> +.I last
> +is ~0U),
> +the kernel will unshare a new file descriptor table for the caller up to
> +.IR first .
> +This avoids subsequent close calls entirely;

s/close/.BR close (2)/

> +the whole operation is complete once the table is unshared.
> +.SS Closing files on \fBexec\fP
> +.\" 582f1fb6b721facf04848d2ca57f34468da1813e
> +This is particularly useful in cases where multiple
> +.RB pre- exec
> +setup steps risk conflicting with each other.
> +For example, setting up a
> +.BR seccomp (2)
> +profile can conflict with a
> +.BR close_range ()
> +call:
> +if the file descriptors are closed before the
> +.BR seccomp (2)
> +profile is set up,
> +the profile setup can't use them itself,
> +or control their closure;
> +if the file descriptors are closed afterwards,
> +the seccomp profile can't block the
> +.BR close_range ()
> +call or any fallbacks.
> +Using
> +.B CLOSE_RANGE_CLOEXEC
> +avoids this:
> +the descriptors can be marked before the
> +.BR seccomp (2)
> +profile is set up,
> +and the profile can control access to
> +.BR close_range ()
> +without affecting the calling process.
> +.SH EXAMPLES
> +The following program is designed to be execed by the second program
> +below.

I have some specific comments below, but a more general comment
to start with: why use two programs here? It seems to add complexity
without demonstrating anything that couldn't also be demonstrated
with a simpler single program, or have I missed something?

> +It lists its open file descriptors:
> +.PP
> +.in +4n
> +.EX
> +/* listopen.c */
> +
> +#include <stdio.h>
> +#include <sys/stat.h>
> +
> +int
> +main(int argc, char *argv[])
> +{
> +    struct stat buf;
> +
> +    for (int i = 0; i < 100; i++) {
> +        if (!fstat(i, &buf))

I kind of prefer "fstat(...) == 0"

> +            printf("FD %d is open.\en", i);
> +    }
> +
> +    exit(EXIT_SUCCESS);
> +)
> +.EE
> +.in
> +.PP
> +This program executes the command given on its command-line,
> +after opening the files listed after the command
> +and then using
> +.BR close_range ()
> +to close them:
> +.PP
> +.in +4n
> +.EX
> +/* close_range.c */
> +
> +#include <fcntl.h>
> +#include <linux/close_range.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +int
> +main(int argc, char *argv[])
> +{
> +    char *newargv[] = { NULL };
> +    char *newenviron[] = { NULL };
> +
> +    if (argc < 3) {
> +        fprintf(stderr, "Usage: %s <command-to-run> <files-to-open>\en", argv[0]);

Line too long. Please break it up so that it renders well on
an 80-column terminal.

Or, alternatively: 

        fprintf(stderr, "Usage: %s <command> <file>...\en", argv[0]);

> +        exit(EXIT_FAILURE);
> +    }
> +
> +    for (int i = 2; i < argc; i++) {
> +        if (open(argv[i], O_RDONLY) == -1) {
> +            perror(argv[i]);
> +            exit(EXIT_FAILURE);
> +        }
> +    }
> +
> +    if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) {

Line too long.

Alternatively, what about s/CLOSE_RANGE_UNSHARE/0/? Or it
considered best practice to always use CLOSE_RANGE_UNSHARE?

> +        perror("close_range");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    execve(argv[1], newargv, newenviron);
> +    perror("execve");
> +    exit(EXIT_FAILURE);
> +}
> +.EE
> +.in
> +.PP
> +We can use the second program to exec the first as follows:
> +.PP
> +.in +4n
> +.EX
> +.RB "$" " make listopen close_range"

Perhaps we don't really need the preceding line?

> +.RB "$" " ./close_range ./listopen /dev/null /dev/zero"
> +FD 0 is open.
> +FD 1 is open.
> +FD 2 is open.
> +.EE
> +.in
> +.PP
> +Removing the call to
> +.BR close_range ()
> +will show different output,
> +with the file descriptors for the named files still open.
> +.SH SEE ALSO
> +.BR close (2)
> 
> base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] close_range.2: new page documenting close_range(2)
  2020-12-22 20:36 ` Michael Kerrisk (man-pages)
@ 2020-12-23 19:33   ` Stephen Kitt
  0 siblings, 0 replies; 3+ messages in thread
From: Stephen Kitt @ 2020-12-23 19:33 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: linux-man, Alejandro Colomar, Christian Brauner,
	Giuseppe Scrivano, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 8159 bytes --]

Hi Michael,

On Tue, 22 Dec 2020 21:36:28 +0100, "Michael Kerrisk (man-pages)"
<mtk.manpages@gmail.com> wrote:
> On 12/21/20 8:46 PM, Stephen Kitt wrote:
[...]
> > +Errors closing a given file descriptor are currently ignored.
> > +.PP
> > +.I flags
> > +can be 0 or set to one or both of the following:  
> 
> Better, I think:
> "flags is a bit mask containing 0 or more of the following:"

Indeed, thanks!

> > +.TP
> > +.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)"  
> 
> s/5.10/5.11/ ?

Oops, yes, 5.11.

> > +sets the close-on-exec bit instead of  
> 
> s/close-on-exec bit/file descriptor's close-on-exec flag/

Noted.

> > +immediately closing the file descriptors.
> > +.TP
> > +.B CLOSE_RANGE_UNSHARE
> > +unshares the range of file descriptors from any other processes,
> > +before closing them,
> > +avoiding races with other threads sharing the file descriptor table.
> > +.SH RETURN VALUE
> > +On success,
> > +.BR close_range ()
> > +returns 0.
> > +On error, \-1 is returned and
> > +.I errno
> > +is set to indicate the cause of the error.
> > +.SH ERRORS
> > +.TP
> > +.B EINVAL
> > +.I flags
> > +is not valid, or
> > +.I first
> > +is greater than
> > +.IR last .
> > +.PP
> > +The following can occur with
> > +.B CLOSE_RANGE_UNSHARE
> > +(when constructing the new descriptor table):
> > +.TP
> > +.B EMFILE
> > +The per-process limit on the number of open file descriptors has been
> > reached +(see the description of
> > +.B RLIMIT_NOFILE
> > +in
> > +.BR getrlimit (2)).
> > +.TP
> > +.B ENOMEM
> > +Insufficient kernel memory was available.
> > +.SH VERSIONS
> > +.BR close_range ()
> > +first appeared in Linux 5.9.
> > +.SH CONFORMING TO
> > +.BR close_range ()
> > +is a nonstandard function that is also present on FreeBSD.
> > +.SH NOTES
> > +Glibc does not provide a wrapper for this system call; call it using
> > +.BR syscall (2).
> > +.SS Closing all open file descriptors
> > +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
> > +To avoid blindly closing file descriptors
> > +in the range of possible file descriptors,
> > +this is sometimes implemented (on Linux)
> > +by listing open file descriptors in
> > +.I /proc/self/fd/
> > +and calling
> > +.BR close (2)
> > +on each one.
> > +.BR close_range ()
> > +can take care of this without requiring
> > +.I /proc
> > +and within a single system call,
> > +which provides significant performance benefits.
> > +.SS Closing file descriptors before exec
> > +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
> > +File descriptors can be closed safely using
> > +.PP
> > +.in +4n
> > +.EX
> > +/* we don't want anything past stderr here */
> > +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
> > +execve(....);
> > +.EE
> > +.in
> > +.PP
> > +.B CLOSE_RANGE_UNSHARE
> > +is conceptually equivalent to
> > +.PP
> > +.in +4n
> > +.EX
> > +unshare(CLONE_FILES);
> > +close_range(first, last, 0);
> > +.EE
> > +.in
> > +.PP
> > +but can be more efficient:
> > +if the unshared range extends past
> > +the current maximum number of file descriptors allocated
> > +in the caller's file descriptor table
> > +(the common case when
> > +.I last
> > +is ~0U),
> > +the kernel will unshare a new file descriptor table for the caller up to
> > +.IR first .
> > +This avoids subsequent close calls entirely;  
> 
> s/close/.BR close (2)/

Noted.

> > +the whole operation is complete once the table is unshared.
> > +.SS Closing files on \fBexec\fP
> > +.\" 582f1fb6b721facf04848d2ca57f34468da1813e
> > +This is particularly useful in cases where multiple
> > +.RB pre- exec
> > +setup steps risk conflicting with each other.
> > +For example, setting up a
> > +.BR seccomp (2)
> > +profile can conflict with a
> > +.BR close_range ()
> > +call:
> > +if the file descriptors are closed before the
> > +.BR seccomp (2)
> > +profile is set up,
> > +the profile setup can't use them itself,
> > +or control their closure;
> > +if the file descriptors are closed afterwards,
> > +the seccomp profile can't block the
> > +.BR close_range ()
> > +call or any fallbacks.
> > +Using
> > +.B CLOSE_RANGE_CLOEXEC
> > +avoids this:
> > +the descriptors can be marked before the
> > +.BR seccomp (2)
> > +profile is set up,
> > +and the profile can control access to
> > +.BR close_range ()
> > +without affecting the calling process.
> > +.SH EXAMPLES
> > +The following program is designed to be execed by the second program
> > +below.  
> 
> I have some specific comments below, but a more general comment
> to start with: why use two programs here? It seems to add complexity
> without demonstrating anything that couldn't also be demonstrated
> with a simpler single program, or have I missed something?

I based the example on the test code in the kernel and the examples from
execve(2), since close_range(2) is mostly useful in preparation for an
execve(2) call.

> > +It lists its open file descriptors:
> > +.PP
> > +.in +4n
> > +.EX
> > +/* listopen.c */
> > +
> > +#include <stdio.h>
> > +#include <sys/stat.h>
> > +
> > +int
> > +main(int argc, char *argv[])
> > +{
> > +    struct stat buf;
> > +
> > +    for (int i = 0; i < 100; i++) {
> > +        if (!fstat(i, &buf))  
> 
> I kind of prefer "fstat(...) == 0"

Ah yes, that makes sense.

> > +            printf("FD %d is open.\en", i);
> > +    }
> > +
> > +    exit(EXIT_SUCCESS);
> > +)
> > +.EE
> > +.in
> > +.PP
> > +This program executes the command given on its command-line,
> > +after opening the files listed after the command
> > +and then using
> > +.BR close_range ()
> > +to close them:
> > +.PP
> > +.in +4n
> > +.EX
> > +/* close_range.c */
> > +
> > +#include <fcntl.h>
> > +#include <linux/close_range.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <sys/stat.h>
> > +#include <sys/syscall.h>
> > +#include <sys/types.h>
> > +#include <unistd.h>
> > +
> > +int
> > +main(int argc, char *argv[])
> > +{
> > +    char *newargv[] = { NULL };
> > +    char *newenviron[] = { NULL };
> > +
> > +    if (argc < 3) {
> > +        fprintf(stderr, "Usage: %s <command-to-run> <files-to-open>\en",
> > argv[0]);  
> 
> Line too long. Please break it up so that it renders well on
> an 80-column terminal.
> 
> Or, alternatively: 
> 
>         fprintf(stderr, "Usage: %s <command> <file>...\en", argv[0]);

Noted.

> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    for (int i = 2; i < argc; i++) {
> > +        if (open(argv[i], O_RDONLY) == -1) {
> > +            perror(argv[i]);
> > +            exit(EXIT_FAILURE);
> > +        }
> > +    }
> > +
> > +    if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) {  
> 
> Line too long.
> 
> Alternatively, what about s/CLOSE_RANGE_UNSHARE/0/? Or it
> considered best practice to always use CLOSE_RANGE_UNSHARE?

In this particular context it’s not required, but I would argue that it’s
better to use _UNSHARE in general — it avoids the risk of forgetting to add
it in long-lived code which ends up using threads at some point...

> > +        perror("close_range");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    execve(argv[1], newargv, newenviron);
> > +    perror("execve");
> > +    exit(EXIT_FAILURE);
> > +}
> > +.EE
> > +.in
> > +.PP
> > +We can use the second program to exec the first as follows:
> > +.PP
> > +.in +4n
> > +.EX
> > +.RB "$" " make listopen close_range"  
> 
> Perhaps we don't really need the preceding line?

I was following the examples in execve(2) which show how to build the
programs.

> > +.RB "$" " ./close_range ./listopen /dev/null /dev/zero"
> > +FD 0 is open.
> > +FD 1 is open.
> > +FD 2 is open.
> > +.EE
> > +.in
> > +.PP
> > +Removing the call to
> > +.BR close_range ()
> > +will show different output,
> > +with the file descriptors for the named files still open.
> > +.SH SEE ALSO
> > +.BR close (2)
> > 
> > base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55  

Thanks,

Stephen

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-23 19:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-21 19:46 [PATCH v5] close_range.2: new page documenting close_range(2) Stephen Kitt
2020-12-22 20:36 ` Michael Kerrisk (man-pages)
2020-12-23 19:33   ` Stephen Kitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).