linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] close_range.2: new page documenting close_range(2)
@ 2020-12-08 21:51 Stephen Kitt
  2020-12-09  8:50 ` Michael Kerrisk (man-pages)
  2020-12-09  9:58 ` Christian Brauner
  0 siblings, 2 replies; 14+ messages in thread
From: Stephen Kitt @ 2020-12-08 21:51 UTC (permalink / raw)
  To: linux-man, Alejandro Colomar, Michael Kerrisk
  Cc: Christian Brauner, linux-kernel, Stephen Kitt

This documents close_range(2) based on information in
278a5fbaed89dacd04e9d052f4594ffd0e0585de and
60997c3d45d9a67daf01c56d805ae4fec37e0bd8.

Signed-off-by: Stephen Kitt <steve@sk2.org>
---
 man2/close_range.2 | 112 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)
 create mode 100644 man2/close_range.2

diff --git a/man2/close_range.2 b/man2/close_range.2
new file mode 100644
index 000000000..62167d9b0
--- /dev/null
+++ b/man2/close_range.2
@@ -0,0 +1,112 @@
+.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
+.SH NAME
+close_range \- close all file descriptors in a given range
+.SH SYNOPSIS
+.nf
+.B #include <linux/close_range.h>
+.PP
+.BI "int close_range(int " first ", int " last ", unsigned int " flags );
+.fi
+.SH DESCRIPTION
+The
+.BR close_range ()
+system call closes all open file descriptors from
+.I first
+to
+.IR last
+(included).
+.PP
+Errors closing a given file descriptor are currently ignored.
+.PP
+.I flags
+can be set to
+.B CLOSE_RANGE_UNSHARE
+to unshare the range of file descriptors from any other processes,
+.I instead
+of closing them.
+.SH RETURN VALUE
+On success,
+.BR close_range ()
+return 0.
+On error, \-1 is returned and
+.I errno
+is set to indicate the cause of the error.
+.SH ERRORS
+.TP
+.B EINVAL
+.I flags
+is not valid, or
+.I first
+is greater than
+.IR last .
+.TP
+.B EMFILE
+The per-process limit on the number of open file descriptors has been reached
+(see the description of
+.BR RLIMIT_NOFILE
+in
+.BR getrlimit (2)).
+.TP
+.B ENOMEM
+Insufficient kernel memory was available.
+.SH VERSIONS
+.BR close_range ()
+first appeared in Linux 5.9.
+.SH CONFORMING TO
+.BR close_range ()
+is available on Linux and FreeBSD.
+.SH NOTES
+Currently, there is no glibc wrapper for this system call; call it using
+.BR syscall (2).
+.SH USE CASES
+.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
+.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
+.SS Closing file descriptors before exec
+File descriptors can be closed safely using
+.PP
+.in +4n
+.EX
+/* we don't want anything past stderr here */
+close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
+execve(....);
+.EE
+.in
+.PP
+.SS Closing all open file descriptors
+This is commonly implemented (on Linux) by listing open file
+descriptors in
+.B /proc/self/fd/
+and calling
+.BR close (2)
+on each one.
+.BR close_range ()
+can take care of this without requiring
+.B /proc
+and with a single system call, which provides significant performance
+benefits.
+.SH SEE ALSO
+.BR close (2)

base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-08 21:51 [patch] close_range.2: new page documenting close_range(2) Stephen Kitt
@ 2020-12-09  8:50 ` Michael Kerrisk (man-pages)
  2020-12-09  9:40   ` Christian Brauner
  2020-12-09  9:47   ` Alejandro Colomar (man-pages)
  2020-12-09  9:58 ` Christian Brauner
  1 sibling, 2 replies; 14+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-12-09  8:50 UTC (permalink / raw)
  To: Stephen Kitt; +Cc: linux-man, Alejandro Colomar, Christian Brauner, lkml

Hello Stephen

Thank you for writing this page! Some comments/questions below.

On Tue, 8 Dec 2020 at 22:51, Stephen Kitt <steve@sk2.org> wrote:
>
> This documents close_range(2) based on information in
> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.

(Thanks for noting these commit IDs.)

> Signed-off-by: Stephen Kitt <steve@sk2.org>
> ---
>  man2/close_range.2 | 112 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 112 insertions(+)
>  create mode 100644 man2/close_range.2
>
> diff --git a/man2/close_range.2 b/man2/close_range.2
> new file mode 100644
> index 000000000..62167d9b0
> --- /dev/null
> +++ b/man2/close_range.2
> @@ -0,0 +1,112 @@
> +.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +close_range \- close all file descriptors in a given range
> +.SH SYNOPSIS
> +.nf
> +.B #include <linux/close_range.h>
> +.PP
> +.BI "int close_range(int " first ", int " last ", unsigned int " flags );
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR close_range ()
> +system call closes all open file descriptors from
> +.I first
> +to
> +.IR last
> +(included).
> +.PP
> +Errors closing a given file descriptor are currently ignored.
> +.PP
> +.I flags
> +can be set to
> +.B CLOSE_RANGE_UNSHARE
> +to unshare the range of file descriptors from any other processes,
> +.I instead
> +of closing them.

Really "instead of closing them"? I had supposed that rather that this
should be "before closing them". That's also how the kernel code reads
to me, from a quick glance.

> +.SH RETURN VALUE
> +On success,
> +.BR close_range ()
> +return 0.

s/return/returns/

> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the cause of the error.
> +.SH ERRORS
> +.TP
> +.B EINVAL
> +.I flags
> +is not valid, or
> +.I first
> +is greater than
> +.IR last .
> +.TP
> +.B EMFILE
> +The per-process limit on the number of open file descriptors has been reached
> +(see the description of
> +.BR RLIMIT_NOFILE
> +in
> +.BR getrlimit (2)).

Given that we are simply closing FDs, how can EMFILE occur?

> +.TP
> +.B ENOMEM
> +Insufficient kernel memory was available.
> +.SH VERSIONS
> +.BR close_range ()
> +first appeared in Linux 5.9.
> +.SH CONFORMING TO
> +.BR close_range ()
> +is available on Linux and FreeBSD.

Here, I think it would be better to write:

close_range()
is a nonstandard function that is also present on FreeBSD.

> +.SH NOTES
> +Currently, there is no glibc wrapper for this system call; call it using
> +.BR syscall (2).
> +.SH USE CASES
> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
> +.SS Closing file descriptors before exec
> +File descriptors can be closed safely using
> +.PP
> +.in +4n
> +.EX
> +/* we don't want anything past stderr here */
> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
> +execve(....);
> +.EE
> +.in
> +.PP

.PP is not necessary before a new subsection (.SS).

> +.SS Closing all open file descriptors
> +This is commonly implemented (on Linux) by listing open file

Is it really true that this is common? I suspect not. It's slow, and
relies on /proc being present. I would have thought that more common
is something like:

        int maxfd = sysconf(_SC_OPEN_MAX);
        if (maxfd == -1)                /* Limit is indeterminate... */
            maxfd = 16384;           /* so take a guess */

        for (fd = 0; fd < maxfd; fd++)
            close(fd);

I think it's fine to mention the use of a /proc as an (inferior and)
alternative way of doing this. I'm just not sure that "commonly" is
correct.

> +descriptors in
> +.B /proc/self/fd/
> +and calling
> +.BR close (2)
> +on each one.
> +.BR close_range ()
> +can take care of this without requiring
> +.B /proc
> +and with a single system call, which provides significant performance
> +benefits.
> +.SH SEE ALSO
> +.BR close (2)
>
> base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
> --
> 2.20.1

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-09  8:50 ` Michael Kerrisk (man-pages)
@ 2020-12-09  9:40   ` Christian Brauner
  2020-12-09  9:43     ` Stephen Kitt
  2020-12-09  9:47   ` Alejandro Colomar (man-pages)
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2020-12-09  9:40 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Stephen Kitt, linux-man, Alejandro Colomar, lkml

On Wed, Dec 09, 2020 at 09:50:38AM +0100, Michael Kerrisk (man-pages) wrote:
> Hello Stephen
> 
> Thank you for writing this page! Some comments/questions below.
> 
> On Tue, 8 Dec 2020 at 22:51, Stephen Kitt <steve@sk2.org> wrote:
> >
> > This documents close_range(2) based on information in
> > 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
> > 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
> 
> (Thanks for noting these commit IDs.)
> 
> > Signed-off-by: Stephen Kitt <steve@sk2.org>
> > ---
> >  man2/close_range.2 | 112 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 112 insertions(+)
> >  create mode 100644 man2/close_range.2
> >
> > diff --git a/man2/close_range.2 b/man2/close_range.2
> > new file mode 100644
> > index 000000000..62167d9b0
> > --- /dev/null
> > +++ b/man2/close_range.2
> > @@ -0,0 +1,112 @@
> > +.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
> > +.\"
> > +.\" %%%LICENSE_START(VERBATIM)
> > +.\" Permission is granted to make and distribute verbatim copies of this
> > +.\" manual provided the copyright notice and this permission notice are
> > +.\" preserved on all copies.
> > +.\"
> > +.\" Permission is granted to copy and distribute modified versions of this
> > +.\" manual under the conditions for verbatim copying, provided that the
> > +.\" entire resulting derived work is distributed under the terms of a
> > +.\" permission notice identical to this one.
> > +.\"
> > +.\" Since the Linux kernel and libraries are constantly changing, this
> > +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> > +.\" responsibility for errors or omissions, or for damages resulting from
> > +.\" the use of the information contained herein.  The author(s) may not
> > +.\" have taken the same level of care in the production of this manual,
> > +.\" which is licensed free of charge, as they might when working
> > +.\" professionally.
> > +.\"
> > +.\" Formatted or processed versions of this manual, if unaccompanied by
> > +.\" the source, must acknowledge the copyright and authors of this work.
> > +.\" %%%LICENSE_END
> > +.\"
> > +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
> > +.SH NAME
> > +close_range \- close all file descriptors in a given range
> > +.SH SYNOPSIS
> > +.nf
> > +.B #include <linux/close_range.h>
> > +.PP
> > +.BI "int close_range(int " first ", int " last ", unsigned int " flags );
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR close_range ()
> > +system call closes all open file descriptors from
> > +.I first
> > +to
> > +.IR last
> > +(included).
> > +.PP
> > +Errors closing a given file descriptor are currently ignored.
> > +.PP
> > +.I flags
> > +can be set to
> > +.B CLOSE_RANGE_UNSHARE
> > +to unshare the range of file descriptors from any other processes,
> > +.I instead
> > +of closing them.
> 
> Really "instead of closing them"? I had supposed that rather that this
> should be "before closing them". That's also how the kernel code reads
> to me, from a quick glance.

It's also mentioned in the commit message. Basically setting
CLOSE_RANGE_UNSHARE is equivalent to:

unshare(CLONE_FILES);
close_range(<start>, <end>);

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-09  9:40   ` Christian Brauner
@ 2020-12-09  9:43     ` Stephen Kitt
  0 siblings, 0 replies; 14+ messages in thread
From: Stephen Kitt @ 2020-12-09  9:43 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Christian Brauner
  Cc: linux-man, Alejandro Colomar, lkml

Le 09/12/2020 10:40, Christian Brauner a écrit :
> On Wed, Dec 09, 2020 at 09:50:38AM +0100, Michael Kerrisk (man-pages) 
> wrote:
>> > +.PP
>> > +.I flags
>> > +can be set to
>> > +.B CLOSE_RANGE_UNSHARE
>> > +to unshare the range of file descriptors from any other processes,
>> > +.I instead
>> > +of closing them.
>> 
>> Really "instead of closing them"? I had supposed that rather that this
>> should be "before closing them". That's also how the kernel code reads
>> to me, from a quick glance.
> 
> It's also mentioned in the commit message. Basically setting
> CLOSE_RANGE_UNSHARE is equivalent to:
> 
> unshare(CLONE_FILES);
> close_range(<start>, <end>);

Yes, I got that mixed up, thanks for the clarification! I'll send a v2 
addressing the review comments later today.

Regards,

Stephen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-09  8:50 ` Michael Kerrisk (man-pages)
  2020-12-09  9:40   ` Christian Brauner
@ 2020-12-09  9:47   ` Alejandro Colomar (man-pages)
  2020-12-10 22:40     ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 14+ messages in thread
From: Alejandro Colomar (man-pages) @ 2020-12-09  9:47 UTC (permalink / raw)
  To: mtk.manpages, Stephen Kitt; +Cc: linux-man, Christian Brauner, lkml

Hello Stephen and Michael,

Thanks for the page!

A few more comments below.

Thanks,

Alex

On 12/9/20 9:50 AM, Michael Kerrisk (man-pages) wrote:
> Hello Stephen
> 
> Thank you for writing this page! Some comments/questions below.
> 
> On Tue, 8 Dec 2020 at 22:51, Stephen Kitt <steve@sk2.org> wrote:
>>
>> This documents close_range(2) based on information in
>> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
>> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
> 
> (Thanks for noting these commit IDs.)
> 
>> Signed-off-by: Stephen Kitt <steve@sk2.org>
>> ---
>>  man2/close_range.2 | 112 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 112 insertions(+)
>>  create mode 100644 man2/close_range.2
>>
>> diff --git a/man2/close_range.2 b/man2/close_range.2
>> new file mode 100644
>> index 000000000..62167d9b0
>> --- /dev/null
>> +++ b/man2/close_range.2
>> @@ -0,0 +1,112 @@
>> +.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
>> +.\"
>> +.\" %%%LICENSE_START(VERBATIM)
>> +.\" Permission is granted to make and distribute verbatim copies of this
>> +.\" manual provided the copyright notice and this permission notice are
>> +.\" preserved on all copies.
>> +.\"
>> +.\" Permission is granted to copy and distribute modified versions of this
>> +.\" manual under the conditions for verbatim copying, provided that the
>> +.\" entire resulting derived work is distributed under the terms of a
>> +.\" permission notice identical to this one.
>> +.\"
>> +.\" Since the Linux kernel and libraries are constantly changing, this
>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
>> +.\" responsibility for errors or omissions, or for damages resulting from
>> +.\" the use of the information contained herein.  The author(s) may not
>> +.\" have taken the same level of care in the production of this manual,
>> +.\" which is licensed free of charge, as they might when working
>> +.\" professionally.
>> +.\"
>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>> +.\" the source, must acknowledge the copyright and authors of this work.
>> +.\" %%%LICENSE_END
>> +.\"
>> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
>> +.SH NAME
>> +close_range \- close all file descriptors in a given range
>> +.SH SYNOPSIS
>> +.nf
>> +.B #include <linux/close_range.h>
>> +.PP
>> +.BI "int close_range(int " first ", int " last ", unsigned int " flags );

The actual interface of close_range() uses 'unsigned int'
for the file descriptors:

$ sed -n '/SYSCALL_DEFINE.(close_range,/,/^{/p' linux/fs/open.c
SYSCALL_DEFINE3(close_range, unsigned int, fd, unsigned int, max_fd,
		unsigned int, flags)
{


>> +.fi
>> +.SH DESCRIPTION
>> +The
>> +.BR close_range ()
>> +system call closes all open file descriptors from
>> +.I first
>> +to
>> +.IR last

s/IR/I/

>> +(included).
>> +.PP
>> +Errors closing a given file descriptor are currently ignored.
>> +.PP
>> +.I flags
>> +can be set to
>> +.B CLOSE_RANGE_UNSHARE
>> +to unshare the range of file descriptors from any other processes,
>> +.I instead
>> +of closing them.
> 
> Really "instead of closing them"? I had supposed that rather that this
> should be "before closing them". That's also how the kernel code reads
> to me, from a quick glance.
> 
>> +.SH RETURN VALUE
>> +On success,
>> +.BR close_range ()
>> +return 0.
> 
> s/return/returns/
> 
>> +On error, \-1 is returned and
>> +.I errno
>> +is set to indicate the cause of the error.
>> +.SH ERRORS
>> +.TP
>> +.B EINVAL
>> +.I flags
>> +is not valid, or
>> +.I first
>> +is greater than
>> +.IR last .
>> +.TP
>> +.B EMFILE
>> +The per-process limit on the number of open file descriptors has been reached
>> +(see the description of
>> +.BR RLIMIT_NOFILE
>> +in
>> +.BR getrlimit (2)).
> 
> Given that we are simply closing FDs, how can EMFILE occur?

I was also going to ask this, but after some investigation,
it's possible.  Basically, CLOSE_RANGE_UNSHARE duplicates fds
before closing them.  In case that duplication is not possible,
ENOMEM and EMFILE are the two possible reasons.  It might be interesting
to note that those errors may only occur if CLOSE_RANGE_UNSHARE was set.
See below:

fs/open.c:
close_range() calls __close_range().

fs/file.c:
If CLOSE_RANGE_UNSHARE, __close_range() calls unshare_fd()
with CLONE_FILES flag.

kernel/fork.c:
unshare_fd() calls dup_fd().

fs/file.c:
dup_fd() may fail with -ENOMEM or -EMFILE.

> 
>> +.TP
>> +.B ENOMEM
>> +Insufficient kernel memory was available.
>> +.SH VERSIONS
>> +.BR close_range ()
>> +first appeared in Linux 5.9.
>> +.SH CONFORMING TO
>> +.BR close_range ()
>> +is available on Linux and FreeBSD.
> 
> Here, I think it would be better to write:
> 
> close_range()
> is a nonstandard function that is also present on FreeBSD.
> 
>> +.SH NOTES
>> +Currently, there is no glibc wrapper for this system call; call it using
>> +.BR syscall (2).
>> +.SH USE CASES
>> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
>> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
>> +.SS Closing file descriptors before exec
>> +File descriptors can be closed safely using
>> +.PP
>> +.in +4n
>> +.EX
>> +/* we don't want anything past stderr here */
>> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
>> +execve(....);
>> +.EE
>> +.in
>> +.PP
> 
> .PP is not necessary before a new subsection (.SS).
> 
>> +.SS Closing all open file descriptors
>> +This is commonly implemented (on Linux) by listing open file
> 
> Is it really true that this is common? I suspect not. It's slow, and
> relies on /proc being present. I would have thought that more common
> is something like:
> 
>         int maxfd = sysconf(_SC_OPEN_MAX);
>         if (maxfd == -1)                /* Limit is indeterminate... */
>             maxfd = 16384;           /* so take a guess */
> 
>         for (fd = 0; fd < maxfd; fd++)
>             close(fd);
> 
> I think it's fine to mention the use of a /proc as an (inferior and)
> alternative way of doing this. I'm just not sure that "commonly" is
> correct.
> 
>> +descriptors in
>> +.B /proc/self/fd/

By reading proc.5, I think this should s/.B/.I/, right mtk?

>> +and calling
>> +.BR close (2)
>> +on each one.
>> +.BR close_range ()
>> +can take care of this without requiring
>> +.B /proc

By reading proc.5, I think this should s/.B/.I/, right mtk?

>> +and with a single system call, which provides significant performance
>> +benefits.
>> +.SH SEE ALSO
>> +.BR close (2)
>>
>> base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
>> --
>> 2.20.1
> 
> Thanks,
> 
> Michael
> 
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-08 21:51 [patch] close_range.2: new page documenting close_range(2) Stephen Kitt
  2020-12-09  8:50 ` Michael Kerrisk (man-pages)
@ 2020-12-09  9:58 ` Christian Brauner
  2020-12-09 10:44   ` Alejandro Colomar (man-pages)
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2020-12-09  9:58 UTC (permalink / raw)
  To: Stephen Kitt; +Cc: linux-man, Alejandro Colomar, Michael Kerrisk, linux-kernel

On Tue, Dec 08, 2020 at 10:51:33PM +0100, Stephen Kitt wrote:
> This documents close_range(2) based on information in
> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
> 
> Signed-off-by: Stephen Kitt <steve@sk2.org>
> ---

Hey Stephen,

Thanks for working on this that's an early Christmas present as it gets
an item off my todo list!

>  man2/close_range.2 | 112 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 112 insertions(+)
>  create mode 100644 man2/close_range.2
> 
> diff --git a/man2/close_range.2 b/man2/close_range.2
> new file mode 100644
> index 000000000..62167d9b0
> --- /dev/null
> +++ b/man2/close_range.2
> @@ -0,0 +1,112 @@
> +.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +close_range \- close all file descriptors in a given range
> +.SH SYNOPSIS
> +.nf
> +.B #include <linux/close_range.h>
> +.PP
> +.BI "int close_range(int " first ", int " last ", unsigned int " flags );

Note, the kernel prototype uses unsigned int as the type for file
descriptor arguments. As does the close() syscall itself. Only glibc
wrappers expose file descriptor types (at least in close variants) as
int.
Since this is a manpage about the syscall not the wrapper it might make
sense to note the correct types.

> +.fi
> +.SH DESCRIPTION
> +The
> +.BR close_range ()
> +system call closes all open file descriptors from
> +.I first
> +to
> +.IR last
> +(included).
> +.PP
> +Errors closing a given file descriptor are currently ignored.
> +.PP
> +.I flags
> +can be set to
> +.B CLOSE_RANGE_UNSHARE
> +to unshare the range of file descriptors from any other processes,
> +.I instead
> +of closing them.

As Michael has noted, this needs to be reworded. A few things to note:
- CLOSE_RANGE_UNSHARE will ensure that the calling process will have a
  private file descriptor table. This ensures that other threads opening
  files cannot inject new file descriptors into the caller's file
  descriptor table to e.g. make the caller inherit unwanted file
  descriptors.
- CLOSE_RANGE_UNSHARE is conceptually equivalent to:
  unshare(CLONE_FILES);
  close_range(3, ~0U);
- Whenever the requested range @last is greater than the current maximum
  number of file descriptors allocated in the caller's file descriptor
  table the kernel will only unshare a new file descriptor table for the
  caller up to @first, i.e. the new file descriptor table will be 0 up
  to and including @first not 0 up to and including @last. Which means
  that the kernel will not have to do any costly filp_close() calls at
  all. In essence, the close_range() operation is finished after the
  in-kernel unshare call in such cases.

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-09  9:58 ` Christian Brauner
@ 2020-12-09 10:44   ` Alejandro Colomar (man-pages)
  2020-12-09 10:56     ` Christian Brauner
  0 siblings, 1 reply; 14+ messages in thread
From: Alejandro Colomar (man-pages) @ 2020-12-09 10:44 UTC (permalink / raw)
  To: Christian Brauner, Stephen Kitt; +Cc: linux-man, Michael Kerrisk, linux-kernel

Hey Christian,

I have a question for you below.

Thanks,

Alex

On 12/9/20 10:58 AM, Christian Brauner wrote:
> On Tue, Dec 08, 2020 at 10:51:33PM +0100, Stephen Kitt wrote:
>> This documents close_range(2) based on information in
>> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
>> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
>>
>> Signed-off-by: Stephen Kitt <steve@sk2.org>
>> ---
> 
> Hey Stephen,
> 
> Thanks for working on this that's an early Christmas present as it gets
> an item off my todo list!
> 
>>  man2/close_range.2 | 112 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 112 insertions(+)
>>  create mode 100644 man2/close_range.2
>>
>> diff --git a/man2/close_range.2 b/man2/close_range.2
>> new file mode 100644
>> index 000000000..62167d9b0
>> --- /dev/null
>> +++ b/man2/close_range.2
>> @@ -0,0 +1,112 @@
>> +.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
>> +.\"
>> +.\" %%%LICENSE_START(VERBATIM)
>> +.\" Permission is granted to make and distribute verbatim copies of this
>> +.\" manual provided the copyright notice and this permission notice are
>> +.\" preserved on all copies.
>> +.\"
>> +.\" Permission is granted to copy and distribute modified versions of this
>> +.\" manual under the conditions for verbatim copying, provided that the
>> +.\" entire resulting derived work is distributed under the terms of a
>> +.\" permission notice identical to this one.
>> +.\"
>> +.\" Since the Linux kernel and libraries are constantly changing, this
>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
>> +.\" responsibility for errors or omissions, or for damages resulting from
>> +.\" the use of the information contained herein.  The author(s) may not
>> +.\" have taken the same level of care in the production of this manual,
>> +.\" which is licensed free of charge, as they might when working
>> +.\" professionally.
>> +.\"
>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>> +.\" the source, must acknowledge the copyright and authors of this work.
>> +.\" %%%LICENSE_END
>> +.\"
>> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
>> +.SH NAME
>> +close_range \- close all file descriptors in a given range
>> +.SH SYNOPSIS
>> +.nf
>> +.B #include <linux/close_range.h>
>> +.PP
>> +.BI "int close_range(int " first ", int " last ", unsigned int " flags );
> 
> Note, the kernel prototype uses unsigned int as the type for file
> descriptor arguments. As does the close() syscall itself. Only glibc
> wrappers expose file descriptor types (at least in close variants) as
> int.
> Since this is a manpage about the syscall not the wrapper it might make
> sense to note the correct types.
> 
>> +.fi
>> +.SH DESCRIPTION
>> +The
>> +.BR close_range ()
>> +system call closes all open file descriptors from
>> +.I first
>> +to
>> +.IR last
>> +(included).
>> +.PP
>> +Errors closing a given file descriptor are currently ignored.
>> +.PP
>> +.I flags
>> +can be set to
>> +.B CLOSE_RANGE_UNSHARE
>> +to unshare the range of file descriptors from any other processes,
>> +.I instead
>> +of closing them.
> 
> As Michael has noted, this needs to be reworded. A few things to note:
> - CLOSE_RANGE_UNSHARE will ensure that the calling process will have a
>   private file descriptor table. This ensures that other threads opening
>   files cannot inject new file descriptors into the caller's file
>   descriptor table to e.g. make the caller inherit unwanted file
>   descriptors.
> - CLOSE_RANGE_UNSHARE is conceptually equivalent to:
>   unshare(CLONE_FILES);
>   close_range(3, ~0U);

AFAICS after reading the code, if the unsharing fails,
it will not close any file descriptors (please correct me if I'm wrong).

Just wanted to be sure that it was the intended behavior with you,
and if so, it would be good to document it in the page.

> - Whenever the requested range @last is greater than the current maximum
>   number of file descriptors allocated in the caller's file descriptor
>   table the kernel will only unshare a new file descriptor table for the
>   caller up to @first, i.e. the new file descriptor table will be 0 up
>   to and including @first not 0 up to and including @last. Which means
>   that the kernel will not have to do any costly filp_close() calls at
>   all. In essence, the close_range() operation is finished after the
>   in-kernel unshare call in such cases.
> 
> Christian
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-09 10:44   ` Alejandro Colomar (man-pages)
@ 2020-12-09 10:56     ` Christian Brauner
  2020-12-10 14:36       ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2020-12-09 10:56 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: Stephen Kitt, linux-man, Michael Kerrisk, linux-kernel

On Wed, Dec 09, 2020 at 11:44:22AM +0100, Alejandro Colomar (man-pages) wrote:
> Hey Christian,
> 
> I have a question for you below.
> 
> Thanks,

Hey Alex,

Sure!

> 
> Alex
> 
> On 12/9/20 10:58 AM, Christian Brauner wrote:
> > On Tue, Dec 08, 2020 at 10:51:33PM +0100, Stephen Kitt wrote:
> >> This documents close_range(2) based on information in
> >> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
> >> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
> >>
> >> Signed-off-by: Stephen Kitt <steve@sk2.org>
> >> ---
> > 
> > Hey Stephen,
> > 
> > Thanks for working on this that's an early Christmas present as it gets
> > an item off my todo list!
> > 
> >>  man2/close_range.2 | 112 +++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 112 insertions(+)
> >>  create mode 100644 man2/close_range.2
> >>
> >> diff --git a/man2/close_range.2 b/man2/close_range.2
> >> new file mode 100644
> >> index 000000000..62167d9b0
> >> --- /dev/null
> >> +++ b/man2/close_range.2
> >> @@ -0,0 +1,112 @@
> >> +.\" Copyright (c) 2020 Stephen Kitt <steve@sk2.org>
> >> +.\"
> >> +.\" %%%LICENSE_START(VERBATIM)
> >> +.\" Permission is granted to make and distribute verbatim copies of this
> >> +.\" manual provided the copyright notice and this permission notice are
> >> +.\" preserved on all copies.
> >> +.\"
> >> +.\" Permission is granted to copy and distribute modified versions of this
> >> +.\" manual under the conditions for verbatim copying, provided that the
> >> +.\" entire resulting derived work is distributed under the terms of a
> >> +.\" permission notice identical to this one.
> >> +.\"
> >> +.\" Since the Linux kernel and libraries are constantly changing, this
> >> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> >> +.\" responsibility for errors or omissions, or for damages resulting from
> >> +.\" the use of the information contained herein.  The author(s) may not
> >> +.\" have taken the same level of care in the production of this manual,
> >> +.\" which is licensed free of charge, as they might when working
> >> +.\" professionally.
> >> +.\"
> >> +.\" Formatted or processed versions of this manual, if unaccompanied by
> >> +.\" the source, must acknowledge the copyright and authors of this work.
> >> +.\" %%%LICENSE_END
> >> +.\"
> >> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
> >> +.SH NAME
> >> +close_range \- close all file descriptors in a given range
> >> +.SH SYNOPSIS
> >> +.nf
> >> +.B #include <linux/close_range.h>
> >> +.PP
> >> +.BI "int close_range(int " first ", int " last ", unsigned int " flags );
> > 
> > Note, the kernel prototype uses unsigned int as the type for file
> > descriptor arguments. As does the close() syscall itself. Only glibc
> > wrappers expose file descriptor types (at least in close variants) as
> > int.
> > Since this is a manpage about the syscall not the wrapper it might make
> > sense to note the correct types.
> > 
> >> +.fi
> >> +.SH DESCRIPTION
> >> +The
> >> +.BR close_range ()
> >> +system call closes all open file descriptors from
> >> +.I first
> >> +to
> >> +.IR last
> >> +(included).
> >> +.PP
> >> +Errors closing a given file descriptor are currently ignored.
> >> +.PP
> >> +.I flags
> >> +can be set to
> >> +.B CLOSE_RANGE_UNSHARE
> >> +to unshare the range of file descriptors from any other processes,
> >> +.I instead
> >> +of closing them.
> > 
> > As Michael has noted, this needs to be reworded. A few things to note:
> > - CLOSE_RANGE_UNSHARE will ensure that the calling process will have a
> >   private file descriptor table. This ensures that other threads opening
> >   files cannot inject new file descriptors into the caller's file
> >   descriptor table to e.g. make the caller inherit unwanted file
> >   descriptors.
> > - CLOSE_RANGE_UNSHARE is conceptually equivalent to:
> >   unshare(CLONE_FILES);
> >   close_range(3, ~0U);
> 
> AFAICS after reading the code, if the unsharing fails,
> it will not close any file descriptors (please correct me if I'm wrong).
> 
> Just wanted to be sure that it was the intended behavior with you,
> and if so, it would be good to document it in the page.

Yes, this is intended because if the unshare fails we haven't yet
actually started closing anything so we're before the point of no
return where we ignore failures. So we can let userspace decide whether
they want to retry without CLOSE_RANGE_UNSHARE.

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-09 10:56     ` Christian Brauner
@ 2020-12-10 14:36       ` Alejandro Colomar (man-pages)
  2020-12-12 12:14         ` Christian Brauner
  0 siblings, 1 reply; 14+ messages in thread
From: Alejandro Colomar (man-pages) @ 2020-12-10 14:36 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Stephen Kitt, linux-man, Michael Kerrisk, linux-kernel

Hi Christian,

Thanks for confirming that behavior.  Seems reasonable.

I was wondering...
If this call is equivalent to unshare(2)+{close(2) in a loop},
shouldn't it fail for the same reasons those syscalls can fail?

What about the following errors?:

From unshare(2):

       EPERM  The calling process did not have the  required  privi‐
              leges for this operation.

From close(2):
       EBADF  fd isn't a valid open file descriptor.

OK, this one can't happen with the current code.
Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
It's a no-op (although it will still unshare if the flag is set).
But souldn't it fail with EBADF?

       EINTR  The close() call was interrupted by a signal; see sig‐
              nal(7).

       EIO    An I/O error occurred.

       ENOSPC, EDQUOT
              On NFS, these errors are not normally reported against
              the first write which exceeds  the  available  storage
              space,  but  instead  against  a  subsequent write(2),
              fsync(2), or close().

Thanks,

Alex


On 12/9/20 11:56 AM, Christian Brauner wrote:
> On Wed, Dec 09, 2020 at 11:44:22AM +0100, Alejandro Colomar (man-pages) wrote:
>> Hey Christian,
>>
>> I have a question for you below.
>>
>> Thanks,
> 
> Hey Alex,
> 
> Sure!

[...]

>>
>> AFAICS after reading the code, if the unsharing fails,
>> it will not close any file descriptors (please correct me if I'm wrong).
>>
>> Just wanted to be sure that it was the intended behavior with you,
>> and if so, it would be good to document it in the page.
> 
> Yes, this is intended because if the unshare fails we haven't yet
> actually started closing anything so we're before the point of no
> return where we ignore failures. So we can let userspace decide whether
> they want to retry without CLOSE_RANGE_UNSHARE.
> 
> Christian
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-09  9:47   ` Alejandro Colomar (man-pages)
@ 2020-12-10 22:40     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-12-10 22:40 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages), Stephen Kitt
  Cc: mtk.manpages, linux-man, Christian Brauner, lkml

On 12/9/20 10:47 AM, Alejandro Colomar (man-pages) wrote:

>>> +descriptors in
>>> +.B /proc/self/fd/
> 
> By reading proc.5, I think this should s/.B/.I/, right mtk?
> 
>>> +and calling
>>> +.BR close (2)
>>> +on each one.
>>> +.BR close_range ()
>>> +can take care of this without requiring
>>> +.B /proc
> 
> By reading proc.5, I think this should s/.B/.I/, right mtk?

Yes to both. Pathnames are formatted with .I.


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-10 14:36       ` Alejandro Colomar (man-pages)
@ 2020-12-12 12:14         ` Christian Brauner
  2020-12-12 17:58           ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2020-12-12 12:14 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: Stephen Kitt, linux-man, Michael Kerrisk, linux-kernel

On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Christian,

Hi Alex,

> 
> Thanks for confirming that behavior.  Seems reasonable.
> 
> I was wondering...
> If this call is equivalent to unshare(2)+{close(2) in a loop},
> shouldn't it fail for the same reasons those syscalls can fail?
> 
> What about the following errors?:
> 
> From unshare(2):
> 
>        EPERM  The calling process did not have the  required  privi‐
>               leges for this operation.

unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
i.e.
CLONE_NEWNS
CLONE_NEWUTS
CLONE_NEWIPC
CLONE_NEWNET
CLONE_NEWPID
CLONE_NEWCGROUP
CLONE_NEWTIME
so the permissions are the same.

> 
> From close(2):
>        EBADF  fd isn't a valid open file descriptor.
> 
> OK, this one can't happen with the current code.
> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
> It's a no-op (although it will still unshare if the flag is set).
> But souldn't it fail with EBADF?

CLOSE_RANGE_UNSHARE should always give you a private file descriptor
table independent of whether or not any file descriptors need to be
closed. That's also how we documented the flag:

/* Unshare the file descriptor table before closing file descriptors. */
#define CLOSE_RANGE_UNSHARE	(1U << 1)

A caller calling unshare(CLONE_FILES) and then an emulated close_range()
or the proper close_range() syscall wants to make sure that all unwanted
file descriptors are closed (if any) and that no new file descriptors
can be injected afterwards. If you skip the unshare(CLONE_FILES) because
there are no fds to be closed you open up a race window. It would also
be annoying for userspace if they _may_ have received a private file
descriptor table but only if any fds needed to be closed.

If people really were extremely keen about skipping the unshare when no
fd needs to be closed then this could become a new flag. But I really
don't think that's necessary and also doesn't make a lot of sense, imho.

> 
>        EINTR  The close() call was interrupted by a signal; see sig‐
>               nal(7).
> 
>        EIO    An I/O error occurred.
> 
>        ENOSPC, EDQUOT
>               On NFS, these errors are not normally reported against
>               the first write which exceeds  the  available  storage
>               space,  but  instead  against  a  subsequent write(2),
>               fsync(2), or close().

None of these will be seen by userspace because close_range() currently
ignores all errors after it has begun closing files.

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] close_range.2: new page documenting close_range(2)
  2020-12-12 12:14         ` Christian Brauner
@ 2020-12-12 17:58           ` Alejandro Colomar (man-pages)
  2020-12-18 10:12             ` Ping: " Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 14+ messages in thread
From: Alejandro Colomar (man-pages) @ 2020-12-12 17:58 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Stephen Kitt, linux-man, Michael Kerrisk, linux-kernel

Hi Christian,

Makes sense to me.

Thanks,

Alex

On 12/12/20 1:14 PM, Christian Brauner wrote:
> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
>> Hi Christian,
> 
> Hi Alex,
> 
>>
>> Thanks for confirming that behavior.  Seems reasonable.
>>
>> I was wondering...
>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>> shouldn't it fail for the same reasons those syscalls can fail?
>>
>> What about the following errors?:
>>
>> From unshare(2):
>>
>>        EPERM  The calling process did not have the  required  privi‐
>>               leges for this operation.
> 
> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
> i.e.
> CLONE_NEWNS
> CLONE_NEWUTS
> CLONE_NEWIPC
> CLONE_NEWNET
> CLONE_NEWPID
> CLONE_NEWCGROUP
> CLONE_NEWTIME
> so the permissions are the same.
> 
>>
>> From close(2):
>>        EBADF  fd isn't a valid open file descriptor.
>>
>> OK, this one can't happen with the current code.
>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>> It's a no-op (although it will still unshare if the flag is set).
>> But souldn't it fail with EBADF?
> 
> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
> table independent of whether or not any file descriptors need to be
> closed. That's also how we documented the flag:
> 
> /* Unshare the file descriptor table before closing file descriptors. */
> #define CLOSE_RANGE_UNSHARE	(1U << 1)
> 
> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
> or the proper close_range() syscall wants to make sure that all unwanted
> file descriptors are closed (if any) and that no new file descriptors
> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
> there are no fds to be closed you open up a race window. It would also
> be annoying for userspace if they _may_ have received a private file
> descriptor table but only if any fds needed to be closed.
> 
> If people really were extremely keen about skipping the unshare when no
> fd needs to be closed then this could become a new flag. But I really
> don't think that's necessary and also doesn't make a lot of sense, imho.
> 
>>
>>        EINTR  The close() call was interrupted by a signal; see sig‐
>>               nal(7).
>>
>>        EIO    An I/O error occurred.
>>
>>        ENOSPC, EDQUOT
>>               On NFS, these errors are not normally reported against
>>               the first write which exceeds  the  available  storage
>>               space,  but  instead  against  a  subsequent write(2),
>>               fsync(2), or close().
> 
> None of these will be seen by userspace because close_range() currently
> ignores all errors after it has begun closing files.
> 
> Christian
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Ping: [patch] close_range.2: new page documenting close_range(2)
  2020-12-12 17:58           ` Alejandro Colomar (man-pages)
@ 2020-12-18 10:12             ` Alejandro Colomar (man-pages)
  2020-12-18 10:14               ` Stephen Kitt
  0 siblings, 1 reply; 14+ messages in thread
From: Alejandro Colomar (man-pages) @ 2020-12-18 10:12 UTC (permalink / raw)
  To: Stephen Kitt; +Cc: linux-man, Michael Kerrisk, linux-kernel, Christian Brauner

Hi Stephen,

Linux 5.10 has been recently released.
Do you have any updates for this patch?

Thanks,

Alex

On 12/12/20 6:58 PM, Alejandro Colomar (man-pages) wrote:
> Hi Christian,
> 
> Makes sense to me.
> 
> Thanks,
> 
> Alex
> 
> On 12/12/20 1:14 PM, Christian Brauner wrote:
>> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
>>> Hi Christian,
>>
>> Hi Alex,
>>
>>>
>>> Thanks for confirming that behavior.  Seems reasonable.
>>>
>>> I was wondering...
>>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>>> shouldn't it fail for the same reasons those syscalls can fail?
>>>
>>> What about the following errors?:
>>>
>>> From unshare(2):
>>>
>>>        EPERM  The calling process did not have the  required  privi‐
>>>               leges for this operation.
>>
>> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
>> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
>> i.e.
>> CLONE_NEWNS
>> CLONE_NEWUTS
>> CLONE_NEWIPC
>> CLONE_NEWNET
>> CLONE_NEWPID
>> CLONE_NEWCGROUP
>> CLONE_NEWTIME
>> so the permissions are the same.
>>
>>>
>>> From close(2):
>>>        EBADF  fd isn't a valid open file descriptor.
>>>
>>> OK, this one can't happen with the current code.
>>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>>> It's a no-op (although it will still unshare if the flag is set).
>>> But souldn't it fail with EBADF?
>>
>> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
>> table independent of whether or not any file descriptors need to be
>> closed. That's also how we documented the flag:
>>
>> /* Unshare the file descriptor table before closing file descriptors. */
>> #define CLOSE_RANGE_UNSHARE	(1U << 1)
>>
>> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
>> or the proper close_range() syscall wants to make sure that all unwanted
>> file descriptors are closed (if any) and that no new file descriptors
>> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
>> there are no fds to be closed you open up a race window. It would also
>> be annoying for userspace if they _may_ have received a private file
>> descriptor table but only if any fds needed to be closed.
>>
>> If people really were extremely keen about skipping the unshare when no
>> fd needs to be closed then this could become a new flag. But I really
>> don't think that's necessary and also doesn't make a lot of sense, imho.
>>
>>>
>>>        EINTR  The close() call was interrupted by a signal; see sig‐
>>>               nal(7).
>>>
>>>        EIO    An I/O error occurred.
>>>
>>>        ENOSPC, EDQUOT
>>>               On NFS, these errors are not normally reported against
>>>               the first write which exceeds  the  available  storage
>>>               space,  but  instead  against  a  subsequent write(2),
>>>               fsync(2), or close().
>>
>> None of these will be seen by userspace because close_range() currently
>> ignores all errors after it has begun closing files.
>>
>> Christian
>>

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ping: [patch] close_range.2: new page documenting close_range(2)
  2020-12-18 10:12             ` Ping: " Alejandro Colomar (man-pages)
@ 2020-12-18 10:14               ` Stephen Kitt
  0 siblings, 0 replies; 14+ messages in thread
From: Stephen Kitt @ 2020-12-18 10:14 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: linux-man, Michael Kerrisk, linux-kernel, Christian Brauner

Hi Alex,

Le 18/12/2020 11:12, Alejandro Colomar (man-pages) a écrit :
> Linux 5.10 has been recently released.
> Do you have any updates for this patch?

Yes, I have a v3 in preparation, with _CLOEXEC and a code example. I'll 
wrap it up today.

Regards,

Stephen

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-12-18 10:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08 21:51 [patch] close_range.2: new page documenting close_range(2) Stephen Kitt
2020-12-09  8:50 ` Michael Kerrisk (man-pages)
2020-12-09  9:40   ` Christian Brauner
2020-12-09  9:43     ` Stephen Kitt
2020-12-09  9:47   ` Alejandro Colomar (man-pages)
2020-12-10 22:40     ` Michael Kerrisk (man-pages)
2020-12-09  9:58 ` Christian Brauner
2020-12-09 10:44   ` Alejandro Colomar (man-pages)
2020-12-09 10:56     ` Christian Brauner
2020-12-10 14:36       ` Alejandro Colomar (man-pages)
2020-12-12 12:14         ` Christian Brauner
2020-12-12 17:58           ` Alejandro Colomar (man-pages)
2020-12-18 10:12             ` Ping: " Alejandro Colomar (man-pages)
2020-12-18 10:14               ` Stephen Kitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).