* [PATCH 1/2] clone.2: tfix
@ 2019-11-28 12:46 Adrian Reber
2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Adrian Reber @ 2019-11-28 12:46 UTC (permalink / raw)
To: mtk.manpages, Christian Brauner; +Cc: linux-man
added two missing parentheses
Signed-off-by: Adrian Reber <areber@redhat.com>
---
man2/clone.2 | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/man2/clone.2 b/man2/clone.2
index e5ab2a096..076b9258e 100644
--- a/man2/clone.2
+++ b/man2/clone.2
@@ -1349,13 +1349,13 @@ For example, on aarch64,
.I stack
must be a multiple of 16.
.TP
-.BR EINVAL " (" clone3 "() only"
+.BR EINVAL " (" clone3 "() only)"
.B CLONE_DETACHED
was specified in the
.I flags
mask.
.TP
-.BR EINVAL " (" clone "() only"
+.BR EINVAL " (" clone "() only)"
.B CLONE_PIDFD
was specified together with
.B CLONE_DETACHED
base-commit: 445fc03eebeee9b0e9ccaa4e2361f91f30749e8e
--
2.23.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] clone.2: added clone3() set_tid information
2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber
@ 2019-11-28 12:46 ` Adrian Reber
2019-11-28 17:24 ` Christian Brauner
2019-12-01 12:18 ` Michael Kerrisk (man-pages)
2019-11-28 13:30 ` [PATCH 1/2] clone.2: tfix Christian Brauner
2019-12-01 12:16 ` Michael Kerrisk (man-pages)
2 siblings, 2 replies; 10+ messages in thread
From: Adrian Reber @ 2019-11-28 12:46 UTC (permalink / raw)
To: mtk.manpages, Christian Brauner; +Cc: linux-man
Signed-off-by: Adrian Reber <areber@redhat.com>
---
man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 90 insertions(+)
diff --git a/man2/clone.2 b/man2/clone.2
index 076b9258e..59c13ec35 100644
--- a/man2/clone.2
+++ b/man2/clone.2
@@ -195,6 +195,8 @@ struct clone_args {
u64 stack; /* Pointer to lowest byte of stack */
u64 stack_size; /* Size of stack */
u64 tls; /* Location of new TLS */
+ u64 set_tid; /* Pointer to a \fIpid_t\fP array */
+ u64 set_tid_size; /* Number of elements in \fIset_tid\fP */
};
.EE
.in
@@ -262,6 +264,8 @@ flags & 0xff exit_signal
stack stack
\fP---\fP stack_size
tls tls See CLONE_SETTLS
+\fP---\fP set_tid See below for details
+\fP---\fP set_tid_size
.TE
.RE
.\"
@@ -285,6 +289,74 @@ options when waiting for the child with
If no signal (i.e., zero) is specified, then the parent process is not signaled
when the child terminates.
.\"
+.SS The set_tid array
+.PP
+The
+.I set_tid
+array is used to select a certain PID for the process to be created by
+.BR clone3 ().
+If the PID of the newly created process should only be set for the current
+PID namespace or in the newly created PID namespace (if
+.I flags
+contains
+.BR CLONE_NEWPID )
+then the first element in the
+.I set_tid
+array has to be the desired PID and
+.I set_tid_size
+needs to be 1.
+.PP
+If the PID of the newly created process should have a certain value in
+multiple PID namespaces the
+.I set_tid
+array can have multiple entries. The first entry defines the PID in the most
+nested PID namespace and all following entries contain the PID of the
+corresponding parent PID namespace. The number of PID namespaces in which a PID
+should be set is defined by
+.I set_tid_size
+which cannot be larger than the number of currently nested PID namespaces.
+.PP
+To create a process with the following PIDs:
+.RS
+.TS
+lb lb
+l l .
+PID NS level Requested PID
+0 (host) 31496
+1 42
+2 7
+.TE
+.RE
+.PP
+The
+.I set_tid
+array would need to be filled with:
+.PP
+.EX
+ set_tid[0] = 7;
+ set_tid[1] = 42;
+ set_tid[2] = 31496;
+ set_tid_size = 3;
+.EE
+.PP
+If only the PID of the two innermost PID namespaces
+should be defined it needs to be set like this:
+.PP
+.EX
+ set_tid[0] = 7;
+ set_tid[1] = 42;
+ set_tid_size = 2;
+.EE
+.PP
+The PID in the PID namespaces outside the two innermost PID namespaces
+is then selected the same way as any other PID is selected.
+.PP
+Only a privileged process
+.RB ( CAP_SYS_ADMIN )
+can set
+.I set_tid
+to select a PID for the process to be created.
+.\"
.SS The flags mask
.PP
Both
@@ -1379,6 +1451,16 @@ in the
.I flags
mask.
.TP
+.BR EINVAL " (" clone3 "() only)"
+.I set_tid_size
+larger than current number of nested PID namespaces or maximum number of
+nested PID namespaces was specified.
+.TP
+.BR EINVAL " (" clone3 "() only)"
+If one of the PIDs specified in
+.I set_tid
+was an invalid PID.
+.TP
.B ENOMEM
Cannot allocate sufficient memory to allocate a task structure for the
child, or to copy those parts of the caller's context that need to be
@@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment
(i.e., the caller's root directory does not match the root directory
of the mount namespace in which it resides).
.TP
+.BR EPERM " (" clone3 "() only)"
+If
+.I set_tid
+with
+.I set_tid_size
+larger than 0 was specified by an unprivileged process (process without
+\fBCAP_SYS_ADMIN\fP).
+.TP
.BR ERESTARTNOINTR " (since Linux 2.6.17)"
.\" commit 4a2c7a7837da1b91468e50426066d988050e4d56
System call was interrupted by a signal and will be restarted.
--
2.23.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] clone.2: tfix
2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber
2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber
@ 2019-11-28 13:30 ` Christian Brauner
2019-12-01 12:16 ` Michael Kerrisk (man-pages)
2 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2019-11-28 13:30 UTC (permalink / raw)
To: Adrian Reber; +Cc: mtk.manpages, linux-man
On Thu, Nov 28, 2019 at 01:46:49PM +0100, Adrian Reber wrote:
> added two missing parentheses
>
> Signed-off-by: Adrian Reber <areber@redhat.com>
Seems reasonable to me.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information
2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber
@ 2019-11-28 17:24 ` Christian Brauner
2019-11-29 13:56 ` Adrian Reber
2019-12-01 12:18 ` Michael Kerrisk (man-pages)
2019-12-01 12:18 ` Michael Kerrisk (man-pages)
1 sibling, 2 replies; 10+ messages in thread
From: Christian Brauner @ 2019-11-28 17:24 UTC (permalink / raw)
To: Adrian Reber; +Cc: mtk.manpages, linux-man
On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote:
> Signed-off-by: Adrian Reber <areber@redhat.com>
> ---
> man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 90 insertions(+)
>
> diff --git a/man2/clone.2 b/man2/clone.2
> index 076b9258e..59c13ec35 100644
> --- a/man2/clone.2
> +++ b/man2/clone.2
> @@ -195,6 +195,8 @@ struct clone_args {
> u64 stack; /* Pointer to lowest byte of stack */
> u64 stack_size; /* Size of stack */
> u64 tls; /* Location of new TLS */
> + u64 set_tid; /* Pointer to a \fIpid_t\fP array */
> + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */
> };
> .EE
> .in
> @@ -262,6 +264,8 @@ flags & 0xff exit_signal
> stack stack
> \fP---\fP stack_size
> tls tls See CLONE_SETTLS
> +\fP---\fP set_tid See below for details
> +\fP---\fP set_tid_size
> .TE
> .RE
> .\"
> @@ -285,6 +289,74 @@ options when waiting for the child with
> If no signal (i.e., zero) is specified, then the parent process is not signaled
> when the child terminates.
> .\"
> +.SS The set_tid array
> +.PP
> +The
> +.I set_tid
> +array is used to select a certain PID for the process to be created by
> +.BR clone3 ().
> +If the PID of the newly created process should only be set for the current
> +PID namespace or in the newly created PID namespace (if
> +.I flags
> +contains
> +.BR CLONE_NEWPID )
> +then the first element in the
> +.I set_tid
> +array has to be the desired PID and
> +.I set_tid_size
> +needs to be 1.
> +.PP
> +If the PID of the newly created process should have a certain value in
> +multiple PID namespaces the
> +.I set_tid
> +array can have multiple entries. The first entry defines the PID in the most
> +nested PID namespace and all following entries contain the PID of the
> +corresponding parent PID namespace. The number of PID namespaces in which a PID
> +should be set is defined by
> +.I set_tid_size
> +which cannot be larger than the number of currently nested PID namespaces.
"It's upper cap is the kernel-enforced general nesting limit."
or sm like that
> +.PP
> +To create a process with the following PIDs:
in a pid namespace hierarchy
> +.RS
> +.TS
> +lb lb
> +l l .
> +PID NS level Requested PID
> +0 (host) 31496
> +1 42
> +2 7
> +.TE
> +.RE
> +.PP
Set the array to:
> +.PP
> +.EX
> + set_tid[0] = 7;
> + set_tid[1] = 42;
> + set_tid[2] = 31496;
> + set_tid_size = 3;
> +.EE
> +.PP
> +If only the PID of the two innermost PID namespaces
need to be specified set the array to:
> +.PP
> +.EX
> + set_tid[0] = 7;
> + set_tid[1] = 42;
> + set_tid_size = 2;
> +.EE
> +.PP
> +The PID in the PID namespaces outside the two innermost PID namespaces
> +is then selected the same way as any other PID is selected.
s/is then/will be/
> +.PP
> +Only a privileged process
> +.RB ( CAP_SYS_ADMIN )
Maybe more specific:
The set_tid feature requires CAP_SYS_ADMIN in all owning user namespaces
of the target pid namespaces.
> +can set
> +.I set_tid
> +to select a PID for the process to be created.
> +.\"
> .SS The flags mask
> .PP
> Both
> @@ -1379,6 +1451,16 @@ in the
> .I flags
> mask.
> .TP
> +.BR EINVAL " (" clone3 "() only)"
> +.I set_tid_size
> +larger than current number of nested PID namespaces or maximum number of
> +nested PID namespaces was specified.
> +.TP
> +.BR EINVAL " (" clone3 "() only)"
> +If one of the PIDs specified in
> +.I set_tid
> +was an invalid PID.
> +.TP
> .B ENOMEM
> Cannot allocate sufficient memory to allocate a task structure for the
> child, or to copy those parts of the caller's context that need to be
> @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment
> (i.e., the caller's root directory does not match the root directory
> of the mount namespace in which it resides).
> .TP
> +.BR EPERM " (" clone3 "() only)"
> +If
> +.I set_tid
> +with
> +.I set_tid_size
> +larger than 0 was specified by an unprivileged process (process without
> +\fBCAP_SYS_ADMIN\fP).
without CAP_SYS_ADMIN in _one_ of the owning user namespaces of the
target pid namespace
maybe?
Where's EEXIST?
Christian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information
2019-11-28 17:24 ` Christian Brauner
@ 2019-11-29 13:56 ` Adrian Reber
2019-12-01 12:17 ` Michael Kerrisk (man-pages)
2019-12-01 12:18 ` Michael Kerrisk (man-pages)
1 sibling, 1 reply; 10+ messages in thread
From: Adrian Reber @ 2019-11-29 13:56 UTC (permalink / raw)
To: Christian Brauner; +Cc: mtk.manpages, linux-man
On Thu, Nov 28, 2019 at 06:24:05PM +0100, Christian Brauner wrote:
> On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote:
> > Signed-off-by: Adrian Reber <areber@redhat.com>
> > ---
> > man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 90 insertions(+)
> >
> > diff --git a/man2/clone.2 b/man2/clone.2
> > index 076b9258e..59c13ec35 100644
> > --- a/man2/clone.2
> > +++ b/man2/clone.2
> > @@ -195,6 +195,8 @@ struct clone_args {
> > u64 stack; /* Pointer to lowest byte of stack */
> > u64 stack_size; /* Size of stack */
> > u64 tls; /* Location of new TLS */
> > + u64 set_tid; /* Pointer to a \fIpid_t\fP array */
> > + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */
> > };
> > .EE
> > .in
> > @@ -262,6 +264,8 @@ flags & 0xff exit_signal
> > stack stack
> > \fP---\fP stack_size
> > tls tls See CLONE_SETTLS
> > +\fP---\fP set_tid See below for details
> > +\fP---\fP set_tid_size
> > .TE
> > .RE
> > .\"
> > @@ -285,6 +289,74 @@ options when waiting for the child with
> > If no signal (i.e., zero) is specified, then the parent process is not signaled
> > when the child terminates.
> > .\"
> > +.SS The set_tid array
> > +.PP
> > +The
> > +.I set_tid
> > +array is used to select a certain PID for the process to be created by
> > +.BR clone3 ().
> > +If the PID of the newly created process should only be set for the current
> > +PID namespace or in the newly created PID namespace (if
> > +.I flags
> > +contains
> > +.BR CLONE_NEWPID )
> > +then the first element in the
> > +.I set_tid
> > +array has to be the desired PID and
> > +.I set_tid_size
> > +needs to be 1.
> > +.PP
> > +If the PID of the newly created process should have a certain value in
> > +multiple PID namespaces the
> > +.I set_tid
> > +array can have multiple entries. The first entry defines the PID in the most
> > +nested PID namespace and all following entries contain the PID of the
> > +corresponding parent PID namespace. The number of PID namespaces in which a PID
> > +should be set is defined by
> > +.I set_tid_size
> > +which cannot be larger than the number of currently nested PID namespaces.
>
> "It's upper cap is the kernel-enforced general nesting limit."
> or sm like that
Is that an addition to my sentence or a replacement. I think at this
point it is more important to point out that it cannot be larger than
the number of currently nested PID namespaces. Later (at EPERM) I am
also mentioning that it cannot be larger than the maximum number of
nested PID namespaces. The code does indeed check if set_tid_size is
larger than the maximum number of possible nested PID namespaces for
the user, I think, when calling clone3(), it is more relevant that
set_tid_size is not larger than the number of currently nested PID
namespaces. The maximum number of possible nested PID namespaces is more
likely enforced during unshare() or CLONE_NEWPID (which could be
happening at the same point in time as set_tid_size larger than maximum
number of nested PID namespace).
This definitely feels like too much discussion for a single sentence ;)
I can add a sentence about the maximum number of nested PID namespaces
here in addition to the one at EPERM.
I do not think it is relevant for the user at this point in time.
Adrian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] clone.2: tfix
2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber
2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber
2019-11-28 13:30 ` [PATCH 1/2] clone.2: tfix Christian Brauner
@ 2019-12-01 12:16 ` Michael Kerrisk (man-pages)
2 siblings, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-12-01 12:16 UTC (permalink / raw)
To: Adrian Reber, Christian Brauner; +Cc: linux-man
Hi Adrian,
On 11/28/19 1:46 PM, Adrian Reber wrote:
> added two missing parentheses
>
> Signed-off-by: Adrian Reber <areber@redhat.com>
> ---
> man2/clone.2 | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/man2/clone.2 b/man2/clone.2
> index e5ab2a096..076b9258e 100644
> --- a/man2/clone.2
> +++ b/man2/clone.2
> @@ -1349,13 +1349,13 @@ For example, on aarch64,
> .I stack
> must be a multiple of 16.
> .TP
> -.BR EINVAL " (" clone3 "() only"
> +.BR EINVAL " (" clone3 "() only)"
> .B CLONE_DETACHED
> was specified in the
> .I flags
> mask.
> .TP
> -.BR EINVAL " (" clone "() only"
> +.BR EINVAL " (" clone "() only)"
> .B CLONE_PIDFD
> was specified together with
> .B CLONE_DETACHED
>
> base-commit: 445fc03eebeee9b0e9ccaa4e2361f91f30749e8e
Thanks. Applied.
Cheers,
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information
2019-11-29 13:56 ` Adrian Reber
@ 2019-12-01 12:17 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-12-01 12:17 UTC (permalink / raw)
To: Adrian Reber, Christian Brauner; +Cc: linux-man
Hi Adrian,
Thank you for the patch. I have some review comments below,
and also further comments in my reply to Christian's notes.
On 11/29/19 2:56 PM, Adrian Reber wrote:
> On Thu, Nov 28, 2019 at 06:24:05PM +0100, Christian Brauner wrote:
>> On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote:
>>> Signed-off-by: Adrian Reber <areber@redhat.com>
>>> ---
>>> man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 90 insertions(+)
>>>
>>> diff --git a/man2/clone.2 b/man2/clone.2
>>> index 076b9258e..59c13ec35 100644
>>> --- a/man2/clone.2
>>> +++ b/man2/clone.2
>>> @@ -195,6 +195,8 @@ struct clone_args {
>>> u64 stack; /* Pointer to lowest byte of stack */
>>> u64 stack_size; /* Size of stack */
>>> u64 tls; /* Location of new TLS */
>>> + u64 set_tid; /* Pointer to a \fIpid_t\fP array */
>>> + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */
>>> };
>>> .EE
>>> .in
>>> @@ -262,6 +264,8 @@ flags & 0xff exit_signal
>>> stack stack
>>> \fP---\fP stack_size
>>> tls tls See CLONE_SETTLS
>>> +\fP---\fP set_tid See below for details
>>> +\fP---\fP set_tid_size
>>> .TE
>>> .RE
>>> .\"
>>> @@ -285,6 +289,74 @@ options when waiting for the child with
>>> If no signal (i.e., zero) is specified, then the parent process is not signaled
>>> when the child terminates.
>>> .\"
>>> +.SS The set_tid array
>>> +.PP
>>> +The
>>> +.I set_tid
>>> +array is used to select a certain PID for the process to be created by
>>> +.BR clone3 ().
>>> +If the PID of the newly created process should only be set for the current
>>> +PID namespace or in the newly created PID namespace (if
>>> +.I flags
>>> +contains
>>> +.BR CLONE_NEWPID )
>>> +then the first element in the
>>> +.I set_tid
>>> +array has to be the desired PID and
>>> +.I set_tid_size
>>> +needs to be 1.
>>> +.PP
>>> +If the PID of the newly created process should have a certain value in
>>> +multiple PID namespaces the
>>> +.I set_tid
>>> +array can have multiple entries. The first entry defines the PID in the most
>>> +nested PID namespace and all following entries contain the PID of the
>>> +corresponding parent PID namespace. The number of PID namespaces in which a PID
>>> +should be set is defined by
>>> +.I set_tid_size
>>> +which cannot be larger than the number of currently nested PID namespaces.
>>
>> "It's upper cap is the kernel-enforced general nesting limit."
>> or sm like that
>
> Is that an addition to my sentence or a replacement. I think at this
> point it is more important to point out that it cannot be larger than
> the number of currently nested PID namespaces. Later (at EPERM) I am
> also mentioning that it cannot be larger than the maximum number of
> nested PID namespaces. The code does indeed check if set_tid_size is
> larger than the maximum number of possible nested PID namespaces for
> the user, I think, when calling clone3(), it is more relevant that
> set_tid_size is not larger than the number of currently nested PID
> namespaces. The maximum number of possible nested PID namespaces is more
> likely enforced during unshare() or CLONE_NEWPID (which could be
> happening at the same point in time as set_tid_size larger than maximum
> number of nested PID namespace).
>
> This definitely feels like too much discussion for a single sentence ;)
>
> I can add a sentence about the maximum number of nested PID namespaces
> here in addition to the one at EPERM.
> I do not think it is relevant for the user at this point in time.
I'm inclined to agree with you. The maximum nesting depth is a general
detail of PID NSs and is already covered in pid_namespaces((7). I think
there's no need to repeat or allude to that detial here the detail here.
Thanks,
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information
2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber
2019-11-28 17:24 ` Christian Brauner
@ 2019-12-01 12:18 ` Michael Kerrisk (man-pages)
1 sibling, 0 replies; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-12-01 12:18 UTC (permalink / raw)
To: Adrian Reber, Christian Brauner; +Cc: linux-man
Hello Adrian,
On 11/28/19 1:46 PM, Adrian Reber wrote:
> Signed-off-by: Adrian Reber <areber@redhat.com>
> ---
> man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 90 insertions(+)
>
> diff --git a/man2/clone.2 b/man2/clone.2
> index 076b9258e..59c13ec35 100644
> --- a/man2/clone.2
> +++ b/man2/clone.2
> @@ -195,6 +195,8 @@ struct clone_args {
> u64 stack; /* Pointer to lowest byte of stack */
> u64 stack_size; /* Size of stack */
> u64 tls; /* Location of new TLS */
> + u64 set_tid; /* Pointer to a \fIpid_t\fP array */
> + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */
> };
> .EE
> .in
> @@ -262,6 +264,8 @@ flags & 0xff exit_signal
> stack stack
> \fP---\fP stack_size
> tls tls See CLONE_SETTLS
> +\fP---\fP set_tid See below for details
> +\fP---\fP set_tid_size
> .TE
> .RE
> .\"
> @@ -285,6 +289,74 @@ options when waiting for the child with
> If no signal (i.e., zero) is specified, then the parent process is not signaled
> when the child terminates.
> .\"
> +.SS The set_tid array
> +.PP
> +The
> +.I set_tid
> +array is used to select a certain PID for the process to be created by
s/is used/may be used/
Because it's not required to use this array, right? I mean, the
default is that the kernel chooses the PIDs. Perhaps this needs to be
more clearly stated at the start of this subsection.
How about:
[[
By default, the kernel chooses the next sequential PID for the new
process in each of the PID namespaces where it is present.
When creating a process with
.BR clone3 (),
the
.I set_tid
array can be used to select specific PIDs for the process in some
or all of the PID namespaces where it is present.
]]
?
> +.BR clone3 ().
> +If the PID of the newly created process should only be set for the current
> +PID namespace or in the newly created PID namespace (if
> +.I flags
> +contains
> +.BR CLONE_NEWPID )
> +then the first element in the
> +.I set_tid
> +array has to be the desired PID and
> +.I set_tid_size
> +needs to be 1.
> +.PP
> +If the PID of the newly created process should have a certain value in
> +multiple PID namespaces the
> +.I set_tid
> +array can have multiple entries. The first entry defines the PID in the most
most *deeply* nested
> +nested PID namespace and all following entries contain the PID of the
> +corresponding parent PID namespace. The number of PID namespaces in which a PID
> +should be set is defined by
> +.I set_tid_size
> +which cannot be larger than the number of currently nested PID namespaces.
> +.PP
> +To create a process with the following PIDs:
> +.RS
> +.TS
> +lb lb
> +l l .
> +PID NS level Requested PID
> +0 (host) 31496
> +1 42
> +2 7
> +.TE
> +.RE
> +.PP
> +The
> +.I set_tid
> +array would need to be filled with:
> +.PP
> +.EX
> + set_tid[0] = 7;
> + set_tid[1] = 42;
> + set_tid[2] = 31496;
> + set_tid_size = 3;
> +.EE
> +.PP
> +If only the PID of the two innermost PID namespaces
> +should be defined it needs to be set like this:
> +.PP
> +.EX
> + set_tid[0] = 7;
> + set_tid[1] = 42;
> + set_tid_size = 2;
> +.EE
> +.PP
> +The PID in the PID namespaces outside the two innermost PID namespaces
> +is then selected the same way as any other PID is selected.
> +.PP
> +Only a privileged process
> +.RB ( CAP_SYS_ADMIN )
> +can set
> +.I set_tid
> +to select a PID for the process to be created.
> +.\"
> .SS The flags mask
> .PP
> Both
> @@ -1379,6 +1451,16 @@ in the
> .I flags
> mask.
> .TP
> +.BR EINVAL " (" clone3 "() only)"
> +.I set_tid_size
> +larger than current number of nested PID namespaces or maximum number of
> +nested PID namespaces was specified.
> +.TP
> +.BR EINVAL " (" clone3 "() only)"
> +If one of the PIDs specified in
> +.I set_tid
> +was an invalid PID.
> +.TP
> .B ENOMEM
> Cannot allocate sufficient memory to allocate a task structure for the
> child, or to copy those parts of the caller's context that need to be
> @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment
> (i.e., the caller's root directory does not match the root directory
> of the mount namespace in which it resides).
> .TP
> +.BR EPERM " (" clone3 "() only)"
> +If
> +.I set_tid
> +with
> +.I set_tid_size
> +larger than 0 was specified by an unprivileged process (process without
> +\fBCAP_SYS_ADMIN\fP).
> +.TP
> .BR ERESTARTNOINTR " (since Linux 2.6.17)"
> .\" commit 4a2c7a7837da1b91468e50426066d988050e4d56
> System call was interrupted by a signal and will be restarted.
Thanks,
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information
2019-11-28 17:24 ` Christian Brauner
2019-11-29 13:56 ` Adrian Reber
@ 2019-12-01 12:18 ` Michael Kerrisk (man-pages)
2019-12-02 14:25 ` Adrian Reber
1 sibling, 1 reply; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-12-01 12:18 UTC (permalink / raw)
To: Christian Brauner, Adrian Reber; +Cc: linux-man
Hello Christian, and Adrian,
Christian, thanks for the review!
On 11/28/19 6:24 PM, Christian Brauner wrote:
> On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote:
>> Signed-off-by: Adrian Reber <areber@redhat.com>
>> ---
>> man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 90 insertions(+)
>>
>> diff --git a/man2/clone.2 b/man2/clone.2
>> index 076b9258e..59c13ec35 100644
>> --- a/man2/clone.2
>> +++ b/man2/clone.2
>> @@ -195,6 +195,8 @@ struct clone_args {
>> u64 stack; /* Pointer to lowest byte of stack */
>> u64 stack_size; /* Size of stack */
>> u64 tls; /* Location of new TLS */
>> + u64 set_tid; /* Pointer to a \fIpid_t\fP array */
>> + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */
>> };
>> .EE
>> .in
>> @@ -262,6 +264,8 @@ flags & 0xff exit_signal
>> stack stack
>> \fP---\fP stack_size
>> tls tls See CLONE_SETTLS
>> +\fP---\fP set_tid See below for details
>> +\fP---\fP set_tid_size
>> .TE
>> .RE
>> .\"
>> @@ -285,6 +289,74 @@ options when waiting for the child with
>> If no signal (i.e., zero) is specified, then the parent process is not signaled
>> when the child terminates.
>> .\"
>> +.SS The set_tid array
>> +.PP
>> +The
>> +.I set_tid
>> +array is used to select a certain PID for the process to be created by
>> +.BR clone3 ().
>> +If the PID of the newly created process should only be set for the current
>> +PID namespace or in the newly created PID namespace (if
>> +.I flags
>> +contains
>> +.BR CLONE_NEWPID )
>> +then the first element in the
>> +.I set_tid
>> +array has to be the desired PID and
>> +.I set_tid_size
>> +needs to be 1.
>> +.PP
>> +If the PID of the newly created process should have a certain value in
>> +multiple PID namespaces the
>> +.I set_tid
>> +array can have multiple entries. The first entry defines the PID in the most
>> +nested PID namespace and all following entries contain the PID of the
>> +corresponding parent PID namespace. The number of PID namespaces in which a PID
>> +should be set is defined by
>> +.I set_tid_size
>> +which cannot be larger than the number of currently nested PID namespaces.
>
> "It's upper cap is the kernel-enforced general nesting limit."
> or sm like that
See my comments in a separate mail where Adrian replied to the above.
>> +.PP
>> +To create a process with the following PIDs:
>
> in a pid namespace hierarchy
(okay)
>> +.RS
>> +.TS
>> +lb lb
>> +l l .
>> +PID NS level Requested PID
>> +0 (host) 31496
>> +1 42
>> +2 7
>> +.TE
>> +.RE
>> +.PP
>
Christian, you trim away the piece:
[[
+The
+.I set_tid
+array would need to be filled with:
]]
that you suggest replacing with:
> Set the array to:
I think the suggested change is fine, but when you trim the text
in this way (as you do a few times below, it makes it overly
difficult to review your suggestions, since I must separately
refer back to Adrian's original mail to see what is being
replaced (which is not comfortable on my 13" laptop
11km above Belgium :-)).
>> +.PP
>> +.EX
>> + set_tid[0] = 7;
>> + set_tid[1] = 42;
>> + set_tid[2] = 31496;
>> + set_tid_size = 3;
>> +.EE
>> +.PP
>> +If only the PID of the two innermost PID namespaces
s/PID of/PIDs in/
>
> need to be specified set the array to:
(okay; but see my comment about trimming above.)
>> +.PP
>> +.EX
>> + set_tid[0] = 7;
>> + set_tid[1] = 42;
>> + set_tid_size = 2;
>> +.EE
>> +.PP
>> +The PID in the PID namespaces outside the two innermost PID namespaces
>> +is then selected the same way as any other PID is selected.
>
> s/is then/will be/
(okay)
>> +.PP
>> +Only a privileged process
>> +.RB ( CAP_SYS_ADMIN )
>
> Maybe more specific:
>
> The set_tid feature requires CAP_SYS_ADMIN in all owning user namespaces
> of the target pid namespaces.
Excellent addition.
>> +can set
>> +.I set_tid
>> +to select a PID for the process to be created.
>> +.\"
>> .SS The flags mask
>> .PP
>> Both
>> @@ -1379,6 +1451,16 @@ in the
>> .I flags
>> mask.
>> .TP
>> +.BR EINVAL " (" clone3 "() only)"
>> +.I set_tid_size
>> +larger than current number of nested PID namespaces or maximum number of
>> +nested PID namespaces was specified.
The last piece of that sentence feels redundant to me, and it
could/should be removed, I think. The current number of nested PID
namespaces can't exceed the maximum number of nested namespaces, so
mentioning the latter as a limit seems unnecessary (and even a little
confusing, since it leaves the reader wondering whether "current"
could exceed "maximum".)
Or have I misunderstood what you intend to convey, Adrian?
>> +.TP
>> +.BR EINVAL " (" clone3 "() only)"
>> +If one of the PIDs specified in
>> +.I set_tid
>> +was an invalid PID.
>> +.TP
>> .B ENOMEM
>> Cannot allocate sufficient memory to allocate a task structure for the
>> child, or to copy those parts of the caller's context that need to be
>> @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment
>> (i.e., the caller's root directory does not match the root directory
>> of the mount namespace in which it resides).
>> .TP
>> +.BR EPERM " (" clone3 "() only)"
>> +If
>> +.I set_tid
>> +with
>> +.I set_tid_size
>> +larger than 0 was specified by an unprivileged process (process without
>> +\fBCAP_SYS_ADMIN\fP).
>
> without CAP_SYS_ADMIN in _one_ of the owning user namespaces of the
> target pid namespace
>
> maybe?
Adrian, I think there's no need to mention 'set_tid'. That's implied
by 'set_tid_size' being greater than 0.
How about:
[[
.BR EPERM " (" clone3 "() only)"
.I set_tid_size
was greater than zero, and the caller lacks the
.B CAP_SYS_ADMIN
capability in one or more of the user namespaces that own the
corresponding PID namespaces.
]]
?
> Where's EEXIST?
Indeed!
I speculate (without having looked at the kernel code):
[[
.BR EPERM " (" clone3 "() only)"
One or more of the PIDs specified in
.I set_tid
already exists in the corresponding namespace.
]]
Thanks,
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information
2019-12-01 12:18 ` Michael Kerrisk (man-pages)
@ 2019-12-02 14:25 ` Adrian Reber
0 siblings, 0 replies; 10+ messages in thread
From: Adrian Reber @ 2019-12-02 14:25 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: Christian Brauner, linux-man
Christian and Michael,
thanks for the reviews. I tried to apply all suggested changes in v2
which I will send out in a few minutes.
Adrian
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-12-02 14:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber
2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber
2019-11-28 17:24 ` Christian Brauner
2019-11-29 13:56 ` Adrian Reber
2019-12-01 12:17 ` Michael Kerrisk (man-pages)
2019-12-01 12:18 ` Michael Kerrisk (man-pages)
2019-12-02 14:25 ` Adrian Reber
2019-12-01 12:18 ` Michael Kerrisk (man-pages)
2019-11-28 13:30 ` [PATCH 1/2] clone.2: tfix Christian Brauner
2019-12-01 12:16 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).