* [PATCH 1/2] clone.2: tfix @ 2019-11-28 12:46 Adrian Reber 2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Adrian Reber @ 2019-11-28 12:46 UTC (permalink / raw) To: mtk.manpages, Christian Brauner; +Cc: linux-man added two missing parentheses Signed-off-by: Adrian Reber <areber@redhat.com> --- man2/clone.2 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/man2/clone.2 b/man2/clone.2 index e5ab2a096..076b9258e 100644 --- a/man2/clone.2 +++ b/man2/clone.2 @@ -1349,13 +1349,13 @@ For example, on aarch64, .I stack must be a multiple of 16. .TP -.BR EINVAL " (" clone3 "() only" +.BR EINVAL " (" clone3 "() only)" .B CLONE_DETACHED was specified in the .I flags mask. .TP -.BR EINVAL " (" clone "() only" +.BR EINVAL " (" clone "() only)" .B CLONE_PIDFD was specified together with .B CLONE_DETACHED base-commit: 445fc03eebeee9b0e9ccaa4e2361f91f30749e8e -- 2.23.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] clone.2: added clone3() set_tid information 2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber @ 2019-11-28 12:46 ` Adrian Reber 2019-11-28 17:24 ` Christian Brauner 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 2019-11-28 13:30 ` [PATCH 1/2] clone.2: tfix Christian Brauner 2019-12-01 12:16 ` Michael Kerrisk (man-pages) 2 siblings, 2 replies; 10+ messages in thread From: Adrian Reber @ 2019-11-28 12:46 UTC (permalink / raw) To: mtk.manpages, Christian Brauner; +Cc: linux-man Signed-off-by: Adrian Reber <areber@redhat.com> --- man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) diff --git a/man2/clone.2 b/man2/clone.2 index 076b9258e..59c13ec35 100644 --- a/man2/clone.2 +++ b/man2/clone.2 @@ -195,6 +195,8 @@ struct clone_args { u64 stack; /* Pointer to lowest byte of stack */ u64 stack_size; /* Size of stack */ u64 tls; /* Location of new TLS */ + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ }; .EE .in @@ -262,6 +264,8 @@ flags & 0xff exit_signal stack stack \fP---\fP stack_size tls tls See CLONE_SETTLS +\fP---\fP set_tid See below for details +\fP---\fP set_tid_size .TE .RE .\" @@ -285,6 +289,74 @@ options when waiting for the child with If no signal (i.e., zero) is specified, then the parent process is not signaled when the child terminates. .\" +.SS The set_tid array +.PP +The +.I set_tid +array is used to select a certain PID for the process to be created by +.BR clone3 (). +If the PID of the newly created process should only be set for the current +PID namespace or in the newly created PID namespace (if +.I flags +contains +.BR CLONE_NEWPID ) +then the first element in the +.I set_tid +array has to be the desired PID and +.I set_tid_size +needs to be 1. +.PP +If the PID of the newly created process should have a certain value in +multiple PID namespaces the +.I set_tid +array can have multiple entries. The first entry defines the PID in the most +nested PID namespace and all following entries contain the PID of the +corresponding parent PID namespace. The number of PID namespaces in which a PID +should be set is defined by +.I set_tid_size +which cannot be larger than the number of currently nested PID namespaces. +.PP +To create a process with the following PIDs: +.RS +.TS +lb lb +l l . +PID NS level Requested PID +0 (host) 31496 +1 42 +2 7 +.TE +.RE +.PP +The +.I set_tid +array would need to be filled with: +.PP +.EX + set_tid[0] = 7; + set_tid[1] = 42; + set_tid[2] = 31496; + set_tid_size = 3; +.EE +.PP +If only the PID of the two innermost PID namespaces +should be defined it needs to be set like this: +.PP +.EX + set_tid[0] = 7; + set_tid[1] = 42; + set_tid_size = 2; +.EE +.PP +The PID in the PID namespaces outside the two innermost PID namespaces +is then selected the same way as any other PID is selected. +.PP +Only a privileged process +.RB ( CAP_SYS_ADMIN ) +can set +.I set_tid +to select a PID for the process to be created. +.\" .SS The flags mask .PP Both @@ -1379,6 +1451,16 @@ in the .I flags mask. .TP +.BR EINVAL " (" clone3 "() only)" +.I set_tid_size +larger than current number of nested PID namespaces or maximum number of +nested PID namespaces was specified. +.TP +.BR EINVAL " (" clone3 "() only)" +If one of the PIDs specified in +.I set_tid +was an invalid PID. +.TP .B ENOMEM Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller's context that need to be @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment (i.e., the caller's root directory does not match the root directory of the mount namespace in which it resides). .TP +.BR EPERM " (" clone3 "() only)" +If +.I set_tid +with +.I set_tid_size +larger than 0 was specified by an unprivileged process (process without +\fBCAP_SYS_ADMIN\fP). +.TP .BR ERESTARTNOINTR " (since Linux 2.6.17)" .\" commit 4a2c7a7837da1b91468e50426066d988050e4d56 System call was interrupted by a signal and will be restarted. -- 2.23.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information 2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber @ 2019-11-28 17:24 ` Christian Brauner 2019-11-29 13:56 ` Adrian Reber 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 1 sibling, 2 replies; 10+ messages in thread From: Christian Brauner @ 2019-11-28 17:24 UTC (permalink / raw) To: Adrian Reber; +Cc: mtk.manpages, linux-man On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote: > Signed-off-by: Adrian Reber <areber@redhat.com> > --- > man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 90 insertions(+) > > diff --git a/man2/clone.2 b/man2/clone.2 > index 076b9258e..59c13ec35 100644 > --- a/man2/clone.2 > +++ b/man2/clone.2 > @@ -195,6 +195,8 @@ struct clone_args { > u64 stack; /* Pointer to lowest byte of stack */ > u64 stack_size; /* Size of stack */ > u64 tls; /* Location of new TLS */ > + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ > + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ > }; > .EE > .in > @@ -262,6 +264,8 @@ flags & 0xff exit_signal > stack stack > \fP---\fP stack_size > tls tls See CLONE_SETTLS > +\fP---\fP set_tid See below for details > +\fP---\fP set_tid_size > .TE > .RE > .\" > @@ -285,6 +289,74 @@ options when waiting for the child with > If no signal (i.e., zero) is specified, then the parent process is not signaled > when the child terminates. > .\" > +.SS The set_tid array > +.PP > +The > +.I set_tid > +array is used to select a certain PID for the process to be created by > +.BR clone3 (). > +If the PID of the newly created process should only be set for the current > +PID namespace or in the newly created PID namespace (if > +.I flags > +contains > +.BR CLONE_NEWPID ) > +then the first element in the > +.I set_tid > +array has to be the desired PID and > +.I set_tid_size > +needs to be 1. > +.PP > +If the PID of the newly created process should have a certain value in > +multiple PID namespaces the > +.I set_tid > +array can have multiple entries. The first entry defines the PID in the most > +nested PID namespace and all following entries contain the PID of the > +corresponding parent PID namespace. The number of PID namespaces in which a PID > +should be set is defined by > +.I set_tid_size > +which cannot be larger than the number of currently nested PID namespaces. "It's upper cap is the kernel-enforced general nesting limit." or sm like that > +.PP > +To create a process with the following PIDs: in a pid namespace hierarchy > +.RS > +.TS > +lb lb > +l l . > +PID NS level Requested PID > +0 (host) 31496 > +1 42 > +2 7 > +.TE > +.RE > +.PP Set the array to: > +.PP > +.EX > + set_tid[0] = 7; > + set_tid[1] = 42; > + set_tid[2] = 31496; > + set_tid_size = 3; > +.EE > +.PP > +If only the PID of the two innermost PID namespaces need to be specified set the array to: > +.PP > +.EX > + set_tid[0] = 7; > + set_tid[1] = 42; > + set_tid_size = 2; > +.EE > +.PP > +The PID in the PID namespaces outside the two innermost PID namespaces > +is then selected the same way as any other PID is selected. s/is then/will be/ > +.PP > +Only a privileged process > +.RB ( CAP_SYS_ADMIN ) Maybe more specific: The set_tid feature requires CAP_SYS_ADMIN in all owning user namespaces of the target pid namespaces. > +can set > +.I set_tid > +to select a PID for the process to be created. > +.\" > .SS The flags mask > .PP > Both > @@ -1379,6 +1451,16 @@ in the > .I flags > mask. > .TP > +.BR EINVAL " (" clone3 "() only)" > +.I set_tid_size > +larger than current number of nested PID namespaces or maximum number of > +nested PID namespaces was specified. > +.TP > +.BR EINVAL " (" clone3 "() only)" > +If one of the PIDs specified in > +.I set_tid > +was an invalid PID. > +.TP > .B ENOMEM > Cannot allocate sufficient memory to allocate a task structure for the > child, or to copy those parts of the caller's context that need to be > @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment > (i.e., the caller's root directory does not match the root directory > of the mount namespace in which it resides). > .TP > +.BR EPERM " (" clone3 "() only)" > +If > +.I set_tid > +with > +.I set_tid_size > +larger than 0 was specified by an unprivileged process (process without > +\fBCAP_SYS_ADMIN\fP). without CAP_SYS_ADMIN in _one_ of the owning user namespaces of the target pid namespace maybe? Where's EEXIST? Christian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information 2019-11-28 17:24 ` Christian Brauner @ 2019-11-29 13:56 ` Adrian Reber 2019-12-01 12:17 ` Michael Kerrisk (man-pages) 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 10+ messages in thread From: Adrian Reber @ 2019-11-29 13:56 UTC (permalink / raw) To: Christian Brauner; +Cc: mtk.manpages, linux-man On Thu, Nov 28, 2019 at 06:24:05PM +0100, Christian Brauner wrote: > On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote: > > Signed-off-by: Adrian Reber <areber@redhat.com> > > --- > > man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 90 insertions(+) > > > > diff --git a/man2/clone.2 b/man2/clone.2 > > index 076b9258e..59c13ec35 100644 > > --- a/man2/clone.2 > > +++ b/man2/clone.2 > > @@ -195,6 +195,8 @@ struct clone_args { > > u64 stack; /* Pointer to lowest byte of stack */ > > u64 stack_size; /* Size of stack */ > > u64 tls; /* Location of new TLS */ > > + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ > > + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ > > }; > > .EE > > .in > > @@ -262,6 +264,8 @@ flags & 0xff exit_signal > > stack stack > > \fP---\fP stack_size > > tls tls See CLONE_SETTLS > > +\fP---\fP set_tid See below for details > > +\fP---\fP set_tid_size > > .TE > > .RE > > .\" > > @@ -285,6 +289,74 @@ options when waiting for the child with > > If no signal (i.e., zero) is specified, then the parent process is not signaled > > when the child terminates. > > .\" > > +.SS The set_tid array > > +.PP > > +The > > +.I set_tid > > +array is used to select a certain PID for the process to be created by > > +.BR clone3 (). > > +If the PID of the newly created process should only be set for the current > > +PID namespace or in the newly created PID namespace (if > > +.I flags > > +contains > > +.BR CLONE_NEWPID ) > > +then the first element in the > > +.I set_tid > > +array has to be the desired PID and > > +.I set_tid_size > > +needs to be 1. > > +.PP > > +If the PID of the newly created process should have a certain value in > > +multiple PID namespaces the > > +.I set_tid > > +array can have multiple entries. The first entry defines the PID in the most > > +nested PID namespace and all following entries contain the PID of the > > +corresponding parent PID namespace. The number of PID namespaces in which a PID > > +should be set is defined by > > +.I set_tid_size > > +which cannot be larger than the number of currently nested PID namespaces. > > "It's upper cap is the kernel-enforced general nesting limit." > or sm like that Is that an addition to my sentence or a replacement. I think at this point it is more important to point out that it cannot be larger than the number of currently nested PID namespaces. Later (at EPERM) I am also mentioning that it cannot be larger than the maximum number of nested PID namespaces. The code does indeed check if set_tid_size is larger than the maximum number of possible nested PID namespaces for the user, I think, when calling clone3(), it is more relevant that set_tid_size is not larger than the number of currently nested PID namespaces. The maximum number of possible nested PID namespaces is more likely enforced during unshare() or CLONE_NEWPID (which could be happening at the same point in time as set_tid_size larger than maximum number of nested PID namespace). This definitely feels like too much discussion for a single sentence ;) I can add a sentence about the maximum number of nested PID namespaces here in addition to the one at EPERM. I do not think it is relevant for the user at this point in time. Adrian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information 2019-11-29 13:56 ` Adrian Reber @ 2019-12-01 12:17 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 10+ messages in thread From: Michael Kerrisk (man-pages) @ 2019-12-01 12:17 UTC (permalink / raw) To: Adrian Reber, Christian Brauner; +Cc: linux-man Hi Adrian, Thank you for the patch. I have some review comments below, and also further comments in my reply to Christian's notes. On 11/29/19 2:56 PM, Adrian Reber wrote: > On Thu, Nov 28, 2019 at 06:24:05PM +0100, Christian Brauner wrote: >> On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote: >>> Signed-off-by: Adrian Reber <areber@redhat.com> >>> --- >>> man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 90 insertions(+) >>> >>> diff --git a/man2/clone.2 b/man2/clone.2 >>> index 076b9258e..59c13ec35 100644 >>> --- a/man2/clone.2 >>> +++ b/man2/clone.2 >>> @@ -195,6 +195,8 @@ struct clone_args { >>> u64 stack; /* Pointer to lowest byte of stack */ >>> u64 stack_size; /* Size of stack */ >>> u64 tls; /* Location of new TLS */ >>> + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ >>> + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ >>> }; >>> .EE >>> .in >>> @@ -262,6 +264,8 @@ flags & 0xff exit_signal >>> stack stack >>> \fP---\fP stack_size >>> tls tls See CLONE_SETTLS >>> +\fP---\fP set_tid See below for details >>> +\fP---\fP set_tid_size >>> .TE >>> .RE >>> .\" >>> @@ -285,6 +289,74 @@ options when waiting for the child with >>> If no signal (i.e., zero) is specified, then the parent process is not signaled >>> when the child terminates. >>> .\" >>> +.SS The set_tid array >>> +.PP >>> +The >>> +.I set_tid >>> +array is used to select a certain PID for the process to be created by >>> +.BR clone3 (). >>> +If the PID of the newly created process should only be set for the current >>> +PID namespace or in the newly created PID namespace (if >>> +.I flags >>> +contains >>> +.BR CLONE_NEWPID ) >>> +then the first element in the >>> +.I set_tid >>> +array has to be the desired PID and >>> +.I set_tid_size >>> +needs to be 1. >>> +.PP >>> +If the PID of the newly created process should have a certain value in >>> +multiple PID namespaces the >>> +.I set_tid >>> +array can have multiple entries. The first entry defines the PID in the most >>> +nested PID namespace and all following entries contain the PID of the >>> +corresponding parent PID namespace. The number of PID namespaces in which a PID >>> +should be set is defined by >>> +.I set_tid_size >>> +which cannot be larger than the number of currently nested PID namespaces. >> >> "It's upper cap is the kernel-enforced general nesting limit." >> or sm like that > > Is that an addition to my sentence or a replacement. I think at this > point it is more important to point out that it cannot be larger than > the number of currently nested PID namespaces. Later (at EPERM) I am > also mentioning that it cannot be larger than the maximum number of > nested PID namespaces. The code does indeed check if set_tid_size is > larger than the maximum number of possible nested PID namespaces for > the user, I think, when calling clone3(), it is more relevant that > set_tid_size is not larger than the number of currently nested PID > namespaces. The maximum number of possible nested PID namespaces is more > likely enforced during unshare() or CLONE_NEWPID (which could be > happening at the same point in time as set_tid_size larger than maximum > number of nested PID namespace). > > This definitely feels like too much discussion for a single sentence ;) > > I can add a sentence about the maximum number of nested PID namespaces > here in addition to the one at EPERM. > I do not think it is relevant for the user at this point in time. I'm inclined to agree with you. The maximum nesting depth is a general detail of PID NSs and is already covered in pid_namespaces((7). I think there's no need to repeat or allude to that detial here the detail here. Thanks, Michael ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information 2019-11-28 17:24 ` Christian Brauner 2019-11-29 13:56 ` Adrian Reber @ 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 2019-12-02 14:25 ` Adrian Reber 1 sibling, 1 reply; 10+ messages in thread From: Michael Kerrisk (man-pages) @ 2019-12-01 12:18 UTC (permalink / raw) To: Christian Brauner, Adrian Reber; +Cc: linux-man Hello Christian, and Adrian, Christian, thanks for the review! On 11/28/19 6:24 PM, Christian Brauner wrote: > On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote: >> Signed-off-by: Adrian Reber <areber@redhat.com> >> --- >> man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 90 insertions(+) >> >> diff --git a/man2/clone.2 b/man2/clone.2 >> index 076b9258e..59c13ec35 100644 >> --- a/man2/clone.2 >> +++ b/man2/clone.2 >> @@ -195,6 +195,8 @@ struct clone_args { >> u64 stack; /* Pointer to lowest byte of stack */ >> u64 stack_size; /* Size of stack */ >> u64 tls; /* Location of new TLS */ >> + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ >> + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ >> }; >> .EE >> .in >> @@ -262,6 +264,8 @@ flags & 0xff exit_signal >> stack stack >> \fP---\fP stack_size >> tls tls See CLONE_SETTLS >> +\fP---\fP set_tid See below for details >> +\fP---\fP set_tid_size >> .TE >> .RE >> .\" >> @@ -285,6 +289,74 @@ options when waiting for the child with >> If no signal (i.e., zero) is specified, then the parent process is not signaled >> when the child terminates. >> .\" >> +.SS The set_tid array >> +.PP >> +The >> +.I set_tid >> +array is used to select a certain PID for the process to be created by >> +.BR clone3 (). >> +If the PID of the newly created process should only be set for the current >> +PID namespace or in the newly created PID namespace (if >> +.I flags >> +contains >> +.BR CLONE_NEWPID ) >> +then the first element in the >> +.I set_tid >> +array has to be the desired PID and >> +.I set_tid_size >> +needs to be 1. >> +.PP >> +If the PID of the newly created process should have a certain value in >> +multiple PID namespaces the >> +.I set_tid >> +array can have multiple entries. The first entry defines the PID in the most >> +nested PID namespace and all following entries contain the PID of the >> +corresponding parent PID namespace. The number of PID namespaces in which a PID >> +should be set is defined by >> +.I set_tid_size >> +which cannot be larger than the number of currently nested PID namespaces. > > "It's upper cap is the kernel-enforced general nesting limit." > or sm like that See my comments in a separate mail where Adrian replied to the above. >> +.PP >> +To create a process with the following PIDs: > > in a pid namespace hierarchy (okay) >> +.RS >> +.TS >> +lb lb >> +l l . >> +PID NS level Requested PID >> +0 (host) 31496 >> +1 42 >> +2 7 >> +.TE >> +.RE >> +.PP > Christian, you trim away the piece: [[ +The +.I set_tid +array would need to be filled with: ]] that you suggest replacing with: > Set the array to: I think the suggested change is fine, but when you trim the text in this way (as you do a few times below, it makes it overly difficult to review your suggestions, since I must separately refer back to Adrian's original mail to see what is being replaced (which is not comfortable on my 13" laptop 11km above Belgium :-)). >> +.PP >> +.EX >> + set_tid[0] = 7; >> + set_tid[1] = 42; >> + set_tid[2] = 31496; >> + set_tid_size = 3; >> +.EE >> +.PP >> +If only the PID of the two innermost PID namespaces s/PID of/PIDs in/ > > need to be specified set the array to: (okay; but see my comment about trimming above.) >> +.PP >> +.EX >> + set_tid[0] = 7; >> + set_tid[1] = 42; >> + set_tid_size = 2; >> +.EE >> +.PP >> +The PID in the PID namespaces outside the two innermost PID namespaces >> +is then selected the same way as any other PID is selected. > > s/is then/will be/ (okay) >> +.PP >> +Only a privileged process >> +.RB ( CAP_SYS_ADMIN ) > > Maybe more specific: > > The set_tid feature requires CAP_SYS_ADMIN in all owning user namespaces > of the target pid namespaces. Excellent addition. >> +can set >> +.I set_tid >> +to select a PID for the process to be created. >> +.\" >> .SS The flags mask >> .PP >> Both >> @@ -1379,6 +1451,16 @@ in the >> .I flags >> mask. >> .TP >> +.BR EINVAL " (" clone3 "() only)" >> +.I set_tid_size >> +larger than current number of nested PID namespaces or maximum number of >> +nested PID namespaces was specified. The last piece of that sentence feels redundant to me, and it could/should be removed, I think. The current number of nested PID namespaces can't exceed the maximum number of nested namespaces, so mentioning the latter as a limit seems unnecessary (and even a little confusing, since it leaves the reader wondering whether "current" could exceed "maximum".) Or have I misunderstood what you intend to convey, Adrian? >> +.TP >> +.BR EINVAL " (" clone3 "() only)" >> +If one of the PIDs specified in >> +.I set_tid >> +was an invalid PID. >> +.TP >> .B ENOMEM >> Cannot allocate sufficient memory to allocate a task structure for the >> child, or to copy those parts of the caller's context that need to be >> @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment >> (i.e., the caller's root directory does not match the root directory >> of the mount namespace in which it resides). >> .TP >> +.BR EPERM " (" clone3 "() only)" >> +If >> +.I set_tid >> +with >> +.I set_tid_size >> +larger than 0 was specified by an unprivileged process (process without >> +\fBCAP_SYS_ADMIN\fP). > > without CAP_SYS_ADMIN in _one_ of the owning user namespaces of the > target pid namespace > > maybe? Adrian, I think there's no need to mention 'set_tid'. That's implied by 'set_tid_size' being greater than 0. How about: [[ .BR EPERM " (" clone3 "() only)" .I set_tid_size was greater than zero, and the caller lacks the .B CAP_SYS_ADMIN capability in one or more of the user namespaces that own the corresponding PID namespaces. ]] ? > Where's EEXIST? Indeed! I speculate (without having looked at the kernel code): [[ .BR EPERM " (" clone3 "() only)" One or more of the PIDs specified in .I set_tid already exists in the corresponding namespace. ]] Thanks, Michael ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information 2019-12-01 12:18 ` Michael Kerrisk (man-pages) @ 2019-12-02 14:25 ` Adrian Reber 0 siblings, 0 replies; 10+ messages in thread From: Adrian Reber @ 2019-12-02 14:25 UTC (permalink / raw) To: Michael Kerrisk (man-pages); +Cc: Christian Brauner, linux-man Christian and Michael, thanks for the reviews. I tried to apply all suggested changes in v2 which I will send out in a few minutes. Adrian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] clone.2: added clone3() set_tid information 2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber 2019-11-28 17:24 ` Christian Brauner @ 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 1 sibling, 0 replies; 10+ messages in thread From: Michael Kerrisk (man-pages) @ 2019-12-01 12:18 UTC (permalink / raw) To: Adrian Reber, Christian Brauner; +Cc: linux-man Hello Adrian, On 11/28/19 1:46 PM, Adrian Reber wrote: > Signed-off-by: Adrian Reber <areber@redhat.com> > --- > man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 90 insertions(+) > > diff --git a/man2/clone.2 b/man2/clone.2 > index 076b9258e..59c13ec35 100644 > --- a/man2/clone.2 > +++ b/man2/clone.2 > @@ -195,6 +195,8 @@ struct clone_args { > u64 stack; /* Pointer to lowest byte of stack */ > u64 stack_size; /* Size of stack */ > u64 tls; /* Location of new TLS */ > + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ > + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ > }; > .EE > .in > @@ -262,6 +264,8 @@ flags & 0xff exit_signal > stack stack > \fP---\fP stack_size > tls tls See CLONE_SETTLS > +\fP---\fP set_tid See below for details > +\fP---\fP set_tid_size > .TE > .RE > .\" > @@ -285,6 +289,74 @@ options when waiting for the child with > If no signal (i.e., zero) is specified, then the parent process is not signaled > when the child terminates. > .\" > +.SS The set_tid array > +.PP > +The > +.I set_tid > +array is used to select a certain PID for the process to be created by s/is used/may be used/ Because it's not required to use this array, right? I mean, the default is that the kernel chooses the PIDs. Perhaps this needs to be more clearly stated at the start of this subsection. How about: [[ By default, the kernel chooses the next sequential PID for the new process in each of the PID namespaces where it is present. When creating a process with .BR clone3 (), the .I set_tid array can be used to select specific PIDs for the process in some or all of the PID namespaces where it is present. ]] ? > +.BR clone3 (). > +If the PID of the newly created process should only be set for the current > +PID namespace or in the newly created PID namespace (if > +.I flags > +contains > +.BR CLONE_NEWPID ) > +then the first element in the > +.I set_tid > +array has to be the desired PID and > +.I set_tid_size > +needs to be 1. > +.PP > +If the PID of the newly created process should have a certain value in > +multiple PID namespaces the > +.I set_tid > +array can have multiple entries. The first entry defines the PID in the most most *deeply* nested > +nested PID namespace and all following entries contain the PID of the > +corresponding parent PID namespace. The number of PID namespaces in which a PID > +should be set is defined by > +.I set_tid_size > +which cannot be larger than the number of currently nested PID namespaces. > +.PP > +To create a process with the following PIDs: > +.RS > +.TS > +lb lb > +l l . > +PID NS level Requested PID > +0 (host) 31496 > +1 42 > +2 7 > +.TE > +.RE > +.PP > +The > +.I set_tid > +array would need to be filled with: > +.PP > +.EX > + set_tid[0] = 7; > + set_tid[1] = 42; > + set_tid[2] = 31496; > + set_tid_size = 3; > +.EE > +.PP > +If only the PID of the two innermost PID namespaces > +should be defined it needs to be set like this: > +.PP > +.EX > + set_tid[0] = 7; > + set_tid[1] = 42; > + set_tid_size = 2; > +.EE > +.PP > +The PID in the PID namespaces outside the two innermost PID namespaces > +is then selected the same way as any other PID is selected. > +.PP > +Only a privileged process > +.RB ( CAP_SYS_ADMIN ) > +can set > +.I set_tid > +to select a PID for the process to be created. > +.\" > .SS The flags mask > .PP > Both > @@ -1379,6 +1451,16 @@ in the > .I flags > mask. > .TP > +.BR EINVAL " (" clone3 "() only)" > +.I set_tid_size > +larger than current number of nested PID namespaces or maximum number of > +nested PID namespaces was specified. > +.TP > +.BR EINVAL " (" clone3 "() only)" > +If one of the PIDs specified in > +.I set_tid > +was an invalid PID. > +.TP > .B ENOMEM > Cannot allocate sufficient memory to allocate a task structure for the > child, or to copy those parts of the caller's context that need to be > @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment > (i.e., the caller's root directory does not match the root directory > of the mount namespace in which it resides). > .TP > +.BR EPERM " (" clone3 "() only)" > +If > +.I set_tid > +with > +.I set_tid_size > +larger than 0 was specified by an unprivileged process (process without > +\fBCAP_SYS_ADMIN\fP). > +.TP > .BR ERESTARTNOINTR " (since Linux 2.6.17)" > .\" commit 4a2c7a7837da1b91468e50426066d988050e4d56 > System call was interrupted by a signal and will be restarted. Thanks, Michael ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] clone.2: tfix 2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber 2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber @ 2019-11-28 13:30 ` Christian Brauner 2019-12-01 12:16 ` Michael Kerrisk (man-pages) 2 siblings, 0 replies; 10+ messages in thread From: Christian Brauner @ 2019-11-28 13:30 UTC (permalink / raw) To: Adrian Reber; +Cc: mtk.manpages, linux-man On Thu, Nov 28, 2019 at 01:46:49PM +0100, Adrian Reber wrote: > added two missing parentheses > > Signed-off-by: Adrian Reber <areber@redhat.com> Seems reasonable to me. Acked-by: Christian Brauner <christian.brauner@ubuntu.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] clone.2: tfix 2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber 2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber 2019-11-28 13:30 ` [PATCH 1/2] clone.2: tfix Christian Brauner @ 2019-12-01 12:16 ` Michael Kerrisk (man-pages) 2 siblings, 0 replies; 10+ messages in thread From: Michael Kerrisk (man-pages) @ 2019-12-01 12:16 UTC (permalink / raw) To: Adrian Reber, Christian Brauner; +Cc: linux-man Hi Adrian, On 11/28/19 1:46 PM, Adrian Reber wrote: > added two missing parentheses > > Signed-off-by: Adrian Reber <areber@redhat.com> > --- > man2/clone.2 | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/man2/clone.2 b/man2/clone.2 > index e5ab2a096..076b9258e 100644 > --- a/man2/clone.2 > +++ b/man2/clone.2 > @@ -1349,13 +1349,13 @@ For example, on aarch64, > .I stack > must be a multiple of 16. > .TP > -.BR EINVAL " (" clone3 "() only" > +.BR EINVAL " (" clone3 "() only)" > .B CLONE_DETACHED > was specified in the > .I flags > mask. > .TP > -.BR EINVAL " (" clone "() only" > +.BR EINVAL " (" clone "() only)" > .B CLONE_PIDFD > was specified together with > .B CLONE_DETACHED > > base-commit: 445fc03eebeee9b0e9ccaa4e2361f91f30749e8e Thanks. Applied. Cheers, Michael ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-12-02 14:25 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-28 12:46 [PATCH 1/2] clone.2: tfix Adrian Reber 2019-11-28 12:46 ` [PATCH 2/2] clone.2: added clone3() set_tid information Adrian Reber 2019-11-28 17:24 ` Christian Brauner 2019-11-29 13:56 ` Adrian Reber 2019-12-01 12:17 ` Michael Kerrisk (man-pages) 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 2019-12-02 14:25 ` Adrian Reber 2019-12-01 12:18 ` Michael Kerrisk (man-pages) 2019-11-28 13:30 ` [PATCH 1/2] clone.2: tfix Christian Brauner 2019-12-01 12:16 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).