linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Update man pages for prctl and kcmp syscall
@ 2012-02-29 12:23 Cyrill Gorcunov
  2012-02-29 12:23 ` [PATCH 1/2] prctl: Add PR_SET_MM option description Cyrill Gorcunov
  2012-02-29 12:23 ` [PATCH 2/2] Add kcmp.2 manpage Cyrill Gorcunov
  0 siblings, 2 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-02-29 12:23 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML

Hi,

here is a first draft to describe prctl extension and new
kcmp syscall. So I woule _really_ appreciate any feedback.

Cyrill


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-02-29 12:23 [PATCH 0/2] Update man pages for prctl and kcmp syscall Cyrill Gorcunov
@ 2012-02-29 12:23 ` Cyrill Gorcunov
  2012-03-06 18:00   ` Michael Kerrisk (man-pages)
  2012-04-15  3:48   ` Michael Kerrisk (man-pages)
  2012-02-29 12:23 ` [PATCH 2/2] Add kcmp.2 manpage Cyrill Gorcunov
  1 sibling, 2 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-02-29 12:23 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Cyrill Gorcunov,
	Tejun Heo

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Tejun Heo <tj@kernel.org>
CC: Pavel Emelyanov <xemul@parallels.com>
---
 man2/prctl.2 |  104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 104 insertions(+), 0 deletions(-)

diff --git a/man2/prctl.2 b/man2/prctl.2
index effad2a..4d6244f 100644
--- a/man2/prctl.2
+++ b/man2/prctl.2
@@ -378,6 +378,110 @@ Return the current per-process machine check kill policy.
 All unused
 .BR prctl ()
 arguments must be zero.
+.TP
+.BR PR_SET_MM " (since Linux 3.3)"
+Allows a user to modify certain kernel memory map descriptor fields
+of the calling process.
+Usually these fields are set by the kernel and dynamic loader (see
+.BR ld.so (8)
+for more information) and a regular application should not use this feature.
+Still there are cases such as self-modifying programs, where a program might
+find it useful to change its own memory map.
+The kernel must be built with
+.BR CONFIG_CHECKPOINT_RESTORE
+option turned on, otherwise this feature will not be accessible
+from a user space level.
+The calling process must have
+.BR CAP_SYS_ADMIN
+(see
+.BR capabilities (7)
+for details) capability granted.
+The value in
+.I arg2
+is one of the options below, while
+.I arg3
+provides a new value for this option.
+
+.BR PR_SET_MM_START_CODE
+to set the address above which program text can run.
+The corresponding memory area must be readable and executable,
+but not writable or shareable (see
+.BR mprotect (2)
+and
+.BR mmap (2)
+for more information).
+
+.BR PR_SET_MM_END_CODE
+to set the address below which program text can run.
+The corresponding memory area must be readable and executable,
+but not writable or shareable.
+
+.BR PR_SET_MM_START_DATA
+to set the address above which program data+bss is placed.
+The corresponding memory area must be readable and writable,
+but not executable or shareable.
+
+.B PR_SET_MM_END_DATA
+to set the address below which program data+bss is placed.
+The corresponding memory area must be readable and writable,
+but not executable or shareable.
+
+.BR PR_SET_MM_START_STACK
+to set the start address of the stack.
+The corresponding memory area must be readable and writable.
+
+.BR PR_SET_MM_START_BRK
+to set the address above which program heap can be expanded with
+.BR brk (2)
+call.
+The address must not be greater than ending address of
+the current program data segment, neither it may exceed
+resource limit for data (see
+.BR setrlimit (2)
+for more information).
+
+.BR PR_SET_MM_BRK
+to set the current
+.BR brk (2)
+value.
+The requirements for address are the same as for
+.BR PR_SET_MM_START_BRK
+option.
+
+.BR PR_SET_MM_ARG_START
+to set the address above which program command line is placed.
+
+.BR PR_SET_MM_ARG_END
+to set the address below which program command line is placed.
+
+.BR PR_SET_MM_ENV_START
+to set the address above which program environment is placed.
+
+.BR PR_SET_MM_ENV_END
+to set the address below which program environment is placed.
+
+The address passed with
+.BR PR_SET_MM_ARG_START ,
+.BR PR_SET_MM_ARG_END ,
+.BR PR_SET_MM_ENV_START ,
+.BR PR_SET_MM_ENV_END ,
+should belong to a process stack area, thus corresponding memory area
+must be readable, writable and (depending on the kernel
+configuration) has
+.BR MAP_GROWSDOWN
+attribute set (see
+.BR mmap (2)
+for details).
+
+.BR PR_SET_MM_AUXV
+to set a new auxiliary vector.
+The
+.I arg3
+argument should provide the address of the vector.
+The
+.I arg4
+is the size of the vector.
+.\"
 .SH "RETURN VALUE"
 On success,
 .BR PR_GET_DUMPABLE ,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] Add kcmp.2 manpage
  2012-02-29 12:23 [PATCH 0/2] Update man pages for prctl and kcmp syscall Cyrill Gorcunov
  2012-02-29 12:23 ` [PATCH 1/2] prctl: Add PR_SET_MM option description Cyrill Gorcunov
@ 2012-02-29 12:23 ` Cyrill Gorcunov
  2012-02-29 12:34   ` Cyrill Gorcunov
  1 sibling, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-02-29 12:23 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin

NAME
       kcmp - compare if two processes do share a particular kernel resource


SYNOPSIS
       #define  GNU SOURCE         /* See feature test macros(7) */
       #include <unistd.h>
       #include <linux/kcmp.h>
       #include <sys/syscall.h>   /* For SYS xxx definitions */

       int syscall(  NR kcmp, pid1, pid2, type, idx1, idx2);


DESCRIPTION
       kcmp() allows to find out if two processes identified by pid1
       and pid2 do share kernel resources such as virtual memory, file

       The comparison type is one of the following

       KCMP FILE to compare two file descriptors specified by idx1 and idx2

       KCMP VM to compare whether processes do share virtual memory

       KCMP FILES to compare whether processes do share share the file descriptor table

       KCMP FS to compare whether processes do share the file system information

       KCMP SIGHAND to compare whether processes do share a signal handlers table

       KCMP IO to compare whether processes do share I/O context

       KCMP SYSVSEM to compare whether processes do share a single list of System V
       semaphore undo values


RETURN VALUE
       kcmp was designed to return values suitable for sorting.  This is particularly
       handy when one have to compare a large number

       The return value is merely a result of simple arithmetic comparison of kernel
       pointers (when kernel compares resources, it us

       The  easiest way to explain is to consider an example.  Lets say v1 and v2
       are the addresses of appropriate resources, then the return value is one
       of the following

       0 - v1 is equal to v2 , in other words we have a shared resource here

       1 - v1 is greater than v2

       2 - v1 is less than v2

       3 - v1 is not equal to v2 , but ordering information is unavailble.

       On error, -1 is returned, and errno is set appropriately.


CONFORMING TO
       kcmp() is Linux specific and should not be used in programs intended to
       be portable.

SEE ALSO
       clone(2)

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Pavel Emelyanov <xemul@parallels.com>
---
 man2/kcmp.2 |  105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 man2/kcmp.2

diff --git a/man2/kcmp.2 b/man2/kcmp.2
new file mode 100644
index 0000000..de07367
--- /dev/null
+++ b/man2/kcmp.2
@@ -0,0 +1,105 @@
+.TH KCMP 2 2012-02-01 "Linux" "Linux Programmer's Manual"
+
+.SH NAME
+kcmp \- compare if two processes do share a particular kernel resource
+
+.SH SYNOPSIS
+.nf
+.BR "#define _GNU_SOURCE" "         /* See feature_test_macros(7) */"
+.B #include <unistd.h>
+.B #include <linux/kcmp.h>
+.BR "#include <sys/syscall.h>   "  "/* For SYS_xxx definitions */"
+
+.BI "int syscall(__NR_kcmp, pid1, pid2, type, idx1, idx2);"
+.fi
+
+.SH DESCRIPTION
+
+.BR kcmp ()
+allows to find out if two processes identified by
+.I pid1
+and
+.I pid2
+do share kernel resources such as virtual memory, file descriptors,
+file system etc.
+
+The comparison
+.I type
+is one of the following
+
+.BR KCMP_FILE
+to compare two file descriptors specified by
+.I idx1
+and
+.I idx2
+
+.BR KCMP_VM
+to compare whether processes do share virtual memory
+
+.BR KCMP_FILES
+to compare whether processes do share share the file descriptor table
+
+.BR KCMP_FS
+to compare whether processes do share the file system information
+
+.BR KCMP_SIGHAND
+to compare whether processes do share a signal handlers table
+
+.BR KCMP_IO
+to compare whether processes do share I/O context
+
+.BR KCMP_SYSVSEM
+to compare whether processes do share a single list of
+System V semaphore undo values
+
+.SH "RETURN VALUE"
+.B kcmp
+was designed to return values suitable for sorting.
+This is particularly handy when one have to compare
+a large number of file descriptors.
+
+The return value is merely a result of simple arithmetic comparison
+of kernel pointers (when kernel compares resources, it uses their
+memory addresses).
+
+The easiest way to explain is to consider an example.
+Lets say
+.I v1
+and
+.I v2
+are the addresses of appropriate resources, then the return value
+is one of the following
+
+.B 0
+\-
+.I v1
+is equal to
+.I v2
+, in other words we have a shared resource here
+
+.B 1
+\-
+.I v1
+is greater than
+.I v2
+
+.B 2
+\-
+.I v1
+is less than
+.I v2
+
+.B 3
+\-
+.I v1
+is not equal to
+.I v2
+, but ordering information is unavailble.
+
+On error, \-1 is returned, and errno is set appropriately.
+
+.SH "CONFORMING TO"
+.BR kcmp ()
+is Linux specific and should not be used in programs intended to be portable.
+.SH "SEE ALSO"
+.BR clone (2)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] Add kcmp.2 manpage
  2012-02-29 12:23 ` [PATCH 2/2] Add kcmp.2 manpage Cyrill Gorcunov
@ 2012-02-29 12:34   ` Cyrill Gorcunov
  2012-02-29 12:41     ` Cyrill Gorcunov
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-02-29 12:34 UTC (permalink / raw)
  To: Michael Kerrisk, Andrew Morton, Pavel Emelyanov, linux-man, LKML,
	Eric W. Biederman, H. Peter Anvin

On Wed, Feb 29, 2012 at 04:23:17PM +0400, Cyrill Gorcunov wrote:
> 
>        The  easiest way to explain is to consider an example.  Lets say v1 and v2
>        are the addresses of appropriate resources, then the return value is one
>        of the following
> 
>        0 - v1 is equal to v2 , in other words we have a shared resource here
> 
>        1 - v1 is greater than v2
> 
>        2 - v1 is less than v2

1 and 2 should be swapped here, I'll update (this nit grow from text tossing,
so don't pay attention on it).

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] Add kcmp.2 manpage
  2012-02-29 12:34   ` Cyrill Gorcunov
@ 2012-02-29 12:41     ` Cyrill Gorcunov
  0 siblings, 0 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-02-29 12:41 UTC (permalink / raw)
  To: Michael Kerrisk, Andrew Morton, Pavel Emelyanov, linux-man, LKML,
	Eric W. Biederman, H. Peter Anvin

On Wed, Feb 29, 2012 at 04:34:08PM +0400, Cyrill Gorcunov wrote:
> 1 and 2 should be swapped here, I'll update (this nit grow from text tossing,
> so don't pay attention on it).
> 

Updated version below.

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Wed, 29 Feb 2012 16:40:13 +0400
Subject: [PATCH 2/2] Add kcmp.2 manpage

NAME
       kcmp - compare if two processes do share a particular kernel resource


SYNOPSIS
       #define  GNU SOURCE         /* See feature test macros(7) */
       #include <unistd.h>
       #include <linux/kcmp.h>
       #include <sys/syscall.h>   /* For SYS xxx definitions */

       int syscall(  NR kcmp, pid1, pid2, type, idx1, idx2);


DESCRIPTION
       kcmp() allows to find out if two processes identified by pid1
       and pid2 do share kernel resources such as virtual memory, file

       The comparison type is one of the following

       KCMP FILE to compare two file descriptors specified by idx1 and idx2

       KCMP VM to compare whether processes do share virtual memory

       KCMP FILES to compare whether processes do share share the file descriptor table

       KCMP FS to compare whether processes do share the file system information

       KCMP SIGHAND to compare whether processes do share a signal handlers table

       KCMP IO to compare whether processes do share I/O context

       KCMP SYSVSEM to compare whether processes do share a single list of System V
       semaphore undo values


RETURN VALUE
       kcmp was designed to return values suitable for sorting.  This is particularly
       handy when one have to compare a large number

       The return value is merely a result of simple arithmetic comparison of kernel
       pointers (when kernel compares resources, it us

       The  easiest way to explain is to consider an example.  Lets say v1 and v2
       are the addresses of appropriate resources, then the return value is one
       of the following

       0 - v1 is equal to v2 , in other words we have a shared resource here

       1 - v1 is less than v2

       2 - v1 is greater than v2

       3 - v1 is not equal to v2 , but ordering information is unavailble.

       On error, -1 is returned, and errno is set appropriately.


CONFORMING TO
       kcmp() is Linux specific and should not be used in programs intended to
       be portable.

SEE ALSO
       clone(2)

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Pavel Emelyanov <xemul@parallels.com>
---
 man2/kcmp.2 |  105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 man2/kcmp.2

diff --git a/man2/kcmp.2 b/man2/kcmp.2
new file mode 100644
index 0000000..de68109
--- /dev/null
+++ b/man2/kcmp.2
@@ -0,0 +1,105 @@
+.TH KCMP 2 2012-02-01 "Linux" "Linux Programmer's Manual"
+
+.SH NAME
+kcmp \- compare if two processes do share a particular kernel resource
+
+.SH SYNOPSIS
+.nf
+.BR "#define _GNU_SOURCE" "         /* See feature_test_macros(7) */"
+.B #include <unistd.h>
+.B #include <linux/kcmp.h>
+.BR "#include <sys/syscall.h>   "  "/* For SYS_xxx definitions */"
+
+.BI "int syscall(__NR_kcmp, pid1, pid2, type, idx1, idx2);"
+.fi
+
+.SH DESCRIPTION
+
+.BR kcmp ()
+allows to find out if two processes identified by
+.I pid1
+and
+.I pid2
+do share kernel resources such as virtual memory, file descriptors,
+file system etc.
+
+The comparison
+.I type
+is one of the following
+
+.BR KCMP_FILE
+to compare two file descriptors specified by
+.I idx1
+and
+.I idx2
+
+.BR KCMP_VM
+to compare whether processes do share virtual memory
+
+.BR KCMP_FILES
+to compare whether processes do share share the file descriptor table
+
+.BR KCMP_FS
+to compare whether processes do share the file system information
+
+.BR KCMP_SIGHAND
+to compare whether processes do share a signal handlers table
+
+.BR KCMP_IO
+to compare whether processes do share I/O context
+
+.BR KCMP_SYSVSEM
+to compare whether processes do share a single list of
+System V semaphore undo values
+
+.SH "RETURN VALUE"
+.B kcmp
+was designed to return values suitable for sorting.
+This is particularly handy when one have to compare
+a large number of file descriptors.
+
+The return value is merely a result of simple arithmetic comparison
+of kernel pointers (when kernel compares resources, it uses their
+memory addresses).
+
+The easiest way to explain is to consider an example.
+Lets say
+.I v1
+and
+.I v2
+are the addresses of appropriate resources, then the return value
+is one of the following
+
+.B 0
+\-
+.I v1
+is equal to
+.I v2
+, in other words we have a shared resource here
+
+.B 1
+\-
+.I v1
+is less than
+.I v2
+
+.B 2
+\-
+.I v1
+is greater than
+.I v2
+
+.B 3
+\-
+.I v1
+is not equal to
+.I v2
+, but ordering information is unavailble.
+
+On error, \-1 is returned, and errno is set appropriately.
+
+.SH "CONFORMING TO"
+.BR kcmp ()
+is Linux specific and should not be used in programs intended to be portable.
+.SH "SEE ALSO"
+.BR clone (2)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-02-29 12:23 ` [PATCH 1/2] prctl: Add PR_SET_MM option description Cyrill Gorcunov
@ 2012-03-06 18:00   ` Michael Kerrisk (man-pages)
  2012-03-06 18:22     ` Cyrill Gorcunov
  2012-04-15  3:48   ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-03-06 18:00 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Tejun Heo

Hi Cyrill,

Just a couple of comments for the moment.

On Thu, Mar 1, 2012 at 1:23 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Tejun Heo <tj@kernel.org>
> CC: Pavel Emelyanov <xemul@parallels.com>
> ---
>  man2/prctl.2 |  104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 104 insertions(+), 0 deletions(-)
>
> diff --git a/man2/prctl.2 b/man2/prctl.2
> index effad2a..4d6244f 100644
> --- a/man2/prctl.2
> +++ b/man2/prctl.2
> @@ -378,6 +378,110 @@ Return the current per-process machine check kill policy.
>  All unused
>  .BR prctl ()
>  arguments must be zero.
> +.TP
> +.BR PR_SET_MM " (since Linux 3.3)"
> +Allows a user to modify certain kernel memory map descriptor fields
> +of the calling process.
> +Usually these fields are set by the kernel and dynamic loader (see
> +.BR ld.so (8)
> +for more information) and a regular application should not use this feature.
> +Still there are cases such as self-modifying programs, where a program might
> +find it useful to change its own memory map.

By the way, do you have a *simple* program that demonstrates some
usage of R_SET_MM?

> +The kernel must be built with
> +.BR CONFIG_CHECKPOINT_RESTORE
> +option turned on, otherwise this feature will not be accessible
> +from a user space level.
> +The calling process must have
> +.BR CAP_SYS_ADMIN
> +(see
> +.BR capabilities (7)
> +for details) capability granted.

As we discussed earlier (offlist), there are probably better choices
than the hugely overloaded CAP_SYS_ADMIN (see
http://man7.org/linux/man-pages/man7/capabilities.7.html). And if the
capability governing PR_SET_MM is to change, then it would be good to
do so before 3.3 is released. What are the plans on this point?

Cheers,

Michael


> +The value in
> +.I arg2
> +is one of the options below, while
> +.I arg3
> +provides a new value for this option.
> +
> +.BR PR_SET_MM_START_CODE
> +to set the address above which program text can run.
> +The corresponding memory area must be readable and executable,
> +but not writable or shareable (see
> +.BR mprotect (2)
> +and
> +.BR mmap (2)
> +for more information).
> +
> +.BR PR_SET_MM_END_CODE
> +to set the address below which program text can run.
> +The corresponding memory area must be readable and executable,
> +but not writable or shareable.
> +
> +.BR PR_SET_MM_START_DATA
> +to set the address above which program data+bss is placed.
> +The corresponding memory area must be readable and writable,
> +but not executable or shareable.
> +
> +.B PR_SET_MM_END_DATA
> +to set the address below which program data+bss is placed.
> +The corresponding memory area must be readable and writable,
> +but not executable or shareable.
> +
> +.BR PR_SET_MM_START_STACK
> +to set the start address of the stack.
> +The corresponding memory area must be readable and writable.
> +
> +.BR PR_SET_MM_START_BRK
> +to set the address above which program heap can be expanded with
> +.BR brk (2)
> +call.
> +The address must not be greater than ending address of
> +the current program data segment, neither it may exceed
> +resource limit for data (see
> +.BR setrlimit (2)
> +for more information).
> +
> +.BR PR_SET_MM_BRK
> +to set the current
> +.BR brk (2)
> +value.
> +The requirements for address are the same as for
> +.BR PR_SET_MM_START_BRK
> +option.
> +
> +.BR PR_SET_MM_ARG_START
> +to set the address above which program command line is placed.
> +
> +.BR PR_SET_MM_ARG_END
> +to set the address below which program command line is placed.
> +
> +.BR PR_SET_MM_ENV_START
> +to set the address above which program environment is placed.
> +
> +.BR PR_SET_MM_ENV_END
> +to set the address below which program environment is placed.
> +
> +The address passed with
> +.BR PR_SET_MM_ARG_START ,
> +.BR PR_SET_MM_ARG_END ,
> +.BR PR_SET_MM_ENV_START ,
> +.BR PR_SET_MM_ENV_END ,
> +should belong to a process stack area, thus corresponding memory area
> +must be readable, writable and (depending on the kernel
> +configuration) has
> +.BR MAP_GROWSDOWN
> +attribute set (see
> +.BR mmap (2)
> +for details).
> +
> +.BR PR_SET_MM_AUXV
> +to set a new auxiliary vector.
> +The
> +.I arg3
> +argument should provide the address of the vector.
> +The
> +.I arg4
> +is the size of the vector.
> +.\"
>  .SH "RETURN VALUE"
>  On success,
>  .BR PR_GET_DUMPABLE ,
> --
> 1.7.7.6
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-03-06 18:00   ` Michael Kerrisk (man-pages)
@ 2012-03-06 18:22     ` Cyrill Gorcunov
  2012-03-06 19:52       ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-03-06 18:22 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Tejun Heo

On Wed, Mar 07, 2012 at 07:00:14AM +1300, Michael Kerrisk (man-pages) wrote:
> Hi Cyrill,
> 
> Just a couple of comments for the moment.
> 
> On Thu, Mar 1, 2012 at 1:23 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> > CC: Tejun Heo <tj@kernel.org>
> > CC: Pavel Emelyanov <xemul@parallels.com>
> > ---
> >  man2/prctl.2 |  104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 104 insertions(+), 0 deletions(-)
> >
> > diff --git a/man2/prctl.2 b/man2/prctl.2
> > index effad2a..4d6244f 100644
> > --- a/man2/prctl.2
> > +++ b/man2/prctl.2
> > @@ -378,6 +378,110 @@ Return the current per-process machine check kill policy.
> >  All unused
> >  .BR prctl ()
> >  arguments must be zero.
> > +.TP
> > +.BR PR_SET_MM " (since Linux 3.3)"
> > +Allows a user to modify certain kernel memory map descriptor fields
> > +of the calling process.
> > +Usually these fields are set by the kernel and dynamic loader (see
> > +.BR ld.so (8)
> > +for more information) and a regular application should not use this feature.
> > +Still there are cases such as self-modifying programs, where a program might
> > +find it useful to change its own memory map.
> 
> By the way, do you have a *simple* program that demonstrates some
> usage of R_SET_MM?

Hi Michael,

well, at moment we've only crtools itself which uses this facility,
so if we need complete standalone example I need to think about it.

> 
> > +The kernel must be built with
> > +.BR CONFIG_CHECKPOINT_RESTORE
> > +option turned on, otherwise this feature will not be accessible
> > +from a user space level.
> > +The calling process must have
> > +.BR CAP_SYS_ADMIN
> > +(see
> > +.BR capabilities (7)
> > +for details) capability granted.
> 
> As we discussed earlier (offlist), there are probably better choices
> than the hugely overloaded CAP_SYS_ADMIN (see
> http://man7.org/linux/man-pages/man7/capabilities.7.html). And if the
> capability governing PR_SET_MM is to change, then it would be good to
> do so before 3.3 is released. What are the plans on this point?
> 

Yeah, I thought about changing it to CAP_SYS_RESOURCE here.
And I'll post a patch. The problem at moment that there another
two snippets needed for prctl -- ability to set new /proc/pid/exe
symlink and to obtaine clear-tid-address. So there is a discussion
now about symlink change. Once we finish with it -- i'll post
update for capability.

If you prefer to have it done earlier -- no problem, I'll cook
a patch today instead on top of everything we've already
merged into linux-next. What would you prefer?


	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-03-06 18:22     ` Cyrill Gorcunov
@ 2012-03-06 19:52       ` Michael Kerrisk (man-pages)
  2012-03-06 20:01         ` Cyrill Gorcunov
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-03-06 19:52 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Tejun Heo

On Wed, Mar 7, 2012 at 7:22 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Wed, Mar 07, 2012 at 07:00:14AM +1300, Michael Kerrisk (man-pages) wrote:
>> Hi Cyrill,
>>
>> Just a couple of comments for the moment.
>>
>> On Thu, Mar 1, 2012 at 1:23 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>> > CC: Tejun Heo <tj@kernel.org>
>> > CC: Pavel Emelyanov <xemul@parallels.com>
>> > ---
>> >  man2/prctl.2 |  104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >  1 files changed, 104 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/man2/prctl.2 b/man2/prctl.2
>> > index effad2a..4d6244f 100644
>> > --- a/man2/prctl.2
>> > +++ b/man2/prctl.2
>> > @@ -378,6 +378,110 @@ Return the current per-process machine check kill policy.
>> >  All unused
>> >  .BR prctl ()
>> >  arguments must be zero.
>> > +.TP
>> > +.BR PR_SET_MM " (since Linux 3.3)"
>> > +Allows a user to modify certain kernel memory map descriptor fields
>> > +of the calling process.
>> > +Usually these fields are set by the kernel and dynamic loader (see
>> > +.BR ld.so (8)
>> > +for more information) and a regular application should not use this feature.
>> > +Still there are cases such as self-modifying programs, where a program might
>> > +find it useful to change its own memory map.
>>
>> By the way, do you have a *simple* program that demonstrates some
>> usage of R_SET_MM?
>
> Hi Michael,
>
> well, at moment we've only crtools itself which uses this facility,
> so if we need complete standalone example I need to think about it.
>
>>
>> > +The kernel must be built with
>> > +.BR CONFIG_CHECKPOINT_RESTORE
>> > +option turned on, otherwise this feature will not be accessible
>> > +from a user space level.
>> > +The calling process must have
>> > +.BR CAP_SYS_ADMIN
>> > +(see
>> > +.BR capabilities (7)
>> > +for details) capability granted.
>>
>> As we discussed earlier (offlist), there are probably better choices
>> than the hugely overloaded CAP_SYS_ADMIN (see
>> http://man7.org/linux/man-pages/man7/capabilities.7.html). And if the
>> capability governing PR_SET_MM is to change, then it would be good to
>> do so before 3.3 is released. What are the plans on this point?
>>
>
> Yeah, I thought about changing it to CAP_SYS_RESOURCE here.
> And I'll post a patch. The problem at moment that there another
> two snippets needed for prctl -- ability to set new /proc/pid/exe
> symlink and to obtaine clear-tid-address. So there is a discussion
> now about symlink change. Once we finish with it -- i'll post
> update for capability.
>
> If you prefer to have it done earlier -- no problem, I'll cook
> a patch today instead on top of everything we've already
> merged into linux-next. What would you prefer?

It would make sense if the capability requirements were finalized
before 3.3 is released. Changing them after 3.3 creates (at least a
little) pain for userspace.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-03-06 19:52       ` Michael Kerrisk (man-pages)
@ 2012-03-06 20:01         ` Cyrill Gorcunov
  2012-03-06 20:07           ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-03-06 20:01 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Tejun Heo

On Wed, Mar 07, 2012 at 08:52:30AM +1300, Michael Kerrisk (man-pages) wrote:
...
> >
> > If you prefer to have it done earlier -- no problem, I'll cook
> > a patch today instead on top of everything we've already
> > merged into linux-next. What would you prefer?
> 
> It would make sense if the capability requirements were finalized
> before 3.3 is released. Changing them after 3.3 creates (at least a
> little) pain for userspace.
> 

OK. I'll update and send a patch out.

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-03-06 20:01         ` Cyrill Gorcunov
@ 2012-03-06 20:07           ` Michael Kerrisk (man-pages)
  2012-03-06 20:16             ` Cyrill Gorcunov
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-03-06 20:07 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Tejun Heo

On Wed, Mar 7, 2012 at 9:01 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Wed, Mar 07, 2012 at 08:52:30AM +1300, Michael Kerrisk (man-pages) wrote:
> ...
>> >
>> > If you prefer to have it done earlier -- no problem, I'll cook
>> > a patch today instead on top of everything we've already
>> > merged into linux-next. What would you prefer?
>>
>> It would make sense if the capability requirements were finalized
>> before 3.3 is released. Changing them after 3.3 creates (at least a
>> little) pain for userspace.
>>
>
> OK. I'll update and send a patch out.

Take a look at http://man7.org/linux/man-pages/man7/capabilities.7.html

The two most obvious alternatives are CAP_SYS_RESOURCE and
CAP_SYS_NICE. Maybe CAP_SYS_NICE is better? I say this because of the
(slight) similarity to existing operations in the CAP_SYS_NICE list.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-03-06 20:07           ` Michael Kerrisk (man-pages)
@ 2012-03-06 20:16             ` Cyrill Gorcunov
  0 siblings, 0 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-03-06 20:16 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrew Morton, Pavel Emelyanov, linux-man, LKML, Tejun Heo

On Wed, Mar 07, 2012 at 09:07:38AM +1300, Michael Kerrisk (man-pages) wrote:
> >>
> >> It would make sense if the capability requirements were finalized
> >> before 3.3 is released. Changing them after 3.3 creates (at least a
> >> little) pain for userspace.
> >>
> >
> > OK. I'll update and send a patch out.
> 
> Take a look at http://man7.org/linux/man-pages/man7/capabilities.7.html
> 
> The two most obvious alternatives are CAP_SYS_RESOURCE and
> CAP_SYS_NICE. Maybe CAP_SYS_NICE is better? I say this because of the
> (slight) similarity to existing operations in the CAP_SYS_NICE list.
> 

Well, dunno Michael, CAP_SYS_RESOURCE looks a bit metter for me since
the process is modifying own 'resources' (in term of what it owns).
Maybe Andrew or Tejun have something to say?

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-02-29 12:23 ` [PATCH 1/2] prctl: Add PR_SET_MM option description Cyrill Gorcunov
  2012-03-06 18:00   ` Michael Kerrisk (man-pages)
@ 2012-04-15  3:48   ` Michael Kerrisk (man-pages)
  2012-04-15  6:54     ` Cyrill Gorcunov
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-04-15  3:48 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Pavel Emelyanov, linux-man, LKML, Tejun Heo

Cyrill,

While reviewing your patch to the prctl() manual page, I noticed the
following code inkernel/sys.c::prctl_set_mm():

        if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
                /* It must be existing VMA */
                if (!vma || vma->vm_start > addr)
                        goto out;
        }

At this point, the code causes an exit with error set to zero (i.e.,
success). This looks unintended to me. Is the code correct? I suspect
a return of -EFAULT or -ENOMEM is warranted.

Cheers,

Michael
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-04-15  3:48   ` Michael Kerrisk (man-pages)
@ 2012-04-15  6:54     ` Cyrill Gorcunov
  2012-04-15 10:13       ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-04-15  6:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Pavel Emelyanov, linux-man, LKML, Tejun Heo

On Sun, Apr 15, 2012 at 03:48:18PM +1200, Michael Kerrisk (man-pages) wrote:
> Cyrill,
> 
> While reviewing your patch to the prctl() manual page, I noticed the
> following code inkernel/sys.c::prctl_set_mm():
> 
>         if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
>                 /* It must be existing VMA */
>                 if (!vma || vma->vm_start > addr)
>                         goto out;
>         }
> 
> At this point, the code causes an exit with error set to zero (i.e.,
> success). This looks unintended to me. Is the code correct? I suspect
> a return of -EFAULT or -ENOMEM is warranted.

Hi Michael, yup, -EINVAL escaped (I think EFAULT or ENOMEM is not really
good here). I'll fix and send update. Thanks!

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-04-15  6:54     ` Cyrill Gorcunov
@ 2012-04-15 10:13       ` Michael Kerrisk (man-pages)
  2012-04-15 22:10         ` Cyrill Gorcunov
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-04-15 10:13 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Pavel Emelyanov, linux-man, LKML, Tejun Heo

On Sun, Apr 15, 2012 at 6:54 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Sun, Apr 15, 2012 at 03:48:18PM +1200, Michael Kerrisk (man-pages) wrote:
>> Cyrill,
>>
>> While reviewing your patch to the prctl() manual page, I noticed the
>> following code inkernel/sys.c::prctl_set_mm():
>>
>>         if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
>>                 /* It must be existing VMA */
>>                 if (!vma || vma->vm_start > addr)
>>                         goto out;
>>         }
>>
>> At this point, the code causes an exit with error set to zero (i.e.,
>> success). This looks unintended to me. Is the code correct? I suspect
>> a return of -EFAULT or -ENOMEM is warranted.
>
> Hi Michael, yup, -EINVAL escaped (I think EFAULT or ENOMEM is not really
> good here). I'll fix and send update. Thanks!

For what it's worth (I am no expert), it looks to me as though EFAULT
or ENOMEM is more usual after a failed find_vma(). Furthermore, EINVAL
is already heavily used, so not very informative as an error.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] prctl: Add PR_SET_MM option description
  2012-04-15 10:13       ` Michael Kerrisk (man-pages)
@ 2012-04-15 22:10         ` Cyrill Gorcunov
  0 siblings, 0 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2012-04-15 22:10 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Pavel Emelyanov, linux-man, LKML, Tejun Heo, Andrew Morton

On Sun, Apr 15, 2012 at 10:13:51PM +1200, Michael Kerrisk (man-pages) wrote:
> On Sun, Apr 15, 2012 at 6:54 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> > On Sun, Apr 15, 2012 at 03:48:18PM +1200, Michael Kerrisk (man-pages) wrote:
> >> Cyrill,
> >>
> >> While reviewing your patch to the prctl() manual page, I noticed the
> >> following code inkernel/sys.c::prctl_set_mm():
> >>
> >>         if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
> >>                 /* It must be existing VMA */
> >>                 if (!vma || vma->vm_start > addr)
> >>                         goto out;
> >>         }
> >>
> >> At this point, the code causes an exit with error set to zero (i.e.,
> >> success). This looks unintended to me. Is the code correct? I suspect
> >> a return of -EFAULT or -ENOMEM is warranted.
> >
> > Hi Michael, yup, -EINVAL escaped (I think EFAULT or ENOMEM is not really
> > good here). I'll fix and send update. Thanks!
> 
> For what it's worth (I am no expert), it looks to me as though EFAULT
> or ENOMEM is more usual after a failed find_vma(). Furthermore, EINVAL
> is already heavily used, so not very informative as an error.

Would not ENOMEM be decoded by glibc as "no-memory" usually associated
with lack of free memory?

You know, I'm starting to think this checks for existing vmas might be
redundant completely. I tried to make this prctl codes to look somehow
close to elf loading procedure, where start|end_code/data do correspond
vmas loaded by kernel while parsing pt-load sections, but now I think
this is not needed, because start|end_code/data is not changed after
file is loaded but when we do checkpoint (and then restore) the program
map might be seriously changed (the program may unmap original areas,alocate
new vmas, put there code/data or whatever) thus there might be no correspond
vma at all when we setup this addresses for memory map (if only I'm not
missing something). So I guess I could drop this "existing vmas"
requirements. Need to think more :)

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-04-15 22:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-29 12:23 [PATCH 0/2] Update man pages for prctl and kcmp syscall Cyrill Gorcunov
2012-02-29 12:23 ` [PATCH 1/2] prctl: Add PR_SET_MM option description Cyrill Gorcunov
2012-03-06 18:00   ` Michael Kerrisk (man-pages)
2012-03-06 18:22     ` Cyrill Gorcunov
2012-03-06 19:52       ` Michael Kerrisk (man-pages)
2012-03-06 20:01         ` Cyrill Gorcunov
2012-03-06 20:07           ` Michael Kerrisk (man-pages)
2012-03-06 20:16             ` Cyrill Gorcunov
2012-04-15  3:48   ` Michael Kerrisk (man-pages)
2012-04-15  6:54     ` Cyrill Gorcunov
2012-04-15 10:13       ` Michael Kerrisk (man-pages)
2012-04-15 22:10         ` Cyrill Gorcunov
2012-02-29 12:23 ` [PATCH 2/2] Add kcmp.2 manpage Cyrill Gorcunov
2012-02-29 12:34   ` Cyrill Gorcunov
2012-02-29 12:41     ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).