All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: prctl(PR_SET_MM)
       [not found] <20130222142603.987c6e3c.akpm@linux-foundation.org>
@ 2013-02-24  6:24 ` Amnon Shiloh
  2013-02-24  6:28 ` prctl(PR_SET_MM) Amnon Shiloh
  1 sibling, 0 replies; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-24  6:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Steven, Rostedt, rostedt, Oleg, Nesterov, oleg, Pedro, Alves,
	palves, Denys, Vlasenko, dvlasenk, Jan, Kratochvil,
	jan.kratochvil, Pavel, Emelyanov, xemul, Frederic, Weisbecker,
	fweisbec, Ingo, Molnar, mingo, Peter, Zijlstra, a.p.zijlstra,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]

Dear Andrew,

Andrew Morton <akpm@linux-foundation.org> Wrote:
> Well OK.  Put all that on top of a patch, add suitable signoffs and
> cc's and send it along?

The purpose of this patch is to allow privileged processes to set
their own per-memory memory-region fields:

      start_code, end_code, start_data, end_data, start_brk, brk,
      start_stack, arg_start, arg_end, env_start, env_end.

This functionality is needed by any application or package that
needs to reconstruct Linux processes, that is, to start them in
any way other than by means of an "execve()" from an executable
file.  This includes:

1. Restoring processes from a checkpoint-file (by all potential
   user-level checkpointing packages, not only CRIU's).
2. Restarting processes on another node after process migration.
3. Starting duplicated copies of a running process (for reliability
   and high-availablity).
4. Starting a process from an executable format that is not supported
   by Linux, thus requiring a "manual execve" by a user-level utility.
5. Similarly, starting a process from a networked and/or crypted
   executable that, for confidentiality, licensing or other reasons,
   may not be written to the local file-systems.

The code that does that was already included in the Linux kernel by
the CRIU group, in the form of "prctl(PR_SET_MM)", but prior to this
was enclosed within their private "#ifdef CONFIG_CHECKPOINT_RESTORE",
which is normally disabled.

It was not clear from your answer, Andrew, whether you prefer to
remove the "#ifdef CONFIG_CHECKPOINT_RESTORE" altogether from the
said code, or to enclose it in a new configuration option that is
enabled by default.   I therefore attach two alternative patches
to choose from: the first removes the #ifdef altogether while the
second introduces a new option.

Signed-off-by: Amnon Shiloh.

Best Regards,
Amnon.


> On Fri, 22 Feb 2013 12:18:01 +1100 (EST)
> u3557@miso.sublimeip.com (Amnon Shiloh) wrote:
> 
> > The code in "kernel/sys.c" that is currently within
> > CONFIG_CHECKPOINT_RESTORE is in fact, as I explain below,
> > one possible solution to a general issue, required by a wide
> > class of applications.  It just so happened that the CRIU group
> > were the first to place this, or an equivalent code, in the kernel,
> > that allows a privileged process to set its 11 per-process memory-region
> > fields:
> >      start_code, end_code, start_data, end_data, start_brk, brk,
> >      start_stack, arg_start, arg_end, env_start, env_end.
> > 
> > 
> > Contrary to the rest of the CHECKPOINT_RESTORE code, which is specific
> > to the CRIU package, the code in "kernel/sys.c" (or its equivalent) is
> > needed by ANY application or package that needs to reconstruct Linux
> > processes, that means, starting them from the middle rather than from
> > an executable file.
> > 
> > That includes user-level checkpointing (any, not just CRIU's),
> > process-migration (to other computers, as my own package does)
> > and process duplication (for high-availability/reliability) -
> > in fact even for starting a process from an executable format
> > that is not supported by Linux, thus requiring a "manual execve"
> > by a user-level utility.
> > 
> > My first preference is to remove that "#ifdef CONFIG_CHECKPOINT_RESTORE"
> > altogether.  Note that there are no security issues because this code
> > is already restricted to "capable(CAP_SYS_RESOURCE)".
> > Short of that is the proposed patch.
> 
> Well OK.  Put all that on top of a patch, add suitable signoffs and
> cc's and send it along?
> 

[-- Attachment #2: unified diff output, ASCII text --]
[-- Type: text/plain, Size: 2270 bytes --]

diff -Naur linux-3.8/init/Kconfig option2/init/Kconfig
--- linux-3.8/init/Kconfig	2013-02-19 10:28:34.000000000 +1030
+++ option2/init/Kconfig	2013-02-24 13:57:02.000000000 +1030
@@ -991,6 +991,7 @@
 config CHECKPOINT_RESTORE
 	bool "Checkpoint/restore support" if EXPERT
 	default n
+	select MM_FIELDS_SETTING
 	help
 	  Enables additional kernel features in a sake of checkpoint/restore.
 	  In particular it adds auxiliary prctl codes to setup process text,
@@ -999,6 +1000,22 @@
 
 	  If unsure, say N here.
 
+config MM_FIELDS_SETTING
+	bool "Allow modifying per-process memory-region fields"
+	default y
+	help
+	   Support "prctl(PR_SET_MM)" which allows applications to modify
+	   the following in their "mm_struct":
+
+	      start_code, end_code, start_data, end_data, start_brk, brk,
+	      start_stack, arg_start, arg_end, env_start, env_end.
+
+	    Also to modify their executable file ("/proc/self/exe").
+
+	    This option is needed for reconstructing processes (such as when
+	    restoring a process from a checkpoint; duplicating a process;
+	    or migrating it to another computer).
+
 menuconfig NAMESPACES
 	bool "Namespaces support" if EXPERT
 	default !EXPERT
diff -Naur linux-3.8/kernel/sys.c option2/kernel/sys.c
--- linux-3.8/kernel/sys.c	2013-02-19 10:28:34.000000000 +1030
+++ option2/kernel/sys.c	2013-02-24 10:37:08.000000000 +1030
@@ -1788,7 +1788,7 @@
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_MM_FIELDS_SETTING
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1981,18 +1981,22 @@
 	up_read(&mm->mmap_sem);
 	return error;
 }
+#else /* CONFIG_MM_FIELDS_SETTING */
 
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
-{
-	return put_user(me->clear_child_tid, tid_addr);
-}
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	return -EINVAL;
 }
+#endif
+
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
+{
+	return put_user(me->clear_child_tid, tid_addr);
+}
+
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return -EINVAL;

[-- Attachment #3: unified diff output, ASCII text --]
[-- Type: text/plain, Size: 836 bytes --]

diff -Naur linux-3.8/kernel/sys.c option1/kernel/sys.c
--- linux-3.8/kernel/sys.c	2013-02-19 10:28:34.000000000 +1030
+++ option1/kernel/sys.c	2013-02-24 10:47:45.000000000 +1030
@@ -1788,7 +1788,6 @@
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1982,17 +1981,12 @@
 	return error;
 }
 
+#ifdef CONFIG_CHECKPOINT_RESTORE
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return put_user(me->clear_child_tid, tid_addr);
 }
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
-static int prctl_set_mm(int opt, unsigned long addr,
-			unsigned long arg4, unsigned long arg5)
-{
-	return -EINVAL;
-}
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return -EINVAL;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
       [not found] <20130222142603.987c6e3c.akpm@linux-foundation.org>
  2013-02-24  6:24 ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-24  6:28 ` Amnon Shiloh
  1 sibling, 0 replies; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-24  6:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rostedt, oleg, palves, oleg, palves, dvlasenk, jan.kratochvil,
	xemul, fweisbec, mingo, a.p.zijlstra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]

Dear Andrew,

Andrew Morton <akpm@linux-foundation.org> Wrote:
> Well OK.  Put all that on top of a patch, add suitable signoffs and
> cc's and send it along?

The purpose of this patch is to allow privileged processes to set
their own per-memory memory-region fields:

      start_code, end_code, start_data, end_data, start_brk, brk,
      start_stack, arg_start, arg_end, env_start, env_end.

This functionality is needed by any application or package that
needs to reconstruct Linux processes, that is, to start them in
any way other than by means of an "execve()" from an executable
file.  This includes:

1. Restoring processes from a checkpoint-file (by all potential
   user-level checkpointing packages, not only CRIU's).
2. Restarting processes on another node after process migration.
3. Starting duplicated copies of a running process (for reliability
   and high-availablity).
4. Starting a process from an executable format that is not supported
   by Linux, thus requiring a "manual execve" by a user-level utility.
5. Similarly, starting a process from a networked and/or crypted
   executable that, for confidentiality, licensing or other reasons,
   may not be written to the local file-systems.

The code that does that was already included in the Linux kernel by
the CRIU group, in the form of "prctl(PR_SET_MM)", but prior to this
was enclosed within their private "#ifdef CONFIG_CHECKPOINT_RESTORE",
which is normally disabled.

It was not clear from your answer, Andrew, whether you prefer to
remove the "#ifdef CONFIG_CHECKPOINT_RESTORE" altogether from the
said code, or to enclose it in a new configuration option that is
enabled by default.   I therefore attach two alternative patches
to choose from: the first removes the #ifdef altogether while the
second introduces a new option.

Signed-off-by: Amnon Shiloh.

Best Regards,
Amnon.


> On Fri, 22 Feb 2013 12:18:01 +1100 (EST)
> u3557@miso.sublimeip.com (Amnon Shiloh) wrote:
> 
> > The code in "kernel/sys.c" that is currently within
> > CONFIG_CHECKPOINT_RESTORE is in fact, as I explain below,
> > one possible solution to a general issue, required by a wide
> > class of applications.  It just so happened that the CRIU group
> > were the first to place this, or an equivalent code, in the kernel,
> > that allows a privileged process to set its 11 per-process memory-region
> > fields:
> >      start_code, end_code, start_data, end_data, start_brk, brk,
> >      start_stack, arg_start, arg_end, env_start, env_end.
> > 
> > 
> > Contrary to the rest of the CHECKPOINT_RESTORE code, which is specific
> > to the CRIU package, the code in "kernel/sys.c" (or its equivalent) is
> > needed by ANY application or package that needs to reconstruct Linux
> > processes, that means, starting them from the middle rather than from
> > an executable file.
> > 
> > That includes user-level checkpointing (any, not just CRIU's),
> > process-migration (to other computers, as my own package does)
> > and process duplication (for high-availability/reliability) -
> > in fact even for starting a process from an executable format
> > that is not supported by Linux, thus requiring a "manual execve"
> > by a user-level utility.
> > 
> > My first preference is to remove that "#ifdef CONFIG_CHECKPOINT_RESTORE"
> > altogether.  Note that there are no security issues because this code
> > is already restricted to "capable(CAP_SYS_RESOURCE)".
> > Short of that is the proposed patch.
> 
> Well OK.  Put all that on top of a patch, add suitable signoffs and
> cc's and send it along?
> 

[-- Attachment #2: unified diff output, ASCII text --]
[-- Type: text/plain, Size: 2270 bytes --]

diff -Naur linux-3.8/init/Kconfig option2/init/Kconfig
--- linux-3.8/init/Kconfig	2013-02-19 10:28:34.000000000 +1030
+++ option2/init/Kconfig	2013-02-24 13:57:02.000000000 +1030
@@ -991,6 +991,7 @@
 config CHECKPOINT_RESTORE
 	bool "Checkpoint/restore support" if EXPERT
 	default n
+	select MM_FIELDS_SETTING
 	help
 	  Enables additional kernel features in a sake of checkpoint/restore.
 	  In particular it adds auxiliary prctl codes to setup process text,
@@ -999,6 +1000,22 @@
 
 	  If unsure, say N here.
 
+config MM_FIELDS_SETTING
+	bool "Allow modifying per-process memory-region fields"
+	default y
+	help
+	   Support "prctl(PR_SET_MM)" which allows applications to modify
+	   the following in their "mm_struct":
+
+	      start_code, end_code, start_data, end_data, start_brk, brk,
+	      start_stack, arg_start, arg_end, env_start, env_end.
+
+	    Also to modify their executable file ("/proc/self/exe").
+
+	    This option is needed for reconstructing processes (such as when
+	    restoring a process from a checkpoint; duplicating a process;
+	    or migrating it to another computer).
+
 menuconfig NAMESPACES
 	bool "Namespaces support" if EXPERT
 	default !EXPERT
diff -Naur linux-3.8/kernel/sys.c option2/kernel/sys.c
--- linux-3.8/kernel/sys.c	2013-02-19 10:28:34.000000000 +1030
+++ option2/kernel/sys.c	2013-02-24 10:37:08.000000000 +1030
@@ -1788,7 +1788,7 @@
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_MM_FIELDS_SETTING
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1981,18 +1981,22 @@
 	up_read(&mm->mmap_sem);
 	return error;
 }
+#else /* CONFIG_MM_FIELDS_SETTING */
 
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
-{
-	return put_user(me->clear_child_tid, tid_addr);
-}
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	return -EINVAL;
 }
+#endif
+
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
+{
+	return put_user(me->clear_child_tid, tid_addr);
+}
+
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return -EINVAL;

[-- Attachment #3: unified diff output, ASCII text --]
[-- Type: text/plain, Size: 836 bytes --]

diff -Naur linux-3.8/kernel/sys.c option1/kernel/sys.c
--- linux-3.8/kernel/sys.c	2013-02-19 10:28:34.000000000 +1030
+++ option1/kernel/sys.c	2013-02-24 10:47:45.000000000 +1030
@@ -1788,7 +1788,6 @@
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1982,17 +1981,12 @@
 	return error;
 }
 
+#ifdef CONFIG_CHECKPOINT_RESTORE
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return put_user(me->clear_child_tid, tid_addr);
 }
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
-static int prctl_set_mm(int opt, unsigned long addr,
-			unsigned long arg4, unsigned long arg5)
-{
-	return -EINVAL;
-}
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return -EINVAL;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-19  6:25         ` prctl(PR_SET_MM) Amnon Shiloh
  2013-02-20  8:39           ` prctl(PR_SET_MM) Cyrill Gorcunov
@ 2013-02-22 14:23           ` Denys Vlasenko
  1 sibling, 0 replies; 20+ messages in thread
From: Denys Vlasenko @ 2013-02-22 14:23 UTC (permalink / raw)
  To: u3557
  Cc: Amnon Shiloh, Steven Rostedt, Oleg Nesterov, Pedro Alves,
	Jan Kratochvil, Cyrill Gorcunov, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

On 02/19/2013 07:25 AM, Amnon Shiloh wrote:
> Steven Rostedt wrote:
> 
>> If only you, or a few people are using it (ie. distros don't see a
>> need), then it will be up to you to make the changes.
> 
> I believe that this functionality is of a general nature and is needed
> by many, not only by myself and by the CRIU group, but by all user-level
> software packages, past present and future, that provide some form or
> another of reconstructing a Linux process.

That's what "RESTORE" part of CONFIG_CHECKPOINT_RESTORE refers to:
the ability to restore (reconstruct) a process.

If you want to be able to restore a process, you need RESTORE
feature. It's that simple.

Why do you want yet another config option for it?
What's the problem if you simply enable CONFIG_CHECKPOINT_RESTORE?

The only problem I can imagine is "CONFIG_CHECKPOINT_RESTORE
enables too many things I don't need".

Frankly, I find it not very likely, unless you are planning
on working on resource-constrained machines (like mobile phone).

-- 
vda



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-21 22:18                     ` prctl(PR_SET_MM) Andrew Morton
  2013-02-21 22:42                       ` prctl(PR_SET_MM) Cyrill Gorcunov
@ 2013-02-22  1:18                       ` Amnon Shiloh
  1 sibling, 0 replies; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-22  1:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cyrill Gorcunov, Steven Rostedt, u3557, Oleg Nesterov,
	Pedro Alves, Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

Dear Andrew,

The code in "kernel/sys.c" that is currently within
CONFIG_CHECKPOINT_RESTORE is in fact, as I explain below,
one possible solution to a general issue, required by a wide
class of applications.  It just so happened that the CRIU group
were the first to place this, or an equivalent code, in the kernel,
that allows a privileged process to set its 11 per-process memory-region
fields:
     start_code, end_code, start_data, end_data, start_brk, brk,
     start_stack, arg_start, arg_end, env_start, env_end.


Contrary to the rest of the CHECKPOINT_RESTORE code, which is specific
to the CRIU package, the code in "kernel/sys.c" (or its equivalent) is
needed by ANY application or package that needs to reconstruct Linux
processes, that means, starting them from the middle rather than from
an executable file.

That includes user-level checkpointing (any, not just CRIU's),
process-migration (to other computers, as my own package does)
and process duplication (for high-availability/reliability) -
in fact even for starting a process from an executable format
that is not supported by Linux, thus requiring a "manual execve"
by a user-level utility.

My first preference is to remove that "#ifdef CONFIG_CHECKPOINT_RESTORE"
altogether.  Note that there are no security issues because this code
is already restricted to "capable(CAP_SYS_RESOURCE)".
Short of that is the proposed patch.

Best Regards,
Amnon.


Andrew Morton wrote:
> 
> On Thu, 21 Feb 2013 12:00:28 +0400
> Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> 
> > From: Amnon Shiloh <u3557@miso.sublimeip.com>
> > Subject: prctl: Make PR_SET_MM being depend on own CONFIG_MM_FIELDS_SETTING
> > 
> > ...
> > 
> > Signed-off-by: Amnon Shiloh <u3557@miso.sublimeip.com>
> 
> The "..." makes me sad.
> 
> If/when this patch gets sent for real, please explain the reasons? 
> AFAICT it permits the enabling of prctl(PR_SET_MM) in
> CONFIG_CHECKPOINT_RESTORE=n kernels.  If that was indeed the intent,
> why?
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-21 22:18                     ` prctl(PR_SET_MM) Andrew Morton
@ 2013-02-21 22:42                       ` Cyrill Gorcunov
  2013-02-22  1:18                       ` prctl(PR_SET_MM) Amnon Shiloh
  1 sibling, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2013-02-21 22:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Amnon Shiloh, Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

On Thu, Feb 21, 2013 at 02:18:41PM -0800, Andrew Morton wrote:
> On Thu, 21 Feb 2013 12:00:28 +0400
> Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> 
> > From: Amnon Shiloh <u3557@miso.sublimeip.com>
> > Subject: prctl: Make PR_SET_MM being depend on own CONFIG_MM_FIELDS_SETTING
> > 
> > ...
> > 
> > Signed-off-by: Amnon Shiloh <u3557@miso.sublimeip.com>
> 
> The "..." makes me sad.
> 
> If/when this patch gets sent for real, please explain the reasons? 
> AFAICT it permits the enabling of prctl(PR_SET_MM) in
> CONFIG_CHECKPOINT_RESTORE=n kernels.  If that was indeed the intent,
> why?

Sorry for this "...", it was a draft version for Amnon, not for inclusion.
As far as I understand Amnon needs these prctl opcodes to be enabled by
default (but still turnable off in Kconfig if needed) for his minimal
c/r software, he do not need the whole c/r functionality (procfs map-files,
get-tid-address,kcmp and such). That is the idea if I understand correctly.

Quoting Amnon
|
| Correct, this is an important feature that is useful for a whole 
| general class of applications, not only those needing CHECKPOINT_RESTORE. 
|
| Had this not been done as part of the CHECKPOINT_RESTORE project, it 
| would have certainly been done, sooner or later, by some other developers: 
| it just so happened that the CHECKPOINT_RESTORE people were the first to 
| (publically) fill this gap, but in fact this code in "kernel/sys.c" should 
| be general kernel code, not part of CHECKPOINT_RESTORE.
|

I personally don't mind if this code become y by default (it requires
cap-sys-resource capability granted anyway), but for normal c/r this
prctl opcodes only is not enough and CHECKPOINT_RESTORE should be set.
Thus, if people agree with enabling prctl extension by default I certainly
won't object.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-21  8:00                   ` prctl(PR_SET_MM) Cyrill Gorcunov
  2013-02-21  8:03                     ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-21 22:18                     ` Andrew Morton
  2013-02-21 22:42                       ` prctl(PR_SET_MM) Cyrill Gorcunov
  2013-02-22  1:18                       ` prctl(PR_SET_MM) Amnon Shiloh
  1 sibling, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2013-02-21 22:18 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Amnon Shiloh, Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

On Thu, 21 Feb 2013 12:00:28 +0400
Cyrill Gorcunov <gorcunov@openvz.org> wrote:

> From: Amnon Shiloh <u3557@miso.sublimeip.com>
> Subject: prctl: Make PR_SET_MM being depend on own CONFIG_MM_FIELDS_SETTING
> 
> ...
> 
> Signed-off-by: Amnon Shiloh <u3557@miso.sublimeip.com>

The "..." makes me sad.

If/when this patch gets sent for real, please explain the reasons? 
AFAICT it permits the enabling of prctl(PR_SET_MM) in
CONFIG_CHECKPOINT_RESTORE=n kernels.  If that was indeed the intent,
why?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-21  8:03                     ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-21  8:09                       ` Cyrill Gorcunov
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2013-02-21  8:09 UTC (permalink / raw)
  To: Amnon Shiloh
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Andrew Morton

On Thu, Feb 21, 2013 at 07:03:58PM +1100, Amnon Shiloh wrote:
> Hi,
> 
> Cyrill Gorcunov wrote:
> 
> > Wouldn't the below do the same trick but eliminate OR in preproc code?
> 
> Yes it would.  I don't mind having it either way.

OK, lets wait for opinions and see if this approach is acceptable.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-21  8:00                   ` prctl(PR_SET_MM) Cyrill Gorcunov
@ 2013-02-21  8:03                     ` Amnon Shiloh
  2013-02-21  8:09                       ` prctl(PR_SET_MM) Cyrill Gorcunov
  2013-02-21 22:18                     ` prctl(PR_SET_MM) Andrew Morton
  1 sibling, 1 reply; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-21  8:03 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Andrew Morton

Hi,

Cyrill Gorcunov wrote:

> Wouldn't the below do the same trick but eliminate OR in preproc code?

Yes it would.  I don't mind having it either way.

Best Regards,
Amnon.

> ---
> From: Amnon Shiloh <u3557@miso.sublimeip.com>
> Subject: prctl: Make PR_SET_MM being depend on own CONFIG_MM_FIELDS_SETTING
> 
> ...
> 
> Signed-off-by: Amnon Shiloh <u3557@miso.sublimeip.com>
> ---
>  init/Kconfig |   17 +++++++++++++++++
>  kernel/sys.c |   16 ++++++++--------
>  2 files changed, 25 insertions(+), 8 deletions(-)
> 
> Index: linux-2.6.git/init/Kconfig
> ===================================================================
> --- linux-2.6.git.orig/init/Kconfig
> +++ linux-2.6.git/init/Kconfig
> @@ -991,6 +991,7 @@ endif # CGROUPS
>  config CHECKPOINT_RESTORE
>  	bool "Checkpoint/restore support" if EXPERT
>  	default n
> +	select MM_FIELDS_SETTING
>  	help
>  	  Enables additional kernel features in a sake of checkpoint/restore.
>  	  In particular it adds auxiliary prctl codes to setup process text,
> @@ -999,6 +1000,22 @@ config CHECKPOINT_RESTORE
>  
>  	  If unsure, say N here.
>  
> +config MM_FIELDS_SETTING
> +	bool "Allow modifying per-process memory-region fields"
> +	default y
> +	help
> +	   Support "prctl(PR_SET_MM)" which allows applications to modify
> +	   the following in their "mm_struct":
> +
> +	      start_code, end_code, start_data, end_data, start_brk, brk,
> +	      start_stack, arg_start, arg_end, env_start, env_end.
> +
> +	    Also to modify their executable file ("/proc/self/exe").
> +
> +	    This option is needed for reconstructing processes (such as when
> +	    restoring a process from a checkpoint; duplicating a process;
> +	    or migrating it to another computer).
> +
>  menuconfig NAMESPACES
>  	bool "Namespaces support" if EXPERT
>  	default !EXPERT
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> @@ -1788,7 +1788,7 @@ SYSCALL_DEFINE1(umask, int, mask)
>  	return mask;
>  }
>  
> -#ifdef CONFIG_CHECKPOINT_RESTORE
> +#ifdef CONFIG_MM_FIELDS_SETTING
>  static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
>  {
>  	struct fd exe;
> @@ -1981,23 +1981,23 @@ out:
>  	up_read(&mm->mmap_sem);
>  	return error;
>  }
> +#else /* CONFIG_MM_FIELDS_SETTING */
>  
> -static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
> -{
> -	return put_user(me->clear_child_tid, tid_addr);
> -}
> -
> -#else /* CONFIG_CHECKPOINT_RESTORE */
>  static int prctl_set_mm(int opt, unsigned long addr,
>  			unsigned long arg4, unsigned long arg5)
>  {
>  	return -EINVAL;
>  }
> +#endif
> +
>  static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
>  {
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +	return put_user(me->clear_child_tid, tid_addr);
> +#else
>  	return -EINVAL;
> -}
>  #endif
> +}
>  
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-21  7:46                 ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-21  8:00                   ` Cyrill Gorcunov
  2013-02-21  8:03                     ` prctl(PR_SET_MM) Amnon Shiloh
  2013-02-21 22:18                     ` prctl(PR_SET_MM) Andrew Morton
  0 siblings, 2 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2013-02-21  8:00 UTC (permalink / raw)
  To: Amnon Shiloh
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Andrew Morton

On Thu, Feb 21, 2013 at 06:46:39PM +1100, Amnon Shiloh wrote:
> Cyrill Gorcunov wrote:
> 
> >> Another possibility is to have a dual #if:
> >>
> >> #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_MM_FIELDS_SETTING)
> >
> > Thus this approach looks preferred. And MM_FIELDS_SETTING will be y by default.
> > Mind to cook a patch and lets see if community accept it? Don't forget to
> > CC Andrew Morton.
> 
> Very well, patch attached.

Wouldn't the below do the same trick but eliminate OR in preproc code?
---
From: Amnon Shiloh <u3557@miso.sublimeip.com>
Subject: prctl: Make PR_SET_MM being depend on own CONFIG_MM_FIELDS_SETTING

...

Signed-off-by: Amnon Shiloh <u3557@miso.sublimeip.com>
---
 init/Kconfig |   17 +++++++++++++++++
 kernel/sys.c |   16 ++++++++--------
 2 files changed, 25 insertions(+), 8 deletions(-)

Index: linux-2.6.git/init/Kconfig
===================================================================
--- linux-2.6.git.orig/init/Kconfig
+++ linux-2.6.git/init/Kconfig
@@ -991,6 +991,7 @@ endif # CGROUPS
 config CHECKPOINT_RESTORE
 	bool "Checkpoint/restore support" if EXPERT
 	default n
+	select MM_FIELDS_SETTING
 	help
 	  Enables additional kernel features in a sake of checkpoint/restore.
 	  In particular it adds auxiliary prctl codes to setup process text,
@@ -999,6 +1000,22 @@ config CHECKPOINT_RESTORE
 
 	  If unsure, say N here.
 
+config MM_FIELDS_SETTING
+	bool "Allow modifying per-process memory-region fields"
+	default y
+	help
+	   Support "prctl(PR_SET_MM)" which allows applications to modify
+	   the following in their "mm_struct":
+
+	      start_code, end_code, start_data, end_data, start_brk, brk,
+	      start_stack, arg_start, arg_end, env_start, env_end.
+
+	    Also to modify their executable file ("/proc/self/exe").
+
+	    This option is needed for reconstructing processes (such as when
+	    restoring a process from a checkpoint; duplicating a process;
+	    or migrating it to another computer).
+
 menuconfig NAMESPACES
 	bool "Namespaces support" if EXPERT
 	default !EXPERT
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1788,7 +1788,7 @@ SYSCALL_DEFINE1(umask, int, mask)
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_MM_FIELDS_SETTING
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1981,23 +1981,23 @@ out:
 	up_read(&mm->mmap_sem);
 	return error;
 }
+#else /* CONFIG_MM_FIELDS_SETTING */
 
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
-{
-	return put_user(me->clear_child_tid, tid_addr);
-}
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	return -EINVAL;
 }
+#endif
+
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
+#ifdef CONFIG_CHECKPOINT_RESTORE
+	return put_user(me->clear_child_tid, tid_addr);
+#else
 	return -EINVAL;
-}
 #endif
+}
 
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-20 10:51               ` prctl(PR_SET_MM) Cyrill Gorcunov
  2013-02-20 11:16                 ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-21  7:46                 ` Amnon Shiloh
  2013-02-21  8:00                   ` prctl(PR_SET_MM) Cyrill Gorcunov
  1 sibling, 1 reply; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-21  7:46 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

Cyrill Gorcunov wrote:

>> Another possibility is to have a dual #if:
>>
>> #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_MM_FIELDS_SETTING)
>
> Thus this approach looks preferred. And MM_FIELDS_SETTING will be y by default.
> Mind to cook a patch and lets see if community accept it? Don't forget to
> CC Andrew Morton.

Very well, patch attached.

Amnon.

[-- Attachment #2: unified diff output, ASCII text --]
[-- Type: text/plain, Size: 2040 bytes --]

diff -Naur before/init/Kconfig after/init/Kconfig
--- before/init/Kconfig	2013-02-19 10:28:34.000000000 +1030
+++ after/init/Kconfig	2013-02-21 18:03:48.000000000 +1030
@@ -999,6 +999,22 @@
 
 	  If unsure, say N here.
 
+config MM_FIELDS_SETTING
+	bool "Allow modifying per-process memory-region fields"
+	default y
+	help
+	   Support "prctl(PR_SET_MM)" which allows applications to modify
+	   the following in their "mm_struct":
+
+	      start_code, end_code, start_data, end_data, start_brk, brk,
+	      start_stack, arg_start, arg_end, env_start, env_end.
+
+	    Also to modify their executable file ("/proc/self/exe").
+
+	    This option is needed for reconstructing processes (such as when
+	    restoring a process from a checkpoint; duplicating a process;
+	    or migrating it to another computer).
+
 menuconfig NAMESPACES
 	bool "Namespaces support" if EXPERT
 	default !EXPERT
diff -Naur before/kernel/sys.c after/kernel/sys.c
--- before/kernel/sys.c	2013-02-19 10:28:34.000000000 +1030
+++ after/kernel/sys.c	2013-02-21 17:19:10.000000000 +1030
@@ -1788,7 +1788,7 @@
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_MM_FIELDS_SETTING)
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1981,18 +1981,22 @@
 	up_read(&mm->mmap_sem);
 	return error;
 }
+#else /* CONFIG_CHECKPOINT_RESTORE || CONFIG_MM_FIELDS_SETTING */
 
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
-{
-	return put_user(me->clear_child_tid, tid_addr);
-}
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	return -EINVAL;
 }
+#endif
+
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
+{
+	return put_user(me->clear_child_tid, tid_addr);
+}
+
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return -EINVAL;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-20 10:51               ` prctl(PR_SET_MM) Cyrill Gorcunov
@ 2013-02-20 11:16                 ` Amnon Shiloh
  2013-02-21  7:46                 ` prctl(PR_SET_MM) Amnon Shiloh
  1 sibling, 0 replies; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-20 11:16 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

Hi Cyrill,

Cyrill Gorcunov wrote:

> I see. Thanks for explanation! Thus we need some new config option which would
> enable this prctl opcodes (y by default), in turn config-checkpoint-restore
> kconfig option need to select this feature if set. Sounds reasonable?

Yes, This would be one possible way to do it.
Another possibility is to not have an #ifdef at all around this code.
Another possibility is to have a dual #if:

#if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_MM_FIELDS_SETTING)

Best Regards,
Amnon.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-20  9:38             ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-20 10:51               ` Cyrill Gorcunov
  2013-02-20 11:16                 ` prctl(PR_SET_MM) Amnon Shiloh
  2013-02-21  7:46                 ` prctl(PR_SET_MM) Amnon Shiloh
  0 siblings, 2 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2013-02-20 10:51 UTC (permalink / raw)
  To: Amnon Shiloh
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

On Wed, Feb 20, 2013 at 08:38:01PM +1100, Amnon Shiloh wrote:
> > I personally don't mind if this come become y by default, because it will
> > work for us.
> 
> I don't mind either, to say the least, to have CONFIG_CHECKPOINT_RESTORE
> have a "default y" in "init/Kconfig" - that would solve my problem and make
> me happy, as well as your group and others, but we are told here that Linus
> has a policy vetoing such changes and I don't believe either of us can make
> him change his mind.

That's perfectly fine for all new features :-)

> As such, we must look at other options, such as having the code in
> "kernel/sys.c" out in the open, not enclosed by any #ifdef's altogether
> - surely you would like that!
> 
> > Still I guess if you need to reconstruct Linux process(es)
> > plain prctl extension is not enough, you still need a functionality which
> > is enclosed in config-checkpoint-restore (say /proc/pid/map_files, kcmp),
> > no?
> 
> My process-migration package only reconstructs Linux processes partially:
> as it's a bit different than any classical checkpoint-restore, the critical
> code in "kernel/sys.c" is all I need at the moment (from within
> CONFIG_CHECKPOINT_RESTORE).  I do appreciate that anyone attempting to
> perform complete, classical, checkpoint-restore operations, needs more
> than that.

I see. Thanks for explanation! Thus we need some new config option which would
enable this prctl opcodes (y by default), in turn config-checkpoint-restore
kconfig option need to select this feature if set. Sounds reasonable?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-20  8:39           ` prctl(PR_SET_MM) Cyrill Gorcunov
@ 2013-02-20  9:38             ` Amnon Shiloh
  2013-02-20 10:51               ` prctl(PR_SET_MM) Cyrill Gorcunov
  0 siblings, 1 reply; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-20  9:38 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

Hi Cyrill,

Cyrill Gorcunov wrote:

> On Tue, Feb 19, 2013 at 05:25:31PM +1100, Amnon Shiloh wrote:
> > 
> > To reconstruct Linux process(es), one must be able to restore all their
> > memory contents, states and registers to their original and consistent
> > values.
> 
> I personally don't mind if this come become y by default, because it will
> work for us.

I don't mind either, to say the least, to have CONFIG_CHECKPOINT_RESTORE
have a "default y" in "init/Kconfig" - that would solve my problem and make
me happy, as well as your group and others, but we are told here that Linus
has a policy vetoing such changes and I don't believe either of us can make
him change his mind.

As such, we must look at other options, such as having the code in
"kernel/sys.c" out in the open, not enclosed by any #ifdef's altogether
- surely you would like that!

> Still I guess if you need to reconstruct Linux process(es)
> plain prctl extension is not enough, you still need a functionality which
> is enclosed in config-checkpoint-restore (say /proc/pid/map_files, kcmp),
> no?

My process-migration package only reconstructs Linux processes partially:
as it's a bit different than any classical checkpoint-restore, the critical
code in "kernel/sys.c" is all I need at the moment (from within
CONFIG_CHECKPOINT_RESTORE).  I do appreciate that anyone attempting to
perform complete, classical, checkpoint-restore operations, needs more
than that.

Best Regards,
Amnon.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-19  6:25         ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-20  8:39           ` Cyrill Gorcunov
  2013-02-20  9:38             ` prctl(PR_SET_MM) Amnon Shiloh
  2013-02-22 14:23           ` prctl(PR_SET_MM) Denys Vlasenko
  1 sibling, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2013-02-20  8:39 UTC (permalink / raw)
  To: Amnon Shiloh
  Cc: Steven Rostedt, u3557, Oleg Nesterov, Pedro Alves,
	Denys Vlasenko, Jan Kratochvil, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

On Tue, Feb 19, 2013 at 05:25:31PM +1100, Amnon Shiloh wrote:
> 
> To reconstruct Linux process(es), one must be able to restore all their
> memory contents, states and registers to their original and consistent
> values.

I personally don't mind if this come become y by default, because it will
work for us. Still I guess if you need to reconstruct Linux process(es)
plain prctl extension is not enough, you still need a functionality which
is enclosed in config-checkpoint-restore (say /proc/pid/map_files, kcmp),
no?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-18 19:49       ` prctl(PR_SET_MM) Steven Rostedt
@ 2013-02-19  6:25         ` Amnon Shiloh
  2013-02-20  8:39           ` prctl(PR_SET_MM) Cyrill Gorcunov
  2013-02-22 14:23           ` prctl(PR_SET_MM) Denys Vlasenko
  0 siblings, 2 replies; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-19  6:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: u3557, Oleg Nesterov, Pedro Alves, Denys Vlasenko,
	Jan Kratochvil, Cyrill Gorcunov, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

Steven Rostedt wrote:

> If only you, or a few people are using it (ie. distros don't see a
> need), then it will be up to you to make the changes.

I believe that this functionality is of a general nature and is needed
by many, not only by myself and by the CRIU group, but by all user-level
software packages, past present and future, that provide some form or
another of reconstructing a Linux process.

For example:
1) Checkpoint-restore
   - the ability to resume a computation after computer crashes.
2) Process-migration
   - the ability to have a running computation change computers.
     (for numerous reasons, including planned shutdown; overheating;
     load-balancing; increased memory demands; the original computer
     being required by higher-priority users, etc. etc.)
3) Process-duplication
   - the ability to have a running computation continue in parallel from
     a point in time on two or more computers, so that if one of them
     fails, at least one copy will still be running (in time-critical
     missions, a periodic checkpointing may not be sufficient).

To reconstruct Linux process(es), one must be able to restore all their
memory contents, states and registers to their original and consistent
values.    

Other than the code currently enclosed in "CONFIG_CHECKPOINT_RESTORE",
nothing else in the kernel provides the ability to set those 11 fields:

     start_code, end_code, start_data, end_data, start_brk, brk,
     start_stack, arg_start, arg_end, env_start, env_end.

There are many possible ways to implement this functionality and if you
believe that it would be a good idea, I could instead send a simple,
independent kernel patch that sets those 11 fields.  It would be shorter
than the current and perhaps implemented as a new "ptrace" function,
but given that the CRIU group already wrote this code and that it is
already in the kernel, I am concerned that trying to duplicate it might
interfere with their work, so I first ask for the relevant code in
"kernel/sys.c" to no longer be enclosed by an #ifdef.  What do you think?

Best Regards,
Amnon.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-18 16:33     ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-18 19:49       ` Steven Rostedt
  2013-02-19  6:25         ` prctl(PR_SET_MM) Amnon Shiloh
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2013-02-18 19:49 UTC (permalink / raw)
  To: u3557
  Cc: Oleg Nesterov, Pedro Alves, Denys Vlasenko, Jan Kratochvil,
	Cyrill Gorcunov, Pavel Emelyanov, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, linux-kernel

On Tue, 2013-02-19 at 03:33 +1100, Amnon Shiloh wrote:

> Yes, Randy Dunlap already raised this point, but I have no dealings with
> any particular Linux distribution or the right connections to chase them
> all, one by one - I develop generic software for the general Linux community,
> that is intended to work distribution-independently.  Even if I had access
> to all distributions, it may be hard to convince them to configure
> CONFIG_CHECKPOINT_RESTORE as a whole since it contains so much other code.

If only you, or a few people are using it (ie. distros don't see a
need), then it will be up to you to make the changes.

> 
> BTW, Can anyone explain this policy of "have things default n"?
> When I go over "init/Kconfig" or most other Kconfig's, I can
> actually see lots of "default y".

Linus stated that he doesn't want any new feature default on, unless it
gives better performance to the general populace. Because someone's old
config should not add new features when they use it for a new kernel.
Basically, if something is default on, then it probably shouldn't have a
config for it.

Some are on because they were always on (although several were changed
to 'n'). Also it's fine to have a default y that depends on something
that is default n, if it makes sense.

For example, the function graph tracer is default y, but requires the
function tracer to be enabled which is default n. This is because I
assumed that you would want the function graph tracer if you enabled
function tracing, as the graph tracer gives much more information.

> 
> > 
> > > 2) Releasing this code from the "#ifdef CONFIG_CHECK_RESTORE"; or
> > > 3) Placing this code within a different kernel-configuration option
> > >    (say "CONFIG_BASIC_CHECKPOINTING") that is enabled by default; or
> > > 4) Placing this code under a dual #if, so instead of:
> > >    #ifdef CONFIG_CHECKPOINT_RESTORE
> > > 	   have:
> > >    #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_BASIC_CHECKPOINTING)
> > 
> > One of the above 3 can probably be worked out.
> > 
> > -- Steve
> 
> Great!
> 
> Naturally I prefer option 2 (but the other two will do as well).

I have no problems with this out of line.

-- Steve



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-18 15:21   ` prctl(PR_SET_MM) Steven Rostedt
@ 2013-02-18 16:33     ` Amnon Shiloh
  2013-02-18 19:49       ` prctl(PR_SET_MM) Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-18 16:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: u3557, Oleg Nesterov, Pedro Alves, Denys Vlasenko,
	Jan Kratochvil, Cyrill Gorcunov, Pavel Emelyanov,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

Steven Rostedt wrote:

> On Mon, 2013-02-18 at 12:39 +1100, Amnon Shiloh wrote:
> > Hello,
> > 
> > The code in "kernel/sys.c" provides the "prctl(PR_SET_MM)" function,
> > which is the only way a process can set or modify the following 11
> > per-process fields:
> > 
> >  	start_code, end_code, start_data, end_data, start_brk, brk,
> >  	start_stack, arg_start, arg_end, env_start, env_end.
> > 
> > Being able to set those fields is important, even crucial,
> > for any conceivable user-level checkpointing software, as
> > well as for migrating processes between different computers.
> 
> You're saying that this is useful for code not needing a kernel with
> CHECKPOINT_RESTORE enabled. Correct?

Correct, this is an important feature that is useful for a whole
general class of applications, not only those needing CHECKPOINT_RESTORE.

Had this not been done as part of the CHECKPOINT_RESTORE project, it
would have certainly been done, sooner or later, by some other developers:
it just so happened that the CHECKPOINT_RESTORE people were the first to
(publically) fill this gap, but in fact this code in "kernel/sys.c" should
be general kernel code, not part of CHECKPOINT_RESTORE.

> 
> > 
> > Unfortunately, this code (essentially "prctl_set_mm()") is presently
> > enclosed in "#ifdef CONFIG_CHECKPOINT_RESTORE" which is configured
> > as "default n" in "init/Kconfig".  Many system-administrators who
> > may like to have a checkpoint/restore or process-migration facility,
> > but use standard pre-packaged kernels, find the requirement to
> > configure and compile their own non-standard kernel difficult or
> > too prohibitive.
> > 
> > Would it be possible to have this code enabled by default?
> > 
> > This could be done in one of 4 ways:
> > 1) Having CONFIG_CHECKPOINT_RESTORE enabled by default; or
> 
> Nope, that wont due. Kernel policy is to have things default n. Have an
> issue with a config, talk with the distribution you are dealing with.
> They set the policy of what configs get set for their kernels.

Yes, Randy Dunlap already raised this point, but I have no dealings with
any particular Linux distribution or the right connections to chase them
all, one by one - I develop generic software for the general Linux community,
that is intended to work distribution-independently.  Even if I had access
to all distributions, it may be hard to convince them to configure
CONFIG_CHECKPOINT_RESTORE as a whole since it contains so much other code.

BTW, Can anyone explain this policy of "have things default n"?
When I go over "init/Kconfig" or most other Kconfig's, I can
actually see lots of "default y".

> 
> > 2) Releasing this code from the "#ifdef CONFIG_CHECK_RESTORE"; or
> > 3) Placing this code within a different kernel-configuration option
> >    (say "CONFIG_BASIC_CHECKPOINTING") that is enabled by default; or
> > 4) Placing this code under a dual #if, so instead of:
> >    #ifdef CONFIG_CHECKPOINT_RESTORE
> > 	   have:
> >    #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_BASIC_CHECKPOINTING)
> 
> One of the above 3 can probably be worked out.
> 
> -- Steve

Great!

Naturally I prefer option 2 (but the other two will do as well).

Thanks,
Amnon.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-18  1:39 ` prctl(PR_SET_MM) Amnon Shiloh
  2013-02-18  5:44   ` prctl(PR_SET_MM) Randy Dunlap
@ 2013-02-18 15:21   ` Steven Rostedt
  2013-02-18 16:33     ` prctl(PR_SET_MM) Amnon Shiloh
  1 sibling, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2013-02-18 15:21 UTC (permalink / raw)
  To: u3557
  Cc: Oleg Nesterov, Pedro Alves, Denys Vlasenko, Jan Kratochvil,
	Cyrill Gorcunov, Pavel Emelyanov, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, linux-kernel

On Mon, 2013-02-18 at 12:39 +1100, Amnon Shiloh wrote:
> Hello,
> 
> The code in "kernel/sys.c" provides the "prctl(PR_SET_MM)" function,
> which is the only way a process can set or modify the following 11
> per-process fields:
> 
>  	start_code, end_code, start_data, end_data, start_brk, brk,
>  	start_stack, arg_start, arg_end, env_start, env_end.
> 
> Being able to set those fields is important, even crucial,
> for any conceivable user-level checkpointing software, as
> well as for migrating processes between different computers.

You're saying that this is useful for code not needing a kernel with
CHECKPOINT_RESTORE enabled. Correct?

> 
> Unfortunately, this code (essentially "prctl_set_mm()") is presently
> enclosed in "#ifdef CONFIG_CHECKPOINT_RESTORE" which is configured
> as "default n" in "init/Kconfig".  Many system-administrators who
> may like to have a checkpoint/restore or process-migration facility,
> but use standard pre-packaged kernels, find the requirement to
> configure and compile their own non-standard kernel difficult or
> too prohibitive.
> 
> Would it be possible to have this code enabled by default?
> 
> This could be done in one of 4 ways:
> 1) Having CONFIG_CHECKPOINT_RESTORE enabled by default; or

Nope, that wont due. Kernel policy is to have things default n. Have an
issue with a config, talk with the distribution you are dealing with.
They set the policy of what configs get set for their kernels.

> 2) Releasing this code from the "#ifdef CONFIG_CHECK_RESTORE"; or
> 3) Placing this code within a different kernel-configuration option
>    (say "CONFIG_BASIC_CHECKPOINTING") that is enabled by default; or
> 4) Placing this code under a dual #if, so instead of:
>    #ifdef CONFIG_CHECKPOINT_RESTORE
> 	   have:
>    #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_BASIC_CHECKPOINTING)

One of the above 3 can probably be worked out.

-- Steve



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: prctl(PR_SET_MM)
  2013-02-18  1:39 ` prctl(PR_SET_MM) Amnon Shiloh
@ 2013-02-18  5:44   ` Randy Dunlap
  2013-02-18 15:21   ` prctl(PR_SET_MM) Steven Rostedt
  1 sibling, 0 replies; 20+ messages in thread
From: Randy Dunlap @ 2013-02-18  5:44 UTC (permalink / raw)
  To: u3557
  Cc: Amnon Shiloh, Oleg Nesterov, Pedro Alves, Denys Vlasenko,
	Jan Kratochvil, Cyrill Gorcunov, Pavel Emelyanov, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar, Peter Zijlstra, linux-kernel

On 02/17/13 17:39, Amnon Shiloh wrote:
> Hello,
> 
> The code in "kernel/sys.c" provides the "prctl(PR_SET_MM)" function,
> which is the only way a process can set or modify the following 11
> per-process fields:
> 
>  	start_code, end_code, start_data, end_data, start_brk, brk,
>  	start_stack, arg_start, arg_end, env_start, env_end.
> 
> Being able to set those fields is important, even crucial,
> for any conceivable user-level checkpointing software, as
> well as for migrating processes between different computers.
> 
> Unfortunately, this code (essentially "prctl_set_mm()") is presently
> enclosed in "#ifdef CONFIG_CHECKPOINT_RESTORE" which is configured
> as "default n" in "init/Kconfig".  Many system-administrators who
> may like to have a checkpoint/restore or process-migration facility,
> but use standard pre-packaged kernels, find the requirement to
> configure and compile their own non-standard kernel difficult or
> too prohibitive.
> 
> Would it be possible to have this code enabled by default?
> 
> This could be done in one of 4 ways:
> 1) Having CONFIG_CHECKPOINT_RESTORE enabled by default; or
> 2) Releasing this code from the "#ifdef CONFIG_CHECK_RESTORE"; or
> 3) Placing this code within a different kernel-configuration option
>    (say "CONFIG_BASIC_CHECKPOINTING") that is enabled by default; or
> 4) Placing this code under a dual #if, so instead of:
>    #ifdef CONFIG_CHECKPOINT_RESTORE
> 	   have:
>    #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_BASIC_CHECKPOINTING)


This is basically a distro issue.  Distros can choose to enable
this code by default, but the Linux kernel that Linus maintains does
not need to enable it.


-- 
~Randy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* prctl(PR_SET_MM)
  2013-01-14 16:01 PTRACE_SYSCALL && vsyscall (Was: arch_check_bp_in_kernelspace: fix the range check) Oleg Nesterov
@ 2013-02-18  1:39 ` Amnon Shiloh
  2013-02-18  5:44   ` prctl(PR_SET_MM) Randy Dunlap
  2013-02-18 15:21   ` prctl(PR_SET_MM) Steven Rostedt
  0 siblings, 2 replies; 20+ messages in thread
From: Amnon Shiloh @ 2013-02-18  1:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Pedro Alves, Denys Vlasenko, Jan Kratochvil, Cyrill Gorcunov,
	Pavel Emelyanov, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, linux-kernel

Hello,

The code in "kernel/sys.c" provides the "prctl(PR_SET_MM)" function,
which is the only way a process can set or modify the following 11
per-process fields:

 	start_code, end_code, start_data, end_data, start_brk, brk,
 	start_stack, arg_start, arg_end, env_start, env_end.

Being able to set those fields is important, even crucial,
for any conceivable user-level checkpointing software, as
well as for migrating processes between different computers.

Unfortunately, this code (essentially "prctl_set_mm()") is presently
enclosed in "#ifdef CONFIG_CHECKPOINT_RESTORE" which is configured
as "default n" in "init/Kconfig".  Many system-administrators who
may like to have a checkpoint/restore or process-migration facility,
but use standard pre-packaged kernels, find the requirement to
configure and compile their own non-standard kernel difficult or
too prohibitive.

Would it be possible to have this code enabled by default?

This could be done in one of 4 ways:
1) Having CONFIG_CHECKPOINT_RESTORE enabled by default; or
2) Releasing this code from the "#ifdef CONFIG_CHECK_RESTORE"; or
3) Placing this code within a different kernel-configuration option
   (say "CONFIG_BASIC_CHECKPOINTING") that is enabled by default; or
4) Placing this code under a dual #if, so instead of:
   #ifdef CONFIG_CHECKPOINT_RESTORE
	   have:
   #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_BASIC_CHECKPOINTING)


Thank you,
Amnon.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-02-24  6:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20130222142603.987c6e3c.akpm@linux-foundation.org>
2013-02-24  6:24 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-24  6:28 ` prctl(PR_SET_MM) Amnon Shiloh
2013-01-14 16:01 PTRACE_SYSCALL && vsyscall (Was: arch_check_bp_in_kernelspace: fix the range check) Oleg Nesterov
2013-02-18  1:39 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-18  5:44   ` prctl(PR_SET_MM) Randy Dunlap
2013-02-18 15:21   ` prctl(PR_SET_MM) Steven Rostedt
2013-02-18 16:33     ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-18 19:49       ` prctl(PR_SET_MM) Steven Rostedt
2013-02-19  6:25         ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-20  8:39           ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-20  9:38             ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-20 10:51               ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-20 11:16                 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-21  7:46                 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-21  8:00                   ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-21  8:03                     ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-21  8:09                       ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-21 22:18                     ` prctl(PR_SET_MM) Andrew Morton
2013-02-21 22:42                       ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-22  1:18                       ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-22 14:23           ` prctl(PR_SET_MM) Denys Vlasenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.