All of lore.kernel.org
 help / color / mirror / Atom feed
From: u3557@miso.sublimeip.com (Amnon Shiloh)
To: akpm@linux-foundation.org (Andrew Morton)
Cc: rostedt@goodmis.org, oleg@redhat.com, palves@redhat.com,
	oleg@redhat.com, palves@redhat.com, dvlasenk@redhat.com,
	jan.kratochvil@redhat.com, xemul@parallels.com,
	fweisbec@gmail.com, mingo@redhat.com, a.p.zijlstra@chello.nl,
	linux-kernel@vger.kernel.org
Subject: Re: prctl(PR_SET_MM)
Date: Sun, 24 Feb 2013 17:28:33 +1100 (EST)	[thread overview]
Message-ID: <20130224062833.EFCE259205C@miso.sublimeip.com> (raw)
In-Reply-To: <20130222142603.987c6e3c.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]

Dear Andrew,

Andrew Morton <akpm@linux-foundation.org> Wrote:
> Well OK.  Put all that on top of a patch, add suitable signoffs and
> cc's and send it along?

The purpose of this patch is to allow privileged processes to set
their own per-memory memory-region fields:

      start_code, end_code, start_data, end_data, start_brk, brk,
      start_stack, arg_start, arg_end, env_start, env_end.

This functionality is needed by any application or package that
needs to reconstruct Linux processes, that is, to start them in
any way other than by means of an "execve()" from an executable
file.  This includes:

1. Restoring processes from a checkpoint-file (by all potential
   user-level checkpointing packages, not only CRIU's).
2. Restarting processes on another node after process migration.
3. Starting duplicated copies of a running process (for reliability
   and high-availablity).
4. Starting a process from an executable format that is not supported
   by Linux, thus requiring a "manual execve" by a user-level utility.
5. Similarly, starting a process from a networked and/or crypted
   executable that, for confidentiality, licensing or other reasons,
   may not be written to the local file-systems.

The code that does that was already included in the Linux kernel by
the CRIU group, in the form of "prctl(PR_SET_MM)", but prior to this
was enclosed within their private "#ifdef CONFIG_CHECKPOINT_RESTORE",
which is normally disabled.

It was not clear from your answer, Andrew, whether you prefer to
remove the "#ifdef CONFIG_CHECKPOINT_RESTORE" altogether from the
said code, or to enclose it in a new configuration option that is
enabled by default.   I therefore attach two alternative patches
to choose from: the first removes the #ifdef altogether while the
second introduces a new option.

Signed-off-by: Amnon Shiloh.

Best Regards,
Amnon.


> On Fri, 22 Feb 2013 12:18:01 +1100 (EST)
> u3557@miso.sublimeip.com (Amnon Shiloh) wrote:
> 
> > The code in "kernel/sys.c" that is currently within
> > CONFIG_CHECKPOINT_RESTORE is in fact, as I explain below,
> > one possible solution to a general issue, required by a wide
> > class of applications.  It just so happened that the CRIU group
> > were the first to place this, or an equivalent code, in the kernel,
> > that allows a privileged process to set its 11 per-process memory-region
> > fields:
> >      start_code, end_code, start_data, end_data, start_brk, brk,
> >      start_stack, arg_start, arg_end, env_start, env_end.
> > 
> > 
> > Contrary to the rest of the CHECKPOINT_RESTORE code, which is specific
> > to the CRIU package, the code in "kernel/sys.c" (or its equivalent) is
> > needed by ANY application or package that needs to reconstruct Linux
> > processes, that means, starting them from the middle rather than from
> > an executable file.
> > 
> > That includes user-level checkpointing (any, not just CRIU's),
> > process-migration (to other computers, as my own package does)
> > and process duplication (for high-availability/reliability) -
> > in fact even for starting a process from an executable format
> > that is not supported by Linux, thus requiring a "manual execve"
> > by a user-level utility.
> > 
> > My first preference is to remove that "#ifdef CONFIG_CHECKPOINT_RESTORE"
> > altogether.  Note that there are no security issues because this code
> > is already restricted to "capable(CAP_SYS_RESOURCE)".
> > Short of that is the proposed patch.
> 
> Well OK.  Put all that on top of a patch, add suitable signoffs and
> cc's and send it along?
> 

[-- Attachment #2: unified diff output, ASCII text --]
[-- Type: text/plain, Size: 2270 bytes --]

diff -Naur linux-3.8/init/Kconfig option2/init/Kconfig
--- linux-3.8/init/Kconfig	2013-02-19 10:28:34.000000000 +1030
+++ option2/init/Kconfig	2013-02-24 13:57:02.000000000 +1030
@@ -991,6 +991,7 @@
 config CHECKPOINT_RESTORE
 	bool "Checkpoint/restore support" if EXPERT
 	default n
+	select MM_FIELDS_SETTING
 	help
 	  Enables additional kernel features in a sake of checkpoint/restore.
 	  In particular it adds auxiliary prctl codes to setup process text,
@@ -999,6 +1000,22 @@
 
 	  If unsure, say N here.
 
+config MM_FIELDS_SETTING
+	bool "Allow modifying per-process memory-region fields"
+	default y
+	help
+	   Support "prctl(PR_SET_MM)" which allows applications to modify
+	   the following in their "mm_struct":
+
+	      start_code, end_code, start_data, end_data, start_brk, brk,
+	      start_stack, arg_start, arg_end, env_start, env_end.
+
+	    Also to modify their executable file ("/proc/self/exe").
+
+	    This option is needed for reconstructing processes (such as when
+	    restoring a process from a checkpoint; duplicating a process;
+	    or migrating it to another computer).
+
 menuconfig NAMESPACES
 	bool "Namespaces support" if EXPERT
 	default !EXPERT
diff -Naur linux-3.8/kernel/sys.c option2/kernel/sys.c
--- linux-3.8/kernel/sys.c	2013-02-19 10:28:34.000000000 +1030
+++ option2/kernel/sys.c	2013-02-24 10:37:08.000000000 +1030
@@ -1788,7 +1788,7 @@
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_MM_FIELDS_SETTING
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1981,18 +1981,22 @@
 	up_read(&mm->mmap_sem);
 	return error;
 }
+#else /* CONFIG_MM_FIELDS_SETTING */
 
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
-{
-	return put_user(me->clear_child_tid, tid_addr);
-}
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	return -EINVAL;
 }
+#endif
+
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
+{
+	return put_user(me->clear_child_tid, tid_addr);
+}
+
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return -EINVAL;

[-- Attachment #3: unified diff output, ASCII text --]
[-- Type: text/plain, Size: 836 bytes --]

diff -Naur linux-3.8/kernel/sys.c option1/kernel/sys.c
--- linux-3.8/kernel/sys.c	2013-02-19 10:28:34.000000000 +1030
+++ option1/kernel/sys.c	2013-02-24 10:47:45.000000000 +1030
@@ -1788,7 +1788,6 @@
 	return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
@@ -1982,17 +1981,12 @@
 	return error;
 }
 
+#ifdef CONFIG_CHECKPOINT_RESTORE
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return put_user(me->clear_child_tid, tid_addr);
 }
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
-static int prctl_set_mm(int opt, unsigned long addr,
-			unsigned long arg4, unsigned long arg5)
-{
-	return -EINVAL;
-}
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
 	return -EINVAL;

  parent reply	other threads:[~2013-02-24  6:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130222142603.987c6e3c.akpm@linux-foundation.org>
2013-02-24  6:24 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-24  6:28 ` Amnon Shiloh [this message]
2013-01-14 16:01 PTRACE_SYSCALL && vsyscall (Was: arch_check_bp_in_kernelspace: fix the range check) Oleg Nesterov
2013-02-18  1:39 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-18  5:44   ` prctl(PR_SET_MM) Randy Dunlap
2013-02-18 15:21   ` prctl(PR_SET_MM) Steven Rostedt
2013-02-18 16:33     ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-18 19:49       ` prctl(PR_SET_MM) Steven Rostedt
2013-02-19  6:25         ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-20  8:39           ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-20  9:38             ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-20 10:51               ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-20 11:16                 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-21  7:46                 ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-21  8:00                   ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-21  8:03                     ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-21  8:09                       ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-21 22:18                     ` prctl(PR_SET_MM) Andrew Morton
2013-02-21 22:42                       ` prctl(PR_SET_MM) Cyrill Gorcunov
2013-02-22  1:18                       ` prctl(PR_SET_MM) Amnon Shiloh
2013-02-22 14:23           ` prctl(PR_SET_MM) Denys Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130224062833.EFCE259205C@miso.sublimeip.com \
    --to=u3557@miso.sublimeip.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dvlasenk@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=jan.kratochvil@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=palves@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=u3557@dialix.com.au \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.