All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Make core_pattern support namespace
@ 2017-08-02  6:37 Cao Shufeng
  2017-08-02  6:37 ` [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible to set namespaces Cao Shufeng
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, kamezawa.hiroyu, stgraber, avagin,
	zhaolei, mashimiao.fnst, caosf.fnst

This patchset includes following function points:
1: Let usermodehelper function possible to set pid namespace
   done by: [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible
   to set namespaces
2: Let pipe_type core_pattern write dump into container's rootfs
   done by: [PATCH_v4.1_2/3] Limit dump_pipe program's permission to
   init for container
3: Make separate core_pattern setting for each container
   done by: [PATCH_v4.1_3/3] Make core_pattern support namespace
4: Compatibility with current system
   also included in: [PATCH_v4.1_3/3] Make core_pattern support namespace
   If container hadn't change core_pattern setting, it will keep
   same setting with host.

Test:
1: Pass a test script for each function of this patchset
   ## TEST IN HOST ##
   [root@kerneldev dumptest]# ./test_host
   Set file core_pattern: OK
   ./test_host: line 41:  2366 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dumpfile: OK
   Set file core_pattern: OK
   ./test_host: line 41:  2369 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking capabilities: OK

   ## TEST IN GUEST ##
   # ./test
   Segmentation fault (core dumped)
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking cg pids: OK
   Checking capabilities: OK
   [   64.940734] make_dump[2432]: segfault at 0 ip 000000000040049d sp 000=
07ffc4af025f0 error 6 in make_dump[400000+a6000]
   #
2: Pass other test(which is not easy to do in script) by hand.

Changelog v3.1-v4:
1. remove extra fork pointed out by:
   Andrei Vagin <avagin@gmail.com>
2: Rebase on top of v4.9-rc8.
3: Rebase on top of v4.12.

Changelog v3-v3.1:
1. Switch "pwd" of pipe program to container's root fs.
2. Rebase on top of v4.9-rc1.

Changelog v2->v3:
1: Fix problem of setting pid namespace, pointed out by:
   Andrei Vagin <avagin@gmail.com>

Changelog v1(RFC)->v2:
1: Add [PATCH 2/2] which was todo in [RFC v1].
2: Pass a test script for each function.
3: Rebase on top of v4.7.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Cao Shufeng <caosf.fnst@cn.fujitsu.com>

Cao Shufeng (3):
  Make call_usermodehelper_exec possible to set namespaces
  Limit dump_pipe program's permission to init for container
  Make core_pattern support namespace

 fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
 include/linux/binfmts.h       |   2 +
 include/linux/kmod.h          |   5 ++
 include/linux/pid_namespace.h |   3 +
 init/do_mounts_initrd.c       |   3 +-
 kernel/kmod.c                 |  56 +++++++++++++---
 kernel/pid.c                  |   2 +
 kernel/pid_namespace.c        |   2 +
 kernel/sysctl.c               |  50 ++++++++++++--
 lib/kobject_uevent.c          |   3 +-
 security/keys/request_key.c   |   4 +-
 11 files changed, 253 insertions(+), 27 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible to set namespaces
       [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2017-08-02  6:37   ` Cao Shufeng
  2017-08-02  6:37   ` [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Current call_usermodehelper_work() can not set namespaces for
the executed program.

This patch add above function for call_usermodehelper_work().
The init_intermediate is introduced for init works which should
be done before fork(). So that we get a method to set namespaces
for children. The cleanup_intermediate is introduced for cleaning
up what we have done in init_intermediate, like switching back
the namespace.

This function is helpful for coredump to run pipe_program in
specific container environment.

Signed-off-by: Cao Shufeng <caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 fs/coredump.c               |  3 ++-
 include/linux/kmod.h        |  5 ++++
 init/do_mounts_initrd.c     |  3 ++-
 kernel/kmod.c               | 56 +++++++++++++++++++++++++++++++++++++--------
 lib/kobject_uevent.c        |  3 ++-
 security/keys/request_key.c |  4 ++--
 6 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 5926837..802f434 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -646,7 +646,8 @@ void do_coredump(const siginfo_t *siginfo)
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						umh_pipe_setup, NULL, &cprm);
+						NULL, NULL, umh_pipe_setup,
+						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index c4e441e..bb4e1a6 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -61,6 +61,9 @@ struct subprocess_info {
 	char **envp;
 	int wait;
 	int retval;
+	bool cleaned;
+	void (*init_intermediate)(struct subprocess_info *info);
+	void (*cleanup_intermediate)(struct subprocess_info *info);
 	int (*init)(struct subprocess_info *info, struct cred *new);
 	void (*cleanup)(struct subprocess_info *info);
 	void *data;
@@ -72,6 +75,8 @@ call_usermodehelper(const char *path, char **argv, char **envp, int wait);
 extern struct subprocess_info *
 call_usermodehelper_setup(const char *path, char **argv, char **envp,
 			  gfp_t gfp_mask,
+			  void (*init_intermediate)(struct subprocess_info *info),
+			  void (*cleanup_intermediate)(struct subprocess_info *info),
 			  int (*init)(struct subprocess_info *info, struct cred *new),
 			  void (*cleanup)(struct subprocess_info *), void *data);
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index a1000ca..59d11c9 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -72,7 +72,8 @@ static void __init handle_initrd(void)
 	current->flags |= PF_FREEZER_SKIP;
 
 	info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
-					 GFP_KERNEL, init_linuxrc, NULL, NULL);
+					 GFP_KERNEL, NULL, NULL, init_linuxrc,
+					 NULL, NULL);
 	if (!info)
 		return;
 	call_usermodehelper_exec(info, UMH_WAIT_PROC);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 563f97e..f75725b 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -41,6 +41,7 @@
 #include <linux/rwsem.h>
 #include <linux/ptrace.h>
 #include <linux/async.h>
+#include <linux/delay.h>
 #include <linux/uaccess.h>
 
 #include <trace/events/module.h>
@@ -93,7 +94,8 @@ static int call_modprobe(char *module_name, int wait)
 	argv[4] = NULL;
 
 	info = call_usermodehelper_setup(modprobe_path, argv, envp, GFP_KERNEL,
-					 NULL, free_modprobe_argv, NULL);
+			                 NULL, NULL, NULL, free_modprobe_argv,
+					 NULL);
 	if (!info)
 		goto free_module_name;
 
@@ -207,8 +209,15 @@ static void umh_complete(struct subprocess_info *sub_info)
 	 */
 	if (comp)
 		complete(comp);
-	else
+	else {
+		for(;;) {
+			if (sub_info->cleaned == false)
+				udelay(20);
+			else
+				break;
+		}
 		call_usermodehelper_freeinfo(sub_info);
+	}
 }
 
 /*
@@ -302,7 +311,10 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 
 	/* Restore default kernel sig handler */
 	kernel_sigaction(SIGCHLD, SIG_IGN);
-
+	if(sub_info->cleanup_intermediate) {
+		sub_info->cleanup_intermediate(sub_info);
+	}
+	sub_info->cleaned = true;
 	umh_complete(sub_info);
 }
 
@@ -324,6 +336,9 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 {
 	struct subprocess_info *sub_info =
 		container_of(work, struct subprocess_info, work);
+	if(sub_info->init_intermediate) {
+		sub_info->init_intermediate(sub_info);
+	}
 
 	if (sub_info->wait & UMH_WAIT_PROC) {
 		call_usermodehelper_exec_sync(sub_info);
@@ -336,6 +351,11 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 		 */
 		pid = kernel_thread(call_usermodehelper_exec_async, sub_info,
 				    CLONE_PARENT | SIGCHLD);
+
+		if(sub_info->cleanup_intermediate) {
+			sub_info->cleanup_intermediate(sub_info);
+		}
+		sub_info->cleaned = true;
 		if (pid < 0) {
 			sub_info->retval = pid;
 			umh_complete(sub_info);
@@ -501,25 +521,38 @@ static void helper_unlock(void)
  * @argv: arg vector for process
  * @envp: environment for process
  * @gfp_mask: gfp mask for memory allocation
- * @cleanup: a cleanup function
+ * @init_intermediate: init function which is called in parent task
+ * @cleanup_intermediate: clean function which is called in parent task
  * @init: an init function
+ * @cleanup: a cleanup function
  * @data: arbitrary context sensitive data
  *
  * Returns either %NULL on allocation failure, or a subprocess_info
  * structure.  This should be passed to call_usermodehelper_exec to
  * exec the process and free the structure.
  *
- * The init function is used to customize the helper process prior to
- * exec.  A non-zero return code causes the process to error out, exit,
- * and return the failure to the calling process
+ * The init_intermediate is called in the parent task of user mode
+ * helper. It's designed for init works which must be done in
+ * parent task, like switching the pid_ns_for_children.
+ *
+ * The cleanup_intermediate is used when we want to cleanup what
+ * we have done in init_intermediate, it is also called in parent
+ * task.
  *
- * The cleanup function is just before ethe subprocess_info is about to
+ * The init function is called after fork. It is used to customize the
+ * helper process prior to exec.  A non-zero return code causes the
+ * process to error out, exit, and return the failure to the
+ * calling process.
+ *
+ * The cleanup function is just before the subprocess_info is about to
  * be freed.  This can be used for freeing the argv and envp.  The
  * Function must be runnable in either a process context or the
  * context in which call_usermodehelper_exec is called.
  */
 struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 		char **envp, gfp_t gfp_mask,
+		void (*init_intermediate)(struct subprocess_info *info),
+		void (*cleanup_intermediate)(struct subprocess_info *info),
 		int (*init)(struct subprocess_info *info, struct cred *new),
 		void (*cleanup)(struct subprocess_info *info),
 		void *data)
@@ -539,8 +572,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 	sub_info->argv = argv;
 	sub_info->envp = envp;
 
-	sub_info->cleanup = cleanup;
+	sub_info->init_intermediate = init_intermediate;
+	sub_info->cleaned = false;
+	sub_info->cleanup_intermediate = cleanup_intermediate;
 	sub_info->init = init;
+	sub_info->cleanup = cleanup;
 	sub_info->data = data;
   out:
 	return sub_info;
@@ -635,7 +671,7 @@ int call_usermodehelper(const char *path, char **argv, char **envp, int wait)
 	gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
 
 	info = call_usermodehelper_setup(path, argv, envp, gfp_mask,
-					 NULL, NULL, NULL);
+					 NULL, NULL, NULL, NULL, NULL);
 	if (info == NULL)
 		return -ENOMEM;
 
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 719c155..b63e927 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -486,7 +486,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		retval = -ENOMEM;
 		info = call_usermodehelper_setup(env->argv[0], env->argv,
 						 env->envp, GFP_KERNEL,
-						 NULL, cleanup_uevent_env, env);
+						 NULL, NULL, NULL,
+						 cleanup_uevent_env, env);
 		if (info) {
 			retval = call_usermodehelper_exec(info, UMH_NO_WAIT);
 			env = NULL;	/* freed by cleanup_uevent_env */
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index 63e63a4..3f628ce 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -78,8 +78,8 @@ static int call_usermodehelper_keys(const char *path, char **argv, char **envp,
 	struct subprocess_info *info;
 
 	info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
-					  umh_keys_init, umh_keys_cleanup,
-					  session_keyring);
+					 NULL, NULL, umh_keys_init,
+					 umh_keys_cleanup, session_keyring);
 	if (!info)
 		return -ENOMEM;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible to set namespaces
  2017-08-02  6:37 [PATCH 0/3] Make core_pattern support namespace Cao Shufeng
@ 2017-08-02  6:37 ` Cao Shufeng
       [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, kamezawa.hiroyu, stgraber, avagin,
	zhaolei, mashimiao.fnst, caosf.fnst

Current call_usermodehelper_work() can not set namespaces for
the executed program.

This patch add above function for call_usermodehelper_work().
The init_intermediate is introduced for init works which should
be done before fork(). So that we get a method to set namespaces
for children. The cleanup_intermediate is introduced for cleaning
up what we have done in init_intermediate, like switching back
the namespace.

This function is helpful for coredump to run pipe_program in
specific container environment.

Signed-off-by: Cao Shufeng <caosf.fnst@cn.fujitsu.com>
---
 fs/coredump.c               |  3 ++-
 include/linux/kmod.h        |  5 ++++
 init/do_mounts_initrd.c     |  3 ++-
 kernel/kmod.c               | 56 +++++++++++++++++++++++++++++++++++++--------
 lib/kobject_uevent.c        |  3 ++-
 security/keys/request_key.c |  4 ++--
 6 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 5926837..802f434 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -646,7 +646,8 @@ void do_coredump(const siginfo_t *siginfo)
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						umh_pipe_setup, NULL, &cprm);
+						NULL, NULL, umh_pipe_setup,
+						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index c4e441e..bb4e1a6 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -61,6 +61,9 @@ struct subprocess_info {
 	char **envp;
 	int wait;
 	int retval;
+	bool cleaned;
+	void (*init_intermediate)(struct subprocess_info *info);
+	void (*cleanup_intermediate)(struct subprocess_info *info);
 	int (*init)(struct subprocess_info *info, struct cred *new);
 	void (*cleanup)(struct subprocess_info *info);
 	void *data;
@@ -72,6 +75,8 @@ call_usermodehelper(const char *path, char **argv, char **envp, int wait);
 extern struct subprocess_info *
 call_usermodehelper_setup(const char *path, char **argv, char **envp,
 			  gfp_t gfp_mask,
+			  void (*init_intermediate)(struct subprocess_info *info),
+			  void (*cleanup_intermediate)(struct subprocess_info *info),
 			  int (*init)(struct subprocess_info *info, struct cred *new),
 			  void (*cleanup)(struct subprocess_info *), void *data);
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index a1000ca..59d11c9 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -72,7 +72,8 @@ static void __init handle_initrd(void)
 	current->flags |= PF_FREEZER_SKIP;
 
 	info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
-					 GFP_KERNEL, init_linuxrc, NULL, NULL);
+					 GFP_KERNEL, NULL, NULL, init_linuxrc,
+					 NULL, NULL);
 	if (!info)
 		return;
 	call_usermodehelper_exec(info, UMH_WAIT_PROC);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 563f97e..f75725b 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -41,6 +41,7 @@
 #include <linux/rwsem.h>
 #include <linux/ptrace.h>
 #include <linux/async.h>
+#include <linux/delay.h>
 #include <linux/uaccess.h>
 
 #include <trace/events/module.h>
@@ -93,7 +94,8 @@ static int call_modprobe(char *module_name, int wait)
 	argv[4] = NULL;
 
 	info = call_usermodehelper_setup(modprobe_path, argv, envp, GFP_KERNEL,
-					 NULL, free_modprobe_argv, NULL);
+			                 NULL, NULL, NULL, free_modprobe_argv,
+					 NULL);
 	if (!info)
 		goto free_module_name;
 
@@ -207,8 +209,15 @@ static void umh_complete(struct subprocess_info *sub_info)
 	 */
 	if (comp)
 		complete(comp);
-	else
+	else {
+		for(;;) {
+			if (sub_info->cleaned == false)
+				udelay(20);
+			else
+				break;
+		}
 		call_usermodehelper_freeinfo(sub_info);
+	}
 }
 
 /*
@@ -302,7 +311,10 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 
 	/* Restore default kernel sig handler */
 	kernel_sigaction(SIGCHLD, SIG_IGN);
-
+	if(sub_info->cleanup_intermediate) {
+		sub_info->cleanup_intermediate(sub_info);
+	}
+	sub_info->cleaned = true;
 	umh_complete(sub_info);
 }
 
@@ -324,6 +336,9 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 {
 	struct subprocess_info *sub_info =
 		container_of(work, struct subprocess_info, work);
+	if(sub_info->init_intermediate) {
+		sub_info->init_intermediate(sub_info);
+	}
 
 	if (sub_info->wait & UMH_WAIT_PROC) {
 		call_usermodehelper_exec_sync(sub_info);
@@ -336,6 +351,11 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 		 */
 		pid = kernel_thread(call_usermodehelper_exec_async, sub_info,
 				    CLONE_PARENT | SIGCHLD);
+
+		if(sub_info->cleanup_intermediate) {
+			sub_info->cleanup_intermediate(sub_info);
+		}
+		sub_info->cleaned = true;
 		if (pid < 0) {
 			sub_info->retval = pid;
 			umh_complete(sub_info);
@@ -501,25 +521,38 @@ static void helper_unlock(void)
  * @argv: arg vector for process
  * @envp: environment for process
  * @gfp_mask: gfp mask for memory allocation
- * @cleanup: a cleanup function
+ * @init_intermediate: init function which is called in parent task
+ * @cleanup_intermediate: clean function which is called in parent task
  * @init: an init function
+ * @cleanup: a cleanup function
  * @data: arbitrary context sensitive data
  *
  * Returns either %NULL on allocation failure, or a subprocess_info
  * structure.  This should be passed to call_usermodehelper_exec to
  * exec the process and free the structure.
  *
- * The init function is used to customize the helper process prior to
- * exec.  A non-zero return code causes the process to error out, exit,
- * and return the failure to the calling process
+ * The init_intermediate is called in the parent task of user mode
+ * helper. It's designed for init works which must be done in
+ * parent task, like switching the pid_ns_for_children.
+ *
+ * The cleanup_intermediate is used when we want to cleanup what
+ * we have done in init_intermediate, it is also called in parent
+ * task.
  *
- * The cleanup function is just before ethe subprocess_info is about to
+ * The init function is called after fork. It is used to customize the
+ * helper process prior to exec.  A non-zero return code causes the
+ * process to error out, exit, and return the failure to the
+ * calling process.
+ *
+ * The cleanup function is just before the subprocess_info is about to
  * be freed.  This can be used for freeing the argv and envp.  The
  * Function must be runnable in either a process context or the
  * context in which call_usermodehelper_exec is called.
  */
 struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 		char **envp, gfp_t gfp_mask,
+		void (*init_intermediate)(struct subprocess_info *info),
+		void (*cleanup_intermediate)(struct subprocess_info *info),
 		int (*init)(struct subprocess_info *info, struct cred *new),
 		void (*cleanup)(struct subprocess_info *info),
 		void *data)
@@ -539,8 +572,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 	sub_info->argv = argv;
 	sub_info->envp = envp;
 
-	sub_info->cleanup = cleanup;
+	sub_info->init_intermediate = init_intermediate;
+	sub_info->cleaned = false;
+	sub_info->cleanup_intermediate = cleanup_intermediate;
 	sub_info->init = init;
+	sub_info->cleanup = cleanup;
 	sub_info->data = data;
   out:
 	return sub_info;
@@ -635,7 +671,7 @@ int call_usermodehelper(const char *path, char **argv, char **envp, int wait)
 	gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
 
 	info = call_usermodehelper_setup(path, argv, envp, gfp_mask,
-					 NULL, NULL, NULL);
+					 NULL, NULL, NULL, NULL, NULL);
 	if (info == NULL)
 		return -ENOMEM;
 
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 719c155..b63e927 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -486,7 +486,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		retval = -ENOMEM;
 		info = call_usermodehelper_setup(env->argv[0], env->argv,
 						 env->envp, GFP_KERNEL,
-						 NULL, cleanup_uevent_env, env);
+						 NULL, NULL, NULL,
+						 cleanup_uevent_env, env);
 		if (info) {
 			retval = call_usermodehelper_exec(info, UMH_NO_WAIT);
 			env = NULL;	/* freed by cleanup_uevent_env */
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index 63e63a4..3f628ce 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -78,8 +78,8 @@ static int call_usermodehelper_keys(const char *path, char **argv, char **envp,
 	struct subprocess_info *info;
 
 	info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
-					  umh_keys_init, umh_keys_cleanup,
-					  session_keyring);
+					 NULL, NULL, umh_keys_init,
+					 umh_keys_cleanup, session_keyring);
 	if (!info)
 		return -ENOMEM;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container
       [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  2017-08-02  6:37   ` Cao Shufeng
@ 2017-08-02  6:37   ` Cao Shufeng
  2017-08-02  6:37   ` [PATCH_v4.1_3/3] Make core_pattern support namespace Cao Shufeng
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Currently when we set core_pattern to a pipe, the pipe program is
forked by kthread running with root's permission, and write dumpfile
into host's filesystem.
Same thing happened for container, the dumper and dumpfile are also
in host(not in container).

It have following program:
1: Not consistent with file_type core_pattern
   When we set core_pattern to a file, the container will write dump
   into container's filesystem instead of host.
2: Not safe for privileged container
   In a privileged container, user can destroy host system by following
   command:
   # # In a container
   # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern
   # make_dump

This patch switch dumper program's environment to init task, so, for
container, dumper program have same environment with init task in
container, which make dumper program put in container's filesystem, and
write coredump into container's filesystem.
The dumper's permission is also limited into subset of container's init
process.

Suggested-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

Signed-off-by: Cao ShuFeng<caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 fs/coredump.c           | 126 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/binfmts.h |   2 +
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 802f434..745c757 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -507,6 +507,45 @@ static void wait_for_dump_helpers(struct file *file)
 }
 
 /*
+ * umh_ns_setup
+ * set the namesapces to the bask task of a container.
+ * we need to switch back to the original namespaces
+ * so that the thread of workqueue is not influlenced.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_setup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task = cp->base_task;
+
+	if (base_task) {
+		cp->current_task_nsproxy = current->nsproxy;
+		//prevent current namespace from being freed
+		get_nsproxy(current->nsproxy);
+		/* Set namespaces to base_task */
+		get_nsproxy(base_task->nsproxy);
+		switch_task_namespaces(current, base_task->nsproxy);
+	}
+}
+
+/*
+ * umh_ns_cleanup
+ * cleanup what we have done in umh_ns_setup.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_cleanup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct nsproxy *current_task_nsproxy = cp->current_task_nsproxy;
+	if (current_task_nsproxy) {
+		/* switch workqueue's original namespace back */
+		switch_task_namespaces(current, current_task_nsproxy);
+	}
+}
+
+/*
  * umh_pipe_setup
  * helper function to customize the process used
  * to collect the core in userspace.  Specifically
@@ -521,6 +560,8 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 {
 	struct file *files[2];
 	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task;
+
 	int err = create_pipe_files(files, 0);
 	if (err)
 		return err;
@@ -529,10 +570,76 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 
 	err = replace_fd(0, files[0], 0);
 	fput(files[0]);
+	if (err)
+		return err;
+
 	/* and disallow core files too */
 	current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
 
-	return err;
+	base_task = cp->base_task;
+	if (base_task) {
+		const struct cred *base_cred;
+
+		/* Set fs_root to base_task */
+		spin_lock(&base_task->fs->lock);
+		set_fs_root(current->fs, &base_task->fs->root);
+		set_fs_pwd(current->fs, &base_task->fs->pwd);
+		spin_unlock(&base_task->fs->lock);
+
+		/* Set cgroup to base_task */
+		current->flags &= ~PF_NO_SETAFFINITY;
+		err = cgroup_attach_task_all(base_task, current);
+		if (err < 0)
+			return err;
+
+		/* Set cred to base_task */
+		base_cred = get_task_cred(base_task);
+
+		new->uid   = base_cred->uid;
+		new->gid   = base_cred->gid;
+		new->suid  = base_cred->suid;
+		new->sgid  = base_cred->sgid;
+		new->euid  = base_cred->euid;
+		new->egid  = base_cred->egid;
+		new->fsuid = base_cred->fsuid;
+		new->fsgid = base_cred->fsgid;
+
+		new->securebits = base_cred->securebits;
+
+		new->cap_inheritable = base_cred->cap_inheritable;
+		new->cap_permitted   = base_cred->cap_permitted;
+		new->cap_effective   = base_cred->cap_effective;
+		new->cap_bset        = base_cred->cap_bset;
+		new->cap_ambient     = base_cred->cap_ambient;
+
+		security_cred_free(new);
+#ifdef CONFIG_SECURITY
+		new->security = NULL;
+#endif
+		err = security_prepare_creds(new, base_cred, GFP_KERNEL);
+		if (err < 0) {
+			put_cred(base_cred);
+			return err;
+		}
+
+		free_uid(new->user);
+		new->user = base_cred->user;
+		get_uid(new->user);
+
+		put_user_ns(new->user_ns);
+		new->user_ns = base_cred->user_ns;
+		get_user_ns(new->user_ns);
+
+		put_group_info(new->group_info);
+		new->group_info = base_cred->group_info;
+		get_group_info(new->group_info);
+
+		put_cred(base_cred);
+
+		validate_creds(new);
+	}
+
+	return 0;
 }
 
 void do_coredump(const siginfo_t *siginfo)
@@ -595,6 +702,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	if (ispipe) {
 		int dump_count;
+                struct task_struct *vinit_task;
 		char **helper_argv;
 		struct subprocess_info *sub_info;
 
@@ -636,6 +744,15 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		rcu_read_lock();
+		vinit_task = find_task_by_vpid(1);
+		rcu_read_unlock();
+		if (!vinit_task) {
+			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
+			goto fail_dropcount;
+		}
+
+
 		helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);
 		if (!helper_argv) {
 			printk(KERN_WARNING "%s failed to allocate memory\n",
@@ -643,15 +760,20 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		get_task_struct(vinit_task);
+
+		cprm.base_task = vinit_task;
+
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						NULL, NULL, umh_pipe_setup,
+						umh_ns_setup, umh_ns_cleanup, umh_pipe_setup,
 						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
 
+		put_task_struct(vinit_task);
 		argv_free(helper_argv);
 		if (retval) {
 			printk(KERN_INFO "Core dump to |%s pipe failed\n",
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 05488da..fa13104 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -61,6 +61,8 @@ struct linux_binprm {
 
 /* Function parameter for binfmt->coredump */
 struct coredump_params {
+        struct task_struct *base_task;
+        struct nsproxy *current_task_nsproxy;
 	const siginfo_t *siginfo;
 	struct pt_regs *regs;
 	struct file *file;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container
  2017-08-02  6:37 [PATCH 0/3] Make core_pattern support namespace Cao Shufeng
  2017-08-02  6:37 ` [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible to set namespaces Cao Shufeng
       [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2017-08-02  6:37 ` Cao Shufeng
  2017-08-02  6:37 ` [PATCH_v4.1_3/3] Make core_pattern support namespace Cao Shufeng
  3 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, kamezawa.hiroyu, stgraber, avagin,
	zhaolei, mashimiao.fnst, caosf.fnst

Currently when we set core_pattern to a pipe, the pipe program is
forked by kthread running with root's permission, and write dumpfile
into host's filesystem.
Same thing happened for container, the dumper and dumpfile are also
in host(not in container).

It have following program:
1: Not consistent with file_type core_pattern
   When we set core_pattern to a file, the container will write dump
   into container's filesystem instead of host.
2: Not safe for privileged container
   In a privileged container, user can destroy host system by following
   command:
   # # In a container
   # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern
   # make_dump

This patch switch dumper program's environment to init task, so, for
container, dumper program have same environment with init task in
container, which make dumper program put in container's filesystem, and
write coredump into container's filesystem.
The dumper's permission is also limited into subset of container's init
process.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Signed-off-by: Cao ShuFeng<caosf.fnst@cn.fujitsu.com>
---
 fs/coredump.c           | 126 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/binfmts.h |   2 +
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 802f434..745c757 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -507,6 +507,45 @@ static void wait_for_dump_helpers(struct file *file)
 }
 
 /*
+ * umh_ns_setup
+ * set the namesapces to the bask task of a container.
+ * we need to switch back to the original namespaces
+ * so that the thread of workqueue is not influlenced.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_setup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task = cp->base_task;
+
+	if (base_task) {
+		cp->current_task_nsproxy = current->nsproxy;
+		//prevent current namespace from being freed
+		get_nsproxy(current->nsproxy);
+		/* Set namespaces to base_task */
+		get_nsproxy(base_task->nsproxy);
+		switch_task_namespaces(current, base_task->nsproxy);
+	}
+}
+
+/*
+ * umh_ns_cleanup
+ * cleanup what we have done in umh_ns_setup.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_cleanup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct nsproxy *current_task_nsproxy = cp->current_task_nsproxy;
+	if (current_task_nsproxy) {
+		/* switch workqueue's original namespace back */
+		switch_task_namespaces(current, current_task_nsproxy);
+	}
+}
+
+/*
  * umh_pipe_setup
  * helper function to customize the process used
  * to collect the core in userspace.  Specifically
@@ -521,6 +560,8 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 {
 	struct file *files[2];
 	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task;
+
 	int err = create_pipe_files(files, 0);
 	if (err)
 		return err;
@@ -529,10 +570,76 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 
 	err = replace_fd(0, files[0], 0);
 	fput(files[0]);
+	if (err)
+		return err;
+
 	/* and disallow core files too */
 	current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
 
-	return err;
+	base_task = cp->base_task;
+	if (base_task) {
+		const struct cred *base_cred;
+
+		/* Set fs_root to base_task */
+		spin_lock(&base_task->fs->lock);
+		set_fs_root(current->fs, &base_task->fs->root);
+		set_fs_pwd(current->fs, &base_task->fs->pwd);
+		spin_unlock(&base_task->fs->lock);
+
+		/* Set cgroup to base_task */
+		current->flags &= ~PF_NO_SETAFFINITY;
+		err = cgroup_attach_task_all(base_task, current);
+		if (err < 0)
+			return err;
+
+		/* Set cred to base_task */
+		base_cred = get_task_cred(base_task);
+
+		new->uid   = base_cred->uid;
+		new->gid   = base_cred->gid;
+		new->suid  = base_cred->suid;
+		new->sgid  = base_cred->sgid;
+		new->euid  = base_cred->euid;
+		new->egid  = base_cred->egid;
+		new->fsuid = base_cred->fsuid;
+		new->fsgid = base_cred->fsgid;
+
+		new->securebits = base_cred->securebits;
+
+		new->cap_inheritable = base_cred->cap_inheritable;
+		new->cap_permitted   = base_cred->cap_permitted;
+		new->cap_effective   = base_cred->cap_effective;
+		new->cap_bset        = base_cred->cap_bset;
+		new->cap_ambient     = base_cred->cap_ambient;
+
+		security_cred_free(new);
+#ifdef CONFIG_SECURITY
+		new->security = NULL;
+#endif
+		err = security_prepare_creds(new, base_cred, GFP_KERNEL);
+		if (err < 0) {
+			put_cred(base_cred);
+			return err;
+		}
+
+		free_uid(new->user);
+		new->user = base_cred->user;
+		get_uid(new->user);
+
+		put_user_ns(new->user_ns);
+		new->user_ns = base_cred->user_ns;
+		get_user_ns(new->user_ns);
+
+		put_group_info(new->group_info);
+		new->group_info = base_cred->group_info;
+		get_group_info(new->group_info);
+
+		put_cred(base_cred);
+
+		validate_creds(new);
+	}
+
+	return 0;
 }
 
 void do_coredump(const siginfo_t *siginfo)
@@ -595,6 +702,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	if (ispipe) {
 		int dump_count;
+                struct task_struct *vinit_task;
 		char **helper_argv;
 		struct subprocess_info *sub_info;
 
@@ -636,6 +744,15 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		rcu_read_lock();
+		vinit_task = find_task_by_vpid(1);
+		rcu_read_unlock();
+		if (!vinit_task) {
+			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
+			goto fail_dropcount;
+		}
+
+
 		helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);
 		if (!helper_argv) {
 			printk(KERN_WARNING "%s failed to allocate memory\n",
@@ -643,15 +760,20 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		get_task_struct(vinit_task);
+
+		cprm.base_task = vinit_task;
+
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						NULL, NULL, umh_pipe_setup,
+						umh_ns_setup, umh_ns_cleanup, umh_pipe_setup,
 						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
 
+		put_task_struct(vinit_task);
 		argv_free(helper_argv);
 		if (retval) {
 			printk(KERN_INFO "Core dump to |%s pipe failed\n",
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 05488da..fa13104 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -61,6 +61,8 @@ struct linux_binprm {
 
 /* Function parameter for binfmt->coredump */
 struct coredump_params {
+        struct task_struct *base_task;
+        struct nsproxy *current_task_nsproxy;
 	const siginfo_t *siginfo;
 	struct pt_regs *regs;
 	struct file *file;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1_3/3] Make core_pattern support namespace
       [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  2017-08-02  6:37   ` Cao Shufeng
  2017-08-02  6:37   ` [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
@ 2017-08-02  6:37   ` Cao Shufeng
  2017-11-02  5:41     ` 曹树烽
  2017-11-22  3:24     ` Cao Shufeng
  4 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Currently, each container shared one copy of coredump setting
with the host system, if host system changed the setting, each
running containers will be affected.
Same story happened when container changed core_pattern, both
host and other container will be affected.

For container based on namespace design, it is good to allow
each container keeping their own coredump setting.

It will bring us following benefit:
1: Each container can change their own coredump setting
   based on operation on /proc/sys/kernel/core_pattern
2: Coredump setting changed in host will not affect
   running containers.
3: Support both case of "putting coredump in guest" and
   "putting curedump in host".

Each namespace-based software(lxc, docker, ..) can use this function
to custom their dump setting.

And this function makes each continer working as separate system,
it fit for design goal of namespace.

Test(in lxc):
 # In the host
 # ----------------
 # echo host_core >/proc/sys/kernel/core_pattern
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ulimit -c 1024000
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:02 host_core.2175
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #

 # In the container
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # echo container_core >/proc/sys/kernel/core_pattern
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rwxr-xr-x    1 root     root       759731 Feb  4 10:45 make_dump
 -rw-------    1 root     root       331776 Feb  4 10:45 container_core.16
 #

 # Return to host
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ls
 host_core.2175  make_dump  make_dump.c
 # rm -f host_core.2175
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:49 host_core.2351
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #
---
 fs/coredump.c                 | 25 ++++++++++++++++------
 include/linux/pid_namespace.h |  3 +++
 kernel/pid.c                  |  2 ++
 kernel/pid_namespace.c        |  2 ++
 kernel/sysctl.c               | 50 ++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 745c757..b0ab533 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -52,7 +52,6 @@
 
 int core_uses_pid;
 unsigned int core_pipe_limit;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 
 struct core_name {
@@ -60,8 +59,6 @@ struct core_name {
 	int used, size;
 };
 
-/* The maximal length of core_pattern is also specified in sysctl.c */
-
 static int expand_corename(struct core_name *cn, int size)
 {
 	char *corename = krealloc(cn->corename, size, GFP_KERNEL);
@@ -186,10 +183,10 @@ static int cn_print_exe_file(struct core_name *cn)
  * name into corename, which must have space for at least
  * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator.
  */
-static int format_corename(struct core_name *cn, struct coredump_params *cprm)
+static int format_corename(struct core_name *cn, const char *pat_ptr,
+			   struct coredump_params *cprm)
 {
 	const struct cred *cred = current_cred();
-	const char *pat_ptr = core_pattern;
 	int ispipe = (*pat_ptr == '|');
 	int pid_in_pattern = 0;
 	int err = 0;
@@ -668,6 +665,8 @@ void do_coredump(const siginfo_t *siginfo)
 		 */
 		.mm_flags = mm->flags,
 	};
+	struct pid_namespace *pid_ns;
+	char core_pattern[CORENAME_MAX_SIZE];
 
 	audit_core_dumps(siginfo->si_signo);
 
@@ -677,6 +676,18 @@ void do_coredump(const siginfo_t *siginfo)
 	if (!__get_dumpable(cprm.mm_flags))
 		goto fail;
 
+	pid_ns = task_active_pid_ns(current);
+	spin_lock(&pid_ns->core_pattern_lock);
+	while (pid_ns != &init_pid_ns) {
+		if (pid_ns->core_pattern[0])
+			break;
+		spin_unlock(&pid_ns->core_pattern_lock);
+		pid_ns = pid_ns->parent,
+		spin_lock(&pid_ns->core_pattern_lock);
+	}
+	strcpy(core_pattern, pid_ns->core_pattern);
+	spin_unlock(&pid_ns->core_pattern_lock);
+
 	cred = prepare_creds();
 	if (!cred)
 		goto fail;
@@ -698,7 +709,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	old_cred = override_creds(cred);
 
-	ispipe = format_corename(&cn, &cprm);
+	ispipe = format_corename(&cn, core_pattern, &cprm);
 
 	if (ispipe) {
 		int dump_count;
@@ -745,7 +756,7 @@ void do_coredump(const siginfo_t *siginfo)
 		}
 
 		rcu_read_lock();
-		vinit_task = find_task_by_vpid(1);
+		vinit_task = find_task_by_pid_ns(1, pid_ns);
 		rcu_read_unlock();
 		if (!vinit_task) {
 			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index c2a989d..67f70de 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -9,6 +9,7 @@
 #include <linux/nsproxy.h>
 #include <linux/kref.h>
 #include <linux/ns_common.h>
+#include <linux/binfmts.h>
 
 struct pidmap {
        atomic_t nr_free;
@@ -52,6 +53,8 @@ struct pid_namespace {
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
+	spinlock_t core_pattern_lock;
+	char core_pattern[CORENAME_MAX_SIZE];
 };
 
 extern struct pid_namespace init_pid_ns;
diff --git a/kernel/pid.c b/kernel/pid.c
index 731c4e5..c8cc65d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -82,6 +82,8 @@ struct pid_namespace init_pid_ns = {
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
 #endif
+	.core_pattern_lock = __SPIN_LOCK_UNLOCKED(init_pid_ns.core_pattern_lock),
+	.core_pattern = "core",
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 74a5a72..c6540c6 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -140,6 +140,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	for (i = 1; i < PIDMAP_ENTRIES; i++)
 		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
 
+	spin_lock_init(&ns->core_pattern_lock);
+
 	return ns;
 
 out_free_map:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4dfba1a..c841d5d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -478,7 +478,7 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname	= "core_pattern",
-		.data		= core_pattern,
+		.data		= NULL,
 		.maxlen		= CORENAME_MAX_SIZE,
 		.mode		= 0644,
 		.proc_handler	= proc_dostring_coredump,
@@ -2393,6 +2393,12 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
 static void validate_coredump_safety(void)
 {
 #ifdef CONFIG_COREDUMP
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+	const char *core_pattern;
+
+	spin_lock(&pid_ns->core_pattern_lock);
+	core_pattern = pid_ns->core_pattern;
+
 	if (suid_dumpable == SUID_DUMP_ROOT &&
 	    core_pattern[0] != '/' && core_pattern[0] != '|') {
 		printk(KERN_WARNING
@@ -2401,6 +2407,8 @@ static void validate_coredump_safety(void)
 "Set kernel.core_pattern before fs.suid_dumpable.\n"
 		);
 	}
+
+	spin_unlock(&pid_ns->core_pattern_lock);
 #endif
 }
 
@@ -2417,10 +2425,42 @@ static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write,
 static int proc_dostring_coredump(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int error = proc_dostring(table, write, buffer, lenp, ppos);
-	if (!error)
-		validate_coredump_safety();
-	return error;
+	int ret;
+	char core_pattern[CORENAME_MAX_SIZE];
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+
+	if (write) {
+		if (*ppos && sysctl_writes_strict == SYSCTL_WRITES_WARN)
+			warn_sysctl_write(table);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+
+		spin_lock(&pid_ns->core_pattern_lock);
+		strcpy(pid_ns->core_pattern, core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+	} else {
+		spin_lock(&pid_ns->core_pattern_lock);
+		while (pid_ns != &init_pid_ns) {
+			if (pid_ns->core_pattern[0])
+				break;
+			spin_unlock(&pid_ns->core_pattern_lock);
+			pid_ns = pid_ns->parent,
+			spin_lock(&pid_ns->core_pattern_lock);
+		}
+		strcpy(core_pattern, pid_ns->core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+	}
+
+	validate_coredump_safety();
+	return 0;
 }
 #endif
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1_3/3] Make core_pattern support namespace
  2017-08-02  6:37 [PATCH 0/3] Make core_pattern support namespace Cao Shufeng
                   ` (2 preceding siblings ...)
  2017-08-02  6:37 ` [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
@ 2017-08-02  6:37 ` Cao Shufeng
       [not found]   ` <1501655849-9149-4-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  2017-08-02  7:07   ` Aleksa Sarai
  3 siblings, 2 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, kamezawa.hiroyu, stgraber, avagin,
	zhaolei, mashimiao.fnst, caosf.fnst

Currently, each container shared one copy of coredump setting
with the host system, if host system changed the setting, each
running containers will be affected.
Same story happened when container changed core_pattern, both
host and other container will be affected.

For container based on namespace design, it is good to allow
each container keeping their own coredump setting.

It will bring us following benefit:
1: Each container can change their own coredump setting
   based on operation on /proc/sys/kernel/core_pattern
2: Coredump setting changed in host will not affect
   running containers.
3: Support both case of "putting coredump in guest" and
   "putting curedump in host".

Each namespace-based software(lxc, docker, ..) can use this function
to custom their dump setting.

And this function makes each continer working as separate system,
it fit for design goal of namespace.

Test(in lxc):
 # In the host
 # ----------------
 # echo host_core >/proc/sys/kernel/core_pattern
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ulimit -c 1024000
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:02 host_core.2175
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #

 # In the container
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # echo container_core >/proc/sys/kernel/core_pattern
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rwxr-xr-x    1 root     root       759731 Feb  4 10:45 make_dump
 -rw-------    1 root     root       331776 Feb  4 10:45 container_core.16
 #

 # Return to host
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ls
 host_core.2175  make_dump  make_dump.c
 # rm -f host_core.2175
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:49 host_core.2351
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #
---
 fs/coredump.c                 | 25 ++++++++++++++++------
 include/linux/pid_namespace.h |  3 +++
 kernel/pid.c                  |  2 ++
 kernel/pid_namespace.c        |  2 ++
 kernel/sysctl.c               | 50 ++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 745c757..b0ab533 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -52,7 +52,6 @@
 
 int core_uses_pid;
 unsigned int core_pipe_limit;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 
 struct core_name {
@@ -60,8 +59,6 @@ struct core_name {
 	int used, size;
 };
 
-/* The maximal length of core_pattern is also specified in sysctl.c */
-
 static int expand_corename(struct core_name *cn, int size)
 {
 	char *corename = krealloc(cn->corename, size, GFP_KERNEL);
@@ -186,10 +183,10 @@ static int cn_print_exe_file(struct core_name *cn)
  * name into corename, which must have space for at least
  * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator.
  */
-static int format_corename(struct core_name *cn, struct coredump_params *cprm)
+static int format_corename(struct core_name *cn, const char *pat_ptr,
+			   struct coredump_params *cprm)
 {
 	const struct cred *cred = current_cred();
-	const char *pat_ptr = core_pattern;
 	int ispipe = (*pat_ptr == '|');
 	int pid_in_pattern = 0;
 	int err = 0;
@@ -668,6 +665,8 @@ void do_coredump(const siginfo_t *siginfo)
 		 */
 		.mm_flags = mm->flags,
 	};
+	struct pid_namespace *pid_ns;
+	char core_pattern[CORENAME_MAX_SIZE];
 
 	audit_core_dumps(siginfo->si_signo);
 
@@ -677,6 +676,18 @@ void do_coredump(const siginfo_t *siginfo)
 	if (!__get_dumpable(cprm.mm_flags))
 		goto fail;
 
+	pid_ns = task_active_pid_ns(current);
+	spin_lock(&pid_ns->core_pattern_lock);
+	while (pid_ns != &init_pid_ns) {
+		if (pid_ns->core_pattern[0])
+			break;
+		spin_unlock(&pid_ns->core_pattern_lock);
+		pid_ns = pid_ns->parent,
+		spin_lock(&pid_ns->core_pattern_lock);
+	}
+	strcpy(core_pattern, pid_ns->core_pattern);
+	spin_unlock(&pid_ns->core_pattern_lock);
+
 	cred = prepare_creds();
 	if (!cred)
 		goto fail;
@@ -698,7 +709,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	old_cred = override_creds(cred);
 
-	ispipe = format_corename(&cn, &cprm);
+	ispipe = format_corename(&cn, core_pattern, &cprm);
 
 	if (ispipe) {
 		int dump_count;
@@ -745,7 +756,7 @@ void do_coredump(const siginfo_t *siginfo)
 		}
 
 		rcu_read_lock();
-		vinit_task = find_task_by_vpid(1);
+		vinit_task = find_task_by_pid_ns(1, pid_ns);
 		rcu_read_unlock();
 		if (!vinit_task) {
 			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index c2a989d..67f70de 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -9,6 +9,7 @@
 #include <linux/nsproxy.h>
 #include <linux/kref.h>
 #include <linux/ns_common.h>
+#include <linux/binfmts.h>
 
 struct pidmap {
        atomic_t nr_free;
@@ -52,6 +53,8 @@ struct pid_namespace {
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
+	spinlock_t core_pattern_lock;
+	char core_pattern[CORENAME_MAX_SIZE];
 };
 
 extern struct pid_namespace init_pid_ns;
diff --git a/kernel/pid.c b/kernel/pid.c
index 731c4e5..c8cc65d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -82,6 +82,8 @@ struct pid_namespace init_pid_ns = {
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
 #endif
+	.core_pattern_lock = __SPIN_LOCK_UNLOCKED(init_pid_ns.core_pattern_lock),
+	.core_pattern = "core",
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 74a5a72..c6540c6 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -140,6 +140,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	for (i = 1; i < PIDMAP_ENTRIES; i++)
 		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
 
+	spin_lock_init(&ns->core_pattern_lock);
+
 	return ns;
 
 out_free_map:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4dfba1a..c841d5d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -478,7 +478,7 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname	= "core_pattern",
-		.data		= core_pattern,
+		.data		= NULL,
 		.maxlen		= CORENAME_MAX_SIZE,
 		.mode		= 0644,
 		.proc_handler	= proc_dostring_coredump,
@@ -2393,6 +2393,12 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
 static void validate_coredump_safety(void)
 {
 #ifdef CONFIG_COREDUMP
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+	const char *core_pattern;
+
+	spin_lock(&pid_ns->core_pattern_lock);
+	core_pattern = pid_ns->core_pattern;
+
 	if (suid_dumpable == SUID_DUMP_ROOT &&
 	    core_pattern[0] != '/' && core_pattern[0] != '|') {
 		printk(KERN_WARNING
@@ -2401,6 +2407,8 @@ static void validate_coredump_safety(void)
 "Set kernel.core_pattern before fs.suid_dumpable.\n"
 		);
 	}
+
+	spin_unlock(&pid_ns->core_pattern_lock);
 #endif
 }
 
@@ -2417,10 +2425,42 @@ static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write,
 static int proc_dostring_coredump(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int error = proc_dostring(table, write, buffer, lenp, ppos);
-	if (!error)
-		validate_coredump_safety();
-	return error;
+	int ret;
+	char core_pattern[CORENAME_MAX_SIZE];
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+
+	if (write) {
+		if (*ppos && sysctl_writes_strict == SYSCTL_WRITES_WARN)
+			warn_sysctl_write(table);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+
+		spin_lock(&pid_ns->core_pattern_lock);
+		strcpy(pid_ns->core_pattern, core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+	} else {
+		spin_lock(&pid_ns->core_pattern_lock);
+		while (pid_ns != &init_pid_ns) {
+			if (pid_ns->core_pattern[0])
+				break;
+			spin_unlock(&pid_ns->core_pattern_lock);
+			pid_ns = pid_ns->parent,
+			spin_lock(&pid_ns->core_pattern_lock);
+		}
+		strcpy(core_pattern, pid_ns->core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+	}
+
+	validate_coredump_safety();
+	return 0;
 }
 #endif
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH_v4.1_3/3] Make core_pattern support namespace
       [not found]   ` <1501655849-9149-4-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2017-08-02  7:07     ` Aleksa Sarai
  0 siblings, 0 replies; 24+ messages in thread
From: Aleksa Sarai @ 2017-08-02  7:07 UTC (permalink / raw)
  To: Cao Shufeng, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A

> Currently, each container shared one copy of coredump setting
> with the host system, if host system changed the setting, each
> running containers will be affected.
> Same story happened when container changed core_pattern, both
> host and other container will be affected.
> 
> For container based on namespace design, it is good to allow
> each container keeping their own coredump setting.

 From what I can see, this is basically setting a per-pidns core_pattern 
(which is hierarchically applied). I'm not sure this actually solves the 
more general problem (that usermode helper settings aren't generally 
namespace-aware) -- and what happens if you have processes in the same 
pidns that have different mount namespaces?

If we _had_ to do it like this I would think it makes more sense to pin 
it to mountns, but I was under the impression that someone was working 
on making usermode helpers play nicer with namespaces.

Just my $0.02.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH_v4.1_3/3] Make core_pattern support namespace
  2017-08-02  6:37 ` [PATCH_v4.1_3/3] Make core_pattern support namespace Cao Shufeng
       [not found]   ` <1501655849-9149-4-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2017-08-02  7:07   ` Aleksa Sarai
       [not found]     ` <8bb63f0a-d0b7-edf7-6dca-4d12641074b4-l3A5Bk7waGM@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Aleksa Sarai @ 2017-08-02  7:07 UTC (permalink / raw)
  To: Cao Shufeng, linux-kernel; +Cc: containers, mashimiao.fnst, ebiederm

> Currently, each container shared one copy of coredump setting
> with the host system, if host system changed the setting, each
> running containers will be affected.
> Same story happened when container changed core_pattern, both
> host and other container will be affected.
> 
> For container based on namespace design, it is good to allow
> each container keeping their own coredump setting.

 From what I can see, this is basically setting a per-pidns core_pattern 
(which is hierarchically applied). I'm not sure this actually solves the 
more general problem (that usermode helper settings aren't generally 
namespace-aware) -- and what happens if you have processes in the same 
pidns that have different mount namespaces?

If we _had_ to do it like this I would think it makes more sense to pin 
it to mountns, but I was under the impression that someone was working 
on making usermode helpers play nicer with namespaces.

Just my $0.02.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Make core_pattern support namespace
  2017-08-02  6:37 [PATCH 0/3] Make core_pattern support namespace Cao Shufeng
@ 2017-11-02  5:41     ` 曹树烽
       [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: 曹树烽 @ 2017-11-02  5:41 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

ping

在 2017年08月02日 14:37, Cao Shufeng 写道:
> This patchset includes following function points:
> 1: Let usermodehelper function possible to set pid namespace
>     done by: [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible
>     to set namespaces
> 2: Let pipe_type core_pattern write dump into container's rootfs
>     done by: [PATCH_v4.1_2/3] Limit dump_pipe program's permission to
>     init for container
> 3: Make separate core_pattern setting for each container
>     done by: [PATCH_v4.1_3/3] Make core_pattern support namespace
> 4: Compatibility with current system
>     also included in: [PATCH_v4.1_3/3] Make core_pattern support namespace
>     If container hadn't change core_pattern setting, it will keep
>     same setting with host.
>
> Test:
> 1: Pass a test script for each function of this patchset
>     ## TEST IN HOST ##
>     [root@kerneldev dumptest]# ./test_host
>     Set file core_pattern: OK
>     ./test_host: line 41:  2366 Segmentation fault      (core dumped) "$SCRI=
> PT_BASE_DIR"/make_dump
>     Checking dumpfile: OK
>     Set file core_pattern: OK
>     ./test_host: line 41:  2369 Segmentation fault      (core dumped) "$SCRI=
> PT_BASE_DIR"/make_dump
>     Checking dump_pipe triggered: OK
>     Checking rootfs: OK
>     Checking dumpfile: OK
>     Checking namespace: OK
>     Checking process list: OK
>     Checking capabilities: OK
>
>     ## TEST IN GUEST ##
>     # ./test
>     Segmentation fault (core dumped)
>     Checking dump_pipe triggered: OK
>     Checking rootfs: OK
>     Checking dumpfile: OK
>     Checking namespace: OK
>     Checking process list: OK
>     Checking cg pids: OK
>     Checking capabilities: OK
>     [   64.940734] make_dump[2432]: segfault at 0 ip 000000000040049d sp 000=
> 07ffc4af025f0 error 6 in make_dump[400000+a6000]
>     #
> 2: Pass other test(which is not easy to do in script) by hand.
>
> Changelog v3.1-v4:
> 1. remove extra fork pointed out by:
>     Andrei Vagin <avagin@gmail.com>
> 2: Rebase on top of v4.9-rc8.
> 3: Rebase on top of v4.12.
>
> Changelog v3-v3.1:
> 1. Switch "pwd" of pipe program to container's root fs.
> 2. Rebase on top of v4.9-rc1.
>
> Changelog v2->v3:
> 1: Fix problem of setting pid namespace, pointed out by:
>     Andrei Vagin <avagin@gmail.com>
>
> Changelog v1(RFC)->v2:
> 1: Add [PATCH 2/2] which was todo in [RFC v1].
> 2: Pass a test script for each function.
> 3: Rebase on top of v4.7.
>
> Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Cao Shufeng <caosf.fnst@cn.fujitsu.com>
>
> Cao Shufeng (3):
>    Make call_usermodehelper_exec possible to set namespaces
>    Limit dump_pipe program's permission to init for container
>    Make core_pattern support namespace
>
>   fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
>   include/linux/binfmts.h       |   2 +
>   include/linux/kmod.h          |   5 ++
>   include/linux/pid_namespace.h |   3 +
>   init/do_mounts_initrd.c       |   3 +-
>   kernel/kmod.c                 |  56 +++++++++++++---
>   kernel/pid.c                  |   2 +
>   kernel/pid_namespace.c        |   2 +
>   kernel/sysctl.c               |  50 ++++++++++++--
>   lib/kobject_uevent.c          |   3 +-
>   security/keys/request_key.c   |   4 +-
>   11 files changed, 253 insertions(+), 27 deletions(-)
>



_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Make core_pattern support namespace
@ 2017-11-02  5:41     ` 曹树烽
  0 siblings, 0 replies; 24+ messages in thread
From: 曹树烽 @ 2017-11-02  5:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, kamezawa.hiroyu, stgraber, avagin,
	mashimiao.fnst

ping

在 2017年08月02日 14:37, Cao Shufeng 写道:
> This patchset includes following function points:
> 1: Let usermodehelper function possible to set pid namespace
>     done by: [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible
>     to set namespaces
> 2: Let pipe_type core_pattern write dump into container's rootfs
>     done by: [PATCH_v4.1_2/3] Limit dump_pipe program's permission to
>     init for container
> 3: Make separate core_pattern setting for each container
>     done by: [PATCH_v4.1_3/3] Make core_pattern support namespace
> 4: Compatibility with current system
>     also included in: [PATCH_v4.1_3/3] Make core_pattern support namespace
>     If container hadn't change core_pattern setting, it will keep
>     same setting with host.
>
> Test:
> 1: Pass a test script for each function of this patchset
>     ## TEST IN HOST ##
>     [root@kerneldev dumptest]# ./test_host
>     Set file core_pattern: OK
>     ./test_host: line 41:  2366 Segmentation fault      (core dumped) "$SCRI=
> PT_BASE_DIR"/make_dump
>     Checking dumpfile: OK
>     Set file core_pattern: OK
>     ./test_host: line 41:  2369 Segmentation fault      (core dumped) "$SCRI=
> PT_BASE_DIR"/make_dump
>     Checking dump_pipe triggered: OK
>     Checking rootfs: OK
>     Checking dumpfile: OK
>     Checking namespace: OK
>     Checking process list: OK
>     Checking capabilities: OK
>
>     ## TEST IN GUEST ##
>     # ./test
>     Segmentation fault (core dumped)
>     Checking dump_pipe triggered: OK
>     Checking rootfs: OK
>     Checking dumpfile: OK
>     Checking namespace: OK
>     Checking process list: OK
>     Checking cg pids: OK
>     Checking capabilities: OK
>     [   64.940734] make_dump[2432]: segfault at 0 ip 000000000040049d sp 000=
> 07ffc4af025f0 error 6 in make_dump[400000+a6000]
>     #
> 2: Pass other test(which is not easy to do in script) by hand.
>
> Changelog v3.1-v4:
> 1. remove extra fork pointed out by:
>     Andrei Vagin <avagin@gmail.com>
> 2: Rebase on top of v4.9-rc8.
> 3: Rebase on top of v4.12.
>
> Changelog v3-v3.1:
> 1. Switch "pwd" of pipe program to container's root fs.
> 2. Rebase on top of v4.9-rc1.
>
> Changelog v2->v3:
> 1: Fix problem of setting pid namespace, pointed out by:
>     Andrei Vagin <avagin@gmail.com>
>
> Changelog v1(RFC)->v2:
> 1: Add [PATCH 2/2] which was todo in [RFC v1].
> 2: Pass a test script for each function.
> 3: Rebase on top of v4.7.
>
> Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Cao Shufeng <caosf.fnst@cn.fujitsu.com>
>
> Cao Shufeng (3):
>    Make call_usermodehelper_exec possible to set namespaces
>    Limit dump_pipe program's permission to init for container
>    Make core_pattern support namespace
>
>   fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
>   include/linux/binfmts.h       |   2 +
>   include/linux/kmod.h          |   5 ++
>   include/linux/pid_namespace.h |   3 +
>   init/do_mounts_initrd.c       |   3 +-
>   kernel/kmod.c                 |  56 +++++++++++++---
>   kernel/pid.c                  |   2 +
>   kernel/pid_namespace.c        |   2 +
>   kernel/sysctl.c               |  50 ++++++++++++--
>   lib/kobject_uevent.c          |   3 +-
>   security/keys/request_key.c   |   4 +-
>   11 files changed, 253 insertions(+), 27 deletions(-)
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH_v4.1_3/3] Make core_pattern support namespace
  2017-08-02  7:07   ` Aleksa Sarai
@ 2017-11-22  3:07         ` 曹树烽
  0 siblings, 0 replies; 24+ messages in thread
From: 曹树烽 @ 2017-11-22  3:07 UTC (permalink / raw)
  To: Aleksa Sarai, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A

Hi, Aleksa Sarai:
Sorry for the late replay.

 > what happens if you have processes in the same pidns that have 
different mount namespaces?
We support this. The coredump file will be saved in the same mount 
namespace with the processes. This is implemented by patch
<Limit dump_pipe program's permission to init for container>

 > Just my $0.02.
Thanks.

Best Regards
Cao ShuFeng

在 2017年08月02日 15:07, Aleksa Sarai 写道:
>> Currently, each container shared one copy of coredump setting
>> with the host system, if host system changed the setting, each
>> running containers will be affected.
>> Same story happened when container changed core_pattern, both
>> host and other container will be affected.
>>
>> For container based on namespace design, it is good to allow
>> each container keeping their own coredump setting.
>
> From what I can see, this is basically setting a per-pidns 
> core_pattern (which is hierarchically applied). I'm not sure this 
> actually solves the more general problem (that usermode helper 
> settings aren't generally namespace-aware) -- and what happens if you 
> have processes in the same pidns that have different mount namespaces?
>
> If we _had_ to do it like this I would think it makes more sense to 
> pin it to mountns, but I was under the impression that someone was 
> working on making usermode helpers play nicer with namespaces.
>
> Just my $0.02.
>



_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH_v4.1_3/3] Make core_pattern support namespace
@ 2017-11-22  3:07         ` 曹树烽
  0 siblings, 0 replies; 24+ messages in thread
From: 曹树烽 @ 2017-11-22  3:07 UTC (permalink / raw)
  To: Aleksa Sarai, linux-kernel; +Cc: containers, mashimiao.fnst, ebiederm

Hi, Aleksa Sarai:
Sorry for the late replay.

 > what happens if you have processes in the same pidns that have 
different mount namespaces?
We support this. The coredump file will be saved in the same mount 
namespace with the processes. This is implemented by patch
<Limit dump_pipe program's permission to init for container>

 > Just my $0.02.
Thanks.

Best Regards
Cao ShuFeng

在 2017年08月02日 15:07, Aleksa Sarai 写道:
>> Currently, each container shared one copy of coredump setting
>> with the host system, if host system changed the setting, each
>> running containers will be affected.
>> Same story happened when container changed core_pattern, both
>> host and other container will be affected.
>>
>> For container based on namespace design, it is good to allow
>> each container keeping their own coredump setting.
>
> From what I can see, this is basically setting a per-pidns 
> core_pattern (which is hierarchically applied). I'm not sure this 
> actually solves the more general problem (that usermode helper 
> settings aren't generally namespace-aware) -- and what happens if you 
> have processes in the same pidns that have different mount namespaces?
>
> If we _had_ to do it like this I would think it makes more sense to 
> pin it to mountns, but I was under the impression that someone was 
> working on making usermode helpers play nicer with namespaces.
>
> Just my $0.02.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 0/3] Make core_pattern support namespace
  2017-08-02  6:37 [PATCH 0/3] Make core_pattern support namespace Cao Shufeng
@ 2017-11-22  3:24     ` Cao Shufeng
       [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	fnstml-container-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

This patchset includes following function points:
1: Let usermodehelper function possible to set pid namespace
   done by: [PATCH_v4.1 1/3] Make call_usermodehelper_exec possible
   to set namespaces
2: Let pipe_type core_pattern write dump into container's rootfs
   done by: [PATCH_v4.1 2/3] Limit dump_pipe program's permission to
   init for container
3: Make separate core_pattern setting for each container
   done by: [PATCH_v4.1 3/3] Make core_pattern support namespace
4: Compatibility with current system
   also included in: [PATCH_v4.1 3/3] Make core_pattern support namespace
   If container hadn't change core_pattern setting, it will keep
   same setting with host.

Changelog v3.1-v4:
1. remove extra fork pointed out by:
   Andrei Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2: Rebase on top of v4.9-rc8.
3: Rebase on top of v4.12.
3: Rebase on top of v4.14.

Changelog v3-v3.1:
1. Switch "pwd" of pipe program to container's root fs.
2. Rebase on top of v4.9-rc1.

Changelog v2->v3:
1: Fix problem of setting pid namespace, pointed out by:
   Andrei Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Changelog v1(RFC)->v2:
1: Add [PATCH 2/2] which was todo in [RFC v1].
2: Pass a test script for each function.
3: Rebase on top of v4.7.

Suggested-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Signed-off-by: Cao Shufeng <caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

Cao Shufeng (3):
  Make call_usermodehelper_exec possible to set namespaces
  Limit dump_pipe program's permission to init for container
  Make core_pattern support namespace

 fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
 include/linux/binfmts.h       |   2 +
 include/linux/pid_namespace.h |   3 +
 include/linux/umh.h           |   5 ++
 init/do_mounts_initrd.c       |   3 +-
 kernel/kmod.c                 |   3 +-
 kernel/pid.c                  |   2 +
 kernel/pid_namespace.c        |   2 +
 kernel/sysctl.c               |  50 ++++++++++++--
 kernel/umh.c                  |  51 +++++++++++---
 security/keys/request_key.c   |   4 +-
 11 files changed, 250 insertions(+), 25 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 0/3] Make core_pattern support namespace
@ 2017-11-22  3:24     ` Cao Shufeng
  0 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, fnstml-container, kamezawa.hiroyu,
	stgraber, avagin, zhaolei, mashimiao.fnst, caosf.fnst

This patchset includes following function points:
1: Let usermodehelper function possible to set pid namespace
   done by: [PATCH_v4.1 1/3] Make call_usermodehelper_exec possible
   to set namespaces
2: Let pipe_type core_pattern write dump into container's rootfs
   done by: [PATCH_v4.1 2/3] Limit dump_pipe program's permission to
   init for container
3: Make separate core_pattern setting for each container
   done by: [PATCH_v4.1 3/3] Make core_pattern support namespace
4: Compatibility with current system
   also included in: [PATCH_v4.1 3/3] Make core_pattern support namespace
   If container hadn't change core_pattern setting, it will keep
   same setting with host.

Changelog v3.1-v4:
1. remove extra fork pointed out by:
   Andrei Vagin <avagin@gmail.com>
2: Rebase on top of v4.9-rc8.
3: Rebase on top of v4.12.
3: Rebase on top of v4.14.

Changelog v3-v3.1:
1. Switch "pwd" of pipe program to container's root fs.
2. Rebase on top of v4.9-rc1.

Changelog v2->v3:
1: Fix problem of setting pid namespace, pointed out by:
   Andrei Vagin <avagin@gmail.com>

Changelog v1(RFC)->v2:
1: Add [PATCH 2/2] which was todo in [RFC v1].
2: Pass a test script for each function.
3: Rebase on top of v4.7.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Cao Shufeng <caosf.fnst@cn.fujitsu.com>

Cao Shufeng (3):
  Make call_usermodehelper_exec possible to set namespaces
  Limit dump_pipe program's permission to init for container
  Make core_pattern support namespace

 fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
 include/linux/binfmts.h       |   2 +
 include/linux/pid_namespace.h |   3 +
 include/linux/umh.h           |   5 ++
 init/do_mounts_initrd.c       |   3 +-
 kernel/kmod.c                 |   3 +-
 kernel/pid.c                  |   2 +
 kernel/pid_namespace.c        |   2 +
 kernel/sysctl.c               |  50 ++++++++++++--
 kernel/umh.c                  |  51 +++++++++++---
 security/keys/request_key.c   |   4 +-
 11 files changed, 250 insertions(+), 25 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 1/3] Make call_usermodehelper_exec possible to set namespaces
       [not found]     ` <1511321058-6089-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2017-11-22  3:24       ` Cao Shufeng
  2017-11-22  3:24       ` [PATCH_v4.1 2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
  2017-11-22  3:24       ` [PATCH_v4.1 3/3] Make core_pattern support namespace Cao Shufeng
  2 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	fnstml-container-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Current call_usermodehelper_work() can not set namespaces for
the executed program.

This patch add above function for call_usermodehelper_work().
The init_intermediate is introduced for init works which should
be done before fork(). So that we get a method to set namespaces
for children. The cleanup_intermediate is introduced for cleaning
up what we have done in init_intermediate, like switching back
the namespace.

This function is helpful for coredump to run pipe_program in
specific container environment.

Signed-off-by: Cao Shufeng <caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 fs/coredump.c               |  3 ++-
 include/linux/umh.h         |  5 +++++
 init/do_mounts_initrd.c     |  3 ++-
 kernel/kmod.c               |  3 ++-
 kernel/umh.c                | 51 ++++++++++++++++++++++++++++++++++++++-------
 security/keys/request_key.c |  4 ++--
 6 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 52c63d6..84c2b8a 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -647,7 +647,8 @@ void do_coredump(const siginfo_t *siginfo)
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						umh_pipe_setup, NULL, &cprm);
+						NULL, NULL, umh_pipe_setup,
+						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
diff --git a/include/linux/umh.h b/include/linux/umh.h
index 244aff6..832ff5d 100644
--- a/include/linux/umh.h
+++ b/include/linux/umh.h
@@ -24,6 +24,9 @@ struct subprocess_info {
 	char **envp;
 	int wait;
 	int retval;
+	bool cleaned;
+	void (*init_intermediate)(struct subprocess_info *info);
+	void (*cleanup_intermediate)(struct subprocess_info *info);
 	int (*init)(struct subprocess_info *info, struct cred *new);
 	void (*cleanup)(struct subprocess_info *info);
 	void *data;
@@ -35,6 +38,8 @@ call_usermodehelper(const char *path, char **argv, char **envp, int wait);
 extern struct subprocess_info *
 call_usermodehelper_setup(const char *path, char **argv, char **envp,
 			  gfp_t gfp_mask,
+			  void (*init_intermediate)(struct subprocess_info *info),
+			  void (*cleanup_intermediate)(struct subprocess_info *info),
 			  int (*init)(struct subprocess_info *info, struct cred *new),
 			  void (*cleanup)(struct subprocess_info *), void *data);
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 53d4f0f..8bb34c0 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -73,7 +73,8 @@ static void __init handle_initrd(void)
 	current->flags |= PF_FREEZER_SKIP;
 
 	info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
-					 GFP_KERNEL, init_linuxrc, NULL, NULL);
+					 GFP_KERNEL, NULL, NULL, init_linuxrc,
+					 NULL, NULL);
 	if (!info)
 		return;
 	call_usermodehelper_exec(info, UMH_WAIT_PROC);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index bc6addd..41df494 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -92,7 +92,8 @@ static int call_modprobe(char *module_name, int wait)
 	argv[4] = NULL;
 
 	info = call_usermodehelper_setup(modprobe_path, argv, envp, GFP_KERNEL,
-					 NULL, free_modprobe_argv, NULL);
+					 NULL, NULL, NULL, free_modprobe_argv,
+					 NULL);
 	if (!info)
 		goto free_module_name;
 
diff --git a/kernel/umh.c b/kernel/umh.c
index 6ff9905..97e9bd8 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -25,6 +25,7 @@
 #include <linux/ptrace.h>
 #include <linux/async.h>
 #include <linux/uaccess.h>
+#include <linux/delay.h>
 
 #include <trace/events/module.h>
 
@@ -53,8 +54,15 @@ static void umh_complete(struct subprocess_info *sub_info)
 	 */
 	if (comp)
 		complete(comp);
-	else
+	else {
+		for(;;) {
+			if (sub_info->cleaned == false)
+				udelay(20);
+			else
+				break;
+		}
 		call_usermodehelper_freeinfo(sub_info);
+	}
 }
 
 /*
@@ -120,6 +128,9 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 
 	/* If SIGCLD is ignored sys_wait4 won't populate the status. */
 	kernel_sigaction(SIGCHLD, SIG_DFL);
+	if(sub_info->cleanup_intermediate) {
+		sub_info->cleanup_intermediate(sub_info);
+	}
 	pid = kernel_thread(call_usermodehelper_exec_async, sub_info, SIGCHLD);
 	if (pid < 0) {
 		sub_info->retval = pid;
@@ -170,6 +181,9 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 {
 	struct subprocess_info *sub_info =
 		container_of(work, struct subprocess_info, work);
+	if(sub_info->init_intermediate) {
+		sub_info->init_intermediate(sub_info);
+	}
 
 	if (sub_info->wait & UMH_WAIT_PROC) {
 		call_usermodehelper_exec_sync(sub_info);
@@ -182,6 +196,11 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 		 */
 		pid = kernel_thread(call_usermodehelper_exec_async, sub_info,
 				    CLONE_PARENT | SIGCHLD);
+
+		if(sub_info->cleanup_intermediate) {
+			sub_info->cleanup_intermediate(sub_info);
+		}
+		sub_info->cleaned = true;
 		if (pid < 0) {
 			sub_info->retval = pid;
 			umh_complete(sub_info);
@@ -347,25 +366,38 @@ static void helper_unlock(void)
  * @argv: arg vector for process
  * @envp: environment for process
  * @gfp_mask: gfp mask for memory allocation
- * @cleanup: a cleanup function
+ * @init_intermediate: init function which is called in parent task
+ * @cleanup_intermediate: clean function which is called in parent task
  * @init: an init function
+ * @cleanup: a cleanup function
  * @data: arbitrary context sensitive data
  *
  * Returns either %NULL on allocation failure, or a subprocess_info
  * structure.  This should be passed to call_usermodehelper_exec to
  * exec the process and free the structure.
  *
- * The init function is used to customize the helper process prior to
- * exec.  A non-zero return code causes the process to error out, exit,
- * and return the failure to the calling process
+ * The init_intermediate is called in the parent task of user mode
+ * helper. It's designed for init works which must be done in
+ * parent task, like switching the pid_ns_for_children.
+ *
+ * The cleanup_intermediate is used when we want to cleanup what
+ * we have done in init_intermediate, it is also called in parent
+ * task.
  *
- * The cleanup function is just before ethe subprocess_info is about to
+ * The init function is called after fork. It is used to customize the
+ * helper process prior to exec.  A non-zero return code causes the
+ * process to error out, exit, and return the failure to the
+ * calling process.
+ *
+ * The cleanup function is just before the subprocess_info is about to
  * be freed.  This can be used for freeing the argv and envp.  The
  * Function must be runnable in either a process context or the
  * context in which call_usermodehelper_exec is called.
  */
 struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 		char **envp, gfp_t gfp_mask,
+		void (*init_intermediate)(struct subprocess_info *info),
+		void (*cleanup_intermediate)(struct subprocess_info *info),
 		int (*init)(struct subprocess_info *info, struct cred *new),
 		void (*cleanup)(struct subprocess_info *info),
 		void *data)
@@ -385,8 +417,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 	sub_info->argv = argv;
 	sub_info->envp = envp;
 
-	sub_info->cleanup = cleanup;
+	sub_info->init_intermediate = init_intermediate;
+	sub_info->cleaned = false;
+	sub_info->cleanup_intermediate = cleanup_intermediate;
 	sub_info->init = init;
+	sub_info->cleanup = cleanup;
 	sub_info->data = data;
   out:
 	return sub_info;
@@ -481,7 +516,7 @@ int call_usermodehelper(const char *path, char **argv, char **envp, int wait)
 	gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
 
 	info = call_usermodehelper_setup(path, argv, envp, gfp_mask,
-					 NULL, NULL, NULL);
+					 NULL, NULL, NULL, NULL, NULL);
 	if (info == NULL)
 		return -ENOMEM;
 
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index e8036cd..ae1025c 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -78,8 +78,8 @@ static int call_usermodehelper_keys(const char *path, char **argv, char **envp,
 	struct subprocess_info *info;
 
 	info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
-					  umh_keys_init, umh_keys_cleanup,
-					  session_keyring);
+					  NULL, NULL, umh_keys_init,
+					  umh_keys_cleanup, session_keyring);
 	if (!info)
 		return -ENOMEM;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 1/3] Make call_usermodehelper_exec possible to set namespaces
  2017-11-22  3:24     ` Cao Shufeng
  (?)
@ 2017-11-22  3:24     ` Cao Shufeng
  -1 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, fnstml-container, kamezawa.hiroyu,
	stgraber, avagin, zhaolei, mashimiao.fnst, caosf.fnst

Current call_usermodehelper_work() can not set namespaces for
the executed program.

This patch add above function for call_usermodehelper_work().
The init_intermediate is introduced for init works which should
be done before fork(). So that we get a method to set namespaces
for children. The cleanup_intermediate is introduced for cleaning
up what we have done in init_intermediate, like switching back
the namespace.

This function is helpful for coredump to run pipe_program in
specific container environment.

Signed-off-by: Cao Shufeng <caosf.fnst@cn.fujitsu.com>
---
 fs/coredump.c               |  3 ++-
 include/linux/umh.h         |  5 +++++
 init/do_mounts_initrd.c     |  3 ++-
 kernel/kmod.c               |  3 ++-
 kernel/umh.c                | 51 ++++++++++++++++++++++++++++++++++++++-------
 security/keys/request_key.c |  4 ++--
 6 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 52c63d6..84c2b8a 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -647,7 +647,8 @@ void do_coredump(const siginfo_t *siginfo)
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						umh_pipe_setup, NULL, &cprm);
+						NULL, NULL, umh_pipe_setup,
+						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
diff --git a/include/linux/umh.h b/include/linux/umh.h
index 244aff6..832ff5d 100644
--- a/include/linux/umh.h
+++ b/include/linux/umh.h
@@ -24,6 +24,9 @@ struct subprocess_info {
 	char **envp;
 	int wait;
 	int retval;
+	bool cleaned;
+	void (*init_intermediate)(struct subprocess_info *info);
+	void (*cleanup_intermediate)(struct subprocess_info *info);
 	int (*init)(struct subprocess_info *info, struct cred *new);
 	void (*cleanup)(struct subprocess_info *info);
 	void *data;
@@ -35,6 +38,8 @@ call_usermodehelper(const char *path, char **argv, char **envp, int wait);
 extern struct subprocess_info *
 call_usermodehelper_setup(const char *path, char **argv, char **envp,
 			  gfp_t gfp_mask,
+			  void (*init_intermediate)(struct subprocess_info *info),
+			  void (*cleanup_intermediate)(struct subprocess_info *info),
 			  int (*init)(struct subprocess_info *info, struct cred *new),
 			  void (*cleanup)(struct subprocess_info *), void *data);
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 53d4f0f..8bb34c0 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -73,7 +73,8 @@ static void __init handle_initrd(void)
 	current->flags |= PF_FREEZER_SKIP;
 
 	info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
-					 GFP_KERNEL, init_linuxrc, NULL, NULL);
+					 GFP_KERNEL, NULL, NULL, init_linuxrc,
+					 NULL, NULL);
 	if (!info)
 		return;
 	call_usermodehelper_exec(info, UMH_WAIT_PROC);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index bc6addd..41df494 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -92,7 +92,8 @@ static int call_modprobe(char *module_name, int wait)
 	argv[4] = NULL;
 
 	info = call_usermodehelper_setup(modprobe_path, argv, envp, GFP_KERNEL,
-					 NULL, free_modprobe_argv, NULL);
+					 NULL, NULL, NULL, free_modprobe_argv,
+					 NULL);
 	if (!info)
 		goto free_module_name;
 
diff --git a/kernel/umh.c b/kernel/umh.c
index 6ff9905..97e9bd8 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -25,6 +25,7 @@
 #include <linux/ptrace.h>
 #include <linux/async.h>
 #include <linux/uaccess.h>
+#include <linux/delay.h>
 
 #include <trace/events/module.h>
 
@@ -53,8 +54,15 @@ static void umh_complete(struct subprocess_info *sub_info)
 	 */
 	if (comp)
 		complete(comp);
-	else
+	else {
+		for(;;) {
+			if (sub_info->cleaned == false)
+				udelay(20);
+			else
+				break;
+		}
 		call_usermodehelper_freeinfo(sub_info);
+	}
 }
 
 /*
@@ -120,6 +128,9 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 
 	/* If SIGCLD is ignored sys_wait4 won't populate the status. */
 	kernel_sigaction(SIGCHLD, SIG_DFL);
+	if(sub_info->cleanup_intermediate) {
+		sub_info->cleanup_intermediate(sub_info);
+	}
 	pid = kernel_thread(call_usermodehelper_exec_async, sub_info, SIGCHLD);
 	if (pid < 0) {
 		sub_info->retval = pid;
@@ -170,6 +181,9 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 {
 	struct subprocess_info *sub_info =
 		container_of(work, struct subprocess_info, work);
+	if(sub_info->init_intermediate) {
+		sub_info->init_intermediate(sub_info);
+	}
 
 	if (sub_info->wait & UMH_WAIT_PROC) {
 		call_usermodehelper_exec_sync(sub_info);
@@ -182,6 +196,11 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
 		 */
 		pid = kernel_thread(call_usermodehelper_exec_async, sub_info,
 				    CLONE_PARENT | SIGCHLD);
+
+		if(sub_info->cleanup_intermediate) {
+			sub_info->cleanup_intermediate(sub_info);
+		}
+		sub_info->cleaned = true;
 		if (pid < 0) {
 			sub_info->retval = pid;
 			umh_complete(sub_info);
@@ -347,25 +366,38 @@ static void helper_unlock(void)
  * @argv: arg vector for process
  * @envp: environment for process
  * @gfp_mask: gfp mask for memory allocation
- * @cleanup: a cleanup function
+ * @init_intermediate: init function which is called in parent task
+ * @cleanup_intermediate: clean function which is called in parent task
  * @init: an init function
+ * @cleanup: a cleanup function
  * @data: arbitrary context sensitive data
  *
  * Returns either %NULL on allocation failure, or a subprocess_info
  * structure.  This should be passed to call_usermodehelper_exec to
  * exec the process and free the structure.
  *
- * The init function is used to customize the helper process prior to
- * exec.  A non-zero return code causes the process to error out, exit,
- * and return the failure to the calling process
+ * The init_intermediate is called in the parent task of user mode
+ * helper. It's designed for init works which must be done in
+ * parent task, like switching the pid_ns_for_children.
+ *
+ * The cleanup_intermediate is used when we want to cleanup what
+ * we have done in init_intermediate, it is also called in parent
+ * task.
  *
- * The cleanup function is just before ethe subprocess_info is about to
+ * The init function is called after fork. It is used to customize the
+ * helper process prior to exec.  A non-zero return code causes the
+ * process to error out, exit, and return the failure to the
+ * calling process.
+ *
+ * The cleanup function is just before the subprocess_info is about to
  * be freed.  This can be used for freeing the argv and envp.  The
  * Function must be runnable in either a process context or the
  * context in which call_usermodehelper_exec is called.
  */
 struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 		char **envp, gfp_t gfp_mask,
+		void (*init_intermediate)(struct subprocess_info *info),
+		void (*cleanup_intermediate)(struct subprocess_info *info),
 		int (*init)(struct subprocess_info *info, struct cred *new),
 		void (*cleanup)(struct subprocess_info *info),
 		void *data)
@@ -385,8 +417,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 	sub_info->argv = argv;
 	sub_info->envp = envp;
 
-	sub_info->cleanup = cleanup;
+	sub_info->init_intermediate = init_intermediate;
+	sub_info->cleaned = false;
+	sub_info->cleanup_intermediate = cleanup_intermediate;
 	sub_info->init = init;
+	sub_info->cleanup = cleanup;
 	sub_info->data = data;
   out:
 	return sub_info;
@@ -481,7 +516,7 @@ int call_usermodehelper(const char *path, char **argv, char **envp, int wait)
 	gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
 
 	info = call_usermodehelper_setup(path, argv, envp, gfp_mask,
-					 NULL, NULL, NULL);
+					 NULL, NULL, NULL, NULL, NULL);
 	if (info == NULL)
 		return -ENOMEM;
 
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index e8036cd..ae1025c 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -78,8 +78,8 @@ static int call_usermodehelper_keys(const char *path, char **argv, char **envp,
 	struct subprocess_info *info;
 
 	info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
-					  umh_keys_init, umh_keys_cleanup,
-					  session_keyring);
+					  NULL, NULL, umh_keys_init,
+					  umh_keys_cleanup, session_keyring);
 	if (!info)
 		return -ENOMEM;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 2/3] Limit dump_pipe program's permission to init for container
       [not found]     ` <1511321058-6089-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  2017-11-22  3:24       ` Cao Shufeng
@ 2017-11-22  3:24       ` Cao Shufeng
  2017-11-22  3:24       ` [PATCH_v4.1 3/3] Make core_pattern support namespace Cao Shufeng
  2 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	fnstml-container-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Currently when we set core_pattern to a pipe, the pipe program is
forked by kthread running with root's permission, and write dumpfile
into host's filesystem.
Same thing happened for container, the dumper and dumpfile are also
in host(not in container).

It have following program:
1: Not consistent with file_type core_pattern
   When we set core_pattern to a file, the container will write dump
   into container's filesystem instead of host.
2: Not safe for privileged container
   In a privileged container, user can destroy host system by following
   command:
   # # In a container
   # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern
   # make_dump

This patch switch dumper program's environment to init task, so, for
container, dumper program have same environment with init task in
container, which make dumper program put in container's filesystem, and
write coredump into container's filesystem.
The dumper's permission is also limited into subset of container's init
process.

Suggested-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

Signed-off-by: Cao ShuFeng<caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 fs/coredump.c           | 126 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/binfmts.h |   2 +
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 84c2b8a..41448bd 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -508,6 +508,45 @@ static void wait_for_dump_helpers(struct file *file)
 }
 
 /*
+ * umh_ns_setup
+ * set the namesapces to the bask task of a container.
+ * we need to switch back to the original namespaces
+ * so that the thread of workqueue is not influlenced.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_setup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task = cp->base_task;
+
+	if (base_task) {
+		cp->current_task_nsproxy = current->nsproxy;
+		//prevent current namespace from being freed
+		get_nsproxy(current->nsproxy);
+		/* Set namespaces to base_task */
+		get_nsproxy(base_task->nsproxy);
+		switch_task_namespaces(current, base_task->nsproxy);
+	}
+}
+
+/*
+ * umh_ns_cleanup
+ * cleanup what we have done in umh_ns_setup.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_cleanup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct nsproxy *current_task_nsproxy = cp->current_task_nsproxy;
+	if (current_task_nsproxy) {
+		/* switch workqueue's original namespace back */
+		switch_task_namespaces(current, current_task_nsproxy);
+	}
+}
+
+/*
  * umh_pipe_setup
  * helper function to customize the process used
  * to collect the core in userspace.  Specifically
@@ -522,6 +561,8 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 {
 	struct file *files[2];
 	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task;
+
 	int err = create_pipe_files(files, 0);
 	if (err)
 		return err;
@@ -530,10 +571,76 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 
 	err = replace_fd(0, files[0], 0);
 	fput(files[0]);
+	if (err)
+		return err;
+
 	/* and disallow core files too */
 	current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
 
-	return err;
+	base_task = cp->base_task;
+	if (base_task) {
+		const struct cred *base_cred;
+
+		/* Set fs_root to base_task */
+		spin_lock(&base_task->fs->lock);
+		set_fs_root(current->fs, &base_task->fs->root);
+		set_fs_pwd(current->fs, &base_task->fs->pwd);
+		spin_unlock(&base_task->fs->lock);
+
+		/* Set cgroup to base_task */
+		current->flags &= ~PF_NO_SETAFFINITY;
+		err = cgroup_attach_task_all(base_task, current);
+		if (err < 0)
+			return err;
+
+		/* Set cred to base_task */
+		base_cred = get_task_cred(base_task);
+
+		new->uid   = base_cred->uid;
+		new->gid   = base_cred->gid;
+		new->suid  = base_cred->suid;
+		new->sgid  = base_cred->sgid;
+		new->euid  = base_cred->euid;
+		new->egid  = base_cred->egid;
+		new->fsuid = base_cred->fsuid;
+		new->fsgid = base_cred->fsgid;
+
+		new->securebits = base_cred->securebits;
+
+		new->cap_inheritable = base_cred->cap_inheritable;
+		new->cap_permitted   = base_cred->cap_permitted;
+		new->cap_effective   = base_cred->cap_effective;
+		new->cap_bset        = base_cred->cap_bset;
+		new->cap_ambient     = base_cred->cap_ambient;
+
+		security_cred_free(new);
+#ifdef CONFIG_SECURITY
+		new->security = NULL;
+#endif
+		err = security_prepare_creds(new, base_cred, GFP_KERNEL);
+		if (err < 0) {
+			put_cred(base_cred);
+			return err;
+		}
+
+		free_uid(new->user);
+		new->user = base_cred->user;
+		get_uid(new->user);
+
+		put_user_ns(new->user_ns);
+		new->user_ns = base_cred->user_ns;
+		get_user_ns(new->user_ns);
+
+		put_group_info(new->group_info);
+		new->group_info = base_cred->group_info;
+		get_group_info(new->group_info);
+
+		put_cred(base_cred);
+
+		validate_creds(new);
+	}
+
+	return 0;
 }
 
 void do_coredump(const siginfo_t *siginfo)
@@ -596,6 +703,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	if (ispipe) {
 		int dump_count;
+                struct task_struct *vinit_task;
 		char **helper_argv;
 		struct subprocess_info *sub_info;
 
@@ -637,6 +745,15 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		rcu_read_lock();
+		vinit_task = find_task_by_vpid(1);
+		rcu_read_unlock();
+		if (!vinit_task) {
+			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
+			goto fail_dropcount;
+		}
+
+
 		helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);
 		if (!helper_argv) {
 			printk(KERN_WARNING "%s failed to allocate memory\n",
@@ -644,15 +761,20 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		get_task_struct(vinit_task);
+
+		cprm.base_task = vinit_task;
+
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						NULL, NULL, umh_pipe_setup,
+						umh_ns_setup, umh_ns_cleanup, umh_pipe_setup,
 						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
 
+		put_task_struct(vinit_task);
 		argv_free(helper_argv);
 		if (retval) {
 			printk(KERN_INFO "Core dump to |%s pipe failed\n",
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index b0abe21..e69d5e5 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -76,6 +76,8 @@ struct linux_binprm {
 
 /* Function parameter for binfmt->coredump */
 struct coredump_params {
+        struct task_struct *base_task;
+        struct nsproxy *current_task_nsproxy;
 	const siginfo_t *siginfo;
 	struct pt_regs *regs;
 	struct file *file;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 2/3] Limit dump_pipe program's permission to init for container
  2017-11-22  3:24     ` Cao Shufeng
                       ` (2 preceding siblings ...)
  (?)
@ 2017-11-22  3:24     ` Cao Shufeng
  -1 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, fnstml-container, kamezawa.hiroyu,
	stgraber, avagin, zhaolei, mashimiao.fnst, caosf.fnst

Currently when we set core_pattern to a pipe, the pipe program is
forked by kthread running with root's permission, and write dumpfile
into host's filesystem.
Same thing happened for container, the dumper and dumpfile are also
in host(not in container).

It have following program:
1: Not consistent with file_type core_pattern
   When we set core_pattern to a file, the container will write dump
   into container's filesystem instead of host.
2: Not safe for privileged container
   In a privileged container, user can destroy host system by following
   command:
   # # In a container
   # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern
   # make_dump

This patch switch dumper program's environment to init task, so, for
container, dumper program have same environment with init task in
container, which make dumper program put in container's filesystem, and
write coredump into container's filesystem.
The dumper's permission is also limited into subset of container's init
process.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Signed-off-by: Cao ShuFeng<caosf.fnst@cn.fujitsu.com>
---
 fs/coredump.c           | 126 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/binfmts.h |   2 +
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 84c2b8a..41448bd 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -508,6 +508,45 @@ static void wait_for_dump_helpers(struct file *file)
 }
 
 /*
+ * umh_ns_setup
+ * set the namesapces to the bask task of a container.
+ * we need to switch back to the original namespaces
+ * so that the thread of workqueue is not influlenced.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_setup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task = cp->base_task;
+
+	if (base_task) {
+		cp->current_task_nsproxy = current->nsproxy;
+		//prevent current namespace from being freed
+		get_nsproxy(current->nsproxy);
+		/* Set namespaces to base_task */
+		get_nsproxy(base_task->nsproxy);
+		switch_task_namespaces(current, base_task->nsproxy);
+	}
+}
+
+/*
+ * umh_ns_cleanup
+ * cleanup what we have done in umh_ns_setup.
+ *
+ * this method runs in workqueue kernel thread.
+ */
+static void umh_ns_cleanup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct nsproxy *current_task_nsproxy = cp->current_task_nsproxy;
+	if (current_task_nsproxy) {
+		/* switch workqueue's original namespace back */
+		switch_task_namespaces(current, current_task_nsproxy);
+	}
+}
+
+/*
  * umh_pipe_setup
  * helper function to customize the process used
  * to collect the core in userspace.  Specifically
@@ -522,6 +561,8 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 {
 	struct file *files[2];
 	struct coredump_params *cp = (struct coredump_params *)info->data;
+	struct task_struct *base_task;
+
 	int err = create_pipe_files(files, 0);
 	if (err)
 		return err;
@@ -530,10 +571,76 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 
 	err = replace_fd(0, files[0], 0);
 	fput(files[0]);
+	if (err)
+		return err;
+
 	/* and disallow core files too */
 	current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
 
-	return err;
+	base_task = cp->base_task;
+	if (base_task) {
+		const struct cred *base_cred;
+
+		/* Set fs_root to base_task */
+		spin_lock(&base_task->fs->lock);
+		set_fs_root(current->fs, &base_task->fs->root);
+		set_fs_pwd(current->fs, &base_task->fs->pwd);
+		spin_unlock(&base_task->fs->lock);
+
+		/* Set cgroup to base_task */
+		current->flags &= ~PF_NO_SETAFFINITY;
+		err = cgroup_attach_task_all(base_task, current);
+		if (err < 0)
+			return err;
+
+		/* Set cred to base_task */
+		base_cred = get_task_cred(base_task);
+
+		new->uid   = base_cred->uid;
+		new->gid   = base_cred->gid;
+		new->suid  = base_cred->suid;
+		new->sgid  = base_cred->sgid;
+		new->euid  = base_cred->euid;
+		new->egid  = base_cred->egid;
+		new->fsuid = base_cred->fsuid;
+		new->fsgid = base_cred->fsgid;
+
+		new->securebits = base_cred->securebits;
+
+		new->cap_inheritable = base_cred->cap_inheritable;
+		new->cap_permitted   = base_cred->cap_permitted;
+		new->cap_effective   = base_cred->cap_effective;
+		new->cap_bset        = base_cred->cap_bset;
+		new->cap_ambient     = base_cred->cap_ambient;
+
+		security_cred_free(new);
+#ifdef CONFIG_SECURITY
+		new->security = NULL;
+#endif
+		err = security_prepare_creds(new, base_cred, GFP_KERNEL);
+		if (err < 0) {
+			put_cred(base_cred);
+			return err;
+		}
+
+		free_uid(new->user);
+		new->user = base_cred->user;
+		get_uid(new->user);
+
+		put_user_ns(new->user_ns);
+		new->user_ns = base_cred->user_ns;
+		get_user_ns(new->user_ns);
+
+		put_group_info(new->group_info);
+		new->group_info = base_cred->group_info;
+		get_group_info(new->group_info);
+
+		put_cred(base_cred);
+
+		validate_creds(new);
+	}
+
+	return 0;
 }
 
 void do_coredump(const siginfo_t *siginfo)
@@ -596,6 +703,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	if (ispipe) {
 		int dump_count;
+                struct task_struct *vinit_task;
 		char **helper_argv;
 		struct subprocess_info *sub_info;
 
@@ -637,6 +745,15 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		rcu_read_lock();
+		vinit_task = find_task_by_vpid(1);
+		rcu_read_unlock();
+		if (!vinit_task) {
+			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
+			goto fail_dropcount;
+		}
+
+
 		helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);
 		if (!helper_argv) {
 			printk(KERN_WARNING "%s failed to allocate memory\n",
@@ -644,15 +761,20 @@ void do_coredump(const siginfo_t *siginfo)
 			goto fail_dropcount;
 		}
 
+		get_task_struct(vinit_task);
+
+		cprm.base_task = vinit_task;
+
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						NULL, NULL, umh_pipe_setup,
+						umh_ns_setup, umh_ns_cleanup, umh_pipe_setup,
 						NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
 
+		put_task_struct(vinit_task);
 		argv_free(helper_argv);
 		if (retval) {
 			printk(KERN_INFO "Core dump to |%s pipe failed\n",
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index b0abe21..e69d5e5 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -76,6 +76,8 @@ struct linux_binprm {
 
 /* Function parameter for binfmt->coredump */
 struct coredump_params {
+        struct task_struct *base_task;
+        struct nsproxy *current_task_nsproxy;
 	const siginfo_t *siginfo;
 	struct pt_regs *regs;
 	struct file *file;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 3/3] Make core_pattern support namespace
       [not found]     ` <1511321058-6089-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  2017-11-22  3:24       ` Cao Shufeng
  2017-11-22  3:24       ` [PATCH_v4.1 2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
@ 2017-11-22  3:24       ` Cao Shufeng
  2 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	fnstml-container-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Currently, each container shared one copy of coredump setting
with the host system, if host system changed the setting, each
running containers will be affected.
Same story happened when container changed core_pattern, both
host and other container will be affected.

For container based on namespace design, it is good to allow
each container keeping their own coredump setting.

It will bring us following benefit:
1: Each container can change their own coredump setting
   based on operation on /proc/sys/kernel/core_pattern
2: Coredump setting changed in host will not affect
   running containers.
3: Support both case of "putting coredump in guest" and
   "putting curedump in host".

Each namespace-based software(lxc, docker, ..) can use this function
to custom their dump setting.

And this function makes each continer working as separate system,
it fit for design goal of namespace.

Test(in lxc):
 # In the host
 # ----------------
 # echo host_core >/proc/sys/kernel/core_pattern
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ulimit -c 1024000
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:02 host_core.2175
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #

 # In the container
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # echo container_core >/proc/sys/kernel/core_pattern
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rwxr-xr-x    1 root     root       759731 Feb  4 10:45 make_dump
 -rw-------    1 root     root       331776 Feb  4 10:45 container_core.16
 #

 # Return to host
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ls
 host_core.2175  make_dump  make_dump.c
 # rm -f host_core.2175
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:49 host_core.2351
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #
---
 fs/coredump.c                 | 25 ++++++++++++++++------
 include/linux/pid_namespace.h |  3 +++
 kernel/pid.c                  |  2 ++
 kernel/pid_namespace.c        |  2 ++
 kernel/sysctl.c               | 50 ++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 41448bd..cf08c65 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -53,7 +53,6 @@
 
 int core_uses_pid;
 unsigned int core_pipe_limit;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 
 struct core_name {
@@ -61,8 +60,6 @@ struct core_name {
 	int used, size;
 };
 
-/* The maximal length of core_pattern is also specified in sysctl.c */
-
 static int expand_corename(struct core_name *cn, int size)
 {
 	char *corename = krealloc(cn->corename, size, GFP_KERNEL);
@@ -187,10 +184,10 @@ static int cn_print_exe_file(struct core_name *cn)
  * name into corename, which must have space for at least
  * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator.
  */
-static int format_corename(struct core_name *cn, struct coredump_params *cprm)
+static int format_corename(struct core_name *cn, const char *pat_ptr,
+			   struct coredump_params *cprm)
 {
 	const struct cred *cred = current_cred();
-	const char *pat_ptr = core_pattern;
 	int ispipe = (*pat_ptr == '|');
 	int pid_in_pattern = 0;
 	int err = 0;
@@ -669,6 +666,8 @@ void do_coredump(const siginfo_t *siginfo)
 		 */
 		.mm_flags = mm->flags,
 	};
+	struct pid_namespace *pid_ns;
+	char core_pattern[CORENAME_MAX_SIZE];
 
 	audit_core_dumps(siginfo->si_signo);
 
@@ -678,6 +677,18 @@ void do_coredump(const siginfo_t *siginfo)
 	if (!__get_dumpable(cprm.mm_flags))
 		goto fail;
 
+	pid_ns = task_active_pid_ns(current);
+	spin_lock(&pid_ns->core_pattern_lock);
+	while (pid_ns != &init_pid_ns) {
+		if (pid_ns->core_pattern[0])
+			break;
+		spin_unlock(&pid_ns->core_pattern_lock);
+		pid_ns = pid_ns->parent,
+		spin_lock(&pid_ns->core_pattern_lock);
+	}
+	strcpy(core_pattern, pid_ns->core_pattern);
+	spin_unlock(&pid_ns->core_pattern_lock);
+
 	cred = prepare_creds();
 	if (!cred)
 		goto fail;
@@ -699,7 +710,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	old_cred = override_creds(cred);
 
-	ispipe = format_corename(&cn, &cprm);
+	ispipe = format_corename(&cn, core_pattern, &cprm);
 
 	if (ispipe) {
 		int dump_count;
@@ -746,7 +757,7 @@ void do_coredump(const siginfo_t *siginfo)
 		}
 
 		rcu_read_lock();
-		vinit_task = find_task_by_vpid(1);
+		vinit_task = find_task_by_pid_ns(1, pid_ns);
 		rcu_read_unlock();
 		if (!vinit_task) {
 			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index c78af60..a384b4a 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -10,6 +10,7 @@
 #include <linux/nsproxy.h>
 #include <linux/kref.h>
 #include <linux/ns_common.h>
+#include <linux/binfmts.h>
 
 struct pidmap {
        atomic_t nr_free;
@@ -53,6 +54,8 @@ struct pid_namespace {
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
+	spinlock_t core_pattern_lock;
+	char core_pattern[CORENAME_MAX_SIZE];
 } __randomize_layout;
 
 extern struct pid_namespace init_pid_ns;
diff --git a/kernel/pid.c b/kernel/pid.c
index 020dedb..32e1cff 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -82,6 +82,8 @@ struct pid_namespace init_pid_ns = {
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
 #endif
+	.core_pattern_lock = __SPIN_LOCK_UNLOCKED(init_pid_ns.core_pattern_lock),
+	.core_pattern = "core",
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 4918314..a3d18c2 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -144,6 +144,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	for (i = 1; i < PIDMAP_ENTRIES; i++)
 		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
 
+	spin_lock_init(&ns->core_pattern_lock);
+
 	return ns;
 
 out_free_map:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9576bd5..d091212 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -500,7 +500,7 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname	= "core_pattern",
-		.data		= core_pattern,
+		.data		= NULL,
 		.maxlen		= CORENAME_MAX_SIZE,
 		.mode		= 0644,
 		.proc_handler	= proc_dostring_coredump,
@@ -2624,6 +2624,12 @@ int proc_douintvec_minmax(struct ctl_table *table, int write,
 static void validate_coredump_safety(void)
 {
 #ifdef CONFIG_COREDUMP
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+	const char *core_pattern;
+
+	spin_lock(&pid_ns->core_pattern_lock);
+	core_pattern = pid_ns->core_pattern;
+
 	if (suid_dumpable == SUID_DUMP_ROOT &&
 	    core_pattern[0] != '/' && core_pattern[0] != '|') {
 		printk(KERN_WARNING
@@ -2632,6 +2638,8 @@ static void validate_coredump_safety(void)
 "Set kernel.core_pattern before fs.suid_dumpable.\n"
 		);
 	}
+
+	spin_unlock(&pid_ns->core_pattern_lock);
 #endif
 }
 
@@ -2648,10 +2656,42 @@ static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write,
 static int proc_dostring_coredump(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int error = proc_dostring(table, write, buffer, lenp, ppos);
-	if (!error)
-		validate_coredump_safety();
-	return error;
+	int ret;
+	char core_pattern[CORENAME_MAX_SIZE];
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+
+	if (write) {
+		if (*ppos && sysctl_writes_strict == SYSCTL_WRITES_WARN)
+			warn_sysctl_write(table);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+
+		spin_lock(&pid_ns->core_pattern_lock);
+		strcpy(pid_ns->core_pattern, core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+	} else {
+		spin_lock(&pid_ns->core_pattern_lock);
+		while (pid_ns != &init_pid_ns) {
+			if (pid_ns->core_pattern[0])
+				break;
+			spin_unlock(&pid_ns->core_pattern_lock);
+			pid_ns = pid_ns->parent,
+			spin_lock(&pid_ns->core_pattern_lock);
+		}
+		strcpy(core_pattern, pid_ns->core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+	}
+
+	validate_coredump_safety();
+	return 0;
 }
 #endif
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH_v4.1 3/3] Make core_pattern support namespace
  2017-11-22  3:24     ` Cao Shufeng
                       ` (3 preceding siblings ...)
  (?)
@ 2017-11-22  3:24     ` Cao Shufeng
  -1 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-11-22  3:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, fnstml-container, kamezawa.hiroyu,
	stgraber, avagin, zhaolei, mashimiao.fnst, caosf.fnst

Currently, each container shared one copy of coredump setting
with the host system, if host system changed the setting, each
running containers will be affected.
Same story happened when container changed core_pattern, both
host and other container will be affected.

For container based on namespace design, it is good to allow
each container keeping their own coredump setting.

It will bring us following benefit:
1: Each container can change their own coredump setting
   based on operation on /proc/sys/kernel/core_pattern
2: Coredump setting changed in host will not affect
   running containers.
3: Support both case of "putting coredump in guest" and
   "putting curedump in host".

Each namespace-based software(lxc, docker, ..) can use this function
to custom their dump setting.

And this function makes each continer working as separate system,
it fit for design goal of namespace.

Test(in lxc):
 # In the host
 # ----------------
 # echo host_core >/proc/sys/kernel/core_pattern
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ulimit -c 1024000
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:02 host_core.2175
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #

 # In the container
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # echo container_core >/proc/sys/kernel/core_pattern
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rwxr-xr-x    1 root     root       759731 Feb  4 10:45 make_dump
 -rw-------    1 root     root       331776 Feb  4 10:45 container_core.16
 #

 # Return to host
 # ----------------
 # cat /proc/sys/kernel/core_pattern
 host_core
 # ls
 host_core.2175  make_dump  make_dump.c
 # rm -f host_core.2175
 # ./make_dump
 Segmentation fault (core dumped)
 # ls -l
 -rw------- 1 root root 331776 Feb  4 18:49 host_core.2351
 -rwxr-xr-x 1 root root 759731 Feb  4 18:01 make_dump
 #
---
 fs/coredump.c                 | 25 ++++++++++++++++------
 include/linux/pid_namespace.h |  3 +++
 kernel/pid.c                  |  2 ++
 kernel/pid_namespace.c        |  2 ++
 kernel/sysctl.c               | 50 ++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 41448bd..cf08c65 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -53,7 +53,6 @@
 
 int core_uses_pid;
 unsigned int core_pipe_limit;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 
 struct core_name {
@@ -61,8 +60,6 @@ struct core_name {
 	int used, size;
 };
 
-/* The maximal length of core_pattern is also specified in sysctl.c */
-
 static int expand_corename(struct core_name *cn, int size)
 {
 	char *corename = krealloc(cn->corename, size, GFP_KERNEL);
@@ -187,10 +184,10 @@ static int cn_print_exe_file(struct core_name *cn)
  * name into corename, which must have space for at least
  * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator.
  */
-static int format_corename(struct core_name *cn, struct coredump_params *cprm)
+static int format_corename(struct core_name *cn, const char *pat_ptr,
+			   struct coredump_params *cprm)
 {
 	const struct cred *cred = current_cred();
-	const char *pat_ptr = core_pattern;
 	int ispipe = (*pat_ptr == '|');
 	int pid_in_pattern = 0;
 	int err = 0;
@@ -669,6 +666,8 @@ void do_coredump(const siginfo_t *siginfo)
 		 */
 		.mm_flags = mm->flags,
 	};
+	struct pid_namespace *pid_ns;
+	char core_pattern[CORENAME_MAX_SIZE];
 
 	audit_core_dumps(siginfo->si_signo);
 
@@ -678,6 +677,18 @@ void do_coredump(const siginfo_t *siginfo)
 	if (!__get_dumpable(cprm.mm_flags))
 		goto fail;
 
+	pid_ns = task_active_pid_ns(current);
+	spin_lock(&pid_ns->core_pattern_lock);
+	while (pid_ns != &init_pid_ns) {
+		if (pid_ns->core_pattern[0])
+			break;
+		spin_unlock(&pid_ns->core_pattern_lock);
+		pid_ns = pid_ns->parent,
+		spin_lock(&pid_ns->core_pattern_lock);
+	}
+	strcpy(core_pattern, pid_ns->core_pattern);
+	spin_unlock(&pid_ns->core_pattern_lock);
+
 	cred = prepare_creds();
 	if (!cred)
 		goto fail;
@@ -699,7 +710,7 @@ void do_coredump(const siginfo_t *siginfo)
 
 	old_cred = override_creds(cred);
 
-	ispipe = format_corename(&cn, &cprm);
+	ispipe = format_corename(&cn, core_pattern, &cprm);
 
 	if (ispipe) {
 		int dump_count;
@@ -746,7 +757,7 @@ void do_coredump(const siginfo_t *siginfo)
 		}
 
 		rcu_read_lock();
-		vinit_task = find_task_by_vpid(1);
+		vinit_task = find_task_by_pid_ns(1, pid_ns);
 		rcu_read_unlock();
 		if (!vinit_task) {
 			printk(KERN_WARNING "failed getting init task info, skipping core dump\n");
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index c78af60..a384b4a 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -10,6 +10,7 @@
 #include <linux/nsproxy.h>
 #include <linux/kref.h>
 #include <linux/ns_common.h>
+#include <linux/binfmts.h>
 
 struct pidmap {
        atomic_t nr_free;
@@ -53,6 +54,8 @@ struct pid_namespace {
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
+	spinlock_t core_pattern_lock;
+	char core_pattern[CORENAME_MAX_SIZE];
 } __randomize_layout;
 
 extern struct pid_namespace init_pid_ns;
diff --git a/kernel/pid.c b/kernel/pid.c
index 020dedb..32e1cff 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -82,6 +82,8 @@ struct pid_namespace init_pid_ns = {
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
 #endif
+	.core_pattern_lock = __SPIN_LOCK_UNLOCKED(init_pid_ns.core_pattern_lock),
+	.core_pattern = "core",
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 4918314..a3d18c2 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -144,6 +144,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	for (i = 1; i < PIDMAP_ENTRIES; i++)
 		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
 
+	spin_lock_init(&ns->core_pattern_lock);
+
 	return ns;
 
 out_free_map:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9576bd5..d091212 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -500,7 +500,7 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname	= "core_pattern",
-		.data		= core_pattern,
+		.data		= NULL,
 		.maxlen		= CORENAME_MAX_SIZE,
 		.mode		= 0644,
 		.proc_handler	= proc_dostring_coredump,
@@ -2624,6 +2624,12 @@ int proc_douintvec_minmax(struct ctl_table *table, int write,
 static void validate_coredump_safety(void)
 {
 #ifdef CONFIG_COREDUMP
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+	const char *core_pattern;
+
+	spin_lock(&pid_ns->core_pattern_lock);
+	core_pattern = pid_ns->core_pattern;
+
 	if (suid_dumpable == SUID_DUMP_ROOT &&
 	    core_pattern[0] != '/' && core_pattern[0] != '|') {
 		printk(KERN_WARNING
@@ -2632,6 +2638,8 @@ static void validate_coredump_safety(void)
 "Set kernel.core_pattern before fs.suid_dumpable.\n"
 		);
 	}
+
+	spin_unlock(&pid_ns->core_pattern_lock);
 #endif
 }
 
@@ -2648,10 +2656,42 @@ static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write,
 static int proc_dostring_coredump(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int error = proc_dostring(table, write, buffer, lenp, ppos);
-	if (!error)
-		validate_coredump_safety();
-	return error;
+	int ret;
+	char core_pattern[CORENAME_MAX_SIZE];
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+
+	if (write) {
+		if (*ppos && sysctl_writes_strict == SYSCTL_WRITES_WARN)
+			warn_sysctl_write(table);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+
+		spin_lock(&pid_ns->core_pattern_lock);
+		strcpy(pid_ns->core_pattern, core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+	} else {
+		spin_lock(&pid_ns->core_pattern_lock);
+		while (pid_ns != &init_pid_ns) {
+			if (pid_ns->core_pattern[0])
+				break;
+			spin_unlock(&pid_ns->core_pattern_lock);
+			pid_ns = pid_ns->parent,
+			spin_lock(&pid_ns->core_pattern_lock);
+		}
+		strcpy(core_pattern, pid_ns->core_pattern);
+		spin_unlock(&pid_ns->core_pattern_lock);
+
+		ret = _proc_do_string(core_pattern, table->maxlen, write,
+				      (char __user *)buffer, lenp, ppos);
+		if (ret)
+			return ret;
+	}
+
+	validate_coredump_safety();
+	return 0;
 }
 #endif
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 0/3] Make core_pattern support namespace
@ 2017-08-02  6:37 Cao Shufeng
  0 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2017-08-02  6:37 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

This patchset includes following function points:
1: Let usermodehelper function possible to set pid namespace
   done by: [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible
   to set namespaces
2: Let pipe_type core_pattern write dump into container's rootfs
   done by: [PATCH_v4.1_2/3] Limit dump_pipe program's permission to
   init for container
3: Make separate core_pattern setting for each container
   done by: [PATCH_v4.1_3/3] Make core_pattern support namespace
4: Compatibility with current system
   also included in: [PATCH_v4.1_3/3] Make core_pattern support namespace
   If container hadn't change core_pattern setting, it will keep
   same setting with host.

Test:
1: Pass a test script for each function of this patchset
   ## TEST IN HOST ##
   [root@kerneldev dumptest]# ./test_host
   Set file core_pattern: OK
   ./test_host: line 41:  2366 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dumpfile: OK
   Set file core_pattern: OK
   ./test_host: line 41:  2369 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking capabilities: OK

   ## TEST IN GUEST ##
   # ./test
   Segmentation fault (core dumped)
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking cg pids: OK
   Checking capabilities: OK
   [   64.940734] make_dump[2432]: segfault at 0 ip 000000000040049d sp 000=
07ffc4af025f0 error 6 in make_dump[400000+a6000]
   #
2: Pass other test(which is not easy to do in script) by hand.

Changelog v3.1-v4:
1. remove extra fork pointed out by:
   Andrei Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2: Rebase on top of v4.9-rc8.
3: Rebase on top of v4.12.

Changelog v3-v3.1:
1. Switch "pwd" of pipe program to container's root fs.
2. Rebase on top of v4.9-rc1.

Changelog v2->v3:
1: Fix problem of setting pid namespace, pointed out by:
   Andrei Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Changelog v1(RFC)->v2:
1: Add [PATCH 2/2] which was todo in [RFC v1].
2: Pass a test script for each function.
3: Rebase on top of v4.7.

Suggested-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Signed-off-by: Cao Shufeng <caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

Cao Shufeng (3):
  Make call_usermodehelper_exec possible to set namespaces
  Limit dump_pipe program's permission to init for container
  Make core_pattern support namespace

 fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
 include/linux/binfmts.h       |   2 +
 include/linux/kmod.h          |   5 ++
 include/linux/pid_namespace.h |   3 +
 init/do_mounts_initrd.c       |   3 +-
 kernel/kmod.c                 |  56 +++++++++++++---
 kernel/pid.c                  |   2 +
 kernel/pid_namespace.c        |   2 +
 kernel/sysctl.c               |  50 ++++++++++++--
 lib/kobject_uevent.c          |   3 +-
 security/keys/request_key.c   |   4 +-
 11 files changed, 253 insertions(+), 27 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/3] Make core_pattern support namespace
@ 2016-12-06 11:06 ` Cao Shufeng
  0 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2016-12-06 11:06 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	caosf.fnst-BthXqXjhjHXQFUHtdCDX3A,
	mashimiao.fnst-BthXqXjhjHXQFUHtdCDX3A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

This patchset includes following function points:
1: Let usermodehelper function possible to set pid namespace
   done by: [PATCH v4 1/3] Make call_usermodehelper_exec possible
   to set pid namespace.
2: Let pipe_type core_pattern write dump into container's rootfs
   done by: [PATCH v4 2/3] Limit dump_pipe program's permission to
   init for container.
2: Make separate core_pattern setting for each container
   done by: [PATCH v4 3/3] Make core_pattern support namespace
3: Compatibility with current system
   also included in: [PATCH v4 3/3] Make core_pattern support namespace
   If container hadn't change core_pattern setting, it will keep
   same setting with host.

Test:
1: Pass a test script for each function of this patchset
   ## TEST IN HOST ##
   [root@kerneldev dumptest]# ./test_host
   Set file core_pattern: OK
   ./test_host: line 41:  2366 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dumpfile: OK
   Set file core_pattern: OK
   ./test_host: line 41:  2369 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking capabilities: OK

   ## TEST IN GUEST ##
   # ./test
   Segmentation fault (core dumped)
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking cg pids: OK
   Checking capabilities: OK
   [   64.940734] make_dump[2432]: segfault at 0 ip 000000000040049d sp 000=
07ffc4af025f0 error 6 in make_dump[400000+a6000]
   #
2: Pass other test(which is not easy to do in script) by hand.

Changelog v3.1-v4:
1. remove extra fork pointed out by:
   Andrei Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2: Rebase on top of v4.9-rc8.

Changelog v3-v3.1:
1. Switch "pwd" of pipe program to container's root fs.
2. Rebase on top of v4.9-rc1.

Changelog v2->v3:
1: Fix problem of setting pid namespace, pointed out by:
   Andrei Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Changelog v1(RFC)->v2:
1: Add [PATCH 2/2] which was todo in [RFC v1].
2: Pass a test script for each function.
3: Rebase on top of v4.7.

Suggested-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Signed-off-by: Zhao Lei <zhaolei-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Signed-off-by: Cao Shufeng <caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

Cao Shufeng (2):
  Make call_usermodehelper_exec possible to set namespaces
  Limit dump_pipe program's permission to init for container

Zhao Lei (1):
  Make core_pattern support namespace

 fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
 include/linux/binfmts.h       |   2 +
 include/linux/kmod.h          |   4 ++
 include/linux/pid_namespace.h |   3 +
 init/do_mounts_initrd.c       |   3 +-
 kernel/kmod.c                 |  43 +++++++++---
 kernel/pid.c                  |   2 +
 kernel/pid_namespace.c        |   2 +
 kernel/sysctl.c               |  50 ++++++++++++--
 lib/kobject_uevent.c          |   3 +-
 security/keys/request_key.c   |   4 +-
 11 files changed, 241 insertions(+), 25 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/3] Make core_pattern support namespace
@ 2016-12-06 11:06 ` Cao Shufeng
  0 siblings, 0 replies; 24+ messages in thread
From: Cao Shufeng @ 2016-12-06 11:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, ebiederm, mguzik, kamezawa.hiroyu, stgraber, avagin,
	zhaolei, mashimiao.fnst, caosf.fnst

This patchset includes following function points:
1: Let usermodehelper function possible to set pid namespace
   done by: [PATCH v4 1/3] Make call_usermodehelper_exec possible
   to set pid namespace.
2: Let pipe_type core_pattern write dump into container's rootfs
   done by: [PATCH v4 2/3] Limit dump_pipe program's permission to
   init for container.
2: Make separate core_pattern setting for each container
   done by: [PATCH v4 3/3] Make core_pattern support namespace
3: Compatibility with current system
   also included in: [PATCH v4 3/3] Make core_pattern support namespace
   If container hadn't change core_pattern setting, it will keep
   same setting with host.

Test:
1: Pass a test script for each function of this patchset
   ## TEST IN HOST ##
   [root@kerneldev dumptest]# ./test_host
   Set file core_pattern: OK
   ./test_host: line 41:  2366 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dumpfile: OK
   Set file core_pattern: OK
   ./test_host: line 41:  2369 Segmentation fault      (core dumped) "$SCRI=
PT_BASE_DIR"/make_dump
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking capabilities: OK

   ## TEST IN GUEST ##
   # ./test
   Segmentation fault (core dumped)
   Checking dump_pipe triggered: OK
   Checking rootfs: OK
   Checking dumpfile: OK
   Checking namespace: OK
   Checking process list: OK
   Checking cg pids: OK
   Checking capabilities: OK
   [   64.940734] make_dump[2432]: segfault at 0 ip 000000000040049d sp 000=
07ffc4af025f0 error 6 in make_dump[400000+a6000]
   #
2: Pass other test(which is not easy to do in script) by hand.

Changelog v3.1-v4:
1. remove extra fork pointed out by:
   Andrei Vagin <avagin@gmail.com>
2: Rebase on top of v4.9-rc8.

Changelog v3-v3.1:
1. Switch "pwd" of pipe program to container's root fs.
2. Rebase on top of v4.9-rc1.

Changelog v2->v3:
1: Fix problem of setting pid namespace, pointed out by:
   Andrei Vagin <avagin@gmail.com>

Changelog v1(RFC)->v2:
1: Add [PATCH 2/2] which was todo in [RFC v1].
2: Pass a test script for each function.
3: Rebase on top of v4.7.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Cao Shufeng <caosf.fnst@cn.fujitsu.com>

Cao Shufeng (2):
  Make call_usermodehelper_exec possible to set namespaces
  Limit dump_pipe program's permission to init for container

Zhao Lei (1):
  Make core_pattern support namespace

 fs/coredump.c                 | 150 +++++++++++++++++++++++++++++++++++++++---
 include/linux/binfmts.h       |   2 +
 include/linux/kmod.h          |   4 ++
 include/linux/pid_namespace.h |   3 +
 init/do_mounts_initrd.c       |   3 +-
 kernel/kmod.c                 |  43 +++++++++---
 kernel/pid.c                  |   2 +
 kernel/pid_namespace.c        |   2 +
 kernel/sysctl.c               |  50 ++++++++++++--
 lib/kobject_uevent.c          |   3 +-
 security/keys/request_key.c   |   4 +-
 11 files changed, 241 insertions(+), 25 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-11-22  3:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02  6:37 [PATCH 0/3] Make core_pattern support namespace Cao Shufeng
2017-08-02  6:37 ` [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible to set namespaces Cao Shufeng
     [not found] ` <1501655849-9149-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2017-08-02  6:37   ` Cao Shufeng
2017-08-02  6:37   ` [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
2017-08-02  6:37   ` [PATCH_v4.1_3/3] Make core_pattern support namespace Cao Shufeng
2017-11-02  5:41   ` [PATCH 0/3] " 曹树烽
2017-11-02  5:41     ` 曹树烽
2017-11-22  3:24   ` [PATCH_v4.1 " Cao Shufeng
2017-11-22  3:24     ` Cao Shufeng
2017-11-22  3:24     ` [PATCH_v4.1 1/3] Make call_usermodehelper_exec possible to set namespaces Cao Shufeng
     [not found]     ` <1511321058-6089-1-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2017-11-22  3:24       ` Cao Shufeng
2017-11-22  3:24       ` [PATCH_v4.1 2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
2017-11-22  3:24       ` [PATCH_v4.1 3/3] Make core_pattern support namespace Cao Shufeng
2017-11-22  3:24     ` [PATCH_v4.1 2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
2017-11-22  3:24     ` [PATCH_v4.1 3/3] Make core_pattern support namespace Cao Shufeng
2017-08-02  6:37 ` [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container Cao Shufeng
2017-08-02  6:37 ` [PATCH_v4.1_3/3] Make core_pattern support namespace Cao Shufeng
     [not found]   ` <1501655849-9149-4-git-send-email-caosf.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2017-08-02  7:07     ` Aleksa Sarai
2017-08-02  7:07   ` Aleksa Sarai
     [not found]     ` <8bb63f0a-d0b7-edf7-6dca-4d12641074b4-l3A5Bk7waGM@public.gmane.org>
2017-11-22  3:07       ` 曹树烽
2017-11-22  3:07         ` 曹树烽
  -- strict thread matches above, loose matches on Subject: below --
2017-08-02  6:37 [PATCH 0/3] " Cao Shufeng
2016-12-06 11:06 Cao Shufeng
2016-12-06 11:06 ` Cao Shufeng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.