All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] coredump: core dump masking support v4
@ 2007-03-02  4:41 Kawai, Hidehiro
  2007-03-02  4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-02  4:41 UTC (permalink / raw)
  To: Andrew Morton, kernel list
  Cc: Pavel Machek, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi,

This patch series is version 4 of the core dump masking feature,
which provides a per-process flag not to dump anonymous shared
memory segments.

In the previous version, the flag value was passed around the core
dump functions as an argument to use the same setting while dumping.
In this version, instead of doing that, a r/w semaphore prevents the
setting from being changed while dumping.

This patch series can be applied against 2.6.20-mm2.
The supported core file formats are ELF and ELF-FDPIC. ELF has been
tested, but ELF-FDPIC has not been built and tested because I don't
have the test environment.


Background:
Some software programs share huge memory among hundreds of
processes. If a failure occurs on one of these processes, they can
be signaled by a monitoring process to generate core files and
restart the service. However, it can develop into a system-wide
failure such as system slow down for a long time and disk space
shortage because the total size of the core files is very huge!

To avoid the above situation we can limit the core file size by
setrlimit(2) or ulimit(1). But this method can lose important data
such as stack because core dumping is terminated halfway.
So I suggest keeping shared memory segments from being dumped for
particular processes. Because the shared memory attached to processes
is common in them, we don't need to dump the shared memory every time.


Usage:
Get all shared memory segments of pid 1234 not to dump:

  $ echo 1 > /proc/1234/coredump_omit_anonymous_shared

When a new process is created, the process inherits the flag status
from its parent. It is useful to set the core dump flags before the
program runs. For example:

  $ echo 1 > /proc/self/coredump_omit_anonymous_shared
  $ ./some_program


ChangeLog:
v4:
  - in maydump(), retrieve the core dump setting from mm_struct
    directly, instead of its additional argument
  - writing to /proc/<pid>/coredump_omit_anonymous_shared returns
    EBUSY while core dumping.

v3:
http://groups.google.com/group/linux.kernel/browse_frm/thread/706d2ae41c1cb2de/
  - remove `/proc/<pid>/core_flags' proc entry
  - add `/proc/<pid>/coredump_anonymous_shared' as a named flag
  - remove kernel.core_flags_enable sysctl parameter

v2:
http://groups.google.com/group/linux.kernel/browse_frm/thread/cb254465971d4a42/
http://groups.google.com/group/linux.kernel/browse_frm/thread/da78f2702e06fa11/
  - rename `coremask' to `core_flags'
  - change `core_flags' member in mm_struct to a bit field
    next to `dumpable'
  - introduce a global spin lock to protect adjacent two bit fields
    (core_flags and dumpable) from race condition
  - fix a bug that the generated core file can be corrupted when
    core dumping and updating core_flags occur concurrently
  - add kernel.core_flags_enable sysctl parameter to enable/disable
    flags in /proc/<pid>/core_flags
  - support ELF-FDPIC binary format, but not tested

v1:
http://groups.google.com/group/linux.kernel/browse_frm/thread/1381fc54d716e3e6/

-- 
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory
E-mail: hidehiro.kawai.ez@hitachi.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-02  4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro
@ 2007-03-02  4:47 ` Kawai, Hidehiro
  2007-03-02  9:34   ` Pavel Machek
  2007-03-02  4:49 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-02  4:47 UTC (permalink / raw)
  To: Andrew Morton, kernel list
  Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells,
	Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

This patch adds an interface to set/reset a flag which determines
anonymous shared memory segments should be dumped or not when a core
file is generated.

/proc/<pid>/coredump_omit_anonymous_shared file is provided to access
the flag. You can change the flag status for a particular process by
writing to or reading from the file.

The flag status is inherited to the child process when it is created.

The flag is stored into coredump_omit_anon_shared member of mm_struct,
which shares bytes with dumpable member because these two are adjacent
bit fields. In order to avoid write-write race between the two, we use
a global spin lock.

smp_wmb() at updating dumpable is removed because set_dumpable()
includes a pair of spin lock and unlock which has the effect of
memory barrier.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 fs/exec.c               |   12 ++--
 fs/proc/base.c          |  103 ++++++++++++++++++++++++++++++++++++++
 include/linux/binfmts.h |    4 +
 include/linux/sched.h   |   33 ++++++++++++
 kernel/fork.c           |    3 +
 kernel/sys.c            |   62 +++++++---------------
 security/commoncap.c    |    2 
 security/dummy.c        |    2 
 8 files changed, 174 insertions(+), 47 deletions(-)

Index: linux-2.6.20-mm2/fs/proc/base.c
===================================================================
--- linux-2.6.20-mm2.orig/fs/proc/base.c
+++ linux-2.6.20-mm2/fs/proc/base.c
@@ -74,6 +74,7 @@
 #include <linux/poll.h>
 #include <linux/nsproxy.h>
 #include <linux/oom.h>
+#include <linux/elf.h>
 #include "internal.h"
 
 /* NOTE:
@@ -1753,6 +1754,104 @@ static const struct inode_operations pro
 
 #endif
 
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+static ssize_t proc_coredump_omit_anon_shared_read(struct file *file,
+						   char __user *buf,
+						   size_t count,
+						   loff_t *ppos)
+{
+	struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+	struct mm_struct *mm;
+	char buffer[PROC_NUMBUF];
+	size_t len;
+	loff_t __ppos = *ppos;
+	int ret;
+
+	ret = -ESRCH;
+	if (!task)
+		goto out_no_task;
+
+	ret = 0;
+	mm = get_task_mm(task);
+	if (!mm)
+		goto out_no_mm;
+
+	len = snprintf(buffer, sizeof(buffer), "%u\n",
+		       mm->coredump_omit_anon_shared);
+	if (__ppos >= len)
+		goto out;
+	if (count > len - __ppos)
+		count = len - __ppos;
+
+	ret = -EFAULT;
+	if (copy_to_user(buf, buffer + __ppos, count))
+		goto out;
+
+	ret = count;
+	*ppos = __ppos + count;
+
+ out:
+	mmput(mm);
+ out_no_mm:
+	put_task_struct(task);
+ out_no_task:
+	return ret;
+}
+
+static ssize_t proc_coredump_omit_anon_shared_write(struct file *file,
+						    const char __user *buf,
+						    size_t count,
+						    loff_t *ppos)
+{
+	struct task_struct *task;
+	struct mm_struct *mm;
+	char buffer[PROC_NUMBUF], *end;
+	unsigned int val;
+	int ret;
+
+	ret = -EFAULT;
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		goto out_no_task;
+
+	ret = -EINVAL;
+	val = (unsigned int)simple_strtoul(buffer, &end, 0);
+	if (*end == '\n')
+		end++;
+	if (end - buffer == 0)
+		goto out_no_task;
+
+	ret = -ESRCH;
+	task = get_proc_task(file->f_dentry->d_inode);
+	if (!task)
+		goto out_no_task;
+
+	ret = end - buffer;
+	mm = get_task_mm(task);
+	if (!mm)
+		goto out_no_mm;
+
+	if (down_write_trylock(&coredump_settings_sem)) {
+		set_coredump_omit_anon_shared(mm, (val != 0));
+		up_write(&coredump_settings_sem);
+	} else
+		ret = -EBUSY;
+
+	mmput(mm);
+ out_no_mm:
+	put_task_struct(task);
+ out_no_task:
+	return ret;
+}
+
+static struct file_operations proc_coredump_omit_anon_shared_operations = {
+	.read		= proc_coredump_omit_anon_shared_read,
+	.write		= proc_coredump_omit_anon_shared_write,
+};
+#endif
+
 /*
  * /proc/self:
  */
@@ -1972,6 +2071,10 @@ static struct pid_entry tgid_base_stuff[
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
 #endif
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+	REG("coredump_omit_anonymous_shared", S_IRUGO|S_IWUSR,
+	    coredump_omit_anon_shared),
+#endif
 #ifdef CONFIG_TASK_IO_ACCOUNTING
 	INF("io",	S_IRUGO, pid_io_accounting),
 #endif
Index: linux-2.6.20-mm2/include/linux/sched.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/sched.h
+++ linux-2.6.20-mm2/include/linux/sched.h
@@ -366,7 +366,13 @@ struct mm_struct {
 	unsigned int token_priority;
 	unsigned int last_interval;
 
+	/*
+	 * Writing to these bit fields can cause race condition.  To avoid
+	 * the race, use dump_bits_lock.  You may also use set_dumpable() or
+	 * set_coredump_*() macros to set a value.
+	 */
 	unsigned char dumpable:2;
+	unsigned char coredump_omit_anon_shared:1;  /* don't dump anon shm */
 
 	/* coredumping support */
 	int core_waiters;
@@ -1727,6 +1733,33 @@ static inline void inc_syscw(struct task
 }
 #endif
 
+#include <linux/elf.h>
+/*
+ * These macros are used to protect dumpable and coredump_omit_anon_shared bit
+ * fields in mm_struct from write race between the two.
+ */
+extern spinlock_t dump_bits_lock;
+#define __set_dump_bits(dest, val) \
+	do {					\
+		spin_lock(&dump_bits_lock);	\
+		(dest) = (val);			\
+		spin_unlock(&dump_bits_lock);	\
+	} while (0)
+
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+# define set_dumpable(mm, val) \
+	__set_dump_bits((mm)->dumpable, val)
+# define set_coredump_omit_anon_shared(mm, val) \
+	__set_dump_bits((mm)->coredump_omit_anon_shared, val)
+#else
+# define set_dumpable(mm, val) \
+	do {				\
+		(mm)->dumpable = (val);	\
+		smp_wmb();		\
+	} while (0)
+# define set_coredump_omit_anon_shared(mm, val)
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif
Index: linux-2.6.20-mm2/fs/exec.c
===================================================================
--- linux-2.6.20-mm2.orig/fs/exec.c
+++ linux-2.6.20-mm2/fs/exec.c
@@ -61,6 +61,8 @@
 int core_uses_pid;
 char core_pattern[128] = "core";
 int suid_dumpable = 0;
+DEFINE_SPINLOCK(dump_bits_lock);
+DECLARE_RWSEM(coredump_settings_sem);
 
 EXPORT_SYMBOL(suid_dumpable);
 /* The maximal length of core_pattern is also specified in sysctl.c */
@@ -853,9 +855,9 @@ int flush_old_exec(struct linux_binprm *
 	current->sas_ss_sp = current->sas_ss_size = 0;
 
 	if (current->euid == current->uid && current->egid == current->gid)
-		current->mm->dumpable = 1;
+		set_dumpable(current->mm, 1);
 	else
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 
 	name = bprm->filename;
 
@@ -883,7 +885,7 @@ int flush_old_exec(struct linux_binprm *
 	    file_permission(bprm->file, MAY_READ) ||
 	    (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)) {
 		suid_keys(current);
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 	}
 
 	/* An exec changes our domain. We are no longer part of the thread
@@ -1477,7 +1479,7 @@ int do_coredump(long signr, int exit_cod
 		flag = O_EXCL;		/* Stop rewrite attacks */
 		current->fsuid = 0;	/* Dump root private */
 	}
-	mm->dumpable = 0;
+	set_dumpable(mm, 0);
 
 	retval = coredump_wait(exit_code);
 	if (retval < 0)
@@ -1530,7 +1532,9 @@ int do_coredump(long signr, int exit_cod
 	if (!ispipe && do_truncate(file->f_path.dentry, 0, 0, file) != 0)
 		goto close_fail;
 
+	down_read(&coredump_settings_sem);
 	retval = binfmt->core_dump(signr, regs, file);
+	up_read(&coredump_settings_sem);
 
 	if (retval)
 		current->signal->group_exit_code |= 0x80;
Index: linux-2.6.20-mm2/kernel/fork.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/fork.c
+++ linux-2.6.20-mm2/kernel/fork.c
@@ -334,6 +334,9 @@ static struct mm_struct * mm_init(struct
 	atomic_set(&mm->mm_count, 1);
 	init_rwsem(&mm->mmap_sem);
 	INIT_LIST_HEAD(&mm->mmlist);
+	/* don't need to use set_coredump_omit_anon_shared() */
+	mm->coredump_omit_anon_shared =
+		(current->mm) ?	current->mm->coredump_omit_anon_shared : 0;
 	mm->core_waiters = 0;
 	mm->nr_ptes = 0;
 	set_mm_counter(mm, file_rss, 0);
Index: linux-2.6.20-mm2/kernel/sys.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/sys.c
+++ linux-2.6.20-mm2/kernel/sys.c
@@ -1022,10 +1022,8 @@ asmlinkage long sys_setregid(gid_t rgid,
 		else
 			return -EPERM;
 	}
-	if (new_egid != old_egid) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (new_egid != old_egid)
+		set_dumpable(current->mm, suid_dumpable);
 	if (rgid != (gid_t) -1 ||
 	    (egid != (gid_t) -1 && egid != old_rgid))
 		current->sgid = new_egid;
@@ -1052,16 +1050,12 @@ asmlinkage long sys_setgid(gid_t gid)
 		return retval;
 
 	if (capable(CAP_SETGID)) {
-		if (old_egid != gid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (old_egid != gid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->gid = current->egid = current->sgid = current->fsgid = gid;
 	} else if ((gid == current->gid) || (gid == current->sgid)) {
-		if (old_egid != gid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (old_egid != gid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->egid = current->fsgid = gid;
 	}
 	else
@@ -1089,10 +1083,8 @@ static int set_user(uid_t new_ruid, int 
 
 	switch_uid(new_user);
 
-	if (dumpclear) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (dumpclear)
+		set_dumpable(current->mm, suid_dumpable);
 	current->uid = new_ruid;
 	return 0;
 }
@@ -1145,10 +1137,8 @@ asmlinkage long sys_setreuid(uid_t ruid,
 	if (new_ruid != old_ruid && set_user(new_ruid, new_euid != old_euid) < 0)
 		return -EAGAIN;
 
-	if (new_euid != old_euid) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (new_euid != old_euid)
+		set_dumpable(current->mm, suid_dumpable);
 	current->fsuid = current->euid = new_euid;
 	if (ruid != (uid_t) -1 ||
 	    (euid != (uid_t) -1 && euid != old_ruid))
@@ -1195,10 +1185,8 @@ asmlinkage long sys_setuid(uid_t uid)
 	} else if ((uid != current->uid) && (uid != new_suid))
 		return -EPERM;
 
-	if (old_euid != uid) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (old_euid != uid)
+		set_dumpable(current->mm, suid_dumpable);
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
@@ -1240,10 +1228,8 @@ asmlinkage long sys_setresuid(uid_t ruid
 			return -EAGAIN;
 	}
 	if (euid != (uid_t) -1) {
-		if (euid != current->euid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (euid != current->euid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->euid = euid;
 	}
 	current->fsuid = current->euid;
@@ -1290,10 +1276,8 @@ asmlinkage long sys_setresgid(gid_t rgid
 			return -EPERM;
 	}
 	if (egid != (gid_t) -1) {
-		if (egid != current->egid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (egid != current->egid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->egid = egid;
 	}
 	current->fsgid = current->egid;
@@ -1336,10 +1320,8 @@ asmlinkage long sys_setfsuid(uid_t uid)
 	if (uid == current->uid || uid == current->euid ||
 	    uid == current->suid || uid == current->fsuid || 
 	    capable(CAP_SETUID)) {
-		if (uid != old_fsuid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (uid != old_fsuid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->fsuid = uid;
 	}
 
@@ -1365,10 +1347,8 @@ asmlinkage long sys_setfsgid(gid_t gid)
 	if (gid == current->gid || gid == current->egid ||
 	    gid == current->sgid || gid == current->fsgid || 
 	    capable(CAP_SETGID)) {
-		if (gid != old_fsgid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (gid != old_fsgid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->fsgid = gid;
 		key_fsgid_changed(current);
 		proc_id_connector(current, PROC_EVENT_GID);
@@ -2163,7 +2143,7 @@ asmlinkage long sys_prctl(int option, un
 				error = -EINVAL;
 				break;
 			}
-			current->mm->dumpable = arg2;
+			set_dumpable(current->mm, arg2);
 			break;
 
 		case PR_SET_UNALIGN:
Index: linux-2.6.20-mm2/security/commoncap.c
===================================================================
--- linux-2.6.20-mm2.orig/security/commoncap.c
+++ linux-2.6.20-mm2/security/commoncap.c
@@ -244,7 +244,7 @@ void cap_bprm_apply_creds (struct linux_
 
 	if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
 	    !cap_issubset (new_permitted, current->cap_permitted)) {
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 
 		if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 			if (!capable(CAP_SETUID)) {
Index: linux-2.6.20-mm2/security/dummy.c
===================================================================
--- linux-2.6.20-mm2.orig/security/dummy.c
+++ linux-2.6.20-mm2/security/dummy.c
@@ -130,7 +130,7 @@ static void dummy_bprm_free_security (st
 static void dummy_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
 {
 	if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) {
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 
 		if ((unsafe & ~LSM_UNSAFE_PTRACE_CAP) && !capable(CAP_SETUID)) {
 			bprm->e_uid = current->uid;
Index: linux-2.6.20-mm2/include/linux/binfmts.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/binfmts.h
+++ linux-2.6.20-mm2/include/linux/binfmts.h
@@ -17,6 +17,8 @@ struct pt_regs;
 
 #ifdef __KERNEL__
 
+#include <linux/rwsem.h>
+
 /*
  * This structure is used to hold the arguments that are used when loading binaries.
  */
@@ -75,6 +77,8 @@ extern int suid_dumpable;
 #define SUID_DUMP_USER		1	/* Dump as user of process */
 #define SUID_DUMP_ROOT		2	/* Dump as root */
 
+extern struct rw_semaphore coredump_settings_sem;
+
 /* Stack area protections */
 #define EXSTACK_DEFAULT   0	/* Whatever the arch defaults to */
 #define EXSTACK_DISABLE_X 1	/* Disable executable stacks */



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory
  2007-03-02  4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro
  2007-03-02  4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
@ 2007-03-02  4:49 ` Kawai, Hidehiro
  2007-03-02  4:50 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-02  4:49 UTC (permalink / raw)
  To: Andrew Morton, kernel list
  Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells,
	Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

This patch enables to omit anonymous shared memory from an ELF
formatted core file when it is generated.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 fs/binfmt_elf.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

Index: linux-2.6.20-mm2/fs/binfmt_elf.c
===================================================================
--- linux-2.6.20-mm2.orig/fs/binfmt_elf.c
+++ linux-2.6.20-mm2/fs/binfmt_elf.c
@@ -1191,9 +1191,15 @@ static int maydump(struct vm_area_struct
 	if (vma->vm_flags & (VM_IO | VM_RESERVED))
 		return 0;
 
-	/* Dump shared memory only if mapped from an anonymous file. */
-	if (vma->vm_flags & VM_SHARED)
-		return vma->vm_file->f_path.dentry->d_inode->i_nlink == 0;
+	/*
+	 * Dump shared memory only if mapped from an anonymous file and
+	 * /proc/<pid>/coredump_omit_anonymous_shared flag is not set.
+	 */
+	if (vma->vm_flags & VM_SHARED) {
+		if (vma->vm_file->f_path.dentry->d_inode->i_nlink)
+			return 0;
+		return vma->vm_mm->coredump_omit_anon_shared == 0;
+	}
 
 	/* If it hasn't been written to, don't write it out */
 	if (!vma->anon_vma)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory
  2007-03-02  4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro
  2007-03-02  4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
  2007-03-02  4:49 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro
@ 2007-03-02  4:50 ` Kawai, Hidehiro
  2007-03-02  4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
  2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton
  4 siblings, 0 replies; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-02  4:50 UTC (permalink / raw)
  To: Andrew Morton, kernel list
  Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells,
	Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

This patch enables to omit anonymous shared memory from an ELF-FDPIC
formatted core file when it is generated.

The debug messages from maydump() in fs/binfmt_elf_fdpic.c are changed
appropriately so that we can know what kind of memory segments are
dumped or not.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 fs/binfmt_elf_fdpic.c |   25 ++++++++++++++++---------
 1 files changed, 16 insertions(+), 9 deletions(-)

Index: linux-2.6.20-mm2/fs/binfmt_elf_fdpic.c
===================================================================
--- linux-2.6.20-mm2.orig/fs/binfmt_elf_fdpic.c
+++ linux-2.6.20-mm2/fs/binfmt_elf_fdpic.c
@@ -1168,7 +1168,7 @@ static int dump_seek(struct file *file, 
  *
  * I think we should skip something. But I am not sure how. H.J.
  */
-static int maydump(struct vm_area_struct *vma)
+static int maydump(struct vm_area_struct *vma, struct mm_struct *mm)
 {
 	/* Do not dump I/O mapped devices or special mappings */
 	if (vma->vm_flags & (VM_IO | VM_RESERVED)) {
@@ -1184,15 +1184,22 @@ static int maydump(struct vm_area_struct
 		return 0;
 	}
 
-	/* Dump shared memory only if mapped from an anonymous file. */
+	/*
+	 * Dump shared memory only if mapped from an anonymous file and
+	 * /proc/<pid>/coredump_omit_anonymous_shared flag is not set.
+	 */
 	if (vma->vm_flags & VM_SHARED) {
-		if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0) {
+		if (vma->vm_file->f_path.dentry->d_inode->i_nlink) {
 			kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
+			return 0;
+		}
+		if (mm->coredump_omit_anon_shared) {
+			kdcore("%08lx: %08lx: no (anon-share)", vma->vm_start, vma->vm_flags);
+			return 0;
+		} else {
+			kdcore("%08lx: %08lx: yes (anon-share)", vma->vm_start, vma->vm_flags);
 			return 1;
 		}
-
-		kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
-		return 0;
 	}
 
 #ifdef CONFIG_MMU
@@ -1451,7 +1458,7 @@ static int elf_fdpic_dump_segments(struc
 	for (vma = current->mm->mmap; vma; vma = vma->vm_next) {
 		unsigned long addr;
 
-		if (!maydump(vma))
+		if (!maydump(vma, mm))
 			continue;
 
 		for (addr = vma->vm_start;
@@ -1506,7 +1513,7 @@ static int elf_fdpic_dump_segments(struc
 	for (vml = current->mm->context.vmlist; vml; vml = vml->next) {
 	struct vm_area_struct *vma = vml->vma;
 
-		if (!maydump(vma))
+		if (!maydump(vma, mm))
 			continue;
 
 		if ((*size += PAGE_SIZE) > *limit)
@@ -1715,7 +1722,7 @@ static int elf_fdpic_core_dump(long sign
 		phdr.p_offset = offset;
 		phdr.p_vaddr = vma->vm_start;
 		phdr.p_paddr = 0;
-		phdr.p_filesz = maydump(vma) ? sz : 0;
+		phdr.p_filesz = maydump(vma, current->mm) ? sz : 0;
 		phdr.p_memsz = sz;
 		offset += phdr.p_filesz;
 		phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] coredump: documentation for proc entry
  2007-03-02  4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro
                   ` (2 preceding siblings ...)
  2007-03-02  4:50 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro
@ 2007-03-02  4:51 ` Kawai, Hidehiro
  2007-03-02  9:35   ` Pavel Machek
  2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton
  4 siblings, 1 reply; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-02  4:51 UTC (permalink / raw)
  To: Andrew Morton, kernel list
  Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells,
	Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

This patch adds the documentation for
/proc/<pid>/coredump_omit_anonymous_shared.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 Documentation/filesystems/proc.txt |   38 +++++++++++++++++++++++++++
 1 files changed, 38 insertions(+)

Index: linux-2.6.20-mm2/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.20-mm2.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.20-mm2/Documentation/filesystems/proc.txt
@@ -41,6 +41,7 @@ Table of Contents
   2.11	/proc/sys/fs/mqueue - POSIX message queues filesystem
   2.12	/proc/<pid>/oom_adj - Adjust the oom-killer score
   2.13	/proc/<pid>/oom_score - Display current oom-killer score
+  2.14	/proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator
 
 ------------------------------------------------------------------------------
 Preface
@@ -1982,6 +1983,43 @@ This file can be used to check the curre
 any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which
 process should be killed in an out-of-memory situation.
 
+2.14 /proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator
+---------------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory.
+
+The /proc/<pid>/coredump_omit_anonymous_shared is a flag which enables you to
+omit anonymous shared memory segments from a core file when it is generated.
+When the <pid> process is dumped, the core dump routine decides whether a
+given memory segment should be dumped into a core file or not based on the
+type of the memory segment and the flag.
+
+If you have written a non-zero value to this proc file, anonymous shared
+memory segments are not dumped. There are three types of anonymous shared
+memory:
+
+  - IPC shared memory
+  - the memory segments created by mmap(2) with MAP_ANONYMOUS and MAP_SHARED
+    flags
+  - the memory segments created by mmap(2) with MAP_SHARED flag, and the
+    mapped file has already been unlinked
+
+Because current core dump routine doesn't distinguish these segments, you can
+only choose either dumping all anonymous shared memory segments or not.
+
+If you don't want to dump all shared memory segments attached to pid 1234,
+write 0 to the process's proc file.
+
+  $ echo 1 > /proc/1234/coredump_omit_anonymous_shared
+
+When a new process is created, the process inherits the flag status from its
+parent. It is useful to set the flag before the program runs.
+For example:
+
+  $ echo 1 > /proc/self/coredump_omit_anonymous_shared
+  $ ./some_program
+
 ------------------------------------------------------------------------------
 Summary
 ------------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-02  4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
@ 2007-03-02  9:34   ` Pavel Machek
  2007-03-26 13:02     ` Kawai, Hidehiro
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Machek @ 2007-03-02  9:34 UTC (permalink / raw)
  To: Kawai, Hidehiro
  Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi!

> This patch adds an interface to set/reset a flag which determines
> anonymous shared memory segments should be dumped or not when a core
> file is generated.
> 
> /proc/<pid>/coredump_omit_anonymous_shared file is provided to access
> the flag. You can change the flag status for a particular process by
> writing to or reading from the file.

Yes. So, you used very different interface interface from ulimit,
which means locking is hard.

Plus, what you are doing can be done in userspace using google
coredumper.

> +	if (down_write_trylock(&coredump_settings_sem)) {
> +		set_coredump_omit_anon_shared(mm, (val != 0));
> +		up_write(&coredump_settings_sem);
> +	} else
> +		ret = -EBUSY;

Now this is an ugly interface. "If user tries to write to /proc file
while it is locked, return him spurious error.


> @@ -75,6 +77,8 @@ extern int suid_dumpable;
>  #define SUID_DUMP_USER		1	/* Dump as user of process */
>  #define SUID_DUMP_ROOT		2	/* Dump as root */
>  
> +extern struct rw_semaphore coredump_settings_sem;
> +
>  /* Stack area protections */
>  #define EXSTACK_DEFAULT   0	/* Whatever the arch defaults to */
>  #define EXSTACK_DISABLE_X 1	/* Disable executable stacks */

Yep, very nice, if you used interface suited for the task (ulimit),
you'd not have to invent new locking like this.

Anyway, I don't care, and you don't listen. You got the design wrong,
and your code can be done in userspace, anyway. This is "NAK" from me,
but unfortunately I'm not in power to decide this. So at least drop me
from Cc in future submissions.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] coredump: documentation for proc entry
  2007-03-02  4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
@ 2007-03-02  9:35   ` Pavel Machek
  2007-03-20 11:11     ` Kawai, Hidehiro
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Machek @ 2007-03-02  9:35 UTC (permalink / raw)
  To: Kawai, Hidehiro
  Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi!

> +If you don't want to dump all shared memory segments attached to pid 1234,
> +write 0 to the process's proc file.
> +
> +  $ echo 1 > /proc/1234/coredump_omit_anonymous_shared

Write 0?

> +When a new process is created, the process inherits the flag status from its
> +parent. It is useful to set the flag before the program runs.
> +For example:
> +
> +  $ echo 1 > /proc/self/coredump_omit_anonymous_shared
> +  $ ./some_program
> +

Notice that this docs is wrong. You have to retry until kernel stops
producing spurious errors.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] coredump: core dump masking support v4
  2007-03-02  4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro
                   ` (3 preceding siblings ...)
  2007-03-02  4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
@ 2007-03-15 20:37 ` Andrew Morton
  2007-03-23 13:13   ` Kawai, Hidehiro
  4 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2007-03-15 20:37 UTC (permalink / raw)
  To: Kawai, Hidehiro
  Cc: kernel list, Pavel Machek, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

On Fri, 02 Mar 2007 13:41:30 +0900
"Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote:

> This patch series is version 4 of the core dump masking feature,
> which provides a per-process flag not to dump anonymous shared
> memory segments.

First up, please convince us that this problem cannot be solved in
userspace.  Note that we now support dumping core over a pipe to a
userspace application which can perform filtering such as this (see
Documentation/sysctl/kernel.txt:core_pattern).


Assuming that your argument is successful...

- The unpleasing trylock in proc_coredump_omit_anon_shared_write() is
  there, I believe, to handle the case where a coredump is presently in
  progress.  We don't want to change the filtering rule while the dump is
  happening.

  What I suggest you do instead is to take a copy of
  mm->coredump_omit_anon_shared into a local variable with one single read
  per coredump.  Pass that local down into all the callees which need to
  see it.  That way, no locking is needed.

- These games we're playing with the atomicity of the bitfields in the
  mm_struct need to go away.

  First up, please prepare a standalone patch which removes
  mm_struct.dumpable and adds `unsigned long mm_struct.flags'.  Include a
  comment telling people that they must use atomic bitops (set_bit, clear_bit) on
  mm_struct.flags.

  Reimplement the current three-value dumpable silliness using two or
  three separate flags in mm_struct.flags.  Of course, this design means
  that there will be tiny timing windows where the value of these two or
  three flags have intermediate, invalid states.  Please take care of those
  little windows and document how you did so.  I expect a suitable approach
  would be to set and clear the flags in a suitable order, so that if a
  race _does_ happen, the results are benign.

- Once that is done, you're ready to think about your new functionality. 
  Start out with 

	#define MM_FLAG_COREDUMP_OMIT_ANON_SHARED	(1 << 3)

  or whatever, and it all becomes easy.

- Finally, the code as you have it here is very specific to your specific
  requirement: don't dump shared memory segments.

  But if we're going to implement in-kernel core-dump filtering of this
  nature, we should design it extensibly, even if we don't actually
  implement those extensions at this time.

  Because other people might (reasonably) wish to omit anonymous memory,
  or private mappings, or file-backed VMAs, or whatever.

  So maybe /proc/pid/coredump_omit_anon_shared should become
  /proc/pid/core_dumpfilter, which is a carefully documented bitmask.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] coredump: documentation for proc entry
  2007-03-02  9:35   ` Pavel Machek
@ 2007-03-20 11:11     ` Kawai, Hidehiro
  0 siblings, 0 replies; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-20 11:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi Pavel,
I'm sorry for my late reply.

Pavel Machek wrote:

> Hi!
> 
>>+If you don't want to dump all shared memory segments attached to pid 1234,
>>+write 0 to the process's proc file.
>>+
>>+  $ echo 1 > /proc/1234/coredump_omit_anonymous_shared
> 
> Write 0?

Thank you for pointing out. 
It seems I mistook when I changed the documents.
`write 1' is correct.


>>+When a new process is created, the process inherits the flag status from its
>>+parent. It is useful to set the flag before the program runs.
>>+For example:
>>+
>>+  $ echo 1 > /proc/self/coredump_omit_anonymous_shared
>>+  $ ./some_program
>>+
> 
> Notice that this docs is wrong. You have to retry until kernel stops
> producing spurious errors.
> 									Pavel
 
I'll fix the patchset so that kernel doesn't produce the spurious error.

For answers to your another mail, please wait a few days.
I'm still considering the answer partly. 

Thanks,
-- 
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] coredump: core dump masking support v4
  2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton
@ 2007-03-23 13:13   ` Kawai, Hidehiro
  2007-03-28 12:37     ` Kawai, Hidehiro
  0 siblings, 1 reply; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-23 13:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kernel list, Pavel Machek, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi,
Thank you for your kind comments.

I'm still discussing the answer with my senior colleagues, so please
wait a few days.  I think I can reply at the beginning of next week.

Best regards,
-- 
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory


Andrew Morton wrote:

> On Fri, 02 Mar 2007 13:41:30 +0900
> "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote:
> 
> 
>>This patch series is version 4 of the core dump masking feature,
>>which provides a per-process flag not to dump anonymous shared
>>memory segments.
> 
> 
> First up, please convince us that this problem cannot be solved in
> userspace.  Note that we now support dumping core over a pipe to a
> userspace application which can perform filtering such as this (see
> Documentation/sysctl/kernel.txt:core_pattern).
> 
> 
> Assuming that your argument is successful...
> 
> - The unpleasing trylock in proc_coredump_omit_anon_shared_write() is
>   there, I believe, to handle the case where a coredump is presently in
>   progress.  We don't want to change the filtering rule while the dump is
>   happening.
> 
>   What I suggest you do instead is to take a copy of
>   mm->coredump_omit_anon_shared into a local variable with one single read
>   per coredump.  Pass that local down into all the callees which need to
>   see it.  That way, no locking is needed.
> 
> - These games we're playing with the atomicity of the bitfields in the
>   mm_struct need to go away.
> 
>   First up, please prepare a standalone patch which removes
>   mm_struct.dumpable and adds `unsigned long mm_struct.flags'.  Include a
>   comment telling people that they must use atomic bitops (set_bit, clear_bit) on
>   mm_struct.flags.
> 
>   Reimplement the current three-value dumpable silliness using two or
>   three separate flags in mm_struct.flags.  Of course, this design means
>   that there will be tiny timing windows where the value of these two or
>   three flags have intermediate, invalid states.  Please take care of those
>   little windows and document how you did so.  I expect a suitable approach
>   would be to set and clear the flags in a suitable order, so that if a
>   race _does_ happen, the results are benign.
> 
> - Once that is done, you're ready to think about your new functionality. 
>   Start out with 
> 
> 	#define MM_FLAG_COREDUMP_OMIT_ANON_SHARED	(1 << 3)
> 
>   or whatever, and it all becomes easy.
> 
> - Finally, the code as you have it here is very specific to your specific
>   requirement: don't dump shared memory segments.
> 
>   But if we're going to implement in-kernel core-dump filtering of this
>   nature, we should design it extensibly, even if we don't actually
>   implement those extensions at this time.
> 
>   Because other people might (reasonably) wish to omit anonymous memory,
>   or private mappings, or file-backed VMAs, or whatever.
> 
>   So maybe /proc/pid/coredump_omit_anon_shared should become
>   /proc/pid/core_dumpfilter, which is a carefully documented bitmask.
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-02  9:34   ` Pavel Machek
@ 2007-03-26 13:02     ` Kawai, Hidehiro
  2007-03-29 10:49       ` Pavel Machek
  2007-03-29 19:16       ` David Howells
  0 siblings, 2 replies; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-26 13:02 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi Pavel, 
Thank you for your reply.
I'm sorry for my late reply.

I have discussed with my colleagues why you say "ugly" against my
procfs interface, then I noticed I may have misunderstood what you said.
Is the reason for saying "ugly" two interfaces, i.e. preexisting ulimit
(get/setrlimit) and my proc entry, exist to control core file size? 
If so, I'm sorry for taking your precious time by proceeding to the
discussion without enough understanding.

Assuming my presumption is true, I don't think it's so ugly because
there are other parameters to control core dumping such as dumpable
(controled by prctl(2)) and suid_dumpable (controled via
fs.suid_dumpable sysctl).
What would you think about that?

Anyway, here are the answers to what you pointed out.


Pavel Machek wrote:

>>This patch adds an interface to set/reset a flag which determines
>>anonymous shared memory segments should be dumped or not when a core
>>file is generated.
>>
>>/proc/<pid>/coredump_omit_anonymous_shared file is provided to access
>>the flag. You can change the flag status for a particular process by
>>writing to or reading from the file.
> 
> Yes. So, you used very different interface interface from ulimit,
> which means locking is hard.

I'd like to allow users to change the flag from other process.  So I
have to do locking even if it is hard.  You said this flexibility was
not an advantage before, but in some cases, it is needed.

Please assume the case where a process forks many children and they
share a huge shared memory.  Sometimes an end user wants to set 
coredump_omit_anon_shared flag to those processes except for a
particular child process.  With ulimit (setrlimit) interface,
the user can't do such setup without modifying the application
program.  But normal end user will not be able to modify the program.


> Plus, what you are doing can be done in userspace using google
> coredumper.

I think that the needs differ between userland core dumper user and
in-kernel core dumper user.  Pros and cons also differ.

Some of people (such as system admins, distro vendors, etc) need
highly reliable core dumper because they don't want to experience
same failures again and they don't hope that another failure is
caused by core dumping.  Userland core dumper is useful because
it is relatively easy to be customized, but its reliability highly
depends on the application programs.

If the stack for signal handlers is not set up carefully, if the
data used by userland core dumper has been destroyed, if
coredump_omit_anon_shared flag has been overwritten by bad data,
or if the address of functions have been destroyed, the userland
core dumper may fail to dump.  So in-kernel solutoin is required
by enterprise users.


>>+	if (down_write_trylock(&coredump_settings_sem)) {
>>+		set_coredump_omit_anon_shared(mm, (val != 0));
>>+		up_write(&coredump_settings_sem);
>>+	} else
>>+		ret = -EBUSY;
> 
> Now this is an ugly interface. "If user tries to write to /proc file
> while it is locked, return him spurious error.

I'm considering using the previous argument passing approach (preserves
the setting value into a local variable and then passes it to core dump
routines) or another approach which introduce a per-process flag to
indicate that core dump is ongoing.  Both of these approach never
produce spurious errors.

 
>>@@ -75,6 +77,8 @@ extern int suid_dumpable;
>> #define SUID_DUMP_USER		1	/* Dump as user of process */
>> #define SUID_DUMP_ROOT		2	/* Dump as root */
>> 
>>+extern struct rw_semaphore coredump_settings_sem;
>>+
>> /* Stack area protections */
>> #define EXSTACK_DEFAULT   0	/* Whatever the arch defaults to */
>> #define EXSTACK_DISABLE_X 1	/* Disable executable stacks */
> 
> Yep, very nice, if you used interface suited for the task (ulimit),
> you'd not have to invent new locking like this.

Using the above-stated approach, this semaphore becomes unnecessary.


Thanks,
-- 
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] coredump: core dump masking support v4
  2007-03-23 13:13   ` Kawai, Hidehiro
@ 2007-03-28 12:37     ` Kawai, Hidehiro
  2007-03-28 17:32       ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-28 12:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kawai, Hidehiro, kernel list, Pavel Machek, Robin Holt,
	David Howells, Alan Cox, Masami Hiramatsu, sugita,
	Satoshi OSHIMA, Hideo AOKI

Hi,
Thank you for your kind comments.
I'm sorry for my late reply. 

Andrew Morton wrote:

> On Fri, 02 Mar 2007 13:41:30 +0900
> "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote:
> 
>>This patch series is version 4 of the core dump masking feature,
>>which provides a per-process flag not to dump anonymous shared
>>memory segments.
> 
> First up, please convince us that this problem cannot be solved in
> userspace. 
> Note that we now support dumping core over a pipe to a
> userspace application which can perform filtering such as this (see
> Documentation/sysctl/kernel.txt:core_pattern).

I understand.  Thank you for your suggestion.  I'll reply about it in
another mail, but it may take a few days.

 
> Assuming that your argument is successful...
> 
> - The unpleasing trylock in proc_coredump_omit_anon_shared_write() is
>   there, I believe, to handle the case where a coredump is presently in
>   progress.  We don't want to change the filtering rule while the dump is
>   happening.
> 
>   What I suggest you do instead is to take a copy of
>   mm->coredump_omit_anon_shared into a local variable with one single read
>   per coredump.  Pass that local down into all the callees which need to
>   see it.  That way, no locking is needed.

Previous v3 patchset does what you suggest, and here are links to the
patches:

[PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory
http://lkml.org/lkml/2007/2/16/156

[PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory
http://lkml.org/lkml/2007/2/16/157

However, there was an opposite opinion.  To pass the flag status, I
added omit_anon_shared argument to elf_fdpic_dump_segments().  Then,
David pointed that the argument was unncecessary, because the function
also receives mm_struct *mm which includes coredump_omit_anon_shared.
But mm->coredump_omit_anon_shared can be changed while core dumping, and
it may causes the core file to be corrupted.  So in v4 patchset I used
r/w semaphore to prevent mm->coredump_omit_anon_shared from being changed.

If I add an addtional argument to elf_fdpic_dump_segments() again, I
have to explain it to David.  I'll tell him that removing mm argument
from the function will be a solution since it refers current->mm directly
and the mm argument is never used.

 
> - These games we're playing with the atomicity of the bitfields in the
>   mm_struct need to go away.
> 
>   First up, please prepare a standalone patch which removes
>   mm_struct.dumpable and adds `unsigned long mm_struct.flags'.  Include a
>   comment telling people that they must use atomic bitops (set_bit, clear_bit) on
>   mm_struct.flags.

OK.  I'll do it in the next version.


> - Finally, the code as you have it here is very specific to your specific
>   requirement: don't dump shared memory segments.
> 
>   But if we're going to implement in-kernel core-dump filtering of this
>   nature, we should design it extensibly, even if we don't actually
>   implement those extensions at this time.

I understood. Since I had done so initially, I'll revert it to.

 
>   Because other people might (reasonably) wish to omit anonymous memory,
>   or private mappings, or file-backed VMAs, or whatever.
> 
>   So maybe /proc/pid/coredump_omit_anon_shared should become
>   /proc/pid/core_dumpfilter, which is a carefully documented bitmask.

There are people who wish to dump VMAs which are not dumped by default.
Taking this into account, some bits of core_dumpfilter will be set by
default.  This means users have to be aware of the default bitmask
when they change the bitmask.  Perhaps changing the bitmask requires
3 steps:

  1. read the default bitmask
  2. change bits of the mask
  3. write it to the proc entry

So I think it is better if we provide /proc/pid/core_flags (default:
all bits are 0) instead of core_dumpfilter.  With this interface,
users who use only one bit of the bitmask (this will be a common case) 
just have to write 2^n to the proc entry.  It takes only one step:

 1. write a value to the proc entry

If we can implement at the same cost, core_flags will be better
because it is useful for users.  What would you think about that?


By the way, Robin Holt wrote as follows:

> Can you make this a little more transparent?  Having a magic bitmask does
> not seem like the best way to do stuff.  Could you maybe make a core_flags
> directory with a seperate file for each flag.  It could still map to a
> single field in the mm, but be broken out for the proc filesystem.

Do you think Robin's suggestion is acceptable?

Best regards,
-- 
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] coredump: core dump masking support v4
  2007-03-28 12:37     ` Kawai, Hidehiro
@ 2007-03-28 17:32       ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2007-03-28 17:32 UTC (permalink / raw)
  To: Kawai, Hidehiro
  Cc: kernel list, Pavel Machek, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

On Wed, 28 Mar 2007 21:37:07 +0900 "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote:

> >   Because other people might (reasonably) wish to omit anonymous memory,
> >   or private mappings, or file-backed VMAs, or whatever.
> > 
> >   So maybe /proc/pid/coredump_omit_anon_shared should become
> >   /proc/pid/core_dumpfilter, which is a carefully documented bitmask.
> 
> There are people who wish to dump VMAs which are not dumped by default.
> Taking this into account, some bits of core_dumpfilter will be set by
> default.  This means users have to be aware of the default bitmask
> when they change the bitmask.  Perhaps changing the bitmask requires
> 3 steps:
> 
>   1. read the default bitmask
>   2. change bits of the mask
>   3. write it to the proc entry
> 
> So I think it is better if we provide /proc/pid/core_flags (default:
> all bits are 0) instead of core_dumpfilter.  With this interface,
> users who use only one bit of the bitmask (this will be a common case) 
> just have to write 2^n to the proc entry.  It takes only one step:
> 
>  1. write a value to the proc entry
> 
> If we can implement at the same cost, core_flags will be better
> because it is useful for users.  What would you think about that?
> 

It sounds unnecessarily complex, and unnecessarily different from our
normal expectations of /proc files.  And the value we read differs from the
value we wrote...  I think having a non-zero default will be fine.

> 
> By the way, Robin Holt wrote as follows:
> 
> > Can you make this a little more transparent?  Having a magic bitmask does
> > not seem like the best way to do stuff.  Could you maybe make a core_flags
> > directory with a seperate file for each flag.  It could still map to a
> > single field in the mm, but be broken out for the proc filesystem.
> 
> Do you think Robin's suggestion is acceptable?

Marginal, I think.  This is not likely to be a field which a lot of people
modify a lot of times.  Those few people who need to work with this can
afford to look the values up in the documentation while writing their
script.

And it requires a distressingly large amount of code to implement a /proc
file.  Perhaps in this situation the code can be shared.

otoh, why is it a /proc thing at all?

unsigned long sys_set_corefile_filter(unsigned long enable_mask);
unsigned long sys_clear_corefile_filter(unsigned long enable_mask);

would be better?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-26 13:02     ` Kawai, Hidehiro
@ 2007-03-29 10:49       ` Pavel Machek
  2007-03-29 19:16       ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: Pavel Machek @ 2007-03-29 10:49 UTC (permalink / raw)
  To: Kawai, Hidehiro
  Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi!

> I have discussed with my colleagues why you say "ugly" against my
> procfs interface, then I noticed I may have misunderstood what you said.
> Is the reason for saying "ugly" two interfaces, i.e. preexisting ulimit
> (get/setrlimit) and my proc entry, exist to control core file size? 

Yes.

> > Plus, what you are doing can be done in userspace using google
> > coredumper.
> 
> I think that the needs differ between userland core dumper user and
> in-kernel core dumper user.  Pros and cons also differ.
> 
> Some of people (such as system admins, distro vendors, etc) need
> highly reliable core dumper because they don't want to experience
> same failures again and they don't hope that another failure is
> caused by core dumping.  Userland core dumper is useful because
> it is relatively easy to be customized, but its reliability highly
> depends on the application programs.

Fix userland core dumper to be reliable, then.

> If the stack for signal handlers is not set up carefully, if the
> data used by userland core dumper has been destroyed, if
> coredump_omit_anon_shared flag has been overwritten by bad data,
> or if the address of functions have been destroyed, the userland
> core dumper may fail to dump.  So in-kernel solutoin is required
> by enterprise users.

It should be possible to dump from separate process.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-26 13:02     ` Kawai, Hidehiro
  2007-03-29 10:49       ` Pavel Machek
@ 2007-03-29 19:16       ` David Howells
  2007-03-29 21:17         ` Andrew Morton
  1 sibling, 1 reply; 20+ messages in thread
From: David Howells @ 2007-03-29 19:16 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Kawai, Hidehiro, Andrew Morton, kernel list, Robin Holt,
	Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI,
	dhowells

Pavel Machek <pavel@ucw.cz> wrote:

> > Userland core dumper is useful because it is relatively easy to be
> > customized, but its reliability highly depends on the application
> > programs.
> 
> Fix userland core dumper to be reliable, then.

I don't think it's that easy.  The userland core dumper, as I understand it,
has to work *within* an application program (it's a library), thus the
application program my scotch the core dumper in a couple of ways:

 (1) by trying to claim the same services (such as signal handlers).

 (2) by destroying the coredumper should the application run amok and splat it
     (by munmapping it, mmapping over it, writing over it, etc.).

Plus there are cases that an in-application userspace coredumper can't catch -
namely double/triple faults.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-29 19:16       ` David Howells
@ 2007-03-29 21:17         ` Andrew Morton
  2007-03-30 10:29           ` Kawai, Hidehiro
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2007-03-29 21:17 UTC (permalink / raw)
  To: David Howells
  Cc: Pavel Machek, Kawai, Hidehiro, kernel list, Robin Holt, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

On Thu, 29 Mar 2007 20:16:59 +0100
David Howells <dhowells@redhat.com> wrote:

> Pavel Machek <pavel@ucw.cz> wrote:
> 
> > > Userland core dumper is useful because it is relatively easy to be
> > > customized, but its reliability highly depends on the application
> > > programs.
> > 
> > Fix userland core dumper to be reliable, then.
> 
> I don't think it's that easy.  The userland core dumper, as I understand it,
> has to work *within* an application program (it's a library), thus the
> application program my scotch the core dumper in a couple of ways:

That's no longer necessarily true with the recently-added
dump-to-an-application feature:

core_pattern:
  ...
. If the first character of the pattern is a '|', the kernel will treat
  the rest of the pattern as a command to run.  The core dump will be
  written to the standard input of that program instead of to a file.


That's new in 2.6.20 (maybe .19?) so people probably don't know about it.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-29 21:17         ` Andrew Morton
@ 2007-03-30 10:29           ` Kawai, Hidehiro
  2007-03-30 16:10             ` Andrew Morton
  2007-03-31 13:03             ` David Howells
  0 siblings, 2 replies; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-03-30 10:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, Pavel Machek, kernel list, Robin Holt, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Hi,

Andrew Morton wrote:

> On Thu, 29 Mar 2007 20:16:59 +0100
> David Howells <dhowells@redhat.com> wrote:
> 
>>Pavel Machek <pavel@ucw.cz> wrote:
>>
>>>>Userland core dumper is useful because it is relatively easy to be
>>>>customized, but its reliability highly depends on the application
>>>>programs.
>>>
>>>Fix userland core dumper to be reliable, then.
>>
>>I don't think it's that easy.  The userland core dumper, as I understand it,
>>has to work *within* an application program (it's a library), thus the
>>application program my scotch the core dumper in a couple of ways:
> 
> That's no longer necessarily true with the recently-added
> dump-to-an-application feature:
> 
> core_pattern:
>   ...
> . If the first character of the pattern is a '|', the kernel will treat
>   the rest of the pattern as a command to run.  The core dump will be
>   written to the standard input of that program instead of to a file.

I think dumping core over a pipe is almost good.  Filtering and writing
out a core by a separete userspace program can be reliable because it is
independent of the failed user process.  But I have one concern; data
transfer over a pipe is slow.  It took 7 seconds to transfer 2GB
anonymous shared memory (detailed at the last of this mail).

In the case of dumping hundreds processes which share giga bytes memory,
it will take a few minutes even if the huge shared memory is not written
to a disk.  If a user wants to restart the failed application as soon
as possible to keep downtime to a minimum, this extra transfer time will
be a barrier.  So I think in-kernel filtering is still needed.


I performed a simple experiment to confirm the transfer speed:
  1. run a program which allocates 2GB anonymous shared memory
  2. the program receives SIGSEGV
  3. core image is generated and passed to tp_pipe(*) over a pipe
  4. tp_pipe reads the pipe and revokes the received data while
     data arrives
  5. tp_pipe logs time of completing the data transfer
  6. repeat 1 to 5 ten times and calculate the average and standard
     deviation

  (*) test program which I prepared for this experiment

Here is the result:
             Average: 7.041 sec
  Standard deviation: 0.903


And cpuinfo and meminfo of my box:

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 4
model name	: Intel(R) Xeon(TM) CPU 3.20GHz
stepping	: 1
cpu MHz		: 3200.343
cache size	: 1024 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr
bogomips	: 6404.82
clflush size	: 64

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 15
model		: 4
model name	: Intel(R) Xeon(TM) CPU 3.20GHz
stepping	: 1
cpu MHz		: 3200.343
cache size	: 1024 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr
bogomips	: 6400.75
clflush size	: 64
$ cat /proc/meminfo
MemTotal:      8309240 kB
MemFree:       8216036 kB
Buffers:         11888 kB
Cached:          45200 kB
SwapCached:          0 kB
Active:          42324 kB
Inactive:        26008 kB
HighTotal:     7470896 kB
HighFree:      7405504 kB
LowTotal:       838344 kB
LowFree:        810532 kB
SwapTotal:    16386260 kB
SwapFree:     16386260 kB
Dirty:               4 kB
Writeback:           0 kB
AnonPages:       11232 kB
Mapped:           5828 kB
Slab:            11916 kB
SReclaimable:     5604 kB
SUnreclaim:       6312 kB
PageTables:        896 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  20540880 kB
Committed_AS:    33540 kB
VmallocTotal:   118776 kB
VmallocUsed:      8324 kB
VmallocChunk:   110408 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Thanks,
-- 
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-30 10:29           ` Kawai, Hidehiro
@ 2007-03-30 16:10             ` Andrew Morton
  2007-03-31 13:03             ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2007-03-30 16:10 UTC (permalink / raw)
  To: Kawai, Hidehiro
  Cc: David Howells, Pavel Machek, kernel list, Robin Holt, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

On Fri, 30 Mar 2007 19:29:23 +0900 "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote:

> > core_pattern:
> >   ...
> > . If the first character of the pattern is a '|', the kernel will treat
> >   the rest of the pattern as a command to run.  The core dump will be
> >   written to the standard input of that program instead of to a file.
> 
> I think dumping core over a pipe is almost good.  Filtering and writing
> out a core by a separete userspace program can be reliable because it is
> independent of the failed user process.  But I have one concern; data
> transfer over a pipe is slow.  It took 7 seconds to transfer 2GB
> anonymous shared memory (detailed at the last of this mail).
> 
> In the case of dumping hundreds processes which share giga bytes memory,
> it will take a few minutes even if the huge shared memory is not written
> to a disk.  If a user wants to restart the failed application as soon
> as possible to keep downtime to a minimum, this extra transfer time will
> be a barrier.  So I think in-kernel filtering is still needed.

Yes, I agree - I don't think we presently have a way of avoiding having
to send all of that uninteresting data down the pipe.

One may, however, be able to play tricks with /proc/<pid>/mem from within
the corefile-generating program: select the vmas which are to be dumped,
read only those ones.  I don't know how practical that would be.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-03-30 10:29           ` Kawai, Hidehiro
  2007-03-30 16:10             ` Andrew Morton
@ 2007-03-31 13:03             ` David Howells
  1 sibling, 0 replies; 20+ messages in thread
From: David Howells @ 2007-03-31 13:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kawai, Hidehiro, Pavel Machek, kernel list, Robin Holt, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI

Andrew Morton <akpm@linux-foundation.org> wrote:

> Yes, I agree - I don't think we presently have a way of avoiding having
> to send all of that uninteresting data down the pipe.

Have the kernel fork and exec a debug program "just in time" with the dying
process ptrace-attached in advance.  That program could then use ptrace() or
whatever to do the coredump.  I think Windows does something like that.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] coredump: add an interface to control the core dump routine
  2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro
@ 2007-02-16 13:39 ` Kawai, Hidehiro
  0 siblings, 0 replies; 20+ messages in thread
From: Kawai, Hidehiro @ 2007-02-16 13:39 UTC (permalink / raw)
  To: Andrew Morton, kernel list
  Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, dhowells, Alan Cox,
	Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI@redhat

This patch adds an interface to set/reset a flag which determines
anonymous shared memory segments should be dumped or not when a core
file is generated.

/proc/<pid>/coredump_omit_anonymous_shared file is provided to access
the flag. You can change the flag status for a particular process by
writing to or reading from the file.

The flag status is inherited to the child process when it is created.

The flag is stored into coredump_omit_anon_shared member of mm_struct,
which shares bytes with dumpable member because these two are adjacent
bit fields. In order to avoid write-write race between the two, we use
a global spin lock.

smp_wmb() at updating dumpable is removed because set_dumpable()
includes a pair of spin lock and unlock which has the effect of
memory barrier.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 fs/exec.c             |   10 ++--
 fs/proc/base.c        |   99 ++++++++++++++++++++++++++++++++++++++++
 include/linux/sched.h |   33 +++++++++++++
 kernel/fork.c         |    3 +
 kernel/sys.c          |   62 ++++++++-----------------
 security/commoncap.c  |    2 
 security/dummy.c      |    2 
 7 files changed, 164 insertions(+), 47 deletions(-)

Index: linux-2.6.20-mm1/fs/proc/base.c
===================================================================
--- linux-2.6.20-mm1.orig/fs/proc/base.c
+++ linux-2.6.20-mm1/fs/proc/base.c
@@ -74,6 +74,7 @@
 #include <linux/poll.h>
 #include <linux/nsproxy.h>
 #include <linux/oom.h>
+#include <linux/elf.h>
 #include "internal.h"
 
 /* NOTE:
@@ -1753,6 +1754,100 @@ static const struct inode_operations pro
 
 #endif
 
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+static ssize_t proc_coredump_omit_anon_shared_read(struct file *file,
+						   char __user *buf,
+						   size_t count,
+						   loff_t *ppos)
+{
+	struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+	struct mm_struct *mm;
+	char buffer[PROC_NUMBUF];
+	size_t len;
+	loff_t __ppos = *ppos;
+	int ret;
+
+	ret = -ESRCH;
+	if (!task)
+		goto out_no_task;
+
+	ret = 0;
+	mm = get_task_mm(task);
+	if (!mm)
+		goto out_no_mm;
+
+	len = snprintf(buffer, sizeof(buffer), "%u\n",
+		       mm->coredump_omit_anon_shared);
+	if (__ppos >= len)
+		goto out;
+	if (count > len - __ppos)
+		count = len - __ppos;
+
+	ret = -EFAULT;
+	if (copy_to_user(buf, buffer + __ppos, count))
+		goto out;
+
+	ret = count;
+	*ppos = __ppos + count;
+
+ out:
+	mmput(mm);
+ out_no_mm:
+	put_task_struct(task);
+ out_no_task:
+	return ret;
+}
+
+static ssize_t proc_coredump_omit_anon_shared_write(struct file *file,
+						    const char __user *buf,
+						    size_t count,
+						    loff_t *ppos)
+{
+	struct task_struct *task;
+	struct mm_struct *mm;
+	char buffer[PROC_NUMBUF], *end;
+	unsigned int val;
+	int ret;
+
+	ret = -EFAULT;
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		goto out_no_task;
+
+	ret = -EINVAL;
+	val = (unsigned int)simple_strtoul(buffer, &end, 0);
+	if (*end == '\n')
+		end++;
+	if (end - buffer == 0)
+		goto out_no_task;
+
+	ret = -ESRCH;
+	task = get_proc_task(file->f_dentry->d_inode);
+	if (!task)
+		goto out_no_task;
+
+	ret = end - buffer;
+	mm = get_task_mm(task);
+	if (!mm)
+		goto out_no_mm;
+
+	set_coredump_omit_anon_shared(mm, (val != 0));
+
+	mmput(mm);
+ out_no_mm:
+	put_task_struct(task);
+ out_no_task:
+	return ret;
+}
+
+static struct file_operations proc_coredump_omit_anon_shared_operations = {
+	.read		= proc_coredump_omit_anon_shared_read,
+	.write		= proc_coredump_omit_anon_shared_write,
+};
+#endif
+
 /*
  * /proc/self:
  */
@@ -1972,6 +2067,10 @@ static struct pid_entry tgid_base_stuff[
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
 #endif
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+	REG("coredump_omit_anonymous_shared", S_IRUGO|S_IWUSR,
+	    coredump_omit_anon_shared),
+#endif
 #ifdef CONFIG_TASK_IO_ACCOUNTING
 	INF("io",	S_IRUGO, pid_io_accounting),
 #endif
Index: linux-2.6.20-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.20-mm1.orig/include/linux/sched.h
+++ linux-2.6.20-mm1/include/linux/sched.h
@@ -366,7 +366,13 @@ struct mm_struct {
 	unsigned int token_priority;
 	unsigned int last_interval;
 
+	/*
+	 * Writing to these bit fields can cause race condition.  To avoid
+	 * the race, use dump_bits_lock.  You may also use set_dumpable() or
+	 * set_coredump_*() macros to set a value.
+	 */
 	unsigned char dumpable:2;
+	unsigned char coredump_omit_anon_shared:1;  /* don't dump anon shm */
 
 	/* coredumping support */
 	int core_waiters;
@@ -1721,6 +1727,33 @@ static inline void inc_syscw(struct task
 }
 #endif
 
+#include <linux/elf.h>
+/*
+ * These macros are used to protect dumpable and coredump_omit_anon_shared bit
+ * fields in mm_struct from write race between the two.
+ */
+extern spinlock_t dump_bits_lock;
+#define __set_dump_bits(dest, val) \
+	do {					\
+		spin_lock(&dump_bits_lock);	\
+		(dest) = (val);			\
+		spin_unlock(&dump_bits_lock);	\
+	} while (0)
+
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+# define set_dumpable(mm, val) \
+	__set_dump_bits((mm)->dumpable, val)
+# define set_coredump_omit_anon_shared(mm, val) \
+	__set_dump_bits((mm)->coredump_omit_anon_shared, val)
+#else
+# define set_dumpable(mm, val) \
+	do {				\
+		(mm)->dumpable = (val);	\
+		smp_wmb();		\
+	} while (0)
+# define set_coredump_omit_anon_shared(mm, val)
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif
Index: linux-2.6.20-mm1/fs/exec.c
===================================================================
--- linux-2.6.20-mm1.orig/fs/exec.c
+++ linux-2.6.20-mm1/fs/exec.c
@@ -62,6 +62,8 @@ int core_uses_pid;
 char core_pattern[128] = "core";
 int suid_dumpable = 0;
 
+DEFINE_SPINLOCK(dump_bits_lock);
+
 EXPORT_SYMBOL(suid_dumpable);
 /* The maximal length of core_pattern is also specified in sysctl.c */
 
@@ -853,9 +855,9 @@ int flush_old_exec(struct linux_binprm *
 	current->sas_ss_sp = current->sas_ss_size = 0;
 
 	if (current->euid == current->uid && current->egid == current->gid)
-		current->mm->dumpable = 1;
+		set_dumpable(current->mm, 1);
 	else
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 
 	name = bprm->filename;
 
@@ -883,7 +885,7 @@ int flush_old_exec(struct linux_binprm *
 	    file_permission(bprm->file, MAY_READ) ||
 	    (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)) {
 		suid_keys(current);
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 	}
 
 	/* An exec changes our domain. We are no longer part of the thread
@@ -1477,7 +1479,7 @@ int do_coredump(long signr, int exit_cod
 		flag = O_EXCL;		/* Stop rewrite attacks */
 		current->fsuid = 0;	/* Dump root private */
 	}
-	mm->dumpable = 0;
+	set_dumpable(mm, 0);
 
 	retval = coredump_wait(exit_code);
 	if (retval < 0)
Index: linux-2.6.20-mm1/kernel/fork.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/fork.c
+++ linux-2.6.20-mm1/kernel/fork.c
@@ -333,6 +333,9 @@ static struct mm_struct * mm_init(struct
 	atomic_set(&mm->mm_count, 1);
 	init_rwsem(&mm->mmap_sem);
 	INIT_LIST_HEAD(&mm->mmlist);
+	/* don't need to use set_coredump_omit_anon_shared() */
+	mm->coredump_omit_anon_shared =
+		(current->mm) ?	current->mm->coredump_omit_anon_shared : 0;
 	mm->core_waiters = 0;
 	mm->nr_ptes = 0;
 	set_mm_counter(mm, file_rss, 0);
Index: linux-2.6.20-mm1/kernel/sys.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/sys.c
+++ linux-2.6.20-mm1/kernel/sys.c
@@ -1022,10 +1022,8 @@ asmlinkage long sys_setregid(gid_t rgid,
 		else
 			return -EPERM;
 	}
-	if (new_egid != old_egid) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (new_egid != old_egid)
+		set_dumpable(current->mm, suid_dumpable);
 	if (rgid != (gid_t) -1 ||
 	    (egid != (gid_t) -1 && egid != old_rgid))
 		current->sgid = new_egid;
@@ -1052,16 +1050,12 @@ asmlinkage long sys_setgid(gid_t gid)
 		return retval;
 
 	if (capable(CAP_SETGID)) {
-		if (old_egid != gid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (old_egid != gid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->gid = current->egid = current->sgid = current->fsgid = gid;
 	} else if ((gid == current->gid) || (gid == current->sgid)) {
-		if (old_egid != gid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (old_egid != gid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->egid = current->fsgid = gid;
 	}
 	else
@@ -1089,10 +1083,8 @@ static int set_user(uid_t new_ruid, int 
 
 	switch_uid(new_user);
 
-	if (dumpclear) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (dumpclear)
+		set_dumpable(current->mm, suid_dumpable);
 	current->uid = new_ruid;
 	return 0;
 }
@@ -1145,10 +1137,8 @@ asmlinkage long sys_setreuid(uid_t ruid,
 	if (new_ruid != old_ruid && set_user(new_ruid, new_euid != old_euid) < 0)
 		return -EAGAIN;
 
-	if (new_euid != old_euid) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (new_euid != old_euid)
+		set_dumpable(current->mm, suid_dumpable);
 	current->fsuid = current->euid = new_euid;
 	if (ruid != (uid_t) -1 ||
 	    (euid != (uid_t) -1 && euid != old_ruid))
@@ -1195,10 +1185,8 @@ asmlinkage long sys_setuid(uid_t uid)
 	} else if ((uid != current->uid) && (uid != new_suid))
 		return -EPERM;
 
-	if (old_euid != uid) {
-		current->mm->dumpable = suid_dumpable;
-		smp_wmb();
-	}
+	if (old_euid != uid)
+		set_dumpable(current->mm, suid_dumpable);
 	current->fsuid = current->euid = uid;
 	current->suid = new_suid;
 
@@ -1240,10 +1228,8 @@ asmlinkage long sys_setresuid(uid_t ruid
 			return -EAGAIN;
 	}
 	if (euid != (uid_t) -1) {
-		if (euid != current->euid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (euid != current->euid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->euid = euid;
 	}
 	current->fsuid = current->euid;
@@ -1290,10 +1276,8 @@ asmlinkage long sys_setresgid(gid_t rgid
 			return -EPERM;
 	}
 	if (egid != (gid_t) -1) {
-		if (egid != current->egid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (egid != current->egid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->egid = egid;
 	}
 	current->fsgid = current->egid;
@@ -1336,10 +1320,8 @@ asmlinkage long sys_setfsuid(uid_t uid)
 	if (uid == current->uid || uid == current->euid ||
 	    uid == current->suid || uid == current->fsuid || 
 	    capable(CAP_SETUID)) {
-		if (uid != old_fsuid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (uid != old_fsuid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->fsuid = uid;
 	}
 
@@ -1365,10 +1347,8 @@ asmlinkage long sys_setfsgid(gid_t gid)
 	if (gid == current->gid || gid == current->egid ||
 	    gid == current->sgid || gid == current->fsgid || 
 	    capable(CAP_SETGID)) {
-		if (gid != old_fsgid) {
-			current->mm->dumpable = suid_dumpable;
-			smp_wmb();
-		}
+		if (gid != old_fsgid)
+			set_dumpable(current->mm, suid_dumpable);
 		current->fsgid = gid;
 		key_fsgid_changed(current);
 		proc_id_connector(current, PROC_EVENT_GID);
@@ -2163,7 +2143,7 @@ asmlinkage long sys_prctl(int option, un
 				error = -EINVAL;
 				break;
 			}
-			current->mm->dumpable = arg2;
+			set_dumpable(current->mm, arg2);
 			break;
 
 		case PR_SET_UNALIGN:
Index: linux-2.6.20-mm1/security/commoncap.c
===================================================================
--- linux-2.6.20-mm1.orig/security/commoncap.c
+++ linux-2.6.20-mm1/security/commoncap.c
@@ -244,7 +244,7 @@ void cap_bprm_apply_creds (struct linux_
 
 	if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
 	    !cap_issubset (new_permitted, current->cap_permitted)) {
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 
 		if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 			if (!capable(CAP_SETUID)) {
Index: linux-2.6.20-mm1/security/dummy.c
===================================================================
--- linux-2.6.20-mm1.orig/security/dummy.c
+++ linux-2.6.20-mm1/security/dummy.c
@@ -130,7 +130,7 @@ static void dummy_bprm_free_security (st
 static void dummy_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
 {
 	if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) {
-		current->mm->dumpable = suid_dumpable;
+		set_dumpable(current->mm, suid_dumpable);
 
 		if ((unsafe & ~LSM_UNSAFE_PTRACE_CAP) && !capable(CAP_SETUID)) {
 			bprm->e_uid = current->uid;



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-03-31 13:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-02  4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro
2007-03-02  4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
2007-03-02  9:34   ` Pavel Machek
2007-03-26 13:02     ` Kawai, Hidehiro
2007-03-29 10:49       ` Pavel Machek
2007-03-29 19:16       ` David Howells
2007-03-29 21:17         ` Andrew Morton
2007-03-30 10:29           ` Kawai, Hidehiro
2007-03-30 16:10             ` Andrew Morton
2007-03-31 13:03             ` David Howells
2007-03-02  4:49 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro
2007-03-02  4:50 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro
2007-03-02  4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
2007-03-02  9:35   ` Pavel Machek
2007-03-20 11:11     ` Kawai, Hidehiro
2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton
2007-03-23 13:13   ` Kawai, Hidehiro
2007-03-28 12:37     ` Kawai, Hidehiro
2007-03-28 17:32       ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro
2007-02-16 13:39 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.