linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] multithreaded coredumps for elf exeecutables
@ 2002-03-15 11:37 Vamsi Krishna S .
  2002-03-19 15:29 ` Pavel Machek
  0 siblings, 1 reply; 18+ messages in thread
From: Vamsi Krishna S . @ 2002-03-15 11:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: alan, marcelo, dan, tachino, jefreyr, mgross, vamsi_krishna,
	richardj_moore, hanharat, bsuparna, bharata, asit.k.mallick,
	david.p.howell, tony.luck, sunil.saxena


Here is a kernel patch to support multithreaded coredumps being worked on
by Mark Gross (Intel, mgross@unix-os.sc.intel.com) and 
Vamsi Krishna (IBM, vamsi_krishna@in.ibm.com).

Multi-threaded core dump patch for 2.4.17:
- multithreaded coredump functionality is enabled by a new sysctl
  core_dumps_threads. (0 = off, 1 = on).

- Core dump is started by the first thread which gets the signal

- Threads are located by walking the entire task list looking for tasks
  with matching mm as that seems to be the only reliable way to locate 
  other threads of a given task. In fact, IMO this is the only way 
  until all user space libraries migrate to using thread groups 
  provides by linux kernel (CLONE_THREAD).

- Other threads are prevented from executing while core dump is in 
  progress to improve the accuracy of the dumps. This is done without 
  changing the state of the task. We set cpus_allowed in task struct 
  to be 0 to stop a task from being scheduled and reset it to -1 for
  resume execution. This has the advantage to not depending on user
  space at all for correct functioning. IMO sending SIGSTOP to stop 
  other threads does not work if the process is being run under a 
  debugger. The only possible issue with using cpus_allowed is that 
  we could lose task affinities once a core dump is taken. However, 
  this is not a big deal as the task is going to die anyway fairly 
  soon, which is why the dump was taken in the first place.

- Support of SSE registers in the core dump

- Code cleanups/reorg - breakup into smaller functions. Main function
  elf_core_dump() reorganized/cleaned up by moving filling up of elfhdr,
  prstatus, psinfo and notes to separate functions to make this very 
  long function a little bit more readable and to reuse some code with 
  the function dumping status of other threads.

- Easy to port to other architectures. It just needs
  ELF_CORE_COPY_TASK_REG - to copy task specific registers
  ELF_CORE_COPY_FPREGS - to copy floating point registers and
  ELF_CORE_COPY_XFPREGS - to copy extended fp registers(SSE) if present
  ELF_CORE_SYNC - to sync up fpu status of other processors in SMP 
                  systems if needed by a particular architecture. 
                  Read the patch for more details.

- We started with the tcore patches by John Jones and Jason Villarreal 
  as base which were then heavily reworked. This patch is entirely 
  different from theirs in pretty much all aspects.

Current TODO list:
- May be remove reschedule_other_cpus in suspend_other_threads and do 
  this as part of ELF_CORE_SYNC.  Rescheduling the other CPUs the way 
  the current patch work may be over kill for accurate core files.  
  Any thoughts?
- Port to 2.5.x, specially the logic to stop other threads from executing
  while dumping is in progress.
- Make the loop looking for other threads a little shorter by counting
  the number of tasks found and breaking out of for_each_task loop
  when it is equal to current->mm->mm_users.

Some usage notes on this patch:
GDB 5.1 works with the core files produced, but only for Red Hat 7.2, and
only if the /lib/i686/libpthread.so library is hidden.  It turns out that
for IA32 RedHat, that there exists 2 libpthread.so files.  If the
/lib/i686/libpthread.so is loaded then the gdb post mortem debug will not
work.  We don't understand what's going on here, but its real.  Hide the
/lib/i686/libpthread.so such that the /lib/libpthread.so gets loaded at
debug time, and then debugger will work with the core file.  Any insights
into this is very much welcome.  This behavior is very mysterious to us.

Thanks to Bharata B Rao(IBM) for helping with capturing FPU registers and
testing and Suparna Bhattacharya(IBM) for design discussions.

Thanks to Tony Luck (Intel) and Jun Nakajima (Intel) for helping with the
review and design of the suspend_other_threads implementation.

This is currently i386 only, it has been unit tested on 1P-P4, 2P-P4,
and 4P-PIII systems. I haven't seen any failures so far, YMMV.

The patch is against kernel version 2.4.17. We will port this to latest
versions of the kernel if there is any interest.

Regards.. Vamsi.

Vamsi Krishna S.
Linux Technology Center,
IBM Software Lab, Bangalore.
Ph: +91 80 5044959
Internet: vamsi_krishna@in.ibm.com

-- patch here--

diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/arch/i386/kernel/i387.c 2417-tcore/arch/i386/kernel/i387.c
--- /usr/src/2417-pure/arch/i386/kernel/i387.c	Fri Feb 23 23:39:08 2001
+++ 2417-tcore/arch/i386/kernel/i387.c	Fri Mar 15 11:52:28 2002
@@ -520,3 +520,42 @@
 
 	return fpvalid;
 }
+
+int dump_task_fpu( struct task_struct *tsk, struct user_i387_struct *fpu )
+{
+        int fpvalid;
+
+        fpvalid = tsk->used_math;
+        if ( fpvalid ) {
+                if (tsk == current) unlazy_fpu( tsk );
+                if ( cpu_has_fxsr ) {
+                        copy_fpu_fxsave( tsk, fpu );
+                } else {
+                        copy_fpu_fsave( tsk, fpu );
+                }
+        }
+
+        return fpvalid;
+}
+
+int dump_task_extended_fpu( struct task_struct *tsk, struct user_fxsr_struct *fpu )
+{
+        int fpvalid;
+
+        fpvalid = tsk->used_math && cpu_has_fxsr;
+        if ( fpvalid ) {
+                if (tsk == current) unlazy_fpu( tsk );
+                memcpy( fpu, &tsk->thread.i387.fxsave,
+                        sizeof(struct user_fxsr_struct) );
+        }
+
+        return fpvalid;
+}
+
+#ifdef CONFIG_SMP
+void dump_smp_unlazy_fpu(void)
+{
+	unlazy_fpu(current);
+	return;
+}
+#endif
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/arch/i386/kernel/process.c 2417-tcore/arch/i386/kernel/process.c
--- /usr/src/2417-pure/arch/i386/kernel/process.c	Fri Oct  5 07:12:54 2001
+++ 2417-tcore/arch/i386/kernel/process.c	Fri Mar 15 11:52:28 2002
@@ -642,6 +642,19 @@
 	dump->u_fpvalid = dump_fpu (regs, &dump->i387);
 }
 
+/* 
+ * Capture the user space registers if the task is not running (in user space)
+ */
+int dump_task_regs(struct task_struct *tsk, struct pt_regs *regs)
+{
+	*regs = *(struct pt_regs *)((unsigned long)tsk + THREAD_SIZE - sizeof(struct pt_regs));
+	regs->xcs &= 0xffff;
+	regs->xds &= 0xffff;
+	regs->xes &= 0xffff;
+	regs->xss &= 0xffff;
+	return 1;
+}
+
 /*
  * This special macro can be used to load a debugging register
  */
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/fs/binfmt_elf.c 2417-tcore/fs/binfmt_elf.c
--- /usr/src/2417-pure/fs/binfmt_elf.c	Fri Dec 21 23:11:55 2001
+++ 2417-tcore/fs/binfmt_elf.c	Fri Mar 15 11:54:49 2002
@@ -31,6 +31,7 @@
 #include <linux/init.h>
 #include <linux/highuid.h>
 #include <linux/smp_lock.h>
+#include <linux/smp.h>
 #include <linux/compiler.h>
 #include <linux/highmem.h>
 
@@ -960,7 +961,7 @@
 /* #define DEBUG */
 
 #ifdef DEBUG
-static void dump_regs(const char *str, elf_greg_t *r)
+static void dump_regs(const char *str, elf_gregset_t *r)
 {
 	int i;
 	static const char *regs[] = { "ebx", "ecx", "edx", "esi", "edi", "ebp",
@@ -1008,6 +1009,255 @@
 #define DUMP_SEEK(off)	\
 	if (!dump_seek(file, (off))) \
 		goto end_coredump;
+
+static inline void fill_elf_header(struct elfhdr *elf, int segs)
+{
+	memcpy(elf->e_ident, ELFMAG, SELFMAG);
+	elf->e_ident[EI_CLASS] = ELF_CLASS;
+	elf->e_ident[EI_DATA] = ELF_DATA;
+	elf->e_ident[EI_VERSION] = EV_CURRENT;
+	memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
+
+	elf->e_type = ET_CORE;
+	elf->e_machine = ELF_ARCH;
+	elf->e_version = EV_CURRENT;
+	elf->e_entry = 0;
+	elf->e_phoff = sizeof(struct elfhdr);
+	elf->e_shoff = 0;
+	elf->e_flags = 0;
+	elf->e_ehsize = sizeof(struct elfhdr);
+	elf->e_phentsize = sizeof(struct elf_phdr);
+	elf->e_phnum = segs;
+	elf->e_shentsize = 0;
+	elf->e_shnum = 0;
+	elf->e_shstrndx = 0;
+	return;
+}
+
+static inline void fill_elf_note_phdr(struct elf_phdr *phdr, int sz, off_t offset)
+{
+	phdr->p_type = PT_NOTE;
+	phdr->p_offset = offset;
+	phdr->p_vaddr = 0;
+	phdr->p_paddr = 0;
+	phdr->p_filesz = sz;
+	phdr->p_memsz = 0;
+	phdr->p_flags = 0;
+	phdr->p_align = 0;
+	return;
+}
+
+static inline void fill_note(struct memelfnote *note, const char *name, int type, 
+		unsigned int sz, void *data)
+{
+	note->name = name;
+	note->type = type;
+	note->datasz = sz;
+	note->data = data;
+	return;
+}
+
+/*
+ * fill up all the fields in prstatus from the given task struct, except registers
+ * which need to be filled up seperately.
+ */
+static inline void fill_prstatus(struct elf_prstatus *prstatus, struct task_struct *p, long signr) 
+{
+	prstatus->pr_info.si_signo = prstatus->pr_cursig = signr;
+	prstatus->pr_sigpend = p->pending.signal.sig[0];
+	prstatus->pr_sighold = p->blocked.sig[0];
+	prstatus->pr_pid = p->pid;
+	prstatus->pr_ppid = p->p_pptr->pid;
+	prstatus->pr_pgrp = p->pgrp;
+	prstatus->pr_sid = p->session;
+	prstatus->pr_utime.tv_sec = CT_TO_SECS(p->times.tms_utime);
+	prstatus->pr_utime.tv_usec = CT_TO_USECS(p->times.tms_utime);
+	prstatus->pr_stime.tv_sec = CT_TO_SECS(p->times.tms_stime);
+	prstatus->pr_stime.tv_usec = CT_TO_USECS(p->times.tms_stime);
+	prstatus->pr_cutime.tv_sec = CT_TO_SECS(p->times.tms_cutime);
+	prstatus->pr_cutime.tv_usec = CT_TO_USECS(p->times.tms_cutime);
+	prstatus->pr_cstime.tv_sec = CT_TO_SECS(p->times.tms_cstime);
+	prstatus->pr_cstime.tv_usec = CT_TO_USECS(p->times.tms_cstime);
+	return;
+}
+
+static inline void fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p)
+{
+	int i;
+	
+	psinfo->pr_pid = p->pid;
+	psinfo->pr_ppid = p->p_pptr->pid;
+	psinfo->pr_pgrp = p->pgrp;
+	psinfo->pr_sid = p->session;
+
+	i = p->state ? ffz(~p->state) + 1 : 0;
+	psinfo->pr_state = i;
+	psinfo->pr_sname = (i < 0 || i > 5) ? '.' : "RSDZTD"[i];
+	psinfo->pr_zomb = psinfo->pr_sname == 'Z';
+	psinfo->pr_nice = p->nice;
+	psinfo->pr_flag = p->flags;
+	psinfo->pr_uid = NEW_TO_OLD_UID(p->uid);
+	psinfo->pr_gid = NEW_TO_OLD_GID(p->gid);
+	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
+	return;
+}
+
+/*
+ * This is the variable that can be set in proc to determine if we want to
+ * dump a multithreaded core or not. A value of 1 means yes while any
+ * other value means no.
+ *
+ * It is located at /proc/sys/kernel/core_dumps_threads
+ */
+
+int core_dumps_threads = 0;
+
+/* Here is the structure in which status of each thread is captured. */
+struct elf_thread_status
+{
+	struct list_head list;
+	struct elf_prstatus prstatus;	/* NT_PRSTATUS */
+	elf_fpregset_t fpu;		/* NT_PRFPREG */
+	elf_fpxregset_t xfpu;		/* NT_PRXFPREG */
+	struct memelfnote notes[3];
+	int num_notes;
+};
+
+#ifdef CONFIG_SMP
+/*
+ * trivial function used for SMP CPU synchronization.
+ * It doesn't do anything.
+ */
+void do_nothing(void *var)
+{
+	return;
+}
+#endif
+
+/*
+ * Suspend execution of other threads belonging to the same multithreaded process 
+ * of current, ASAP.
+ *
+ * Sets the current->cpu_mask to the current cpu to avoid cpu migration durring the dump.
+ * This cpu will also be the only cpu the other threads will be allowed to run after 
+ * coredump is completed. This seems to be needed to fix some SMP races.  This still
+ * needs some more thought though this solution works.
+ *
+ * TODO: Rethink the logic used to find other threads.
+ */
+static unsigned long suspend_other_threads(void)
+{
+	struct task_struct *p;
+
+	/*
+	 * brute force method uses the runqueue_lock contention.  Grab this lock, and
+	 * force a schedule call on all the other CPU's to get them spinning.
+	 */
+	read_lock(&tasklist_lock);
+	spin_lock(&runqueue_lock);
+
+	task_lock(current);
+	current->cpus_allowed = current->cpus_runnable; /* prevent cpu migration */
+	task_unlock(current);
+			
+	reschedule_other_cpus();
+		
+	for_each_task(p)
+		if (current->mm == p->mm && current != p) {
+			task_lock(p);
+			/* 
+			 * force yield and keep waking processes from getting scheduled
+			 * in. The following will result in these processes getting swapped out and
+			 * not swapped in by the scheduler if they have been sleeping.
+			 */
+			p->cpus_allowed = 0UL;
+			task_unlock(p);
+		}
+		
+	spin_unlock(&runqueue_lock);
+	
+	/* let them all run again.. */
+	read_unlock(&tasklist_lock);
+
+	/* 
+	 * now we sychronize on all the CPU's to make sure
+	 * none of the other thread processes are not running in 
+	 * user space before we proceed.
+	 *
+	 * We have a race from the time the runqueue_lock is released and the 
+	 * time __switch_to gets called that can result in bogus FPU/XFPU register 
+	 * data in the core file, so we use ELF_CORE_SYNC with smp_call_function
+	 * which on SMP evaluates to a call which grabs the FPU state.
+	 */
+	smp_call_function(ELF_CORE_SYNC, NULL, 1,1);
+
+	return current->cpus_allowed;
+}
+
+/*
+ * resume execution of other threads on the cpu given the cpu_mask.
+ */
+static void resume_other_threads(unsigned long current_cpu_mask)
+{
+	struct task_struct *p;
+
+	if(current_cpu_mask != current->cpus_runnable)
+		printk(KERN_WARNING "tcore: multithread core dump CPU affinity assumption violated"); /* BUG would be too harsh */
+
+	read_lock(&tasklist_lock);
+	for_each_task(p)
+		if (current->mm == p->mm && current != p) {
+			task_lock(p);			
+			p->cpus_allowed = current_cpu_mask;
+			task_unlock(p);
+		}
+	read_unlock(&tasklist_lock);
+
+	return;
+}
+
+/*
+ * In order to add the specific thread information for the elf file format,
+ * we need to keep a linked list of every threads pr_status and then
+ * create a single section for them in the final core file.
+ */
+static int elf_dump_thread_status(long signr, struct task_struct * p, struct list_head * thread_list)
+{
+
+	struct elf_thread_status *t;
+	int sz = 0;
+
+	t = kmalloc(sizeof(*t), GFP_KERNEL);
+	if (!t) {
+		printk(KERN_WARNING "Cannot allocate memory for thread status.\n");
+		return 0;
+	}
+
+	INIT_LIST_HEAD(&t->list);
+	t->num_notes = 0;
+
+	fill_prstatus(&t->prstatus, p, signr);
+	elf_core_copy_task_regs(p, &t->prstatus.pr_reg);	
+	fill_note(&t->notes[0], "CORE", NT_PRSTATUS, sizeof(t->prstatus), &(t->prstatus));
+	t->num_notes++;
+	sz += notesize(&t->notes[0]);
+
+	if ((t->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(p, &t->fpu))) {
+		fill_note(&t->notes[1], "CORE", NT_PRFPREG, sizeof(t->fpu), &(t->fpu));
+		t->num_notes++;
+		sz += notesize(&t->notes[1]);
+	}
+
+	if (elf_core_copy_task_xfpregs(p, &t->xfpu)) {
+		fill_note(&t->notes[2], "LINUX", NT_PRXFPREG, sizeof(t->xfpu), &(t->xfpu));
+		t->num_notes++;
+		sz += notesize(&t->notes[2]);
+	}
+
+	list_add(&t->list, thread_list);
+	return sz;
+}
+
 /*
  * Actual dumper
  *
@@ -1026,12 +1276,32 @@
 	struct elfhdr elf;
 	off_t offset = 0, dataoff;
 	unsigned long limit = current->rlim[RLIMIT_CORE].rlim_cur;
-	int numnote = 4;
-	struct memelfnote notes[4];
+	int numnote = 5;
+	struct memelfnote notes[5];
 	struct elf_prstatus prstatus;	/* NT_PRSTATUS */
-	elf_fpregset_t fpu;		/* NT_PRFPREG */
 	struct elf_prpsinfo psinfo;	/* NT_PRPSINFO */
+ 	struct task_struct *p;
+ 	LIST_HEAD(thread_list);
+ 	struct list_head *t;
+	unsigned long cpu_mask = 0xFFFFFFFF;
+	elf_fpregset_t fpu;
+	elf_fpxregset_t xfpu;
+	int dump_threads = 0;
+	int thread_status_size = 0;
+	
+	/* now stop all vm operations */
+	down_write(&current->mm->mmap_sem);
+	segs = current->mm->map_count;
+
+ 	if (atomic_read(&current->mm->mm_users) != 1) {
+		dump_threads = core_dumps_threads;
+	}
 
+	/* First pause all related threaded processes */
+	if (dump_threads) {
+		cpu_mask = suspend_other_threads();
+	}
+		
 	/* first copy the parameters from user space */
 	memset(&psinfo, 0, sizeof(psinfo));
 	{
@@ -1049,34 +1319,30 @@
 
 	}
 
-	/* now stop all vm operations */
-	down_write(&current->mm->mmap_sem);
-	segs = current->mm->map_count;
+	if (dump_threads) {
+		/* capture the status of all other threads */
+		if (signr) {
+			read_lock(&tasklist_lock);
+			for_each_task(p)
+				if (current->mm == p->mm && current != p) {
+					int sz = elf_dump_thread_status(signr, p, &thread_list);
+					if (!sz) {
+						read_unlock(&tasklist_lock);
+						goto cleanup;
+					}
+					else
+						thread_status_size += sz;
+				}
+			read_unlock(&tasklist_lock);
+		}
+	} /* End if(dump_threads) */
 
 #ifdef DEBUG
 	printk("elf_core_dump: %d segs %lu limit\n", segs, limit);
 #endif
 
 	/* Set up header */
-	memcpy(elf.e_ident, ELFMAG, SELFMAG);
-	elf.e_ident[EI_CLASS] = ELF_CLASS;
-	elf.e_ident[EI_DATA] = ELF_DATA;
-	elf.e_ident[EI_VERSION] = EV_CURRENT;
-	memset(elf.e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
-
-	elf.e_type = ET_CORE;
-	elf.e_machine = ELF_ARCH;
-	elf.e_version = EV_CURRENT;
-	elf.e_entry = 0;
-	elf.e_phoff = sizeof(elf);
-	elf.e_shoff = 0;
-	elf.e_flags = 0;
-	elf.e_ehsize = sizeof(elf);
-	elf.e_phentsize = sizeof(struct elf_phdr);
-	elf.e_phnum = segs+1;		/* Include notes */
-	elf.e_shentsize = 0;
-	elf.e_shnum = 0;
-	elf.e_shstrndx = 0;
+	fill_elf_header(&elf, segs+1); /* including notes section*/
 
 	fs = get_fs();
 	set_fs(KERNEL_DS);
@@ -1093,79 +1359,35 @@
 	 * with info from their /proc.
 	 */
 	memset(&prstatus, 0, sizeof(prstatus));
-
-	notes[0].name = "CORE";
-	notes[0].type = NT_PRSTATUS;
-	notes[0].datasz = sizeof(prstatus);
-	notes[0].data = &prstatus;
-	prstatus.pr_info.si_signo = prstatus.pr_cursig = signr;
-	prstatus.pr_sigpend = current->pending.signal.sig[0];
-	prstatus.pr_sighold = current->blocked.sig[0];
-	psinfo.pr_pid = prstatus.pr_pid = current->pid;
-	psinfo.pr_ppid = prstatus.pr_ppid = current->p_pptr->pid;
-	psinfo.pr_pgrp = prstatus.pr_pgrp = current->pgrp;
-	psinfo.pr_sid = prstatus.pr_sid = current->session;
-	prstatus.pr_utime.tv_sec = CT_TO_SECS(current->times.tms_utime);
-	prstatus.pr_utime.tv_usec = CT_TO_USECS(current->times.tms_utime);
-	prstatus.pr_stime.tv_sec = CT_TO_SECS(current->times.tms_stime);
-	prstatus.pr_stime.tv_usec = CT_TO_USECS(current->times.tms_stime);
-	prstatus.pr_cutime.tv_sec = CT_TO_SECS(current->times.tms_cutime);
-	prstatus.pr_cutime.tv_usec = CT_TO_USECS(current->times.tms_cutime);
-	prstatus.pr_cstime.tv_sec = CT_TO_SECS(current->times.tms_cstime);
-	prstatus.pr_cstime.tv_usec = CT_TO_USECS(current->times.tms_cstime);
+	fill_prstatus(&prstatus, current, signr);
+	fill_note(&notes[0], "CORE", NT_PRSTATUS, sizeof(prstatus), &prstatus);
 
 	/*
 	 * This transfers the registers from regs into the standard
 	 * coredump arrangement, whatever that is.
 	 */
-#ifdef ELF_CORE_COPY_REGS
-	ELF_CORE_COPY_REGS(prstatus.pr_reg, regs)
-#else
-	if (sizeof(elf_gregset_t) != sizeof(struct pt_regs))
-	{
-		printk("sizeof(elf_gregset_t) (%ld) != sizeof(struct pt_regs) (%ld)\n",
-			(long)sizeof(elf_gregset_t), (long)sizeof(struct pt_regs));
-	}
-	else
-		*(struct pt_regs *)&prstatus.pr_reg = *regs;
-#endif
+	elf_core_copy_regs(&prstatus.pr_reg, regs);
 
 #ifdef DEBUG
 	dump_regs("Passed in regs", (elf_greg_t *)regs);
 	dump_regs("prstatus regs", (elf_greg_t *)&prstatus.pr_reg);
 #endif
 
-	notes[1].name = "CORE";
-	notes[1].type = NT_PRPSINFO;
-	notes[1].datasz = sizeof(psinfo);
-	notes[1].data = &psinfo;
-	i = current->state ? ffz(~current->state) + 1 : 0;
-	psinfo.pr_state = i;
-	psinfo.pr_sname = (i < 0 || i > 5) ? '.' : "RSDZTD"[i];
-	psinfo.pr_zomb = psinfo.pr_sname == 'Z';
-	psinfo.pr_nice = current->nice;
-	psinfo.pr_flag = current->flags;
-	psinfo.pr_uid = NEW_TO_OLD_UID(current->uid);
-	psinfo.pr_gid = NEW_TO_OLD_GID(current->gid);
-	strncpy(psinfo.pr_fname, current->comm, sizeof(psinfo.pr_fname));
-
-	notes[2].name = "CORE";
-	notes[2].type = NT_TASKSTRUCT;
-	notes[2].datasz = sizeof(*current);
-	notes[2].data = current;
+	fill_psinfo(&psinfo, current);
+	fill_note(&notes[1], "CORE", NT_PRPSINFO, sizeof(psinfo), &psinfo);
+	
+	fill_note(&notes[2], "CORE", NT_TASKSTRUCT, sizeof(*current), current);
 
 	/* Try to dump the FPU. */
-	prstatus.pr_fpvalid = dump_fpu (regs, &fpu);
-	if (!prstatus.pr_fpvalid)
-	{
-		numnote--;
-	}
-	else
-	{
-		notes[3].name = "CORE";
-		notes[3].type = NT_PRFPREG;
-		notes[3].datasz = sizeof(fpu);
-		notes[3].data = &fpu;
+	if ((prstatus.pr_fpvalid = elf_core_copy_task_fpregs(current, &fpu))) {
+		fill_note(&notes[3], "CORE", NT_PRFPREG, sizeof(fpu), &fpu);
+	} else {
+		--numnote;
+	}
+	if (elf_core_copy_task_xfpregs(current, &xfpu)) {
+		fill_note(&notes[4], "LINUX", NT_PRXFPREG, sizeof(xfpu), &xfpu);
+	} else {
+		--numnote;
 	}
 	
 	/* Write notes phdr entry */
@@ -1175,17 +1397,12 @@
 
 		for(i = 0; i < numnote; i++)
 			sz += notesize(&notes[i]);
+		
+		if (dump_threads)
+			sz += thread_status_size;
 
-		phdr.p_type = PT_NOTE;
-		phdr.p_offset = offset;
-		phdr.p_vaddr = 0;
-		phdr.p_paddr = 0;
-		phdr.p_filesz = sz;
-		phdr.p_memsz = 0;
-		phdr.p_flags = 0;
-		phdr.p_align = 0;
-
-		offset += phdr.p_filesz;
+		fill_elf_note_phdr(&phdr, sz, offset);
+		offset += sz;
 		DUMP_WRITE(&phdr, sizeof(phdr));
 	}
 
@@ -1214,10 +1431,21 @@
 		DUMP_WRITE(&phdr, sizeof(phdr));
 	}
 
+ 	/* write out the notes section */
 	for(i = 0; i < numnote; i++)
 		if (!writenote(&notes[i], file))
 			goto end_coredump;
 
+	/* write out the thread status notes section */
+ 	if (dump_threads)  {
+		list_for_each(t, &thread_list) {
+			struct elf_thread_status *tmp = list_entry(t, struct elf_thread_status, list);
+			for (i = 0; i < tmp->num_notes; i++)
+				if (!writenote(&tmp->notes[i], file))
+					goto end_coredump;
+		}
+ 	}
+ 
 	DUMP_SEEK(dataoff);
 
 	for(vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
@@ -1259,8 +1487,20 @@
 		       (off_t) file->f_pos, offset);
 	}
 
- end_coredump:
+end_coredump:
 	set_fs(fs);
+
+cleanup:
+	if (dump_threads)  {
+		while(!list_empty(&thread_list)) {
+			struct list_head *tmp = thread_list.next;
+			list_del(tmp);
+			kfree(list_entry(tmp, struct elf_thread_status, list));
+		}
+
+		resume_other_threads(cpu_mask);
+	}
+
 	up_write(&current->mm->mmap_sem);
 	return has_dumped;
 }
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/include/asm-i386/elf.h 2417-tcore/include/asm-i386/elf.h
--- /usr/src/2417-pure/include/asm-i386/elf.h	Fri Nov 23 01:18:29 2001
+++ 2417-tcore/include/asm-i386/elf.h	Fri Mar 15 11:52:28 2002
@@ -99,6 +99,18 @@
 
 #ifdef __KERNEL__
 #define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)
+
+extern int dump_task_regs (struct task_struct *, struct pt_regs *);
+extern int dump_task_fpu (struct task_struct *, struct user_i387_struct *);
+extern int dump_task_extended_fpu (struct task_struct *, struct user_fxsr_struct *);
+
+#define ELF_CORE_COPY_TASK_REGS(tsk, pt_regs) dump_task_regs(tsk, pt_regs)
+#define ELF_CORE_COPY_FPREGS(tsk, elf_fpregs) dump_task_fpu(tsk, elf_fpregs)
+#define ELF_CORE_COPY_XFPREGS(tsk, elf_xfpregs) dump_task_extended_fpu(tsk, elf_xfpregs)
+#ifdef CONFIG_SMP
+extern void dump_smp_unlazy_fpu(void);
+#define ELF_CORE_SYNC dump_smp_unlazy_fpu
 #endif
+#endif /* __KERNEL__ */
 
 #endif
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/include/linux/elf.h 2417-tcore/include/linux/elf.h
--- /usr/src/2417-pure/include/linux/elf.h	Fri Nov 23 01:18:29 2001
+++ 2417-tcore/include/linux/elf.h	Fri Mar 15 11:52:28 2002
@@ -576,6 +576,8 @@
 #define NT_PRPSINFO	3
 #define NT_TASKSTRUCT	4
 #define NT_PRFPXREG	20
+#define NT_PRXFPREG     0x46e62b7f	/* note name must be "LINUX" as per GDB */
+					/* from gdb5.1/include/elf/common.h */
 
 /* Note header in a PT_NOTE section */
 typedef struct elf32_note {
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/include/linux/elfcore.h 2417-tcore/include/linux/elfcore.h
--- /usr/src/2417-pure/include/linux/elfcore.h	Fri Nov 23 01:19:02 2001
+++ 2417-tcore/include/linux/elfcore.h	Fri Mar 15 11:52:28 2002
@@ -86,4 +86,56 @@
 #define PRARGSZ ELF_PRARGSZ 
 #endif
 
+#ifdef __KERNEL__
+static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs *regs)
+{
+#ifdef ELF_CORE_COPY_REGS
+	ELF_CORE_COPY_REGS((*elfregs), regs)
+#else
+	if (sizeof(elf_gregset_t) != sizeof(struct pt_regs)) {
+		printk("sizeof(elf_gregset_t) (%ld) != sizeof(struct pt_regs) (%ld)\n",
+			(long)sizeof(elf_gregset_t), (long)sizeof(struct pt_regs));
+	} else
+		*(struct pt_regs *)elfregs = *regs;
+#endif
+}
+
+static inline int elf_core_copy_task_regs(struct task_struct *t, elf_gregset_t *elfregs)
+{
+	struct pt_regs regs;
+#ifdef ELF_CORE_COPY_TASK_REGS
+	if (ELF_CORE_COPY_TASK_REGS(t, &regs)) {
+		elf_core_copy_regs(elfregs, &regs);
+		return 1;
+	}
+#endif
+	return 0;
+}
+
+static inline int elf_core_copy_task_fpregs(struct task_struct *t, elf_fpregset_t *fpu)
+{
+#ifdef ELF_CORE_COPY_FPREGS
+	return ELF_CORE_COPY_FPREGS(t, fpu);
+#else
+	return dump_fpu(NULL, fpu);
+#endif
+}
+
+static inline int elf_core_copy_task_xfpregs(struct task_struct *t, elf_fpxregset_t *xfpu)
+{
+#ifdef ELF_CORE_COPY_XFPREGS
+	return ELF_CORE_COPY_XFPREGS(t, xfpu);
+#else
+	return 0;
+#endif
+}
+
+#ifdef CONFIG_SMP
+#ifndef ELF_CORE_SYNC
+#define ELF_CORE_SYNC do_nothing
+#endif
+#endif
+
+#endif /* __KERNEL__ */
+
 #endif /* _LINUX_ELFCORE_H */
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/include/linux/sched.h 2417-tcore/include/linux/sched.h
--- /usr/src/2417-pure/include/linux/sched.h	Fri Dec 21 23:12:03 2001
+++ 2417-tcore/include/linux/sched.h	Fri Mar 15 11:52:28 2002
@@ -160,6 +160,10 @@
 extern int start_context_thread(void);
 extern int current_is_keventd(void);
 
+extern void reschedule_other_cpus(void);
+// forces all cpu's other than current to reschedule.  Needed for accurate core dumps.
+
+
 /*
  * The default fd array needs to be at least BITS_PER_LONG,
  * as this is the granularity returned by copy_fdset().
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/include/linux/sysctl.h 2417-tcore/include/linux/sysctl.h
--- /usr/src/2417-pure/include/linux/sysctl.h	Mon Nov 26 18:59:17 2001
+++ 2417-tcore/include/linux/sysctl.h	Fri Mar 15 11:52:28 2002
@@ -87,6 +87,7 @@
 	KERN_CAP_BSET=14,	/* int: capability bounding set */
 	KERN_PANIC=15,		/* int: panic timeout */
 	KERN_REALROOTDEV=16,	/* real root device to mount after initrd */
+	KERN_CORE_DUMPS_THREADS=17, /* int: include status of others threads in dump */
 
 	KERN_SPARC_REBOOT=21,	/* reboot command on Sparc */
 	KERN_CTLALTDEL=22,	/* int: allow ctl-alt-del to reboot */
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/kernel/sched.c 2417-tcore/kernel/sched.c
--- /usr/src/2417-pure/kernel/sched.c	Fri Dec 21 23:12:04 2001
+++ 2417-tcore/kernel/sched.c	Fri Mar 15 11:52:28 2002
@@ -121,7 +121,7 @@
 #else
 
 #define idle_task(cpu) (&init_task)
-#define can_schedule(p,cpu) (1)
+#define can_schedule(p,cpu) ((p)->cpus_allowed)
 
 #endif
 
@@ -704,6 +704,28 @@
 	return;
 }
 
+/*
+ * needed for accurate core dumps of multi-threaded applications.
+ * see binfmt_elf.c for more information.
+ */
+void reschedule_other_cpus(void)
+{
+#ifdef CONFIG_SMP
+	int i, cpu;
+	struct task_struct *p;
+
+	for(i=0; i< smp_num_cpus; i++) {
+		cpu = cpu_logical_map(i);
+		p = cpu_curr(cpu);
+		if (p->processor != smp_processor_id()) {
+			p->need_resched = 1;
+			smp_send_reschedule(p->processor);
+		}
+	}
+#endif	
+	return;
+}
+
 /*
  * The core wakeup function.  Non-exclusive wakeups (nr_exclusive == 0) just wake everything
  * up.  If it's an exclusive wakeup (nr_exclusive == small +ve number) then we wake all the
diff -urN -X /home/vamsi/dontdiff /usr/src/2417-pure/kernel/sysctl.c 2417-tcore/kernel/sysctl.c
--- /usr/src/2417-pure/kernel/sysctl.c	Fri Dec 21 23:12:04 2001
+++ 2417-tcore/kernel/sysctl.c	Fri Mar 15 11:52:28 2002
@@ -49,6 +49,7 @@
 extern int max_queued_signals;
 extern int sysrq_enabled;
 extern int core_uses_pid;
+extern int core_dumps_threads;
 extern int cad_pid;
 
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
@@ -169,6 +170,8 @@
 	 0644, NULL, &proc_doutsstring, &sysctl_string},
 	{KERN_PANIC, "panic", &panic_timeout, sizeof(int),
 	 0644, NULL, &proc_dointvec},
+	{KERN_CORE_DUMPS_THREADS, "core_dumps_threads", &core_dumps_threads, sizeof(int),
+	 0644, NULL, &proc_dointvec},
 	{KERN_CORE_USES_PID, "core_uses_pid", &core_uses_pid, sizeof(int),
 	 0644, NULL, &proc_dointvec},
 	{KERN_TAINTED, "tainted", &tainted, sizeof(int),

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-15 11:37 [PATCH] multithreaded coredumps for elf exeecutables Vamsi Krishna S .
@ 2002-03-19 15:29 ` Pavel Machek
  2002-03-19 18:49   ` Mark Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Machek @ 2002-03-19 15:29 UTC (permalink / raw)
  To: Vamsi Krishna S .
  Cc: linux-kernel, alan, marcelo, dan, tachino, jefreyr, mgross,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

Hi!

> - Other threads are prevented from executing while core dump is in 
>   progress to improve the accuracy of the dumps. This is done without 
>   changing the state of the task. We set cpus_allowed in task struct 
>   to be 0 to stop a task from being scheduled and reset it to -1 for
>   resume execution. This has the advantage to not depending on user
>   space at all for correct functioning. IMO sending SIGSTOP to stop 
>   other threads does not work if the process is being run under a 
>   debugger. The only possible issue with using cpus_allowed is that 

In swsusp patch, I had exactly the same problem. I created refrigerator(),
and halfway send a signal.... 

> +/*
> + * Suspend execution of other threads belonging to the same multithreaded process 
> + * of current, ASAP.
> + *
> + * Sets the current->cpu_mask to the current cpu to avoid cpu migration durring the dump.
> + * This cpu will also be the only cpu the other threads will be allowed to run after 
> + * coredump is completed. This seems to be needed to fix some SMP races.  This still
> + * needs some more thought though this solution works.

What about

app has 5 threads. 1st dumps core, and starts setting cpus_allowed mask to
thread 2. Meanwhile 3nd thread resets the mask back.

								Pavel?

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-19 15:29 ` Pavel Machek
@ 2002-03-19 18:49   ` Mark Gross
  2002-03-20  6:06     ` Vamsi Krishna S .
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Gross @ 2002-03-19 18:49 UTC (permalink / raw)
  To: Pavel Machek, Vamsi Krishna S .
  Cc: linux-kernel, alan, marcelo, dan, tachino, jefreyr,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

On Tuesday 19 March 2002 10:29 am, Pavel Machek wrote:
> > + *
> > + * Sets the current->cpu_mask to the current cpu to avoid cpu migration
> > durring the dump. + * This cpu will also be the only cpu the other
> > threads will be allowed to run after + * coredump is completed. This
> > seems to be needed to fix some SMP races.  This still + * needs some more
> > thought though this solution works.
>
> What about
>
> app has 5 threads. 1st dumps core, and starts setting cpus_allowed mask to
> thread 2. Meanwhile 3nd thread resets the mask back.
>
This patch was intended to prevent this from happening.  I hope I didn't miss 
something.

The dumping thread doesn't proceed until the other CPU's have gotten into 
kernel mode and done 2 IPI's.  One to reschedule the other cpu's and one to 
synchronize before exiting suspend_other_threads.  

The way the IPI's are sent out by this patch, the other CPUs get 2 IPI's and 
execute at least one IRET, and hence at least one call to schedule, before 
the dumping process continues.  This one call to schedule on each of the 
other cpu's is what's needed to get all possible related thread processes 
swapped out for the duration of the dump.

Unless the IPI's and associated IRET's get dropped by the system, that 3rd 
thread will not get a chance to touch the cpu_masks before the dumping 
process is finished taking its dump and resume_other_threads gets called.   
Because it will have been scheduled out.  

--mgross

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-19 18:49   ` Mark Gross
@ 2002-03-20  6:06     ` Vamsi Krishna S .
  2002-03-20 18:37       ` Daniel Jacobowitz
  0 siblings, 1 reply; 18+ messages in thread
From: Vamsi Krishna S . @ 2002-03-20  6:06 UTC (permalink / raw)
  To: Mark Gross
  Cc: Pavel Machek, linux-kernel, alan, marcelo, dan, tachino, jefreyr,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

There is serialization at higher level. We take a write lock
on current->mm->mmap_sem at the beginning of elf_core_dump
function which is released just before leaving the function.
So, if one thread enters elf_core_dump and starts dumping core,
no other thread (same mm) of the same process can start
dumping.

static int elf_core_dump(long signr, struct pt_regs * regs, struct file * file)
{
	...
	...
        /* now stop all vm operations */
        down_write(&current->mm->mmap_sem);
	...
	...
	...
        up_write(&current->mm->mmap_sem);
        return has_dumped;
}

Vamsi.
-- 
Vamsi Krishna S.
Linux Technology Center,
IBM Software Lab, Bangalore.
Ph: +91 80 5262355 Extn: 3959
Internet: vamsi@in.ibm.com

On Tue, Mar 19, 2002 at 01:49:58PM -0500, Mark Gross wrote:
> On Tuesday 19 March 2002 10:29 am, Pavel Machek wrote:
> > > + *
> > > + * Sets the current->cpu_mask to the current cpu to avoid cpu migration
> > > durring the dump. + * This cpu will also be the only cpu the other
> > > threads will be allowed to run after + * coredump is completed. This
> > > seems to be needed to fix some SMP races.  This still + * needs some more
> > > thought though this solution works.
> >
> > What about
> >
> > app has 5 threads. 1st dumps core, and starts setting cpus_allowed mask to
> > thread 2. Meanwhile 3nd thread resets the mask back.
> >
> This patch was intended to prevent this from happening.  I hope I didn't miss 
> something.
> 
> The dumping thread doesn't proceed until the other CPU's have gotten into 
> kernel mode and done 2 IPI's.  One to reschedule the other cpu's and one to 
> synchronize before exiting suspend_other_threads.  
> 
> The way the IPI's are sent out by this patch, the other CPUs get 2 IPI's and 
> execute at least one IRET, and hence at least one call to schedule, before 
> the dumping process continues.  This one call to schedule on each of the 
> other cpu's is what's needed to get all possible related thread processes 
> swapped out for the duration of the dump.
> 
> Unless the IPI's and associated IRET's get dropped by the system, that 3rd 
> thread will not get a chance to touch the cpu_masks before the dumping 
> process is finished taking its dump and resume_other_threads gets called.   
> Because it will have been scheduled out.  
> 
> --mgross

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-20 18:37       ` Daniel Jacobowitz
@ 2002-03-20 16:14         ` Mark Gross
  2002-03-21 10:03           ` Vamsi Krishna S .
  2002-03-21 10:16         ` Vamsi Krishna S .
  1 sibling, 1 reply; 18+ messages in thread
From: Mark Gross @ 2002-03-20 16:14 UTC (permalink / raw)
  To: Daniel Jacobowitz, Vamsi Krishna S .
  Cc: Pavel Machek, linux-kernel, alan, marcelo, tachino, jefreyr,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

I've only JUST started on the Itanium version of this patch.  In my initial 
testing, after hacking around some of the compilation issues,  I do get a 
type of process freezing when attempting this.  Could be this bug.  

Thanks for the tip ;)

--mgross



On Wednesday 20 March 2002 01:37 pm, Daniel Jacobowitz wrote:
> On Wed, Mar 20, 2002 at 11:36:30AM +0530, Vamsi Krishna S . wrote:
> > There is serialization at higher level. We take a write lock
> > on current->mm->mmap_sem at the beginning of elf_core_dump
> > function which is released just before leaving the function.
> > So, if one thread enters elf_core_dump and starts dumping core,
> > no other thread (same mm) of the same process can start
> > dumping.
> > 
> > static int elf_core_dump(long signr, struct pt_regs * regs, struct file *
> > file) {
> >       ...
> >       ...
> >         /* now stop all vm operations */
> >         down_write(&current->mm->mmap_sem);
> >       ...
> >       ...
> >       ...
> >         up_write(&current->mm->mmap_sem);
> >         return has_dumped;
> > }
>
> That's not a feature, it's a bug.  You can't take the mmap_sem before
> collecting thread status; it will cause a deadlock on at least ia64,
> where some registers are collected from user memory.
>
> (Thanks to Manfred Spraul for explaining that to me.)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-20  6:06     ` Vamsi Krishna S .
@ 2002-03-20 18:37       ` Daniel Jacobowitz
  2002-03-20 16:14         ` Mark Gross
  2002-03-21 10:16         ` Vamsi Krishna S .
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Jacobowitz @ 2002-03-20 18:37 UTC (permalink / raw)
  To: Vamsi Krishna S .
  Cc: Mark Gross, Pavel Machek, linux-kernel, alan, marcelo, tachino,
	jefreyr, vamsi_krishna, richardj_moore, hanharat, bsuparna,
	bharata, asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

On Wed, Mar 20, 2002 at 11:36:30AM +0530, Vamsi Krishna S . wrote:
> There is serialization at higher level. We take a write lock
> on current->mm->mmap_sem at the beginning of elf_core_dump
> function which is released just before leaving the function.
> So, if one thread enters elf_core_dump and starts dumping core,
> no other thread (same mm) of the same process can start
> dumping.
> 
> static int elf_core_dump(long signr, struct pt_regs * regs, struct file * file)
> {
> 	...
> 	...
>         /* now stop all vm operations */
>         down_write(&current->mm->mmap_sem);
> 	...
> 	...
> 	...
>         up_write(&current->mm->mmap_sem);
>         return has_dumped;
> }

That's not a feature, it's a bug.  You can't take the mmap_sem before
collecting thread status; it will cause a deadlock on at least ia64,
where some registers are collected from user memory.

(Thanks to Manfred Spraul for explaining that to me.)

-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-20 16:14         ` Mark Gross
@ 2002-03-21 10:03           ` Vamsi Krishna S .
  2002-03-22 16:19             ` Mark Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Vamsi Krishna S . @ 2002-03-21 10:03 UTC (permalink / raw)
  To: Mark Gross
  Cc: Daniel Jacobowitz, Pavel Machek, linux-kernel, alan, marcelo,
	tachino, jefreyr, vamsi_krishna, richardj_moore, hanharat,
	bsuparna, bharata, asit.k.mallick, david.p.howell, tony.luck,
	sunil.saxena

Mark,

Does moving the down_write() to be after the registers of all 
threads are collected help? (This patch on top of our previous
one)
--
--- 2417-tcore/fs/binfmt_elf.c.ori	Thu Mar 21 15:30:08 2002
+++ 2417-tcore/fs/binfmt_elf.c	Thu Mar 21 15:27:29 2002
@@ -1289,10 +1289,6 @@
 	int dump_threads = 0;
 	int thread_status_size = 0;
 	
-	/* now stop all vm operations */
-	down_write(&current->mm->mmap_sem);
-	segs = current->mm->map_count;
-
  	if (atomic_read(&current->mm->mm_users) != 1) {
 		dump_threads = core_dumps_threads;
 	}
@@ -1337,6 +1333,19 @@
 		}
 	} /* End if(dump_threads) */
 
+	/*
+	 * This transfers the registers from regs into the standard
+	 * coredump arrangement, whatever that is. We need to do this
+	 * before acquiring mmap_sem as on some architectures (IA64)
+	 * we may need to access user pages to get register state.
+	 */
+	memset(&prstatus, 0, sizeof(prstatus));
+	elf_core_copy_regs(&prstatus.pr_reg, regs);
+
+	/* now stop all vm operations */
+	down_write(&current->mm->mmap_sem);
+	segs = current->mm->map_count;
+
 #ifdef DEBUG
 	printk("elf_core_dump: %d segs %lu limit\n", segs, limit);
 #endif
@@ -1358,16 +1367,9 @@
 	 * Set up the notes in similar form to SVR4 core dumps made
 	 * with info from their /proc.
 	 */
-	memset(&prstatus, 0, sizeof(prstatus));
 	fill_prstatus(&prstatus, current, signr);
 	fill_note(&notes[0], "CORE", NT_PRSTATUS, sizeof(prstatus), &prstatus);
 
-	/*
-	 * This transfers the registers from regs into the standard
-	 * coredump arrangement, whatever that is.
-	 */
-	elf_core_copy_regs(&prstatus.pr_reg, regs);
-
 #ifdef DEBUG
 	dump_regs("Passed in regs", (elf_greg_t *)regs);
 	dump_regs("prstatus regs", (elf_greg_t *)&prstatus.pr_reg);


-- 
Vamsi Krishna S.
Linux Technology Center,
IBM Software Lab, Bangalore.
Ph: +91 80 5262355 Extn: 3959
Internet: vamsi@in.ibm.com

On Wed, Mar 20, 2002 at 11:14:56AM -0500, Mark Gross wrote:
> I've only JUST started on the Itanium version of this patch.  In my initial 
> testing, after hacking around some of the compilation issues,  I do get a 
> type of process freezing when attempting this.  Could be this bug.  
> 
> Thanks for the tip ;)
> 
> --mgross
> 
> 
> 
> On Wednesday 20 March 2002 01:37 pm, Daniel Jacobowitz wrote:
> > On Wed, Mar 20, 2002 at 11:36:30AM +0530, Vamsi Krishna S . wrote:
> > > There is serialization at higher level. We take a write lock
> > > on current->mm->mmap_sem at the beginning of elf_core_dump
> > > function which is released just before leaving the function.
> > > So, if one thread enters elf_core_dump and starts dumping core,
> > > no other thread (same mm) of the same process can start
> > > dumping.
> > > <snip>
> >
> > That's not a feature, it's a bug.  You can't take the mmap_sem before
> > collecting thread status; it will cause a deadlock on at least ia64,
> > where some registers are collected from user memory.
> >
> > (Thanks to Manfred Spraul for explaining that to me.)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-20 18:37       ` Daniel Jacobowitz
  2002-03-20 16:14         ` Mark Gross
@ 2002-03-21 10:16         ` Vamsi Krishna S .
  2002-03-21 16:27           ` Daniel Jacobowitz
  1 sibling, 1 reply; 18+ messages in thread
From: Vamsi Krishna S . @ 2002-03-21 10:16 UTC (permalink / raw)
  To: dan
  Cc: Mark Gross, Pavel Machek, linux-kernel, alan, marcelo, tachino,
	jefreyr, vamsi_krishna, richardj_moore, hanharat, bsuparna,
	bharata, asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

Dan,

Thanks for pointing this out. I see that this change has now gone into
2.4.18 as well as 2.5.4. We would ensure that the down_write happens
only after the registers of all threads are collected.

Coming back to the original point raised by Pavel, indeed there is 
nothing preventing external code (any other kernel modules) modifying
the cpus_allowed field from under us. This could get worse in 2.5.x
where a user could change cpu affinity (through proc or a syscall, 
though I don't think the patches providing this are accepted as yet).

Vamsi.

On Wed, Mar 20, 2002 at 01:37:09PM -0500, Daniel Jacobowitz wrote:
> On Wed, Mar 20, 2002 at 11:36:30AM +0530, Vamsi Krishna S . wrote:
> > There is serialization at higher level. We take a write lock
> > on current->mm->mmap_sem at the beginning of elf_core_dump
> > function which is released just before leaving the function.
> > So, if one thread enters elf_core_dump and starts dumping core,
> > no other thread (same mm) of the same process can start
> > dumping.
> > <snip>
> 
> That's not a feature, it's a bug.  You can't take the mmap_sem before
> collecting thread status; it will cause a deadlock on at least ia64,
> where some registers are collected from user memory.
> 
> (Thanks to Manfred Spraul for explaining that to me.)
> 
> -- 
> Daniel Jacobowitz                           Carnegie Mellon University
> MontaVista Software                         Debian GNU/Linux Developer

-- 
Vamsi Krishna S.
Linux Technology Center,
IBM Software Lab, Bangalore.
Ph: +91 80 5262355 Extn: 3959
Internet: vamsi@in.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-21 16:52             ` Alan Cox
@ 2002-03-21 14:10               ` Mark Gross
  2002-03-21 17:34                 ` Alan Cox
  2002-03-21 20:25                 ` Pavel Machek
  0 siblings, 2 replies; 18+ messages in thread
From: Mark Gross @ 2002-03-21 14:10 UTC (permalink / raw)
  To: Alan Cox, Daniel Jacobowitz
  Cc: Vamsi Krishna S .,
	Pavel Machek, linux-kernel, alan, marcelo, tachino, jefreyr,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

On Thursday 21 March 2002 11:52 am, Alan Cox wrote:
> You need interrupts to handle this, even if you don't wrap it in the top
> layer of signals it will be able to use much of the code I agree. The nasty
> case is the "currently running on another cpu" one. Especially since you
> can't just "trap it" - if you IPI that processor it might have moved by the
> time the IPI arrives 8)

This why I grabbed all those locks, and did the two sets of IPI's in the 
tcore patch.  Once the runqueue lock is grabbed, even if that process on the 
other CPU tries to migrate, it won't get swapped in or looked at by the 
scheduler until its cpus_allowed member has been marked.   After cpus_allowed 
has been marked it won't run. 

I don't think there is any faster way of getting the other CPU's into 
schedule and a specific running process to be swapped out than what was done 
here.

The only risk with this type of code is if other code or drivers attempt 
similar maneuvers at the same time.  Having a standard mechanism or API for 
this in the scheduler would be a "good thing".

--mgross
ps.
I've just started considering how to do this with the 2.5 O(1) scheduler, and 
I'm not sure yet how I can accomplish this process "pausing" behavior just 
yet.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-21 17:34                 ` Alan Cox
@ 2002-03-21 14:59                   ` Mark Gross
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Gross @ 2002-03-21 14:59 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alan Cox, Daniel Jacobowitz, Vamsi Krishna S .,
	Pavel Machek, linux-kernel, marcelo, tachino, jefreyr,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

On Thursday 21 March 2002 12:34 pm, Alan Cox wrote:
> > This why I grabbed all those locks, and did the two sets of IPI's in the
> > tcore patch.  Once the runqueue lock is grabbed, even if that process on
> > the
>
> If you IPI holding a lock whats going to happen if while the IPI is going
> across the cpus the other processor tries to grab the runqueue lock and
> is spinning on it with interrupts off ?

Then the at least 2 CPU's would quickly become dead locked on the 
synchronization IPI this patch sends at the end of the suspend_other_threads 
function call.

Interrupts shouldn't be turned off when grabbing the runqueue lock.  Its also 
a bad thing if they would happen to be off while calling into to schedule.  

I think schedule was designed to be called only while interrupts are turned 
on.  It BUG's if "in_interrupt" to enforce this.

--mgross


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-21 10:16         ` Vamsi Krishna S .
@ 2002-03-21 16:27           ` Daniel Jacobowitz
  2002-03-21 16:52             ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Jacobowitz @ 2002-03-21 16:27 UTC (permalink / raw)
  To: Vamsi Krishna S .
  Cc: Mark Gross, Pavel Machek, linux-kernel, alan, marcelo, tachino,
	jefreyr, vamsi_krishna, richardj_moore, hanharat, bsuparna,
	bharata, asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

On Thu, Mar 21, 2002 at 03:46:50PM +0530, Vamsi Krishna S . wrote:
> Dan,
> 
> Thanks for pointing this out. I see that this change has now gone into
> 2.4.18 as well as 2.5.4. We would ensure that the down_write happens
> only after the registers of all threads are collected.

Yes, your other patch for this looks OK.

> Coming back to the original point raised by Pavel, indeed there is 
> nothing preventing external code (any other kernel modules) modifying
> the cpus_allowed field from under us. This could get worse in 2.5.x
> where a user could change cpu affinity (through proc or a syscall, 
> though I don't think the patches providing this are accepted as yet).

We really need a non-signal-based way to tell the scheduler that a task
can not be scheduled.  A lot of the machinery is all there, but private to
sched.c; the rest is pretty straightforward.

-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-21 16:27           ` Daniel Jacobowitz
@ 2002-03-21 16:52             ` Alan Cox
  2002-03-21 14:10               ` Mark Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2002-03-21 16:52 UTC (permalink / raw)
  To: Daniel Jacobowitz
  Cc: Vamsi Krishna S .,
	Mark Gross, Pavel Machek, linux-kernel, alan, marcelo, tachino,
	jefreyr, vamsi_krishna, richardj_moore, hanharat, bsuparna,
	bharata, asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

> We really need a non-signal-based way to tell the scheduler that a task
> can not be scheduled.  A lot of the machinery is all there, but private to
> sched.c; the rest is pretty straightforward.

You need interrupts to handle this, even if you don't wrap it in the top
layer of signals it will be able to use much of the code I agree. The nasty
case is the "currently running on another cpu" one. Especially since you 
can't just "trap it" - if you IPI that processor it might have moved by the
time the IPI arrives 8)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-21 14:10               ` Mark Gross
@ 2002-03-21 17:34                 ` Alan Cox
  2002-03-21 14:59                   ` Mark Gross
  2002-03-21 20:25                 ` Pavel Machek
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Cox @ 2002-03-21 17:34 UTC (permalink / raw)
  To: mgross
  Cc: Alan Cox, Daniel Jacobowitz, Vamsi Krishna S .,
	Pavel Machek, linux-kernel, marcelo, tachino, jefreyr,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

> This why I grabbed all those locks, and did the two sets of IPI's in the 
> tcore patch.  Once the runqueue lock is grabbed, even if that process on the 

If you IPI holding a lock whats going to happen if while the IPI is going
across the cpus the other processor tries to grab the runqueue lock and
is spinning on it with interrupts off ?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-21 14:10               ` Mark Gross
  2002-03-21 17:34                 ` Alan Cox
@ 2002-03-21 20:25                 ` Pavel Machek
  1 sibling, 0 replies; 18+ messages in thread
From: Pavel Machek @ 2002-03-21 20:25 UTC (permalink / raw)
  To: Mark Gross
  Cc: Alan Cox, Daniel Jacobowitz, Vamsi Krishna S .,
	Pavel Machek, linux-kernel, marcelo, tachino, jefreyr,
	vamsi_krishna, richardj_moore, hanharat, bsuparna, bharata,
	asit.k.mallick, david.p.howell, tony.luck, sunil.saxena

Hi!

> > You need interrupts to handle this, even if you don't wrap it in the top
> > layer of signals it will be able to use much of the code I agree. The nasty
> > case is the "currently running on another cpu" one. Especially since you
> > can't just "trap it" - if you IPI that processor it might have moved by the
> > time the IPI arrives 8)
> 
> This why I grabbed all those locks, and did the two sets of IPI's in the 
> tcore patch.  Once the runqueue lock is grabbed, even if that process on the 
> other CPU tries to migrate, it won't get swapped in or looked at by the 
> scheduler until its cpus_allowed member has been marked.   After cpus_allowed 
> has been marked it won't run. 

BTW it would be very nice to put "task freezing" in some generic
place. I have my own version of task freezing (with refrigerator), and
it would be good to be able to share that...

> The only risk with this type of code is if other code or drivers attempt 
> similar maneuvers at the same time.  Having a standard mechanism or API for 
> this in the scheduler would be a "good thing".

Ahha, so you know it, too.

> I've just started considering how to do this with the 2.5 O(1) scheduler, and 
> I'm not sure yet how I can accomplish this process "pausing" behavior just 
> yet.

I'm doing this in my freezer, and it should be safe even on
2.5.X. Most interesting is the part in suspend.c...
								Pavel

--- clean.2.4/arch/i386/kernel/apm.c	Thu Feb 28 11:18:05 2002
+++ linux-swsusp.24/arch/i386/kernel/apm.c	Fri Mar  1 12:44:18 2002
@@ -1664,6 +1664,7 @@
 	daemonize();
 
 	strcpy(current->comm, "kapmd");
+	current->flags |= PF_IOTHREAD;
 	sigfillset(&current->blocked);
 
 	if (apm_info.connection_version == 0) {
--- clean.2.4/arch/i386/kernel/signal.c	Thu Feb 28 11:18:05 2002
+++ linux-swsusp.24/arch/i386/kernel/signal.c	Thu Mar  7 23:17:18 2002
@@ -20,6 +20,7 @@
 #include <linux/stddef.h>
 #include <linux/tty.h>
 #include <linux/personality.h>
+#include <linux/suspend.h>
 #include <asm/ucontext.h>
 #include <asm/uaccess.h>
 #include <asm/i387.h>
@@ -595,6 +596,11 @@
 	if ((regs->xcs & 3) != 3)
 		return 1;
 
+	if (current->flags & PF_FREEZE) {
+		refrigerator(0);
+		goto no_signal;
+	}
+
 	if (!oldset)
 		oldset = &current->blocked;
 
@@ -705,6 +711,7 @@
 		return 1;
 	}
 
+ no_signal:
 	/* Did we come from a system call? */
 	if (regs->orig_eax >= 0) {
 		/* Restart the system call - no handlers present */
--- clean.2.4/drivers/usb/storage/usb.c	Thu Feb 28 11:18:20 2002
+++ linux-swsusp.24/drivers/usb/storage/usb.c	Fri Mar  1 12:43:11 2002
@@ -316,6 +316,7 @@
 	 */
 	exit_files(current);
 	current->files = init_task.files;
+	current->flags |= PF_IOTHREAD;
 	atomic_inc(&current->files->count);
 	daemonize();
 
--- clean.2.4/fs/buffer.c	Thu Feb 28 11:18:21 2002
+++ linux-swsusp.24/fs/buffer.c	Thu Mar  7 22:51:11 2002
@@ -129,6 +129,8 @@
 		wake_up(&bh->b_wait);
 }
 
+DECLARE_TASK_QUEUE(tq_bdflush);
+
 /*
  * Rewrote the wait-routines to use the "new" wait-queue functionality,
  * and getting rid of the cli-sti pairs. The wait-queue routines still
@@ -2981,12 +2986,14 @@
 	spin_unlock_irq(&tsk->sigmask_lock);
 
 	complete((struct completion *)startup);
-
+	current->flags |= PF_KERNTHREAD;
 	for (;;) {
 		wait_for_some_buffers(NODEV);
 
 		/* update interval */
 		interval = bdf_prm.b_un.interval;
+		if (current->flags & PF_FREEZE)
+			refrigerator(PF_IOTHREAD);
 		if (interval) {
 			tsk->state = TASK_INTERRUPTIBLE;
 			schedule_timeout(interval);
--- clean.2.4/fs/jbd/journal.c	Thu Feb 28 11:18:22 2002
+++ linux-swsusp.24/fs/jbd/journal.c	Thu Mar  7 23:13:25 2002
@@ -34,6 +34,7 @@
 #include <linux/init.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <linux/suspend.h>
 #include <asm/uaccess.h>
 #include <linux/proc_fs.h>
 
@@ -226,6 +227,7 @@
 			journal->j_commit_interval / HZ);
 	list_add(&journal->j_all_journals, &all_journals);
 
+	current->flags |= PF_KERNTHREAD;
 	/* And now, wait forever for commit wakeup events. */
 	while (1) {
 		if (journal->j_flags & JFS_UNMOUNT)
@@ -246,7 +248,15 @@
 		}
 
 		wake_up(&journal->j_wait_done_commit);
-		interruptible_sleep_on(&journal->j_wait_commit);
+		if (current->flags & PF_FREEZE) { /* The simpler the better. Flushing journal isn't a
+						     good idea, because that depends on threads that
+						     may be already stopped. */
+			jbd_debug(1, "Now suspending kjournald\n");
+			refrigerator(PF_IOTHREAD);
+			jbd_debug(1, "Resuming kjournald\n");						
+		} else		/* we assume on resume that commits are already there,
+				   so we don't sleep */
+			interruptible_sleep_on(&journal->j_wait_commit);
 
 		jbd_debug(1, "kjournald wakes\n");
 
--- clean.2.4/include/linux/sched.h	Tue Dec 25 22:39:30 2001
+++ linux-swsusp.24/include/linux/sched.h	Thu Mar  7 23:09:25 2002
@@ -427,6 +427,10 @@
 #define PF_MEMDIE	0x00001000	/* Killed for out-of-memory */
 #define PF_FREE_PAGES	0x00002000	/* per process page freeing */
 #define PF_NOIO		0x00004000	/* avoid generating further I/O */
+#define PF_FROZEN	0x00008000	/* frozen for system suspend */
+#define PF_FREEZE	0x00010000	/* this task should be frozen for suspend */
+#define PF_IOTHREAD	0x00020000	/* this thread is needed for doing I/O to swap */
+#define PF_KERNTHREAD	0x00040000	/* this thread is a kernel thread that cannot be sent signals to */
 
 #define PF_USEDFPU	0x00100000	/* task used FPU this quantum (SMP) */
 
--- clean.2.4/kernel/context.c	Thu Oct 11 20:17:22 2001
+++ linux-swsusp.24/kernel/context.c	Tue Feb 19 20:33:23 2002
@@ -72,6 +72,7 @@
 
 	daemonize();
 	strcpy(curtask->comm, "keventd");
+	current->flags |= PF_IOTHREAD;
 	keventd_running = 1;
 	keventd_task = curtask;
 
--- clean.2.4/kernel/signal.c	Wed Dec  5 23:46:07 2001
+++ linux-swsusp.24/kernel/signal.c	Tue Feb 19 20:33:23 2002
@@ -463,7 +463,7 @@
  * No need to set need_resched since signal event passing
  * goes through ->blocked
  */
-static inline void signal_wake_up(struct task_struct *t)
+inline void signal_wake_up(struct task_struct *t)
 {
 	t->sigpending = 1;
 
--- clean.2.4/kernel/softirq.c	Wed Oct 31 19:26:02 2001
+++ linux-swsusp.24/kernel/softirq.c	Tue Feb 19 20:33:23 2002
@@ -366,6 +366,7 @@
 
 	daemonize();
 	current->nice = 19;
+	current->flags |= PF_IOTHREAD;
 	sigfillset(&current->blocked);
 
 	/* Migrate to the right CPU */
--- clean.2.4/kernel/suspend.c	Sun Nov 11 20:26:28 2001
+++ linux-swsusp.24/kernel/suspend.c	Tue Mar 19 13:22:14 2002
@@ -0,0 +1,1373 @@
...
+/*
+ * Refrigerator and related stuff
+ */
+
+#define INTERESTING(p) \
+			/* We don't want to touch kernel_threads..*/ \
+			if (p->flags & PF_IOTHREAD) \
+				continue; \
+			if (p == current) \
+				continue; \
+			if (p->state == TASK_ZOMBIE) \
+				continue;
+
+/* Refrigerator is place where frozen processes are stored :-). */
+void refrigerator(unsigned long flag)
+{
+	/* You need correct to work with real-time processes.
+	   OTOH, this way one process may see (via /proc/) some other
+	   process in stopped state (and thereby discovered we were
+	   suspended. We probably do not care. 
+	 */
+	long save;
+	save = current->state;
+	current->state = TASK_STOPPED;
+//	PRINTK("%s entered refrigerator\n", current->comm);
+	printk(":");
+	current->flags &= ~PF_FREEZE;
+	if (flag)
+		flush_signals(current); /* We have signaled a kernel thread, which isn't normal behaviour
+					   and that may lead to 100%CPU sucking because those threads
+					   just don't manage signals. */
+	current->flags |= PF_FROZEN;
+	while (current->flags & PF_FROZEN)
+		schedule();
+//	PRINTK("%s left refrigerator\n", current->comm);
+	printk(":");
+	current->state = save;
+}
+
+/* 0 = success, else # of processes that we failed to stop */
+static int freeze_processes(void)
+{
+	int todo, start_time;
+	struct task_struct *p;
+	
+	PRINTS( "Waiting for tasks to stop... " );
+	
+	start_time = jiffies;
+	do {
+		todo = 0;
+		read_lock(&tasklist_lock);
+		for_each_task(p) {
+			unsigned long flags;
+			INTERESTING(p);
+			if (p->flags & PF_FROZEN)
+				continue;
+
+			/* FIXME: smp problem here: we may not access other process' flags
+			   without locking */
+			p->flags |= PF_FREEZE;
+			spin_lock_irqsave(&p->sigmask_lock, flags);
+			signal_wake_up(p);
+			spin_unlock_irqrestore(&p->sigmask_lock, flags);
+			todo++;
+		}
+		read_unlock(&tasklist_lock);
+		sys_sched_yield();
+		schedule();
+		if (time_after(jiffies, start_time + TIMEOUT)) {
+			PRINTK( "\n" );
+			printk(KERN_ERR " stopping tasks failed (%d tasks remaining)\n", todo );
+			return todo;
+		}
+	} while(todo);
+	
+	PRINTK( " ok\n" );
+	return 0;
+}
+
+static void thaw_processes(void)
+{
+	struct task_struct *p;
+
+	PRINTR( "Restarting tasks..." );
+	read_lock(&tasklist_lock);
+	for_each_task(p) {
+		INTERESTING(p);
+		
+		if (p->flags & PF_FROZEN) p->flags &= ~PF_FROZEN;
+		else
+			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
+		wake_up_process(p);
+	}
+	read_unlock(&tasklist_lock);
+	PRINTK( " done\n" );
+	MDELAY(500);
+}
--- clean.2.4/mm/vmscan.c	Thu Feb 28 11:18:26 2002
+++ linux-swsusp.24/mm/vmscan.c	Thu Mar  7 22:55:49 2002
@@ -723,18 +723,22 @@
 	 * us from recursively trying to free more memory as we're
 	 * trying to free the first piece of memory in the first place).
 	 */
-	tsk->flags |= PF_MEMALLOC;
+	tsk->flags |= PF_MEMALLOC | PF_KERNTHREAD;
 
 	/*
 	 * Kswapd main loop.
 	 */
 	for (;;) {
+		if (current->flags & PF_FREEZE)
+			refrigerator(PF_IOTHREAD);
 		__set_current_state(TASK_INTERRUPTIBLE);
 		add_wait_queue(&kswapd_wait, &wait);
 
 		mb();
-		if (kswapd_can_sleep())
+		if (kswapd_can_sleep()) {
 			schedule();
+		}
+		
 
 		__set_current_state(TASK_RUNNING);
 		remove_wait_queue(&kswapd_wait, &wait);
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-21 10:03           ` Vamsi Krishna S .
@ 2002-03-22 16:19             ` Mark Gross
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Gross @ 2002-03-22 16:19 UTC (permalink / raw)
  To: vamsi
  Cc: Daniel Jacobowitz, Pavel Machek, linux-kernel, alan, marcelo,
	tachino, jefreyr, vamsi_krishna, richardj_moore, hanharat,
	bsuparna, bharata, asit.k.mallick, david.p.howell, tony.luck,
	sunil.saxena

On Thursday 21 March 2002 05:03 am, Vamsi Krishna S . wrote:
> Mark,
>
> Does moving the down_write() to be after the registers of all
> threads are collected help? (This patch on top of our previous
> one)

Yes, moving the down_write to after the grabbing of the registers fixes the 
semi lock ups.  

I need to move my Big Sur to RH7.2 to continue my validation.  Its running 
the 7.1 libs and gdb / libpthreads.so aren't as happy at debug time as they 
are for 7.2 on ia32.

Thanks.

--mgross

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-29  5:43 ` Jeff Jenkins
@ 2002-03-29 12:50   ` Mark Gross
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Gross @ 2002-03-29 12:50 UTC (permalink / raw)
  To: Jeff Jenkins, Suparna Bhattacharya
  Cc: Alan Cox, Alan Cox, asit.k.mallick, bharata, Daniel Jacobowitz,
	david.p.howell, hanharat, linux-kernel, marcelo, Pavel Machek,
	Richard_J_Moore/UK/IBM%IBMGB, S Vamsikrishna, sunil.saxena,
	tachino, tony.luck, vamsi

Yes.  

Patch the 2.4.17 kernel with the patch that Vamsi sent out and you're good to 
go on ia32.  I've had very good luck with this patch unit tested  on 1,2, and 
4 way ia32 systems without any failures.

However; I'm currently working on a bug fix to my process pausing 
implementation and Itanium support for it as a patch off the 2.4.17 base 
kernel.  This bug showed up on Itanium, and could bite ia32 users even though 
we haven't seen it on ia32 yet.    

I should have the bug fix patch and the Itainium patch posted next week.

Take the current 2.4.17 patch and give it a try.

--mgross

On Friday 29 March 2002 12:43 am, Jeff Jenkins wrote:
> So, after all this discussion, is there a set of source that I can use to
> build a kernel that will
> dump ALL threads to a core file?
>
> I recall that Vamsi initially send out the diffs that were to be used as a
> patch.  This sparked the issue raised by Daniel.
>
> Vamsi:  do you have a set of patches that differ than the original patch
> you sent?
>
> Thanks!
>
> -- jrj
>
> -----Original Message-----
> From: Suparna Bhattacharya [mailto:bsuparna@in.ibm.com]
> Sent: Thursday, March 21, 2002 10:06 PM
> To: mgross@unix-os.sc.intel.com
> Cc: Alan Cox; Alan Cox; asit.k.mallick@intel.com; bharata@linux.ibm.com;
> Daniel Jacobowitz; david.p.howell@intel.com; hanharat@us.ibm.com;
> jefreyr@pacbell.net; linux-kernel@vger.kernel.org;
> marcelo@conectiva.com.br; Pavel Machek; Richard_J_Moore/UK/IBM%IBMGB; S
> Vamsikrishna; sunil.saxena@intel.com; tachino@jp.fujitsu.com;
> tony.luck@intel.com; vamsi@linux.ibm.com
> Subject: Re: [PATCH] multithreaded coredumps for elf exeecutables
> Importance: High
>
>
>
> IIRC there was an observation that spin_lock_irq seems to first disable
> interrupts and then start spinning on the lock, which is why such a
> situation could arise (even though the code in schedule doesn't appear to
> explicitly disable interrupts).
>
> However, in Mark's implementation, its only the first IPI that happens
> under the runqueue lock, and that actually doesn't wait for the other CPUs
> to receive the IPI. (The purpose of the first IPI was more a matter of
> trying to improve accuracy by notifying the other threads as soon as
> possible). So there shouldn't be a deadlock. The synchronization/wait
> happens in the case of the second IPI (i.e. the smp_call_function), and by
> that time the runqueue lock has been released, and cpus_allowed has been
> updated.
>
> Regards
> Suparna
>
>   Suparna Bhattacharya
>   Linux Technology Center
>   IBM Software Lab, India
>   E-mail : bsuparna@in.ibm.com
>   Phone :  91-80-5044961
>
>
>
>
>                     Mark Gross
>                     <mgross@unix-os.sc.       To:     Alan Cox
> <alan@lxorguk.ukuu.org.uk>
>                     intel.com>                cc:
> alan@lxorguk.ukuu.org.uk (Alan Cox),
>                                                dan@debian.org (Daniel
> Jacobowitz),
>                     03/21/02 08:29 PM          vamsi@linux.ibm.com,
> pavel@suse.cz (Pavel
>                     Please respond to          Machek),
> linux-kernel@vger.kernel.org,
>                     mgross                     marcelo@conectiva.com.br,
>                                                tachino@jp.fujitsu.com,
> jefreyr@pacbell.net,
>                                                S
> Vamsikrishna/India/IBM@IBMIN, Richard J
>                                                Moore/UK/IBM@IBMGB,
> hanharat@us.ibm.com,
>                                                Suparna
> Bhattacharya/India/IBM@IBMIN,
>                                                bharata@linux.ibm.com,
>                                                asit.k.mallick@intel.com,
>                                                david.p.howell@intel.com,
>                                                tony.luck@intel.com,
> sunil.saxena@intel.com
>                                               Subject:     Re: [PATCH]
> multithreaded
>                                                coredumps for elf
> exeecutables
>
> On Thursday 21 March 2002 12:34 pm, Alan Cox wrote:
> > > This why I grabbed all those locks, and did the two sets of IPI's in
>
> the
>
> > > tcore patch.  Once the runqueue lock is grabbed, even if that process
>
> on
>
> > > the
> >
> > If you IPI holding a lock whats going to happen if while the IPI is going
> > across the cpus the other processor tries to grab the runqueue lock and
> > is spinning on it with interrupts off ?
>
> Then the at least 2 CPU's would quickly become dead locked on the
> synchronization IPI this patch sends at the end of the
> suspend_other_threads
> function call.
>
> Interrupts shouldn't be turned off when grabbing the runqueue lock.  Its
> also
> a bad thing if they would happen to be off while calling into to schedule.
>
>
> I think schedule was designed to be called only while interrupts are turned
>
> on.  It BUG's if "in_interrupt" to enforce this.
>
> --mgross
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] multithreaded coredumps for elf exeecutables
  2002-03-22  6:06 Suparna Bhattacharya
@ 2002-03-29  5:43 ` Jeff Jenkins
  2002-03-29 12:50   ` Mark Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Jenkins @ 2002-03-29  5:43 UTC (permalink / raw)
  To: Suparna Bhattacharya, mgross
  Cc: Alan Cox, Alan Cox, asit.k.mallick, bharata, Daniel Jacobowitz,
	david.p.howell, hanharat, linux-kernel, marcelo, Pavel Machek,
	Richard_J_Moore/UK/IBM%IBMGB, S Vamsikrishna, sunil.saxena,
	tachino, tony.luck, vamsi

So, after all this discussion, is there a set of source that I can use to
build a kernel that will
dump ALL threads to a core file?

I recall that Vamsi initially send out the diffs that were to be used as a
patch.  This sparked the issue raised by Daniel.

Vamsi:  do you have a set of patches that differ than the original patch you
sent?

Thanks!

-- jrj

-----Original Message-----
From: Suparna Bhattacharya [mailto:bsuparna@in.ibm.com]
Sent: Thursday, March 21, 2002 10:06 PM
To: mgross@unix-os.sc.intel.com
Cc: Alan Cox; Alan Cox; asit.k.mallick@intel.com; bharata@linux.ibm.com;
Daniel Jacobowitz; david.p.howell@intel.com; hanharat@us.ibm.com;
jefreyr@pacbell.net; linux-kernel@vger.kernel.org;
marcelo@conectiva.com.br; Pavel Machek; Richard_J_Moore/UK/IBM%IBMGB; S
Vamsikrishna; sunil.saxena@intel.com; tachino@jp.fujitsu.com;
tony.luck@intel.com; vamsi@linux.ibm.com
Subject: Re: [PATCH] multithreaded coredumps for elf exeecutables
Importance: High



IIRC there was an observation that spin_lock_irq seems to first disable
interrupts and then start spinning on the lock, which is why such a
situation could arise (even though the code in schedule doesn't appear to
explicitly disable interrupts).

However, in Mark's implementation, its only the first IPI that happens
under the runqueue lock, and that actually doesn't wait for the other CPUs
to receive the IPI. (The purpose of the first IPI was more a matter of
trying to improve accuracy by notifying the other threads as soon as
possible). So there shouldn't be a deadlock. The synchronization/wait
happens in the case of the second IPI (i.e. the smp_call_function), and by
that time the runqueue lock has been released, and cpus_allowed has been
updated.

Regards
Suparna

  Suparna Bhattacharya
  Linux Technology Center
  IBM Software Lab, India
  E-mail : bsuparna@in.ibm.com
  Phone :  91-80-5044961




                    Mark Gross
                    <mgross@unix-os.sc.       To:     Alan Cox
<alan@lxorguk.ukuu.org.uk>
                    intel.com>                cc:
alan@lxorguk.ukuu.org.uk (Alan Cox),
                                               dan@debian.org (Daniel
Jacobowitz),
                    03/21/02 08:29 PM          vamsi@linux.ibm.com,
pavel@suse.cz (Pavel
                    Please respond to          Machek),
linux-kernel@vger.kernel.org,
                    mgross                     marcelo@conectiva.com.br,
                                               tachino@jp.fujitsu.com,
jefreyr@pacbell.net,
                                               S
Vamsikrishna/India/IBM@IBMIN, Richard J
                                               Moore/UK/IBM@IBMGB,
hanharat@us.ibm.com,
                                               Suparna
Bhattacharya/India/IBM@IBMIN,
                                               bharata@linux.ibm.com,
                                               asit.k.mallick@intel.com,
                                               david.p.howell@intel.com,
                                               tony.luck@intel.com,
sunil.saxena@intel.com
                                              Subject:     Re: [PATCH]
multithreaded
                                               coredumps for elf
exeecutables






On Thursday 21 March 2002 12:34 pm, Alan Cox wrote:
> > This why I grabbed all those locks, and did the two sets of IPI's in
the
> > tcore patch.  Once the runqueue lock is grabbed, even if that process
on
> > the
>
> If you IPI holding a lock whats going to happen if while the IPI is going
> across the cpus the other processor tries to grab the runqueue lock and
> is spinning on it with interrupts off ?

Then the at least 2 CPU's would quickly become dead locked on the
synchronization IPI this patch sends at the end of the
suspend_other_threads
function call.

Interrupts shouldn't be turned off when grabbing the runqueue lock.  Its
also
a bad thing if they would happen to be off while calling into to schedule.


I think schedule was designed to be called only while interrupts are turned

on.  It BUG's if "in_interrupt" to enforce this.

--mgross






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] multithreaded coredumps for elf exeecutables
@ 2002-03-22  6:06 Suparna Bhattacharya
  2002-03-29  5:43 ` Jeff Jenkins
  0 siblings, 1 reply; 18+ messages in thread
From: Suparna Bhattacharya @ 2002-03-22  6:06 UTC (permalink / raw)
  To: mgross
  Cc: Alan Cox, Alan Cox, asit.k.mallick, bharata, Daniel Jacobowitz,
	david.p.howell, hanharat, jefreyr, linux-kernel, marcelo,
	Pavel Machek, Richard_J_Moore/UK/IBM%IBMGB, S Vamsikrishna,
	sunil.saxena, tachino, tony.luck, vamsi


IIRC there was an observation that spin_lock_irq seems to first disable
interrupts and then start spinning on the lock, which is why such a
situation could arise (even though the code in schedule doesn't appear to
explicitly disable interrupts).

However, in Mark's implementation, its only the first IPI that happens
under the runqueue lock, and that actually doesn't wait for the other CPUs
to receive the IPI. (The purpose of the first IPI was more a matter of
trying to improve accuracy by notifying the other threads as soon as
possible). So there shouldn't be a deadlock. The synchronization/wait
happens in the case of the second IPI (i.e. the smp_call_function), and by
that time the runqueue lock has been released, and cpus_allowed has been
updated.

Regards
Suparna

  Suparna Bhattacharya
  Linux Technology Center
  IBM Software Lab, India
  E-mail : bsuparna@in.ibm.com
  Phone :  91-80-5044961



                                                                                            
                    Mark Gross                                                              
                    <mgross@unix-os.sc.       To:     Alan Cox <alan@lxorguk.ukuu.org.uk>   
                    intel.com>                cc:     alan@lxorguk.ukuu.org.uk (Alan Cox),  
                                               dan@debian.org (Daniel Jacobowitz),          
                    03/21/02 08:29 PM          vamsi@linux.ibm.com, pavel@suse.cz (Pavel    
                    Please respond to          Machek), linux-kernel@vger.kernel.org,       
                    mgross                     marcelo@conectiva.com.br,                    
                                               tachino@jp.fujitsu.com, jefreyr@pacbell.net, 
                                               S Vamsikrishna/India/IBM@IBMIN, Richard J    
                                               Moore/UK/IBM@IBMGB, hanharat@us.ibm.com,     
                                               Suparna Bhattacharya/India/IBM@IBMIN,        
                                               bharata@linux.ibm.com,                       
                                               asit.k.mallick@intel.com,                    
                                               david.p.howell@intel.com,                    
                                               tony.luck@intel.com, sunil.saxena@intel.com  
                                              Subject:     Re: [PATCH] multithreaded        
                                               coredumps for elf exeecutables               
                                                                                            
                                                                                            
                                                                                            



On Thursday 21 March 2002 12:34 pm, Alan Cox wrote:
> > This why I grabbed all those locks, and did the two sets of IPI's in
the
> > tcore patch.  Once the runqueue lock is grabbed, even if that process
on
> > the
>
> If you IPI holding a lock whats going to happen if while the IPI is going
> across the cpus the other processor tries to grab the runqueue lock and
> is spinning on it with interrupts off ?

Then the at least 2 CPU's would quickly become dead locked on the
synchronization IPI this patch sends at the end of the
suspend_other_threads
function call.

Interrupts shouldn't be turned off when grabbing the runqueue lock.  Its
also
a bad thing if they would happen to be off while calling into to schedule.


I think schedule was designed to be called only while interrupts are turned

on.  It BUG's if "in_interrupt" to enforce this.

--mgross






^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2002-03-29 15:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-15 11:37 [PATCH] multithreaded coredumps for elf exeecutables Vamsi Krishna S .
2002-03-19 15:29 ` Pavel Machek
2002-03-19 18:49   ` Mark Gross
2002-03-20  6:06     ` Vamsi Krishna S .
2002-03-20 18:37       ` Daniel Jacobowitz
2002-03-20 16:14         ` Mark Gross
2002-03-21 10:03           ` Vamsi Krishna S .
2002-03-22 16:19             ` Mark Gross
2002-03-21 10:16         ` Vamsi Krishna S .
2002-03-21 16:27           ` Daniel Jacobowitz
2002-03-21 16:52             ` Alan Cox
2002-03-21 14:10               ` Mark Gross
2002-03-21 17:34                 ` Alan Cox
2002-03-21 14:59                   ` Mark Gross
2002-03-21 20:25                 ` Pavel Machek
2002-03-22  6:06 Suparna Bhattacharya
2002-03-29  5:43 ` Jeff Jenkins
2002-03-29 12:50   ` Mark Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).