linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] mm_struct leak on cpu hotplug (s390/ppc64)
@ 2005-01-04 13:11 Heiko Carstens
  2005-01-05  2:41 ` Nathan Lynch
  0 siblings, 1 reply; 8+ messages in thread
From: Heiko Carstens @ 2005-01-04 13:11 UTC (permalink / raw)
  To: rusty, paulus, nathanl; +Cc: linux-kernel

Hi,

there is an mm_struct memory leak when using cpu hotplug. Appearently
start_secondary in smp.c initializes active_mm of the cpu's idle task
and increases init_mm's mm_count. But on cpu_die the idle task's
active_mm doesn't get dropped and therefore on the next cpu_up event
(->start_secondary) it gets overwritten and the result is a forgotten
reference count to whatever mm_struct was active when the cpu
was taken down previously.

The patch below should fix this for s390 (at least it works fine for
me), but I'm not sure if it's ok to call mmdrop from __cpu_die.

Also this very same leak exists for ppc64 as well.

Any opinions?

Thanks,
Heiko

diff -urN linux-2.6.10/arch/s390/kernel/smp.c linux-2.6.10-patched/arch/s390/kernel/smp.c
--- linux-2.6.10/arch/s390/kernel/smp.c	2004-12-24 22:35:50.000000000 +0100
+++ linux-2.6.10-patched/arch/s390/kernel/smp.c	2005-01-04 13:42:14.000000000 +0100
@@ -728,9 +728,14 @@
 void
 __cpu_die(unsigned int cpu)
 {
+	struct task_struct *p;
+
 	/* Wait until target cpu is down */
 	while (!cpu_stopped(cpu));
 	printk("Processor %d spun down\n", cpu);
+	p = current_set[cpu];
+	mmdrop(p->active_mm);
+	p->active_mm = NULL;
 }
 
 void

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] mm_struct leak on cpu hotplug (s390/ppc64)
  2005-01-04 13:11 [BUG] mm_struct leak on cpu hotplug (s390/ppc64) Heiko Carstens
@ 2005-01-05  2:41 ` Nathan Lynch
  2005-01-05 11:08   ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Lynch @ 2005-01-05  2:41 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: rusty, paulus, linux-kernel

On Tue, 2005-01-04 at 14:11 +0100, Heiko Carstens wrote:
> Hi,
> 
> there is an mm_struct memory leak when using cpu hotplug. Appearently
> start_secondary in smp.c initializes active_mm of the cpu's idle task
> and increases init_mm's mm_count. But on cpu_die the idle task's
> active_mm doesn't get dropped and therefore on the next cpu_up event
> (->start_secondary) it gets overwritten and the result is a forgotten
> reference count to whatever mm_struct was active when the cpu
> was taken down previously.
> 
> The patch below should fix this for s390 (at least it works fine for
> me), but I'm not sure if it's ok to call mmdrop from __cpu_die.
> 
> Also this very same leak exists for ppc64 as well.
> 
> Any opinions?

Wouldn't it be better to fix this in generic code instead of duplicating
it in each architecture?  It looks like the same thing would occur on
ia64 also.

What about something like this?  Tested on ppc64.


Index: 2.6.10/kernel/sched.c
===================================================================
--- 2.6.10.orig/kernel/sched.c	2004-12-24 21:35:24.000000000 +0000
+++ 2.6.10/kernel/sched.c	2005-01-05 01:48:47.520250232 +0000
@@ -4088,6 +4088,9 @@
 		migrate_nr_uninterruptible(rq);
 		BUG_ON(rq->nr_running != 0);
 
+		/* Must manually drop reference to avoid leaking mm_structs. */
+		mmdrop(rq->idle->active_mm);
+
 		/* No need to migrate the tasks: it was best-effort if
 		 * they didn't do lock_cpu_hotplug().  Just wake up
 		 * the requestors. */



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] mm_struct leak on cpu hotplug (s390/ppc64)
  2005-01-05  2:41 ` Nathan Lynch
@ 2005-01-05 11:08   ` Ingo Molnar
  2005-01-05 14:22     ` Heiko Carstens
  2005-01-05 15:44     ` Nathan Lynch
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2005-01-05 11:08 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Heiko Carstens, rusty, paulus, linux-kernel


* Nathan Lynch <nathanl@austin.ibm.com> wrote:

> What about something like this?  Tested on ppc64.

>  		migrate_nr_uninterruptible(rq);
>  		BUG_ON(rq->nr_running != 0);
>  
> +		/* Must manually drop reference to avoid leaking mm_structs. */
> +		mmdrop(rq->idle->active_mm);
> +
>  		/* No need to migrate the tasks: it was best-effort if
>  		 * they didn't do lock_cpu_hotplug().  Just wake up
>  		 * the requestors. */

this doesnt look correct to me, because we might end up pulling the rug
(the pagetables) from under the idle task on that CPU. This can happen
in two ways: 1) there's no direct synchronization between a dead CPU
having called into cpu_die() and the downing CPU doing the mmdrop(), so
we might end up dropping it before the idle has entered the final loop
and is still executing kernel code, 2) even when the dead idle task is
already in its final loop there's no generic guarantee that an mmdrop()
can be done - e.g. on x86 the kernel pagetables are mixed up with the
user pagetables and an mmdrop() in case of lazy-TLB might end up zapping
the idle task's pagetables which might break in subtle ways.

the correct solution i think would be to call back into the scheduler
from cpu_die():

void cpu_die(void)
{
        if (ppc_md.cpu_die)
                ppc_md.cpu_die();
+	idle_task_exit();
        local_irq_disable();
        for (;;);
}

and then in idle_task_exit(), do something like:

void idle_task_exit(void)
{
	struct mm_struct *mm = current->active_mm;

	if (mm != &init_mm)
		switch_mm(mm, &init_mm, current);
	mmdrop(mm);
}

(completely untested.) This makes sure that the idle task uses the
init_mm (which always has valid pagetables), and also ensures correct
reference-counting. Hm?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] mm_struct leak on cpu hotplug (s390/ppc64)
  2005-01-05 11:08   ` Ingo Molnar
@ 2005-01-05 14:22     ` Heiko Carstens
  2005-01-05 15:44     ` Nathan Lynch
  1 sibling, 0 replies; 8+ messages in thread
From: Heiko Carstens @ 2005-01-05 14:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Nathan Lynch, paulus, rusty

> the correct solution i think would be to call back into the scheduler
> from cpu_die():
> 
> void cpu_die(void)
> {
>         if (ppc_md.cpu_die)
>                 ppc_md.cpu_die();
> +   idle_task_exit();
>         local_irq_disable();
>         for (;;);
> }
> 
> and then in idle_task_exit(), do something like:
> 
> void idle_task_exit(void)
> {
>    struct mm_struct *mm = current->active_mm;
> 
>    if (mm != &init_mm)
>       switch_mm(mm, &init_mm, current);
>    mmdrop(mm);
> }
> 
> (completely untested.) This makes sure that the idle task uses the
> init_mm (which always has valid pagetables), and also ensures correct
> reference-counting. Hm?

Looks good and works for me.

Thanks,
Heiko


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] mm_struct leak on cpu hotplug (s390/ppc64)
  2005-01-05 11:08   ` Ingo Molnar
  2005-01-05 14:22     ` Heiko Carstens
@ 2005-01-05 15:44     ` Nathan Lynch
  2005-01-07 11:43       ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Nathan Lynch @ 2005-01-05 15:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Heiko Carstens, rusty, paulus, linux-kernel

On Wed, 2005-01-05 at 12:08 +0100, Ingo Molnar wrote:
> the correct solution i think would be to call back into the scheduler
> from cpu_die():
> 
> void cpu_die(void)
> {
>         if (ppc_md.cpu_die)
>                 ppc_md.cpu_die();
> +	idle_task_exit();
>         local_irq_disable();
>         for (;;);
> }
> 
> and then in idle_task_exit(), do something like:
> 
> void idle_task_exit(void)
> {
> 	struct mm_struct *mm = current->active_mm;
> 
> 	if (mm != &init_mm)
> 		switch_mm(mm, &init_mm, current);
> 	mmdrop(mm);
> }
> 
> (completely untested.) This makes sure that the idle task uses the
> init_mm (which always has valid pagetables), and also ensures correct
> reference-counting. Hm?

OK, how's this?  I'll submit the sched and ppc64 bits separately if
there are no objections.  I assume Heiko can take care of s390.

Note that in the ppc64 cpu_die function we must call idle_task_exit
before calling ppc_md.cpu_die, because the latter does not return.

Nathan


Index: 2.6.10/include/linux/sched.h
===================================================================
--- 2.6.10.orig/include/linux/sched.h	2004-12-24 21:33:59.000000000 +0000
+++ 2.6.10/include/linux/sched.h	2005-01-05 14:30:39.000000000 +0000
@@ -735,6 +735,7 @@
 #endif
 
 extern void sched_idle_next(void);
+extern void idle_task_exit(void);
 extern void set_user_nice(task_t *p, long nice);
 extern int task_prio(const task_t *p);
 extern int task_nice(const task_t *p);
Index: 2.6.10/kernel/sched.c
===================================================================
--- 2.6.10.orig/kernel/sched.c	2004-12-24 21:35:24.000000000 +0000
+++ 2.6.10/kernel/sched.c	2005-01-05 14:36:40.000000000 +0000
@@ -3995,6 +3995,20 @@
 	spin_unlock_irqrestore(&rq->lock, flags);
 }
 
+/* Ensures that the idle task is using init_mm right before its cpu goes
+ * offline.
+ */
+void idle_task_exit(void)
+{
+	struct mm_struct *mm = current->active_mm;
+
+	BUG_ON(cpu_online(smp_processor_id()));
+
+	if (mm != &init_mm)
+		switch_mm(mm, &init_mm, current);
+	mmdrop(mm);
+}
+
 static void migrate_dead(unsigned int dead_cpu, task_t *tsk)
 {
 	struct runqueue *rq = cpu_rq(dead_cpu);
Index: 2.6.10/arch/ppc64/kernel/setup.c
===================================================================
--- 2.6.10.orig/arch/ppc64/kernel/setup.c	2004-12-24 21:34:44.000000000 +0000
+++ 2.6.10/arch/ppc64/kernel/setup.c	2005-01-05 14:39:43.000000000 +0000
@@ -1337,6 +1337,7 @@
 
 void cpu_die(void)
 {
+	idle_task_exit();
 	if (ppc_md.cpu_die)
 		ppc_md.cpu_die();
 	local_irq_disable();



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] mm_struct leak on cpu hotplug (s390/ppc64)
  2005-01-05 15:44     ` Nathan Lynch
@ 2005-01-07 11:43       ` Ingo Molnar
  2005-01-07 21:43         ` [PATCH] introduce idle_task_exit Nathan Lynch
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2005-01-07 11:43 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Heiko Carstens, rusty, paulus, linux-kernel, Andrew Morton


* Nathan Lynch <nathanl@austin.ibm.com> wrote:

> OK, how's this?  I'll submit the sched and ppc64 bits separately if
> there are no objections.  I assume Heiko can take care of s390.
> 
> Note that in the ppc64 cpu_die function we must call idle_task_exit
> before calling ppc_md.cpu_die, because the latter does not return.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] introduce idle_task_exit
  2005-01-07 11:43       ` Ingo Molnar
@ 2005-01-07 21:43         ` Nathan Lynch
  2005-01-07 21:44           ` [PATCH] ppc64: call idle_task_exit from cpu_die Nathan Lynch
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Lynch @ 2005-01-07 21:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Heiko Carstens, Rusty Russell, paulus, lkml, Ingo Molnar

On Fri, 2005-01-07 at 05:43, Ingo Molnar wrote:
> * Nathan Lynch <nathanl@austin.ibm.com> wrote:
> 
> > OK, how's this?  I'll submit the sched and ppc64 bits separately if
> > there are no objections.  I assume Heiko can take care of s390.
> > 
> > Note that in the ppc64 cpu_die function we must call idle_task_exit
> > before calling ppc_md.cpu_die, because the latter does not return.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>


Heiko Carstens figured out that offlining a cpu can leak mm_structs
because the dying cpu's idle task fails to switch to init_mm and
mmdrop its active_mm before the cpu is down.  This patch introduces
idle_task_exit, which allows the idle task to do this as Ingo
suggested.

I will follow this up with a patch for ppc64 which calls
idle_task_exit from cpu_die.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>


---


diff -puN kernel/sched.c~sched-idle_task_exit kernel/sched.c
--- linux-2.6.10-bk10/kernel/sched.c~sched-idle_task_exit	2005-01-07 15:23:35.000000000 -0600
+++ linux-2.6.10-bk10-nathanl/kernel/sched.c	2005-01-07 15:23:35.000000000 -0600
@@ -3996,6 +3996,20 @@ void sched_idle_next(void)
 	spin_unlock_irqrestore(&rq->lock, flags);
 }
 
+/* Ensures that the idle task is using init_mm right before its cpu goes
+ * offline.
+ */
+void idle_task_exit(void)
+{
+	struct mm_struct *mm = current->active_mm;
+
+	BUG_ON(cpu_online(smp_processor_id()));
+
+	if (mm != &init_mm)
+		switch_mm(mm, &init_mm, current);
+	mmdrop(mm);
+}
+
 static void migrate_dead(unsigned int dead_cpu, task_t *tsk)
 {
 	struct runqueue *rq = cpu_rq(dead_cpu);
diff -puN include/linux/sched.h~sched-idle_task_exit include/linux/sched.h
--- linux-2.6.10-bk10/include/linux/sched.h~sched-idle_task_exit	2005-01-07 15:23:35.000000000 -0600
+++ linux-2.6.10-bk10-nathanl/include/linux/sched.h	2005-01-07 15:23:35.000000000 -0600
@@ -757,6 +757,7 @@ extern void sched_exec(void);
 #endif
 
 extern void sched_idle_next(void);
+extern void idle_task_exit(void);
 extern void set_user_nice(task_t *p, long nice);
 extern int task_prio(const task_t *p);
 extern int task_nice(const task_t *p);

_



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] ppc64: call idle_task_exit from cpu_die
  2005-01-07 21:43         ` [PATCH] introduce idle_task_exit Nathan Lynch
@ 2005-01-07 21:44           ` Nathan Lynch
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Lynch @ 2005-01-07 21:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Heiko Carstens, Rusty Russell, paulus, lkml, Ingo Molnar

Call idle_task_exit from cpu_die to avoid mm_struct leak.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>


---


diff -puN arch/ppc64/kernel/setup.c~ppc64-call-idle_task_exit-from-cpu_die arch/ppc64/kernel/setup.c
--- linux-2.6.10-bk10/arch/ppc64/kernel/setup.c~ppc64-call-idle_task_exit-from-cpu_die	2005-01-07 15:33:02.000000000 -0600
+++ linux-2.6.10-bk10-nathanl/arch/ppc64/kernel/setup.c	2005-01-07 15:33:23.000000000 -0600
@@ -1345,6 +1345,7 @@ early_param("xmon", early_xmon);
 
 void cpu_die(void)
 {
+	idle_task_exit();
 	if (ppc_md.cpu_die)
 		ppc_md.cpu_die();
 	local_irq_disable();

_



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-01-07 21:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-04 13:11 [BUG] mm_struct leak on cpu hotplug (s390/ppc64) Heiko Carstens
2005-01-05  2:41 ` Nathan Lynch
2005-01-05 11:08   ` Ingo Molnar
2005-01-05 14:22     ` Heiko Carstens
2005-01-05 15:44     ` Nathan Lynch
2005-01-07 11:43       ` Ingo Molnar
2005-01-07 21:43         ` [PATCH] introduce idle_task_exit Nathan Lynch
2005-01-07 21:44           ` [PATCH] ppc64: call idle_task_exit from cpu_die Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).