linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users
@ 2015-10-05 16:34 Oleg Nesterov
  2015-10-05 17:36 ` Oleg Nesterov
  2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-05 16:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
this ->mm can't go away and this is wrong, change it to rely on
->mm_count and mmdrop().

Firstly, we do not want to delay exit_mmap/etc if the victim exits
before we do mmput(), but this is minor.

More importantly, we simply can not do mmput() in oom_kill_process(),
this can deadlock. For example, suppose that access_process_vm(tsk)
triggers OOM and oom-killer decides to kill this "tsk". If it exits
and does mmput() before us, ksm_exit() called by us may want to want
to take the same mmap_sem for writing.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 mm/oom_kill.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 034d219..52abb78 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 
 	/* Get a reference to safely compare mm after task_unlock(victim) */
 	mm = victim->mm;
-	atomic_inc(&mm->mm_users);
+	atomic_inc(&mm->mm_count);
 	/*
 	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
 	 * the OOM victim from depleting the memory reserves from the user
@@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	}
 	rcu_read_unlock();
 
-	mmput(mm);
+	mmdrop(mm);
 	put_task_struct(victim);
 }
 #undef K
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users
  2015-10-05 16:34 [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users Oleg Nesterov
@ 2015-10-05 17:36 ` Oleg Nesterov
  2015-10-06 16:26   ` Oleg Nesterov
  2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-05 17:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

On 10/05, Oleg Nesterov wrote:
>
> oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
> this ->mm can't go away and this is wrong, change it to rely on
> ->mm_count and mmdrop().
>
> Firstly, we do not want to delay exit_mmap/etc if the victim exits
> before we do mmput(), but this is minor.
>
> More importantly, we simply can not do mmput() in oom_kill_process(),
> this can deadlock. For example, suppose that access_process_vm(tsk)
> triggers OOM and oom-killer decides to kill this "tsk". If it exits
> and does mmput() before us, ksm_exit() called by us may want to want
> to take the same mmap_sem for writing.

Self nack to the changelog ;)

I still think the patch is fine, I'll resend it. But the changelog
is wrong, in the case above access_process_vm() adds another reference
to ->mm_users, so mmput() can never lead to mm_users == 0.


> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  mm/oom_kill.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 034d219..52abb78 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>  
>  	/* Get a reference to safely compare mm after task_unlock(victim) */
>  	mm = victim->mm;
> -	atomic_inc(&mm->mm_users);
> +	atomic_inc(&mm->mm_count);
>  	/*
>  	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>  	 * the OOM victim from depleting the memory reserves from the user
> @@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>  	}
>  	rcu_read_unlock();
>  
> -	mmput(mm);
> +	mmdrop(mm);
>  	put_task_struct(victim);
>  }
>  #undef K
> -- 
> 2.4.3
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users
  2015-10-05 17:36 ` Oleg Nesterov
@ 2015-10-06 16:26   ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-06 16:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

On 10/05, Oleg Nesterov wrote:
>
> Self nack to the changelog ;)
>
> I still think the patch is fine, I'll resend it. But the changelog
> is wrong, in the case above access_process_vm() adds another reference
> to ->mm_users, so mmput() can never lead to mm_users == 0.

Please see v2 with updated changelog I am going to send. However, somehow
I forgot that "mm/oom_kill.c: fix potentially killing unrelated process"
still sits in -mm, so I guess it would be better to fold this change into
mmoom-fix-potentially-killing-unrelated-process.patch.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix
  2015-10-05 16:34 [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users Oleg Nesterov
  2015-10-05 17:36 ` Oleg Nesterov
@ 2015-10-06 16:28 ` Oleg Nesterov
  2015-10-06 16:56   ` Michal Hocko
  1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-06 16:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
this ->mm can't go away and this is wrong, change it to rely on
->mm_count and mmdrop().

Firstly, we do not want to delay exit_mmap/etc if the victim exits
before we do mmput(), but this is minor.

More importantly, we simply can not do mmput() in oom_kill_process(),
this can deadlock if (for example) the caller holds i_mmap_rwsem and
mmput() actually leads to exit_mmap(); the victim can have this file
mmaped and in this case unmap_vmas/free_pgtables paths will take the
same lock for writing. And at least huge_pmd_share() does pmd_alloc()
under i_mmap_rwsem because VM_HUGETLB memory is not reclaimable.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 mm/oom_kill.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 034d219..52abb78 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 
 	/* Get a reference to safely compare mm after task_unlock(victim) */
 	mm = victim->mm;
-	atomic_inc(&mm->mm_users);
+	atomic_inc(&mm->mm_count);
 	/*
 	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
 	 * the OOM victim from depleting the memory reserves from the user
@@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	}
 	rcu_read_unlock();
 
-	mmput(mm);
+	mmdrop(mm);
 	put_task_struct(victim);
 }
 #undef K
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix
  2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
@ 2015-10-06 16:56   ` Michal Hocko
  2015-10-06 22:49     ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2015-10-06 16:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Kyle Walker, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

On Tue 06-10-15 18:28:04, Oleg Nesterov wrote:
> oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
> this ->mm can't go away and this is wrong, change it to rely on
> ->mm_count and mmdrop().
> 
> Firstly, we do not want to delay exit_mmap/etc if the victim exits
> before we do mmput(), but this is minor.
> 
> More importantly, we simply can not do mmput() in oom_kill_process(),
> this can deadlock if (for example) the caller holds i_mmap_rwsem and
> mmput() actually leads to exit_mmap(); the victim can have this file
> mmaped and in this case unmap_vmas/free_pgtables paths will take the
> same lock for writing. And at least huge_pmd_share() does pmd_alloc()
> under i_mmap_rwsem because VM_HUGETLB memory is not reclaimable.

Ouch, I have completely missed this during review! Thanks for catching
this. On the second thought it is clear now. We really want to pin the
mm_struct not the address space.

> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/oom_kill.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 034d219..52abb78 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>  
>  	/* Get a reference to safely compare mm after task_unlock(victim) */
>  	mm = victim->mm;
> -	atomic_inc(&mm->mm_users);
> +	atomic_inc(&mm->mm_count);
>  	/*
>  	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>  	 * the OOM victim from depleting the memory reserves from the user
> @@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>  	}
>  	rcu_read_unlock();
>  
> -	mmput(mm);
> +	mmdrop(mm);
>  	put_task_struct(victim);
>  }
>  #undef K
> -- 
> 2.4.3
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix
  2015-10-06 16:56   ` Michal Hocko
@ 2015-10-06 22:49     ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2015-10-06 22:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oleg Nesterov, Sasha Levin, Andrew Morton, David Rientjes,
	Kyle Walker, Stanislav Kozina, Tetsuo Handa, linux-kernel

On Tue, 6 Oct 2015, Michal Hocko wrote:
> On Tue 06-10-15 18:28:04, Oleg Nesterov wrote:
> > oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
> > this ->mm can't go away and this is wrong, change it to rely on
> > ->mm_count and mmdrop().
> > 
> > Firstly, we do not want to delay exit_mmap/etc if the victim exits
> > before we do mmput(), but this is minor.
> > 
> > More importantly, we simply can not do mmput() in oom_kill_process(),
> > this can deadlock if (for example) the caller holds i_mmap_rwsem and
> > mmput() actually leads to exit_mmap(); the victim can have this file
> > mmaped and in this case unmap_vmas/free_pgtables paths will take the
> > same lock for writing. And at least huge_pmd_share() does pmd_alloc()
> > under i_mmap_rwsem because VM_HUGETLB memory is not reclaimable.
> 
> Ouch, I have completely missed this during review! Thanks for catching
> this. On the second thought it is clear now. We really want to pin the
> mm_struct not the address space.
> 
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Acked-by: Hugh Dickins <hughd@google.com>

Thanks: looks like this is what was behind recent trinity/KSM deadlock,
https://lkml.org/lkml/2015/10/1/563

> 
> > ---
> >  mm/oom_kill.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 034d219..52abb78 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> >  
> >  	/* Get a reference to safely compare mm after task_unlock(victim) */
> >  	mm = victim->mm;
> > -	atomic_inc(&mm->mm_users);
> > +	atomic_inc(&mm->mm_count);
> >  	/*
> >  	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> >  	 * the OOM victim from depleting the memory reserves from the user
> > @@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> >  	}
> >  	rcu_read_unlock();
> >  
> > -	mmput(mm);
> > +	mmdrop(mm);
> >  	put_task_struct(victim);
> >  }
> >  #undef K
> > -- 
> > 2.4.3
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-06 22:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-05 16:34 [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users Oleg Nesterov
2015-10-05 17:36 ` Oleg Nesterov
2015-10-06 16:26   ` Oleg Nesterov
2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
2015-10-06 16:56   ` Michal Hocko
2015-10-06 22:49     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).