* [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users
@ 2015-10-05 16:34 Oleg Nesterov
2015-10-05 17:36 ` Oleg Nesterov
2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-05 16:34 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
Tetsuo Handa, linux-kernel
oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
this ->mm can't go away and this is wrong, change it to rely on
->mm_count and mmdrop().
Firstly, we do not want to delay exit_mmap/etc if the victim exits
before we do mmput(), but this is minor.
More importantly, we simply can not do mmput() in oom_kill_process(),
this can deadlock. For example, suppose that access_process_vm(tsk)
triggers OOM and oom-killer decides to kill this "tsk". If it exits
and does mmput() before us, ksm_exit() called by us may want to want
to take the same mmap_sem for writing.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
mm/oom_kill.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 034d219..52abb78 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
/* Get a reference to safely compare mm after task_unlock(victim) */
mm = victim->mm;
- atomic_inc(&mm->mm_users);
+ atomic_inc(&mm->mm_count);
/*
* We should send SIGKILL before setting TIF_MEMDIE in order to prevent
* the OOM victim from depleting the memory reserves from the user
@@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
}
rcu_read_unlock();
- mmput(mm);
+ mmdrop(mm);
put_task_struct(victim);
}
#undef K
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users
2015-10-05 16:34 [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users Oleg Nesterov
@ 2015-10-05 17:36 ` Oleg Nesterov
2015-10-06 16:26 ` Oleg Nesterov
2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-05 17:36 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
Tetsuo Handa, linux-kernel
On 10/05, Oleg Nesterov wrote:
>
> oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
> this ->mm can't go away and this is wrong, change it to rely on
> ->mm_count and mmdrop().
>
> Firstly, we do not want to delay exit_mmap/etc if the victim exits
> before we do mmput(), but this is minor.
>
> More importantly, we simply can not do mmput() in oom_kill_process(),
> this can deadlock. For example, suppose that access_process_vm(tsk)
> triggers OOM and oom-killer decides to kill this "tsk". If it exits
> and does mmput() before us, ksm_exit() called by us may want to want
> to take the same mmap_sem for writing.
Self nack to the changelog ;)
I still think the patch is fine, I'll resend it. But the changelog
is wrong, in the case above access_process_vm() adds another reference
to ->mm_users, so mmput() can never lead to mm_users == 0.
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> mm/oom_kill.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 034d219..52abb78 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>
> /* Get a reference to safely compare mm after task_unlock(victim) */
> mm = victim->mm;
> - atomic_inc(&mm->mm_users);
> + atomic_inc(&mm->mm_count);
> /*
> * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> * the OOM victim from depleting the memory reserves from the user
> @@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> }
> rcu_read_unlock();
>
> - mmput(mm);
> + mmdrop(mm);
> put_task_struct(victim);
> }
> #undef K
> --
> 2.4.3
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users
2015-10-05 17:36 ` Oleg Nesterov
@ 2015-10-06 16:26 ` Oleg Nesterov
0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-06 16:26 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
Tetsuo Handa, linux-kernel
On 10/05, Oleg Nesterov wrote:
>
> Self nack to the changelog ;)
>
> I still think the patch is fine, I'll resend it. But the changelog
> is wrong, in the case above access_process_vm() adds another reference
> to ->mm_users, so mmput() can never lead to mm_users == 0.
Please see v2 with updated changelog I am going to send. However, somehow
I forgot that "mm/oom_kill.c: fix potentially killing unrelated process"
still sits in -mm, so I guess it would be better to fold this change into
mmoom-fix-potentially-killing-unrelated-process.patch.
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix
2015-10-05 16:34 [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users Oleg Nesterov
2015-10-05 17:36 ` Oleg Nesterov
@ 2015-10-06 16:28 ` Oleg Nesterov
2015-10-06 16:56 ` Michal Hocko
1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2015-10-06 16:28 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
Tetsuo Handa, linux-kernel
oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
this ->mm can't go away and this is wrong, change it to rely on
->mm_count and mmdrop().
Firstly, we do not want to delay exit_mmap/etc if the victim exits
before we do mmput(), but this is minor.
More importantly, we simply can not do mmput() in oom_kill_process(),
this can deadlock if (for example) the caller holds i_mmap_rwsem and
mmput() actually leads to exit_mmap(); the victim can have this file
mmaped and in this case unmap_vmas/free_pgtables paths will take the
same lock for writing. And at least huge_pmd_share() does pmd_alloc()
under i_mmap_rwsem because VM_HUGETLB memory is not reclaimable.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
mm/oom_kill.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 034d219..52abb78 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
/* Get a reference to safely compare mm after task_unlock(victim) */
mm = victim->mm;
- atomic_inc(&mm->mm_users);
+ atomic_inc(&mm->mm_count);
/*
* We should send SIGKILL before setting TIF_MEMDIE in order to prevent
* the OOM victim from depleting the memory reserves from the user
@@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
}
rcu_read_unlock();
- mmput(mm);
+ mmdrop(mm);
put_task_struct(victim);
}
#undef K
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix
2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
@ 2015-10-06 16:56 ` Michal Hocko
2015-10-06 22:49 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2015-10-06 16:56 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, David Rientjes, Kyle Walker, Stanislav Kozina,
Tetsuo Handa, linux-kernel
On Tue 06-10-15 18:28:04, Oleg Nesterov wrote:
> oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
> this ->mm can't go away and this is wrong, change it to rely on
> ->mm_count and mmdrop().
>
> Firstly, we do not want to delay exit_mmap/etc if the victim exits
> before we do mmput(), but this is minor.
>
> More importantly, we simply can not do mmput() in oom_kill_process(),
> this can deadlock if (for example) the caller holds i_mmap_rwsem and
> mmput() actually leads to exit_mmap(); the victim can have this file
> mmaped and in this case unmap_vmas/free_pgtables paths will take the
> same lock for writing. And at least huge_pmd_share() does pmd_alloc()
> under i_mmap_rwsem because VM_HUGETLB memory is not reclaimable.
Ouch, I have completely missed this during review! Thanks for catching
this. On the second thought it is clear now. We really want to pin the
mm_struct not the address space.
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/oom_kill.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 034d219..52abb78 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>
> /* Get a reference to safely compare mm after task_unlock(victim) */
> mm = victim->mm;
> - atomic_inc(&mm->mm_users);
> + atomic_inc(&mm->mm_count);
> /*
> * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> * the OOM victim from depleting the memory reserves from the user
> @@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> }
> rcu_read_unlock();
>
> - mmput(mm);
> + mmdrop(mm);
> put_task_struct(victim);
> }
> #undef K
> --
> 2.4.3
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix
2015-10-06 16:56 ` Michal Hocko
@ 2015-10-06 22:49 ` Hugh Dickins
0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2015-10-06 22:49 UTC (permalink / raw)
To: Michal Hocko
Cc: Oleg Nesterov, Sasha Levin, Andrew Morton, David Rientjes,
Kyle Walker, Stanislav Kozina, Tetsuo Handa, linux-kernel
On Tue, 6 Oct 2015, Michal Hocko wrote:
> On Tue 06-10-15 18:28:04, Oleg Nesterov wrote:
> > oom_kill_process() does atomic_inc(&mm->mm_users) to ensure that
> > this ->mm can't go away and this is wrong, change it to rely on
> > ->mm_count and mmdrop().
> >
> > Firstly, we do not want to delay exit_mmap/etc if the victim exits
> > before we do mmput(), but this is minor.
> >
> > More importantly, we simply can not do mmput() in oom_kill_process(),
> > this can deadlock if (for example) the caller holds i_mmap_rwsem and
> > mmput() actually leads to exit_mmap(); the victim can have this file
> > mmaped and in this case unmap_vmas/free_pgtables paths will take the
> > same lock for writing. And at least huge_pmd_share() does pmd_alloc()
> > under i_mmap_rwsem because VM_HUGETLB memory is not reclaimable.
>
> Ouch, I have completely missed this during review! Thanks for catching
> this. On the second thought it is clear now. We really want to pin the
> mm_struct not the address space.
>
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Thanks: looks like this is what was behind recent trinity/KSM deadlock,
https://lkml.org/lkml/2015/10/1/563
>
> > ---
> > mm/oom_kill.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 034d219..52abb78 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -571,7 +571,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> >
> > /* Get a reference to safely compare mm after task_unlock(victim) */
> > mm = victim->mm;
> > - atomic_inc(&mm->mm_users);
> > + atomic_inc(&mm->mm_count);
> > /*
> > * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> > * the OOM victim from depleting the memory reserves from the user
> > @@ -609,7 +609,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> > }
> > rcu_read_unlock();
> >
> > - mmput(mm);
> > + mmdrop(mm);
> > put_task_struct(victim);
> > }
> > #undef K
> > --
> > 2.4.3
> >
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-10-06 22:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-05 16:34 [PATCH -mm] mm/oom_kill: change oom_kill_process() to bump ->mm_count rather than ->mm_users Oleg Nesterov
2015-10-05 17:36 ` Oleg Nesterov
2015-10-06 16:26 ` Oleg Nesterov
2015-10-06 16:28 ` [PATCH -mm] mmoom-fix-potentially-killing-unrelated-process-fix Oleg Nesterov
2015-10-06 16:56 ` Michal Hocko
2015-10-06 22:49 ` Hugh Dickins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).