From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: mhocko@kernel.org Cc: rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [patch] mm, oom: prevent additional oom kills before memory is freed Date: Thu, 15 Jun 2017 20:32:39 +0900 [thread overview] Message-ID: <201706152032.BFE21313.MSHQOtLVFFJOOF@I-love.SAKURA.ne.jp> (raw) In-Reply-To: <20170615110119.GI1486@dhcp22.suse.cz> Michal Hocko wrote: > On Thu 15-06-17 19:53:24, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Wed 14-06-17 16:43:03, David Rientjes wrote: > > > > If mm->mm_users is not incremented because it is already zero by the oom > > > > reaper, meaning the final refcount has been dropped, do not set > > > > MMF_OOM_SKIP prematurely. > > > > > > > > __mmput() may not have had a chance to do exit_mmap() yet, so memory from > > > > a previous oom victim is still mapped. > > > > > > true and do we have a _guarantee_ it will do it? E.g. can somebody block > > > exit_aio from completing? Or can somebody hold mmap_sem and thus block > > > ksm_exit resp. khugepaged_exit from completing? The reason why I was > > > conservative and set such a mm as MMF_OOM_SKIP was because I couldn't > > > give a definitive answer to those questions. And we really _want_ to > > > have a guarantee of a forward progress here. Killing an additional > > > proecess is a price to pay and if that doesn't trigger normall it sounds > > > like a reasonable compromise to me. > > > > Right. If you want this patch, __oom_reap_task_mm() must not return true without > > setting MMF_OOM_SKIP (in other words, return false if __oom_reap_task_mm() > > does not set MMF_OOM_SKIP). The most important role of the OOM reaper is to > > guarantee that the OOM killer is re-enabled within finite time, for __mmput() > > cannot guarantee that MMF_OOM_SKIP is set within finite time. > > An alternative would be to allow reaping and exit_mmap race. The unmap > part should just work I guess. We just have to be careful to not race > with free_pgtables and that shouldn't be too hard to implement (e.g. > (ab)use mmap_sem for write there). I haven't thought that through > completely though so I might miss something of course. I think below one is simpler. mm/oom_kill.c | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0e2c925..c63f514 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -466,11 +466,10 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); -static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) +static void __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { struct mmu_gather tlb; struct vm_area_struct *vma; - bool ret = true; /* * We have to make sure to not race with the victim exit path @@ -488,10 +487,8 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) */ mutex_lock(&oom_lock); - if (!down_read_trylock(&mm->mmap_sem)) { - ret = false; + if (!down_read_trylock(&mm->mmap_sem)) goto unlock_oom; - } /* * increase mm_users only after we know we will reap something so @@ -531,6 +528,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) NULL); } tlb_finish_mmu(&tlb, 0, -1); + set_bit(MMF_OOM_SKIP, &mm->flags); pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, K(get_mm_counter(mm, MM_ANONPAGES)), @@ -546,7 +544,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) mmput_async(mm); unlock_oom: mutex_unlock(&oom_lock); - return ret; } #define MAX_OOM_REAP_RETRIES 10 @@ -556,25 +553,21 @@ static void oom_reap_task(struct task_struct *tsk) struct mm_struct *mm = tsk->signal->oom_mm; /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) + while (__oom_reap_task_mm(tsk, mm), !test_bit(MMF_OOM_SKIP, &mm->flags) + && attempts++ < MAX_OOM_REAP_RETRIES) schedule_timeout_idle(HZ/10); - if (attempts <= MAX_OOM_REAP_RETRIES) - goto done; - - - pr_info("oom_reaper: unable to reap pid:%d (%s)\n", - task_pid_nr(tsk), tsk->comm); - debug_show_all_locks(); - -done: - tsk->oom_reaper_list = NULL; - /* * Hide this mm from OOM killer because it has been either reaped or * somebody can't call up_write(mmap_sem). */ - set_bit(MMF_OOM_SKIP, &mm->flags); + if (!test_and_set_bit(MMF_OOM_SKIP, &mm->flags)) { + pr_info("oom_reaper: unable to reap pid:%d (%s)\n", + task_pid_nr(tsk), tsk->comm); + debug_show_all_locks(); + } + + tsk->oom_reaper_list = NULL; /* Drop a reference taken by wake_oom_reaper */ put_task_struct(tsk);
WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: mhocko@kernel.org Cc: rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [patch] mm, oom: prevent additional oom kills before memory is freed Date: Thu, 15 Jun 2017 20:32:39 +0900 [thread overview] Message-ID: <201706152032.BFE21313.MSHQOtLVFFJOOF@I-love.SAKURA.ne.jp> (raw) In-Reply-To: <20170615110119.GI1486@dhcp22.suse.cz> Michal Hocko wrote: > On Thu 15-06-17 19:53:24, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Wed 14-06-17 16:43:03, David Rientjes wrote: > > > > If mm->mm_users is not incremented because it is already zero by the oom > > > > reaper, meaning the final refcount has been dropped, do not set > > > > MMF_OOM_SKIP prematurely. > > > > > > > > __mmput() may not have had a chance to do exit_mmap() yet, so memory from > > > > a previous oom victim is still mapped. > > > > > > true and do we have a _guarantee_ it will do it? E.g. can somebody block > > > exit_aio from completing? Or can somebody hold mmap_sem and thus block > > > ksm_exit resp. khugepaged_exit from completing? The reason why I was > > > conservative and set such a mm as MMF_OOM_SKIP was because I couldn't > > > give a definitive answer to those questions. And we really _want_ to > > > have a guarantee of a forward progress here. Killing an additional > > > proecess is a price to pay and if that doesn't trigger normall it sounds > > > like a reasonable compromise to me. > > > > Right. If you want this patch, __oom_reap_task_mm() must not return true without > > setting MMF_OOM_SKIP (in other words, return false if __oom_reap_task_mm() > > does not set MMF_OOM_SKIP). The most important role of the OOM reaper is to > > guarantee that the OOM killer is re-enabled within finite time, for __mmput() > > cannot guarantee that MMF_OOM_SKIP is set within finite time. > > An alternative would be to allow reaping and exit_mmap race. The unmap > part should just work I guess. We just have to be careful to not race > with free_pgtables and that shouldn't be too hard to implement (e.g. > (ab)use mmap_sem for write there). I haven't thought that through > completely though so I might miss something of course. I think below one is simpler. mm/oom_kill.c | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0e2c925..c63f514 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -466,11 +466,10 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); -static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) +static void __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { struct mmu_gather tlb; struct vm_area_struct *vma; - bool ret = true; /* * We have to make sure to not race with the victim exit path @@ -488,10 +487,8 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) */ mutex_lock(&oom_lock); - if (!down_read_trylock(&mm->mmap_sem)) { - ret = false; + if (!down_read_trylock(&mm->mmap_sem)) goto unlock_oom; - } /* * increase mm_users only after we know we will reap something so @@ -531,6 +528,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) NULL); } tlb_finish_mmu(&tlb, 0, -1); + set_bit(MMF_OOM_SKIP, &mm->flags); pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, K(get_mm_counter(mm, MM_ANONPAGES)), @@ -546,7 +544,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) mmput_async(mm); unlock_oom: mutex_unlock(&oom_lock); - return ret; } #define MAX_OOM_REAP_RETRIES 10 @@ -556,25 +553,21 @@ static void oom_reap_task(struct task_struct *tsk) struct mm_struct *mm = tsk->signal->oom_mm; /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) + while (__oom_reap_task_mm(tsk, mm), !test_bit(MMF_OOM_SKIP, &mm->flags) + && attempts++ < MAX_OOM_REAP_RETRIES) schedule_timeout_idle(HZ/10); - if (attempts <= MAX_OOM_REAP_RETRIES) - goto done; - - - pr_info("oom_reaper: unable to reap pid:%d (%s)\n", - task_pid_nr(tsk), tsk->comm); - debug_show_all_locks(); - -done: - tsk->oom_reaper_list = NULL; - /* * Hide this mm from OOM killer because it has been either reaped or * somebody can't call up_write(mmap_sem). */ - set_bit(MMF_OOM_SKIP, &mm->flags); + if (!test_and_set_bit(MMF_OOM_SKIP, &mm->flags)) { + pr_info("oom_reaper: unable to reap pid:%d (%s)\n", + task_pid_nr(tsk), tsk->comm); + debug_show_all_locks(); + } + + tsk->oom_reaper_list = NULL; /* Drop a reference taken by wake_oom_reaper */ put_task_struct(tsk); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-06-15 11:32 UTC|newest] Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-06-14 23:43 [patch] mm, oom: prevent additional oom kills before memory is freed David Rientjes 2017-06-14 23:43 ` David Rientjes 2017-06-15 10:39 ` Michal Hocko 2017-06-15 10:39 ` Michal Hocko 2017-06-15 10:53 ` Tetsuo Handa 2017-06-15 10:53 ` Tetsuo Handa 2017-06-15 11:01 ` Michal Hocko 2017-06-15 11:01 ` Michal Hocko 2017-06-15 11:32 ` Tetsuo Handa [this message] 2017-06-15 11:32 ` Tetsuo Handa 2017-06-15 12:03 ` Michal Hocko 2017-06-15 12:03 ` Michal Hocko 2017-06-15 12:13 ` Michal Hocko 2017-06-15 12:13 ` Michal Hocko 2017-06-15 13:01 ` Tetsuo Handa 2017-06-15 13:01 ` Tetsuo Handa 2017-06-15 13:22 ` Michal Hocko 2017-06-15 13:22 ` Michal Hocko 2017-06-15 21:43 ` Tetsuo Handa 2017-06-15 21:43 ` Tetsuo Handa 2017-06-15 21:37 ` David Rientjes 2017-06-15 21:37 ` David Rientjes 2017-06-15 12:20 ` Michal Hocko 2017-06-15 12:20 ` Michal Hocko 2017-06-15 21:26 ` David Rientjes 2017-06-15 21:26 ` David Rientjes 2017-06-15 21:41 ` Michal Hocko 2017-06-15 21:41 ` Michal Hocko 2017-06-15 22:03 ` David Rientjes 2017-06-15 22:03 ` David Rientjes 2017-06-15 22:12 ` Michal Hocko 2017-06-15 22:12 ` Michal Hocko 2017-06-15 22:42 ` David Rientjes 2017-06-15 22:42 ` David Rientjes 2017-06-16 8:06 ` Michal Hocko 2017-06-16 8:06 ` Michal Hocko 2017-06-16 0:54 ` Tetsuo Handa 2017-06-16 0:54 ` Tetsuo Handa 2017-06-16 4:00 ` Tetsuo Handa 2017-06-16 4:00 ` Tetsuo Handa 2017-06-16 8:39 ` Michal Hocko 2017-06-16 8:39 ` Michal Hocko 2017-06-16 10:27 ` Tetsuo Handa 2017-06-16 10:27 ` Tetsuo Handa 2017-06-16 11:02 ` Michal Hocko 2017-06-16 11:02 ` Michal Hocko 2017-06-16 14:26 ` Re: [patch] mm, oom: prevent additional oom kills before memoryis freed Tetsuo Handa 2017-06-16 14:26 ` Tetsuo Handa 2017-06-16 14:42 ` Michal Hocko 2017-06-16 14:42 ` Michal Hocko 2017-06-17 13:30 ` Re: [patch] mm, oom: prevent additional oom kills before memory is freed Tetsuo Handa 2017-06-17 13:30 ` Tetsuo Handa 2017-06-23 12:38 ` Michal Hocko 2017-06-23 12:38 ` Michal Hocko 2017-06-16 12:22 ` Tetsuo Handa 2017-06-16 12:22 ` Tetsuo Handa 2017-06-16 14:12 ` Michal Hocko 2017-06-16 14:12 ` Michal Hocko 2017-06-17 5:17 ` [PATCH] mm,oom_kill: Close race window of needlessly selecting new victims Tetsuo Handa 2017-06-17 5:17 ` Tetsuo Handa 2017-06-20 22:12 ` David Rientjes 2017-06-20 22:12 ` David Rientjes 2017-06-21 2:17 ` Tetsuo Handa 2017-06-21 20:31 ` David Rientjes 2017-06-21 20:31 ` David Rientjes 2017-06-22 0:53 ` Tetsuo Handa 2017-06-23 12:45 ` Michal Hocko 2017-06-23 12:45 ` Michal Hocko 2017-06-21 13:18 ` Michal Hocko 2017-06-21 13:18 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=201706152032.BFE21313.MSHQOtLVFFJOOF@I-love.SAKURA.ne.jp \ --to=penguin-kernel@i-love.sakura.ne.jp \ --cc=akpm@linux-foundation.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.