From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: mhocko@kernel.org Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, guro@fb.com, vdavydov.dev@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/2] mm, oom: do not trigger out_of_memory from the #PF Date: Sat, 20 May 2017 00:22:30 +0900 [thread overview] Message-ID: <201705200022.BFJ12428.JFOSMLFOtFHOVQ@I-love.SAKURA.ne.jp> (raw) In-Reply-To: <20170519132209.GG29839@dhcp22.suse.cz> Michal Hocko wrote: > On Fri 19-05-17 22:02:44, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > Any allocation failure during the #PF path will return with VM_FAULT_OOM > > > which in turn results in pagefault_out_of_memory. This can happen for > > > 2 different reasons. a) Memcg is out of memory and we rely on > > > mem_cgroup_oom_synchronize to perform the memcg OOM handling or b) > > > normal allocation fails. > > > > > > The later is quite problematic because allocation paths already trigger > > > out_of_memory and the page allocator tries really hard to not fail > > > > We made many memory allocation requests from page fault path (e.g. XFS) > > __GFP_FS some time ago, didn't we? But if I recall correctly (I couldn't > > find the message), there are some allocation requests from page fault path > > which cannot use __GFP_FS. Then, not all allocation requests can call > > oom_kill_process() and reaching pagefault_out_of_memory() will be > > inevitable. > > Even if such an allocation fail without the OOM killer then we simply > retry the PF and will do that the same way how we keep retrying the > allocation inside the page allocator. So how is this any different? You are trying to remove out_of_memory() from pagefault_out_of_memory() by this patch. But you also want to make !__GFP_FS allocations not to keep retrying inside the page allocator in future kernels, don't you? Then, a thread which need to allocate memory from page fault path but cannot call oom_kill_process() will spin forever (unless somebody else calls oom_kill_process() via a __GFP_FS allocation request). I consider that introducing such possibility is a problem. > > > > allocations. Anyway, if the OOM killer has been already invoked there > > > is no reason to invoke it again from the #PF path. Especially when the > > > OOM condition might be gone by that time and we have no way to find out > > > other than allocate. > > > > > > Moreover if the allocation failed and the OOM killer hasn't been > > > invoked then we are unlikely to do the right thing from the #PF context > > > because we have already lost the allocation context and restictions and > > > therefore might oom kill a task from a different NUMA domain. > > > > If we carry a flag via task_struct that indicates whether it is an memory > > allocation request from page fault and allocation failure is not acceptable, > > we can call out_of_memory() from page allocator path. > > I do not understand We need to allocate memory from page fault path in order to avoid spinning forever (unless somebody else calls oom_kill_process() via a __GFP_FS allocation request), doesn't it? Then, memory allocation requests from page fault path can pass flags like __GFP_NOFAIL | __GFP_KILLABLE because retrying the page fault without allocating memory is pointless. I called such flags as carry a flag via task_struct. > > By the way, can page fault occur after reaching do_exit()? When a thread > > reached do_exit(), fatal_signal_pending(current) becomes false, doesn't it? > > yes fatal_signal_pending will be false at the time and I believe we can > perform a page fault past that moment and go via allocation path which would > trigger the OOM or give this task access to reserves but it is more > likely that the oom reaper will push to kill another task by that time > if the situation didn't get resolved. Or did I miss your concern? How checking fatal_signal_pending() here helps? It only suppresses printk(). If current thread needs to allocate memory because not all allocation requests can call oom_kill_process(), doing printk() is not the right thing to do. Allocate memory by some means (e.g. __GFP_NOFAIL | __GFP_KILLABLE) will be the right thing to do.
WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: mhocko@kernel.org Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, guro@fb.com, vdavydov.dev@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/2] mm, oom: do not trigger out_of_memory from the #PF Date: Sat, 20 May 2017 00:22:30 +0900 [thread overview] Message-ID: <201705200022.BFJ12428.JFOSMLFOtFHOVQ@I-love.SAKURA.ne.jp> (raw) In-Reply-To: <20170519132209.GG29839@dhcp22.suse.cz> Michal Hocko wrote: > On Fri 19-05-17 22:02:44, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > Any allocation failure during the #PF path will return with VM_FAULT_OOM > > > which in turn results in pagefault_out_of_memory. This can happen for > > > 2 different reasons. a) Memcg is out of memory and we rely on > > > mem_cgroup_oom_synchronize to perform the memcg OOM handling or b) > > > normal allocation fails. > > > > > > The later is quite problematic because allocation paths already trigger > > > out_of_memory and the page allocator tries really hard to not fail > > > > We made many memory allocation requests from page fault path (e.g. XFS) > > __GFP_FS some time ago, didn't we? But if I recall correctly (I couldn't > > find the message), there are some allocation requests from page fault path > > which cannot use __GFP_FS. Then, not all allocation requests can call > > oom_kill_process() and reaching pagefault_out_of_memory() will be > > inevitable. > > Even if such an allocation fail without the OOM killer then we simply > retry the PF and will do that the same way how we keep retrying the > allocation inside the page allocator. So how is this any different? You are trying to remove out_of_memory() from pagefault_out_of_memory() by this patch. But you also want to make !__GFP_FS allocations not to keep retrying inside the page allocator in future kernels, don't you? Then, a thread which need to allocate memory from page fault path but cannot call oom_kill_process() will spin forever (unless somebody else calls oom_kill_process() via a __GFP_FS allocation request). I consider that introducing such possibility is a problem. > > > > allocations. Anyway, if the OOM killer has been already invoked there > > > is no reason to invoke it again from the #PF path. Especially when the > > > OOM condition might be gone by that time and we have no way to find out > > > other than allocate. > > > > > > Moreover if the allocation failed and the OOM killer hasn't been > > > invoked then we are unlikely to do the right thing from the #PF context > > > because we have already lost the allocation context and restictions and > > > therefore might oom kill a task from a different NUMA domain. > > > > If we carry a flag via task_struct that indicates whether it is an memory > > allocation request from page fault and allocation failure is not acceptable, > > we can call out_of_memory() from page allocator path. > > I do not understand We need to allocate memory from page fault path in order to avoid spinning forever (unless somebody else calls oom_kill_process() via a __GFP_FS allocation request), doesn't it? Then, memory allocation requests from page fault path can pass flags like __GFP_NOFAIL | __GFP_KILLABLE because retrying the page fault without allocating memory is pointless. I called such flags as carry a flag via task_struct. > > By the way, can page fault occur after reaching do_exit()? When a thread > > reached do_exit(), fatal_signal_pending(current) becomes false, doesn't it? > > yes fatal_signal_pending will be false at the time and I believe we can > perform a page fault past that moment and go via allocation path which would > trigger the OOM or give this task access to reserves but it is more > likely that the oom reaper will push to kill another task by that time > if the situation didn't get resolved. Or did I miss your concern? How checking fatal_signal_pending() here helps? It only suppresses printk(). If current thread needs to allocate memory because not all allocation requests can call oom_kill_process(), doing printk() is not the right thing to do. Allocate memory by some means (e.g. __GFP_NOFAIL | __GFP_KILLABLE) will be the right thing to do. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-05-19 15:22 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-05-19 11:26 [PATCH 0/2] fix premature OOM killer Michal Hocko 2017-05-19 11:26 ` Michal Hocko 2017-05-19 11:26 ` [PATCH 1/2] mm, oom: make sure that the oom victim uses memory reserves Michal Hocko 2017-05-19 11:26 ` Michal Hocko 2017-05-19 12:12 ` Tetsuo Handa 2017-05-19 12:12 ` Tetsuo Handa 2017-05-19 12:46 ` Michal Hocko 2017-05-19 12:46 ` Michal Hocko 2017-05-22 15:06 ` Roman Gushchin 2017-05-22 15:06 ` Roman Gushchin 2017-05-19 11:26 ` [RFC PATCH 2/2] mm, oom: do not trigger out_of_memory from the #PF Michal Hocko 2017-05-19 11:26 ` Michal Hocko 2017-05-19 13:02 ` Tetsuo Handa 2017-05-19 13:02 ` Tetsuo Handa 2017-05-19 13:22 ` Michal Hocko 2017-05-19 13:22 ` Michal Hocko 2017-05-19 15:22 ` Tetsuo Handa [this message] 2017-05-19 15:22 ` Tetsuo Handa 2017-05-19 15:50 ` Michal Hocko 2017-05-19 15:50 ` Michal Hocko 2017-05-19 23:43 ` Tetsuo Handa 2017-05-19 23:43 ` Tetsuo Handa 2017-05-22 9:31 ` Michal Hocko 2017-05-22 9:31 ` Michal Hocko 2017-06-08 14:36 ` Michal Hocko 2017-06-08 14:36 ` Michal Hocko 2017-06-09 14:08 ` Johannes Weiner 2017-06-09 14:08 ` Johannes Weiner 2017-06-09 14:46 ` Michal Hocko 2017-06-09 14:46 ` Michal Hocko 2017-06-10 8:49 ` Michal Hocko 2017-06-10 8:49 ` Michal Hocko 2017-06-10 11:57 ` [RFC PATCH 2/2] mm, oom: do not trigger out_of_memory from the#PF Tetsuo Handa 2017-06-10 11:57 ` Tetsuo Handa 2017-06-12 7:39 ` Michal Hocko 2017-06-12 7:39 ` Michal Hocko 2017-06-12 10:48 ` [RFC PATCH 2/2] mm, oom: do not trigger out_of_memory from the #PF Tetsuo Handa 2017-06-12 10:48 ` Tetsuo Handa 2017-06-12 11:06 ` Michal Hocko 2017-06-12 11:06 ` Michal Hocko 2017-06-23 12:50 ` Michal Hocko 2017-06-23 12:50 ` Michal Hocko 2017-05-19 11:37 ` [PATCH 0/2] fix premature OOM killer Tetsuo Handa 2017-05-19 11:37 ` Tetsuo Handa 2017-05-19 12:47 ` Michal Hocko 2017-05-19 12:47 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=201705200022.BFJ12428.JFOSMLFOtFHOVQ@I-love.SAKURA.ne.jp \ --to=penguin-kernel@i-love.sakura.ne.jp \ --cc=akpm@linux-foundation.org \ --cc=guro@fb.com \ --cc=hannes@cmpxchg.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=vdavydov.dev@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.