From: Michal Hocko <mhocko@kernel.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>, Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, David Rientjes <rientjes@google.com>, Tejun Heo <tj@kernel.org> Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Date: Mon, 27 Aug 2018 15:51:01 +0200 [thread overview] Message-ID: <20180827135101.15700-1-mhocko@kernel.org> (raw) From: Michal Hocko <mhocko@suse.com> Tetsuo Handa has reported that it is possible to bypass the short sleep for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") and lock up the system if OOM. The primary reason is that WQ_MEM_RECLAIM WQs are not guaranteed to run even when they have a rescuer available. Those workers might be essential for reclaim to make a forward progress, however. If we are too unlucky all the allocations requests can get stuck waiting for a WQ_MEM_RECLAIM work item and the system is essentially stuck in an OOM condition without much hope to move on. Tetsuo has seen the reclaim stuck on drain_local_pages_wq or xlog_cil_push_work (xfs). There might be others. Since should_reclaim_retry() should be a natural reschedule point, let's do the short sleep for PF_WQ_WORKER threads unconditionally in order to guarantee that other pending work items are started. This will workaround this problem and it is less fragile than hunting down when the sleep is missed. E.g. we used to have a sleeping point in the oom path but this has been removed recently because it caused other issues. Having a single sleeping point is more robust. Reported-and-debugged-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> --- Hi Andrew, this has been previously posted [1] but it took quite some time to finally understand the issue [2]. Can we push this to mmotm and linux-next? I wouldn't hurry to merge this but the longer we have a wider testing exposure the better. Thanks! [1] http://lkml.kernel.org/r/ca3da8b8-1bb5-c302-b190-fa6cebab58ca@I-love.SAKURA.ne.jp [2] http://lkml.kernel.org/r/20180730145425.GE1206094@devbig004.ftw2.facebook.com mm/page_alloc.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e75865d58ba7..5fc5e500b5d0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3923,6 +3923,7 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, { struct zone *zone; struct zoneref *z; + bool ret = false; /* * Costly allocations might have made a progress but this doesn't mean @@ -3986,25 +3987,26 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, } } - /* - * Memory allocation/reclaim might be called from a WQ - * context and the current implementation of the WQ - * concurrency control doesn't recognize that - * a particular WQ is congested if the worker thread is - * looping without ever sleeping. Therefore we have to - * do a short sleep here rather than calling - * cond_resched(). - */ - if (current->flags & PF_WQ_WORKER) - schedule_timeout_uninterruptible(1); - else - cond_resched(); - - return true; + ret = true; + goto out; } } - return false; +out: + /* + * Memory allocation/reclaim might be called from a WQ + * context and the current implementation of the WQ + * concurrency control doesn't recognize that + * a particular WQ is congested if the worker thread is + * looping without ever sleeping. Therefore we have to + * do a short sleep here rather than calling + * cond_resched(). + */ + if (current->flags & PF_WQ_WORKER) + schedule_timeout_uninterruptible(1); + else + cond_resched(); + return ret; } static inline bool -- 2.18.0
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>, Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>, Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, David Rientjes <rientjes@google.com>, Tejun Heo <tj@kernel.org> Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Date: Mon, 27 Aug 2018 15:51:01 +0200 [thread overview] Message-ID: <20180827135101.15700-1-mhocko@kernel.org> (raw) From: Michal Hocko <mhocko@suse.com> Tetsuo Handa has reported that it is possible to bypass the short sleep for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") and lock up the system if OOM. The primary reason is that WQ_MEM_RECLAIM WQs are not guaranteed to run even when they have a rescuer available. Those workers might be essential for reclaim to make a forward progress, however. If we are too unlucky all the allocations requests can get stuck waiting for a WQ_MEM_RECLAIM work item and the system is essentially stuck in an OOM condition without much hope to move on. Tetsuo has seen the reclaim stuck on drain_local_pages_wq or xlog_cil_push_work (xfs). There might be others. Since should_reclaim_retry() should be a natural reschedule point, let's do the short sleep for PF_WQ_WORKER threads unconditionally in order to guarantee that other pending work items are started. This will workaround this problem and it is less fragile than hunting down when the sleep is missed. E.g. we used to have a sleeping point in the oom path but this has been removed recently because it caused other issues. Having a single sleeping point is more robust. Reported-and-debugged-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> --- Hi Andrew, this has been previously posted [1] but it took quite some time to finally understand the issue [2]. Can we push this to mmotm and linux-next? I wouldn't hurry to merge this but the longer we have a wider testing exposure the better. Thanks! [1] http://lkml.kernel.org/r/ca3da8b8-1bb5-c302-b190-fa6cebab58ca@I-love.SAKURA.ne.jp [2] http://lkml.kernel.org/r/20180730145425.GE1206094@devbig004.ftw2.facebook.com mm/page_alloc.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e75865d58ba7..5fc5e500b5d0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3923,6 +3923,7 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, { struct zone *zone; struct zoneref *z; + bool ret = false; /* * Costly allocations might have made a progress but this doesn't mean @@ -3986,25 +3987,26 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, } } - /* - * Memory allocation/reclaim might be called from a WQ - * context and the current implementation of the WQ - * concurrency control doesn't recognize that - * a particular WQ is congested if the worker thread is - * looping without ever sleeping. Therefore we have to - * do a short sleep here rather than calling - * cond_resched(). - */ - if (current->flags & PF_WQ_WORKER) - schedule_timeout_uninterruptible(1); - else - cond_resched(); - - return true; + ret = true; + goto out; } } - return false; +out: + /* + * Memory allocation/reclaim might be called from a WQ + * context and the current implementation of the WQ + * concurrency control doesn't recognize that + * a particular WQ is congested if the worker thread is + * looping without ever sleeping. Therefore we have to + * do a short sleep here rather than calling + * cond_resched(). + */ + if (current->flags & PF_WQ_WORKER) + schedule_timeout_uninterruptible(1); + else + cond_resched(); + return ret; } static inline bool -- 2.18.0
next reply other threads:[~2018-08-27 13:51 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-27 13:51 Michal Hocko [this message] 2018-08-27 13:51 ` [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry() Michal Hocko -- strict thread matches above, loose matches on Subject: below -- 2018-07-26 11:06 Tetsuo Handa 2018-07-26 11:06 ` Tetsuo Handa 2018-07-26 11:39 ` Michal Hocko 2018-07-27 15:47 ` Tetsuo Handa 2018-07-30 9:32 ` Michal Hocko 2018-07-30 14:34 ` Tetsuo Handa 2018-07-30 14:46 ` Michal Hocko 2018-07-30 14:54 ` Tejun Heo 2018-07-30 15:25 ` Tetsuo Handa 2018-07-30 15:44 ` Tejun Heo 2018-07-30 18:51 ` Michal Hocko 2018-07-30 19:10 ` Michal Hocko 2018-07-30 19:10 ` Michal Hocko 2018-07-30 21:01 ` Tetsuo Handa 2018-07-31 5:09 ` Michal Hocko 2018-07-31 10:47 ` Tetsuo Handa 2018-07-31 11:15 ` Michal Hocko 2018-07-31 11:30 ` Tetsuo Handa 2018-07-31 11:55 ` Michal Hocko 2018-08-02 22:05 ` Tetsuo Handa 2018-08-03 6:16 ` Michal Hocko 2018-08-21 21:07 ` Tetsuo Handa 2018-08-22 7:32 ` Michal Hocko 2018-08-23 20:06 ` David Rientjes 2018-08-23 21:00 ` Tetsuo Handa 2018-08-23 22:45 ` David Rientjes 2018-08-24 0:31 ` Tetsuo Handa 2018-09-05 13:20 ` Tetsuo Handa 2018-09-05 13:40 ` Michal Hocko 2018-09-05 13:53 ` Tetsuo Handa 2018-09-05 14:04 ` Michal Hocko 2018-09-06 1:00 ` Tetsuo Handa 2018-09-06 5:57 ` Michal Hocko 2018-09-06 6:22 ` Tetsuo Handa 2018-09-06 7:03 ` Tetsuo Handa 2018-07-30 19:14 ` Tejun Heo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180827135101.15700-1-mhocko@kernel.org \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=guro@fb.com \ --cc=hannes@cmpxchg.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.com \ --cc=rientjes@google.com \ --cc=tj@kernel.org \ --cc=vdavydov.dev@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.