All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm,oom: Disable preemption inside the OOM killer.
@ 2018-03-22 11:04 Tetsuo Handa
  2018-03-22 11:40 ` Michal Hocko
  0 siblings, 1 reply; 2+ messages in thread
From: Tetsuo Handa @ 2018-03-22 11:04 UTC (permalink / raw)
  To: linux-mm
  Cc: Tetsuo Handa, David Rientjes, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Tejun Heo, Vladimir Davydov

cond_resched() from printk() or CONFIG_PREEMPT=y can allow other
contending allocating paths to disturb the owner of oom_lock.
They can break

  /*
   * Acquire the oom lock.  If that fails, somebody else is
   * making progress for us.
   */

assumption in __alloc_pages_may_oom().

If we use mutex_lock_killable() instead of mutex_trylock(), we can
guarantee that noone forever continues wasting CPU resource and disturbs
the owner of oom_lock. But when I proposed such change at [1], Michal
responded that it is worse because it significantly delays the OOM reaper
 from reclaiming memory. [2] is an alternative which will not delay the
OOM reaper, but [2] was already rejected.

Therefore, I proposed further steps at [3] and [4]. But Michal still does
not like it because it does not address preemption problem. I don't
consider preemption as a problem because [1] will eventually stop
disturbing the owner of oom_lock by stop wasting CPU resource.

It will be nice if we can make the OOM context not preemptible. But it is
not easy because printk() can be very slow which might not fit for
disabling the preemption. Since the printk() is responsible for printing
dying messages, we need to be careful not to deprive printk() of CPU
resources. From that aspect, [3] is safer direction than making the OOM
context not preemptible. Of course, if we could get rid of direct reclaim,
we won't need [3] from the beginning, for [3] is the last defense against
forever disturbing the owner of oom_lock by wasting CPU resource for
direct reclaim without any progress.

Nonetheless, this patch disables preemption inside the OOM killer as much
as possible, for this is the direction Michal wants to go.

[1] http://lkml.kernel.org/r/201802202232.IEC26597.FOQtMFOFJHOSVL@I-love.SAKURA.ne.jp
[2] http://lkml.kernel.org/r/1481020439-5867-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
[3] http://lkml.kernel.org/r/201802241700.JJB51016.FQOLFJHFOOSVMt@I-love.SAKURA.ne.jp
[4] http://lkml.kernel.org/r/201803022010.BJE26043.LtSOOVFQOMJFHF@I-love.SAKURA.ne.jp

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
---
 mm/oom_kill.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dcdb642..614d1a2 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1068,7 +1068,7 @@ int unregister_oom_notifier(struct notifier_block *nb)
  * OR try to be smart about which process to kill. Note that we
  * don't have to be perfect here, we just have to be good.
  */
-bool out_of_memory(struct oom_control *oc)
+static bool __out_of_memory(struct oom_control *oc)
 {
 	unsigned long freed = 0;
 	enum oom_constraint constraint = CONSTRAINT_NONE;
@@ -1077,7 +1077,9 @@ bool out_of_memory(struct oom_control *oc)
 		return false;
 
 	if (!is_memcg_oom(oc)) {
+		preempt_enable();
 		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
+		preempt_disable();
 		if (freed > 0)
 			/* Got some memory back in the last second. */
 			return true;
@@ -1138,6 +1140,16 @@ bool out_of_memory(struct oom_control *oc)
 	return !!oc->chosen_task;
 }
 
+bool out_of_memory(struct oom_control *oc)
+{
+	bool ret;
+
+	preempt_disable();
+	ret = __out_of_memory(oc);
+	preempt_enable();
+	return ret;
+}
+
 /*
  * The pagefault handler calls here because it is out of memory, so kill a
  * memory-hogging task. If oom_lock is held by somebody else, a parallel oom
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] mm,oom: Disable preemption inside the OOM killer.
  2018-03-22 11:04 [PATCH] mm,oom: Disable preemption inside the OOM killer Tetsuo Handa
@ 2018-03-22 11:40 ` Michal Hocko
  0 siblings, 0 replies; 2+ messages in thread
From: Michal Hocko @ 2018-03-22 11:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-mm, David Rientjes, Johannes Weiner, Roman Gushchin,
	Tejun Heo, Vladimir Davydov

On Thu 22-03-18 20:04:12, Tetsuo Handa wrote:
> cond_resched() from printk() or CONFIG_PREEMPT=y can allow other
> contending allocating paths to disturb the owner of oom_lock.
> They can break
> 
>   /*
>    * Acquire the oom lock.  If that fails, somebody else is
>    * making progress for us.
>    */
> 
> assumption in __alloc_pages_may_oom().
> 
> If we use mutex_lock_killable() instead of mutex_trylock(), we can
> guarantee that noone forever continues wasting CPU resource and disturbs
> the owner of oom_lock.

Wrong! _Any_ non allocating task could still preempt the lock holder.

> But when I proposed such change at [1], Michal
> responded that it is worse because it significantly delays the OOM reaper
>  from reclaiming memory. [2] is an alternative which will not delay the
> OOM reaper, but [2] was already rejected.
> 
> Therefore, I proposed further steps at [3] and [4]. But Michal still does
> not like it because it does not address preemption problem. I don't
> consider preemption as a problem because [1] will eventually stop
> disturbing the owner of oom_lock by stop wasting CPU resource.
> 
> It will be nice if we can make the OOM context not preemptible. But it is
> not easy because printk() can be very slow which might not fit for
> disabling the preemption. Since the printk() is responsible for printing
> dying messages, we need to be careful not to deprive printk() of CPU
> resources. From that aspect, [3] is safer direction than making the OOM
> context not preemptible. Of course, if we could get rid of direct reclaim,
> we won't need [3] from the beginning, for [3] is the last defense against
> forever disturbing the owner of oom_lock by wasting CPU resource for
> direct reclaim without any progress.

Can you pretty please try to come up with a reasonable changelog that
doesn't refer to 4 different links and state the problem and the way how
the patch addresses it? Whoever is interested in the history of the
change can look into mailing list archives.

> Nonetheless, this patch disables preemption inside the OOM killer as much
> as possible, for this is the direction Michal wants to go.

And we are doing some pretty heavy lifting in the oom path so disabling
the whole preemption is a no-go. You are likely to introduce soft
lockups on large machines.

Look, this has been explained to you already but you keep ignoring that.
We are not going to add a code that risks negative side effects just
because of an artificial workload of yours. It would be great to handle
it as well but that is way far from straightforward. Large machines with
zilions of tasks are real, on the other hand. So please try to think out
of your bubble finally!

Nacked-by: Michal Hocko <mhocko@suse.com>

> 
> [1] http://lkml.kernel.org/r/201802202232.IEC26597.FOQtMFOFJHOSVL@I-love.SAKURA.ne.jp
> [2] http://lkml.kernel.org/r/1481020439-5867-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
> [3] http://lkml.kernel.org/r/201802241700.JJB51016.FQOLFJHFOOSVMt@I-love.SAKURA.ne.jp
> [4] http://lkml.kernel.org/r/201803022010.BJE26043.LtSOOVFQOMJFHF@I-love.SAKURA.ne.jp
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Tejun Heo <tj@kernel.org>
> ---
>  mm/oom_kill.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index dcdb642..614d1a2 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1068,7 +1068,7 @@ int unregister_oom_notifier(struct notifier_block *nb)
>   * OR try to be smart about which process to kill. Note that we
>   * don't have to be perfect here, we just have to be good.
>   */
> -bool out_of_memory(struct oom_control *oc)
> +static bool __out_of_memory(struct oom_control *oc)
>  {
>  	unsigned long freed = 0;
>  	enum oom_constraint constraint = CONSTRAINT_NONE;
> @@ -1077,7 +1077,9 @@ bool out_of_memory(struct oom_control *oc)
>  		return false;
>  
>  	if (!is_memcg_oom(oc)) {
> +		preempt_enable();
>  		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> +		preempt_disable();
>  		if (freed > 0)
>  			/* Got some memory back in the last second. */
>  			return true;
> @@ -1138,6 +1140,16 @@ bool out_of_memory(struct oom_control *oc)
>  	return !!oc->chosen_task;
>  }
>  
> +bool out_of_memory(struct oom_control *oc)
> +{
> +	bool ret;
> +
> +	preempt_disable();
> +	ret = __out_of_memory(oc);
> +	preempt_enable();
> +	return ret;
> +}
> +
>  /*
>   * The pagefault handler calls here because it is out of memory, so kill a
>   * memory-hogging task. If oom_lock is held by somebody else, a parallel oom
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-03-22 11:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22 11:04 [PATCH] mm,oom: Disable preemption inside the OOM killer Tetsuo Handa
2018-03-22 11:40 ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.