All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] [PATCH v3] mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once.
       [not found] <1506070646-4549-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
@ 2017-09-25 14:30 ` Michal Hocko
  2017-09-25 15:09   ` Tetsuo Handa
  2017-09-26 11:27   ` [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at task_will_free_mem(current) Tetsuo Handa
  0 siblings, 2 replies; 5+ messages in thread
From: Michal Hocko @ 2017-09-25 14:30 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: akpm, linux-mm, David Rientjes, Manish Jaggi, Oleg Nesterov,
	Vladimir Davydov

On Fri 22-09-17 17:57:26, Tetsuo Handa wrote:
[...]
> Michal Hocko has nacked this patch [3], and he suggested an alternative
> patch [4]. But he himself is not ready to clarify all the concerns with
> the alternative patch [5]. In addition to that, nobody is interested in
> either patch; we can not make progress here. Let's choose this patch for
> now, for this patch has smaller impact than the alternative patch.

My Nack stands and it is really annoying you are sending a patch for
inclusion regardless of that fact. An alternative approach has been
proposed and the mere fact that I do not have time to pursue this
direction is not reason to go with a incomplete solution. This is not an
issue many people would be facing to scream for a quick and dirty
workarounds AFAIK (there have been 0 reports from non-artificial
workloads).

> [1] http://lkml.kernel.org/r/e6c83a26-1d59-4afd-55cf-04e58bdde188@caviumnetworks.com
> [2] http://lkml.kernel.org/r/201708090835.ICI69305.VFFOLMHOStJOQF@I-love.SAKURA.ne.jp
> [3] http://lkml.kernel.org/r/20170821084307.GB25956@dhcp22.suse.cz
> [4] http://lkml.kernel.org/r/1503577106-9196-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
> [5] http://lkml.kernel.org/r/20170918064353.v35prpp6bkkbgqr6@dhcp22.suse.cz
> 
> Fixes: 696453e66630ad45 ("mm, oom: task_will_free_mem should skip oom_reaped tasks")
> Reported-by: Manish Jaggi <mjaggi@caviumnetworks.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
> Cc: David Rientjes <rientjes@google.com>
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [PATCH v3] mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once.
  2017-09-25 14:30 ` [PATCH] [PATCH v3] mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once Michal Hocko
@ 2017-09-25 15:09   ` Tetsuo Handa
  2017-09-26 11:27   ` [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at task_will_free_mem(current) Tetsuo Handa
  1 sibling, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2017-09-25 15:09 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, linux-mm, rientjes, mjaggi, oleg, vdavydov

Michal Hocko wrote:
> On Fri 22-09-17 17:57:26, Tetsuo Handa wrote:
> [...]
> > Michal Hocko has nacked this patch [3], and he suggested an alternative
> > patch [4]. But he himself is not ready to clarify all the concerns with
> > the alternative patch [5]. In addition to that, nobody is interested in
> > either patch; we can not make progress here. Let's choose this patch for
> > now, for this patch has smaller impact than the alternative patch.
> 
> My Nack stands and it is really annoying you are sending a patch for
> inclusion regardless of that fact. An alternative approach has been
> proposed and the mere fact that I do not have time to pursue this
> direction is not reason to go with a incomplete solution. This is not an
> issue many people would be facing to scream for a quick and dirty
> workarounds AFAIK (there have been 0 reports from non-artificial
> workloads).
> 
But the alternative approach is also an incomplete solution because of below
limitations.

  (1) Since we cannot use direct reclaim for this allocation attempt due to
      oom_lock already held, an OOM victim will be prematurely killed which
      could have been avoided if direct reclaim with oom_lock released was
      used.

  (2) Since we call panic() before calling oom_kill_process() when there is
      no killable process, panic() will be prematurely called which could
      have been avoided if this patch is used. For example, if a multithreaded
      application running with a dedicated CPUs/memory was OOM-killed, we
      can wait until ALLOC_OOM allocation fails to solve OOM situation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at task_will_free_mem(current).
  2017-09-25 14:30 ` [PATCH] [PATCH v3] mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once Michal Hocko
  2017-09-25 15:09   ` Tetsuo Handa
@ 2017-09-26 11:27   ` Tetsuo Handa
  2017-09-26 11:39     ` Michal Hocko
  1 sibling, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2017-09-26 11:27 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, linux-mm, rientjes, mjaggi, oleg, vdavydov.dev, torvalds

Michal Hocko wrote:
> On Fri 22-09-17 17:57:26, Tetsuo Handa wrote:
> [...]
> > Michal Hocko has nacked this patch [3], and he suggested an alternative
> > patch [4]. But he himself is not ready to clarify all the concerns with
> > the alternative patch [5]. In addition to that, nobody is interested in
> > either patch; we can not make progress here. Let's choose this patch for
> > now, for this patch has smaller impact than the alternative patch.
> 
> My Nack stands and it is really annoying you are sending a patch for
> inclusion regardless of that fact. An alternative approach has been
> proposed and the mere fact that I do not have time to pursue this
> direction is not reason to go with a incomplete solution. This is not an
> issue many people would be facing to scream for a quick and dirty
> workarounds AFAIK (there have been 0 reports from non-artificial
> workloads).
> 

You again said there is no report, without providing a mean to tell
whether they actually hit it. Then, this patch must get merged now.
----------------------------------------

>From b67f6482db0f973ae7ecaa1d9873ccfd6dd151b7 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 26 Sep 2017 20:09:36 +0900
Subject: [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at
 task_will_free_mem(current).

There still is a race window where next OOM victim is selected needlessly,
but we are intentionally leaving that window open because Michal Hocko has
never heard about a report from non-artificial workloads. However, it is
too difficult for normal users to tell whether they actually hit that race.
Thus, add a WARN_ON() to task_will_free_mem(current) if they hit that race
in order to encourage them to report it. This patch will tell us whether we
need to care about that race.

[   83.504172] Out of memory: Kill process 2899 (a.out) score 930 or sacrifice child
[   83.506794] Killed process 2899 (a.out) total-vm:16781904kB, anon-rss:88kB, file-rss:0kB, shmem-rss:3519864kB
[   83.513499] oom_reaper: reaped process 2899 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3520204kB
[   83.516494] Racing OOM victim selection. Please report to linux-mm@kvack.org if you saw this warning from non-artificial workloads.
[   83.519793] ------------[ cut here ]------------
[   83.522008] WARNING: CPU: 0 PID: 2934 at mm/oom_kill.c:798 task_will_free_mem+0x11a/0x130
[   83.524881] Modules linked in: coretemp pcspkr sg i2c_piix4 vmw_vmci shpchp sd_mod ata_generic pata_acpi serio_raw vmwgfx drm_kms_helper mptspi syscopyarea scsi_transport_spi sysfillrect mptscsih ahci libahci mptbase sysimgblt fb_sys_fops ttm e1000 ata_piix drm i2c_core libata ipv6
[   83.532023] CPU: 0 PID: 2934 Comm: a.out Not tainted 4.14.0-rc2-next-20170926+ #672

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/oom_kill.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dee0f75..ac3c63d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -794,8 +794,10 @@ static bool task_will_free_mem(struct task_struct *task)
 	 * This task has already been drained by the oom reaper so there are
 	 * only small chances it will free some more
 	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags))
+	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+		WARN(1, "Racing OOM victim selection. Please report to linux-mm@kvack.org if you saw this warning from non-artificial workloads.\n");
 		return false;
+	}
 
 	if (atomic_read(&mm->mm_users) <= 1)
 		return true;
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at task_will_free_mem(current).
  2017-09-26 11:27   ` [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at task_will_free_mem(current) Tetsuo Handa
@ 2017-09-26 11:39     ` Michal Hocko
  2017-09-27  5:00       ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2017-09-26 11:39 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: akpm, linux-mm, rientjes, mjaggi, oleg, vdavydov.dev, torvalds

On Tue 26-09-17 20:27:40, Tetsuo Handa wrote:
[...]
> @@ -794,8 +794,10 @@ static bool task_will_free_mem(struct task_struct *task)
>  	 * This task has already been drained by the oom reaper so there are
>  	 * only small chances it will free some more
>  	 */
> -	if (test_bit(MMF_OOM_SKIP, &mm->flags))
> +	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
> +		WARN(1, "Racing OOM victim selection. Please report to linux-mm@kvack.org if you saw this warning from non-artificial workloads.\n");
>  		return false;
> +	}

This can easily happen even without a race. Just consider that OOM
memory reserves got depleted. I think that the existing oom report will
tell us that the race happened by checking the mm counters.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at task_will_free_mem(current).
  2017-09-26 11:39     ` Michal Hocko
@ 2017-09-27  5:00       ` Tetsuo Handa
  0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2017-09-27  5:00 UTC (permalink / raw)
  To: mhocko, akpm; +Cc: linux-mm, rientjes, mjaggi, oleg, vdavydov.dev, torvalds

Michal Hocko wrote:
> On Tue 26-09-17 20:27:40, Tetsuo Handa wrote:
> [...]
> > @@ -794,8 +794,10 @@ static bool task_will_free_mem(struct task_struct *task)
> >  	 * This task has already been drained by the oom reaper so there are
> >  	 * only small chances it will free some more
> >  	 */
> > -	if (test_bit(MMF_OOM_SKIP, &mm->flags))
> > +	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
> > +		WARN(1, "Racing OOM victim selection. Please report to linux-mm@kvack.org if you saw this warning from non-artificial workloads.\n");
> >  		return false;
> > +	}
> 
> This can easily happen even without a race. Just consider that OOM
> memory reserves got depleted.

What!? You said test_bit(MMF_OOM_SKIP, &mm->flags) == T can easily happen?
I was assuming that you believe that test_bit(MMF_OOM_SKIP, &mm->flags) == T
can't easily happen.

ALLOC_OOM was introduced in order to prevent OOM memory reserves from getting
completely depleted. I assume that you meant that OOM memory reserves got low
enough to fail ALLOC_OOM allocation attempt. But at the same time it means that
there is possibility that OOM memory reserves are not low enough to fail
ALLOC_OOM allocation attempt (but !ALLOC_OOM allocation attempt fails) when
this happens. Then, we are sure that we are already killing next OOM victims
needlessly because there is possibility that ALLOC_OOM allocation attempt can
succeed if we force it by "mm, oom:task_will_free_mem(current) should ignore
MMF_OOM_SKIP for once." patch. You prove that there is no reason we defer that
patch. We can revert that patch when we find better implementation in the future.

>                               I think that the existing oom report will
> tell us that the race happened by checking the mm counters.

I don't think so. Normal users won't dare to post their OOM reports in
order to ask us to judge whether the race happened. We won't be able to
judge whether the race happened unless all OOM reports are unconditionally
posted to ML. What a horrible idea...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-09-27  5:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1506070646-4549-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
2017-09-25 14:30 ` [PATCH] [PATCH v3] mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once Michal Hocko
2017-09-25 15:09   ` Tetsuo Handa
2017-09-26 11:27   ` [PATCH] mm,oom: Warn on racing with MMF_OOM_SKIP at task_will_free_mem(current) Tetsuo Handa
2017-09-26 11:39     ` Michal Hocko
2017-09-27  5:00       ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.