All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vmalloc: back off only when the current task is OOM killed
@ 2017-10-10 10:58 Tetsuo Handa
  2017-10-10 11:54   ` Michal Hocko
  2017-10-10 12:47   ` Johannes Weiner
  0 siblings, 2 replies; 13+ messages in thread
From: Tetsuo Handa @ 2017-10-10 10:58 UTC (permalink / raw)
  To: hannes, akpm
  Cc: alan, hch, mhocko, linux-mm, linux-kernel, kernel-team, Tetsuo Handa

Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
upon SIGKILL. But since the intent of that commit was to avoid unlimited
access to memory reserves, we should have checked tsk_is_oom_victim()
rather than fatal_signal_pending().

Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
TIF_MEMDIE for memory reserves access"), it is possible to trigger
"complete depletion of memory reserves" and "extra OOM kills due to
depletion of memory reserves" by doing a large vmalloc() request if commit
5d17a73a2ebeb8d1 is reverted. Thus, let's keep checking tsk_is_oom_victim()
rather than removing fatal_signal_pending().

  [1] http://lkml.kernel.org/r/42eb5d53-5ceb-a9ce-791a-9469af30810c@I-love.SAKURA.ne.jp
  [2] http://lkml.kernel.org/r/20171003225504.GA966@cmpxchg.org

Fixes: 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is killed")
Cc: stable # 4.11+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/vmalloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8a43db6..6add29d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,7 @@
 #include <linux/compiler.h>
 #include <linux/llist.h>
 #include <linux/bitops.h>
+#include <linux/oom.h>
 
 #include <linux/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1695,7 +1696,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	for (i = 0; i < area->nr_pages; i++) {
 		struct page *page;
 
-		if (fatal_signal_pending(current)) {
+		if (tsk_is_oom_victim(current)) {
 			area->nr_pages = i;
 			goto fail_no_warn;
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
  2017-10-10 10:58 [PATCH] vmalloc: back off only when the current task is OOM killed Tetsuo Handa
@ 2017-10-10 11:54   ` Michal Hocko
  2017-10-10 12:47   ` Johannes Weiner
  1 sibling, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2017-10-10 11:54 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

On Tue 10-10-17 19:58:53, Tetsuo Handa wrote:
> Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> upon SIGKILL. But since the intent of that commit was to avoid unlimited
> access to memory reserves, we should have checked tsk_is_oom_victim()
> rather than fatal_signal_pending().
> 
> Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> TIF_MEMDIE for memory reserves access"), it is possible to trigger
> "complete depletion of memory reserves"

How would that be possible? OOM victims are not allowed to consume whole
reserves and the vmalloc context would have to do something utterly
wrong like PF_MEMALLOC to make this happen. Protecting from such a code
is simply pointless.

> and "extra OOM kills due to depletion of memory reserves"

and this is simply the case for the most vmalloc allocations because
they are not reflected in the oom selection so if there is a massive
vmalloc consumer it is very likely that we will kill a large part the
userspace before hitting the user context on behalf which the vmalloc
allocation is performed.

I have tried to explain this is not really needed before but you keep
insisting which is highly annoying. The patch as is is not harmful but
it is simply _pointless_ IMHO.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
@ 2017-10-10 11:54   ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2017-10-10 11:54 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

On Tue 10-10-17 19:58:53, Tetsuo Handa wrote:
> Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> upon SIGKILL. But since the intent of that commit was to avoid unlimited
> access to memory reserves, we should have checked tsk_is_oom_victim()
> rather than fatal_signal_pending().
> 
> Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> TIF_MEMDIE for memory reserves access"), it is possible to trigger
> "complete depletion of memory reserves"

How would that be possible? OOM victims are not allowed to consume whole
reserves and the vmalloc context would have to do something utterly
wrong like PF_MEMALLOC to make this happen. Protecting from such a code
is simply pointless.

> and "extra OOM kills due to depletion of memory reserves"

and this is simply the case for the most vmalloc allocations because
they are not reflected in the oom selection so if there is a massive
vmalloc consumer it is very likely that we will kill a large part the
userspace before hitting the user context on behalf which the vmalloc
allocation is performed.

I have tried to explain this is not really needed before but you keep
insisting which is highly annoying. The patch as is is not harmful but
it is simply _pointless_ IMHO.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
  2017-10-10 11:54   ` Michal Hocko
@ 2017-10-10 12:47     ` Tetsuo Handa
  -1 siblings, 0 replies; 13+ messages in thread
From: Tetsuo Handa @ 2017-10-10 12:47 UTC (permalink / raw)
  To: mhocko; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

Michal Hocko wrote:
> On Tue 10-10-17 19:58:53, Tetsuo Handa wrote:
> > Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> > killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> > upon SIGKILL. But since the intent of that commit was to avoid unlimited
> > access to memory reserves, we should have checked tsk_is_oom_victim()
> > rather than fatal_signal_pending().
> > 
> > Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> > TIF_MEMDIE for memory reserves access"), it is possible to trigger
> > "complete depletion of memory reserves"
> 
> How would that be possible? OOM victims are not allowed to consume whole
> reserves and the vmalloc context would have to do something utterly
> wrong like PF_MEMALLOC to make this happen. Protecting from such a code
> is simply pointless.

Oops. I was confused when writing that part.
Indeed, "complete" was demonstrated without commit cd04ae1e2dc8e365.

> 
> > and "extra OOM kills due to depletion of memory reserves"
> 
> and this is simply the case for the most vmalloc allocations because
> they are not reflected in the oom selection so if there is a massive
> vmalloc consumer it is very likely that we will kill a large part the
> userspace before hitting the user context on behalf which the vmalloc
> allocation is performed.

If there is a massive alloc_page() loop it is as well very likely that
we will kill a large part the userspace before hitting the user context
on behalf which the alloc_page() allocation is performed.

I think that massive vmalloc() consumers should be (as well as massive
alloc_page() consumers) careful such that they will be chosen as first OOM
victim, for vmalloc() does not abort as soon as an OOM occurs. Thus, I used
set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
"complete" depletion.

> 
> I have tried to explain this is not really needed before but you keep
> insisting which is highly annoying. The patch as is is not harmful but
> it is simply _pointless_ IMHO.

Then, how can massive vmalloc() consumers become careful?
Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
Then, what about adding some comment like "Never try to allocate large
memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
@ 2017-10-10 12:47     ` Tetsuo Handa
  0 siblings, 0 replies; 13+ messages in thread
From: Tetsuo Handa @ 2017-10-10 12:47 UTC (permalink / raw)
  To: mhocko; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

Michal Hocko wrote:
> On Tue 10-10-17 19:58:53, Tetsuo Handa wrote:
> > Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> > killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> > upon SIGKILL. But since the intent of that commit was to avoid unlimited
> > access to memory reserves, we should have checked tsk_is_oom_victim()
> > rather than fatal_signal_pending().
> > 
> > Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> > TIF_MEMDIE for memory reserves access"), it is possible to trigger
> > "complete depletion of memory reserves"
> 
> How would that be possible? OOM victims are not allowed to consume whole
> reserves and the vmalloc context would have to do something utterly
> wrong like PF_MEMALLOC to make this happen. Protecting from such a code
> is simply pointless.

Oops. I was confused when writing that part.
Indeed, "complete" was demonstrated without commit cd04ae1e2dc8e365.

> 
> > and "extra OOM kills due to depletion of memory reserves"
> 
> and this is simply the case for the most vmalloc allocations because
> they are not reflected in the oom selection so if there is a massive
> vmalloc consumer it is very likely that we will kill a large part the
> userspace before hitting the user context on behalf which the vmalloc
> allocation is performed.

If there is a massive alloc_page() loop it is as well very likely that
we will kill a large part the userspace before hitting the user context
on behalf which the alloc_page() allocation is performed.

I think that massive vmalloc() consumers should be (as well as massive
alloc_page() consumers) careful such that they will be chosen as first OOM
victim, for vmalloc() does not abort as soon as an OOM occurs. Thus, I used
set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
"complete" depletion.

> 
> I have tried to explain this is not really needed before but you keep
> insisting which is highly annoying. The patch as is is not harmful but
> it is simply _pointless_ IMHO.

Then, how can massive vmalloc() consumers become careful?
Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
Then, what about adding some comment like "Never try to allocate large
memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
  2017-10-10 10:58 [PATCH] vmalloc: back off only when the current task is OOM killed Tetsuo Handa
@ 2017-10-10 12:47   ` Johannes Weiner
  2017-10-10 12:47   ` Johannes Weiner
  1 sibling, 0 replies; 13+ messages in thread
From: Johannes Weiner @ 2017-10-10 12:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, alan, hch, mhocko, linux-mm, linux-kernel, kernel-team

On Tue, Oct 10, 2017 at 07:58:53PM +0900, Tetsuo Handa wrote:
> Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> upon SIGKILL. But since the intent of that commit was to avoid unlimited
> access to memory reserves, we should have checked tsk_is_oom_victim()
> rather than fatal_signal_pending().
> 
> Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> TIF_MEMDIE for memory reserves access"), it is possible to trigger
> "complete depletion of memory reserves" and "extra OOM kills due to
> depletion of memory reserves" by doing a large vmalloc() request if commit
> 5d17a73a2ebeb8d1 is reverted. Thus, let's keep checking tsk_is_oom_victim()
> rather than removing fatal_signal_pending().

Nothing has changed since the last time you proposed this.

Who is doing large vmallocs, and why shouldn't we annotate what's
special instead of littering generic code with checks for unlikely
events?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
@ 2017-10-10 12:47   ` Johannes Weiner
  0 siblings, 0 replies; 13+ messages in thread
From: Johannes Weiner @ 2017-10-10 12:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, alan, hch, mhocko, linux-mm, linux-kernel, kernel-team

On Tue, Oct 10, 2017 at 07:58:53PM +0900, Tetsuo Handa wrote:
> Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> upon SIGKILL. But since the intent of that commit was to avoid unlimited
> access to memory reserves, we should have checked tsk_is_oom_victim()
> rather than fatal_signal_pending().
> 
> Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> TIF_MEMDIE for memory reserves access"), it is possible to trigger
> "complete depletion of memory reserves" and "extra OOM kills due to
> depletion of memory reserves" by doing a large vmalloc() request if commit
> 5d17a73a2ebeb8d1 is reverted. Thus, let's keep checking tsk_is_oom_victim()
> rather than removing fatal_signal_pending().

Nothing has changed since the last time you proposed this.

Who is doing large vmallocs, and why shouldn't we annotate what's
special instead of littering generic code with checks for unlikely
events?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
  2017-10-10 12:47     ` Tetsuo Handa
@ 2017-10-10 13:49       ` Michal Hocko
  -1 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2017-10-10 13:49 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

On Tue 10-10-17 21:47:02, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 10-10-17 19:58:53, Tetsuo Handa wrote:
> > > Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> > > killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> > > upon SIGKILL. But since the intent of that commit was to avoid unlimited
> > > access to memory reserves, we should have checked tsk_is_oom_victim()
> > > rather than fatal_signal_pending().
> > > 
> > > Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> > > TIF_MEMDIE for memory reserves access"), it is possible to trigger
> > > "complete depletion of memory reserves"
> > 
> > How would that be possible? OOM victims are not allowed to consume whole
> > reserves and the vmalloc context would have to do something utterly
> > wrong like PF_MEMALLOC to make this happen. Protecting from such a code
> > is simply pointless.
> 
> Oops. I was confused when writing that part.
> Indeed, "complete" was demonstrated without commit cd04ae1e2dc8e365.
> 
> > 
> > > and "extra OOM kills due to depletion of memory reserves"
> > 
> > and this is simply the case for the most vmalloc allocations because
> > they are not reflected in the oom selection so if there is a massive
> > vmalloc consumer it is very likely that we will kill a large part the
> > userspace before hitting the user context on behalf which the vmalloc
> > allocation is performed.
> 
> If there is a massive alloc_page() loop it is as well very likely that
> we will kill a large part the userspace before hitting the user context
> on behalf which the alloc_page() allocation is performed.

exactly!

> I think that massive vmalloc() consumers should be (as well as massive
> alloc_page() consumers) careful such that they will be chosen as first OOM
> victim, for vmalloc() does not abort as soon as an OOM occurs.

No. This would require to spread those checks all over the place. That
is why we have that logic inside the allocator which fails the
allocation at certain point in time. Large/unbound/user controlled sized
allocations from the kernel are always a bug and really hard one to
protect from. It is simply impossible to know the intention.

> Thus, I used
> set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
> "complete" depletion.

which was a completely artificial example as already mentioned.

> > I have tried to explain this is not really needed before but you keep
> > insisting which is highly annoying. The patch as is is not harmful but
> > it is simply _pointless_ IMHO.
> 
> Then, how can massive vmalloc() consumers become careful?
> Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
> Then, what about adding some comment like "Never try to allocate large
> memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?

Come on! Seriously we do expect some competence from the code running in
the kernel space. We do not really need to add a comment that you
shouldn't shoot your head because it might hurt. Please try to focus on
real issues. There are many of them to chase after...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
@ 2017-10-10 13:49       ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2017-10-10 13:49 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

On Tue 10-10-17 21:47:02, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 10-10-17 19:58:53, Tetsuo Handa wrote:
> > > Commit 5d17a73a2ebeb8d1 ("vmalloc: back off when the current task is
> > > killed") revealed two bugs [1] [2] that were not ready to fail vmalloc()
> > > upon SIGKILL. But since the intent of that commit was to avoid unlimited
> > > access to memory reserves, we should have checked tsk_is_oom_victim()
> > > rather than fatal_signal_pending().
> > > 
> > > Note that even with commit cd04ae1e2dc8e365 ("mm, oom: do not rely on
> > > TIF_MEMDIE for memory reserves access"), it is possible to trigger
> > > "complete depletion of memory reserves"
> > 
> > How would that be possible? OOM victims are not allowed to consume whole
> > reserves and the vmalloc context would have to do something utterly
> > wrong like PF_MEMALLOC to make this happen. Protecting from such a code
> > is simply pointless.
> 
> Oops. I was confused when writing that part.
> Indeed, "complete" was demonstrated without commit cd04ae1e2dc8e365.
> 
> > 
> > > and "extra OOM kills due to depletion of memory reserves"
> > 
> > and this is simply the case for the most vmalloc allocations because
> > they are not reflected in the oom selection so if there is a massive
> > vmalloc consumer it is very likely that we will kill a large part the
> > userspace before hitting the user context on behalf which the vmalloc
> > allocation is performed.
> 
> If there is a massive alloc_page() loop it is as well very likely that
> we will kill a large part the userspace before hitting the user context
> on behalf which the alloc_page() allocation is performed.

exactly!

> I think that massive vmalloc() consumers should be (as well as massive
> alloc_page() consumers) careful such that they will be chosen as first OOM
> victim, for vmalloc() does not abort as soon as an OOM occurs.

No. This would require to spread those checks all over the place. That
is why we have that logic inside the allocator which fails the
allocation at certain point in time. Large/unbound/user controlled sized
allocations from the kernel are always a bug and really hard one to
protect from. It is simply impossible to know the intention.

> Thus, I used
> set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
> "complete" depletion.

which was a completely artificial example as already mentioned.

> > I have tried to explain this is not really needed before but you keep
> > insisting which is highly annoying. The patch as is is not harmful but
> > it is simply _pointless_ IMHO.
> 
> Then, how can massive vmalloc() consumers become careful?
> Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
> Then, what about adding some comment like "Never try to allocate large
> memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?

Come on! Seriously we do expect some competence from the code running in
the kernel space. We do not really need to add a comment that you
shouldn't shoot your head because it might hurt. Please try to focus on
real issues. There are many of them to chase after...

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
  2017-10-10 13:49       ` Michal Hocko
@ 2017-10-10 14:13         ` Tetsuo Handa
  -1 siblings, 0 replies; 13+ messages in thread
From: Tetsuo Handa @ 2017-10-10 14:13 UTC (permalink / raw)
  To: mhocko; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

Michal Hocko wrote:
> On Tue 10-10-17 21:47:02, Tetsuo Handa wrote:
> > I think that massive vmalloc() consumers should be (as well as massive
> > alloc_page() consumers) careful such that they will be chosen as first OOM
> > victim, for vmalloc() does not abort as soon as an OOM occurs.
> 
> No. This would require to spread those checks all over the place. That
> is why we have that logic inside the allocator which fails the
> allocation at certain point in time. Large/unbound/user controlled sized
> allocations from the kernel are always a bug and really hard one to
> protect from. It is simply impossible to know the intention.
> 
> > Thus, I used
> > set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
> > "complete" depletion.
> 
> which was a completely artificial example as already mentioned.
> 
> > > I have tried to explain this is not really needed before but you keep
> > > insisting which is highly annoying. The patch as is is not harmful but
> > > it is simply _pointless_ IMHO.
> > 
> > Then, how can massive vmalloc() consumers become careful?
> > Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
> > Then, what about adding some comment like "Never try to allocate large
> > memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?
> 
> Come on! Seriously we do expect some competence from the code running in
> the kernel space. We do not really need to add a comment that you
> shouldn't shoot your head because it might hurt. Please try to focus on
> real issues. There are many of them to chase after...
> 
My understanding is that vmalloc() is provided for allocating large memory
where kmalloc() is difficult to satisfy. If we say "do not allocate large
memory with vmalloc() because large allocations from the kernel are always
a bug", it sounds like denial of raison d'etre of vmalloc(). Strange...

But anyway, I am not bothered by vmalloc(). What I'm bothered is warn_alloc()
lockup. Please go ahead with removal of fatal_signal_pending() for vmalloc().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
@ 2017-10-10 14:13         ` Tetsuo Handa
  0 siblings, 0 replies; 13+ messages in thread
From: Tetsuo Handa @ 2017-10-10 14:13 UTC (permalink / raw)
  To: mhocko; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

Michal Hocko wrote:
> On Tue 10-10-17 21:47:02, Tetsuo Handa wrote:
> > I think that massive vmalloc() consumers should be (as well as massive
> > alloc_page() consumers) careful such that they will be chosen as first OOM
> > victim, for vmalloc() does not abort as soon as an OOM occurs.
> 
> No. This would require to spread those checks all over the place. That
> is why we have that logic inside the allocator which fails the
> allocation at certain point in time. Large/unbound/user controlled sized
> allocations from the kernel are always a bug and really hard one to
> protect from. It is simply impossible to know the intention.
> 
> > Thus, I used
> > set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
> > "complete" depletion.
> 
> which was a completely artificial example as already mentioned.
> 
> > > I have tried to explain this is not really needed before but you keep
> > > insisting which is highly annoying. The patch as is is not harmful but
> > > it is simply _pointless_ IMHO.
> > 
> > Then, how can massive vmalloc() consumers become careful?
> > Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
> > Then, what about adding some comment like "Never try to allocate large
> > memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?
> 
> Come on! Seriously we do expect some competence from the code running in
> the kernel space. We do not really need to add a comment that you
> shouldn't shoot your head because it might hurt. Please try to focus on
> real issues. There are many of them to chase after...
> 
My understanding is that vmalloc() is provided for allocating large memory
where kmalloc() is difficult to satisfy. If we say "do not allocate large
memory with vmalloc() because large allocations from the kernel are always
a bug", it sounds like denial of raison d'etre of vmalloc(). Strange...

But anyway, I am not bothered by vmalloc(). What I'm bothered is warn_alloc()
lockup. Please go ahead with removal of fatal_signal_pending() for vmalloc().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
  2017-10-10 14:13         ` Tetsuo Handa
@ 2017-10-10 14:17           ` Michal Hocko
  -1 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2017-10-10 14:17 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

On Tue 10-10-17 23:13:21, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 10-10-17 21:47:02, Tetsuo Handa wrote:
> > > I think that massive vmalloc() consumers should be (as well as massive
> > > alloc_page() consumers) careful such that they will be chosen as first OOM
> > > victim, for vmalloc() does not abort as soon as an OOM occurs.
> > 
> > No. This would require to spread those checks all over the place. That
> > is why we have that logic inside the allocator which fails the
> > allocation at certain point in time. Large/unbound/user controlled sized
> > allocations from the kernel are always a bug and really hard one to
> > protect from. It is simply impossible to know the intention.
> > 
> > > Thus, I used
> > > set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
> > > "complete" depletion.
> > 
> > which was a completely artificial example as already mentioned.
> > 
> > > > I have tried to explain this is not really needed before but you keep
> > > > insisting which is highly annoying. The patch as is is not harmful but
> > > > it is simply _pointless_ IMHO.
> > > 
> > > Then, how can massive vmalloc() consumers become careful?
> > > Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
> > > Then, what about adding some comment like "Never try to allocate large
> > > memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?
> > 
> > Come on! Seriously we do expect some competence from the code running in
> > the kernel space. We do not really need to add a comment that you
> > shouldn't shoot your head because it might hurt. Please try to focus on
> > real issues. There are many of them to chase after...
> > 
> My understanding is that vmalloc() is provided for allocating large memory
> where kmalloc() is difficult to satisfy. If we say "do not allocate large
> memory with vmalloc() because large allocations from the kernel are always
> a bug", it sounds like denial of raison d'etre of vmalloc(). Strange...

try to find some middle ground between literal following the wording and
a common sense. In kernel anything larger than order-3 is a large
allocation. The large we are arguing here is MBs of memory.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vmalloc: back off only when the current task is OOM killed
@ 2017-10-10 14:17           ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2017-10-10 14:17 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

On Tue 10-10-17 23:13:21, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 10-10-17 21:47:02, Tetsuo Handa wrote:
> > > I think that massive vmalloc() consumers should be (as well as massive
> > > alloc_page() consumers) careful such that they will be chosen as first OOM
> > > victim, for vmalloc() does not abort as soon as an OOM occurs.
> > 
> > No. This would require to spread those checks all over the place. That
> > is why we have that logic inside the allocator which fails the
> > allocation at certain point in time. Large/unbound/user controlled sized
> > allocations from the kernel are always a bug and really hard one to
> > protect from. It is simply impossible to know the intention.
> > 
> > > Thus, I used
> > > set_current_oom_origin()/clear_current_oom_origin() when I demonstrated
> > > "complete" depletion.
> > 
> > which was a completely artificial example as already mentioned.
> > 
> > > > I have tried to explain this is not really needed before but you keep
> > > > insisting which is highly annoying. The patch as is is not harmful but
> > > > it is simply _pointless_ IMHO.
> > > 
> > > Then, how can massive vmalloc() consumers become careful?
> > > Explicitly use __vmalloc() and pass __GFP_NOMEMALLOC ?
> > > Then, what about adding some comment like "Never try to allocate large
> > > memory using plain vmalloc(). Use __vmalloc() with __GFP_NOMEMALLOC." ?
> > 
> > Come on! Seriously we do expect some competence from the code running in
> > the kernel space. We do not really need to add a comment that you
> > shouldn't shoot your head because it might hurt. Please try to focus on
> > real issues. There are many of them to chase after...
> > 
> My understanding is that vmalloc() is provided for allocating large memory
> where kmalloc() is difficult to satisfy. If we say "do not allocate large
> memory with vmalloc() because large allocations from the kernel are always
> a bug", it sounds like denial of raison d'etre of vmalloc(). Strange...

try to find some middle ground between literal following the wording and
a common sense. In kernel anything larger than order-3 is a large
allocation. The large we are arguing here is MBs of memory.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-10 14:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 10:58 [PATCH] vmalloc: back off only when the current task is OOM killed Tetsuo Handa
2017-10-10 11:54 ` Michal Hocko
2017-10-10 11:54   ` Michal Hocko
2017-10-10 12:47   ` Tetsuo Handa
2017-10-10 12:47     ` Tetsuo Handa
2017-10-10 13:49     ` Michal Hocko
2017-10-10 13:49       ` Michal Hocko
2017-10-10 14:13       ` Tetsuo Handa
2017-10-10 14:13         ` Tetsuo Handa
2017-10-10 14:17         ` Michal Hocko
2017-10-10 14:17           ` Michal Hocko
2017-10-10 12:47 ` Johannes Weiner
2017-10-10 12:47   ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.