linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
@ 2021-05-19 19:23 Aaron Tomlin
  2021-05-19 19:32 ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Tomlin @ 2021-05-19 19:23 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, vbabka, mhocko, linux-kernel

It does not make sense to retry compaction when a fatal signal is
pending.

In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).

This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..49f416ffb54f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (!order)
 		return false;
 
+	if (fatal_signal_pending(current))
+		goto out;
+
 	if (compaction_made_progress(compact_result))
 		(*compaction_retries)++;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-19 19:23 [PATCH v2] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt Aaron Tomlin
@ 2021-05-19 19:32 ` Matthew Wilcox
  2021-05-19 19:48   ` Aaron Tomlin
  2021-05-19 20:17   ` [PATCH v3] " Aaron Tomlin
  0 siblings, 2 replies; 15+ messages in thread
From: Matthew Wilcox @ 2021-05-19 19:32 UTC (permalink / raw)
  To: Aaron Tomlin; +Cc: linux-mm, akpm, vbabka, mhocko, linux-kernel

On Wed, May 19, 2021 at 08:23:21PM +0100, Aaron Tomlin wrote:
> +++ b/mm/page_alloc.c
> @@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  	if (!order)
>  		return false;
>  
> +	if (fatal_signal_pending(current))
> +		goto out;

I think 'goto out' will be confusing.  It'll output a tracepoint, which
isn't going to record that a fatal signal is pending, so it'll cause
some head scratching for someone looking through the traces.  I
think we should just return false here and skip the tracepoint.

But I'd defer to someone like Vlastimil or Michal who know this code far
better than I do.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-19 19:32 ` Matthew Wilcox
@ 2021-05-19 19:48   ` Aaron Tomlin
  2021-05-19 20:17   ` [PATCH v3] " Aaron Tomlin
  1 sibling, 0 replies; 15+ messages in thread
From: Aaron Tomlin @ 2021-05-19 19:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, akpm, vbabka, mhocko, linux-kernel

On Wed 2021-05-19 20:32 +0100, Matthew Wilcox wrote:
> On Wed, May 19, 2021 at 08:23:21PM +0100, Aaron Tomlin wrote:
> > +++ b/mm/page_alloc.c
> > @@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
> >  	if (!order)
> >  		return false;
> >  
> > +	if (fatal_signal_pending(current))
> > +		goto out;
> 
> I think 'goto out' will be confusing.  It'll output a tracepoint, which
> isn't going to record that a fatal signal is pending, so it'll cause
> some head scratching for someone looking through the traces.  I
> think we should just return false here and skip the tracepoint.

I agree. Having said this, I do plan to send a patch to illustrate why
compaction should not retry to avoid possible confusion. Nevertheless, I am
happy to modify as per your request.


Kind regards,

-- 
Aaron Tomlin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-19 19:32 ` Matthew Wilcox
  2021-05-19 19:48   ` Aaron Tomlin
@ 2021-05-19 20:17   ` Aaron Tomlin
  2021-05-20  4:34     ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Aaron Tomlin @ 2021-05-19 20:17 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, vbabka, mhocko, willy, linux-kernel

It does not make sense to retry compaction when a fatal signal is
pending.

In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).

This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..b317057ac186 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (!order)
 		return false;
 
+	if (fatal_signal_pending(current))
+		return false;
+
 	if (compaction_made_progress(compact_result))
 		(*compaction_retries)++;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-19 20:17   ` [PATCH v3] " Aaron Tomlin
@ 2021-05-20  4:34     ` Andrew Morton
  2021-05-20 10:20       ` Vlastimil Babka
  2021-05-20 11:09       ` [PATCH v3] " Matthew Wilcox
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2021-05-20  4:34 UTC (permalink / raw)
  To: Aaron Tomlin; +Cc: linux-mm, vbabka, mhocko, willy, linux-kernel

On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote:

> It does not make sense to retry compaction when a fatal signal is
> pending.

Well, it might make sense.  Presumably it is beneficial to other tasks.

> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
> 
> This patch ensures that compaction is not needlessly retried
> irrespective of the last known compaction result e.g. if it was skipped,
> in the unlikely case a fatal signal is found pending.
> So, OOM is at least attempted.

What observed problems motivated this change?

What were the observed runtime effects of this change?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20  4:34     ` Andrew Morton
@ 2021-05-20 10:20       ` Vlastimil Babka
  2021-05-20 11:42         ` Aaron Tomlin
  2021-05-20 11:09       ` [PATCH v3] " Matthew Wilcox
  1 sibling, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2021-05-20 10:20 UTC (permalink / raw)
  To: Andrew Morton, Aaron Tomlin; +Cc: linux-mm, mhocko, willy, linux-kernel

On 5/20/21 6:34 AM, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote:
> 
>> It does not make sense to retry compaction when a fatal signal is
>> pending.
> 
> Well, it might make sense.  Presumably it is beneficial to other tasks.

Yeah but the compaction won't happen. compact_zone() will immediately detect it
via __compact_finished() and bail out. So in that sense it does not make sense
to retry :)

>> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
>> returned; albeit, not every zone, on the zone list, would be considered
>> in the case a fatal signal is found to be pending.
>> Yet, in should_compact_retry(), given the last known compaction result,
>> each zone, on the zone list, can be considered/or checked
>> (see compaction_zonelist_suitable()). For example, if a zone was found
>> to succeed, then reclaim/compaction would be tried again
>> (notwithstanding the above).
>> 
>> This patch ensures that compaction is not needlessly retried
>> irrespective of the last known compaction result e.g. if it was skipped,
>> in the unlikely case a fatal signal is found pending.
>> So, OOM is at least attempted.
> 
> What observed problems motivated this change?
> 
> What were the observed runtime effects of this change?

Yep those details from the previous thread should be included here.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20  4:34     ` Andrew Morton
  2021-05-20 10:20       ` Vlastimil Babka
@ 2021-05-20 11:09       ` Matthew Wilcox
  1 sibling, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2021-05-20 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Aaron Tomlin, linux-mm, vbabka, mhocko, linux-kernel

On Wed, May 19, 2021 at 09:34:55PM -0700, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote:
> 
> > It does not make sense to retry compaction when a fatal signal is
> > pending.
> 
> Well, it might make sense.  Presumably it is beneficial to other tasks.

Apart from Vlastimil's point, if I hit ^C, I want the task to die,
as soon as possible.  I don't want it to do things which are beneficial
to other tasks, I want my shell prompt back, not spending seconds
trying to compact memory.  Some other task can do that if _it_ needs
large contiguous chunks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20 10:20       ` Vlastimil Babka
@ 2021-05-20 11:42         ` Aaron Tomlin
  2021-05-20 11:56           ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Tomlin @ 2021-05-20 11:42 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Andrew Morton, linux-mm, mhocko, willy, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2515 bytes --]

On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> On 5/20/21 6:34 AM, Andrew Morton wrote:
> > 
> > What observed problems motivated this change?
> > 
> > What were the observed runtime effects of this change?
> 
> Yep those details from the previous thread should be included here.

Fair enough.

During kernel crash dump/or vmcore analysis: I discovered in the context of
__alloc_pages_slowpath() the value stored in the no_progress_loops variable
was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
signal was pending against current.


     #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
     #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
     #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130

                                                             //      w28 = *(sp + 148)      /* no_progress_loops */
     0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>:      ldr     w0, [sp,#148]
                                                             //      w0 = w0 + 0x1
     0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>:      add     w0, w0, #0x1
                                                             //      *(sp + 148) = w0
     0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>:      str     w0, [sp,#148]
                                                             //      if (w0 >= 0x10)
                                                             //          goto __alloc_pages_nodemask+0x904
     0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>:      cmp     w0, #0x10
     0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>:      b.gt    0xffff0000102cd534

 - The stack pointer was 0xffff00002e78f900

     crash> p *(int *)(0xffff00002e78f900+148)
     $1 = 31611688

     crash> ps 521171
        PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
     > 521171      1  36  ffff8080e2128800  RU   0.0 34789440  18624  special

     crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
     $2 = (struct sigpending *) 0xffff80809a416e40

     crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
     $3 = 0x804100

     crash> sig -s 0x804100
     SIGKILL SIGTERM SIGXCPU

     crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
     $4 = 0x100


Unfortunately, this incident was not reproduced, to date.





Kind regards,

-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20 11:42         ` Aaron Tomlin
@ 2021-05-20 11:56           ` Matthew Wilcox
  2021-05-20 13:30             ` Aaron Tomlin
  2021-05-20 14:29             ` [PATCH v4] " Aaron Tomlin
  0 siblings, 2 replies; 15+ messages in thread
From: Matthew Wilcox @ 2021-05-20 11:56 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, mhocko, linux-kernel

On Thu, May 20, 2021 at 12:42:57PM +0100, Aaron Tomlin wrote:
> On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> > On 5/20/21 6:34 AM, Andrew Morton wrote:
> > > 
> > > What observed problems motivated this change?
> > > 
> > > What were the observed runtime effects of this change?
> > 
> > Yep those details from the previous thread should be included here.
> 
> Fair enough.
> 
> During kernel crash dump/or vmcore analysis: I discovered in the context of
> __alloc_pages_slowpath() the value stored in the no_progress_loops variable
> was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
> signal was pending against current.

While this is true, it's not really answering Andrew's question.
What we want as part of the commit message is something like:

"A customer experienced a low memory situation and sent their task a
fatal signal.  Instead of dying promptly, it looped in the page
allocator failing to make progress because ..."

> 
>      #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
>      #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
>      #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130
> 
>                                                              //      w28 = *(sp + 148)      /* no_progress_loops */
>      0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>:      ldr     w0, [sp,#148]
>                                                              //      w0 = w0 + 0x1
>      0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>:      add     w0, w0, #0x1
>                                                              //      *(sp + 148) = w0
>      0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>:      str     w0, [sp,#148]
>                                                              //      if (w0 >= 0x10)
>                                                              //          goto __alloc_pages_nodemask+0x904
>      0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>:      cmp     w0, #0x10
>      0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>:      b.gt    0xffff0000102cd534
> 
>  - The stack pointer was 0xffff00002e78f900
> 
>      crash> p *(int *)(0xffff00002e78f900+148)
>      $1 = 31611688
> 
>      crash> ps 521171
>         PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
>      > 521171      1  36  ffff8080e2128800  RU   0.0 34789440  18624  special
> 
>      crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
>      $2 = (struct sigpending *) 0xffff80809a416e40
> 
>      crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
>      $3 = 0x804100
> 
>      crash> sig -s 0x804100
>      SIGKILL SIGTERM SIGXCPU
> 
>      crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
>      $4 = 0x100
> 
> 
> Unfortunately, this incident was not reproduced, to date.
> 
> 
> 
> 
> 
> Kind regards,
> 
> -- 
> Aaron Tomlin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20 11:56           ` Matthew Wilcox
@ 2021-05-20 13:30             ` Aaron Tomlin
  2021-05-20 14:29             ` [PATCH v4] " Aaron Tomlin
  1 sibling, 0 replies; 15+ messages in thread
From: Aaron Tomlin @ 2021-05-20 13:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, mhocko, linux-kernel

Matthew,

On Thu 2021-05-20 12:56 +0100, Matthew Wilcox wrote:
> While this is true, it's not really answering Andrew's question.
> What we want as part of the commit message is something like:
> 
> "A customer experienced a low memory situation and sent their task a
> fatal signal.  Instead of dying promptly, it looped in the page
> allocator failing to make progress because ..."

Apologies, I did not understand - I will follow up with a v4.


Kind regards,

-- 
Aaron Tomlin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20 11:56           ` Matthew Wilcox
  2021-05-20 13:30             ` Aaron Tomlin
@ 2021-05-20 14:29             ` Aaron Tomlin
  2021-05-28 12:53               ` Vlastimil Babka
  2021-05-31 11:33               ` Michal Hocko
  1 sibling, 2 replies; 15+ messages in thread
From: Aaron Tomlin @ 2021-05-20 14:29 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, vbabka, mhocko, willy, linux-kernel

A customer experienced a low-memory situation and decided to issue a
SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
would expect, the aforementioned task remained unresponsive.

Further investigation indicated that the task was "stuck" in the
reclaim/compaction retry loop. Now, it does not make sense to retry
compaction when a fatal signal is pending.

In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).

This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..b317057ac186 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (!order)
 		return false;
 
+	if (fatal_signal_pending(current))
+		return false;
+
 	if (compaction_made_progress(compact_result))
 		(*compaction_retries)++;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20 14:29             ` [PATCH v4] " Aaron Tomlin
@ 2021-05-28 12:53               ` Vlastimil Babka
  2021-05-31 11:33               ` Michal Hocko
  1 sibling, 0 replies; 15+ messages in thread
From: Vlastimil Babka @ 2021-05-28 12:53 UTC (permalink / raw)
  To: Aaron Tomlin, linux-mm; +Cc: akpm, mhocko, willy, linux-kernel

On 5/20/21 4:29 PM, Aaron Tomlin wrote:
> A customer experienced a low-memory situation and decided to issue a
> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> would expect, the aforementioned task remained unresponsive.
> 
> Further investigation indicated that the task was "stuck" in the
> reclaim/compaction retry loop. Now, it does not make sense to retry
> compaction when a fatal signal is pending.
> 
> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
> 
> This patch ensures that compaction is not needlessly retried
> irrespective of the last known compaction result e.g. if it was skipped,
> in the unlikely case a fatal signal is found pending.
> So, OOM is at least attempted.
> 
> Signed-off-by: Aaron Tomlin <atomlin@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index aaa1655cf682..b317057ac186 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  	if (!order)
>  		return false;
>  
> +	if (fatal_signal_pending(current))
> +		return false;
> +
>  	if (compaction_made_progress(compact_result))
>  		(*compaction_retries)++;
>  
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-20 14:29             ` [PATCH v4] " Aaron Tomlin
  2021-05-28 12:53               ` Vlastimil Babka
@ 2021-05-31 11:33               ` Michal Hocko
  2021-05-31 11:35                 ` Vlastimil Babka
  1 sibling, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2021-05-31 11:33 UTC (permalink / raw)
  To: Aaron Tomlin; +Cc: linux-mm, akpm, vbabka, willy, linux-kernel

On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
> A customer experienced a low-memory situation and decided to issue a
> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> would expect, the aforementioned task remained unresponsive.
> 
> Further investigation indicated that the task was "stuck" in the
> reclaim/compaction retry loop. Now, it does not make sense to retry
> compaction when a fatal signal is pending.

Is this really true in general? The memory reclaim is retried even when
fatal signals are pending. Why should be compaction different? I do
agree that retrying way too much is bad but is there any reason why this
special case doesn't follow the max retry logic?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-31 11:33               ` Michal Hocko
@ 2021-05-31 11:35                 ` Vlastimil Babka
  2021-05-31 13:21                   ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2021-05-31 11:35 UTC (permalink / raw)
  To: Michal Hocko, Aaron Tomlin; +Cc: linux-mm, akpm, willy, linux-kernel

On 5/31/21 1:33 PM, Michal Hocko wrote:
> On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
>> A customer experienced a low-memory situation and decided to issue a
>> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
>> would expect, the aforementioned task remained unresponsive.
>> 
>> Further investigation indicated that the task was "stuck" in the
>> reclaim/compaction retry loop. Now, it does not make sense to retry
>> compaction when a fatal signal is pending.
> 
> Is this really true in general? The memory reclaim is retried even when
> fatal signals are pending. Why should be compaction different? I do
> agree that retrying way too much is bad but is there any reason why this
> special case doesn't follow the max retry logic?

Compaction doesn't do anything if fatal signal is pending, it bails out
immediately and the checks are rather frequent. So why retry?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
  2021-05-31 11:35                 ` Vlastimil Babka
@ 2021-05-31 13:21                   ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2021-05-31 13:21 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Aaron Tomlin, linux-mm, akpm, willy, linux-kernel

On Mon 31-05-21 13:35:31, Vlastimil Babka wrote:
> On 5/31/21 1:33 PM, Michal Hocko wrote:
> > On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
> >> A customer experienced a low-memory situation and decided to issue a
> >> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> >> would expect, the aforementioned task remained unresponsive.
> >> 
> >> Further investigation indicated that the task was "stuck" in the
> >> reclaim/compaction retry loop. Now, it does not make sense to retry
> >> compaction when a fatal signal is pending.
> > 
> > Is this really true in general? The memory reclaim is retried even when
> > fatal signals are pending. Why should be compaction different? I do
> > agree that retrying way too much is bad but is there any reason why this
> > special case doesn't follow the max retry logic?
> 
> Compaction doesn't do anything if fatal signal is pending, it bails out
> immediately and the checks are rather frequent. So why retry?

OK, I was not aware of that and it would be helpful to have that
mentioned in the changelog.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-31 13:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 19:23 [PATCH v2] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt Aaron Tomlin
2021-05-19 19:32 ` Matthew Wilcox
2021-05-19 19:48   ` Aaron Tomlin
2021-05-19 20:17   ` [PATCH v3] " Aaron Tomlin
2021-05-20  4:34     ` Andrew Morton
2021-05-20 10:20       ` Vlastimil Babka
2021-05-20 11:42         ` Aaron Tomlin
2021-05-20 11:56           ` Matthew Wilcox
2021-05-20 13:30             ` Aaron Tomlin
2021-05-20 14:29             ` [PATCH v4] " Aaron Tomlin
2021-05-28 12:53               ` Vlastimil Babka
2021-05-31 11:33               ` Michal Hocko
2021-05-31 11:35                 ` Vlastimil Babka
2021-05-31 13:21                   ` Michal Hocko
2021-05-20 11:09       ` [PATCH v3] " Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).