All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: akpm@linux-foundation.org, vbabka@suse.cz, hannes@cmpxchg.org,
	mgorman@suse.de, rientjes@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Tue, 6 Dec 2016 19:38:38 +0900	[thread overview]
Message-ID: <201612061938.DDD73970.QFHOFJStFOLVOM@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20161205141009.GJ30758@dhcp22.suse.cz>

Michal Hocko wrote:
> On Mon 05-12-16 22:45:19, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > __alloc_pages_may_oom makes sure to skip the OOM killer depending on
> > > the allocation request. This includes lowmem requests, costly high
> > > order requests and others. For a long time __GFP_NOFAIL acted as an
> > > override for all those rules. This is not documented and it can be quite
> > > surprising as well. E.g. GFP_NOFS requests are not invoking the OOM
> > > killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of
> > > the existing open coded loops around allocator to nofail request (and we
> > > have done that in the past) then such a change would have a non trivial
> > > side effect which is not obvious. Note that the primary motivation for
> > > skipping the OOM killer is to prevent from pre-mature invocation.
> > > 
> > > The exception has been added by 82553a937f12 ("oom: invoke oom killer
> > > for __GFP_NOFAIL"). The changelog points out that the oom killer has to
> > > be invoked otherwise the request would be looping for ever. But this
> > > argument is rather weak because the OOM killer doesn't really guarantee
> > > any forward progress for those exceptional cases - e.g. it will hardly
> > > help to form costly order - I believe we certainly do not want to kill
> > > all processes and eventually panic the system just because there is a
> > > nasty driver asking for order-9 page with GFP_NOFAIL not realizing all
> > > the consequences - it is much better this request would loop for ever
> > > than the massive system disruption, lowmem is also highly unlikely to be
> > > freed during OOM killer and GFP_NOFS request could trigger while there
> > > is still a lot of memory pinned by filesystems.
> > 
> > I disagree. I believe that panic caused by OOM killer is much much better
> > than a locked up system. I hate to add new locations that can lockup inside
> > page allocator. This is __GFP_NOFAIL and reclaim has failed.
> 
> As a matter of fact any __GFP_NOFAIL can lockup inside the allocator.

You are trying to increase possible locations of lockups by changing
default behavior of __GFP_NOFAIL.

> Full stop. There is no guaranteed way to make a forward progress with
> the current page allocator implementation.

Then, will you accept kmallocwd until page allocator implementation
can provide a guaranteed way to make a forward progress?

> 
> So we are somewhere in the middle between pre-mature and pointless
> system disruption (GFP_NOFS with a lots of metadata or lowmem request)
> where the OOM killer even might not help and potential lockup which is
> inevitable with the current design. Dunno about you but I would rather
> go with the first option. To be honest I really fail to understand your
> line of argumentation. We have this
> 	do {
> 		cond_resched();
> 	} while (!(page = alloc_page(GFP_NOFS)));
> vs.
> 	page = alloc_page(GFP_NOFS | __GFP_NOFAIL);
> 
> the first one doesn't invoke OOM killer while the later does. This
> discrepancy just cannot make any sense... The same is true for
> 
> 	alloc_page(GFP_DMA) vs alloc_page(GFP_DMA|__GFP_NOFAIL)
> 
> Now we can discuss whether it is a _good_ idea to not invoke OOM killer
> for those exceptions but whatever we do __GFP_NOFAIL is not a way to
> give such a subtle side effect. Or do you disagree even with that?

"[PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath"
silently changes __GFP_NOFAIL vs. __GFP_NORETRY priority.

Currently, __GFP_NORETRY is stronger than __GFP_NOFAIL; __GFP_NOFAIL
allocation requests fail without invoking the OOM killer when both
__GFP_NORETRY and __GFP_NOFAIL are given.

With [PATCH 1/2], __GFP_NOFAIL becomes stronger than __GFP_NORETRY;
__GFP_NOFAIL allocation requests will loop forever without invoking
the OOM killer when both __GFP_NORETRY and __GFP_NOFAIL are given.

Those callers which prefer lockup over panic can specify both
__GFP_NORETRY and __GFP_NOFAIL. You are trying to change behavior of
__GFP_NOFAIL without asking whether existing __GFP_NOFAIL users
want to invoke the OOM killer.

And the story is not specific to existing __GFP_NOFAIL users;
it applies to existing GFP_NOFS users as well.

Quoting from http://lkml.kernel.org/r/20161125131806.GB24353@dhcp22.suse.cz :
> > Will you look at http://marc.info/?t=120716967100004&r=1&w=2 which lead to
> > commit a02fe13297af26c1 ("selinux: prevent rentry into the FS") and commit
> > 869ab5147e1eead8 ("SELinux: more GFP_NOFS fixups to prevent selinux from
> > re-entering the fs code") ? My understanding is that mkdir() system call
> > caused memory allocation for inode creation and that memory allocation
> > caused memory reclaim which had to be !__GFP_FS.
> 
> I will have a look later, thanks for the points.

What is your answer to this problem? For those who prefer panic over lockup,
please provide a mean to invoke the OOM killer (e.g. __GFP_WANT_OOM_KILLER).

If you provide __GFP_WANT_OOM_KILLER, you can change

-#define GFP_KERNEL      (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
+#define GFP_KERNEL      (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_WANT_OOM_KILLER)

in gfp.h and add __GFP_WANT_OOM_KILLER to any existing __GFP_NOFAIL/GFP_NOFS
users who prefer panic over lockup. Then, I think I'm fine with

-	if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
-		return true;
+	if (oc->gfp_mask && !(oc->gfp_mask & __GFP_WANT_OOM_KILLER))
+		return false;

in out_of_memory().

As you know, quite few people are responding to discussions regarding
almost OOM situation. Changing default behavior without asking existing
users is annoying.

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: akpm@linux-foundation.org, vbabka@suse.cz, hannes@cmpxchg.org,
	mgorman@suse.de, rientjes@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Tue, 6 Dec 2016 19:38:38 +0900	[thread overview]
Message-ID: <201612061938.DDD73970.QFHOFJStFOLVOM@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20161205141009.GJ30758@dhcp22.suse.cz>

Michal Hocko wrote:
> On Mon 05-12-16 22:45:19, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > __alloc_pages_may_oom makes sure to skip the OOM killer depending on
> > > the allocation request. This includes lowmem requests, costly high
> > > order requests and others. For a long time __GFP_NOFAIL acted as an
> > > override for all those rules. This is not documented and it can be quite
> > > surprising as well. E.g. GFP_NOFS requests are not invoking the OOM
> > > killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of
> > > the existing open coded loops around allocator to nofail request (and we
> > > have done that in the past) then such a change would have a non trivial
> > > side effect which is not obvious. Note that the primary motivation for
> > > skipping the OOM killer is to prevent from pre-mature invocation.
> > > 
> > > The exception has been added by 82553a937f12 ("oom: invoke oom killer
> > > for __GFP_NOFAIL"). The changelog points out that the oom killer has to
> > > be invoked otherwise the request would be looping for ever. But this
> > > argument is rather weak because the OOM killer doesn't really guarantee
> > > any forward progress for those exceptional cases - e.g. it will hardly
> > > help to form costly order - I believe we certainly do not want to kill
> > > all processes and eventually panic the system just because there is a
> > > nasty driver asking for order-9 page with GFP_NOFAIL not realizing all
> > > the consequences - it is much better this request would loop for ever
> > > than the massive system disruption, lowmem is also highly unlikely to be
> > > freed during OOM killer and GFP_NOFS request could trigger while there
> > > is still a lot of memory pinned by filesystems.
> > 
> > I disagree. I believe that panic caused by OOM killer is much much better
> > than a locked up system. I hate to add new locations that can lockup inside
> > page allocator. This is __GFP_NOFAIL and reclaim has failed.
> 
> As a matter of fact any __GFP_NOFAIL can lockup inside the allocator.

You are trying to increase possible locations of lockups by changing
default behavior of __GFP_NOFAIL.

> Full stop. There is no guaranteed way to make a forward progress with
> the current page allocator implementation.

Then, will you accept kmallocwd until page allocator implementation
can provide a guaranteed way to make a forward progress?

> 
> So we are somewhere in the middle between pre-mature and pointless
> system disruption (GFP_NOFS with a lots of metadata or lowmem request)
> where the OOM killer even might not help and potential lockup which is
> inevitable with the current design. Dunno about you but I would rather
> go with the first option. To be honest I really fail to understand your
> line of argumentation. We have this
> 	do {
> 		cond_resched();
> 	} while (!(page = alloc_page(GFP_NOFS)));
> vs.
> 	page = alloc_page(GFP_NOFS | __GFP_NOFAIL);
> 
> the first one doesn't invoke OOM killer while the later does. This
> discrepancy just cannot make any sense... The same is true for
> 
> 	alloc_page(GFP_DMA) vs alloc_page(GFP_DMA|__GFP_NOFAIL)
> 
> Now we can discuss whether it is a _good_ idea to not invoke OOM killer
> for those exceptions but whatever we do __GFP_NOFAIL is not a way to
> give such a subtle side effect. Or do you disagree even with that?

"[PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath"
silently changes __GFP_NOFAIL vs. __GFP_NORETRY priority.

Currently, __GFP_NORETRY is stronger than __GFP_NOFAIL; __GFP_NOFAIL
allocation requests fail without invoking the OOM killer when both
__GFP_NORETRY and __GFP_NOFAIL are given.

With [PATCH 1/2], __GFP_NOFAIL becomes stronger than __GFP_NORETRY;
__GFP_NOFAIL allocation requests will loop forever without invoking
the OOM killer when both __GFP_NORETRY and __GFP_NOFAIL are given.

Those callers which prefer lockup over panic can specify both
__GFP_NORETRY and __GFP_NOFAIL. You are trying to change behavior of
__GFP_NOFAIL without asking whether existing __GFP_NOFAIL users
want to invoke the OOM killer.

And the story is not specific to existing __GFP_NOFAIL users;
it applies to existing GFP_NOFS users as well.

Quoting from http://lkml.kernel.org/r/20161125131806.GB24353@dhcp22.suse.cz :
> > Will you look at http://marc.info/?t=120716967100004&r=1&w=2 which lead to
> > commit a02fe13297af26c1 ("selinux: prevent rentry into the FS") and commit
> > 869ab5147e1eead8 ("SELinux: more GFP_NOFS fixups to prevent selinux from
> > re-entering the fs code") ? My understanding is that mkdir() system call
> > caused memory allocation for inode creation and that memory allocation
> > caused memory reclaim which had to be !__GFP_FS.
> 
> I will have a look later, thanks for the points.

What is your answer to this problem? For those who prefer panic over lockup,
please provide a mean to invoke the OOM killer (e.g. __GFP_WANT_OOM_KILLER).

If you provide __GFP_WANT_OOM_KILLER, you can change

-#define GFP_KERNEL      (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
+#define GFP_KERNEL      (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_WANT_OOM_KILLER)

in gfp.h and add __GFP_WANT_OOM_KILLER to any existing __GFP_NOFAIL/GFP_NOFS
users who prefer panic over lockup. Then, I think I'm fine with

-	if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
-		return true;
+	if (oc->gfp_mask && !(oc->gfp_mask & __GFP_WANT_OOM_KILLER))
+		return false;

in out_of_memory().

As you know, quite few people are responding to discussions regarding
almost OOM situation. Changing default behavior without asking existing
users is annoying.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-12-06 10:38 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-01 15:25 [PATCH 0/2] GFP_NOFAIL cleanups Michal Hocko
2016-12-01 15:25 ` Michal Hocko
2016-12-01 15:25 ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-12-01 15:25   ` Michal Hocko
2016-12-01 15:25 ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-12-01 15:25   ` Michal Hocko
2016-12-02  7:23   ` Vlastimil Babka
2016-12-02  7:23     ` Vlastimil Babka
2016-12-05 13:45   ` Tetsuo Handa
2016-12-05 13:45     ` Tetsuo Handa
2016-12-05 14:10     ` Michal Hocko
2016-12-05 14:10       ` Michal Hocko
2016-12-06  8:27       ` Michal Hocko
2016-12-06  8:27         ` Michal Hocko
2016-12-06 10:38       ` Tetsuo Handa [this message]
2016-12-06 10:38         ` Tetsuo Handa
2016-12-06 11:03         ` Vlastimil Babka
2016-12-06 11:03           ` Vlastimil Babka
2016-12-06 19:25           ` Michal Hocko
2016-12-06 19:25             ` Michal Hocko
2016-12-06 19:22         ` Michal Hocko
2016-12-06 19:22           ` Michal Hocko
2016-12-08 12:53           ` Tetsuo Handa
2016-12-08 12:53             ` Tetsuo Handa
2016-12-08 13:47             ` Michal Hocko
2016-12-08 13:47               ` Michal Hocko
2016-12-11 11:23               ` Tetsuo Handa
2016-12-11 11:23                 ` Tetsuo Handa
2016-12-11 13:53                 ` Tetsuo Handa
2016-12-11 13:53                   ` Tetsuo Handa
2016-12-12  8:52                   ` Michal Hocko
2016-12-12  8:52                     ` Michal Hocko
2016-12-12  8:48                 ` Michal Hocko
2016-12-12  8:48                   ` Michal Hocko
2016-12-14 10:34                   ` Michal Hocko
2016-12-14 10:34                     ` Michal Hocko
2016-12-16  7:39 OOM: Better, but still there on 4.9 Michal Hocko
2016-12-16 15:58 ` OOM: Better, but still there on Michal Hocko
2016-12-16 15:58   ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-12-16 15:58     ` Michal Hocko
2016-12-16 17:31     ` Johannes Weiner
2016-12-16 17:31       ` Johannes Weiner
2016-12-16 22:12       ` Michal Hocko
2016-12-16 22:12         ` Michal Hocko
2016-12-17 11:17         ` Tetsuo Handa
2016-12-17 11:17           ` Tetsuo Handa
2016-12-18 16:37           ` Michal Hocko
2016-12-18 16:37             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201612061938.DDD73970.QFHOFJStFOLVOM@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.