All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: NeilBrown <neilb@suse.de>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J. Wong" <djwong@kernel.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops.
Date: Wed, 15 Sep 2021 14:06:49 +0200	[thread overview]
Message-ID: <YUHh2ddnJEDGI8YG@dhcp22.suse.cz> (raw)
In-Reply-To: <163165609100.3992.1570739756456048657@noble.neil.brown.name>

On Wed 15-09-21 07:48:11, Neil Brown wrote:
> On Wed, 15 Sep 2021, Mel Gorman wrote:
> > On Tue, Sep 14, 2021 at 10:13:04AM +1000, NeilBrown wrote:
> > > Indefinite loops waiting for memory allocation are discouraged by
> > > documentation in gfp.h which says the use of __GFP_NOFAIL that it
> > > 
> > >  is definitely preferable to use the flag rather than opencode endless
> > >  loop around allocator.
> > > 
> > > Such loops that use congestion_wait() are particularly unwise as
> > > congestion_wait() is indistinguishable from
> > > schedule_timeout_uninterruptible() in practice - and should be
> > > deprecated.
> > > 
> > > So this patch changes the two loops in ext4_ext_truncate() to use
> > > __GFP_NOFAIL instead of looping.
> > > 
> > > As the allocation is multiple layers deeper in the call stack, this
> > > requires passing the EXT4_EX_NOFAIL flag down and handling it in various
> > > places.
> > > 
> > > Of particular interest is the ext4_journal_start family of calls which
> > > can now have EXT4_EX_NOFAIL 'or'ed in to the 'type'.  This could be seen
> > > as a blurring of types.  However 'type' is 8 bits, and EXT4_EX_NOFAIL is
> > > a high bit, so it is safe in practice.
> > > 
> > > jbd2__journal_start() is enhanced so that the gfp_t flags passed are
> > > used for *all* allocations.
> > > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > 
> > I'm not a fan. GFP_NOFAIL allows access to emergency reserves increasing
> > the risk of a livelock if memory is completely depleted where as some
> > callers can afford to wait.
> 
> Maybe we should wind back and focus on the documentation patches.
> As quoted above, mm.h says:
> 
> > >  is definitely preferable to use the flag rather than opencode endless
> > >  loop around allocator.
> 
> but you seem to be saying that is wrong.  I'd certainly like to get the
> documentation right before changing any code.
> 
> Why does __GFP_NOFAIL access the reserves? Why not require that the
> relevant "Try harder" flag (__GFP_ATOMIC or __GFP_MEMALLOC) be included
> with __GFP_NOFAIL if that is justified?

Does 5020e285856c ("mm, oom: give __GFP_NOFAIL allocations access to
memory reserves") help?

I would be worried to make the semantic even more complex than already
is. Access to memory reserves is an implementation detail that the page
allocator does currently. Callers shouldn't really be worried about
that. I do not ever remember any actual NOFAIL triggered memory
exhaustion. I have seen that to happen for unrestricted access to memory
reserves by OOM victim though. Hence cd04ae1e2dc8 ("mm, oom: do not rely
on TIF_MEMDIE for memory reserves access"). We can consider something
similar if NOFAIL allocation really tend to show a similar problem. We
do not want callers to care about OOM sitauations for this kind of
requests.

__GFP_NOFAIL | __GFP_HIGH is certainly something that is a valid usage
but I wouldn't base OOM behavior based on that.

> There are over 100 __GFP_NOFAIL allocation sites.  I don't feel like
> reviewing them all and seeing if any really need a try-harder flag.
> Can we rename __GFP_NOFAIL to __GFP_NEVERFAIL and then
> #define __GFP_NOFAIL (__GFP_NEVERFAIL | __GFP_ATOMIC)
> and encourage the use of __GFP_NEVERFAIL in future?

Doesn't this add even more complexity?

> When __GFP_NOFAIL loops, it calls congestion_wait() internally.  That
> certainly needs to be fixed and the ideas you present below are
> certainly worth considering when trying to understand how to address
> that.  I'd rather fix it once there in page_alloc.c rather then export a
> waiting API like congestion_wait().  That would provide more
> flexibility.  e.g.  a newly freed page could be handed directly back to
> the waiter.

Completely agreed here. We really do not want people to open code NOFAIL
unless they can do something really subsystem specific that would help
to make a forward progress.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-09-15 12:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14  0:13 [PATCH 0/6] congestion_wait() and GFP_NOFAIL NeilBrown
2021-09-14  0:13 ` [PATCH 2/6] MM: annotate congestion_wait() and wait_iff_congested() as ineffective NeilBrown
2021-09-15 11:56   ` Michal Hocko
2021-09-16 22:13     ` NeilBrown
2021-09-14  0:13 ` [PATCH 5/6] XFS: remove congestion_wait() loop from kmem_alloc() NeilBrown
2021-09-14  1:31   ` Dave Chinner
2021-09-14  3:27     ` NeilBrown
2021-09-14  6:05       ` Dave Chinner
2021-09-14  0:13 ` [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops NeilBrown
2021-09-14 16:34   ` Mel Gorman
2021-09-14 21:48     ` NeilBrown
2021-09-15 12:06       ` Michal Hocko [this message]
2021-09-15 22:35         ` NeilBrown
2021-09-16  0:37           ` Dave Chinner
2021-09-16  6:52           ` Michal Hocko
2021-09-14 23:55     ` Dave Chinner
2021-09-15  8:59       ` Mel Gorman
2021-09-15 12:20         ` Michal Hocko
2021-09-15 14:35         ` Mel Gorman
2021-09-15 22:38           ` Dave Chinner
2021-09-16  9:00             ` Mel Gorman
2021-09-15  0:28   ` Theodore Ts'o
2021-09-15  5:25     ` NeilBrown
2021-09-15 17:02       ` Theodore Ts'o
2021-09-14  0:13 ` [PATCH 1/6] MM: improve documentation for __GFP_NOFAIL NeilBrown
2021-09-15 11:51   ` Michal Hocko
2021-09-14  0:13 ` [PATCH 6/6] XFS: remove congestion_wait() loop from xfs_buf_alloc_pages() NeilBrown
2021-09-14  2:08   ` Dave Chinner
2021-09-14  2:35     ` NeilBrown
2021-09-14  5:33       ` Dave Chinner
2021-09-14 16:45       ` Mel Gorman
2021-09-14 21:13         ` NeilBrown
2021-09-14  0:13 ` [PATCH 4/6] EXT4: remove congestion_wait from ext4_bio_write_page, and simplify NeilBrown
2021-09-17  2:56 [PATCH 0/6 v2] congestion_wait() and GFP_NOFAIL NeilBrown
2021-09-17  2:56 ` [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUHh2ddnJEDGI8YG@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=djwong@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=neilb@suse.de \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.