linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>, NeilBrown <neilb@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Jonathan Corbet <corbet@lwn.net>,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL
Date: Thu, 7 Oct 2021 10:14:52 +1100	[thread overview]
Message-ID: <20211006231452.GF54211@dread.disaster.area> (raw)
In-Reply-To: <eba04a07-99da-771a-ab6b-36de41f9f120@suse.cz>

On Tue, Oct 05, 2021 at 02:27:45PM +0200, Vlastimil Babka wrote:
> On 10/5/21 13:09, Michal Hocko wrote:
> > On Tue 05-10-21 11:20:51, Vlastimil Babka wrote:
> > [...]
> >> > --- a/include/linux/gfp.h
> >> > +++ b/include/linux/gfp.h
> >> > @@ -209,7 +209,11 @@ struct vm_area_struct;
> >> >   * used only when there is no reasonable failure policy) but it is
> >> >   * definitely preferable to use the flag rather than opencode endless
> >> >   * loop around allocator.
> >> > - * Using this flag for costly allocations is _highly_ discouraged.
> >> > + * Use of this flag may lead to deadlocks if locks are held which would
> >> > + * be needed for memory reclaim, write-back, or the timely exit of a
> >> > + * process killed by the OOM-killer.  Dropping any locks not absolutely
> >> > + * needed is advisable before requesting a %__GFP_NOFAIL allocate.
> >> > + * Using this flag for costly allocations (order>1) is _highly_ discouraged.
> >> 
> >> We define costly as 3, not 1. But sure it's best to avoid even order>0 for
> >> __GFP_NOFAIL. Advising order>1 seems arbitrary though?
> > 
> > This is not completely arbitrary. We have a warning for any higher order
> > allocation.
> > rmqueue:
> > 	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> 
> Oh, I missed that.
> 
> > I do agree that "Using this flag for higher order allocations is
> > _highly_ discouraged.
> 
> Well, with the warning in place this is effectively forbidden, not just
> discouraged.

Yup, especially as it doesn't obey __GFP_NOWARN.

See commit de2860f46362 ("mm: Add kvrealloc()") as a direct result
of unwittingly tripping over this warning when adding __GFP_NOFAIL
annotations to replace open coded high-order kmalloc loops that have
been in place for a couple of decades without issues.

Personally I think that the way __GFP_NOFAIL is first of all
recommended over open coded loops and then only later found to be
effectively forbidden and needing to be replaced with open coded
loops to be a complete mess.

Not to mention on the impossibility of using __GFP_NOFAIL with
kvmalloc() calls. Just what do we expect kmalloc_node(__GFP_NORETRY
| __GFP_NOFAIL) to do, exactly?

So, effectively, we have to open-code around kvmalloc() in
situations where failure is not an option. Even if we pass
__GFP_NOFAIL to __vmalloc(), it isn't guaranteed to succeed because
of the "we won't honor gfp flags passed to __vmalloc" semantics it
has.

Even the API constaints of kvmalloc() w.r.t. only doing the vmalloc
fallback if the gfp context is GFP_KERNEL - we already do GFP_NOFS
kvmalloc via memalloc_nofs_save/restore(), so this behavioural
restriction w.r.t. gfp flags just makes no sense at all.

That leads to us having to go back to writing extremely custom open
coded loops to avoid awful high-order kmalloc direct reclaim
behaviour and still fall back to vmalloc and to still handle NOFAIL
semantics we need:

https://lore.kernel.org/linux-xfs/20210902095927.911100-8-david@fromorbit.com/

So, really, the problems are much deeper here than just badly
documented, catch-22 rules for __GFP_NOFAIL - we can't even use
__GFP_NOFAIL consistently across the allocation APIs because it
changes allocation behaviours in unusable, self-defeating ways....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-10-06 23:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-17  2:56 [PATCH 0/6 v2] congestion_wait() and GFP_NOFAIL NeilBrown
2021-09-17  2:56 ` [PATCH 6/6] XFS: remove congestion_wait() loop from xfs_buf_alloc_pages() NeilBrown
2021-09-17  2:56 ` [PATCH 5/6] XFS: remove congestion_wait() loop from kmem_alloc() NeilBrown
2021-09-17 21:45   ` Dave Chinner
2021-09-17  2:56 ` [PATCH 1/6] MM: Support __GFP_NOFAIL in alloc_pages_bulk_*() and improve doco NeilBrown
2021-09-17 14:42   ` Mel Gorman
2021-09-20 23:48     ` NeilBrown
2021-10-05  9:16     ` Vlastimil Babka
2021-09-17  2:56 ` [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops NeilBrown
2021-09-17  2:56 ` [PATCH 4/6] EXT4: remove congestion_wait from ext4_bio_write_page, and simplify NeilBrown
2021-09-17  2:56 ` [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL NeilBrown
2021-10-05  9:20   ` Vlastimil Babka
2021-10-05 11:09     ` Michal Hocko
2021-10-05 12:27       ` Vlastimil Babka
2021-10-06 23:14         ` Dave Chinner [this message]
2021-10-07 10:07           ` Michal Hocko
2021-10-07 23:15             ` NeilBrown
2021-10-08  7:48               ` Michal Hocko
2021-10-08 22:36                 ` Dave Chinner
2021-10-11 11:57                   ` Michal Hocko
2021-10-11 21:49                     ` NeilBrown
2021-10-18 10:23                       ` Michal Hocko
2021-10-19  4:32                         ` NeilBrown
2021-10-19 13:59                           ` Michal Hocko
2021-10-22  0:09                             ` NeilBrown
2021-10-13  2:32                     ` Dave Chinner
2021-10-13  8:26                       ` Michal Hocko
2021-10-14 11:32                         ` David Sterba
2021-10-14 11:46                           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211006231452.GF54211@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=djwong@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).