From: Michal Hocko <mhocko@suse.com>
To: Dave Chinner <david@fromorbit.com>
Cc: NeilBrown <neilb@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
"Darrick J. Wong" <djwong@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mgorman@suse.de>, Jonathan Corbet <corbet@lwn.net>,
linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org
Subject: Re: [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL
Date: Wed, 13 Oct 2021 10:26:58 +0200 [thread overview]
Message-ID: <YWaYUsXgXS6GXM+M@dhcp22.suse.cz> (raw)
In-Reply-To: <20211013023231.GV2361455@dread.disaster.area>
On Wed 13-10-21 13:32:31, Dave Chinner wrote:
> On Mon, Oct 11, 2021 at 01:57:36PM +0200, Michal Hocko wrote:
> > On Sat 09-10-21 09:36:49, Dave Chinner wrote:
> > > On Fri, Oct 08, 2021 at 09:48:39AM +0200, Michal Hocko wrote:
> > > > > > > Even the API constaints of kvmalloc() w.r.t. only doing the vmalloc
> > > > > > > fallback if the gfp context is GFP_KERNEL - we already do GFP_NOFS
> > > > > > > kvmalloc via memalloc_nofs_save/restore(), so this behavioural
> > > > > > > restriction w.r.t. gfp flags just makes no sense at all.
> > > > > >
> > > > > > GFP_NOFS (without using the scope API) has the same problem as NOFAIL in
> > > > > > the vmalloc. Hence it is not supported. If you use the scope API then
> > > > > > you can GFP_KERNEL for kvmalloc. This is clumsy but I am not sure how to
> > > > > > define these conditions in a more sensible way. Special case NOFS if the
> > > > > > scope api is in use? Why do you want an explicit NOFS then?
> > >
> > > Exactly my point - this is clumsy and a total mess. I'm not asking
> > > for an explicit GFP_NOFS, just pointing out that the documented
> > > restrictions that "vmalloc can only do GFP_KERNEL allocations" is
> > > completely wrong.
> > >
> > > vmalloc()
> > > {
> > > if (!(gfp_flags & __GFP_FS))
> > > memalloc_nofs_save();
> > > p = __vmalloc(gfp_flags | GFP_KERNEL)
> > > if (!(gfp_flags & __GFP_FS))
> > > memalloc_nofs_restore();
> > > }
> > >
> > > Yup, that's how simple it is to support GFP_NOFS support in
> > > vmalloc().
> >
> > Yes, this would work from the functionality POV but it defeats the
> > philosophy behind the scope API. Why would you even need this if the
> > scope was defined by the caller of the allocator?
>
> Who actually cares that vmalloc might be using the scoped API
> internally to implement GFP_NOFS or GFP_NOIO? Nobody at all.
> It is far more useful (and self documenting!) for one-off allocations
> to pass a GFP_NOFS flag than it is to use a scope API...
I would agree with you if the explicit GFP_NOFS usage was consistent
and actually justified in the majority cases. My experience tells me
otherwise though. Many filesystems use the flag just because that is
easier. That leads to a huge overuse of the flag that leads to practical
problems.
I was hoping that if we offer an API that would define problematic
reclaim recursion scopes then it would reduce the abuse. I haven't
expected this to happen overnight but it is few years and it seems
it will not happen soon either.
[...]
> > > It also points out that the scope API is highly deficient.
> > > We can do GFP_NOFS via the scope API, but we can't
> > > do anything else because *there is no scope API for other GFP
> > > flags*.
> > >
> > > Why don't we have a GFP_NOFAIL/__GFP_RETRY_FOREVER scope API?
> >
> > NO{FS,IO} where first flags to start this approach. And I have to admit
> > the experiment was much less successful then I hoped for. There are
> > still thousands of direct NOFS users so for some reason defining scopes
> > is not an easy thing to do.
> >
> > I am not against NOFAIL scopes in principle but seeing the nofs
> > "success" I am worried this will not go really well either and it is
> > much more tricky as NOFAIL has much stronger requirements than NOFS.
> > Just imagine how tricky this can be if you just call a library code
> > that is not under your control within a NOFAIL scope. What if that
> > library code decides to allocate (e.g. printk that would attempt to do
> > an optimistic NOWAIT allocation).
>
> I already asked you that _exact_ question earlier in the thread
> w.r.t. kvmalloc(GFP_NOFAIL) using optimistic NOWAIT kmalloc
> allocation. I asked you as a MM expert to define *and document* the
> behaviour that should result, not turn around and use the fact that
> it is undefined behaviour as a "this is too hard" excuse for not
> changing anything.
Dave, you have "thrown" a lot of complains in previous emails and it is
hard to tell rants from features requests apart. I am sorry but I
believe it would be much more productive to continue this discussion if
you could mild your tone.
Can I ask you to break down your feature requests into separate emails
so that we can discuss and track them separately rather in this quite a
long thread which has IMHO diverghed from the initial topic. Thanks!
> THe fact is that the scope APIs are only really useful for certain
> contexts where restrictions are set by higher level functionality.
> For one-off allocation constraints the API sucks and we end up with
Could you be more specific about these one-off allocation constrains?
What would be the reason to define one-off NO{FS,IO} allocation
constrain? Or did you have your NOFAIL example in mind?
> crap like this (found in btrfs):
>
> /*
> * We're holding a transaction handle, so use a NOFS memory
> * allocation context to avoid deadlock if reclaim happens.
> */
> nofs_flag = memalloc_nofs_save();
> value = kmalloc(size, GFP_KERNEL);
> memalloc_nofs_restore(nofs_flag);
Yes this looks wrong indeed! If I were to review such a code I would ask
why the scope cannot match the transaction handle context. IIRC jbd does
that.
I am aware of these patterns. I was pulled in some discussions in the
past and in some it turned out that the constrain is not needed at all
and in some cases that has led to a proper scope definition. As you
point out in your other examples it just happens that it is easier to
go an easy path and define scopes ad-hoc to work around allocation
API limitations.
[...]
> IOWs, a large number of the users of the scope API simply make
> [k]vmalloc() provide GFP_NOFS behaviour. ceph_kvmalloc() is pretty
> much a wrapper that indicates how all vmalloc functions should
> behave. Honour GFP_NOFS and GFP_NOIO by using the scope API
> internally.
I was discouraging from this behavior at vmalloc level to push people
to use scopes properly - aka at the level where the reclaim recursion is
really a problem. If that is infeasible in practice then we can
re-evaluate of course. I was really hoping we can get rid of cargo cult
GFP_NOFS usage this way but the reality often disagrees with hopes.
All that being said, let's discuss [k]vmalloc constrains and usecases
that need changes in a separate email thread.
Thanks!
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2021-10-13 8:27 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-17 2:56 [PATCH 0/6 v2] congestion_wait() and GFP_NOFAIL NeilBrown
2021-09-17 2:56 ` [PATCH 6/6] XFS: remove congestion_wait() loop from xfs_buf_alloc_pages() NeilBrown
2021-09-17 2:56 ` [PATCH 5/6] XFS: remove congestion_wait() loop from kmem_alloc() NeilBrown
2021-09-17 21:45 ` Dave Chinner
2021-09-17 2:56 ` [PATCH 1/6] MM: Support __GFP_NOFAIL in alloc_pages_bulk_*() and improve doco NeilBrown
2021-09-17 14:42 ` Mel Gorman
2021-09-20 23:48 ` NeilBrown
2021-10-05 9:16 ` Vlastimil Babka
2021-09-17 2:56 ` [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops NeilBrown
2021-09-17 2:56 ` [PATCH 4/6] EXT4: remove congestion_wait from ext4_bio_write_page, and simplify NeilBrown
2021-09-17 2:56 ` [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL NeilBrown
2021-10-05 9:20 ` Vlastimil Babka
2021-10-05 11:09 ` Michal Hocko
2021-10-05 12:27 ` Vlastimil Babka
2021-10-06 23:14 ` Dave Chinner
2021-10-07 10:07 ` Michal Hocko
2021-10-07 23:15 ` NeilBrown
2021-10-08 7:48 ` Michal Hocko
2021-10-08 22:36 ` Dave Chinner
2021-10-11 11:57 ` Michal Hocko
2021-10-11 21:49 ` NeilBrown
2021-10-18 10:23 ` Michal Hocko
2021-10-19 4:32 ` NeilBrown
2021-10-19 13:59 ` Michal Hocko
2021-10-22 0:09 ` NeilBrown
2021-10-13 2:32 ` Dave Chinner
2021-10-13 8:26 ` Michal Hocko [this message]
2021-10-14 11:32 ` David Sterba
2021-10-14 11:46 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWaYUsXgXS6GXM+M@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=neilb@suse.de \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).