All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Joel Fernandes <joel@joelfernandes.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [RFC-PATCH 2/4] mm: Add __rcu_alloc_page_lockless() func.
Date: Wed, 7 Oct 2020 12:02:34 +0200	[thread overview]
Message-ID: <20201007100234.GI29020@dhcp22.suse.cz> (raw)
In-Reply-To: <20201006222529.GA23612@pc636>

On Wed 07-10-20 00:25:29, Uladzislau Rezki wrote:
> On Mon, Oct 05, 2020 at 05:41:00PM +0200, Michal Hocko wrote:
> > On Mon 05-10-20 17:08:01, Uladzislau Rezki wrote:
> > > On Fri, Oct 02, 2020 at 11:05:07AM +0200, Michal Hocko wrote:
> > > > On Fri 02-10-20 09:50:14, Mel Gorman wrote:
> > > > > On Fri, Oct 02, 2020 at 09:11:23AM +0200, Michal Hocko wrote:
> > > > > > On Thu 01-10-20 21:26:26, Uladzislau Rezki wrote:
> > > > > > > > 
> > > > > > > > No, I meant going back to idea of new gfp flag, but adjust the implementation in
> > > > > > > > the allocator (different from what you posted in previous version) so that it
> > > > > > > > only looks at the flag after it tries to allocate from pcplist and finds out
> > > > > > > > it's empty. So, no inventing of new page allocator entry points or checks such
> > > > > > > > as the one you wrote above, but adding the new gfp flag in a way that it doesn't
> > > > > > > > affect existing fast paths.
> > > > > > > >
> > > > > > > OK. Now i see. Please have a look below at the patch, so we fully understand
> > > > > > > each other. If that is something that is close to your view or not:
> > > > > > > 
> > > > > > > <snip>
> > > > > > > t a/include/linux/gfp.h b/include/linux/gfp.h
> > > > > > > index c603237e006c..7e613560a502 100644
> > > > > > > --- a/include/linux/gfp.h
> > > > > > > +++ b/include/linux/gfp.h
> > > > > > > @@ -39,8 +39,9 @@ struct vm_area_struct;
> > > > > > >  #define ___GFP_HARDWALL                0x100000u
> > > > > > >  #define ___GFP_THISNODE                0x200000u
> > > > > > >  #define ___GFP_ACCOUNT         0x400000u
> > > > > > > +#define ___GFP_NO_LOCKS                0x800000u
> > > > > > 
> > > > > > Even if a new gfp flag gains a sufficient traction and support I am
> > > > > > _strongly_ opposed against consuming another flag for that. Bit space is
> > > > > > limited. 
> > > > > 
> > > > > That is definitely true. I'm not happy with the GFP flag at all, the
> > > > > comment is at best a damage limiting move. It still would be better for
> > > > > a memory pool to be reserved and sized for critical allocations.
> > > > 
> > > > Completely agreed. The only existing usecase is so special cased that a
> > > > dedicated pool is not only easier to maintain but it should be also much
> > > > better tuned for the specific workload. Something not really feasible
> > > > with the allocator.
> > > > 
> > > > > > Besides that we certainly do not want to allow craziness like
> > > > > > __GFP_NO_LOCK | __GFP_RECLAIM (and similar), do we?
> > > > > 
> > > > > That would deserve to be taken to a dumpster and set on fire. The flag
> > > > > combination could be checked in the allocator but the allocator path fast
> > > > > paths are bad enough already.
> > > > 
> > > > If a new allocation/gfp mode is absolutely necessary then I believe that
> > > > the most reasoanble way forward would be
> > > > #define GFP_NO_LOCK	((__force gfp_t)0)
> > > > 
> > > Agree. Even though i see that some code should be adjusted for it. There are
> > > a few users of the __get_free_page(0); So, need to double check it:
> > 
> > Yes, I believe I have pointed that out in the previous discussion.
> > 
> OK. I spent more time on it. A passed gfp_mask can be adjusted on the entry,
> that adjustment depends on the gfp_allowed_mask. It can be changed in run-time.
> 
> For example during early boot it excludes: __GFP_RECLAIM|__GFP_IO|__GFP_FS flags,
> what is GFP_KERNEL. So, GFP_KERNEL is adjusted on entry and becomes 0 during early
> boot phase.

Honestly I am not sure how much is GFP_BOOT_MASK still needed. The
remaining user of gfp_allowed_mask mask should be only hibernation and I
believe this should be removed in long term. Not as trivial because
scope API cannot be used for that as it needs a global flag but this is
a gross hack that should be implemented differently. It is waiting on my
todo list but never got around to that.

> How to distinguish it:
> 
> <snip>
> +       /*
> +        * gfp_mask can become zero because gfp_allowed_mask changes in run-time.
> +        */
> +       if (!gfp_mask)
> +               alloc_flags |= ALLOC_NO_LOCKS;
> +
>         gfp_mask &= gfp_allowed_mask;
>         alloc_mask = gfp_mask;
>         if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
> <snip>
> 
> > > 
> > > Apart of that. There is a post_alloc_hook(), that gets called from the prep_new_page().
> > > If "debug page alloc enabled", it maps a page for debug purposes invoking kernel_map_pages().
> > > __kernel_map_pages() is ARCH specific. For example, powerpc variant uses sleep-able locks
> > > what can be easily converted to raw variant. 
> > 
> > Yes, there are likely more surprises like that. I am not sure about
> > kasan, page owner (which depens on the stack unwinder) and others which
> > hook into this path.
> >
> I have checked kasan_alloc_pages(), kernel_poison_pages() both are OK,
> at least i did not find any locking there. As for set_page_owner(), it
> requires more attention, since it uses arch specific unwind logic. Though,
> i spent some time on it and so far have not noticed anything.

stack depod depends on a lock IIRC. Anyway, this is just showing how
this is going to grow in complexity and make future additions harder.
A niche usecase is inducing an additional complexity for future
maintenance.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-10-07 10:02 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 19:48 [PATCH 0/4] kvfree_rcu() and _LOCK_NESTING/_PREEMPT_RT Uladzislau Rezki (Sony)
2020-09-18 19:48 ` [PATCH 1/4] rcu/tree: Add a work to allocate pages from regular context Uladzislau Rezki (Sony)
2020-09-18 19:48 ` [RFC-PATCH 2/4] mm: Add __rcu_alloc_page_lockless() func Uladzislau Rezki (Sony)
2020-09-21  7:47   ` Michal Hocko
2020-09-21 15:45     ` Paul E. McKenney
2020-09-21 16:03       ` Michal Hocko
2020-09-21 19:48         ` Uladzislau Rezki
2020-09-22  7:50           ` Michal Hocko
2020-09-22 13:12             ` Uladzislau Rezki
2020-09-22 15:35               ` Michal Hocko
2020-09-23 10:37               ` Mel Gorman
2020-09-23 15:41                 ` Paul E. McKenney
2020-09-23 23:22                   ` Mel Gorman
2020-09-24  8:16                     ` Uladzislau Rezki
2020-09-24 11:16                       ` Peter Zijlstra
2020-09-24 15:16                         ` Uladzislau Rezki
2020-09-24 11:19                       ` Peter Zijlstra
2020-09-24 15:21                         ` Uladzislau Rezki
2020-09-25  8:15                           ` Peter Zijlstra
2020-09-25 10:25                             ` Uladzislau Rezki
2020-09-24 15:38                         ` Paul E. McKenney
2020-09-25  8:26                           ` Peter Zijlstra
2020-09-26 14:37                             ` Paul E. McKenney
2020-09-25  8:05                       ` Michal Hocko
2020-09-25 15:31                         ` Uladzislau Rezki
2020-09-25 15:47                           ` Michal Hocko
2020-09-29 16:25                             ` Uladzislau Rezki
2020-09-30  9:27                               ` Michal Hocko
2020-09-30 12:35                                 ` Uladzislau Rezki
2020-09-30 12:44                                   ` Michal Hocko
2020-09-30 13:39                                     ` Uladzislau Rezki
2020-09-30 16:46                                       ` Michal Hocko
2020-09-30 20:36                                         ` Uladzislau Rezki
2020-09-30 15:25                             ` Joel Fernandes
2020-09-30 16:48                               ` Michal Hocko
2020-09-30 17:03                                 ` Joel Fernandes
2020-09-30 17:03                                   ` Joel Fernandes
2020-09-30 17:22                                   ` Michal Hocko
2020-09-30 17:48                                     ` Joel Fernandes
2020-09-30 17:48                                       ` Joel Fernandes
2020-09-25 16:17                           ` Mel Gorman
2020-09-25 17:57                             ` Uladzislau Rezki
2020-09-22 15:49             ` Paul E. McKenney
2020-09-22  3:35         ` Paul E. McKenney
2020-09-22  8:03           ` Michal Hocko
2020-09-22 15:46             ` Paul E. McKenney
2020-09-23 11:27               ` Uladzislau Rezki
2020-09-29 10:15   ` Vlastimil Babka
2020-09-29 22:07     ` Uladzislau Rezki
2020-09-30 10:35       ` Michal Hocko
2020-10-01 19:32         ` Uladzislau Rezki
2020-09-30 14:39       ` Vlastimil Babka
2020-09-30 15:37         ` Joel Fernandes
2020-10-01 19:26         ` Uladzislau Rezki
2020-10-02  7:11           ` Michal Hocko
2020-10-02  8:50             ` Mel Gorman
2020-10-02  9:05               ` Michal Hocko
2020-10-05 15:08                 ` Uladzislau Rezki
2020-10-05 15:41                   ` Michal Hocko
2020-10-06 22:25                     ` Uladzislau Rezki
2020-10-07 10:02                       ` Michal Hocko [this message]
2020-10-07 11:02                         ` Uladzislau Rezki
2020-10-02  9:07               ` Peter Zijlstra
2020-10-02  9:45                 ` Mel Gorman
2020-10-02  9:58                   ` Peter Zijlstra
2020-10-02 10:19                     ` Mel Gorman
2020-10-02 14:41                       ` Paul E. McKenney
2020-10-06 10:03                         ` Mel Gorman
2020-10-06 15:41                           ` Paul E. McKenney
2020-10-05 13:58             ` Uladzislau Rezki
2020-10-02  8:06           ` Mel Gorman
2020-10-05 14:12             ` Uladzislau Rezki
2020-09-18 19:48 ` [PATCH 3/4] rcu/tree: use " Uladzislau Rezki (Sony)
2020-09-18 19:48 ` [PATCH 4/4] rcu/tree: Use schedule_delayed_work() instead of WQ_HIGHPRI queue Uladzislau Rezki (Sony)
2020-09-20 15:06   ` Paul E. McKenney
2020-09-21 13:27     ` Uladzislau Rezki
2020-09-18 22:15 ` [PATCH 0/4] kvfree_rcu() and _LOCK_NESTING/_PREEMPT_RT Paul E. McKenney
2020-09-30 15:52 ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201007100234.GI29020@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.