linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Joel Fernandes <joel@joelfernandes.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC-PATCH 2/4] mm: Add __rcu_alloc_page_lockless() func.
Date: Wed, 23 Sep 2020 11:37:06 +0100	[thread overview]
Message-ID: <20200923103706.GJ3179@techsingularity.net> (raw)
In-Reply-To: <20200922131257.GA29241@pc636>

On Tue, Sep 22, 2020 at 03:12:57PM +0200, Uladzislau Rezki wrote:
> > > > > Yes, I do well remember that you are unhappy with this approach.
> > > > > Unfortunately, thus far, there is no solution that makes all developers
> > > > > happy.  You might be glad to hear that we are also looking into other
> > > > > solutions, each of which makes some other developers unhappy.  So we
> > > > > are at least not picking on you alone.  :-/
> > > > 
> > > > No worries I do not feel like a whipping boy here. But do expect me to
> > > > argue against the approach. I would also appreciate it if there was some
> > > > more information on other attempts, why they have failed. E.g. why
> > > > pre-allocation is not an option that works well enough in most
> > > > reasonable workloads.
> > > Pre-allocating has some drawbacks:
> > > 
> > > a) It is impossible to predict how many pages will be required to
> > >    cover a demand that is controlled by different workloads on
> > >    various systems.
> > 
> > Yes, this is not trivial but not a rocket science either. Remember that
> > you are relying on a very dumb watermark based pcp pool from the
> > allocator.
> >
> We rely on it, indeed. If the pcp-cache is depleted our special work is
> triggered to charge our local cache(few pages) such way will also initiate
> the process of pre-featching pages from the buddy allocator populating
> the depleted pcp-cache. I do not have any concern here.
> 

It can interfere with ATOMIC allocations in critical paths in extreme
circumstances as it potentially puts increased pressure on the emergency
reserve as watermarks are bypassed. That adds to the risk of a functional
failuure if reclaim fails to make progress.  The number of pages are likely
to be limited and unpredictable. As it uses any PCP type, it potentially
causes fragmention issues. For the last point, the allocations may be
transient in the RCU case now but not guaranteed forever. As the API is
in gfp.h, it's open to abuse so the next guy that comes along and thinks
"I am critical no matter what the name says" will cause problems. While
you could argue that would be caught in review, plenty of GFP flag abuses
made it through review.

Fundamentally, this is simply shifting the problem from RCU to the page
allocator because of the locking arrangements and hazard of acquiring zone
lock is a raw spinlock is held on RT. It does not even make the timing
predictable as an empty PCU list (for example, a full drain in low memory
situations) may mean the emergency path is hit anyway. About all it changes
is the timing of when the emergency path is hit in some circumstances --
it's not fixing the problem, it's simply changing the shape.

> >
> > Mimicing a similar implementation shouldn't be all that hard
> > and you will get your own pool which doesn't affect other page allocator
> > users as much as a bonus.
> > 
> I see your point Michal. As i mentioned before, it is important to avoid of
> having such own pools, because the aim is not to waste memory resources. A
> page will be returned back to "page allocator" as soon as a scheduler place  
> our reclaim thread on a CPU and grace period is passed. So, the resource
> can be used for other needs. What is important.
> 

As the emergency path and synchronising can be hit no matter what, why
not increase the pool temporarily after the emergency path is hit and
shrink it again later if necessary?

> Otherwise a memory footprint is increased what is bad for low memory
> conditions when OOM is involved.

OOM would only be a major factor if the size of the pools meant the
machine could not even operate or at least was severely degraded. However,
depleting the PCPU lists for RCU may slow kswapd making reclaim progress
and cause an OOM in itself, or at least an intervention by a userspace
monitor that kills non-critical applications in the background when memory
pressure exists.

> > > As for memory overhead, it is important to reduce it because of
> > > embedded devices like phones, where a low memory condition is a
> > > big issue. In that sense pre-allocating is something that we strongly
> > > would like to avoid.
> > 
> > How big "machines" are we talking about here? I would expect that really
> > tiny machines would have hard times to really fill up thousands of pages
> > with pointers to free...
> > 
> I mentioned above. We can not rely on static model. We would like to
> have a mechanism that gives back ASAP used pages to page allocator
> for other needs.
> 

After an emergency, temporarily increase the size of the pool to avoid
hitting the emergency path again in the near future.

> >
> > Would a similar scaling as the page allocator feasible. Really I mostly
> > do care about shared nature of the pcp allocator list that one user can
> > easily monopolize with this API.
> > 
> I see your concern. pcplist can be monopolized by already existing API:
> 
>     while (i < 100)
>         __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> 

That's not the same class of abuse as it can go to the buddy lists to
refill the correct PCP lists, avoid fragmentation issues, obeys watermarks
and wakes kswapd if it's not awake already.

-- 
Mel Gorman
SUSE Labs


  parent reply	other threads:[~2020-09-23 10:37 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 19:48 [PATCH 0/4] kvfree_rcu() and _LOCK_NESTING/_PREEMPT_RT Uladzislau Rezki (Sony)
2020-09-18 19:48 ` [PATCH 1/4] rcu/tree: Add a work to allocate pages from regular context Uladzislau Rezki (Sony)
2020-09-18 19:48 ` [RFC-PATCH 2/4] mm: Add __rcu_alloc_page_lockless() func Uladzislau Rezki (Sony)
2020-09-21  7:47   ` Michal Hocko
2020-09-21 15:45     ` Paul E. McKenney
2020-09-21 16:03       ` Michal Hocko
2020-09-21 19:48         ` Uladzislau Rezki
2020-09-22  7:50           ` Michal Hocko
2020-09-22 13:12             ` Uladzislau Rezki
2020-09-22 15:35               ` Michal Hocko
2020-09-23 10:37               ` Mel Gorman [this message]
2020-09-23 15:41                 ` Paul E. McKenney
2020-09-23 23:22                   ` Mel Gorman
2020-09-24  8:16                     ` Uladzislau Rezki
2020-09-24 11:16                       ` Peter Zijlstra
2020-09-24 15:16                         ` Uladzislau Rezki
2020-09-24 11:19                       ` Peter Zijlstra
2020-09-24 15:21                         ` Uladzislau Rezki
2020-09-25  8:15                           ` Peter Zijlstra
2020-09-25 10:25                             ` Uladzislau Rezki
2020-09-24 15:38                         ` Paul E. McKenney
2020-09-25  8:26                           ` Peter Zijlstra
2020-09-26 14:37                             ` Paul E. McKenney
2020-09-25  8:05                       ` Michal Hocko
2020-09-25 15:31                         ` Uladzislau Rezki
2020-09-25 15:47                           ` Michal Hocko
2020-09-29 16:25                             ` Uladzislau Rezki
2020-09-30  9:27                               ` Michal Hocko
2020-09-30 12:35                                 ` Uladzislau Rezki
2020-09-30 12:44                                   ` Michal Hocko
2020-09-30 13:39                                     ` Uladzislau Rezki
2020-09-30 16:46                                       ` Michal Hocko
2020-09-30 20:36                                         ` Uladzislau Rezki
2020-09-30 15:25                             ` Joel Fernandes
2020-09-30 16:48                               ` Michal Hocko
2020-09-30 17:03                                 ` Joel Fernandes
2020-09-30 17:22                                   ` Michal Hocko
2020-09-30 17:48                                     ` Joel Fernandes
2020-09-25 16:17                           ` Mel Gorman
2020-09-25 17:57                             ` Uladzislau Rezki
2020-09-22 15:49             ` Paul E. McKenney
2020-09-22  3:35         ` Paul E. McKenney
2020-09-22  8:03           ` Michal Hocko
2020-09-22 15:46             ` Paul E. McKenney
2020-09-23 11:27               ` Uladzislau Rezki
2020-09-29 10:15   ` Vlastimil Babka
2020-09-29 22:07     ` Uladzislau Rezki
2020-09-30 10:35       ` Michal Hocko
2020-10-01 19:32         ` Uladzislau Rezki
2020-09-30 14:39       ` Vlastimil Babka
2020-09-30 15:37         ` Joel Fernandes
2020-10-01 19:26         ` Uladzislau Rezki
2020-10-02  7:11           ` Michal Hocko
2020-10-02  8:50             ` Mel Gorman
2020-10-02  9:05               ` Michal Hocko
2020-10-05 15:08                 ` Uladzislau Rezki
2020-10-05 15:41                   ` Michal Hocko
2020-10-06 22:25                     ` Uladzislau Rezki
2020-10-07 10:02                       ` Michal Hocko
2020-10-07 11:02                         ` Uladzislau Rezki
2020-10-02  9:07               ` Peter Zijlstra
2020-10-02  9:45                 ` Mel Gorman
2020-10-02  9:58                   ` Peter Zijlstra
2020-10-02 10:19                     ` Mel Gorman
2020-10-02 14:41                       ` Paul E. McKenney
2020-10-06 10:03                         ` Mel Gorman
2020-10-06 15:41                           ` Paul E. McKenney
2020-10-05 13:58             ` Uladzislau Rezki
2020-10-02  8:06           ` Mel Gorman
2020-10-05 14:12             ` Uladzislau Rezki
2020-09-18 19:48 ` [PATCH 3/4] rcu/tree: use " Uladzislau Rezki (Sony)
2020-09-18 19:48 ` [PATCH 4/4] rcu/tree: Use schedule_delayed_work() instead of WQ_HIGHPRI queue Uladzislau Rezki (Sony)
2020-09-20 15:06   ` Paul E. McKenney
2020-09-21 13:27     ` Uladzislau Rezki
2020-09-18 22:15 ` [PATCH 0/4] kvfree_rcu() and _LOCK_NESTING/_PREEMPT_RT Paul E. McKenney
2020-09-30 15:52 ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200923103706.GJ3179@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).