linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Roman Gushchin <guro@fb.com>,
	linux-api@vger.kernel.org
Subject: Re: [PATCH RFC] mm/madvise: implement MADV_STOCKPILE (kswapd from user space)
Date: Tue, 28 May 2019 07:56:00 -0700	[thread overview]
Message-ID: <CALvZod4fZeQiARaMrw8eaw=9Tynb4x4quZx13nen22EwoC5epQ@mail.gmail.com> (raw)
In-Reply-To: <20190528084243.GT1658@dhcp22.suse.cz>

On Tue, May 28, 2019 at 1:42 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 28-05-19 11:04:46, Konstantin Khlebnikov wrote:
> > On 28.05.2019 10:38, Michal Hocko wrote:
> [...]
> > > Could you define the exact semantic? Ideally something for the manual
> > > page please?
> > >
> >
> > Like kswapd which works with thresholds of free memory this one reclaims
> > until 'free' (i.e. memory which could be allocated without invoking
> > direct recliam of any kind) is lower than passed 'size' argument.
>
> s@lower@higher@ I guess
>
> > Thus right after madvise(NULL, size, MADV_STOCKPILE) 'size' bytes
> > could be allocated in this memory cgroup without extra latency from
> > reclaimer if there is no other memory consumers.
> >
> > Reclaimed memory is simply put into free lists in common buddy allocator,
> > there is no reserves for particular task or cgroup.
> >
> > If overall memory allocation rate is smooth without rough spikes then
> > calling MADV_STOCKPILE in loop periodically provides enough room for
> > allocations and eliminates direct reclaim from all other tasks.
> > As a result this eliminates unpredictable delays caused by
> > direct reclaim in random places.
>
> OK, this makes it more clear to me. Thanks for the clarification!
> I have clearly misunderstood and misinterpreted target as the reclaim
> target rather than free memory target.  Sorry about the confusion.
> I sill think that this looks like an abuse of the madvise but if there
> is a wider consensus this is acceptable I will not stand in the way.
>
>

I agree with Michal that madvise does not seem like a right API for
this use-case, a 'proactive reclaim'.

This is conflating memcg and global proactive reclaim. There are
use-cases which would prefer to have centralized control on the system
wide proactive reclaim because system level memory overcommit is
controlled by the admin. Decoupling global and per-memcg proactive
reclaim will allow mechanism to implement both use-cases (yours and
this one).

The madvise() is requiring that the proactive reclaim process should
be in the target memcg.  I think a memcg interface instead of madvise
is better as it will allow the job owner to control cpu resources of
the proactive reclaim. With madvise, the proactive reclaim has to
share cpu with the target sub-task of the job (or do some tricks with
the hierarchy).

The current implementation is polling-based. I think a reactive
approach based on some watermarks would be better. Polling may be fine
for servers but for power restricted devices, reactive approach is
preferable.

The current implementation is bypassing PSI for global reclaim.
However I am not sure how should PSI interact with proactive reclaim
in general.

thanks,
Shakeel

      parent reply	other threads:[~2019-05-28 14:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-27 10:05 [PATCH RFC] mm/madvise: implement MADV_STOCKPILE (kswapd from user space) Konstantin Khlebnikov
2019-05-27 14:12 ` Michal Hocko
2019-05-27 14:21   ` Michal Hocko
2019-05-27 14:30     ` Konstantin Khlebnikov
2019-05-27 14:39     ` Michal Hocko
2019-05-28  6:25       ` Konstantin Khlebnikov
2019-05-28  6:51         ` Michal Hocko
2019-05-28  7:30           ` Konstantin Khlebnikov
2019-05-28  7:38             ` Michal Hocko
2019-05-28  8:04               ` Konstantin Khlebnikov
2019-05-28  8:42                 ` Michal Hocko
2019-05-28  8:58                   ` Konstantin Khlebnikov
2019-05-28 14:56                   ` Shakeel Butt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod4fZeQiARaMrw8eaw=9Tynb4x4quZx13nen22EwoC5epQ@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).