linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleksandr Natalenko <oleksandr@redhat.com>
To: Pavel Machek <pavel@ucw.cz>
Cc: Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Daniel Colascione <dancol@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>,
	jannh@google.com, oleg@redhat.com, christian@brauner.io,
	hdanton@sina.com, lizeb@google.com
Subject: Re: [PATCH v2 0/5] Introduce MADV_COLD and MADV_PAGEOUT
Date: Wed, 12 Jun 2019 13:19:20 +0200	[thread overview]
Message-ID: <20190612111920.evedpmre63ivnxkz@butterfly.localdomain> (raw)
In-Reply-To: <20190612105945.GA16442@amd>

On Wed, Jun 12, 2019 at 12:59:45PM +0200, Pavel Machek wrote:
> > - Problem
> > 
> > Naturally, cached apps were dominant consumers of memory on the system.
> > However, they were not significant consumers of swap even though they are
> > good candidate for swap. Under investigation, swapping out only begins
> > once the low zone watermark is hit and kswapd wakes up, but the overall
> > allocation rate in the system might trip lmkd thresholds and cause a cached
> > process to be killed(we measured performance swapping out vs. zapping the
> > memory by killing a process. Unsurprisingly, zapping is 10x times faster
> > even though we use zram which is much faster than real storage) so kill
> > from lmkd will often satisfy the high zone watermark, resulting in very
> > few pages actually being moved to swap.
> 
> Is it still faster to swap-in the application than to restart it?

It's the same type of question I was addressing earlier in the remote
KSM discussion: making applications aware of all the memory management stuff
or delegate the decision to some supervising task.

In this case, we cannot rewrite all the application to handle imaginary
SIGRESTART (or whatever you invent to handle restarts gracefully). SIGTERM
may require more memory to finish stuff to not lose your data (and I guess
you don't want to lose your data, right?), and SIGKILL is pretty much
destructive.

Offloading proactive memory management to a process that knows how to do
it allows to handle not only throwaway containers/microservices, but also
usual desktop/mobile workflow.

> > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > information required to make the reclaim decision is not known to the app.
> > Instead, it is known to a centralized userspace daemon, and that daemon
> > must be able to initiate reclaim on its own without any app involvement.
> > To solve the concern, this patch introduces new syscall -
> > 
> >     struct pr_madvise_param {
> >             int size;               /* the size of this structure */
> >             int cookie;             /* reserved to support atomicity */
> >             int nr_elem;            /* count of below arrary fields */
> >             int __user *hints;      /* hints for each range */
> >             /* to store result of each operation */
> >             const struct iovec __user *results;
> >             /* input address ranges */
> >             const struct iovec __user *ranges;
> >     };
> >     
> >     int process_madvise(int pidfd, struct pr_madvise_param *u_param,
> >                             unsigned long flags);
> 
> That's quite a complex interface.
> 
> Could we simply have feel_free_to_swap_out(int pid) syscall? :-).

I wonder for how long we'll go on with adding new syscalls each time we need
some amendment to existing interfaces. Yes, clone6(), I'm looking at
you :(.

In case of process_madvise() keep in mind it will be focused not only on
MADV_COLD, but also, potentially, on other MADV_ flags as well. I can
hardly imagine we'll add one syscall per each flag.

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer

  reply	other threads:[~2019-06-12 11:19 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-10 11:12 [PATCH v2 0/5] Introduce MADV_COLD and MADV_PAGEOUT Minchan Kim
2019-06-10 11:12 ` [PATCH v2 1/5] mm: introduce MADV_COLD Minchan Kim
2019-06-19 12:56   ` Michal Hocko
2019-06-20  0:06     ` Minchan Kim
2019-06-20  7:08       ` Michal Hocko
2019-06-20  8:44         ` Minchan Kim
2019-06-10 11:12 ` [PATCH v2 2/5] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Minchan Kim
2019-06-19 13:09   ` Michal Hocko
2019-06-10 11:12 ` [PATCH v2 3/5] mm: account nr_isolated_xxx in [isolate|putback]_lru_page Minchan Kim
2019-06-10 11:12 ` [PATCH v2 4/5] mm: introduce MADV_PAGEOUT Minchan Kim
2019-06-19 13:24   ` Michal Hocko
2019-06-20  4:16     ` Minchan Kim
2019-06-20  7:04       ` Michal Hocko
2019-06-20  8:40         ` Minchan Kim
2019-06-20  9:22           ` Michal Hocko
2019-06-20 10:32             ` Minchan Kim
2019-06-20 10:55               ` Michal Hocko
2019-06-10 11:12 ` [PATCH v2 5/5] mm: factor out pmd young/dirty bit handling and THP split Minchan Kim
2019-06-10 18:03 ` [PATCH v2 0/5] Introduce MADV_COLD and MADV_PAGEOUT Dave Hansen
2019-06-13  4:51   ` Minchan Kim
2019-06-12 10:59 ` Pavel Machek
2019-06-12 11:19   ` Oleksandr Natalenko [this message]
2019-06-12 11:37     ` Pavel Machek
2019-06-19 12:27 ` Michal Hocko
2019-06-19 23:42   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190612111920.evedpmre63ivnxkz@butterfly.localdomain \
    --to=oleksandr@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=christian@brauner.io \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizeb@google.com \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=pavel@ucw.cz \
    --cc=shakeelb@google.com \
    --cc=sonnyrao@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).