All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] mm/ksm, proc: introduce remote madvise
@ 2019-05-16  9:42 Oleksandr Natalenko
  2019-05-16  9:42 ` [PATCH RFC 1/5] proc: introduce madvise placeholder Oleksandr Natalenko
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Oleksandr Natalenko @ 2019-05-16  9:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill Tkhai, Hugh Dickins, Alexey Dobriyan, Vlastimil Babka,
	Michal Hocko, Matthew Wilcox, Pavel Tatashin, Greg KH,
	Suren Baghdasaryan, Minchan Kim, Timofey Titovets, Aaron Tomlin,
	Grzegorz Halat, linux-mm, linux-api

It all began with the fact that KSM works only on memory that is marked
by madvise(). And the only way to get around that is to either:

  * use LD_PRELOAD; or
  * patch the kernel with something like UKSM or PKSM.

(i skip ptrace can of worms here intentionally)

To overcome this restriction, lets implement a per-process /proc knob,
which allows calling madvise remotely. This can be used manually on a
task in question or by some small userspace helper daemon that will do
auto-KSM job for us.

Also, following the discussions from the previous submissions [2] and
[3], make the interface more generic, so that it can be used for other
madvise hints in the future. At this point, I'd like Android people to
speak up, for instance, and clarify in which form they need page
granularity or other things I've missed or have never heard about.

So, I think of three major consumers of this interface:

  * hosts, that run containers, especially similar ones and especially in
    a trusted environment, sharing the same runtime like Node.js;

  * heavy applications, that can be run in multiple instances, not
    limited to opensource ones like Firefox, but also those that cannot be
	modified since they are binary-only and, maybe, statically linked;

  * Android environment that wants to do tricks with
    MADV_WILLNEED/DONTNEED or something similar.

On to the actual implementation. The per-process knob is named "madvise",
and it is write-only. It accepts a madvise hint name to be executed.
Currently, only KSM hints are implemented:

* to mark all the eligible VMAs as mergeable, use:

   # echo merge > /proc/<pid>/madvise

* to unmerge all the VMAs, use:

   # echo unmerge > /proc/<pid>/madvise

I've implemented address space level granularity instead of VMA/page
granularity intentionally for simplicity. If the discussion goes in
other directions, this can be re-implemented to act on a specific VMA
(via map_files?) or page-wise.

Speaking of statistics, more numbers can be found in the very first
submission, that is related to this one [1]. For my current setup with
two Firefox instances I get 100 to 200 MiB saved for the second instance
depending on the amount of tabs.

1 FF instance with 15 tabs:

   $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
   410

2 FF instances, second one has 12 tabs (all the tabs are different):

   $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
   592

At the very moment I do not have specific numbers for containerised
workload, but those should be comparable in case the containers share
similar/same runtime.

The history of this patchset:

  * [2] was based on Timofey's submission [1], but it didn't use a
    dedicated kthread to walk through the list of tasks/VMAs. Instead,
	do_anonymous_page() was amended to implement fully automatic mode,
	but this approach was incorrect due to improper locking and not
	desired due to excessive complexity and being KSM-specific;
  * [3] implemented KSM-specific madvise hints via sysfs, leaving
    traversing /proc to userspace if needed. The approach was not
	desired due to the fact that sysfs shouldn't implement any
	per-process API. Also, the interface was not generic enough to
	extend it for other users.

I drop all the "Reviewed-by" tags from previous submissions because of
code changes and because the objective of this series is now somewhat
different.

Please comment!

Thanks.

[1] https://lore.kernel.org/patchwork/patch/1012142/
[2] http://lkml.iu.edu/hypermail/linux/kernel/1905.1/02417.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1905.1/05076.html

Oleksandr Natalenko (5):
  proc: introduce madvise placeholder
  mm/ksm: introduce ksm_madvise_merge() helper
  mm/ksm: introduce ksm_madvise_unmerge() helper
  mm/ksm, proc: introduce remote merge
  mm/ksm, proc: add remote madvise documentation

 Documentation/filesystems/proc.txt | 13 +++++
 fs/proc/base.c                     | 70 +++++++++++++++++++++++
 include/linux/ksm.h                |  4 ++
 mm/ksm.c                           | 92 +++++++++++++++++++-----------
 4 files changed, 145 insertions(+), 34 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-05-16 17:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16  9:42 [PATCH RFC 0/5] mm/ksm, proc: introduce remote madvise Oleksandr Natalenko
2019-05-16  9:42 ` [PATCH RFC 1/5] proc: introduce madvise placeholder Oleksandr Natalenko
2019-05-16  9:42 ` [PATCH RFC 2/5] mm/ksm: introduce ksm_madvise_merge() helper Oleksandr Natalenko
2019-05-16  9:42 ` [PATCH RFC 3/5] mm/ksm: introduce ksm_madvise_unmerge() helper Oleksandr Natalenko
2019-05-16  9:42 ` [PATCH RFC 4/5] mm/ksm, proc: introduce remote merge Oleksandr Natalenko
2019-05-16 10:00   ` Jann Horn
2019-05-16 10:00     ` Jann Horn
2019-05-16 14:20     ` Oleksandr Natalenko
2019-05-16 14:43       ` Oleksandr Natalenko
2019-05-16 16:09         ` Jann Horn
2019-05-16 16:09           ` Jann Horn
2019-05-16 16:06       ` Jann Horn
2019-05-16 16:06         ` Jann Horn
2019-05-16 16:29     ` Aaron Tomlin
2019-05-16  9:42 ` [PATCH RFC 5/5] mm/ksm, proc: add remote madvise documentation Oleksandr Natalenko
2019-05-16 10:44 ` [PATCH RFC 0/5] mm/ksm, proc: introduce remote madvise Michal Hocko
2019-05-16 14:21   ` Oleksandr Natalenko
2019-05-16 17:24 ` Alexey Dobriyan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.