linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: SeongJae Park <sjpark@amazon.com>
To: <akpm@linux-foundation.org>
Cc: SeongJae Park <sjpark@amazon.de>, <Jonathan.Cameron@Huawei.com>,
	<aarcange@redhat.com>, <acme@kernel.org>,
	<alexander.shishkin@linux.intel.com>, <amit@kernel.org>,
	<benh@kernel.crashing.org>, <brendan.d.gregg@gmail.com>,
	<brendanhiggins@google.com>, <cai@lca.pw>,
	<colin.king@canonical.com>, <corbet@lwn.net>, <david@redhat.com>,
	<dwmw@amazon.com>, <elver@google.com>, <fan.du@intel.com>,
	<foersleo@amazon.de>, <gthelen@google.com>, <irogers@google.com>,
	<jolsa@redhat.com>, <kirill@shutemov.name>,
	<mark.rutland@arm.com>, <mgorman@suse.de>, <minchan@kernel.org>,
	<mingo@redhat.com>, <namhyung@kernel.org>, <peterz@infradead.org>,
	<rdunlap@infradead.org>, <riel@surriel.com>,
	<rientjes@google.com>, <rostedt@goodmis.org>, <rppt@kernel.org>,
	<sblbir@amazon.com>, <shakeelb@google.com>, <shuah@kernel.org>,
	<sj38.park@gmail.com>, <snu@amazon.de>, <vbabka@suse.cz>,
	<vdavydov.dev@gmail.com>, <yang.shi@linux.alibaba.com>,
	<ying.huang@intel.com>, <zgf574564920@gmail.com>,
	<linux-damon@amazon.com>, <linux-mm@kvack.org>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Plans around DAMON: perf integration and a new page reclaim mechanism
Date: Wed, 2 Dec 2020 09:27:31 +0100	[thread overview]
Message-ID: <20201202082731.24828-1-sjpark@amazon.com> (raw)

Hello,


This mail describes what DAMON is, what I am trying to do with it, where the
project is now, and what are the next things I will do.  I hope to hear some
comments for refining of the plans if possible.

What DAMON is
-------------

DAMON[1] is a kernel framework for data access monitoring that scalable.  For
the scalability, it guarantees upper-bound limit of the monitoring overhead
that users can set while providing a best effort accuracy.  The kernel
programmers, hence, can easily write various data access monitoring-based
subsystems in the kernel space using DAMON.  Some of such subsystems would
export some interface to user space so that users can also get some benefit
from it.

[1] https://damonitor.github.io

What I am trying to do
----------------------

Actually, DAMON is a part of my project called Data Access-aware Operating
System (DAOS).  As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns.  The
optimizations are for both kernel and user spaces.  We will therefore modify or
create kernel mechanisms, exports some of those to user space and implement
user space library / tools.  Below shows the layers and components for the
project.

---------------------------------------------------------------------------
Primitives:	PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework:	DAMON
Features:	DAMOS, virtual addr, physical addr, ...
Applications:	DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^    KERNEL SPACE    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Raw Interface:	debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...

vvvvvvvvvvvvvvvvvvvvvvv    USER SPACE      vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library:	(libdamon), ...
Tools:		DAMO, (perf), ...
---------------------------------------------------------------------------

The components in parentheses are not implemented yet but in our future plan.
IOW, those are the TODO tasks of DAOS project.  DAMOS, DARC and DAMO will be
explained in following sections.

Where the project is and how it arrived there
---------------------------------------------

The project motivated by increasing memory intensive systems.  Working set size
is continuously growing while DRAM in single system cannot follow the speed.
Fortunately, new memory devices like NVRAM are evolving.  The trend made a
number of data access pattern aware system optimization works to begin.  Most
of those works showed impressive results, but have a common problem.  Many of
their access pattern extraction schemes are impractical or incur high overhead.

Therefore I started designing a way to extract the fine-grained information in
efficient and scalable way.  It is named DAMON.  It proved its lightweight
overhead and accuracy with many environments including realistic benchmarks[1]
and a real huge production systems[2].

For rough but effective re-implementation of the previous works using DAMON
with no code, I implemented a feature called DAMON-based Operation Schemes
(DAMOS).  Using this, I implemented two well-known access-aware memory
management schemes (access-aware THP[3] and proactive reclamation[4]) in 3
lines of configurations[1] and achieved impressive memory footprint reduction
while preserving most of the performance.

The results presented in several venues including KernelSummit'19[5],
MIDDLEWARE Industry'19[6], LWN[7], a Google's internal event, and
KernelSummit'20[8].

The patches posted to LKML since January and received many reviews.  As of now,
22nd version of DAMON patchset[9], 15th version of DAMOS patchset[10], and 8th
version of a patchset[11] for a few more works are available.

[1]  https://damonitor.github.io/doc/html/next/vm/damon/eval.html
[2]  https://lore.kernel.org/linux-mm/20201117143021.11883-1-sjpark@amazon.com/
[3]  https://www.usenix.org/system/files/conference/osdi16/osdi16-kwon.pdf
[4]  https://research.google/pubs/pub48551/
[5]  https://linuxplumbersconf.org/event/4/contributions/548/
[6]  https://dl.acm.org/citation.cfm?id=3368125
[7]  https://lwn.net/Articles/812707/
[8]  https://www.linuxplumbersconf.org/event/7/contributions/659/
[9]  https://lore.kernel.org/linux-mm/20201020085940.13875-1-sjpark@amazon.com/
[10] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[11] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/

What I will do next
-------------------

In a long term, I will continue the works mentioned in 'What I am trying to do'
section.  IOW, I will implement the parentheses-wrapped components in the above
figure.  In a short term, I'd like to start with two things below.

1. Integration of DAMON user space tool in perf

The DAMON patchset introduces a kernel space DAMON application called
damon-dbgfs as a static kernel module.  It exposes DAMON interface to user
space via the debugfs and provide monitoring results recording feature, so that
users can use DAMON as a profiler or data access-aware optimization framework
(using DAMOS feature).  For easier use of the debugfs interface, the patchset
also introduces a user space tool named DAMON Operator (DAMO).  It wraps the
debugfs interface with a human friendly interface and provides a few useful
monitoring results visualization features.

Since the DAMON is presented, many people asked if it is integrated in perf or
is it able to be controlled via perf.  As perf is the must-have tool for system
admins, making it integrated in perf will make much better user experience.
For the reason, I want to integrate DAMO inside perf as yet another subcommand.
For example, users will be able to use DAMON in below way:

    # perf damon start $(pidof $my_workload)	/* Starts monitoring */
    # perf record -e damon:damon_aggregated	/* DAMON's tracepoint */
    # perf damon record $(pidof $my_workload)	/* shortcut for above two */
    # perf damon report

2. DAMON-based Page Reclamation

Page reclamation considered harmful, but the trend mentioned above in the
motivation part implies a change of the situation.  Simplest but reasonable
choice under the trend is configuring fast swap devices such as NVRAM or zram.
Pseudo-LRU, the current page replacement algorithm of the kernel, worked well
in many real world production systems, but the overhead will become more easily
viewable in frequently reclaiming systems.  I also noticed it before[1].  After
all, concerns about the algorithm have long existed[2].

I'd like to propose another Data Access-aware ReClamation algorithm (DARC)
which can be implemented on the DAMON framework.  The design is not fixed yet,
but the abstract idea is as follows.  Once a memory pressure is recognized, it
monitors the memory access pattern of the system and select eviction targets
based on both access frequency and recency.  In a detail, it would account the
age of each region based on access frequency; the age gradually increases but
becomes zero if a big access frequency change to the region is detected.  Then,
it selects pages in regions having lowest access frequency for longest time as
the first eviction candidate.

Rather than just replacing the pseudo-LRU based reclamation, I'd liket to
implement it as an optional proactive reclamation feature.  In a detail, it
will have three watermarks for each zone, that tunable via sysfs.  The lowest
watermarks will be higher than the high watermark for the original reclaim
logic.  DARC will start if the available memory becomes lower than middle
watermark, and stop if the available memory becomes >highest watermark or
<lowest watermark.  In the final case, original reclaim will do the work.

After this, we will be able to get some feedback from brave users and carefully
adjust it's activeness.

Again, the design is still not fixed.  I will post a draft of it as soon as
possible.  I'd like to discuss more detail from there, but just wanted to share
the conceptual level here.

[1] https://linuxplumbersconf.org/event/4/contributions/548/
[2] https://linux-mm.org/AdvancedPageReplacement

Conclusion
----------

I shared my goals, current status, and the short-term TODO items above.  Hope
this to help you understanding the project, avoid unnecessary conflicts and
make more comments.  If you have any question, concern or opinion, please feel
free to let me know.


Thanks,
SeongJae Park


                 reply	other threads:[~2020-12-02  8:28 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201202082731.24828-1-sjpark@amazon.com \
    --to=sjpark@amazon.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=amit@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=brendan.d.gregg@gmail.com \
    --cc=brendanhiggins@google.com \
    --cc=cai@lca.pw \
    --cc=colin.king@canonical.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=dwmw@amazon.com \
    --cc=elver@google.com \
    --cc=fan.du@intel.com \
    --cc=foersleo@amazon.de \
    --cc=gthelen@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-damon@amazon.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=sblbir@amazon.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sj38.park@gmail.com \
    --cc=sjpark@amazon.de \
    --cc=snu@amazon.de \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    --cc=zgf574564920@gmail.com \
    --subject='Re: Plans around DAMON: perf integration and a new page reclaim mechanism' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).