Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 00/11] Introduce Data Access MONitor (DAMON)
@ 2020-02-10 14:48 sjpark
  2020-02-10 14:48 ` [PATCH v4 01/11] mm: " sjpark
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:48 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

Introduction
============

Memory management systems can improve their performance and efficiency by
utilizing finer informations about data access However, because such finer
informations usually comes with higher overhead, most systems including Linux
forgives the potential improvement and rely on coarse informations and/or some
light-weight heuristics.  The psuedo-LRU and the aggressive THP promotions are
examples.

A number of experimental data access pattern awared memory management
optimizations (refer to 'Appendix A' for more details) say the sacrifices are
huge.  However, none of those has successfully adopted to Linux kernel mainly
due to the absence of a scalable and efficient data access monitoring
mechanism.  Refer to 'Appendix B' to see the limitations of existing memory
monitoring mechanisms.

DAMON is a data access monitoring solution for the problem.  It is 1) accurate
enough to be used for the DRAM level memory management (a straightforward
DAMON-based optimization achieved up to 2.55x speedup), 2) light-weight enough
to be applied online (compared to a straightforward access monitoring scheme,
DAMON is up to 94.242.42x lighter) and 3) keeps predefined upper-bound overhead
regardless of the size of target workloads (thus scalable).  Refer to 'Appendix
C' if you interested in how it is possible.

DAMON has mainly designed for the kernel's memory management mechanisms.
However, because it is implemented as a standalone kernel module and provides
several interfaces, it can be used by a wide range of users including kernel
space programs, user space programs, programmers, and administrators.  DAMON
is now supporting the monitoring only, but it will also provide simple and
convenient data access pattern awared memory managements by itself.  Refer to
'Appendix D' for more detailed expected usages of DAMON.


Frequently Asked Questions
==========================

Q: Why DAMON is not integrated with perf?
A: From the perspective of perf like profilers, DAMON can be thought of as a
data source in kernel, like the tracepoints, the pressure stall information
(psi), or the idle page tracking.  Thus, it is easy to integrate DAMON with the
profilers.  However, this patchset doesn't provide a fancy perf integration
because current step of DAMON development is focused on its core logic only.
That said, DAMON already provides two interfaces for user space programs, which
based on debugfs and tracepoint, respectively.  Using the tracepoint interface,
you can use DAMON with perf.  This patchset also provides a debugfs interface
based user space tool for DAMON.  It can be used to record, visualize, and
analyze data access patterns of target processes in a convenient way.

Q: Why a new module, instead of extending perf or other tools?
A: First, DAMON aims to be used by other programs including the kernel.
Therefore, having dependency to specific tools like perf is not desirable.
Second, because it need to be lightweight as much as possible so that it can be
used online, any unnecessary overhead such as kernel - user space context
switching cost should be avoided.  These are the two most biggest reasons why
DAMON is implemented in the kernel space.  The idle page tracking subsystem
would be the kernel module that most seems similar to DAMON.  However, its own
interface is not compatible with DAMON.  Also, the internal implementation of
it has no common part to be reused by DAMON.

Q: Can 'perf mem' provide the data required for DAMON?
A: On the systems supporting 'perf mem', yes.  DAMON is using the PTE Accessed
bits in low level.  Other H/W or S/W features that can be used for the purpose
could be used.  However, as explained with above question, DAMON need to be
implemented in the kernel space.


Evaluations
===========

A prototype of DAMON has evaluated on an Intel Xeon E7-8837 machine using 20
benchmarks that picked from SPEC CPU 2006, NAS, Tensorflow Benchmark,
SPLASH-2X, and PARSEC 3 benchmark suite.  Nonethless, this section provides
only summary of the results.  For more detail, please refer to the slides used
for the introduction of DAMON at the Linux Plumbers Conference 2019[1] or the
MIDDLEWARE'19 industrial track paper[2].


Quality
-------

We first traced and visualized the data access pattern of each workload.  We
were able to confirm that the visualized results are reasonably accurate by
manually comparing those with the source code of the workloads.

To see the usefulness of the monitoring, we optimized 9 memory intensive
workloads among them for memory pressure situations using the DAMON outputs.
In detail, we identified frequently accessed memory regions in each workload
based on the DAMON results and protected them with ``mlock()`` system calls.
The optimized versions consistently show speedup (2.55x in best case, 1.65x in
average) under memory pressure.


Overhead
--------

We also measured the overhead of DAMON.  It was not only under the upperbound
we set, but was much lower (0.6 percent of the bound in best case, 13.288
percent of the bound in average).  This reduction of the overhead is mainly
resulted from its core mechanism called adaptive regions adjustment.  Refer to
'Appendix D' for more detail about the mechanism.  We also compared the
overhead of DAMON with that of a straightforward periodic access check-based
monitoring.  DAMON's overhead was smaller than it by 94,242.42x in best case,
3,159.61x in average.


References
==========

Prototypes of DAMON have introduced by an LPC kernel summit track talk[1] and
two academic papers[2,3].  Please refer to those for more detailed information,
especially the evaluations.

[1] SeongJae Park, Tracing Data Access Pattern with Bounded Overhead and
    Best-effort Accuracy. In The Linux Kernel Summit, September 2019.
    https://linuxplumbersconf.org/event/4/contributions/548/
[2] SeongJae Park, Yunjae Lee, Heon Y. Yeom, Profiling Dynamic Data Access
    Patterns with Controlled Overhead and Quality. In 20th ACM/IFIP
    International Middleware Conference Industry, December 2019.
    https://dl.acm.org/doi/10.1145/3366626.3368125
[3] SeongJae Park, Yunjae Lee, Yunhee Kim, Heon Y. Yeom, Profiling Dynamic Data
    Access Patterns with Bounded Overhead and Accuracy. In IEEE International
    Workshop on Foundations and Applications of Self- Systems (FAS 2019), June
    2019.


Sequence Of Patches
===================

The patches are organized in the following sequence.  The first patch
introduces DAMON module and it's small common functions.  Following three
patches (2nd to 4th) implement the core logics of DAMON, namely regions based
sampling, adaptive regions adjustment, and dynamic memory mapping chage
adoption, one by one.

Next three patches (5th to 7th) adds interfaces of DAMON.  Each of those adds
an api for other kernel code, a debugfs interface for privileged users and a
tracepoint for other tracepoint supporting tracers such as perf.

To provide a minimal reference to the debugfs interface and for more convenient
use/tests of the DAMON, the next patch (8th) implements an user space tool.
The 9th patch adds a document for administrators of DAMON, and the 10th patch
provides DAMON's kunit tests.  Finally, the last patch (11th) updates the
MAINTAINERS file.

The patches are based on the v5.5.  You can also clone the complete git
tree:

    $ git clone git://github.com/sjp38/linux -b damon/patches/v4

The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v4


Patch History
=============

Changes from v3
(https://lore.kernel.org/linux-mm/20200204062312.19913-1-sj38.park@gmail.com/)
 - Fix i386 build issue
   Reported-by: kbuild test robot <lkp@intel.com>
 - Increase the default size of the monitoring result buffer to 1 MiB
 - Fix misc bugs in debugfs interface

Changes from v2
(https://lore.kernel.org/linux-mm/20200128085742.14566-1-sjpark@amazon.com/)
 - Move MAINTAINERS changes to last commit (Brendan Higgins)
 - Add descriptions for kunittest: why not only entire mappings and what the 4
   input sets are trying to test (Brendan Higgins)
 - Remove 'kdamond_need_stop()' test (Brendan Higgins)
 - Discuss about the 'perf mem' and DAMON (Peter Zijlstra)
 - Make CV clearly say what it actually does (Peter Zijlstra)
 - Answer why new module (Qian Cai)
 - Diable DAMON by default (Randy Dunlap)
 - Change the interface: Seperate recording attributes
   (attrs, record, rules) and allow multiple kdamond instances
 - Implement kernel API interface

Changes from v1
(https://lore.kernel.org/linux-mm/20200120162757.32375-1-sjpark@amazon.com/)
 - Rebase on v5.5
 - Add a tracepoint for integration with other tracers (Kirill A. Shutemov)
 - document: Add more description for the user space tool (Brendan Higgins)
 - unittest: Improve readability (Brendan Higgins)
 - unittest: Use consistent name and helpers function (Brendan Higgins)
 - Update PG_Young to avoid reclaim logic interference (Yunjae Lee)

Changes from RFC
(https://lore.kernel.org/linux-mm/20200110131522.29964-1-sjpark@amazon.com/)
 - Specify an ambiguous plan of access pattern based mm optimizations
 - Support loadable module build
 - Cleanup code

SeongJae Park (11):
  mm: Introduce Data Access MONitor (DAMON)
  mm/damon: Implement region based sampling
  mm/damon: Adaptively adjust regions
  mm/damon: Apply dynamic memory mapping changes
  mm/damon: Implement kernel space API
  mm/damon: Add debugfs interface
  mm/damon: Add a tracepoint for result writing
  tools: Add a minimal user-space tool for DAMON
  Documentation/admin-guide/mm: Add a document for DAMON
  mm/damon: Add kunit tests
  MAINTAINERS: Update for DAMON

 .../admin-guide/mm/data_access_monitor.rst    |  414 +++++
 Documentation/admin-guide/mm/index.rst        |    1 +
 MAINTAINERS                                   |   11 +
 include/linux/damon.h                         |   71 +
 include/trace/events/damon.h                  |   32 +
 mm/Kconfig                                    |   23 +
 mm/Makefile                                   |    1 +
 mm/damon-test.h                               |  604 +++++++
 mm/damon.c                                    | 1413 +++++++++++++++++
 tools/damon/.gitignore                        |    1 +
 tools/damon/_dist.py                          |   35 +
 tools/damon/bin2txt.py                        |   64 +
 tools/damon/damo                              |   37 +
 tools/damon/heats.py                          |  358 +++++
 tools/damon/nr_regions.py                     |   88 +
 tools/damon/record.py                         |  219 +++
 tools/damon/report.py                         |   45 +
 tools/damon/wss.py                            |   94 ++
 18 files changed, 3511 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/data_access_monitor.rst
 create mode 100644 include/linux/damon.h
 create mode 100644 include/trace/events/damon.h
 create mode 100644 mm/damon-test.h
 create mode 100644 mm/damon.c
 create mode 100644 tools/damon/.gitignore
 create mode 100644 tools/damon/_dist.py
 create mode 100644 tools/damon/bin2txt.py
 create mode 100755 tools/damon/damo
 create mode 100644 tools/damon/heats.py
 create mode 100644 tools/damon/nr_regions.py
 create mode 100644 tools/damon/record.py
 create mode 100644 tools/damon/report.py
 create mode 100644 tools/damon/wss.py


base-commit: d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
-- 
2.17.1

================================ >8 ===========================================

Appendix A: Related Works
=========================

There are a number of researches[1,2,3,4,5,6] optimizing memory management
mechanisms based on the actual memory access patterns that shows impressive
results.  However, most of those has no deep consideration about the monitoring
of the accesses itself.  Some of those focused on the overhead of the
monitoring, but does not consider the accuracy scalability[6] or has additional
dependencies[7].  Indeed, one recent research[5] about the proactive
reclamation has also proposed[8] to the kernel community but the monitoring
overhead was considered a main problem.

[1] Subramanya R Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram,
    Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016.
    Data tiering in heterogeneous memory systems. In Proceedings of the 11th
    European Conference on Computer Systems (EuroSys). ACM, 15.
[2] Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J Rossbach, and Emmett
    Witchel. 2016. Coordinated and efficient huge page management with ingens.
    In 12th USENIX Symposium on Operating Systems Design and Implementation
    (OSDI).  705–721.
[3] Harald Servat, Antonio J Peña, Germán Llort, Estanislao Mercadal,
    HansChristian Hoppe, and Jesús Labarta. 2017. Automating the application
    data placement in hybrid memory systems. In 2017 IEEE International
    Conference on Cluster Computing (CLUSTER). IEEE, 126–136.
[4] Vlad Nitu, Boris Teabe, Alain Tchana, Canturk Isci, and Daniel Hagimont.
    2018. Welcome to zombieland: practical and energy-efficient memory
    disaggregation in a datacenter. In Proceedings of the 13th European
    Conference on Computer Systems (EuroSys). ACM, 16.
[5] Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw
    Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid
    Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy
    Ranganathan.  2019. Software-Defined Far Memory in Warehouse-Scale
    Computers.  In Proceedings of the 24th International Conference on
    Architectural Support for Programming Languages and Operating Systems
    (ASPLOS).  ACM, New York, NY, USA, 317–330.
    DOI:https://doi.org/10.1145/3297858.3304053
[6] Carl Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park.
    2017. Cache Modeling and Optimization using Miniature Simulations. In 2017
    USENIX Annual Technical Conference (ATC). USENIX Association, Santa
    Clara, CA, 487–498.
    https://www.usenix.org/conference/atc17/technical-sessions/
[7] Haojie Wang, Jidong Zhai, Xiongchao Tang, Bowen Yu, Xiaosong Ma, and
    Wenguang Chen. 2018. Spindle: Informed Memory Access Monitoring. In 2018
    USENIX Annual Technical Conference (ATC). USENIX Association, Boston, MA,
    561–574.  https://www.usenix.org/conference/atc18/presentation/wang-haojie
[8] Jonathan Corbet. 2019. Proactively reclaiming idle memory. (2019).
    https://lwn.net/Articles/787611/.


Appendix B: Limitations of Other Access Monitoring Techniques
=============================================================

The memory access instrumentation techniques which are applied to
many tools such as Intel PIN is essential for correctness required cases such
as memory access bug detections or cache level optimizations.  However, those
usually incur exceptionally high overhead which is unacceptable.

Periodic access checks based on access counting features (e.g., PTE Accessed
bits or PG_Idle flags) can reduce the overhead.  It sacrifies some of the
quality but it's still ok to many of this domain.  However, the overhead
arbitrarily increase as the size of the target workload grows.  Miniature-like
static region based sampling can set the upperbound of the overhead, but it
will now decrease the quality of the output as the size of the workload grows.

DAMON is another solution that overcomes the limitations.  It is 1) accurate
enough for this domain, 2) light-weight so that it can be applied online, and
3) allow users to set the upper-bound of the overhead, regardless of the size
of target workloads.  It is implemented as a simple and small kernel module to
support various users in both of the user space and the kernel space.  Refer to
'Evaluations' section below for detailed performance of DAMON.

For the goals, DAMON utilizes its two core mechanisms, which allows lightweight
overhead and high quality of output, repectively.  To show how DAMON promises
those, refer to 'Mechanisms of DAMON' section below.


Appendix C: Mechanisms of DAMON
===============================


Basic Access Check
------------------

DAMON basically reports what pages are how frequently accessed.  The report is
passed to users in binary format via a ``result file`` which users can set it's
path.  Note that the frequency is not an absolute number of accesses, but a
relative frequency among the pages of the target workloads.

Users can also control the resolution of the reports by setting two time
intervals, ``sampling interval`` and ``aggregation interval``.  In detail,
DAMON checks access to each page per ``sampling interval``, aggregates the
results (counts the number of the accesses to each page), and reports the
aggregated results per ``aggregation interval``.  For the access check of each
page, DAMON uses the Accessed bits of PTEs.

This is thus similar to the previously mentioned periodic access checks based
mechanisms, which overhead is increasing as the size of the target process
grows.


Region Based Sampling
---------------------

To avoid the unbounded increase of the overhead, DAMON groups a number of
adjacent pages that assumed to have same access frequencies into a region.  As
long as the assumption (pages in a region have same access frequencies) is
kept, only one page in the region is required to be checked.  Thus, for each
``sampling interval``, DAMON randomly picks one page in each region and clears
its Accessed bit.  After one more ``sampling interval``, DAMON reads the
Accessed bit of the page and increases the access frequency of the region if
the bit has set meanwhile.  Therefore, the monitoring overhead is controllable
by setting the number of regions.  DAMON allows users to set the minimal and
maximum number of regions for the trade-off.

Except the assumption, this is almost same with the above-mentioned
miniature-like static region based sampling.  In other words, this scheme
cannot preserve the quality of the output if the assumption is not guaranteed.


Adaptive Regions Adjustment
---------------------------

At the beginning of the monitoring, DAMON constructs the initial regions by
evenly splitting the memory mapped address space of the process into the
user-specified minimal number of regions.  In this initial state, the
assumption is normally not kept and thus the quality could be low.  To keep the
assumption as much as possible, DAMON adaptively merges and splits each region.
For each ``aggregation interval``, it compares the access frequencies of
adjacent regions and merges those if the frequency difference is small.  Then,
after it reports and clears the aggregated access frequency of each region, it
splits each region into two regions if the total number of regions is smaller
than the half of the user-specified maximum number of regions.

In this way, DAMON provides its best-effort quality and minimal overhead while
keeping the bounds users set for their trade-off.


Applying Dynamic Memory Mappings
--------------------------------

Only a number of small parts in the super-huge virtual address space of the
processes is mapped to physical memory and accessed.  Thus, tracking the
unmapped address regions is just wasteful.  However, tracking every memory
mapping change might incur an overhead.  For the reason, DAMON applies the
dynamic memory mapping changes to the tracking regions only for each of an
user-specified time interval (``regions update interval``).


Appendix D: Expected Use-cases
==============================

A straightforward usecase of DAMON would be the program behavior analysis.
With the DAMON output, users can confirm whether the program is running as
intended or not.  This will be useful for debuggings and tests of design
points.

The monitored results can also be useful for counting the dynamic working set
size of workloads.  For the administration of memory overcommitted systems or
selection of the environments (e.g., containers providing different amount of
memory) for your workloads, this will be useful.

If you are a programmer, you can optimize your program by managing the memory
based on the actual data access pattern.  For example, you can identify the
dynamic hotness of your data using DAMON and call ``mlock()`` to keep your hot
data in DRAM, or call ``madvise()`` with ``MADV_PAGEOUT`` to proactively
reclaim cold data.  Even though your program is guaranteed to not encounter
memory pressure, you can still improve the performance by applying the DAMON
outputs for call of ``MADV_HUGEPAGE`` and ``MADV_NOHUGEPAGE``.  More creative
optimizations would be possible.  Our evaluations of DAMON includes a
straightforward optimization using the ``mlock()``.  Please refer to the below
Evaluation section for more detail.

As DAMON incurs very low overhead, such optimizations can be applied not only
offline, but also online.  Also, there is no reason to limit such optimizations
to the user space.  Several parts of the kernel's memory management mechanisms
could be also optimized using DAMON. The reclamation, the THP (de)promotion
decisions, and the compaction would be such a candidates.  DAMON will continue
its development to be highly optimized for the online/in-kernel uses.


A Future Plan: Data Access Based Optimizations Support
------------------------------------------------------

As described in the above section, DAMON could be helpful for actual access
based memory management optimizations.  Nevertheless, users who want to do such
optimizations should run DAMON, read the traced data (either online or
offline), analyze it, plan a new memory management scheme, and apply the new
scheme by themselves.  It must be easier than the past, but could still require
some level of efforts.  In its next development stage, DAMON will reduce some
of such efforts by allowing users to specify some access based memory
management rules for their specific processes.

Because this is just a plan, the specific interface is not fixed yet, but for
example, users will be allowed to write their desired memory management rules
to a special file in a DAMON specific format.  The rules will be something like
'if a memory region of size in a range is keeping a range of hotness for more
than a duration, apply specific memory management rule using madvise() or
mlock() to the region'.  For example, we can imagine rules like below:

    # format is: <min/max size> <min/max frequency (0-99)> <duration> <action>

    # if a region of a size keeps a very high access frequency for more than
    # 100ms, lock the region in the main memory (call mlock()). But, if the
    # region is larger than 500 MiB, skip it. The exception might be helpful
    # if the system has only, say, 600 MiB of DRAM, a region of size larger
    # than 600 MiB cannot be locked in the DRAM at all.
    na 500M 90 99 100ms mlock

    # if a region keeps a high access frequency for more than 100ms, put the
    # region on the head of the LRU list (call madvise() with MADV_WILLNEED).
    na na 80 90 100ms madv_willneed

    # if a region keeps a low access frequency for more than 100ms, put the
    # region on the tail of the LRU list (call madvise() with MADV_COLD).
    na na 10 20 100ms madv_cold

    # if a region keeps a very low access frequency for more than 100ms, swap
    # out the region immediately (call madvise() with MADV_PAGEOUT).
    na na 0 10 100ms madv_pageout

    # if a region of a size bigger than 2MB keeps a very high access frequency
    # for more than 100ms, let the region to use huge pages (call madvise()
    # with MADV_HUGEPAGE).
    2M na 90 99 100ms madv_hugepage

    # If a regions of a size bigger than > 2MB keeps no high access frequency
    # for more than 100ms, avoid the region from using huge pages (call
    # madvise() with MADV_NOHUGEPAGE).
    2M na 0 25 100ms madv_nohugepage


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 01/11] mm: Introduce Data Access MONitor (DAMON)
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
@ 2020-02-10 14:48 ` " sjpark
  2020-02-10 14:48 ` [PATCH v4 02/11] mm/damon: Implement region based sampling sjpark
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:48 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit introduces a kernel module named DAMON.  Note that this
commit is implementing only the stub for the module load/unload, basic
data structures, and simple manipulation functions of the structures to
keep the size of commit small.  The core mechanisms of DAMON will be
implemented one by one by following commits.

Brief Introduction
==================

Memory management decisions can normally be more efficient if finer data
access information is available.  However, because finer information
usually comes with higher overhead, most systems including Linux made a
tradeoff: Forgive some wise decisions and use coarse information and/or
light-weight heuristics.

A number of experimental data access pattern awared memory management
optimizations say the sacrifices are huge.  However, none of those has
successfully adopted to Linux kernel mainly due to the absence of a
scalable and efficient data access monitoring mechanism.

DAMON is a data access monitoring solution for the problem.  It is 1)
accurate enough for the DRAM level memory management, 2) light-weight
enough to be applied online, and 3) keeps predefined upper-bound
overhead regardless of the size of target workloads (thus scalable).

DAMON is implemented as a standalone kernel module and provides several
simple interfaces.  Owing to that, though it has mainly designed for the
kernel's memory management mechanisms, it can be also used for a wide
range of user space programs and people.

Frequently Asked Questions
==========================

Q: Why not integrated with perf?
A: From the perspective of perf like profilers, DAMON can be thought of
as a data source in kernel, like tracepoints, pressure stall information
(psi), or idle page tracking.  Thus, it can be easily integrated with
those.  However, this patchset doesn't provide a fancy perf integration
because current step of DAMON development is focused on its core logic
only.  That said, DAMON already provides two interfaces for user space
programs, which based on debugfs and tracepoint, respectively.  Using
the tracepoint interface, you can use DAMON with perf.  This patchset
also provides the debugfs interface based user space tool for DAMON.  It
can be used to record, visualize, and analyze data access pattern of
target processes in a convenient way.

Q: Why a new module, instead of extending perf or other tools?
A: First, DAMON aims to be used by other programs including the kernel.
Therefore, having dependency to specific tools like perf is not
desirable.  Second, because it need to be lightweight as much as
possible so that it can be used online, any unnecessary overhead such as
kernel - user space context switching cost should be avoided.  These are
the two most biggest reasons why DAMON is implemented in the kernel
space.  The idle page tracking subsystem would be the kernel module that
most seems similar to DAMON.  However, it's own interface is not
compatible with DAMON.  Also, the internal implementation of it has no
common part to be reused by DAMON.

Q: Can 'perf mem' provide the data required for DAMON?
A: On the systems supporting 'perf mem', yes.  DAMON is using the PTE
Accessed bits in low level.  Other H/W or S/W features that can be used
for the purpose could be used.  However, as explained with above
question, DAMON need to be implemented in the kernel space.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/Kconfig  |  12 +++
 mm/Makefile |   1 +
 mm/damon.c  | 226 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 239 insertions(+)
 create mode 100644 mm/damon.c

diff --git a/mm/Kconfig b/mm/Kconfig
index ab80933be65f..387d469f40ec 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -739,4 +739,16 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
         bool
 
+config DAMON
+	tristate "Data Access Monitor"
+	depends on MMU
+	default n
+	help
+	  Provides data access monitoring.
+
+	  DAMON is a kernel module that allows users to monitor the actual
+	  memory access pattern of specific user-space processes.  It aims to
+	  be 1) accurate enough to be useful for performance-centric domains,
+	  and 2) sufficiently light-weight so that it can be applied online.
+
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 1937cc251883..2911b3832c90 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -108,3 +108,4 @@ obj-$(CONFIG_ZONE_DEVICE) += memremap.o
 obj-$(CONFIG_HMM_MIRROR) += hmm.o
 obj-$(CONFIG_MEMFD_CREATE) += memfd.o
 obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o
+obj-$(CONFIG_DAMON) += damon.o
diff --git a/mm/damon.c b/mm/damon.c
new file mode 100644
index 000000000000..0687d2b83bb6
--- /dev/null
+++ b/mm/damon.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Data Access Monitor
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.  All rights reserved.
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#define pr_fmt(fmt) "damon: " fmt
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+
+#define damon_get_task_struct(t) \
+	(get_pid_task(find_vpid(t->pid), PIDTYPE_PID))
+
+#define damon_next_region(r) \
+	(container_of(r->list.next, struct damon_region, list))
+
+#define damon_prev_region(r) \
+	(container_of(r->list.prev, struct damon_region, list))
+
+#define damon_for_each_region(r, t) \
+	list_for_each_entry(r, &t->regions_list, list)
+
+#define damon_for_each_region_safe(r, next, t) \
+	list_for_each_entry_safe(r, next, &t->regions_list, list)
+
+#define damon_for_each_task(ctx, t) \
+	list_for_each_entry(t, &(ctx)->tasks_list, list)
+
+#define damon_for_each_task_safe(ctx, t, next) \
+	list_for_each_entry_safe(t, next, &(ctx)->tasks_list, list)
+
+/* Represents a monitoring target region on the virtual address space */
+struct damon_region {
+	unsigned long vm_start;
+	unsigned long vm_end;
+	unsigned long sampling_addr;
+	unsigned int nr_accesses;
+	struct list_head list;
+};
+
+/* Represents a monitoring target task */
+struct damon_task {
+	unsigned long pid;
+	struct list_head regions_list;
+	struct list_head list;
+};
+
+struct damon_ctx {
+	struct rnd_state rndseed;
+
+	struct list_head tasks_list;	/* 'damon_task' objects */
+};
+
+#define LEN_RES_FILE_PATH	256
+
+/* Get a random number in [l, r) */
+#define damon_rand(ctx, l, r) (l + prandom_u32_state(&ctx->rndseed) % (r - l))
+
+/*
+ * Construct a damon_region struct
+ *
+ * Returns the pointer to the new struct if success, or NULL otherwise
+ */
+static struct damon_region *damon_new_region(struct damon_ctx *ctx,
+				unsigned long vm_start, unsigned long vm_end)
+{
+	struct damon_region *ret;
+
+	ret = kmalloc(sizeof(struct damon_region), GFP_KERNEL);
+	if (!ret)
+		return NULL;
+	ret->vm_start = vm_start;
+	ret->vm_end = vm_end;
+	ret->nr_accesses = 0;
+	ret->sampling_addr = damon_rand(ctx, vm_start, vm_end);
+	INIT_LIST_HEAD(&ret->list);
+
+	return ret;
+}
+
+/*
+ * Add a region between two other regions
+ */
+static inline void damon_add_region(struct damon_region *r,
+		struct damon_region *prev, struct damon_region *next)
+{
+	__list_add(&r->list, &prev->list, &next->list);
+}
+
+/*
+ * Append a region to a task's list of regions
+ */
+static void damon_add_region_tail(struct damon_region *r, struct damon_task *t)
+{
+	list_add_tail(&r->list, &t->regions_list);
+}
+
+/*
+ * Delete a region from its list
+ */
+static void damon_del_region(struct damon_region *r)
+{
+	list_del(&r->list);
+}
+
+/*
+ * De-allocate a region
+ */
+static void damon_free_region(struct damon_region *r)
+{
+	kfree(r);
+}
+
+static void damon_destroy_region(struct damon_region *r)
+{
+	damon_del_region(r);
+	damon_free_region(r);
+}
+
+/*
+ * Construct a damon_task struct
+ *
+ * Returns the pointer to the new struct if success, or NULL otherwise
+ */
+static struct damon_task *damon_new_task(unsigned long pid)
+{
+	struct damon_task *t;
+
+	t = kmalloc(sizeof(struct damon_task), GFP_KERNEL);
+	if (!t)
+		return NULL;
+	t->pid = pid;
+	INIT_LIST_HEAD(&t->regions_list);
+
+	return t;
+}
+
+/* Returns n-th damon_region of the given task */
+struct damon_region *damon_nth_region_of(struct damon_task *t, unsigned int n)
+{
+	struct damon_region *r;
+	unsigned int i;
+
+	i = 0;
+	damon_for_each_region(r, t) {
+		if (i++ == n)
+			return r;
+	}
+	return NULL;
+}
+
+static void damon_add_task_tail(struct damon_ctx *ctx, struct damon_task *t)
+{
+	list_add_tail(&t->list, &ctx->tasks_list);
+}
+
+static void damon_del_task(struct damon_task *t)
+{
+	list_del(&t->list);
+}
+
+static void damon_free_task(struct damon_task *t)
+{
+	struct damon_region *r, *next;
+
+	damon_for_each_region_safe(r, next, t)
+		damon_free_region(r);
+	kfree(t);
+}
+
+static void damon_destroy_task(struct damon_task *t)
+{
+	damon_del_task(t);
+	damon_free_task(t);
+}
+
+/*
+ * Returns number of monitoring target tasks
+ */
+static unsigned int nr_damon_tasks(struct damon_ctx *ctx)
+{
+	struct damon_task *t;
+	unsigned int ret = 0;
+
+	damon_for_each_task(ctx, t)
+		ret++;
+	return ret;
+}
+
+/*
+ * Returns the number of target regions for a given target task
+ */
+static unsigned int nr_damon_regions(struct damon_task *t)
+{
+	struct damon_region *r;
+	unsigned int ret = 0;
+
+	damon_for_each_region(r, t)
+		ret++;
+	return ret;
+}
+
+static int __init damon_init(void)
+{
+	pr_info("init\n");
+
+	return 0;
+}
+
+static void __exit damon_exit(void)
+{
+	pr_info("exit\n");
+}
+
+module_init(damon_init);
+module_exit(damon_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("SeongJae Park <sjpark@amazon.de>");
+MODULE_DESCRIPTION("DAMON: Data Access MONitor");
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 02/11] mm/damon: Implement region based sampling
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
  2020-02-10 14:48 ` [PATCH v4 01/11] mm: " sjpark
@ 2020-02-10 14:48 ` sjpark
  2020-02-10 14:48 ` [PATCH v4 03/11] mm/damon: Adaptively adjust regions sjpark
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:48 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit implements DAMON's basic access check and region based
sampling mechanisms.  This change would seems make no sense, mainly
because it is only a part of the DAMON's logics.  Following two commits
will make more sense.

Basic Access Check
------------------

DAMON basically reports what pages are how frequently accessed.  Note
that the frequency is not an absolute number of accesses, but a relative
frequency among the pages of the target workloads.

Users can control the resolution of the reports by setting two time
intervals, ``sampling interval`` and ``aggregation interval``.  In
detail, DAMON checks access to each page per ``sampling interval``,
aggregates the results (counts the number of the accesses to each page),
and reports the aggregated results per ``aggregation interval``.  For
the access check of each page, DAMON uses the Accessed bits of PTEs.

This is thus similar to common periodic access checks based access
tracking mechanisms, which overhead is increasing as the size of the
target process grows.

Region Based Sampling
---------------------

To avoid the unbounded increase of the overhead, DAMON groups a number
of adjacent pages that assumed to have same access frequencies into a
region.  As long as the assumption (pages in a region have same access
frequencies) is kept, only one page in the region is required to be
checked.  Thus, for each ``sampling interval``, DAMON randomly picks one
page in each region and clears its Accessed bit.  After one more
``sampling interval``, DAMON reads the Accessed bit of the page and
increases the access frequency of the region if the bit has set
meanwhile.  Therefore, the monitoring overhead is controllable by
setting the number of regions.

Nonetheless, this scheme cannot preserve the quality of the output if
the assumption is not kept.  Following commit will introduce how we can
make the guarantee with best effort.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon.c | 642 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 642 insertions(+)

diff --git a/mm/damon.c b/mm/damon.c
index 0687d2b83bb6..03fd9b1b931b 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -9,9 +9,14 @@
 
 #define pr_fmt(fmt) "damon: " fmt
 
+#include <linux/delay.h>
+#include <linux/kthread.h>
 #include <linux/mm.h>
 #include <linux/module.h>
+#include <linux/page_idle.h>
 #include <linux/random.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/task.h>
 #include <linux/slab.h>
 
 #define damon_get_task_struct(t) \
@@ -51,7 +56,30 @@ struct damon_task {
 	struct list_head list;
 };
 
+/*
+ * For each 'sample_interval', DAMON checks whether each region is accessed or
+ * not.  It aggregates and keeps the access information (number of accesses to
+ * each region) for 'aggr_interval' and then flushes it to the result buffer if
+ * an 'aggr_interval' surpassed.
+ *
+ * All time intervals are in micro-seconds.
+ */
 struct damon_ctx {
+	unsigned long sample_interval;
+	unsigned long aggr_interval;
+	unsigned long min_nr_regions;
+
+	struct timespec64 last_aggregation;
+
+	unsigned char *rbuf;
+	unsigned int rbuf_len;
+	unsigned int rbuf_offset;
+	char *rfile_path;
+
+	struct task_struct *kdamond;
+	bool kdamond_stop;
+	spinlock_t kdamond_lock;
+
 	struct rnd_state rndseed;
 
 	struct list_head tasks_list;	/* 'damon_task' objects */
@@ -206,6 +234,620 @@ static unsigned int nr_damon_regions(struct damon_task *t)
 	return ret;
 }
 
+/*
+ * Get the mm_struct of the given task
+ *
+ * Callser should put the mm_struct after use, unless it is NULL.
+ *
+ * Returns the mm_struct of the task on success, NULL on failure
+ */
+static struct mm_struct *damon_get_mm(struct damon_task *t)
+{
+	struct task_struct *task;
+	struct mm_struct *mm;
+
+	task = damon_get_task_struct(t);
+	if (!task)
+		return NULL;
+
+	mm = get_task_mm(task);
+	put_task_struct(task);
+	return mm;
+}
+
+/*
+ * Size-evenly split a region into 'nr_pieces' small regions
+ *
+ * Returns 0 on success, or negative error code otherwise.
+ */
+static int damon_split_region_evenly(struct damon_ctx *ctx,
+		struct damon_region *r, unsigned int nr_pieces)
+{
+	unsigned long sz_orig, sz_piece, orig_end;
+	struct damon_region *piece = NULL, *next;
+	unsigned long start;
+
+	if (!r || !nr_pieces)
+		return -EINVAL;
+
+	orig_end = r->vm_end;
+	sz_orig = r->vm_end - r->vm_start;
+	sz_piece = sz_orig / nr_pieces;
+
+	if (!sz_piece)
+		return -EINVAL;
+
+	r->vm_end = r->vm_start + sz_piece;
+	next = damon_next_region(r);
+	for (start = r->vm_end; start + sz_piece <= orig_end;
+			start += sz_piece) {
+		piece = damon_new_region(ctx, start, start + sz_piece);
+		damon_add_region(piece, r, next);
+		r = piece;
+	}
+	if (piece)
+		piece->vm_end = orig_end;
+	return 0;
+}
+
+struct region {
+	unsigned long start;
+	unsigned long end;
+};
+
+static unsigned long sz_region(struct region *r)
+{
+	return r->end - r->start;
+}
+
+static void swap_regions(struct region *r1, struct region *r2)
+{
+	struct region tmp;
+
+	tmp = *r1;
+	*r1 = *r2;
+	*r2 = tmp;
+}
+
+/*
+ * Find the three regions in an address space
+ *
+ * vma		the head vma of the target address space
+ * regions	an array of three 'struct region's that results will be saved
+ *
+ * This function receives an address space and finds three regions in it which
+ * separated by the two biggest unmapped regions in the space.  Please refer to
+ * below comments of 'damon_init_regions_of()' function to know why this is
+ * necessary.
+ *
+ * Returns 0 if success, or negative error code otherwise.
+ */
+static int damon_three_regions_in_vmas(struct vm_area_struct *vma,
+		struct region regions[3])
+{
+	struct region gap = {0,}, first_gap = {0,}, second_gap = {0,};
+	struct vm_area_struct *last_vma = NULL;
+	unsigned long start = 0;
+
+	/* Find two biggest gaps so that first_gap > second_gap > others */
+	for (; vma; vma = vma->vm_next) {
+		if (!last_vma) {
+			start = vma->vm_start;
+			last_vma = vma;
+			continue;
+		}
+		gap.start = last_vma->vm_end;
+		gap.end = vma->vm_start;
+		if (sz_region(&gap) > sz_region(&second_gap)) {
+			swap_regions(&gap, &second_gap);
+			if (sz_region(&second_gap) > sz_region(&first_gap))
+				swap_regions(&second_gap, &first_gap);
+		}
+		last_vma = vma;
+	}
+
+	if (!sz_region(&second_gap) || !sz_region(&first_gap))
+		return -EINVAL;
+
+	/* Sort the two biggest gaps by address */
+	if (first_gap.start > second_gap.start)
+		swap_regions(&first_gap, &second_gap);
+
+	/* Store the result */
+	regions[0].start = start;
+	regions[0].end = first_gap.start;
+	regions[1].start = first_gap.end;
+	regions[1].end = second_gap.start;
+	regions[2].start = second_gap.end;
+	regions[2].end = last_vma->vm_end;
+
+	return 0;
+}
+
+/*
+ * Get the three regions in the given task
+ *
+ * Returns 0 on success, negative error code otherwise.
+ */
+static int damon_three_regions_of(struct damon_task *t,
+				struct region regions[3])
+{
+	struct mm_struct *mm;
+	int ret;
+
+	mm = damon_get_mm(t);
+	if (!mm)
+		return -EINVAL;
+
+	down_read(&mm->mmap_sem);
+	ret = damon_three_regions_in_vmas(mm->mmap, regions);
+	up_read(&mm->mmap_sem);
+
+	mmput(mm);
+	return ret;
+}
+
+/*
+ * Initialize the monitoring target regions for the given task
+ *
+ * t	the given target task
+ *
+ * Because only a number of small portions of the entire address space
+ * is acutally mapped to the memory and accessed, monitoring the unmapped
+ * regions is wasteful.  That said, because we can deal with small noises,
+ * tracking every mapping is not strictly required but could even incur a high
+ * overhead if the mapping frequently changes or the number of mappings is
+ * high.  Nonetheless, this may seems very weird.  DAMON's dynamic regions
+ * adjustment mechanism, which will be implemented with following commit will
+ * make this more sense.
+ *
+ * For the reason, we convert the complex mappings to three distinct regions
+ * that cover every mapped areas of the address space.  Also the two gaps
+ * between the three regions are the two biggest unmapped areas in the given
+ * address space.  In detail, this function first identifies the start and the
+ * end of the mappings and the two biggest unmapped areas of the address space.
+ * Then, it constructs the three regions as below:
+ *
+ *     [mappings[0]->start, big_two_unmapped_areas[0]->start)
+ *     [big_two_unmapped_areas[0]->end, big_two_unmapped_areas[1]->start)
+ *     [big_two_unmapped_areas[1]->end, mappings[nr_mappings - 1]->end)
+ *
+ * As usual memory map of processes is as below, the gap between the heap and
+ * the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed
+ * region and the stack will be two biggest unmapped regions.  Because these
+ * gaps are exceptionally huge areas in usual address space, excluding these
+ * two biggest unmapped regions will be sufficient to make a trade-off.
+ *
+ *   <heap>
+ *   <BIG UNMAPPED REGION 1>
+ *   <uppermost mmap()-ed region>
+ *   (other mmap()-ed regions and small unmapped regions)
+ *   <lowermost mmap()-ed region>
+ *   <BIG UNMAPPED REGION 2>
+ *   <stack>
+ */
+static void damon_init_regions_of(struct damon_ctx *c, struct damon_task *t)
+{
+	struct damon_region *r;
+	struct region regions[3];
+	int i;
+
+	if (damon_three_regions_of(t, regions)) {
+		pr_err("Failed to get three regions of task %lu\n", t->pid);
+		return;
+	}
+
+	/* Set the initial three regions of the task */
+	for (i = 0; i < 3; i++) {
+		r = damon_new_region(c, regions[i].start, regions[i].end);
+		damon_add_region_tail(r, t);
+	}
+
+	/* Split the middle region into 'min_nr_regions - 2' regions */
+	r = damon_nth_region_of(t, 1);
+	if (damon_split_region_evenly(c, r, c->min_nr_regions - 2))
+		pr_warn("Init middle region failed to be split\n");
+}
+
+/* Initialize '->regions_list' of every task */
+static void kdamond_init_regions(struct damon_ctx *ctx)
+{
+	struct damon_task *t;
+
+	damon_for_each_task(ctx, t)
+		damon_init_regions_of(ctx, t);
+}
+
+/*
+ * Check whether the given region has accessed since the last check
+ *
+ * mm	'mm_struct' for the given virtual address space
+ * r	the region to be checked
+ */
+static void kdamond_check_access(struct damon_ctx *ctx,
+			struct mm_struct *mm, struct damon_region *r)
+{
+	pte_t *pte = NULL;
+	pmd_t *pmd = NULL;
+	spinlock_t *ptl;
+
+	if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl))
+		goto mkold;
+
+	/* Read the page table access bit of the page */
+	if (pte && pte_young(*pte))
+		r->nr_accesses++;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	else if (pmd && pmd_young(*pmd))
+		r->nr_accesses++;
+#endif	/* CONFIG_TRANSPARENT_HUGEPAGE */
+
+	spin_unlock(ptl);
+
+mkold:
+	/* mkold next target */
+	r->sampling_addr = damon_rand(ctx, r->vm_start, r->vm_end);
+
+	if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl))
+		return;
+
+	if (pte) {
+		if (pte_young(*pte)) {
+			clear_page_idle(pte_page(*pte));
+			set_page_young(pte_page(*pte));
+		}
+		*pte = pte_mkold(*pte);
+	}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	else if (pmd) {
+		if (pmd_young(*pmd)) {
+			clear_page_idle(pmd_page(*pmd));
+			set_page_young(pte_page(*pte));
+		}
+		*pmd = pmd_mkold(*pmd);
+	}
+#endif
+
+	spin_unlock(ptl);
+}
+
+/*
+ * Check whether a time interval is elapsed
+ *
+ * baseline	the time to check whether the interval has elapsed since
+ * interval	the time interval (microseconds)
+ *
+ * See whether the given time interval has passed since the given baseline
+ * time.  If so, it also updates the baseline to current time for next check.
+ *
+ * Returns true if the time interval has passed, or false otherwise.
+ */
+static bool damon_check_reset_time_interval(struct timespec64 *baseline,
+		unsigned long interval)
+{
+	struct timespec64 now;
+
+	ktime_get_coarse_ts64(&now);
+	if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) <
+			interval * 1000)
+		return false;
+	*baseline = now;
+	return true;
+}
+
+/*
+ * Check whether it is time to flush the aggregated information
+ */
+static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx)
+{
+	return damon_check_reset_time_interval(&ctx->last_aggregation,
+			ctx->aggr_interval);
+}
+
+/*
+ * Flush the content in the result buffer to the result file
+ */
+static void damon_flush_rbuffer(struct damon_ctx *ctx)
+{
+	ssize_t sz;
+	loff_t pos;
+	struct file *rfile;
+
+	while (ctx->rbuf_offset) {
+		pos = 0;
+		rfile = filp_open(ctx->rfile_path, O_CREAT | O_RDWR | O_APPEND,
+				0644);
+		if (IS_ERR(rfile)) {
+			pr_err("Cannot open the result file %s\n",
+					ctx->rfile_path);
+			return;
+		}
+
+		sz = kernel_write(rfile, ctx->rbuf, ctx->rbuf_offset, &pos);
+		filp_close(rfile, NULL);
+
+		ctx->rbuf_offset -= sz;
+	}
+}
+
+/*
+ * Write a data into the result buffer
+ */
+static void damon_write_rbuf(struct damon_ctx *ctx, void *data, ssize_t size)
+{
+	if (!ctx->rbuf_len || !ctx->rbuf)
+		return;
+	if (ctx->rbuf_offset + size > ctx->rbuf_len)
+		damon_flush_rbuffer(ctx);
+
+	memcpy(&ctx->rbuf[ctx->rbuf_offset], data, size);
+	ctx->rbuf_offset += size;
+}
+
+/*
+ * Flush the aggregated monitoring results to the result buffer
+ *
+ * Stores current tracking results to the result buffer and reset 'nr_accesses'
+ * of each regions.  The format for the result buffer is as below:
+ *
+ *   <time> <number of tasks> <array of task infos>
+ *
+ *   task info: <pid> <number of regions> <array of region infos>
+ *   region info: <start address> <end address> <nr_accesses>
+ */
+static void kdamond_flush_aggregated(struct damon_ctx *c)
+{
+	struct damon_task *t;
+	struct timespec64 now;
+	unsigned int nr;
+
+	ktime_get_coarse_ts64(&now);
+
+	damon_write_rbuf(c, &now, sizeof(struct timespec64));
+	nr = nr_damon_tasks(c);
+	damon_write_rbuf(c, &nr, sizeof(nr));
+
+	damon_for_each_task(c, t) {
+		struct damon_region *r;
+
+		damon_write_rbuf(c, &t->pid, sizeof(t->pid));
+		nr = nr_damon_regions(t);
+		damon_write_rbuf(c, &nr, sizeof(nr));
+		damon_for_each_region(r, t) {
+			damon_write_rbuf(c, &r->vm_start, sizeof(r->vm_start));
+			damon_write_rbuf(c, &r->vm_end, sizeof(r->vm_end));
+			damon_write_rbuf(c, &r->nr_accesses,
+					sizeof(r->nr_accesses));
+			r->nr_accesses = 0;
+		}
+	}
+}
+
+/*
+ * Check whether current monitoring should be stopped
+ *
+ * If users asked to stop, need stop.  Even though no user has asked to stop,
+ * need stop if every target task has dead.
+ *
+ * Returns true if need to stop current monitoring.
+ */
+static bool kdamond_need_stop(struct damon_ctx *ctx)
+{
+	struct damon_task *t;
+	struct task_struct *task;
+	bool stop;
+
+	spin_lock(&ctx->kdamond_lock);
+	stop = ctx->kdamond_stop;
+	spin_unlock(&ctx->kdamond_lock);
+	if (stop)
+		return true;
+
+	damon_for_each_task(ctx, t) {
+		task = damon_get_task_struct(t);
+		if (task) {
+			put_task_struct(task);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * The monitoring daemon that runs as a kernel thread
+ */
+static int kdamond_fn(void *data)
+{
+	struct damon_ctx *ctx = (struct damon_ctx *)data;
+	struct damon_task *t;
+	struct damon_region *r, *next;
+	struct mm_struct *mm;
+
+	pr_info("kdamond (%d) starts\n", ctx->kdamond->pid);
+	kdamond_init_regions(ctx);
+	while (!kdamond_need_stop(ctx)) {
+		damon_for_each_task(ctx, t) {
+			mm = damon_get_mm(t);
+			if (!mm)
+				continue;
+			damon_for_each_region(r, t)
+				kdamond_check_access(ctx, mm, r);
+			mmput(mm);
+		}
+
+		if (kdamond_aggregate_interval_passed(ctx))
+			kdamond_flush_aggregated(ctx);
+
+		usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
+	}
+	damon_flush_rbuffer(ctx);
+	damon_for_each_task(ctx, t) {
+		damon_for_each_region_safe(r, next, t)
+			damon_destroy_region(r);
+	}
+	pr_info("kdamond (%d) finishes\n", ctx->kdamond->pid);
+	spin_lock(&ctx->kdamond_lock);
+	ctx->kdamond = NULL;
+	spin_unlock(&ctx->kdamond_lock);
+	return 0;
+}
+
+/*
+ * Controller functions
+ */
+
+/*
+ * Start or stop the kdamond
+ *
+ * Returns 0 if success, negative error code otherwise.
+ */
+static int damon_turn_kdamond(struct damon_ctx *ctx, bool on)
+{
+	spin_lock(&ctx->kdamond_lock);
+	ctx->kdamond_stop = !on;
+	if (!ctx->kdamond && on) {
+		ctx->kdamond = kthread_run(kdamond_fn, ctx, "kdamond");
+		if (!ctx->kdamond)
+			goto fail;
+		goto success;
+	}
+	if (ctx->kdamond && !on) {
+		spin_unlock(&ctx->kdamond_lock);
+		while (true) {
+			spin_lock(&ctx->kdamond_lock);
+			if (!ctx->kdamond)
+				goto success;
+			spin_unlock(&ctx->kdamond_lock);
+
+			usleep_range(ctx->sample_interval,
+					ctx->sample_interval * 2);
+		}
+	}
+
+	/* tried to turn on while turned on, or turn off while turned off */
+
+fail:
+	spin_unlock(&ctx->kdamond_lock);
+	return -EINVAL;
+
+success:
+	spin_unlock(&ctx->kdamond_lock);
+	return 0;
+}
+
+static inline bool damon_is_target_pid(struct damon_ctx *c, unsigned long pid)
+{
+	struct damon_task *t;
+
+	damon_for_each_task(c, t) {
+		if (t->pid == pid)
+			return true;
+	}
+	return false;
+}
+
+/*
+ * This function should not be called while the kdamond is running.
+ */
+static int damon_set_pids(struct damon_ctx *ctx,
+			unsigned long *pids, ssize_t nr_pids)
+{
+	ssize_t i;
+	struct damon_task *t, *next;
+
+	/* Remove unselected tasks */
+	damon_for_each_task_safe(ctx, t, next) {
+		for (i = 0; i < nr_pids; i++) {
+			if (pids[i] == t->pid)
+				break;
+		}
+		if (i != nr_pids)
+			continue;
+		damon_destroy_task(t);
+	}
+
+	/* Add new tasks */
+	for (i = 0; i < nr_pids; i++) {
+		if (damon_is_target_pid(ctx, pids[i]))
+			continue;
+		t = damon_new_task(pids[i]);
+		if (!t) {
+			pr_err("Failed to alloc damon_task\n");
+			return -ENOMEM;
+		}
+		damon_add_task_tail(ctx, t);
+	}
+
+	return 0;
+}
+
+/*
+ * Set attributes for the recording
+ *
+ * path_to_rfile	path to the monitor result files
+ * This function should not be called while the kdamond is running.
+ * Every time interval is in micro-seconds.
+ *
+ * Returns 0 on success, negative error code otherwise.
+ */
+static int damon_set_recording(struct damon_ctx *ctx,
+				unsigned int rbuf_len, char *path_to_rfile)
+{
+	size_t rfile_path_len;
+
+	if (rbuf_len > 4 * 1024 * 1024) {
+		pr_err("too long (>%d) result buffer length\n",
+				4 * 1024 * 1024);
+		return -EINVAL;
+	}
+	rfile_path_len = strnlen(path_to_rfile, LEN_RES_FILE_PATH);
+	if (rfile_path_len >= LEN_RES_FILE_PATH) {
+		pr_err("too long (>%d) result file path %s\n",
+				LEN_RES_FILE_PATH, path_to_rfile);
+		return -EINVAL;
+	}
+	ctx->rbuf_len = rbuf_len;
+	kfree(ctx->rbuf);
+	if (rbuf_len) {
+		ctx->rbuf = kvmalloc(rbuf_len, GFP_KERNEL);
+		if (!ctx->rbuf)
+			return -ENOMEM;
+	}
+	ctx->rfile_path = kmalloc(rfile_path_len + 1, GFP_KERNEL);
+	if (!ctx->rfile_path)
+		return -ENOMEM;
+	strncpy(ctx->rfile_path, path_to_rfile, rfile_path_len);
+	return 0;
+}
+
+/*
+ * Set attributes for the monitoring
+ *
+ * sample_int		time interval between samplings
+ * aggr_int		time interval between aggregations
+ * min_nr_reg		minimal number of regions
+ *
+ * This function should not be called while the kdamond is running.
+ * Every time interval is in micro-seconds.
+ *
+ * Returns 0 on success, negative error code otherwise.
+ */
+static int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
+		unsigned long aggr_int, unsigned long min_nr_reg)
+{
+	if (min_nr_reg < 3) {
+		pr_err("min_nr_regions (%lu) should be bigger than 2\n",
+				min_nr_reg);
+		return -EINVAL;
+	}
+
+	ctx->sample_interval = sample_int;
+	ctx->aggr_interval = aggr_int;
+	ctx->min_nr_regions = min_nr_reg;
+	return 0;
+}
+
 static int __init damon_init(void)
 {
 	pr_info("init\n");
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 03/11] mm/damon: Adaptively adjust regions
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
  2020-02-10 14:48 ` [PATCH v4 01/11] mm: " sjpark
  2020-02-10 14:48 ` [PATCH v4 02/11] mm/damon: Implement region based sampling sjpark
@ 2020-02-10 14:48 ` sjpark
  2020-02-10 14:48 ` [PATCH v4 04/11] mm/damon: Apply dynamic memory mapping changes sjpark
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:48 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

At the beginning of the monitoring, DAMON constructs the initial regions
by evenly splitting the memory mapped address space of the process into
the user-specified minimal number of regions.  In this initial state,
the assumption of the regions (pages in same region have similar access
frequencies) is normally not kept and thus the monitoring quality could
be low.  To keep the assumption as much as possible, DAMON adaptively
merges and splits each region.

For each ``aggregation interval``, it compares the access frequencies of
adjacent regions and merges those if the frequency difference is small.
Then, after it reports and clears the aggregated access frequency of
each region, it splits each region into two regions if the total number
of regions is smaller than the half of the user-specified maximum number
of regions.

In this way, DAMON provides its best-effort quality and minimal overhead
while keeping the bounds users set for their trade-off.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 141 insertions(+), 5 deletions(-)

diff --git a/mm/damon.c b/mm/damon.c
index 03fd9b1b931b..0639064527a4 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -68,6 +68,7 @@ struct damon_ctx {
 	unsigned long sample_interval;
 	unsigned long aggr_interval;
 	unsigned long min_nr_regions;
+	unsigned long max_nr_regions;
 
 	struct timespec64 last_aggregation;
 
@@ -397,9 +398,12 @@ static int damon_three_regions_of(struct damon_task *t,
  * regions is wasteful.  That said, because we can deal with small noises,
  * tracking every mapping is not strictly required but could even incur a high
  * overhead if the mapping frequently changes or the number of mappings is
- * high.  Nonetheless, this may seems very weird.  DAMON's dynamic regions
- * adjustment mechanism, which will be implemented with following commit will
- * make this more sense.
+ * high.  The adaptive regions adjustment mechanism will further help to deal
+ * with the noises by simply identifying the unmapped areas as a region that
+ * has no access.  Moreover, applying the real mappings that would have many
+ * unmapped areas inside will make the adaptive mechanism quite complex.  That
+ * said, too huge unmapped areas inside the monitoring target should be removed
+ * to not take the time for the adaptive mechanism.
  *
  * For the reason, we convert the complex mappings to three distinct regions
  * that cover every mapped areas of the address space.  Also the two gaps
@@ -623,6 +627,123 @@ static void kdamond_flush_aggregated(struct damon_ctx *c)
 	}
 }
 
+#define sz_damon_region(r) (r->vm_end - r->vm_start)
+
+/*
+ * Merge two adjacent regions into one region
+ */
+static void damon_merge_two_regions(struct damon_region *l,
+				struct damon_region *r)
+{
+	l->nr_accesses = (l->nr_accesses * sz_damon_region(l) +
+			r->nr_accesses * sz_damon_region(r)) /
+			(sz_damon_region(l) + sz_damon_region(r));
+	l->vm_end = r->vm_end;
+	damon_destroy_region(r);
+}
+
+#define diff_of(a, b) (a > b ? a - b : b - a)
+
+/*
+ * Merge adjacent regions having similar access frequencies
+ *
+ * t		task that merge operation will make change
+ * thres	merge regions having '->nr_accesses' diff smaller than this
+ */
+static void damon_merge_regions_of(struct damon_task *t, unsigned int thres)
+{
+	struct damon_region *r, *prev = NULL, *next;
+
+	damon_for_each_region_safe(r, next, t) {
+		if (!prev || prev->vm_end != r->vm_start)
+			goto next;
+		if (diff_of(prev->nr_accesses, r->nr_accesses) > thres)
+			goto next;
+		damon_merge_two_regions(prev, r);
+		continue;
+next:
+		prev = r;
+	}
+}
+
+/*
+ * Merge adjacent regions having similar access frequencies
+ *
+ * threshold	merge regions havind nr_accesses diff larger than this
+ *
+ * This function merges monitoring target regions which are adjacent and their
+ * access frequencies are similar.  This is for minimizing the monitoring
+ * overhead under the dynamically changeable access pattern.  If a merge was
+ * unnecessarily made, later 'kdamond_split_regions()' will revert it.
+ */
+static void kdamond_merge_regions(struct damon_ctx *c, unsigned int threshold)
+{
+	struct damon_task *t;
+
+	damon_for_each_task(c, t)
+		damon_merge_regions_of(t, threshold);
+}
+
+/*
+ * Split a region into two small regions
+ *
+ * r		the region to be split
+ * sz_r		size of the first sub-region that will be made
+ */
+static void damon_split_region_at(struct damon_ctx *ctx,
+		struct damon_region *r, unsigned long sz_r)
+{
+	struct damon_region *new;
+
+	new = damon_new_region(ctx, r->vm_start + sz_r, r->vm_end);
+	r->vm_end = new->vm_start;
+
+	damon_add_region(new, r, damon_next_region(r));
+}
+
+static void damon_split_regions_of(struct damon_ctx *ctx, struct damon_task *t)
+{
+	struct damon_region *r, *next;
+	unsigned long sz_left_region;
+
+	damon_for_each_region_safe(r, next, t) {
+		/*
+		 * Randomly select size of left sub-region to be at least
+		 * 10 percent and at most 90% of original region
+		 */
+		sz_left_region = (prandom_u32_state(&ctx->rndseed) % 9 + 1) *
+			(r->vm_end - r->vm_start) / 10;
+		/* Do not allow blank region */
+		if (sz_left_region == 0)
+			continue;
+		damon_split_region_at(ctx, r, sz_left_region);
+	}
+}
+
+/*
+ * splits every target regions into two randomly-sized regions
+ *
+ * This function splits every target regions into two random-sized regions if
+ * current total number of the regions is smaller than the half of the
+ * user-specified maximum number of regions.  This is for maximizing the
+ * monitoring accuracy under the dynamically changeable access patterns.  If a
+ * split was unnecessarily made, later 'kdamond_merge_regions()' will revert
+ * it.
+ */
+static void kdamond_split_regions(struct damon_ctx *ctx)
+{
+	struct damon_task *t;
+	unsigned int nr_regions = 0;
+
+	damon_for_each_task(ctx, t)
+		nr_regions += nr_damon_regions(t);
+	if (nr_regions > ctx->max_nr_regions / 2)
+		return;
+
+	damon_for_each_task(ctx, t)
+		damon_split_regions_of(ctx, t);
+}
+
 /*
  * Check whether current monitoring should be stopped
  *
@@ -663,21 +784,29 @@ static int kdamond_fn(void *data)
 	struct damon_task *t;
 	struct damon_region *r, *next;
 	struct mm_struct *mm;
+	unsigned long max_nr_accesses;
 
 	pr_info("kdamond (%d) starts\n", ctx->kdamond->pid);
 	kdamond_init_regions(ctx);
 	while (!kdamond_need_stop(ctx)) {
+		max_nr_accesses = 0;
 		damon_for_each_task(ctx, t) {
 			mm = damon_get_mm(t);
 			if (!mm)
 				continue;
-			damon_for_each_region(r, t)
+			damon_for_each_region(r, t) {
 				kdamond_check_access(ctx, mm, r);
+				if (r->nr_accesses > max_nr_accesses)
+					max_nr_accesses = r->nr_accesses;
+			}
 			mmput(mm);
 		}
 
-		if (kdamond_aggregate_interval_passed(ctx))
+		if (kdamond_aggregate_interval_passed(ctx)) {
+			kdamond_merge_regions(ctx, max_nr_accesses / 10);
 			kdamond_flush_aggregated(ctx);
+			kdamond_split_regions(ctx);
+		}
 
 		usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
 	}
@@ -827,6 +956,7 @@ static int damon_set_recording(struct damon_ctx *ctx,
  * sample_int		time interval between samplings
  * aggr_int		time interval between aggregations
  * min_nr_reg		minimal number of regions
+ * max_nr_reg		maximum number of regions
  *
  * This function should not be called while the kdamond is running.
  * Every time interval is in micro-seconds.
@@ -841,10 +971,16 @@ static int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
 				min_nr_reg);
 		return -EINVAL;
 	}
+	if (min_nr_reg >= ctx->max_nr_regions) {
+		pr_err("invalid nr_regions.  min (%lu) >= max (%lu)\n",
+				min_nr_reg, max_nr_reg);
+		return -EINVAL;
+	}
 
 	ctx->sample_interval = sample_int;
 	ctx->aggr_interval = aggr_int;
 	ctx->min_nr_regions = min_nr_reg;
+	ctx->max_nr_regions = max_nr_reg;
 	return 0;
 }
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 04/11] mm/damon: Apply dynamic memory mapping changes
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (2 preceding siblings ...)
  2020-02-10 14:48 ` [PATCH v4 03/11] mm/damon: Adaptively adjust regions sjpark
@ 2020-02-10 14:48 ` sjpark
  2020-02-10 14:50 ` [PATCH v4 05/11] mm/damon: Implement kernel space API sjpark
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:48 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

Only a number of parts in the virtual address space of the processes is
mapped to physical memory and accessed.  Thus, tracking the unmapped
address regions is just wasteful.  However, tracking every memory
mapping change might incur an overhead.  For the reason, DAMON applies
the dynamic memory mapping changes to the tracking regions only for each
of a user-specified time interval (``regions update interval``).

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 92 insertions(+), 2 deletions(-)

diff --git a/mm/damon.c b/mm/damon.c
index 0639064527a4..c2c40e003580 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -60,13 +60,16 @@ struct damon_task {
  * For each 'sample_interval', DAMON checks whether each region is accessed or
  * not.  It aggregates and keeps the access information (number of accesses to
  * each region) for 'aggr_interval' and then flushes it to the result buffer if
- * an 'aggr_interval' surpassed.
+ * an 'aggr_interval' surpassed.  And for each 'regions_update_interval', damon
+ * checks whether the memory mapping of the target tasks has changed (e.g., by
+ * mmap() calls from the applications) and applies the changes.
  *
  * All time intervals are in micro-seconds.
  */
 struct damon_ctx {
 	unsigned long sample_interval;
 	unsigned long aggr_interval;
+	unsigned long regions_update_interval;
 	unsigned long min_nr_regions;
 	unsigned long max_nr_regions;
 
@@ -744,6 +747,87 @@ static void kdamond_split_regions(struct damon_ctx *ctx)
 		damon_split_regions_of(ctx, t);
 }
 
+/*
+ * Check whether it is time to check and apply the dynamic mmap changes
+ *
+ * Returns true if it is.
+ */
+static bool kdamond_need_update_regions(struct damon_ctx *ctx)
+{
+	return damon_check_reset_time_interval(&ctx->last_regions_update,
+			ctx->regions_update_interval);
+}
+
+static bool damon_intersect(struct damon_region *r, struct region *re)
+{
+	return !(r->vm_end <= re->start || re->end <= r->vm_start);
+}
+
+/*
+ * Update damon regions for the three big regions of the given task
+ *
+ * t		the given task
+ * bregions	the three big regions of the task
+ */
+static void damon_apply_three_regions(struct damon_ctx *ctx,
+		struct damon_task *t, struct region bregions[3])
+{
+	struct damon_region *r, *next;
+	unsigned int i = 0;
+
+	/* Remove regions which isn't in the three big regions now */
+	damon_for_each_region_safe(r, next, t) {
+		for (i = 0; i < 3; i++) {
+			if (damon_intersect(r, &bregions[i]))
+				break;
+		}
+		if (i == 3)
+			damon_destroy_region(r);
+	}
+
+	/* Adjust intersecting regions to fit with the threee big regions */
+	for (i = 0; i < 3; i++) {
+		struct damon_region *first = NULL, *last;
+		struct damon_region *newr;
+		struct region *br;
+
+		br = &bregions[i];
+		/* Get the first and last regions which intersects with br */
+		damon_for_each_region(r, t) {
+			if (damon_intersect(r, br)) {
+				if (!first)
+					first = r;
+				last = r;
+			}
+			if (r->vm_start >= br->end)
+				break;
+		}
+		if (!first) {
+			/* no damon_region intersects with this big region */
+			newr = damon_new_region(ctx, br->start, br->end);
+			damon_add_region(newr, damon_prev_region(r), r);
+		} else {
+			first->vm_start = br->start;
+			last->vm_end = br->end;
+		}
+	}
+}
+
+/*
+ * Update regions for current memory mappings
+ */
+static void kdamond_update_regions(struct damon_ctx *ctx)
+{
+	struct region three_regions[3];
+	struct damon_task *t;
+
+	damon_for_each_task(ctx, t) {
+		if (damon_three_regions_of(t, three_regions))
+			continue;
+		damon_apply_three_regions(ctx, t, three_regions);
+	}
+}
+
 /*
  * Check whether current monitoring should be stopped
  *
@@ -808,6 +892,9 @@ static int kdamond_fn(void *data)
 			kdamond_split_regions(ctx);
 		}
 
+		if (kdamond_need_update_regions(ctx))
+			kdamond_update_regions(ctx);
+
 		usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
 	}
 	damon_flush_rbuffer(ctx);
@@ -955,6 +1042,7 @@ static int damon_set_recording(struct damon_ctx *ctx,
  *
  * sample_int		time interval between samplings
  * aggr_int		time interval between aggregations
+ * regions_update_int	time interval between vma update checks
  * min_nr_reg		minimal number of regions
  * max_nr_reg		maximum number of regions
  *
@@ -964,7 +1052,8 @@ static int damon_set_recording(struct damon_ctx *ctx,
  * Returns 0 on success, negative error code otherwise.
  */
 static int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
-		unsigned long aggr_int, unsigned long min_nr_reg)
+		unsigned long aggr_int, unsigned long regions_update_int,
+		unsigned long min_nr_reg, unsigned long max_nr_reg)
 {
 	if (min_nr_reg < 3) {
 		pr_err("min_nr_regions (%lu) should be bigger than 2\n",
@@ -979,6 +1068,7 @@ static int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
 
 	ctx->sample_interval = sample_int;
 	ctx->aggr_interval = aggr_int;
+	ctx->regions_update_interval = regions_update_int;
 	ctx->min_nr_regions = min_nr_reg;
 	ctx->max_nr_regions = max_nr_reg;
 	return 0;
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 05/11] mm/damon: Implement kernel space API
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (3 preceding siblings ...)
  2020-02-10 14:48 ` [PATCH v4 04/11] mm/damon: Apply dynamic memory mapping changes sjpark
@ 2020-02-10 14:50 ` sjpark
  2020-02-12 23:21   ` kbuild test robot
  2020-02-10 14:51 ` [PATCH v4 06/11] mm/damon: Add debugfs interface sjpark
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: sjpark @ 2020-02-10 14:50 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit implements the DAMON api for the kernel.  Other kernel code
can use DAMON by calling damon_start() and damon_stop() with their own
'struct damon_ctx'.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h | 71 +++++++++++++++++++++++++++++++++++++++++++
 mm/damon.c            | 70 +++++++++++-------------------------------
 2 files changed, 89 insertions(+), 52 deletions(-)
 create mode 100644 include/linux/damon.h

diff --git a/include/linux/damon.h b/include/linux/damon.h
new file mode 100644
index 000000000000..78785cb88d42
--- /dev/null
+++ b/include/linux/damon.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * DAMON api
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.  All rights reserved.
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#ifndef _DAMON_H_
+#define _DAMON_H_
+
+#include <linux/random.h>
+#include <linux/spinlock_types.h>
+#include <linux/time64.h>
+#include <linux/types.h>
+
+/* Represents a monitoring target region on the virtual address space */
+struct damon_region {
+	unsigned long vm_start;
+	unsigned long vm_end;
+	unsigned long sampling_addr;
+	unsigned int nr_accesses;
+	struct list_head list;
+};
+
+/* Represents a monitoring target task */
+struct damon_task {
+	unsigned long pid;
+	struct list_head regions_list;
+	struct list_head list;
+};
+
+struct damon_ctx {
+	unsigned long sample_interval;
+	unsigned long aggr_interval;
+	unsigned long regions_update_interval;
+	unsigned long min_nr_regions;
+	unsigned long max_nr_regions;
+
+	struct timespec64 last_aggregation;
+	struct timespec64 last_regions_update;
+
+	unsigned char *rbuf;
+	unsigned int rbuf_len;
+	unsigned int rbuf_offset;
+	char *rfile_path;
+
+	struct task_struct *kdamond;
+	bool kdamond_stop;
+	spinlock_t kdamond_lock;
+
+	struct rnd_state rndseed;
+
+	struct list_head tasks_list;	/* 'damon_task' objects */
+
+	/* callbacks */
+	void (*sample_cb)(struct damon_ctx *context);
+	void (*aggregate_cb)(struct damon_ctx *context);
+};
+
+int damon_set_pids(struct damon_ctx *ctx,
+			unsigned long *pids, ssize_t nr_pids);
+int damon_set_recording(struct damon_ctx *ctx,
+			unsigned int rbuf_len, char *rfile_path);
+int damon_set_attrs(struct damon_ctx *ctx, unsigned long s, unsigned long a,
+			unsigned long r, unsigned long min, unsigned long max);
+int damon_start(struct damon_ctx *ctx);
+int damon_stop(struct damon_ctx *ctx);
+
+#endif
diff --git a/mm/damon.c b/mm/damon.c
index c2c40e003580..450b85bef120 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -9,6 +9,7 @@
 
 #define pr_fmt(fmt) "damon: " fmt
 
+#include <linux/damon.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/mm.h>
@@ -40,55 +41,6 @@
 #define damon_for_each_task_safe(ctx, t, next) \
 	list_for_each_entry_safe(t, next, &(ctx)->tasks_list, list)
 
-/* Represents a monitoring target region on the virtual address space */
-struct damon_region {
-	unsigned long vm_start;
-	unsigned long vm_end;
-	unsigned long sampling_addr;
-	unsigned int nr_accesses;
-	struct list_head list;
-};
-
-/* Represents a monitoring target task */
-struct damon_task {
-	unsigned long pid;
-	struct list_head regions_list;
-	struct list_head list;
-};
-
-/*
- * For each 'sample_interval', DAMON checks whether each region is accessed or
- * not.  It aggregates and keeps the access information (number of accesses to
- * each region) for 'aggr_interval' and then flushes it to the result buffer if
- * an 'aggr_interval' surpassed.  And for each 'regions_update_interval', damon
- * checks whether the memory mapping of the target tasks has changed (e.g., by
- * mmap() calls from the applications) and applies the changes.
- *
- * All time intervals are in micro-seconds.
- */
-struct damon_ctx {
-	unsigned long sample_interval;
-	unsigned long aggr_interval;
-	unsigned long regions_update_interval;
-	unsigned long min_nr_regions;
-	unsigned long max_nr_regions;
-
-	struct timespec64 last_aggregation;
-
-	unsigned char *rbuf;
-	unsigned int rbuf_len;
-	unsigned int rbuf_offset;
-	char *rfile_path;
-
-	struct task_struct *kdamond;
-	bool kdamond_stop;
-	spinlock_t kdamond_lock;
-
-	struct rnd_state rndseed;
-
-	struct list_head tasks_list;	/* 'damon_task' objects */
-};
-
 #define LEN_RES_FILE_PATH	256
 
 /* Get a random number in [l, r) */
@@ -885,11 +837,15 @@ static int kdamond_fn(void *data)
 			}
 			mmput(mm);
 		}
+		if (ctx->sample_cb)
+			ctx->sample_cb(ctx);
 
 		if (kdamond_aggregate_interval_passed(ctx)) {
 			kdamond_merge_regions(ctx, max_nr_accesses / 10);
 			kdamond_flush_aggregated(ctx);
 			kdamond_split_regions(ctx);
+			if (ctx->aggregate_cb)
+				ctx->aggregate_cb(ctx);
 		}
 
 		if (kdamond_need_update_regions(ctx))
@@ -952,6 +908,16 @@ static int damon_turn_kdamond(struct damon_ctx *ctx, bool on)
 	return 0;
 }
 
+int damon_start(struct damon_ctx *ctx)
+{
+	return damon_turn_kdamond(ctx, true);
+}
+
+int damon_stop(struct damon_ctx *ctx)
+{
+	return damon_turn_kdamond(ctx, false);
+}
+
 static inline bool damon_is_target_pid(struct damon_ctx *c, unsigned long pid)
 {
 	struct damon_task *t;
@@ -966,7 +932,7 @@ static inline bool damon_is_target_pid(struct damon_ctx *c, unsigned long pid)
 /*
  * This function should not be called while the kdamond is running.
  */
-static int damon_set_pids(struct damon_ctx *ctx,
+int damon_set_pids(struct damon_ctx *ctx,
 			unsigned long *pids, ssize_t nr_pids)
 {
 	ssize_t i;
@@ -1007,7 +973,7 @@ static int damon_set_pids(struct damon_ctx *ctx,
  *
  * Returns 0 on success, negative error code otherwise.
  */
-static int damon_set_recording(struct damon_ctx *ctx,
+int damon_set_recording(struct damon_ctx *ctx,
 				unsigned int rbuf_len, char *path_to_rfile)
 {
 	size_t rfile_path_len;
@@ -1051,7 +1017,7 @@ static int damon_set_recording(struct damon_ctx *ctx,
  *
  * Returns 0 on success, negative error code otherwise.
  */
-static int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
+int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
 		unsigned long aggr_int, unsigned long regions_update_int,
 		unsigned long min_nr_reg, unsigned long max_nr_reg)
 {
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 06/11] mm/damon: Add debugfs interface
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (4 preceding siblings ...)
  2020-02-10 14:50 ` [PATCH v4 05/11] mm/damon: Implement kernel space API sjpark
@ 2020-02-10 14:51 ` sjpark
  2020-02-10 14:52 ` [PATCH v4 07/11] mm/damon: Add a tracepoint for result writing sjpark
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:51 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds a debugfs interface for DAMON.

DAMON exports four files, ``attrs``, ``pids``, ``record``, and
``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.

Attributes
----------

Users can read and write the ``sampling interval``, ``aggregation
interval``, ``regions update interval``, and min/max number of
monitoring target regions by reading from and writing to the ``attrs``
file.  For example, below commands set those values to 5 ms, 100 ms,
1,000 ms, 10, 1000 and check it again::

    # cd <debugfs>/damon
    # echo 5000 100000 1000000 10 1000 > attrs
    # cat attrs
    5000 100000 1000000 10 1000

Target PIDs
-----------

Users can read and write the pids of current monitoring target processes
by reading from and writing to the ``pids`` file.  For example, below
commands set processes having pids 42 and 4242 as the processes to be
monitored and check it again::

    # cd <debugfs>/damon
    # echo 42 4242 > pids
    # cat pids
    42 4242

Note that setting the pids doesn't starts the monitoring.

Record
------

DAMON support direct monitoring result record feature.  The recorded
results are first written to a buffer and flushed to a file in batch.
Users can set the size of the buffer and the path to the result file by
reading from and writing to the ``record`` file.  For example, below
commands set the buffer to be 4 KiB and the result to be saved in
'/damon.data'.

    # cd <debugfs>/damon
    # echo 4096 /damon.data > pids
    # cat record
    4096 /damon.data

Turning On/Off
--------------

You can check current status, start and stop the monitoring by reading
from and writing to the ``monitor_on`` file.  Writing ``on`` to the file
starts DAMON to monitor the target processes with the attributes.
Writing ``off`` to the file stops DAMON.  DAMON also stops if every
target processes is be terminated.  Below example commands turn on, off,
and check status of DAMON::

    # cd <debugfs>/damon
    # echo on > monitor_on
    # echo off > monitor_on
    # cat monitor_on
    off

Please note that you cannot write to the ``attrs`` and ``pids`` files
while the monitoring is turned on.  If you write to the files while
DAMON is running, ``-EINVAL`` will be returned.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon.c | 349 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 348 insertions(+), 1 deletion(-)

diff --git a/mm/damon.c b/mm/damon.c
index 450b85bef120..748cd8537fee 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -10,6 +10,7 @@
 #define pr_fmt(fmt) "damon: " fmt
 
 #include <linux/damon.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/mm.h>
@@ -41,6 +42,24 @@
 #define damon_for_each_task_safe(ctx, t, next) \
 	list_for_each_entry_safe(t, next, &(ctx)->tasks_list, list)
 
+/*
+ * For each 'sample_interval', DAMON checks whether each region is accessed or
+ * not.  It aggregates and keeps the access information (number of accesses to
+ * each region) for 'aggr_interval' and then flushes it to the result buffer if
+ * an 'aggr_interval' surpassed.  And for each 'regions_update_interval', damon
+ * checks whether the memory mapping of the target tasks has changed (e.g., by
+ * mmap() calls from the applications) and applies the changes.
+ *
+ * All time intervals are in micro-seconds.
+ */
+static struct damon_ctx damon_user_ctx = {
+	.sample_interval = 5 * 1000,
+	.aggr_interval = 100 * 1000,
+	.regions_update_interval = 1000 * 1000,
+	.min_nr_regions = 10,
+	.max_nr_regions = 1000,
+};
+
 #define LEN_RES_FILE_PATH	256
 
 /* Get a random number in [l, r) */
@@ -1040,15 +1059,343 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
 	return 0;
 }
 
+/*
+ * debugfs functions
+ */
+
+static ssize_t debugfs_monitor_on_read(struct file *file,
+		char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	char monitor_on_buf[5];
+	bool monitor_on;
+	int ret;
+
+	spin_lock(&ctx->kdamond_lock);
+	monitor_on = ctx->kdamond != NULL;
+	spin_unlock(&ctx->kdamond_lock);
+
+	ret = snprintf(monitor_on_buf, 5, monitor_on ? "on\n" : "off\n");
+
+	return simple_read_from_buffer(buf, count, ppos, monitor_on_buf, ret);
+}
+
+static ssize_t debugfs_monitor_on_write(struct file *file,
+		const char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	ssize_t ret;
+	bool on = false;
+	char cmdbuf[5];
+
+	ret = simple_write_to_buffer(cmdbuf, 5, ppos, buf, count);
+	if (ret < 0)
+		return ret;
+
+	if (sscanf(cmdbuf, "%s", cmdbuf) != 1)
+		return -EINVAL;
+	if (!strncmp(cmdbuf, "on", 5))
+		on = true;
+	else if (!strncmp(cmdbuf, "off", 5))
+		on = false;
+	else
+		return -EINVAL;
+
+	if (damon_turn_kdamond(ctx, on))
+		return -EINVAL;
+
+	return ret;
+}
+
+static ssize_t damon_sprint_pids(struct damon_ctx *ctx, char *buf, ssize_t len)
+{
+	char *cursor = buf;
+	struct damon_task *t;
+	int ret;
+
+	damon_for_each_task(ctx, t) {
+		ret = snprintf(cursor, len, "%lu ", t->pid);
+		cursor += ret;
+	}
+	if (cursor != buf)
+		cursor--;
+	cursor += snprintf(cursor, len, "\n");
+	return cursor - buf;
+}
+
+static ssize_t debugfs_pids_read(struct file *file,
+		char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	ssize_t len;
+	char pids_buf[512];
+
+	len = damon_sprint_pids(ctx, pids_buf, 512);
+
+	return simple_read_from_buffer(buf, count, ppos, pids_buf, len);
+}
+
+/*
+ * Converts a string into an array of unsigned long integers
+ *
+ * Returns an array of unsigned long integers that converted, or NULL if the
+ * input is wrong.
+ */
+static unsigned long *str_to_pids(const char *str, ssize_t len,
+				ssize_t *nr_pids)
+{
+	unsigned long *pids;
+	unsigned long pid;
+	int pos = 0, parsed, ret;
+
+	*nr_pids = 0;
+	pids = kmalloc_array(256, sizeof(unsigned long), GFP_KERNEL);
+	while (*nr_pids < 256 && pos < len) {
+		ret = sscanf(&str[pos], "%lu%n", &pid, &parsed);
+		pos += parsed;
+		if (ret != 1)
+			break;
+		pids[*nr_pids] = pid;
+		*nr_pids += 1;
+	}
+	if (*nr_pids == 0) {
+		kfree(pids);
+		pids = NULL;
+	}
+
+	return pids;
+}
+
+static ssize_t debugfs_pids_write(struct file *file,
+		const char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	ssize_t ret;
+	unsigned long *targets;
+	ssize_t nr_targets;
+	char pids_buf[512];
+
+	ret = simple_write_to_buffer(pids_buf, 512, ppos, buf, count);
+	if (ret < 0)
+		return ret;
+
+	targets = str_to_pids(pids_buf, ret, &nr_targets);
+
+	spin_lock(&ctx->kdamond_lock);
+	if (ctx->kdamond)
+		goto monitor_running;
+
+	damon_set_pids(ctx, targets, nr_targets);
+	spin_unlock(&ctx->kdamond_lock);
+	kfree(targets);
+
+	return ret;
+
+monitor_running:
+	spin_unlock(&ctx->kdamond_lock);
+	pr_err("%s: kdamond is running. Turn it off first.\n", __func__);
+	return -EINVAL;
+}
+
+static ssize_t debugfs_record_read(struct file *file,
+		char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	char record_buf[512];
+	int ret;
+
+	ret = snprintf(record_buf, 512, "%u %s\n",
+			ctx->rbuf_len, ctx->rfile_path);
+	return simple_read_from_buffer(buf, count, ppos, record_buf, ret);
+}
+
+static ssize_t debugfs_record_write(struct file *file,
+		const char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	char record_buf[512];
+	unsigned int rbuf_len;
+	char res_file_path[LEN_RES_FILE_PATH];
+	ssize_t ret;
+
+	if (count > 512) {
+		pr_err("record debugfs input is too large: %s\n", buf);
+		return -ENOMEM;
+	}
+
+	ret = simple_write_to_buffer(record_buf, 512, ppos, buf, count);
+	if (ret < 0)
+		return ret;
+	if (sscanf(record_buf, "%u %s",
+				&rbuf_len, res_file_path) != 2)
+		return -EINVAL;
+
+	spin_lock(&ctx->kdamond_lock);
+	if (ctx->kdamond)
+		goto monitor_running;
+
+	damon_set_recording(ctx, rbuf_len, res_file_path);
+	spin_unlock(&ctx->kdamond_lock);
+
+	return ret;
+
+monitor_running:
+	spin_unlock(&ctx->kdamond_lock);
+	pr_err("%s: kdamond is running. Turn it off first.\n", __func__);
+	return -EINVAL;
+}
+
+
+static ssize_t debugfs_attrs_read(struct file *file,
+		char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	char attrs_buf[256];
+	int ret;
+
+	ret = snprintf(attrs_buf, 256, "%lu %lu %lu %lu %lu\n",
+			ctx->sample_interval, ctx->aggr_interval,
+			ctx->regions_update_interval, ctx->min_nr_regions,
+			ctx->max_nr_regions);
+
+	return simple_read_from_buffer(buf, count, ppos, attrs_buf, ret);
+}
+
+static ssize_t debugfs_attrs_write(struct file *file,
+		const char __user *buf, size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	unsigned long s, a, r, minr, maxr;
+	char attrs_buf[256];
+	ssize_t ret;
+
+	if (count > 256) {
+		pr_err("attributes stream is too large: %s\n", buf);
+		return -ENOMEM;
+	}
+
+	ret = simple_write_to_buffer(attrs_buf, 256, ppos, buf, count);
+	if (ret < 0)
+		return ret;
+
+	if (sscanf(attrs_buf, "%lu %lu %lu %lu %lu",
+				&s, &a, &r, &minr, &maxr) != 5)
+		return -EINVAL;
+
+	spin_lock(&ctx->kdamond_lock);
+	if (ctx->kdamond)
+		goto monitor_running;
+
+	damon_set_attrs(ctx, s, a, r, minr, maxr);
+	spin_unlock(&ctx->kdamond_lock);
+
+	return ret;
+
+monitor_running:
+	spin_unlock(&ctx->kdamond_lock);
+	pr_err("%s: kdamond is running. Turn it off first.\n", __func__);
+	return -EINVAL;
+}
+
+static const struct file_operations monitor_on_fops = {
+	.owner = THIS_MODULE,
+	.read = debugfs_monitor_on_read,
+	.write = debugfs_monitor_on_write,
+};
+
+static const struct file_operations pids_fops = {
+	.owner = THIS_MODULE,
+	.read = debugfs_pids_read,
+	.write = debugfs_pids_write,
+};
+
+static const struct file_operations record_fops = {
+	.owner = THIS_MODULE,
+	.read = debugfs_record_read,
+	.write = debugfs_record_write,
+};
+
+static const struct file_operations attrs_fops = {
+	.owner = THIS_MODULE,
+	.read = debugfs_attrs_read,
+	.write = debugfs_attrs_write,
+};
+
+static struct dentry *debugfs_root;
+
+static int __init debugfs_init(void)
+{
+	const char * const file_names[] = {"attrs", "record",
+		"pids", "monitor_on"};
+	const struct file_operations *fops[] = {&attrs_fops, &record_fops,
+		&pids_fops, &monitor_on_fops};
+	int i;
+
+	debugfs_root = debugfs_create_dir("damon", NULL);
+	if (!debugfs_root) {
+		pr_err("failed to create the debugfs dir\n");
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(file_names); i++) {
+		if (!debugfs_create_file(file_names[i], 0600, debugfs_root,
+					NULL, fops[i])) {
+			pr_err("failed to create %s file\n", file_names[i]);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static int __init damon_init_user_ctx(void)
+{
+	int rc;
+
+	struct damon_ctx *ctx = &damon_user_ctx;
+
+	ktime_get_coarse_ts64(&ctx->last_aggregation);
+	ctx->last_regions_update = ctx->last_aggregation;
+
+	ctx->rbuf_offset = 0;
+	rc = damon_set_recording(ctx, 1024 * 1024, "/damon.data");
+	if (rc)
+		return rc;
+
+	ctx->kdamond = NULL;
+	ctx->kdamond_stop = false;
+	spin_lock_init(&ctx->kdamond_lock);
+
+	prandom_seed_state(&ctx->rndseed, 42);
+	INIT_LIST_HEAD(&ctx->tasks_list);
+
+	ctx->sample_cb = NULL;
+	ctx->aggregate_cb = NULL;
+
+	return 0;
+}
+
 static int __init damon_init(void)
 {
+	int rc;
+
 	pr_info("init\n");
 
-	return 0;
+	rc = damon_init_user_ctx();
+	if (rc)
+		return rc;
+
+	return debugfs_init();
 }
 
 static void __exit damon_exit(void)
 {
+	damon_turn_kdamond(&damon_user_ctx, false);
+	debugfs_remove_recursive(debugfs_root);
+
+	kfree(damon_user_ctx.rbuf);
+	kfree(damon_user_ctx.rfile_path);
+
 	pr_info("exit\n");
 }
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 07/11] mm/damon: Add a tracepoint for result writing
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (5 preceding siblings ...)
  2020-02-10 14:51 ` [PATCH v4 06/11] mm/damon: Add debugfs interface sjpark
@ 2020-02-10 14:52 ` sjpark
  2020-02-10 14:52 ` [PATCH v4 08/11] tools: Add a minimal user-space tool for DAMON sjpark
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:52 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds a tracepoint for DAMON's result buffer writing.  It is
called for each writing of the DAMON results and print the result data.
Therefore, it would be used to easily integrated with other tracepoint
supporting tracers such as perf.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/trace/events/damon.h | 32 ++++++++++++++++++++++++++++++++
 mm/damon.c                   |  4 ++++
 2 files changed, 36 insertions(+)
 create mode 100644 include/trace/events/damon.h

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
new file mode 100644
index 000000000000..fb33993620ce
--- /dev/null
+++ b/include/trace/events/damon.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM damon
+
+#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DAMON_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(damon_write_rbuf,
+
+	TP_PROTO(void *buf, const ssize_t sz),
+
+	TP_ARGS(buf, sz),
+
+	TP_STRUCT__entry(
+		__dynamic_array(char, buf, sz)
+	),
+
+	TP_fast_assign(
+		memcpy(__get_dynamic_array(buf), buf, sz);
+	),
+
+	TP_printk("dat=%s", __print_hex(__get_dynamic_array(buf),
+			__get_dynamic_array_len(buf)))
+);
+
+#endif /* _TRACE_DAMON_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/damon.c b/mm/damon.c
index 748cd8537fee..02bfa12940ea 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -9,6 +9,8 @@
 
 #define pr_fmt(fmt) "damon: " fmt
 
+#define CREATE_TRACE_POINTS
+
 #include <linux/damon.h>
 #include <linux/debugfs.h>
 #include <linux/delay.h>
@@ -20,6 +22,7 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/task.h>
 #include <linux/slab.h>
+#include <trace/events/damon.h>
 
 #define damon_get_task_struct(t) \
 	(get_pid_task(find_vpid(t->pid), PIDTYPE_PID))
@@ -553,6 +556,7 @@ static void damon_flush_rbuffer(struct damon_ctx *ctx)
  */
 static void damon_write_rbuf(struct damon_ctx *ctx, void *data, ssize_t size)
 {
+	trace_damon_write_rbuf(data, size);
 	if (!ctx->rbuf_len || !ctx->rbuf)
 		return;
 	if (ctx->rbuf_offset + size > ctx->rbuf_len)
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 08/11] tools: Add a minimal user-space tool for DAMON
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (6 preceding siblings ...)
  2020-02-10 14:52 ` [PATCH v4 07/11] mm/damon: Add a tracepoint for result writing sjpark
@ 2020-02-10 14:52 ` sjpark
  2020-02-10 14:53 ` [PATCH v4 09/11] Documentation/admin-guide/mm: Add a document " sjpark
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:52 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds a shallow wrapper python script, ``/tools/damon/damo``
that provides more convenient interface.  Note that it is only aimed to
be used for minimal reference of the DAMON's debugfs interfaces and for
debugging of the DAMON itself.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 tools/damon/.gitignore    |   1 +
 tools/damon/_dist.py      |  35 ++++
 tools/damon/bin2txt.py    |  64 +++++++
 tools/damon/damo          |  37 ++++
 tools/damon/heats.py      | 358 ++++++++++++++++++++++++++++++++++++++
 tools/damon/nr_regions.py |  88 ++++++++++
 tools/damon/record.py     | 219 +++++++++++++++++++++++
 tools/damon/report.py     |  45 +++++
 tools/damon/wss.py        |  94 ++++++++++
 9 files changed, 941 insertions(+)
 create mode 100644 tools/damon/.gitignore
 create mode 100644 tools/damon/_dist.py
 create mode 100644 tools/damon/bin2txt.py
 create mode 100755 tools/damon/damo
 create mode 100644 tools/damon/heats.py
 create mode 100644 tools/damon/nr_regions.py
 create mode 100644 tools/damon/record.py
 create mode 100644 tools/damon/report.py
 create mode 100644 tools/damon/wss.py

diff --git a/tools/damon/.gitignore b/tools/damon/.gitignore
new file mode 100644
index 000000000000..96403d36ff93
--- /dev/null
+++ b/tools/damon/.gitignore
@@ -0,0 +1 @@
+__pycache__/*
diff --git a/tools/damon/_dist.py b/tools/damon/_dist.py
new file mode 100644
index 000000000000..f26409cf9232
--- /dev/null
+++ b/tools/damon/_dist.py
@@ -0,0 +1,35 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import os
+import struct
+import subprocess
+
+def access_patterns(f):
+    nr_regions = struct.unpack('I', f.read(4))[0]
+
+    patterns = []
+    for r in range(nr_regions):
+        saddr = struct.unpack('L', f.read(8))[0]
+        eaddr = struct.unpack('L', f.read(8))[0]
+        nr_accesses = struct.unpack('I', f.read(4))[0]
+        patterns.append([eaddr - saddr, nr_accesses])
+    return patterns
+
+def plot_dist(data_file, output_file, xlabel):
+    terminal = output_file.split('.')[-1]
+    if not terminal in ['pdf', 'jpeg', 'png', 'svg']:
+        os.remove(data_file)
+        print("Unsupported plot output type.")
+        exit(-1)
+
+    gnuplot_cmd = """
+    set term %s;
+    set output '%s';
+    set key off;
+    set ylabel 'working set size (bytes)';
+    set xlabel '%s';
+    plot '%s' with linespoints;""" % (terminal, output_file, xlabel, data_file)
+    subprocess.call(['gnuplot', '-e', gnuplot_cmd])
+    os.remove(data_file)
+
diff --git a/tools/damon/bin2txt.py b/tools/damon/bin2txt.py
new file mode 100644
index 000000000000..d5ffac60e02c
--- /dev/null
+++ b/tools/damon/bin2txt.py
@@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import argparse
+import os
+import struct
+import sys
+
+def parse_time(bindat):
+    "bindat should be 16 bytes"
+    sec = struct.unpack('l', bindat[0:8])[0]
+    nsec = struct.unpack('l', bindat[8:16])[0]
+    return sec * 1000000000 + nsec;
+
+def pr_region(f):
+    saddr = struct.unpack('L', f.read(8))[0]
+    eaddr = struct.unpack('L', f.read(8))[0]
+    nr_accesses = struct.unpack('I', f.read(4))[0]
+    print("%012x-%012x(%10d):\t%d" %
+            (saddr, eaddr, eaddr - saddr, nr_accesses))
+
+def pr_task_info(f):
+    pid = struct.unpack('L', f.read(8))[0]
+    print("pid: ", pid)
+    nr_regions = struct.unpack('I', f.read(4))[0]
+    print("nr_regions: ", nr_regions)
+    for r in range(nr_regions):
+        pr_region(f)
+
+def set_argparser(parser):
+    parser.add_argument('--input', '-i', type=str, metavar='<file>',
+            default='damon.data', help='input file name')
+
+def main(args=None):
+    if not args:
+        parser = argparse.ArgumentParser()
+        set_argparser(parser)
+        args = parser.parse_args()
+
+    file_path = args.input
+
+    if not os.path.isfile(file_path):
+        print('input file (%s) is not exist' % file_path)
+        exit(1)
+
+    with open(file_path, 'rb') as f:
+        start_time = None
+        while True:
+            timebin = f.read(16)
+            if len(timebin) != 16:
+                break
+            time = parse_time(timebin)
+            if not start_time:
+                start_time = time
+                print("start_time: ", start_time)
+            print("rel time: %16d" % (time - start_time))
+            nr_tasks = struct.unpack('I', f.read(4))[0]
+            print("nr_tasks: ", nr_tasks)
+            for t in range(nr_tasks):
+                pr_task_info(f)
+                print("")
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/damon/damo b/tools/damon/damo
new file mode 100755
index 000000000000..58e1099ae5fc
--- /dev/null
+++ b/tools/damon/damo
@@ -0,0 +1,37 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import argparse
+
+import record
+import report
+
+class SubCmdHelpFormatter(argparse.RawDescriptionHelpFormatter):
+    def _format_action(self, action):
+        parts = super(argparse.RawDescriptionHelpFormatter,
+                self)._format_action(action)
+        # skip sub parsers help
+        if action.nargs == argparse.PARSER:
+            parts = '\n'.join(parts.split('\n')[1:])
+        return parts
+
+parser = argparse.ArgumentParser(formatter_class=SubCmdHelpFormatter)
+
+subparser = parser.add_subparsers(title='command', dest='command',
+        metavar='<command>')
+subparser.required = True
+
+parser_record = subparser.add_parser('record',
+        help='record data accesses of the given target processes')
+record.set_argparser(parser_record)
+
+parser_report = subparser.add_parser('report',
+        help='report the recorded data accesses in the specified form')
+report.set_argparser(parser_report)
+
+args = parser.parse_args()
+
+if args.command == 'record':
+    record.main(args)
+elif args.command == 'report':
+    report.main(args)
diff --git a/tools/damon/heats.py b/tools/damon/heats.py
new file mode 100644
index 000000000000..48e966c5ca02
--- /dev/null
+++ b/tools/damon/heats.py
@@ -0,0 +1,358 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Transform binary trace data into human readable text that can be used for
+heatmap drawing, or directly plot the data in a heatmap format.
+
+Format of the text is:
+
+    <time> <space> <heat>
+    ...
+
+"""
+
+import argparse
+import os
+import struct
+import subprocess
+import sys
+import tempfile
+
+class HeatSample:
+    space_idx = None
+    sz_time_space = None
+    heat = None
+
+    def __init__(self, space_idx, sz_time_space, heat):
+        if sz_time_space < 0:
+            raise RuntimeError()
+        self.space_idx = space_idx
+        self.sz_time_space = sz_time_space
+        self.heat = heat
+
+    def total_heat(self):
+        return self.heat * self.sz_time_space
+
+    def merge(self, sample):
+        "sample must have a space idx that same to self"
+        heat_sum = self.total_heat() + sample.total_heat()
+        self.heat = heat_sum / (self.sz_time_space + sample.sz_time_space)
+        self.sz_time_space += sample.sz_time_space
+
+def pr_samples(samples, time_idx, time_unit, region_unit):
+    display_time = time_idx * time_unit
+    for idx, sample in enumerate(samples):
+        display_addr = idx * region_unit
+        if not sample:
+            print("%s\t%s\t%s" % (display_time, display_addr, 0.0))
+            continue
+        print("%s\t%s\t%s" % (display_time, display_addr, sample.total_heat() /
+            time_unit / region_unit))
+
+def to_idx(value, min_, unit):
+    return (value - min_) // unit
+
+def read_task_heats(f, pid, aunit, amin, amax):
+    pid_ = struct.unpack('L', f.read(8))[0]
+    nr_regions = struct.unpack('I', f.read(4))[0]
+    if pid_ != pid:
+        f.read(20 * nr_regions)
+        return None
+    samples = []
+    for i in range(nr_regions):
+        saddr = struct.unpack('L', f.read(8))[0]
+        eaddr = struct.unpack('L', f.read(8))[0]
+        eaddr = min(eaddr, amax - 1)
+        heat = struct.unpack('I', f.read(4))[0]
+
+        if eaddr <= amin:
+            continue
+        if saddr >= amax:
+            continue
+        saddr = max(amin, saddr)
+        eaddr = min(amax, eaddr)
+
+        sidx = to_idx(saddr, amin, aunit)
+        eidx = to_idx(eaddr - 1, amin, aunit)
+        for idx in range(sidx, eidx + 1):
+            sa = max(amin + idx * aunit, saddr)
+            ea = min(amin + (idx + 1) * aunit, eaddr)
+            sample = HeatSample(idx, (ea - sa), heat)
+            samples.append(sample)
+    return samples
+
+def parse_time(bindat):
+    sec = struct.unpack('l', bindat[0:8])[0]
+    nsec = struct.unpack('l', bindat[8:16])[0]
+    return sec * 1000000000 + nsec
+
+def apply_samples(target_samples, samples, start_time, end_time, aunit, amin):
+    for s in samples:
+        sample = HeatSample(s.space_idx,
+                s.sz_time_space * (end_time - start_time), s.heat)
+        idx = sample.space_idx
+        if not target_samples[idx]:
+            target_samples[idx] = sample
+        else:
+            target_samples[idx].merge(sample)
+
+def __pr_heats(f, pid, tunit, tmin, tmax, aunit, amin, amax):
+    heat_samples = [None] * ((amax - amin) // aunit)
+
+    start_time = 0
+    end_time = 0
+    last_flushed = -1
+    while True:
+        start_time = end_time
+        timebin = f.read(16)
+        if (len(timebin)) != 16:
+            break
+        end_time = parse_time(timebin)
+        nr_tasks = struct.unpack('I', f.read(4))[0]
+        samples_set = {}
+        for t in range(nr_tasks):
+            samples = read_task_heats(f, pid, aunit, amin, amax)
+            if samples:
+                samples_set[pid] = samples
+        if not pid in samples_set:
+            continue
+        if start_time >= tmax:
+            continue
+        if end_time <= tmin:
+            continue
+        start_time = max(start_time, tmin)
+        end_time = min(end_time, tmax)
+
+        sidx = to_idx(start_time, tmin, tunit)
+        eidx = to_idx(end_time - 1, tmin, tunit)
+        for idx in range(sidx, eidx + 1):
+            if idx != last_flushed:
+                pr_samples(heat_samples, idx, tunit, aunit)
+                heat_samples = [None] * ((amax - amin) // aunit)
+                last_flushed = idx
+            st = max(start_time, tmin + idx * tunit)
+            et = min(end_time, tmin + (idx + 1) * tunit)
+            apply_samples(heat_samples, samples_set[pid], st, et, aunit, amin)
+
+def pr_heats(args):
+    binfile = args.input
+    pid = args.pid
+    tres = args.tres
+    tmin = args.tmin
+    ares = args.ares
+    amin = args.amin
+
+    tunit = (args.tmax - tmin) // tres
+    aunit = (args.amax - amin) // ares
+
+    # Compensate the values so that those fit with the resolution
+    tmax = tmin + tunit * tres
+    amax = amin + aunit * ares
+
+    with open(binfile, 'rb') as f:
+        __pr_heats(f, pid, tunit, tmin, tmax, aunit, amin, amax)
+
+class GuideInfo:
+    pid = None
+    start_time = None
+    end_time = None
+    lowest_addr = None
+    highest_addr = None
+    gaps = None
+
+    def __init__(self, pid, start_time):
+        self.pid = pid
+        self.start_time = start_time
+        self.gaps = []
+
+    def regions(self):
+        regions = []
+        region = [self.lowest_addr]
+        for gap in self.gaps:
+            for idx, point in enumerate(gap):
+                if idx == 0:
+                    region.append(point)
+                    regions.append(region)
+                else:
+                    region = [point]
+        region.append(self.highest_addr)
+        regions.append(region)
+        return regions
+
+    def total_space(self):
+        ret = 0
+        for r in self.regions():
+            ret += r[1] - r[0]
+        return ret
+
+    def __str__(self):
+        lines = ['pid:%d' % self.pid]
+        lines.append('time: %d-%d (%d)' % (self.start_time, self.end_time,
+                    self.end_time - self.start_time))
+        for idx, region in enumerate(self.regions()):
+            lines.append('region\t%2d: %020d-%020d (%d)' %
+                    (idx, region[0], region[1], region[1] - region[0]))
+        return '\n'.join(lines)
+
+def is_overlap(region1, region2):
+    if region1[1] < region2[0]:
+        return False
+    if region2[1] < region1[0]:
+        return False
+    return True
+
+def overlap_region_of(region1, region2):
+    return [max(region1[0], region2[0]), min(region1[1], region2[1])]
+
+def overlapping_regions(regions1, regions2):
+    overlap_regions = []
+    for r1 in regions1:
+        for r2 in regions2:
+            if is_overlap(r1, r2):
+                r1 = overlap_region_of(r1, r2)
+        if r1:
+            overlap_regions.append(r1)
+    return overlap_regions
+
+def get_guide_info(binfile):
+    "Read file, return the set of guide information objects of the data"
+    guides = {}
+    with open(binfile, 'rb') as f:
+        while True:
+            timebin = f.read(16)
+            if len(timebin) != 16:
+                break
+            monitor_time = parse_time(timebin)
+            nr_tasks = struct.unpack('I', f.read(4))[0]
+            for t in range(nr_tasks):
+                pid = struct.unpack('L', f.read(8))[0]
+                nr_regions = struct.unpack('I', f.read(4))[0]
+                if not pid in guides:
+                    guides[pid] = GuideInfo(pid, monitor_time)
+                guide = guides[pid]
+                guide.end_time = monitor_time
+
+                last_addr = None
+                gaps = []
+                for r in range(nr_regions):
+                    saddr = struct.unpack('L', f.read(8))[0]
+                    eaddr = struct.unpack('L', f.read(8))[0]
+                    f.read(4)
+
+                    if not guide.lowest_addr or saddr < guide.lowest_addr:
+                        guide.lowest_addr = saddr
+                    if not guide.highest_addr or eaddr > guide.highest_addr:
+                        guide.highest_addr = eaddr
+
+                    if not last_addr:
+                        last_addr = eaddr
+                        continue
+                    if last_addr != saddr:
+                        gaps.append([last_addr, saddr])
+                    last_addr = eaddr
+
+                if not guide.gaps:
+                    guide.gaps = gaps
+                else:
+                    guide.gaps = overlapping_regions(guide.gaps, gaps)
+    return sorted(list(guides.values()), key=lambda x: x.total_space(),
+                    reverse=True)
+
+def pr_guide(binfile):
+    for guide in get_guide_info(binfile):
+        print(guide)
+
+def region_sort_key(region):
+    return region[1] - region[0]
+
+def set_missed_args(args):
+    if args.pid and args.tmin and args.tmax and args.amin and args.amax:
+        return
+    guides = get_guide_info(args.input)
+    guide = guides[0]
+    if not args.pid:
+        args.pid = guide.pid
+    for g in guides:
+        if g.pid == args.pid:
+            guide = g
+            break
+
+    if not args.tmin:
+        args.tmin = guide.start_time
+    if not args.tmax:
+        args.tmax = guide.end_time
+
+    if not args.amin or not args.amax:
+        region = sorted(guide.regions(), key=lambda x: x[1] - x[0],
+                reverse=True)[0]
+        args.amin = region[0]
+        args.amax = region[1]
+
+def plot_heatmap(data_file, output_file):
+    terminal = output_file.split('.')[-1]
+    if not terminal in ['pdf', 'jpeg', 'png', 'svg']:
+        os.remove(data_file)
+        print("Unsupported plot output type.")
+        exit(-1)
+
+    gnuplot_cmd = """
+    set term %s;
+    set output '%s';
+    set key off;
+    set xrange [0:];
+    set yrange [0:];
+    set xlabel 'Time (ns)';
+    set ylabel 'Virtual Address (bytes)';
+    plot '%s' using 1:2:3 with image;""" % (terminal, output_file, data_file)
+    subprocess.call(['gnuplot', '-e', gnuplot_cmd])
+    os.remove(data_file)
+
+def set_argparser(parser):
+    parser.add_argument('--input', '-i', type=str, metavar='<file>',
+            default='damon.data', help='input file name')
+    parser.add_argument('--pid', metavar='<pid>', type=int,
+            help='pid of target task')
+    parser.add_argument('--tres', metavar='<resolution>', type=int,
+            default=500, help='time resolution of the output')
+    parser.add_argument('--tmin', metavar='<time>', type=lambda x: int(x,0),
+            help='minimal time of the output')
+    parser.add_argument('--tmax', metavar='<time>', type=lambda x: int(x,0),
+            help='maximum time of the output')
+    parser.add_argument('--ares', metavar='<resolution>', type=int, default=500,
+            help='space address resolution of the output')
+    parser.add_argument('--amin', metavar='<address>', type=lambda x: int(x,0),
+            help='minimal space address of the output')
+    parser.add_argument('--amax', metavar='<address>', type=lambda x: int(x,0),
+            help='maximum space address of the output')
+    parser.add_argument('--guide', action='store_true',
+            help='print a guidance for the min/max/resolution settings')
+    parser.add_argument('--heatmap', metavar='<file>', type=str,
+            help='heatmap image file to create')
+
+def main(args=None):
+    if not args:
+        parser = argparse.ArgumentParser()
+        set_argparser(parser)
+        args = parser.parse_args()
+
+    if args.guide:
+        pr_guide(args.input)
+    else:
+        set_missed_args(args)
+        orig_stdout = sys.stdout
+        if args.heatmap:
+            tmp_path = tempfile.mkstemp()[1]
+            tmp_file = open(tmp_path, 'w')
+            sys.stdout = tmp_file
+
+        pr_heats(args)
+
+        if args.heatmap:
+            sys.stdout = orig_stdout
+            tmp_file.flush()
+            tmp_file.close()
+            plot_heatmap(tmp_path, args.heatmap)
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/damon/nr_regions.py b/tools/damon/nr_regions.py
new file mode 100644
index 000000000000..4eb7e824f7b6
--- /dev/null
+++ b/tools/damon/nr_regions.py
@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"Print out distribution of the number of regions in the given record"
+
+import argparse
+import struct
+import sys
+import tempfile
+
+import _dist
+
+def set_argparser(parser):
+    parser.add_argument('--input', '-i', type=str, metavar='<file>',
+            default='damon.data', help='input file name')
+    parser.add_argument('--range', '-r', type=int, nargs=3,
+            metavar=('<start>', '<stop>', '<step>'),
+            help='range of percentiles to print')
+    parser.add_argument('--sortby', '-s', choices=['time', 'size'],
+            help='the metric to be used for sorting the number of regions')
+    parser.add_argument('--plot', '-p', type=str, metavar='<file>',
+            help='plot the distribution to an image file')
+
+def main(args=None):
+    if not args:
+        parser = argparse.ArgumentParser()
+        set_argparser(parser)
+        args = parser.parse_args()
+
+    percentiles = [0, 25, 50, 75, 100]
+
+    file_path = args.input
+    if args.range:
+        percentiles = range(args.range[0], args.range[1], args.range[2])
+    nr_regions_sort = True
+    if args.sortby == 'time':
+        nr_regions_sort = False
+
+    pid_pattern_map = {}
+    with open(file_path, 'rb') as f:
+        start_time = None
+        while True:
+            timebin = f.read(16)
+            if len(timebin) != 16:
+                break
+            nr_tasks = struct.unpack('I', f.read(4))[0]
+            for t in range(nr_tasks):
+                pid = struct.unpack('L', f.read(8))[0]
+                if not pid in pid_pattern_map:
+                    pid_pattern_map[pid] = []
+                pid_pattern_map[pid].append(_dist.access_patterns(f))
+
+    orig_stdout = sys.stdout
+    if args.plot:
+        tmp_path = tempfile.mkstemp()[1]
+        tmp_file = open(tmp_path, 'w')
+        sys.stdout = tmp_file
+
+    print('# <percentile> <# regions>')
+    for pid in pid_pattern_map.keys():
+        # Skip firs 20 regions as those would not adaptively adjusted
+        snapshots = pid_pattern_map[pid][20:]
+        nr_regions_dist = []
+        for snapshot in snapshots:
+            nr_regions_dist.append(len(snapshot))
+        if nr_regions_sort:
+            nr_regions_dist.sort(reverse=False)
+
+        print('# pid\t%s' % pid)
+        print('# avr:\t%d' % (sum(nr_regions_dist) / len(nr_regions_dist)))
+        for percentile in percentiles:
+            thres_idx = int(percentile / 100.0 * len(nr_regions_dist))
+            if thres_idx == len(nr_regions_dist):
+                thres_idx -= 1
+            threshold = nr_regions_dist[thres_idx]
+            print('%d\t%d' % (percentile, nr_regions_dist[thres_idx]))
+
+    if args.plot:
+        sys.stdout = orig_stdout
+        tmp_file.flush()
+        tmp_file.close()
+        xlabel = 'runtime (percent)'
+        if nr_regions_sort:
+            xlabel = 'percentile'
+        _dist.plot_dist(tmp_path, args.plot, xlabel)
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/damon/record.py b/tools/damon/record.py
new file mode 100644
index 000000000000..0815434b76b2
--- /dev/null
+++ b/tools/damon/record.py
@@ -0,0 +1,219 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Record data access patterns of the target process.
+"""
+
+import argparse
+import copy
+import os
+import signal
+import subprocess
+import time
+
+debugfs_attrs = None
+debugfs_record = None
+debugfs_pids = None
+debugfs_monitor_on = None
+
+def set_target_pid(pid):
+    return subprocess.call('echo %s > %s' % (pid, debugfs_pids), shell=True,
+            executable='/bin/bash')
+
+def turn_damon(on_off):
+    return subprocess.call("echo %s > %s" % (on_off, debugfs_monitor_on),
+            shell=True, executable="/bin/bash")
+
+def is_damon_running():
+    with open(debugfs_monitor_on, 'r') as f:
+        return f.read().strip() == 'on'
+
+def do_record(target, is_target_cmd, attrs, old_attrs):
+    if os.path.isfile(attrs.rfile_path):
+        os.rename(attrs.rfile_path, attrs.rfile_path + '.old')
+
+    if attrs.apply():
+        print('attributes (%s) failed to be applied' % attrs)
+        cleanup_exit(old_attrs, -1)
+    print('# damon attrs: %s' % attrs)
+    if is_target_cmd:
+        p = subprocess.Popen(target, shell=True, executable='/bin/bash')
+        target = p.pid
+    if set_target_pid(target):
+        print('pid setting (%s) failed' % target)
+        cleanup_exit(old_attrs, -2)
+    if turn_damon('on'):
+        print('could not turn on damon' % target)
+        cleanup_exit(old_attrs, -3)
+    if is_target_cmd:
+        p.wait()
+    while True:
+        # damon will turn it off by itself if the target tasks are terminated.
+        if not is_damon_running():
+            break
+        time.sleep(1)
+
+    cleanup_exit(old_attrs, 0)
+
+class Attrs:
+    sample_interval = None
+    aggr_interval = None
+    regions_update_interval = None
+    min_nr_regions = None
+    max_nr_regions = None
+    rbuf_len = None
+    rfile_path = None
+
+    def __init__(self, s, a, r, n, x, l, f):
+        self.sample_interval = s
+        self.aggr_interval = a
+        self.regions_update_interval = r
+        self.min_nr_regions = n
+        self.max_nr_regions = x
+        self.rbuf_len = l
+        self.rfile_path = f
+
+    def __str__(self):
+        return "%s %s %s %s %s %s %s" % (self.sample_interval, self.aggr_interval,
+                self.regions_update_interval, self.min_nr_regions,
+                self.max_nr_regions, self.rbuf_len, self.rfile_path)
+
+    def attr_str(self):
+        return "%s %s %s %s %s " % (self.sample_interval, self.aggr_interval,
+                self.regions_update_interval, self.min_nr_regions,
+                self.max_nr_regions)
+
+    def record_str(self):
+        return '%s %s ' % (self.rbuf_len, self.rfile_path)
+
+    def apply(self):
+        ret = subprocess.call('echo %s > %s' % (self.attr_str(), debugfs_attrs),
+                shell=True, executable='/bin/bash')
+        if ret:
+            return ret
+        return subprocess.call('echo %s > %s' % (self.record_str(),
+            debugfs_record), shell=True, executable='/bin/bash')
+
+def current_attrs():
+    with open(debugfs_attrs, 'r') as f:
+        attrs = f.read().split()
+    attrs = [int(x) for x in attrs]
+
+    with open(debugfs_record, 'r') as f:
+        rattrs = f.read().split()
+    attrs.append(int(rattrs[0]))
+    attrs.append(rattrs[1])
+    return Attrs(*attrs)
+
+def cmd_args_to_attrs(args):
+    "Generate attributes based on current attributes and command arguments"
+    a = current_attrs()
+    if args.sample:
+        a.sample_interval = args.sample
+    if args.aggr:
+        a.aggr_interval = args.aggr
+    if args.updr:
+        a.regions_update_interval = args.updr
+    if args.minr:
+        a.min_nr_regions = args.minr
+    if args.maxr:
+        a.max_nr_regions = args.maxr
+    if args.rbuf:
+        a.rbuf_len = args.rbuf
+    if args.out:
+        if not os.path.isabs(args.out):
+            args.out = os.path.join(os.getcwd(), args.out)
+        a.rfile_path = args.out
+    return a
+
+def cleanup_exit(orig_attrs, exit_code):
+    if is_damon_running():
+        if turn_damon('off'):
+            print('failed to turn damon off!')
+    if orig_attrs:
+        if orig_attrs.apply():
+            print('original attributes (%s) restoration failed!' % orig_attrs)
+    exit(exit_code)
+
+def sighandler(signum, frame):
+    print('\nsignal %s received' % signum)
+    cleanup_exit(orig_attrs, signum)
+
+def chk_update_debugfs(debugfs):
+    global debugfs_attrs
+    global debugfs_record
+    global debugfs_pids
+    global debugfs_monitor_on
+
+    debugfs_damon = os.path.join(debugfs, 'damon')
+    debugfs_attrs = os.path.join(debugfs_damon, 'attrs')
+    debugfs_record = os.path.join(debugfs_damon, 'record')
+    debugfs_pids = os.path.join(debugfs_damon, 'pids')
+    debugfs_monitor_on = os.path.join(debugfs_damon, 'monitor_on')
+
+    if not os.path.isdir(debugfs_damon):
+        print("damon debugfs dir (%s) not found", debugfs_damon)
+        exit(1)
+
+    for f in [debugfs_attrs, debugfs_record, debugfs_pids, debugfs_monitor_on]:
+        if not os.path.isfile(f):
+            print("damon debugfs file (%s) not found" % f)
+            exit(1)
+
+def chk_permission():
+    if os.geteuid() != 0:
+        print("Run as root")
+        exit(1)
+
+def set_argparser(parser):
+    parser.add_argument('target', type=str, metavar='<target>',
+            help='the target command or the pid to record')
+    parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
+            help='sampling interval')
+    parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
+            help='aggregate interval')
+    parser.add_argument('-u', '--updr', metavar='<interval>', type=int,
+            help='regions update interval')
+    parser.add_argument('-n', '--minr', metavar='<# regions>', type=int,
+            help='minimal number of regions')
+    parser.add_argument('-m', '--maxr', metavar='<# regions>', type=int,
+            help='maximum number of regions')
+    parser.add_argument('-l', '--rbuf', metavar='<len>', type=int,
+            default=1024*1024, help='length of record result buffer')
+    parser.add_argument('-o', '--out', metavar='<file path>', type=str,
+            default='damon.data', help='output file path')
+    parser.add_argument('-d', '--debugfs', metavar='<debugfs>', type=str,
+            default='/sys/kernel/debug', help='debugfs mounted path')
+
+def main(args=None):
+    global orig_attrs
+    if not args:
+        parser = argparse.ArgumentParser()
+        set_argparser(parser)
+        args = parser.parse_args()
+
+    chk_permission()
+    chk_update_debugfs(args.debugfs)
+
+    signal.signal(signal.SIGINT, sighandler)
+    signal.signal(signal.SIGTERM, sighandler)
+    orig_attrs = current_attrs()
+
+    new_attrs = cmd_args_to_attrs(args)
+    target = args.target
+
+    target_fields = target.split()
+    if not subprocess.call('which %s > /dev/null' % target_fields[0],
+            shell=True, executable='/bin/bash'):
+        do_record(target, True, new_attrs, orig_attrs)
+    else:
+        try:
+            pid = int(target)
+        except:
+            print('target \'%s\' is neither a command, nor a pid' % target)
+            exit(1)
+        do_record(target, False, new_attrs, orig_attrs)
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/damon/report.py b/tools/damon/report.py
new file mode 100644
index 000000000000..c661c7b2f1af
--- /dev/null
+++ b/tools/damon/report.py
@@ -0,0 +1,45 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import argparse
+
+import bin2txt
+import heats
+import nr_regions
+import wss
+
+def set_argparser(parser):
+    subparsers = parser.add_subparsers(title='report type', dest='report_type',
+            metavar='<report type>', help='the type of the report to generate')
+    subparsers.required = True
+
+    parser_raw = subparsers.add_parser('raw', help='human readable raw data')
+    bin2txt.set_argparser(parser_raw)
+
+    parser_heats = subparsers.add_parser('heats', help='heats of regions')
+    heats.set_argparser(parser_heats)
+
+    parser_wss = subparsers.add_parser('wss', help='working set size')
+    wss.set_argparser(parser_wss)
+
+    parser_nr_regions = subparsers.add_parser('nr_regions',
+            help='number of regions')
+    nr_regions.set_argparser(parser_nr_regions)
+
+def main(args=None):
+    if not args:
+        parser = argparse.ArgumentParser()
+        set_argparser(parser)
+        args = parser.parse_args()
+
+    if args.report_type == 'raw':
+        bin2txt.main(args)
+    elif args.report_type == 'heats':
+        heats.main(args)
+    elif args.report_type == 'wss':
+        wss.main(args)
+    elif args.report_type == 'nr_regions':
+        nr_regions.main(args)
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/damon/wss.py b/tools/damon/wss.py
new file mode 100644
index 000000000000..fd5a0320070d
--- /dev/null
+++ b/tools/damon/wss.py
@@ -0,0 +1,94 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"Print out the distribution of the working set sizes of the given trace"
+
+import argparse
+import struct
+import sys
+import tempfile
+
+import _dist
+
+def set_argparser(parser):
+    parser.add_argument('--input', '-i', type=str, metavar='<file>',
+            default='damon.data', help='input file name')
+    parser.add_argument('--range', '-r', type=int, nargs=3,
+            metavar=('<start>', '<stop>', '<step>'),
+            help='range of wss percentiles to print')
+    parser.add_argument('--sortby', '-s', choices=['time', 'size'],
+            help='the metric to be used for the sort of the working set sizes')
+    parser.add_argument('--plot', '-p', type=str, metavar='<file>',
+            help='plot the distribution to an image file')
+
+def main(args=None):
+    if not args:
+        parser = argparse.ArgumentParser()
+        set_argparser(parser)
+        args = parser.parse_args()
+
+    percentiles = [0, 25, 50, 75, 100]
+
+    file_path = args.input
+    if args.range:
+        percentiles = range(args.range[0], args.range[1], args.range[2])
+    wss_sort = True
+    if args.sortby == 'time':
+        wss_sort = False
+
+    pid_pattern_map = {}
+    with open(file_path, 'rb') as f:
+        start_time = None
+        while True:
+            timebin = f.read(16)
+            if len(timebin) != 16:
+                break
+            nr_tasks = struct.unpack('I', f.read(4))[0]
+            for t in range(nr_tasks):
+                pid = struct.unpack('L', f.read(8))[0]
+                if not pid in pid_pattern_map:
+                    pid_pattern_map[pid] = []
+                pid_pattern_map[pid].append(_dist.access_patterns(f))
+
+    orig_stdout = sys.stdout
+    if args.plot:
+        tmp_path = tempfile.mkstemp()[1]
+        tmp_file = open(tmp_path, 'w')
+        sys.stdout = tmp_file
+
+    print('# <percentile> <wss>')
+    for pid in pid_pattern_map.keys():
+        # Skip first 20 snapshots as regions may not adjusted yet.
+        snapshots = pid_pattern_map[pid][20:]
+        wss_dist = []
+        for snapshot in snapshots:
+            wss = 0
+            for p in snapshot:
+                # Ignore regions not accessed
+                if p[1] <= 0:
+                    continue
+                wss += p[0]
+            wss_dist.append(wss)
+        if wss_sort:
+            wss_dist.sort(reverse=False)
+
+        print('# pid\t%s' % pid)
+        print('# avr:\t%d' % (sum(wss_dist) / len(wss_dist)))
+        for percentile in percentiles:
+            thres_idx = int(percentile / 100.0 * len(wss_dist))
+            if thres_idx == len(wss_dist):
+                thres_idx -= 1
+            threshold = wss_dist[thres_idx]
+            print('%d\t%d' % (percentile, wss_dist[thres_idx]))
+
+    if args.plot:
+        sys.stdout = orig_stdout
+        tmp_file.flush()
+        tmp_file.close()
+        xlabel = 'runtime (percent)'
+        if wss_sort:
+            xlabel = 'percentile'
+        _dist.plot_dist(tmp_path, args.plot, xlabel)
+
+if __name__ == '__main__':
+    main()
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 09/11] Documentation/admin-guide/mm: Add a document for DAMON
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (7 preceding siblings ...)
  2020-02-10 14:52 ` [PATCH v4 08/11] tools: Add a minimal user-space tool for DAMON sjpark
@ 2020-02-10 14:53 ` " sjpark
  2020-02-10 14:53 ` [PATCH v4 10/11] mm/damon: Add kunit tests sjpark
  2020-02-10 14:54 ` [PATCH v4 11/11] MAINTAINERS: Update for DAMON sjpark
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:53 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds a simple document for DAMON under
`Documentation/admin-guide/mm`.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 .../admin-guide/mm/data_access_monitor.rst    | 414 ++++++++++++++++++
 Documentation/admin-guide/mm/index.rst        |   1 +
 2 files changed, 415 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/data_access_monitor.rst

diff --git a/Documentation/admin-guide/mm/data_access_monitor.rst b/Documentation/admin-guide/mm/data_access_monitor.rst
new file mode 100644
index 000000000000..4d836c3866e2
--- /dev/null
+++ b/Documentation/admin-guide/mm/data_access_monitor.rst
@@ -0,0 +1,414 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+DAMON: Data Access MONitor
+==========================
+
+Introduction
+============
+
+Memory management decisions can normally be more efficient if finer data access
+information is available.  However, because finer information usually comes
+with higher overhead, most systems including Linux made a tradeoff: Forgive
+some wise decisions and use coarse information and/or light-weight heuristics.
+
+A number of experimental data access pattern awared memory management
+optimizations say the sacrifices are
+huge (2.55x slowdown).  However, none of those has successfully adopted to
+Linux kernel mainly due to the absence of a scalable and efficient data access
+monitoring mechanism.
+
+DAMON is a data access monitoring solution for the problem.  It is 1) accurate
+enough for the DRAM level memory management, 2) light-weight enough to be
+applied online, and 3) keeps predefined upper-bound overhead regardless of the
+size of target workloads (thus scalable).
+
+DAMON is implemented as a standalone kernel module and provides several simple
+interfaces.  Owing to that, though it has mainly designed for the kernel's
+memory management mechanisms, it can be also used for a wide range of user
+space programs and people.
+
+
+Frequently Asked Questions
+==========================
+
+Q: Why not integrated with perf?
+A: From the perspective of perf like profilers, DAMON can be thought of as a
+data source in kernel, like tracepoints, pressure stall information (psi), or
+idle page tracking.  Thus, it can be easily integrated with those.  However,
+this patchset doesn't provide a fancy perf integration because current step of
+DAMON development is focused on its core logic only.  That said, DAMON already
+provides two interfaces for user space programs, which based on debugfs and
+tracepoint, respectively.  Using the tracepoint interface, you can use DAMON
+with perf.  This patchset also provides the debugfs interface based user space
+tool for DAMON.  It can be used to record, visualize, and analyze data access
+pattern of target processes in a convenient way.
+
+Q: Why a new module, instead of extending perf or other tools?
+A: First, DAMON aims to be used by other programs including the kernel.
+Therefore, having dependency to specific tools like perf is not desirable.
+Second, because it need to be lightweight as much as possible so that it can be
+used online, any unnecessary overhead such as kernel - user space context
+switching cost should be avoided.  These are the two most biggest reasons why
+DAMON is implemented in the kernel space.  The idle page tracking subsystem
+would be the kernel module that most seems similar to DAMON.  However, it's own
+interface is not compatible with DAMON.  Also, the internal implementation of
+it has no common part to be reused by DAMON.
+
+Q: Can 'perf mem' provide the data required for DAMON?
+A: On the systems supporting 'perf mem', yes.  DAMON is using the PTE Accessed
+bits in low level.  Other H/W or S/W features that can be used for the purpose
+could be used.  However, as explained with above question, DAMON need to be
+implemented in the kernel space.
+
+
+Expected Use-cases
+==================
+
+A straightforward usecase of DAMON would be the program behavior analysis.
+With the DAMON output, users can confirm whether the program is running as
+intended or not.  This will be useful for debuggings and tests of design
+points.
+
+The monitored results can also be useful for counting the dynamic working set
+size of workloads.  For the administration of memory overcommitted systems or
+selection of the environments (e.g., containers providing different amount of
+memory) for your workloads, this will be useful.
+
+If you are a programmer, you can optimize your program by managing the memory
+based on the actual data access pattern.  For example, you can identify the
+dynamic hotness of your data using DAMON and call ``mlock()`` to keep your hot
+data in DRAM, or call ``madvise()`` with ``MADV_PAGEOUT`` to proactively
+reclaim cold data.  Even though your program is guaranteed to not encounter
+memory pressure, you can still improve the performance by applying the DAMON
+outputs for call of ``MADV_HUGEPAGE`` and ``MADV_NOHUGEPAGE``.  More creative
+optimizations would be possible.  Our evaluations of DAMON includes a
+straightforward optimization using the ``mlock()``.  Please refer to the below
+Evaluation section for more detail.
+
+As DAMON incurs very low overhead, such optimizations can be applied not only
+offline, but also online.  Also, there is no reason to limit such optimizations
+to the user space.  Several parts of the kernel's memory management mechanisms
+could be also optimized using DAMON. The reclamation, the THP (de)promotion
+decisions, and the compaction would be such a candidates.
+
+
+Mechanisms of DAMON
+===================
+
+
+Basic Access Check
+------------------
+
+DAMON basically reports what pages are how frequently accessed.  The report is
+passed to users in binary format via a ``result file`` which users can set it's
+path.  Note that the frequency is not an absolute number of accesses, but a
+relative frequency among the pages of the target workloads.
+
+Users can also control the resolution of the reports by setting two time
+intervals, ``sampling interval`` and ``aggregation interval``.  In detail,
+DAMON checks access to each page per ``sampling interval``, aggregates the
+results (counts the number of the accesses to each page), and reports the
+aggregated results per ``aggregation interval``.  For the access check of each
+page, DAMON uses the Accessed bits of PTEs.
+
+This is thus similar to the previously mentioned periodic access checks based
+mechanisms, which overhead is increasing as the size of the target process
+grows.
+
+
+Region Based Sampling
+---------------------
+
+To avoid the unbounded increase of the overhead, DAMON groups a number of
+adjacent pages that assumed to have same access frequencies into a region.  As
+long as the assumption (pages in a region have same access frequencies) is
+kept, only one page in the region is required to be checked.  Thus, for each
+``sampling interval``, DAMON randomly picks one page in each region and clears
+its Accessed bit.  After one more ``sampling interval``, DAMON reads the
+Accessed bit of the page and increases the access frequency of the region if
+the bit has set meanwhile.  Therefore, the monitoring overhead is controllable
+by setting the number of regions.  DAMON allows users to set the minimal and
+maximum number of regions for the trade-off.
+
+Except the assumption, this is almost same with the above-mentioned
+miniature-like static region based sampling.  In other words, this scheme
+cannot preserve the quality of the output if the assumption is not guaranteed.
+
+
+Adaptive Regions Adjustment
+---------------------------
+
+At the beginning of the monitoring, DAMON constructs the initial regions by
+evenly splitting the memory mapped address space of the process into the
+user-specified minimal number of regions.  In this initial state, the
+assumption is normally not kept and thus the quality could be low.  To keep the
+assumption as much as possible, DAMON adaptively merges and splits each region.
+For each ``aggregation interval``, it compares the access frequencies of
+adjacent regions and merges those if the frequency difference is small.  Then,
+after it reports and clears the aggregated access frequency of each region, it
+splits each region into two regions if the total number of regions is smaller
+than the half of the user-specified maximum number of regions.
+
+In this way, DAMON provides its best-effort quality and minimal overhead while
+keeping the bounds users set for their trade-off.
+
+
+Applying Dynamic Memory Mappings
+--------------------------------
+
+Only a number of small parts in the super-huge virtual address space of the
+processes is mapped to physical memory and accessed.  Thus, tracking the
+unmapped address regions is just wasteful.  However, tracking every memory
+mapping change might incur an overhead.  For the reason, DAMON applies the
+dynamic memory mapping changes to the tracking regions only for each of an
+user-specified time interval (``regions update interval``).
+
+
+``debugfs`` Interface
+=====================
+
+DAMON exports four files, ``attrs``, ``pids``, ``record``, and ``monitor_on``
+under its debugfs directory, ``<debugfs>/damon/``.
+
+Attributes
+----------
+
+Users can read and write the ``sampling interval``, ``aggregation interval``,
+``regions update interval``, and min/max number of monitoring target regions by
+reading from and writing to the ``attrs`` file.  For example, below commands
+set those values to 5 ms, 100 ms, 1,000 ms, 10, 1000 and check it again::
+
+    # cd <debugfs>/damon
+    # echo 5000 100000 1000000 10 1000 > attrs
+    # cat attrs
+    5000 100000 1000000 10 1000
+
+Target PIDs
+-----------
+
+Users can read and write the pids of current monitoring target processes by
+reading from and writing to the ``pids`` file.  For example, below commands set
+processes having pids 42 and 4242 as the processes to be monitored and check it
+again::
+
+    # cd <debugfs>/damon
+    # echo 42 4242 > pids
+    # cat pids
+    42 4242
+
+Note that setting the pids doesn't starts the monitoring.
+
+Record
+------
+
+DAMON support direct monitoring result record feature.  The recorded results
+are first written to a buffer and flushed to a file in batch.  Users can set
+the size of the buffer and the path to the result file by reading from and
+writing to the ``record`` file.  For example, below commands set the buffer to
+be 4 KiB and the result to be saved in ``/damon.data``.
+
+    # cd <debugfs>/damon
+    # echo "4096 /damon.data" > pids
+    # cat record
+    4096 /damon.data
+
+Turning On/Off
+--------------
+
+You can check current status, start and stop the monitoring by reading from and
+writing to the ``monitor_on`` file.  Writing ``on`` to the file starts DAMON to
+monitor the target processes with the attributes.  Writing ``off`` to the file
+stops DAMON.  DAMON also stops if every target processes is be terminated.
+Below example commands turn on, off, and check status of DAMON::
+
+    # cd <debugfs>/damon
+    # echo on > monitor_on
+    # echo off > monitor_on
+    # cat monitor_on
+    off
+
+Please note that you cannot write to the ``attrs`` and ``pids`` files while the
+monitoring is turned on.  If you write to the files while DAMON is running,
+``-EINVAL`` will be returned.
+
+
+User Space Tool for DAMON
+=========================
+
+There is a user space tool for DAMON, ``/tools/damon/damo``.  It provides
+another user interface which more convenient than the debugfs interface.
+Nevertheless, note that it is only aimed to be used for minimal reference of
+the DAMON's debugfs interfaces and for tests of the DAMON itself.  Based on the
+debugfs interface, you can create another cool and more convenient user space
+tools.
+
+The interface of the tool is basically subcommand based.  You can almost always
+use ``-h`` option to get help of the use of each subcommand.  Currently, it
+supports two subcommands, ``record`` and ``report``.
+
+
+Recording Data Access Pattern
+-----------------------------
+
+The ``record`` subcommand records the data access pattern of target process in
+a file (``./damon.data`` by default) using DAMON.  You can specifies the target
+as either pid or a command for an execution of the process.  Below example
+shows a command target usage::
+
+    # cd <kernel>/tools/damon/
+    # ./damo record "sleep 5"
+
+The tool will execute ``sleep 5`` by itself and record the data access patterns
+of the process.  Below example shows a pid target usage::
+
+    # sleep 5 &
+    # ./damo record `pidof sleep`
+
+You can set more detailed attributes and path to the recorded data file using
+optional arguments to the subcommand.  Use the ``-h`` option for more help.
+
+
+Analyzing Data Access Pattern
+-----------------------------
+
+The ``report`` subcommand reads a data access pattern record file (if not
+explicitly specified, reads ``./damon.data`` file if exists) and generates
+reports of various types.  You can specify what type of report you want using
+sub-subcommand to ``report`` subcommand.  For supported types, pass the ``-h``
+option to ``report`` subcommand.
+
+
+raw
+~~~
+
+``raw`` sub-subcommand simply transforms the record, which is storing the data
+access patterns in binary format to human readable text.  For example::
+
+    $ ./damo report raw
+    start_time:  193485829398
+    rel time:                0
+    nr_tasks:  1
+    pid:  1348
+    nr_regions:  4
+    560189609000-56018abce000(  22827008):  0
+    7fbdff59a000-7fbdffaf1a00(   5601792):  0
+    7fbdffaf1a00-7fbdffbb5000(    800256):  1
+    7ffea0dc0000-7ffea0dfd000(    249856):  0
+
+    rel time:        100000731
+    nr_tasks:  1
+    pid:  1348
+    nr_regions:  6
+    560189609000-56018abce000(  22827008):  0
+    7fbdff59a000-7fbdff8ce933(   3361075):  0
+    7fbdff8ce933-7fbdffaf1a00(   2240717):  1
+    7fbdffaf1a00-7fbdffb66d99(    480153):  0
+    7fbdffb66d99-7fbdffbb5000(    320103):  1
+    7ffea0dc0000-7ffea0dfd000(    249856):  0
+
+The first line shows recording started timestamp (nanosecond).  Records of data
+access patterns are following this.  Each record is sperated by a blank line.
+Each record first specifies the recorded time (``rel time``), number of
+monitored tasks in this record (``nr_tasks``).  Multiple number of records of
+data access pattern for each task continue.  Each data access pattern for each
+task shows first it's pid (``pid``) and number of monitored virtual address
+regions in this access pattern (``nr_regions``).  After that, each line shows
+start/end address, size, and number of monitored accesses to the region for
+each of the regions.
+
+
+heats
+~~~~~
+
+The ``raw`` type shows detailed information but it is exhaustive to manually
+read and analyzed.  For the reason, ``heats`` plots the data in heatmap form,
+using time as x-axis, virtual address as y-axis, and access frequency as
+z-axis.  Also, users set the resolution and start/end point of each axis via
+optional arguments.  For example::
+
+    $ ./damo report heats --tres 3 --ares 3
+    0               0               0.0
+    0               7609002         0.0
+    0               15218004        0.0
+    66112620851     0               0.0
+    66112620851     7609002         0.0
+    66112620851     15218004        0.0
+    132225241702    0               0.0
+    132225241702    7609002         0.0
+    132225241702    15218004        0.0
+
+This command shows the recorded access pattern of the ``sleep`` command using 3
+data points for each of time axis and address axis.  Therefore, it shows 9 data
+points in total.
+
+Users can easily converts this text output into heatmap image or other 3D
+representation using various tools such as 'gnuplot'.  ``raw`` sub-subcommand
+also provides 'gnuplot' based heatmap image creation.  For this, you can use
+``--heatmap`` option.  Also, note that because it uses 'gnuplot' internally, it
+will fail if 'gnuplot' is not installed on your system.  For example::
+
+    $ ./damo report heats --heatmap heatmap.png
+
+Creates ``heatmap.png`` file containing the heatmap image.  It supports
+``pdf``, ``png``, ``jpeg``, and ``svg``.
+
+For proper zoom in / zoom out, you need to see the layout of the record.  For
+that, use '--guide' option.  If the option is given, it will provide useful
+information about the records in the record file.  For example::
+
+    $ ./damo report heats --guide
+    pid:1348
+    time: 193485829398-198337863555 (4852034157)
+    region   0: 00000094564599762944-00000094564622589952 (22827008)
+    region   1: 00000140454009610240-00000140454016012288 (6402048)
+    region   2: 00000140731597193216-00000140731597443072 (249856)
+
+The output shows monitored regions (start and end addresses in byte) and
+monitored time duration (start and end time in nanosecond) of each target task.
+Therefore, it would be wise to plot only each region rather than plotting
+entire address space in one heatmap because the gaps between the regions are so
+huge in this case.
+
+
+wss
+~~~
+
+The ``wss`` type shows the distribution or time-varying working set sizes of
+the recorded workload using the records.  For example::
+
+    $ ./damo report wss
+    # <percentile> <wss>
+    # pid   1348
+    # avr:  66228
+    0       0
+    25      0
+    50      0
+    75      0
+    100     1920615
+
+Without any option, it shows the distribution of the working set sizes as
+above.  Basically it shows 0th, 25th, 50th, 75th, and 100th percentile and
+average of the measured working set sizes in the access pattern records.  In
+this case, the working set size was zero for 75th percentile but 1,920,615
+bytes in max and 66,228 in average.
+
+By setting the sort key of the percentile using '--sortby', you can also see
+how the working set size is chronologically changed.  For example::
+
+    $ ./damo report wss --sortby time
+    # <percentile> <wss>
+    # pid   1348
+    # avr:  66228
+    0       0
+    25      0
+    50      0
+    75      0
+    100     0
+
+The average is still 66,228.  And, because we sorted the working set using
+recorded time and the access is very short, we cannot show when the access
+made.
+
+Users can specify the resolution of the distribution (``--range``).  It also
+supports 'gnuplot' based simple visualization (``--plot``) of the distribution.
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 11db46448354..d3d0ba373eb6 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -27,6 +27,7 @@ the Linux memory management.
 
    concepts
    cma_debugfs
+   data_access_monitor
    hugetlbpage
    idle_page_tracking
    ksm
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 10/11] mm/damon: Add kunit tests
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (8 preceding siblings ...)
  2020-02-10 14:53 ` [PATCH v4 09/11] Documentation/admin-guide/mm: Add a document " sjpark
@ 2020-02-10 14:53 ` sjpark
  2020-02-11 22:21   ` Brendan Higgins
                     ` (2 more replies)
  2020-02-10 14:54 ` [PATCH v4 11/11] MAINTAINERS: Update for DAMON sjpark
  10 siblings, 3 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:53 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds kunit based unit tests for DAMON.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/Kconfig      |  11 +
 mm/damon-test.h | 604 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/damon.c      |   2 +
 3 files changed, 617 insertions(+)
 create mode 100644 mm/damon-test.h

diff --git a/mm/Kconfig b/mm/Kconfig
index 387d469f40ec..b279ab9c78d0 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -751,4 +751,15 @@ config DAMON
 	  be 1) accurate enough to be useful for performance-centric domains,
 	  and 2) sufficiently light-weight so that it can be applied online.
 
+config DAMON_KUNIT_TEST
+	bool "Test for damon"
+	depends on DAMON && KUNIT
+	help
+	  This builds the DAMON Kunit test suite.
+
+	  For more information on KUnit and unit tests in general, please refer
+	  to the KUnit documentation.
+
+	  If unsure, say N.
+
 endmenu
diff --git a/mm/damon-test.h b/mm/damon-test.h
new file mode 100644
index 000000000000..c7dc21325c77
--- /dev/null
+++ b/mm/damon-test.h
@@ -0,0 +1,604 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Data Access Monitor Unit Tests
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.  All rights reserved.
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#ifdef CONFIG_DAMON_KUNIT_TEST
+
+#ifndef _DAMON_TEST_H
+#define _DAMON_TEST_H
+
+#include <kunit/test.h>
+
+static void damon_test_str_to_pids(struct kunit *test)
+{
+	char *question;
+	unsigned long *answers;
+	unsigned long expected[] = {12, 35, 46};
+	ssize_t nr_integers = 0, i;
+
+	question = "123";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers);
+	KUNIT_EXPECT_EQ(test, 123ul, answers[0]);
+	kfree(answers);
+
+	question = "123abc";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers);
+	KUNIT_EXPECT_EQ(test, 123ul, answers[0]);
+	kfree(answers);
+
+	question = "a123";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
+	KUNIT_EXPECT_PTR_EQ(test, answers, (unsigned long *)NULL);
+
+	question = "12 35";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers);
+	for (i = 0; i < nr_integers; i++)
+		KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
+	kfree(answers);
+
+	question = "12 35 46";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)3, nr_integers);
+	for (i = 0; i < nr_integers; i++)
+		KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
+	kfree(answers);
+
+	question = "12 35 abc 46";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers);
+	for (i = 0; i < 2; i++)
+		KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
+	kfree(answers);
+
+	question = "";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
+	KUNIT_EXPECT_PTR_EQ(test, (unsigned long *)NULL, answers);
+	kfree(answers);
+
+	question = "\n";
+	answers = str_to_pids(question, strnlen(question, 128), &nr_integers);
+	KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
+	KUNIT_EXPECT_PTR_EQ(test, (unsigned long *)NULL, answers);
+	kfree(answers);
+}
+
+static void damon_test_regions(struct kunit *test)
+{
+	struct damon_region *r;
+	struct damon_task *t;
+
+	r = damon_new_region(&damon_user_ctx, 1, 2);
+	KUNIT_EXPECT_EQ(test, 1ul, r->vm_start);
+	KUNIT_EXPECT_EQ(test, 2ul, r->vm_end);
+	KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses);
+	KUNIT_EXPECT_TRUE(test, r->sampling_addr >= r->vm_start);
+	KUNIT_EXPECT_TRUE(test, r->sampling_addr < r->vm_end);
+
+	t = damon_new_task(42);
+	KUNIT_EXPECT_EQ(test, 0u, nr_damon_regions(t));
+
+	damon_add_region_tail(r, t);
+	KUNIT_EXPECT_EQ(test, 1u, nr_damon_regions(t));
+
+	damon_del_region(r);
+	KUNIT_EXPECT_EQ(test, 0u, nr_damon_regions(t));
+
+	damon_free_task(t);
+}
+
+static void damon_test_tasks(struct kunit *test)
+{
+	struct damon_ctx *c = &damon_user_ctx;
+	struct damon_task *t;
+
+	t = damon_new_task(42);
+	KUNIT_EXPECT_EQ(test, 42ul, t->pid);
+	KUNIT_EXPECT_EQ(test, 0u, nr_damon_tasks(c));
+
+	damon_add_task_tail(&damon_user_ctx, t);
+	KUNIT_EXPECT_EQ(test, 1u, nr_damon_tasks(c));
+
+	damon_destroy_task(t);
+	KUNIT_EXPECT_EQ(test, 0u, nr_damon_tasks(c));
+}
+
+static void damon_test_set_pids(struct kunit *test)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	unsigned long pids[] = {1, 2, 3};
+	char buf[64];
+
+	damon_set_pids(ctx, pids, 3);
+	damon_sprint_pids(ctx, buf, 64);
+	KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2 3\n");
+
+	damon_set_pids(ctx, NULL, 0);
+	damon_sprint_pids(ctx, buf, 64);
+	KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
+
+	damon_set_pids(ctx, (unsigned long []){1, 2}, 2);
+	damon_sprint_pids(ctx, buf, 64);
+	KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2\n");
+
+	damon_set_pids(ctx, (unsigned long []){2}, 1);
+	damon_sprint_pids(ctx, buf, 64);
+	KUNIT_EXPECT_STREQ(test, (char *)buf, "2\n");
+
+	damon_set_pids(ctx, NULL, 0);
+	damon_sprint_pids(ctx, buf, 64);
+	KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
+}
+
+/*
+ * Test damon_three_regions_in_vmas() function
+ *
+ * DAMON converts the complex and dynamic memory mappings of each target task
+ * to three discontiguous regions which cover every mapped areas.  However, the
+ * three regions should not include the two biggest unmapped areas in the
+ * original mapping, because the two biggest areas are normally the areas
+ * between 1) heap and the mmap()-ed regions, and 2) the mmap()-ed regions and
+ * stack.  Because these two unmapped areas are very huge but obviously never
+ * accessed, covering the region is just a waste.
+ *
+ * 'damon_three_regions_in_vmas() receives an address space of a process.  It
+ * first identifies the start of mappings, end of mappings, and the two biggest
+ * unmapped areas.  After that, based on the information, it constructs the
+ * three regions and returns.  For more detail, refer to the comment of
+ * 'damon_init_regions_of()' function definition in 'mm/damon.c' file.
+ *
+ * For example, suppose virtual address ranges of 10-20, 20-25, 200-210,
+ * 210-220, 300-305, and 307-330 (Other comments represent this mappings in
+ * more short form: 10-20-25, 200-210-220, 300-305, 307-330) of a process are
+ * mapped.  To cover every mappings, the three regions should start with 10,
+ * and end with 305.  The process also has three unmapped areas, 25-200,
+ * 220-300, and 305-307.  Among those, 25-200 and 220-300 are the biggest two
+ * unmapped areas, and thus it should be converted to three regions of 10-25,
+ * 200-220, and 300-330.
+ */
+static void damon_test_three_regions_in_vmas(struct kunit *test)
+{
+	struct region regions[3] = {0,};
+	/* 10-20-25, 200-210-220, 300-305, 307-330 */
+	struct vm_area_struct vmas[] = {
+		(struct vm_area_struct) {.vm_start = 10, .vm_end = 20},
+		(struct vm_area_struct) {.vm_start = 20, .vm_end = 25},
+		(struct vm_area_struct) {.vm_start = 200, .vm_end = 210},
+		(struct vm_area_struct) {.vm_start = 210, .vm_end = 220},
+		(struct vm_area_struct) {.vm_start = 300, .vm_end = 305},
+		(struct vm_area_struct) {.vm_start = 307, .vm_end = 330},
+	};
+	vmas[0].vm_next = &vmas[1];
+	vmas[1].vm_next = &vmas[2];
+	vmas[2].vm_next = &vmas[3];
+	vmas[3].vm_next = &vmas[4];
+	vmas[4].vm_next = &vmas[5];
+	vmas[5].vm_next = NULL;
+
+	damon_three_regions_in_vmas(&vmas[0], regions);
+
+	KUNIT_EXPECT_EQ(test, 10ul, regions[0].start);
+	KUNIT_EXPECT_EQ(test, 25ul, regions[0].end);
+	KUNIT_EXPECT_EQ(test, 200ul, regions[1].start);
+	KUNIT_EXPECT_EQ(test, 220ul, regions[1].end);
+	KUNIT_EXPECT_EQ(test, 300ul, regions[2].start);
+	KUNIT_EXPECT_EQ(test, 330ul, regions[2].end);
+}
+
+/* Clean up global state of damon */
+static void damon_cleanup_global_state(void)
+{
+	struct damon_task *t, *next;
+
+	damon_for_each_task_safe(&damon_user_ctx, t, next)
+		damon_destroy_task(t);
+
+	damon_user_ctx.rbuf_offset = 0;
+}
+
+/*
+ * Test kdamond_flush_aggregated()
+ *
+ * DAMON checks access to each region and aggregates this information as the
+ * access frequency of each region.  In detail, it increases '->nr_accesses' of
+ * regions that an access has confirmed.  'kdamond_flush_aggregated()' flushes
+ * the aggregated information ('->nr_accesses' of each regions) to the result
+ * buffer.  As a result of the flushing, the '->nr_accesses' of regions are
+ * initialized to zero.
+ */
+static void damon_test_aggregate(struct kunit *test)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	unsigned long pids[] = {1, 2, 3};
+	unsigned long saddr[][3] = {{10, 20, 30}, {5, 42, 49}, {13, 33, 55} };
+	unsigned long eaddr[][3] = {{15, 27, 40}, {31, 45, 55}, {23, 44, 66} };
+	unsigned long accesses[][3] = {{42, 95, 84}, {10, 20, 30}, {0, 1, 2} };
+	struct damon_task *t;
+	struct damon_region *r;
+	int it, ir;
+	ssize_t sz, sr, sp;
+
+	damon_set_recording(ctx, 256, "damon.data");
+	damon_set_pids(ctx, pids, 3);
+
+	it = 0;
+	damon_for_each_task(ctx, t) {
+		for (ir = 0; ir < 3; ir++) {
+			r = damon_new_region(ctx,
+					saddr[it][ir], eaddr[it][ir]);
+			r->nr_accesses = accesses[it][ir];
+			damon_add_region_tail(r, t);
+		}
+		it++;
+	}
+	kdamond_flush_aggregated(ctx);
+	it = 0;
+	damon_for_each_task(ctx, t) {
+		ir = 0;
+		/* '->nr_accesses' should be zeroed */
+		damon_for_each_region(r, t) {
+			KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses);
+			ir++;
+		}
+		/* regions should be preserved */
+		KUNIT_EXPECT_EQ(test, 3, ir);
+		it++;
+	}
+	/* tasks also should be preserved */
+	KUNIT_EXPECT_EQ(test, 3, it);
+
+	/* The aggregated information should be written in the buffer */
+	sr = sizeof(r->vm_start) + sizeof(r->vm_end) + sizeof(r->nr_accesses);
+	sp = sizeof(t->pid) + sizeof(unsigned int) + 3 * sr;
+	sz = sizeof(struct timespec64) + sizeof(unsigned int) + 3 * sp;
+	KUNIT_EXPECT_EQ(test, (unsigned int)sz, ctx->rbuf_offset);
+
+	damon_set_recording(ctx, 0, "damon.data");
+	damon_cleanup_global_state();
+}
+
+static void damon_test_write_rbuf(struct kunit *test)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	char *data;
+
+	damon_set_recording(&damon_user_ctx, 256, "damon.data");
+
+	data = "hello";
+	damon_write_rbuf(ctx, data, strnlen(data, 256));
+	KUNIT_EXPECT_EQ(test, ctx->rbuf_offset, 5u);
+
+	damon_write_rbuf(ctx, data, 0);
+	KUNIT_EXPECT_EQ(test, ctx->rbuf_offset, 5u);
+
+	KUNIT_EXPECT_STREQ(test, (char *)ctx->rbuf, data);
+	damon_set_recording(&damon_user_ctx, 0, "damon.data");
+}
+
+/*
+ * Test 'damon_apply_three_regions()'
+ *
+ * test			kunit object
+ * regions		an array containing start/end addresses of current
+ *			monitoring target regions
+ * nr_regions		the number of the addresses in 'regions'
+ * three_regions	The three regions that need to be applied now
+ * expected		start/end addresses of monitoring target regions that
+ *			'three_regions' are applied
+ * nr_expected		the number of addresses in 'expected'
+ *
+ * The memory mapping of the target processes changes dynamically.  To follow
+ * the change, DAMON periodically reads the mappings, simplifies it to the
+ * three regions, and updates the monitoring target regions to fit in the three
+ * regions.  The update of current target regions is the role of
+ * 'damon_apply_three_regions()'.
+ *
+ * This test passes the given target regions and the new three regions that
+ * need to be applied to the function and check whether it updates the regions
+ * as expected.
+ */
+static void damon_do_test_apply_three_regions(struct kunit *test,
+				unsigned long *regions, int nr_regions,
+				struct region *three_regions,
+				unsigned long *expected, int nr_expected)
+{
+	struct damon_task *t;
+	struct damon_region *r;
+	int i;
+
+	t = damon_new_task(42);
+	for (i = 0; i < nr_regions / 2; i++) {
+		r = damon_new_region(&damon_user_ctx,
+				regions[i * 2], regions[i * 2 + 1]);
+		damon_add_region_tail(r, t);
+	}
+	damon_add_task_tail(&damon_user_ctx, t);
+
+	damon_apply_three_regions(&damon_user_ctx, t, three_regions);
+
+	for (i = 0; i < nr_expected / 2; i++) {
+		r = damon_nth_region_of(t, i);
+		KUNIT_EXPECT_EQ(test, r->vm_start, expected[i * 2]);
+		KUNIT_EXPECT_EQ(test, r->vm_end, expected[i * 2 + 1]);
+	}
+
+	damon_cleanup_global_state();
+}
+
+/*
+ * This function test most common case where the three big regions are only
+ * slightly changed.  Target regions should adjust their boundary (10-20-30,
+ * 50-55, 70-80, 90-100) to fit with the new big regions or remove target
+ * regions (57-79) that now out of the three regions.
+ */
+static void damon_test_apply_three_regions1(struct kunit *test)
+{
+	/* 10-20-30, 50-55-57-59, 70-80-90-100 */
+	unsigned long regions[] = {10, 20, 20, 30, 50, 55, 55, 57, 57, 59,
+				70, 80, 80, 90, 90, 100};
+	/* 5-27, 45-55, 73-104 */
+	struct region new_three_regions[3] = {
+		(struct region){.start = 5, .end = 27},
+		(struct region){.start = 45, .end = 55},
+		(struct region){.start = 73, .end = 104} };
+	/* 5-20-27, 45-55, 73-80-90-104 */
+	unsigned long expected[] = {5, 20, 20, 27, 45, 55,
+				73, 80, 80, 90, 90, 104};
+
+	damon_do_test_apply_three_regions(test, regions, ARRAY_SIZE(regions),
+			new_three_regions, expected, ARRAY_SIZE(expected));
+}
+
+/*
+ * Test slightly bigger change.  Similar to above, but the second big region
+ * now require two target regions (50-55, 57-59) to be removed.
+ */
+static void damon_test_apply_three_regions2(struct kunit *test)
+{
+	/* 10-20-30, 50-55-57-59, 70-80-90-100 */
+	unsigned long regions[] = {10, 20, 20, 30, 50, 55, 55, 57, 57, 59,
+				70, 80, 80, 90, 90, 100};
+	/* 5-27, 56-57, 65-104 */
+	struct region new_three_regions[3] = {
+		(struct region){.start = 5, .end = 27},
+		(struct region){.start = 56, .end = 57},
+		(struct region){.start = 65, .end = 104} };
+	/* 5-20-27, 56-57, 65-80-90-104 */
+	unsigned long expected[] = {5, 20, 20, 27, 56, 57,
+				65, 80, 80, 90, 90, 104};
+
+	damon_do_test_apply_three_regions(test, regions, ARRAY_SIZE(regions),
+			new_three_regions, expected, ARRAY_SIZE(expected));
+}
+
+/*
+ * Test a big change.  The second big region has totally freed and mapped to
+ * different area (50-59 -> 61-63).  The target regions which were in the old
+ * second big region (50-55-57-59) should be removed and new target region
+ * covering the second big region (61-63) should be created.
+ */
+static void damon_test_apply_three_regions3(struct kunit *test)
+{
+	/* 10-20-30, 50-55-57-59, 70-80-90-100 */
+	unsigned long regions[] = {10, 20, 20, 30, 50, 55, 55, 57, 57, 59,
+				70, 80, 80, 90, 90, 100};
+	/* 5-27, 61-63, 65-104 */
+	struct region new_three_regions[3] = {
+		(struct region){.start = 5, .end = 27},
+		(struct region){.start = 61, .end = 63},
+		(struct region){.start = 65, .end = 104} };
+	/* 5-20-27, 61-63, 65-80-90-104 */
+	unsigned long expected[] = {5, 20, 20, 27, 61, 63,
+				65, 80, 80, 90, 90, 104};
+
+	damon_do_test_apply_three_regions(test, regions, ARRAY_SIZE(regions),
+			new_three_regions, expected, ARRAY_SIZE(expected));
+}
+
+/*
+ * Test another big change.  Both of the second and third big regions (50-59
+ * and 70-100) has totally freed and mapped to different area (30-32 and
+ * 65-68).  The target regions which were in the old second and third big
+ * regions should now be removed and new target regions covering the new second
+ * and third big regions should be crated.
+ */
+static void damon_test_apply_three_regions4(struct kunit *test)
+{
+	/* 10-20-30, 50-55-57-59, 70-80-90-100 */
+	unsigned long regions[] = {10, 20, 20, 30, 50, 55, 55, 57, 57, 59,
+				70, 80, 80, 90, 90, 100};
+	/* 5-7, 30-32, 65-68 */
+	struct region new_three_regions[3] = {
+		(struct region){.start = 5, .end = 7},
+		(struct region){.start = 30, .end = 32},
+		(struct region){.start = 65, .end = 68} };
+	/* expect 5-7, 30-32, 65-68 */
+	unsigned long expected[] = {5, 7, 30, 32, 65, 68};
+
+	damon_do_test_apply_three_regions(test, regions, ARRAY_SIZE(regions),
+			new_three_regions, expected, ARRAY_SIZE(expected));
+}
+
+static void damon_test_split_evenly(struct kunit *test)
+{
+	struct damon_ctx *c = &damon_user_ctx;
+	struct damon_task *t;
+	struct damon_region *r;
+	unsigned long i;
+
+	KUNIT_EXPECT_EQ(test, damon_split_region_evenly(c, NULL, 5), -EINVAL);
+
+	t = damon_new_task(42);
+	r = damon_new_region(&damon_user_ctx, 0, 100);
+	KUNIT_EXPECT_EQ(test, damon_split_region_evenly(c, r, 0), -EINVAL);
+
+	damon_add_region_tail(r, t);
+	KUNIT_EXPECT_EQ(test, damon_split_region_evenly(c, r, 10), 0);
+	KUNIT_EXPECT_EQ(test, nr_damon_regions(t), 10u);
+
+	i = 0;
+	damon_for_each_region(r, t) {
+		KUNIT_EXPECT_EQ(test, r->vm_start, i++ * 10);
+		KUNIT_EXPECT_EQ(test, r->vm_end, i * 10);
+	}
+	damon_free_task(t);
+
+	t = damon_new_task(42);
+	r = damon_new_region(&damon_user_ctx, 5, 59);
+	damon_add_region_tail(r, t);
+	KUNIT_EXPECT_EQ(test, damon_split_region_evenly(c, r, 5), 0);
+	KUNIT_EXPECT_EQ(test, nr_damon_regions(t), 5u);
+
+	i = 0;
+	damon_for_each_region(r, t) {
+		if (i == 4)
+			break;
+		KUNIT_EXPECT_EQ(test, r->vm_start, 5 + 10 * i++);
+		KUNIT_EXPECT_EQ(test, r->vm_end, 5 + 10 * i);
+	}
+	KUNIT_EXPECT_EQ(test, r->vm_start, 5 + 10 * i);
+	KUNIT_EXPECT_EQ(test, r->vm_end, 59ul);
+	damon_free_task(t);
+
+	t = damon_new_task(42);
+	r = damon_new_region(&damon_user_ctx, 5, 6);
+	damon_add_region_tail(r, t);
+	KUNIT_EXPECT_EQ(test, damon_split_region_evenly(c, r, 2), -EINVAL);
+	KUNIT_EXPECT_EQ(test, nr_damon_regions(t), 1u);
+
+	damon_for_each_region(r, t) {
+		KUNIT_EXPECT_EQ(test, r->vm_start, 5ul);
+		KUNIT_EXPECT_EQ(test, r->vm_end, 6ul);
+	}
+	damon_free_task(t);
+}
+
+static void damon_test_split_at(struct kunit *test)
+{
+	struct damon_task *t;
+	struct damon_region *r;
+
+	t = damon_new_task(42);
+	r = damon_new_region(&damon_user_ctx, 0, 100);
+	damon_add_region_tail(r, t);
+	damon_split_region_at(&damon_user_ctx, r, 25);
+	KUNIT_EXPECT_EQ(test, r->vm_start, 0ul);
+	KUNIT_EXPECT_EQ(test, r->vm_end, 25ul);
+
+	r = damon_next_region(r);
+	KUNIT_EXPECT_EQ(test, r->vm_start, 25ul);
+	KUNIT_EXPECT_EQ(test, r->vm_end, 100ul);
+
+	damon_free_task(t);
+}
+
+static void damon_test_merge_two(struct kunit *test)
+{
+	struct damon_task *t;
+	struct damon_region *r, *r2, *r3;
+	int i;
+
+	t = damon_new_task(42);
+	r = damon_new_region(&damon_user_ctx, 0, 100);
+	r->nr_accesses = 10;
+	damon_add_region_tail(r, t);
+	r2 = damon_new_region(&damon_user_ctx, 100, 300);
+	r2->nr_accesses = 20;
+	damon_add_region_tail(r2, t);
+
+	damon_merge_two_regions(r, r2);
+	KUNIT_EXPECT_EQ(test, r->vm_start, 0ul);
+	KUNIT_EXPECT_EQ(test, r->vm_end, 300ul);
+	KUNIT_EXPECT_EQ(test, r->nr_accesses, 16u);
+
+	i = 0;
+	damon_for_each_region(r3, t) {
+		KUNIT_EXPECT_PTR_EQ(test, r, r3);
+		i++;
+	}
+	KUNIT_EXPECT_EQ(test, i, 1);
+
+	damon_free_task(t);
+}
+
+static void damon_test_merge_regions_of(struct kunit *test)
+{
+	struct damon_task *t;
+	struct damon_region *r;
+	unsigned long sa[] = {0, 100, 114, 122, 130, 156, 170, 184};
+	unsigned long ea[] = {100, 112, 122, 130, 156, 170, 184, 230};
+	unsigned int nrs[] = {0, 0, 10, 10, 20, 30, 1, 2};
+
+	unsigned long saddrs[] = {0, 114, 130, 156, 170};
+	unsigned long eaddrs[] = {112, 130, 156, 170, 230};
+	int i;
+
+	t = damon_new_task(42);
+	for (i = 0; i < ARRAY_SIZE(sa); i++) {
+		r = damon_new_region(&damon_user_ctx, sa[i], ea[i]);
+		r->nr_accesses = nrs[i];
+		damon_add_region_tail(r, t);
+	}
+
+	damon_merge_regions_of(t, 9);
+	/* 0-112, 114-130, 130-156, 156-170 */
+	KUNIT_EXPECT_EQ(test, nr_damon_regions(t), 5u);
+	for (i = 0; i < 5; i++) {
+		r = damon_nth_region_of(t, i);
+		KUNIT_EXPECT_EQ(test, r->vm_start, saddrs[i]);
+		KUNIT_EXPECT_EQ(test, r->vm_end, eaddrs[i]);
+	}
+	damon_free_task(t);
+}
+
+static void damon_test_split_regions_of(struct kunit *test)
+{
+	struct damon_task *t;
+	struct damon_region *r;
+
+	t = damon_new_task(42);
+	r = damon_new_region(&damon_user_ctx, 0, 22);
+	damon_add_region_tail(r, t);
+	damon_split_regions_of(&damon_user_ctx, t);
+	KUNIT_EXPECT_EQ(test, nr_damon_regions(t), 2u);
+	damon_free_task(t);
+}
+
+static struct kunit_case damon_test_cases[] = {
+	KUNIT_CASE(damon_test_str_to_pids),
+	KUNIT_CASE(damon_test_tasks),
+	KUNIT_CASE(damon_test_regions),
+	KUNIT_CASE(damon_test_set_pids),
+	KUNIT_CASE(damon_test_three_regions_in_vmas),
+	KUNIT_CASE(damon_test_aggregate),
+	KUNIT_CASE(damon_test_write_rbuf),
+	KUNIT_CASE(damon_test_apply_three_regions1),
+	KUNIT_CASE(damon_test_apply_three_regions2),
+	KUNIT_CASE(damon_test_apply_three_regions3),
+	KUNIT_CASE(damon_test_apply_three_regions4),
+	KUNIT_CASE(damon_test_split_evenly),
+	KUNIT_CASE(damon_test_split_at),
+	KUNIT_CASE(damon_test_merge_two),
+	KUNIT_CASE(damon_test_merge_regions_of),
+	KUNIT_CASE(damon_test_split_regions_of),
+	{},
+};
+
+static struct kunit_suite damon_test_suite = {
+	.name = "damon",
+	.test_cases = damon_test_cases,
+};
+kunit_test_suite(damon_test_suite);
+
+#endif /* _DAMON_TEST_H */
+
+#endif	/* CONFIG_DAMON_KUNIT_TEST */
diff --git a/mm/damon.c b/mm/damon.c
index 02bfa12940ea..bb8eb88edaf3 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -1409,3 +1409,5 @@ module_exit(damon_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("SeongJae Park <sjpark@amazon.de>");
 MODULE_DESCRIPTION("DAMON: Data Access MONitor");
+
+#include "damon-test.h"
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 11/11] MAINTAINERS: Update for DAMON
  2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
                   ` (9 preceding siblings ...)
  2020-02-10 14:53 ` [PATCH v4 10/11] mm/damon: Add kunit tests sjpark
@ 2020-02-10 14:54 ` sjpark
  10 siblings, 0 replies; 19+ messages in thread
From: sjpark @ 2020-02-10 14:54 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
	rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit updates MAINTAINERS file for DAMON related files.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 MAINTAINERS | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 56765f542244..bad6ebfe56e5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4611,6 +4611,17 @@ F:	net/ax25/ax25_out.c
 F:	net/ax25/ax25_timer.c
 F:	net/ax25/sysctl_net_ax25.c
 
+DATA ACCESS MONITOR
+M:	SeongJae Park <sjpark@amazon.de>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	Documentation/admin-guide/mm/data_access_monitor.rst
+F:	include/linux/damon.h
+F:	include/trace/events/damon.h
+F:	mm/damon-test.h
+F:	mm/damon.c
+F:	tools/damon/*
+
 DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER
 L:	netdev@vger.kernel.org
 S:	Orphan
-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 10/11] mm/damon: Add kunit tests
  2020-02-10 14:53 ` [PATCH v4 10/11] mm/damon: Add kunit tests sjpark
@ 2020-02-11 22:21   ` Brendan Higgins
  2020-02-13 16:56   ` kbuild test robot
  2020-02-15  4:07   ` Randy Dunlap
  2 siblings, 0 replies; 19+ messages in thread
From: Brendan Higgins @ 2020-02-11 22:21 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, SeongJae Park, acme, alexander.shishkin, amit,
	brendan.d.gregg, cai, Colin King, Jonathan Corbet, dwmw, jolsa,
	kirill, Mark Rutland, mgorman, minchan, mingo, namhyung,
	Peter Zijlstra, Randy Dunlap, Steven Rostedt, SeongJae Park,
	vdavydov.dev, linux-mm, open list:DOCUMENTATION,
	Linux Kernel Mailing List

On Mon, Feb 10, 2020 at 6:54 AM <sjpark@amazon.com> wrote:
>
> From: SeongJae Park <sjpark@amazon.de>
>
> This commit adds kunit based unit tests for DAMON.
>
> Signed-off-by: SeongJae Park <sjpark@amazon.de>

Reviewed-by: Brendan Higgins <brendanhiggins@google.com>

Cheers!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 05/11] mm/damon: Implement kernel space API
  2020-02-10 14:50 ` [PATCH v4 05/11] mm/damon: Implement kernel space API sjpark
@ 2020-02-12 23:21   ` kbuild test robot
  2020-02-13  9:09     ` SeongJae Park
  0 siblings, 1 reply; 19+ messages in thread
From: kbuild test robot @ 2020-02-12 23:21 UTC (permalink / raw)
  To: sjpark
  Cc: kbuild-all, akpm, SeongJae Park, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, rostedt, sj38.park, vdavydov.dev, linux-mm,
	linux-doc, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on d5226fa6dbae0569ee43ecfc08bdcd6770fc4755]

url:    https://github.com/0day-ci/linux/commits/sjpark-amazon-com/Introduce-Data-Access-MONitor-DAMON/20200213-003254
base:    d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
config: m68k-allmodconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ERROR: "lookup_page_ext" [mm/damon.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 51870 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Re: [PATCH v4 05/11] mm/damon: Implement kernel space API
  2020-02-12 23:21   ` kbuild test robot
@ 2020-02-13  9:09     ` SeongJae Park
  0 siblings, 0 replies; 19+ messages in thread
From: SeongJae Park @ 2020-02-13  9:09 UTC (permalink / raw)
  To: kbuild test robot
  Cc: sjpark, kbuild-all, akpm, SeongJae Park, acme,
	alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
	colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
	minchan, mingo, namhyung, peterz, rdunlap, rostedt, sj38.park,
	vdavydov.dev, linux-mm, linux-doc, linux-kernel

On Thu, 13 Feb 2020 07:21:37 +0800 kbuild test robot <lkp@intel.com> wrote:

> [-- Attachment #1: Type: text/plain, Size: 937 bytes --]
> 
> Hi,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on d5226fa6dbae0569ee43ecfc08bdcd6770fc4755]
> 
> url:    https://github.com/0day-ci/linux/commits/sjpark-amazon-com/Introduce-Data-Access-MONitor-DAMON/20200213-003254
> base:    d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
> config: m68k-allmodconfig (attached as .config)
> compiler: m68k-linux-gcc (GCC) 7.5.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=7.5.0 make.cross ARCH=m68k 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
> >> ERROR: "lookup_page_ext" [mm/damon.ko] undefined!

Thank you for finding this problem, kbuild!


This problem comes when `CONFIG_DAMON=m` and `CONFIG_64BIT` unset because
`lookup_page_ext()`, which is used by `page_idle.h` if `CONFIG_64BIT` unset, is
not exported.  Most simple fix would be avoiding module build of DAMON as
below::

    diff --git a/mm/Kconfig b/mm/Kconfig
    index b279ab9c78d0..f24dd670baad 100644
    --- a/mm/Kconfig
    +++ b/mm/Kconfig
    @@ -740,7 +740,7 @@ config MAPPING_DIRTY_HELPERS
             bool
    
     config DAMON
    -       tristate "Data Access Monitor"
    +       bool "Data Access Monitor"
            depends on MMU
            default n
            help

Or, exporting the symbol as below::

    diff --git a/mm/page_ext.c b/mm/page_ext.c
    index 4ade843ff588..e6e6b7e625e4 100644
    --- a/mm/page_ext.c
    +++ b/mm/page_ext.c
    @@ -131,6 +131,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
                                            MAX_ORDER_NR_PAGES);
            return get_entry(base, index);
     }
    +EXPORT_SYMBOL(lookup_page_ext);
    
     static int __init alloc_node_page_ext(int nid)
     {

I of course prefer this fix but unsure whether exporting this symbol is ok or
not.  May I ask your opinions?


Thanks,
SeongJae Park

> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> 
> [-- Attachment #2: .config.gz --]
> [-- Type: application/gzip, Size: 51870 bytes --]


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 10/11] mm/damon: Add kunit tests
  2020-02-10 14:53 ` [PATCH v4 10/11] mm/damon: Add kunit tests sjpark
  2020-02-11 22:21   ` Brendan Higgins
@ 2020-02-13 16:56   ` kbuild test robot
  2020-02-14 11:19     ` SeongJae Park
  2020-02-15  4:07   ` Randy Dunlap
  2 siblings, 1 reply; 19+ messages in thread
From: kbuild test robot @ 2020-02-13 16:56 UTC (permalink / raw)
  To: sjpark
  Cc: kbuild-all, akpm, SeongJae Park, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, rostedt, sj38.park, vdavydov.dev, linux-mm,
	linux-doc, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3435 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on d5226fa6dbae0569ee43ecfc08bdcd6770fc4755]

url:    https://github.com/0day-ci/linux/commits/sjpark-amazon-com/Introduce-Data-Access-MONitor-DAMON/20200213-003254
base:    d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-4) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from mm/damon.c:19:0:
>> include/linux/module.h:131:42: error: redefinition of '__inittest'
     static inline initcall_t __maybe_unused __inittest(void)  \
                                             ^
   include/linux/module.h:124:28: note: in expansion of macro 'module_init'
    #define late_initcall(fn)  module_init(fn)
                               ^~~~~~~~~~~
   include/kunit/test.h:224:2: note: in expansion of macro 'late_initcall'
     late_initcall(kunit_suite_init##suite)
     ^~~~~~~~~~~~~
   mm/damon-test.h:600:1: note: in expansion of macro 'kunit_test_suite'
    kunit_test_suite(damon_test_suite);
    ^~~~~~~~~~~~~~~~
   include/linux/module.h:131:42: note: previous definition of '__inittest' was here
     static inline initcall_t __maybe_unused __inittest(void)  \
                                             ^
   mm/damon.c:1406:1: note: in expansion of macro 'module_init'
    module_init(damon_init);
    ^~~~~~~~~~~
>> include/linux/module.h:133:6: error: redefinition of 'init_module'
     int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));
         ^
   include/linux/module.h:124:28: note: in expansion of macro 'module_init'
    #define late_initcall(fn)  module_init(fn)
                               ^~~~~~~~~~~
   include/kunit/test.h:224:2: note: in expansion of macro 'late_initcall'
     late_initcall(kunit_suite_init##suite)
     ^~~~~~~~~~~~~
   mm/damon-test.h:600:1: note: in expansion of macro 'kunit_test_suite'
    kunit_test_suite(damon_test_suite);
    ^~~~~~~~~~~~~~~~
   include/linux/module.h:133:6: note: previous definition of 'init_module' was here
     int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));
         ^
   mm/damon.c:1406:1: note: in expansion of macro 'module_init'
    module_init(damon_init);
    ^~~~~~~~~~~

vim +/__inittest +131 include/linux/module.h

0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  128  
0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  129  /* Each module must use one module_init(). */
0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  130  #define module_init(initfn)					\
1f318a8bafcfba9 Arnd Bergmann  2017-02-01 @131  	static inline initcall_t __maybe_unused __inittest(void)		\
0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  132  	{ return initfn; }					\
a6e60d84989fa0e Miguel Ojeda   2019-01-19 @133  	int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));
0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  134  

:::::: The code at line 131 was first introduced by commit
:::::: 1f318a8bafcfba9f0d623f4870c4e890fd22e659 modules: mark __inittest/__exittest as __maybe_unused

:::::: TO: Arnd Bergmann <arnd@arndb.de>
:::::: CC: Jessica Yu <jeyu@redhat.com>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71807 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Re: [PATCH v4 10/11] mm/damon: Add kunit tests
  2020-02-13 16:56   ` kbuild test robot
@ 2020-02-14 11:19     ` SeongJae Park
  0 siblings, 0 replies; 19+ messages in thread
From: SeongJae Park @ 2020-02-14 11:19 UTC (permalink / raw)
  To: kbuild test robot
  Cc: sjpark, kbuild-all, akpm, SeongJae Park, acme,
	alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
	colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
	minchan, mingo, namhyung, peterz, rdunlap, rostedt, sj38.park,
	vdavydov.dev, linux-mm, linux-doc, linux-kernel

On Fri, 14 Feb 2020 00:56:50 +0800 kbuild test robot <lkp@intel.com> wrote:

> [-- Attachment #1: Type: text/plain, Size: 3435 bytes --]
> 
> Hi,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on d5226fa6dbae0569ee43ecfc08bdcd6770fc4755]
> 
> url:    https://github.com/0day-ci/linux/commits/sjpark-amazon-com/Introduce-Data-Access-MONitor-DAMON/20200213-003254
> base:    d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
> config: x86_64-allmodconfig (attached as .config)
> compiler: gcc-7 (Debian 7.5.0-4) 7.5.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from mm/damon.c:19:0:
> >> include/linux/module.h:131:42: error: redefinition of '__inittest'
>      static inline initcall_t __maybe_unused __inittest(void)  \
>                                              ^
>    include/linux/module.h:124:28: note: in expansion of macro 'module_init'
>     #define late_initcall(fn)  module_init(fn)
>                                ^~~~~~~~~~~
>    include/kunit/test.h:224:2: note: in expansion of macro 'late_initcall'
>      late_initcall(kunit_suite_init##suite)
>      ^~~~~~~~~~~~~
>    mm/damon-test.h:600:1: note: in expansion of macro 'kunit_test_suite'
>     kunit_test_suite(damon_test_suite);
>     ^~~~~~~~~~~~~~~~
>    include/linux/module.h:131:42: note: previous definition of '__inittest' was here
>      static inline initcall_t __maybe_unused __inittest(void)  \
>                                              ^
>    mm/damon.c:1406:1: note: in expansion of macro 'module_init'
>     module_init(damon_init);
>     ^~~~~~~~~~~
> >> include/linux/module.h:133:6: error: redefinition of 'init_module'
>      int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));
>          ^
>    include/linux/module.h:124:28: note: in expansion of macro 'module_init'
>     #define late_initcall(fn)  module_init(fn)
>                                ^~~~~~~~~~~
>    include/kunit/test.h:224:2: note: in expansion of macro 'late_initcall'
>      late_initcall(kunit_suite_init##suite)
>      ^~~~~~~~~~~~~
>    mm/damon-test.h:600:1: note: in expansion of macro 'kunit_test_suite'
>     kunit_test_suite(damon_test_suite);
>     ^~~~~~~~~~~~~~~~
>    include/linux/module.h:133:6: note: previous definition of 'init_module' was here
>      int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));
>          ^
>    mm/damon.c:1406:1: note: in expansion of macro 'module_init'
>     module_init(damon_init);
>     ^~~~~~~~~~~
> 
> vim +/__inittest +131 include/linux/module.h
> 
> 0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  128  
> 0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  129  /* Each module must use one module_init(). */
> 0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  130  #define module_init(initfn)					\
> 1f318a8bafcfba9 Arnd Bergmann  2017-02-01 @131  	static inline initcall_t __maybe_unused __inittest(void)		\
> 0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  132  	{ return initfn; }					\
> a6e60d84989fa0e Miguel Ojeda   2019-01-19 @133  	int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));
> 0fd972a7d91d6e1 Paul Gortmaker 2015-05-01  134  
> 
> :::::: The code at line 131 was first introduced by commit
> :::::: 1f318a8bafcfba9f0d623f4870c4e890fd22e659 modules: mark __inittest/__exittest as __maybe_unused
> 
> :::::: TO: Arnd Bergmann <arnd@arndb.de>
> :::::: CC: Jessica Yu <jeyu@redhat.com>

Thank you for finding yet another problem!  The problem is reproducible if
`CONFIG_DAMON=m` but `CONFIG_DAMON_KUNIT_TEST=y`.  Will fix this problem by
simply adjusting the dependency of the tests as below.  It will avoid the build
of the test code when DAMON is configured to be built as a module::

    diff --git a/mm/Kconfig b/mm/Kconfig
    index b279ab9c78d0..1a745ce0cbcb 100644
    --- a/mm/Kconfig
    +++ b/mm/Kconfig
    @@ -753,7 +753,7 @@ config DAMON
    
     config DAMON_KUNIT_TEST
            bool "Test for damon"
    -       depends on DAMON && KUNIT
    +       depends on DAMON=y && KUNIT
            help
              This builds the DAMON Kunit test suite.

I think this is fair enough as KUNIT is not exporting its main functions to
modules and thus cannot be used by modules anyway.


Thanks,
SeongJae Park

> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> 
> [-- Attachment #2: .config.gz --]
> [-- Type: application/gzip, Size: 71807 bytes --]


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 10/11] mm/damon: Add kunit tests
  2020-02-10 14:53 ` [PATCH v4 10/11] mm/damon: Add kunit tests sjpark
  2020-02-11 22:21   ` Brendan Higgins
  2020-02-13 16:56   ` kbuild test robot
@ 2020-02-15  4:07   ` Randy Dunlap
  2020-02-15  6:52     ` SeongJae Park
  2 siblings, 1 reply; 19+ messages in thread
From: Randy Dunlap @ 2020-02-15  4:07 UTC (permalink / raw)
  To: sjpark, akpm
  Cc: SeongJae Park, acme, alexander.shishkin, amit, brendan.d.gregg,
	brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
	mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rostedt,
	sj38.park, vdavydov.dev, linux-mm, linux-doc, linux-kernel

On 2/10/20 6:53 AM, sjpark@amazon.com wrote:
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 387d469f40ec..b279ab9c78d0 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -751,4 +751,15 @@ config DAMON
>  	  be 1) accurate enough to be useful for performance-centric domains,
>  	  and 2) sufficiently light-weight so that it can be applied online.
>  
> +config DAMON_KUNIT_TEST
> +	bool "Test for damon"

s/bool/tristate/ ?

> +	depends on DAMON && KUNIT
> +	help
> +	  This builds the DAMON Kunit test suite.
> +
> +	  For more information on KUnit and unit tests in general, please refer
> +	  to the KUnit documentation.
> +
> +	  If unsure, say N.
> +
>  endmenu


-- 
~Randy



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Re: [PATCH v4 10/11] mm/damon: Add kunit tests
  2020-02-15  4:07   ` Randy Dunlap
@ 2020-02-15  6:52     ` SeongJae Park
  0 siblings, 0 replies; 19+ messages in thread
From: SeongJae Park @ 2020-02-15  6:52 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: sjpark, akpm, SeongJae Park, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rostedt, sj38.park, vdavydov.dev, linux-mm, linux-doc,
	linux-kernel

On Fri, 14 Feb 2020 20:07:47 -0800 Randy Dunlap <rdunlap@infradead.org> wrote:

> On 2/10/20 6:53 AM, sjpark@amazon.com wrote:
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 387d469f40ec..b279ab9c78d0 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -751,4 +751,15 @@ config DAMON
> >  	  be 1) accurate enough to be useful for performance-centric domains,
> >  	  and 2) sufficiently light-weight so that it can be applied online.
> >  
> > +config DAMON_KUNIT_TEST
> > +	bool "Test for damon"
> 
> s/bool/tristate/ ?

Thank you for this comment!

It seems Kunit does not support module build, as its core functions are not
exported to modules.  That said, as this might be confusing and even could
cause a build failure with some configuration combinations[1], I will change
this dependency to `DAMON=y && KUNIT` in next spin.

[1] https://lore.kernel.org/linux-mm/20200214111907.7017-1-sjpark@amazon.com/

Thanks,
SeongJae Park

> 
> > +	depends on DAMON && KUNIT
> > +	help
> > +	  This builds the DAMON Kunit test suite.
> > +
> > +	  For more information on KUnit and unit tests in general, please refer
> > +	  to the KUnit documentation.
> > +
> > +	  If unsure, say N.
> > +
> >  endmenu
> 
> 
> -- 
> ~Randy
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 14:48 [PATCH v4 00/11] Introduce Data Access MONitor (DAMON) sjpark
2020-02-10 14:48 ` [PATCH v4 01/11] mm: " sjpark
2020-02-10 14:48 ` [PATCH v4 02/11] mm/damon: Implement region based sampling sjpark
2020-02-10 14:48 ` [PATCH v4 03/11] mm/damon: Adaptively adjust regions sjpark
2020-02-10 14:48 ` [PATCH v4 04/11] mm/damon: Apply dynamic memory mapping changes sjpark
2020-02-10 14:50 ` [PATCH v4 05/11] mm/damon: Implement kernel space API sjpark
2020-02-12 23:21   ` kbuild test robot
2020-02-13  9:09     ` SeongJae Park
2020-02-10 14:51 ` [PATCH v4 06/11] mm/damon: Add debugfs interface sjpark
2020-02-10 14:52 ` [PATCH v4 07/11] mm/damon: Add a tracepoint for result writing sjpark
2020-02-10 14:52 ` [PATCH v4 08/11] tools: Add a minimal user-space tool for DAMON sjpark
2020-02-10 14:53 ` [PATCH v4 09/11] Documentation/admin-guide/mm: Add a document " sjpark
2020-02-10 14:53 ` [PATCH v4 10/11] mm/damon: Add kunit tests sjpark
2020-02-11 22:21   ` Brendan Higgins
2020-02-13 16:56   ` kbuild test robot
2020-02-14 11:19     ` SeongJae Park
2020-02-15  4:07   ` Randy Dunlap
2020-02-15  6:52     ` SeongJae Park
2020-02-10 14:54 ` [PATCH v4 11/11] MAINTAINERS: Update for DAMON sjpark

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git