linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring
@ 2020-12-16  9:42 SeongJae Park
  2020-12-16  9:42 ` [RFC v10 01/13] damon/dbgfs: Allow users to set initial monitoring target regions SeongJae Park
                   ` (13 more replies)
  0 siblings, 14 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

NOTE: This is only an RFC for future features of DAMON patchset[1], which is
not merged in the mainline yet.  The aim of this RFC is to show how DAMON would
be evolved once it is merged in.  So, if you have some interest in this RFC,
please consider reviewing the DAMON patchset, either.

Changes from Previous Version
=============================

- Rebase on v5.10 + DAMON v23[1] + DAMOS v15.1[2]
- Implement primitives for page granularity idleness monitoring

Introduction
============

DAMON[1] programming interface users can extend DAMON for any address space by
configuring the address-space specific low level primitives with appropriate
ones.  However, because only the implementation for the virtual address spaces
is available now, the users should implement their own for other address
spaces.  Worse yet, the user space users who are using the debugfs interface of
'damon-dbgfs' module or the DAMON user space tool, cannot implement their own.
Therefore, this patchset implements reference implementation of the low level
primitives for two other common use cases.

First common use case is the physical memory address space monitoring.  This
patchset, hence, implements monitoring primitives for the physical address
space.  Further, this patchset links the implementation to the debugfs
interface and the user space tool for the user space users.

Another common use case is page-granularity idleness monitoring.  This is
already available via 'Idle Page Tracking', but it requires frequent
kernel-user context change.  DAMON can efficiently support the usecase via its
arbitrary type monitoring target feature.  This patchset, therefore, provides
yet another implementation of the monitoring primitives for the use case.
Nonetheless, this version of the patchset doesn't provide user space friendly
interface for that.  It would be included in a future version of this patchset.

Note that the primitives support only the online user memory, as same to the
Idle Page Tracking.

Baseline and Complete Git Trees
===============================

The patches are based on the v5.10 linux kernel plus DAMON v23 patchset[1] and
DAMOS RFC v15.1 patchset[2].  You can also clone the complete git tree:

    $ git clone git://github.com/sjp38/linux -b cdamon/rfc/v10

The web is also available:
https://github.com/sjp38/linux/releases/tag/cdamon/rfc/v10

[1] https://lore.kernel.org/linux-doc/20201005105522.23841-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/

Sequence of Patches
===================

The sequence of patches is as follow.

The first 5 patches allow the user space users manually set the monitoring
regions.  The 1st and 2nd patches implements the features in the 'damon-dbgfs'
and the user space tool, respectively.  Following two patches update
unittests (the 3rd patch) and selftests (the 4th patch) for the new feature.
Finally, the 5th patch documents this new feature.

Following 5 patches implement the physical memory monitoring.  The 6th patch
makes some primitive functions for the virtual address spaces primitive
reusable.  The 7th patch implements the physical address space monitoring
primitive using it.  The 8th and the 9th patches links the primitives to the
'damon-dbgfs' and the user space tool, respectively.  The 10th patch further
implement a handy NUMA specific memory monitoring feature on the user space
tool.  The 11th patch documents this new features.

The 12th patch again separates some functions of the physical address space
primitives to reusable, and the 13th patch implements the page granularity
idleness monitoring primitives using those.

Patch History
=============

Changes from RFC v9
(https://lore.kernel.org/linux-mm/20201007071409.12174-1-sjpark@amazon.com/)
- Rebase on v5.10 + DAMON v23[1] + DAMOS v15.1[2]
- Implement primitives for page granularity idleness monitoring

Changes from RFC v8
(https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/)
- s/snprintf()/scnprintf() (Marco Elver)
- Place three parts of DAMON (core, primitives, and dbgfs) in different files

Changes from RFC v7
(https://lore.kernel.org/linux-mm/20200818072501.30396-1-sjpark@amazon.com/)
- Add missed 'put_page()' calls
- Support unmapped LRU pages

Changes from RFC v6
(https://lore.kernel.org/linux-mm/20200805065951.18221-1-sjpark@amazon.com/)
- Use 42 as the fake target id for paddr instead of -1
- Fix typo

Changes from RFC v5
(https://lore.kernel.org/linux-mm/20200707144540.21216-1-sjpark@amazon.com/)
- Support nested iomem sections (Du Fan)
- Rebase on v5.8

Changes from RFC v4
(https://lore.kernel.org/linux-mm/20200616140813.17863-1-sjpark@amazon.com/)
 - Support NUMA specific physical memory monitoring

Changes from RFC v3
(https://lore.kernel.org/linux-mm/20200609141941.19184-1-sjpark@amazon.com/)
 - Export rmap functions
 - Reorganize for physical memory monitoring support only
 - Clean up debugfs code

Changes from RFC v2
(https://lore.kernel.org/linux-mm/20200603141135.10575-1-sjpark@amazon.com/)
 - Support the physical memory monitoring with the user space tool
 - Use 'pfn_to_online_page()' (David Hildenbrand)
 - Document more detail on random 'pfn' and its safeness (David Hildenbrand)

Changes from RFC v1
(https://lore.kernel.org/linux-mm/20200409094232.29680-1-sjpark@amazon.com/)
 - Provide the reference primitive implementations for the physical memory
 - Connect the extensions with the debugfs interface

SeongJae Park (13):
  damon/dbgfs: Allow users to set initial monitoring target regions
  tools/damon: Support init target regions specification
  damon/dbgfs-test: Add a unit test case for 'init_regions'
  selftests/damon/_chk_record: Do not check number of gaps
  Docs/admin-guide/mm/damon: Document 'init_regions' feature
  mm/damon/vaddr: Separate commonly usable functions
  mm/damon: Implement primitives for physical address space monitoring
  damon/dbgfs: Support physical memory monitoring
  tools/damon/record: Support physical memory monitoring
  tools/damon/record: Support NUMA specific recording
  Docs/DAMON: Document physical memory monitoring support
  mm/damon/paddr: Separate commonly usable functions
  mm/damon: Implement primitives for page granularity idleness
    monitoring

 Documentation/admin-guide/mm/damon/usage.rst |  77 ++++++-
 Documentation/vm/damon/design.rst            |  29 ++-
 Documentation/vm/damon/faq.rst               |   5 +-
 include/linux/damon.h                        |  37 +++
 include/trace/events/damon.h                 |  19 ++
 mm/damon/Kconfig                             |  20 +-
 mm/damon/Makefile                            |   4 +-
 mm/damon/dbgfs-test.h                        |  55 +++++
 mm/damon/dbgfs.c                             | 163 ++++++++++++-
 mm/damon/paddr.c                             |  98 ++++++++
 mm/damon/pgidle.c                            |  69 ++++++
 mm/damon/prmtv-common.c                      | 226 +++++++++++++++++++
 mm/damon/prmtv-common.h                      |  25 ++
 mm/damon/vaddr.c                             | 108 +--------
 tools/damon/_damon.py                        |  41 ++++
 tools/damon/_paddr_layout.py                 | 147 ++++++++++++
 tools/damon/record.py                        |  57 ++++-
 tools/damon/schemes.py                       |  12 +-
 tools/testing/selftests/damon/_chk_record.py |   6 -
 19 files changed, 1049 insertions(+), 149 deletions(-)
 create mode 100644 mm/damon/paddr.c
 create mode 100644 mm/damon/pgidle.c
 create mode 100644 mm/damon/prmtv-common.c
 create mode 100644 mm/damon/prmtv-common.h
 create mode 100644 tools/damon/_paddr_layout.py

-- 
2.17.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC v10 01/13] damon/dbgfs: Allow users to set initial monitoring target regions
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2021-01-19 18:41   ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 02/13] tools/damon: Support init target regions specification SeongJae Park
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

Some 'damon-dbgfs' users would want to monitor only a part of the entire
virtual memory address space.  The framework users in the kernel space
could use '->init_target_regions' callback or even set the regions
inside the context struct as they want, but 'damon-dbgfs' users cannot.

For the reason, this commit introduces a new debugfs file,
'init_region'.  'damon-dbgfs' users can specify which initial monitoring
target address regions they want by writing special input to the file.
The input should describe each region in each line in below form:

    <pid> <start address> <end address>

Note that the regions will be updated to cover entire memory mapped
regions after 'regions update interval'.  If you want the regions to not
be updated after the initial setting, you could set the interval as a
very long time, say, a few decades.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon/dbgfs.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 151 insertions(+), 3 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 06295c986dc3..2f1ec6ebd9f0 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -492,6 +492,147 @@ static ssize_t dbgfs_target_ids_write(struct file *file,
 	return ret;
 }
 
+static ssize_t sprint_init_regions(struct damon_ctx *c, char *buf, ssize_t len)
+{
+	struct damon_target *t;
+	struct damon_region *r;
+	int written = 0;
+	int rc;
+
+	damon_for_each_target(t, c) {
+		damon_for_each_region(r, t) {
+			rc = scnprintf(&buf[written], len - written,
+					"%lu %lu %lu\n",
+					t->id, r->ar.start, r->ar.end);
+			if (!rc)
+				return -ENOMEM;
+			written += rc;
+		}
+	}
+	return written;
+}
+
+static ssize_t dbgfs_init_regions_read(struct file *file, char __user *buf,
+		size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = file->private_data;
+	char *kbuf;
+	ssize_t len;
+
+	kbuf = kmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	mutex_lock(&ctx->kdamond_lock);
+	if (ctx->kdamond) {
+		mutex_unlock(&ctx->kdamond_lock);
+		return -EBUSY;
+	}
+
+	len = sprint_init_regions(ctx, kbuf, count);
+	mutex_unlock(&ctx->kdamond_lock);
+	if (len < 0)
+		goto out;
+	len = simple_read_from_buffer(buf, count, ppos, kbuf, len);
+
+out:
+	kfree(kbuf);
+	return len;
+}
+
+static int add_init_region(struct damon_ctx *c,
+			 unsigned long target_id, struct damon_addr_range *ar)
+{
+	struct damon_target *t;
+	struct damon_region *r, *prev;
+	int rc = -EINVAL;
+
+	if (ar->start >= ar->end)
+		return -EINVAL;
+
+	damon_for_each_target(t, c) {
+		if (t->id == target_id) {
+			r = damon_new_region(ar->start, ar->end);
+			if (!r)
+				return -ENOMEM;
+			damon_add_region(r, t);
+			if (damon_nr_regions(t) > 1) {
+				prev = damon_prev_region(r);
+				if (prev->ar.end > r->ar.start) {
+					damon_destroy_region(r);
+					return -EINVAL;
+				}
+			}
+			rc = 0;
+		}
+	}
+	return rc;
+}
+
+static int set_init_regions(struct damon_ctx *c, const char *str, ssize_t len)
+{
+	struct damon_target *t;
+	struct damon_region *r, *next;
+	int pos = 0, parsed, ret;
+	unsigned long target_id;
+	struct damon_addr_range ar;
+	int err;
+
+	damon_for_each_target(t, c) {
+		damon_for_each_region_safe(r, next, t)
+			damon_destroy_region(r);
+	}
+
+	while (pos < len) {
+		ret = sscanf(&str[pos], "%lu %lu %lu%n",
+				&target_id, &ar.start, &ar.end, &parsed);
+		if (ret != 3)
+			break;
+		err = add_init_region(c, target_id, &ar);
+		if (err)
+			goto fail;
+		pos += parsed;
+	}
+
+	return 0;
+
+fail:
+	damon_for_each_target(t, c) {
+		damon_for_each_region_safe(r, next, t)
+			damon_destroy_region(r);
+	}
+	return err;
+}
+
+static ssize_t dbgfs_init_regions_write(struct file *file,
+					  const char __user *buf, size_t count,
+					  loff_t *ppos)
+{
+	struct damon_ctx *ctx = file->private_data;
+	char *kbuf;
+	ssize_t ret = count;
+	int err;
+
+	kbuf = user_input_str(buf, count, ppos);
+	if (IS_ERR(kbuf))
+		return PTR_ERR(kbuf);
+
+	mutex_lock(&ctx->kdamond_lock);
+	if (ctx->kdamond) {
+		ret = -EBUSY;
+		goto unlock_out;
+	}
+
+	err = set_init_regions(ctx, kbuf, ret);
+	if (err)
+		ret = err;
+
+unlock_out:
+	mutex_unlock(&ctx->kdamond_lock);
+	kfree(kbuf);
+	return ret;
+}
+
 static ssize_t dbgfs_kdamond_pid_read(struct file *file,
 		char __user *buf, size_t count, loff_t *ppos)
 {
@@ -553,6 +694,13 @@ static const struct file_operations target_ids_fops = {
 	.write = dbgfs_target_ids_write,
 };
 
+static const struct file_operations init_regions_fops = {
+	.owner = THIS_MODULE,
+	.open = damon_dbgfs_open,
+	.read = dbgfs_init_regions_read,
+	.write = dbgfs_init_regions_write,
+};
+
 static const struct file_operations kdamond_pid_fops = {
 	.owner = THIS_MODULE,
 	.open = damon_dbgfs_open,
@@ -562,9 +710,9 @@ static const struct file_operations kdamond_pid_fops = {
 static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx)
 {
 	const char * const file_names[] = {"attrs", "record", "schemes",
-		"target_ids", "kdamond_pid"};
-	const struct file_operations *fops[] = {&attrs_fops,
-		&record_fops, &schemes_fops, &target_ids_fops,
+		"target_ids", "init_regions", "kdamond_pid"};
+	const struct file_operations *fops[] = {&attrs_fops, &record_fops,
+		&schemes_fops, &target_ids_fops, &init_regions_fops,
 		&kdamond_pid_fops};
 	int i;
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 02/13] tools/damon: Support init target regions specification
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
  2020-12-16  9:42 ` [RFC v10 01/13] damon/dbgfs: Allow users to set initial monitoring target regions SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 03/13] damon/dbgfs-test: Add a unit test case for 'init_regions' SeongJae Park
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit updates the damon user space tool to support the initial
monitoring target regions specification.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 tools/damon/_damon.py  | 39 +++++++++++++++++++++++++++++++++++++++
 tools/damon/record.py  | 12 +++++++-----
 tools/damon/schemes.py | 12 +++++++-----
 3 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/tools/damon/_damon.py b/tools/damon/_damon.py
index a4f6c03c23e4..a22ec3777c16 100644
--- a/tools/damon/_damon.py
+++ b/tools/damon/_damon.py
@@ -12,12 +12,25 @@ debugfs_attrs = None
 debugfs_record = None
 debugfs_schemes = None
 debugfs_target_ids = None
+debugfs_init_regions = None
 debugfs_monitor_on = None
 
 def set_target_id(tid):
     with open(debugfs_target_ids, 'w') as f:
         f.write('%s\n' % tid)
 
+def set_target(tid, init_regions=[]):
+    rc = set_target_id(tid)
+    if rc:
+        return rc
+
+    if not os.path.exists(debugfs_init_regions):
+        return 0
+
+    string = ' '.join(['%s %d %d' % (tid, r[0], r[1]) for r in init_regions])
+    return subprocess.call('echo "%s" > %s' % (string, debugfs_init_regions),
+            shell=True, executable='/bin/bash')
+
 def turn_damon(on_off):
     return subprocess.call("echo %s > %s" % (on_off, debugfs_monitor_on),
             shell=True, executable="/bin/bash")
@@ -97,6 +110,7 @@ def chk_update_debugfs(debugfs):
     global debugfs_record
     global debugfs_schemes
     global debugfs_target_ids
+    global debugfs_init_regions
     global debugfs_monitor_on
 
     debugfs_damon = os.path.join(debugfs, 'damon')
@@ -104,6 +118,7 @@ def chk_update_debugfs(debugfs):
     debugfs_record = os.path.join(debugfs_damon, 'record')
     debugfs_schemes = os.path.join(debugfs_damon, 'schemes')
     debugfs_target_ids = os.path.join(debugfs_damon, 'target_ids')
+    debugfs_init_regions = os.path.join(debugfs_damon, 'init_regions')
     debugfs_monitor_on = os.path.join(debugfs_damon, 'monitor_on')
 
     if not os.path.isdir(debugfs_damon):
@@ -131,6 +146,26 @@ def cmd_args_to_attrs(args):
     return Attrs(sample_interval, aggr_interval, regions_update_interval,
             min_nr_regions, max_nr_regions, rbuf_len, rfile_path, schemes)
 
+def cmd_args_to_init_regions(args):
+    regions = []
+    for arg in args.regions.split():
+        addrs = arg.split('-')
+        try:
+            if len(addrs) != 2:
+                raise Exception('two addresses not given')
+            start = int(addrs[0])
+            end = int(addrs[1])
+            if start >= end:
+                raise Exception('start >= end')
+            if regions and regions[-1][1] > start:
+                raise Exception('regions overlap')
+        except Exception as e:
+            print('Wrong \'--regions\' argument (%s)' % e)
+            exit(1)
+
+        regions.append([start, end])
+    return regions
+
 def set_attrs_argparser(parser):
     parser.add_argument('-d', '--debugfs', metavar='<debugfs>', type=str,
             default='/sys/kernel/debug', help='debugfs mounted path')
@@ -144,3 +179,7 @@ def set_attrs_argparser(parser):
             default=10, help='minimal number of regions')
     parser.add_argument('-m', '--maxr', metavar='<# regions>', type=int,
             default=1000, help='maximum number of regions')
+
+def set_init_regions_argparser(parser):
+    parser.add_argument('-r', '--regions', metavar='"<start>-<end> ..."',
+            type=str, default='', help='monitoring target address regions')
diff --git a/tools/damon/record.py b/tools/damon/record.py
index 6d1cbe593b94..11fd54001472 100644
--- a/tools/damon/record.py
+++ b/tools/damon/record.py
@@ -24,7 +24,7 @@ def pidfd_open(pid):
 
     return syscall(NR_pidfd_open, pid, 0)
 
-def do_record(target, is_target_cmd, attrs, old_attrs, pidfd):
+def do_record(target, is_target_cmd, init_regions, attrs, old_attrs, pidfd):
     if os.path.isfile(attrs.rfile_path):
         os.rename(attrs.rfile_path, attrs.rfile_path + '.old')
 
@@ -48,8 +48,8 @@ def do_record(target, is_target_cmd, attrs, old_attrs, pidfd):
         # only for reference of the pidfd usage.
         target = 'pidfd %s' % fd
 
-    if _damon.set_target_id(target):
-        print('target id setting (%s) failed' % target)
+    if _damon.set_target(target, init_regions):
+        print('target setting (%s, %s) failed' % (target, init_regions))
         cleanup_exit(old_attrs, -2)
     if _damon.turn_damon('on'):
         print('could not turn on damon' % target)
@@ -91,6 +91,7 @@ def chk_permission():
 
 def set_argparser(parser):
     _damon.set_attrs_argparser(parser)
+    _damon.set_init_regions_argparser(parser)
     parser.add_argument('target', type=str, metavar='<target>',
             help='the target command or the pid to record')
     parser.add_argument('--pidfd', action='store_true',
@@ -117,19 +118,20 @@ def main(args=None):
     args.schemes = ''
     pidfd = args.pidfd
     new_attrs = _damon.cmd_args_to_attrs(args)
+    init_regions = _damon.cmd_args_to_init_regions(args)
     target = args.target
 
     target_fields = target.split()
     if not subprocess.call('which %s &> /dev/null' % target_fields[0],
             shell=True, executable='/bin/bash'):
-        do_record(target, True, new_attrs, orig_attrs, pidfd)
+        do_record(target, True, init_regions, new_attrs, orig_attrs, pidfd)
     else:
         try:
             pid = int(target)
         except:
             print('target \'%s\' is neither a command, nor a pid' % target)
             exit(1)
-        do_record(target, False, new_attrs, orig_attrs, pidfd)
+        do_record(target, False, init_regions, new_attrs, orig_attrs, pidfd)
 
 if __name__ == '__main__':
     main()
diff --git a/tools/damon/schemes.py b/tools/damon/schemes.py
index 9095835f6133..cfec89854a08 100644
--- a/tools/damon/schemes.py
+++ b/tools/damon/schemes.py
@@ -14,7 +14,7 @@ import time
 import _convert_damos
 import _damon
 
-def run_damon(target, is_target_cmd, attrs, old_attrs):
+def run_damon(target, is_target_cmd, init_regions, attrs, old_attrs):
     if os.path.isfile(attrs.rfile_path):
         os.rename(attrs.rfile_path, attrs.rfile_path + '.old')
 
@@ -27,8 +27,8 @@ def run_damon(target, is_target_cmd, attrs, old_attrs):
     if is_target_cmd:
         p = subprocess.Popen(target, shell=True, executable='/bin/bash')
         target = p.pid
-    if _damon.set_target_pid(target):
-        print('pid setting (%s) failed' % target)
+    if _damon.set_target(target, init_regions):
+        print('target setting (%s, %s) failed' % (target, init_regions))
         cleanup_exit(old_attrs, -2)
     if _damon.turn_damon('on'):
         print('could not turn on damon' % target)
@@ -68,6 +68,7 @@ def chk_permission():
 
 def set_argparser(parser):
     _damon.set_attrs_argparser(parser)
+    _damon.set_init_regions_argparser(parser)
     parser.add_argument('target', type=str, metavar='<target>',
             help='the target command or the pid to record')
     parser.add_argument('-c', '--schemes', metavar='<file>', type=str,
@@ -92,19 +93,20 @@ def main(args=None):
     args.out = 'null'
     args.schemes = _convert_damos.convert(args.schemes, args.sample, args.aggr)
     new_attrs = _damon.cmd_args_to_attrs(args)
+    init_regions = _damon.cmd_args_to_init_regions(args)
     target = args.target
 
     target_fields = target.split()
     if not subprocess.call('which %s &> /dev/null' % target_fields[0],
             shell=True, executable='/bin/bash'):
-        run_damon(target, True, new_attrs, orig_attrs)
+        run_damon(target, True, init_regions, new_attrs, orig_attrs)
     else:
         try:
             pid = int(target)
         except:
             print('target \'%s\' is neither a command, nor a pid' % target)
             exit(1)
-        run_damon(target, False, new_attrs, orig_attrs)
+        run_damon(target, False, init_regions, new_attrs, orig_attrs)
 
 if __name__ == '__main__':
     main()
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 03/13] damon/dbgfs-test: Add a unit test case for 'init_regions'
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
  2020-12-16  9:42 ` [RFC v10 01/13] damon/dbgfs: Allow users to set initial monitoring target regions SeongJae Park
  2020-12-16  9:42 ` [RFC v10 02/13] tools/damon: Support init target regions specification SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 04/13] selftests/damon/_chk_record: Do not check number of gaps SeongJae Park
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds another test case for the new feature, 'init_regions'.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
---
 mm/damon/dbgfs-test.h | 55 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/mm/damon/dbgfs-test.h b/mm/damon/dbgfs-test.h
index ce9c6784ad47..91138b53fe2a 100644
--- a/mm/damon/dbgfs-test.h
+++ b/mm/damon/dbgfs-test.h
@@ -189,12 +189,67 @@ static void damon_dbgfs_test_aggregate(struct kunit *test)
 	damon_destroy_ctx(ctx);
 }
 
+
+static void damon_dbgfs_test_set_init_regions(struct kunit *test)
+{
+	struct damon_ctx *ctx = damon_new_ctx(DAMON_ADAPTIVE_TARGET);
+	unsigned long ids[] = {1, 2, 3};
+	/* Each line represents one region in ``<target id> <start> <end>`` */
+	char * const valid_inputs[] = {"2 10 20\n 2   20 30\n2 35 45",
+		"2 10 20\n",
+		"2 10 20\n1 39 59\n1 70 134\n  2  20 25\n",
+		""};
+	/* Reading the file again will show sorted, clean output */
+	char * const valid_expects[] = {"2 10 20\n2 20 30\n2 35 45\n",
+		"2 10 20\n",
+		"1 39 59\n1 70 134\n2 10 20\n2 20 25\n",
+		""};
+	char * const invalid_inputs[] = {"4 10 20\n",	/* target not exists */
+		"2 10 20\n 2 14 26\n",		/* regions overlap */
+		"1 10 20\n2 30 40\n 1 5 8"};	/* not sorted by address */
+	char *input, *expect;
+	int i, rc;
+	char buf[256];
+
+	damon_set_targets(ctx, ids, 3);
+
+	/* Put valid inputs and check the results */
+	for (i = 0; i < ARRAY_SIZE(valid_inputs); i++) {
+		input = valid_inputs[i];
+		expect = valid_expects[i];
+
+		rc = set_init_regions(ctx, input, strnlen(input, 256));
+		KUNIT_EXPECT_EQ(test, rc, 0);
+
+		memset(buf, 0, 256);
+		sprint_init_regions(ctx, buf, 256);
+
+		KUNIT_EXPECT_STREQ(test, (char *)buf, expect);
+	}
+	/* Put invlid inputs and check the return error code */
+	for (i = 0; i < ARRAY_SIZE(invalid_inputs); i++) {
+		input = invalid_inputs[i];
+		pr_info("input: %s\n", input);
+		rc = set_init_regions(ctx, input, strnlen(input, 256));
+		KUNIT_EXPECT_EQ(test, rc, -EINVAL);
+
+		memset(buf, 0, 256);
+		sprint_init_regions(ctx, buf, 256);
+
+		KUNIT_EXPECT_STREQ(test, (char *)buf, "");
+	}
+
+	damon_set_targets(ctx, NULL, 0);
+	damon_destroy_ctx(ctx);
+}
+
 static struct kunit_case damon_test_cases[] = {
 	KUNIT_CASE(damon_dbgfs_test_str_to_target_ids),
 	KUNIT_CASE(damon_dbgfs_test_set_targets),
 	KUNIT_CASE(damon_dbgfs_test_set_recording),
 	KUNIT_CASE(damon_dbgfs_test_write_rbuf),
 	KUNIT_CASE(damon_dbgfs_test_aggregate),
+	KUNIT_CASE(damon_dbgfs_test_set_init_regions),
 	{},
 };
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 04/13] selftests/damon/_chk_record: Do not check number of gaps
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (2 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 03/13] damon/dbgfs-test: Add a unit test case for 'init_regions' SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 05/13] Docs/admin-guide/mm/damon: Document 'init_regions' feature SeongJae Park
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

Now the regions can be explicitly set as users want.  Therefore checking
the number of gaps doesn't make sense.  Remove the condition.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 tools/testing/selftests/damon/_chk_record.py | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/tools/testing/selftests/damon/_chk_record.py b/tools/testing/selftests/damon/_chk_record.py
index 73e128904319..5f11be64abed 100644
--- a/tools/testing/selftests/damon/_chk_record.py
+++ b/tools/testing/selftests/damon/_chk_record.py
@@ -37,12 +37,9 @@ def chk_task_info(f):
         print('too many regions: %d > %d' % (nr_regions, max_nr_regions))
         exit(1)
 
-    nr_gaps = 0
     eaddr = 0
     for r in range(nr_regions):
         saddr = struct.unpack('L', f.read(8))[0]
-        if eaddr and saddr != eaddr:
-            nr_gaps += 1
         eaddr = struct.unpack('L', f.read(8))[0]
         nr_accesses = struct.unpack('I', f.read(4))[0]
 
@@ -56,9 +53,6 @@ def chk_task_info(f):
                 print('too high nr_access: expected %d but %d' %
                         (max_nr_accesses, nr_accesses))
                 exit(1)
-    if nr_gaps != 2:
-        print('number of gaps are not two but %d' % nr_gaps)
-        exit(1)
 
 def parse_time_us(bindat):
     sec = struct.unpack('l', bindat[0:8])[0]
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 05/13] Docs/admin-guide/mm/damon: Document 'init_regions' feature
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (3 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 04/13] selftests/damon/_chk_record: Do not check number of gaps SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 06/13] mm/damon/vaddr: Separate commonly usable functions SeongJae Park
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds description of the 'init_regions' feature in the DAMON
usage document.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 Documentation/admin-guide/mm/damon/usage.rst | 41 +++++++++++++++++++-
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index 96278227f925..cf0d44ce0ac9 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -281,8 +281,9 @@ for at least 100 milliseconds using below commands::
 debugfs Interface
 =================
 
-DAMON exports five files, ``attrs``, ``target_ids``, ``record``, ``schemes``
-and ``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.
+DAMON exports six files, ``attrs``, ``target_ids``, ``init_regions``,
+``record``, ``schemes`` and ``monitor_on`` under its debugfs directory,
+``<debugfs>/damon/``.
 
 
 Attributes
@@ -321,6 +322,42 @@ check it again::
 Note that setting the target ids doesn't start the monitoring.
 
 
+Initial Monitoring Target Regions
+---------------------------------
+
+In case of the debugfs based monitoring, DAMON automatically sets and updates
+the monitoring target regions so that entire memory mappings of target
+processes can be covered. However, users might want to limit the monitoring
+region to specific address ranges, such as the heap, the stack, or specific
+file-mapped area.  Or, some users might know the initial access pattern of
+their workloads and therefore want to set optimal initial regions for the
+'adaptive regions adjustment'.
+
+In such cases, users can explicitly set the initial monitoring target regions
+as they want, by writing proper values to the ``init_regions`` file.  Each line
+of the input should represent one region in below form.::
+
+    <target id> <start address> <end address>
+
+The ``target id`` should already in ``target_ids`` file, and the regions should
+be passed in address order.  For example, below commands will set a couple of
+address ranges, ``1-100`` and ``100-200`` as the initial monitoring target
+region of process 42, and another couple of address ranges, ``20-40`` and
+``50-100`` as that of process 4242.::
+
+    # cd <debugfs>/damon
+    # echo "42   1       100
+            42   100     200
+            4242 20      40
+            4242 50      100" > init_regions
+
+Note that this sets the initial monitoring target regions only.  In case of
+virtual memory monitoring, DAMON will automatically updates the boundary of the
+regions after one ``regions update interval``.  Therefore, users should set the
+``regions update interval`` large enough in this case, if they don't want the
+update.
+
+
 Record
 ------
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 06/13] mm/damon/vaddr: Separate commonly usable functions
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (4 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 05/13] Docs/admin-guide/mm/damon: Document 'init_regions' feature SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 07/13] mm/damon: Implement primitives for physical address space monitoring SeongJae Park
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit moves functions in the default virtual address spaces
monitoring primitives that commonly usable from other address spaces
like physical address space into a header file.  Those will be reused by
the physical address space monitoring primitives in the following
commit.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon/Makefile       |   2 +-
 mm/damon/prmtv-common.c | 104 ++++++++++++++++++++++++++++++++++++++
 mm/damon/prmtv-common.h |  21 ++++++++
 mm/damon/vaddr.c        | 108 +---------------------------------------
 4 files changed, 128 insertions(+), 107 deletions(-)
 create mode 100644 mm/damon/prmtv-common.c
 create mode 100644 mm/damon/prmtv-common.h

diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index fed4be3bace3..99b1bfe01ff5 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_DAMON)		:= core.o
-obj-$(CONFIG_DAMON_VADDR)	+= vaddr.o
+obj-$(CONFIG_DAMON_VADDR)	+= prmtv-common.o vaddr.o
 obj-$(CONFIG_DAMON_DBGFS)	+= dbgfs.o
diff --git a/mm/damon/prmtv-common.c b/mm/damon/prmtv-common.c
new file mode 100644
index 000000000000..6cdb96cbc9ef
--- /dev/null
+++ b/mm/damon/prmtv-common.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common Primitives for Data Access Monitoring
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#include "prmtv-common.h"
+
+static void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm,
+			     unsigned long addr)
+{
+	bool referenced = false;
+	struct page *page = pte_page(*pte);
+
+	if (pte_young(*pte)) {
+		referenced = true;
+		*pte = pte_mkold(*pte);
+	}
+
+#ifdef CONFIG_MMU_NOTIFIER
+	if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE))
+		referenced = true;
+#endif /* CONFIG_MMU_NOTIFIER */
+
+	if (referenced)
+		set_page_young(page);
+
+	set_page_idle(page);
+}
+
+static void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm,
+			     unsigned long addr)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	bool referenced = false;
+	struct page *page = pmd_page(*pmd);
+
+	if (pmd_young(*pmd)) {
+		referenced = true;
+		*pmd = pmd_mkold(*pmd);
+	}
+
+#ifdef CONFIG_MMU_NOTIFIER
+	if (mmu_notifier_clear_young(mm, addr,
+				addr + ((1UL) << HPAGE_PMD_SHIFT)))
+		referenced = true;
+#endif /* CONFIG_MMU_NOTIFIER */
+
+	if (referenced)
+		set_page_young(page);
+
+	set_page_idle(page);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+}
+
+void damon_va_mkold(struct mm_struct *mm, unsigned long addr)
+{
+	pte_t *pte = NULL;
+	pmd_t *pmd = NULL;
+	spinlock_t *ptl;
+
+	if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl))
+		return;
+
+	if (pte) {
+		damon_ptep_mkold(pte, mm, addr);
+		pte_unmap_unlock(pte, ptl);
+	} else {
+		damon_pmdp_mkold(pmd, mm, addr);
+		spin_unlock(ptl);
+	}
+}
+
+bool damon_va_young(struct mm_struct *mm, unsigned long addr,
+			unsigned long *page_sz)
+{
+	pte_t *pte = NULL;
+	pmd_t *pmd = NULL;
+	spinlock_t *ptl;
+	bool young = false;
+
+	if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl))
+		return false;
+
+	*page_sz = PAGE_SIZE;
+	if (pte) {
+		young = pte_young(*pte);
+		if (!young)
+			young = !page_is_idle(pte_page(*pte));
+		pte_unmap_unlock(pte, ptl);
+		return young;
+	}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	young = pmd_young(*pmd);
+	if (!young)
+		young = !page_is_idle(pmd_page(*pmd));
+	spin_unlock(ptl);
+	*page_sz = ((1UL) << HPAGE_PMD_SHIFT);
+#endif	/* CONFIG_TRANSPARENT_HUGEPAGE */
+
+	return young;
+}
diff --git a/mm/damon/prmtv-common.h b/mm/damon/prmtv-common.h
new file mode 100644
index 000000000000..a66a6139b4fc
--- /dev/null
+++ b/mm/damon/prmtv-common.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Common Primitives for Data Access Monitoring
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#include <linux/damon.h>
+#include <linux/mm.h>
+#include <linux/mmu_notifier.h>
+#include <linux/page_idle.h>
+#include <linux/random.h>
+#include <linux/sched/mm.h>
+#include <linux/slab.h>
+
+/* Get a random number in [l, r) */
+#define damon_rand(l, r) (l + prandom_u32_max(r - l))
+
+void damon_va_mkold(struct mm_struct *mm, unsigned long addr);
+bool damon_va_young(struct mm_struct *mm, unsigned long addr,
+			unsigned long *page_sz);
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index 2075f07f728b..915b12329c6e 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -8,22 +8,14 @@
 #define pr_fmt(fmt) "damon-va: " fmt
 
 #include <asm-generic/mman-common.h>
-#include <linux/damon.h>
-#include <linux/mm.h>
-#include <linux/mmu_notifier.h>
-#include <linux/page_idle.h>
-#include <linux/random.h>
-#include <linux/sched/mm.h>
-#include <linux/slab.h>
+
+#include "prmtv-common.h"
 
 #ifdef CONFIG_DAMON_VADDR_KUNIT_TEST
 #undef DAMON_MIN_REGION
 #define DAMON_MIN_REGION 1
 #endif
 
-/* Get a random number in [l, r) */
-#define damon_rand(l, r) (l + prandom_u32_max(r - l))
-
 /*
  * 't->id' should be the pointer to the relevant 'struct pid' having reference
  * count.  Caller must put the returned task, unless it is NULL.
@@ -370,71 +362,6 @@ void damon_va_update_regions(struct damon_ctx *ctx)
 	}
 }
 
-static void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm,
-			     unsigned long addr)
-{
-	bool referenced = false;
-	struct page *page = pte_page(*pte);
-
-	if (pte_young(*pte)) {
-		referenced = true;
-		*pte = pte_mkold(*pte);
-	}
-
-#ifdef CONFIG_MMU_NOTIFIER
-	if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE))
-		referenced = true;
-#endif /* CONFIG_MMU_NOTIFIER */
-
-	if (referenced)
-		set_page_young(page);
-
-	set_page_idle(page);
-}
-
-static void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm,
-			     unsigned long addr)
-{
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	bool referenced = false;
-	struct page *page = pmd_page(*pmd);
-
-	if (pmd_young(*pmd)) {
-		referenced = true;
-		*pmd = pmd_mkold(*pmd);
-	}
-
-#ifdef CONFIG_MMU_NOTIFIER
-	if (mmu_notifier_clear_young(mm, addr,
-				addr + ((1UL) << HPAGE_PMD_SHIFT)))
-		referenced = true;
-#endif /* CONFIG_MMU_NOTIFIER */
-
-	if (referenced)
-		set_page_young(page);
-
-	set_page_idle(page);
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-}
-
-static void damon_va_mkold(struct mm_struct *mm, unsigned long addr)
-{
-	pte_t *pte = NULL;
-	pmd_t *pmd = NULL;
-	spinlock_t *ptl;
-
-	if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl))
-		return;
-
-	if (pte) {
-		damon_ptep_mkold(pte, mm, addr);
-		pte_unmap_unlock(pte, ptl);
-	} else {
-		damon_pmdp_mkold(pmd, mm, addr);
-		spin_unlock(ptl);
-	}
-}
-
 /*
  * Functions for the access checking of the regions
  */
@@ -463,37 +390,6 @@ void damon_va_prepare_access_checks(struct damon_ctx *ctx)
 	}
 }
 
-static bool damon_va_young(struct mm_struct *mm, unsigned long addr,
-			unsigned long *page_sz)
-{
-	pte_t *pte = NULL;
-	pmd_t *pmd = NULL;
-	spinlock_t *ptl;
-	bool young = false;
-
-	if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl))
-		return false;
-
-	*page_sz = PAGE_SIZE;
-	if (pte) {
-		young = pte_young(*pte);
-		if (!young)
-			young = !page_is_idle(pte_page(*pte));
-		pte_unmap_unlock(pte, ptl);
-		return young;
-	}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	young = pmd_young(*pmd);
-	if (!young)
-		young = !page_is_idle(pmd_page(*pmd));
-	spin_unlock(ptl);
-	*page_sz = ((1UL) << HPAGE_PMD_SHIFT);
-#endif	/* CONFIG_TRANSPARENT_HUGEPAGE */
-
-	return young;
-}
-
 /*
  * Check whether the region was accessed after the last preparation
  *
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 07/13] mm/damon: Implement primitives for physical address space monitoring
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (5 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 06/13] mm/damon/vaddr: Separate commonly usable functions SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 08/13] damon/dbgfs: Support physical memory monitoring SeongJae Park
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit implements the primitives for the basic access monitoring of
the physical memory address space.  By using this, users can easily
monitor the accesses to the physical memory.

Internally, it uses the PTE Accessed bit, as similar to that of the
virtual memory support.  Also, it supports only user memory pages, as
idle page tracking also does, for the same reason.  If the monitoring
target physical memory address range contains non-user memory pages,
access check of the pages will do nothing but simply treat the pages as
not accessed.

Users who want to use other access check primitives and/or monitor the
non-user memory regions could implement and use their own callbacks.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h |  10 ++
 mm/damon/Kconfig      |   9 ++
 mm/damon/Makefile     |   1 +
 mm/damon/paddr.c      | 222 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 242 insertions(+)
 create mode 100644 mm/damon/paddr.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
index ed7e86207e53..ea2fd054b2ef 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -376,4 +376,14 @@ void damon_va_set_primitives(struct damon_ctx *ctx);
 
 #endif	/* CONFIG_DAMON_VADDR */
 
+#ifdef CONFIG_DAMON_PADDR
+
+/* Monitoring primitives for the physical memory address space */
+void damon_pa_prepare_access_checks(struct damon_ctx *ctx);
+unsigned int damon_pa_check_accesses(struct damon_ctx *ctx);
+bool damon_pa_target_valid(void *t);
+void damon_pa_set_primitives(struct damon_ctx *ctx);
+
+#endif	/* CONFIG_DAMON_PADDR */
+
 #endif	/* _DAMON_H */
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 455995152697..89c06ac8c9eb 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -33,6 +33,15 @@ config DAMON_VADDR
 	  This builds the default data access monitoring primitives for DAMON
 	  that works for virtual address spaces.
 
+config DAMON_PADDR
+	bool "Data access monitoring primitives for the physical address space"
+	depends on DAMON && MMU
+	select PAGE_EXTENSION if !64BIT
+	select PAGE_IDLE_FLAG
+	help
+	  This builds the default data access monitoring primitives for DAMON
+	  that works for physical address spaces.
+
 config DAMON_VADDR_KUNIT_TEST
 	bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS
 	depends on DAMON_VADDR && KUNIT=y
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 99b1bfe01ff5..8d9b0df79702 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -2,4 +2,5 @@
 
 obj-$(CONFIG_DAMON)		:= core.o
 obj-$(CONFIG_DAMON_VADDR)	+= prmtv-common.o vaddr.o
+obj-$(CONFIG_DAMON_PADDR)	+= prmtv-common.o paddr.o
 obj-$(CONFIG_DAMON_DBGFS)	+= dbgfs.o
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
new file mode 100644
index 000000000000..b120f672cc57
--- /dev/null
+++ b/mm/damon/paddr.c
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON Primitives for The Physical Address Space
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#define pr_fmt(fmt) "damon-pa: " fmt
+
+#include <linux/rmap.h>
+
+#include "prmtv-common.h"
+
+/*
+ * This has no implementations for 'init_target_regions()' and
+ * 'update_target_regions()'.  Users should set the initial regions and update
+ * regions by themselves in the 'before_start' and 'after_aggregation'
+ * callbacks, respectively.  Or, they can implement and use their own version
+ * of the primitives.
+ */
+
+/*
+ * Get a page by pfn if it is in the LRU list.  Otherwise, returns NULL.
+ *
+ * The body of this function is stollen from the 'page_idle_get_page()'.  We
+ * steal rather than reuse it because the code is quite simple.
+ */
+static struct page *damon_pa_get_page(unsigned long pfn)
+{
+	struct page *page = pfn_to_online_page(pfn);
+	pg_data_t *pgdat;
+
+	if (!page || !PageLRU(page) ||
+	    !get_page_unless_zero(page))
+		return NULL;
+
+	pgdat = page_pgdat(page);
+	spin_lock_irq(&pgdat->lru_lock);
+	if (unlikely(!PageLRU(page))) {
+		put_page(page);
+		page = NULL;
+	}
+	spin_unlock_irq(&pgdat->lru_lock);
+	return page;
+}
+
+static bool __damon_pa_mkold(struct page *page, struct vm_area_struct *vma,
+		unsigned long addr, void *arg)
+{
+	damon_va_mkold(vma->vm_mm, addr);
+	return true;
+}
+
+static void damon_pa_mkold(unsigned long paddr)
+{
+	struct page *page = damon_pa_get_page(PHYS_PFN(paddr));
+	struct rmap_walk_control rwc = {
+		.rmap_one = __damon_pa_mkold,
+		.anon_lock = page_lock_anon_vma_read,
+	};
+	bool need_lock;
+
+	if (!page)
+		return;
+
+	if (!page_mapped(page) || !page_rmapping(page)) {
+		set_page_idle(page);
+		put_page(page);
+		return;
+	}
+
+	need_lock = !PageAnon(page) || PageKsm(page);
+	if (need_lock && !trylock_page(page)) {
+		put_page(page);
+		return;
+	}
+
+	rmap_walk(page, &rwc);
+
+	if (need_lock)
+		unlock_page(page);
+	put_page(page);
+}
+
+static void __damon_pa_prepare_access_check(struct damon_ctx *ctx,
+					    struct damon_region *r)
+{
+	r->sampling_addr = damon_rand(r->ar.start, r->ar.end);
+
+	damon_pa_mkold(r->sampling_addr);
+}
+
+void damon_pa_prepare_access_checks(struct damon_ctx *ctx)
+{
+	struct damon_target *t;
+	struct damon_region *r;
+
+	damon_for_each_target(t, ctx) {
+		damon_for_each_region(r, t)
+			__damon_pa_prepare_access_check(ctx, r);
+	}
+}
+
+struct damon_pa_access_chk_result {
+	unsigned long page_sz;
+	bool accessed;
+};
+
+static bool damon_pa_accessed(struct page *page, struct vm_area_struct *vma,
+		unsigned long addr, void *arg)
+{
+	struct damon_pa_access_chk_result *result = arg;
+
+	result->accessed = damon_va_young(vma->vm_mm, addr, &result->page_sz);
+
+	/* If accessed, stop walking */
+	return !result->accessed;
+}
+
+static bool damon_pa_young(unsigned long paddr, unsigned long *page_sz)
+{
+	struct page *page = damon_pa_get_page(PHYS_PFN(paddr));
+	struct damon_pa_access_chk_result result = {
+		.page_sz = PAGE_SIZE,
+		.accessed = false,
+	};
+	struct rmap_walk_control rwc = {
+		.arg = &result,
+		.rmap_one = damon_pa_accessed,
+		.anon_lock = page_lock_anon_vma_read,
+	};
+	bool need_lock;
+
+	if (!page)
+		return false;
+
+	if (!page_mapped(page) || !page_rmapping(page)) {
+		if (page_is_idle(page))
+			result.accessed = false;
+		else
+			result.accessed = true;
+		put_page(page);
+		goto out;
+	}
+
+	need_lock = !PageAnon(page) || PageKsm(page);
+	if (need_lock && !trylock_page(page)) {
+		put_page(page);
+		return NULL;
+	}
+
+	rmap_walk(page, &rwc);
+
+	if (need_lock)
+		unlock_page(page);
+	put_page(page);
+
+out:
+	*page_sz = result.page_sz;
+	return result.accessed;
+}
+
+/*
+ * Check whether the region was accessed after the last preparation
+ *
+ * mm	'mm_struct' for the given virtual address space
+ * r	the region of physical address space that needs to be checked
+ */
+static void __damon_pa_check_access(struct damon_ctx *ctx,
+				    struct damon_region *r)
+{
+	static unsigned long last_addr;
+	static unsigned long last_page_sz = PAGE_SIZE;
+	static bool last_accessed;
+
+	/* If the region is in the last checked page, reuse the result */
+	if (ALIGN_DOWN(last_addr, last_page_sz) ==
+				ALIGN_DOWN(r->sampling_addr, last_page_sz)) {
+		if (last_accessed)
+			r->nr_accesses++;
+		return;
+	}
+
+	last_accessed = damon_pa_young(r->sampling_addr, &last_page_sz);
+	if (last_accessed)
+		r->nr_accesses++;
+
+	last_addr = r->sampling_addr;
+}
+
+unsigned int damon_pa_check_accesses(struct damon_ctx *ctx)
+{
+	struct damon_target *t;
+	struct damon_region *r;
+	unsigned int max_nr_accesses = 0;
+
+	damon_for_each_target(t, ctx) {
+		damon_for_each_region(r, t) {
+			__damon_pa_check_access(ctx, r);
+			max_nr_accesses = max(r->nr_accesses, max_nr_accesses);
+		}
+	}
+
+	return max_nr_accesses;
+}
+
+bool damon_pa_target_valid(void *t)
+{
+	return true;
+}
+
+void damon_pa_set_primitives(struct damon_ctx *ctx)
+{
+	ctx->primitive.init_target_regions = NULL;
+	ctx->primitive.update_target_regions = NULL;
+	ctx->primitive.prepare_access_checks = damon_pa_prepare_access_checks;
+	ctx->primitive.check_accesses = damon_pa_check_accesses;
+	ctx->primitive.reset_aggregated = NULL;
+	ctx->primitive.target_valid = damon_pa_target_valid;
+	ctx->primitive.cleanup = NULL;
+	ctx->primitive.apply_scheme = NULL;
+}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 08/13] damon/dbgfs: Support physical memory monitoring
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (6 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 07/13] mm/damon: Implement primitives for physical address space monitoring SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 09/13] tools/damon/record: " SeongJae Park
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit makes the 'damon-dbgfs' to support the physical memory
monitoring, in addition to the virtual memory monitoring.

Users can do the physical memory monitoring by writing a special
keyword, 'paddr\n' to the 'pids' debugfs file.  Then, DAMON will check
the special keyword and configure the monitoring context to run using
the primitives for physical memory.  This will internally add one fake
monitoring target process, which has target id 42.

Unlike the virtual memory monitoring, the monitoring target region will
not be automatically set.  Therefore, users should also set the
monitoring target address region using the 'init_regions' debugfs file.

Finally, the physical memory monitoring will not automatically
terminated.  The user should explicitly turn off the monitoring by
writing 'off' to the 'monitor_on' debugfs file.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon/Kconfig | 2 +-
 mm/damon/dbgfs.c | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 89c06ac8c9eb..38f4cfce72dd 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -56,7 +56,7 @@ config DAMON_VADDR_KUNIT_TEST
 
 config DAMON_DBGFS
 	bool "DAMON debugfs interface"
-	depends on DAMON_VADDR && DEBUG_FS
+	depends on DAMON_VADDR && DAMON_PADDR && DEBUG_FS
 	help
 	  This builds the debugfs interface for DAMON.  The user space admins
 	  can use the interface for arbitrary data access monitoring.
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 2f1ec6ebd9f0..ea03d7d5b879 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -462,6 +462,15 @@ static ssize_t dbgfs_target_ids_write(struct file *file,
 		return PTR_ERR(kbuf);
 
 	nrs = kbuf;
+	if (!strncmp(kbuf, "paddr\n", count)) {
+		/* Configure the context for physical memory monitoring */
+		damon_pa_set_primitives(ctx);
+		/* target id is meaningless here, but we set it just for fun */
+		scnprintf(kbuf, count, "42    ");
+	} else {
+		/* Configure the context for virtual memory monitoring */
+		damon_va_set_primitives(ctx);
+	}
 
 	targets = str_to_target_ids(nrs, ret, &nr_targets);
 	if (!targets) {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 09/13] tools/damon/record: Support physical memory monitoring
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (7 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 08/13] damon/dbgfs: Support physical memory monitoring SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 10/13] tools/damon/record: Support NUMA specific recording SeongJae Park
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit allows users to record the data accesses on physical memory
address space by passing 'paddr' as target to 'damo-record'.  If the
init regions are given, the regions will be monitored.  Else, it will
monitor biggest conitguous 'System RAM' region in '/proc/iomem' and
monitor the region.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 tools/damon/_damon.py |  2 ++
 tools/damon/record.py | 29 ++++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/tools/damon/_damon.py b/tools/damon/_damon.py
index a22ec3777c16..6ff278117e84 100644
--- a/tools/damon/_damon.py
+++ b/tools/damon/_damon.py
@@ -27,6 +27,8 @@ def set_target(tid, init_regions=[]):
     if not os.path.exists(debugfs_init_regions):
         return 0
 
+    if tid == 'paddr':
+        tid = 42
     string = ' '.join(['%s %d %d' % (tid, r[0], r[1]) for r in init_regions])
     return subprocess.call('echo "%s" > %s' % (string, debugfs_init_regions),
             shell=True, executable='/bin/bash')
diff --git a/tools/damon/record.py b/tools/damon/record.py
index 11fd54001472..6fd0b59c73e0 100644
--- a/tools/damon/record.py
+++ b/tools/damon/record.py
@@ -101,6 +101,29 @@ def set_argparser(parser):
     parser.add_argument('-o', '--out', metavar='<file path>', type=str,
             default='damon.data', help='output file path')
 
+def default_paddr_region():
+    "Largest System RAM region becomes the default"
+    ret = []
+    with open('/proc/iomem', 'r') as f:
+        # example of the line: '100000000-42b201fff : System RAM'
+        for line in f:
+            fields = line.split(':')
+            if len(fields) != 2:
+                continue
+            name = fields[1].strip()
+            if name != 'System RAM':
+                continue
+            addrs = fields[0].split('-')
+            if len(addrs) != 2:
+                continue
+            start = int(addrs[0], 16)
+            end = int(addrs[1], 16)
+
+            sz_region = end - start
+            if not ret or sz_region > (ret[1] - ret[0]):
+                ret = [start, end]
+    return ret
+
 def main(args=None):
     global orig_attrs
     if not args:
@@ -122,7 +145,11 @@ def main(args=None):
     target = args.target
 
     target_fields = target.split()
-    if not subprocess.call('which %s &> /dev/null' % target_fields[0],
+    if target == 'paddr':   # physical memory address space
+        if not init_regions:
+            init_regions = [default_paddr_region()]
+        do_record(target, False, init_regions, new_attrs, orig_attrs, pidfd)
+    elif not subprocess.call('which %s &> /dev/null' % target_fields[0],
             shell=True, executable='/bin/bash'):
         do_record(target, True, init_regions, new_attrs, orig_attrs, pidfd)
     else:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 10/13] tools/damon/record: Support NUMA specific recording
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (8 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 09/13] tools/damon/record: " SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 11/13] Docs/DAMON: Document physical memory monitoring support SeongJae Park
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit updates the DAMON user space tool (damo-record) for NUMA
specific physical memory monitoring.  With this change, users can
monitor accesses to physical memory of specific NUMA node.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 tools/damon/_paddr_layout.py | 147 +++++++++++++++++++++++++++++++++++
 tools/damon/record.py        |  18 ++++-
 2 files changed, 164 insertions(+), 1 deletion(-)
 create mode 100644 tools/damon/_paddr_layout.py

diff --git a/tools/damon/_paddr_layout.py b/tools/damon/_paddr_layout.py
new file mode 100644
index 000000000000..561c2b6729f6
--- /dev/null
+++ b/tools/damon/_paddr_layout.py
@@ -0,0 +1,147 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import os
+
+class PaddrRange:
+    start = None
+    end = None
+    nid = None
+    state = None
+    name = None
+
+    def __init__(self, start, end, nid, state, name):
+        self.start = start
+        self.end = end
+        self.nid = nid
+        self.state = state
+        self.name = name
+
+    def interleaved(self, prange):
+        if self.end <= prange.start:
+            return None
+        if prange.end <= self.start:
+            return None
+        return [max(self.start, prange.start), min(self.end, prange.end)]
+
+    def __str__(self):
+        return '%x-%x, nid %s, state %s, name %s' % (self.start, self.end,
+                self.nid, self.state, self.name)
+
+class MemBlock:
+    nid = None
+    index = None
+    state = None
+
+    def __init__(self, nid, index, state):
+        self.nid = nid
+        self.index = index
+        self.state = state
+
+    def __str__(self):
+        return '%d (%s)' % (self.index, self.state)
+
+    def __repr__(self):
+        return self.__str__()
+
+def readfile(file_path):
+    with open(file_path, 'r') as f:
+        return f.read()
+
+def collapse_ranges(ranges):
+    ranges = sorted(ranges, key=lambda x: x.start)
+    merged = []
+    for r in ranges:
+        if not merged:
+            merged.append(r)
+            continue
+        last = merged[-1]
+        if last.end != r.start or last.nid != r.nid or last.state != r.state:
+            merged.append(r)
+        else:
+            last.end = r.end
+    return merged
+
+def memblocks_to_ranges(blocks, block_size):
+    ranges = []
+    for b in blocks:
+        ranges.append(PaddrRange(b.index * block_size,
+            (b.index + 1) * block_size, b.nid, b.state, None))
+
+    return collapse_ranges(ranges)
+
+def memblock_ranges():
+    SYSFS='/sys/devices/system/node'
+    sz_block = int(readfile('/sys/devices/system/memory/block_size_bytes'), 16)
+    sys_nodes = [x for x in os.listdir(SYSFS) if x.startswith('node')]
+
+    blocks = []
+    for sys_node in sys_nodes:
+        nid = int(sys_node[4:])
+
+        sys_node_files = os.listdir(os.path.join(SYSFS, sys_node))
+        for f in sys_node_files:
+            if not f.startswith('memory'):
+                continue
+            index = int(f[6:])
+            sys_state = os.path.join(SYSFS, sys_node, f, 'state')
+            state = readfile(sys_state).strip()
+
+            blocks.append(MemBlock(nid, index, state))
+
+    return memblocks_to_ranges(blocks, sz_block)
+
+def iomem_ranges():
+    ranges = []
+
+    with open('/proc/iomem', 'r') as f:
+        # example of the line: '100000000-42b201fff : System RAM'
+        for line in f:
+            fields = line.split(':')
+            if len(fields) < 2:
+                continue
+            name = ':'.join(fields[1:]).strip()
+            addrs = fields[0].split('-')
+            if len(addrs) != 2:
+                continue
+            start = int(addrs[0], 16)
+            end = int(addrs[1], 16) + 1
+            ranges.append(PaddrRange(start, end, None, None, name))
+
+    return ranges
+
+def integrate(memblock_parsed, iomem_parsed):
+    merged = []
+
+    for r in iomem_parsed:
+        for r2 in memblock_parsed:
+            if r2.start <= r.start and r.end <= r2.end:
+                r.nid = r2.nid
+                r.state = r2.state
+                merged.append(r)
+            elif r2.start <= r.start and r.start < r2.end and r2.end < r.end:
+                sub = PaddrRange(r2.end, r.end, None, None, r.name)
+                iomem_parsed.append(sub)
+                r.end = r2.end
+                r.nid = r2.nid
+                r.state = r2.state
+                merged.append(r)
+    merged = sorted(merged, key=lambda x: x.start)
+    return merged
+
+def paddr_ranges():
+    return integrate(memblock_ranges(), iomem_ranges())
+
+def pr_ranges(ranges):
+    print('#%12s %13s\tnode\tstate\tresource\tsize' % ('start', 'end'))
+    for r in ranges:
+        print('%13d %13d\t%s\t%s\t%s\t%d' % (r.start, r.end, r.nid,
+            r.state, r.name, r.end - r.start))
+
+def main():
+    ranges = paddr_ranges()
+
+    pr_ranges(ranges)
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/damon/record.py b/tools/damon/record.py
index 6fd0b59c73e0..e9d6bfc70ead 100644
--- a/tools/damon/record.py
+++ b/tools/damon/record.py
@@ -12,6 +12,7 @@ import subprocess
 import time
 
 import _damon
+import _paddr_layout
 
 def pidfd_open(pid):
     import ctypes
@@ -98,6 +99,8 @@ def set_argparser(parser):
             help='use pidfd type target id')
     parser.add_argument('-l', '--rbuf', metavar='<len>', type=int,
             default=1024*1024, help='length of record result buffer')
+    parser.add_argument('--numa_node', metavar='<node id>', type=int,
+            help='if target is \'paddr\', limit it to the numa node')
     parser.add_argument('-o', '--out', metavar='<file path>', type=str,
             default='damon.data', help='output file path')
 
@@ -124,6 +127,15 @@ def default_paddr_region():
                 ret = [start, end]
     return ret
 
+def paddr_region_of(numa_node):
+    regions = []
+    paddr_ranges = _paddr_layout.paddr_ranges()
+    for r in paddr_ranges:
+        if r.nid == numa_node and r.name == 'System RAM':
+            regions.append([r.start, r.end])
+
+    return regions
+
 def main(args=None):
     global orig_attrs
     if not args:
@@ -142,12 +154,16 @@ def main(args=None):
     pidfd = args.pidfd
     new_attrs = _damon.cmd_args_to_attrs(args)
     init_regions = _damon.cmd_args_to_init_regions(args)
+    numa_node = args.numa_node
     target = args.target
 
     target_fields = target.split()
     if target == 'paddr':   # physical memory address space
         if not init_regions:
-            init_regions = [default_paddr_region()]
+            if numa_node:
+                init_regions = paddr_region_of(numa_node)
+            else:
+                init_regions = [default_paddr_region()]
         do_record(target, False, init_regions, new_attrs, orig_attrs, pidfd)
     elif not subprocess.call('which %s &> /dev/null' % target_fields[0],
             shell=True, executable='/bin/bash'):
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 11/13] Docs/DAMON: Document physical memory monitoring support
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (9 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 10/13] tools/damon/record: Support NUMA specific recording SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 12/13] mm/damon/paddr: Separate commonly usable functions SeongJae Park
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit updates the DAMON documents for the physical memory
monitoring support.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 Documentation/admin-guide/mm/damon/usage.rst | 42 ++++++++++++++++----
 Documentation/vm/damon/design.rst            | 29 +++++++++-----
 Documentation/vm/damon/faq.rst               |  5 +--
 3 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index cf0d44ce0ac9..3e2f1519c96a 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -10,15 +10,16 @@ DAMON provides below three interfaces for different users.
   This is for privileged people such as system administrators who want a
   just-working human-friendly interface.  Using this, users can use the DAMON’s
   major features in a human-friendly way.  It may not be highly tuned for
-  special cases, though.  It supports only virtual address spaces monitoring.
+  special cases, though.  It supports both virtual and physical address spaces
+  monitoring.
 - *debugfs interface.*
   This is for privileged user space programmers who want more optimized use of
   DAMON.  Using this, users can use DAMON’s major features by reading
   from and writing to special debugfs files.  Therefore, you can write and use
   your personalized DAMON debugfs wrapper programs that reads/writes the
   debugfs files instead of you.  The DAMON user space tool is also a reference
-  implementation of such programs.  It supports only virtual address spaces
-  monitoring.
+  implementation of such programs.  It supports both virtual and physical
+  address spaces monitoring.
 - *Kernel Space Programming Interface.*
   This is for kernel space programmers.  Using this, users can utilize every
   feature of DAMON most flexibly and efficiently by writing kernel space
@@ -49,8 +50,10 @@ Recording Data Access Pattern
 
 The ``record`` subcommand records the data access pattern of target workloads
 in a file (``./damon.data`` by default).  You can specify the target with 1)
-the command for execution of the monitoring target process, or 2) pid of
-running target process.  Below example shows a command target usage::
+the command for execution of the monitoring target process, 2) pid of running
+target process, or 3) the special keyword, 'paddr', if you want to monitor the
+system's physical memory address space.  Below example shows a command target
+usage::
 
     # cd <kernel>/tools/damon/
     # damo record "sleep 5"
@@ -61,6 +64,15 @@ of the process.  Below example shows a pid target usage::
     # sleep 5 &
     # damo record `pidof sleep`
 
+Finally, below example shows the use of the special keyword, 'paddr'::
+
+    # damo record paddr
+
+In this case, the monitoring target regions defaults to the largetst 'System
+RAM' region specified in '/proc/iomem' file.  Note that the initial monitoring
+target region is maintained rather than dynamically updated like the virtual
+memory address spaces monitoring case.
+
 The location of the recorded file can be explicitly set using ``-o`` option.
 You can further tune this by setting the monitoring attributes.  To know about
 the monitoring attributes in detail, please refer to the
@@ -319,20 +331,34 @@ check it again::
     # cat target_ids
     42 4242
 
+Users can also monitor the physical memory address space of the system by
+writing a special keyword, "``paddr\n``" to the file.  Because physical address
+space monitoring doesn't support multiple targets, reading the file will show a
+fake value, ``42``, as below::
+
+    # cd <debugfs>/damon
+    # echo paddr > target_ids
+    # cat target_ids
+    42
+
 Note that setting the target ids doesn't start the monitoring.
 
 
 Initial Monitoring Target Regions
 ---------------------------------
 
-In case of the debugfs based monitoring, DAMON automatically sets and updates
-the monitoring target regions so that entire memory mappings of target
-processes can be covered. However, users might want to limit the monitoring
+In case of the virtual address space monitoring, DAMON automatically sets and
+updates the monitoring target regions so that entire memory mappings of target
+processes can be covered.  However, users might want to limit the monitoring
 region to specific address ranges, such as the heap, the stack, or specific
 file-mapped area.  Or, some users might know the initial access pattern of
 their workloads and therefore want to set optimal initial regions for the
 'adaptive regions adjustment'.
 
+In contrast, DAMON do not automatically sets and updates the monitoring target
+regions in case of physical memory monitoring.  Therefore, users should set the
+monitoring target regions by themselves.
+
 In such cases, users can explicitly set the initial monitoring target regions
 as they want, by writing proper values to the ``init_regions`` file.  Each line
 of the input should represent one region in below form.::
diff --git a/Documentation/vm/damon/design.rst b/Documentation/vm/damon/design.rst
index 727d72093f8f..0666e19018fd 100644
--- a/Documentation/vm/damon/design.rst
+++ b/Documentation/vm/damon/design.rst
@@ -35,27 +35,34 @@ two parts:
 1. Identification of the monitoring target address range for the address space.
 2. Access check of specific address range in the target space.
 
-DAMON currently provides the implementation of the primitives for only the
-virtual address spaces. Below two subsections describe how it works.
+DAMON currently provides the implementations of the primitives for the physical
+and virtual address spaces. Below two subsections describe how those work.
 
 
 PTE Accessed-bit Based Access Check
 -----------------------------------
 
-The implementation for the virtual address space uses PTE Accessed-bit for
-basic access checks.  It finds the relevant PTE Accessed bit from the address
-by walking the page table for the target task of the address.  In this way, the
-implementation finds and clears the bit for next sampling target address and
-checks whether the bit set again after one sampling period.  This could disturb
-other kernel subsystems using the Accessed bits, namely Idle page tracking and
-the reclaim logic.  To avoid such disturbances, DAMON makes it mutually
-exclusive with Idle page tracking and uses ``PG_idle`` and ``PG_young`` page
-flags to solve the conflict with the reclaim logic, as Idle page tracking does.
+Both of the implementations for physical and virtual address spaces use PTE
+Accessed-bit for basic access checks.  Only one difference is the way of
+finding the relevant PTE Accessed bit(s) from the address.  While the
+implementation for the virtual address walks the page table for the target task
+of the address, the implementation for the physical address walks every page
+table having a mapping to the address.  In this way, the implementations find
+and clear the bit(s) for next sampling target address and checks whether the
+bit(s) set again after one sampling period.  This could disturb other kernel
+subsystems using the Accessed bits, namely Idle page tracking and the reclaim
+logic.  To avoid such disturbances, DAMON makes it mutually exclusive with Idle
+page tracking and uses ``PG_idle`` and ``PG_young`` page flags to solve the
+conflict with the reclaim logic, as Idle page tracking does.
 
 
 VMA-based Target Address Range Construction
 -------------------------------------------
 
+This is only for the virtual address space primitives implementation.  That for
+the physical address space simply asks users to manually set the monitoring
+target address ranges.
+
 Only small parts in the super-huge virtual address space of the processes are
 mapped to the physical memory and accessed.  Thus, tracking the unmapped
 address regions is just wasteful.  However, because DAMON can deal with some
diff --git a/Documentation/vm/damon/faq.rst b/Documentation/vm/damon/faq.rst
index 088128bbf22b..6469d54c480f 100644
--- a/Documentation/vm/damon/faq.rst
+++ b/Documentation/vm/damon/faq.rst
@@ -43,10 +43,9 @@ constructions and actual access checks can be implemented and configured on the
 DAMON core by the users.  In this way, DAMON users can monitor any address
 space with any access check technique.
 
-Nonetheless, DAMON provides vma tracking and PTE Accessed bit check based
+Nonetheless, DAMON provides vma/rmap tracking and PTE Accessed bit check based
 implementations of the address space dependent functions for the virtual memory
-by default, for a reference and convenient use.  In near future, we will
-provide those for physical memory address space.
+and the physical memory by default, for a reference and convenient use.
 
 
 Can I simply monitor page granularity?
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 12/13] mm/damon/paddr: Separate commonly usable functions
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (10 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 11/13] Docs/DAMON: Document physical memory monitoring support SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2020-12-16  9:42 ` [RFC v10 13/13] mm/damon: Implement primitives for page granularity idleness monitoring SeongJae Park
  2021-05-05 11:26 ` DAMON-based Proactive Reclamation for The Physical Address Space SeongJae Park
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit moves functions in the default physical address space
monitoring primitives that commonly usable from other use cases like
page granularity idleness monitoring to prmtv-common.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon/paddr.c        | 122 ----------------------------------------
 mm/damon/prmtv-common.c | 122 ++++++++++++++++++++++++++++++++++++++++
 mm/damon/prmtv-common.h |   4 ++
 3 files changed, 126 insertions(+), 122 deletions(-)

diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index b120f672cc57..143ddc0e5917 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -19,69 +19,6 @@
  * of the primitives.
  */
 
-/*
- * Get a page by pfn if it is in the LRU list.  Otherwise, returns NULL.
- *
- * The body of this function is stollen from the 'page_idle_get_page()'.  We
- * steal rather than reuse it because the code is quite simple.
- */
-static struct page *damon_pa_get_page(unsigned long pfn)
-{
-	struct page *page = pfn_to_online_page(pfn);
-	pg_data_t *pgdat;
-
-	if (!page || !PageLRU(page) ||
-	    !get_page_unless_zero(page))
-		return NULL;
-
-	pgdat = page_pgdat(page);
-	spin_lock_irq(&pgdat->lru_lock);
-	if (unlikely(!PageLRU(page))) {
-		put_page(page);
-		page = NULL;
-	}
-	spin_unlock_irq(&pgdat->lru_lock);
-	return page;
-}
-
-static bool __damon_pa_mkold(struct page *page, struct vm_area_struct *vma,
-		unsigned long addr, void *arg)
-{
-	damon_va_mkold(vma->vm_mm, addr);
-	return true;
-}
-
-static void damon_pa_mkold(unsigned long paddr)
-{
-	struct page *page = damon_pa_get_page(PHYS_PFN(paddr));
-	struct rmap_walk_control rwc = {
-		.rmap_one = __damon_pa_mkold,
-		.anon_lock = page_lock_anon_vma_read,
-	};
-	bool need_lock;
-
-	if (!page)
-		return;
-
-	if (!page_mapped(page) || !page_rmapping(page)) {
-		set_page_idle(page);
-		put_page(page);
-		return;
-	}
-
-	need_lock = !PageAnon(page) || PageKsm(page);
-	if (need_lock && !trylock_page(page)) {
-		put_page(page);
-		return;
-	}
-
-	rmap_walk(page, &rwc);
-
-	if (need_lock)
-		unlock_page(page);
-	put_page(page);
-}
-
 static void __damon_pa_prepare_access_check(struct damon_ctx *ctx,
 					    struct damon_region *r)
 {
@@ -101,65 +38,6 @@ void damon_pa_prepare_access_checks(struct damon_ctx *ctx)
 	}
 }
 
-struct damon_pa_access_chk_result {
-	unsigned long page_sz;
-	bool accessed;
-};
-
-static bool damon_pa_accessed(struct page *page, struct vm_area_struct *vma,
-		unsigned long addr, void *arg)
-{
-	struct damon_pa_access_chk_result *result = arg;
-
-	result->accessed = damon_va_young(vma->vm_mm, addr, &result->page_sz);
-
-	/* If accessed, stop walking */
-	return !result->accessed;
-}
-
-static bool damon_pa_young(unsigned long paddr, unsigned long *page_sz)
-{
-	struct page *page = damon_pa_get_page(PHYS_PFN(paddr));
-	struct damon_pa_access_chk_result result = {
-		.page_sz = PAGE_SIZE,
-		.accessed = false,
-	};
-	struct rmap_walk_control rwc = {
-		.arg = &result,
-		.rmap_one = damon_pa_accessed,
-		.anon_lock = page_lock_anon_vma_read,
-	};
-	bool need_lock;
-
-	if (!page)
-		return false;
-
-	if (!page_mapped(page) || !page_rmapping(page)) {
-		if (page_is_idle(page))
-			result.accessed = false;
-		else
-			result.accessed = true;
-		put_page(page);
-		goto out;
-	}
-
-	need_lock = !PageAnon(page) || PageKsm(page);
-	if (need_lock && !trylock_page(page)) {
-		put_page(page);
-		return NULL;
-	}
-
-	rmap_walk(page, &rwc);
-
-	if (need_lock)
-		unlock_page(page);
-	put_page(page);
-
-out:
-	*page_sz = result.page_sz;
-	return result.accessed;
-}
-
 /*
  * Check whether the region was accessed after the last preparation
  *
diff --git a/mm/damon/prmtv-common.c b/mm/damon/prmtv-common.c
index 6cdb96cbc9ef..6c2e760e086c 100644
--- a/mm/damon/prmtv-common.c
+++ b/mm/damon/prmtv-common.c
@@ -102,3 +102,125 @@ bool damon_va_young(struct mm_struct *mm, unsigned long addr,
 
 	return young;
 }
+
+/*
+ * Get a page by pfn if it is in the LRU list.  Otherwise, returns NULL.
+ *
+ * The body of this function is stollen from the 'page_idle_get_page()'.  We
+ * steal rather than reuse it because the code is quite simple.
+ */
+static struct page *damon_pa_get_page(unsigned long pfn)
+{
+	struct page *page = pfn_to_online_page(pfn);
+	pg_data_t *pgdat;
+
+	if (!page || !PageLRU(page) ||
+	    !get_page_unless_zero(page))
+		return NULL;
+
+	pgdat = page_pgdat(page);
+	spin_lock_irq(&pgdat->lru_lock);
+	if (unlikely(!PageLRU(page))) {
+		put_page(page);
+		page = NULL;
+	}
+	spin_unlock_irq(&pgdat->lru_lock);
+	return page;
+}
+
+static bool __damon_pa_mkold(struct page *page, struct vm_area_struct *vma,
+		unsigned long addr, void *arg)
+{
+	damon_va_mkold(vma->vm_mm, addr);
+	return true;
+}
+
+void damon_pa_mkold(unsigned long paddr)
+{
+	struct page *page = damon_pa_get_page(PHYS_PFN(paddr));
+	struct rmap_walk_control rwc = {
+		.rmap_one = __damon_pa_mkold,
+		.anon_lock = page_lock_anon_vma_read,
+	};
+	bool need_lock;
+
+	if (!page)
+		return;
+
+	if (!page_mapped(page) || !page_rmapping(page)) {
+		set_page_idle(page);
+		put_page(page);
+		return;
+	}
+
+	need_lock = !PageAnon(page) || PageKsm(page);
+	if (need_lock && !trylock_page(page)) {
+		put_page(page);
+		return;
+	}
+
+	rmap_walk(page, &rwc);
+
+	if (need_lock)
+		unlock_page(page);
+	put_page(page);
+}
+
+struct damon_pa_access_chk_result {
+	unsigned long page_sz;
+	bool accessed;
+};
+
+static bool damon_pa_accessed(struct page *page, struct vm_area_struct *vma,
+		unsigned long addr, void *arg)
+{
+	struct damon_pa_access_chk_result *result = arg;
+
+	result->accessed = damon_va_young(vma->vm_mm, addr, &result->page_sz);
+
+	/* If accessed, stop walking */
+	return !result->accessed;
+}
+
+bool damon_pa_young(unsigned long paddr, unsigned long *page_sz)
+{
+	struct page *page = damon_pa_get_page(PHYS_PFN(paddr));
+	struct damon_pa_access_chk_result result = {
+		.page_sz = PAGE_SIZE,
+		.accessed = false,
+	};
+	struct rmap_walk_control rwc = {
+		.arg = &result,
+		.rmap_one = damon_pa_accessed,
+		.anon_lock = page_lock_anon_vma_read,
+	};
+	bool need_lock;
+
+	if (!page)
+		return false;
+
+	if (!page_mapped(page) || !page_rmapping(page)) {
+		if (page_is_idle(page))
+			result.accessed = false;
+		else
+			result.accessed = true;
+		put_page(page);
+		goto out;
+	}
+
+	need_lock = !PageAnon(page) || PageKsm(page);
+	if (need_lock && !trylock_page(page)) {
+		put_page(page);
+		return NULL;
+	}
+
+	rmap_walk(page, &rwc);
+
+	if (need_lock)
+		unlock_page(page);
+	put_page(page);
+
+out:
+	*page_sz = result.page_sz;
+	return result.accessed;
+}
diff --git a/mm/damon/prmtv-common.h b/mm/damon/prmtv-common.h
index a66a6139b4fc..fbe9452bd040 100644
--- a/mm/damon/prmtv-common.h
+++ b/mm/damon/prmtv-common.h
@@ -10,6 +10,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/page_idle.h>
 #include <linux/random.h>
+#include <linux/rmap.h>
 #include <linux/sched/mm.h>
 #include <linux/slab.h>
 
@@ -19,3 +20,6 @@
 void damon_va_mkold(struct mm_struct *mm, unsigned long addr);
 bool damon_va_young(struct mm_struct *mm, unsigned long addr,
 			unsigned long *page_sz);
+
+void damon_pa_mkold(unsigned long paddr);
+bool damon_pa_young(unsigned long paddr, unsigned long *page_sz);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC v10 13/13] mm/damon: Implement primitives for page granularity idleness monitoring
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (11 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 12/13] mm/damon/paddr: Separate commonly usable functions SeongJae Park
@ 2020-12-16  9:42 ` SeongJae Park
  2021-05-05 11:26 ` DAMON-based Proactive Reclamation for The Physical Address Space SeongJae Park
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2020-12-16  9:42 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

The lightweight and upper-bound limited monitoring overhead of DAMON is
available due to its core mechanisms, namely region-based sampling and
adaptive regions adjustment.  However, there could be some use cases
that don't need such fancy mechanisms.  The page-granularity idleness
monitoring is a good example.  Because the metadata for DAMON's overhead
control mechanism only wastes memory in such cases, DAMON allows users
to eliminate such overhead using arbitrary type monitoring targets.

This commit implements a monitoring primitive supporting the
page-granularity idleness monitoring using the arbitrary type target
feature.  It's almost same to Idle Page Tracking, but incur much less
kernel - user context changes compared to it.

Nevertheless, this patch provides only kernel space API.  This feature
will be exported to the user space via the debugfs interface in near
future.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h        | 27 ++++++++++++++
 include/trace/events/damon.h | 19 ++++++++++
 mm/damon/Kconfig             |  9 +++++
 mm/damon/Makefile            |  1 +
 mm/damon/paddr.c             |  2 --
 mm/damon/pgidle.c            | 69 ++++++++++++++++++++++++++++++++++++
 6 files changed, 125 insertions(+), 2 deletions(-)
 create mode 100644 mm/damon/pgidle.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
index ea2fd054b2ef..220e59299f19 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -386,4 +386,31 @@ void damon_pa_set_primitives(struct damon_ctx *ctx);
 
 #endif	/* CONFIG_DAMON_PADDR */
 
+#ifdef CONFIG_DAMON_PGIDLE
+
+/*
+ * struct damon_pfns_range - Represents a pfn range of [@start, @end).
+ * @start:	Start pfn of the range (inclusive).
+ * @end:	End pfn of the range (exclusive).
+ *
+ * In case of the page granularity idleness monitoring, an instance of this
+ * struct is pointed by &damon_ctx.arbitrary_target.
+ */
+struct damon_pfns_range {
+	unsigned long start;
+	unsigned long end;
+};
+
+bool damon_pgi_is_idle(unsigned long pfn, unsigned long *pg_size);
+
+/* Monitoring primitives for page granularity idleness monitoring */
+
+void damon_pgi_prepare_access_checks(struct damon_ctx *ctx);
+unsigned int damon_pgi_check_accesses(struct damon_ctx *ctx);
+bool damon_pgi_target_valid(void *t);
+void damon_pgi_set_primitives(struct damon_ctx *ctx);
+
+#endif	/* CONFIG_DAMON_PGIDLE */
+
+
 #endif	/* _DAMON_H */
diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 2f422f4f1fb9..f0c9f1d801cf 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -37,6 +37,25 @@ TRACE_EVENT(damon_aggregated,
 			__entry->start, __entry->end, __entry->nr_accesses)
 );
 
+TRACE_EVENT(damon_pgi,
+
+	TP_PROTO(unsigned long pfn, bool accessed),
+
+	TP_ARGS(pfn, accessed),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, pfn)
+		__field(bool, accessed)
+	),
+
+	TP_fast_assign(
+		__entry->pfn = pfn;
+		__entry->accessed = accessed;
+	),
+
+	TP_printk("pfn=%lu accessed=%u", __entry->pfn, __entry->accessed)
+);
+
 #endif /* _TRACE_DAMON_H */
 
 /* This part must be outside protection */
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 38f4cfce72dd..eeefb5b633b6 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -42,6 +42,15 @@ config DAMON_PADDR
 	  This builds the default data access monitoring primitives for DAMON
 	  that works for physical address spaces.
 
+config DAMON_PGIDLE
+	bool "Data access monitoring primitives for page granularity idleness"
+	depends on DAMON && MMU
+	select PAGE_EXTENSION if !64BIT
+	select PAGE_IDLE_FLAG
+	help
+	  This builds the default data access monitoring primitives for DAMON
+	  that works for page granularity idleness monitoring.
+
 config DAMON_VADDR_KUNIT_TEST
 	bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS
 	depends on DAMON_VADDR && KUNIT=y
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 8d9b0df79702..017799e5670a 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -3,4 +3,5 @@
 obj-$(CONFIG_DAMON)		:= core.o
 obj-$(CONFIG_DAMON_VADDR)	+= prmtv-common.o vaddr.o
 obj-$(CONFIG_DAMON_PADDR)	+= prmtv-common.o paddr.o
+obj-$(CONFIG_DAMON_PGIDLE)	+= prmtv-common.o pgidle.o
 obj-$(CONFIG_DAMON_DBGFS)	+= dbgfs.o
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 143ddc0e5917..95d7f3b745a9 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -7,8 +7,6 @@
 
 #define pr_fmt(fmt) "damon-pa: " fmt
 
-#include <linux/rmap.h>
-
 #include "prmtv-common.h"
 
 /*
diff --git a/mm/damon/pgidle.c b/mm/damon/pgidle.c
new file mode 100644
index 000000000000..dd8297371eaf
--- /dev/null
+++ b/mm/damon/pgidle.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON Primitives for Page Granularity Idleness Monitoring
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#define pr_fmt(fmt) "damon-pgi: " fmt
+
+#include <linux/rmap.h>
+
+#include "prmtv-common.h"
+
+#include <trace/events/damon.h>
+
+bool damon_pgi_is_idle(unsigned long pfn, unsigned long *pg_size)
+{
+	return damon_pa_young(PFN_PHYS(pfn), pg_size);
+}
+
+/*
+ * This has no implementations for 'init_target_regions()' and
+ * 'update_target_regions()'.  Users should set the initial regions and update
+ * regions by themselves in the 'before_start' and 'after_aggregation'
+ * callbacks, respectively.  Or, they can implement and use their own version
+ * of the primitives.
+ */
+
+void damon_pgi_prepare_access_checks(struct damon_ctx *ctx)
+{
+	struct damon_pfns_range *target = ctx->arbitrary_target;
+	unsigned long pfn;
+
+	for (pfn = target->start; pfn < target->end; pfn++)
+		damon_pa_mkold(PFN_PHYS(pfn));
+}
+
+unsigned int damon_pgi_check_accesses(struct damon_ctx *ctx)
+{
+	struct damon_pfns_range *target = ctx->arbitrary_target;
+	unsigned long pfn;
+	unsigned long pg_size = 0;
+
+	for (pfn = target->start; pfn < target->end; pfn++) {
+		pg_size = 0;
+		trace_damon_pgi(pfn, damon_pa_young(PFN_PHYS(pfn), &pg_size));
+		if (pg_size > PAGE_SIZE)
+			pfn += pg_size / PAGE_SIZE - 1;
+	}
+
+	return 0;
+}
+
+bool damon_pgi_target_valid(void *target)
+{
+	return true;
+}
+
+void damon_pgi_set_primitives(struct damon_ctx *ctx)
+{
+	ctx->primitive.init_target_regions = NULL;
+	ctx->primitive.update_target_regions = NULL;
+	ctx->primitive.prepare_access_checks = damon_pgi_prepare_access_checks;
+	ctx->primitive.check_accesses = damon_pgi_check_accesses;
+	ctx->primitive.reset_aggregated = NULL;
+	ctx->primitive.target_valid = damon_pgi_target_valid;
+	ctx->primitive.cleanup = NULL;
+	ctx->primitive.apply_scheme = NULL;
+}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC v10 01/13] damon/dbgfs: Allow users to set initial monitoring target regions
  2020-12-16  9:42 ` [RFC v10 01/13] damon/dbgfs: Allow users to set initial monitoring target regions SeongJae Park
@ 2021-01-19 18:41   ` SeongJae Park
  0 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2021-01-19 18:41 UTC (permalink / raw)
  To: SeongJae Park
  Cc: akpm, Jonathan.Cameron, aarcange, acme, alexander.shishkin, amit,
	benh, brendan.d.gregg, brendanhiggins, cai, colin.king, corbet,
	david, dwmw, elver, fan.du, foersleo, gthelen, irogers, jolsa,
	kirill, mark.rutland, mgorman, minchan, mingo, namhyung, peterz,
	rdunlap, riel, rientjes, rostedt, rppt, sblbir, shakeelb, shuah,
	sj38.park, snu, vbabka, vdavydov.dev, yang.shi, ying.huang,
	zgf574564920, linux-damon, linux-mm, linux-doc, linux-kernel

On Wed, 16 Dec 2020 10:42:09 +0100 SeongJae Park <sjpark@amazon.com> wrote:

> From: SeongJae Park <sjpark@amazon.de>
> 
> Some 'damon-dbgfs' users would want to monitor only a part of the entire
> virtual memory address space.  The framework users in the kernel space
> could use '->init_target_regions' callback or even set the regions
> inside the context struct as they want, but 'damon-dbgfs' users cannot.
> 
> For the reason, this commit introduces a new debugfs file,
> 'init_region'.  'damon-dbgfs' users can specify which initial monitoring
> target address regions they want by writing special input to the file.
> The input should describe each region in each line in below form:
> 
>     <pid> <start address> <end address>
> 
> Note that the regions will be updated to cover entire memory mapped
> regions after 'regions update interval'.  If you want the regions to not
> be updated after the initial setting, you could set the interval as a
> very long time, say, a few decades.
> 
> Signed-off-by: SeongJae Park <sjpark@amazon.de>
> ---
>  mm/damon/dbgfs.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 151 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
> index 06295c986dc3..2f1ec6ebd9f0 100644
> --- a/mm/damon/dbgfs.c
> +++ b/mm/damon/dbgfs.c
[...]
> +
> +static ssize_t dbgfs_init_regions_read(struct file *file, char __user *buf,
> +		size_t count, loff_t *ppos)
> +{
> +	struct damon_ctx *ctx = file->private_data;
> +	char *kbuf;
> +	ssize_t len;
> +
> +	kbuf = kmalloc(count, GFP_KERNEL);
> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	mutex_lock(&ctx->kdamond_lock);
> +	if (ctx->kdamond) {
> +		mutex_unlock(&ctx->kdamond_lock);
> +		return -EBUSY;

Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. found that
above return is missing deallocation of 'kbuf'.  I will fix this in the next
version.

> +	}
> +
> +	len = sprint_init_regions(ctx, kbuf, count);
> +	mutex_unlock(&ctx->kdamond_lock);
> +	if (len < 0)
> +		goto out;
> +	len = simple_read_from_buffer(buf, count, ppos, kbuf, len);
> +
> +out:
> +	kfree(kbuf);
> +	return len;
> +}


Thanks,
SeongJae Park
[...]


^ permalink raw reply	[flat|nested] 16+ messages in thread

* DAMON-based Proactive Reclamation for The Physical Address Space
  2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
                   ` (12 preceding siblings ...)
  2020-12-16  9:42 ` [RFC v10 13/13] mm/damon: Implement primitives for page granularity idleness monitoring SeongJae Park
@ 2021-05-05 11:26 ` SeongJae Park
  13 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2021-05-05 11:26 UTC (permalink / raw)
  To: SeongJae Park
  Cc: akpm, Jonathan.Cameron, aarcange, acme, alexander.shishkin, amit,
	benh, brendan.d.gregg, brendanhiggins, cai, colin.king, corbet,
	david, dwmw, elver, fan.du, foersleo, gthelen, irogers, jolsa,
	kirill, mark.rutland, mgorman, minchan, mingo, namhyung, peterz,
	rdunlap, riel, rientjes, rostedt, rppt, sblbir, shakeelb, shuah,
	sj38.park, snu, vbabka, vdavydov.dev, yang.shi, ying.huang,
	zgf574564920, linux-damon, linux-mm, linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

On Wed, 16 Dec 2020 10:42:08 +0100 SeongJae Park <sjpark@amazon.com> wrote:

> From: SeongJae Park <sjpark@amazon.de>
> 
> NOTE: This is only an RFC for future features of DAMON patchset[1], which is
> not merged in the mainline yet.  The aim of this RFC is to show how DAMON would
> be evolved once it is merged in.  So, if you have some interest in this RFC,
> please consider reviewing the DAMON patchset, either.
> 
[...]

TL; DR: I confirmed DAMON's physical address monitoring works effectively by
implementing a proactive reclamation system using DAMON and evaluating it with
24 realistic workloads.

DAMON's overhead control logics, namely 'region-based sampling' and 'adaptive
regions adjustment', are based on an assumption.  That is, there would be a
number of memory regions that pages in each region having similar access
frequency.  In other words, a sort of spatial locality.

This made some people concerned about the accuracy of physical address space
monitoring.  In detail, because any process in the system can make access to
the physical address space, the pattern would be more chaotic and randomic than
virtual address spaces.  As a result, the spatial locality assumption is broken
and DAMON will give only poor quality monitoring results.

I'd argue such case will be very rare in real.  After all, the assumption-based
logics are only optional[1].  I also confirmed the physical address space
monitoring results are accurate enough for basic profiling, with real
production systems[2] and my test workloads.

In the past, I shown the effectiveness of the DAMON's virtual address space
monitoring with the monitoring-based proactive reclamation[3].  I call the
implementation 'prcl'.  To show the effectiveness of the DAMON's physical
address space monitoring and convince some more people, I did same work again,
for the physical address space monitoring.  That is, I implemented a physical
address space monitoring-based version of the proactive reclamation ('pprcl')
and evaluated it's performance with 24 realistic workloads.  The setup is
almost same to the previously shared one[3].

In detail, 'pprcl' finds memory regions in physical address space that didn't
accessed for >=5 seconds and reclaim those.  'prcl' is similar but finds the
regions from the virtual address space of the target workload, and the
threshold time is tuned for each workload, so that it wouldn't incur too high
runtime overhead.


Reduction of Workload's Residential Sets
-----------------------------------------

Below shows the averaged RSS of each workload on the systems.

rss.avg                 orig         prcl         (overhead) pprcl        (overhead)
parsec3/blackscholes    588658.400   255710.400   (-56.56)   291570.800   (-50.47)
parsec3/bodytrack       32286.600    6714.200     (-79.20)   29023.200    (-10.11)
parsec3/canneal         841353.400   841823.600   (0.06)     841721.800   (0.04)
parsec3/dedup           1163860.000  561526.200   (-51.75)   922990.000   (-20.70)
parsec3/facesim         311657.800   191045.600   (-38.70)   188238.200   (-39.60)
parsec3/fluidanimate    531832.000   415361.600   (-21.90)   418925.800   (-21.23)
parsec3/freqmine        552641.400   37270.000    (-93.26)   66849.800    (-87.90)
parsec3/raytrace        885486.400   296335.800   (-66.53)   360111.000   (-59.33)
parsec3/streamcluster   110838.200   109961.000   (-0.79)    108288.600   (-2.30)
parsec3/swaptions       5697.600     3575.200     (-37.25)   1982.600     (-65.20)
parsec3/vips            31849.200    27923.400    (-12.33)   29194.000    (-8.34)
parsec3/x264            81749.800    81936.600    (0.23)     80098.600    (-2.02)
splash2x/barnes         1217412.400  681704.000   (-44.00)   825071.200   (-32.23)
splash2x/fft            10055745.800 8948474.600  (-11.01)   9049028.600  (-10.01)
splash2x/lu_cb          511975.400   338240.000   (-33.93)   343283.200   (-32.95)
splash2x/lu_ncb         511459.000   406830.400   (-20.46)   392444.400   (-23.27)
splash2x/ocean_cp       3384642.800  3413014.800  (0.84)     3377972.000  (-0.20)
splash2x/ocean_ncp      3943689.400  3950712.800  (0.18)     3896549.800  (-1.20)
splash2x/radiosity      1472601.000  96327.400    (-93.46)   245859.800   (-83.30)
splash2x/radix          2419770.000  2467029.400  (1.95)     2416935.600  (-0.12)
splash2x/raytrace       23297.600    5559.200     (-76.14)   12799.000    (-45.06)
splash2x/volrend        44117.400    16930.400    (-61.62)   20800.400    (-52.85)
splash2x/water_nsquared 29403.200    13191.400    (-55.14)   25244.400    (-14.14)
splash2x/water_spatial  663455.600   258882.000   (-60.98)   479496.000   (-27.73)
total                   29415600.000 23426082.000 (-20.36)   24424396.000 (-16.97)
average                 1225650.000  976087.000   (-37.99)   1017680.000  (-28.76)

On average, 'prcl' saved 37.99% of memory, while 'pprcl' saved 28.76%.  The
memory saving of 'pprcl' is smaller than that of 'prcl', though the difference
is not significant.  Note that this machine has about 130 GiB memory, which is
much larger than the RSS of the workloads (only about 1.2 GiB on average).  I
believe this fact made the accuracy of the physical address monitoring worse
than the virtual address monitoring.  Compared to the monitoring scope increase
(about 100x), the accuracy degradation is very small.


System Global Memory Saving
---------------------------

I further measured the amount of free memory of the system to calculate the
system global memory footprint.

memused.avg             orig         prcl         (overhead) pprcl        (overhead)
parsec3/blackscholes    1838734.200  1617375.000  (-12.04)   321902.200   (-82.49)
parsec3/bodytrack       1436094.400  1451703.200  (1.09)     256972.600   (-82.11)
parsec3/canneal         1048424.600  1062165.200  (1.31)     885787.600   (-15.51)
parsec3/dedup           2526629.800  2506042.600  (-0.81)    1777099.400  (-29.67)
parsec3/facesim         546595.800   494834.200   (-9.47)    243344.600   (-55.48)
parsec3/fluidanimate    581078.800   484461.200   (-16.63)   409179.000   (-29.58)
parsec3/freqmine        994034.000   760863.000   (-23.46)   320619.200   (-67.75)
parsec3/raytrace        1753114.800  1565592.600  (-10.70)   703991.600   (-59.84)
parsec3/streamcluster   128533.400   142138.200   (10.58)    100322.200   (-21.95)
parsec3/swaptions       22869.200    40935.000    (79.00)    -11221.800   (-149.07)
parsec3/vips            2992238.000  2948726.000  (-1.45)    479531.000   (-83.97)
parsec3/x264            3250209.000  3273603.400  (0.72)     691699.400   (-78.72)
splash2x/barnes         1220499.800  955857.200   (-21.68)   978864.800   (-19.80)
splash2x/fft            9674473.000  9803918.800  (1.34)     10242764.800 (5.87)
splash2x/lu_cb          521333.400   365105.200   (-29.97)   323198.200   (-38.01)
splash2x/lu_ncb         521936.200   431906.000   (-17.25)   384663.200   (-26.30)
splash2x/ocean_cp       3295293.800  3311071.800  (0.48)     3281148.000  (-0.43)
splash2x/ocean_ncp      3917407.800  3926460.000  (0.23)     3871557.000  (-1.17)
splash2x/radiosity      1472602.400  431091.600   (-70.73)   496768.400   (-66.27)
splash2x/radix          2394703.600  2340372.000  (-2.27)    2494416.400  (4.16)
splash2x/raytrace       52380.400    61028.200    (16.51)    4832.600     (-90.77)
splash2x/volrend        159425.800   167845.600   (5.28)     36449.600    (-77.14)
splash2x/water_nsquared 50912.200    69023.600    (35.57)    12645.200    (-75.16)
splash2x/water_spatial  681121.200   382255.200   (-43.88)   516789.200   (-24.13)
total                   41080500.000 38594500.000 (-6.05)    28823200.000 (-29.84)
average                 1711690.000  1608100.000  (-4.51)    1200970.000  (-48.55)

On average, 'pprcl' (48.55 %) saved about 10 times more memory than 'prcl'
(4.51 %).  I believe this is because 'pprcl' can reclaim any system memory
while 'prcl' can do that for only the memory mapped to the workload.


Runtime Overhead
----------------

I also measured the runtime of each workload, because the proactive reclamation
could make workloads slowed down.  Note that we used 'zram' as a swap device[3]
to minimize the degradation.

runtime                 orig     prcl     (overhead) pprcl    (overhead)
parsec3/blackscholes    138.566  146.351  (5.62)     139.731  (0.84)
parsec3/bodytrack       125.359  141.542  (12.91)    127.269  (1.52)
parsec3/canneal         203.778  216.348  (6.17)     225.055  (10.44)
parsec3/dedup           18.261   20.552   (12.55)    19.662   (7.67)
parsec3/facesim         338.071  367.367  (8.67)     344.212  (1.82)
parsec3/fluidanimate    341.858  341.465  (-0.11)    332.765  (-2.66)
parsec3/freqmine        437.206  449.147  (2.73)     444.311  (1.63)
parsec3/raytrace        185.744  201.641  (8.56)     186.037  (0.16)
parsec3/streamcluster   640.900  680.466  (6.17)     637.582  (-0.52)
parsec3/swaptions       220.612  223.065  (1.11)     221.809  (0.54)
parsec3/vips            87.661   91.613   (4.51)     94.582   (7.89)
parsec3/x264            114.661  125.278  (9.26)     112.389  (-1.98)
splash2x/barnes         128.298  145.497  (13.41)    139.424  (8.67)
splash2x/fft            58.677   64.417   (9.78)     76.932   (31.11)
splash2x/lu_cb          133.660  138.980  (3.98)     133.222  (-0.33)
splash2x/lu_ncb         148.260  151.129  (1.93)     152.448  (2.82)
splash2x/ocean_cp       75.966   76.765   (1.05)     76.880   (1.20)
splash2x/ocean_ncp      153.289  162.596  (6.07)     172.197  (12.33)
splash2x/radiosity      143.191  154.972  (8.23)     148.913  (4.00)
splash2x/radix          51.190   51.030   (-0.31)    61.811   (20.75)
splash2x/raytrace       133.835  147.047  (9.87)     135.699  (1.39)
splash2x/volrend        120.789  129.783  (7.45)     121.455  (0.55)
splash2x/water_nsquared 370.232  424.013  (14.53)    378.424  (2.21)
splash2x/water_spatial  132.444  151.769  (14.59)    146.471  (10.59)
total                   4502.510 4802.850 (6.67)     4629.270 (2.82)
average                 187.605  200.119  (7.03)     192.886  (5.11)

On average, 'pprcl' outperforms 'prcl' again, though the difference is only
small.  'pprcl' incurs 5.11% slowdown to the workload, while 'prcl' incurs
7.03% slowdown.

Nevertheless, because the reclamation threshold (5 seconds) of 'pprcl' is not
tuned for each workload, it sometimes do too aggressive reclamation and
therefore incur high runtime overhead to some workloads, including splash2x/fft
(31.11%) and splash2x/radix (20.75%).  In contrast, the worst-case runtime
overhead of 'prcl' is only 14.59% (splash2x/water_spatial) because it uses
different tuned thresholds that tuned for each workload.


Conclusion
----------

Based on the above results, I argue that DAMON's overhead control mechanism can
be effective enough for the physical address space.

Nonetheless, note that DAMON is a framework for general access monitoring of
any address space, and the overhead control logic is only optional.  You can
always disable it if it doesn't make sense for your specific use case.

If this results make you interested, please consider reviewing the DAMON
patchset[2].


[1] https://lore.kernel.org/linux-mm/20201216094221.11898-14-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-mm/20210413142904.556-1-sj38.park@gmail.com/
[3] https://damonitor.github.io/doc/html/latest/vm/damon/eval.html#proactive-reclamation


Thanks,
SeongJae Park


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-05 11:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16  9:42 [RFC v10 00/13] DAMON: Support Physical Memory Address Space and Page-granularity Idleness Monitoring SeongJae Park
2020-12-16  9:42 ` [RFC v10 01/13] damon/dbgfs: Allow users to set initial monitoring target regions SeongJae Park
2021-01-19 18:41   ` SeongJae Park
2020-12-16  9:42 ` [RFC v10 02/13] tools/damon: Support init target regions specification SeongJae Park
2020-12-16  9:42 ` [RFC v10 03/13] damon/dbgfs-test: Add a unit test case for 'init_regions' SeongJae Park
2020-12-16  9:42 ` [RFC v10 04/13] selftests/damon/_chk_record: Do not check number of gaps SeongJae Park
2020-12-16  9:42 ` [RFC v10 05/13] Docs/admin-guide/mm/damon: Document 'init_regions' feature SeongJae Park
2020-12-16  9:42 ` [RFC v10 06/13] mm/damon/vaddr: Separate commonly usable functions SeongJae Park
2020-12-16  9:42 ` [RFC v10 07/13] mm/damon: Implement primitives for physical address space monitoring SeongJae Park
2020-12-16  9:42 ` [RFC v10 08/13] damon/dbgfs: Support physical memory monitoring SeongJae Park
2020-12-16  9:42 ` [RFC v10 09/13] tools/damon/record: " SeongJae Park
2020-12-16  9:42 ` [RFC v10 10/13] tools/damon/record: Support NUMA specific recording SeongJae Park
2020-12-16  9:42 ` [RFC v10 11/13] Docs/DAMON: Document physical memory monitoring support SeongJae Park
2020-12-16  9:42 ` [RFC v10 12/13] mm/damon/paddr: Separate commonly usable functions SeongJae Park
2020-12-16  9:42 ` [RFC v10 13/13] mm/damon: Implement primitives for page granularity idleness monitoring SeongJae Park
2021-05-05 11:26 ` DAMON-based Proactive Reclamation for The Physical Address Space SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).