linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] DAMON: Make coexistable with Idle Page Tracking
@ 2020-09-28 10:35 SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 2/5] mm/damon: Separate DAMON schemes application to primitives SeongJae Park
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: SeongJae Park @ 2020-09-28 10:35 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

NOTE: This is an RFC for future change of DAMON patchsets[1,2,3], which is not
merged in the mainline yet.  The aim of this RFC is to show how the patchset
would be changed in the next version.  So, if you have some interest in this
RFC, please consider reviewing the DAMON patchset, either.

Currently, DAMON is configured to be exclusive with Idle Page Tracking because
both of the subsystems use PG_Idle flag and there is no way to synchronize with
Idle Page Tracking.  Though there are many use cases DAMON could do better than
Idle Page Tracking, DAMON cannot fully replace Idle Page Tracking, since

- DAMON doesn't support all features of Idle Page Tracking from the beginning
  (e.g., physical address space is supported from the third DAMON patchset[3]),
  and
- there are some use cases Idle Page Tracking could be more efficient (e.g.,
  page size granularity working set size calculation).

Therefore, this patchset makes DAMON coexistable with Idle Page Tracking.  As
the first decision of making DAMON exclusive was not a good idea, this change
will be merged in the next versions of the original patchsets[1,2,3].
Therefore, you could skip detail of the changes but wait for postings of the
next versions of the patchsets, except the 4th patch.

The changes significantly refactor the code, especially 'damon.c' and
'damon-test.c'.  Though the refactoring changes are only straightforward, if
you gave 'Reviewed-by' before and you want to drop it due to the changes,
please let me know.

[1] https://lore.kernel.org/linux-mm/20200817105137.19296-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-mm/20200804142430.15384-1-sjpark@amazon.com/
[3] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/

Sequence of Patches
===================

The 1st patch separates DAMON components that unnecessarily implemented in one
source file and depend on one config option (CONFIG_DAMON)
to multiple files and apply fine-grained dependency.  As a result, the core
framework part of DAMON becomes coexistable with Idle Page Tracking.

Following two patches further refactor the code for cleaner bound between the
components.

The 4th patch implements a synchronization infrastructure for PG_idle flag
users.  We implement it to eventually used for DAMON, but the change is
independent with DAMON and the also required for Idle Page Tracking itself.
This could be picked before DAMON patchsets merged.

Finally, the 5th patch updates DAMON to use the PG_idle synchronization
infrastructure and fully coexistable with Page Idle Tracking.

Baseline and Complete Git Trees
===============================

The patches are based on the v5.8 plus DAMON v20 patchset[1], RFC v14 of DAMOS
patchset, RFC v8 of physical address space support patchset, RFC v1 of user
space improvement[4], and some more trivial fixes (s/snprintf/scnprintf).  You
can also clone the complete git tree:

    $ git clone git://github.com/sjp38/linux -b damon-usi/rfc/v1

The web is also available:
https://github.com/sjp38/linux/releases/tag/damon-usi/rfc/v1


[1] https://lore.kernel.org/linux-mm/20200817105137.19296-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-mm/20200804142430.15384-1-sjpark@amazon.com/
[3] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
[4] https://lore.kernel.org/linux-mm/20200915180807.18812-1-sjpark@amazon.com/

SeongJae Park (5):
  mm/damon: Separate components and apply fine-grained dependencies
  mm/damon: Separate DAMON schemes application to primitives
  mm/damon: Move recording feature from core to dbgfs
  mm/page_idle: Avoid interferences from concurrent users
  mm/damon/primitives: Make coexistable with Idle Page Tracking

 .../admin-guide/mm/idle_page_tracking.rst     |   22 +-
 MAINTAINERS                                   |    3 +-
 include/linux/damon.h                         |  109 +-
 include/linux/page_idle.h                     |    2 +
 mm/Kconfig                                    |   25 +-
 mm/Makefile                                   |    2 +-
 mm/damon-test.h                               |  724 -----
 mm/damon.c                                    | 2754 -----------------
 mm/damon/Kconfig                              |   68 +
 mm/damon/Makefile                             |    5 +
 mm/damon/core-test.h                          |  253 ++
 mm/damon/core.c                               |  860 +++++
 mm/damon/damon.h                              |    7 +
 mm/damon/dbgfs-test.h                         |  264 ++
 mm/damon/dbgfs.c                              | 1158 +++++++
 mm/damon/primitives-test.h                    |  328 ++
 mm/damon/primitives.c                         |  896 ++++++
 mm/page_idle.c                                |   40 +
 18 files changed, 3982 insertions(+), 3538 deletions(-)
 delete mode 100644 mm/damon-test.h
 delete mode 100644 mm/damon.c
 create mode 100644 mm/damon/Kconfig
 create mode 100644 mm/damon/Makefile
 create mode 100644 mm/damon/core-test.h
 create mode 100644 mm/damon/core.c
 create mode 100644 mm/damon/damon.h
 create mode 100644 mm/damon/dbgfs-test.h
 create mode 100644 mm/damon/dbgfs.c
 create mode 100644 mm/damon/primitives-test.h
 create mode 100644 mm/damon/primitives.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/5] mm/damon: Separate DAMON schemes application to primitives
  2020-09-28 10:35 [RFC PATCH 0/5] DAMON: Make coexistable with Idle Page Tracking SeongJae Park
@ 2020-09-28 10:35 ` SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 3/5] mm/damon: Move recording feature from core to dbgfs SeongJae Park
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2020-09-28 10:35 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

DAMON-based operation schemes feature is implemented inside DAMON
'core'.  Though the access pattern based schemes target region tracking
part makes sense to reside in the 'core', applying the scheme action
would better to be reside in the 'primitives', as the work highly
depends on the type of the target region.

For the reason, this commit moves the part to 'primitives' by adding one
more context callback, 'apply_scheme' and implementing it in the
reference primitives implementation for the virtual address spaces.
Note that this doesn't add the implementation for the physical address
space, as it didn't exist before.  Nonetheless, the extension for
physical space would be easily done in this way in future.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h |  8 +++++
 mm/damon/core.c       | 65 ++------------------------------------
 mm/damon/damon.h      | 28 -----------------
 mm/damon/primitives.c | 73 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 82 insertions(+), 92 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 264958a62c02..505e6261cefa 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -170,6 +170,7 @@ struct damos {
  * @check_accesses:		Checks the access of target regions.
  * @target_valid:		Determine if the target is valid.
  * @cleanup:			Cleans up the context.
+ * @apply_scheme:		Apply a DAMON-based operation scheme.
  * @sample_cb:			Called for each sampling interval.
  * @aggregate_cb:		Called for each aggregation interval.
  *
@@ -193,6 +194,9 @@ struct damos {
  * monitoring.
  * @cleanup is called from @kdamond just before its termination.  After this
  * call, only @kdamond_lock and @kdamond will be touched.
+ * @apply_scheme is called from @kdamond when a region for user provided
+ * DAMON-based operation scheme is found.  It should apply the scheme's action
+ * to the region.
  *
  * @sample_cb and @aggregate_cb are called from @kdamond for each of the
  * sampling intervals and aggregation intervals, respectively.  Therefore,
@@ -229,6 +233,8 @@ struct damon_ctx {
 	unsigned int (*check_accesses)(struct damon_ctx *context);
 	bool (*target_valid)(struct damon_target *target);
 	void (*cleanup)(struct damon_ctx *context);
+	int (*apply_scheme)(struct damon_ctx *context, struct damon_target *t,
+			struct damon_region *r, struct damos *scheme);
 	void (*sample_cb)(struct damon_ctx *context);
 	void (*aggregate_cb)(struct damon_ctx *context);
 };
@@ -312,6 +318,8 @@ void kdamond_prepare_vm_access_checks(struct damon_ctx *ctx);
 unsigned int kdamond_check_vm_accesses(struct damon_ctx *ctx);
 bool kdamond_vm_target_valid(struct damon_target *t);
 void kdamond_vm_cleanup(struct damon_ctx *ctx);
+int kdamond_vm_apply_scheme(struct damon_ctx *context, struct damon_target *t,
+		struct damon_region *r, struct damos *scheme);
 void damon_set_vaddr_primitives(struct damon_ctx *ctx);
 
 /* Reference callback implementations for physical memory */
diff --git a/mm/damon/core.c b/mm/damon/core.c
index d85ade7b5e23..ba52421a2673 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -701,68 +701,6 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
 	}
 }
 
-#ifndef CONFIG_ADVISE_SYSCALLS
-static int damos_madvise(struct damon_target *target, struct damon_region *r,
-			int behavior)
-{
-	return -EINVAL;
-}
-#else
-static int damos_madvise(struct damon_target *target, struct damon_region *r,
-			int behavior)
-{
-	struct task_struct *t;
-	struct mm_struct *mm;
-	int ret = -ENOMEM;
-
-	t = damon_get_task_struct(target);
-	if (!t)
-		goto out;
-	mm = damon_get_mm(target);
-	if (!mm)
-		goto put_task_out;
-
-	ret = do_madvise(t, mm, PAGE_ALIGN(r->ar.start),
-			PAGE_ALIGN(r->ar.end - r->ar.start), behavior);
-	mmput(mm);
-put_task_out:
-	put_task_struct(t);
-out:
-	return ret;
-}
-#endif	/* CONFIG_ADVISE_SYSCALLS */
-
-static int damos_do_action(struct damon_target *target, struct damon_region *r,
-			enum damos_action action)
-{
-	int madv_action;
-
-	switch (action) {
-	case DAMOS_WILLNEED:
-		madv_action = MADV_WILLNEED;
-		break;
-	case DAMOS_COLD:
-		madv_action = MADV_COLD;
-		break;
-	case DAMOS_PAGEOUT:
-		madv_action = MADV_PAGEOUT;
-		break;
-	case DAMOS_HUGEPAGE:
-		madv_action = MADV_HUGEPAGE;
-		break;
-	case DAMOS_NOHUGEPAGE:
-		madv_action = MADV_NOHUGEPAGE;
-		break;
-	case DAMOS_STAT:
-		return 0;
-	default:
-		pr_warn("Wrong action %d\n", action);
-		return -EINVAL;
-	}
-
-	return damos_madvise(target, r, madv_action);
-}
-
 static void damon_do_apply_schemes(struct damon_ctx *c,
 				   struct damon_target *t,
 				   struct damon_region *r)
@@ -781,7 +719,8 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
 			continue;
 		s->stat_count++;
 		s->stat_sz += sz;
-		damos_do_action(t, r, s->action);
+		if (c->apply_scheme)
+			c->apply_scheme(c, t, r, s);
 		if (s->action != DAMOS_STAT)
 			r->age = 0;
 	}
diff --git a/mm/damon/damon.h b/mm/damon/damon.h
index fc565fff4953..4315dadcca8a 100644
--- a/mm/damon/damon.h
+++ b/mm/damon/damon.h
@@ -5,31 +5,3 @@
 
 /* Get a random number in [l, r) */
 #define damon_rand(l, r) (l + prandom_u32() % (r - l))
-
-/*
- * 't->id' should be the pointer to the relevant 'struct pid' having reference
- * count.  Caller must put the returned task, unless it is NULL.
- */
-#define damon_get_task_struct(t) \
-	(get_pid_task((struct pid *)t->id, PIDTYPE_PID))
-
-/*
- * Get the mm_struct of the given target
- *
- * Caller _must_ put the mm_struct after use, unless it is NULL.
- *
- * Returns the mm_struct of the target on success, NULL on failure
- */
-static inline struct mm_struct *damon_get_mm(struct damon_target *t)
-{
-	struct task_struct *task;
-	struct mm_struct *mm;
-
-	task = damon_get_task_struct(t);
-	if (!task)
-		return NULL;
-
-	mm = get_task_mm(task);
-	put_task_struct(task);
-	return mm;
-}
diff --git a/mm/damon/primitives.c b/mm/damon/primitives.c
index d7796cbffbd8..e762dc8a5f2e 100644
--- a/mm/damon/primitives.c
+++ b/mm/damon/primitives.c
@@ -38,8 +38,11 @@
 #endif
 
 /*
- * Functions for the initial monitoring target regions construction
+ * 't->id' should be the pointer to the relevant 'struct pid' having reference
+ * count.  Caller must put the returned task, unless it is NULL.
  */
+#define damon_get_task_struct(t) \
+	(get_pid_task((struct pid *)t->id, PIDTYPE_PID))
 
 /*
  * Get the mm_struct of the given target
@@ -62,6 +65,10 @@ struct mm_struct *damon_get_mm(struct damon_target *t)
 	return mm;
 }
 
+/*
+ * Functions for the initial monitoring target regions construction
+ */
+
 /*
  * Size-evenly split a region into 'nr_pieces' small regions
  *
@@ -788,6 +795,68 @@ void kdamond_vm_cleanup(struct damon_ctx *ctx)
 	}
 }
 
+#ifndef CONFIG_ADVISE_SYSCALLS
+static int damos_madvise(struct damon_target *target, struct damon_region *r,
+			int behavior)
+{
+	return -EINVAL;
+}
+#else
+static int damos_madvise(struct damon_target *target, struct damon_region *r,
+			int behavior)
+{
+	struct task_struct *t;
+	struct mm_struct *mm;
+	int ret = -ENOMEM;
+
+	t = damon_get_task_struct(target);
+	if (!t)
+		goto out;
+	mm = damon_get_mm(target);
+	if (!mm)
+		goto put_task_out;
+
+	ret = do_madvise(t, mm, PAGE_ALIGN(r->ar.start),
+			PAGE_ALIGN(r->ar.end - r->ar.start), behavior);
+	mmput(mm);
+put_task_out:
+	put_task_struct(t);
+out:
+	return ret;
+}
+#endif	/* CONFIG_ADVISE_SYSCALLS */
+
+int kdamond_vm_apply_scheme(struct damon_ctx *ctx, struct damon_target *t,
+		struct damon_region *r, struct damos *scheme)
+{
+	int madv_action;
+
+	switch (scheme->action) {
+	case DAMOS_WILLNEED:
+		madv_action = MADV_WILLNEED;
+		break;
+	case DAMOS_COLD:
+		madv_action = MADV_COLD;
+		break;
+	case DAMOS_PAGEOUT:
+		madv_action = MADV_PAGEOUT;
+		break;
+	case DAMOS_HUGEPAGE:
+		madv_action = MADV_HUGEPAGE;
+		break;
+	case DAMOS_NOHUGEPAGE:
+		madv_action = MADV_NOHUGEPAGE;
+		break;
+	case DAMOS_STAT:
+		return 0;
+	default:
+		pr_warn("Wrong action %d\n", scheme->action);
+		return -EINVAL;
+	}
+
+	return damos_madvise(t, r, madv_action);
+}
+
 void damon_set_vaddr_primitives(struct damon_ctx *ctx)
 {
 	ctx->init_target_regions = kdamond_init_vm_regions;
@@ -796,6 +865,7 @@ void damon_set_vaddr_primitives(struct damon_ctx *ctx)
 	ctx->check_accesses = kdamond_check_vm_accesses;
 	ctx->target_valid = kdamond_vm_target_valid;
 	ctx->cleanup = kdamond_vm_cleanup;
+	ctx->apply_scheme = kdamond_vm_apply_scheme;
 }
 
 void damon_set_paddr_primitives(struct damon_ctx *ctx)
@@ -806,6 +876,7 @@ void damon_set_paddr_primitives(struct damon_ctx *ctx)
 	ctx->check_accesses = kdamond_check_phys_accesses;
 	ctx->target_valid = NULL;
 	ctx->cleanup = NULL;
+	ctx->apply_scheme = NULL;
 }
 
 #include "primitives-test.h"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 3/5] mm/damon: Move recording feature from core to dbgfs
  2020-09-28 10:35 [RFC PATCH 0/5] DAMON: Make coexistable with Idle Page Tracking SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 2/5] mm/damon: Separate DAMON schemes application to primitives SeongJae Park
@ 2020-09-28 10:35 ` SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 4/5] mm/page_idle: Avoid interferences from concurrent users SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 5/5] mm/damon/primitives: Make coexistable with Idle Page Tracking SeongJae Park
  3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2020-09-28 10:35 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

DAMON passes the monitoring results to user space via two ways: 1) a
tracepoint and 2) it's recording feature.  The recording feature is for
the users who want simplest use.

However, as the feature is for the user space only while the core is
fundamentally a framework for the kernel space, keeping the feature in
the core would make no sense.  Therefore, this commit moves the feature
to the debugfs interface of DAMON.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h |  23 +---
 mm/damon/core-test.h  |  57 ++-------
 mm/damon/core.c       | 150 +-----------------------
 mm/damon/dbgfs-test.h |  87 +++++++++++++-
 mm/damon/dbgfs.c      | 264 ++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 359 insertions(+), 222 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 505e6261cefa..606e59f785a2 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -134,14 +134,6 @@ struct damos {
  * in case of virtual memory monitoring) and applies the changes for each
  * @regions_update_interval.  All time intervals are in micro-seconds.
  *
- * @rbuf: In-memory buffer for monitoring result recording.
- * @rbuf_len: The length of @rbuf.
- * @rbuf_offset: The offset for next write to @rbuf.
- * @rfile_path: Record file path.
- *
- * If @rbuf, @rbuf_len, and @rfile_path are set, the monitored results are
- * automatically stored in @rfile_path file.
- *
  * @kdamond:		Kernel thread who does the monitoring.
  * @kdamond_stop:	Notifies whether kdamond should stop.
  * @kdamond_lock:	Mutex for the synchronizations with @kdamond.
@@ -164,6 +156,8 @@ struct damos {
  * @targets_list:	Head of monitoring targets (&damon_target) list.
  * @schemes_list:	Head of schemes (&damos) list.
  *
+ * @private		Private user data.
+ *
  * @init_target_regions:	Constructs initial monitoring target regions.
  * @update_target_regions:	Updates monitoring target regions.
  * @prepare_access_checks:	Prepares next access check of target regions.
@@ -214,11 +208,6 @@ struct damon_ctx {
 	struct timespec64 last_aggregation;
 	struct timespec64 last_regions_update;
 
-	unsigned char *rbuf;
-	unsigned int rbuf_len;
-	unsigned int rbuf_offset;
-	char *rfile_path;
-
 	struct task_struct *kdamond;
 	bool kdamond_stop;
 	struct mutex kdamond_lock;
@@ -226,6 +215,8 @@ struct damon_ctx {
 	struct list_head targets_list;	/* 'damon_target' objects */
 	struct list_head schemes_list;	/* 'damos' objects */
 
+	void *private;
+
 	/* callbacks */
 	void (*init_target_regions)(struct damon_ctx *context);
 	void (*update_target_regions)(struct damon_ctx *context);
@@ -241,10 +232,6 @@ struct damon_ctx {
 
 #ifdef CONFIG_DAMON
 
-#define MIN_RECORD_BUFFER_LEN	1024
-#define MAX_RECORD_BUFFER_LEN	(4 * 1024 * 1024)
-#define MAX_RFILE_PATH_LEN	256
-
 #define damon_next_region(r) \
 	(container_of(r->list.next, struct damon_region, list))
 
@@ -298,8 +285,6 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
 		unsigned long min_nr_reg, unsigned long max_nr_reg);
 int damon_set_schemes(struct damon_ctx *ctx,
 			struct damos **schemes, ssize_t nr_schemes);
-int damon_set_recording(struct damon_ctx *ctx,
-				unsigned int rbuf_len, char *rfile_path);
 int damon_nr_running_ctxs(void);
 
 int damon_start(struct damon_ctx *ctxs, int nr_ctxs);
diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h
index c916d773397a..b815dfbfb5fd 100644
--- a/mm/damon/core-test.h
+++ b/mm/damon/core-test.h
@@ -36,6 +36,17 @@ static void damon_test_regions(struct kunit *test)
 	damon_free_target(t);
 }
 
+static unsigned int nr_damon_targets(struct damon_ctx *ctx)
+{
+	struct damon_target *t;
+	unsigned int nr_targets = 0;
+
+	damon_for_each_target(t, ctx)
+		nr_targets++;
+
+	return nr_targets;
+}
+
 static void damon_test_target(struct kunit *test)
 {
 	struct damon_ctx *c = damon_new_ctx();
@@ -54,23 +65,6 @@ static void damon_test_target(struct kunit *test)
 	damon_destroy_ctx(c);
 }
 
-static void damon_test_set_recording(struct kunit *test)
-{
-	struct damon_ctx *ctx = damon_new_ctx();
-	int err;
-
-	err = damon_set_recording(ctx, 42, "foo");
-	KUNIT_EXPECT_EQ(test, err, -EINVAL);
-	damon_set_recording(ctx, 4242, "foo.bar");
-	KUNIT_EXPECT_EQ(test, ctx->rbuf_len, 4242u);
-	KUNIT_EXPECT_STREQ(test, ctx->rfile_path, "foo.bar");
-	damon_set_recording(ctx, 424242, "foo");
-	KUNIT_EXPECT_EQ(test, ctx->rbuf_len, 424242u);
-	KUNIT_EXPECT_STREQ(test, ctx->rfile_path, "foo");
-
-	damon_destroy_ctx(ctx);
-}
-
 /*
  * Test kdamond_reset_aggregated()
  *
@@ -91,9 +85,7 @@ static void damon_test_aggregate(struct kunit *test)
 	struct damon_target *t;
 	struct damon_region *r;
 	int it, ir;
-	ssize_t sz, sr, sp;
 
-	damon_set_recording(ctx, 4242, "damon.data");
 	damon_set_targets(ctx, target_ids, 3);
 
 	it = 0;
@@ -121,31 +113,6 @@ static void damon_test_aggregate(struct kunit *test)
 	/* targets also should be preserved */
 	KUNIT_EXPECT_EQ(test, 3, it);
 
-	/* The aggregated information should be written in the buffer */
-	sr = sizeof(r->ar.start) + sizeof(r->ar.end) + sizeof(r->nr_accesses);
-	sp = sizeof(t->id) + sizeof(unsigned int) + 3 * sr;
-	sz = sizeof(struct timespec64) + sizeof(unsigned int) + 3 * sp;
-	KUNIT_EXPECT_EQ(test, (unsigned int)sz, ctx->rbuf_offset);
-
-	damon_destroy_ctx(ctx);
-}
-
-static void damon_test_write_rbuf(struct kunit *test)
-{
-	struct damon_ctx *ctx = damon_new_ctx();
-	char *data;
-
-	damon_set_recording(ctx, 4242, "damon.data");
-
-	data = "hello";
-	damon_write_rbuf(ctx, data, strnlen(data, 256));
-	KUNIT_EXPECT_EQ(test, ctx->rbuf_offset, 5u);
-
-	damon_write_rbuf(ctx, data, 0);
-	KUNIT_EXPECT_EQ(test, ctx->rbuf_offset, 5u);
-
-	KUNIT_EXPECT_STREQ(test, (char *)ctx->rbuf, data);
-
 	damon_destroy_ctx(ctx);
 }
 
@@ -267,9 +234,7 @@ static void damon_test_split_regions_of(struct kunit *test)
 static struct kunit_case damon_test_cases[] = {
 	KUNIT_CASE(damon_test_target),
 	KUNIT_CASE(damon_test_regions),
-	KUNIT_CASE(damon_test_set_recording),
 	KUNIT_CASE(damon_test_aggregate),
-	KUNIT_CASE(damon_test_write_rbuf),
 	KUNIT_CASE(damon_test_split_at),
 	KUNIT_CASE(damon_test_merge_two),
 	KUNIT_CASE(damon_test_merge_regions_of),
diff --git a/mm/damon/core.c b/mm/damon/core.c
index ba52421a2673..ba0035d7a27a 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -208,11 +208,6 @@ struct damon_ctx *damon_new_ctx(void)
 	ktime_get_coarse_ts64(&ctx->last_aggregation);
 	ctx->last_regions_update = ctx->last_aggregation;
 
-	if (damon_set_recording(ctx, 0, "none")) {
-		kfree(ctx);
-		return NULL;
-	}
-
 	mutex_init(&ctx->kdamond_lock);
 
 	INIT_LIST_HEAD(&ctx->targets_list);
@@ -328,54 +323,6 @@ int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes,
 	return 0;
 }
 
-/**
- * damon_set_recording() - Set attributes for the recording.
- * @ctx:	target kdamond context
- * @rbuf_len:	length of the result buffer
- * @rfile_path:	path to the monitor result files
- *
- * Setting 'rbuf_len' 0 disables recording.
- *
- * This function should not be called while the kdamond is running.
- *
- * Return: 0 on success, negative error code otherwise.
- */
-int damon_set_recording(struct damon_ctx *ctx,
-			unsigned int rbuf_len, char *rfile_path)
-{
-	size_t rfile_path_len;
-
-	if (rbuf_len && (rbuf_len > MAX_RECORD_BUFFER_LEN ||
-			rbuf_len < MIN_RECORD_BUFFER_LEN)) {
-		pr_err("result buffer size (%u) is out of [%d,%d]\n",
-				rbuf_len, MIN_RECORD_BUFFER_LEN,
-				MAX_RECORD_BUFFER_LEN);
-		return -EINVAL;
-	}
-	rfile_path_len = strnlen(rfile_path, MAX_RFILE_PATH_LEN);
-	if (rfile_path_len >= MAX_RFILE_PATH_LEN) {
-		pr_err("too long (>%d) result file path %s\n",
-				MAX_RFILE_PATH_LEN, rfile_path);
-		return -EINVAL;
-	}
-	ctx->rbuf_len = rbuf_len;
-	kfree(ctx->rbuf);
-	ctx->rbuf = NULL;
-	kfree(ctx->rfile_path);
-	ctx->rfile_path = NULL;
-
-	if (rbuf_len) {
-		ctx->rbuf = kvmalloc(rbuf_len, GFP_KERNEL);
-		if (!ctx->rbuf)
-			return -ENOMEM;
-	}
-	ctx->rfile_path = kmalloc(rfile_path_len + 1, GFP_KERNEL);
-	if (!ctx->rfile_path)
-		return -ENOMEM;
-	strncpy(ctx->rfile_path, rfile_path, rfile_path_len + 1);
-	return 0;
-}
-
 /**
  * damon_nr_running_ctxs() - Return number of currently running contexts.
  */
@@ -390,17 +337,6 @@ int damon_nr_running_ctxs(void)
 	return nr_ctxs;
 }
 
-static unsigned int nr_damon_targets(struct damon_ctx *ctx)
-{
-	struct damon_target *t;
-	unsigned int nr_targets = 0;
-
-	damon_for_each_target(t, ctx)
-		nr_targets++;
-
-	return nr_targets;
-}
-
 /* Returns the size upper limit for each monitoring region */
 static unsigned long damon_region_sz_limit(struct damon_ctx *ctx)
 {
@@ -613,87 +549,18 @@ static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx)
 }
 
 /*
- * Flush the content in the result buffer to the result file
- */
-static void damon_flush_rbuffer(struct damon_ctx *ctx)
-{
-	ssize_t sz;
-	loff_t pos = 0;
-	struct file *rfile;
-
-	if (!ctx->rbuf_offset)
-		return;
-
-	rfile = filp_open(ctx->rfile_path,
-			O_CREAT | O_RDWR | O_APPEND | O_LARGEFILE, 0644);
-	if (IS_ERR(rfile)) {
-		pr_err("Cannot open the result file %s\n",
-				ctx->rfile_path);
-		return;
-	}
-
-	while (ctx->rbuf_offset) {
-		sz = kernel_write(rfile, ctx->rbuf, ctx->rbuf_offset, &pos);
-		if (sz < 0)
-			break;
-		ctx->rbuf_offset -= sz;
-	}
-	filp_close(rfile, NULL);
-}
-
-/*
- * Write a data into the result buffer
- */
-static void damon_write_rbuf(struct damon_ctx *ctx, void *data, ssize_t size)
-{
-	if (!ctx->rbuf_len || !ctx->rbuf || !ctx->rfile_path)
-		return;
-	if (ctx->rbuf_offset + size > ctx->rbuf_len)
-		damon_flush_rbuffer(ctx);
-	if (ctx->rbuf_offset + size > ctx->rbuf_len) {
-		pr_warn("%s: flush failed, or wrong size given(%u, %zu)\n",
-				__func__, ctx->rbuf_offset, size);
-		return;
-	}
-
-	memcpy(&ctx->rbuf[ctx->rbuf_offset], data, size);
-	ctx->rbuf_offset += size;
-}
-
-/*
- * Flush the aggregated monitoring results to the result buffer
- *
- * Stores current tracking results to the result buffer and reset 'nr_accesses'
- * of each region.  The format for the result buffer is as below:
- *
- *   <time> <number of targets> <array of target infos>
- *
- *   target info: <id> <number of regions> <array of region infos>
- *   region info: <start address> <end address> <nr_accesses>
+ * Reset the aggregated monitoring results ('nr_accesses' of each region).
  */
 static void kdamond_reset_aggregated(struct damon_ctx *c)
 {
 	struct damon_target *t;
-	struct timespec64 now;
 	unsigned int nr;
 
-	ktime_get_coarse_ts64(&now);
-
-	damon_write_rbuf(c, &now, sizeof(now));
-	nr = nr_damon_targets(c);
-	damon_write_rbuf(c, &nr, sizeof(nr));
-
 	damon_for_each_target(t, c) {
 		struct damon_region *r;
 
-		damon_write_rbuf(c, &t->id, sizeof(t->id));
 		nr = damon_nr_regions(t);
-		damon_write_rbuf(c, &nr, sizeof(nr));
 		damon_for_each_region(r, t) {
-			damon_write_rbuf(c, &r->ar.start, sizeof(r->ar.start));
-			damon_write_rbuf(c, &r->ar.end, sizeof(r->ar.end));
-			damon_write_rbuf(c, &r->nr_accesses,
-					sizeof(r->nr_accesses));
 			trace_damon_aggregated(t, r, nr);
 			r->last_nr_accesses = r->nr_accesses;
 			r->nr_accesses = 0;
@@ -927,14 +794,6 @@ static bool kdamond_need_stop(struct damon_ctx *ctx)
 	return true;
 }
 
-static void kdamond_write_record_header(struct damon_ctx *ctx)
-{
-	int recfmt_ver = 2;
-
-	damon_write_rbuf(ctx, "damon_recfmt_ver", 16);
-	damon_write_rbuf(ctx, &recfmt_ver, sizeof(recfmt_ver));
-}
-
 /*
  * The monitoring daemon that runs as a kernel thread
  */
@@ -951,8 +810,6 @@ static int kdamond_fn(void *data)
 		ctx->init_target_regions(ctx);
 	sz_limit = damon_region_sz_limit(ctx);
 
-	kdamond_write_record_header(ctx);
-
 	while (!kdamond_need_stop(ctx)) {
 		if (ctx->prepare_access_checks)
 			ctx->prepare_access_checks(ctx);
@@ -965,10 +822,10 @@ static int kdamond_fn(void *data)
 			max_nr_accesses = ctx->check_accesses(ctx);
 
 		if (kdamond_aggregate_interval_passed(ctx)) {
-			if (ctx->aggregate_cb)
-				ctx->aggregate_cb(ctx);
 			kdamond_merge_regions(ctx, max_nr_accesses / 10,
 					sz_limit);
+			if (ctx->aggregate_cb)
+				ctx->aggregate_cb(ctx);
 			kdamond_apply_schemes(ctx);
 			kdamond_reset_aggregated(ctx);
 			kdamond_split_regions(ctx);
@@ -980,7 +837,6 @@ static int kdamond_fn(void *data)
 			sz_limit = damon_region_sz_limit(ctx);
 		}
 	}
-	damon_flush_rbuffer(ctx);
 	damon_for_each_target(t, ctx) {
 		damon_for_each_region_safe(r, next, t)
 			damon_destroy_region(r);
diff --git a/mm/damon/dbgfs-test.h b/mm/damon/dbgfs-test.h
index dffb9f70e399..426adf5dadc2 100644
--- a/mm/damon/dbgfs-test.h
+++ b/mm/damon/dbgfs-test.h
@@ -78,7 +78,7 @@ static void damon_dbgfs_test_str_to_target_ids(struct kunit *test)
 
 static void damon_dbgfs_test_set_targets(struct kunit *test)
 {
-	struct damon_ctx *ctx = damon_new_ctx();
+	struct damon_ctx *ctx = debugfs_new_ctx();
 	unsigned long ids[] = {1, 2, 3};
 	char buf[64];
 
@@ -105,9 +105,91 @@ static void damon_dbgfs_test_set_targets(struct kunit *test)
 	sprint_target_ids(ctx, buf, 64);
 	KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
 
+	debugfs_destroy_ctx(ctx);
+}
+
+static void damon_dbgfs_test_set_recording(struct kunit *test)
+{
+	struct damon_ctx *ctx = debugfs_new_ctx();
+	struct debugfs_recorder *rec = ctx->private;
+	int err;
+
+	err = debugfs_set_recording(ctx, 42, "foo");
+	KUNIT_EXPECT_EQ(test, err, -EINVAL);
+	debugfs_set_recording(ctx, 4242, "foo.bar");
+	KUNIT_EXPECT_EQ(test, rec->rbuf_len, 4242u);
+	KUNIT_EXPECT_STREQ(test, rec->rfile_path, "foo.bar");
+	debugfs_set_recording(ctx, 424242, "foo");
+	KUNIT_EXPECT_EQ(test, rec->rbuf_len, 424242u);
+	KUNIT_EXPECT_STREQ(test, rec->rfile_path, "foo");
+
+	debugfs_destroy_ctx(ctx);
+}
+
+static void damon_dbgfs_test_write_rbuf(struct kunit *test)
+{
+	struct damon_ctx *ctx = debugfs_new_ctx();
+	struct debugfs_recorder *rec = ctx->private;
+	char *data;
+
+	debugfs_set_recording(ctx, 4242, "damon.data");
+
+	data = "hello";
+	debugfs_write_rbuf(ctx, data, strnlen(data, 256));
+	KUNIT_EXPECT_EQ(test, rec->rbuf_offset, 5u);
+
+	debugfs_write_rbuf(ctx, data, 0);
+	KUNIT_EXPECT_EQ(test, rec->rbuf_offset, 5u);
+
+	KUNIT_EXPECT_STREQ(test, (char *)rec->rbuf, data);
+
+	debugfs_destroy_ctx(ctx);
+}
+
+/*
+ * Test debugfs_aggregate_cb()
+ *
+ * dbgfs sets debugfs_aggregate_cb() as aggregate callback.  It stores the
+ * aggregated monitoring information ('->nr_accesses' of each regions) to the
+ * result buffer.
+ */
+static void damon_dbgfs_test_aggregate(struct kunit *test)
+{
+	struct damon_ctx *ctx = debugfs_new_ctx();
+	struct debugfs_recorder *rec = ctx->private;
+	unsigned long target_ids[] = {1, 2, 3};
+	unsigned long saddr[][3] = {{10, 20, 30}, {5, 42, 49}, {13, 33, 55} };
+	unsigned long eaddr[][3] = {{15, 27, 40}, {31, 45, 55}, {23, 44, 66} };
+	unsigned long accesses[][3] = {{42, 95, 84}, {10, 20, 30}, {0, 1, 2} };
+	struct damon_target *t;
+	struct damon_region *r;
+	int it, ir;
+	ssize_t sz, sr, sp;
+
+	debugfs_set_recording(ctx, 4242, "damon.data");
+	damon_set_targets(ctx, target_ids, 3);
+
+	it = 0;
+	damon_for_each_target(t, ctx) {
+		for (ir = 0; ir < 3; ir++) {
+			r = damon_new_region(saddr[it][ir], eaddr[it][ir]);
+			r->nr_accesses = accesses[it][ir];
+			damon_add_region(r, t);
+		}
+		it++;
+	}
+	debugfs_aggregate_cb(ctx);
+
+	/* The aggregated information should be written in the buffer */
+	sr = sizeof(r->ar.start) + sizeof(r->ar.end) + sizeof(r->nr_accesses);
+	sp = sizeof(t->id) + sizeof(unsigned int) + 3 * sr;
+	sz = sizeof(struct timespec64) + sizeof(unsigned int) + 3 * sp;
+	KUNIT_EXPECT_EQ(test, (unsigned int)sz, rec->rbuf_offset);
+
 	damon_destroy_ctx(ctx);
 }
 
+
 static void damon_dbgfs_test_set_init_regions(struct kunit *test)
 {
 	struct damon_ctx *ctx = damon_new_ctx();
@@ -164,6 +246,9 @@ static void damon_dbgfs_test_set_init_regions(struct kunit *test)
 static struct kunit_case damon_test_cases[] = {
 	KUNIT_CASE(damon_dbgfs_test_str_to_target_ids),
 	KUNIT_CASE(damon_dbgfs_test_set_targets),
+	KUNIT_CASE(damon_dbgfs_test_set_recording),
+	KUNIT_CASE(damon_dbgfs_test_write_rbuf),
+	KUNIT_CASE(damon_dbgfs_test_aggregate),
 	KUNIT_CASE(damon_dbgfs_test_set_init_regions),
 	{},
 };
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 646a492100ff..7a6c279690f8 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -10,15 +10,155 @@
 #include <linux/damon.h>
 #include <linux/debugfs.h>
 #include <linux/file.h>
+#include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 
+#define MIN_RECORD_BUFFER_LEN	1024
+#define MAX_RECORD_BUFFER_LEN	(4 * 1024 * 1024)
+#define MAX_RFILE_PATH_LEN	256
+
+struct debugfs_recorder {
+	unsigned char *rbuf;
+	unsigned int rbuf_len;
+	unsigned int rbuf_offset;
+	char *rfile_path;
+};
+
 /* Monitoring contexts for debugfs interface users. */
 static struct damon_ctx **debugfs_ctxs;
 static int debugfs_nr_ctxs = 1;
 
 static DEFINE_MUTEX(damon_dbgfs_lock);
 
+static unsigned int nr_damon_targets(struct damon_ctx *ctx)
+{
+	struct damon_target *t;
+	unsigned int nr_targets = 0;
+
+	damon_for_each_target(t, ctx)
+		nr_targets++;
+
+	return nr_targets;
+}
+
+/*
+ * Flush the content in the result buffer to the result file
+ */
+static void debugfs_flush_rbuffer(struct debugfs_recorder *rec)
+{
+	ssize_t sz;
+	loff_t pos = 0;
+	struct file *rfile;
+
+	if (!rec->rbuf_offset)
+		return;
+
+	rfile = filp_open(rec->rfile_path,
+			O_CREAT | O_RDWR | O_APPEND | O_LARGEFILE, 0644);
+	if (IS_ERR(rfile)) {
+		pr_err("Cannot open the result file %s\n",
+				rec->rfile_path);
+		return;
+	}
+
+	while (rec->rbuf_offset) {
+		sz = kernel_write(rfile, rec->rbuf, rec->rbuf_offset, &pos);
+		if (sz < 0)
+			break;
+		rec->rbuf_offset -= sz;
+	}
+	filp_close(rfile, NULL);
+}
+
+/*
+ * Write a data into the result buffer
+ */
+static void debugfs_write_rbuf(struct damon_ctx *ctx, void *data, ssize_t size)
+{
+	struct debugfs_recorder *rec = (struct debugfs_recorder *)ctx->private;
+
+	if (!rec->rbuf_len || !rec->rbuf || !rec->rfile_path)
+		return;
+	if (rec->rbuf_offset + size > rec->rbuf_len)
+		debugfs_flush_rbuffer(ctx->private);
+	if (rec->rbuf_offset + size > rec->rbuf_len) {
+		pr_warn("%s: flush failed, or wrong size given(%u, %zu)\n",
+				__func__, rec->rbuf_offset, size);
+		return;
+	}
+
+	memcpy(&rec->rbuf[rec->rbuf_offset], data, size);
+	rec->rbuf_offset += size;
+}
+
+static void debugfs_write_record_header(struct damon_ctx *ctx)
+{
+	int recfmt_ver = 2;
+
+	debugfs_write_rbuf(ctx, "damon_recfmt_ver", 16);
+	debugfs_write_rbuf(ctx, &recfmt_ver, sizeof(recfmt_ver));
+}
+
+static void debugfs_init_vm_regions(struct damon_ctx *ctx)
+{
+	debugfs_write_record_header(ctx);
+	kdamond_init_vm_regions(ctx);
+}
+
+static void debugfs_vm_cleanup(struct damon_ctx *ctx)
+{
+	debugfs_flush_rbuffer(ctx->private);
+	kdamond_vm_cleanup(ctx);
+}
+
+static void debugfs_init_phys_regions(struct damon_ctx *ctx)
+{
+	debugfs_write_record_header(ctx);
+}
+
+static void debugfs_phys_cleanup(struct damon_ctx *ctx)
+{
+	debugfs_flush_rbuffer(ctx->private);
+}
+
+/*
+ * Store the aggregated monitoring results to the result buffer
+ *
+ * The format for the result buffer is as below:
+ *
+ *   <time> <number of targets> <array of target infos>
+ *
+ *   target info: <id> <number of regions> <array of region infos>
+ *   region info: <start address> <end address> <nr_accesses>
+ */
+static void debugfs_aggregate_cb(struct damon_ctx *c)
+{
+	struct damon_target *t;
+	struct timespec64 now;
+	unsigned int nr;
+
+	ktime_get_coarse_ts64(&now);
+
+	debugfs_write_rbuf(c, &now, sizeof(now));
+	nr = nr_damon_targets(c);
+	debugfs_write_rbuf(c, &nr, sizeof(nr));
+
+	damon_for_each_target(t, c) {
+		struct damon_region *r;
+
+		debugfs_write_rbuf(c, &t->id, sizeof(t->id));
+		nr = damon_nr_regions(t);
+		debugfs_write_rbuf(c, &nr, sizeof(nr));
+		damon_for_each_region(r, t) {
+			debugfs_write_rbuf(c, &r->ar.start, sizeof(r->ar.start));
+			debugfs_write_rbuf(c, &r->ar.end, sizeof(r->ar.end));
+			debugfs_write_rbuf(c, &r->nr_accesses,
+					sizeof(r->nr_accesses));
+		}
+	}
+}
+
 static ssize_t debugfs_monitor_on_read(struct file *file,
 		char __user *buf, size_t count, loff_t *ppos)
 {
@@ -330,6 +470,20 @@ static struct pid *damon_get_pidfd_pid(unsigned int pidfd)
 	return pid;
 }
 
+static void debugfs_set_vaddr_primitives(struct damon_ctx *ctx)
+{
+	damon_set_vaddr_primitives(ctx);
+	ctx->init_target_regions = debugfs_init_vm_regions;
+	ctx->cleanup = debugfs_vm_cleanup;
+}
+
+static void debugfs_set_paddr_primitives(struct damon_ctx *ctx)
+{
+	damon_set_paddr_primitives(ctx);
+	ctx->init_target_regions = debugfs_init_phys_regions;
+	ctx->cleanup = debugfs_phys_cleanup;
+}
+
 static ssize_t debugfs_target_ids_write(struct file *file,
 		const char __user *buf, size_t count, loff_t *ppos)
 {
@@ -349,12 +503,13 @@ static ssize_t debugfs_target_ids_write(struct file *file,
 	nrs = kbuf;
 	if (!strncmp(kbuf, "paddr\n", count)) {
 		/* Configure the context for physical memory monitoring */
-		damon_set_paddr_primitives(ctx);
+		debugfs_set_paddr_primitives(ctx);
 		/* target id is meaningless here, but we set it just for fun */
 		scnprintf(kbuf, count, "42    ");
 	} else {
 		/* Configure the context for virtual memory monitoring */
-		damon_set_vaddr_primitives(ctx);
+		debugfs_set_vaddr_primitives(ctx);
+
 		if (!strncmp(kbuf, "pidfd ", 6)) {
 			received_pidfds = true;
 			nrs = &kbuf[6];
@@ -398,16 +553,76 @@ static ssize_t debugfs_record_read(struct file *file,
 		char __user *buf, size_t count, loff_t *ppos)
 {
 	struct damon_ctx *ctx = file->private_data;
+	struct debugfs_recorder *rec = ctx->private;
 	char record_buf[20 + MAX_RFILE_PATH_LEN];
 	int ret;
 
 	mutex_lock(&ctx->kdamond_lock);
 	ret = scnprintf(record_buf, ARRAY_SIZE(record_buf), "%u %s\n",
-			ctx->rbuf_len, ctx->rfile_path);
+			rec->rbuf_len, rec->rfile_path);
 	mutex_unlock(&ctx->kdamond_lock);
 	return simple_read_from_buffer(buf, count, ppos, record_buf, ret);
 }
 
+/*
+ * debugfs_set_recording() - Set attributes for the recording.
+ * @ctx:	target kdamond context
+ * @rbuf_len:	length of the result buffer
+ * @rfile_path:	path to the monitor result files
+ *
+ * Setting 'rbuf_len' 0 disables recording.
+ *
+ * This function should not be called while the kdamond is running.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+static int debugfs_set_recording(struct damon_ctx *ctx,
+			unsigned int rbuf_len, char *rfile_path)
+{
+	struct debugfs_recorder *recorder;
+	size_t rfile_path_len;
+
+	if (rbuf_len && (rbuf_len > MAX_RECORD_BUFFER_LEN ||
+			rbuf_len < MIN_RECORD_BUFFER_LEN)) {
+		pr_err("result buffer size (%u) is out of [%d,%d]\n",
+				rbuf_len, MIN_RECORD_BUFFER_LEN,
+				MAX_RECORD_BUFFER_LEN);
+		return -EINVAL;
+	}
+	rfile_path_len = strnlen(rfile_path, MAX_RFILE_PATH_LEN);
+	if (rfile_path_len >= MAX_RFILE_PATH_LEN) {
+		pr_err("too long (>%d) result file path %s\n",
+				MAX_RFILE_PATH_LEN, rfile_path);
+		return -EINVAL;
+	}
+
+	recorder = ctx->private;
+	if (!recorder) {
+		recorder = kzalloc(sizeof(*recorder), GFP_KERNEL);
+		if (!recorder)
+			return -ENOMEM;
+		ctx->private = recorder;
+	}
+
+	recorder->rbuf_len = rbuf_len;
+	kfree(recorder->rbuf);
+	recorder->rbuf = NULL;
+	kfree(recorder->rfile_path);
+	recorder->rfile_path = NULL;
+
+	if (rbuf_len) {
+		recorder->rbuf = kvmalloc(rbuf_len, GFP_KERNEL);
+		if (!recorder->rbuf)
+			return -ENOMEM;
+	}
+	recorder->rfile_path = kmalloc(rfile_path_len + 1, GFP_KERNEL);
+	if (!recorder->rfile_path)
+		return -ENOMEM;
+	strncpy(recorder->rfile_path, rfile_path, rfile_path_len + 1);
+
+	return 0;
+}
+
 static ssize_t debugfs_record_write(struct file *file,
 		const char __user *buf, size_t count, loff_t *ppos)
 {
@@ -434,7 +649,7 @@ static ssize_t debugfs_record_write(struct file *file,
 		goto unlock_out;
 	}
 
-	err = damon_set_recording(ctx, rbuf_len, rfile_path);
+	err = debugfs_set_recording(ctx, rbuf_len, rfile_path);
 	if (err)
 		ret = err;
 unlock_out:
@@ -654,6 +869,38 @@ static struct dentry **debugfs_dirs;
 
 static int debugfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx);
 
+static void debugfs_free_recorder(struct debugfs_recorder *recorder)
+{
+	kfree(recorder->rbuf);
+	kfree(recorder->rfile_path);
+	kfree(recorder);
+}
+
+static struct damon_ctx *debugfs_new_ctx(void)
+{
+	struct damon_ctx *ctx;
+
+	ctx = damon_new_ctx();
+	if (!ctx)
+		return NULL;
+
+	if (debugfs_set_recording(ctx, 0, "none")) {
+		damon_destroy_ctx(ctx);
+		return NULL;
+	}
+
+	debugfs_set_vaddr_primitives(ctx);
+	ctx->aggregate_cb = debugfs_aggregate_cb;
+	return ctx;
+}
+
+static void debugfs_destroy_ctx(struct damon_ctx *ctx)
+{
+	debugfs_free_recorder(ctx->private);
+	damon_destroy_ctx(ctx);
+}
+
+
 static ssize_t debugfs_nr_contexts_write(struct file *file,
 		const char __user *buf, size_t count, loff_t *ppos)
 {
@@ -689,7 +936,7 @@ static ssize_t debugfs_nr_contexts_write(struct file *file,
 
 	for (i = nr_contexts; i < debugfs_nr_ctxs; i++) {
 		debugfs_remove(debugfs_dirs[i]);
-		damon_destroy_ctx(debugfs_ctxs[i]);
+		debugfs_destroy_ctx(debugfs_ctxs[i]);
 	}
 
 	new_dirs = kmalloc_array(nr_contexts, sizeof(*new_dirs), GFP_KERNEL);
@@ -729,13 +976,13 @@ static ssize_t debugfs_nr_contexts_write(struct file *file,
 			break;
 		}
 
-		debugfs_ctxs[i] = damon_new_ctx();
+		debugfs_ctxs[i] = debugfs_new_ctx();
 		if (!debugfs_ctxs[i]) {
 			pr_err("ctx for %s creation failed\n", dirname);
+			debugfs_remove(debugfs_dirs[i]);
 			ret = -ENOMEM;
 			break;
 		}
-		damon_set_vaddr_primitives(debugfs_ctxs[i]);
 
 		if (debugfs_fill_ctx_dir(debugfs_dirs[i], debugfs_ctxs[i])) {
 			ret = -ENOMEM;
@@ -865,10 +1112,9 @@ static int __init damon_dbgfs_init(void)
 	int rc;
 
 	debugfs_ctxs = kmalloc(sizeof(*debugfs_ctxs), GFP_KERNEL);
-	debugfs_ctxs[0] = damon_new_ctx();
+	debugfs_ctxs[0] = debugfs_new_ctx();
 	if (!debugfs_ctxs[0])
 		return -ENOMEM;
-	damon_set_vaddr_primitives(debugfs_ctxs[0]);
 
 	rc = damon_debugfs_init();
 	if (rc)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 4/5] mm/page_idle: Avoid interferences from concurrent users
  2020-09-28 10:35 [RFC PATCH 0/5] DAMON: Make coexistable with Idle Page Tracking SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 2/5] mm/damon: Separate DAMON schemes application to primitives SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 3/5] mm/damon: Move recording feature from core to dbgfs SeongJae Park
@ 2020-09-28 10:35 ` SeongJae Park
  2020-09-28 10:35 ` [RFC PATCH 5/5] mm/damon/primitives: Make coexistable with Idle Page Tracking SeongJae Park
  3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2020-09-28 10:35 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

Concurrent Idle Page Tracking users can interfere each other because the
interface doesn't provide a central rule for synchronization between the
users.  Users could implement their own synchronization rule, but even
in that case, applications developed by different users would not know
how to synchronize with others.  To help this situation, this commit
introduces a centralized synchronization infrastructure of Idle Page
Tracking.

In detail, this commit introduces a mutex lock for Idle Page Tracking,
called 'page_idle_lock'.  It is exposed to user space via a new bool
sysfs file, '/sys/kernel/mm/page_idle/lock'.  By writing to and reading
from the file, users can hold/release and read status of the mutex.
Writes to the Idle Page Tracking 'bitmap' file fails if the lock is not
held, while reads of the file can be done regardless of the lock status.

Note that users could still interfere each other if they abuse this
locking rule.  Nevertheless, this change will let them notice the rule.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 .../admin-guide/mm/idle_page_tracking.rst     | 22 +++++++---
 mm/page_idle.c                                | 40 +++++++++++++++++++
 2 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/mm/idle_page_tracking.rst b/Documentation/admin-guide/mm/idle_page_tracking.rst
index df9394fb39c2..3f5e7a8b5b78 100644
--- a/Documentation/admin-guide/mm/idle_page_tracking.rst
+++ b/Documentation/admin-guide/mm/idle_page_tracking.rst
@@ -21,13 +21,13 @@ User API
 ========
 
 The idle page tracking API is located at ``/sys/kernel/mm/page_idle``.
-Currently, it consists of the only read-write file,
-``/sys/kernel/mm/page_idle/bitmap``.
+Currently, it consists of two read-write file,
+``/sys/kernel/mm/page_idle/bitmap`` and ``/sys/kernel/mm/page_idle/lock``.
 
-The file implements a bitmap where each bit corresponds to a memory page. The
-bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
-mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
-set, the corresponding page is idle.
+The ``bitmap`` file implements a bitmap where each bit corresponds to a memory
+page. The bitmap is represented by an array of 8-byte integers, and the page at
+PFN #i is mapped to bit #i%64 of array element #i/64, byte order is native.
+When a bit is set, the corresponding page is idle.
 
 A page is considered idle if it has not been accessed since it was marked idle
 (for more details on what "accessed" actually means see the :ref:`Implementation
@@ -74,6 +74,16 @@ See :ref:`Documentation/admin-guide/mm/pagemap.rst <pagemap>` for more
 information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and
 ``/proc/kpagecgroup``.
 
+The ``lock`` file is for avoidance of interference from concurrent users.  If
+the content of the ``lock`` file is ``1``, it means the ``bitmap`` file is
+currently being used by someone.  While the content of the ``lock`` file is
+``1``, writing ``1`` to the file fails.  Therefore, users should first
+successfully write ``1`` to the ``lock`` file before starting use of ``bitmap``
+file and write ``0`` to the ``lock`` file after they finished use of the
+``bitmap`` file.  If a user writes the ``bitmap`` file while the ``lock`` is
+``0``, the write fails.  Meanwhile, reads of the ``bitmap`` file success
+regardless of the ``lock`` status.
+
 .. _impl_details:
 
 Implementation Details
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 144fb4ed961d..0aa45f848570 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -16,6 +16,8 @@
 #define BITMAP_CHUNK_SIZE	sizeof(u64)
 #define BITMAP_CHUNK_BITS	(BITMAP_CHUNK_SIZE * BITS_PER_BYTE)
 
+static DEFINE_MUTEX(page_idle_lock);
+
 /*
  * Idle page tracking only considers user memory pages, for other types of
  * pages the idle flag is always unset and an attempt to set it is silently
@@ -169,6 +171,9 @@ static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj,
 	unsigned long pfn, end_pfn;
 	int bit;
 
+	if (!mutex_is_locked(&page_idle_lock))
+		return -EPERM;
+
 	if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
 		return -EINVAL;
 
@@ -197,17 +202,52 @@ static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj,
 	return (char *)in - buf;
 }
 
+static ssize_t page_idle_lock_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", mutex_is_locked(&page_idle_lock));
+}
+
+static ssize_t page_idle_lock_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	bool do_lock;
+	int ret;
+
+	ret = kstrtobool(buf, &do_lock);
+	if (ret < 0)
+		return ret;
+
+	if (do_lock) {
+		if (!mutex_trylock(&page_idle_lock))
+			return -EBUSY;
+	} else {
+		mutex_unlock(&page_idle_lock);
+	}
+
+	return count;
+}
+
 static struct bin_attribute page_idle_bitmap_attr =
 		__BIN_ATTR(bitmap, 0600,
 			   page_idle_bitmap_read, page_idle_bitmap_write, 0);
 
+static struct kobj_attribute page_idle_lock_attr =
+		__ATTR(lock, 0600, page_idle_lock_show, page_idle_lock_store);
+
 static struct bin_attribute *page_idle_bin_attrs[] = {
 	&page_idle_bitmap_attr,
 	NULL,
 };
 
+static struct attribute *page_idle_lock_attrs[] = {
+	&page_idle_lock_attr.attr,
+	NULL,
+};
+
 static const struct attribute_group page_idle_attr_group = {
 	.bin_attrs = page_idle_bin_attrs,
+	.attrs = page_idle_lock_attrs,
 	.name = "page_idle",
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 5/5] mm/damon/primitives: Make coexistable with Idle Page Tracking
  2020-09-28 10:35 [RFC PATCH 0/5] DAMON: Make coexistable with Idle Page Tracking SeongJae Park
                   ` (2 preceding siblings ...)
  2020-09-28 10:35 ` [RFC PATCH 4/5] mm/page_idle: Avoid interferences from concurrent users SeongJae Park
@ 2020-09-28 10:35 ` SeongJae Park
  3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2020-09-28 10:35 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
	alexander.shishkin, amit, benh, brendan.d.gregg, brendanhiggins,
	cai, colin.king, corbet, david, dwmw, elver, fan.du, foersleo,
	gthelen, irogers, jolsa, kirill, mark.rutland, mgorman, minchan,
	mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt, rppt,
	sblbir, shakeelb, shuah, sj38.park, snu, vbabka, vdavydov.dev,
	yang.shi, ying.huang, zgf574564920, linux-damon, linux-mm,
	linux-doc, linux-kernel

From: SeongJae Park <sjpark@amazon.de>

DAMON's reference 'primitives' internally use 'PG_Idle' flag.  Because
the flag is also used by Idle Page Tracking but there was no way to
synchronize with it, the 'primitives' were configured to be exclusive
with Idle Page Tracking before.  However, as we can now synchronize with
Idle Page Tracking using 'idle_page_lock', this commit makes the
primitives to do the synchronization and coexistable with Idle Page
Tracking.

In more detail, the 'primitives' only require the users to do the
synchronization by themselves.  Real synchronization is done by the
DAMON debugfs interface, who is the only one user of the 'primitives' as
of now.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h     |  1 +
 include/linux/page_idle.h |  2 ++
 mm/damon/Kconfig          |  2 +-
 mm/damon/dbgfs.c          | 32 +++++++++++++++++++++++++++++++-
 mm/damon/primitives.c     | 16 +++++++++++++++-
 mm/page_idle.c            |  2 +-
 6 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 606e59f785a2..12200a1171a8 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -312,6 +312,7 @@ void kdamond_init_phys_regions(struct damon_ctx *ctx);
 void kdamond_update_phys_regions(struct damon_ctx *ctx);
 void kdamond_prepare_phys_access_checks(struct damon_ctx *ctx);
 unsigned int kdamond_check_phys_accesses(struct damon_ctx *ctx);
+bool kdamond_phys_target_valid(struct damon_target *t);
 void damon_set_paddr_primitives(struct damon_ctx *ctx);
 
 #endif	/* CONFIG_DAMON_PRIMITIVES */
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
index d8a6aecf99cb..bcbb965b566c 100644
--- a/include/linux/page_idle.h
+++ b/include/linux/page_idle.h
@@ -8,6 +8,8 @@
 
 #ifdef CONFIG_PAGE_IDLE_FLAG
 
+extern struct mutex page_idle_lock;
+
 #ifdef CONFIG_64BIT
 static inline bool page_is_young(struct page *page)
 {
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 8b3f3dd3bd32..64d69a239408 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -26,7 +26,7 @@ config DAMON_KUNIT_TEST
 
 config DAMON_PRIMITIVES
 	bool "DAMON primitives for virtual/physical address spaces monitoring"
-	depends on DAMON && MMU && !IDLE_PAGE_TRACKING
+	depends on DAMON && MMU
 	select PAGE_EXTENSION if !64BIT
 	select PAGE_IDLE_FLAG
 	help
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 7a6c279690f8..ce12e92e1667 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -12,6 +12,7 @@
 #include <linux/file.h>
 #include <linux/mm.h>
 #include <linux/module.h>
+#include <linux/page_idle.h>
 #include <linux/slab.h>
 
 #define MIN_RECORD_BUFFER_LEN	1024
@@ -28,6 +29,7 @@ struct debugfs_recorder {
 /* Monitoring contexts for debugfs interface users. */
 static struct damon_ctx **debugfs_ctxs;
 static int debugfs_nr_ctxs = 1;
+static int debugfs_nr_terminated_ctxs;
 
 static DEFINE_MUTEX(damon_dbgfs_lock);
 
@@ -106,9 +108,20 @@ static void debugfs_init_vm_regions(struct damon_ctx *ctx)
 	kdamond_init_vm_regions(ctx);
 }
 
+static void debugfs_unlock_page_idle_lock(void)
+{
+	mutex_lock(&damon_dbgfs_lock);
+	if (++debugfs_nr_terminated_ctxs == debugfs_nr_ctxs) {
+		debugfs_nr_terminated_ctxs = 0;
+		mutex_unlock(&page_idle_lock);
+	}
+	mutex_unlock(&damon_dbgfs_lock);
+}
+
 static void debugfs_vm_cleanup(struct damon_ctx *ctx)
 {
 	debugfs_flush_rbuffer(ctx->private);
+	debugfs_unlock_page_idle_lock();
 	kdamond_vm_cleanup(ctx);
 }
 
@@ -120,6 +133,8 @@ static void debugfs_init_phys_regions(struct damon_ctx *ctx)
 static void debugfs_phys_cleanup(struct damon_ctx *ctx)
 {
 	debugfs_flush_rbuffer(ctx->private);
+	debugfs_unlock_page_idle_lock();
+
 }
 
 /*
@@ -197,6 +212,21 @@ static char *user_input_str(const char __user *buf, size_t count, loff_t *ppos)
 	return kbuf;
 }
 
+static int debugfs_start_ctx_ptrs(struct damon_ctx **ctxs, int nr_ctxs)
+{
+	int rc;
+
+	if (!mutex_trylock(&page_idle_lock))
+		return -EBUSY;
+
+	rc = damon_start_ctx_ptrs(ctxs, nr_ctxs);
+	if (rc)
+		mutex_unlock(&page_idle_lock);
+
+	return rc;
+}
+
+
 static ssize_t debugfs_monitor_on_write(struct file *file,
 		const char __user *buf, size_t count, loff_t *ppos)
 {
@@ -212,7 +242,7 @@ static ssize_t debugfs_monitor_on_write(struct file *file,
 	if (sscanf(kbuf, "%s", kbuf) != 1)
 		return -EINVAL;
 	if (!strncmp(kbuf, "on", count))
-		err = damon_start_ctx_ptrs(debugfs_ctxs, debugfs_nr_ctxs);
+		err = debugfs_start_ctx_ptrs(debugfs_ctxs, debugfs_nr_ctxs);
 	else if (!strncmp(kbuf, "off", count))
 		err = damon_stop_ctx_ptrs(debugfs_ctxs, debugfs_nr_ctxs);
 	else
diff --git a/mm/damon/primitives.c b/mm/damon/primitives.c
index e762dc8a5f2e..442b41b79b82 100644
--- a/mm/damon/primitives.c
+++ b/mm/damon/primitives.c
@@ -30,6 +30,10 @@
 
 #include "damon.h"
 
+#ifndef CONFIG_IDLE_PAGE_TRACKING
+DEFINE_MUTEX(page_idle_lock);
+#endif
+
 /* Minimal region size.  Every damon_region is aligned by this. */
 #ifndef CONFIG_DAMON_KUNIT_TEST
 #define MIN_REGION PAGE_SIZE
@@ -776,6 +780,9 @@ bool kdamond_vm_target_valid(struct damon_target *t)
 {
 	struct task_struct *task;
 
+	if (!mutex_is_locked(&page_idle_lock))
+		return false;
+
 	task = damon_get_task_struct(t);
 	if (task) {
 		put_task_struct(task);
@@ -795,6 +802,13 @@ void kdamond_vm_cleanup(struct damon_ctx *ctx)
 	}
 }
 
+bool kdamond_phys_target_valid(struct damon_target *t)
+{
+	if (!mutex_is_locked(&page_idle_lock))
+		return false;
+	return true;
+}
+
 #ifndef CONFIG_ADVISE_SYSCALLS
 static int damos_madvise(struct damon_target *target, struct damon_region *r,
 			int behavior)
@@ -874,7 +888,7 @@ void damon_set_paddr_primitives(struct damon_ctx *ctx)
 	ctx->update_target_regions = kdamond_update_phys_regions;
 	ctx->prepare_access_checks = kdamond_prepare_phys_access_checks;
 	ctx->check_accesses = kdamond_check_phys_accesses;
-	ctx->target_valid = NULL;
+	ctx->target_valid = kdamond_phys_target_valid;
 	ctx->cleanup = NULL;
 	ctx->apply_scheme = NULL;
 }
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 0aa45f848570..958dcc18f6cd 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -16,7 +16,7 @@
 #define BITMAP_CHUNK_SIZE	sizeof(u64)
 #define BITMAP_CHUNK_BITS	(BITMAP_CHUNK_SIZE * BITS_PER_BYTE)
 
-static DEFINE_MUTEX(page_idle_lock);
+DEFINE_MUTEX(page_idle_lock);
 
 /*
  * Idle page tracking only considers user memory pages, for other types of
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-28 10:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-28 10:35 [RFC PATCH 0/5] DAMON: Make coexistable with Idle Page Tracking SeongJae Park
2020-09-28 10:35 ` [RFC PATCH 2/5] mm/damon: Separate DAMON schemes application to primitives SeongJae Park
2020-09-28 10:35 ` [RFC PATCH 3/5] mm/damon: Move recording feature from core to dbgfs SeongJae Park
2020-09-28 10:35 ` [RFC PATCH 4/5] mm/page_idle: Avoid interferences from concurrent users SeongJae Park
2020-09-28 10:35 ` [RFC PATCH 5/5] mm/damon/primitives: Make coexistable with Idle Page Tracking SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).