Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: <sjpark@amazon.com>
To: <akpm@linux-foundation.org>
Cc: SeongJae Park <sjpark@amazon.de>, <sj38.park@gmail.com>,
	<acme@kernel.org>, <amit@kernel.org>, <brendan.d.gregg@gmail.com>,
	<corbet@lwn.net>, <dwmw@amazon.com>, <mgorman@suse.de>,
	<rostedt@goodmis.org>, <kirill@shutemov.name>,
	<brendanhiggins@google.com>, <colin.king@canonical.com>,
	<minchan@kernel.org>, <vdavydov.dev@gmail.com>,
	<vdavydov@parallels.com>, <linux-mm@kvack.org>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH v2 1/9] mm: Introduce Data Access MONitor (DAMON)
Date: Tue, 28 Jan 2020 09:57:34 +0100
Message-ID: <20200128085742.14566-2-sjpark@amazon.com> (raw)
In-Reply-To: <20200128085742.14566-1-sjpark@amazon.com>

From: SeongJae Park <sjpark@amazon.de>

This commit introduces a new kernel module named DAMON.  Note that this
commit is implementing only the stub for the module load/unload, basic
data structures, and simple manipulation functions of the structures to
keep the size of commit small.  The core mechanisms of DAMON will be
implemented one by one by following commits.

Brief Introduction
==================

DAMON is a kernel module that would allow users to monitor the actual
memory access pattern of specific user-space processes.  It aims to be
1) accurate enough to be useful for performance-centric domains, and 2)
sufficiently light-weight so that it can be applied online.

For the goals, DAMON will utilize its two core mechanisms, called
region-based sampling and adaptive regions adjustment.  The region-based
sampling allows users to make their own trade-off between the quality
and the overhead of the monitoring and set the upperbound of the
monitoring overhead.  Further, the adaptive regions adjustment mechanism
makes DAMON to maximize the quality and minimize the overhead with its
best efforts while preserving the users configured trade-off.

Please note that the term 'memory' in this context means 'main memory'.
It also assumes that it would usually utilizes the middle level speed
memory devices such as DRAMs or NVRAMs.  CPU caches or storage devices
are not DAMON's concern, as those are too fast or too slow to be in
DAMON's scope.

Expected Use-cases
==================

A straightforward usecase of DAMON would be the program behavior
analysis.  With the DAMON output, users can confirm whether the program
is running as intended or not.  This will be useful for debuggings and
tests of design points.

The monitored results can also be useful for counting the dynamic
working set size of workloads.  For the administration of memory
overcommitted systems or selection of the environments (e.g., containers
providing different amount of memory) for your workloads, this will be
useful.

If you are a programmer, you can optimize your program by managing the
memory based on the actual data access pattern.  For example, you can
identify the dynamic hotness of your data using DAMON and call
``mlock()`` to keep your hot data in DRAM, or call ``madvise()`` with
``MADV_PAGEOUT`` to proactively reclaim cold data.  Even though your
program is guaranteed to not encounter memory pressure, you can still
improve the performance by applying the DAMON outputs for call of
``MADV_HUGEPAGE`` and ``MADV_NOHUGEPAGE``.  More creative optimizations
would be possible.  Our evaluations of DAMON includes a straightforward
optimization using the ``mlock()``.  Please refer to the below
Evaluation section for more detail.

As DAMON incurs very low overhead, such optimizations can be applied not
only offline, but also online.  Also, there is no reason to limit such
optimizations to the user space.  Several parts of the kernel's memory
management mechanisms could be also optimized using DAMON. The
reclamation, the THP (de)promotion decisions, and the compaction would
be such a candidates.  Nevertheless, current version of DAMON is not
highly optimized for the online/in-kernel uses.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 MAINTAINERS |   6 ++
 mm/Kconfig  |  12 +++
 mm/Makefile |   1 +
 mm/damon.c  | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 242 insertions(+)
 create mode 100644 mm/damon.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 56765f542244..5a4db07cad33 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4611,6 +4611,12 @@ F:	net/ax25/ax25_out.c
 F:	net/ax25/ax25_timer.c
 F:	net/ax25/sysctl_net_ax25.c
 
+DATA ACCESS MONITOR
+M:	SeongJae Park <sjpark@amazon.de>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	mm/damon.c
+
 DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER
 L:	netdev@vger.kernel.org
 S:	Orphan
diff --git a/mm/Kconfig b/mm/Kconfig
index ab80933be65f..144fb916aa8b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -739,4 +739,16 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
         bool
 
+config DAMON
+	tristate "Data Access Monitor"
+	depends on MMU
+	default y
+	help
+	  Provides data access monitoring.
+
+	  DAMON is a kernel module that allows users to monitor the actual
+	  memory access pattern of specific user-space processes.  It aims to
+	  be 1) accurate enough to be useful for performance-centric domains,
+	  and 2) sufficiently light-weight so that it can be applied online.
+
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 1937cc251883..2911b3832c90 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -108,3 +108,4 @@ obj-$(CONFIG_ZONE_DEVICE) += memremap.o
 obj-$(CONFIG_HMM_MIRROR) += hmm.o
 obj-$(CONFIG_MEMFD_CREATE) += memfd.o
 obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o
+obj-$(CONFIG_DAMON) += damon.o
diff --git a/mm/damon.c b/mm/damon.c
new file mode 100644
index 000000000000..064ec1f6ded9
--- /dev/null
+++ b/mm/damon.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Data Access Monitor
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.  All rights reserved.
+ *
+ * Author: SeongJae Park <sjpark@amazon.de>
+ */
+
+#define pr_fmt(fmt) "damon: " fmt
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+
+#define damon_get_task_struct(t) \
+	(get_pid_task(find_vpid(t->pid), PIDTYPE_PID))
+
+#define damon_next_region(r) \
+	(container_of(r->list.next, struct damon_region, list))
+
+#define damon_prev_region(r) \
+	(container_of(r->list.prev, struct damon_region, list))
+
+#define damon_for_each_region(r, t) \
+	list_for_each_entry(r, &t->regions_list, list)
+
+#define damon_for_each_region_safe(r, next, t) \
+	list_for_each_entry_safe(r, next, &t->regions_list, list)
+
+#define damon_for_each_task(t) \
+	list_for_each_entry(t, &damon_tasks_list, list)
+
+#define damon_for_each_task_safe(t, next) \
+	list_for_each_entry_safe(t, next, &damon_tasks_list, list)
+
+/* Represents a monitoring target region on the virtual address space */
+struct damon_region {
+	unsigned long vm_start;
+	unsigned long vm_end;
+	unsigned long sampling_addr;
+	unsigned int nr_accesses;
+	struct list_head list;
+};
+
+/* Represents a monitoring target task */
+struct damon_task {
+	unsigned long pid;
+	struct list_head regions_list;
+	struct list_head list;
+};
+
+/* List of damon_task objects */
+static LIST_HEAD(damon_tasks_list);
+
+static struct rnd_state rndseed;
+/* Get a random number in [l, r) */
+#define damon_rand(l, r) (l + prandom_u32_state(&rndseed) % (r - l))
+
+/*
+ * Construct a damon_region struct
+ *
+ * Returns the pointer to the new struct if success, or NULL otherwise
+ */
+static struct damon_region *damon_new_region(unsigned long vm_start,
+					unsigned long vm_end)
+{
+	struct damon_region *ret;
+
+	ret = kmalloc(sizeof(struct damon_region), GFP_KERNEL);
+	if (!ret)
+		return NULL;
+	ret->vm_start = vm_start;
+	ret->vm_end = vm_end;
+	ret->nr_accesses = 0;
+	ret->sampling_addr = damon_rand(vm_start, vm_end);
+	INIT_LIST_HEAD(&ret->list);
+
+	return ret;
+}
+
+/*
+ * Add a region between two other regions
+ */
+static inline void damon_add_region(struct damon_region *r,
+		struct damon_region *prev, struct damon_region *next)
+{
+	__list_add(&r->list, &prev->list, &next->list);
+}
+
+/*
+ * Append a region to a task's list of regions
+ */
+static void damon_add_region_tail(struct damon_region *r, struct damon_task *t)
+{
+	list_add_tail(&r->list, &t->regions_list);
+}
+
+/*
+ * Delete a region from its list
+ */
+static void damon_del_region(struct damon_region *r)
+{
+	list_del(&r->list);
+}
+
+/*
+ * De-allocate a region
+ */
+static void damon_free_region(struct damon_region *r)
+{
+	kfree(r);
+}
+
+static void damon_destroy_region(struct damon_region *r)
+{
+	damon_del_region(r);
+	damon_free_region(r);
+}
+
+/*
+ * Construct a damon_task struct
+ *
+ * Returns the pointer to the new struct if success, or NULL otherwise
+ */
+static struct damon_task *damon_new_task(unsigned long pid)
+{
+	struct damon_task *t;
+
+	t = kmalloc(sizeof(struct damon_task), GFP_KERNEL);
+	if (!t)
+		return NULL;
+	t->pid = pid;
+	INIT_LIST_HEAD(&t->regions_list);
+
+	return t;
+}
+
+/* Returns n-th damon_region of the given task */
+struct damon_region *damon_nth_region_of(struct damon_task *t, unsigned int n)
+{
+	struct damon_region *r;
+	unsigned int i;
+
+	i = 0;
+	damon_for_each_region(r, t) {
+		if (i++ == n)
+			return r;
+	}
+	return NULL;
+}
+
+static void damon_add_task_tail(struct damon_task *t)
+{
+	list_add_tail(&t->list, &damon_tasks_list);
+}
+
+static void damon_del_task(struct damon_task *t)
+{
+	list_del(&t->list);
+}
+
+static void damon_free_task(struct damon_task *t)
+{
+	struct damon_region *r, *next;
+
+	damon_for_each_region_safe(r, next, t)
+		damon_free_region(r);
+	kfree(t);
+}
+
+static void damon_destroy_task(struct damon_task *t)
+{
+	damon_del_task(t);
+	damon_free_task(t);
+}
+
+/*
+ * Returns number of monitoring target tasks
+ */
+static unsigned int nr_damon_tasks(void)
+{
+	struct damon_task *t;
+	unsigned int ret = 0;
+
+	damon_for_each_task(t)
+		ret++;
+	return ret;
+}
+
+/*
+ * Returns the number of target regions for a given target task
+ */
+static unsigned int nr_damon_regions(struct damon_task *t)
+{
+	struct damon_region *r;
+	unsigned int ret = 0;
+
+	damon_for_each_region(r, t)
+		ret++;
+	return ret;
+}
+
+static int __init damon_init(void)
+{
+	pr_info("init\n");
+
+	prandom_seed_state(&rndseed, 42);
+	return 0;
+}
+
+static void __exit damon_exit(void)
+{
+	pr_info("exit\n");
+}
+
+module_init(damon_init);
+module_exit(damon_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("SeongJae Park <sjpark@amazon.de>");
+MODULE_DESCRIPTION("DAMON: Data Access MONitor");
-- 
2.17.1



  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28  8:57 [PATCH v2 0/9] " sjpark
2020-01-28  8:57 ` sjpark [this message]
2020-01-28 16:06   ` [PATCH v2 1/9] mm: " Randy Dunlap
2020-01-28 16:09     ` sjpark
2020-01-30 23:58   ` Brendan Higgins
2020-01-31  4:38     ` SeongJae Park
2020-01-28  8:57 ` [PATCH v2 2/9] mm/damon: Implement region based sampling sjpark
2020-01-28  8:57 ` [PATCH v2 3/9] mm/damon: Adaptively adjust regions sjpark
2020-01-28  8:57 ` [PATCH v2 4/9] mm/damon: Apply dynamic memory mapping changes sjpark
2020-01-28  8:59 ` [PATCH v2 5/9] mm/damon: Add debugfs interface sjpark
2020-01-28  9:00 ` [PATCH v2 6/9] mm/damon: Add minimal user-space tools sjpark
2020-01-31  0:02   ` Brendan Higgins
2020-01-31  4:44     ` SeongJae Park
2020-02-01  8:52       ` SeongJae Park
2020-01-28  9:00 ` [PATCH v2 7/9] Documentation/admin-guide/mm: Add a document for DAMON sjpark
2020-01-28  9:01 ` [PATCH v2 8/9] mm/damon: Add kunit tests sjpark
2020-01-31  0:14   ` Brendan Higgins
2020-01-31  4:55     ` SeongJae Park
2020-01-28  9:01 ` [PATCH v2 9/9] mm/damon: Add a tracepoint for result buffer writing sjpark
2020-01-28 10:20 ` [PATCH v2 0/9] Introduce Data Access MONitor (DAMON) Qian Cai
2020-01-28 10:49   ` sjpark
2020-01-28 11:20     ` Qian Cai
2020-01-28 12:00       ` sjpark
2020-01-29 12:56         ` Peter Zijlstra
2020-01-29 14:37           ` sjpark
2020-01-29 18:07             ` Peter Zijlstra
2020-01-29 19:06               ` SeongJae Park
2020-01-29 19:36                 ` Peter Zijlstra
2020-01-29 19:59                   ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200128085742.14566-2-sjpark@amazon.com \
    --to=sjpark@amazon.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=amit@kernel.org \
    --cc=brendan.d.gregg@gmail.com \
    --cc=brendanhiggins@google.com \
    --cc=colin.king@canonical.com \
    --cc=corbet@lwn.net \
    --cc=dwmw@amazon.com \
    --cc=kirill@shutemov.name \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sj38.park@gmail.com \
    --cc=sjpark@amazon.de \
    --cc=vdavydov.dev@gmail.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git