* [RFC v5 1/7] mm/madvise: Export do_madvise() to external GPL modules
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
@ 2020-03-30 11:50 ` SeongJae Park
2020-03-30 11:50 ` [RFC v5 2/7] mm/damon: Account age of target regions SeongJae Park
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-03-30 11:50 UTC (permalink / raw)
To: akpm
Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-mm, linux-doc, linux-kernel
From: SeongJae Park <sjpark@amazon.de>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
mm/madvise.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/madvise.c b/mm/madvise.c
index 80f8a1839f70..151aaf285cdd 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1151,6 +1151,7 @@ int do_madvise(struct task_struct *target_task, struct mm_struct *mm,
return error;
}
+EXPORT_SYMBOL_GPL(do_madvise);
SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
{
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC v5 2/7] mm/damon: Account age of target regions
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
2020-03-30 11:50 ` [RFC v5 1/7] mm/madvise: Export do_madvise() to external GPL modules SeongJae Park
@ 2020-03-30 11:50 ` SeongJae Park
2020-03-30 11:50 ` [RFC v5 3/7] mm/damon: Implement data access monitoring-based operation schemes SeongJae Park
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-03-30 11:50 UTC (permalink / raw)
To: akpm
Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-mm, linux-doc, linux-kernel
From: SeongJae Park <sjpark@amazon.de>
DAMON can be used as a primitive for data access pattern awared memory
maangement optimizations. However, users who want such optimizations
should run DAMON, read the monitoring results, analyze it, plan a new
memory management scheme, and apply the new scheme by themselves. It
would not be too hard, but still require some level of efforts. For
complicated optimizations, this effort is inevitable.
That said, in many cases, users would simply want to apply an actions to
a memory region of a specific size having a specific access frequency
for a specific time. For example, "page out a memory region larger than
100 MiB but having a low access frequency more than 10 minutes", or "Use
THP for a memory region larger than 2 MiB having a high access frequency
for more than 2 seconds".
For such optimizations, users will need to first account the age of each
region themselves. To reduce such efforts, this commit implements a
simple age account of each region in DAMON. For each aggregation step,
DAMON compares the access frequency and start/end address of each region
with those from last aggregation and reset the age of the region if the
change is significant. Else, the age is incremented.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
include/linux/damon.h | 5 ++
mm/damon.c | 105 ++++++++++++++++++++++++++++++++++++++++--
2 files changed, 106 insertions(+), 4 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 47fb0ec03030..49205c71c63d 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -22,6 +22,11 @@ struct damon_region {
unsigned long sampling_addr;
unsigned int nr_accesses;
struct list_head list;
+
+ unsigned int age;
+ unsigned long last_vm_start;
+ unsigned long last_vm_end;
+ unsigned int last_nr_accesses;
};
/* Represents a monitoring target task */
diff --git a/mm/damon.c b/mm/damon.c
index 4ca8a822c30c..3eeb729f3947 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -79,6 +79,10 @@ static struct damon_region *damon_new_region(struct damon_ctx *ctx,
region->sampling_addr = damon_rand(ctx, vm_start, vm_end);
INIT_LIST_HEAD(®ion->list);
+ region->age = 0;
+ region->last_vm_start = vm_start;
+ region->last_vm_end = vm_end;
+
return region;
}
@@ -613,11 +617,44 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
sizeof(r->nr_accesses));
trace_damon_aggregated(t->pid, nr,
r->vm_start, r->vm_end, r->nr_accesses);
+ r->last_nr_accesses = r->nr_accesses;
r->nr_accesses = 0;
}
}
}
+#define diff_of(a, b) (a > b ? a - b : b - a)
+
+/*
+ * Increase or reset the age of the given monitoring target region
+ *
+ * If the area or '->nr_accesses' has changed significantly, reset the '->age'.
+ * Else, increase the age.
+ */
+static void damon_do_count_age(struct damon_region *r, unsigned int threshold)
+{
+ unsigned long sz_threshold = (r->vm_end - r->vm_start) / 5;
+
+ if (diff_of(r->vm_start, r->last_vm_start) +
+ diff_of(r->vm_end, r->last_vm_end) > sz_threshold)
+ r->age = 0;
+ else if (diff_of(r->nr_accesses, r->last_nr_accesses) > threshold)
+ r->age = 0;
+ else
+ r->age++;
+}
+
+static void kdamond_count_age(struct damon_ctx *c, unsigned int threshold)
+{
+ struct damon_task *t;
+ struct damon_region *r;
+
+ damon_for_each_task(c, t) {
+ damon_for_each_region(r, t)
+ damon_do_count_age(r, threshold);
+ }
+}
+
#define sz_damon_region(r) (r->vm_end - r->vm_start)
/*
@@ -626,33 +663,86 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
static void damon_merge_two_regions(struct damon_region *l,
struct damon_region *r)
{
- l->nr_accesses = (l->nr_accesses * sz_damon_region(l) +
- r->nr_accesses * sz_damon_region(r)) /
- (sz_damon_region(l) + sz_damon_region(r));
+ unsigned long sz_l = sz_damon_region(l), sz_r = sz_damon_region(r);
+
+ l->nr_accesses = (l->nr_accesses * sz_l + r->nr_accesses * sz_r) /
+ (sz_l + sz_r);
+ l->age = (l->age * sz_l + r->age * sz_r) / (sz_l + sz_r);
l->vm_end = r->vm_end;
damon_destroy_region(r);
}
-#define diff_of(a, b) (a > b ? a - b : b - a)
+static inline void set_last_area(struct damon_region *r, struct region *last)
+{
+ r->last_vm_start = last->start;
+ r->last_vm_end = last->end;
+}
+
+static inline void get_last_area(struct damon_region *r, struct region *last)
+{
+ last->start = r->last_vm_start;
+ last->end = r->last_vm_end;
+}
/*
* Merge adjacent regions having similar access frequencies
*
* t task that merge operation will make change
* thres merge regions having '->nr_accesses' diff smaller than this
+ *
+ * After each merge, the biggest mergee region becomes the last shape of the
+ * new region. If two regions splitted from one region at the end of previous
+ * aggregation interval are merged into one region, we handle the two regions
+ * as one big mergee, because it can lead to unproper last shape record if we
+ * don't do so.
+ *
+ * To understand why we take special care for regions splitted from one region,
+ * suppose that a region of size 10 has splitted into two regions of size 4 and
+ * 6. Two regions show similar access frequency for next aggregation interval
+ * and thus now be merged into one region again. Because the split is made
+ * regardless of the access pattern, DAMON should say the region of size 10 had
+ * no area change for last aggregation interval. However, if the two mergees
+ * are handled seperatively, DAMON will say the merged region has changed its
+ * size from 6 to 10.
*/
static void damon_merge_regions_of(struct damon_task *t, unsigned int thres)
{
struct damon_region *r, *prev = NULL, *next;
+ struct region biggest_mergee; /* the biggest region being merged */
+ unsigned long sz_biggest = 0; /* size of the biggest_mergee */
+ unsigned long sz_mergee = 0; /* size of current mergee */
damon_for_each_region_safe(r, next, t) {
if (!prev || prev->vm_end != r->vm_start ||
diff_of(prev->nr_accesses, r->nr_accesses) > thres) {
+ if (sz_biggest)
+ set_last_area(prev, &biggest_mergee);
+
prev = r;
+ sz_biggest = sz_damon_region(prev);
+ get_last_area(prev, &biggest_mergee);
continue;
}
+
+
+ /* Set size of current mergee and biggest mergee */
+ sz_mergee += sz_damon_region(r);
+ if (sz_mergee > sz_biggest) {
+ sz_biggest = sz_mergee;
+ get_last_area(r, &biggest_mergee);
+ }
+
+ /*
+ * If next region and current region is not originated from
+ * same region, initialize the size of mergee.
+ */
+ if (r->last_vm_start != next->last_vm_start)
+ sz_mergee = 0;
+
damon_merge_two_regions(prev, r);
}
+ if (sz_biggest)
+ set_last_area(prev, &biggest_mergee);
}
/*
@@ -685,6 +775,12 @@ static void damon_split_region_at(struct damon_ctx *ctx,
struct damon_region *new;
new = damon_new_region(ctx, r->vm_start + sz_r, r->vm_end);
+ new->age = r->age;
+ new->last_vm_start = r->vm_start;
+ new->last_nr_accesses = r->last_nr_accesses;
+
+ r->last_vm_start = r->vm_start;
+ r->last_vm_end = r->vm_end;
r->vm_end = new->vm_start;
damon_insert_region(new, r, damon_next_region(r));
@@ -874,6 +970,7 @@ static int kdamond_fn(void *data)
if (kdamond_aggregate_interval_passed(ctx)) {
kdamond_merge_regions(ctx, max_nr_accesses / 10);
+ kdamond_count_age(ctx, max_nr_accesses / 10);
if (ctx->aggregate_cb)
ctx->aggregate_cb(ctx);
kdamond_reset_aggregated(ctx);
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC v5 3/7] mm/damon: Implement data access monitoring-based operation schemes
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
2020-03-30 11:50 ` [RFC v5 1/7] mm/madvise: Export do_madvise() to external GPL modules SeongJae Park
2020-03-30 11:50 ` [RFC v5 2/7] mm/damon: Account age of target regions SeongJae Park
@ 2020-03-30 11:50 ` SeongJae Park
2020-03-30 11:50 ` [RFC v5 4/7] mm/damon/schemes: Implement a debugfs interface SeongJae Park
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-03-30 11:50 UTC (permalink / raw)
To: akpm
Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-mm, linux-doc, linux-kernel
From: SeongJae Park <sjpark@amazon.de>
In many cases, users might use DAMON for simple data access awared
memory management optimizations such as applying an operation scheme to
a memory region of a specific size having a specific access frequency
for a specific time. For example, "page out a memory region larger than
100 MiB but having a low access frequency more than 10 minutes", or "Use
THP for a memory region larger than 2 MiB having a high access frequency
for more than 2 seconds".
To minimize users from spending their time for implementation of such
simple data access monitoring-based operation schemes, this commit makes
DAMON to handle such schemes directly. With this commit, users can
simply specify their desired schemes to DAMON.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. In specific, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significanly changed or the action of a scheme has
applied. For the action, current implementation supports only a few of
madvise() hints, ``MADV_WILLNEED``, ``MADV_COLD``, ``MADV_PAGEOUT``,
``MADV_HUGEPAGE``, and ``MADV_NOHUGEPAGE``.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
include/linux/damon.h | 24 +++++++
mm/damon.c | 149 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 173 insertions(+)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 49205c71c63d..b0fa898ed6d8 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -36,6 +36,27 @@ struct damon_task {
struct list_head list;
};
+/* Data Access Monitoring-based Operation Scheme */
+enum damos_action {
+ DAMOS_WILLNEED,
+ DAMOS_COLD,
+ DAMOS_PAGEOUT,
+ DAMOS_HUGEPAGE,
+ DAMOS_NOHUGEPAGE,
+ DAMOS_ACTION_LEN,
+};
+
+struct damos {
+ unsigned int min_sz_region;
+ unsigned int max_sz_region;
+ unsigned int min_nr_accesses;
+ unsigned int max_nr_accesses;
+ unsigned int min_age_region;
+ unsigned int max_age_region;
+ enum damos_action action;
+ struct list_head list;
+};
+
/*
* For each 'sample_interval', DAMON checks whether each region is accessed or
* not. It aggregates and keeps the access information (number of accesses to
@@ -66,6 +87,7 @@ struct damon_ctx {
struct rnd_state rndseed;
struct list_head tasks_list; /* 'damon_task' objects */
+ struct list_head schemes_list; /* 'damos' objects */
/* callbacks */
void (*sample_cb)(struct damon_ctx *context);
@@ -76,6 +98,8 @@ int damon_set_pids(struct damon_ctx *ctx, unsigned long *pids, ssize_t nr_pids);
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
unsigned long aggr_int, unsigned long regions_update_int,
unsigned long min_nr_reg, unsigned long max_nr_reg);
+int damon_set_schemes(struct damon_ctx *ctx,
+ struct damos **schemes, ssize_t nr_schemes);
int damon_set_recording(struct damon_ctx *ctx,
unsigned int rbuf_len, char *rfile_path);
int damon_start(struct damon_ctx *ctx);
diff --git a/mm/damon.c b/mm/damon.c
index 3eeb729f3947..933d484451d1 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -11,6 +11,7 @@
#define CREATE_TRACE_POINTS
+#include <asm-generic/mman-common.h>
#include <linux/damon.h>
#include <linux/debugfs.h>
#include <linux/delay.h>
@@ -45,6 +46,12 @@
#define damon_for_each_task_safe(ctx, t, next) \
list_for_each_entry_safe(t, next, &(ctx)->tasks_list, list)
+#define damon_for_each_schemes(ctx, r) \
+ list_for_each_entry(r, &(ctx)->schemes_list, list)
+
+#define damon_for_each_schemes_safe(ctx, s, next) \
+ list_for_each_entry_safe(s, next, &(ctx)->schemes_list, list)
+
#define MAX_RFILE_PATH_LEN 256
/* Get a random number in [l, r) */
@@ -174,6 +181,27 @@ static void damon_destroy_task(struct damon_task *t)
damon_free_task(t);
}
+static void damon_add_scheme(struct damon_ctx *ctx, struct damos *s)
+{
+ list_add_tail(&s->list, &ctx->schemes_list);
+}
+
+static void damon_del_scheme(struct damos *s)
+{
+ list_del(&s->list);
+}
+
+static void damon_free_scheme(struct damos *s)
+{
+ kfree(s);
+}
+
+static void damon_destroy_scheme(struct damos *s)
+{
+ damon_del_scheme(s);
+ damon_free_scheme(s);
+}
+
static unsigned int nr_damon_tasks(struct damon_ctx *ctx)
{
struct damon_task *t;
@@ -655,6 +683,101 @@ static void kdamond_count_age(struct damon_ctx *c, unsigned int threshold)
}
}
+#ifndef CONFIG_ADVISE_SYSCALLS
+static int damos_madvise(struct damon_task *task, struct damon_region *r,
+ int behavior)
+{
+ return -EINVAL;
+}
+#else
+static int damos_madvise(struct damon_task *task, struct damon_region *r,
+ int behavior)
+{
+ struct task_struct *t;
+ struct mm_struct *mm;
+ int ret = -ENOMEM;
+
+ t = damon_get_task_struct(task);
+ if (!t)
+ goto out;
+ mm = damon_get_mm(task);
+ if (!mm)
+ goto put_task_out;
+
+ ret = do_madvise(t, mm, PAGE_ALIGN(r->vm_start),
+ PAGE_ALIGN(r->vm_end - r->vm_start), behavior);
+ mmput(mm);
+put_task_out:
+ put_task_struct(t);
+out:
+ return ret;
+}
+#endif /* CONFIG_ADVISE_SYSCALLS */
+
+static int damos_do_action(struct damon_task *task, struct damon_region *r,
+ enum damos_action action)
+{
+ int madv_action;
+
+ switch (action) {
+ case DAMOS_WILLNEED:
+ madv_action = MADV_WILLNEED;
+ break;
+ case DAMOS_COLD:
+ madv_action = MADV_COLD;
+ break;
+ case DAMOS_PAGEOUT:
+ madv_action = MADV_PAGEOUT;
+ break;
+ case DAMOS_HUGEPAGE:
+ madv_action = MADV_HUGEPAGE;
+ break;
+ case DAMOS_NOHUGEPAGE:
+ madv_action = MADV_NOHUGEPAGE;
+ break;
+ default:
+ pr_warn("Wrong action %d\n", action);
+ return -EINVAL;
+ }
+
+ return damos_madvise(task, r, madv_action);
+}
+
+static void damon_do_apply_schemes(struct damon_ctx *c, struct damon_task *t,
+ struct damon_region *r)
+{
+ struct damos *s;
+ unsigned long sz;
+
+ damon_for_each_schemes(c, s) {
+ sz = r->vm_end - r->vm_start;
+ if ((s->min_sz_region && sz < s->min_sz_region) ||
+ (s->max_sz_region && s->max_sz_region < sz))
+ continue;
+ if ((s->min_nr_accesses && r->nr_accesses < s->min_nr_accesses)
+ || (s->max_nr_accesses &&
+ s->max_nr_accesses < r->nr_accesses))
+ continue;
+ if ((s->min_age_region && r->age < s->min_age_region) ||
+ (s->max_age_region &&
+ s->max_age_region < r->age))
+ continue;
+ damos_do_action(t, r, s->action);
+ r->age = 0;
+ }
+}
+
+static void kdamond_apply_schemes(struct damon_ctx *c)
+{
+ struct damon_task *t;
+ struct damon_region *r;
+
+ damon_for_each_task(c, t) {
+ damon_for_each_region(r, t)
+ damon_do_apply_schemes(c, t, r);
+ }
+}
+
#define sz_damon_region(r) (r->vm_end - r->vm_start)
/*
@@ -973,6 +1096,7 @@ static int kdamond_fn(void *data)
kdamond_count_age(ctx, max_nr_accesses / 10);
if (ctx->aggregate_cb)
ctx->aggregate_cb(ctx);
+ kdamond_apply_schemes(ctx);
kdamond_reset_aggregated(ctx);
kdamond_split_regions(ctx);
}
@@ -1060,6 +1184,30 @@ int damon_stop(struct damon_ctx *ctx)
return damon_turn_kdamond(ctx, false);
}
+/*
+ * damon_set_schemes() - Set data access monitoring based operation schemes.
+ * @ctx: monitoring context
+ * @schemes: array of the schemes
+ * @nr_schemes: number of entries in @schemes
+ *
+ * This function should not be called while the kdamond of the context is
+ * running.
+ *
+ * Return: 0 if success, or negative error code otherwise.
+ */
+int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes,
+ ssize_t nr_schemes)
+{
+ struct damos *s, *next;
+ ssize_t i;
+
+ damon_for_each_schemes_safe(ctx, s, next)
+ damon_destroy_scheme(s);
+ for (i = 0; i < nr_schemes; i++)
+ damon_add_scheme(ctx, schemes[i]);
+ return 0;
+}
+
/*
* damon_set_pids() - Set monitoring target processes.
* @ctx: monitoring context
@@ -1496,6 +1644,7 @@ static int __init damon_init_user_ctx(void)
prandom_seed_state(&ctx->rndseed, 42);
INIT_LIST_HEAD(&ctx->tasks_list);
+ INIT_LIST_HEAD(&ctx->schemes_list);
return 0;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC v5 4/7] mm/damon/schemes: Implement a debugfs interface
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
` (2 preceding siblings ...)
2020-03-30 11:50 ` [RFC v5 3/7] mm/damon: Implement data access monitoring-based operation schemes SeongJae Park
@ 2020-03-30 11:50 ` SeongJae Park
2020-03-30 11:50 ` [RFC v5 5/7] mm/damon-test: Add kunit test case for regions age accounting SeongJae Park
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-03-30 11:50 UTC (permalink / raw)
To: akpm
Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-mm, linux-doc, linux-kernel
From: SeongJae Park <sjpark@amazon.de>
This commit implements a debugfs interface for the data access
monitoring oriented memory management schemes. It is supposed to be
used by administrators and/or privileged user space programs. Users can
read and update the rules using ``<debugfs>/damon/schemes`` file. The
format is::
<min/max size> <min/max access frequency> <min/max age> <action>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
mm/damon.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 172 insertions(+), 2 deletions(-)
diff --git a/mm/damon.c b/mm/damon.c
index 933d484451d1..1f4d7e345f4c 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -181,6 +181,29 @@ static void damon_destroy_task(struct damon_task *t)
damon_free_task(t);
}
+static struct damos *damon_new_scheme(
+ unsigned int min_sz_region, unsigned int max_sz_region,
+ unsigned int min_nr_accesses, unsigned int max_nr_accesses,
+ unsigned int min_age_region, unsigned int max_age_region,
+ enum damos_action action)
+{
+ struct damos *scheme;
+
+ scheme = kmalloc(sizeof(*scheme), GFP_KERNEL);
+ if (!scheme)
+ return NULL;
+ scheme->min_sz_region = min_sz_region;
+ scheme->max_sz_region = max_sz_region;
+ scheme->min_nr_accesses = min_nr_accesses;
+ scheme->max_nr_accesses = max_nr_accesses;
+ scheme->min_age_region = min_age_region;
+ scheme->max_age_region = max_age_region;
+ scheme->action = action;
+ INIT_LIST_HEAD(&scheme->list);
+
+ return scheme;
+}
+
static void damon_add_scheme(struct damon_ctx *ctx, struct damos *s)
{
list_add_tail(&s->list, &ctx->schemes_list);
@@ -1362,6 +1385,147 @@ static ssize_t debugfs_monitor_on_write(struct file *file,
return ret;
}
+static ssize_t sprint_schemes(struct damon_ctx *c, char *buf, ssize_t len)
+{
+ struct damos *s;
+ int written = 0;
+ int rc;
+
+ damon_for_each_schemes(c, s) {
+ rc = snprintf(&buf[written], len - written,
+ "%u %u %u %u %u %u %d\n",
+ s->min_sz_region, s->max_sz_region,
+ s->min_nr_accesses, s->max_nr_accesses,
+ s->min_age_region, s->max_age_region,
+ s->action);
+ if (!rc)
+ return -ENOMEM;
+
+ written += rc;
+ }
+ return written;
+}
+
+static ssize_t debugfs_schemes_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct damon_ctx *ctx = &damon_user_ctx;
+ char *kbuf;
+ ssize_t len;
+
+ kbuf = kmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ len = sprint_schemes(ctx, kbuf, count);
+ if (len < 0)
+ goto out;
+ len = simple_read_from_buffer(buf, count, ppos, kbuf, len);
+
+out:
+ kfree(kbuf);
+ return len;
+}
+
+static void free_schemes_arr(struct damos **schemes, ssize_t nr_schemes)
+{
+ ssize_t i;
+
+ for (i = 0; i < nr_schemes; i++)
+ kfree(schemes[i]);
+ kfree(schemes);
+}
+
+/*
+ * Converts a string into an array of struct damos pointers
+ *
+ * Returns an array of struct damos pointers that converted if the conversion
+ * success, or NULL otherwise.
+ */
+static struct damos **str_to_schemes(const char *str, ssize_t len,
+ ssize_t *nr_schemes)
+{
+ struct damos *scheme, **schemes;
+ const int max_nr_schemes = 256;
+ int pos = 0, parsed, ret;
+ unsigned int min_sz, max_sz, min_nr_a, max_nr_a, min_age, max_age;
+ int action;
+
+ schemes = kmalloc_array(max_nr_schemes, sizeof(scheme),
+ GFP_KERNEL);
+ if (!schemes)
+ return NULL;
+
+ *nr_schemes = 0;
+ while (pos < len && *nr_schemes < max_nr_schemes) {
+ ret = sscanf(&str[pos], "%u %u %u %u %u %u %d%n",
+ &min_sz, &max_sz, &min_nr_a, &max_nr_a,
+ &min_age, &max_age, &action, &parsed);
+ if (ret != 7)
+ break;
+ if (action >= DAMOS_ACTION_LEN) {
+ pr_err("wrong action %d\n", action);
+ goto fail;
+ }
+
+ pos += parsed;
+ scheme = damon_new_scheme(min_sz, max_sz, min_nr_a, max_nr_a,
+ min_age, max_age, action);
+ if (!scheme)
+ goto fail;
+
+ schemes[*nr_schemes] = scheme;
+ *nr_schemes += 1;
+ }
+ if (!*nr_schemes)
+ goto fail;
+ return schemes;
+fail:
+ free_schemes_arr(schemes, *nr_schemes);
+ return NULL;
+}
+
+static ssize_t debugfs_schemes_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct damon_ctx *ctx = &damon_user_ctx;
+ char *kbuf;
+ struct damos **schemes;
+ ssize_t nr_schemes = 0, ret;
+ int err;
+
+ if (*ppos)
+ return -EINVAL;
+
+ kbuf = kmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ ret = simple_write_to_buffer(kbuf, count, ppos, buf, count);
+ if (ret < 0)
+ goto out;
+
+ schemes = str_to_schemes(kbuf, ret, &nr_schemes);
+
+ mutex_lock(&ctx->kdamond_lock);
+ if (ctx->kdamond) {
+ ret = -EBUSY;
+ goto unlock_out;
+ }
+
+ err = damon_set_schemes(ctx, schemes, nr_schemes);
+ if (err)
+ ret = err;
+ else
+ nr_schemes = 0;
+unlock_out:
+ mutex_unlock(&ctx->kdamond_lock);
+ free_schemes_arr(schemes, nr_schemes);
+out:
+ kfree(kbuf);
+ return ret;
+}
+
static ssize_t damon_sprint_pids(struct damon_ctx *ctx, char *buf, ssize_t len)
{
struct damon_task *t;
@@ -1588,6 +1752,12 @@ static const struct file_operations pids_fops = {
.write = debugfs_pids_write,
};
+static const struct file_operations schemes_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_schemes_read,
+ .write = debugfs_schemes_write,
+};
+
static const struct file_operations record_fops = {
.owner = THIS_MODULE,
.read = debugfs_record_read,
@@ -1604,10 +1774,10 @@ static struct dentry *debugfs_root;
static int __init damon_debugfs_init(void)
{
- const char * const file_names[] = {"attrs", "record",
+ const char * const file_names[] = {"attrs", "record", "schemes",
"pids", "monitor_on"};
const struct file_operations *fops[] = {&attrs_fops, &record_fops,
- &pids_fops, &monitor_on_fops};
+ &schemes_fops, &pids_fops, &monitor_on_fops};
int i;
debugfs_root = debugfs_create_dir("damon", NULL);
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC v5 5/7] mm/damon-test: Add kunit test case for regions age accounting
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
` (3 preceding siblings ...)
2020-03-30 11:50 ` [RFC v5 4/7] mm/damon/schemes: Implement a debugfs interface SeongJae Park
@ 2020-03-30 11:50 ` SeongJae Park
2020-03-30 11:50 ` [RFC v5 6/7] mm/damon/selftests: Add 'schemes' debugfs tests SeongJae Park
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-03-30 11:50 UTC (permalink / raw)
To: akpm
Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-mm, linux-doc, linux-kernel
From: SeongJae Park <sjpark@amazon.de>
After merges of regions, each region should know their last shape in
proper way to measure the changes from the last modification and reset
the age if the changes are significant. This commit adds kunit test
cases checking whether the regions are knowing their last shape properly
after merges of regions.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
---
mm/damon-test.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/damon-test.h b/mm/damon-test.h
index 498c637b78ff..133de6c70c37 100644
--- a/mm/damon-test.h
+++ b/mm/damon-test.h
@@ -540,6 +540,8 @@ static void damon_test_merge_regions_of(struct kunit *test)
unsigned long saddrs[] = {0, 114, 130, 156, 170};
unsigned long eaddrs[] = {112, 130, 156, 170, 230};
+ unsigned long lsa[] = {0, 114, 130, 156, 184};
+ unsigned long lea[] = {100, 122, 156, 170, 230};
int i;
t = damon_new_task(42);
@@ -556,6 +558,9 @@ static void damon_test_merge_regions_of(struct kunit *test)
r = damon_nth_region_of(t, i);
KUNIT_EXPECT_EQ(test, r->vm_start, saddrs[i]);
KUNIT_EXPECT_EQ(test, r->vm_end, eaddrs[i]);
+ KUNIT_EXPECT_EQ(test, r->last_vm_start, lsa[i]);
+ KUNIT_EXPECT_EQ(test, r->last_vm_end, lea[i]);
+
}
damon_free_task(t);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC v5 6/7] mm/damon/selftests: Add 'schemes' debugfs tests
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
` (4 preceding siblings ...)
2020-03-30 11:50 ` [RFC v5 5/7] mm/damon-test: Add kunit test case for regions age accounting SeongJae Park
@ 2020-03-30 11:50 ` SeongJae Park
2020-03-30 11:50 ` [RFC v5 7/7] damon/tools: Support more human friendly 'schemes' control SeongJae Park
2020-03-31 15:51 ` [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes Jonathan Cameron
7 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-03-30 11:50 UTC (permalink / raw)
To: akpm
Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-mm, linux-doc, linux-kernel
From: SeongJae Park <sjpark@amazon.de>
This commit adds simple selftets for 'schemes' debugfs file of DAMON.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
.../testing/selftests/damon/debugfs_attrs.sh | 29 +++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh
index d5188b0f71b1..82a98c81975b 100755
--- a/tools/testing/selftests/damon/debugfs_attrs.sh
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -97,6 +97,35 @@ fi
echo $ORIG_CONTENT > $file
+# Test schemes file
+file="$DBGFS/schemes"
+
+ORIG_CONTENT=$(cat $file)
+echo "1 2 3 4 5 6 3" > $file
+if [ $? -ne 0 ]
+then
+ echo "$file write fail"
+ echo $ORIG_CONTENT > $file
+ exit 1
+fi
+
+echo "1 2
+3 4 5 6 3" > $file
+if [ $? -eq 0 ]
+then
+ echo "$file splitted write success (expected fail)"
+ echo $ORIG_CONTENT > $file
+ exit 1
+fi
+
+echo > $file
+if [ $? -ne 0 ]
+then
+ echo "$file empty string writing fail"
+ echo $ORIG_CONTENT > $file
+ exit 1
+fi
+
# Test pids file
file="$DBGFS/pids"
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC v5 7/7] damon/tools: Support more human friendly 'schemes' control
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
` (5 preceding siblings ...)
2020-03-30 11:50 ` [RFC v5 6/7] mm/damon/selftests: Add 'schemes' debugfs tests SeongJae Park
@ 2020-03-30 11:50 ` SeongJae Park
2020-03-31 15:51 ` [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes Jonathan Cameron
7 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-03-30 11:50 UTC (permalink / raw)
To: akpm
Cc: SeongJae Park, Jonathan.Cameron, aarcange, acme,
alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-mm, linux-doc, linux-kernel
From: SeongJae Park <sjpark@amazon.de>
This commit implements 'schemes' subcommand of the damon userspace tool.
It can be used to describe and apply the data access monitoring-based
operation schemes in more human friendly fashion.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
tools/damon/_convert_damos.py | 125 +++++++++++++++++++++++++++++
tools/damon/_damon.py | 143 ++++++++++++++++++++++++++++++++++
tools/damon/damo | 7 ++
tools/damon/record.py | 135 +++-----------------------------
tools/damon/schemes.py | 105 +++++++++++++++++++++++++
5 files changed, 392 insertions(+), 123 deletions(-)
create mode 100755 tools/damon/_convert_damos.py
create mode 100644 tools/damon/_damon.py
create mode 100644 tools/damon/schemes.py
diff --git a/tools/damon/_convert_damos.py b/tools/damon/_convert_damos.py
new file mode 100755
index 000000000000..0f1e7e3d4ccc
--- /dev/null
+++ b/tools/damon/_convert_damos.py
@@ -0,0 +1,125 @@
+#!/usr/bin/env python3
+
+"""
+Change human readable data access monitoring-based operation schemes to the low
+level input for the '<debugfs>/damon/schemes' file. Below is an example of the
+schemes written in the human readable format:
+
+# format is: <min/max size> <min/max frequency (0-100)> <min/max age> <action>
+# lines starts with '#' or blank are ignored.
+# B/K/M/G/T for Bytes/KiB/MiB/GiB/TiB
+# us/ms/s/m/h/d for micro-seconds/milli-seconds/seconds/minutes/hours/days
+# 'null' means zero, which passes the check
+
+# if a region (no matter of its size) keeps a high access frequency for more
+# than 100ms, put the region on the head of the LRU list (call madvise() with
+# MADV_WILLNEED).
+null null 80 null 100ms null willneed
+
+# if a region keeps a low access frequency for more than 100ms, put the
+# region on the tail of the LRU list (call madvise() with MADV_COLD).
+0B 0B 10 20 200ms 1h cold
+
+# if a region keeps a very low access frequency for more than 100ms, swap
+# out the region immediately (call madvise() with MADV_PAGEOUT).
+0B null 0 10 100ms 2h pageout
+
+# if a region of a size bigger than 2MiB keeps a very high access frequency
+# for more than 100ms, let the region to use huge pages (call madvise()
+# with MADV_HUGEPAGE).
+2M null 90 99 100ms 2h hugepage
+
+# If a regions of a size bigger than 2MiB keeps no high access frequency
+# for more than 100ms, avoid the region from using huge pages (call
+# madvise() with MADV_NOHUGEPAGE).
+2M null 0 25 100ms 2h nohugepage
+"""
+
+import argparse
+
+unit_to_bytes = {'B': 1, 'K': 1024, 'M': 1024 * 1024, 'G': 1024 * 1024 * 1024,
+ 'T': 1024 * 1024 * 1024 * 1024}
+
+def text_to_bytes(txt):
+ if txt == 'null':
+ return 0
+ unit = txt[-1]
+ number = int(txt[:-1])
+ return number * unit_to_bytes[unit]
+
+unit_to_usecs = {'us': 1, 'ms': 1000, 's': 1000 * 1000, 'm': 60 * 1000 * 1000,
+ 'h': 60 * 60 * 1000 * 1000, 'd': 24 * 60 * 60 * 1000 * 1000}
+
+def text_to_us(txt):
+ if txt == 'null':
+ return 0
+ unit = txt[-2:]
+ if unit in ['us', 'ms']:
+ number = int(txt[:-2])
+ else:
+ unit = txt[-1]
+ number = int(txt[:-1])
+ return number * unit_to_usecs[unit]
+
+damos_action_to_int = {'DAMOS_WILLNEED': 0, 'DAMOS_COLD': 1,
+ 'DAMOS_PAGEOUT': 2, 'DAMOS_HUGEPAGE': 3, 'DAMOS_NOHUGEPAGE': 4}
+
+def text_to_damos_action(txt):
+ return damos_action_to_int['DAMOS_' + txt.upper()]
+
+def text_to_nr_accesses(txt, max_nr_accesses):
+ if txt == 'null':
+ return 0
+ return int(int(txt) * max_nr_accesses / 100)
+
+def debugfs_scheme(line, sample_interval, aggr_interval):
+ fields = line.split()
+ if len(fields) != 7:
+ print('wrong input line: %s' % line)
+ exit(1)
+
+ limit_nr_accesses = aggr_interval / sample_interval
+ try:
+ min_sz = text_to_bytes(fields[0])
+ max_sz = text_to_bytes(fields[1])
+ min_nr_accesses = text_to_nr_accesses(fields[2], limit_nr_accesses)
+ max_nr_accesses = text_to_nr_accesses(fields[3], limit_nr_accesses)
+ min_age = text_to_us(fields[4]) / aggr_interval
+ max_age = text_to_us(fields[5]) / aggr_interval
+ action = text_to_damos_action(fields[6])
+ except:
+ print('wrong input field')
+ raise
+ return '%d\t%d\t%d\t%d\t%d\t%d\t%d' % (min_sz, max_sz, min_nr_accesses,
+ max_nr_accesses, min_age, max_age, action)
+
+def convert(schemes_file, sample_interval, aggr_interval):
+ lines = []
+ with open(schemes_file, 'r') as f:
+ for line in f:
+ if line.startswith('#'):
+ continue
+ line = line.strip()
+ if line == '':
+ continue
+ lines.append(debugfs_scheme(line, sample_interval, aggr_interval))
+ return '\n'.join(lines)
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument('input', metavar='<file>',
+ help='input file describing the schemes')
+ parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
+ default=5000, help='sampling interval (us)')
+ parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
+ default=100000, help='aggregation interval (us)')
+ args = parser.parse_args()
+
+ schemes_file = args.input
+ sample_interval = args.sample
+ aggr_interval = args.aggr
+
+ print(convert(schemes_file, sample_interval, aggr_interval))
+
+if __name__ == '__main__':
+ main()
diff --git a/tools/damon/_damon.py b/tools/damon/_damon.py
new file mode 100644
index 000000000000..0a703ec7471a
--- /dev/null
+++ b/tools/damon/_damon.py
@@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Contains core functions for DAMON debugfs control.
+"""
+
+import os
+import subprocess
+
+debugfs_attrs = None
+debugfs_record = None
+debugfs_schemes = None
+debugfs_pids = None
+debugfs_monitor_on = None
+
+def set_target_pid(pid):
+ return subprocess.call('echo %s > %s' % (pid, debugfs_pids), shell=True,
+ executable='/bin/bash')
+
+def turn_damon(on_off):
+ return subprocess.call("echo %s > %s" % (on_off, debugfs_monitor_on),
+ shell=True, executable="/bin/bash")
+
+def is_damon_running():
+ with open(debugfs_monitor_on, 'r') as f:
+ return f.read().strip() == 'on'
+
+class Attrs:
+ sample_interval = None
+ aggr_interval = None
+ regions_update_interval = None
+ min_nr_regions = None
+ max_nr_regions = None
+ rbuf_len = None
+ rfile_path = None
+ schemes = None
+
+ def __init__(self, s, a, r, n, x, l, f, c):
+ self.sample_interval = s
+ self.aggr_interval = a
+ self.regions_update_interval = r
+ self.min_nr_regions = n
+ self.max_nr_regions = x
+ self.rbuf_len = l
+ self.rfile_path = f
+ self.schemes = c
+
+ def __str__(self):
+ return "%s %s %s %s %s %s %s\n%s" % (self.sample_interval,
+ self.aggr_interval, self.regions_update_interval,
+ self.min_nr_regions, self.max_nr_regions, self.rbuf_len,
+ self.rfile_path, self.schemes)
+
+ def attr_str(self):
+ return "%s %s %s %s %s " % (self.sample_interval, self.aggr_interval,
+ self.regions_update_interval, self.min_nr_regions,
+ self.max_nr_regions)
+
+ def record_str(self):
+ return '%s %s ' % (self.rbuf_len, self.rfile_path)
+
+ def apply(self):
+ ret = subprocess.call('echo %s > %s' % (self.attr_str(), debugfs_attrs),
+ shell=True, executable='/bin/bash')
+ if ret:
+ return ret
+ ret = subprocess.call('echo %s > %s' % (self.record_str(),
+ debugfs_record), shell=True, executable='/bin/bash')
+ if ret:
+ return ret
+ return subprocess.call('echo %s > %s' % (
+ self.schemes.replace('\n', ' '), debugfs_schemes), shell=True,
+ executable='/bin/bash')
+
+def current_attrs():
+ with open(debugfs_attrs, 'r') as f:
+ attrs = f.read().split()
+ attrs = [int(x) for x in attrs]
+
+ with open(debugfs_record, 'r') as f:
+ rattrs = f.read().split()
+ attrs.append(int(rattrs[0]))
+ attrs.append(rattrs[1])
+
+ with open(debugfs_schemes, 'r') as f:
+ schemes = f.read()
+ attrs.append(schemes)
+
+ return Attrs(*attrs)
+
+def chk_update_debugfs(debugfs):
+ global debugfs_attrs
+ global debugfs_record
+ global debugfs_schemes
+ global debugfs_pids
+ global debugfs_monitor_on
+
+ debugfs_damon = os.path.join(debugfs, 'damon')
+ debugfs_attrs = os.path.join(debugfs_damon, 'attrs')
+ debugfs_record = os.path.join(debugfs_damon, 'record')
+ debugfs_schemes = os.path.join(debugfs_damon, 'schemes')
+ debugfs_pids = os.path.join(debugfs_damon, 'pids')
+ debugfs_monitor_on = os.path.join(debugfs_damon, 'monitor_on')
+
+ if not os.path.isdir(debugfs_damon):
+ print("damon debugfs dir (%s) not found", debugfs_damon)
+ exit(1)
+
+ for f in [debugfs_attrs, debugfs_record, debugfs_schemes, debugfs_pids,
+ debugfs_monitor_on]:
+ if not os.path.isfile(f):
+ print("damon debugfs file (%s) not found" % f)
+ exit(1)
+
+def cmd_args_to_attrs(args):
+ "Generate attributes with specified arguments"
+ sample_interval = args.sample
+ aggr_interval = args.aggr
+ regions_update_interval = args.updr
+ min_nr_regions = args.minr
+ max_nr_regions = args.maxr
+ rbuf_len = args.rbuf
+ if not os.path.isabs(args.out):
+ args.out = os.path.join(os.getcwd(), args.out)
+ rfile_path = args.out
+ schemes = args.schemes
+ return Attrs(sample_interval, aggr_interval, regions_update_interval,
+ min_nr_regions, max_nr_regions, rbuf_len, rfile_path, schemes)
+
+def set_attrs_argparser(parser):
+ parser.add_argument('-d', '--debugfs', metavar='<debugfs>', type=str,
+ default='/sys/kernel/debug', help='debugfs mounted path')
+ parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
+ default=5000, help='sampling interval')
+ parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
+ default=100000, help='aggregate interval')
+ parser.add_argument('-u', '--updr', metavar='<interval>', type=int,
+ default=1000000, help='regions update interval')
+ parser.add_argument('-n', '--minr', metavar='<# regions>', type=int,
+ default=10, help='minimal number of regions')
+ parser.add_argument('-m', '--maxr', metavar='<# regions>', type=int,
+ default=1000, help='maximum number of regions')
diff --git a/tools/damon/damo b/tools/damon/damo
index 58e1099ae5fc..ce7180069bef 100755
--- a/tools/damon/damo
+++ b/tools/damon/damo
@@ -5,6 +5,7 @@ import argparse
import record
import report
+import schemes
class SubCmdHelpFormatter(argparse.RawDescriptionHelpFormatter):
def _format_action(self, action):
@@ -25,6 +26,10 @@ parser_record = subparser.add_parser('record',
help='record data accesses of the given target processes')
record.set_argparser(parser_record)
+parser_schemes = subparser.add_parser('schemes',
+ help='apply operation schemes to the given target process')
+schemes.set_argparser(parser_schemes)
+
parser_report = subparser.add_parser('report',
help='report the recorded data accesses in the specified form')
report.set_argparser(parser_report)
@@ -33,5 +38,7 @@ args = parser.parse_args()
if args.command == 'record':
record.main(args)
+elif args.command == 'schemes':
+ schemes.main(args)
elif args.command == 'report':
report.main(args)
diff --git a/tools/damon/record.py b/tools/damon/record.py
index a547d479a103..3bbf7b8359da 100644
--- a/tools/damon/record.py
+++ b/tools/damon/record.py
@@ -6,28 +6,12 @@ Record data access patterns of the target process.
"""
import argparse
-import copy
import os
import signal
import subprocess
import time
-debugfs_attrs = None
-debugfs_record = None
-debugfs_pids = None
-debugfs_monitor_on = None
-
-def set_target_pid(pid):
- return subprocess.call('echo %s > %s' % (pid, debugfs_pids), shell=True,
- executable='/bin/bash')
-
-def turn_damon(on_off):
- return subprocess.call("echo %s > %s" % (on_off, debugfs_monitor_on),
- shell=True, executable="/bin/bash")
-
-def is_damon_running():
- with open(debugfs_monitor_on, 'r') as f:
- return f.read().strip() == 'on'
+import _damon
def do_record(target, is_target_cmd, attrs, old_attrs):
if os.path.isfile(attrs.rfile_path):
@@ -36,93 +20,29 @@ def do_record(target, is_target_cmd, attrs, old_attrs):
if attrs.apply():
print('attributes (%s) failed to be applied' % attrs)
cleanup_exit(old_attrs, -1)
- print('# damon attrs: %s' % attrs)
+ print('# damon attrs: %s %s' % (attrs.attr_str(), attrs.record_str()))
if is_target_cmd:
p = subprocess.Popen(target, shell=True, executable='/bin/bash')
target = p.pid
- if set_target_pid(target):
+ if _damon.set_target_pid(target):
print('pid setting (%s) failed' % target)
cleanup_exit(old_attrs, -2)
- if turn_damon('on'):
+ if _damon.turn_damon('on'):
print('could not turn on damon' % target)
cleanup_exit(old_attrs, -3)
if is_target_cmd:
p.wait()
while True:
# damon will turn it off by itself if the target tasks are terminated.
- if not is_damon_running():
+ if not _damon.is_damon_running():
break
time.sleep(1)
cleanup_exit(old_attrs, 0)
-class Attrs:
- sample_interval = None
- aggr_interval = None
- regions_update_interval = None
- min_nr_regions = None
- max_nr_regions = None
- rbuf_len = None
- rfile_path = None
-
- def __init__(self, s, a, r, n, x, l, f):
- self.sample_interval = s
- self.aggr_interval = a
- self.regions_update_interval = r
- self.min_nr_regions = n
- self.max_nr_regions = x
- self.rbuf_len = l
- self.rfile_path = f
-
- def __str__(self):
- return "%s %s %s %s %s %s %s" % (self.sample_interval, self.aggr_interval,
- self.regions_update_interval, self.min_nr_regions,
- self.max_nr_regions, self.rbuf_len, self.rfile_path)
-
- def attr_str(self):
- return "%s %s %s %s %s " % (self.sample_interval, self.aggr_interval,
- self.regions_update_interval, self.min_nr_regions,
- self.max_nr_regions)
-
- def record_str(self):
- return '%s %s ' % (self.rbuf_len, self.rfile_path)
-
- def apply(self):
- ret = subprocess.call('echo %s > %s' % (self.attr_str(), debugfs_attrs),
- shell=True, executable='/bin/bash')
- if ret:
- return ret
- return subprocess.call('echo %s > %s' % (self.record_str(),
- debugfs_record), shell=True, executable='/bin/bash')
-
-def current_attrs():
- with open(debugfs_attrs, 'r') as f:
- attrs = f.read().split()
- attrs = [int(x) for x in attrs]
-
- with open(debugfs_record, 'r') as f:
- rattrs = f.read().split()
- attrs.append(int(rattrs[0]))
- attrs.append(rattrs[1])
- return Attrs(*attrs)
-
-def cmd_args_to_attrs(args):
- "Generate attributes with specified arguments"
- sample_interval = args.sample
- aggr_interval = args.aggr
- regions_update_interval = args.updr
- min_nr_regions = args.minr
- max_nr_regions = args.maxr
- rbuf_len = args.rbuf
- if not os.path.isabs(args.out):
- args.out = os.path.join(os.getcwd(), args.out)
- rfile_path = args.out
- return Attrs(sample_interval, aggr_interval, regions_update_interval,
- min_nr_regions, max_nr_regions, rbuf_len, rfile_path)
-
def cleanup_exit(orig_attrs, exit_code):
- if is_damon_running():
- if turn_damon('off'):
+ if _damon.is_damon_running():
+ if _damon.turn_damon('off'):
print('failed to turn damon off!')
if orig_attrs:
if orig_attrs.apply():
@@ -133,51 +53,19 @@ def sighandler(signum, frame):
print('\nsignal %s received' % signum)
cleanup_exit(orig_attrs, signum)
-def chk_update_debugfs(debugfs):
- global debugfs_attrs
- global debugfs_record
- global debugfs_pids
- global debugfs_monitor_on
-
- debugfs_damon = os.path.join(debugfs, 'damon')
- debugfs_attrs = os.path.join(debugfs_damon, 'attrs')
- debugfs_record = os.path.join(debugfs_damon, 'record')
- debugfs_pids = os.path.join(debugfs_damon, 'pids')
- debugfs_monitor_on = os.path.join(debugfs_damon, 'monitor_on')
-
- if not os.path.isdir(debugfs_damon):
- print("damon debugfs dir (%s) not found", debugfs_damon)
- exit(1)
-
- for f in [debugfs_attrs, debugfs_record, debugfs_pids, debugfs_monitor_on]:
- if not os.path.isfile(f):
- print("damon debugfs file (%s) not found" % f)
- exit(1)
-
def chk_permission():
if os.geteuid() != 0:
print("Run as root")
exit(1)
def set_argparser(parser):
+ _damon.set_attrs_argparser(parser)
parser.add_argument('target', type=str, metavar='<target>',
help='the target command or the pid to record')
- parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
- default=5000, help='sampling interval')
- parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
- default=100000, help='aggregate interval')
- parser.add_argument('-u', '--updr', metavar='<interval>', type=int,
- default=1000000, help='regions update interval')
- parser.add_argument('-n', '--minr', metavar='<# regions>', type=int,
- default=10, help='minimal number of regions')
- parser.add_argument('-m', '--maxr', metavar='<# regions>', type=int,
- default=1000, help='maximum number of regions')
parser.add_argument('-l', '--rbuf', metavar='<len>', type=int,
default=1024*1024, help='length of record result buffer')
parser.add_argument('-o', '--out', metavar='<file path>', type=str,
default='damon.data', help='output file path')
- parser.add_argument('-d', '--debugfs', metavar='<debugfs>', type=str,
- default='/sys/kernel/debug', help='debugfs mounted path')
def main(args=None):
global orig_attrs
@@ -187,13 +75,14 @@ def main(args=None):
args = parser.parse_args()
chk_permission()
- chk_update_debugfs(args.debugfs)
+ _damon.chk_update_debugfs(args.debugfs)
signal.signal(signal.SIGINT, sighandler)
signal.signal(signal.SIGTERM, sighandler)
- orig_attrs = current_attrs()
+ orig_attrs = _damon.current_attrs()
- new_attrs = cmd_args_to_attrs(args)
+ args.schemes = ''
+ new_attrs = _damon.cmd_args_to_attrs(args)
target = args.target
target_fields = target.split()
diff --git a/tools/damon/schemes.py b/tools/damon/schemes.py
new file mode 100644
index 000000000000..408a73813234
--- /dev/null
+++ b/tools/damon/schemes.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Apply given operation schemes to the target process.
+"""
+
+import argparse
+import os
+import signal
+import subprocess
+import time
+
+import _convert_damos
+import _damon
+
+def run_damon(target, is_target_cmd, attrs, old_attrs):
+ if os.path.isfile(attrs.rfile_path):
+ os.rename(attrs.rfile_path, attrs.rfile_path + '.old')
+
+ if attrs.apply():
+ print('attributes (%s) failed to be applied' % attrs)
+ cleanup_exit(old_attrs, -1)
+ print('# damon attrs: %s %s' % (attrs.attr_str(), attrs.record_str()))
+ for line in attrs.schemes.split('\n'):
+ print('# scheme: %s' % line)
+ if is_target_cmd:
+ p = subprocess.Popen(target, shell=True, executable='/bin/bash')
+ target = p.pid
+ if _damon.set_target_pid(target):
+ print('pid setting (%s) failed' % target)
+ cleanup_exit(old_attrs, -2)
+ if _damon.turn_damon('on'):
+ print('could not turn on damon' % target)
+ cleanup_exit(old_attrs, -3)
+ if is_target_cmd:
+ p.wait()
+ while True:
+ # damon will turn it off by itself if the target tasks are terminated.
+ if not _damon.is_damon_running():
+ break
+ time.sleep(1)
+
+ cleanup_exit(old_attrs, 0)
+
+def cleanup_exit(orig_attrs, exit_code):
+ if _damon.is_damon_running():
+ if turn_damon('off'):
+ print('failed to turn damon off!')
+ if orig_attrs:
+ if orig_attrs.apply():
+ print('original attributes (%s) restoration failed!' % orig_attrs)
+ exit(exit_code)
+
+def sighandler(signum, frame):
+ print('\nsignal %s received' % signum)
+ cleanup_exit(orig_attrs, signum)
+
+def chk_permission():
+ if os.geteuid() != 0:
+ print("Run as root")
+ exit(1)
+
+def set_argparser(parser):
+ _damon.set_attrs_argparser(parser)
+ parser.add_argument('target', type=str, metavar='<target>',
+ help='the target command or the pid to record')
+ parser.add_argument('-c', '--schemes', metavar='<file>', type=str,
+ default='damon.schemes',
+ help='data access monitoring-based operation schemes')
+
+def main(args=None):
+ global orig_attrs
+ if not args:
+ parser = argparse.ArgumentParser()
+ set_argparser(parser)
+ args = parser.parse_args()
+
+ chk_permission()
+ _damon.chk_update_debugfs(args.debugfs)
+
+ signal.signal(signal.SIGINT, sighandler)
+ signal.signal(signal.SIGTERM, sighandler)
+ orig_attrs = _damon.current_attrs()
+
+ args.rbuf = 0
+ args.out = 'null'
+ args.schemes = _convert_damos.convert(args.schemes, args.sample, args.aggr)
+ new_attrs = _damon.cmd_args_to_attrs(args)
+ target = args.target
+
+ target_fields = target.split()
+ if not subprocess.call('which %s > /dev/null' % target_fields[0],
+ shell=True, executable='/bin/bash'):
+ run_damon(target, True, new_attrs, orig_attrs)
+ else:
+ try:
+ pid = int(target)
+ except:
+ print('target \'%s\' is neither a command, nor a pid' % target)
+ exit(1)
+ run_damon(target, False, new_attrs, orig_attrs)
+
+if __name__ == '__main__':
+ main()
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes
2020-03-30 11:50 [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
` (6 preceding siblings ...)
2020-03-30 11:50 ` [RFC v5 7/7] damon/tools: Support more human friendly 'schemes' control SeongJae Park
@ 2020-03-31 15:51 ` Jonathan Cameron
2020-03-31 16:18 ` SeongJae Park
7 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2020-03-31 15:51 UTC (permalink / raw)
To: SeongJae Park, alexander.shishkin, linux-mm
Cc: akpm, SeongJae Park, aarcange, acme, amit, brendan.d.gregg,
brendanhiggins, cai, colin.king, corbet, dwmw, jolsa, kirill,
mark.rutland, mgorman, minchan, mingo, namhyung, peterz, rdunlap,
riel, rientjes, rostedt, shakeelb, shuah, sj38.park, vbabka,
vdavydov.dev, yang.shi, ying.huang, linux-doc, linux-kernel
On Mon, 30 Mar 2020 13:50:35 +0200
SeongJae Park <sjpark@amazon.com> wrote:
> From: SeongJae Park <sjpark@amazon.de>
>
> DAMON[1] can be used as a primitive for data access awared memory management
> optimizations. That said, users who want such optimizations should run DAMON,
> read the monitoring results, analyze it, plan a new memory management scheme,
> and apply the new scheme by themselves. Such efforts will be inevitable for
> some complicated optimizations.
>
> However, in many other cases, the users could simply want the system to apply a
> memory management action to a memory region of a specific size having a
> specific access frequency for a specific time. For example, "page out a memory
> region larger than 100 MiB keeping only rare accesses more than 2 minutes", or
> "Do not use THP for a memory region larger than 2 MiB rarely accessed for more
> than 1 seconds".
>
> This RFC patchset makes DAMON to handle such data access monitoring-based
> operation schemes. With this change, users can do the data access awared
> optimizations by simply specifying their schemes to DAMON.
Hi SeongJae,
I'm wondering if I'm misreading the results below or a data handling mixup
occured. See inline.
Thanks,
Jonathan
>
>
> Evaluations
> ===========
>
> Setup
> -----
>
> On my personal QEMU/KVM based virtual machine on an Intel i7 host machine
> running Ubuntu 18.04, I measure runtime and consumed system memory while
> running various realistic workloads with several configurations. I use 13 and
> 12 workloads in PARSEC3[3] and SPLASH-2X[4] benchmark suites, respectively. I
> personally use another wrapper scripts[5] for setup and run of the workloads.
> On top of this patchset, we also applied the DAMON-based operation schemes
> patchset[6] for this evaluation.
>
> Measurement
> ~~~~~~~~~~~
>
> For the measurement of the amount of consumed memory in system global scope, I
> drop caches before starting each of the workloads and monitor 'MemFree' in the
> '/proc/meminfo' file. To make results more stable, I repeat the runs 5 times
> and average results. You can get stdev, min, and max of the numbers among the
> repeated runs in appendix below.
>
> Configurations
> ~~~~~~~~~~~~~~
>
> The configurations I use are as below.
>
> orig: Linux v5.5 with 'madvise' THP policy
> rec: 'orig' plus DAMON running with record feature
> thp: same with 'orig', but use 'always' THP policy
> ethp: 'orig' plus a DAMON operation scheme[6], 'efficient THP'
> prcl: 'orig' plus a DAMON operation scheme, 'proactive reclaim[7]'
>
> I use 'rec' for measurement of DAMON overheads to target workloads and system
> memory. The remaining configs including 'thp', 'ethp', and 'prcl' are for
> measurement of DAMON monitoring accuracy.
>
> 'ethp' and 'prcl' is simple DAMON-based operation schemes developed for
> proof of concepts of DAMON. 'ethp' reduces memory space waste of THP by using
> DAMON for decision of promotions and demotion for huge pages, while 'prcl' is
> as similar as the original work. Those are implemented as below:
>
> # format: <min/max size> <min/max frequency (0-100)> <min/max age> <action>
> # ethp: Use huge pages if a region >2MB shows >5% access rate, use regular
> # pages if a region >2MB shows <5% access rate for >1 second
> 2M null 5 null null null hugepage
> 2M null null 5 1s null nohugepage
>
> # prcl: If a region >4KB shows <5% access rate for >5 seconds, page out.
> 4K null null 5 5s null pageout
>
> Note that both 'ethp' and 'prcl' are designed with my only straightforward
> intuition, because those are for only proof of concepts and monitoring accuracy
> of DAMON. In other words, those are not for production. For production use,
> those should be tuned more.
>
>
> [1] "Redis latency problems troubleshooting", https://redis.io/topics/latency
> [2] "Disable Transparent Huge Pages (THP)",
> https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
> [3] "The PARSEC Becnhmark Suite", https://parsec.cs.princeton.edu/index.htm
> [4] "SPLASH-2x", https://parsec.cs.princeton.edu/parsec3-doc.htm#splash2x
> [5] "parsec3_on_ubuntu", https://github.com/sjp38/parsec3_on_ubuntu
> [6] "[RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation
> Schemes",
> https://lore.kernel.org/linux-mm/20200303121406.20954-1-sjpark@amazon.com/
> [7] "Proactively reclaiming idle memory", https://lwn.net/Articles/787611/
>
>
> Results
> -------
>
> Below two tables show the measurement results. The runtimes are in seconds
> while the memory usages are in KiB. Each configurations except 'orig' shows
> its overhead relative to 'orig' in percent within parenthesises.
>
> runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> parsec3/blackscholes 107.594 107.956 (0.34) 106.750 (-0.78) 107.672 (0.07) 111.916 (4.02)
> parsec3/bodytrack 79.230 79.368 (0.17) 78.908 (-0.41) 79.705 (0.60) 80.423 (1.50)
> parsec3/canneal 142.831 143.810 (0.69) 123.530 (-13.51) 133.778 (-6.34) 144.998 (1.52)
> parsec3/dedup 11.986 11.959 (-0.23) 11.762 (-1.87) 12.028 (0.35) 13.313 (11.07)
> parsec3/facesim 210.125 209.007 (-0.53) 205.226 (-2.33) 207.766 (-1.12) 209.815 (-0.15)
> parsec3/ferret 191.601 191.177 (-0.22) 190.420 (-0.62) 191.775 (0.09) 192.638 (0.54)
> parsec3/fluidanimate 212.735 212.970 (0.11) 209.151 (-1.68) 211.904 (-0.39) 218.573 (2.74)
> parsec3/freqmine 291.225 290.873 (-0.12) 289.258 (-0.68) 289.884 (-0.46) 298.373 (2.45)
> parsec3/raytrace 118.289 119.586 (1.10) 119.045 (0.64) 119.064 (0.66) 137.919 (16.60)
> parsec3/streamcluster 323.565 328.168 (1.42) 279.565 (-13.60) 287.452 (-11.16) 333.244 (2.99)
> parsec3/swaptions 155.140 155.473 (0.21) 153.816 (-0.85) 156.423 (0.83) 156.237 (0.71)
> parsec3/vips 58.979 59.311 (0.56) 58.733 (-0.42) 59.005 (0.04) 61.062 (3.53)
> parsec3/x264 70.539 68.413 (-3.01) 64.760 (-8.19) 67.180 (-4.76) 68.103 (-3.45)
> splash2x/barnes 80.414 81.751 (1.66) 73.585 (-8.49) 80.232 (-0.23) 115.753 (43.95)
> splash2x/fft 33.902 34.111 (0.62) 24.228 (-28.53) 29.926 (-11.73) 44.438 (31.08)
> splash2x/lu_cb 85.556 86.001 (0.52) 84.538 (-1.19) 86.000 (0.52) 91.447 (6.89)
> splash2x/lu_ncb 93.399 93.652 (0.27) 90.463 (-3.14) 94.008 (0.65) 93.901 (0.54)
> splash2x/ocean_cp 45.253 45.191 (-0.14) 43.049 (-4.87) 44.022 (-2.72) 46.588 (2.95)
> splash2x/ocean_ncp 86.927 87.065 (0.16) 50.747 (-41.62) 86.855 (-0.08) 199.553 (129.57)
> splash2x/radiosity 91.433 91.511 (0.09) 90.626 (-0.88) 91.865 (0.47) 104.524 (14.32)
> splash2x/radix 31.923 32.023 (0.31) 25.194 (-21.08) 32.035 (0.35) 39.231 (22.89)
> splash2x/raytrace 84.367 84.677 (0.37) 82.417 (-2.31) 83.505 (-1.02) 84.857 (0.58)
> splash2x/volrend 87.499 87.495 (-0.00) 86.775 (-0.83) 87.311 (-0.21) 87.511 (0.01)
> splash2x/water_nsquared 236.397 236.759 (0.15) 219.902 (-6.98) 224.228 (-5.15) 238.562 (0.92)
> splash2x/water_spatial 89.646 89.767 (0.14) 89.735 (0.10) 90.347 (0.78) 103.585 (15.55)
> total 3020.570 3028.080 (0.25) 2852.190 (-5.57) 2953.960 (-2.21) 3276.550 (8.47)
>
>
> memused.avg orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> parsec3/blackscholes 1785916.600 1834201.400 (2.70) 1826249.200 (2.26) 1828079.200 (2.36) 1712210.600 (-4.13)
> parsec3/bodytrack 1415049.400 1434317.600 (1.36) 1423715.000 (0.61) 1430392.600 (1.08) 1435136.000 (1.42)
> parsec3/canneal 1043489.800 1058617.600 (1.45) 1040484.600 (-0.29) 1048664.800 (0.50) 1050280.000 (0.65)
> parsec3/dedup 2414453.200 2458493.200 (1.82) 2411379.400 (-0.13) 2400516.000 (-0.58) 2461120.800 (1.93)
> parsec3/facesim 541597.200 550097.400 (1.57) 544364.600 (0.51) 553240.000 (2.15) 552316.400 (1.98)
> parsec3/ferret 317986.600 332346.000 (4.52) 320218.000 (0.70) 331085.000 (4.12) 330895.200 (4.06)
> parsec3/fluidanimate 576183.400 585442.000 (1.61) 577780.200 (0.28) 587703.400 (2.00) 506501.000 (-12.09)
> parsec3/freqmine 990869.200 997817.000 (0.70) 990350.400 (-0.05) 997669.000 (0.69) 763325.800 (-22.96)
> parsec3/raytrace 1748370.800 1757109.200 (0.50) 1746153.800 (-0.13) 1757830.400 (0.54) 1581455.800 (-9.55)
> parsec3/streamcluster 121521.800 140452.400 (15.58) 129725.400 (6.75) 132266.000 (8.84) 130558.200 (7.44)
> parsec3/swaptions 15592.400 29018.800 (86.11) 14765.800 (-5.30) 27260.200 (74.83) 26631.600 (70.80)
> parsec3/vips 2957567.600 2967993.800 (0.35) 2956623.200 (-0.03) 2973062.600 (0.52) 2951402.000 (-0.21)
> parsec3/x264 3169012.400 3175048.800 (0.19) 3190345.400 (0.67) 3189353.000 (0.64) 3172924.200 (0.12)
> splash2x/barnes 1209066.000 1213125.400 (0.34) 1217261.400 (0.68) 1209661.600 (0.05) 921041.800 (-23.82)
> splash2x/fft 9359313.200 9195213.000 (-1.75) 9377562.400 (0.19) 9050957.600 (-3.29) 9517977.000 (1.70)
> splash2x/lu_cb 514966.200 522939.400 (1.55) 520870.400 (1.15) 522635.000 (1.49) 329933.600 (-35.93)
> splash2x/lu_ncb 514180.400 525974.800 (2.29) 521420.200 (1.41) 521063.600 (1.34) 523557.000 (1.82)
> splash2x/ocean_cp 3346493.400 3288078.000 (-1.75) 3382253.800 (1.07) 3289477.600 (-1.70) 3260810.400 (-2.56)
> splash2x/ocean_ncp 3909966.400 3882968.800 (-0.69) 7037196.000 (79.98) 4046363.400 (3.49) 3471452.400 (-11.22)
> splash2x/radiosity 1471119.400 1470626.800 (-0.03) 1482604.200 (0.78) 1472718.400 (0.11) 546893.600 (-62.82)
> splash2x/radix 1748360.800 1729163.400 (-1.10) 1371463.200 (-21.56) 1701993.600 (-2.65) 1817519.600 (3.96)
> splash2x/raytrace 46670.000 60172.200 (28.93) 51901.600 (11.21) 60782.600 (30.24) 52644.800 (12.80)
> splash2x/volrend 150666.600 167444.200 (11.14) 151335.200 (0.44) 163345.000 (8.41) 162760.000 (8.03)
> splash2x/water_nsquared 45720.200 59422.400 (29.97) 46031.000 (0.68) 61801.400 (35.17) 62627.000 (36.98)
> splash2x/water_spatial 663052.200 672855.800 (1.48) 665787.600 (0.41) 674696.200 (1.76) 471052.600 (-28.96)
> total 40077300.000 40108900.000 (0.08) 42997900.000 (7.29) 40032700.000 (-0.11) 37813000.000 (-5.65)
>
>
> DAMON Overheads
> ~~~~~~~~~~~~~~~
>
> In total, DAMON recording feature incurs 0.25% runtime overhead (up to 1.66% in
> worst case with 'splash2x/barnes') and 0.08% memory space overhead.
>
> For convenience test run of 'rec', I use a Python wrapper. The wrapper
> constantly consumes about 10-15MB of memory. This becomes high memory overhead
> if the target workload has small memory footprint. In detail, 16%, 86%, 29%,
> 11%, and 30% overheads shown for parsec3/streamcluster (125 MiB),
> parsec3/swaptions (15 MiB), splash2x/raytrace (45 MiB), splash2x/volrend (151
> MiB), and splash2x/water_nsquared (46 MiB)). Nonetheless, the overheads are
> not from DAMON, but from the wrapper, and thus should be ignored. This fake
> memory overhead continues in 'ethp' and 'prcl', as those configurations are
> also using the Python wrapper.
>
>
> Efficient THP
> ~~~~~~~~~~~~~
>
> THP 'always' enabled policy achieves 5.57% speedup but incurs 7.29% memory
> overhead. It achieves 41.62% speedup in best case, but 79.98% memory overhead
> in worst case. Interestingly, both the best and worst case are with
> 'splash2x/ocean_ncp').
The results above don't seems to support this any more?
> runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> splash2x/ocean_ncp 86.927 87.065 (0.16) 50.747 (-41.62) 86.855 (-0.08) 199.553 (129.57)
>
> The 2-lines implementation of data access monitoring based THP version ('ethp')
> shows 2.21% speedup and -0.11% memory overhead. In other words, 'ethp' removes
> 100% of THP memory waste while preserving 39.67% of THP speedup in total.
>
>
> Proactive Reclamation
> ~~~~~~~~~~~~~~~~~~~~
>
> As same to the original work, I use 'zram' swap device for this configuration.
>
> In total, our 1 line implementation of Proactive Reclamation, 'prcl', incurred
> 8.47% runtime overhead in total while achieving 5.65% system memory usage
> reduction.
>
> Nonetheless, as the memory usage is calculated with 'MemFree' in
> '/proc/meminfo', it contains the SwapCached pages. As the swapcached pages can
> be easily evicted, I also measured the residential set size of the workloads:
>
> rss.avg orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> parsec3/blackscholes 592502.000 589764.400 (-0.46) 592132.600 (-0.06) 593702.000 (0.20) 406639.400 (-31.37)
> parsec3/bodytrack 32365.400 32195.000 (-0.53) 32210.800 (-0.48) 32114.600 (-0.77) 21537.600 (-33.45)
> parsec3/canneal 839904.200 840292.200 (0.05) 836866.400 (-0.36) 838263.200 (-0.20) 837895.800 (-0.24)
> parsec3/dedup 1208337.200 1218465.600 (0.84) 1233278.600 (2.06) 1200490.200 (-0.65) 882911.400 (-26.93)
> parsec3/facesim 311380.800 311363.600 (-0.01) 315642.600 (1.37) 312573.400 (0.38) 310257.400 (-0.36)
> parsec3/ferret 99514.800 99542.000 (0.03) 100454.200 (0.94) 99879.800 (0.37) 89679.200 (-9.88)
> parsec3/fluidanimate 531760.800 531735.200 (-0.00) 531865.400 (0.02) 531940.800 (0.03) 440781.000 (-17.11)
> parsec3/freqmine 552455.400 552882.600 (0.08) 555793.600 (0.60) 553019.800 (0.10) 58067.000 (-89.49)
> parsec3/raytrace 894798.400 894953.400 (0.02) 892223.400 (-0.29) 893012.400 (-0.20) 315259.800 (-64.77)
> parsec3/streamcluster 110780.400 110856.800 (0.07) 110954.000 (0.16) 111310.800 (0.48) 108066.800 (-2.45)
> parsec3/swaptions 5614.600 5645.600 (0.55) 5553.200 (-1.09) 5552.600 (-1.10) 3251.800 (-42.08)
> parsec3/vips 31942.200 31752.800 (-0.59) 32042.600 (0.31) 32226.600 (0.89) 29012.200 (-9.17)
> parsec3/x264 81770.800 81609.200 (-0.20) 82800.800 (1.26) 82612.200 (1.03) 81805.800 (0.04)
> splash2x/barnes 1216515.600 1217113.800 (0.05) 1225605.600 (0.75) 1217325.000 (0.07) 540108.400 (-55.60)
> splash2x/fft 9668660.600 9751350.800 (0.86) 9773806.400 (1.09) 9613555.400 (-0.57) 7951241.800 (-17.76)
> splash2x/lu_cb 510368.800 510095.800 (-0.05) 514350.600 (0.78) 510276.000 (-0.02) 311584.800 (-38.95)
> splash2x/lu_ncb 509904.800 510001.600 (0.02) 513847.000 (0.77) 510073.400 (0.03) 509905.600 (0.00)
> splash2x/ocean_cp 3389550.600 3404466.000 (0.44) 3443363.600 (1.59) 3410388.000 (0.61) 3330608.600 (-1.74)
> splash2x/ocean_ncp 3923723.200 3911148.200 (-0.32) 7175800.400 (82.88) 4104482.400 (4.61) 2030525.000 (-48.25)
> splash2x/radiosity 1472994.600 1475946.400 (0.20) 1485636.800 (0.86) 1476193.000 (0.22) 262161.400 (-82.20)
> splash2x/radix 1750329.800 1765697.000 (0.88) 1413304.000 (-19.25) 1754154.400 (0.22) 1516142.600 (-13.38)
> splash2x/raytrace 23149.600 23208.000 (0.25) 28574.400 (23.43) 26694.600 (15.31) 16257.800 (-29.77)
> splash2x/volrend 43968.800 43919.000 (-0.11) 44087.600 (0.27) 44224.000 (0.58) 32484.400 (-26.12)
> splash2x/water_nsquared 29348.000 29338.400 (-0.03) 29604.600 (0.87) 29779.400 (1.47) 23644.800 (-19.43)
> splash2x/water_spatial 655263.600 655097.800 (-0.03) 655199.200 (-0.01) 656282.400 (0.16) 379816.800 (-42.04)
> total 28486900.000 28598400.000 (0.39) 31625000.000 (11.02) 28640100.000 (0.54) 20489600.000 (-28.07)
>
> In total, 28.07% of residential sets were reduced.
>
> With parsec3/freqmine, 'prcl' reduced 22.96% of system memory usage and 89.49%
> of residential sets while incurring only 2.45% runtime overhead.
>
>
> Sequence Of Patches
> ===================
>
> The patches are based on the v5.6 plus v7 DAMON patchset[1] and Minchan's
> ``do_madvise()`` patch[2]. Minchan's patch was necessary for reuse of
> ``madvise()`` code in DAMON. You can also clone the complete git tree:
>
> $ git clone git://github.com/sjp38/linux -b damos/rfc/v5
>
> The web is also available:
> https://github.com/sjp38/linux/releases/tag/damos/rfc/v5
>
>
> [1] https://lore.kernel.org/linux-mm/20200318112722.30143-1-sjpark@amazon.com/
> [2] https://lore.kernel.org/linux-mm/20200302193630.68771-2-minchan@kernel.org/
>
> The first patch allows DAMON to reuse ``madvise()`` code for the actions. The
> second patch accounts age of each region. The third patch implements the
> handling of the schemes in DAMON and exports a kernel space programming
> interface for it. The fourth patch implements a debugfs interface for
> privileged people and programs. The fifth and sixth patches each adds
> kunittests and selftests for these changes, and finally the seventhe patch
> modifies the user space tool for DAMON to support description and applying of
> schemes in human freiendly way.
>
>
> Patch History
> =============
>
> Changes from RFC v4
> (https://lore.kernel.org/linux-mm/20200303121406.20954-1-sjpark@amazon.com/)
> - Handle CONFIG_ADVISE_SYSCALL
> - Clean up code (Jonathan Cameron)
> - Update test results
> - Rebase on v5.6 + DAMON v7
>
> Changes from RFC v3
> (https://lore.kernel.org/linux-mm/20200225102300.23895-1-sjpark@amazon.com/)
> - Add Reviewed-by from Brendan Higgins
> - Code cleanup: Modularize madvise() call
> - Fix a trivial bug in the wrapper python script
> - Add more stable and detailed evaluation results with updated ETHP scheme
>
> Changes from RFC v2
> (https://lore.kernel.org/linux-mm/20200218085309.18346-1-sjpark@amazon.com/)
> - Fix aging mechanism for more better 'old region' selection
> - Add more kunittests and kselftests for this patchset
> - Support more human friedly description and application of 'schemes'
>
> Changes from RFC v1
> (https://lore.kernel.org/linux-mm/20200210150921.32482-1-sjpark@amazon.com/)
> - Properly adjust age accounting related properties after splitting, merging,
> and action applying
> SeongJae Park (7):
> mm/madvise: Export do_madvise() to external GPL modules
> mm/damon: Account age of target regions
> mm/damon: Implement data access monitoring-based operation schemes
> mm/damon/schemes: Implement a debugfs interface
> mm/damon-test: Add kunit test case for regions age accounting
> mm/damon/selftests: Add 'schemes' debugfs tests
> damon/tools: Support more human friendly 'schemes' control
>
> include/linux/damon.h | 29 ++
> mm/damon-test.h | 5 +
> mm/damon.c | 428 +++++++++++++++++-
> mm/madvise.c | 1 +
> tools/damon/_convert_damos.py | 125 +++++
> tools/damon/_damon.py | 143 ++++++
> tools/damon/damo | 7 +
> tools/damon/record.py | 135 +-----
> tools/damon/schemes.py | 105 +++++
> .../testing/selftests/damon/debugfs_attrs.sh | 29 ++
> 10 files changed, 878 insertions(+), 129 deletions(-)
> create mode 100755 tools/damon/_convert_damos.py
> create mode 100644 tools/damon/_damon.py
> create mode 100644 tools/damon/schemes.py
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes
2020-03-31 15:51 ` [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes Jonathan Cameron
@ 2020-03-31 16:18 ` SeongJae Park
2020-03-31 16:39 ` Jonathan Cameron
0 siblings, 1 reply; 12+ messages in thread
From: SeongJae Park @ 2020-03-31 16:18 UTC (permalink / raw)
To: Jonathan Cameron
Cc: SeongJae Park, alexander.shishkin, linux-mm, akpm, SeongJae Park,
aarcange, acme, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-doc, linux-kernel
On Tue, 31 Mar 2020 16:51:55 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> On Mon, 30 Mar 2020 13:50:35 +0200
> SeongJae Park <sjpark@amazon.com> wrote:
>
> > From: SeongJae Park <sjpark@amazon.de>
> >
> > DAMON[1] can be used as a primitive for data access awared memory management
> > optimizations. That said, users who want such optimizations should run DAMON,
> > read the monitoring results, analyze it, plan a new memory management scheme,
> > and apply the new scheme by themselves. Such efforts will be inevitable for
> > some complicated optimizations.
> >
> > However, in many other cases, the users could simply want the system to apply a
> > memory management action to a memory region of a specific size having a
> > specific access frequency for a specific time. For example, "page out a memory
> > region larger than 100 MiB keeping only rare accesses more than 2 minutes", or
> > "Do not use THP for a memory region larger than 2 MiB rarely accessed for more
> > than 1 seconds".
> >
> > This RFC patchset makes DAMON to handle such data access monitoring-based
> > operation schemes. With this change, users can do the data access awared
> > optimizations by simply specifying their schemes to DAMON.
>
>
> Hi SeongJae,
>
> I'm wondering if I'm misreading the results below or a data handling mixup
> occured. See inline.
Thank you for question, Jonathan!
>
> Thanks,
>
> Jonathan
>
> >
[...]
> > Results
> > -------
> >
> > Below two tables show the measurement results. The runtimes are in seconds
> > while the memory usages are in KiB. Each configurations except 'orig' shows
> > its overhead relative to 'orig' in percent within parenthesises.
> >
> > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > parsec3/blackscholes 107.594 107.956 (0.34) 106.750 (-0.78) 107.672 (0.07) 111.916 (4.02)
> > parsec3/bodytrack 79.230 79.368 (0.17) 78.908 (-0.41) 79.705 (0.60) 80.423 (1.50)
> > parsec3/canneal 142.831 143.810 (0.69) 123.530 (-13.51) 133.778 (-6.34) 144.998 (1.52)
> > parsec3/dedup 11.986 11.959 (-0.23) 11.762 (-1.87) 12.028 (0.35) 13.313 (11.07)
> > parsec3/facesim 210.125 209.007 (-0.53) 205.226 (-2.33) 207.766 (-1.12) 209.815 (-0.15)
> > parsec3/ferret 191.601 191.177 (-0.22) 190.420 (-0.62) 191.775 (0.09) 192.638 (0.54)
> > parsec3/fluidanimate 212.735 212.970 (0.11) 209.151 (-1.68) 211.904 (-0.39) 218.573 (2.74)
> > parsec3/freqmine 291.225 290.873 (-0.12) 289.258 (-0.68) 289.884 (-0.46) 298.373 (2.45)
> > parsec3/raytrace 118.289 119.586 (1.10) 119.045 (0.64) 119.064 (0.66) 137.919 (16.60)
> > parsec3/streamcluster 323.565 328.168 (1.42) 279.565 (-13.60) 287.452 (-11.16) 333.244 (2.99)
> > parsec3/swaptions 155.140 155.473 (0.21) 153.816 (-0.85) 156.423 (0.83) 156.237 (0.71)
> > parsec3/vips 58.979 59.311 (0.56) 58.733 (-0.42) 59.005 (0.04) 61.062 (3.53)
> > parsec3/x264 70.539 68.413 (-3.01) 64.760 (-8.19) 67.180 (-4.76) 68.103 (-3.45)
> > splash2x/barnes 80.414 81.751 (1.66) 73.585 (-8.49) 80.232 (-0.23) 115.753 (43.95)
> > splash2x/fft 33.902 34.111 (0.62) 24.228 (-28.53) 29.926 (-11.73) 44.438 (31.08)
> > splash2x/lu_cb 85.556 86.001 (0.52) 84.538 (-1.19) 86.000 (0.52) 91.447 (6.89)
> > splash2x/lu_ncb 93.399 93.652 (0.27) 90.463 (-3.14) 94.008 (0.65) 93.901 (0.54)
> > splash2x/ocean_cp 45.253 45.191 (-0.14) 43.049 (-4.87) 44.022 (-2.72) 46.588 (2.95)
> > splash2x/ocean_ncp 86.927 87.065 (0.16) 50.747 (-41.62) 86.855 (-0.08) 199.553 (129.57)
> > splash2x/radiosity 91.433 91.511 (0.09) 90.626 (-0.88) 91.865 (0.47) 104.524 (14.32)
> > splash2x/radix 31.923 32.023 (0.31) 25.194 (-21.08) 32.035 (0.35) 39.231 (22.89)
> > splash2x/raytrace 84.367 84.677 (0.37) 82.417 (-2.31) 83.505 (-1.02) 84.857 (0.58)
> > splash2x/volrend 87.499 87.495 (-0.00) 86.775 (-0.83) 87.311 (-0.21) 87.511 (0.01)
> > splash2x/water_nsquared 236.397 236.759 (0.15) 219.902 (-6.98) 224.228 (-5.15) 238.562 (0.92)
> > splash2x/water_spatial 89.646 89.767 (0.14) 89.735 (0.10) 90.347 (0.78) 103.585 (15.55)
> > total 3020.570 3028.080 (0.25) 2852.190 (-5.57) 2953.960 (-2.21) 3276.550 (8.47)
> >
> >
> > memused.avg orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > parsec3/blackscholes 1785916.600 1834201.400 (2.70) 1826249.200 (2.26) 1828079.200 (2.36) 1712210.600 (-4.13)
> > parsec3/bodytrack 1415049.400 1434317.600 (1.36) 1423715.000 (0.61) 1430392.600 (1.08) 1435136.000 (1.42)
> > parsec3/canneal 1043489.800 1058617.600 (1.45) 1040484.600 (-0.29) 1048664.800 (0.50) 1050280.000 (0.65)
> > parsec3/dedup 2414453.200 2458493.200 (1.82) 2411379.400 (-0.13) 2400516.000 (-0.58) 2461120.800 (1.93)
> > parsec3/facesim 541597.200 550097.400 (1.57) 544364.600 (0.51) 553240.000 (2.15) 552316.400 (1.98)
> > parsec3/ferret 317986.600 332346.000 (4.52) 320218.000 (0.70) 331085.000 (4.12) 330895.200 (4.06)
> > parsec3/fluidanimate 576183.400 585442.000 (1.61) 577780.200 (0.28) 587703.400 (2.00) 506501.000 (-12.09)
> > parsec3/freqmine 990869.200 997817.000 (0.70) 990350.400 (-0.05) 997669.000 (0.69) 763325.800 (-22.96)
> > parsec3/raytrace 1748370.800 1757109.200 (0.50) 1746153.800 (-0.13) 1757830.400 (0.54) 1581455.800 (-9.55)
> > parsec3/streamcluster 121521.800 140452.400 (15.58) 129725.400 (6.75) 132266.000 (8.84) 130558.200 (7.44)
> > parsec3/swaptions 15592.400 29018.800 (86.11) 14765.800 (-5.30) 27260.200 (74.83) 26631.600 (70.80)
> > parsec3/vips 2957567.600 2967993.800 (0.35) 2956623.200 (-0.03) 2973062.600 (0.52) 2951402.000 (-0.21)
> > parsec3/x264 3169012.400 3175048.800 (0.19) 3190345.400 (0.67) 3189353.000 (0.64) 3172924.200 (0.12)
> > splash2x/barnes 1209066.000 1213125.400 (0.34) 1217261.400 (0.68) 1209661.600 (0.05) 921041.800 (-23.82)
> > splash2x/fft 9359313.200 9195213.000 (-1.75) 9377562.400 (0.19) 9050957.600 (-3.29) 9517977.000 (1.70)
> > splash2x/lu_cb 514966.200 522939.400 (1.55) 520870.400 (1.15) 522635.000 (1.49) 329933.600 (-35.93)
> > splash2x/lu_ncb 514180.400 525974.800 (2.29) 521420.200 (1.41) 521063.600 (1.34) 523557.000 (1.82)
> > splash2x/ocean_cp 3346493.400 3288078.000 (-1.75) 3382253.800 (1.07) 3289477.600 (-1.70) 3260810.400 (-2.56)
> > splash2x/ocean_ncp 3909966.400 3882968.800 (-0.69) 7037196.000 (79.98) 4046363.400 (3.49) 3471452.400 (-11.22)
> > splash2x/radiosity 1471119.400 1470626.800 (-0.03) 1482604.200 (0.78) 1472718.400 (0.11) 546893.600 (-62.82)
> > splash2x/radix 1748360.800 1729163.400 (-1.10) 1371463.200 (-21.56) 1701993.600 (-2.65) 1817519.600 (3.96)
> > splash2x/raytrace 46670.000 60172.200 (28.93) 51901.600 (11.21) 60782.600 (30.24) 52644.800 (12.80)
> > splash2x/volrend 150666.600 167444.200 (11.14) 151335.200 (0.44) 163345.000 (8.41) 162760.000 (8.03)
> > splash2x/water_nsquared 45720.200 59422.400 (29.97) 46031.000 (0.68) 61801.400 (35.17) 62627.000 (36.98)
> > splash2x/water_spatial 663052.200 672855.800 (1.48) 665787.600 (0.41) 674696.200 (1.76) 471052.600 (-28.96)
> > total 40077300.000 40108900.000 (0.08) 42997900.000 (7.29) 40032700.000 (-0.11) 37813000.000 (-5.65)
> >
> >
[...]
> >
> > Efficient THP
> > ~~~~~~~~~~~~~
> >
> > THP 'always' enabled policy achieves 5.57% speedup but incurs 7.29% memory
> > overhead. It achieves 41.62% speedup in best case, but 79.98% memory overhead
> > in worst case. Interestingly, both the best and worst case are with
> > 'splash2x/ocean_ncp').
>
> The results above don't seems to support this any more?
>
> > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > splash2x/ocean_ncp 86.927 87.065 (0.16) 50.747 (-41.62) 86.855 (-0.08) 199.553 (129.57)
Hmm... But, I don't get what point you meaning... In the data, column of 'thp'
means the THP 'always' enabled policy. And, the following column shows the
overhead of it compared to that of 'orig', in percent. Thus, the data says THP
'always' enabled policy enabled kernel consumes 50.747 seconds to finish
splash2x/ocean_ncp, while THP disabled original kernel consumes 86.927 seconds.
Thus, the overhead is ``(50.747 - 86.927) / 86.927 = -0.4162``. In other
words, 41.62% speedup.
Also, 5.57% speedup and 7.29% memory overhead is for _total_. This data shows
it.
> > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > total 3020.570 3028.080 (0.25) 2852.190 (-5.57) 2953.960 (-2.21) 3276.550 (8.47)
Maybe I made you confused by ambiguously saying this. Sorry if so. Or, if I'm
still misunderstanding your point, please let me know.
Thanks,
SeongJae Park
[...]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes
2020-03-31 16:18 ` SeongJae Park
@ 2020-03-31 16:39 ` Jonathan Cameron
2020-04-01 8:21 ` SeongJae Park
0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Cameron @ 2020-03-31 16:39 UTC (permalink / raw)
To: SeongJae Park
Cc: alexander.shishkin, linux-mm, akpm, SeongJae Park, aarcange,
acme, amit, brendan.d.gregg, brendanhiggins, cai, colin.king,
corbet, dwmw, jolsa, kirill, mark.rutland, mgorman, minchan,
mingo, namhyung, peterz, rdunlap, riel, rientjes, rostedt,
shakeelb, shuah, sj38.park, vbabka, vdavydov.dev, yang.shi,
ying.huang, linux-doc, linux-kernel
On Tue, 31 Mar 2020 18:18:19 +0200
SeongJae Park <sjpark@amazon.com> wrote:
> On Tue, 31 Mar 2020 16:51:55 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
>
> > On Mon, 30 Mar 2020 13:50:35 +0200
> > SeongJae Park <sjpark@amazon.com> wrote:
> >
> > > From: SeongJae Park <sjpark@amazon.de>
> > >
> > > DAMON[1] can be used as a primitive for data access awared memory management
> > > optimizations. That said, users who want such optimizations should run DAMON,
> > > read the monitoring results, analyze it, plan a new memory management scheme,
> > > and apply the new scheme by themselves. Such efforts will be inevitable for
> > > some complicated optimizations.
> > >
> > > However, in many other cases, the users could simply want the system to apply a
> > > memory management action to a memory region of a specific size having a
> > > specific access frequency for a specific time. For example, "page out a memory
> > > region larger than 100 MiB keeping only rare accesses more than 2 minutes", or
> > > "Do not use THP for a memory region larger than 2 MiB rarely accessed for more
> > > than 1 seconds".
> > >
> > > This RFC patchset makes DAMON to handle such data access monitoring-based
> > > operation schemes. With this change, users can do the data access awared
> > > optimizations by simply specifying their schemes to DAMON.
> >
> >
> > Hi SeongJae,
> >
> > I'm wondering if I'm misreading the results below or a data handling mixup
> > occured. See inline.
>
> Thank you for question, Jonathan!
>
> >
> > Thanks,
> >
> > Jonathan
> >
> > >
> [...]
> > > Results
> > > -------
> > >
> > > Below two tables show the measurement results. The runtimes are in seconds
> > > while the memory usages are in KiB. Each configurations except 'orig' shows
> > > its overhead relative to 'orig' in percent within parenthesises.
> > >
> > > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > > parsec3/blackscholes 107.594 107.956 (0.34) 106.750 (-0.78) 107.672 (0.07) 111.916 (4.02)
> > > parsec3/bodytrack 79.230 79.368 (0.17) 78.908 (-0.41) 79.705 (0.60) 80.423 (1.50)
> > > parsec3/canneal 142.831 143.810 (0.69) 123.530 (-13.51) 133.778 (-6.34) 144.998 (1.52)
> > > parsec3/dedup 11.986 11.959 (-0.23) 11.762 (-1.87) 12.028 (0.35) 13.313 (11.07)
> > > parsec3/facesim 210.125 209.007 (-0.53) 205.226 (-2.33) 207.766 (-1.12) 209.815 (-0.15)
> > > parsec3/ferret 191.601 191.177 (-0.22) 190.420 (-0.62) 191.775 (0.09) 192.638 (0.54)
> > > parsec3/fluidanimate 212.735 212.970 (0.11) 209.151 (-1.68) 211.904 (-0.39) 218.573 (2.74)
> > > parsec3/freqmine 291.225 290.873 (-0.12) 289.258 (-0.68) 289.884 (-0.46) 298.373 (2.45)
> > > parsec3/raytrace 118.289 119.586 (1.10) 119.045 (0.64) 119.064 (0.66) 137.919 (16.60)
> > > parsec3/streamcluster 323.565 328.168 (1.42) 279.565 (-13.60) 287.452 (-11.16) 333.244 (2.99)
> > > parsec3/swaptions 155.140 155.473 (0.21) 153.816 (-0.85) 156.423 (0.83) 156.237 (0.71)
> > > parsec3/vips 58.979 59.311 (0.56) 58.733 (-0.42) 59.005 (0.04) 61.062 (3.53)
> > > parsec3/x264 70.539 68.413 (-3.01) 64.760 (-8.19) 67.180 (-4.76) 68.103 (-3.45)
> > > splash2x/barnes 80.414 81.751 (1.66) 73.585 (-8.49) 80.232 (-0.23) 115.753 (43.95)
> > > splash2x/fft 33.902 34.111 (0.62) 24.228 (-28.53) 29.926 (-11.73) 44.438 (31.08)
> > > splash2x/lu_cb 85.556 86.001 (0.52) 84.538 (-1.19) 86.000 (0.52) 91.447 (6.89)
> > > splash2x/lu_ncb 93.399 93.652 (0.27) 90.463 (-3.14) 94.008 (0.65) 93.901 (0.54)
> > > splash2x/ocean_cp 45.253 45.191 (-0.14) 43.049 (-4.87) 44.022 (-2.72) 46.588 (2.95)
> > > splash2x/ocean_ncp 86.927 87.065 (0.16) 50.747 (-41.62) 86.855 (-0.08) 199.553 (129.57)
> > > splash2x/radiosity 91.433 91.511 (0.09) 90.626 (-0.88) 91.865 (0.47) 104.524 (14.32)
> > > splash2x/radix 31.923 32.023 (0.31) 25.194 (-21.08) 32.035 (0.35) 39.231 (22.89)
> > > splash2x/raytrace 84.367 84.677 (0.37) 82.417 (-2.31) 83.505 (-1.02) 84.857 (0.58)
> > > splash2x/volrend 87.499 87.495 (-0.00) 86.775 (-0.83) 87.311 (-0.21) 87.511 (0.01)
> > > splash2x/water_nsquared 236.397 236.759 (0.15) 219.902 (-6.98) 224.228 (-5.15) 238.562 (0.92)
> > > splash2x/water_spatial 89.646 89.767 (0.14) 89.735 (0.10) 90.347 (0.78) 103.585 (15.55)
> > > total 3020.570 3028.080 (0.25) 2852.190 (-5.57) 2953.960 (-2.21) 3276.550 (8.47)
> > >
> > >
> > > memused.avg orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > > parsec3/blackscholes 1785916.600 1834201.400 (2.70) 1826249.200 (2.26) 1828079.200 (2.36) 1712210.600 (-4.13)
> > > parsec3/bodytrack 1415049.400 1434317.600 (1.36) 1423715.000 (0.61) 1430392.600 (1.08) 1435136.000 (1.42)
> > > parsec3/canneal 1043489.800 1058617.600 (1.45) 1040484.600 (-0.29) 1048664.800 (0.50) 1050280.000 (0.65)
> > > parsec3/dedup 2414453.200 2458493.200 (1.82) 2411379.400 (-0.13) 2400516.000 (-0.58) 2461120.800 (1.93)
> > > parsec3/facesim 541597.200 550097.400 (1.57) 544364.600 (0.51) 553240.000 (2.15) 552316.400 (1.98)
> > > parsec3/ferret 317986.600 332346.000 (4.52) 320218.000 (0.70) 331085.000 (4.12) 330895.200 (4.06)
> > > parsec3/fluidanimate 576183.400 585442.000 (1.61) 577780.200 (0.28) 587703.400 (2.00) 506501.000 (-12.09)
> > > parsec3/freqmine 990869.200 997817.000 (0.70) 990350.400 (-0.05) 997669.000 (0.69) 763325.800 (-22.96)
> > > parsec3/raytrace 1748370.800 1757109.200 (0.50) 1746153.800 (-0.13) 1757830.400 (0.54) 1581455.800 (-9.55)
> > > parsec3/streamcluster 121521.800 140452.400 (15.58) 129725.400 (6.75) 132266.000 (8.84) 130558.200 (7.44)
> > > parsec3/swaptions 15592.400 29018.800 (86.11) 14765.800 (-5.30) 27260.200 (74.83) 26631.600 (70.80)
> > > parsec3/vips 2957567.600 2967993.800 (0.35) 2956623.200 (-0.03) 2973062.600 (0.52) 2951402.000 (-0.21)
> > > parsec3/x264 3169012.400 3175048.800 (0.19) 3190345.400 (0.67) 3189353.000 (0.64) 3172924.200 (0.12)
> > > splash2x/barnes 1209066.000 1213125.400 (0.34) 1217261.400 (0.68) 1209661.600 (0.05) 921041.800 (-23.82)
> > > splash2x/fft 9359313.200 9195213.000 (-1.75) 9377562.400 (0.19) 9050957.600 (-3.29) 9517977.000 (1.70)
> > > splash2x/lu_cb 514966.200 522939.400 (1.55) 520870.400 (1.15) 522635.000 (1.49) 329933.600 (-35.93)
> > > splash2x/lu_ncb 514180.400 525974.800 (2.29) 521420.200 (1.41) 521063.600 (1.34) 523557.000 (1.82)
> > > splash2x/ocean_cp 3346493.400 3288078.000 (-1.75) 3382253.800 (1.07) 3289477.600 (-1.70) 3260810.400 (-2.56)
> > > splash2x/ocean_ncp 3909966.400 3882968.800 (-0.69) 7037196.000 (79.98) 4046363.400 (3.49) 3471452.400 (-11.22)
> > > splash2x/radiosity 1471119.400 1470626.800 (-0.03) 1482604.200 (0.78) 1472718.400 (0.11) 546893.600 (-62.82)
> > > splash2x/radix 1748360.800 1729163.400 (-1.10) 1371463.200 (-21.56) 1701993.600 (-2.65) 1817519.600 (3.96)
> > > splash2x/raytrace 46670.000 60172.200 (28.93) 51901.600 (11.21) 60782.600 (30.24) 52644.800 (12.80)
> > > splash2x/volrend 150666.600 167444.200 (11.14) 151335.200 (0.44) 163345.000 (8.41) 162760.000 (8.03)
> > > splash2x/water_nsquared 45720.200 59422.400 (29.97) 46031.000 (0.68) 61801.400 (35.17) 62627.000 (36.98)
> > > splash2x/water_spatial 663052.200 672855.800 (1.48) 665787.600 (0.41) 674696.200 (1.76) 471052.600 (-28.96)
> > > total 40077300.000 40108900.000 (0.08) 42997900.000 (7.29) 40032700.000 (-0.11) 37813000.000 (-5.65)
> > >
> > >
> [...]
> > >
> > > Efficient THP
> > > ~~~~~~~~~~~~~
> > >
> > > THP 'always' enabled policy achieves 5.57% speedup but incurs 7.29% memory
> > > overhead. It achieves 41.62% speedup in best case, but 79.98% memory overhead
> > > in worst case. Interestingly, both the best and worst case are with
> > > 'splash2x/ocean_ncp').
> >
> > The results above don't seems to support this any more?
> >
> > > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > > splash2x/ocean_ncp 86.927 87.065 (0.16) 50.747 (-41.62) 86.855 (-0.08) 199.553 (129.57)
>
> Hmm... But, I don't get what point you meaning... In the data, column of 'thp'
> means the THP 'always' enabled policy. And, the following column shows the
> overhead of it compared to that of 'orig', in percent. Thus, the data says THP
> 'always' enabled policy enabled kernel consumes 50.747 seconds to finish
> splash2x/ocean_ncp, while THP disabled original kernel consumes 86.927 seconds.
ah. I got myself confused.
However, I was expecting to see a significant performance advantage
to ethp for this particular case as we did in the previous version.
In the previous version (you posted in reply to v6 of Damon), for ethp we had a significant gain with:
runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
splash2x/ocean_ncp 81.360 81.434 (0.09) 51.157 (-37.12) 66.711 (-18.00) 91.611 (12.60)
So, in ethp we got roughly half the performance back (at the cost of some of the memory)
That was a result I have been trying to replicate, hence was at the front of my mind!
Any idea why that changed so much?
Thanks,
Jonathan
> Thus, the overhead is ``(50.747 - 86.927) / 86.927 = -0.4162``. In other
> words, 41.62% speedup.
>
> Also, 5.57% speedup and 7.29% memory overhead is for _total_. This data shows
> it.
>
> > > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > > total 3020.570 3028.080 (0.25) 2852.190 (-5.57) 2953.960 (-2.21) 3276.550 (8.47)
>
> Maybe I made you confused by ambiguously saying this. Sorry if so. Or, if I'm
> still misunderstanding your point, please let me know.
>
>
> Thanks,
> SeongJae Park
>
> [...]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: [RFC v5 0/7] Implement Data Access Monitoring-based Memory Operation Schemes
2020-03-31 16:39 ` Jonathan Cameron
@ 2020-04-01 8:21 ` SeongJae Park
0 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2020-04-01 8:21 UTC (permalink / raw)
To: Jonathan Cameron
Cc: SeongJae Park, alexander.shishkin, linux-mm, akpm, SeongJae Park,
aarcange, acme, amit, brendan.d.gregg, brendanhiggins, cai,
colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
minchan, mingo, namhyung, peterz, rdunlap, riel, rientjes,
rostedt, shakeelb, shuah, sj38.park, vbabka, vdavydov.dev,
yang.shi, ying.huang, linux-doc, linux-kernel
On Tue, 31 Mar 2020 17:39:08 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> On Tue, 31 Mar 2020 18:18:19 +0200
> SeongJae Park <sjpark@amazon.com> wrote:
>
> > On Tue, 31 Mar 2020 16:51:55 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> >
> > > On Mon, 30 Mar 2020 13:50:35 +0200
> > > SeongJae Park <sjpark@amazon.com> wrote:
> > >
> > > > From: SeongJae Park <sjpark@amazon.de>
> > > >
> > > > DAMON[1] can be used as a primitive for data access awared memory management
> > > > optimizations. That said, users who want such optimizations should run DAMON,
> > > > read the monitoring results, analyze it, plan a new memory management scheme,
> > > > and apply the new scheme by themselves. Such efforts will be inevitable for
> > > > some complicated optimizations.
> > > >
> > > > However, in many other cases, the users could simply want the system to apply a
> > > > memory management action to a memory region of a specific size having a
> > > > specific access frequency for a specific time. For example, "page out a memory
> > > > region larger than 100 MiB keeping only rare accesses more than 2 minutes", or
> > > > "Do not use THP for a memory region larger than 2 MiB rarely accessed for more
> > > > than 1 seconds".
> > > >
> > > > This RFC patchset makes DAMON to handle such data access monitoring-based
> > > > operation schemes. With this change, users can do the data access awared
> > > > optimizations by simply specifying their schemes to DAMON.
[...]
> > > >
> > > > Efficient THP
> > > > ~~~~~~~~~~~~~
> > > >
> > > > THP 'always' enabled policy achieves 5.57% speedup but incurs 7.29% memory
> > > > overhead. It achieves 41.62% speedup in best case, but 79.98% memory overhead
> > > > in worst case. Interestingly, both the best and worst case are with
> > > > 'splash2x/ocean_ncp').
> > >
> > > The results above don't seems to support this any more?
> > >
> > > > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > > > splash2x/ocean_ncp 86.927 87.065 (0.16) 50.747 (-41.62) 86.855 (-0.08) 199.553 (129.57)
> >
> > Hmm... But, I don't get what point you meaning... In the data, column of 'thp'
> > means the THP 'always' enabled policy. And, the following column shows the
> > overhead of it compared to that of 'orig', in percent. Thus, the data says THP
> > 'always' enabled policy enabled kernel consumes 50.747 seconds to finish
> > splash2x/ocean_ncp, while THP disabled original kernel consumes 86.927 seconds.
>
> ah. I got myself confused.
>
> However, I was expecting to see a significant performance advantage
> to ethp for this particular case as we did in the previous version.
>
> In the previous version (you posted in reply to v6 of Damon), for ethp we had a significant gain with:
>
> runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> splash2x/ocean_ncp 81.360 81.434 (0.09) 51.157 (-37.12) 66.711 (-18.00) 91.611 (12.60)
>
> So, in ethp we got roughly half the performance back (at the cost of some of the memory)
>
> That was a result I have been trying to replicate, hence was at the front of my mind!
>
> Any idea why that changed so much?
Ah, I forgot to explain about this change. Thank you for let me know.
Overall, ETHP in DAMON-based Operations Schemes RFC v5 shows worse peak
performance gains.
For example, splash2x/fft shows best case speedup with ETHP for both this
version and previous version. The speedup changed from 19% to 12%. In case of
splash2x/ocean_ncp, it changed from 18% to only 0.08%.
That said, total performance speedup is improved. It was 1.83% before, and
2.21% now. Also, there are several workloads showing better speedup. In case
of parsec3/canneal, the speedup changed from 3.86% to 6.34%.
Also note that ETHP's memory savings for the workloads showing less speedup are
much improved. For example, the memory overhead of ETHP for splash2x/ocean_ncp
was 24.4% before, but only 3.5% now.
This is due to the fact that I didn't update the schemes for the updated DAMON.
Basically, the effect of the schemes are access pattern dependent. In this
case, because DAMON has changed so that it might report access pattern that
different from that of previous version, the schemes should also be modified to
make best performance. However, I didn't update the schemes because those are
for only proof of the concepts, not for productions.
The change of report was not so huge, fortunately. I also confirmed this with
my human eyes by comparing the visualized access patterns of the two version.
The overall trend (better performance, less memory overhead) also changed only
subtle. However, some individual workloads got some remarkable changes.
Thanks,
SeongJae Park
>
> Thanks,
>
> Jonathan
>
>
> > Thus, the overhead is ``(50.747 - 86.927) / 86.927 = -0.4162``. In other
> > words, 41.62% speedup.
> >
> > Also, 5.57% speedup and 7.29% memory overhead is for _total_. This data shows
> > it.
> >
> > > > runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead)
> > > > total 3020.570 3028.080 (0.25) 2852.190 (-5.57) 2953.960 (-2.21) 3276.550 (8.47)
> >
> > Maybe I made you confused by ambiguously saying this. Sorry if so. Or, if I'm
> > still misunderstanding your point, please let me know.
>
>
> >
> >
> > Thanks,
> > SeongJae Park
> >
> > [...]
>
^ permalink raw reply [flat|nested] 12+ messages in thread