From: David Hildenbrand <david@redhat.com>
To: SeongJae Park <sjpark@amazon.com>, akpm@linux-foundation.org
Cc: SeongJae Park <sjpark@amazon.de>,
Jonathan.Cameron@Huawei.com, aarcange@redhat.com,
acme@kernel.org, alexander.shishkin@linux.intel.com,
amit@kernel.org, benh@kernel.crashing.org,
brendan.d.gregg@gmail.com, brendanhiggins@google.com, cai@lca.pw,
colin.king@canonical.com, corbet@lwn.net, dwmw@amazon.com,
foersleo@amazon.de, irogers@google.com, jolsa@redhat.com,
kirill@shutemov.name, mark.rutland@arm.com, mgorman@suse.de,
minchan@kernel.org, mingo@redhat.com, namhyung@kernel.org,
peterz@infradead.org, rdunlap@infradead.org, riel@surriel.com,
rientjes@google.com, rostedt@goodmis.org, sblbir@amazon.com,
shakeelb@google.com, shuah@kernel.org, sj38.park@gmail.com,
snu@amazon.de, vbabka@suse.cz, vdavydov.dev@gmail.com,
yang.shi@linux.alibaba.com, ying.huang@intel.com,
linux-damon@amazon.com, linux-mm@kvack.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC v11 3/8] mm/damon: Implement data access monitoring-based operation schemes
Date: Tue, 9 Jun 2020 10:47:45 +0200 [thread overview]
Message-ID: <ed4b0be0-34ad-511c-7817-e4506ed2f891@redhat.com> (raw)
In-Reply-To: <20200609065320.12941-4-sjpark@amazon.com>
On 09.06.20 08:53, SeongJae Park wrote:
> From: SeongJae Park <sjpark@amazon.de>
>
> In many cases, users might use DAMON for simple data access aware
> memory management optimizations such as applying an operation scheme to
> a memory region of a specific size having a specific access frequency
> for a specific time. For example, "page out a memory region larger than
> 100 MiB but having a low access frequency more than 10 minutes", or "Use
> THP for a memory region larger than 2 MiB having a high access frequency
> for more than 2 seconds".
>
> To minimize users from spending their time for implementation of such
> simple data access monitoring-based operation schemes, this commit makes
> DAMON to handle such schemes directly. With this commit, users can
> simply specify their desired schemes to DAMON.
What would be the alternative? How would a solution where these policies
are handled by user space (or inside an application?) look like?
>
> Each of the schemes is composed with conditions for filtering of the
> target memory regions and desired memory management action for the
> target. Specifically, the format is::
>
> <min/max size> <min/max access frequency> <min/max age> <action>
>
> The filtering conditions are size of memory region, number of accesses
> to the region monitored by DAMON, and the age of the region. The age of
> region is incremented periodically but reset when its addresses or
> access frequency has significantly changed or the action of a scheme was
> applied. For the action, current implementation supports only a few of
> madvise() hints, ``MADV_WILLNEED``, ``MADV_COLD``, ``MADV_PAGEOUT``,
> ``MADV_HUGEPAGE``, and ``MADV_NOHUGEPAGE``.
I am missing some important information. Is this specified for *all*
user space processes? Or how is this configured? What are examples?
E.g., messing with ``MADV_HUGEPAGE`` vs. ``MADV_NOHUGEPAGE`` of random
applications can change the behavior/break these applications. (e.g., if
userfaultfd is getting used and the applciation explicitly sets
MADV_NOHUGEPAGE).
>
> Signed-off-by: SeongJae Park <sjpark@amazon.de>
> ---
> include/linux/damon.h | 50 ++++++++++++++
> mm/damon.c | 149 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 199 insertions(+)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 6a8ff2c63c2a..842a01e80c6e 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -55,6 +55,52 @@ struct damon_task {
> struct list_head list;
> };
>
> +/**
> + * enum damos_action - Represents an action of a Data Access Monitoring-based
> + * Operation Scheme.
> + *
> + * @DAMOS_WILLNEED: Call ``madvise()`` for the region with MADV_WILLNEED.
> + * @DAMOS_COLD: Call ``madvise()`` for the region with MADV_COLD.
> + * @DAMOS_PAGEOUT: Call ``madvise()`` for the region with MADV_PAGEOUT.
> + * @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE.
> + * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
> + * @DAMOS_ACTION_LEN: Number of supported actions.
> + */
> +enum damos_action {
> + DAMOS_WILLNEED,
> + DAMOS_COLD,
> + DAMOS_PAGEOUT,
> + DAMOS_HUGEPAGE,
> + DAMOS_NOHUGEPAGE,
> + DAMOS_ACTION_LEN,
> +};
> +
> +/**
> + * struct damos - Represents a Data Access Monitoring-based Operation Scheme.
> + * @min_sz_region: Minimum size of target regions.
> + * @max_sz_region: Maximum size of target regions.
> + * @min_nr_accesses: Minimum ``->nr_accesses`` of target regions.
> + * @max_nr_accesses: Maximum ``->nr_accesses`` of target regions.
> + * @min_age_region: Minimum age of target regions.
> + * @max_age_region: Maximum age of target regions.
> + * @action: &damo_action to be applied to the target regions.
> + * @list: List head for siblings.
> + *
> + * For each aggregation interval, DAMON applies @action to monitoring target
> + * regions fit in the condition and updates the statistics. Note that both
> + * the minimums and the maximums are inclusive.
> + */
> +struct damos {
> + unsigned int min_sz_region;
> + unsigned int max_sz_region;
> + unsigned int min_nr_accesses;
> + unsigned int max_nr_accesses;
> + unsigned int min_age_region;
> + unsigned int max_age_region;
> + enum damos_action action;
> + struct list_head list;
> +};
> +
> /**
> * struct damon_ctx - Represents a context for each monitoring. This is the
> * main interface that allows users to set the attributes and get the results
> @@ -98,6 +144,7 @@ struct damon_task {
> * @kdamond_lock. Accesses to other fields must be protected by themselves.
> *
> * @tasks_list: Head of monitoring target tasks (&damon_task) list.
> + * @schemes_list: Head of schemes (&damos) list.
> *
> * @sample_cb: Called for each sampling interval.
> * @aggregate_cb: Called for each aggregation interval.
> @@ -128,6 +175,7 @@ struct damon_ctx {
> struct mutex kdamond_lock;
>
> struct list_head tasks_list; /* 'damon_task' objects */
> + struct list_head schemes_list; /* 'damos' objects */
>
> /* callbacks */
> void (*sample_cb)(struct damon_ctx *context);
> @@ -138,6 +186,8 @@ int damon_set_pids(struct damon_ctx *ctx, int *pids, ssize_t nr_pids);
> int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> unsigned long aggr_int, unsigned long regions_update_int,
> unsigned long min_nr_reg, unsigned long max_nr_reg);
> +int damon_set_schemes(struct damon_ctx *ctx,
> + struct damos **schemes, ssize_t nr_schemes);
> int damon_set_recording(struct damon_ctx *ctx,
> unsigned int rbuf_len, char *rfile_path);
> int damon_start(struct damon_ctx *ctx);
> diff --git a/mm/damon.c b/mm/damon.c
> index 17ec5fcc1b96..1ec6fa3dd671 100644
> --- a/mm/damon.c
> +++ b/mm/damon.c
> @@ -22,6 +22,7 @@
>
> #define CREATE_TRACE_POINTS
>
> +#include <asm-generic/mman-common.h>
> #include <linux/damon.h>
> #include <linux/debugfs.h>
> #include <linux/delay.h>
> @@ -67,6 +68,12 @@
> #define damon_for_each_task_safe(t, next, ctx) \
> list_for_each_entry_safe(t, next, &(ctx)->tasks_list, list)
>
> +#define damon_for_each_scheme(s, ctx) \
> + list_for_each_entry(s, &(ctx)->schemes_list, list)
> +
> +#define damon_for_each_scheme_safe(s, next, ctx) \
> + list_for_each_entry_safe(s, next, &(ctx)->schemes_list, list)
> +
> #define MAX_RECORD_BUFFER_LEN (4 * 1024 * 1024)
> #define MAX_RFILE_PATH_LEN 256
>
> @@ -181,6 +188,27 @@ static void damon_destroy_task(struct damon_task *t)
> damon_free_task(t);
> }
>
> +static void damon_add_scheme(struct damon_ctx *ctx, struct damos *s)
> +{
> + list_add_tail(&s->list, &ctx->schemes_list);
> +}
> +
> +static void damon_del_scheme(struct damos *s)
> +{
> + list_del(&s->list);
> +}
> +
> +static void damon_free_scheme(struct damos *s)
> +{
> + kfree(s);
> +}
> +
> +static void damon_destroy_scheme(struct damos *s)
> +{
> + damon_del_scheme(s);
> + damon_free_scheme(s);
> +}
> +
> static unsigned int nr_damon_tasks(struct damon_ctx *ctx)
> {
> struct damon_task *t;
> @@ -779,6 +807,101 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
> }
> }
>
> +#ifndef CONFIG_ADVISE_SYSCALLS
> +static int damos_madvise(struct damon_task *task, struct damon_region *r,
> + int behavior)
> +{
> + return -EINVAL;
> +}
> +#else
> +static int damos_madvise(struct damon_task *task, struct damon_region *r,
> + int behavior)
> +{
> + struct task_struct *t;
> + struct mm_struct *mm;
> + int ret = -ENOMEM;
> +
> + t = damon_get_task_struct(task);
> + if (!t)
> + goto out;
> + mm = damon_get_mm(task);
> + if (!mm)
> + goto put_task_out;
> +
> + ret = do_madvise(t, mm, PAGE_ALIGN(r->vm_start),
> + PAGE_ALIGN(r->vm_end - r->vm_start), behavior);
> + mmput(mm);
> +put_task_out:
> + put_task_struct(t);
> +out:
> + return ret;
> +}
> +#endif /* CONFIG_ADVISE_SYSCALLS */
> +
> +static int damos_do_action(struct damon_task *task, struct damon_region *r,
> + enum damos_action action)
> +{
> + int madv_action;
> +
> + switch (action) {
> + case DAMOS_WILLNEED:
> + madv_action = MADV_WILLNEED;
> + break;
> + case DAMOS_COLD:
> + madv_action = MADV_COLD;
> + break;
> + case DAMOS_PAGEOUT:
> + madv_action = MADV_PAGEOUT;
> + break;
> + case DAMOS_HUGEPAGE:
> + madv_action = MADV_HUGEPAGE;
> + break;
> + case DAMOS_NOHUGEPAGE:
> + madv_action = MADV_NOHUGEPAGE;
> + break;
> + default:
> + pr_warn("Wrong action %d\n", action);
> + return -EINVAL;
> + }
> +
> + return damos_madvise(task, r, madv_action);
> +}
> +
> +static void damon_do_apply_schemes(struct damon_ctx *c, struct damon_task *t,
> + struct damon_region *r)
> +{
> + struct damos *s;
> + unsigned long sz;
> +
> + damon_for_each_scheme(s, c) {
> + sz = r->vm_end - r->vm_start;
> + if ((s->min_sz_region && sz < s->min_sz_region) ||
> + (s->max_sz_region && s->max_sz_region < sz))
> + continue;
> + if ((s->min_nr_accesses && r->nr_accesses < s->min_nr_accesses)
> + || (s->max_nr_accesses &&
> + s->max_nr_accesses < r->nr_accesses))
> + continue;
> + if ((s->min_age_region && r->age < s->min_age_region) ||
> + (s->max_age_region &&
> + s->max_age_region < r->age))
> + continue;
> + damos_do_action(t, r, s->action);
> + r->age = 0;
> + }
> +}
> +
> +static void kdamond_apply_schemes(struct damon_ctx *c)
> +{
> + struct damon_task *t;
> + struct damon_region *r;
> +
> + damon_for_each_task(t, c) {
> + damon_for_each_region(r, t)
> + damon_do_apply_schemes(c, t, r);
> + }
> +}
> +
> #define sz_damon_region(r) (r->vm_end - r->vm_start)
>
> /*
> @@ -1001,6 +1124,7 @@ static int kdamond_fn(void *data)
> kdamond_merge_regions(ctx, max_nr_accesses / 10);
> if (ctx->aggregate_cb)
> ctx->aggregate_cb(ctx);
> + kdamond_apply_schemes(ctx);
> kdamond_reset_aggregated(ctx);
> kdamond_split_regions(ctx);
> }
> @@ -1081,6 +1205,30 @@ int damon_stop(struct damon_ctx *ctx)
> return -EPERM;
> }
>
> +/**
> + * damon_set_schemes() - Set data access monitoring based operation schemes.
> + * @ctx: monitoring context
> + * @schemes: array of the schemes
> + * @nr_schemes: number of entries in @schemes
> + *
> + * This function should not be called while the kdamond of the context is
> + * running.
> + *
> + * Return: 0 if success, or negative error code otherwise.
> + */
> +int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes,
> + ssize_t nr_schemes)
> +{
> + struct damos *s, *next;
> + ssize_t i;
> +
> + damon_for_each_scheme_safe(s, next, ctx)
> + damon_destroy_scheme(s);
> + for (i = 0; i < nr_schemes; i++)
> + damon_add_scheme(ctx, schemes[i]);
> + return 0;
> +}
> +
> /**
> * damon_set_pids() - Set monitoring target processes.
> * @ctx: monitoring context
> @@ -1525,6 +1673,7 @@ static int __init damon_init_user_ctx(void)
> mutex_init(&ctx->kdamond_lock);
>
> INIT_LIST_HEAD(&ctx->tasks_list);
> + INIT_LIST_HEAD(&ctx->schemes_list);
>
> return 0;
> }
>
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2020-06-09 8:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-09 6:53 [RFC v11 0/8] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
2020-06-09 6:53 ` [RFC v11 1/8] mm/madvise: Export do_madvise() to external GPL modules SeongJae Park
2020-06-09 6:53 ` [RFC v11 2/8] mm/damon: Account age of target regions SeongJae Park
2020-06-09 6:53 ` [RFC v11 3/8] mm/damon: Implement data access monitoring-based operation schemes SeongJae Park
2020-06-09 8:47 ` David Hildenbrand [this message]
2020-06-09 9:17 ` SeongJae Park
2020-06-09 14:07 ` David Hildenbrand
2020-06-09 6:53 ` [RFC v11 4/8] mm/damon/schemes: Implement a debugfs interface SeongJae Park
2020-06-09 6:53 ` [RFC v11 5/8] mm/damon/schemes: Implement statistics feature SeongJae Park
2020-06-09 6:53 ` [RFC v11 6/8] mm/damon/selftests: Add 'schemes' debugfs tests SeongJae Park
2020-06-09 6:53 ` [RFC v11 7/8] damon/tools: Support more human friendly 'schemes' control SeongJae Park
2020-06-09 6:53 ` [RFC v11 8/8] Documentation/admin-guide/mm: Document DAMON-based operation schemes SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed4b0be0-34ad-511c-7817-e4506ed2f891@redhat.com \
--to=david@redhat.com \
--cc=Jonathan.Cameron@Huawei.com \
--cc=aarcange@redhat.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=amit@kernel.org \
--cc=benh@kernel.crashing.org \
--cc=brendan.d.gregg@gmail.com \
--cc=brendanhiggins@google.com \
--cc=cai@lca.pw \
--cc=colin.king@canonical.com \
--cc=corbet@lwn.net \
--cc=dwmw@amazon.com \
--cc=foersleo@amazon.de \
--cc=irogers@google.com \
--cc=jolsa@redhat.com \
--cc=kirill@shutemov.name \
--cc=linux-damon@amazon.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=sblbir@amazon.com \
--cc=shakeelb@google.com \
--cc=shuah@kernel.org \
--cc=sj38.park@gmail.com \
--cc=sjpark@amazon.com \
--cc=sjpark@amazon.de \
--cc=snu@amazon.de \
--cc=vbabka@suse.cz \
--cc=vdavydov.dev@gmail.com \
--cc=yang.shi@linux.alibaba.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).