Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: <vrd@amazon.com>
To: SeongJae Park <sjpark@amazon.com>, <akpm@linux-foundation.org>
Cc: SeongJae Park <sjpark@amazon.de>, <Jonathan.Cameron@Huawei.com>,
	<aarcange@redhat.com>, <acme@kernel.org>,
	<alexander.shishkin@linux.intel.com>, <amit@kernel.org>,
	<benh@kernel.crashing.org>, <brendan.d.gregg@gmail.com>,
	<brendanhiggins@google.com>, <cai@lca.pw>,
	<colin.king@canonical.com>, <corbet@lwn.net>, <dwmw@amazon.com>,
	<foersleo@amazon.de>, <irogers@google.com>, <jolsa@redhat.com>,
	<kirill@shutemov.name>, <mark.rutland@arm.com>, <mgorman@suse.de>,
	<minchan@kernel.org>, <mingo@redhat.com>, <namhyung@kernel.org>,
	<peterz@infradead.org>, <rdunlap@infradead.org>,
	<riel@surriel.com>, <rientjes@google.com>, <rostedt@goodmis.org>,
	<sblbir@amazon.com>, <shakeelb@google.com>, <shuah@kernel.org>,
	<sj38.park@gmail.com>, <snu@amazon.de>, <vbabka@suse.cz>,
	<vdavydov.dev@gmail.com>, <yang.shi@linux.alibaba.com>,
	<ying.huang@intel.com>, <david@redhat.com>,
	<linux-damon@amazon.com>, <linux-mm@kvack.org>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v15 03/14] mm/damon: Implement region based sampling
Date: Wed, 10 Jun 2020 22:36:00 +0200
Message-ID: <e9c0655b-0b6c-46b2-275d-22bdcd01c66f@amazon.com> (raw)
In-Reply-To: <20200608114047.26589-4-sjpark@amazon.com>

On 6/8/20 1:40 PM, SeongJae Park wrote:
> From: SeongJae Park <sjpark@amazon.de>
> 
> This commit implements DAMON's basic access check and region based
> sampling mechanisms.  This change would seems make no sense, mainly
> because it is only a part of the DAMON's logics.  Following two commits
> will make more sense.
> 
> Basic Access Check
> ------------------
> 
> DAMON basically reports what pages are how frequently accessed.  Note
> that the frequency is not an absolute number of accesses, but a relative
> frequency among the pages of the target workloads.
> 
> Users can control the resolution of the reports by setting two time
> intervals, ``sampling interval`` and ``aggregation interval``.  In
> detail, DAMON checks access to each page per ``sampling interval``,
> aggregates the results (counts the number of the accesses to each page),
> and reports the aggregated results per ``aggregation interval``.  For
> the access check of each page, DAMON uses the Accessed bits of PTEs.
> 
> This is thus similar to common periodic access checks based access
> tracking mechanisms, which overhead is increasing as the size of the
> target process grows.
> 
> Region Based Sampling
> ---------------------
> 
> To avoid the unbounded increase of the overhead, DAMON groups a number
> of adjacent pages that assumed to have same access frequencies into a
> region.  As long as the assumption (pages in a region have same access
> frequencies) is kept, only one page in the region is required to be
> checked.  Thus, for each ``sampling interval``, DAMON randomly picks one
> page in each region and clears its Accessed bit.  After one more
> ``sampling interval``, DAMON reads the Accessed bit of the page and
> increases the access frequency of the region if the bit has set
> meanwhile.  Therefore, the monitoring overhead is controllable by
> setting the number of regions.
> 
> Nonetheless, this scheme cannot preserve the quality of the output if
> the assumption is not kept.  Following commit will introduce how we can
> make the guarantee with best effort.
> 
> Signed-off-by: SeongJae Park <sjpark@amazon.de>
> Reviewed-by: Leonard Foerster <foersleo@amazon.de>
> ---
>  include/linux/damon.h |  48 +++-
>  mm/damon.c            | 615 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 660 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 135633334929..f0fe4520a4e9 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -11,6 +11,8 @@
>  #define _DAMON_H_
>  
>  #include <linux/random.h>
> +#include <linux/mutex.h>
> +#include <linux/time64.h>
>  #include <linux/types.h>
>  
>  /**
> @@ -44,11 +46,55 @@ struct damon_task {
>  };
>  
>  /**
> - * struct damon_ctx - Represents a context for each monitoring.
> + * struct damon_ctx - Represents a context for each monitoring.  This is the
> + * main interface that allows users to set the attributes and get the results
> + * of the monitoring.
> + *
> + * For each monitoring request (damon_start()), a kernel thread for the
> + * monitoring is created.  The pointer to the thread is stored in @kdamond.
> + *
> + * @sample_interval:		The time between access samplings.
> + * @aggr_interval:		The time between monitor results aggregations.
> + * @min_nr_regions:		The number of initial monitoring regions.
> + *
> + * For each @sample_interval, DAMON checks whether each region is accessed or
> + * not.  It aggregates and keeps the access information (number of accesses to
> + * each region) for @aggr_interval time.  All time intervals are in
> + * micro-seconds.
> + *
> + * @kdamond:		Kernel thread who does the monitoring.
> + * @kdamond_stop:	Notifies whether kdamond should stop.
> + * @kdamond_lock:	Mutex for the synchronizations with @kdamond.
> + *
> + * The monitoring thread sets @kdamond to NULL when it terminates.  Therefore,
> + * users can know whether the monitoring is ongoing or terminated by reading
> + * @kdamond.  Also, users can ask @kdamond to be terminated by writing non-zero
> + * to @kdamond_stop.  Reads and writes to @kdamond and @kdamond_stop from
> + * outside of the monitoring thread must be protected by @kdamond_lock.
> + *
> + * Note that the monitoring thread protects only @kdamond and @kdamond_stop via
> + * @kdamond_lock.  Accesses to other fields must be protected by themselves.
> + *
>   * @tasks_list:		Head of monitoring target tasks (&damon_task) list.
>   */
>  struct damon_ctx {
> +	unsigned long sample_interval;
> +	unsigned long aggr_interval;
> +	unsigned long min_nr_regions;
> +
> +	struct timespec64 last_aggregation;
> +
> +	struct task_struct *kdamond;
> +	bool kdamond_stop;
> +	struct mutex kdamond_lock;
> +
>  	struct list_head tasks_list;	/* 'damon_task' objects */
>  };
>  
> +int damon_set_pids(struct damon_ctx *ctx, int *pids, ssize_t nr_pids);
> +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> +		unsigned long aggr_int, unsigned long min_nr_reg);
> +int damon_start(struct damon_ctx *ctx);
> +int damon_stop(struct damon_ctx *ctx);
> +
>  #endif
> diff --git a/mm/damon.c b/mm/damon.c
> index 170e8a694dbe..fa14ff7dd31a 100644
> --- a/mm/damon.c
> +++ b/mm/damon.c
> @@ -9,18 +9,29 @@
>   * This file is constructed in below parts.
>   *
>   * - Functions and macros for DAMON data structures
> + * - Functions for the initial monitoring target regions construction
> + * - Functions for the access checking of the regions
> + * - Functions for DAMON core logics and features
> + * - Functions for the DAMON programming interface
>   * - Functions for the module loading/unloading
> - *
> - * The core parts are not implemented yet.
>   */
>  
>  #define pr_fmt(fmt) "damon: " fmt
>  
>  #include <linux/damon.h>
> +#include <linux/delay.h>
> +#include <linux/kthread.h>
>  #include <linux/mm.h>
>  #include <linux/module.h>
> +#include <linux/page_idle.h>
> +#include <linux/random.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/task.h>
>  #include <linux/slab.h>
>  
> +/* Minimal region size.  Every damon_region is aligned by this. */
> +#define MIN_REGION PAGE_SIZE
> +
>  /*
>   * Functions and macros for DAMON data structures
>   */
> @@ -167,6 +178,606 @@ static unsigned int nr_damon_regions(struct damon_task *t)
>  	return nr_regions;
>  }
>  
> +/*
> + * Get the mm_struct of the given task
> + *
> + * Caller _must_ put the mm_struct after use, unless it is NULL.
> + *
> + * Returns the mm_struct of the task on success, NULL on failure
> + */
> +static struct mm_struct *damon_get_mm(struct damon_task *t)
> +{
> +	struct task_struct *task;
> +	struct mm_struct *mm;
> +
> +	task = damon_get_task_struct(t);
> +	if (!task)
> +		return NULL;
> +
> +	mm = get_task_mm(task);
> +	put_task_struct(task);
> +	return mm;
> +}
> +
> +/*
> + * Functions for the initial monitoring target regions construction
> + */
> +
> +/*
> + * Size-evenly split a region into 'nr_pieces' small regions
> + *
> + * Returns 0 on success, or negative error code otherwise.
> + */
> +static int damon_split_region_evenly(struct damon_ctx *ctx,
> +		struct damon_region *r, unsigned int nr_pieces)
> +{
> +	unsigned long sz_orig, sz_piece, orig_end;
> +	struct damon_region *n = NULL, *next;
> +	unsigned long start;
> +
> +	if (!r || !nr_pieces)
> +		return -EINVAL;
> +
> +	orig_end = r->vm_end;
> +	sz_orig = r->vm_end - r->vm_start;
> +	sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, MIN_REGION);
> +
> +	if (!sz_piece)
> +		return -EINVAL;
> +
> +	r->vm_end = r->vm_start + sz_piece;
> +	next = damon_next_region(r);
> +	for (start = r->vm_end; start + sz_piece <= orig_end;
> +			start += sz_piece) {
> +		n = damon_new_region(ctx, start, start + sz_piece);
> +		if (!n)
> +			return -ENOMEM;
> +		damon_insert_region(n, r, next);
> +		r = n;
> +	}
> +	/* complement last region for possible rounding error */
> +	if (n)
> +		n->vm_end = orig_end;
> +
> +	return 0;
> +}
> +
> +struct region {
> +	unsigned long start;
> +	unsigned long end;
> +};
> +
> +static unsigned long sz_region(struct region *r)
> +{
> +	return r->end - r->start;
> +}
> +
> +static void swap_regions(struct region *r1, struct region *r2)
> +{
> +	struct region tmp;
> +
> +	tmp = *r1;
> +	*r1 = *r2;
> +	*r2 = tmp;
> +}
> +
> +/*
> + * Find three regions separated by two biggest unmapped regions
> + *
> + * vma		the head vma of the target address space
> + * regions	an array of three 'struct region's that results will be saved
> + *
> + * This function receives an address space and finds three regions in it which
> + * separated by the two biggest unmapped regions in the space.  Please refer to
> + * below comments of 'damon_init_regions_of()' function to know why this is
> + * necessary.
> + *
> + * Returns 0 if success, or negative error code otherwise.
> + */
> +static int damon_three_regions_in_vmas(struct vm_area_struct *vma,
> +		struct region regions[3])
> +{
> +	struct region gap = {0}, first_gap = {0}, second_gap = {0};
> +	struct vm_area_struct *last_vma = NULL;
> +	unsigned long start = 0;
> +
> +	/* Find two biggest gaps so that first_gap > second_gap > others */
> +	for (; vma; vma = vma->vm_next) {

Since vm_area_struct already maintains information about the largest gap below this vma
in the mm_rb rbtree, walking the vma via mm_rb instead of the linked list, and skipping
the ones with don't fit the gap requirement via vma->rb_subtree_gap helps avoid the
extra comparisons in this function.

I measured the following implementation to be considerably faster as the number of
vmas grows for a process damon would attach to:

-static int damon_three_regions_in_vmas(struct vm_area_struct *vma,
+static int damon_three_regions_in_vmas(struct rb_root *root,
 		struct region regions[3])
 {
+	struct rb_node *nd = NULL;
 	struct region gap = {0}, first_gap = {0}, second_gap = {0};
-	struct vm_area_struct *last_vma = NULL;
+	struct vm_area_struct *vma = NULL;
 	unsigned long start = 0;
 
 	/* Find two biggest gaps so that first_gap > second_gap > others */
-	for (; vma; vma = vma->vm_next) {
-		if (!last_vma) {
-			start = vma->vm_start;
-			last_vma = vma;
+	for (nd = rb_first(root); nd; nd = rb_next(nd)) {
+		vma = rb_entry(nd, struct vm_area_struct, vm_rb);
+
+		if (vma->rb_subtree_gap < sz_region(&second_gap)) {
+			/*
+			 * Skip this vma if the largest gap at this vma is still
+			 * smaller than what we have encountered so far.
+			 */
 			continue;
 		}
-		gap.start = last_vma->vm_end;
+		if (!vma->vm_prev) {
+			/* This is the first vma. */
+			start = vma->vm_start;
+			continue;
+		}
+		gap.start = vma->vm_prev->vm_end;
 		gap.end = vma->vm_start;
 		if (sz_region(&gap) > sz_region(&second_gap)) {
 			swap_regions(&gap, &second_gap);
 			if (sz_region(&second_gap) > sz_region(&first_gap))
 				swap_regions(&second_gap, &first_gap);
 		}
-		last_vma = vma;
 	}
 
 	if (!sz_region(&second_gap) || !sz_region(&first_gap))
@@ -35,7 +44,7 @@
 	regions[1].start = ALIGN(first_gap.end, MIN_REGION);
 	regions[1].end = ALIGN(second_gap.start, MIN_REGION);
 	regions[2].start = ALIGN(second_gap.end, MIN_REGION);
-	regions[2].end = ALIGN(last_vma->vm_end, MIN_REGION);
+	regions[2].end = ALIGN(vma->vm_end, MIN_REGION);
 
 	return 0;
 }


> +		if (!last_vma) {
> +			start = vma->vm_start;
> +			last_vma = vma;
> +			continue;
> +		}
> +		gap.start = last_vma->vm_end;
> +		gap.end = vma->vm_start;
> +		if (sz_region(&gap) > sz_region(&second_gap)) {
> +			swap_regions(&gap, &second_gap);
> +			if (sz_region(&second_gap) > sz_region(&first_gap))
> +				swap_regions(&second_gap, &first_gap);
> +		}
> +		last_vma = vma;
> +	}
> +
> +	if (!sz_region(&second_gap) || !sz_region(&first_gap))
> +		return -EINVAL;
> +
> +	/* Sort the two biggest gaps by address */
> +	if (first_gap.start > second_gap.start)
> +		swap_regions(&first_gap, &second_gap);
> +
> +	/* Store the result */
> +	regions[0].start = ALIGN(start, MIN_REGION);
> +	regions[0].end = ALIGN(first_gap.start, MIN_REGION);
> +	regions[1].start = ALIGN(first_gap.end, MIN_REGION);
> +	regions[1].end = ALIGN(second_gap.start, MIN_REGION);
> +	regions[2].start = ALIGN(second_gap.end, MIN_REGION);
> +	regions[2].end = ALIGN(last_vma->vm_end, MIN_REGION);
> +
> +	return 0;
> +}
> +
> +/*
> + * Get the three regions in the given task
> + *
> + * Returns 0 on success, negative error code otherwise.
> + */
> +static int damon_three_regions_of(struct damon_task *t,
> +				struct region regions[3])
> +{
> +	struct mm_struct *mm;
> +	int rc;
> +
> +	mm = damon_get_mm(t);
> +	if (!mm)
> +		return -EINVAL;
> +
> +	down_read(&mm->mmap_sem);
> +	rc = damon_three_regions_in_vmas(mm->mmap, regions);
> +	up_read(&mm->mmap_sem);
> +
> +	mmput(mm);
> +	return rc;
> +}
> +
> +/*
> + * Initialize the monitoring target regions for the given task
> + *
> + * t	the given target task
> + *
> + * Because only a number of small portions of the entire address space
> + * is actually mapped to the memory and accessed, monitoring the unmapped
> + * regions is wasteful.  That said, because we can deal with small noises,
> + * tracking every mapping is not strictly required but could even incur a high
> + * overhead if the mapping frequently changes or the number of mappings is
> + * high.  Nonetheless, this may seems very weird.  DAMON's dynamic regions
> + * adjustment mechanism, which will be implemented with following commit will
> + * make this more sense.
> + *
> + * For the reason, we convert the complex mappings to three distinct regions
> + * that cover every mapped area of the address space.  Also the two gaps
> + * between the three regions are the two biggest unmapped areas in the given
> + * address space.  In detail, this function first identifies the start and the
> + * end of the mappings and the two biggest unmapped areas of the address space.
> + * Then, it constructs the three regions as below:
> + *
> + *     [mappings[0]->start, big_two_unmapped_areas[0]->start)
> + *     [big_two_unmapped_areas[0]->end, big_two_unmapped_areas[1]->start)
> + *     [big_two_unmapped_areas[1]->end, mappings[nr_mappings - 1]->end)
> + *
> + * As usual memory map of processes is as below, the gap between the heap and
> + * the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed
> + * region and the stack will be two biggest unmapped regions.  Because these
> + * gaps are exceptionally huge areas in usual address space, excluding these
> + * two biggest unmapped regions will be sufficient to make a trade-off.
> + *
> + *   <heap>
> + *   <BIG UNMAPPED REGION 1>
> + *   <uppermost mmap()-ed region>
> + *   (other mmap()-ed regions and small unmapped regions)
> + *   <lowermost mmap()-ed region>
> + *   <BIG UNMAPPED REGION 2>
> + *   <stack>
> + */
> +static void damon_init_regions_of(struct damon_ctx *c, struct damon_task *t)
> +{
> +	struct damon_region *r, *m = NULL;
> +	struct region regions[3];
> +	int i;
> +
> +	if (damon_three_regions_of(t, regions)) {
> +		pr_err("Failed to get three regions of task %d\n", t->pid);
> +		return;
> +	}
> +
> +	/* Set the initial three regions of the task */
> +	for (i = 0; i < 3; i++) {
> +		r = damon_new_region(c, regions[i].start, regions[i].end);
> +		if (!r) {
> +			pr_err("%d'th init region creation failed\n", i);
> +			return;
> +		}
> +		damon_add_region(r, t);
> +		if (i == 1)
> +			m = r;
> +	}
> +
> +	/* Split the middle region into 'min_nr_regions - 2' regions */
> +	if (damon_split_region_evenly(c, m, c->min_nr_regions - 2))
> +		pr_warn("Init middle region failed to be split\n");
> +}
> +
> +/* Initialize '->regions_list' of every task */
> +static void kdamond_init_regions(struct damon_ctx *ctx)
> +{
> +	struct damon_task *t;
> +
> +	damon_for_each_task(t, ctx)
> +		damon_init_regions_of(ctx, t);
> +}
> +
> +/*
> + * Functions for the access checking of the regions
> + */
> +
> +static void damon_mkold(struct mm_struct *mm, unsigned long addr)
> +{
> +	pte_t *pte = NULL;
> +	pmd_t *pmd = NULL;
> +	spinlock_t *ptl;
> +
> +	if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl))
> +		return;
> +
> +	if (pte) {
> +		if (pte_young(*pte)) {
> +			clear_page_idle(pte_page(*pte));
> +			set_page_young(pte_page(*pte));
> +		}
> +		*pte = pte_mkold(*pte);
> +		pte_unmap_unlock(pte, ptl);
> +		return;
> +	}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	if (pmd_young(*pmd)) {
> +		clear_page_idle(pmd_page(*pmd));
> +		set_page_young(pmd_page(*pmd));
> +	}
> +	*pmd = pmd_mkold(*pmd);
> +	spin_unlock(ptl);
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +}
> +
> +static void damon_prepare_access_check(struct damon_ctx *ctx,
> +			struct mm_struct *mm, struct damon_region *r)
> +{
> +	r->sampling_addr = damon_rand(r->vm_start, r->vm_end);
> +
> +	damon_mkold(mm, r->sampling_addr);
> +}
> +
> +static void kdamond_prepare_access_checks(struct damon_ctx *ctx)
> +{
> +	struct damon_task *t;
> +	struct mm_struct *mm;
> +	struct damon_region *r;
> +
> +	damon_for_each_task(t, ctx) {
> +		mm = damon_get_mm(t);
> +		if (!mm)
> +			continue;
> +		damon_for_each_region(r, t)
> +			damon_prepare_access_check(ctx, mm, r);
> +		mmput(mm);
> +	}
> +}
> +
> +static bool damon_young(struct mm_struct *mm, unsigned long addr,
> +			unsigned long *page_sz)
> +{
> +	pte_t *pte = NULL;
> +	pmd_t *pmd = NULL;
> +	spinlock_t *ptl;
> +	bool young = false;
> +
> +	if (follow_pte_pmd(mm, addr, NULL, &pte, &pmd, &ptl))
> +		return false;
> +
> +	*page_sz = PAGE_SIZE;
> +	if (pte) {
> +		young = pte_young(*pte);
> +		pte_unmap_unlock(pte, ptl);
> +		return young;
> +	}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	young = pmd_young(*pmd);
> +	spin_unlock(ptl);
> +	*page_sz = ((1UL) << HPAGE_PMD_SHIFT);
> +#endif	/* CONFIG_TRANSPARENT_HUGEPAGE */
> +
> +	return young;
> +}
> +
> +/*
> + * Check whether the region was accessed after the last preparation
> + *
> + * mm	'mm_struct' for the given virtual address space
> + * r	the region to be checked
> + */
> +static void damon_check_access(struct damon_ctx *ctx,
> +			       struct mm_struct *mm, struct damon_region *r)
> +{
> +	static struct mm_struct *last_mm;
> +	static unsigned long last_addr;
> +	static unsigned long last_page_sz = PAGE_SIZE;
> +	static bool last_accessed;
> +
> +	/* If the region is in the last checked page, reuse the result */
> +	if (mm == last_mm && (ALIGN_DOWN(last_addr, last_page_sz) ==
> +				ALIGN_DOWN(r->sampling_addr, last_page_sz))) {
> +		if (last_accessed)
> +			r->nr_accesses++;
> +		return;
> +	}
> +
> +	last_accessed = damon_young(mm, r->sampling_addr, &last_page_sz);
> +	if (last_accessed)
> +		r->nr_accesses++;
> +
> +	last_mm = mm;
> +	last_addr = r->sampling_addr;
> +}
> +
> +static void kdamond_check_accesses(struct damon_ctx *ctx)
> +{
> +	struct damon_task *t;
> +	struct mm_struct *mm;
> +	struct damon_region *r;
> +
> +	damon_for_each_task(t, ctx) {
> +		mm = damon_get_mm(t);
> +		if (!mm)
> +			continue;
> +		damon_for_each_region(r, t)
> +			damon_check_access(ctx, mm, r);
> +		mmput(mm);
> +	}
> +}
> +
> +/*
> + * Functions for DAMON core logics and features
> + */
> +
> +/*
> + * damon_check_reset_time_interval() - Check if a time interval is elapsed.
> + * @baseline:	the time to check whether the interval has elapsed since
> + * @interval:	the time interval (microseconds)
> + *
> + * See whether the given time interval has passed since the given baseline
> + * time.  If so, it also updates the baseline to current time for next check.
> + *
> + * Return:	true if the time interval has passed, or false otherwise.
> + */
> +static bool damon_check_reset_time_interval(struct timespec64 *baseline,
> +		unsigned long interval)
> +{
> +	struct timespec64 now;
> +
> +	ktime_get_coarse_ts64(&now);
> +	if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) <
> +			interval * 1000)
> +		return false;
> +	*baseline = now;
> +	return true;
> +}
> +
> +/*
> + * Check whether it is time to flush the aggregated information
> + */
> +static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx)
> +{
> +	return damon_check_reset_time_interval(&ctx->last_aggregation,
> +			ctx->aggr_interval);
> +}
> +
> +/*
> + * Reset the aggregated monitoring results
> + */
> +static void kdamond_reset_aggregated(struct damon_ctx *c)
> +{
> +	struct damon_task *t;
> +	struct damon_region *r;
> +
> +	damon_for_each_task(t, c) {
> +		damon_for_each_region(r, t)
> +			r->nr_accesses = 0;
> +	}
> +}
> +
> +/*
> + * Check whether current monitoring should be stopped
> + *
> + * The monitoring is stopped when either the user requested to stop, or all
> + * monitoring target tasks are dead.
> + *
> + * Returns true if need to stop current monitoring.
> + */
> +static bool kdamond_need_stop(struct damon_ctx *ctx)
> +{
> +	struct damon_task *t;
> +	struct task_struct *task;
> +	bool stop;
> +
> +	mutex_lock(&ctx->kdamond_lock);
> +	stop = ctx->kdamond_stop;
> +	mutex_unlock(&ctx->kdamond_lock);
> +	if (stop)
> +		return true;
> +
> +	damon_for_each_task(t, ctx) {
> +		task = damon_get_task_struct(t);
> +		if (task) {
> +			put_task_struct(task);
> +			return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * The monitoring daemon that runs as a kernel thread
> + */
> +static int kdamond_fn(void *data)
> +{
> +	struct damon_ctx *ctx = (struct damon_ctx *)data;
> +	struct damon_task *t;
> +	struct damon_region *r, *next;
> +
> +	pr_info("kdamond (%d) starts\n", ctx->kdamond->pid);
> +	kdamond_init_regions(ctx);
> +	while (!kdamond_need_stop(ctx)) {
> +		kdamond_prepare_access_checks(ctx);
> +
> +		usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
> +
> +		kdamond_check_accesses(ctx);
> +
> +		if (kdamond_aggregate_interval_passed(ctx))
> +			kdamond_reset_aggregated(ctx);
> +
> +	}
> +	damon_for_each_task(t, ctx) {
> +		damon_for_each_region_safe(r, next, t)
> +			damon_destroy_region(r);
> +	}
> +	pr_debug("kdamond (%d) finishes\n", ctx->kdamond->pid);
> +	mutex_lock(&ctx->kdamond_lock);
> +	ctx->kdamond = NULL;
> +	mutex_unlock(&ctx->kdamond_lock);
> +
> +	do_exit(0);
> +}
> +
> +/*
> + * Functions for the DAMON programming interface
> + */
> +
> +static bool damon_kdamond_running(struct damon_ctx *ctx)
> +{
> +	bool running;
> +
> +	mutex_lock(&ctx->kdamond_lock);
> +	running = ctx->kdamond != NULL;
> +	mutex_unlock(&ctx->kdamond_lock);
> +
> +	return running;
> +}
> +
> +/**
> + * damon_start() - Starts monitoring with given context.
> + * @ctx:	monitoring context
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_start(struct damon_ctx *ctx)
> +{
> +	int err = -EBUSY;
> +
> +	mutex_lock(&ctx->kdamond_lock);
> +	if (!ctx->kdamond) {
> +		err = 0;
> +		ctx->kdamond_stop = false;
> +		ctx->kdamond = kthread_run(kdamond_fn, ctx, "kdamond");
> +		if (IS_ERR(ctx->kdamond))
> +			err = PTR_ERR(ctx->kdamond);
> +	}
> +	mutex_unlock(&ctx->kdamond_lock);
> +
> +	return err;
> +}
> +
> +/**
> + * damon_stop() - Stops monitoring of given context.
> + * @ctx:	monitoring context
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_stop(struct damon_ctx *ctx)
> +{
> +	mutex_lock(&ctx->kdamond_lock);
> +	if (ctx->kdamond) {
> +		ctx->kdamond_stop = true;
> +		mutex_unlock(&ctx->kdamond_lock);
> +		while (damon_kdamond_running(ctx))
> +			usleep_range(ctx->sample_interval,
> +					ctx->sample_interval * 2);
> +		return 0;
> +	}
> +	mutex_unlock(&ctx->kdamond_lock);
> +
> +	return -EPERM;
> +}
> +
> +/**
> + * damon_set_pids() - Set monitoring target processes.
> + * @ctx:	monitoring context
> + * @pids:	array of target processes pids
> + * @nr_pids:	number of entries in @pids
> + *
> + * This function should not be called while the kdamond is running.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_set_pids(struct damon_ctx *ctx, int *pids, ssize_t nr_pids)
> +{
> +	ssize_t i;
> +	struct damon_task *t, *next;
> +
> +	damon_for_each_task_safe(t, next, ctx)
> +		damon_destroy_task(t);
> +
> +	for (i = 0; i < nr_pids; i++) {
> +		t = damon_new_task(pids[i]);
> +		if (!t) {
> +			pr_err("Failed to alloc damon_task\n");
> +			return -ENOMEM;
> +		}
> +		damon_add_task(ctx, t);
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * damon_set_attrs() - Set attributes for the monitoring.
> + * @ctx:		monitoring context
> + * @sample_int:		time interval between samplings
> + * @aggr_int:		time interval between aggregations
> + * @min_nr_reg:		minimal number of regions
> + *
> + * This function should not be called while the kdamond is running.
> + * Every time interval is in micro-seconds.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> +		unsigned long aggr_int, unsigned long min_nr_reg)
> +{
> +	if (min_nr_reg < 3) {
> +		pr_err("min_nr_regions (%lu) must be at least 3\n",
> +				min_nr_reg);
> +		return -EINVAL;
> +	}
> +
> +	ctx->sample_interval = sample_int;
> +	ctx->aggr_interval = aggr_int;
> +	ctx->min_nr_regions = min_nr_reg;
> +
> +	return 0;
> +}
> +
>  /*
>   * Functions for the module loading/unloading
>   */
> 




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-08 11:40 [PATCH v15 00/14] Introduce Data Access MONitor (DAMON) SeongJae Park
2020-06-08 11:40 ` [PATCH v15 01/14] mm/page_ext: Export lookup_page_ext() to GPL modules SeongJae Park
2020-06-08 11:53   ` David Hildenbrand
2020-06-08 11:56     ` SeongJae Park
2020-06-08 15:49     ` Christoph Hellwig
2020-06-08 17:48       ` SeongJae Park
2020-06-08 18:15       ` David Hildenbrand
2020-06-10 20:13   ` vrd
2020-06-08 11:40 ` [PATCH v15 02/14] mm: Introduce Data Access MONitor (DAMON) SeongJae Park
2020-06-08 11:40 ` [PATCH v15 03/14] mm/damon: Implement region based sampling SeongJae Park
2020-06-10 20:36   ` vrd [this message]
2020-06-11  7:21     ` SeongJae Park
2020-06-08 11:40 ` [PATCH v15 04/14] mm/damon: Adaptively adjust regions SeongJae Park
2020-06-10 10:13   ` SeongJae Park
2020-06-08 11:40 ` [PATCH v15 05/14] mm/damon: Apply dynamic memory mapping changes SeongJae Park
2020-06-08 11:40 ` [PATCH v15 06/14] mm/damon: Implement callbacks SeongJae Park
2020-06-08 11:40 ` [PATCH v15 07/14] mm/damon: Implement access pattern recording SeongJae Park
2020-06-08 11:40 ` [PATCH v15 08/14] mm/damon: Add debugfs interface SeongJae Park
2020-06-08 11:40 ` [PATCH v15 09/14] mm/damon: Add tracepoints SeongJae Park
2020-06-08 11:40 ` [PATCH v15 10/14] tools: Add a minimal user-space tool for DAMON SeongJae Park
2020-06-08 11:40 ` [PATCH v15 11/14] Documentation/admin-guide/mm: Add a document " SeongJae Park
2020-06-08 11:40 ` [PATCH v15 12/14] mm/damon: Add kunit tests SeongJae Park
2020-06-08 11:40 ` [PATCH v15 13/14] mm/damon: Add user space selftests SeongJae Park
2020-06-08 11:40 ` [PATCH v15 14/14] MAINTAINERS: Update for DAMON SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9c0655b-0b6c-46b2-275d-22bdcd01c66f@amazon.com \
    --to=vrd@amazon.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=amit@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=brendan.d.gregg@gmail.com \
    --cc=brendanhiggins@google.com \
    --cc=cai@lca.pw \
    --cc=colin.king@canonical.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=dwmw@amazon.com \
    --cc=foersleo@amazon.de \
    --cc=irogers@google.com \
    --cc=jolsa@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-damon@amazon.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=sblbir@amazon.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sj38.park@gmail.com \
    --cc=sjpark@amazon.com \
    --cc=sjpark@amazon.de \
    --cc=snu@amazon.de \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git