[PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
@ 2021-12-22 11:14 Baolin Wang
  2021-12-22 11:14 ` [PATCH v2 1/2] mm: Export the demote_page_list() function Baolin Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Baolin Wang @ 2021-12-22 11:14 UTC (permalink / raw)
  To: sj, akpm
  Cc: ying.huang, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	baolin.wang, linux-mm, linux-kernel

Hi,

Now on tiered memory system with different memory types, the reclaim path in
shrink_page_list() already support demoting pages to slow memory node instead
of discarding the pages. However, at that time the fast memory node memory
wartermark is already tense, which will increase the memory allocation latency
during page demotion. So a new method from user space demoting cold pages
proactively will be more helpful.

We can rely on the DAMON in user space to help to monitor the cold memory on
fast memory node, and demote the cold pages to slow memory node proactively to
keep the fast memory node in a healthy state.

This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
and works well from my testing. Any comments are welcome. Thanks.

Changes from v1:
 - Reuse the demote_page_list().
 - Fix some comments style issues.
 - Move the DAMOS_DEMOTE definition to the correct place.
 - Rename some function name.
 - Change to return void type for damos_isolate_page().
 - Remove unnecessary PAGE_ALIGN() in damos_demote().
 - Fix the return value for damos_demote().

Baolin Wang (2):
  mm: Export the demote_page_list() function
  mm/damon: Add a new scheme to support demotion on tiered memory system

 include/linux/damon.h |   3 ++
 mm/damon/dbgfs.c      |   1 +
 mm/damon/vaddr.c      | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/internal.h         |   2 +
 mm/vmscan.c           |   4 +-
 5 files changed, 155 insertions(+), 2 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/2] mm: Export the demote_page_list() function
  2021-12-22 11:14 [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system Baolin Wang
@ 2021-12-22 11:14 ` Baolin Wang
  2021-12-22 11:14 ` [PATCH v2 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system Baolin Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2021-12-22 11:14 UTC (permalink / raw)
  To: sj, akpm
  Cc: ying.huang, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	baolin.wang, linux-mm, linux-kernel

Export the demote_page_list() function to the head file as a
preparation to support page demotion for DAMON monitor.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/internal.h | 2 ++
 mm/vmscan.c   | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index deb9bda..f11e444 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -181,6 +181,8 @@ static inline void set_page_refcounted(struct page *page)
 extern int isolate_lru_page(struct page *page);
 extern void putback_lru_page(struct page *page);
 extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason);
+extern unsigned int demote_page_list(struct list_head *demote_pages,
+				     struct pglist_data *pgdat);
 
 /*
  * in mm/rmap.c:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f3162a5..849dffa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1480,8 +1480,8 @@ static struct page *alloc_demote_page(struct page *page, unsigned long node)
  * another node.  Pages which are not demoted are left on
  * @demote_pages.
  */
-static unsigned int demote_page_list(struct list_head *demote_pages,
-				     struct pglist_data *pgdat)
+unsigned int demote_page_list(struct list_head *demote_pages,
+			      struct pglist_data *pgdat)
 {
 	int target_nid = next_demotion_node(pgdat->node_id);
 	unsigned int nr_succeeded;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system
  2021-12-22 11:14 [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system Baolin Wang
  2021-12-22 11:14 ` [PATCH v2 1/2] mm: Export the demote_page_list() function Baolin Wang
@ 2021-12-22 11:14 ` Baolin Wang
  2021-12-23  0:01 ` [PATCH v2 0/2] " Andrew Morton
  2021-12-23  1:07 ` Huang, Ying
  3 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2021-12-22 11:14 UTC (permalink / raw)
  To: sj, akpm
  Cc: ying.huang, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	baolin.wang, linux-mm, linux-kernel

On tiered memory system, the reclaim path in shrink_page_list() already
support demoting pages to slow memory node instead of discarding the
pages. However, at that time the fast memory node memory wartermark is
already tense, which will increase the memory allocation latency during
page demotion.

We can rely on the DAMON in user space to help to monitor the cold
memory on fast memory node, and demote the cold pages to slow memory
node proactively to keep the fast memory node in a healthy state.
Thus this patch introduces a new scheme named DAMOS_DEMOTE to support
this feature.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/damon.h |   3 ++
 mm/damon/dbgfs.c      |   1 +
 mm/damon/vaddr.c      | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index af64838..ec46c7a 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -87,6 +87,8 @@ struct damon_target {
  * @DAMOS_PAGEOUT:	Call ``madvise()`` for the region with MADV_PAGEOUT.
  * @DAMOS_HUGEPAGE:	Call ``madvise()`` for the region with MADV_HUGEPAGE.
  * @DAMOS_NOHUGEPAGE:	Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
+ * @DAMOS_DEMOTE:	Migrate cold pages from fast memory node (DRAM) to slow
+ *			memory node (persistent memory).
  * @DAMOS_STAT:		Do nothing but count the stat.
  */
 enum damos_action {
@@ -96,6 +98,7 @@ enum damos_action {
 	DAMOS_HUGEPAGE,
 	DAMOS_NOHUGEPAGE,
 	DAMOS_STAT,		/* Do nothing but only record the stat */
+	DAMOS_DEMOTE,
 };
 
 /**
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 58dbb96..3bd0b0e 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -169,6 +169,7 @@ static bool damos_action_valid(int action)
 	case DAMOS_HUGEPAGE:
 	case DAMOS_NOHUGEPAGE:
 	case DAMOS_STAT:
+	case DAMOS_DEMOTE:
 		return true;
 	default:
 		return false;
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index 9e213a1..bcdc602 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -14,6 +14,10 @@
 #include <linux/page_idle.h>
 #include <linux/pagewalk.h>
 #include <linux/sched/mm.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
+#include <linux/swapops.h>
+#include "../internal.h"
 
 #include "prmtv-common.h"
 
@@ -693,6 +697,147 @@ static unsigned long damos_madvise(struct damon_target *target,
 }
 #endif	/* CONFIG_ADVISE_SYSCALLS */
 
+static void damos_isolate_page(struct page *page, struct list_head *demote_list)
+{
+	struct page *head = compound_head(page);
+
+	/* Do not interfere with other mappings of this page */
+	if (page_mapcount(head) != 1)
+		return;
+
+	/* No need migration if the target demotion node is empty. */
+	if (next_demotion_node(page_to_nid(head)) == NUMA_NO_NODE)
+		return;
+
+	if (isolate_lru_page(head))
+		return;
+
+	list_add_tail(&head->lru, demote_list);
+	mod_node_page_state(page_pgdat(head),
+			    NR_ISOLATED_ANON + page_is_file_lru(head),
+			    thp_nr_pages(head));
+}
+
+static int damos_isolate_pmd_entry(pmd_t *pmd, unsigned long addr,
+				   unsigned long end, struct mm_walk *walk)
+{
+	struct vm_area_struct *vma = walk->vma;
+	struct list_head *demote_list = walk->private;
+	spinlock_t *ptl;
+	struct page *page;
+	pte_t *pte, *mapped_pte;
+
+	if (!vma_migratable(vma))
+		return -EFAULT;
+
+	ptl = pmd_trans_huge_lock(pmd, vma);
+	if (ptl) {
+		/* Bail out if THP migration is not supported. */
+		if (!thp_migration_supported())
+			goto thp_out;
+
+		/* If the THP pte is under migration, do not bother it. */
+		if (unlikely(is_pmd_migration_entry(*pmd)))
+			goto thp_out;
+
+		page = damon_get_page(pmd_pfn(*pmd));
+		if (!page)
+			goto thp_out;
+
+		damos_isolate_page(page, demote_list);
+
+		put_page(page);
+thp_out:
+		spin_unlock(ptl);
+		return 0;
+	}
+
+	/* regular page handling */
+	if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
+		return -EINVAL;
+
+	mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	for (; addr != end; pte++, addr += PAGE_SIZE) {
+		if (pte_none(*pte) || !pte_present(*pte))
+			continue;
+
+		page = damon_get_page(pte_pfn(*pte));
+		if (!page)
+			continue;
+
+		damos_isolate_page(page, demote_list);
+		put_page(page);
+	}
+	pte_unmap_unlock(mapped_pte, ptl);
+	cond_resched();
+
+	return 0;
+}
+
+static const struct mm_walk_ops damos_isolate_pages_walk_ops = {
+	.pmd_entry              = damos_isolate_pmd_entry,
+};
+
+/*
+ * damos_demote() - demote cold pages from fast memory to slow memory
+ * @target:    the given target
+ * @r:         region of the target
+ *
+ * On tiered memory system, if DAMON monitored cold pages on fast memory
+ * node (DRAM), we can demote them to slow memory node proactively in case
+ * accumulating much more cold memory on fast memory node (DRAM) to reclaim.
+ *
+ * Return the bytes of the region that the DAMOS_DEMOTE action is successfully
+ * applied.
+ */
+static unsigned long damos_demote(struct damon_target *target,
+				  struct damon_region *r)
+{
+	struct mm_struct *mm;
+	LIST_HEAD(demote_pages);
+	LIST_HEAD(pagelist);
+	unsigned int nr_succeeded, demoted_pages = 0;
+	struct page *page, *next;
+
+	/* Validate if allowing to do page demotion */
+	if (!numa_demotion_enabled)
+		return 0;
+
+	mm = damon_get_mm(target);
+	if (!mm)
+		return 0;
+
+	mmap_read_lock(mm);
+	walk_page_range(mm, r->ar.start, r->ar.end,
+			&damos_isolate_pages_walk_ops, &demote_pages);
+	mmap_read_unlock(mm);
+
+	mmput(mm);
+	if (list_empty(&demote_pages))
+		return 0;
+
+	list_for_each_entry_safe(page, next, &demote_pages, lru) {
+		list_add(&page->lru, &pagelist);
+
+		nr_succeeded = demote_page_list(&pagelist, page_pgdat(page));
+		if (!nr_succeeded) {
+			if (!list_empty(&pagelist)) {
+				list_del(&page->lru);
+				mod_node_page_state(page_pgdat(page),
+						    NR_ISOLATED_ANON + page_is_file_lru(page),
+						    -thp_nr_pages(page));
+				putback_lru_page(page);
+			}
+		} else {
+			demoted_pages += nr_succeeded;
+		}
+
+		cond_resched();
+	}
+
+	return demoted_pages * PAGE_SIZE;
+}
+
 static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
 		struct damon_target *t, struct damon_region *r,
 		struct damos *scheme)
@@ -717,6 +862,8 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
 		break;
 	case DAMOS_STAT:
 		return 0;
+	case DAMOS_DEMOTE:
+		return damos_demote(t, r);
 	default:
 		return 0;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-22 11:14 [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system Baolin Wang
  2021-12-22 11:14 ` [PATCH v2 1/2] mm: Export the demote_page_list() function Baolin Wang
  2021-12-22 11:14 ` [PATCH v2 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system Baolin Wang
@ 2021-12-23  0:01 ` Andrew Morton
  2021-12-23  1:01   ` Baolin Wang
  2021-12-23  1:07 ` Huang, Ying
  3 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2021-12-23  0:01 UTC (permalink / raw)
  To: Baolin Wang
  Cc: sj, ying.huang, dave.hansen, ziy, shy828301, zhongjiang-ali,
	xlpang, linux-mm, linux-kernel

On Wed, 22 Dec 2021 19:14:39 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:

> Now on tiered memory system with different memory types, the reclaim path in
> shrink_page_list() already support demoting pages to slow memory node instead
> of discarding the pages. However, at that time the fast memory node memory
> wartermark is already tense, which will increase the memory allocation latency
> during page demotion. So a new method from user space demoting cold pages
> proactively will be more helpful.
> 
> We can rely on the DAMON in user space to help to monitor the cold memory on
> fast memory node, and demote the cold pages to slow memory node proactively to
> keep the fast memory node in a healthy state.
> 
> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
> and works well from my testing. Any comments are welcome. Thanks.

This is interesting.

I think it would be helpful if we could have some example scenarios in
this changelog, help people understand how to use DAMOS_DEMOTE and what
effects it has.

Documentation/admin-guide/mm/damon/usage.rst would like an update?

And the DAMON user space tool?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-23  0:01 ` [PATCH v2 0/2] " Andrew Morton
@ 2021-12-23  1:01   ` Baolin Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2021-12-23  1:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: sj, ying.huang, dave.hansen, ziy, shy828301, zhongjiang-ali,
	xlpang, linux-mm, linux-kernel



On 12/23/2021 8:01 AM, Andrew Morton wrote:
> On Wed, 22 Dec 2021 19:14:39 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
> 
>> Now on tiered memory system with different memory types, the reclaim path in
>> shrink_page_list() already support demoting pages to slow memory node instead
>> of discarding the pages. However, at that time the fast memory node memory
>> wartermark is already tense, which will increase the memory allocation latency
>> during page demotion. So a new method from user space demoting cold pages
>> proactively will be more helpful.
>>
>> We can rely on the DAMON in user space to help to monitor the cold memory on
>> fast memory node, and demote the cold pages to slow memory node proactively to
>> keep the fast memory node in a healthy state.
>>
>> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
>> and works well from my testing. Any comments are welcome. Thanks.
> 
> This is interesting.
> 
> I think it would be helpful if we could have some example scenarios in
> this changelog, help people understand how to use DAMOS_DEMOTE and what
> effects it has.

Sure.

> 
> Documentation/admin-guide/mm/damon/usage.rst would like an update?

Ah, I missed updating de Doc, and will do in v3.

> And the DAMON user space tool?

Yes. Thanks for your comments.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-22 11:14 [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system Baolin Wang
                   ` (2 preceding siblings ...)
  2021-12-23  0:01 ` [PATCH v2 0/2] " Andrew Morton
@ 2021-12-23  1:07 ` Huang, Ying
  2021-12-23  1:21   ` Baolin Wang
  3 siblings, 1 reply; 18+ messages in thread
From: Huang, Ying @ 2021-12-23  1:07 UTC (permalink / raw)
  To: Baolin Wang
  Cc: sj, akpm, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	linux-mm, linux-kernel

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> Hi,
>
> Now on tiered memory system with different memory types, the reclaim path in
> shrink_page_list() already support demoting pages to slow memory node instead
> of discarding the pages. However, at that time the fast memory node memory
> wartermark is already tense, which will increase the memory allocation latency
> during page demotion. So a new method from user space demoting cold pages
> proactively will be more helpful.
>
> We can rely on the DAMON in user space to help to monitor the cold memory on
> fast memory node, and demote the cold pages to slow memory node proactively to
> keep the fast memory node in a healthy state.
>
> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
> and works well from my testing. Any comments are welcome. Thanks.

As a performance optimization patch, it's better to provide some test
results.

Another question is why we shouldn't do this in user space?  With DAMON,
it's possible to export cold memory regions information to the user
space, then we can use move_pages() to migrate them from DRAM to PMEM.
What's the problem of that?

Best Regards,
Huang, Ying

> Changes from v1:
>  - Reuse the demote_page_list().
>  - Fix some comments style issues.
>  - Move the DAMOS_DEMOTE definition to the correct place.
>  - Rename some function name.
>  - Change to return void type for damos_isolate_page().
>  - Remove unnecessary PAGE_ALIGN() in damos_demote().
>  - Fix the return value for damos_demote().
>
> Baolin Wang (2):
>   mm: Export the demote_page_list() function
>   mm/damon: Add a new scheme to support demotion on tiered memory system
>
>  include/linux/damon.h |   3 ++
>  mm/damon/dbgfs.c      |   1 +
>  mm/damon/vaddr.c      | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/internal.h         |   2 +
>  mm/vmscan.c           |   4 +-
>  5 files changed, 155 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-23  1:07 ` Huang, Ying
@ 2021-12-23  1:21   ` Baolin Wang
  2021-12-23  3:22     ` Huang, Ying
  0 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2021-12-23  1:21 UTC (permalink / raw)
  To: Huang, Ying
  Cc: sj, akpm, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	linux-mm, linux-kernel



On 12/23/2021 9:07 AM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> Hi,
>>
>> Now on tiered memory system with different memory types, the reclaim path in
>> shrink_page_list() already support demoting pages to slow memory node instead
>> of discarding the pages. However, at that time the fast memory node memory
>> wartermark is already tense, which will increase the memory allocation latency
>> during page demotion. So a new method from user space demoting cold pages
>> proactively will be more helpful.
>>
>> We can rely on the DAMON in user space to help to monitor the cold memory on
>> fast memory node, and demote the cold pages to slow memory node proactively to
>> keep the fast memory node in a healthy state.
>>
>> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
>> and works well from my testing. Any comments are welcome. Thanks.
> 
> As a performance optimization patch, it's better to provide some test
> results.

Actually this is a functional patch, which adds a new scheme for DAMON. 
And I think it is too early to measure the performance for the real 
workload, and more work need to do for DAMON used on tiered memory 
system (like supporting promotion scheme later).

> Another question is why we shouldn't do this in user space?  With DAMON,
> it's possible to export cold memory regions information to the user
> space, then we can use move_pages() to migrate them from DRAM to PMEM.
> What's the problem of that?

IMO this is the purpose of introducing scheme for DAMON, and you can 
check more in the Documentation/admin-guide/mm/damon/usage.rst.

"
Schemes
-------

For usual DAMON-based data access aware memory management optimizations, 
users
would simply want the system to apply a memory management action to a memory
region of a specific access pattern.  DAMON receives such formalized 
operation
schemes from the user and applies those to the target processes.
"

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-23  1:21   ` Baolin Wang
@ 2021-12-23  3:22     ` Huang, Ying
  2021-12-23  6:35       ` Baolin Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Huang, Ying @ 2021-12-23  3:22 UTC (permalink / raw)
  To: Baolin Wang, sj
  Cc: akpm, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	linux-mm, linux-kernel

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> On 12/23/2021 9:07 AM, Huang, Ying wrote:
>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>> 
>>> Hi,
>>>
>>> Now on tiered memory system with different memory types, the reclaim path in
>>> shrink_page_list() already support demoting pages to slow memory node instead
>>> of discarding the pages. However, at that time the fast memory node memory
>>> wartermark is already tense, which will increase the memory allocation latency
>>> during page demotion. So a new method from user space demoting cold pages
>>> proactively will be more helpful.
>>>
>>> We can rely on the DAMON in user space to help to monitor the cold memory on
>>> fast memory node, and demote the cold pages to slow memory node proactively to
>>> keep the fast memory node in a healthy state.
>>>
>>> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
>>> and works well from my testing. Any comments are welcome. Thanks.
>> As a performance optimization patch, it's better to provide some
>> test
>> results.
>
> Actually this is a functional patch, which adds a new scheme for
> DAMON. And I think it is too early to measure the performance for the
> real workload, and more work need to do for DAMON used on tiered
> memory system (like supporting promotion scheme later).

I don't think you provide any new functionality except the performance
influence.

And I think proactive demotion itself can show some performance benefit
already.  Just like we can find the performance benefit in the proactive
reclaim patchset as below.

https://lore.kernel.org/lkml/20211019150731.16699-1-sj@kernel.org/

>> Another question is why we shouldn't do this in user space?  With DAMON,
>> it's possible to export cold memory regions information to the user
>> space, then we can use move_pages() to migrate them from DRAM to PMEM.
>> What's the problem of that?
>
> IMO this is the purpose of introducing scheme for DAMON, and you can
> check more in the Documentation/admin-guide/mm/damon/usage.rst.
>
> "
> Schemes
> -------
>
> For usual DAMON-based data access aware memory management
> optimizations, users
> would simply want the system to apply a memory management action to a memory
> region of a specific access pattern.  DAMON receives such formalized
> operation
> schemes from the user and applies those to the target processes.
> "

For proactive reclaim, we haven't a user space ABI to reclaim a page of
a process from memory to disk.  So it appears necessary to add a kernel
module to do that.

But for proactive demotion, we already have a user space ABI
(move_pages()) to demote a page of a process from DRAM to PMEM.  What
prevents you to do all these in the user space?

And, I found there are MADV_XXX schemes too.  Where the user space ABIs
are available already.  TBH, I don't know why we need these given there
are already user space ABIs.  Maybe this is a question for SeongJae too.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-23  3:22     ` Huang, Ying
@ 2021-12-23  6:35       ` Baolin Wang
  2021-12-23  7:51         ` Huang, Ying
  0 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2021-12-23  6:35 UTC (permalink / raw)
  To: Huang, Ying, sj
  Cc: akpm, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	linux-mm, linux-kernel



On 12/23/2021 11:22 AM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> On 12/23/2021 9:07 AM, Huang, Ying wrote:
>>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>>
>>>> Hi,
>>>>
>>>> Now on tiered memory system with different memory types, the reclaim path in
>>>> shrink_page_list() already support demoting pages to slow memory node instead
>>>> of discarding the pages. However, at that time the fast memory node memory
>>>> wartermark is already tense, which will increase the memory allocation latency
>>>> during page demotion. So a new method from user space demoting cold pages
>>>> proactively will be more helpful.
>>>>
>>>> We can rely on the DAMON in user space to help to monitor the cold memory on
>>>> fast memory node, and demote the cold pages to slow memory node proactively to
>>>> keep the fast memory node in a healthy state.
>>>>
>>>> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
>>>> and works well from my testing. Any comments are welcome. Thanks.
>>> As a performance optimization patch, it's better to provide some
>>> test
>>> results.
>>
>> Actually this is a functional patch, which adds a new scheme for
>> DAMON. And I think it is too early to measure the performance for the
>> real workload, and more work need to do for DAMON used on tiered
>> memory system (like supporting promotion scheme later).
> 
> I don't think you provide any new functionality except the performance
> influence.

Fair enough. I mean for DAMON.

> And I think proactive demotion itself can show some performance benefit
> already.  Just like we can find the performance benefit in the proactive

Yes, I think so too. But now I am afraid I can not get some obvious 
performance benefit with current linux-next branch on tiered memory 
system, since the promotion patches are not there (yes, I can backport 
them into my local branch to test), meanwhile I may need more tuning for 
the demote scheme (such as tuning min-size, max-size, min-acc, max-acc, 
min-age, max-age to get a better performance) for the real workload. Now 
I just did a small step to add demotiong support for DAMON, so I do not 
expect some obvious performance gain now (more work need to research). 
But same as the proactive reclaim, I think this is on the right way for 
DAMON.

Anyway, maybe some other people also curious the benefit, and I will do 
some measurement with DAMON demote scheme on mysql to show the 
performance results. Or do you have any other measurement suggestion?

> reclaim patchset as below.
> 
> https://lore.kernel.org/lkml/20211019150731.16699-1-sj@kernel.org/
> 
>>> Another question is why we shouldn't do this in user space?  With DAMON,
>>> it's possible to export cold memory regions information to the user
>>> space, then we can use move_pages() to migrate them from DRAM to PMEM.
>>> What's the problem of that?
>>
>> IMO this is the purpose of introducing scheme for DAMON, and you can
>> check more in the Documentation/admin-guide/mm/damon/usage.rst.
>>
>> "
>> Schemes
>> -------
>>
>> For usual DAMON-based data access aware memory management
>> optimizations, users
>> would simply want the system to apply a memory management action to a memory
>> region of a specific access pattern.  DAMON receives such formalized
>> operation
>> schemes from the user and applies those to the target processes.
>> "
> 
> For proactive reclaim, we haven't a user space ABI to reclaim a page of
> a process from memory to disk.  So it appears necessary to add a kernel
> module to do that.
> 
> But for proactive demotion, we already have a user space ABI
> (move_pages()) to demote a page of a process from DRAM to PMEM.  What
> prevents you to do all these in the user space?
> 
> And, I found there are MADV_XXX schemes too.  Where the user space ABIs
> are available already.  TBH, I don't know why we need these given there
> are already user space ABIs.  Maybe this is a question for SeongJae too.

 From my understanding, schemes will simplify the design for user space 
to avoid implementing their own strategy according to the monitoring 
results, and more details in patch[1]. SeongJae may have more input for 
the purpose.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=1f366e421c8f69583ed37b56d86e3747331869c3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-23  6:35       ` Baolin Wang
@ 2021-12-23  7:51         ` Huang, Ying
  2021-12-23 11:31           ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: Huang, Ying @ 2021-12-23  7:51 UTC (permalink / raw)
  To: Baolin Wang
  Cc: sj, akpm, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	linux-mm, linux-kernel, Minchan Kim

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> On 12/23/2021 11:22 AM, Huang, Ying wrote:
>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>> 
>>> On 12/23/2021 9:07 AM, Huang, Ying wrote:
>>>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> Now on tiered memory system with different memory types, the reclaim path in
>>>>> shrink_page_list() already support demoting pages to slow memory node instead
>>>>> of discarding the pages. However, at that time the fast memory node memory
>>>>> wartermark is already tense, which will increase the memory allocation latency
>>>>> during page demotion. So a new method from user space demoting cold pages
>>>>> proactively will be more helpful.
>>>>>
>>>>> We can rely on the DAMON in user space to help to monitor the cold memory on
>>>>> fast memory node, and demote the cold pages to slow memory node proactively to
>>>>> keep the fast memory node in a healthy state.
>>>>>
>>>>> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
>>>>> and works well from my testing. Any comments are welcome. Thanks.
>>>> As a performance optimization patch, it's better to provide some
>>>> test
>>>> results.
>>>
>>> Actually this is a functional patch, which adds a new scheme for
>>> DAMON. And I think it is too early to measure the performance for the
>>> real workload, and more work need to do for DAMON used on tiered
>>> memory system (like supporting promotion scheme later).
>> I don't think you provide any new functionality except the
>> performance
>> influence.
>
> Fair enough. I mean for DAMON.
>
>> And I think proactive demotion itself can show some performance benefit
>> already.  Just like we can find the performance benefit in the proactive
>
> Yes, I think so too. But now I am afraid I can not get some obvious
> performance benefit with current linux-next branch on tiered memory 
> system, since the promotion patches are not there (yes, I can backport
> them into my local branch to test), meanwhile I may need more tuning
> for the demote scheme (such as tuning min-size, max-size, min-acc,
> max-acc, min-age, max-age to get a better performance) for the real
> workload. Now I just did a small step to add demotiong support for
> DAMON, so I do not expect some obvious performance gain now (more work
> need to research). But same as the proactive reclaim, I think this is
> on the right way for DAMON.
>
> Anyway, maybe some other people also curious the benefit, and I will
> do some measurement with DAMON demote scheme on mysql to show the 
> performance results. Or do you have any other measurement suggestion?

For example, you can run 2 instances of workload, say, instance A and
instance B.  The memory size of instance A + B is larger than the size
of the DRAM.  And `numactl -m <DRAM node>` is used to run the instance,
so that demotion will be triggered when DRAM is used up.  Instance A is
run at first, after some time, say several to tens seconds, instance B
is run.  With the original kernel, demotion will be triggered when run
instance B, long latency may be triggered.  With your patch, the
proactive demotion will be triggered earlier to avoid the long latency
at the cost of performance of instance A (may be just a little).  We can
also compare between DAMON based and the in-kernel LRU based cold page
identification algorithm.

>> reclaim patchset as below.
>> https://lore.kernel.org/lkml/20211019150731.16699-1-sj@kernel.org/
>
>>>> Another question is why we shouldn't do this in user space?  With DAMON,
>>>> it's possible to export cold memory regions information to the user
>>>> space, then we can use move_pages() to migrate them from DRAM to PMEM.
>>>> What's the problem of that?
>>>
>>> IMO this is the purpose of introducing scheme for DAMON, and you can
>>> check more in the Documentation/admin-guide/mm/damon/usage.rst.
>>>
>>> "
>>> Schemes
>>> -------
>>>
>>> For usual DAMON-based data access aware memory management
>>> optimizations, users
>>> would simply want the system to apply a memory management action to a memory
>>> region of a specific access pattern.  DAMON receives such formalized
>>> operation
>>> schemes from the user and applies those to the target processes.
>>> "
>> For proactive reclaim, we haven't a user space ABI to reclaim a page
>> of
>> a process from memory to disk.  So it appears necessary to add a kernel
>> module to do that.
>> But for proactive demotion, we already have a user space ABI
>> (move_pages()) to demote a page of a process from DRAM to PMEM.  What
>> prevents you to do all these in the user space?
>> And, I found there are MADV_XXX schemes too.  Where the user space
>> ABIs
>> are available already.  TBH, I don't know why we need these given there
>> are already user space ABIs.  Maybe this is a question for SeongJae too.
>
> From my understanding, schemes will simplify the design for user space
> to avoid implementing their own strategy according to the monitoring 
> results, and more details in patch[1]. SeongJae may have more input
> for the purpose.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=1f366e421c8f69583ed37b56d86e3747331869c3

Thanks a lot for your information.  The commit log is helpful.

It's good to avoid to change the source code of an application to apply
some memory management optimization (for example, use DAMON +
madvise()).  But it's much easier to run a user space daemon to optimize
for the application.  (for example, use DAMON + other information +
process_madvise()).

And this kind of per-application optimization is kind of application
specific policy.  This kind of policy may be too complex and flexible to
be put in the kernel directly.  For example, in addition to DAMON, some
other application specific or system knowledge may be helpful too, so we
have process_madvise() for that before DAMON.  Some more complex
algorithm may be needed for some applications.

And this kind of application specific policy usually need complex
configuration.  It's hard to export all these policy parameters to the
user space as the kernel ABI.  Now, DAMON schemes parameters are
exported in debugfs so they are not considered ABI.  So they may be
changed at any time.  But applications need some stable and
well-maintained ABI.

All in all, IMHO, what we need is a user space per-application policy
daemon with the information from DAMON and other sources.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-23  7:51         ` Huang, Ying
@ 2021-12-23 11:31           ` SeongJae Park
  2021-12-27  3:09             ` Huang, Ying
  0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2021-12-23 11:31 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Baolin Wang, sj, akpm, dave.hansen, ziy, shy828301,
	zhongjiang-ali, xlpang, linux-mm, linux-kernel, Minchan Kim

Hi,

On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:

> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
> > On 12/23/2021 11:22 AM, Huang, Ying wrote:
> >> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> >> 
> >>> On 12/23/2021 9:07 AM, Huang, Ying wrote:
> >>>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Now on tiered memory system with different memory types, the reclaim path in
> >>>>> shrink_page_list() already support demoting pages to slow memory node instead
> >>>>> of discarding the pages. However, at that time the fast memory node memory
> >>>>> wartermark is already tense, which will increase the memory allocation latency
> >>>>> during page demotion. So a new method from user space demoting cold pages
> >>>>> proactively will be more helpful.
> >>>>>
> >>>>> We can rely on the DAMON in user space to help to monitor the cold memory on
> >>>>> fast memory node, and demote the cold pages to slow memory node proactively to
> >>>>> keep the fast memory node in a healthy state.
> >>>>>
> >>>>> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
> >>>>> and works well from my testing. Any comments are welcome. Thanks.
> >>>> As a performance optimization patch, it's better to provide some
> >>>> test
> >>>> results.
> >>>
> >>> Actually this is a functional patch, which adds a new scheme for
> >>> DAMON. And I think it is too early to measure the performance for the
> >>> real workload, and more work need to do for DAMON used on tiered
> >>> memory system (like supporting promotion scheme later).
> >> I don't think you provide any new functionality except the
> >> performance
> >> influence.
> >
> > Fair enough. I mean for DAMON.
> >
> >> And I think proactive demotion itself can show some performance benefit
> >> already.  Just like we can find the performance benefit in the proactive
> >
> > Yes, I think so too. But now I am afraid I can not get some obvious
> > performance benefit with current linux-next branch on tiered memory 
> > system, since the promotion patches are not there (yes, I can backport
> > them into my local branch to test), meanwhile I may need more tuning
> > for the demote scheme (such as tuning min-size, max-size, min-acc,
> > max-acc, min-age, max-age to get a better performance) for the real
> > workload. Now I just did a small step to add demotiong support for
> > DAMON, so I do not expect some obvious performance gain now (more work
> > need to research). But same as the proactive reclaim, I think this is
> > on the right way for DAMON.
> >
> > Anyway, maybe some other people also curious the benefit, and I will
> > do some measurement with DAMON demote scheme on mysql to show the 
> > performance results. Or do you have any other measurement suggestion?
> 
> For example, you can run 2 instances of workload, say, instance A and
> instance B.  The memory size of instance A + B is larger than the size
> of the DRAM.  And `numactl -m <DRAM node>` is used to run the instance,
> so that demotion will be triggered when DRAM is used up.  Instance A is
> run at first, after some time, say several to tens seconds, instance B
> is run.  With the original kernel, demotion will be triggered when run
> instance B, long latency may be triggered.  With your patch, the
> proactive demotion will be triggered earlier to avoid the long latency
> at the cost of performance of instance A (may be just a little).  We can
> also compare between DAMON based and the in-kernel LRU based cold page
> identification algorithm.

Good suggestion!

Also, there is a performance test for virtual address space proactive
reclamation scheme in the DAMON performance tests suite[1].  It measures memory
saving and runtime slowdown.  Maybe you could start from extending that for
demote scheme and measure similar metrics.

[1] https://github.com/awslabs/damon-tests/tree/master/perf

> 
> >> reclaim patchset as below.
> >> https://lore.kernel.org/lkml/20211019150731.16699-1-sj@kernel.org/
> >
> >>>> Another question is why we shouldn't do this in user space?  With DAMON,
> >>>> it's possible to export cold memory regions information to the user
> >>>> space, then we can use move_pages() to migrate them from DRAM to PMEM.
> >>>> What's the problem of that?
> >>>
> >>> IMO this is the purpose of introducing scheme for DAMON, and you can
> >>> check more in the Documentation/admin-guide/mm/damon/usage.rst.
> >>>
> >>> "
> >>> Schemes
> >>> -------
> >>>
> >>> For usual DAMON-based data access aware memory management
> >>> optimizations, users
> >>> would simply want the system to apply a memory management action to a memory
> >>> region of a specific access pattern.  DAMON receives such formalized
> >>> operation
> >>> schemes from the user and applies those to the target processes.
> >>> "
> >> For proactive reclaim, we haven't a user space ABI to reclaim a page
> >> of
> >> a process from memory to disk.  So it appears necessary to add a kernel
> >> module to do that.
> >> But for proactive demotion, we already have a user space ABI
> >> (move_pages()) to demote a page of a process from DRAM to PMEM.  What
> >> prevents you to do all these in the user space?
> >> And, I found there are MADV_XXX schemes too.  Where the user space
> >> ABIs
> >> are available already.  TBH, I don't know why we need these given there
> >> are already user space ABIs.  Maybe this is a question for SeongJae too.
> >
> > From my understanding, schemes will simplify the design for user space
> > to avoid implementing their own strategy according to the monitoring 
> > results, and more details in patch[1]. SeongJae may have more input
> > for the purpose.
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=1f366e421c8f69583ed37b56d86e3747331869c3
> 
> Thanks a lot for your information.  The commit log is helpful.
> 
> It's good to avoid to change the source code of an application to apply
> some memory management optimization (for example, use DAMON +
> madvise()).  But it's much easier to run a user space daemon to optimize
> for the application.  (for example, use DAMON + other information +
> process_madvise()).
> 
> And this kind of per-application optimization is kind of application
> specific policy.  This kind of policy may be too complex and flexible to
> be put in the kernel directly.  For example, in addition to DAMON, some
> other application specific or system knowledge may be helpful too, so we
> have process_madvise() for that before DAMON.  Some more complex
> algorithm may be needed for some applications.
> 
> And this kind of application specific policy usually need complex
> configuration.  It's hard to export all these policy parameters to the
> user space as the kernel ABI.  Now, DAMON schemes parameters are
> exported in debugfs so they are not considered ABI.  So they may be
> changed at any time.  But applications need some stable and
> well-maintained ABI.
> 
> All in all, IMHO, what we need is a user space per-application policy
> daemon with the information from DAMON and other sources.

I basically agree to Ying, as I also noted in the coverletter of DAMOS
patchset[1]:

    DAMON[1] can be used as a primitive for data access aware memory
    management optimizations.  For that, users who want such optimizations
    should run DAMON, read the monitoring results, analyze it, plan a new
    memory management scheme, and apply the new scheme by themselves.  Such
    efforts will be inevitable for some complicated optimizations.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274

That is, I believe some programs and big companies would definitely have their
own information and want such kind of complicated optimizations.  But, such
optimizations would depend on characteristics of each program and require
investment of some amount of resources.  Some other programs and users wouldn't
have such special information, and/or resource to invest for such
optimizations.  For them, some amount of benefit would be helpful enough even
though its sub-optimal.

I think we should help both groups, and DAMOS could be useful for the second
group.  And I don't think DAMOS is useless for the first group.  They could use
their information-based policy in prallel to DAMOS in some cases.  E.g., if
they have a way to predict the data access pattern of specific memory region
even without help from DAMON, they can use their own policy for the region but
DAMOS for other regions.

Someone could ask why not implement a user-space implementation for the second
group, then.  First of all, DAMOS is not only for the user-space driven virtual
memory management optimization, but also for kernel-space programs and any
DAMOS-supportable address spaces including the physical address space.  And,
another important goal of DAMOS for user space driven use case in addition to
reducing the redundant code is minimizing the user-kernel context switch
overhead for passing the monitoring results information and memory management
action requests.

In summary, I agree the user space per-application policy daemon will be useful
for the specialized ultimate optimizations, but we also need DAMOS for another
common group of cases.

If I'm missing something, please feel free to let me know.


Thanks,
SJ


> 
> Best Regards,
> Huang, Ying

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-23 11:31           ` SeongJae Park
@ 2021-12-27  3:09             ` Huang, Ying
  2021-12-28  8:44               ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: Huang, Ying @ 2021-12-27  3:09 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Baolin Wang, akpm, dave.hansen, ziy, shy828301, zhongjiang-ali,
	xlpang, linux-mm, linux-kernel, Minchan Kim

Hi, SeongJae,

SeongJae Park <sj@kernel.org> writes:

> Hi,
>
> On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:

[snip]

>> It's good to avoid to change the source code of an application to apply
>> some memory management optimization (for example, use DAMON +
>> madvise()).  But it's much easier to run a user space daemon to optimize
>> for the application.  (for example, use DAMON + other information +
>> process_madvise()).
>> 
>> And this kind of per-application optimization is kind of application
>> specific policy.  This kind of policy may be too complex and flexible to
>> be put in the kernel directly.  For example, in addition to DAMON, some
>> other application specific or system knowledge may be helpful too, so we
>> have process_madvise() for that before DAMON.  Some more complex
>> algorithm may be needed for some applications.
>> 
>> And this kind of application specific policy usually need complex
>> configuration.  It's hard to export all these policy parameters to the
>> user space as the kernel ABI.  Now, DAMON schemes parameters are
>> exported in debugfs so they are not considered ABI.  So they may be
>> changed at any time.  But applications need some stable and
>> well-maintained ABI.
>> 
>> All in all, IMHO, what we need is a user space per-application policy
>> daemon with the information from DAMON and other sources.
>
> I basically agree to Ying, as I also noted in the coverletter of DAMOS
> patchset[1]:
>
>     DAMON[1] can be used as a primitive for data access aware memory
>     management optimizations.  For that, users who want such optimizations
>     should run DAMON, read the monitoring results, analyze it, plan a new
>     memory management scheme, and apply the new scheme by themselves.  Such
>     efforts will be inevitable for some complicated optimizations.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
>
> That is, I believe some programs and big companies would definitely have their
> own information and want such kind of complicated optimizations.  But, such
> optimizations would depend on characteristics of each program and require
> investment of some amount of resources.  Some other programs and users wouldn't
> have such special information, and/or resource to invest for such
> optimizations.  For them, some amount of benefit would be helpful enough even
> though its sub-optimal.
>
> I think we should help both groups, and DAMOS could be useful for the second
> group.  And I don't think DAMOS is useless for the first group.  They could use
> their information-based policy in prallel to DAMOS in some cases.  E.g., if
> they have a way to predict the data access pattern of specific memory region
> even without help from DAMON, they can use their own policy for the region but
> DAMOS for other regions.
>
> Someone could ask why not implement a user-space implementation for the second
> group, then.  First of all, DAMOS is not only for the user-space driven virtual
> memory management optimization, but also for kernel-space programs and any
> DAMOS-supportable address spaces including the physical address space.  And,
> another important goal of DAMOS for user space driven use case in addition to
> reducing the redundant code is minimizing the user-kernel context switch
> overhead for passing the monitoring results information and memory management
> action requests.
>
> In summary, I agree the user space per-application policy daemon will be useful
> for the specialized ultimate optimizations, but we also need DAMOS for another
> common group of cases.
>
> If I'm missing something, please feel free to let me know.

I guess that most end-users and quite some system administrators of
small companies have no enough capability to take advantage of the
per-application optimizations.  How do they know the appropriate region
number and proactive reclaim threshold?

So per my understanding, Linux kernel
need provide,

1. An in-kernel general policy that is obviously correct and benefits
   almost all users and applications, at least no regression.  No
   complex configuration or deep knowledge is needed to take advantage
   of it.

2. Some way to inspect and control system and application behavior, so
   that some advanced and customized user space policy daemons can be
   built to satisfy some advanced users who have the enough knowledge
   for the applications and systems, for example, oomd.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-27  3:09             ` Huang, Ying
@ 2021-12-28  8:44               ` SeongJae Park
  2021-12-29  1:33                 ` Huang, Ying
  2021-12-30  9:31                 ` Baolin Wang
  0 siblings, 2 replies; 18+ messages in thread
From: SeongJae Park @ 2021-12-28  8:44 UTC (permalink / raw)
  To: Huang, Ying
  Cc: SeongJae Park, Baolin Wang, akpm, dave.hansen, ziy, shy828301,
	zhongjiang-ali, xlpang, linux-mm, linux-kernel, Minchan Kim

Hello,

On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:

> Hi, SeongJae,
> 
> SeongJae Park <sj@kernel.org> writes:
> 
> > Hi,
> >
> > On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> 
> [snip]
> 
> >> It's good to avoid to change the source code of an application to apply
> >> some memory management optimization (for example, use DAMON +
> >> madvise()).  But it's much easier to run a user space daemon to optimize
> >> for the application.  (for example, use DAMON + other information +
> >> process_madvise()).
> >> 
> >> And this kind of per-application optimization is kind of application
> >> specific policy.  This kind of policy may be too complex and flexible to
> >> be put in the kernel directly.  For example, in addition to DAMON, some
> >> other application specific or system knowledge may be helpful too, so we
> >> have process_madvise() for that before DAMON.  Some more complex
> >> algorithm may be needed for some applications.
> >> 
> >> And this kind of application specific policy usually need complex
> >> configuration.  It's hard to export all these policy parameters to the
> >> user space as the kernel ABI.  Now, DAMON schemes parameters are
> >> exported in debugfs so they are not considered ABI.  So they may be
> >> changed at any time.  But applications need some stable and
> >> well-maintained ABI.
> >> 
> >> All in all, IMHO, what we need is a user space per-application policy
> >> daemon with the information from DAMON and other sources.
> >
> > I basically agree to Ying, as I also noted in the coverletter of DAMOS
> > patchset[1]:
> >
> >     DAMON[1] can be used as a primitive for data access aware memory
> >     management optimizations.  For that, users who want such optimizations
> >     should run DAMON, read the monitoring results, analyze it, plan a new
> >     memory management scheme, and apply the new scheme by themselves.  Such
> >     efforts will be inevitable for some complicated optimizations.
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
> >
> > That is, I believe some programs and big companies would definitely have their
> > own information and want such kind of complicated optimizations.  But, such
> > optimizations would depend on characteristics of each program and require
> > investment of some amount of resources.  Some other programs and users wouldn't
> > have such special information, and/or resource to invest for such
> > optimizations.  For them, some amount of benefit would be helpful enough even
> > though its sub-optimal.
> >
> > I think we should help both groups, and DAMOS could be useful for the second
> > group.  And I don't think DAMOS is useless for the first group.  They could use
> > their information-based policy in prallel to DAMOS in some cases.  E.g., if
> > they have a way to predict the data access pattern of specific memory region
> > even without help from DAMON, they can use their own policy for the region but
> > DAMOS for other regions.
> >
> > Someone could ask why not implement a user-space implementation for the second
> > group, then.  First of all, DAMOS is not only for the user-space driven virtual
> > memory management optimization, but also for kernel-space programs and any
> > DAMOS-supportable address spaces including the physical address space.  And,
> > another important goal of DAMOS for user space driven use case in addition to
> > reducing the redundant code is minimizing the user-kernel context switch
> > overhead for passing the monitoring results information and memory management
> > action requests.
> >
> > In summary, I agree the user space per-application policy daemon will be useful
> > for the specialized ultimate optimizations, but we also need DAMOS for another
> > common group of cases.
> >
> > If I'm missing something, please feel free to let me know.
> 
> I guess that most end-users and quite some system administrators of
> small companies have no enough capability to take advantage of the
> per-application optimizations.  How do they know the appropriate region
> number and proactive reclaim threshold?
> 
> So per my understanding, Linux kernel
> need provide,
> 
> 1. An in-kernel general policy that is obviously correct and benefits
>    almost all users and applications, at least no regression.  No
>    complex configuration or deep knowledge is needed to take advantage
>    of it.
> 
> 2. Some way to inspect and control system and application behavior, so
>    that some advanced and customized user space policy daemons can be
>    built to satisfy some advanced users who have the enough knowledge
>    for the applications and systems, for example, oomd.

Agreed, and I think that's the approach that DAMON is currently taking.  In
specific, we provide DAMON debugfs interface for users who want to inspect and
control their system and application behavior.  Using it, we also made a PoC
level user space policy daemon[1].

For the in-kernel policies, we are developing DAMON-based kernel components one
by one, for specific usages.  DAMON-based proactive reclamation module
(DAMON_RECLAIM) is one such example.  Such DAMON-based components will remove
complex tunables that necessary for the general inspection and control of the
system but unnecessary for their specific purpose (e.g., proactive reclamation)
to allow users use it in a simple manner.  Also, those will use conservative
default configs to not incur visible regression.  For example, DAMON_RECLAIM
uses only up to 1% of single CPU time for the reclamation by default.

In short, I think we're on the same page, and adding DEMOTION scheme action
could be helpful for the users who want to efficiently inspect and control the
system/application behavior for their tiered memory systems.  It's unclear how
much benefit this could give to users, though.  I assume Baolin would come back
with some sort of numbers in the next spin.  Nevertheless, I personally don't
think that's a critical blocker, as this patch is essentially just adding a way
for using the pre-existing primitive, namely move_pages(), in a little bit more
efficient manner, for the access pattern-based use cases. 

If I'm missing something, please feel free to let me know.

[1] https://github.com/awslabs/damoos


Thanks,
SJ

> 
> Best Regards,
> Huang, Ying

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-28  8:44               ` SeongJae Park
@ 2021-12-29  1:33                 ` Huang, Ying
  2021-12-29 10:34                   ` SeongJae Park
  2021-12-30  9:31                 ` Baolin Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Huang, Ying @ 2021-12-29  1:33 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Baolin Wang, akpm, dave.hansen, ziy, shy828301, zhongjiang-ali,
	xlpang, linux-mm, linux-kernel, Minchan Kim

SeongJae Park <sj@kernel.org> writes:

> Hello,
>
> On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>
>> Hi, SeongJae,
>> 
>> SeongJae Park <sj@kernel.org> writes:
>> 
>> > Hi,
>> >
>> > On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>> 
>> [snip]
>> 
>> >> It's good to avoid to change the source code of an application to apply
>> >> some memory management optimization (for example, use DAMON +
>> >> madvise()).  But it's much easier to run a user space daemon to optimize
>> >> for the application.  (for example, use DAMON + other information +
>> >> process_madvise()).
>> >> 
>> >> And this kind of per-application optimization is kind of application
>> >> specific policy.  This kind of policy may be too complex and flexible to
>> >> be put in the kernel directly.  For example, in addition to DAMON, some
>> >> other application specific or system knowledge may be helpful too, so we
>> >> have process_madvise() for that before DAMON.  Some more complex
>> >> algorithm may be needed for some applications.
>> >> 
>> >> And this kind of application specific policy usually need complex
>> >> configuration.  It's hard to export all these policy parameters to the
>> >> user space as the kernel ABI.  Now, DAMON schemes parameters are
>> >> exported in debugfs so they are not considered ABI.  So they may be
>> >> changed at any time.  But applications need some stable and
>> >> well-maintained ABI.
>> >> 
>> >> All in all, IMHO, what we need is a user space per-application policy
>> >> daemon with the information from DAMON and other sources.
>> >
>> > I basically agree to Ying, as I also noted in the coverletter of DAMOS
>> > patchset[1]:
>> >
>> >     DAMON[1] can be used as a primitive for data access aware memory
>> >     management optimizations.  For that, users who want such optimizations
>> >     should run DAMON, read the monitoring results, analyze it, plan a new
>> >     memory management scheme, and apply the new scheme by themselves.  Such
>> >     efforts will be inevitable for some complicated optimizations.
>> >
>> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
>> >
>> > That is, I believe some programs and big companies would definitely have their
>> > own information and want such kind of complicated optimizations.  But, such
>> > optimizations would depend on characteristics of each program and require
>> > investment of some amount of resources.  Some other programs and users wouldn't
>> > have such special information, and/or resource to invest for such
>> > optimizations.  For them, some amount of benefit would be helpful enough even
>> > though its sub-optimal.
>> >
>> > I think we should help both groups, and DAMOS could be useful for the second
>> > group.  And I don't think DAMOS is useless for the first group.  They could use
>> > their information-based policy in prallel to DAMOS in some cases.  E.g., if
>> > they have a way to predict the data access pattern of specific memory region
>> > even without help from DAMON, they can use their own policy for the region but
>> > DAMOS for other regions.
>> >
>> > Someone could ask why not implement a user-space implementation for the second
>> > group, then.  First of all, DAMOS is not only for the user-space driven virtual
>> > memory management optimization, but also for kernel-space programs and any
>> > DAMOS-supportable address spaces including the physical address space.  And,
>> > another important goal of DAMOS for user space driven use case in addition to
>> > reducing the redundant code is minimizing the user-kernel context switch
>> > overhead for passing the monitoring results information and memory management
>> > action requests.
>> >
>> > In summary, I agree the user space per-application policy daemon will be useful
>> > for the specialized ultimate optimizations, but we also need DAMOS for another
>> > common group of cases.
>> >
>> > If I'm missing something, please feel free to let me know.
>> 
>> I guess that most end-users and quite some system administrators of
>> small companies have no enough capability to take advantage of the
>> per-application optimizations.  How do they know the appropriate region
>> number and proactive reclaim threshold?
>> 
>> So per my understanding, Linux kernel
>> need provide,
>> 
>> 1. An in-kernel general policy that is obviously correct and benefits
>>    almost all users and applications, at least no regression.  No
>>    complex configuration or deep knowledge is needed to take advantage
>>    of it.
>> 
>> 2. Some way to inspect and control system and application behavior, so
>>    that some advanced and customized user space policy daemons can be
>>    built to satisfy some advanced users who have the enough knowledge
>>    for the applications and systems, for example, oomd.
>
> Agreed, and I think that's the approach that DAMON is currently taking.  In
> specific, we provide DAMON debugfs interface for users who want to inspect and
> control their system and application behavior.  Using it, we also made a PoC
> level user space policy daemon[1].
>
> For the in-kernel policies, we are developing DAMON-based kernel components one
> by one, for specific usages.  DAMON-based proactive reclamation module
> (DAMON_RECLAIM) is one such example.  Such DAMON-based components will remove
> complex tunables that necessary for the general inspection and control of the
> system but unnecessary for their specific purpose (e.g., proactive reclamation)
> to allow users use it in a simple manner.  Also, those will use conservative
> default configs to not incur visible regression.  For example, DAMON_RECLAIM
> uses only up to 1% of single CPU time for the reclamation by default.

I don't think DAMON schemes are the in-kernel general policy I mentioned
above (1.).  For example, NUMA balancing is a general policy to optimize
performance.  It tries to provide a general policy that works for all
users with as few as possible tunables.  If some tunables are needed,
they will be provided as ABI.

Best Regards,
Huang, Ying

> In short, I think we're on the same page, and adding DEMOTION scheme action
> could be helpful for the users who want to efficiently inspect and control the
> system/application behavior for their tiered memory systems.  It's unclear how
> much benefit this could give to users, though.  I assume Baolin would come back
> with some sort of numbers in the next spin.  Nevertheless, I personally don't
> think that's a critical blocker, as this patch is essentially just adding a way
> for using the pre-existing primitive, namely move_pages(), in a little bit more
> efficient manner, for the access pattern-based use cases. 
>
> If I'm missing something, please feel free to let me know.
>
> [1] https://github.com/awslabs/damoos
>
>
> Thanks,
> SJ
>
>> 
>> Best Regards,
>> Huang, Ying

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-29  1:33                 ` Huang, Ying
@ 2021-12-29 10:34                   ` SeongJae Park
  2021-12-30  3:16                     ` Huang, Ying
  0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2021-12-29 10:34 UTC (permalink / raw)
  To: Huang, Ying
  Cc: SeongJae Park, Baolin Wang, akpm, dave.hansen, ziy, shy828301,
	zhongjiang-ali, xlpang, linux-mm, linux-kernel, Minchan Kim

On Wed, 29 Dec 2021 09:33:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:

> SeongJae Park <sj@kernel.org> writes:
> 
> > Hello,
> >
> > On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> >
> >> Hi, SeongJae,
> >> 
> >> SeongJae Park <sj@kernel.org> writes:
> >> 
> >> > Hi,
> >> >
> >> > On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> >> 
> >> [snip]
> >> 
> >> >> It's good to avoid to change the source code of an application to apply
> >> >> some memory management optimization (for example, use DAMON +
> >> >> madvise()).  But it's much easier to run a user space daemon to optimize
> >> >> for the application.  (for example, use DAMON + other information +
> >> >> process_madvise()).
> >> >> 
> >> >> And this kind of per-application optimization is kind of application
> >> >> specific policy.  This kind of policy may be too complex and flexible to
> >> >> be put in the kernel directly.  For example, in addition to DAMON, some
> >> >> other application specific or system knowledge may be helpful too, so we
> >> >> have process_madvise() for that before DAMON.  Some more complex
> >> >> algorithm may be needed for some applications.
> >> >> 
> >> >> And this kind of application specific policy usually need complex
> >> >> configuration.  It's hard to export all these policy parameters to the
> >> >> user space as the kernel ABI.  Now, DAMON schemes parameters are
> >> >> exported in debugfs so they are not considered ABI.  So they may be
> >> >> changed at any time.  But applications need some stable and
> >> >> well-maintained ABI.
> >> >> 
> >> >> All in all, IMHO, what we need is a user space per-application policy
> >> >> daemon with the information from DAMON and other sources.
> >> >
> >> > I basically agree to Ying, as I also noted in the coverletter of DAMOS
> >> > patchset[1]:
> >> >
> >> >     DAMON[1] can be used as a primitive for data access aware memory
> >> >     management optimizations.  For that, users who want such optimizations
> >> >     should run DAMON, read the monitoring results, analyze it, plan a new
> >> >     memory management scheme, and apply the new scheme by themselves.  Such
> >> >     efforts will be inevitable for some complicated optimizations.
> >> >
> >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
> >> >
> >> > That is, I believe some programs and big companies would definitely have their
> >> > own information and want such kind of complicated optimizations.  But, such
> >> > optimizations would depend on characteristics of each program and require
> >> > investment of some amount of resources.  Some other programs and users wouldn't
> >> > have such special information, and/or resource to invest for such
> >> > optimizations.  For them, some amount of benefit would be helpful enough even
> >> > though its sub-optimal.
> >> >
> >> > I think we should help both groups, and DAMOS could be useful for the second
> >> > group.  And I don't think DAMOS is useless for the first group.  They could use
> >> > their information-based policy in prallel to DAMOS in some cases.  E.g., if
> >> > they have a way to predict the data access pattern of specific memory region
> >> > even without help from DAMON, they can use their own policy for the region but
> >> > DAMOS for other regions.
> >> >
> >> > Someone could ask why not implement a user-space implementation for the second
> >> > group, then.  First of all, DAMOS is not only for the user-space driven virtual
> >> > memory management optimization, but also for kernel-space programs and any
> >> > DAMOS-supportable address spaces including the physical address space.  And,
> >> > another important goal of DAMOS for user space driven use case in addition to
> >> > reducing the redundant code is minimizing the user-kernel context switch
> >> > overhead for passing the monitoring results information and memory management
> >> > action requests.
> >> >
> >> > In summary, I agree the user space per-application policy daemon will be useful
> >> > for the specialized ultimate optimizations, but we also need DAMOS for another
> >> > common group of cases.
> >> >
> >> > If I'm missing something, please feel free to let me know.
> >> 
> >> I guess that most end-users and quite some system administrators of
> >> small companies have no enough capability to take advantage of the
> >> per-application optimizations.  How do they know the appropriate region
> >> number and proactive reclaim threshold?
> >> 
> >> So per my understanding, Linux kernel
> >> need provide,
> >> 
> >> 1. An in-kernel general policy that is obviously correct and benefits
> >>    almost all users and applications, at least no regression.  No
> >>    complex configuration or deep knowledge is needed to take advantage
> >>    of it.
> >> 
> >> 2. Some way to inspect and control system and application behavior, so
> >>    that some advanced and customized user space policy daemons can be
> >>    built to satisfy some advanced users who have the enough knowledge
> >>    for the applications and systems, for example, oomd.
> >
> > Agreed, and I think that's the approach that DAMON is currently taking.  In
> > specific, we provide DAMON debugfs interface for users who want to inspect and
> > control their system and application behavior.  Using it, we also made a PoC
> > level user space policy daemon[1].
> >
> > For the in-kernel policies, we are developing DAMON-based kernel components one
> > by one, for specific usages.  DAMON-based proactive reclamation module
> > (DAMON_RECLAIM) is one such example.  Such DAMON-based components will remove
> > complex tunables that necessary for the general inspection and control of the
> > system but unnecessary for their specific purpose (e.g., proactive reclamation)
> > to allow users use it in a simple manner.  Also, those will use conservative
> > default configs to not incur visible regression.  For example, DAMON_RECLAIM
> > uses only up to 1% of single CPU time for the reclamation by default.
> 
> I don't think DAMON schemes are the in-kernel general policy I mentioned
> above (1.).  For example, NUMA balancing is a general policy to optimize
> performance.  It tries to provide a general policy that works for all
> users with as few as possible tunables.  If some tunables are needed,
> they will be provided as ABI.

Exactly.  What I'm saying is, DAMON schemes that exposed to user space via the
debugfs interface is for inspection of system and development of user space
daemon (2.).  It requires some level of tuning and doesn't provide stable ABI
but the debugfs interface.  Meanwhile, DAMON-based kernel components like
DAMON_RECLAIM can be used for the in-kernel general policy (1.).  For example,
DAMON_RECLAIM also tries to be beneficial or at least incur no regression for
almost every users, provides as few as possible tunables, and provides those
via its ABI (module parameters), not debugfs.


Thanks,
SJ

> 
> Best Regards,
> Huang, Ying
> 
> > In short, I think we're on the same page, and adding DEMOTION scheme action
> > could be helpful for the users who want to efficiently inspect and control the
> > system/application behavior for their tiered memory systems.  It's unclear how
> > much benefit this could give to users, though.  I assume Baolin would come back
> > with some sort of numbers in the next spin.  Nevertheless, I personally don't
> > think that's a critical blocker, as this patch is essentially just adding a way
> > for using the pre-existing primitive, namely move_pages(), in a little bit more
> > efficient manner, for the access pattern-based use cases. 
> >
> > If I'm missing something, please feel free to let me know.
> >
> > [1] https://github.com/awslabs/damoos
> >
> >
> > Thanks,
> > SJ
> >
> >> 
> >> Best Regards,
> >> Huang, Ying
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-29 10:34                   ` SeongJae Park
@ 2021-12-30  3:16                     ` Huang, Ying
  2021-12-30  8:03                       ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: Huang, Ying @ 2021-12-30  3:16 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Baolin Wang, akpm, dave.hansen, ziy, shy828301, zhongjiang-ali,
	xlpang, linux-mm, linux-kernel, Minchan Kim

SeongJae Park <sj@kernel.org> writes:

> On Wed, 29 Dec 2021 09:33:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>
>> SeongJae Park <sj@kernel.org> writes:
>> 
>> > Hello,
>> >
>> > On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>> >
>> >> Hi, SeongJae,
>> >> 
>> >> SeongJae Park <sj@kernel.org> writes:
>> >> 
>> >> > Hi,
>> >> >
>> >> > On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>> >> 
>> >> [snip]
>> >> 
>> >> >> It's good to avoid to change the source code of an application to apply
>> >> >> some memory management optimization (for example, use DAMON +
>> >> >> madvise()).  But it's much easier to run a user space daemon to optimize
>> >> >> for the application.  (for example, use DAMON + other information +
>> >> >> process_madvise()).
>> >> >> 
>> >> >> And this kind of per-application optimization is kind of application
>> >> >> specific policy.  This kind of policy may be too complex and flexible to
>> >> >> be put in the kernel directly.  For example, in addition to DAMON, some
>> >> >> other application specific or system knowledge may be helpful too, so we
>> >> >> have process_madvise() for that before DAMON.  Some more complex
>> >> >> algorithm may be needed for some applications.
>> >> >> 
>> >> >> And this kind of application specific policy usually need complex
>> >> >> configuration.  It's hard to export all these policy parameters to the
>> >> >> user space as the kernel ABI.  Now, DAMON schemes parameters are
>> >> >> exported in debugfs so they are not considered ABI.  So they may be
>> >> >> changed at any time.  But applications need some stable and
>> >> >> well-maintained ABI.
>> >> >> 
>> >> >> All in all, IMHO, what we need is a user space per-application policy
>> >> >> daemon with the information from DAMON and other sources.
>> >> >
>> >> > I basically agree to Ying, as I also noted in the coverletter of DAMOS
>> >> > patchset[1]:
>> >> >
>> >> >     DAMON[1] can be used as a primitive for data access aware memory
>> >> >     management optimizations.  For that, users who want such optimizations
>> >> >     should run DAMON, read the monitoring results, analyze it, plan a new
>> >> >     memory management scheme, and apply the new scheme by themselves.  Such
>> >> >     efforts will be inevitable for some complicated optimizations.
>> >> >
>> >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
>> >> >
>> >> > That is, I believe some programs and big companies would definitely have their
>> >> > own information and want such kind of complicated optimizations.  But, such
>> >> > optimizations would depend on characteristics of each program and require
>> >> > investment of some amount of resources.  Some other programs and users wouldn't
>> >> > have such special information, and/or resource to invest for such
>> >> > optimizations.  For them, some amount of benefit would be helpful enough even
>> >> > though its sub-optimal.
>> >> >
>> >> > I think we should help both groups, and DAMOS could be useful for the second
>> >> > group.  And I don't think DAMOS is useless for the first group.  They could use
>> >> > their information-based policy in prallel to DAMOS in some cases.  E.g., if
>> >> > they have a way to predict the data access pattern of specific memory region
>> >> > even without help from DAMON, they can use their own policy for the region but
>> >> > DAMOS for other regions.
>> >> >
>> >> > Someone could ask why not implement a user-space implementation for the second
>> >> > group, then.  First of all, DAMOS is not only for the user-space driven virtual
>> >> > memory management optimization, but also for kernel-space programs and any
>> >> > DAMOS-supportable address spaces including the physical address space.  And,
>> >> > another important goal of DAMOS for user space driven use case in addition to
>> >> > reducing the redundant code is minimizing the user-kernel context switch
>> >> > overhead for passing the monitoring results information and memory management
>> >> > action requests.
>> >> >
>> >> > In summary, I agree the user space per-application policy daemon will be useful
>> >> > for the specialized ultimate optimizations, but we also need DAMOS for another
>> >> > common group of cases.
>> >> >
>> >> > If I'm missing something, please feel free to let me know.
>> >> 
>> >> I guess that most end-users and quite some system administrators of
>> >> small companies have no enough capability to take advantage of the
>> >> per-application optimizations.  How do they know the appropriate region
>> >> number and proactive reclaim threshold?
>> >> 
>> >> So per my understanding, Linux kernel
>> >> need provide,
>> >> 
>> >> 1. An in-kernel general policy that is obviously correct and benefits
>> >>    almost all users and applications, at least no regression.  No
>> >>    complex configuration or deep knowledge is needed to take advantage
>> >>    of it.
>> >> 
>> >> 2. Some way to inspect and control system and application behavior, so
>> >>    that some advanced and customized user space policy daemons can be
>> >>    built to satisfy some advanced users who have the enough knowledge
>> >>    for the applications and systems, for example, oomd.
>> >
>> > Agreed, and I think that's the approach that DAMON is currently taking.  In
>> > specific, we provide DAMON debugfs interface for users who want to inspect and
>> > control their system and application behavior.  Using it, we also made a PoC
>> > level user space policy daemon[1].
>> >
>> > For the in-kernel policies, we are developing DAMON-based kernel components one
>> > by one, for specific usages.  DAMON-based proactive reclamation module
>> > (DAMON_RECLAIM) is one such example.  Such DAMON-based components will remove
>> > complex tunables that necessary for the general inspection and control of the
>> > system but unnecessary for their specific purpose (e.g., proactive reclamation)
>> > to allow users use it in a simple manner.  Also, those will use conservative
>> > default configs to not incur visible regression.  For example, DAMON_RECLAIM
>> > uses only up to 1% of single CPU time for the reclamation by default.
>> 
>> I don't think DAMON schemes are the in-kernel general policy I mentioned
>> above (1.).  For example, NUMA balancing is a general policy to optimize
>> performance.  It tries to provide a general policy that works for all
>> users with as few as possible tunables.  If some tunables are needed,
>> they will be provided as ABI.
>
> Exactly.  What I'm saying is, DAMON schemes that exposed to user space via the
> debugfs interface is for inspection of system and development of user space
> daemon (2.).  It requires some level of tuning and doesn't provide stable ABI
> but the debugfs interface.  Meanwhile, DAMON-based kernel components like
> DAMON_RECLAIM can be used for the in-kernel general policy (1.).  For example,
> DAMON_RECLAIM also tries to be beneficial or at least incur no regression for
> almost every users, provides as few as possible tunables, and provides those
> via its ABI (module parameters), not debugfs.

Thanks for your detailed explanation.

Per my understanding, DAMON schemes are kind of building blocks of some
kernel feature such as DAMON_RECLAIM.  Whether do we need a new scheme
depends on whether it's useful as part of some kernel feature.  Do you
agree?

Best Regards,
Huang, Ying

> Thanks,
> SJ
>
>> 
>> Best Regards,
>> Huang, Ying
>> 
>> > In short, I think we're on the same page, and adding DEMOTION scheme action
>> > could be helpful for the users who want to efficiently inspect and control the
>> > system/application behavior for their tiered memory systems.  It's unclear how
>> > much benefit this could give to users, though.  I assume Baolin would come back
>> > with some sort of numbers in the next spin.  Nevertheless, I personally don't
>> > think that's a critical blocker, as this patch is essentially just adding a way
>> > for using the pre-existing primitive, namely move_pages(), in a little bit more
>> > efficient manner, for the access pattern-based use cases. 
>> >
>> > If I'm missing something, please feel free to let me know.
>> >
>> > [1] https://github.com/awslabs/damoos
>> >
>> >
>> > Thanks,
>> > SJ
>> >
>> >> 
>> >> Best Regards,
>> >> Huang, Ying
>> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-30  3:16                     ` Huang, Ying
@ 2021-12-30  8:03                       ` SeongJae Park
  0 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2021-12-30  8:03 UTC (permalink / raw)
  To: Huang, Ying
  Cc: SeongJae Park, Baolin Wang, akpm, dave.hansen, ziy, shy828301,
	zhongjiang-ali, xlpang, linux-mm, linux-kernel, Minchan Kim

On Thu, 30 Dec 2021 11:16:15 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:

> SeongJae Park <sj@kernel.org> writes:
> 
> > On Wed, 29 Dec 2021 09:33:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> >
> >> SeongJae Park <sj@kernel.org> writes:
> >> 
> >> > Hello,
> >> >
> >> > On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> >> >
> >> >> Hi, SeongJae,
> >> >> 
> >> >> SeongJae Park <sj@kernel.org> writes:
> >> >> 
> >> >> > Hi,
> >> >> >
> >> >> > On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> >> >> 
> >> >> [snip]
> >> >> 
> >> >> >> It's good to avoid to change the source code of an application to apply
> >> >> >> some memory management optimization (for example, use DAMON +
> >> >> >> madvise()).  But it's much easier to run a user space daemon to optimize
> >> >> >> for the application.  (for example, use DAMON + other information +
> >> >> >> process_madvise()).
> >> >> >> 
> >> >> >> And this kind of per-application optimization is kind of application
> >> >> >> specific policy.  This kind of policy may be too complex and flexible to
> >> >> >> be put in the kernel directly.  For example, in addition to DAMON, some
> >> >> >> other application specific or system knowledge may be helpful too, so we
> >> >> >> have process_madvise() for that before DAMON.  Some more complex
> >> >> >> algorithm may be needed for some applications.
> >> >> >> 
> >> >> >> And this kind of application specific policy usually need complex
> >> >> >> configuration.  It's hard to export all these policy parameters to the
> >> >> >> user space as the kernel ABI.  Now, DAMON schemes parameters are
> >> >> >> exported in debugfs so they are not considered ABI.  So they may be
> >> >> >> changed at any time.  But applications need some stable and
> >> >> >> well-maintained ABI.
> >> >> >> 
> >> >> >> All in all, IMHO, what we need is a user space per-application policy
> >> >> >> daemon with the information from DAMON and other sources.
> >> >> >
> >> >> > I basically agree to Ying, as I also noted in the coverletter of DAMOS
> >> >> > patchset[1]:
> >> >> >
> >> >> >     DAMON[1] can be used as a primitive for data access aware memory
> >> >> >     management optimizations.  For that, users who want such optimizations
> >> >> >     should run DAMON, read the monitoring results, analyze it, plan a new
> >> >> >     memory management scheme, and apply the new scheme by themselves.  Such
> >> >> >     efforts will be inevitable for some complicated optimizations.
> >> >> >
> >> >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
> >> >> >
> >> >> > That is, I believe some programs and big companies would definitely have their
> >> >> > own information and want such kind of complicated optimizations.  But, such
> >> >> > optimizations would depend on characteristics of each program and require
> >> >> > investment of some amount of resources.  Some other programs and users wouldn't
> >> >> > have such special information, and/or resource to invest for such
> >> >> > optimizations.  For them, some amount of benefit would be helpful enough even
> >> >> > though its sub-optimal.
> >> >> >
> >> >> > I think we should help both groups, and DAMOS could be useful for the second
> >> >> > group.  And I don't think DAMOS is useless for the first group.  They could use
> >> >> > their information-based policy in prallel to DAMOS in some cases.  E.g., if
> >> >> > they have a way to predict the data access pattern of specific memory region
> >> >> > even without help from DAMON, they can use their own policy for the region but
> >> >> > DAMOS for other regions.
> >> >> >
> >> >> > Someone could ask why not implement a user-space implementation for the second
> >> >> > group, then.  First of all, DAMOS is not only for the user-space driven virtual
> >> >> > memory management optimization, but also for kernel-space programs and any
> >> >> > DAMOS-supportable address spaces including the physical address space.  And,
> >> >> > another important goal of DAMOS for user space driven use case in addition to
> >> >> > reducing the redundant code is minimizing the user-kernel context switch
> >> >> > overhead for passing the monitoring results information and memory management
> >> >> > action requests.
> >> >> >
> >> >> > In summary, I agree the user space per-application policy daemon will be useful
> >> >> > for the specialized ultimate optimizations, but we also need DAMOS for another
> >> >> > common group of cases.
> >> >> >
> >> >> > If I'm missing something, please feel free to let me know.
> >> >> 
> >> >> I guess that most end-users and quite some system administrators of
> >> >> small companies have no enough capability to take advantage of the
> >> >> per-application optimizations.  How do they know the appropriate region
> >> >> number and proactive reclaim threshold?
> >> >> 
> >> >> So per my understanding, Linux kernel
> >> >> need provide,
> >> >> 
> >> >> 1. An in-kernel general policy that is obviously correct and benefits
> >> >>    almost all users and applications, at least no regression.  No
> >> >>    complex configuration or deep knowledge is needed to take advantage
> >> >>    of it.
> >> >> 
> >> >> 2. Some way to inspect and control system and application behavior, so
> >> >>    that some advanced and customized user space policy daemons can be
> >> >>    built to satisfy some advanced users who have the enough knowledge
> >> >>    for the applications and systems, for example, oomd.
> >> >
> >> > Agreed, and I think that's the approach that DAMON is currently taking.  In
> >> > specific, we provide DAMON debugfs interface for users who want to inspect and
> >> > control their system and application behavior.  Using it, we also made a PoC
> >> > level user space policy daemon[1].
> >> >
> >> > For the in-kernel policies, we are developing DAMON-based kernel components one
> >> > by one, for specific usages.  DAMON-based proactive reclamation module
> >> > (DAMON_RECLAIM) is one such example.  Such DAMON-based components will remove
> >> > complex tunables that necessary for the general inspection and control of the
> >> > system but unnecessary for their specific purpose (e.g., proactive reclamation)
> >> > to allow users use it in a simple manner.  Also, those will use conservative
> >> > default configs to not incur visible regression.  For example, DAMON_RECLAIM
> >> > uses only up to 1% of single CPU time for the reclamation by default.
> >> 
> >> I don't think DAMON schemes are the in-kernel general policy I mentioned
> >> above (1.).  For example, NUMA balancing is a general policy to optimize
> >> performance.  It tries to provide a general policy that works for all
> >> users with as few as possible tunables.  If some tunables are needed,
> >> they will be provided as ABI.
> >
> > Exactly.  What I'm saying is, DAMON schemes that exposed to user space via the
> > debugfs interface is for inspection of system and development of user space
> > daemon (2.).  It requires some level of tuning and doesn't provide stable ABI
> > but the debugfs interface.  Meanwhile, DAMON-based kernel components like
> > DAMON_RECLAIM can be used for the in-kernel general policy (1.).  For example,
> > DAMON_RECLAIM also tries to be beneficial or at least incur no regression for
> > almost every users, provides as few as possible tunables, and provides those
> > via its ABI (module parameters), not debugfs.
> 
> Thanks for your detailed explanation.
> 
> Per my understanding, DAMON schemes are kind of building blocks of some
> kernel feature such as DAMON_RECLAIM.

I pretty sure you're perfectly understanding my point.

> Whether do we need a new scheme depends on whether it's useful as part of
> some kernel feature.  Do you agree?

Yes, agreed.


Thanks,
SJ

> 
> Best Regards,
> Huang, Ying
> 
> > Thanks,
> > SJ
> >
> >> 
> >> Best Regards,
> >> Huang, Ying
> >> 
> >> > In short, I think we're on the same page, and adding DEMOTION scheme action
> >> > could be helpful for the users who want to efficiently inspect and control the
> >> > system/application behavior for their tiered memory systems.  It's unclear how
> >> > much benefit this could give to users, though.  I assume Baolin would come back
> >> > with some sort of numbers in the next spin.  Nevertheless, I personally don't
> >> > think that's a critical blocker, as this patch is essentially just adding a way
> >> > for using the pre-existing primitive, namely move_pages(), in a little bit more
> >> > efficient manner, for the access pattern-based use cases. 
> >> >
> >> > If I'm missing something, please feel free to let me know.
> >> >
> >> > [1] https://github.com/awslabs/damoos
> >> >
> >> >
> >> > Thanks,
> >> > SJ
> >> >
> >> >> 
> >> >> Best Regards,
> >> >> Huang, Ying
> >> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system
  2021-12-28  8:44               ` SeongJae Park
  2021-12-29  1:33                 ` Huang, Ying
@ 2021-12-30  9:31                 ` Baolin Wang
  1 sibling, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2021-12-30  9:31 UTC (permalink / raw)
  To: SeongJae Park, Huang, Ying
  Cc: akpm, dave.hansen, ziy, shy828301, zhongjiang-ali, xlpang,
	linux-mm, linux-kernel, Minchan Kim



On 12/28/2021 4:44 PM, SeongJae Park wrote:
> Hello,
> 
> On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> 
>> Hi, SeongJae,
>>
>> SeongJae Park <sj@kernel.org> writes:
>>
>>> Hi,
>>>
>>> On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>>
>> [snip]
>>
>>>> It's good to avoid to change the source code of an application to apply
>>>> some memory management optimization (for example, use DAMON +
>>>> madvise()).  But it's much easier to run a user space daemon to optimize
>>>> for the application.  (for example, use DAMON + other information +
>>>> process_madvise()).
>>>>
>>>> And this kind of per-application optimization is kind of application
>>>> specific policy.  This kind of policy may be too complex and flexible to
>>>> be put in the kernel directly.  For example, in addition to DAMON, some
>>>> other application specific or system knowledge may be helpful too, so we
>>>> have process_madvise() for that before DAMON.  Some more complex
>>>> algorithm may be needed for some applications.
>>>>
>>>> And this kind of application specific policy usually need complex
>>>> configuration.  It's hard to export all these policy parameters to the
>>>> user space as the kernel ABI.  Now, DAMON schemes parameters are
>>>> exported in debugfs so they are not considered ABI.  So they may be
>>>> changed at any time.  But applications need some stable and
>>>> well-maintained ABI.
>>>>
>>>> All in all, IMHO, what we need is a user space per-application policy
>>>> daemon with the information from DAMON and other sources.
>>>
>>> I basically agree to Ying, as I also noted in the coverletter of DAMOS
>>> patchset[1]:
>>>
>>>      DAMON[1] can be used as a primitive for data access aware memory
>>>      management optimizations.  For that, users who want such optimizations
>>>      should run DAMON, read the monitoring results, analyze it, plan a new
>>>      memory management scheme, and apply the new scheme by themselves.  Such
>>>      efforts will be inevitable for some complicated optimizations.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
>>>
>>> That is, I believe some programs and big companies would definitely have their
>>> own information and want such kind of complicated optimizations.  But, such
>>> optimizations would depend on characteristics of each program and require
>>> investment of some amount of resources.  Some other programs and users wouldn't
>>> have such special information, and/or resource to invest for such
>>> optimizations.  For them, some amount of benefit would be helpful enough even
>>> though its sub-optimal.
>>>
>>> I think we should help both groups, and DAMOS could be useful for the second
>>> group.  And I don't think DAMOS is useless for the first group.  They could use
>>> their information-based policy in prallel to DAMOS in some cases.  E.g., if
>>> they have a way to predict the data access pattern of specific memory region
>>> even without help from DAMON, they can use their own policy for the region but
>>> DAMOS for other regions.
>>>
>>> Someone could ask why not implement a user-space implementation for the second
>>> group, then.  First of all, DAMOS is not only for the user-space driven virtual
>>> memory management optimization, but also for kernel-space programs and any
>>> DAMOS-supportable address spaces including the physical address space.  And,
>>> another important goal of DAMOS for user space driven use case in addition to
>>> reducing the redundant code is minimizing the user-kernel context switch
>>> overhead for passing the monitoring results information and memory management
>>> action requests.
>>>
>>> In summary, I agree the user space per-application policy daemon will be useful
>>> for the specialized ultimate optimizations, but we also need DAMOS for another
>>> common group of cases.
>>>
>>> If I'm missing something, please feel free to let me know.
>>
>> I guess that most end-users and quite some system administrators of
>> small companies have no enough capability to take advantage of the
>> per-application optimizations.  How do they know the appropriate region
>> number and proactive reclaim threshold?
>>
>> So per my understanding, Linux kernel
>> need provide,
>>
>> 1. An in-kernel general policy that is obviously correct and benefits
>>     almost all users and applications, at least no regression.  No
>>     complex configuration or deep knowledge is needed to take advantage
>>     of it.
>>
>> 2. Some way to inspect and control system and application behavior, so
>>     that some advanced and customized user space policy daemons can be
>>     built to satisfy some advanced users who have the enough knowledge
>>     for the applications and systems, for example, oomd.
> 
> Agreed, and I think that's the approach that DAMON is currently taking.  In
> specific, we provide DAMON debugfs interface for users who want to inspect and
> control their system and application behavior.  Using it, we also made a PoC
> level user space policy daemon[1].
> 
> For the in-kernel policies, we are developing DAMON-based kernel components one
> by one, for specific usages.  DAMON-based proactive reclamation module
> (DAMON_RECLAIM) is one such example.  Such DAMON-based components will remove
> complex tunables that necessary for the general inspection and control of the
> system but unnecessary for their specific purpose (e.g., proactive reclamation)
> to allow users use it in a simple manner.  Also, those will use conservative
> default configs to not incur visible regression.  For example, DAMON_RECLAIM
> uses only up to 1% of single CPU time for the reclamation by default.
> 
> In short, I think we're on the same page, and adding DEMOTION scheme action
> could be helpful for the users who want to efficiently inspect and control the
> system/application behavior for their tiered memory systems.  It's unclear how

Agree. It will be easier for us to deploy it to the products for the 
common scenarios.

> much benefit this could give to users, though.  I assume Baolin would come back
> with some sort of numbers in the next spin.  Nevertheless, I personally don't

Yes, I am still trying to set up the effective measurement environment 
and get the performance number in the next version.

> think that's a critical blocker, as this patch is essentially just adding a way
> for using the pre-existing primitive, namely move_pages(), in a little bit more
> efficient manner, for the access pattern-based use cases.
> 
> If I'm missing something, please feel free to let me know.
> 
> [1] https://github.com/awslabs/damoos

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-12-30  9:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-22 11:14 [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system Baolin Wang
2021-12-22 11:14 ` [PATCH v2 1/2] mm: Export the demote_page_list() function Baolin Wang
2021-12-22 11:14 ` [PATCH v2 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system Baolin Wang
2021-12-23  0:01 ` [PATCH v2 0/2] " Andrew Morton
2021-12-23  1:01   ` Baolin Wang
2021-12-23  1:07 ` Huang, Ying
2021-12-23  1:21   ` Baolin Wang
2021-12-23  3:22     ` Huang, Ying
2021-12-23  6:35       ` Baolin Wang
2021-12-23  7:51         ` Huang, Ying
2021-12-23 11:31           ` SeongJae Park
2021-12-27  3:09             ` Huang, Ying
2021-12-28  8:44               ` SeongJae Park
2021-12-29  1:33                 ` Huang, Ying
2021-12-29 10:34                   ` SeongJae Park
2021-12-30  3:16                     ` Huang, Ying
2021-12-30  8:03                       ` SeongJae Park
2021-12-30  9:31                 ` Baolin Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.