All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Hasan Al Maruf <hasan3050@gmail.com>
Cc: akpm@linux-foundation.org, dave.hansen@linux.intel.com,
	feng.tang@intel.com, hasanalmaruf@fb.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mgorman@suse.de, mgorman@techsingularity.net, mhocko@suse.com,
	osalvador@suse.de, peterz@infradead.org, riel@surriel.com,
	shakeelb@google.com, shy828301@gmail.com, weixugc@google.com,
	ziy@nvidia.com, Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH -V10 RESEND 2/6] NUMA balancing: optimize page placement for memory tiering system
Date: Wed, 08 Dec 2021 11:16:28 +0800	[thread overview]
Message-ID: <87wnkf3hwz.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <20211207063639.83762-1-hasanalmaruf@fb.com> (Hasan Al Maruf's message of "Tue, 7 Dec 2021 01:36:39 -0500")

Hasan Al Maruf <hasan3050@gmail.com> writes:

> Hi Huang,
>
>>+void set_numabalancing_state(bool enabled)
>>+{
>>+	if (enabled)
>>+		sysctl_numa_balancing_mode = NUMA_BALANCING_NORMAL;
>>+	else
>>+		sysctl_numa_balancing_mode = NUMA_BALANCING_DISABLED;
>>+	__set_numabalancing_state(enabled);
>>+}
>>+
>
> One of the properties of optimized NUMA Balancing for tiered memory is we
> are not going to scan top-tier nodes as promotion doesn't make sense there
> (implemented in the next patch [3/6]). However, if a system has only
> single memory node with CPU, does it make sense to run
> `NUMA_BALANCING_NORMAL` mode there? What do you think about downgrading to
> `NUMA_BALANCING_MEMORY_TIERING` mode if a user setup NUMA Balancing on
> the default mode of `NUMA_BALANCING_NORMAL` on a single toptier memory
> node?

Consider a system with only 1 NUMA node and no PMEM, should we refuse
NUMA balancing to be enabled at all?

Per my understanding, the philosophy behind is to keep thing as small as
possible instead of as smart as possible.  Do you agree?

>>diff --git a/mm/vmscan.c b/mm/vmscan.c
>>index c266e64d2f7e..5edb5dfa8900 100644
>>--- a/mm/vmscan.c
>>+++ b/mm/vmscan.c
>>@@ -56,6 +56,7 @@
>>
>> #include <linux/swapops.h>
>> #include <linux/balloon_compaction.h>
>>+#include <linux/sched/sysctl.h>
>>
>> #include "internal.h"
>>
>>@@ -3919,6 +3920,12 @@ static bool pgdat_watermark_boosted(pg_data_t *pgdat, int highest_zoneidx)
>> 	return false;
>> }
>>
>>+/*
>>+ * Keep the free pages on fast memory node a little more than the high
>>+ * watermark to accommodate the promoted pages.
>>+ */
>>+#define NUMA_BALANCING_PROMOTE_WATERMARK	(10UL * 1024 * 1024 >> PAGE_SHIFT)
>>+
>> /*
>>  * Returns true if there is an eligible zone balanced for the request order
>>  * and highest_zoneidx
>>@@ -3940,6 +3947,15 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
>> 			continue;
>>
>> 		mark = high_wmark_pages(zone);
>>+		if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
>>+		    numa_demotion_enabled &&
>>+		    next_demotion_node(pgdat->node_id) != NUMA_NO_NODE) {
>>+			unsigned long promote_mark;
>>+
>>+			promote_mark = min(NUMA_BALANCING_PROMOTE_WATERMARK,
>>+					   pgdat->node_present_pages >> 6);
>>+			mark += promote_mark;
>>+		}
>> 		if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))
>> 			return true;
>> 	}
>
> This can be moved to a different patch. I think, this patch [2/6] can be
> splitted into two basic patches -- 1. NUMA Balancing interface for tiered
> memory and 2. maintaining a headroom for promotion.

Johannes has taught me that, if we introduce a new function, variable,
or interface, it's better to introduce its user together.  So that we
can determine whether it's necessary to do that, whether the definition
is suitable, etc.  I think that makes sense.  So I try to do that in
this patchset too.

As in [2/5] of your patchset below, another possibility is to make
1. NUMA balancing interface for tiered memory and 2. skip scanning top
tier memory in NUMA balancing in one patch.  One concern is that
although this is an optimization, there's almost no measurable
performance difference.  This makes it hard to justify to extend the
user space interface.  Do you have better data to support this?

> Instead of having a static value for `NUMA_BALANCING_PROMOTE_WATERMARK`
> what about decoupling the allocation and reclamation and add a user-space
> interface for controling them?

This means to add a new user space ABI.  Because we may need to support
the new ABI forever, we should have strong justification to add it.
I am not against to add an ABI to adjust promotion watermark in
general.  I think that the path could be,

- Have a simplest solution that works without introducing new ABI, like
  something in this patch, or revised.

- Then try to add a new ABI in a separate patch with enough
  justification, for example, with much improved performance data.

Do you agree?

> Do you think patch [2/5] and [3/5] of this series can be merged to your
> current patchset?
>
> https://lore.kernel.org/all/cover.1637778851.git.hasanalmaruf@fb.com/

Best Regards,
Huang, Ying

  reply	other threads:[~2021-12-08  3:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-07  2:27 [PATCH -V10 RESEND 0/6] NUMA balancing: optimize memory placement for memory tiering system Huang Ying
2021-12-07  2:27 ` [PATCH -V10 RESEND 1/6] NUMA Balancing: add page promotion counter Huang Ying
2021-12-07  6:05   ` Hasan Al Maruf
2021-12-08  2:16     ` Huang, Ying
2021-12-17  7:25   ` Baolin Wang
2021-12-07  2:27 ` [PATCH -V10 RESEND 2/6] NUMA balancing: optimize page placement for memory tiering system Huang Ying
2021-12-07  6:36   ` Hasan Al Maruf
2021-12-08  3:16     ` Huang, Ying [this message]
2021-12-17  7:35   ` Baolin Wang
2021-12-07  2:27 ` [PATCH -V10 RESEND 3/6] memory tiering: skip to scan fast memory Huang Ying
2021-12-17  7:41   ` Baolin Wang
2021-12-07  2:27 ` [PATCH -V10 RESEND 4/6] memory tiering: hot page selection with hint page fault latency Huang Ying
2021-12-07  2:27 ` [PATCH -V10 RESEND 5/6] memory tiering: rate limit NUMA migration throughput Huang Ying
2021-12-07  2:27 ` [PATCH -V10 RESEND 6/6] memory tiering: adjust hot threshold automatically Huang Ying
2022-01-12 16:10 ` [PATCH -V10 RESEND 0/6] NUMA balancing: optimize memory placement for memory tiering system Peter Zijlstra
2022-01-13  7:19   ` Huang, Ying
2022-01-13  9:49     ` Peter Zijlstra
2022-01-13 12:06       ` Huang, Ying
2022-01-13 13:00         ` Peter Zijlstra
2022-01-13 13:13           ` Huang, Ying
2022-01-13 14:24           ` Huang, Ying
2022-01-14  5:24             ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnkf3hwz.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hasan3050@gmail.com \
    --cc=hasanalmaruf@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=weixugc@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.