All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] mm/compaction: let proactive compaction order configurable
@ 2021-04-28  2:28 chukaiping
  2021-05-10  0:17 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: chukaiping @ 2021-04-28  2:28 UTC (permalink / raw)
  To: mcgrof, keescook, yzaikin, akpm, vbabka, nigupta, bhe,
	khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def
  Cc: linux-kernel, linux-fsdevel, linux-mm

Currently the proactive compaction order is fixed to
COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
normal 4KB memory, but it's too high for the machines with small
normal memory, for example the machines with most memory configured
as 1GB hugetlbfs huge pages. In these machines the max order of
free pages is often below 9, and it's always below 9 even with hard
compaction. This will lead to proactive compaction be triggered very
frequently. In these machines we only care about order of 3 or 4.
This patch export the oder to proc and let it configurable
by user, and the default value is still COMPACTION_HPAGE_ORDER.

Signed-off-by: chukaiping <chukaiping@baidu.com>
Reported-by: kernel test robot <lkp@intel.com>
---

Changes in v4:
    - change the sysctl file name to proactive_compation_order

Changes in v3:
    - change the min value of compaction_order to 1 because the fragmentation
      index of order 0 is always 0
    - move the definition of max_buddy_zone into #ifdef CONFIG_COMPACTION

Changes in v2:
    - fix the compile error in ia64 and powerpc, move the initialization
      of sysctl_compaction_order to kcompactd_init because
      COMPACTION_HPAGE_ORDER is a variable in these architectures
    - change the hard coded max order number from 10 to MAX_ORDER - 1

 include/linux/compaction.h |    1 +
 kernel/sysctl.c            |   10 ++++++++++
 mm/compaction.c            |   12 ++++++++----
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index ed4070e..a0226b1 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int order)
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern unsigned int sysctl_compaction_proactiveness;
+extern unsigned int sysctl_proactive_compaction_order;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 			void *buffer, size_t *length, loff_t *ppos);
 extern int sysctl_extfrag_threshold;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 62fbd09..ed9012e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -196,6 +196,7 @@ enum sysctl_writes_mode {
 #endif /* CONFIG_SCHED_DEBUG */
 
 #ifdef CONFIG_COMPACTION
+static int max_buddy_zone = MAX_ORDER - 1;
 static int min_extfrag_threshold;
 static int max_extfrag_threshold = 1000;
 #endif
@@ -2871,6 +2872,15 @@ int proc_do_static_key(struct ctl_table *table, int write,
 		.extra2		= &one_hundred,
 	},
 	{
+		.procname       = "proactive_compation_order",
+		.data           = &sysctl_proactive_compaction_order,
+		.maxlen         = sizeof(sysctl_proactive_compaction_order),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec_minmax,
+		.extra1         = SYSCTL_ONE,
+		.extra2         = &max_buddy_zone,
+	},
+	{
 		.procname	= "extfrag_threshold",
 		.data		= &sysctl_extfrag_threshold,
 		.maxlen		= sizeof(int),
diff --git a/mm/compaction.c b/mm/compaction.c
index e04f447..171436e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1925,17 +1925,18 @@ static bool kswapd_is_running(pg_data_t *pgdat)
 
 /*
  * A zone's fragmentation score is the external fragmentation wrt to the
- * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
+ * sysctl_proactive_compaction_order. It returns a value in the range
+ * [0, 100].
  */
 static unsigned int fragmentation_score_zone(struct zone *zone)
 {
-	return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
+	return extfrag_for_order(zone, sysctl_proactive_compaction_order);
 }
 
 /*
  * A weighted zone's fragmentation score is the external fragmentation
- * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
- * returns a value in the range [0, 100].
+ * wrt to the sysctl_proactive_compaction_order scaled by the zone's size.
+ * It returns a value in the range [0, 100].
  *
  * The scaling factor ensures that proactive compaction focuses on larger
  * zones like ZONE_NORMAL, rather than smaller, specialized zones like
@@ -2666,6 +2667,7 @@ static void compact_nodes(void)
  * background. It takes values in the range [0, 100].
  */
 unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
+unsigned int __read_mostly sysctl_proactive_compaction_order;
 
 /*
  * This is the entry point for compacting all nodes via
@@ -2958,6 +2960,8 @@ static int __init kcompactd_init(void)
 	int nid;
 	int ret;
 
+	sysctl_proactive_compaction_order = COMPACTION_HPAGE_ORDER;
+
 	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
 					"mm/compaction:online",
 					kcompactd_cpu_online, NULL);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-04-28  2:28 [PATCH v4] mm/compaction: let proactive compaction order configurable chukaiping
@ 2021-05-10  0:17 ` Andrew Morton
  2021-05-10  2:10   ` 答复: " Chu,Kaiping
  2021-05-11  4:20     ` David Rientjes
  2021-05-28 17:42 ` Vlastimil Babka
  2021-06-09 10:44 ` David Hildenbrand
  2 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2021-05-10  0:17 UTC (permalink / raw)
  To: chukaiping
  Cc: mcgrof, keescook, yzaikin, vbabka, nigupta, bhe, khalid.aziz,
	iamjoonsoo.kim, mateusznosek0, sh_def, linux-kernel,
	linux-fsdevel, linux-mm, Mel Gorman, David Rientjes

On Wed, 28 Apr 2021 10:28:21 +0800 chukaiping <chukaiping@baidu.com> wrote:

> Currently the proactive compaction order is fixed to
> COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> normal 4KB memory, but it's too high for the machines with small
> normal memory, for example the machines with most memory configured
> as 1GB hugetlbfs huge pages. In these machines the max order of
> free pages is often below 9, and it's always below 9 even with hard
> compaction. This will lead to proactive compaction be triggered very
> frequently. In these machines we only care about order of 3 or 4.
> This patch export the oder to proc and let it configurable
> by user, and the default value is still COMPACTION_HPAGE_ORDER.

It would be great to do this automatically?  It's quite simple to see
when memory is being handed out to hugetlbfs - so can we tune
proactive_compaction_order in response to this?  That would be far
better than adding a manual tunable.

But from having read Khalid's comments, that does sound quite involved.
Is there some partial solution that we can come up with that will get
most people out of trouble?

That being said, this patch is super-super-simple so perhaps we should
just merge it just to get one person (and hopefully a few more) out of
trouble.  But on the other hand, once we add a /proc tunable we must
maintain that tunable for ever (or at least a very long time) even if
the internal implementations change a lot.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 答复: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-05-10  0:17 ` Andrew Morton
@ 2021-05-10  2:10   ` Chu,Kaiping
  2021-05-11  4:20     ` David Rientjes
  1 sibling, 0 replies; 11+ messages in thread
From: Chu,Kaiping @ 2021-05-10  2:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mcgrof, keescook, yzaikin, vbabka, nigupta, bhe, khalid.aziz,
	iamjoonsoo.kim, mateusznosek0, sh_def, linux-kernel,
	linux-fsdevel, linux-mm, Mel Gorman, David Rientjes



-----邮件原件-----
发件人: Andrew Morton <akpm@linux-foundation.org> 
发送时间: 2021年5月10日 8:18
收件人: Chu,Kaiping <chukaiping@baidu.com>
抄送: mcgrof@kernel.org; keescook@chromium.org; yzaikin@google.com; vbabka@suse.cz; nigupta@nvidia.com; bhe@redhat.com; khalid.aziz@oracle.com; iamjoonsoo.kim@lge.com; mateusznosek0@gmail.com; sh_def@163.com; linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-mm@kvack.org; Mel Gorman <mgorman@techsingularity.net>; David Rientjes <rientjes@google.com>
主题: Re: [PATCH v4] mm/compaction: let proactive compaction order configurable

On Wed, 28 Apr 2021 10:28:21 +0800 chukaiping <chukaiping@baidu.com> wrote:

> > Currently the proactive compaction order is fixed to 
> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of 
> > normal 4KB memory, but it's too high for the machines with small 
> > normal memory, for example the machines with most memory configured as 
> > 1GB hugetlbfs huge pages. In these machines the max order of free 
> > pages is often below 9, and it's always below 9 even with hard 
> > compaction. This will lead to proactive compaction be triggered very 
> > frequently. In these machines we only care about order of 3 or 4.
> > This patch export the oder to proc and let it configurable by user, 
> > and the default value is still COMPACTION_HPAGE_ORDER.

> It would be great to do this automatically?  It's quite simple to see when memory is being handed out to hugetlbfs - so can we tune proactive_compaction_order in response to this?  That would be far better than adding a manual tunable.

> But from having read Khalid's comments, that does sound quite involved.
> Is there some partial solution that we can come up with that will get most people out of trouble?

> That being said, this patch is super-super-simple so perhaps we should just merge it just to get one person (and hopefully a few more) out of trouble.  But on the other hand, once we add a /proc tunable we must maintain that tunable for ever (or at least a very long time) even if the internal implementations change a lot.

Currently the fragment index of each zone is per order, there is no single fragment index for the whole system, so we can only use a user defined order for proactive compaction. I am keep thinking of the way to calculating the average fragment index of the system, but till now I doesn't think out it. I think that we can just use the proc file to configure the order manually, if we think out better solution in future, we can keep the proc file but remove the implementation internally.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-05-10  0:17 ` Andrew Morton
@ 2021-05-11  4:20     ` David Rientjes
  2021-05-11  4:20     ` David Rientjes
  1 sibling, 0 replies; 11+ messages in thread
From: David Rientjes @ 2021-05-11  4:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: chukaiping, mcgrof, keescook, yzaikin, vbabka, nigupta, bhe,
	khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def, linux-kernel,
	linux-fsdevel, linux-mm, Mel Gorman

On Sun, 9 May 2021, Andrew Morton wrote:

> > Currently the proactive compaction order is fixed to
> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> > normal 4KB memory, but it's too high for the machines with small
> > normal memory, for example the machines with most memory configured
> > as 1GB hugetlbfs huge pages. In these machines the max order of
> > free pages is often below 9, and it's always below 9 even with hard
> > compaction. This will lead to proactive compaction be triggered very
> > frequently. In these machines we only care about order of 3 or 4.
> > This patch export the oder to proc and let it configurable
> > by user, and the default value is still COMPACTION_HPAGE_ORDER.
> 
> It would be great to do this automatically?  It's quite simple to see
> when memory is being handed out to hugetlbfs - so can we tune
> proactive_compaction_order in response to this?  That would be far
> better than adding a manual tunable.
> 
> But from having read Khalid's comments, that does sound quite involved.
> Is there some partial solution that we can come up with that will get
> most people out of trouble?
> 
> That being said, this patch is super-super-simple so perhaps we should
> just merge it just to get one person (and hopefully a few more) out of
> trouble.  But on the other hand, once we add a /proc tunable we must
> maintain that tunable for ever (or at least a very long time) even if
> the internal implementations change a lot.
> 

As mentioned in v3 of the patch, I'm not sure why this belongs in the 
kernel at all.

I understand that the system is largely consumed by 1GB gigantic pages and 
that a small percentage of memory is left for native pages.  Thus, 
fragmentation readily occurs and can affect large order allocations even 
at the levels of order-3 or order-4.

So it seems like the ideal solution would be to monitor the fragmentation 
index at the order you care about (the same order you would use for this 
new tunable) and root userspace would manually trigger compaction when 
necessary.  When this was brought up, it was commented that explicitly 
triggered compaction is too expensive to do all in one iteration.  That's 
fair enough, but shouldn't that be an improvement on explicitly triggered 
compaction through sysfs to provide a shorter term (or weaker form) of 
compaction rather than build additional policy decisions into the kernel?

If done this way, there would be a clear separation between mechanism and 
policy and the kernel would not need to carry these sysctls to tune very 
niche areas.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4] mm/compaction: let proactive compaction order configurable
@ 2021-05-11  4:20     ` David Rientjes
  0 siblings, 0 replies; 11+ messages in thread
From: David Rientjes @ 2021-05-11  4:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: chukaiping, mcgrof, keescook, yzaikin, vbabka, nigupta, bhe,
	khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def, linux-kernel,
	linux-fsdevel, linux-mm, Mel Gorman

On Sun, 9 May 2021, Andrew Morton wrote:

> > Currently the proactive compaction order is fixed to
> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> > normal 4KB memory, but it's too high for the machines with small
> > normal memory, for example the machines with most memory configured
> > as 1GB hugetlbfs huge pages. In these machines the max order of
> > free pages is often below 9, and it's always below 9 even with hard
> > compaction. This will lead to proactive compaction be triggered very
> > frequently. In these machines we only care about order of 3 or 4.
> > This patch export the oder to proc and let it configurable
> > by user, and the default value is still COMPACTION_HPAGE_ORDER.
> 
> It would be great to do this automatically?  It's quite simple to see
> when memory is being handed out to hugetlbfs - so can we tune
> proactive_compaction_order in response to this?  That would be far
> better than adding a manual tunable.
> 
> But from having read Khalid's comments, that does sound quite involved.
> Is there some partial solution that we can come up with that will get
> most people out of trouble?
> 
> That being said, this patch is super-super-simple so perhaps we should
> just merge it just to get one person (and hopefully a few more) out of
> trouble.  But on the other hand, once we add a /proc tunable we must
> maintain that tunable for ever (or at least a very long time) even if
> the internal implementations change a lot.
> 

As mentioned in v3 of the patch, I'm not sure why this belongs in the 
kernel at all.

I understand that the system is largely consumed by 1GB gigantic pages and 
that a small percentage of memory is left for native pages.  Thus, 
fragmentation readily occurs and can affect large order allocations even 
at the levels of order-3 or order-4.

So it seems like the ideal solution would be to monitor the fragmentation 
index at the order you care about (the same order you would use for this 
new tunable) and root userspace would manually trigger compaction when 
necessary.  When this was brought up, it was commented that explicitly 
triggered compaction is too expensive to do all in one iteration.  That's 
fair enough, but shouldn't that be an improvement on explicitly triggered 
compaction through sysfs to provide a shorter term (or weaker form) of 
compaction rather than build additional policy decisions into the kernel?

If done this way, there would be a clear separation between mechanism and 
policy and the kernel would not need to carry these sysctls to tune very 
niche areas.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-04-28  2:28 [PATCH v4] mm/compaction: let proactive compaction order configurable chukaiping
  2021-05-10  0:17 ` Andrew Morton
@ 2021-05-28 17:42 ` Vlastimil Babka
  2021-06-01  1:15   ` 答复: " Chu,Kaiping
  2021-06-09 10:44 ` David Hildenbrand
  2 siblings, 1 reply; 11+ messages in thread
From: Vlastimil Babka @ 2021-05-28 17:42 UTC (permalink / raw)
  To: chukaiping, mcgrof, keescook, yzaikin, akpm, nigupta, bhe,
	khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def
  Cc: linux-kernel, linux-fsdevel, linux-mm

On 4/28/21 4:28 AM, chukaiping wrote:
> Currently the proactive compaction order is fixed to
> COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> normal 4KB memory, but it's too high for the machines with small
> normal memory, for example the machines with most memory configured
> as 1GB hugetlbfs huge pages. In these machines the max order of
> free pages is often below 9, and it's always below 9 even with hard
> compaction. This will lead to proactive compaction be triggered very
> frequently.

Could you be more concrete about "very frequently"? There's a proactive_defer
mechanism that should help here. Normally the proactive compaction attempt
happens each 500ms, but if it fails to improve the fragmentation score, it
defers for 32 seconds. So is 32 seconds still too frequent? Or the score does
improve thus defer doesn't happen, but the cost of that improvement is too high
compared to the amount of the improvement?

> In these machines we only care about order of 3 or 4.
> This patch export the oder to proc and let it configurable
> by user, and the default value is still COMPACTION_HPAGE_ORDER.
> 
> Signed-off-by: chukaiping <chukaiping@baidu.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
> 
> Changes in v4:
>     - change the sysctl file name to proactive_compation_order
> 
> Changes in v3:
>     - change the min value of compaction_order to 1 because the fragmentation
>       index of order 0 is always 0
>     - move the definition of max_buddy_zone into #ifdef CONFIG_COMPACTION
> 
> Changes in v2:
>     - fix the compile error in ia64 and powerpc, move the initialization
>       of sysctl_compaction_order to kcompactd_init because
>       COMPACTION_HPAGE_ORDER is a variable in these architectures
>     - change the hard coded max order number from 10 to MAX_ORDER - 1
> 
>  include/linux/compaction.h |    1 +
>  kernel/sysctl.c            |   10 ++++++++++
>  mm/compaction.c            |   12 ++++++++----
>  3 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index ed4070e..a0226b1 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int order)
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compact_memory;
>  extern unsigned int sysctl_compaction_proactiveness;
> +extern unsigned int sysctl_proactive_compaction_order;
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>  			void *buffer, size_t *length, loff_t *ppos);
>  extern int sysctl_extfrag_threshold;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 62fbd09..ed9012e 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -196,6 +196,7 @@ enum sysctl_writes_mode {
>  #endif /* CONFIG_SCHED_DEBUG */
>  
>  #ifdef CONFIG_COMPACTION
> +static int max_buddy_zone = MAX_ORDER - 1;
>  static int min_extfrag_threshold;
>  static int max_extfrag_threshold = 1000;
>  #endif
> @@ -2871,6 +2872,15 @@ int proc_do_static_key(struct ctl_table *table, int write,
>  		.extra2		= &one_hundred,
>  	},
>  	{
> +		.procname       = "proactive_compation_order",
> +		.data           = &sysctl_proactive_compaction_order,
> +		.maxlen         = sizeof(sysctl_proactive_compaction_order),
> +		.mode           = 0644,
> +		.proc_handler   = proc_dointvec_minmax,
> +		.extra1         = SYSCTL_ONE,
> +		.extra2         = &max_buddy_zone,
> +	},
> +	{
>  		.procname	= "extfrag_threshold",
>  		.data		= &sysctl_extfrag_threshold,
>  		.maxlen		= sizeof(int),
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e04f447..171436e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1925,17 +1925,18 @@ static bool kswapd_is_running(pg_data_t *pgdat)
>  
>  /*
>   * A zone's fragmentation score is the external fragmentation wrt to the
> - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
> + * sysctl_proactive_compaction_order. It returns a value in the range
> + * [0, 100].
>   */
>  static unsigned int fragmentation_score_zone(struct zone *zone)
>  {
> -	return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
> +	return extfrag_for_order(zone, sysctl_proactive_compaction_order);
>  }
>  
>  /*
>   * A weighted zone's fragmentation score is the external fragmentation
> - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
> - * returns a value in the range [0, 100].
> + * wrt to the sysctl_proactive_compaction_order scaled by the zone's size.
> + * It returns a value in the range [0, 100].
>   *
>   * The scaling factor ensures that proactive compaction focuses on larger
>   * zones like ZONE_NORMAL, rather than smaller, specialized zones like
> @@ -2666,6 +2667,7 @@ static void compact_nodes(void)
>   * background. It takes values in the range [0, 100].
>   */
>  unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
> +unsigned int __read_mostly sysctl_proactive_compaction_order;
>  
>  /*
>   * This is the entry point for compacting all nodes via
> @@ -2958,6 +2960,8 @@ static int __init kcompactd_init(void)
>  	int nid;
>  	int ret;
>  
> +	sysctl_proactive_compaction_order = COMPACTION_HPAGE_ORDER;
> +
>  	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>  					"mm/compaction:online",
>  					kcompactd_cpu_online, NULL);
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* 答复: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-05-28 17:42 ` Vlastimil Babka
@ 2021-06-01  1:15   ` Chu,Kaiping
  2021-06-16 13:49     ` Vlastimil Babka
  0 siblings, 1 reply; 11+ messages in thread
From: Chu,Kaiping @ 2021-06-01  1:15 UTC (permalink / raw)
  To: Vlastimil Babka, mcgrof, keescook, yzaikin, akpm, nigupta, bhe,
	khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def
  Cc: linux-kernel, linux-fsdevel, linux-mm



> -----邮件原件-----
> 发件人: Vlastimil Babka <vbabka@suse.cz>
> 发送时间: 2021年5月29日 1:42
> 收件人: Chu,Kaiping <chukaiping@baidu.com>; mcgrof@kernel.org;
> keescook@chromium.org; yzaikin@google.com; akpm@linux-foundation.org;
> nigupta@nvidia.com; bhe@redhat.com; khalid.aziz@oracle.com;
> iamjoonsoo.kim@lge.com; mateusznosek0@gmail.com; sh_def@163.com
> 抄送: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> linux-mm@kvack.org
> 主题: Re: [PATCH v4] mm/compaction: let proactive compaction order
> configurable
> 
> On 4/28/21 4:28 AM, chukaiping wrote:
> > Currently the proactive compaction order is fixed to
> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> > normal 4KB memory, but it's too high for the machines with small
> > normal memory, for example the machines with most memory configured as
> > 1GB hugetlbfs huge pages. In these machines the max order of free
> > pages is often below 9, and it's always below 9 even with hard
> > compaction. This will lead to proactive compaction be triggered very
> > frequently.
> 
> Could you be more concrete about "very frequently"? There's a
> proactive_defer mechanism that should help here. Normally the proactive
> compaction attempt happens each 500ms, but if it fails to improve the
> fragmentation score, it defers for 32 seconds. So is 32 seconds still too
> frequent? Or the score does improve thus defer doesn't happen, but the cost
> of that improvement is too high compared to the amount of the
> improvement?
I didn't measure the frequency accurately, I only judge it from code. The defer of 32 seconds is still very short to us, we want the proactive compaction running period to be hours.

> 
> > In these machines we only care about order of 3 or 4.
> > This patch export the oder to proc and let it configurable by user,
> > and the default value is still COMPACTION_HPAGE_ORDER.
> >
> > Signed-off-by: chukaiping <chukaiping@baidu.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> >
> > Changes in v4:
> >     - change the sysctl file name to proactive_compation_order
> >
> > Changes in v3:
> >     - change the min value of compaction_order to 1 because the
> fragmentation
> >       index of order 0 is always 0
> >     - move the definition of max_buddy_zone into #ifdef
> > CONFIG_COMPACTION
> >
> > Changes in v2:
> >     - fix the compile error in ia64 and powerpc, move the initialization
> >       of sysctl_compaction_order to kcompactd_init because
> >       COMPACTION_HPAGE_ORDER is a variable in these architectures
> >     - change the hard coded max order number from 10 to MAX_ORDER - 1
> >
> >  include/linux/compaction.h |    1 +
> >  kernel/sysctl.c            |   10 ++++++++++
> >  mm/compaction.c            |   12 ++++++++----
> >  3 files changed, 19 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> > index ed4070e..a0226b1 100644
> > --- a/include/linux/compaction.h
> > +++ b/include/linux/compaction.h
> > @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int
> > order)  #ifdef CONFIG_COMPACTION  extern int sysctl_compact_memory;
> > extern unsigned int sysctl_compaction_proactiveness;
> > +extern unsigned int sysctl_proactive_compaction_order;
> >  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> >  			void *buffer, size_t *length, loff_t *ppos);  extern int
> > sysctl_extfrag_threshold; diff --git a/kernel/sysctl.c
> > b/kernel/sysctl.c index 62fbd09..ed9012e 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -196,6 +196,7 @@ enum sysctl_writes_mode {  #endif /*
> > CONFIG_SCHED_DEBUG */
> >
> >  #ifdef CONFIG_COMPACTION
> > +static int max_buddy_zone = MAX_ORDER - 1;
> >  static int min_extfrag_threshold;
> >  static int max_extfrag_threshold = 1000;  #endif @@ -2871,6 +2872,15
> > @@ int proc_do_static_key(struct ctl_table *table, int write,
> >  		.extra2		= &one_hundred,
> >  	},
> >  	{
> > +		.procname       = "proactive_compation_order",
> > +		.data           = &sysctl_proactive_compaction_order,
> > +		.maxlen         = sizeof(sysctl_proactive_compaction_order),
> > +		.mode           = 0644,
> > +		.proc_handler   = proc_dointvec_minmax,
> > +		.extra1         = SYSCTL_ONE,
> > +		.extra2         = &max_buddy_zone,
> > +	},
> > +	{
> >  		.procname	= "extfrag_threshold",
> >  		.data		= &sysctl_extfrag_threshold,
> >  		.maxlen		= sizeof(int),
> > diff --git a/mm/compaction.c b/mm/compaction.c index e04f447..171436e
> > 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1925,17 +1925,18 @@ static bool kswapd_is_running(pg_data_t
> > *pgdat)
> >
> >  /*
> >   * A zone's fragmentation score is the external fragmentation wrt to
> > the
> > - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
> > + * sysctl_proactive_compaction_order. It returns a value in the range
> > + * [0, 100].
> >   */
> >  static unsigned int fragmentation_score_zone(struct zone *zone)  {
> > -	return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
> > +	return extfrag_for_order(zone, sysctl_proactive_compaction_order);
> >  }
> >
> >  /*
> >   * A weighted zone's fragmentation score is the external
> > fragmentation
> > - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
> > - * returns a value in the range [0, 100].
> > + * wrt to the sysctl_proactive_compaction_order scaled by the zone's size.
> > + * It returns a value in the range [0, 100].
> >   *
> >   * The scaling factor ensures that proactive compaction focuses on larger
> >   * zones like ZONE_NORMAL, rather than smaller, specialized zones
> > like @@ -2666,6 +2667,7 @@ static void compact_nodes(void)
> >   * background. It takes values in the range [0, 100].
> >   */
> >  unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
> > +unsigned int __read_mostly sysctl_proactive_compaction_order;
> >
> >  /*
> >   * This is the entry point for compacting all nodes via @@ -2958,6
> > +2960,8 @@ static int __init kcompactd_init(void)
> >  	int nid;
> >  	int ret;
> >
> > +	sysctl_proactive_compaction_order = COMPACTION_HPAGE_ORDER;
> > +
> >  	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> >  					"mm/compaction:online",
> >  					kcompactd_cpu_online, NULL);
> >


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-04-28  2:28 [PATCH v4] mm/compaction: let proactive compaction order configurable chukaiping
  2021-05-10  0:17 ` Andrew Morton
  2021-05-28 17:42 ` Vlastimil Babka
@ 2021-06-09 10:44 ` David Hildenbrand
  2021-06-15  1:11   ` 答复: " Chu,Kaiping
  2 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2021-06-09 10:44 UTC (permalink / raw)
  To: chukaiping, mcgrof, keescook, yzaikin, akpm, vbabka, nigupta,
	bhe, khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def
  Cc: linux-kernel, linux-fsdevel, linux-mm

On 28.04.21 04:28, chukaiping wrote:
> Currently the proactive compaction order is fixed to
> COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> normal 4KB memory, but it's too high for the machines with small
> normal memory, for example the machines with most memory configured
> as 1GB hugetlbfs huge pages. In these machines the max order of
> free pages is often below 9, and it's always below 9 even with hard
> compaction. This will lead to proactive compaction be triggered very
> frequently. In these machines we only care about order of 3 or 4.
> This patch export the oder to proc and let it configurable
> by user, and the default value is still COMPACTION_HPAGE_ORDER.
> 
> Signed-off-by: chukaiping <chukaiping@baidu.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
> 
> Changes in v4:
>      - change the sysctl file name to proactive_compation_order
> 
> Changes in v3:
>      - change the min value of compaction_order to 1 because the fragmentation
>        index of order 0 is always 0
>      - move the definition of max_buddy_zone into #ifdef CONFIG_COMPACTION
> 
> Changes in v2:
>      - fix the compile error in ia64 and powerpc, move the initialization
>        of sysctl_compaction_order to kcompactd_init because
>        COMPACTION_HPAGE_ORDER is a variable in these architectures
>      - change the hard coded max order number from 10 to MAX_ORDER - 1
> 
>   include/linux/compaction.h |    1 +
>   kernel/sysctl.c            |   10 ++++++++++
>   mm/compaction.c            |   12 ++++++++----
>   3 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index ed4070e..a0226b1 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int order)
>   #ifdef CONFIG_COMPACTION
>   extern int sysctl_compact_memory;
>   extern unsigned int sysctl_compaction_proactiveness;
> +extern unsigned int sysctl_proactive_compaction_order;
>   extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>   			void *buffer, size_t *length, loff_t *ppos);
>   extern int sysctl_extfrag_threshold;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 62fbd09..ed9012e 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -196,6 +196,7 @@ enum sysctl_writes_mode {
>   #endif /* CONFIG_SCHED_DEBUG */
>   
>   #ifdef CONFIG_COMPACTION
> +static int max_buddy_zone = MAX_ORDER - 1;
>   static int min_extfrag_threshold;
>   static int max_extfrag_threshold = 1000;
>   #endif
> @@ -2871,6 +2872,15 @@ int proc_do_static_key(struct ctl_table *table, int write,
>   		.extra2		= &one_hundred,
>   	},
>   	{
> +		.procname       = "proactive_compation_order",
> +		.data           = &sysctl_proactive_compaction_order,
> +		.maxlen         = sizeof(sysctl_proactive_compaction_order),
> +		.mode           = 0644,
> +		.proc_handler   = proc_dointvec_minmax,
> +		.extra1         = SYSCTL_ONE,
> +		.extra2         = &max_buddy_zone,
> +	},
> +	{
>   		.procname	= "extfrag_threshold",
>   		.data		= &sysctl_extfrag_threshold,
>   		.maxlen		= sizeof(int),
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e04f447..171436e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1925,17 +1925,18 @@ static bool kswapd_is_running(pg_data_t *pgdat)
>   
>   /*
>    * A zone's fragmentation score is the external fragmentation wrt to the
> - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
> + * sysctl_proactive_compaction_order. It returns a value in the range
> + * [0, 100].
>    */
>   static unsigned int fragmentation_score_zone(struct zone *zone)
>   {
> -	return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
> +	return extfrag_for_order(zone, sysctl_proactive_compaction_order);
>   }
>   
>   /*
>    * A weighted zone's fragmentation score is the external fragmentation
> - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
> - * returns a value in the range [0, 100].
> + * wrt to the sysctl_proactive_compaction_order scaled by the zone's size.
> + * It returns a value in the range [0, 100].
>    *
>    * The scaling factor ensures that proactive compaction focuses on larger
>    * zones like ZONE_NORMAL, rather than smaller, specialized zones like
> @@ -2666,6 +2667,7 @@ static void compact_nodes(void)
>    * background. It takes values in the range [0, 100].
>    */
>   unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
> +unsigned int __read_mostly sysctl_proactive_compaction_order;
>   
>   /*
>    * This is the entry point for compacting all nodes via
> @@ -2958,6 +2960,8 @@ static int __init kcompactd_init(void)
>   	int nid;
>   	int ret;
>   
> +	sysctl_proactive_compaction_order = COMPACTION_HPAGE_ORDER;
> +
>   	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>   					"mm/compaction:online",
>   					kcompactd_cpu_online, NULL);
> 

Hm, do we actually want to put an upper limit to the order a user can 
supply?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* 答复: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-06-09 10:44 ` David Hildenbrand
@ 2021-06-15  1:11   ` Chu,Kaiping
  2021-06-15  8:04     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Chu,Kaiping @ 2021-06-15  1:11 UTC (permalink / raw)
  To: David Hildenbrand, mcgrof, keescook, yzaikin, akpm, vbabka,
	nigupta, bhe, khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def
  Cc: linux-kernel, linux-fsdevel, linux-mm



> -----邮件原件-----
> 发件人: David Hildenbrand <david@redhat.com>
> 发送时间: 2021年6月9日 18:45
> 收件人: Chu,Kaiping <chukaiping@baidu.com>; mcgrof@kernel.org;
> keescook@chromium.org; yzaikin@google.com; akpm@linux-foundation.org;
> vbabka@suse.cz; nigupta@nvidia.com; bhe@redhat.com;
> khalid.aziz@oracle.com; iamjoonsoo.kim@lge.com;
> mateusznosek0@gmail.com; sh_def@163.com
> 抄送: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> linux-mm@kvack.org
> 主题: Re: [PATCH v4] mm/compaction: let proactive compaction order
> configurable
> 
> On 28.04.21 04:28, chukaiping wrote:
> > Currently the proactive compaction order is fixed to
> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> > normal 4KB memory, but it's too high for the machines with small
> > normal memory, for example the machines with most memory configured as
> > 1GB hugetlbfs huge pages. In these machines the max order of free
> > pages is often below 9, and it's always below 9 even with hard
> > compaction. This will lead to proactive compaction be triggered very
> > frequently. In these machines we only care about order of 3 or 4.
> > This patch export the oder to proc and let it configurable by user,
> > and the default value is still COMPACTION_HPAGE_ORDER.
> >
> > Signed-off-by: chukaiping <chukaiping@baidu.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> >
> > Changes in v4:
> >      - change the sysctl file name to proactive_compation_order
> >
> > Changes in v3:
> >      - change the min value of compaction_order to 1 because the
> fragmentation
> >        index of order 0 is always 0
> >      - move the definition of max_buddy_zone into #ifdef
> > CONFIG_COMPACTION
> >
> > Changes in v2:
> >      - fix the compile error in ia64 and powerpc, move the initialization
> >        of sysctl_compaction_order to kcompactd_init because
> >        COMPACTION_HPAGE_ORDER is a variable in these architectures
> >      - change the hard coded max order number from 10 to MAX_ORDER -
> 1
> >
> >   include/linux/compaction.h |    1 +
> >   kernel/sysctl.c            |   10 ++++++++++
> >   mm/compaction.c            |   12 ++++++++----
> >   3 files changed, 19 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> > index ed4070e..a0226b1 100644
> > --- a/include/linux/compaction.h
> > +++ b/include/linux/compaction.h
> > @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int
> order)
> >   #ifdef CONFIG_COMPACTION
> >   extern int sysctl_compact_memory;
> >   extern unsigned int sysctl_compaction_proactiveness;
> > +extern unsigned int sysctl_proactive_compaction_order;
> >   extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> >   			void *buffer, size_t *length, loff_t *ppos);
> >   extern int sysctl_extfrag_threshold; diff --git a/kernel/sysctl.c
> > b/kernel/sysctl.c index 62fbd09..ed9012e 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -196,6 +196,7 @@ enum sysctl_writes_mode {
> >   #endif /* CONFIG_SCHED_DEBUG */
> >
> >   #ifdef CONFIG_COMPACTION
> > +static int max_buddy_zone = MAX_ORDER - 1;
> >   static int min_extfrag_threshold;
> >   static int max_extfrag_threshold = 1000;
> >   #endif
> > @@ -2871,6 +2872,15 @@ int proc_do_static_key(struct ctl_table *table,
> int write,
> >   		.extra2		= &one_hundred,
> >   	},
> >   	{
> > +		.procname       = "proactive_compation_order",
> > +		.data           = &sysctl_proactive_compaction_order,
> > +		.maxlen         = sizeof(sysctl_proactive_compaction_order),
> > +		.mode           = 0644,
> > +		.proc_handler   = proc_dointvec_minmax,
> > +		.extra1         = SYSCTL_ONE,
> > +		.extra2         = &max_buddy_zone,
> > +	},
> > +	{
> >   		.procname	= "extfrag_threshold",
> >   		.data		= &sysctl_extfrag_threshold,
> >   		.maxlen		= sizeof(int),
> > diff --git a/mm/compaction.c b/mm/compaction.c index e04f447..171436e
> > 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1925,17 +1925,18 @@ static bool kswapd_is_running(pg_data_t
> > *pgdat)
> >
> >   /*
> >    * A zone's fragmentation score is the external fragmentation wrt to
> > the
> > - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
> > + * sysctl_proactive_compaction_order. It returns a value in the range
> > + * [0, 100].
> >    */
> >   static unsigned int fragmentation_score_zone(struct zone *zone)
> >   {
> > -	return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
> > +	return extfrag_for_order(zone, sysctl_proactive_compaction_order);
> >   }
> >
> >   /*
> >    * A weighted zone's fragmentation score is the external
> > fragmentation
> > - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
> > - * returns a value in the range [0, 100].
> > + * wrt to the sysctl_proactive_compaction_order scaled by the zone's size.
> > + * It returns a value in the range [0, 100].
> >    *
> >    * The scaling factor ensures that proactive compaction focuses on larger
> >    * zones like ZONE_NORMAL, rather than smaller, specialized zones
> > like @@ -2666,6 +2667,7 @@ static void compact_nodes(void)
> >    * background. It takes values in the range [0, 100].
> >    */
> >   unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
> > +unsigned int __read_mostly sysctl_proactive_compaction_order;
> >
> >   /*
> >    * This is the entry point for compacting all nodes via @@ -2958,6
> > +2960,8 @@ static int __init kcompactd_init(void)
> >   	int nid;
> >   	int ret;
> >
> > +	sysctl_proactive_compaction_order = COMPACTION_HPAGE_ORDER;
> > +
> >   	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> >   					"mm/compaction:online",
> >   					kcompactd_cpu_online, NULL);
> >
> 
> Hm, do we actually want to put an upper limit to the order a user can supply?
No,we should allow user to configure the order from 1 to MAX_ORDER - 1.
> 
> --
> Thanks,
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 答复: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-06-15  1:11   ` 答复: " Chu,Kaiping
@ 2021-06-15  8:04     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2021-06-15  8:04 UTC (permalink / raw)
  To: Chu,Kaiping, mcgrof, keescook, yzaikin, akpm, vbabka, nigupta,
	bhe, khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def
  Cc: linux-kernel, linux-fsdevel, linux-mm

On 15.06.21 03:11, Chu,Kaiping wrote:
> 
> 
>> -----邮件原件-----
>> 发件人: David Hildenbrand <david@redhat.com>
>> 发送时间: 2021年6月9日 18:45
>> 收件人: Chu,Kaiping <chukaiping@baidu.com>; mcgrof@kernel.org;
>> keescook@chromium.org; yzaikin@google.com; akpm@linux-foundation.org;
>> vbabka@suse.cz; nigupta@nvidia.com; bhe@redhat.com;
>> khalid.aziz@oracle.com; iamjoonsoo.kim@lge.com;
>> mateusznosek0@gmail.com; sh_def@163.com
>> 抄送: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
>> linux-mm@kvack.org
>> 主题: Re: [PATCH v4] mm/compaction: let proactive compaction order
>> configurable
>>
>> On 28.04.21 04:28, chukaiping wrote:
>>> Currently the proactive compaction order is fixed to
>>> COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
>>> normal 4KB memory, but it's too high for the machines with small
>>> normal memory, for example the machines with most memory configured as
>>> 1GB hugetlbfs huge pages. In these machines the max order of free
>>> pages is often below 9, and it's always below 9 even with hard
>>> compaction. This will lead to proactive compaction be triggered very
>>> frequently. In these machines we only care about order of 3 or 4.
>>> This patch export the oder to proc and let it configurable by user,
>>> and the default value is still COMPACTION_HPAGE_ORDER.
>>>
>>> Signed-off-by: chukaiping <chukaiping@baidu.com>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> ---
>>>
>>> Changes in v4:
>>>       - change the sysctl file name to proactive_compation_order
>>>
>>> Changes in v3:
>>>       - change the min value of compaction_order to 1 because the
>> fragmentation
>>>         index of order 0 is always 0
>>>       - move the definition of max_buddy_zone into #ifdef
>>> CONFIG_COMPACTION
>>>
>>> Changes in v2:
>>>       - fix the compile error in ia64 and powerpc, move the initialization
>>>         of sysctl_compaction_order to kcompactd_init because
>>>         COMPACTION_HPAGE_ORDER is a variable in these architectures
>>>       - change the hard coded max order number from 10 to MAX_ORDER -
>> 1
>>>
>>>    include/linux/compaction.h |    1 +
>>>    kernel/sysctl.c            |   10 ++++++++++
>>>    mm/compaction.c            |   12 ++++++++----
>>>    3 files changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>>> index ed4070e..a0226b1 100644
>>> --- a/include/linux/compaction.h
>>> +++ b/include/linux/compaction.h
>>> @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int
>> order)
>>>    #ifdef CONFIG_COMPACTION
>>>    extern int sysctl_compact_memory;
>>>    extern unsigned int sysctl_compaction_proactiveness;
>>> +extern unsigned int sysctl_proactive_compaction_order;
>>>    extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>>>    			void *buffer, size_t *length, loff_t *ppos);
>>>    extern int sysctl_extfrag_threshold; diff --git a/kernel/sysctl.c
>>> b/kernel/sysctl.c index 62fbd09..ed9012e 100644
>>> --- a/kernel/sysctl.c
>>> +++ b/kernel/sysctl.c
>>> @@ -196,6 +196,7 @@ enum sysctl_writes_mode {
>>>    #endif /* CONFIG_SCHED_DEBUG */
>>>
>>>    #ifdef CONFIG_COMPACTION
>>> +static int max_buddy_zone = MAX_ORDER - 1;
>>>    static int min_extfrag_threshold;
>>>    static int max_extfrag_threshold = 1000;
>>>    #endif
>>> @@ -2871,6 +2872,15 @@ int proc_do_static_key(struct ctl_table *table,
>> int write,
>>>    		.extra2		= &one_hundred,
>>>    	},
>>>    	{
>>> +		.procname       = "proactive_compation_order",
>>> +		.data           = &sysctl_proactive_compaction_order,
>>> +		.maxlen         = sizeof(sysctl_proactive_compaction_order),
>>> +		.mode           = 0644,
>>> +		.proc_handler   = proc_dointvec_minmax,
>>> +		.extra1         = SYSCTL_ONE,
>>> +		.extra2         = &max_buddy_zone,
>>> +	},
>>> +	{
>>>    		.procname	= "extfrag_threshold",
>>>    		.data		= &sysctl_extfrag_threshold,
>>>    		.maxlen		= sizeof(int),
>>> diff --git a/mm/compaction.c b/mm/compaction.c index e04f447..171436e
>>> 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1925,17 +1925,18 @@ static bool kswapd_is_running(pg_data_t
>>> *pgdat)
>>>
>>>    /*
>>>     * A zone's fragmentation score is the external fragmentation wrt to
>>> the
>>> - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
>>> + * sysctl_proactive_compaction_order. It returns a value in the range
>>> + * [0, 100].
>>>     */
>>>    static unsigned int fragmentation_score_zone(struct zone *zone)
>>>    {
>>> -	return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
>>> +	return extfrag_for_order(zone, sysctl_proactive_compaction_order);
>>>    }
>>>
>>>    /*
>>>     * A weighted zone's fragmentation score is the external
>>> fragmentation
>>> - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
>>> - * returns a value in the range [0, 100].
>>> + * wrt to the sysctl_proactive_compaction_order scaled by the zone's size.
>>> + * It returns a value in the range [0, 100].
>>>     *
>>>     * The scaling factor ensures that proactive compaction focuses on larger
>>>     * zones like ZONE_NORMAL, rather than smaller, specialized zones
>>> like @@ -2666,6 +2667,7 @@ static void compact_nodes(void)
>>>     * background. It takes values in the range [0, 100].
>>>     */
>>>    unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
>>> +unsigned int __read_mostly sysctl_proactive_compaction_order;
>>>
>>>    /*
>>>     * This is the entry point for compacting all nodes via @@ -2958,6
>>> +2960,8 @@ static int __init kcompactd_init(void)
>>>    	int nid;
>>>    	int ret;
>>>
>>> +	sysctl_proactive_compaction_order = COMPACTION_HPAGE_ORDER;
>>> +
>>>    	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>>>    					"mm/compaction:online",
>>>    					kcompactd_cpu_online, NULL);
>>>
>>
>> Hm, do we actually want to put an upper limit to the order a user can supply?
> No,we should allow user to configure the order from 1 to MAX_ORDER - 1.

Ah, I missed that we enforce an upper limit of "MAX_ORDER - 1" -- thanks.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 答复: [PATCH v4] mm/compaction: let proactive compaction order configurable
  2021-06-01  1:15   ` 答复: " Chu,Kaiping
@ 2021-06-16 13:49     ` Vlastimil Babka
  0 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2021-06-16 13:49 UTC (permalink / raw)
  To: Chu,Kaiping, mcgrof, keescook, yzaikin, akpm, nigupta, bhe,
	khalid.aziz, iamjoonsoo.kim, mateusznosek0, sh_def,
	Charan Teja Kalla, David Rientjes
  Cc: linux-kernel, linux-fsdevel, linux-mm

On 6/1/21 3:15 AM, Chu,Kaiping wrote:
> 
> 
>> -----邮件原件-----
>> 发件人: Vlastimil Babka <vbabka@suse.cz>
>> 发送时间: 2021年5月29日 1:42
>> 收件人: Chu,Kaiping <chukaiping@baidu.com>; mcgrof@kernel.org;
>> keescook@chromium.org; yzaikin@google.com; akpm@linux-foundation.org;
>> nigupta@nvidia.com; bhe@redhat.com; khalid.aziz@oracle.com;
>> iamjoonsoo.kim@lge.com; mateusznosek0@gmail.com; sh_def@163.com
>> 抄送: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
>> linux-mm@kvack.org
>> 主题: Re: [PATCH v4] mm/compaction: let proactive compaction order
>> configurable
>> 
>> On 4/28/21 4:28 AM, chukaiping wrote:
>> > Currently the proactive compaction order is fixed to
>> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
>> > normal 4KB memory, but it's too high for the machines with small
>> > normal memory, for example the machines with most memory configured as
>> > 1GB hugetlbfs huge pages. In these machines the max order of free
>> > pages is often below 9, and it's always below 9 even with hard
>> > compaction. This will lead to proactive compaction be triggered very
>> > frequently.
>> 
>> Could you be more concrete about "very frequently"? There's a
>> proactive_defer mechanism that should help here. Normally the proactive
>> compaction attempt happens each 500ms, but if it fails to improve the
>> fragmentation score, it defers for 32 seconds. So is 32 seconds still too
>> frequent? Or the score does improve thus defer doesn't happen, but the cost
>> of that improvement is too high compared to the amount of the
>> improvement?
> I didn't measure the frequency accurately, I only judge it from code. The defer of 32 seconds is still very short to us, we want the proactive compaction running period to be hours.

Hours sounds like a lot, and maybe something that would indeed be easier to
accomplies with userspace proactive compaction triggering [1] than any carefully
tuned thresholds.

But with that low frequency, doesn't the kswapd+kcompactd non-proactive
compaction actually happen more frequently? That one should react to the order
that the allocation waking up kswapd requested, AFAIK.

[1] https://lore.kernel.org/linux-doc/cover.1622454385.git.charante@codeaurora.org/

> 
>> 
>> > In these machines we only care about order of 3 or 4.
>> > This patch export the oder to proc and let it configurable by user,
>> > and the default value is still COMPACTION_HPAGE_ORDER.
>> >
>> > Signed-off-by: chukaiping <chukaiping@baidu.com>
>> > Reported-by: kernel test robot <lkp@intel.com>
>> > ---
>> >
>> > Changes in v4:
>> >     - change the sysctl file name to proactive_compation_order
>> >
>> > Changes in v3:
>> >     - change the min value of compaction_order to 1 because the
>> fragmentation
>> >       index of order 0 is always 0
>> >     - move the definition of max_buddy_zone into #ifdef
>> > CONFIG_COMPACTION
>> >
>> > Changes in v2:
>> >     - fix the compile error in ia64 and powerpc, move the initialization
>> >       of sysctl_compaction_order to kcompactd_init because
>> >       COMPACTION_HPAGE_ORDER is a variable in these architectures
>> >     - change the hard coded max order number from 10 to MAX_ORDER - 1
>> >
>> >  include/linux/compaction.h |    1 +
>> >  kernel/sysctl.c            |   10 ++++++++++
>> >  mm/compaction.c            |   12 ++++++++----
>> >  3 files changed, 19 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>> > index ed4070e..a0226b1 100644
>> > --- a/include/linux/compaction.h
>> > +++ b/include/linux/compaction.h
>> > @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int
>> > order)  #ifdef CONFIG_COMPACTION  extern int sysctl_compact_memory;
>> > extern unsigned int sysctl_compaction_proactiveness;
>> > +extern unsigned int sysctl_proactive_compaction_order;
>> >  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>> >  			void *buffer, size_t *length, loff_t *ppos);  extern int
>> > sysctl_extfrag_threshold; diff --git a/kernel/sysctl.c
>> > b/kernel/sysctl.c index 62fbd09..ed9012e 100644
>> > --- a/kernel/sysctl.c
>> > +++ b/kernel/sysctl.c
>> > @@ -196,6 +196,7 @@ enum sysctl_writes_mode {  #endif /*
>> > CONFIG_SCHED_DEBUG */
>> >
>> >  #ifdef CONFIG_COMPACTION
>> > +static int max_buddy_zone = MAX_ORDER - 1;
>> >  static int min_extfrag_threshold;
>> >  static int max_extfrag_threshold = 1000;  #endif @@ -2871,6 +2872,15
>> > @@ int proc_do_static_key(struct ctl_table *table, int write,
>> >  		.extra2		= &one_hundred,
>> >  	},
>> >  	{
>> > +		.procname       = "proactive_compation_order",
>> > +		.data           = &sysctl_proactive_compaction_order,
>> > +		.maxlen         = sizeof(sysctl_proactive_compaction_order),
>> > +		.mode           = 0644,
>> > +		.proc_handler   = proc_dointvec_minmax,
>> > +		.extra1         = SYSCTL_ONE,
>> > +		.extra2         = &max_buddy_zone,
>> > +	},
>> > +	{
>> >  		.procname	= "extfrag_threshold",
>> >  		.data		= &sysctl_extfrag_threshold,
>> >  		.maxlen		= sizeof(int),
>> > diff --git a/mm/compaction.c b/mm/compaction.c index e04f447..171436e
>> > 100644
>> > --- a/mm/compaction.c
>> > +++ b/mm/compaction.c
>> > @@ -1925,17 +1925,18 @@ static bool kswapd_is_running(pg_data_t
>> > *pgdat)
>> >
>> >  /*
>> >   * A zone's fragmentation score is the external fragmentation wrt to
>> > the
>> > - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
>> > + * sysctl_proactive_compaction_order. It returns a value in the range
>> > + * [0, 100].
>> >   */
>> >  static unsigned int fragmentation_score_zone(struct zone *zone)  {
>> > -	return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
>> > +	return extfrag_for_order(zone, sysctl_proactive_compaction_order);
>> >  }
>> >
>> >  /*
>> >   * A weighted zone's fragmentation score is the external
>> > fragmentation
>> > - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
>> > - * returns a value in the range [0, 100].
>> > + * wrt to the sysctl_proactive_compaction_order scaled by the zone's size.
>> > + * It returns a value in the range [0, 100].
>> >   *
>> >   * The scaling factor ensures that proactive compaction focuses on larger
>> >   * zones like ZONE_NORMAL, rather than smaller, specialized zones
>> > like @@ -2666,6 +2667,7 @@ static void compact_nodes(void)
>> >   * background. It takes values in the range [0, 100].
>> >   */
>> >  unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
>> > +unsigned int __read_mostly sysctl_proactive_compaction_order;
>> >
>> >  /*
>> >   * This is the entry point for compacting all nodes via @@ -2958,6
>> > +2960,8 @@ static int __init kcompactd_init(void)
>> >  	int nid;
>> >  	int ret;
>> >
>> > +	sysctl_proactive_compaction_order = COMPACTION_HPAGE_ORDER;
>> > +
>> >  	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>> >  					"mm/compaction:online",
>> >  					kcompactd_cpu_online, NULL);
>> >
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-16 13:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28  2:28 [PATCH v4] mm/compaction: let proactive compaction order configurable chukaiping
2021-05-10  0:17 ` Andrew Morton
2021-05-10  2:10   ` 答复: " Chu,Kaiping
2021-05-11  4:20   ` David Rientjes
2021-05-11  4:20     ` David Rientjes
2021-05-28 17:42 ` Vlastimil Babka
2021-06-01  1:15   ` 答复: " Chu,Kaiping
2021-06-16 13:49     ` Vlastimil Babka
2021-06-09 10:44 ` David Hildenbrand
2021-06-15  1:11   ` 答复: " Chu,Kaiping
2021-06-15  8:04     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.