[PATCH 0/1] mm, compaction: correct the bounds of __fragmentation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-18 16:47 ` robert.m.harris
  0 siblings, 0 replies; 29+ messages in thread
From: robert.m.harris @ 2018-02-18 16:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-doc
  Cc: Jonathan Corbet, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Kirill A. Shutemov, Johannes Weiner, Kemi Wang, David Rientjes,
	Yafang Shao, Kangmin Park, Mel Gorman, Yisheng Xie,
	Davidlohr Bueso, Greg Kroah-Hartman, Huang Ying, Vinayak Menon,
	Robert M. Harris

From: "Robert M. Harris" <robert.m.harris@oracle.com>

__fragmentation_index() calculates a value used to determine whether
compaction should be favoured over page reclaim in the event of
allocation failure.  The function purports to return a value between 0
and 1000, representing units of 1/1000.  Barring the case of a
pathological shortfall of memory, the lower bound is instead 500.  This
is significant because it is the default value of
sysctl_extfrag_threshold, i.e. the value below which compaction should
be avoided in favour of page reclaim for costly pages.

Here's an illustration using a zone that I fragmented with selective
calls to __alloc_pages() and __free_pages --- the fragmentation for
order-1 could not be minimised further yet is reported as 0.5:

# head -1 /proc/buddyinfo
Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 0.999 
# 

With extreme memory shortage the reported fragmentation index does go
lower.  In fact, it can go below zero:

# head -1 /proc/buddyinfo
Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.-500 0.-250 0.-125 0.-62 0.-31 0.-15 0.-07 0.-03 0.-01 0.000 
# 

This patch implements and documents a modified version of the original
expression that returns a value in the range 0 <= index < 1000.  It
amends the default value of sysctl_extfrag_threshold to preserve the
existing behaviour.  With this patch in place, the same two tests yield

# head -1 /proc/buddyinfo
Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 
# 

and

# head -1 /proc/buddyinfo
Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
# 

Robert M. Harris (1):
  mm, compaction: correct the bounds of __fragmentation_index()

 Documentation/sysctl/vm.txt |  2 +-
 mm/compaction.c             |  2 +-
 mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
 3 files changed, 39 insertions(+), 12 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-18 16:47 ` robert.m.harris
  0 siblings, 0 replies; 29+ messages in thread
From: robert.m.harris @ 2018-02-18 16:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-doc
  Cc: Jonathan Corbet, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Kirill A. Shutemov, Johannes Weiner, Kemi Wang, David Rientjes,
	Yafang Shao, Kangmin Park, Mel Gorman, Yisheng Xie,
	Davidlohr Bueso, Greg Kroah-Hartman, Huang Ying, Vinayak Menon,
	Robert M. Harris

From: "Robert M. Harris" <robert.m.harris@oracle.com>

__fragmentation_index() calculates a value used to determine whether
compaction should be favoured over page reclaim in the event of
allocation failure.  The function purports to return a value between 0
and 1000, representing units of 1/1000.  Barring the case of a
pathological shortfall of memory, the lower bound is instead 500.  This
is significant because it is the default value of
sysctl_extfrag_threshold, i.e. the value below which compaction should
be avoided in favour of page reclaim for costly pages.

Here's an illustration using a zone that I fragmented with selective
calls to __alloc_pages() and __free_pages --- the fragmentation for
order-1 could not be minimised further yet is reported as 0.5:

# head -1 /proc/buddyinfo
Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 0.999 
# 

With extreme memory shortage the reported fragmentation index does go
lower.  In fact, it can go below zero:

# head -1 /proc/buddyinfo
Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.-500 0.-250 0.-125 0.-62 0.-31 0.-15 0.-07 0.-03 0.-01 0.000 
# 

This patch implements and documents a modified version of the original
expression that returns a value in the range 0 <= index < 1000.  It
amends the default value of sysctl_extfrag_threshold to preserve the
existing behaviour.  With this patch in place, the same two tests yield

# head -1 /proc/buddyinfo
Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 
# 

and

# head -1 /proc/buddyinfo
Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
# head -1 /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
# 

Robert M. Harris (1):
  mm, compaction: correct the bounds of __fragmentation_index()

 Documentation/sysctl/vm.txt |  2 +-
 mm/compaction.c             |  2 +-
 mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
 3 files changed, 39 insertions(+), 12 deletions(-)

-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-18 16:47 ` robert.m.harris
@ 2018-02-18 16:47   ` robert.m.harris
  -1 siblings, 0 replies; 29+ messages in thread
From: robert.m.harris @ 2018-02-18 16:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-doc
  Cc: Jonathan Corbet, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Kirill A. Shutemov, Johannes Weiner, Kemi Wang, David Rientjes,
	Yafang Shao, Kangmin Park, Mel Gorman, Yisheng Xie,
	Davidlohr Bueso, Greg Kroah-Hartman, Huang Ying, Vinayak Menon,
	Robert M. Harris

From: "Robert M. Harris" <robert.m.harris@oracle.com>

__fragmentation_index() calculates a value used to determine whether
compaction should be favoured over page reclaim in the event of allocation
failure.  The calculation itself is opaque and, on inspection, does not
match its existing description.  The function purports to return a value
between 0 and 1000, representing units of 1/1000.  Barring the case of a
pathological shortfall of memory, the lower bound is instead 500.  This is
significant because it is the default value of sysctl_extfrag_threshold,
i.e. the value below which compaction should be avoided in favour of page
reclaim for costly pages.

This patch implements and documents a modified version of the original
expression that returns a value in the range 0 <= index < 1000.  It amends
the default value of sysctl_extfrag_threshold to preserve the existing
behaviour.

Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
---
 Documentation/sysctl/vm.txt |  2 +-
 mm/compaction.c             |  2 +-
 mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 5025ff9..384a78b 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -237,7 +237,7 @@ of memory, values towards 1000 imply failures are due to fragmentation and -1
 implies that the allocation will succeed as long as watermarks are met.
 
 The kernel will not compact memory in a zone if the
-fragmentation index is <= extfrag_threshold. The default value is 500.
+fragmentation index is <= extfrag_threshold. The default value is 0.
 
 ==============================================================
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 10cd757..9db6ef4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1730,7 +1730,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	return ret;
 }
 
-int sysctl_extfrag_threshold = 500;
+int sysctl_extfrag_threshold;
 
 /**
  * try_to_compact_pages - Direct compact to satisfy a high-order allocation
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2db6..013f1af 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1044,15 +1044,22 @@ static void fill_contig_page_info(struct zone *zone,
 }
 
 /*
- * A fragmentation index only makes sense if an allocation of a requested
- * size would fail. If that is true, the fragmentation index indicates
- * whether external fragmentation or a lack of memory was the problem.
- * The value can be used to determine if page reclaim or compaction
- * should be used
+ * If there is no block of at least the requested size, implying that an
+ * allocation would fail, then it might be possible to conjure one by
+ * compaction.  As this is expensive it is reserved for those cases in which
+ * there is a relatively high degree of fragmentation.  For low degrees, page
+ * reclaim is more appropriate since an allocation failure is more likely to be
+ * caused by a lack of memory.
+ *
+ * This function calculates an index in the range 0 to 1, expressed in units of
+ * 1/1000, indicating low and high fragmentation respectively.  The special
+ * value of -1 indicates that free blocks of sufficient size are available and
+ * that an allocation should therefore succeed.
  */
 static int __fragmentation_index(unsigned int order, struct contig_page_info *info)
 {
 	unsigned long requested = 1UL << order;
+	int result;
 
 	if (WARN_ON_ONCE(order >= MAX_ORDER))
 		return 0;
@@ -1060,17 +1067,37 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in
 	if (!info->free_blocks_total)
 		return 0;
 
-	/* Fragmentation index only makes sense when a request would fail */
 	if (info->free_blocks_suitable)
 		return -1000;
 
 	/*
-	 * Index is between 0 and 1 so return within 3 decimal places
+	 * If the number of requested-size blocks that could be constructed if
+	 * all free blocks were compacted is
+	 *
+	 *	B = info->free_pages/requested
+	 *
+	 * then, conceptually, the number of fragments into which each
+	 * requested-size block has been split is
+	 *
+	 *	N = info->free_blocks_total/B
 	 *
-	 * 0 => allocation would fail due to lack of memory
-	 * 1 => allocation would fail due to fragmentation
+	 * In the least and most fragmented cases all free memory resides on
+	 * either the order - 1 free list or the base page free list
+	 * respecively, thus the range of this function is given by
+	 * 2 <= N <= requested.  The fragmentation index,
+	 *
+	 *	F = 1 - 2/N,
+	 *
+	 * has the more useful range of 0 < F <= 1.  In order to inhibit
+	 * compaction in the event of a pathological shortfall of memory this
+	 * function truncates and returns
+	 *
+	 *	F - 1/info->free_blocks_total
 	 */
-	return 1000 - div_u64( (1000+(div_u64(info->free_pages * 1000ULL, requested))), info->free_blocks_total);
+	result = 1000 - div_u64((1000 + (div_u64(info->free_pages * 2000ULL,
+			requested))), info->free_blocks_total);
+
+	return (result < 0) ? 0 : result;
 }
 
 /* Same as __fragmentation index but allocs contig_page_info on stack */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-18 16:47   ` robert.m.harris
  0 siblings, 0 replies; 29+ messages in thread
From: robert.m.harris @ 2018-02-18 16:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-doc
  Cc: Jonathan Corbet, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Kirill A. Shutemov, Johannes Weiner, Kemi Wang, David Rientjes,
	Yafang Shao, Kangmin Park, Mel Gorman, Yisheng Xie,
	Davidlohr Bueso, Greg Kroah-Hartman, Huang Ying, Vinayak Menon,
	Robert M. Harris

From: "Robert M. Harris" <robert.m.harris@oracle.com>

__fragmentation_index() calculates a value used to determine whether
compaction should be favoured over page reclaim in the event of allocation
failure.  The calculation itself is opaque and, on inspection, does not
match its existing description.  The function purports to return a value
between 0 and 1000, representing units of 1/1000.  Barring the case of a
pathological shortfall of memory, the lower bound is instead 500.  This is
significant because it is the default value of sysctl_extfrag_threshold,
i.e. the value below which compaction should be avoided in favour of page
reclaim for costly pages.

This patch implements and documents a modified version of the original
expression that returns a value in the range 0 <= index < 1000.  It amends
the default value of sysctl_extfrag_threshold to preserve the existing
behaviour.

Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
---
 Documentation/sysctl/vm.txt |  2 +-
 mm/compaction.c             |  2 +-
 mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 5025ff9..384a78b 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -237,7 +237,7 @@ of memory, values towards 1000 imply failures are due to fragmentation and -1
 implies that the allocation will succeed as long as watermarks are met.
 
 The kernel will not compact memory in a zone if the
-fragmentation index is <= extfrag_threshold. The default value is 500.
+fragmentation index is <= extfrag_threshold. The default value is 0.
 
 ==============================================================
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 10cd757..9db6ef4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1730,7 +1730,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	return ret;
 }
 
-int sysctl_extfrag_threshold = 500;
+int sysctl_extfrag_threshold;
 
 /**
  * try_to_compact_pages - Direct compact to satisfy a high-order allocation
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2db6..013f1af 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1044,15 +1044,22 @@ static void fill_contig_page_info(struct zone *zone,
 }
 
 /*
- * A fragmentation index only makes sense if an allocation of a requested
- * size would fail. If that is true, the fragmentation index indicates
- * whether external fragmentation or a lack of memory was the problem.
- * The value can be used to determine if page reclaim or compaction
- * should be used
+ * If there is no block of at least the requested size, implying that an
+ * allocation would fail, then it might be possible to conjure one by
+ * compaction.  As this is expensive it is reserved for those cases in which
+ * there is a relatively high degree of fragmentation.  For low degrees, page
+ * reclaim is more appropriate since an allocation failure is more likely to be
+ * caused by a lack of memory.
+ *
+ * This function calculates an index in the range 0 to 1, expressed in units of
+ * 1/1000, indicating low and high fragmentation respectively.  The special
+ * value of -1 indicates that free blocks of sufficient size are available and
+ * that an allocation should therefore succeed.
  */
 static int __fragmentation_index(unsigned int order, struct contig_page_info *info)
 {
 	unsigned long requested = 1UL << order;
+	int result;
 
 	if (WARN_ON_ONCE(order >= MAX_ORDER))
 		return 0;
@@ -1060,17 +1067,37 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in
 	if (!info->free_blocks_total)
 		return 0;
 
-	/* Fragmentation index only makes sense when a request would fail */
 	if (info->free_blocks_suitable)
 		return -1000;
 
 	/*
-	 * Index is between 0 and 1 so return within 3 decimal places
+	 * If the number of requested-size blocks that could be constructed if
+	 * all free blocks were compacted is
+	 *
+	 *	B = info->free_pages/requested
+	 *
+	 * then, conceptually, the number of fragments into which each
+	 * requested-size block has been split is
+	 *
+	 *	N = info->free_blocks_total/B
 	 *
-	 * 0 => allocation would fail due to lack of memory
-	 * 1 => allocation would fail due to fragmentation
+	 * In the least and most fragmented cases all free memory resides on
+	 * either the order - 1 free list or the base page free list
+	 * respecively, thus the range of this function is given by
+	 * 2 <= N <= requested.  The fragmentation index,
+	 *
+	 *	F = 1 - 2/N,
+	 *
+	 * has the more useful range of 0 < F <= 1.  In order to inhibit
+	 * compaction in the event of a pathological shortfall of memory this
+	 * function truncates and returns
+	 *
+	 *	F - 1/info->free_blocks_total
 	 */
-	return 1000 - div_u64( (1000+(div_u64(info->free_pages * 1000ULL, requested))), info->free_blocks_total);
+	result = 1000 - div_u64((1000 + (div_u64(info->free_pages * 2000ULL,
+			requested))), info->free_blocks_total);
+
+	return (result < 0) ? 0 : result;
 }
 
 /* Same as __fragmentation index but allocs contig_page_info on stack */
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-18 16:47 ` robert.m.harris
@ 2018-02-19  8:24   ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-19  8:24 UTC (permalink / raw)
  To: robert.m.harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Sun 18-02-18 16:47:54, robert.m.harris@oracle.com wrote:
> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> 
> __fragmentation_index() calculates a value used to determine whether
> compaction should be favoured over page reclaim in the event of
> allocation failure.  The function purports to return a value between 0
> and 1000, representing units of 1/1000.  Barring the case of a
> pathological shortfall of memory, the lower bound is instead 500.  This
> is significant because it is the default value of
> sysctl_extfrag_threshold, i.e. the value below which compaction should
> be avoided in favour of page reclaim for costly pages.
> 
> Here's an illustration using a zone that I fragmented with selective
> calls to __alloc_pages() and __free_pages --- the fragmentation for
> order-1 could not be minimised further yet is reported as 0.5:

Cover letter for a single patch is usually an overkill. Why is this
information not valuable in the patch description directly?

> # head -1 /proc/buddyinfo
> Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 0.999 
> # 
> 
> With extreme memory shortage the reported fragmentation index does go
> lower.  In fact, it can go below zero:
> 
> # head -1 /proc/buddyinfo
> Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.-500 0.-250 0.-125 0.-62 0.-31 0.-15 0.-07 0.-03 0.-01 0.000 
> # 
> 
> This patch implements and documents a modified version of the original
> expression that returns a value in the range 0 <= index < 1000.  It
> amends the default value of sysctl_extfrag_threshold to preserve the
> existing behaviour.  With this patch in place, the same two tests yield
> 
> # head -1 /proc/buddyinfo
> Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 
> # 
> 
> and
> 
> # head -1 /proc/buddyinfo
> Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
> # 
> 
> Robert M. Harris (1):
>   mm, compaction: correct the bounds of __fragmentation_index()
> 
>  Documentation/sysctl/vm.txt |  2 +-
>  mm/compaction.c             |  2 +-
>  mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19  8:24   ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-19  8:24 UTC (permalink / raw)
  To: robert.m.harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Sun 18-02-18 16:47:54, robert.m.harris@oracle.com wrote:
> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> 
> __fragmentation_index() calculates a value used to determine whether
> compaction should be favoured over page reclaim in the event of
> allocation failure.  The function purports to return a value between 0
> and 1000, representing units of 1/1000.  Barring the case of a
> pathological shortfall of memory, the lower bound is instead 500.  This
> is significant because it is the default value of
> sysctl_extfrag_threshold, i.e. the value below which compaction should
> be avoided in favour of page reclaim for costly pages.
> 
> Here's an illustration using a zone that I fragmented with selective
> calls to __alloc_pages() and __free_pages --- the fragmentation for
> order-1 could not be minimised further yet is reported as 0.5:

Cover letter for a single patch is usually an overkill. Why is this
information not valuable in the patch description directly?

> # head -1 /proc/buddyinfo
> Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 0.999 
> # 
> 
> With extreme memory shortage the reported fragmentation index does go
> lower.  In fact, it can go below zero:
> 
> # head -1 /proc/buddyinfo
> Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.-500 0.-250 0.-125 0.-62 0.-31 0.-15 0.-07 0.-03 0.-01 0.000 
> # 
> 
> This patch implements and documents a modified version of the original
> expression that returns a value in the range 0 <= index < 1000.  It
> amends the default value of sysctl_extfrag_threshold to preserve the
> existing behaviour.  With this patch in place, the same two tests yield
> 
> # head -1 /proc/buddyinfo
> Node 0, zone      DMA   1983      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.000 0.500 0.750 0.875 0.937 0.969 0.984 0.992 0.996 0.998 
> # 
> 
> and
> 
> # head -1 /proc/buddyinfo
> Node 0, zone      DMA      1      0      0      0      0      0      0      0      0      0      0 
> # head -1 /sys/kernel/debug/extfrag/extfrag_index
> Node 0, zone      DMA -1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
> # 
> 
> Robert M. Harris (1):
>   mm, compaction: correct the bounds of __fragmentation_index()
> 
>  Documentation/sysctl/vm.txt |  2 +-
>  mm/compaction.c             |  2 +-
>  mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-18 16:47   ` robert.m.harris
@ 2018-02-19  8:26     ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-19  8:26 UTC (permalink / raw)
  To: robert.m.harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> 
> __fragmentation_index() calculates a value used to determine whether
> compaction should be favoured over page reclaim in the event of allocation
> failure.  The calculation itself is opaque and, on inspection, does not
> match its existing description.  The function purports to return a value
> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> pathological shortfall of memory, the lower bound is instead 500.  This is
> significant because it is the default value of sysctl_extfrag_threshold,
> i.e. the value below which compaction should be avoided in favour of page
> reclaim for costly pages.
> 
> This patch implements and documents a modified version of the original
> expression that returns a value in the range 0 <= index < 1000.  It amends
> the default value of sysctl_extfrag_threshold to preserve the existing
> behaviour.

It is not really clear to me what is the actual problem you are trying
to solve by this patch. Is there any bug or are you just trying to
improve the current implementation to be more effective?

> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
> ---
>  Documentation/sysctl/vm.txt |  2 +-
>  mm/compaction.c             |  2 +-
>  mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 5025ff9..384a78b 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -237,7 +237,7 @@ of memory, values towards 1000 imply failures are due to fragmentation and -1
>  implies that the allocation will succeed as long as watermarks are met.
>  
>  The kernel will not compact memory in a zone if the
> -fragmentation index is <= extfrag_threshold. The default value is 500.
> +fragmentation index is <= extfrag_threshold. The default value is 0.
>  
>  ==============================================================
>  
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 10cd757..9db6ef4 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1730,7 +1730,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  	return ret;
>  }
>  
> -int sysctl_extfrag_threshold = 500;
> +int sysctl_extfrag_threshold;
>  
>  /**
>   * try_to_compact_pages - Direct compact to satisfy a high-order allocation
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 40b2db6..013f1af 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1044,15 +1044,22 @@ static void fill_contig_page_info(struct zone *zone,
>  }
>  
>  /*
> - * A fragmentation index only makes sense if an allocation of a requested
> - * size would fail. If that is true, the fragmentation index indicates
> - * whether external fragmentation or a lack of memory was the problem.
> - * The value can be used to determine if page reclaim or compaction
> - * should be used
> + * If there is no block of at least the requested size, implying that an
> + * allocation would fail, then it might be possible to conjure one by
> + * compaction.  As this is expensive it is reserved for those cases in which
> + * there is a relatively high degree of fragmentation.  For low degrees, page
> + * reclaim is more appropriate since an allocation failure is more likely to be
> + * caused by a lack of memory.
> + *
> + * This function calculates an index in the range 0 to 1, expressed in units of
> + * 1/1000, indicating low and high fragmentation respectively.  The special
> + * value of -1 indicates that free blocks of sufficient size are available and
> + * that an allocation should therefore succeed.
>   */
>  static int __fragmentation_index(unsigned int order, struct contig_page_info *info)
>  {
>  	unsigned long requested = 1UL << order;
> +	int result;
>  
>  	if (WARN_ON_ONCE(order >= MAX_ORDER))
>  		return 0;
> @@ -1060,17 +1067,37 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in
>  	if (!info->free_blocks_total)
>  		return 0;
>  
> -	/* Fragmentation index only makes sense when a request would fail */
>  	if (info->free_blocks_suitable)
>  		return -1000;
>  
>  	/*
> -	 * Index is between 0 and 1 so return within 3 decimal places
> +	 * If the number of requested-size blocks that could be constructed if
> +	 * all free blocks were compacted is
> +	 *
> +	 *	B = info->free_pages/requested
> +	 *
> +	 * then, conceptually, the number of fragments into which each
> +	 * requested-size block has been split is
> +	 *
> +	 *	N = info->free_blocks_total/B
>  	 *
> -	 * 0 => allocation would fail due to lack of memory
> -	 * 1 => allocation would fail due to fragmentation
> +	 * In the least and most fragmented cases all free memory resides on
> +	 * either the order - 1 free list or the base page free list
> +	 * respecively, thus the range of this function is given by
> +	 * 2 <= N <= requested.  The fragmentation index,
> +	 *
> +	 *	F = 1 - 2/N,
> +	 *
> +	 * has the more useful range of 0 < F <= 1.  In order to inhibit
> +	 * compaction in the event of a pathological shortfall of memory this
> +	 * function truncates and returns
> +	 *
> +	 *	F - 1/info->free_blocks_total
>  	 */
> -	return 1000 - div_u64( (1000+(div_u64(info->free_pages * 1000ULL, requested))), info->free_blocks_total);
> +	result = 1000 - div_u64((1000 + (div_u64(info->free_pages * 2000ULL,
> +			requested))), info->free_blocks_total);
> +
> +	return (result < 0) ? 0 : result;
>  }
>  
>  /* Same as __fragmentation index but allocs contig_page_info on stack */
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19  8:26     ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-19  8:26 UTC (permalink / raw)
  To: robert.m.harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> 
> __fragmentation_index() calculates a value used to determine whether
> compaction should be favoured over page reclaim in the event of allocation
> failure.  The calculation itself is opaque and, on inspection, does not
> match its existing description.  The function purports to return a value
> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> pathological shortfall of memory, the lower bound is instead 500.  This is
> significant because it is the default value of sysctl_extfrag_threshold,
> i.e. the value below which compaction should be avoided in favour of page
> reclaim for costly pages.
> 
> This patch implements and documents a modified version of the original
> expression that returns a value in the range 0 <= index < 1000.  It amends
> the default value of sysctl_extfrag_threshold to preserve the existing
> behaviour.

It is not really clear to me what is the actual problem you are trying
to solve by this patch. Is there any bug or are you just trying to
improve the current implementation to be more effective?

> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
> ---
>  Documentation/sysctl/vm.txt |  2 +-
>  mm/compaction.c             |  2 +-
>  mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 5025ff9..384a78b 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -237,7 +237,7 @@ of memory, values towards 1000 imply failures are due to fragmentation and -1
>  implies that the allocation will succeed as long as watermarks are met.
>  
>  The kernel will not compact memory in a zone if the
> -fragmentation index is <= extfrag_threshold. The default value is 500.
> +fragmentation index is <= extfrag_threshold. The default value is 0.
>  
>  ==============================================================
>  
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 10cd757..9db6ef4 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1730,7 +1730,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  	return ret;
>  }
>  
> -int sysctl_extfrag_threshold = 500;
> +int sysctl_extfrag_threshold;
>  
>  /**
>   * try_to_compact_pages - Direct compact to satisfy a high-order allocation
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 40b2db6..013f1af 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1044,15 +1044,22 @@ static void fill_contig_page_info(struct zone *zone,
>  }
>  
>  /*
> - * A fragmentation index only makes sense if an allocation of a requested
> - * size would fail. If that is true, the fragmentation index indicates
> - * whether external fragmentation or a lack of memory was the problem.
> - * The value can be used to determine if page reclaim or compaction
> - * should be used
> + * If there is no block of at least the requested size, implying that an
> + * allocation would fail, then it might be possible to conjure one by
> + * compaction.  As this is expensive it is reserved for those cases in which
> + * there is a relatively high degree of fragmentation.  For low degrees, page
> + * reclaim is more appropriate since an allocation failure is more likely to be
> + * caused by a lack of memory.
> + *
> + * This function calculates an index in the range 0 to 1, expressed in units of
> + * 1/1000, indicating low and high fragmentation respectively.  The special
> + * value of -1 indicates that free blocks of sufficient size are available and
> + * that an allocation should therefore succeed.
>   */
>  static int __fragmentation_index(unsigned int order, struct contig_page_info *info)
>  {
>  	unsigned long requested = 1UL << order;
> +	int result;
>  
>  	if (WARN_ON_ONCE(order >= MAX_ORDER))
>  		return 0;
> @@ -1060,17 +1067,37 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in
>  	if (!info->free_blocks_total)
>  		return 0;
>  
> -	/* Fragmentation index only makes sense when a request would fail */
>  	if (info->free_blocks_suitable)
>  		return -1000;
>  
>  	/*
> -	 * Index is between 0 and 1 so return within 3 decimal places
> +	 * If the number of requested-size blocks that could be constructed if
> +	 * all free blocks were compacted is
> +	 *
> +	 *	B = info->free_pages/requested
> +	 *
> +	 * then, conceptually, the number of fragments into which each
> +	 * requested-size block has been split is
> +	 *
> +	 *	N = info->free_blocks_total/B
>  	 *
> -	 * 0 => allocation would fail due to lack of memory
> -	 * 1 => allocation would fail due to fragmentation
> +	 * In the least and most fragmented cases all free memory resides on
> +	 * either the order - 1 free list or the base page free list
> +	 * respecively, thus the range of this function is given by
> +	 * 2 <= N <= requested.  The fragmentation index,
> +	 *
> +	 *	F = 1 - 2/N,
> +	 *
> +	 * has the more useful range of 0 < F <= 1.  In order to inhibit
> +	 * compaction in the event of a pathological shortfall of memory this
> +	 * function truncates and returns
> +	 *
> +	 *	F - 1/info->free_blocks_total
>  	 */
> -	return 1000 - div_u64( (1000+(div_u64(info->free_pages * 1000ULL, requested))), info->free_blocks_total);
> +	result = 1000 - div_u64((1000 + (div_u64(info->free_pages * 2000ULL,
> +			requested))), info->free_blocks_total);
> +
> +	return (result < 0) ? 0 : result;
>  }
>  
>  /* Same as __fragmentation index but allocs contig_page_info on stack */
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-18 16:47   ` robert.m.harris
@ 2018-02-19  9:47     ` Mel Gorman
  -1 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2018-02-19  9:47 UTC (permalink / raw)
  To: robert.m.harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon

On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> 
> __fragmentation_index() calculates a value used to determine whether
> compaction should be favoured over page reclaim in the event of allocation
> failure.  The calculation itself is opaque and, on inspection, does not
> match its existing description.  The function purports to return a value
> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> pathological shortfall of memory, the lower bound is instead 500.  This is
> significant because it is the default value of sysctl_extfrag_threshold,
> i.e. the value below which compaction should be avoided in favour of page
> reclaim for costly pages.
> 
> This patch implements and documents a modified version of the original
> expression that returns a value in the range 0 <= index < 1000.  It amends
> the default value of sysctl_extfrag_threshold to preserve the existing
> behaviour.
> 
> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>

You have to update sysctl_extfrag_threshold as well for the new bounds.
It effectively makes it a no-op but it was a no-op already and adjusting
that default should be supported by data indicating it's safe.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19  9:47     ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2018-02-19  9:47 UTC (permalink / raw)
  To: robert.m.harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon

On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> 
> __fragmentation_index() calculates a value used to determine whether
> compaction should be favoured over page reclaim in the event of allocation
> failure.  The calculation itself is opaque and, on inspection, does not
> match its existing description.  The function purports to return a value
> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> pathological shortfall of memory, the lower bound is instead 500.  This is
> significant because it is the default value of sysctl_extfrag_threshold,
> i.e. the value below which compaction should be avoided in favour of page
> reclaim for costly pages.
> 
> This patch implements and documents a modified version of the original
> expression that returns a value in the range 0 <= index < 1000.  It amends
> the default value of sysctl_extfrag_threshold to preserve the existing
> behaviour.
> 
> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>

You have to update sysctl_extfrag_threshold as well for the new bounds.
It effectively makes it a no-op but it was a no-op already and adjusting
that default should be supported by data indicating it's safe.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19  8:24   ` Michal Hocko
  (?)
@ 2018-02-19 11:40   ` Robert Harris
  -1 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 11:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]



> On 19 Feb 2018, at 08:24, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Sun 18-02-18 16:47:54, robert.m.harris@oracle.com <mailto:robert.m.harris@oracle.com> wrote:
>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>> 
>> __fragmentation_index() calculates a value used to determine whether
>> compaction should be favoured over page reclaim in the event of
>> allocation failure.  The function purports to return a value between 0
>> and 1000, representing units of 1/1000.  Barring the case of a
>> pathological shortfall of memory, the lower bound is instead 500.  This
>> is significant because it is the default value of
>> sysctl_extfrag_threshold, i.e. the value below which compaction should
>> be avoided in favour of page reclaim for costly pages.
>> 
>> Here's an illustration using a zone that I fragmented with selective
>> calls to __alloc_pages() and __free_pages --- the fragmentation for
>> order-1 could not be minimised further yet is reported as 0.5:
> 
> Cover letter for a single patch is usually an overkill. Why is this
> information not valuable in the patch description directly?

This is my first patch and I’m not familiar with all the conventions.
I’ll incorporate those details in the next version of the commit message.

Robert Harris

[-- Attachment #2: Type: text/html, Size: 5295 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19  8:26     ` Michal Hocko
@ 2018-02-19 12:14       ` Robert Harris
  -1 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 12:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>> 
>> __fragmentation_index() calculates a value used to determine whether
>> compaction should be favoured over page reclaim in the event of allocation
>> failure.  The calculation itself is opaque and, on inspection, does not
>> match its existing description.  The function purports to return a value
>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>> pathological shortfall of memory, the lower bound is instead 500.  This is
>> significant because it is the default value of sysctl_extfrag_threshold,
>> i.e. the value below which compaction should be avoided in favour of page
>> reclaim for costly pages.
>> 
>> This patch implements and documents a modified version of the original
>> expression that returns a value in the range 0 <= index < 1000.  It amends
>> the default value of sysctl_extfrag_threshold to preserve the existing
>> behaviour.
> 
> It is not really clear to me what is the actual problem you are trying
> to solve by this patch. Is there any bug or are you just trying to
> improve the current implementation to be more effective?

There is not a significant bug.

The first problem is that the mathematical expression in
__fragmentation_index() is opaque, particularly given the lack of
description in the comments or the original commit message.  This patch
provides such a description.

Simply annotating the expression did not make sense since the formula
doesn't work as advertised.  The fragmentation index is described as
being in the range 0 to 1000 but the bounds of the formula are instead
500 to 1000.  This patch changes the formula so that its lower bound is
0.

The fragmentation index is compared to the tuneable
sysctl_extfrag_threshold, which defaults to 500.  If the index is above
this value then compaction is preferred over page reclaim in the event
of allocation failure.  Given the issue above, the index will almost
always exceed the default threshold and compaction will occur even if
there is low fragmentation.  This patch changes the default value of the
tuneable to 0, meaning that the existing behaviour will be unchanged.
Changing sysctl_extfrag_threshold back to something non-zero in a future
patch would effect the behaviour intended by the original code but would
require more comprehensive testing since it would modify the kernel's
performance under memory pressure.

Robert Harris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19 12:14       ` Robert Harris
  0 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 12:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>> 
>> __fragmentation_index() calculates a value used to determine whether
>> compaction should be favoured over page reclaim in the event of allocation
>> failure.  The calculation itself is opaque and, on inspection, does not
>> match its existing description.  The function purports to return a value
>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>> pathological shortfall of memory, the lower bound is instead 500.  This is
>> significant because it is the default value of sysctl_extfrag_threshold,
>> i.e. the value below which compaction should be avoided in favour of page
>> reclaim for costly pages.
>> 
>> This patch implements and documents a modified version of the original
>> expression that returns a value in the range 0 <= index < 1000.  It amends
>> the default value of sysctl_extfrag_threshold to preserve the existing
>> behaviour.
> 
> It is not really clear to me what is the actual problem you are trying
> to solve by this patch. Is there any bug or are you just trying to
> improve the current implementation to be more effective?

There is not a significant bug.

The first problem is that the mathematical expression in
__fragmentation_index() is opaque, particularly given the lack of
description in the comments or the original commit message.  This patch
provides such a description.

Simply annotating the expression did not make sense since the formula
doesn't work as advertised.  The fragmentation index is described as
being in the range 0 to 1000 but the bounds of the formula are instead
500 to 1000.  This patch changes the formula so that its lower bound is
0.

The fragmentation index is compared to the tuneable
sysctl_extfrag_threshold, which defaults to 500.  If the index is above
this value then compaction is preferred over page reclaim in the event
of allocation failure.  Given the issue above, the index will almost
always exceed the default threshold and compaction will occur even if
there is low fragmentation.  This patch changes the default value of the
tuneable to 0, meaning that the existing behaviour will be unchanged.
Changing sysctl_extfrag_threshold back to something non-zero in a future
patch would effect the behaviour intended by the original code but would
require more comprehensive testing since it would modify the kernel's
performance under memory pressure.

Robert Harris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19  9:47     ` Mel Gorman
@ 2018-02-19 12:26       ` Robert Harris
  -1 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 12:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon



> On 19 Feb 2018, at 09:47, Mel Gorman <mgorman@suse.de> wrote:
> 
> On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>> 
>> __fragmentation_index() calculates a value used to determine whether
>> compaction should be favoured over page reclaim in the event of allocation
>> failure.  The calculation itself is opaque and, on inspection, does not
>> match its existing description.  The function purports to return a value
>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>> pathological shortfall of memory, the lower bound is instead 500.  This is
>> significant because it is the default value of sysctl_extfrag_threshold,
>> i.e. the value below which compaction should be avoided in favour of page
>> reclaim for costly pages.
>> 
>> This patch implements and documents a modified version of the original
>> expression that returns a value in the range 0 <= index < 1000.  It amends
>> the default value of sysctl_extfrag_threshold to preserve the existing
>> behaviour.
>> 
>> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
> 
> You have to update sysctl_extfrag_threshold as well for the new bounds.

This patch makes its default value zero.

> It effectively makes it a no-op but it was a no-op already and adjusting
> that default should be supported by data indicating it's safe.

Would it be acceptable to demonstrate using tracing that in both the
pre- and post-patch cases

  1. compaction is attempted regardless of fragmentation index,
     excepting that

  2. reclaim is preferred even for non-zero fragmentation during
     an extreme shortage of memory

?

Robert Harris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19 12:26       ` Robert Harris
  0 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 12:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon



> On 19 Feb 2018, at 09:47, Mel Gorman <mgorman@suse.de> wrote:
> 
> On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>> 
>> __fragmentation_index() calculates a value used to determine whether
>> compaction should be favoured over page reclaim in the event of allocation
>> failure.  The calculation itself is opaque and, on inspection, does not
>> match its existing description.  The function purports to return a value
>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>> pathological shortfall of memory, the lower bound is instead 500.  This is
>> significant because it is the default value of sysctl_extfrag_threshold,
>> i.e. the value below which compaction should be avoided in favour of page
>> reclaim for costly pages.
>> 
>> This patch implements and documents a modified version of the original
>> expression that returns a value in the range 0 <= index < 1000.  It amends
>> the default value of sysctl_extfrag_threshold to preserve the existing
>> behaviour.
>> 
>> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
> 
> You have to update sysctl_extfrag_threshold as well for the new bounds.

This patch makes its default value zero.

> It effectively makes it a no-op but it was a no-op already and adjusting
> that default should be supported by data indicating it's safe.

Would it be acceptable to demonstrate using tracing that in both the
pre- and post-patch cases

  1. compaction is attempted regardless of fragmentation index,
     excepting that

  2. reclaim is preferred even for non-zero fragmentation during
     an extreme shortage of memory

?

Robert Harris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19 12:14       ` Robert Harris
@ 2018-02-19 12:39         ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-19 12:39 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Mon 19-02-18 12:14:26, Robert Harris wrote:
> 
> 
> > On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
> >> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> >> 
> >> __fragmentation_index() calculates a value used to determine whether
> >> compaction should be favoured over page reclaim in the event of allocation
> >> failure.  The calculation itself is opaque and, on inspection, does not
> >> match its existing description.  The function purports to return a value
> >> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> >> pathological shortfall of memory, the lower bound is instead 500.  This is
> >> significant because it is the default value of sysctl_extfrag_threshold,
> >> i.e. the value below which compaction should be avoided in favour of page
> >> reclaim for costly pages.
> >> 
> >> This patch implements and documents a modified version of the original
> >> expression that returns a value in the range 0 <= index < 1000.  It amends
> >> the default value of sysctl_extfrag_threshold to preserve the existing
> >> behaviour.
> > 
> > It is not really clear to me what is the actual problem you are trying
> > to solve by this patch. Is there any bug or are you just trying to
> > improve the current implementation to be more effective?
> 
> There is not a significant bug.
> 
> The first problem is that the mathematical expression in
> __fragmentation_index() is opaque, particularly given the lack of
> description in the comments or the original commit message.  This patch
> provides such a description.
> 
> Simply annotating the expression did not make sense since the formula
> doesn't work as advertised.  The fragmentation index is described as
> being in the range 0 to 1000 but the bounds of the formula are instead
> 500 to 1000.  This patch changes the formula so that its lower bound is
> 0.

But why do we want to fix that in the first place? Why don't we simply
deprecate the tunable and remove it altogether? Who is relying on tuning
this option. Considering how it doesn't work as advertised and nobody
complaining I have that feeling that it is not really used in wild...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19 12:39         ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-19 12:39 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Mon 19-02-18 12:14:26, Robert Harris wrote:
> 
> 
> > On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
> >> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> >> 
> >> __fragmentation_index() calculates a value used to determine whether
> >> compaction should be favoured over page reclaim in the event of allocation
> >> failure.  The calculation itself is opaque and, on inspection, does not
> >> match its existing description.  The function purports to return a value
> >> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> >> pathological shortfall of memory, the lower bound is instead 500.  This is
> >> significant because it is the default value of sysctl_extfrag_threshold,
> >> i.e. the value below which compaction should be avoided in favour of page
> >> reclaim for costly pages.
> >> 
> >> This patch implements and documents a modified version of the original
> >> expression that returns a value in the range 0 <= index < 1000.  It amends
> >> the default value of sysctl_extfrag_threshold to preserve the existing
> >> behaviour.
> > 
> > It is not really clear to me what is the actual problem you are trying
> > to solve by this patch. Is there any bug or are you just trying to
> > improve the current implementation to be more effective?
> 
> There is not a significant bug.
> 
> The first problem is that the mathematical expression in
> __fragmentation_index() is opaque, particularly given the lack of
> description in the comments or the original commit message.  This patch
> provides such a description.
> 
> Simply annotating the expression did not make sense since the formula
> doesn't work as advertised.  The fragmentation index is described as
> being in the range 0 to 1000 but the bounds of the formula are instead
> 500 to 1000.  This patch changes the formula so that its lower bound is
> 0.

But why do we want to fix that in the first place? Why don't we simply
deprecate the tunable and remove it altogether? Who is relying on tuning
this option. Considering how it doesn't work as advertised and nobody
complaining I have that feeling that it is not really used in wild...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19 12:26       ` Robert Harris
@ 2018-02-19 13:10         ` Mel Gorman
  -1 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2018-02-19 13:10 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon

On Mon, Feb 19, 2018 at 12:26:39PM +0000, Robert Harris wrote:
> 
> 
> > On 19 Feb 2018, at 09:47, Mel Gorman <mgorman@suse.de> wrote:
> > 
> > On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
> >> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> >> 
> >> __fragmentation_index() calculates a value used to determine whether
> >> compaction should be favoured over page reclaim in the event of allocation
> >> failure.  The calculation itself is opaque and, on inspection, does not
> >> match its existing description.  The function purports to return a value
> >> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> >> pathological shortfall of memory, the lower bound is instead 500.  This is
> >> significant because it is the default value of sysctl_extfrag_threshold,
> >> i.e. the value below which compaction should be avoided in favour of page
> >> reclaim for costly pages.
> >> 
> >> This patch implements and documents a modified version of the original
> >> expression that returns a value in the range 0 <= index < 1000.  It amends
> >> the default value of sysctl_extfrag_threshold to preserve the existing
> >> behaviour.
> >> 
> >> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
> > 
> > You have to update sysctl_extfrag_threshold as well for the new bounds.
> 
> This patch makes its default value zero.
> 

Sorry, I'm clearly blind.

> > It effectively makes it a no-op but it was a no-op already and adjusting
> > that default should be supported by data indicating it's safe.
> 
> Would it be acceptable to demonstrate using tracing that in both the
> pre- and post-patch cases
> 
>   1. compaction is attempted regardless of fragmentation index,
>      excepting that
> 
>   2. reclaim is preferred even for non-zero fragmentation during
>      an extreme shortage of memory
> 

If you can demonstrate that for both reclaim-intensive and
compaction-intensive workloads then yes. Also include the reclaim and
compaction stats from /proc/vmstat and not just tracepoints to demonstrate
that reclaim doesn't get out of control and reclaim the world in
response to failed high-order allocations such as THP.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19 13:10         ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2018-02-19 13:10 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon

On Mon, Feb 19, 2018 at 12:26:39PM +0000, Robert Harris wrote:
> 
> 
> > On 19 Feb 2018, at 09:47, Mel Gorman <mgorman@suse.de> wrote:
> > 
> > On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
> >> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> >> 
> >> __fragmentation_index() calculates a value used to determine whether
> >> compaction should be favoured over page reclaim in the event of allocation
> >> failure.  The calculation itself is opaque and, on inspection, does not
> >> match its existing description.  The function purports to return a value
> >> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> >> pathological shortfall of memory, the lower bound is instead 500.  This is
> >> significant because it is the default value of sysctl_extfrag_threshold,
> >> i.e. the value below which compaction should be avoided in favour of page
> >> reclaim for costly pages.
> >> 
> >> This patch implements and documents a modified version of the original
> >> expression that returns a value in the range 0 <= index < 1000.  It amends
> >> the default value of sysctl_extfrag_threshold to preserve the existing
> >> behaviour.
> >> 
> >> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
> > 
> > You have to update sysctl_extfrag_threshold as well for the new bounds.
> 
> This patch makes its default value zero.
> 

Sorry, I'm clearly blind.

> > It effectively makes it a no-op but it was a no-op already and adjusting
> > that default should be supported by data indicating it's safe.
> 
> Would it be acceptable to demonstrate using tracing that in both the
> pre- and post-patch cases
> 
>   1. compaction is attempted regardless of fragmentation index,
>      excepting that
> 
>   2. reclaim is preferred even for non-zero fragmentation during
>      an extreme shortage of memory
> 

If you can demonstrate that for both reclaim-intensive and
compaction-intensive workloads then yes. Also include the reclaim and
compaction stats from /proc/vmstat and not just tracepoints to demonstrate
that reclaim doesn't get out of control and reclaim the world in
response to failed high-order allocations such as THP.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19 12:39         ` Michal Hocko
@ 2018-02-19 14:30           ` Robert Harris
  -1 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 14:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon



> On 19 Feb 2018, at 12:39, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Mon 19-02-18 12:14:26, Robert Harris wrote:
>> 
>> 
>>> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
>>> 
>>> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
>>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>>>> 
>>>> __fragmentation_index() calculates a value used to determine whether
>>>> compaction should be favoured over page reclaim in the event of allocation
>>>> failure.  The calculation itself is opaque and, on inspection, does not
>>>> match its existing description.  The function purports to return a value
>>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>>>> pathological shortfall of memory, the lower bound is instead 500.  This is
>>>> significant because it is the default value of sysctl_extfrag_threshold,
>>>> i.e. the value below which compaction should be avoided in favour of page
>>>> reclaim for costly pages.
>>>> 
>>>> This patch implements and documents a modified version of the original
>>>> expression that returns a value in the range 0 <= index < 1000.  It amends
>>>> the default value of sysctl_extfrag_threshold to preserve the existing
>>>> behaviour.
>>> 
>>> It is not really clear to me what is the actual problem you are trying
>>> to solve by this patch. Is there any bug or are you just trying to
>>> improve the current implementation to be more effective?
>> 
>> There is not a significant bug.
>> 
>> The first problem is that the mathematical expression in
>> __fragmentation_index() is opaque, particularly given the lack of
>> description in the comments or the original commit message.  This patch
>> provides such a description.
>> 
>> Simply annotating the expression did not make sense since the formula
>> doesn't work as advertised.  The fragmentation index is described as
>> being in the range 0 to 1000 but the bounds of the formula are instead
>> 500 to 1000.  This patch changes the formula so that its lower bound is
>> 0.
> 
> But why do we want to fix that in the first place? Why don't we simply
> deprecate the tunable and remove it altogether? Who is relying on tuning
> this option. Considering how it doesn't work as advertised and nobody
> complaining I have that feeling that it is not really used in wild…

I think it's a useful feature.  Ignoring any contrived test case, there
will always be a lower limit on the degree of fragmentation that can be
achieved by compaction.  If someone takes the trouble to measure this
then it is entirely reasonable that he or she should be able to inhibit
compaction for cases when fragmentation falls below some correspondingly
sized threshold.

I hope to improve upon the decison-making strategy in the allocator slow
path but that is not a short term goal.  The current patch is an
improvement for the interim.

Robert Harris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19 14:30           ` Robert Harris
  0 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 14:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon



> On 19 Feb 2018, at 12:39, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Mon 19-02-18 12:14:26, Robert Harris wrote:
>> 
>> 
>>> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
>>> 
>>> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
>>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>>>> 
>>>> __fragmentation_index() calculates a value used to determine whether
>>>> compaction should be favoured over page reclaim in the event of allocation
>>>> failure.  The calculation itself is opaque and, on inspection, does not
>>>> match its existing description.  The function purports to return a value
>>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>>>> pathological shortfall of memory, the lower bound is instead 500.  This is
>>>> significant because it is the default value of sysctl_extfrag_threshold,
>>>> i.e. the value below which compaction should be avoided in favour of page
>>>> reclaim for costly pages.
>>>> 
>>>> This patch implements and documents a modified version of the original
>>>> expression that returns a value in the range 0 <= index < 1000.  It amends
>>>> the default value of sysctl_extfrag_threshold to preserve the existing
>>>> behaviour.
>>> 
>>> It is not really clear to me what is the actual problem you are trying
>>> to solve by this patch. Is there any bug or are you just trying to
>>> improve the current implementation to be more effective?
>> 
>> There is not a significant bug.
>> 
>> The first problem is that the mathematical expression in
>> __fragmentation_index() is opaque, particularly given the lack of
>> description in the comments or the original commit message.  This patch
>> provides such a description.
>> 
>> Simply annotating the expression did not make sense since the formula
>> doesn't work as advertised.  The fragmentation index is described as
>> being in the range 0 to 1000 but the bounds of the formula are instead
>> 500 to 1000.  This patch changes the formula so that its lower bound is
>> 0.
> 
> But why do we want to fix that in the first place? Why don't we simply
> deprecate the tunable and remove it altogether? Who is relying on tuning
> this option. Considering how it doesn't work as advertised and nobody
> complaining I have that feeling that it is not really used in wild…

I think it's a useful feature.  Ignoring any contrived test case, there
will always be a lower limit on the degree of fragmentation that can be
achieved by compaction.  If someone takes the trouble to measure this
then it is entirely reasonable that he or she should be able to inhibit
compaction for cases when fragmentation falls below some correspondingly
sized threshold.

I hope to improve upon the decison-making strategy in the allocator slow
path but that is not a short term goal.  The current patch is an
improvement for the interim.

Robert Harris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19 13:10         ` Mel Gorman
@ 2018-02-19 14:37           ` Robert Harris
  -1 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 14:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon



> On 19 Feb 2018, at 13:10, Mel Gorman <mgorman@suse.de> wrote:
> 
> On Mon, Feb 19, 2018 at 12:26:39PM +0000, Robert Harris wrote:
>> 
>> 
>>> On 19 Feb 2018, at 09:47, Mel Gorman <mgorman@suse.de> wrote:
>>> 
>>> On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
>>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>>>> 
>>>> __fragmentation_index() calculates a value used to determine whether
>>>> compaction should be favoured over page reclaim in the event of allocation
>>>> failure.  The calculation itself is opaque and, on inspection, does not
>>>> match its existing description.  The function purports to return a value
>>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>>>> pathological shortfall of memory, the lower bound is instead 500.  This is
>>>> significant because it is the default value of sysctl_extfrag_threshold,
>>>> i.e. the value below which compaction should be avoided in favour of page
>>>> reclaim for costly pages.
>>>> 
>>>> This patch implements and documents a modified version of the original
>>>> expression that returns a value in the range 0 <= index < 1000.  It amends
>>>> the default value of sysctl_extfrag_threshold to preserve the existing
>>>> behaviour.
>>>> 
>>>> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
>>> 
>>> You have to update sysctl_extfrag_threshold as well for the new bounds.
>> 
>> This patch makes its default value zero.
>> 
> 
> Sorry, I'm clearly blind.
> 
>>> It effectively makes it a no-op but it was a no-op already and adjusting
>>> that default should be supported by data indicating it's safe.
>> 
>> Would it be acceptable to demonstrate using tracing that in both the
>> pre- and post-patch cases
>> 
>>  1. compaction is attempted regardless of fragmentation index,
>>     excepting that
>> 
>>  2. reclaim is preferred even for non-zero fragmentation during
>>     an extreme shortage of memory
>> 
> 
> If you can demonstrate that for both reclaim-intensive and
> compaction-intensive workloads then yes. Also include the reclaim and
> compaction stats from /proc/vmstat and not just tracepoints to demonstrate
> that reclaim doesn't get out of control and reclaim the world in
> response to failed high-order allocations such as THP.

Understood.  Thanks.

Robert Harris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-19 14:37           ` Robert Harris
  0 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-19 14:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Yisheng Xie, Davidlohr Bueso, Greg Kroah-Hartman,
	Huang Ying, Vinayak Menon



> On 19 Feb 2018, at 13:10, Mel Gorman <mgorman@suse.de> wrote:
> 
> On Mon, Feb 19, 2018 at 12:26:39PM +0000, Robert Harris wrote:
>> 
>> 
>>> On 19 Feb 2018, at 09:47, Mel Gorman <mgorman@suse.de> wrote:
>>> 
>>> On Sun, Feb 18, 2018 at 04:47:55PM +0000, robert.m.harris@oracle.com wrote:
>>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>>>> 
>>>> __fragmentation_index() calculates a value used to determine whether
>>>> compaction should be favoured over page reclaim in the event of allocation
>>>> failure.  The calculation itself is opaque and, on inspection, does not
>>>> match its existing description.  The function purports to return a value
>>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>>>> pathological shortfall of memory, the lower bound is instead 500.  This is
>>>> significant because it is the default value of sysctl_extfrag_threshold,
>>>> i.e. the value below which compaction should be avoided in favour of page
>>>> reclaim for costly pages.
>>>> 
>>>> This patch implements and documents a modified version of the original
>>>> expression that returns a value in the range 0 <= index < 1000.  It amends
>>>> the default value of sysctl_extfrag_threshold to preserve the existing
>>>> behaviour.
>>>> 
>>>> Signed-off-by: Robert M. Harris <robert.m.harris@oracle.com>
>>> 
>>> You have to update sysctl_extfrag_threshold as well for the new bounds.
>> 
>> This patch makes its default value zero.
>> 
> 
> Sorry, I'm clearly blind.
> 
>>> It effectively makes it a no-op but it was a no-op already and adjusting
>>> that default should be supported by data indicating it's safe.
>> 
>> Would it be acceptable to demonstrate using tracing that in both the
>> pre- and post-patch cases
>> 
>>  1. compaction is attempted regardless of fragmentation index,
>>     excepting that
>> 
>>  2. reclaim is preferred even for non-zero fragmentation during
>>     an extreme shortage of memory
>> 
> 
> If you can demonstrate that for both reclaim-intensive and
> compaction-intensive workloads then yes. Also include the reclaim and
> compaction stats from /proc/vmstat and not just tracepoints to demonstrate
> that reclaim doesn't get out of control and reclaim the world in
> response to failed high-order allocations such as THP.

Understood.  Thanks.

Robert Harris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-19 14:30           ` Robert Harris
@ 2018-02-23  9:10             ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-23  9:10 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Mon 19-02-18 14:30:36, Robert Harris wrote:
> 
> 
> > On 19 Feb 2018, at 12:39, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > On Mon 19-02-18 12:14:26, Robert Harris wrote:
> >> 
> >> 
> >>> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
> >>> 
> >>> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
> >>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> >>>> 
> >>>> __fragmentation_index() calculates a value used to determine whether
> >>>> compaction should be favoured over page reclaim in the event of allocation
> >>>> failure.  The calculation itself is opaque and, on inspection, does not
> >>>> match its existing description.  The function purports to return a value
> >>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> >>>> pathological shortfall of memory, the lower bound is instead 500.  This is
> >>>> significant because it is the default value of sysctl_extfrag_threshold,
> >>>> i.e. the value below which compaction should be avoided in favour of page
> >>>> reclaim for costly pages.
> >>>> 
> >>>> This patch implements and documents a modified version of the original
> >>>> expression that returns a value in the range 0 <= index < 1000.  It amends
> >>>> the default value of sysctl_extfrag_threshold to preserve the existing
> >>>> behaviour.
> >>> 
> >>> It is not really clear to me what is the actual problem you are trying
> >>> to solve by this patch. Is there any bug or are you just trying to
> >>> improve the current implementation to be more effective?
> >> 
> >> There is not a significant bug.
> >> 
> >> The first problem is that the mathematical expression in
> >> __fragmentation_index() is opaque, particularly given the lack of
> >> description in the comments or the original commit message.  This patch
> >> provides such a description.
> >> 
> >> Simply annotating the expression did not make sense since the formula
> >> doesn't work as advertised.  The fragmentation index is described as
> >> being in the range 0 to 1000 but the bounds of the formula are instead
> >> 500 to 1000.  This patch changes the formula so that its lower bound is
> >> 0.
> > 
> > But why do we want to fix that in the first place? Why don't we simply
> > deprecate the tunable and remove it altogether? Who is relying on tuning
> > this option. Considering how it doesn't work as advertised and nobody
> > complaining I have that feeling that it is not really used in wild…
> 
> I think it's a useful feature.  Ignoring any contrived test case, there
> will always be a lower limit on the degree of fragmentation that can be
> achieved by compaction.  If someone takes the trouble to measure this
> then it is entirely reasonable that he or she should be able to inhibit
> compaction for cases when fragmentation falls below some correspondingly
> sized threshold.

Do you have any practical examples?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-23  9:10             ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-23  9:10 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Mon 19-02-18 14:30:36, Robert Harris wrote:
> 
> 
> > On 19 Feb 2018, at 12:39, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > On Mon 19-02-18 12:14:26, Robert Harris wrote:
> >> 
> >> 
> >>> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
> >>> 
> >>> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
> >>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
> >>>> 
> >>>> __fragmentation_index() calculates a value used to determine whether
> >>>> compaction should be favoured over page reclaim in the event of allocation
> >>>> failure.  The calculation itself is opaque and, on inspection, does not
> >>>> match its existing description.  The function purports to return a value
> >>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
> >>>> pathological shortfall of memory, the lower bound is instead 500.  This is
> >>>> significant because it is the default value of sysctl_extfrag_threshold,
> >>>> i.e. the value below which compaction should be avoided in favour of page
> >>>> reclaim for costly pages.
> >>>> 
> >>>> This patch implements and documents a modified version of the original
> >>>> expression that returns a value in the range 0 <= index < 1000.  It amends
> >>>> the default value of sysctl_extfrag_threshold to preserve the existing
> >>>> behaviour.
> >>> 
> >>> It is not really clear to me what is the actual problem you are trying
> >>> to solve by this patch. Is there any bug or are you just trying to
> >>> improve the current implementation to be more effective?
> >> 
> >> There is not a significant bug.
> >> 
> >> The first problem is that the mathematical expression in
> >> __fragmentation_index() is opaque, particularly given the lack of
> >> description in the comments or the original commit message.  This patch
> >> provides such a description.
> >> 
> >> Simply annotating the expression did not make sense since the formula
> >> doesn't work as advertised.  The fragmentation index is described as
> >> being in the range 0 to 1000 but the bounds of the formula are instead
> >> 500 to 1000.  This patch changes the formula so that its lower bound is
> >> 0.
> > 
> > But why do we want to fix that in the first place? Why don't we simply
> > deprecate the tunable and remove it altogether? Who is relying on tuning
> > this option. Considering how it doesn't work as advertised and nobody
> > complaining I have that feeling that it is not really used in wilda?|
> 
> I think it's a useful feature.  Ignoring any contrived test case, there
> will always be a lower limit on the degree of fragmentation that can be
> achieved by compaction.  If someone takes the trouble to measure this
> then it is entirely reasonable that he or she should be able to inhibit
> compaction for cases when fragmentation falls below some correspondingly
> sized threshold.

Do you have any practical examples?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-23  9:10             ` Michal Hocko
@ 2018-02-23 13:40               ` Robert Harris
  -1 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-23 13:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon



> On 23 Feb 2018, at 09:10, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Mon 19-02-18 14:30:36, Robert Harris wrote:
>> 
>> 
>>> On 19 Feb 2018, at 12:39, Michal Hocko <mhocko@kernel.org> wrote:
>>> 
>>> On Mon 19-02-18 12:14:26, Robert Harris wrote:
>>>> 
>>>> 
>>>>> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
>>>>> 
>>>>> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
>>>>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>>>>>> 
>>>>>> __fragmentation_index() calculates a value used to determine whether
>>>>>> compaction should be favoured over page reclaim in the event of allocation
>>>>>> failure.  The calculation itself is opaque and, on inspection, does not
>>>>>> match its existing description.  The function purports to return a value
>>>>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>>>>>> pathological shortfall of memory, the lower bound is instead 500.  This is
>>>>>> significant because it is the default value of sysctl_extfrag_threshold,
>>>>>> i.e. the value below which compaction should be avoided in favour of page
>>>>>> reclaim for costly pages.
>>>>>> 
>>>>>> This patch implements and documents a modified version of the original
>>>>>> expression that returns a value in the range 0 <= index < 1000.  It amends
>>>>>> the default value of sysctl_extfrag_threshold to preserve the existing
>>>>>> behaviour.
>>>>> 
>>>>> It is not really clear to me what is the actual problem you are trying
>>>>> to solve by this patch. Is there any bug or are you just trying to
>>>>> improve the current implementation to be more effective?
>>>> 
>>>> There is not a significant bug.
>>>> 
>>>> The first problem is that the mathematical expression in
>>>> __fragmentation_index() is opaque, particularly given the lack of
>>>> description in the comments or the original commit message.  This patch
>>>> provides such a description.
>>>> 
>>>> Simply annotating the expression did not make sense since the formula
>>>> doesn't work as advertised.  The fragmentation index is described as
>>>> being in the range 0 to 1000 but the bounds of the formula are instead
>>>> 500 to 1000.  This patch changes the formula so that its lower bound is
>>>> 0.
>>> 
>>> But why do we want to fix that in the first place? Why don't we simply
>>> deprecate the tunable and remove it altogether? Who is relying on tuning
>>> this option. Considering how it doesn't work as advertised and nobody
>>> complaining I have that feeling that it is not really used in wild…
>> 
>> I think it's a useful feature.  Ignoring any contrived test case, there
>> will always be a lower limit on the degree of fragmentation that can be
>> achieved by compaction.  If someone takes the trouble to measure this
>> then it is entirely reasonable that he or she should be able to inhibit
>> compaction for cases when fragmentation falls below some correspondingly
>> sized threshold.
> 
> Do you have any practical examples?

Are you looking for proof that the existing feature is useful?

It is possible today to induce compaction, observe a fragmentation index
and then use the same index as a starting point for setting the
tuneable.  The fact that the actual range of reported indices is
500--1000 rather than the documented 0--1000 would have no practical
effect on this approach.  Therefore that fact that the feature doesn't
work precisely as advertised does not mean that it is not useful.

If you are asking me to prove whether modifying the tuneable in the
manner above, thereby preferring compaction for more fragmented systems,
is successful then I can't answer now.  I assume that the onus would
have been on Mel to show this at the time of the original commit.
However, I interpret his last comment on this patch as a request to
verify that changing the preference yields sane results.

Robert Harris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-23 13:40               ` Robert Harris
  0 siblings, 0 replies; 29+ messages in thread
From: Robert Harris @ 2018-02-23 13:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon



> On 23 Feb 2018, at 09:10, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Mon 19-02-18 14:30:36, Robert Harris wrote:
>> 
>> 
>>> On 19 Feb 2018, at 12:39, Michal Hocko <mhocko@kernel.org> wrote:
>>> 
>>> On Mon 19-02-18 12:14:26, Robert Harris wrote:
>>>> 
>>>> 
>>>>> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@kernel.org> wrote:
>>>>> 
>>>>> On Sun 18-02-18 16:47:55, robert.m.harris@oracle.com wrote:
>>>>>> From: "Robert M. Harris" <robert.m.harris@oracle.com>
>>>>>> 
>>>>>> __fragmentation_index() calculates a value used to determine whether
>>>>>> compaction should be favoured over page reclaim in the event of allocation
>>>>>> failure.  The calculation itself is opaque and, on inspection, does not
>>>>>> match its existing description.  The function purports to return a value
>>>>>> between 0 and 1000, representing units of 1/1000.  Barring the case of a
>>>>>> pathological shortfall of memory, the lower bound is instead 500.  This is
>>>>>> significant because it is the default value of sysctl_extfrag_threshold,
>>>>>> i.e. the value below which compaction should be avoided in favour of page
>>>>>> reclaim for costly pages.
>>>>>> 
>>>>>> This patch implements and documents a modified version of the original
>>>>>> expression that returns a value in the range 0 <= index < 1000.  It amends
>>>>>> the default value of sysctl_extfrag_threshold to preserve the existing
>>>>>> behaviour.
>>>>> 
>>>>> It is not really clear to me what is the actual problem you are trying
>>>>> to solve by this patch. Is there any bug or are you just trying to
>>>>> improve the current implementation to be more effective?
>>>> 
>>>> There is not a significant bug.
>>>> 
>>>> The first problem is that the mathematical expression in
>>>> __fragmentation_index() is opaque, particularly given the lack of
>>>> description in the comments or the original commit message.  This patch
>>>> provides such a description.
>>>> 
>>>> Simply annotating the expression did not make sense since the formula
>>>> doesn't work as advertised.  The fragmentation index is described as
>>>> being in the range 0 to 1000 but the bounds of the formula are instead
>>>> 500 to 1000.  This patch changes the formula so that its lower bound is
>>>> 0.
>>> 
>>> But why do we want to fix that in the first place? Why don't we simply
>>> deprecate the tunable and remove it altogether? Who is relying on tuning
>>> this option. Considering how it doesn't work as advertised and nobody
>>> complaining I have that feeling that it is not really used in wild…
>> 
>> I think it's a useful feature.  Ignoring any contrived test case, there
>> will always be a lower limit on the degree of fragmentation that can be
>> achieved by compaction.  If someone takes the trouble to measure this
>> then it is entirely reasonable that he or she should be able to inhibit
>> compaction for cases when fragmentation falls below some correspondingly
>> sized threshold.
> 
> Do you have any practical examples?

Are you looking for proof that the existing feature is useful?

It is possible today to induce compaction, observe a fragmentation index
and then use the same index as a starting point for setting the
tuneable.  The fact that the actual range of reported indices is
500--1000 rather than the documented 0--1000 would have no practical
effect on this approach.  Therefore that fact that the feature doesn't
work precisely as advertised does not mean that it is not useful.

If you are asking me to prove whether modifying the tuneable in the
manner above, thereby preferring compaction for more fragmented systems,
is successful then I can't answer now.  I assume that the onus would
have been on Mel to show this at the time of the original commit.
However, I interpret his last comment on this patch as a request to
verify that changing the preference yields sane results.

Robert Harris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
  2018-02-23 13:40               ` Robert Harris
@ 2018-02-23 13:52                 ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-23 13:52 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Fri 23-02-18 13:40:09, Robert Harris wrote:
> If you are asking me to prove whether modifying the tuneable in the
> manner above, thereby preferring compaction for more fragmented systems,
> is successful then I can't answer now.  I assume that the onus would
> have been on Mel to show this at the time of the original commit.
> However, I interpret his last comment on this patch as a request to
> verify that changing the preference yields sane results.

Yes, this is exactly were I was aiming... This might have been useful
during the initial compaction implementation but I am not aware of any
real users and I am also quite skeptical it is very much useful. I do
realize that this is hand waving because I do not have any numbers at
hands. The bottom line is that the users should care, really. The
compaction should be as automatic as possible. We can argue about
tuning for certain allocation orders and make the compaction more
pro-active to provide lower latencies for those requests but deciding
whether to reclaim or compact sounds like a too low level decision for
admin to make and kind of unstable interface for different kernels as
the implementation of the compaction changes over time.

So I would really prefer to kill the tuning than try to "fix" it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()
@ 2018-02-23 13:52                 ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2018-02-23 13:52 UTC (permalink / raw)
  To: Robert Harris
  Cc: linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	Andrew Morton, Vlastimil Babka, Kirill A. Shutemov,
	Johannes Weiner, Kemi Wang, David Rientjes, Yafang Shao,
	Kangmin Park, Mel Gorman, Yisheng Xie, Davidlohr Bueso,
	Greg Kroah-Hartman, Huang Ying, Vinayak Menon

On Fri 23-02-18 13:40:09, Robert Harris wrote:
> If you are asking me to prove whether modifying the tuneable in the
> manner above, thereby preferring compaction for more fragmented systems,
> is successful then I can't answer now.  I assume that the onus would
> have been on Mel to show this at the time of the original commit.
> However, I interpret his last comment on this patch as a request to
> verify that changing the preference yields sane results.

Yes, this is exactly were I was aiming... This might have been useful
during the initial compaction implementation but I am not aware of any
real users and I am also quite skeptical it is very much useful. I do
realize that this is hand waving because I do not have any numbers at
hands. The bottom line is that the users should care, really. The
compaction should be as automatic as possible. We can argue about
tuning for certain allocation orders and make the compaction more
pro-active to provide lower latencies for those requests but deciding
whether to reclaim or compact sounds like a too low level decision for
admin to make and kind of unstable interface for different kernels as
the implementation of the compaction changes over time.

So I would really prefer to kill the tuning than try to "fix" it.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-02-23 13:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-18 16:47 [PATCH 0/1] mm, compaction: correct the bounds of __fragmentation_index() robert.m.harris
2018-02-18 16:47 ` robert.m.harris
2018-02-18 16:47 ` [PATCH 1/1] " robert.m.harris
2018-02-18 16:47   ` robert.m.harris
2018-02-19  8:26   ` Michal Hocko
2018-02-19  8:26     ` Michal Hocko
2018-02-19 12:14     ` Robert Harris
2018-02-19 12:14       ` Robert Harris
2018-02-19 12:39       ` Michal Hocko
2018-02-19 12:39         ` Michal Hocko
2018-02-19 14:30         ` Robert Harris
2018-02-19 14:30           ` Robert Harris
2018-02-23  9:10           ` Michal Hocko
2018-02-23  9:10             ` Michal Hocko
2018-02-23 13:40             ` Robert Harris
2018-02-23 13:40               ` Robert Harris
2018-02-23 13:52               ` Michal Hocko
2018-02-23 13:52                 ` Michal Hocko
2018-02-19  9:47   ` Mel Gorman
2018-02-19  9:47     ` Mel Gorman
2018-02-19 12:26     ` Robert Harris
2018-02-19 12:26       ` Robert Harris
2018-02-19 13:10       ` Mel Gorman
2018-02-19 13:10         ` Mel Gorman
2018-02-19 14:37         ` Robert Harris
2018-02-19 14:37           ` Robert Harris
2018-02-19  8:24 ` [PATCH 0/1] " Michal Hocko
2018-02-19  8:24   ` Michal Hocko
2018-02-19 11:40   ` Robert Harris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.