All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
@ 2014-05-29  9:04 ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2014-05-29  9:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

The fair zone allocation policy round-robins allocations between zones on
a node to avoid age inversion problems during reclaim using a counter to
manage the round-robin. If the first allocation fails, the batch counts get
reset and the allocation is attempted again before going into the slow path.
There are at least two problems with this

1. If the eligible zones are below the low watermark we reset the counts
   even though the batches might be fine.
2. We potentially do batch resets even when the right choice is to fallback
   to other nodes.

When resetting batch counts, it was expected that the count would be <=
0 but the bizarre side-effect is that we are resetting counters that were
initially postive so (high - low - batch) potentially sets a high positive
batch count to close to 0. This leads to a premature reset in the near
future, more overhead and more ... screwing around.

The user-visible effect depends on zone sizes and a host of other effects
the obvious one is that single-node machines with multiple zones will see
degraded performance for streaming readers at least. The effect is also
visible on NUMA machines but it may be harder to identify in the midst of
other noise.

Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
machine and on an ext3 filesystem. Baseline kernel is mmotm with the
shrinker and proportional reclaim patches on top.

                                      3.15.0-rc5            3.15.0-rc5
                                  mmotm-20140528         fairzone-v1r1
Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)

Streaming readers see a huge boost. Random random readers, sequential
writers and random writers are all in the noise.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2c7d394..70d4264 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1919,6 +1919,28 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
 
 #endif	/* CONFIG_NUMA */
 
+static void reset_alloc_batches(struct zonelist *zonelist,
+				enum zone_type high_zoneidx,
+				struct zone *preferred_zone)
+{
+	struct zoneref *z;
+	struct zone *zone;
+
+	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+		/*
+		 * Only reset the batches of zones that were actually
+		 * considered in the fairness pass, we don't want to
+		 * trash fairness information for zones that are not
+		 * actually part of this zonelist's round-robin cycle.
+		 */
+		if (!zone_local(preferred_zone, zone))
+			continue;
+		mod_zone_page_state(zone, NR_ALLOC_BATCH,
+			high_wmark_pages(zone) - low_wmark_pages(zone) -
+			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
+	}
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -1936,6 +1958,7 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
 	bool consider_zone_dirty = (alloc_flags & ALLOC_WMARK_LOW) &&
 				(gfp_mask & __GFP_WRITE);
+	bool batch_depleted = (alloc_flags & ALLOC_FAIR);
 
 zonelist_scan:
 	/*
@@ -1960,11 +1982,13 @@ zonelist_scan:
 		 * time the page has in memory before being reclaimed.
 		 */
 		if (alloc_flags & ALLOC_FAIR) {
-			if (!zone_local(preferred_zone, zone))
-				continue;
 			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
 				continue;
+			batch_depleted = false;
+			if (!zone_local(preferred_zone, zone))
+				continue;
 		}
+
 		/*
 		 * When allocating a page cache page for writing, we
 		 * want to get it from a zone that is within its dirty
@@ -2075,7 +2099,7 @@ this_zone_full:
 		goto zonelist_scan;
 	}
 
-	if (page)
+	if (page) {
 		/*
 		 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
 		 * necessary to allocate the page. The expectation is
@@ -2084,6 +2108,25 @@ this_zone_full:
 		 * for !PFMEMALLOC purposes.
 		 */
 		page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
+	} else {
+		/*
+		 * The first pass makes sure allocations are spread
+		 * fairly within the local node.  However, the local
+		 * node might have free pages left after the fairness
+		 * batches are exhausted, and remote zones haven't
+		 * even been considered yet.  Try once more without
+		 * fairness, and include remote zones now, before
+		 * entering the slowpath and waking kswapd: prefer
+		 * spilling to a remote zone over swapping locally.
+		 */
+		if ((alloc_flags & ALLOC_FAIR)) {
+			if (batch_depleted)
+				reset_alloc_batches(zonelist, high_zoneidx,
+					    preferred_zone);
+			alloc_flags &= ~ALLOC_FAIR;
+			goto zonelist_scan;
+		}
+	}
 
 	return page;
 }
@@ -2424,28 +2467,6 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-static void reset_alloc_batches(struct zonelist *zonelist,
-				enum zone_type high_zoneidx,
-				struct zone *preferred_zone)
-{
-	struct zoneref *z;
-	struct zone *zone;
-
-	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
-		/*
-		 * Only reset the batches of zones that were actually
-		 * considered in the fairness pass, we don't want to
-		 * trash fairness information for zones that are not
-		 * actually part of this zonelist's round-robin cycle.
-		 */
-		if (!zone_local(preferred_zone, zone))
-			continue;
-		mod_zone_page_state(zone, NR_ALLOC_BATCH,
-			high_wmark_pages(zone) - low_wmark_pages(zone) -
-			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
-	}
-}
-
 static void wake_all_kswapds(unsigned int order,
 			     struct zonelist *zonelist,
 			     enum zone_type high_zoneidx,
@@ -2783,29 +2804,12 @@ retry_cpuset:
 	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
 #endif
-retry:
 	/* First allocation attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
 			preferred_zone, classzone_idx, migratetype);
 	if (unlikely(!page)) {
 		/*
-		 * The first pass makes sure allocations are spread
-		 * fairly within the local node.  However, the local
-		 * node might have free pages left after the fairness
-		 * batches are exhausted, and remote zones haven't
-		 * even been considered yet.  Try once more without
-		 * fairness, and include remote zones now, before
-		 * entering the slowpath and waking kswapd: prefer
-		 * spilling to a remote zone over swapping locally.
-		 */
-		if (alloc_flags & ALLOC_FAIR) {
-			reset_alloc_batches(zonelist, high_zoneidx,
-					    preferred_zone);
-			alloc_flags &= ~ALLOC_FAIR;
-			goto retry;
-		}
-		/*
 		 * Runtime PM, block IO and its error handling path
 		 * can deadlock because I/O on the device might not
 		 * complete.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
@ 2014-05-29  9:04 ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2014-05-29  9:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

The fair zone allocation policy round-robins allocations between zones on
a node to avoid age inversion problems during reclaim using a counter to
manage the round-robin. If the first allocation fails, the batch counts get
reset and the allocation is attempted again before going into the slow path.
There are at least two problems with this

1. If the eligible zones are below the low watermark we reset the counts
   even though the batches might be fine.
2. We potentially do batch resets even when the right choice is to fallback
   to other nodes.

When resetting batch counts, it was expected that the count would be <=
0 but the bizarre side-effect is that we are resetting counters that were
initially postive so (high - low - batch) potentially sets a high positive
batch count to close to 0. This leads to a premature reset in the near
future, more overhead and more ... screwing around.

The user-visible effect depends on zone sizes and a host of other effects
the obvious one is that single-node machines with multiple zones will see
degraded performance for streaming readers at least. The effect is also
visible on NUMA machines but it may be harder to identify in the midst of
other noise.

Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
machine and on an ext3 filesystem. Baseline kernel is mmotm with the
shrinker and proportional reclaim patches on top.

                                      3.15.0-rc5            3.15.0-rc5
                                  mmotm-20140528         fairzone-v1r1
Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)

Streaming readers see a huge boost. Random random readers, sequential
writers and random writers are all in the noise.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2c7d394..70d4264 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1919,6 +1919,28 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
 
 #endif	/* CONFIG_NUMA */
 
+static void reset_alloc_batches(struct zonelist *zonelist,
+				enum zone_type high_zoneidx,
+				struct zone *preferred_zone)
+{
+	struct zoneref *z;
+	struct zone *zone;
+
+	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+		/*
+		 * Only reset the batches of zones that were actually
+		 * considered in the fairness pass, we don't want to
+		 * trash fairness information for zones that are not
+		 * actually part of this zonelist's round-robin cycle.
+		 */
+		if (!zone_local(preferred_zone, zone))
+			continue;
+		mod_zone_page_state(zone, NR_ALLOC_BATCH,
+			high_wmark_pages(zone) - low_wmark_pages(zone) -
+			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
+	}
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -1936,6 +1958,7 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
 	bool consider_zone_dirty = (alloc_flags & ALLOC_WMARK_LOW) &&
 				(gfp_mask & __GFP_WRITE);
+	bool batch_depleted = (alloc_flags & ALLOC_FAIR);
 
 zonelist_scan:
 	/*
@@ -1960,11 +1982,13 @@ zonelist_scan:
 		 * time the page has in memory before being reclaimed.
 		 */
 		if (alloc_flags & ALLOC_FAIR) {
-			if (!zone_local(preferred_zone, zone))
-				continue;
 			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
 				continue;
+			batch_depleted = false;
+			if (!zone_local(preferred_zone, zone))
+				continue;
 		}
+
 		/*
 		 * When allocating a page cache page for writing, we
 		 * want to get it from a zone that is within its dirty
@@ -2075,7 +2099,7 @@ this_zone_full:
 		goto zonelist_scan;
 	}
 
-	if (page)
+	if (page) {
 		/*
 		 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
 		 * necessary to allocate the page. The expectation is
@@ -2084,6 +2108,25 @@ this_zone_full:
 		 * for !PFMEMALLOC purposes.
 		 */
 		page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
+	} else {
+		/*
+		 * The first pass makes sure allocations are spread
+		 * fairly within the local node.  However, the local
+		 * node might have free pages left after the fairness
+		 * batches are exhausted, and remote zones haven't
+		 * even been considered yet.  Try once more without
+		 * fairness, and include remote zones now, before
+		 * entering the slowpath and waking kswapd: prefer
+		 * spilling to a remote zone over swapping locally.
+		 */
+		if ((alloc_flags & ALLOC_FAIR)) {
+			if (batch_depleted)
+				reset_alloc_batches(zonelist, high_zoneidx,
+					    preferred_zone);
+			alloc_flags &= ~ALLOC_FAIR;
+			goto zonelist_scan;
+		}
+	}
 
 	return page;
 }
@@ -2424,28 +2467,6 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-static void reset_alloc_batches(struct zonelist *zonelist,
-				enum zone_type high_zoneidx,
-				struct zone *preferred_zone)
-{
-	struct zoneref *z;
-	struct zone *zone;
-
-	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
-		/*
-		 * Only reset the batches of zones that were actually
-		 * considered in the fairness pass, we don't want to
-		 * trash fairness information for zones that are not
-		 * actually part of this zonelist's round-robin cycle.
-		 */
-		if (!zone_local(preferred_zone, zone))
-			continue;
-		mod_zone_page_state(zone, NR_ALLOC_BATCH,
-			high_wmark_pages(zone) - low_wmark_pages(zone) -
-			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
-	}
-}
-
 static void wake_all_kswapds(unsigned int order,
 			     struct zonelist *zonelist,
 			     enum zone_type high_zoneidx,
@@ -2783,29 +2804,12 @@ retry_cpuset:
 	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
 #endif
-retry:
 	/* First allocation attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
 			preferred_zone, classzone_idx, migratetype);
 	if (unlikely(!page)) {
 		/*
-		 * The first pass makes sure allocations are spread
-		 * fairly within the local node.  However, the local
-		 * node might have free pages left after the fairness
-		 * batches are exhausted, and remote zones haven't
-		 * even been considered yet.  Try once more without
-		 * fairness, and include remote zones now, before
-		 * entering the slowpath and waking kswapd: prefer
-		 * spilling to a remote zone over swapping locally.
-		 */
-		if (alloc_flags & ALLOC_FAIR) {
-			reset_alloc_batches(zonelist, high_zoneidx,
-					    preferred_zone);
-			alloc_flags &= ~ALLOC_FAIR;
-			goto retry;
-		}
-		/*
 		 * Runtime PM, block IO and its error handling path
 		 * can deadlock because I/O on the device might not
 		 * complete.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
  2014-05-29  9:04 ` Mel Gorman
@ 2014-05-29 14:38   ` Johannes Weiner
  -1 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2014-05-29 14:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

Hi Mel!

On Thu, May 29, 2014 at 10:04:32AM +0100, Mel Gorman wrote:
> The fair zone allocation policy round-robins allocations between zones on
> a node to avoid age inversion problems during reclaim using a counter to
> manage the round-robin. If the first allocation fails, the batch counts get
> reset and the allocation is attempted again before going into the slow path.
> There are at least two problems with this
> 
> 1. If the eligible zones are below the low watermark we reset the counts
>    even though the batches might be fine.

The idea behind setting the batches to high-low was that they should
be roughly exhausted by the time the low watermark is hit.  And that
misconception must be the crux of this patch, because if they *were*
to exhaust together this patch wouldn't make a difference.

But once they diverge, we reset the batches prematurely, which means
not everybody is getting their fair share, and that reverts us back to
an imbalance in zone utilization.

So I think the changelog should include why this assumption was wrong.

> 2. We potentially do batch resets even when the right choice is to fallback
>    to other nodes.

We only fall back to other nodes when the fairness cycle is over and
all local zones have been considered fair and square.  Why *not* reset
the batches and start a new fairness cycle at this point?  Remember
that remote nodes are not - can not - be part of the fairness cycle.

So I think this one is a red herring.

> When resetting batch counts, it was expected that the count would be <=
> 0 but the bizarre side-effect is that we are resetting counters that were
> initially postive so (high - low - batch) potentially sets a high positive
> batch count to close to 0. This leads to a premature reset in the near
> future, more overhead and more ... screwing around.

We're just adding the missing delta between the "should" and "is"
value to the existing batch, so a high batch value means small delta,
and we *add* a value close to 0, we don't *set* the batch close to 0.

I think this one is a red herring as well.

> The user-visible effect depends on zone sizes and a host of other effects
> the obvious one is that single-node machines with multiple zones will see
> degraded performance for streaming readers at least. The effect is also
> visible on NUMA machines but it may be harder to identify in the midst of
> other noise.
> 
> Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
> machine and on an ext3 filesystem. Baseline kernel is mmotm with the
> shrinker and proportional reclaim patches on top.
> 
>                                       3.15.0-rc5            3.15.0-rc5
>                                   mmotm-20140528         fairzone-v1r1
> Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
> Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
> Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
> Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
> Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
> Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
> Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
> Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
> Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
> Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
> Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
> Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
> Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
> Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
> Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
> Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
> Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
> Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
> Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
> Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)
> 
> Streaming readers see a huge boost. Random random readers, sequential
> writers and random writers are all in the noise.

Impressive, but I would really like to understand what's going on
there.

Did you record the per-zone allocation numbers by any chance as well,
so we can see the difference in zone utilization?

> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------
>  1 file changed, 47 insertions(+), 42 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2c7d394..70d4264 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1919,6 +1919,28 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
>  
>  #endif	/* CONFIG_NUMA */
>  
> +static void reset_alloc_batches(struct zonelist *zonelist,
> +				enum zone_type high_zoneidx,
> +				struct zone *preferred_zone)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +
> +	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> +		/*
> +		 * Only reset the batches of zones that were actually
> +		 * considered in the fairness pass, we don't want to
> +		 * trash fairness information for zones that are not
> +		 * actually part of this zonelist's round-robin cycle.
> +		 */
> +		if (!zone_local(preferred_zone, zone))
> +			continue;
> +		mod_zone_page_state(zone, NR_ALLOC_BATCH,
> +			high_wmark_pages(zone) - low_wmark_pages(zone) -
> +			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
> +	}
> +}
> +
>  /*
>   * get_page_from_freelist goes through the zonelist trying to allocate
>   * a page.
> @@ -1936,6 +1958,7 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
>  	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
>  	bool consider_zone_dirty = (alloc_flags & ALLOC_WMARK_LOW) &&
>  				(gfp_mask & __GFP_WRITE);
> +	bool batch_depleted = (alloc_flags & ALLOC_FAIR);
>  
>  zonelist_scan:
>  	/*
> @@ -1960,11 +1982,13 @@ zonelist_scan:
>  		 * time the page has in memory before being reclaimed.
>  		 */
>  		if (alloc_flags & ALLOC_FAIR) {
> -			if (!zone_local(preferred_zone, zone))
> -				continue;
>  			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
>  				continue;
> +			batch_depleted = false;
> +			if (!zone_local(preferred_zone, zone))
> +				continue;

This only resets the local batches once the first non-local zone's
batch is exhausted as well.  Which means that once we start spilling,
the fairness pass will never consider local zones again until the
first spill-over target is exhausted too.  But no remote allocs are
allowed during the fairness cycle, so you're creating a pass over the
zonelist where only known-exhausted local zones are considered.

What's going on there?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
@ 2014-05-29 14:38   ` Johannes Weiner
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2014-05-29 14:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

Hi Mel!

On Thu, May 29, 2014 at 10:04:32AM +0100, Mel Gorman wrote:
> The fair zone allocation policy round-robins allocations between zones on
> a node to avoid age inversion problems during reclaim using a counter to
> manage the round-robin. If the first allocation fails, the batch counts get
> reset and the allocation is attempted again before going into the slow path.
> There are at least two problems with this
> 
> 1. If the eligible zones are below the low watermark we reset the counts
>    even though the batches might be fine.

The idea behind setting the batches to high-low was that they should
be roughly exhausted by the time the low watermark is hit.  And that
misconception must be the crux of this patch, because if they *were*
to exhaust together this patch wouldn't make a difference.

But once they diverge, we reset the batches prematurely, which means
not everybody is getting their fair share, and that reverts us back to
an imbalance in zone utilization.

So I think the changelog should include why this assumption was wrong.

> 2. We potentially do batch resets even when the right choice is to fallback
>    to other nodes.

We only fall back to other nodes when the fairness cycle is over and
all local zones have been considered fair and square.  Why *not* reset
the batches and start a new fairness cycle at this point?  Remember
that remote nodes are not - can not - be part of the fairness cycle.

So I think this one is a red herring.

> When resetting batch counts, it was expected that the count would be <=
> 0 but the bizarre side-effect is that we are resetting counters that were
> initially postive so (high - low - batch) potentially sets a high positive
> batch count to close to 0. This leads to a premature reset in the near
> future, more overhead and more ... screwing around.

We're just adding the missing delta between the "should" and "is"
value to the existing batch, so a high batch value means small delta,
and we *add* a value close to 0, we don't *set* the batch close to 0.

I think this one is a red herring as well.

> The user-visible effect depends on zone sizes and a host of other effects
> the obvious one is that single-node machines with multiple zones will see
> degraded performance for streaming readers at least. The effect is also
> visible on NUMA machines but it may be harder to identify in the midst of
> other noise.
> 
> Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
> machine and on an ext3 filesystem. Baseline kernel is mmotm with the
> shrinker and proportional reclaim patches on top.
> 
>                                       3.15.0-rc5            3.15.0-rc5
>                                   mmotm-20140528         fairzone-v1r1
> Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
> Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
> Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
> Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
> Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
> Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
> Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
> Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
> Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
> Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
> Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
> Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
> Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
> Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
> Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
> Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
> Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
> Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
> Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
> Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)
> 
> Streaming readers see a huge boost. Random random readers, sequential
> writers and random writers are all in the noise.

Impressive, but I would really like to understand what's going on
there.

Did you record the per-zone allocation numbers by any chance as well,
so we can see the difference in zone utilization?

> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------
>  1 file changed, 47 insertions(+), 42 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2c7d394..70d4264 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1919,6 +1919,28 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
>  
>  #endif	/* CONFIG_NUMA */
>  
> +static void reset_alloc_batches(struct zonelist *zonelist,
> +				enum zone_type high_zoneidx,
> +				struct zone *preferred_zone)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +
> +	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> +		/*
> +		 * Only reset the batches of zones that were actually
> +		 * considered in the fairness pass, we don't want to
> +		 * trash fairness information for zones that are not
> +		 * actually part of this zonelist's round-robin cycle.
> +		 */
> +		if (!zone_local(preferred_zone, zone))
> +			continue;
> +		mod_zone_page_state(zone, NR_ALLOC_BATCH,
> +			high_wmark_pages(zone) - low_wmark_pages(zone) -
> +			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
> +	}
> +}
> +
>  /*
>   * get_page_from_freelist goes through the zonelist trying to allocate
>   * a page.
> @@ -1936,6 +1958,7 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
>  	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
>  	bool consider_zone_dirty = (alloc_flags & ALLOC_WMARK_LOW) &&
>  				(gfp_mask & __GFP_WRITE);
> +	bool batch_depleted = (alloc_flags & ALLOC_FAIR);
>  
>  zonelist_scan:
>  	/*
> @@ -1960,11 +1982,13 @@ zonelist_scan:
>  		 * time the page has in memory before being reclaimed.
>  		 */
>  		if (alloc_flags & ALLOC_FAIR) {
> -			if (!zone_local(preferred_zone, zone))
> -				continue;
>  			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
>  				continue;
> +			batch_depleted = false;
> +			if (!zone_local(preferred_zone, zone))
> +				continue;

This only resets the local batches once the first non-local zone's
batch is exhausted as well.  Which means that once we start spilling,
the fairness pass will never consider local zones again until the
first spill-over target is exhausted too.  But no remote allocs are
allowed during the fairness cycle, so you're creating a pass over the
zonelist where only known-exhausted local zones are considered.

What's going on there?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
  2014-05-29 14:38   ` Johannes Weiner
@ 2014-05-29 17:16     ` Mel Gorman
  -1 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2014-05-29 17:16 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

On Thu, May 29, 2014 at 10:38:32AM -0400, Johannes Weiner wrote:
> Hi Mel!
> 
> On Thu, May 29, 2014 at 10:04:32AM +0100, Mel Gorman wrote:
> > The fair zone allocation policy round-robins allocations between zones on
> > a node to avoid age inversion problems during reclaim using a counter to
> > manage the round-robin. If the first allocation fails, the batch counts get
> > reset and the allocation is attempted again before going into the slow path.
> > There are at least two problems with this
> > 
> > 1. If the eligible zones are below the low watermark we reset the counts
> >    even though the batches might be fine.
> 
> The idea behind setting the batches to high-low was that they should
> be roughly exhausted by the time the low watermark is hit.  And that
> misconception must be the crux of this patch, because if they *were*
> to exhaust together this patch wouldn't make a difference.
> 
> But once they diverge, we reset the batches prematurely, which means
> not everybody is getting their fair share, and that reverts us back to
> an imbalance in zone utilization.
> 
> So I think the changelog should include why this assumption was wrong.
> 

They won't exhaust together when there are multiple allocation requests
simply on the basis that there is no lock there and there is per-cpu
accounting drift for vmstats. You'd at least expect them to drift by the
per-cpu update threshold.

> > 2. We potentially do batch resets even when the right choice is to fallback
> >    to other nodes.
> 
> We only fall back to other nodes when the fairness cycle is over and
> all local zones have been considered fair and square.  Why *not* reset
> the batches and start a new fairness cycle at this point?  Remember
> that remote nodes are not - can not - be part of the fairness cycle.
> 
> So I think this one is a red herring.
> 

Ok. There have been a lot of red herrings chasing down this one
unfortunately as it was not possible to bisect and there were a lot of
candidates :/

> > When resetting batch counts, it was expected that the count would be <=
> > 0 but the bizarre side-effect is that we are resetting counters that were
> > initially postive so (high - low - batch) potentially sets a high positive
> > batch count to close to 0. This leads to a premature reset in the near
> > future, more overhead and more ... screwing around.
> 
> We're just adding the missing delta between the "should" and "is"
> value to the existing batch, so a high batch value means small delta,
> and we *add* a value close to 0, we don't *set* the batch close to 0.
> 
> I think this one is a red herring as well.
> 

There are still boundary issues that results in screwing around and
maybe I should have focused on this one instead. The situation I had in
mind started out as follows

high zone alloc batch	1000	low watermark not ok
low zone alloc batch	   0	low watermark     ok

during the fairness cycle, no action can take place. The higher zone is not
allowed to allcoate at below the low watermark and must always enter the
slow path. The lower zone also temporarily cannot be used. At this point, a
reset takes place and the system continues until the low watermark is reached

high zone alloc batch	1000	low watermark not ok
low zone allooc batch	 100	low watermark not ok

During this window, every ALLOC_FAIR is going to fail to due watermarks but
still do another zone batch reset and recycle every time before falling
into the slow path.  It ends up being more zonelist traversals which is
why I moved the reset check inside get_page_from_freelist to detect the
difference between ALLOC_FAIL failures and watermarks failures.

The differences in timing when watermarks are hit may also account for
some of the drift for when the alloc batches get depleted.

> > The user-visible effect depends on zone sizes and a host of other effects
> > the obvious one is that single-node machines with multiple zones will see
> > degraded performance for streaming readers at least. The effect is also
> > visible on NUMA machines but it may be harder to identify in the midst of
> > other noise.
> > 
> > Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
> > machine and on an ext3 filesystem. Baseline kernel is mmotm with the
> > shrinker and proportional reclaim patches on top.
> > 
> >                                       3.15.0-rc5            3.15.0-rc5
> >                                   mmotm-20140528         fairzone-v1r1
> > Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
> > Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
> > Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
> > Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
> > Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
> > Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
> > Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
> > Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
> > Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
> > Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
> > Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
> > Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
> > Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
> > Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
> > Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
> > Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
> > Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
> > Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
> > Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
> > Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)
> > 
> > Streaming readers see a huge boost. Random random readers, sequential
> > writers and random writers are all in the noise.
> 
> Impressive, but I would really like to understand what's going on
> there.
> 
> Did you record the per-zone allocation numbers by any chance as well,
> so we can see the difference in zone utilization?

No, I didn't record per-zone usage because at the time when the low
watermarks are being hit, it would have been less useful anyway.

> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------
> >  1 file changed, 47 insertions(+), 42 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 2c7d394..70d4264 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1919,6 +1919,28 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
> >  
> >  #endif	/* CONFIG_NUMA */
> >  
> > +static void reset_alloc_batches(struct zonelist *zonelist,
> > +				enum zone_type high_zoneidx,
> > +				struct zone *preferred_zone)
> > +{
> > +	struct zoneref *z;
> > +	struct zone *zone;
> > +
> > +	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> > +		/*
> > +		 * Only reset the batches of zones that were actually
> > +		 * considered in the fairness pass, we don't want to
> > +		 * trash fairness information for zones that are not
> > +		 * actually part of this zonelist's round-robin cycle.
> > +		 */
> > +		if (!zone_local(preferred_zone, zone))
> > +			continue;
> > +		mod_zone_page_state(zone, NR_ALLOC_BATCH,
> > +			high_wmark_pages(zone) - low_wmark_pages(zone) -
> > +			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
> > +	}
> > +}
> > +
> >  /*
> >   * get_page_from_freelist goes through the zonelist trying to allocate
> >   * a page.
> > @@ -1936,6 +1958,7 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
> >  	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
> >  	bool consider_zone_dirty = (alloc_flags & ALLOC_WMARK_LOW) &&
> >  				(gfp_mask & __GFP_WRITE);
> > +	bool batch_depleted = (alloc_flags & ALLOC_FAIR);
> >  
> >  zonelist_scan:
> >  	/*
> > @@ -1960,11 +1982,13 @@ zonelist_scan:
> >  		 * time the page has in memory before being reclaimed.
> >  		 */
> >  		if (alloc_flags & ALLOC_FAIR) {
> > -			if (!zone_local(preferred_zone, zone))
> > -				continue;
> >  			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
> >  				continue;
> > +			batch_depleted = false;
> > +			if (!zone_local(preferred_zone, zone))
> > +				continue;
> 
> This only resets the local batches once the first non-local zone's
> batch is exhausted as well.  Which means that once we start spilling,
> the fairness pass will never consider local zones again until the
> first spill-over target is exhausted too. 

Yes, you're right. The intent was that the reset would only task place
after all local zones had used their allocation batch but it got mucked
up along the way.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
@ 2014-05-29 17:16     ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2014-05-29 17:16 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

On Thu, May 29, 2014 at 10:38:32AM -0400, Johannes Weiner wrote:
> Hi Mel!
> 
> On Thu, May 29, 2014 at 10:04:32AM +0100, Mel Gorman wrote:
> > The fair zone allocation policy round-robins allocations between zones on
> > a node to avoid age inversion problems during reclaim using a counter to
> > manage the round-robin. If the first allocation fails, the batch counts get
> > reset and the allocation is attempted again before going into the slow path.
> > There are at least two problems with this
> > 
> > 1. If the eligible zones are below the low watermark we reset the counts
> >    even though the batches might be fine.
> 
> The idea behind setting the batches to high-low was that they should
> be roughly exhausted by the time the low watermark is hit.  And that
> misconception must be the crux of this patch, because if they *were*
> to exhaust together this patch wouldn't make a difference.
> 
> But once they diverge, we reset the batches prematurely, which means
> not everybody is getting their fair share, and that reverts us back to
> an imbalance in zone utilization.
> 
> So I think the changelog should include why this assumption was wrong.
> 

They won't exhaust together when there are multiple allocation requests
simply on the basis that there is no lock there and there is per-cpu
accounting drift for vmstats. You'd at least expect them to drift by the
per-cpu update threshold.

> > 2. We potentially do batch resets even when the right choice is to fallback
> >    to other nodes.
> 
> We only fall back to other nodes when the fairness cycle is over and
> all local zones have been considered fair and square.  Why *not* reset
> the batches and start a new fairness cycle at this point?  Remember
> that remote nodes are not - can not - be part of the fairness cycle.
> 
> So I think this one is a red herring.
> 

Ok. There have been a lot of red herrings chasing down this one
unfortunately as it was not possible to bisect and there were a lot of
candidates :/

> > When resetting batch counts, it was expected that the count would be <=
> > 0 but the bizarre side-effect is that we are resetting counters that were
> > initially postive so (high - low - batch) potentially sets a high positive
> > batch count to close to 0. This leads to a premature reset in the near
> > future, more overhead and more ... screwing around.
> 
> We're just adding the missing delta between the "should" and "is"
> value to the existing batch, so a high batch value means small delta,
> and we *add* a value close to 0, we don't *set* the batch close to 0.
> 
> I think this one is a red herring as well.
> 

There are still boundary issues that results in screwing around and
maybe I should have focused on this one instead. The situation I had in
mind started out as follows

high zone alloc batch	1000	low watermark not ok
low zone alloc batch	   0	low watermark     ok

during the fairness cycle, no action can take place. The higher zone is not
allowed to allcoate at below the low watermark and must always enter the
slow path. The lower zone also temporarily cannot be used. At this point, a
reset takes place and the system continues until the low watermark is reached

high zone alloc batch	1000	low watermark not ok
low zone allooc batch	 100	low watermark not ok

During this window, every ALLOC_FAIR is going to fail to due watermarks but
still do another zone batch reset and recycle every time before falling
into the slow path.  It ends up being more zonelist traversals which is
why I moved the reset check inside get_page_from_freelist to detect the
difference between ALLOC_FAIL failures and watermarks failures.

The differences in timing when watermarks are hit may also account for
some of the drift for when the alloc batches get depleted.

> > The user-visible effect depends on zone sizes and a host of other effects
> > the obvious one is that single-node machines with multiple zones will see
> > degraded performance for streaming readers at least. The effect is also
> > visible on NUMA machines but it may be harder to identify in the midst of
> > other noise.
> > 
> > Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
> > machine and on an ext3 filesystem. Baseline kernel is mmotm with the
> > shrinker and proportional reclaim patches on top.
> > 
> >                                       3.15.0-rc5            3.15.0-rc5
> >                                   mmotm-20140528         fairzone-v1r1
> > Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
> > Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
> > Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
> > Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
> > Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
> > Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
> > Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
> > Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
> > Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
> > Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
> > Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
> > Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
> > Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
> > Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
> > Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
> > Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
> > Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
> > Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
> > Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
> > Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)
> > 
> > Streaming readers see a huge boost. Random random readers, sequential
> > writers and random writers are all in the noise.
> 
> Impressive, but I would really like to understand what's going on
> there.
> 
> Did you record the per-zone allocation numbers by any chance as well,
> so we can see the difference in zone utilization?

No, I didn't record per-zone usage because at the time when the low
watermarks are being hit, it would have been less useful anyway.

> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------
> >  1 file changed, 47 insertions(+), 42 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 2c7d394..70d4264 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1919,6 +1919,28 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
> >  
> >  #endif	/* CONFIG_NUMA */
> >  
> > +static void reset_alloc_batches(struct zonelist *zonelist,
> > +				enum zone_type high_zoneidx,
> > +				struct zone *preferred_zone)
> > +{
> > +	struct zoneref *z;
> > +	struct zone *zone;
> > +
> > +	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> > +		/*
> > +		 * Only reset the batches of zones that were actually
> > +		 * considered in the fairness pass, we don't want to
> > +		 * trash fairness information for zones that are not
> > +		 * actually part of this zonelist's round-robin cycle.
> > +		 */
> > +		if (!zone_local(preferred_zone, zone))
> > +			continue;
> > +		mod_zone_page_state(zone, NR_ALLOC_BATCH,
> > +			high_wmark_pages(zone) - low_wmark_pages(zone) -
> > +			atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
> > +	}
> > +}
> > +
> >  /*
> >   * get_page_from_freelist goes through the zonelist trying to allocate
> >   * a page.
> > @@ -1936,6 +1958,7 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
> >  	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
> >  	bool consider_zone_dirty = (alloc_flags & ALLOC_WMARK_LOW) &&
> >  				(gfp_mask & __GFP_WRITE);
> > +	bool batch_depleted = (alloc_flags & ALLOC_FAIR);
> >  
> >  zonelist_scan:
> >  	/*
> > @@ -1960,11 +1982,13 @@ zonelist_scan:
> >  		 * time the page has in memory before being reclaimed.
> >  		 */
> >  		if (alloc_flags & ALLOC_FAIR) {
> > -			if (!zone_local(preferred_zone, zone))
> > -				continue;
> >  			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
> >  				continue;
> > +			batch_depleted = false;
> > +			if (!zone_local(preferred_zone, zone))
> > +				continue;
> 
> This only resets the local batches once the first non-local zone's
> batch is exhausted as well.  Which means that once we start spilling,
> the fairness pass will never consider local zones again until the
> first spill-over target is exhausted too. 

Yes, you're right. The intent was that the reset would only task place
after all local zones had used their allocation batch but it got mucked
up along the way.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
  2014-05-29 17:16     ` Mel Gorman
@ 2014-06-04 14:56       ` Johannes Weiner
  -1 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2014-06-04 14:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

On Thu, May 29, 2014 at 06:16:08PM +0100, Mel Gorman wrote:
> On Thu, May 29, 2014 at 10:38:32AM -0400, Johannes Weiner wrote:
> > Hi Mel!
> > 
> > On Thu, May 29, 2014 at 10:04:32AM +0100, Mel Gorman wrote:
> > > The fair zone allocation policy round-robins allocations between zones on
> > > a node to avoid age inversion problems during reclaim using a counter to
> > > manage the round-robin. If the first allocation fails, the batch counts get
> > > reset and the allocation is attempted again before going into the slow path.
> > > There are at least two problems with this
> > > 
> > > 1. If the eligible zones are below the low watermark we reset the counts
> > >    even though the batches might be fine.
> > 
> > The idea behind setting the batches to high-low was that they should
> > be roughly exhausted by the time the low watermark is hit.  And that
> > misconception must be the crux of this patch, because if they *were*
> > to exhaust together this patch wouldn't make a difference.
> > 
> > But once they diverge, we reset the batches prematurely, which means
> > not everybody is getting their fair share, and that reverts us back to
> > an imbalance in zone utilization.
> > 
> > So I think the changelog should include why this assumption was wrong.
> > 
> 
> They won't exhaust together when there are multiple allocation requests
> simply on the basis that there is no lock there and there is per-cpu
> accounting drift for vmstats. You'd at least expect them to drift by the
> per-cpu update threshold.

Yeah, that's true.  I just didn't think it would make such a big
difference, and the numbers I gathered on my local machines showed
that allocation distribution was reliably proportional to zone size.
But it might really depend on the machine, and definitely on the
workload, which why I was curious about the allocation numbers.

> > > When resetting batch counts, it was expected that the count would be <=
> > > 0 but the bizarre side-effect is that we are resetting counters that were
> > > initially postive so (high - low - batch) potentially sets a high positive
> > > batch count to close to 0. This leads to a premature reset in the near
> > > future, more overhead and more ... screwing around.
> > 
> > We're just adding the missing delta between the "should" and "is"
> > value to the existing batch, so a high batch value means small delta,
> > and we *add* a value close to 0, we don't *set* the batch close to 0.
> > 
> > I think this one is a red herring as well.
> > 
> 
> There are still boundary issues that results in screwing around and
> maybe I should have focused on this one instead. The situation I had in
> mind started out as follows
> 
> high zone alloc batch	1000	low watermark not ok
> low zone alloc batch	   0	low watermark     ok
> 
> during the fairness cycle, no action can take place. The higher zone is not
> allowed to allcoate at below the low watermark and must always enter the
> slow path. The lower zone also temporarily cannot be used. At this point, a
> reset takes place and the system continues until the low watermark is reached
> 
> high zone alloc batch	1000	low watermark not ok
> low zone allooc batch	 100	low watermark not ok
> 
> During this window, every ALLOC_FAIR is going to fail to due watermarks but
> still do another zone batch reset and recycle every time before falling
> into the slow path.  It ends up being more zonelist traversals which is
> why I moved the reset check inside get_page_from_freelist to detect the
> difference between ALLOC_FAIL failures and watermarks failures.
> 
> The differences in timing when watermarks are hit may also account for
> some of the drift for when the alloc batches get depleted.

That makes sense, especially in a highly concurrent workload where the
batches might be reset over and over between the first allocator
entering the slowpath and kswapd actually restoring any of the
watermarks.

> > > The user-visible effect depends on zone sizes and a host of other effects
> > > the obvious one is that single-node machines with multiple zones will see
> > > degraded performance for streaming readers at least. The effect is also
> > > visible on NUMA machines but it may be harder to identify in the midst of
> > > other noise.
> > > 
> > > Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
> > > machine and on an ext3 filesystem. Baseline kernel is mmotm with the
> > > shrinker and proportional reclaim patches on top.
> > > 
> > >                                       3.15.0-rc5            3.15.0-rc5
> > >                                   mmotm-20140528         fairzone-v1r1
> > > Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
> > > Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
> > > Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
> > > Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
> > > Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
> > > Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
> > > Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
> > > Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
> > > Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
> > > Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
> > > Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
> > > Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
> > > Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
> > > Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
> > > Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
> > > Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
> > > Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
> > > Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
> > > Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
> > > Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)
> > > 
> > > Streaming readers see a huge boost. Random random readers, sequential
> > > writers and random writers are all in the noise.
> > 
> > Impressive, but I would really like to understand what's going on
> > there.
> > 
> > Did you record the per-zone allocation numbers by any chance as well,
> > so we can see the difference in zone utilization?
> 
> No, I didn't record per-zone usage because at the time when the low
> watermarks are being hit, it would have been less useful anyway.

I just meant the pgalloc_* numbers from /proc/vmstat before and after
the workload to see if the distribution really runs out of whack and
are not in proportion to the zone sizes over the course of the load.

> > > @@ -1960,11 +1982,13 @@ zonelist_scan:
> > >  		 * time the page has in memory before being reclaimed.
> > >  		 */
> > >  		if (alloc_flags & ALLOC_FAIR) {
> > > -			if (!zone_local(preferred_zone, zone))
> > > -				continue;
> > >  			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
> > >  				continue;
> > > +			batch_depleted = false;
> > > +			if (!zone_local(preferred_zone, zone))
> > > +				continue;
> > 
> > This only resets the local batches once the first non-local zone's
> > batch is exhausted as well.  Which means that once we start spilling,
> > the fairness pass will never consider local zones again until the
> > first spill-over target is exhausted too. 
> 
> Yes, you're right. The intent was that the reset would only task place
> after all local zones had used their allocation batch but it got mucked
> up along the way.

I thought this might have been an intentional change as per the NUMA
spilling behavior mentioned in the changelog.  Very well, then :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired
@ 2014-06-04 14:56       ` Johannes Weiner
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2014-06-04 14:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jan Kara, Hugh Dickins, Rik van Riel,
	Linux Kernel, Linux-MM, Linux-FSDevel

On Thu, May 29, 2014 at 06:16:08PM +0100, Mel Gorman wrote:
> On Thu, May 29, 2014 at 10:38:32AM -0400, Johannes Weiner wrote:
> > Hi Mel!
> > 
> > On Thu, May 29, 2014 at 10:04:32AM +0100, Mel Gorman wrote:
> > > The fair zone allocation policy round-robins allocations between zones on
> > > a node to avoid age inversion problems during reclaim using a counter to
> > > manage the round-robin. If the first allocation fails, the batch counts get
> > > reset and the allocation is attempted again before going into the slow path.
> > > There are at least two problems with this
> > > 
> > > 1. If the eligible zones are below the low watermark we reset the counts
> > >    even though the batches might be fine.
> > 
> > The idea behind setting the batches to high-low was that they should
> > be roughly exhausted by the time the low watermark is hit.  And that
> > misconception must be the crux of this patch, because if they *were*
> > to exhaust together this patch wouldn't make a difference.
> > 
> > But once they diverge, we reset the batches prematurely, which means
> > not everybody is getting their fair share, and that reverts us back to
> > an imbalance in zone utilization.
> > 
> > So I think the changelog should include why this assumption was wrong.
> > 
> 
> They won't exhaust together when there are multiple allocation requests
> simply on the basis that there is no lock there and there is per-cpu
> accounting drift for vmstats. You'd at least expect them to drift by the
> per-cpu update threshold.

Yeah, that's true.  I just didn't think it would make such a big
difference, and the numbers I gathered on my local machines showed
that allocation distribution was reliably proportional to zone size.
But it might really depend on the machine, and definitely on the
workload, which why I was curious about the allocation numbers.

> > > When resetting batch counts, it was expected that the count would be <=
> > > 0 but the bizarre side-effect is that we are resetting counters that were
> > > initially postive so (high - low - batch) potentially sets a high positive
> > > batch count to close to 0. This leads to a premature reset in the near
> > > future, more overhead and more ... screwing around.
> > 
> > We're just adding the missing delta between the "should" and "is"
> > value to the existing batch, so a high batch value means small delta,
> > and we *add* a value close to 0, we don't *set* the batch close to 0.
> > 
> > I think this one is a red herring as well.
> > 
> 
> There are still boundary issues that results in screwing around and
> maybe I should have focused on this one instead. The situation I had in
> mind started out as follows
> 
> high zone alloc batch	1000	low watermark not ok
> low zone alloc batch	   0	low watermark     ok
> 
> during the fairness cycle, no action can take place. The higher zone is not
> allowed to allcoate at below the low watermark and must always enter the
> slow path. The lower zone also temporarily cannot be used. At this point, a
> reset takes place and the system continues until the low watermark is reached
> 
> high zone alloc batch	1000	low watermark not ok
> low zone allooc batch	 100	low watermark not ok
> 
> During this window, every ALLOC_FAIR is going to fail to due watermarks but
> still do another zone batch reset and recycle every time before falling
> into the slow path.  It ends up being more zonelist traversals which is
> why I moved the reset check inside get_page_from_freelist to detect the
> difference between ALLOC_FAIL failures and watermarks failures.
> 
> The differences in timing when watermarks are hit may also account for
> some of the drift for when the alloc batches get depleted.

That makes sense, especially in a highly concurrent workload where the
batches might be reset over and over between the first allocator
entering the slowpath and kswapd actually restoring any of the
watermarks.

> > > The user-visible effect depends on zone sizes and a host of other effects
> > > the obvious one is that single-node machines with multiple zones will see
> > > degraded performance for streaming readers at least. The effect is also
> > > visible on NUMA machines but it may be harder to identify in the midst of
> > > other noise.
> > > 
> > > Comparison is tiobench with data size 2*RAM on ext3 on a small single-node
> > > machine and on an ext3 filesystem. Baseline kernel is mmotm with the
> > > shrinker and proportional reclaim patches on top.
> > > 
> > >                                       3.15.0-rc5            3.15.0-rc5
> > >                                   mmotm-20140528         fairzone-v1r1
> > > Mean   SeqRead-MB/sec-1         120.95 (  0.00%)      133.59 ( 10.45%)
> > > Mean   SeqRead-MB/sec-2         100.81 (  0.00%)      113.61 ( 12.70%)
> > > Mean   SeqRead-MB/sec-4          93.75 (  0.00%)      104.75 ( 11.74%)
> > > Mean   SeqRead-MB/sec-8          85.35 (  0.00%)       91.21 (  6.86%)
> > > Mean   SeqRead-MB/sec-16         68.91 (  0.00%)       74.77 (  8.49%)
> > > Mean   RandRead-MB/sec-1          1.08 (  0.00%)        1.07 ( -0.93%)
> > > Mean   RandRead-MB/sec-2          1.28 (  0.00%)        1.25 ( -2.34%)
> > > Mean   RandRead-MB/sec-4          1.54 (  0.00%)        1.51 ( -1.73%)
> > > Mean   RandRead-MB/sec-8          1.67 (  0.00%)        1.70 (  2.20%)
> > > Mean   RandRead-MB/sec-16         1.74 (  0.00%)        1.73 ( -0.19%)
> > > Mean   SeqWrite-MB/sec-1        113.73 (  0.00%)      113.88 (  0.13%)
> > > Mean   SeqWrite-MB/sec-2        103.76 (  0.00%)      104.13 (  0.36%)
> > > Mean   SeqWrite-MB/sec-4         98.45 (  0.00%)       98.44 ( -0.01%)
> > > Mean   SeqWrite-MB/sec-8         93.11 (  0.00%)       92.79 ( -0.34%)
> > > Mean   SeqWrite-MB/sec-16        87.64 (  0.00%)       87.85 (  0.24%)
> > > Mean   RandWrite-MB/sec-1         1.38 (  0.00%)        1.36 ( -1.21%)
> > > Mean   RandWrite-MB/sec-2         1.35 (  0.00%)        1.35 (  0.25%)
> > > Mean   RandWrite-MB/sec-4         1.33 (  0.00%)        1.35 (  1.00%)
> > > Mean   RandWrite-MB/sec-8         1.31 (  0.00%)        1.29 ( -1.53%)
> > > Mean   RandWrite-MB/sec-16        1.27 (  0.00%)        1.28 (  0.79%)
> > > 
> > > Streaming readers see a huge boost. Random random readers, sequential
> > > writers and random writers are all in the noise.
> > 
> > Impressive, but I would really like to understand what's going on
> > there.
> > 
> > Did you record the per-zone allocation numbers by any chance as well,
> > so we can see the difference in zone utilization?
> 
> No, I didn't record per-zone usage because at the time when the low
> watermarks are being hit, it would have been less useful anyway.

I just meant the pgalloc_* numbers from /proc/vmstat before and after
the workload to see if the distribution really runs out of whack and
are not in proportion to the zone sizes over the course of the load.

> > > @@ -1960,11 +1982,13 @@ zonelist_scan:
> > >  		 * time the page has in memory before being reclaimed.
> > >  		 */
> > >  		if (alloc_flags & ALLOC_FAIR) {
> > > -			if (!zone_local(preferred_zone, zone))
> > > -				continue;
> > >  			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
> > >  				continue;
> > > +			batch_depleted = false;
> > > +			if (!zone_local(preferred_zone, zone))
> > > +				continue;
> > 
> > This only resets the local batches once the first non-local zone's
> > batch is exhausted as well.  Which means that once we start spilling,
> > the fairness pass will never consider local zones again until the
> > first spill-over target is exhausted too. 
> 
> Yes, you're right. The intent was that the reset would only task place
> after all local zones had used their allocation batch but it got mucked
> up along the way.

I thought this might have been an intentional change as per the NUMA
spilling behavior mentioned in the changelog.  Very well, then :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-06-04 14:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-29  9:04 [PATCH] mm: page_alloc: Reset fair zone allocation policy only when batch counts are expired Mel Gorman
2014-05-29  9:04 ` Mel Gorman
2014-05-29 14:38 ` Johannes Weiner
2014-05-29 14:38   ` Johannes Weiner
2014-05-29 17:16   ` Mel Gorman
2014-05-29 17:16     ` Mel Gorman
2014-06-04 14:56     ` Johannes Weiner
2014-06-04 14:56       ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.