linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Xunlei Pang <xlpang@linux.alibaba.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Xunlei Pang <xlpang@linux.alibaba.com>
Subject: [PATCH 4/4] mm/page_reporting: Fix possible user allocation failure
Date: Fri, 26 Mar 2021 17:44:58 +0800	[thread overview]
Message-ID: <1616751898-58393-5-git-send-email-xlpang@linux.alibaba.com> (raw)
In-Reply-To: <1616751898-58393-1-git-send-email-xlpang@linux.alibaba.com>

We encountered user memory allocation failure(OOM) on our
512MiB tiny instances, it didn't happen after turning off
the page reporting.

After some debugging, it turns out 32*4MB=128MB(order-10)
free pages were isolated during reporting window resulting
in no free available.

Actually this might also happen on large instances when
having a few free memory.

This patch introduces a rule to limit reporting capacity
according to current free memory, and reduce accordingly
for higher orders which could break this rule.

For example,
 100MiB free, sgl capacity for different orders are:
   order-9 : 32
   order-10: 16

Reported-by: Helin Guo <helinguo@linux.alibaba.com>
Tested-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
---
 mm/page_reporting.c | 89 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 72 insertions(+), 17 deletions(-)

diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 6ffedb8..2ec0ec0 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -129,8 +129,8 @@ void __page_reporting_notify(void)
  */
 static int
 page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
-		     unsigned int order, unsigned int mt,
-		     struct scatterlist *sgl, unsigned int *offset)
+		     unsigned int order, unsigned int mt, struct scatterlist *sgl,
+		     const unsigned int capacity, unsigned int *offset)
 {
 	struct free_area *area = &zone->free_area[order];
 	struct list_head *list = &area->free_list[mt];
@@ -161,10 +161,10 @@ void __page_reporting_notify(void)
 	 * list processed. This should result in us reporting all pages on
 	 * an idle system in about 30 seconds.
 	 *
-	 * The division here should be cheap since PAGE_REPORTING_CAPACITY
-	 * should always be a power of 2.
+	 * The division here should be cheap since capacity should
+	 * always be a power of 2.
 	 */
-	budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16);
+	budget = DIV_ROUND_UP(area->nr_free, capacity * 16);
 
 	/* loop through free list adding unreported pages to sg list */
 	list_for_each_entry_safe(page, next, list, lru) {
@@ -196,7 +196,7 @@ void __page_reporting_notify(void)
 			--(*offset);
 			sg_set_page(&sgl[*offset], page, page_len, 0);
 
-			nr_pages = (PAGE_REPORTING_CAPACITY - *offset) << order;
+			nr_pages = (capacity - *offset) << order;
 			if (zone->reported_pages + nr_pages >= threshold) {
 				err = 1;
 				break;
@@ -217,10 +217,10 @@ void __page_reporting_notify(void)
 		spin_unlock_irq(&zone->lock);
 
 		/* begin processing pages in local list */
-		err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY);
+		err = prdev->report(prdev, sgl, capacity);
 
 		/* reset offset since the full list was reported */
-		*offset = PAGE_REPORTING_CAPACITY;
+		*offset = capacity;
 
 		/* update budget to reflect call to report function */
 		budget--;
@@ -229,7 +229,7 @@ void __page_reporting_notify(void)
 		spin_lock_irq(&zone->lock);
 
 		/* flush reported pages from the sg list */
-		page_reporting_drain(prdev, sgl, zone, PAGE_REPORTING_CAPACITY, !err);
+		page_reporting_drain(prdev, sgl, zone, capacity, !err);
 
 		/*
 		 * Reset next to first entry, the old next isn't valid
@@ -251,12 +251,39 @@ void __page_reporting_notify(void)
 	return err;
 }
 
+/*
+ * For guest with little free memory, we should tune reporting capacity
+ * correctly to avoid reporting too much once, otherwise user allocation
+ * may fail and OOM during reporting window between __isolate_free_page()
+ * and page_reporting_drain().
+ *
+ * Calculate from which order we begin to reduce the scatterlist capacity,
+ * in order not to isolate too many pages to fail the user allocation.
+ */
+static unsigned int calculate_zone_order_threshold(struct zone *z)
+{
+	unsigned int order;
+	long pages_threshold;
+
+	pages_threshold = zone_page_state(z, NR_FREE_PAGES) - low_wmark_pages(z);
+	for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) {
+		if ((PAGE_REPORTING_CAPACITY << order) > pages_threshold)
+			break;
+	}
+
+	return order;
+}
+
 static int
 page_reporting_process_zone(struct page_reporting_dev_info *prdev,
 			    struct scatterlist *sgl, struct zone *zone)
 {
-	unsigned int order, mt, leftover, offset = PAGE_REPORTING_CAPACITY;
+	unsigned int order, mt, leftover, offset;
 	unsigned long watermark, threshold;
+	unsigned int capacity = PAGE_REPORTING_CAPACITY;
+	unsigned int capacity_curr;
+	struct scatterlist *sgl_curr;
+	unsigned int order_threshold;
 	int err = 0;
 
 	threshold = atomic_long_read(&zone->managed_pages) * reporting_factor / 100;
@@ -274,15 +301,28 @@ void __page_reporting_notify(void)
 	if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 		return err;
 
+	sgl_curr = sgl;
+	capacity_curr = offset = capacity;
+	order_threshold = calculate_zone_order_threshold(zone);
 	/* Process each free list starting from lowest order/mt */
 	for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) {
+		/* try to reduce unexpected high order's reporting capacity */
+		if (order >= order_threshold) {
+			capacity_curr = capacity >> (order - order_threshold + 1);
+			if (capacity_curr == 0)
+				capacity_curr = 1;
+			sgl_curr = sgl + capacity - capacity_curr;
+			offset = capacity_curr;
+			sg_init_table(sgl_curr, capacity_curr);
+		}
+
 		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
 			/* We do not pull pages from the isolate free list */
 			if (is_migrate_isolate(mt))
 				continue;
 
 			err = page_reporting_cycle(prdev, zone, order, mt,
-						   sgl, &offset);
+						   sgl_curr, capacity_curr, &offset);
 			/* Exceed threshold go to report leftover */
 			if (err > 0) {
 				err = 0;
@@ -292,18 +332,34 @@ void __page_reporting_notify(void)
 			if (err)
 				return err;
 		}
+
+		/* report the leftover pages for next orders with reduced capacity */
+		leftover = capacity_curr - offset;
+		if (leftover && order + 1 >= order_threshold) {
+			sgl_curr = &sgl_curr[offset];
+			err = prdev->report(prdev, sgl_curr, leftover);
+			offset = capacity_curr;
+
+			/* flush any remaining pages out from the last report */
+			spin_lock_irq(&zone->lock);
+			page_reporting_drain(prdev, sgl_curr, zone, leftover, !err);
+			spin_unlock_irq(&zone->lock);
+
+			if (err)
+				return err;
+		}
 	}
 
 leftover:
 	/* report the leftover pages before going idle */
-	leftover = PAGE_REPORTING_CAPACITY - offset;
+	leftover = capacity_curr - offset;
 	if (leftover) {
-		sgl = &sgl[offset];
-		err = prdev->report(prdev, sgl, leftover);
+		sgl_curr = &sgl_curr[offset];
+		err = prdev->report(prdev, sgl_curr, leftover);
 
 		/* flush any remaining pages out from the last report */
 		spin_lock_irq(&zone->lock);
-		page_reporting_drain(prdev, sgl, zone, leftover, !err);
+		page_reporting_drain(prdev, sgl_curr, zone, leftover, !err);
 		spin_unlock_irq(&zone->lock);
 	}
 
@@ -332,9 +388,8 @@ static void page_reporting_process(struct work_struct *work)
 	if (!sgl)
 		goto err_out;
 
-	sg_init_table(sgl, PAGE_REPORTING_CAPACITY);
-
 	for_each_zone(zone) {
+		sg_init_table(sgl, PAGE_REPORTING_CAPACITY);
 		err = page_reporting_process_zone(prdev, sgl, zone);
 		if (err)
 			break;
-- 
1.8.3.1



  parent reply	other threads:[~2021-03-26  9:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26  9:44 [PATCH 0/4] mm/page_reporting: Some knobs and fixes Xunlei Pang
2021-03-26  9:44 ` [PATCH 1/4] mm/page_reporting: Introduce free page reported counters Xunlei Pang
2021-04-02 18:29   ` Alexander Duyck
2021-03-26  9:44 ` [PATCH 2/4] mm/page_reporting: Introduce free page reporting factor Xunlei Pang
2021-04-02 18:56   ` Alexander Duyck
2021-04-06  6:53     ` Xunlei Pang
2021-03-26  9:44 ` [PATCH 3/4] mm/page_reporting: Introduce "page_reporting_factor=" boot parameter Xunlei Pang
2021-03-26  9:44 ` Xunlei Pang [this message]
2021-04-02 19:55   ` [PATCH 4/4] mm/page_reporting: Fix possible user allocation failure Alexander Duyck
2021-04-06  6:55     ` Xunlei Pang
2021-04-02  4:08 ` [PATCH 0/4] mm/page_reporting: Some knobs and fixes Xunlei Pang
2021-04-02 18:17   ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1616751898-58393-5-git-send-email-xlpang@linux.alibaba.com \
    --to=xlpang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).