linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, Roman Gushchin <guro@fb.com>
Subject: [PATCH v1 5/5] percpu: implement partial chunk depopulation
Date: Thu, 1 Apr 2021 14:43:01 -0700	[thread overview]
Message-ID: <20210401214301.1689099-6-guro@fb.com> (raw)
In-Reply-To: <20210401214301.1689099-1-guro@fb.com>

This patch implements partial depopulation of percpu chunks.

As now, a chunk can be depopulated only as a part of the final
destruction, if there are no more outstanding allocations. However
to minimize a memory waste it might be useful to depopulate a
partially filed chunk, if a small number of outstanding allocations
prevents the chunk from being fully reclaimed.

This patch implements the following depopulation process: it scans
over the chunk pages, looks for a range of empty and populated pages
and performs the depopulation. To avoid races with new allocations,
the chunk is previously isolated. After the depopulation the chunk is
returned to the original slot (but is appended to the tail of the list
to minimize the chances of population).

Because the pcpu_lock is dropped while calling pcpu_depopulate_chunk(),
the chunk can be concurrently moved to a different slot. To prevent
this, bool chunk->isolated flag is introduced. If set, the chunk can't
be moved to a different slot.

The depopulation is scheduled on the free path. Is the chunk:
  1) has more than 1/8 of total pages free and populated
  2) the system has enough free percpu pages aside of this chunk
  3) isn't the reserved chunk
  4) isn't the first chunk
  5) isn't entirely free
it's a good target for depopulation.

If so, the chunk is moved to a special pcpu_depopulate_list,
chunk->isolate flag is set and the async balancing is scheduled.

The async balancing moves pcpu_depopulate_list to a local list
(because pcpu_depopulate_list can be changed when pcpu_lock is
releases), and then tries to depopulate each chunk. Successfully
or not, at the end all chunks are returned to appropriate slots
and their isolated flags are cleared.

Many thanks to Dennis Zhou for his great ideas and a very constructive
discussion which led to many improvements in this patchset!

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/percpu-internal.h |   1 +
 mm/percpu.c          | 101 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index 095d7eaa0db4..ff318752915d 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -67,6 +67,7 @@ struct pcpu_chunk {
 
 	void			*data;		/* chunk data */
 	bool			immutable;	/* no [de]population allowed */
+	bool			isolated;	/* isolated from chunk slot lists */
 	int			start_offset;	/* the overlap with the previous
 						   region to have a page aligned
 						   base_addr */
diff --git a/mm/percpu.c b/mm/percpu.c
index e20119668c42..dae0b870e10a 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -181,6 +181,12 @@ static LIST_HEAD(pcpu_map_extend_chunks);
  */
 int pcpu_nr_empty_pop_pages[PCPU_NR_CHUNK_TYPES];
 
+/*
+ * List of chunks with a lot of free pages. Used to depopulate them
+ * asynchronously.
+ */
+static LIST_HEAD(pcpu_depopulate_list);
+
 /*
  * The number of populated pages in use by the allocator, protected by
  * pcpu_lock.  This number is kept per a unit per chunk (i.e. when a page gets
@@ -542,7 +548,7 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)
 {
 	int nslot = pcpu_chunk_slot(chunk);
 
-	if (oslot != nslot)
+	if (!chunk->isolated && oslot != nslot)
 		__pcpu_chunk_move(chunk, nslot, oslot < nslot);
 }
 
@@ -2048,6 +2054,82 @@ static void pcpu_grow_populated(enum pcpu_chunk_type type, int nr_to_pop)
 	}
 }
 
+/**
+ * pcpu_shrink_populated - scan chunks and release unused pages to the system
+ * @type: chunk type
+ *
+ * Scan over all chunks, find those marked with the depopulate flag and
+ * try to release unused pages to the system. On every attempt clear the
+ * chunk's depopulate flag to avoid wasting CPU by scanning the same
+ * chunk again and again.
+ */
+static void pcpu_shrink_populated(enum pcpu_chunk_type type)
+{
+	struct pcpu_block_md *block;
+	struct pcpu_chunk *chunk, *tmp;
+	LIST_HEAD(to_depopulate);
+	int i, start;
+
+	spin_lock_irq(&pcpu_lock);
+
+	list_splice_init(&pcpu_depopulate_list, &to_depopulate);
+
+	list_for_each_entry_safe(chunk, tmp, &to_depopulate, list) {
+		WARN_ON(chunk->immutable);
+
+		for (i = 0, start = -1; i < chunk->nr_pages; i++) {
+			/*
+			 * If the chunk has no empty pages or
+			 * we're short on empty pages in general,
+			 * just put the chunk back into the original slot.
+			 */
+			if (!chunk->nr_empty_pop_pages ||
+			    pcpu_nr_empty_pop_pages[type] <
+			    PCPU_EMPTY_POP_PAGES_HIGH)
+				break;
+
+			/*
+			 * If the page is empty and populated, start or
+			 * extend the [start, i) range.
+			 */
+			block = chunk->md_blocks + i;
+			if (block->contig_hint == PCPU_BITMAP_BLOCK_BITS &&
+			    test_bit(i, chunk->populated)) {
+				if (start == -1)
+					start = i;
+				continue;
+			}
+
+			/*
+			 * Otherwise check if there is an active range,
+			 * and if yes, depopulate it.
+			 */
+			if (start == -1)
+				continue;
+
+			spin_unlock_irq(&pcpu_lock);
+			pcpu_depopulate_chunk(chunk, start, i);
+			cond_resched();
+			spin_lock_irq(&pcpu_lock);
+
+			pcpu_chunk_depopulated(chunk, start, i);
+
+			/*
+			 * Reset the range and continue.
+			 */
+			start = -1;
+		}
+
+		/*
+		 * Return the chunk to the corresponding slot.
+		 */
+		chunk->isolated = false;
+		pcpu_chunk_relocate(chunk, -1);
+	}
+
+	spin_unlock_irq(&pcpu_lock);
+}
+
 /**
  * pcpu_balance_populated - manage the amount of populated pages
  * @type: chunk type
@@ -2078,6 +2160,8 @@ static void pcpu_balance_populated(enum pcpu_chunk_type type)
 	} else if (pcpu_nr_empty_pop_pages[type] < PCPU_EMPTY_POP_PAGES_HIGH) {
 		nr_to_pop = PCPU_EMPTY_POP_PAGES_HIGH - pcpu_nr_empty_pop_pages[type];
 		pcpu_grow_populated(type, nr_to_pop);
+	} else if (!list_empty(&pcpu_depopulate_list)) {
+		pcpu_shrink_populated(type);
 	}
 }
 
@@ -2135,7 +2219,12 @@ void free_percpu(void __percpu *ptr)
 
 	pcpu_memcg_free_hook(chunk, off, size);
 
-	/* if there are more than one fully free chunks, wake up grim reaper */
+	/*
+	 * If there are more than one fully free chunks, wake up grim reaper.
+	 * Otherwise if at least 1/8 of its pages are empty and there is no
+	 * system-wide shortage of empty pages aside from this chunk, isolate
+	 * the chunk and schedule an async depopulation.
+	 */
 	if (chunk->free_bytes == pcpu_unit_size) {
 		struct pcpu_chunk *pos;
 
@@ -2144,6 +2233,14 @@ void free_percpu(void __percpu *ptr)
 				need_balance = true;
 				break;
 			}
+	} else if (chunk != pcpu_first_chunk && chunk != pcpu_reserved_chunk &&
+		   !chunk->isolated &&
+		   chunk->nr_empty_pop_pages >= chunk->nr_pages / 8 &&
+		   pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] >
+		   PCPU_EMPTY_POP_PAGES_HIGH + chunk->nr_empty_pop_pages) {
+		list_move(&chunk->list, &pcpu_depopulate_list);
+		chunk->isolated = true;
+		need_balance = true;
 	}
 
 	trace_percpu_free_percpu(chunk->base_addr, off, ptr);
-- 
2.30.2


  parent reply	other threads:[~2021-04-01 21:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 21:42 [PATCH v1 0/5] percpu: partial chunk depopulation Roman Gushchin
2021-04-01 21:42 ` [PATCH v1 1/5] percpu: split __pcpu_balance_workfn() Roman Gushchin
2021-04-01 21:42 ` [PATCH v1 2/5] percpu: make pcpu_nr_empty_pop_pages per chunk type Roman Gushchin
2021-04-01 21:42 ` [PATCH v1 3/5] percpu: generalize pcpu_balance_populated() Roman Gushchin
2021-04-01 21:43 ` [PATCH v1 4/5] percpu: fix a comment about the chunks ordering Roman Gushchin
2021-04-01 21:43 ` Roman Gushchin [this message]
2021-04-05 23:02   ` [PATCH v1 5/5] percpu: implement partial chunk depopulation Dennis Zhou
2021-04-07 11:09 ` [PATCH v1 0/5] percpu: " Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210401214301.1689099-6-guro@fb.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).