Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, Roman Gushchin <guro@fb.com>
Subject: [PATCH rfc 0/4] percpu: partial chunk depopulation
Date: Wed, 24 Mar 2021 12:06:22 -0700
Message-ID: <20210324190626.564297-1-guro@fb.com> (raw)

In our production experience the percpu memory allocator is sometimes struggling
with returning the memory to the system. A typical example is a creation of
several thousands memory cgroups (each has several chunks of the percpu data
used for vmstats, vmevents, ref counters etc). Deletion and complete releasing
of these cgroups doesn't always lead to a shrinkage of the percpu memory.

The underlying problem is the fragmentation: to release an underlying chunk
all percpu allocations should be released first. The percpu allocator tends
to top up chunks to improve the utilization. It means new small-ish allocations
(e.g. percpu ref counters) are placed onto almost filled old-ish chunks,
effectively pinning them in memory.

This patchset pretends to solve this problem by implementing a partial
depopulation of percpu chunks: chunks with many empty pages are being
asynchronously depopulated and the pages are returned to the system.

To illustrate the problem the following script can be used:

--
#!/bin/bash

cd /sys/fs/cgroup

mkdir percpu_test
echo "+memory" > percpu_test/cgroup.subtree_control

cat /proc/meminfo | grep Percpu

for i in `seq 1 1000`; do
    mkdir percpu_test/cg_"${i}"
    for j in `seq 1 10`; do
	mkdir percpu_test/cg_"${i}"_"${j}"
    done
done

cat /proc/meminfo | grep Percpu

for i in `seq 1 1000`; do
    for j in `seq 1 10`; do
	rmdir percpu_test/cg_"${i}"_"${j}"
    done
done

sleep 10

cat /proc/meminfo | grep Percpu

for i in `seq 1 1000`; do
    rmdir percpu_test/cg_"${i}"
done

rmdir percpu_test
--

It creates 11000 memory cgroups and removes every 10 out of 11.
It prints the initial size of the percpu memory, the size after
creating all cgroups and the size after deleting most of them.

Results:
  vanilla:
    $ ./percpu_test.sh
    Percpu:             7296 kB
    Percpu:           481024 kB
    Percpu:           481024 kB

  with this patchset applied:
    ./percpu_test.sh
    Percpu:             7488 kB
    Percpu:           481152 kB
    Percpu:           153920 kB

So the total size of the percpu memory was reduced by ~3 times.


Roman Gushchin (4):
  percpu: implement partial chunk depopulation
  percpu: split __pcpu_balance_workfn()
  percpu: on demand chunk depopulation
  percpu: fix a comment about the chunks ordering

 mm/percpu-internal.h |   1 +
 mm/percpu.c          | 242 ++++++++++++++++++++++++++++++++++++-------
 2 files changed, 203 insertions(+), 40 deletions(-)

-- 
2.30.2



             reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 19:06 Roman Gushchin [this message]
2021-03-24 19:06 ` [PATCH rfc 1/4] percpu: implement " Roman Gushchin
2021-03-29 17:20   ` Dennis Zhou
2021-03-29 18:29     ` Roman Gushchin
2021-03-29 19:28       ` Dennis Zhou
2021-03-29 19:40         ` Roman Gushchin
2021-03-24 19:06 ` [PATCH rfc 2/4] percpu: split __pcpu_balance_workfn() Roman Gushchin
2021-03-29 17:28   ` Dennis Zhou
2021-03-29 18:20     ` Roman Gushchin
2021-03-24 19:06 ` [PATCH rfc 3/4] percpu: on demand chunk depopulation Roman Gushchin
2021-03-29  8:37   ` [percpu] 28c9dada65: invoked_oom-killer:gfp_mask=0x kernel test robot
2021-03-29 18:19     ` Roman Gushchin
2021-03-29 19:21   ` [PATCH rfc 3/4] percpu: on demand chunk depopulation Dennis Zhou
2021-03-29 20:10     ` Roman Gushchin
2021-03-29 23:12       ` Dennis Zhou
2021-03-30  1:04         ` Roman Gushchin
2021-03-24 19:06 ` [PATCH rfc 4/4] percpu: fix a comment about the chunks ordering Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210324190626.564297-1-guro@fb.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git