From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FAEEC433B4 for ; Fri, 16 Apr 2021 14:18:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0F042611AC for ; Fri, 16 Apr 2021 14:18:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0F042611AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 604DE6B0036; Fri, 16 Apr 2021 10:18:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B3776B006C; Fri, 16 Apr 2021 10:18:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 454E46B0070; Fri, 16 Apr 2021 10:18:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id 27DFA6B0036 for ; Fri, 16 Apr 2021 10:18:13 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CD78B40D3 for ; Fri, 16 Apr 2021 14:18:12 +0000 (UTC) X-FDA: 78038434824.04.DFB0E38 Received: from mail-io1-f49.google.com (mail-io1-f49.google.com [209.85.166.49]) by imf12.hostedemail.com (Postfix) with ESMTP id 2DB3113A for ; Fri, 16 Apr 2021 14:18:06 +0000 (UTC) Received: by mail-io1-f49.google.com with SMTP id z5so6003958ioc.13 for ; Fri, 16 Apr 2021 07:18:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=w0G38QrpTF9MzxLWljSX9jB3Wbky6X+aYyRCi5ivhBQ=; b=BALLuDf2rxb2aLL4e0BDrG3xOnb1f+R/ImUgY4+H6Z82NcpgOGNUa9hEuMXmLHeWPi 0/zp1h0gwm63BkXAH2ijUiKsmuvP8wdEPDfsTEf6+Gut9Tk9kcYZKdJTlKX7z35of4F/ khlDampz4rkHnV4Wk4dsHA+Hr06r7kMSuuVCwbdkDS0Q+RbJIvcZaS4JlsRnhs8oScVc KOHi5HVOpzDiWYTWX3KuK7csppokb1j0AeCaff/0neORSDSEQybB2B7fCbj9Ppb6P9KX LT2bfZVWvC3hvKei/TbpY7a82k2c8s+nk6MePYy+s2t0qUsthE3XaaMUt4cBC7O8FWUg 758w== X-Gm-Message-State: AOAM530Sa0W+2m2aJuTwAkuL4fS2+9aoxeThC3a668VVQ9AXWGTDmE4p h43wo1qsgR/Bq6KPilKtNWc= X-Google-Smtp-Source: ABdhPJzr//Wvbj6gzK9N85FCZkc9Q5EEvcu4lbI/lhVYJMcTXpxwfE/K3x1EMefnhAIJoG1qSBasBQ== X-Received: by 2002:a5e:c117:: with SMTP id v23mr3681418iol.54.1618582691852; Fri, 16 Apr 2021 07:18:11 -0700 (PDT) Received: from google.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id d16sm2761964ils.48.2021.04.16.07.18.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Apr 2021 07:18:11 -0700 (PDT) Date: Fri, 16 Apr 2021 14:18:10 +0000 From: Dennis Zhou To: Pratik Sampat Cc: Roman Gushchin , Tejun Heo , Christoph Lameter , Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org, pratik.r.sampat@gmail.com Subject: Re: [PATCH v3 0/6] percpu: partial chunk depopulation Message-ID: References: <20210408035736.883861-1-guro@fb.com> <25c78660-9f4c-34b3-3a05-68c313661a46@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <25c78660-9f4c-34b3-3a05-68c313661a46@linux.ibm.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 2DB3113A X-Stat-Signature: dpjzckwcxgsgm9htbfz6sasej3t4hsno Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf12; identity=mailfrom; envelope-from=""; helo=mail-io1-f49.google.com; client-ip=209.85.166.49 X-HE-DKIM-Result: none/none X-HE-Tag: 1618582686-967928 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Fri, Apr 16, 2021 at 06:26:15PM +0530, Pratik Sampat wrote: > Hello Roman, >=20 > I've tried the v3 patch series on a POWER9 and an x86 KVM setup. >=20 > My results of the percpu_test are as follows: > Intel KVM 4CPU:4G > Vanilla 5.12-rc6 > # ./percpu_test.sh > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1952 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 219648 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 219648 kB >=20 > 5.12-rc6 + with patchset applied > # ./percpu_test.sh > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 2080 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 219712 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 72672 kB >=20 > I'm able to see improvement comparable to that of what you're see too. >=20 > However, on POWERPC I'm unable to reproduce these improvements with the= patchset in the same configuration >=20 > POWER9 KVM 4CPU:4G > Vanilla 5.12-rc6 > # ./percpu_test.sh > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 5888 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 118272 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 118272 kB >=20 > 5.12-rc6 + with patchset applied > # ./percpu_test.sh > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 6144 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 119040 kB > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 119040 kB >=20 > I'm wondering if there's any architectural specific code that needs plu= mbing > here? >=20 There shouldn't be. Can you send me the percpu_stats debug output before and after? > I will also look through the code to find the reason why POWER isn't > depopulating pages. >=20 > Thank you, > Pratik >=20 > On 08/04/21 9:27 am, Roman Gushchin wrote: > > In our production experience the percpu memory allocator is sometimes= struggling > > with returning the memory to the system. A typical example is a creat= ion of > > several thousands memory cgroups (each has several chunks of the perc= pu data > > used for vmstats, vmevents, ref counters etc). Deletion and complete = releasing > > of these cgroups doesn't always lead to a shrinkage of the percpu mem= ory, > > so that sometimes there are several GB's of memory wasted. > >=20 > > The underlying problem is the fragmentation: to release an underlying= chunk > > all percpu allocations should be released first. The percpu allocator= tends > > to top up chunks to improve the utilization. It means new small-ish a= llocations > > (e.g. percpu ref counters) are placed onto almost filled old-ish chun= ks, > > effectively pinning them in memory. > >=20 > > This patchset solves this problem by implementing a partial depopulat= ion > > of percpu chunks: chunks with many empty pages are being asynchronous= ly > > depopulated and the pages are returned to the system. > >=20 > > To illustrate the problem the following script can be used: > >=20 > > -- > > #!/bin/bash > >=20 > > cd /sys/fs/cgroup > >=20 > > mkdir percpu_test > > echo "+memory" > percpu_test/cgroup.subtree_control > >=20 > > cat /proc/meminfo | grep Percpu > >=20 > > for i in `seq 1 1000`; do > > mkdir percpu_test/cg_"${i}" > > for j in `seq 1 10`; do > > mkdir percpu_test/cg_"${i}"_"${j}" > > done > > done > >=20 > > cat /proc/meminfo | grep Percpu > >=20 > > for i in `seq 1 1000`; do > > for j in `seq 1 10`; do > > rmdir percpu_test/cg_"${i}"_"${j}" > > done > > done > >=20 > > sleep 10 > >=20 > > cat /proc/meminfo | grep Percpu > >=20 > > for i in `seq 1 1000`; do > > rmdir percpu_test/cg_"${i}" > > done > >=20 > > rmdir percpu_test > > -- > >=20 > > It creates 11000 memory cgroups and removes every 10 out of 11. > > It prints the initial size of the percpu memory, the size after > > creating all cgroups and the size after deleting most of them. > >=20 > > Results: > > vanilla: > > ./percpu_test.sh > > Percpu: 7488 kB > > Percpu: 481152 kB > > Percpu: 481152 kB > >=20 > > with this patchset applied: > > ./percpu_test.sh > > Percpu: 7488 kB > > Percpu: 481408 kB > > Percpu: 135552 kB > >=20 > > So the total size of the percpu memory was reduced by more than 3.5 t= imes. > >=20 > > v3: > > - introduced pcpu_check_chunk_hint() > > - fixed a bug related to the hint check > > - minor cosmetic changes > > - s/pretends/fixes (cc Vlastimil) > >=20 > > v2: > > - depopulated chunks are sidelined > > - depopulation happens in the reverse order > > - depopulate list made per-chunk type > > - better results due to better heuristics > >=20 > > v1: > > - depopulation heuristics changed and optimized > > - chunks are put into a separate list, depopulation scan this list > > - chunk->isolated is introduced, chunk->depopulate is dropped > > - rearranged patches a bit > > - fixed a panic discovered by krobot > > - made pcpu_nr_empty_pop_pages per chunk type > > - minor fixes > >=20 > > rfc: > > https://lwn.net/Articles/850508/ > >=20 > >=20 > > Roman Gushchin (6): > > percpu: fix a comment about the chunks ordering > > percpu: split __pcpu_balance_workfn() > > percpu: make pcpu_nr_empty_pop_pages per chunk type > > percpu: generalize pcpu_balance_populated() > > percpu: factor out pcpu_check_chunk_hint() > > percpu: implement partial chunk depopulation > >=20 > > mm/percpu-internal.h | 4 +- > > mm/percpu-stats.c | 9 +- > > mm/percpu.c | 306 +++++++++++++++++++++++++++++++++++-----= --- > > 3 files changed, 261 insertions(+), 58 deletions(-) > >=20 >=20 Roman, sorry for the delay. I'm looking to apply this today to for-5.14. Thanks, Dennis