From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43268C433E3 for ; Tue, 25 Aug 2020 03:28:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0F90D207FB for ; Tue, 25 Aug 2020 03:28:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0F90D207FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 438636B00DB; Mon, 24 Aug 2020 23:28:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C0A16B00DC; Mon, 24 Aug 2020 23:28:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2643790002A; Mon, 24 Aug 2020 23:28:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 09E066B00DB for ; Mon, 24 Aug 2020 23:28:36 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C517E8248047 for ; Tue, 25 Aug 2020 03:28:35 +0000 (UTC) X-FDA: 77187658590.24.bag39_0316f5f27058 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 9DDAB1A4A0 for ; Tue, 25 Aug 2020 03:28:35 +0000 (UTC) X-HE-Tag: bag39_0316f5f27058 X-Filterd-Recvd-Size: 7632 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 03:28:33 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04455;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0U6nFDwk_1598326106; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U6nFDwk_1598326106) by smtp.aliyun-inc.com(127.0.0.1); Tue, 25 Aug 2020 11:28:27 +0800 Subject: Re: [PATCH v18 00/32] per memcg lru_lock To: Daniel Jordan , Hugh Dickins Cc: Andrew Morton , mgorman@techsingularity.net, tj@kernel.org, khlebnikov@yandex-team.ru, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com References: <1598273705-69124-1-git-send-email-alex.shi@linux.alibaba.com> <20200824114204.cc796ca182db95809dd70a47@linux-foundation.org> <20200825015627.3c3pnwauqznnp3gc@ca-dmjordan1.us.oracle.com> From: Alex Shi Message-ID: Date: Tue, 25 Aug 2020 11:26:58 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200825015627.3c3pnwauqznnp3gc@ca-dmjordan1.us.oracle.com> Content-Type: text/plain; charset=gbk X-Rspamd-Queue-Id: 9DDAB1A4A0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =D4=DA 2020/8/25 =C9=CF=CE=E79:56, Daniel Jordan =D0=B4=B5=C0: > On Mon, Aug 24, 2020 at 01:24:20PM -0700, Hugh Dickins wrote: >> On Mon, 24 Aug 2020, Andrew Morton wrote: >>> On Mon, 24 Aug 2020 20:54:33 +0800 Alex Shi wrote: >> Andrew demurred on version 17 for lack of review. Alexander Duyck has >> been doing a lot on that front since then. I have intended to do so, >> but it's a mirage that moves away from me as I move towards it: I have >=20 > Same, I haven't been able to keep up with the versions or the recent re= view > feedback. I got through about half of v17 last week and hope to have m= ore time > for the rest this week and beyond. >=20 >>>> Following Daniel Jordan's suggestion, I have run 208 'dd' with on 10= 4 >>>> containers on a 2s * 26cores * HT box with a modefied case: >=20 > Alex, do you have a pointer to the modified readtwice case? Sorry, no. my developer machine crashed, so I lost case my container and = modified case. I am struggling to get my container back from a account problematic= repository.=20 But some testing scripts is here, generally, the original readtwice case = will run each of threads on each of cpus. The new case will run one container = on each cpus, and just run one readtwice thead in each of containers. Here is readtwice case changes(Just a reference) diff --git a/case-lru-file-readtwice b/case-lru-file-readtwice index 85533b248634..48c6b5f44256 100755 --- a/case-lru-file-readtwice +++ b/case-lru-file-readtwice @@ -15,12 +15,9 @@ . ./hw_vars -for i in `seq 1 $nr_task` -do create_sparse_file $SPARSE_FILE-$i $((ROTATE_BYTES / nr_task)) timeout --foreground -s INT ${runtime:-600} dd bs=3D4k if=3D$SPAR= SE_FILE-$i of=3D/dev/null > $TMPFS_MNT/dd-output-1-$i 2>&1 & timeout --foreground -s INT ${runtime:-600} dd bs=3D4k if=3D$SPAR= SE_FILE-$i of=3D/dev/null > $TMPFS_MNT/dd-output-2-$i 2>&1 & -done wait sleep 1 @@ -31,7 +28,7 @@ do echo "dd output file empty: $file" >&2 } cat $file - rm $file + #rm $file done rm `seq -f $SPARSE_FILE-%g 1 $nr_task` And here is how to running the case:=20 -------- #run all case on 24 cpu machine, lrulockv2 is the container with modified= case. for ((i=3D0; i<24; i++)) do #btw, vm-scalability need create 23 loop devices docker run --privileged=3Dtrue --rm lrulockv2 bash -c " sleep 200= 00" & done sleep 15 #wait all container ready.=20 #kick testing for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do docker exec --pr= ivileged=3Dtrue -it $i bash -c "cd vm-scalability/; bash -x ./run case-lr= u-file-readtwice "& done #show testing result for all for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do echo =3D=3D=3D $= i =3D=3D=3D; docker exec $i bash -c 'cat /tmp/vm-scalability-tmp/dd-outpu= t-* ' & done for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do echo =3D=3D=3D $= i =3D=3D=3D; docker exec $i bash -c 'cat /tmp/vm-scalability-tmp/dd-outpu= t-* ' & done | grep MB | awk 'BEGIN {a=3D0 ;} { a+=3D$8} END {print NR, a/(NR)}' >=20 > Even better would be a description of the problem you're having in prod= uction > with lru_lock. We might be able to create at least a simulation of it = to show > what the expected improvement of your real workload is. we are using thousands memcgs in a machine, but as a simulation, I guess = above case could be helpful to show the problem. Thanks a lot! Alex >=20 >>>> https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.g= it/tree/case-lru-file-readtwice >>>> With this patchset, the readtwice performance increased about 80% >>>> in concurrent containers. >>> >>> That's rather a slight amount of performance testing for a huge >>> performance patchset! >> >> Indeed. And I see that clause about readtwice performance increased 8= 0% >> going back eight months to v6: a lot of fundamental bugs have been fix= ed >> in it since then, so I do think it needs refreshing. It could be fast= er >> now: v16 or v17 fixed the last bug I knew of, which had been slowing >> down reclaim considerably. >> >> When I last timed my repetitive swapping loads (not loads anyone sensi= ble >> would be running with), across only two memcgs, Alex's patchset was >> slightly faster than without: it really did make a difference. But >> I tend to think that for all patchsets, there exists at least one >> test that shows it faster, and another that shows it slower. In my testing, case-lru-file-mmap-read has a bit slower, 10+% on 96 threa= d machine, when memcg is enabled but unused, that may due to longer pointer jumpping= on=20 lruvec than pgdat->lru_lock, since cgroup_disable=3Dmemory could fully re= move the regression with the new lock path. I tried reusing page->prviate to store lruvec pointer, that could remove = some=20 regression on this, since private is generally unused on a lru page. But = the patch is too buggy now.=20 BTW,=20 Guess memcg would cause more memory disturb on a large machine, if it's e= nabled but unused, isn't it? >> >>> Is more detailed testing planned? >> >> Not by me, performance testing is not something I trust myself with, >> just get lost in the numbers: Alex, this is what we hoped for months >> ago, please make a more convincing case, I hope Daniel and others >> can make more suggestions. But my own evidence suggests it's good. >=20 > I ran a few benchmarks on v17 last week (sysbench oltp readonly, kernde= vel from > mmtests, a memcg-ized version of the readtwice case I cooked up) and th= en today > discovered there's a chance I wasn't running the right kernels, so I'm = redoing > them on v18. Plan to look into what other, more "macro" tests would be > sensitive to these changes. >=20