From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=AAJG=CD=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.5 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,
	USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 43268C433E3
	for <linux-mm@archiver.kernel.org>; Tue, 25 Aug 2020 03:28:37 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 0F90D207FB
	for <linux-mm@archiver.kernel.org>; Tue, 25 Aug 2020 03:28:36 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0F90D207FB
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 438636B00DB; Mon, 24 Aug 2020 23:28:36 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 3C0A16B00DC; Mon, 24 Aug 2020 23:28:36 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 2643790002A; Mon, 24 Aug 2020 23:28:36 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29])
	by kanga.kvack.org (Postfix) with ESMTP id 09E066B00DB
	for <linux-mm@kvack.org>; Mon, 24 Aug 2020 23:28:36 -0400 (EDT)
Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id C517E8248047
	for <linux-mm@kvack.org>; Tue, 25 Aug 2020 03:28:35 +0000 (UTC)
X-FDA: 77187658590.24.bag39_0316f5f27058
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin24.hostedemail.com (Postfix) with ESMTP id 9DDAB1A4A0
	for <linux-mm@kvack.org>; Tue, 25 Aug 2020 03:28:35 +0000 (UTC)
X-HE-Tag: bag39_0316f5f27058
X-Filterd-Recvd-Size: 7632
Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133])
	by imf03.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 25 Aug 2020 03:28:33 +0000 (UTC)
X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04455;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0U6nFDwk_1598326106;
Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U6nFDwk_1598326106)
          by smtp.aliyun-inc.com(127.0.0.1);
          Tue, 25 Aug 2020 11:28:27 +0800
Subject: Re: [PATCH v18 00/32] per memcg lru_lock
To: Daniel Jordan <daniel.m.jordan@oracle.com>,
 Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, mgorman@techsingularity.net,
 tj@kernel.org, khlebnikov@yandex-team.ru, willy@infradead.org,
 hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
 linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com,
 iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name,
 alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com,
 vdavydov.dev@gmail.com, shy828301@gmail.com
References: <1598273705-69124-1-git-send-email-alex.shi@linux.alibaba.com>
 <20200824114204.cc796ca182db95809dd70a47@linux-foundation.org>
 <alpine.LSU.2.11.2008241231460.1065@eggly.anvils>
 <20200825015627.3c3pnwauqznnp3gc@ca-dmjordan1.us.oracle.com>
From: Alex Shi <alex.shi@linux.alibaba.com>
Message-ID: <ec62a835-f79d-2b8c-99c7-120834703b42@linux.alibaba.com>
Date: Tue, 25 Aug 2020 11:26:58 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0)
 Gecko/20100101 Thunderbird/68.7.0
MIME-Version: 1.0
In-Reply-To: <20200825015627.3c3pnwauqznnp3gc@ca-dmjordan1.us.oracle.com>
Content-Type: text/plain; charset=gbk
X-Rspamd-Queue-Id: 9DDAB1A4A0
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


=D4=DA 2020/8/25 =C9=CF=CE=E79:56, Daniel Jordan =D0=B4=B5=C0:
> On Mon, Aug 24, 2020 at 01:24:20PM -0700, Hugh Dickins wrote:
>> On Mon, 24 Aug 2020, Andrew Morton wrote:
>>> On Mon, 24 Aug 2020 20:54:33 +0800 Alex Shi <alex.shi@linux.alibaba.c=
om> wrote:
>> Andrew demurred on version 17 for lack of review.  Alexander Duyck has
>> been doing a lot on that front since then.  I have intended to do so,
>> but it's a mirage that moves away from me as I move towards it: I have
>=20
> Same, I haven't been able to keep up with the versions or the recent re=
view
> feedback.  I got through about half of v17 last week and hope to have m=
ore time
> for the rest this week and beyond.
>=20
>>>> Following Daniel Jordan's suggestion, I have run 208 'dd' with on 10=
4
>>>> containers on a 2s * 26cores * HT box with a modefied case:
>=20
> Alex, do you have a pointer to the modified readtwice case?

Sorry, no. my developer machine crashed, so I lost case my container and =
modified
case. I am struggling to get my container back from a account problematic=
 repository.=20

But some testing scripts is here, generally, the original readtwice case =
will
run each of threads on each of cpus. The new case will run one container =
on each cpus,
and just run one readtwice thead in each of containers.

Here is readtwice case changes(Just a reference)
diff --git a/case-lru-file-readtwice b/case-lru-file-readtwice
index 85533b248634..48c6b5f44256 100755
--- a/case-lru-file-readtwice
+++ b/case-lru-file-readtwice
@@ -15,12 +15,9 @@

 . ./hw_vars

-for i in `seq 1 $nr_task`
-do
        create_sparse_file $SPARSE_FILE-$i $((ROTATE_BYTES / nr_task))
        timeout --foreground -s INT ${runtime:-600} dd bs=3D4k if=3D$SPAR=
SE_FILE-$i of=3D/dev/null > $TMPFS_MNT/dd-output-1-$i 2>&1 &
        timeout --foreground -s INT ${runtime:-600} dd bs=3D4k if=3D$SPAR=
SE_FILE-$i of=3D/dev/null > $TMPFS_MNT/dd-output-2-$i 2>&1 &
-done

 wait
 sleep 1
@@ -31,7 +28,7 @@ do
                echo "dd output file empty: $file" >&2
        }
        cat $file
-       rm  $file
+       #rm  $file
 done

 rm `seq -f $SPARSE_FILE-%g 1 $nr_task`

And here is how to running the case:=20
--------
#run all case on 24 cpu machine, lrulockv2 is the container with modified=
 case.
for ((i=3D0; i<24; i++))
do
        #btw, vm-scalability need create 23 loop devices
        docker run --privileged=3Dtrue --rm lrulockv2 bash -c " sleep 200=
00" &
done
sleep 15  #wait all container ready.=20

#kick testing
for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do docker exec --pr=
ivileged=3Dtrue -it $i bash -c "cd vm-scalability/; bash -x ./run case-lr=
u-file-readtwice "& done

#show testing result for all
for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do echo =3D=3D=3D $=
i =3D=3D=3D; docker exec $i bash -c 'cat /tmp/vm-scalability-tmp/dd-outpu=
t-* '  & done
for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do echo =3D=3D=3D $=
i =3D=3D=3D; docker exec $i bash -c 'cat /tmp/vm-scalability-tmp/dd-outpu=
t-* '  & done | grep MB | awk 'BEGIN {a=3D0
;} { a+=3D$8} END {print NR, a/(NR)}'


>=20
> Even better would be a description of the problem you're having in prod=
uction
> with lru_lock.  We might be able to create at least a simulation of it =
to show
> what the expected improvement of your real workload is.

we are using thousands memcgs in a machine, but as a simulation, I guess =
above case
could be helpful to show the problem.

Thanks a lot!
Alex

>=20
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.g=
it/tree/case-lru-file-readtwice
>>>> With this patchset, the readtwice performance increased about 80%
>>>> in concurrent containers.
>>>
>>> That's rather a slight amount of performance testing for a huge
>>> performance patchset!
>>
>> Indeed.  And I see that clause about readtwice performance increased 8=
0%
>> going back eight months to v6: a lot of fundamental bugs have been fix=
ed
>> in it since then, so I do think it needs refreshing.  It could be fast=
er
>> now: v16 or v17 fixed the last bug I knew of, which had been slowing
>> down reclaim considerably.
>>
>> When I last timed my repetitive swapping loads (not loads anyone sensi=
ble
>> would be running with), across only two memcgs, Alex's patchset was
>> slightly faster than without: it really did make a difference.  But
>> I tend to think that for all patchsets, there exists at least one
>> test that shows it faster, and another that shows it slower.

In my testing, case-lru-file-mmap-read has a bit slower, 10+% on 96 threa=
d machine,
when memcg is enabled but unused, that may due to longer pointer jumpping=
 on=20
lruvec than pgdat->lru_lock, since cgroup_disable=3Dmemory could fully re=
move the
regression with the new lock path.

I tried reusing page->prviate to store lruvec pointer, that could remove =
some=20
regression on this, since private is generally unused on a lru page. But =
the patch
is too buggy now.=20

BTW,=20
Guess memcg would cause more memory disturb on a large machine, if it's e=
nabled but
unused, isn't it?


>>
>>> Is more detailed testing planned?
>>
>> Not by me, performance testing is not something I trust myself with,
>> just get lost in the numbers: Alex, this is what we hoped for months
>> ago, please make a more convincing case, I hope Daniel and others
>> can make more suggestions.  But my own evidence suggests it's good.
>=20
> I ran a few benchmarks on v17 last week (sysbench oltp readonly, kernde=
vel from
> mmtests, a memcg-ized version of the readtwice case I cooked up) and th=
en today
> discovered there's a chance I wasn't running the right kernels, so I'm =
redoing
> them on v18.  Plan to look into what other, more "macro" tests would be
> sensitive to these changes.
>=20