From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 808C7C433F5
	for <linux-mm@archiver.kernel.org>; Thu, 24 Feb 2022 08:37:57 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 0E3698D0002; Thu, 24 Feb 2022 03:37:57 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 094308D0001; Thu, 24 Feb 2022 03:37:57 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id E9CF78D0002; Thu, 24 Feb 2022 03:37:56 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0085.hostedemail.com [216.40.44.85])
	by kanga.kvack.org (Postfix) with ESMTP id DC4FE8D0001
	for <linux-mm@kvack.org>; Thu, 24 Feb 2022 03:37:56 -0500 (EST)
Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id 96AE89CD5D
	for <linux-mm@kvack.org>; Thu, 24 Feb 2022 08:37:56 +0000 (UTC)
X-FDA: 79177020552.23.B450A56
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
	by imf14.hostedemail.com (Postfix) with ESMTP id 10E0F100002
	for <linux-mm@kvack.org>; Thu, 24 Feb 2022 08:37:53 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1645691875; x=1677227875;
  h=date:from:to:cc:subject:message-id:references:
   mime-version:content-transfer-encoding:in-reply-to;
  bh=B7JmWqLSADHQyOBDEoJai4v3PkQ90Gva6MhdekTZ0/g=;
  b=KuajqtZdVkmdfyA455ynUOXfon2xsABTMJiyfFxBfVCAJZFU34E756Iz
   A23AQBTAfKq3tJ2vj7IrLQHgSaF8A2BZ3fZROECDDPadmK9g3sSduTyBC
   GP7p5Ffp4s5bUsS7sOtY8R3U6bbM+vMIQZTug3y3/C578+U2YRFgrrUi8
   FC2ZdTTdtLCFP4LsxXAFBr8HGSY3LsU8MA7WKB3VItLmbdZQpTz9qaQX/
   E1Fbjls9ke+GYMWqd0A2hnZOZg/aiTNwqXDHbhtMavC4p+4BFrcVye7sw
   uftVjRc2oycMv3+O8qKpR4TyEye6KZcg5Smveu/AyZOKZvfYYsCdhxkRH
   g==;
X-IronPort-AV: E=McAfee;i="6200,9189,10267"; a="315407415"
X-IronPort-AV: E=Sophos;i="5.88,393,1635231600"; 
   d="scan'208";a="315407415"
Received: from orsmga005.jf.intel.com ([10.7.209.41])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Feb 2022 00:37:26 -0800
X-IronPort-AV: E=Sophos;i="5.88,393,1635231600"; 
   d="scan'208";a="707374821"
Received: from xsang-optiplex-9020.sh.intel.com (HELO xsang-OptiPlex-9020) ([10.239.159.143])
  by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Feb 2022 00:37:19 -0800
Date: Thu, 24 Feb 2022 16:37:02 +0800
From: Oliver Sang <oliver.sang@intel.com>
To: Hugh Dickins <hughd@google.com>
Cc: 0day robot <lkp@intel.com>, Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>, lkp@lists.01.org,
	ying.huang@intel.com, feng.tang@intel.com,
	zhengjun.xing@linux.intel.com, fengwei.yin@intel.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@surriel.com>,
	Suren Baghdasaryan <surenb@google.com>, Yu Zhao <yuzhao@google.com>,
	Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>, Hillf Danton <hdanton@sina.com>,
	linux-mm@kvack.org
Subject: Re: [mm/munlock]  237b445401:  stress-ng.remap.ops_per_sec -62.6%
 regression
Message-ID: <20220224083702.GA17607@xsang-OptiPlex-9020>
References: <20220218063536.GA4377@xsang-OptiPlex-9020>
 <7d35c86-77f6-8bf-84a-5c23ff610f4@google.com>
 <a733e937-987e-921-8152-28d48fd89ed7@google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <a733e937-987e-921-8152-28d48fd89ed7@google.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-Rspam-User: 
Authentication-Results: imf14.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=KuajqtZd;
	spf=none (imf14.hostedemail.com: domain of oliver.sang@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=oliver.sang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: 10E0F100002
X-Stat-Signature: 7x6projeneo4cjud7f4n9e5sksmjsf89
X-HE-Tag: 1645691873-167687
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

hi Hugh,

On Sun, Feb 20, 2022 at 10:32:28PM -0800, Hugh Dickins wrote:
> On Fri, 18 Feb 2022, Hugh Dickins wrote:
> > On Fri, 18 Feb 2022, kernel test robot wrote:
> > >=20
> > > Greeting,
> > >=20
> > > FYI, we noticed a -62.6% regression of stress-ng.remap.ops_per_sec =
due to commit:
> > >=20
> > >=20
> > > commit: 237b4454014d3759acc6459eb329c5e3d55113ed ("[PATCH v2 07/13]=
 mm/munlock: mlock_pte_range() when mlocking or munlocking")
> > > url: https://github.com/0day-ci/linux/commits/Hugh-Dickins/mm-munlo=
ck-rework-of-mlock-munlock-page-handling/20220215-104421
> > > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git ee28=
855a54493ce83bc2a3fbe30210be61b57bc7
> > > patch link: https://lore.kernel.org/lkml/d39f6e4d-aa4f-731a-68ee-e7=
7cdbf1d7bb@google.com
> > >=20
> > > in testcase: stress-ng
> > > on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 83=
58 CPU @ 2.60GHz with 128G memory
> > > with following parameters:
> > >=20
> > > 	nr_threads: 100%
> > > 	testtime: 60s
> > > 	class: memory
> > > 	test: remap
> > > 	cpufreq_governor: performance
> > > 	ucode: 0xd000280
> > >=20
> > >=20
> > >=20
> > >=20
> > > If you fix the issue, kindly add following tag
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > >=20
> > >=20
> > > Details are as below:
> > > -------------------------------------------------------------------=
------------------------------->
> > >=20
> > >=20
> > > To reproduce:
> > >=20
> > >         git clone https://github.com/intel/lkp-tests.git
> > >         cd lkp-tests
> > >         sudo bin/lkp install job.yaml           # job file is attac=
hed in this email
> > >         bin/lkp split-job --compatible job.yaml # generate the yaml=
 file for lkp run
> > >         sudo bin/lkp run generated-yaml-file
> > >=20
> > >         # if come across any failure that blocks the test,
> > >         # please remove ~/.lkp and /lkp dir to run from a clean sta=
te.
> > >=20
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_grou=
p/test/testcase/testtime/ucode:
> > >   memory/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-=
20200603.cgz/lkp-icl-2sp6/remap/stress-ng/60s/0xd000280
> > >=20
> > > commit:=20
> > >   c479426e09 ("mm/munlock: maintain page->mlock_count while unevict=
able")
> > >   237b445401 ("mm/munlock: mlock_pte_range() when mlocking or munlo=
cking")
> > >=20
> > > c479426e09c8088d 237b4454014d3759acc6459eb32=20
> > > ---------------- ---------------------------=20
> > >          %stddev     %change         %stddev
> > >              \          |                \ =20
> > >     109459           -62.5%      41003 =B1  2%  stress-ng.remap.ops
> > >       1823           -62.6%     682.54 =B1  2%  stress-ng.remap.ops=
_per_sec
> > >  2.242e+08           -62.5%   83989157 =B1  2%  stress-ng.time.mino=
r_page_faults
> > >      30.00 =B1  2%     -61.2%      11.65 =B1  4%  stress-ng.time.us=
er_time
> >=20
> > Thanks a lot for trying it out, I did hope that you would find someth=
ing.
> >=20
> > However, IIUC, this by itself is not very interesting:
> > the comparison is between c479426e09 (06/13) as base and 237b445401?
> > 237b445401 is 07/13, "Fill in missing pieces", where the series gets
> > to be correct again, after dropping the old implementation and piecin=
g
> > together the rest of the new implementation.  It's not a surprise tha=
t
> > those tests which need what's added back in 07/13 will get much slowe=
r
> > at this stage.  And later 10/13 brings in a pagevec to speed it up.
>=20
> I probably did not understand correctly: I expect you did try the whole
> series at first, found a regression, and then bisected to that commit.

yes, that is kind of what we did. we applied your patch series as a new b=
ranch:

* 173fe0fca5b75 (linux-review/Hugh-Dickins/mm-munlock-rework-of-mlock-mun=
lock-page-handling/20220215-104421) mm/thp: shrink_page_list() avoid spli=
tting VM_LOCKED THP
* b98339dee777f mm/thp: collapse_file() do try_to_unmap(TTU_BATCH_FLUSH)
* cfe816e96f7fe mm/munlock: page migration needs mlock pagevec drained
* 9e15831f5ebd1 mm/munlock: mlock_page() munlock_page() batch by pagevec
* 351471e7c8eaa mm/munlock: delete smp_mb() from __pagevec_lru_add_fn()
* 84b82ccebbb3c mm/migrate: __unmap_and_move() push good newpage to LRU
* 237b4454014d3 mm/munlock: mlock_pte_range() when mlocking or munlocking
* c479426e09c80 mm/munlock: maintain page->mlock_count while unevictable
* 4165ea7156fc5 mm/munlock: replace clear_page_mlock() by final clearance
* 9208010d4509e mm/munlock: rmap call mlock_vma_page() munlock_vma_page()
* 9886e4a451624 mm/munlock: delete munlock_vma_pages_all(), allow oomreap
* 963d7e1d5ec62 mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE
* b99933d467204 mm/munlock: delete page_mlock() and all its works
* ee28855a54493 perf/x86/intel: Increase max number of the fixed counters
* 0144ba0c5bd31 KVM: x86: use the KVM side max supported fixed counter
* 2145e77fecfb3 perf/x86/intel: Enable PEBS format 5
* 58b2ff2c18b1e perf/core: Allow kernel address filter when not filtering=
 the kernel
* e5524bf1047eb perf/x86/intel/pt: Fix address filter config for 32-bit k=
ernel
* d680ff24e9e14 perf/core: Fix address filter parser for multiple filters
* 1fb85d06ad675 x86: Share definition of __is_canonical_address()
* c243cecb58e39 perf/x86/intel/pt: Relax address filter validation
* 26291c54e111f (tag: v5.17-rc2,

but we actually don't test upon the HEAD of this branch directly. we have=
 some
hourly kernel which contains this branch, then we found regression, and t=
he
bisection finally pointed to 237b4454014d3.

below is some results on this branch:

173fe0fca5b75 mm/thp: shrink_page_list() avoid splitting VM_LOCKED THP		5=
45.56 547.77 562.23 528.83 547.67 550.75
237b4454014d3 mm/munlock: mlock_pte_range() when mlocking or munlocking		=
681.04 685.94 704.17 647.75 684.73 691.63
c479426e09c80 mm/munlock: maintain page->mlock_count while unevictable		1=
842.83 1819.69 1875.71 1811.48 1818.57 1772.87
4165ea7156fc5 mm/munlock: replace clear_page_mlock() by final clearance		=
1324.93 1303.7 1303.47
9208010d4509e mm/munlock: rmap call mlock_vma_page() munlock_vma_page()		=
1272.6 1301.53 1354.12
963d7e1d5ec62 mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE			16323.56 =
15509.96 16488.65
26291c54e111f Linux 5.17-rc2							874.04 890.31 849.84

our auto-bisect would check branch HEAD, and in this case, it's still low=
,
so we made out the report.


>=20
> >=20
> > What would be much more interesting is to treat the series of 13 as o=
ne,
> > and compare the baseline before any of it against the end of the seri=
es:
> > is that something that the 0day robot can easily do?
>=20
> That would still be more interesting to me - though probably not
> actionable (see below), so not worth you going to any effort.  But
> I hope the bad result on this test did not curtail further testing.

it it kind of difficult for us to treat series as one, but we're consider=
ing
how to fulfill this. Thanks a lot for suggestions!

and the bad result is not a problem for us :)

>=20
> >=20
> > But I'll look more closely at the numbers you've provided later today=
,
> > numbers that I've snipped off here: there may still be useful things =
to
> > learn from them; and maybe I'll try following the instructions you've
> > supplied, though I probably won't do a good job of following them.
>=20
> I did look more closely, didn't try lkp itself, but did observe
> stress-ng --timeout 60 --times --verify --metrics-brief --remap 128
> in the reproduce file, and followed that up (but with 8 not 128).
> In my config, the series managed about half the ops as the baseline.
>=20
> Comparison of unevictable_pgs in /proc/vmstat was instructive.
> Baseline 5.17-rc:                    With mm/munlock series applied:
> unevictable_pgs_culled 17            53097984
> unevictable_pgs_rescued 17           53097984
> unevictable_pgs_mlocked 97251331     53097984
> unevictable_pgs_munlocked 97251331   53097984
>=20
> I was very surprised by those low culled/rescued baseline numbers,
> but a look at stress-remap-file-pages.c shows that each thread is
> repetitively doing mlock of one page, remap_file_pages on that address
> (with implicit munlock of old page and mlock of new), munlock of new.
> Whereas usually, we don't munlock a range immediately after mlocking it=
.
>=20
> The baseline's "if (!isolate_lru_page(page)) putback_lru_page(page);"
> technique works very much in its favour on this test: the munlocking
> isolates fail because mlock puts the page back on a pagevec, nothing
> reaches the unevictable list; whereas the mm/munlock series under test
> fastidiously moves each page to unevictable before bringing it back.
>=20
> This certainly modifies my view of the role of the pagevec, and I
> think it indicates that we *may* find a regression in some realistic
> workload which demands a smarter approach.  I have experimented with
> munlock_page() "cancelling" an mlock found earlier on its pagevec;
> but I very much doubt that the experimental code is correct yet, it
> worked well but not as well as hoped (there are various lru_add_drain()=
s
> which are limiting it), and it's just not a complication I'd like to ge=
t
> into: unless pushed to do so by a realistic workload.
>=20
> stress-ng --remap is not that realistic workload (and does
> not pretend to be).  I'm glad that it has highlighted the issue,
> thanks to you, but I don't intend to propose any "fix" for this yet.

Thanks a lot for these details!
not sure if above data on that branch any help?

>=20
> Hugh