From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161041Ab2JXTzM (ORCPT ); Wed, 24 Oct 2012 15:55:12 -0400 Received: from mga01.intel.com ([192.55.52.88]:45341 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964884Ab2JXTzH (ORCPT ); Wed, 24 Oct 2012 15:55:07 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.80,642,1344236400"; d="asc'?scan'208";a="239698531" Date: Wed, 24 Oct 2012 22:45:52 +0300 From: "Kirill A. Shutemov" To: Andrew Morton Cc: "Kirill A. Shutemov" , Andrea Arcangeli , linux-mm@kvack.org, Andi Kleen , "H. Peter Anvin" , linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 10/10] thp: implement refcounting for huge zero page Message-ID: <20121024194552.GA24460@otc-wbsnb-06> References: <1350280859-18801-1-git-send-email-kirill.shutemov@linux.intel.com> <1350280859-18801-11-git-send-email-kirill.shutemov@linux.intel.com> <20121018164502.b32791e7.akpm@linux-foundation.org> <20121018235941.GA32397@shutemov.name> <20121023063532.GA15870@shutemov.name> <20121022234349.27f33f62.akpm@linux-foundation.org> <20121023070018.GA18381@otc-wbsnb-06> <20121023155915.7d5ef9d1.akpm@linux-foundation.org> <20121023233801.GA21591@shutemov.name> <20121024122253.5ecea992.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9amGYk9869ThD9tj" Content-Disposition: inline In-Reply-To: <20121024122253.5ecea992.akpm@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --9amGYk9869ThD9tj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 24, 2012 at 12:22:53PM -0700, Andrew Morton wrote: > On Wed, 24 Oct 2012 02:38:01 +0300 > "Kirill A. Shutemov" wrote: >=20 > > On Tue, Oct 23, 2012 at 03:59:15PM -0700, Andrew Morton wrote: > > > On Tue, 23 Oct 2012 10:00:18 +0300 > > > "Kirill A. Shutemov" wrote: > > >=20 > > > > > Well, how hard is it to trigger the bad behavior? One can easily > > > > > create a situation in which that page's refcount frequently switc= hes > > > > > from 0 to 1 and back again. And one can easily create a situatio= n in > > > > > which the shrinkers are being called frequently. Run both at the= same > > > > > time and what happens? > > > >=20 > > > > If the goal is to trigger bad behavior then: > > > >=20 > > > > 1. read from an area where a huge page can be mapped to get huge ze= ro page > > > > mapped. hzp is allocated here. refcounter =3D=3D 2. > > > > 2. write to the same page. refcounter =3D=3D 1. > > > > 3. echo 3 > /proc/sys/vm/drop_caches. refcounter =3D=3D 0 -> free t= he hzp. > > > > 4. goto 1. > > > >=20 > > > > But it's unrealistic. /proc/sys/vm/drop_caches is only root-accessi= ble. > > >=20 > > > Yes, drop_caches is uninteresting. > > >=20 > > > > We can trigger shrinker only under memory pressure. But in this, mo= st > > > > likely we will get -ENOMEM on hzp allocation and will go to fallbac= k path > > > > (4k zero page). > > >=20 > > > I disagree. If, for example, there is a large amount of clean > > > pagecache being generated then the shrinkers will be called frequently > > > and memory reclaim will be running at a 100% success rate. The > > > hugepage allocation will be successful in such a situation? > >=20 > > Yes. > >=20 > > Shrinker callbacks are called from shrink_slab() which happens after pa= ge > > cache reclaim, so on next reclaim round page cache will reclaim first a= nd > > we will avoid frequent alloc-free pattern. >=20 > I don't understand this. If reclaim is running continuously (which can > happen pretty easily: "dd if=3D/fast-disk/large-file") then the zero page > will be whipped away very shortly after its refcount has fallen to > zero. >=20 > > One more thing we can do: increase shrinker->seeks to something like > > DEFAULT_SEEKS * 4. In this case shrink_slab() will call our callback af= ter > > callbacks with DEFAULT_SEEKS. >=20 > It would be useful if you could try to make this scenario happen. If > for some reason it doesn't happen then let's understand *why* it > doesn't happen. >=20 > I'm thinking that such a workload would be the above dd in parallel > with a small app which touches the huge page and then exits, then gets > executed again. That "small app" sounds realistic to me. Obviously > one could exercise the zero page's refcount at higher frequency with a > tight map/touch/unmap loop, but that sounds less realistic. It's worth > trying that exercise as well though. >=20 > Or do something else. But we should try to probe this code's > worst-case behaviour, get an understanding of its effects and then > decide whether any such workload is realisic enough to worry about. Okay, I'll try few memory pressure scenarios. Meanwhile, could you take patches 01-09? Patch 09 implements simpler allocation scheme. It would be nice to get all other code tested. Or do you see any other blocker? --=20 Kirill A. Shutemov --9amGYk9869ThD9tj Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJQiEVwAAoJEAd+omnVudOMi7gP/ROh/ocVBGxEtSWi9skeJUdN +r2mLlui5bYqu12gbXzOcJSpqaefvG6BiAaaGl3GMAlWhIvLBvew7m+t3jlBsbIZ V305R0Vea50qYddOmw11TVtTWZR25+bEY94vHDndF8AJ4NXof+ytf/QqRFPPkebU +cwNdxJPmRKw3u0FMkEpY6xbBLIFm1sVxQITIYqH5tZD9bkDGsOyeQj2DaHbtI3p nhExB6zr7gYNff+WnhHFdPpGResRUMXtBbJl51pTkcxY+p1J4h6tz2ee9wOuKp+4 Co6H3tL8BOkH1YYONnWWYty7mcM7Sq47lQMaVBMLBzt8QENbsjCifhPHgdCMjj6m 0EjwvUKuQe2uqON1IH9N+MidS0bdwe7Wnv2Lmok8j0h+hBEp9Aj/FMZXQMDFnbeF kBIoIVfYVyD0Vk3vZDejj1LPjmVQVvddSSNV22QnFrBfG3G69RYBZnT3QtEUpZmF /lVDW1CjDYsO7B14hYaSg2gr1RhGVTdq0/iRnkLVlINO/Pq5FJuY9E5f50bvPUvM dx+T3P59HBXoZU1kHpVk9/rV9I17YGomBeAA2xmn4OQ5Dv3EkYO2FwIVSP+KHpCp XnagOOsiYaAu1pvIMZeFN29keZbM2/yOI6lsgnvj2x/BzA6ywM09LMyMf9dIMltA WwaxuYW1Pogfb6+LgguJ =XaGy -----END PGP SIGNATURE----- --9amGYk9869ThD9tj--