From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DE62C433DF for ; Tue, 2 Jun 2020 16:48:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AAD4720738 for ; Tue, 2 Jun 2020 16:48:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="iXfuwD/1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AAD4720738 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3DED880007; Tue, 2 Jun 2020 12:48:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 390CD8E0006; Tue, 2 Jun 2020 12:48:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2582A80007; Tue, 2 Jun 2020 12:48:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id 09B1A8E0006 for ; Tue, 2 Jun 2020 12:48:02 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CCFE398C26 for ; Tue, 2 Jun 2020 16:48:01 +0000 (UTC) X-FDA: 76884853962.10.bird61_46011d170eb61 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 9C47798C2D for ; Tue, 2 Jun 2020 16:48:01 +0000 (UTC) X-HE-Tag: bird61_46011d170eb61 X-Filterd-Recvd-Size: 9611 Received: from mail-qt1-f194.google.com (mail-qt1-f194.google.com [209.85.160.194]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Tue, 2 Jun 2020 16:48:00 +0000 (UTC) Received: by mail-qt1-f194.google.com with SMTP id c12so11129492qtq.11 for ; Tue, 02 Jun 2020 09:48:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=yfHwoHEEV5QL/aPzxV6Lxt7b4KjqGfFRukmk4UWU4Eo=; b=iXfuwD/1eJMbgoibB0GhywAuIPYFweGOJWelKhOAL0/I9vM5xGnRGzX6FzWLb98DKC jNxNWkwac7/q27BL7HiSxewrHmRkel4MeEDY5UaCN1B7pVYZ69r3TwyFkpc3UpKm1DUh z9Dd9qrJHboOvjbUj2iXohU4nGSSQHQS72BbDx4aA3oA1EmwJfAen+iWl7L9edrUPTtQ iAMmcDcaHZMDdWwcMdwdYQN/ZuQXmvWVBgfJbT5k+NuaL8Tmsc021adF2NxttOIghuL8 hQrT7ZvA4oxMSzRJ45QEtqya0KxYSfi7twccFGzk4m3hldkG2L2JMpNchWNsQQ/F90oo H/9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=yfHwoHEEV5QL/aPzxV6Lxt7b4KjqGfFRukmk4UWU4Eo=; b=RsDhLGI9DZG6xgxgq8B+KHCuWVJze9GzoJr5wr1xCjiBNtpfDQ4TjjiD2N+VgNiYQs Cpi2sZL80BnPAJKNhAxp7tzYHrAxT10/HnbGQ29rrAfHFeJj/iwa74ydU69o5nTYgCAz X3/AceMeUgaKHE1LQ+yCCEmy+SzQ4MJxW3PN3vAsaX87l19JqnXYtdVjYE+Ll9eOK5iH nkz70JRHNlPuNbuSaclM/H1brxtirUC9hw71Cnmb2KXCw2wsg7zT/mxsR+nTqjBtveNv S0qqm8yl2txLytd/Qx8pERh6xmrmoybIQIvTB4CSY50dO6jCHPY6mLzHxFn5WMsK3HZG Zd0A== X-Gm-Message-State: AOAM533Bej50SXqpcFnK1/sSQ+5e8lbiS1xfS+QR2NWcpCpJUMuJfFlD RPSgE1h9PVVWlq6OndqFFsV3Tg== X-Google-Smtp-Source: ABdhPJwBUSMRhMSr3A93/eStB3AVK2wrSHs6QWoxrCs8SnoVfB8Brx+0QRrCn3tM5ktj3rbTD6rrlQ== X-Received: by 2002:ac8:2c57:: with SMTP id e23mr19758537qta.231.1591116480146; Tue, 02 Jun 2020 09:48:00 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:a7d7]) by smtp.gmail.com with ESMTPSA id a188sm2757826qkg.11.2020.06.02.09.47.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jun 2020 09:47:59 -0700 (PDT) Date: Tue, 2 Jun 2020 12:47:26 -0400 From: Johannes Weiner To: Joonsoo Kim Cc: Linux Memory Management List , Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , LKML , kernel-team@fb.com Subject: Re: [PATCH 05/14] mm: workingset: let cache workingset challenge anon Message-ID: <20200602164726.GA225032@cmpxchg.org> References: <20200520232525.798933-6-hannes@cmpxchg.org> <20200527134333.GF6781@cmpxchg.org> <20200528170155.GA69521@cmpxchg.org> <20200529151228.GA92892@cmpxchg.org> <20200601155615.GA131075@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 9C47798C2D X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 02, 2020 at 11:34:17AM +0900, Joonsoo Kim wrote: > 2020=EB=85=84 6=EC=9B=94 2=EC=9D=BC (=ED=99=94) =EC=98=A4=EC=A0=84 12:5= 6, Johannes Weiner =EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84= =B1: > > On Mon, Jun 01, 2020 at 03:14:24PM +0900, Joonsoo Kim wrote: > > > But, I still think that modified refault activation equation isn't > > > safe. The next > > > problem I found is related to the scan ratio limit patch ("limit th= e range of > > > LRU type balancing") on this series. See the below example. > > > > > > anon: Hot (X M) > > > file: Hot (200 M) / dummy (200 M) > > > P: 1200 M (3 parts, each one 400 M, P1, P2, P3) > > > Access Pattern: A -> F(H) -> P1 -> A -> F(H) -> P2 -> ... -> > > > > > > Without this patch, A and F(H) are kept on the memory and look like > > > it's correct. > > > > > > With this patch and below fix, refault equation for Pn would be: > > > > > > Refault dist of Pn =3D 1200 (from file non-resident) + 1200 * anon = scan > > > ratio (from anon non-resident) > > > anon + active file =3D X + 200 > > > 1200 + 1200 * anon scan ratio (0.5 ~ 2) < X + 200 > > > > That doesn't look quite right to me. The anon part of the refault > > distance is driven by X, so the left-hand of this formula contains X > > as well. > > > > 1000 file (1200M reuse distance, 200M in-core size) + F(H) reactivati= ons + X * scan ratio < X + 1000 >=20 > As I said before, there is no X on left-hand of this formula. To > access all Pn and > re-access P1, we need 1200M file list scan and reclaim. More scan isn't= needed. > With your patch "limit the range of LRU type balancing", scan ratio > between file/anon > list is limited to 0.5 ~ 2.0, so, maximum anon scan would be 1200 M * > 2.0, that is, > 2400 M and not bounded by X. That means that file list cannot be > stable with some X. Oh, no X on the left because you're talking about the number of pages scanned until the first refaults, which is fixed - so why are we still interpreting the refault distance against a variable anon size X? Well, that's misleading. We compare against anon because part of the cache is already encoded in the refault distance. What we're really checking is access distance against total amount of available RAM. Consider this. We want to activate pages where access_distance <=3D RAM and our measure of access distance is: access_distance =3D refault_distance + inactive_file So the comparison becomes: refault_distance + inactive_file < RAM which we simplify to: refault_distance < active_file + anon There is a certain threshold for X simply because there is a certain threshold for RAM beyond which we need to start activating. X cannot be arbitrary, it must be X + cache filling up memory - after all we have page reclaim evicting pages. Again, this isn't new. In the current code, we activate when: refault_distance < active_file which is access_distance <=3D RAM - anon You can see, whether things are stable or not always depends on the existing workingset size. It's just a proxy for how much total RAM we have potentially available to the refaulting page. > If my lastly found example is a correct example (your confirm is requir= ed), > it is also related to the correctness issue since cold pages causes > eviction of the hot pages repeatedly. I think your example is correct, but it benefits from the VM arbitrarily making an assumption that has a 50/50 shot of being true. You and I know which pages are hot and which are cold because you designed the example. All the VM sees is this: - We have an established workingset that has previously shown an access distance <=3D RAM and therefor was activated. - We now have another set that also appears to have an access distance <=3D RAM. The only way to know for sure, however, is sample the established workingset and compare the relative access frequencies. Currently, we just assume the incoming pages are colder. Clearly that's beneficial when it's true. Clearly that can be totally wrong. We must allow a fair comparison between these two sets. For cache, that's already the case - that's why I brought up the cache-only example: if refault distances are 50M and you have 60M of active cache, we activate all refaults and force an even competition between the established workingset and the new pages. Whether we can protect active file when anon needs to shrink first and can't (the activate/iocost split) that's a different question. But I'm no longer so sure after looking into it further. First, we would need two different refault distances: either we consider anon age and need to compare to active_file + anon, or we don't and compare to active_file only. We cannot mix willy nilly, because the metrics wouldn't be comparable. We don't have the space to store two different eviction timestamps, nor could we afford to cut the precision in half. Second, the additional page flag required to implement it. Third, it's somewhat moot because we still have the same behavior when active_file would need to shrink and can't. There can't be a stable state as long as refault distances <=3D active_file. > In this case, they (without patch, with patch) all have some correctnes= s > issue so we need to judge which one is better in terms of overall impac= t. > I don't have strong opinion about it so it's up to you to decide the wa= y to go. If my patch was simply changing the default assumption on which pages are hot and which are cold, I would agree with you - the pros would be equal to the cons, one way wouldn't be more correct than the other. But that isn't what my patch is doing. What it does is get rid of the assumption, to actually sample and compare the access frequencies when there isn't enough data to make an informed decision. That's a net improvement.