From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3826DC47DD9 for ; Sat, 23 Mar 2024 02:40:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96EE06B0089; Fri, 22 Mar 2024 22:40:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91FCB6B008C; Fri, 22 Mar 2024 22:40:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E77F6B0092; Fri, 22 Mar 2024 22:40:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 701536B0089 for ; Fri, 22 Mar 2024 22:40:57 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3A8F7A16F2 for ; Sat, 23 Mar 2024 02:40:57 +0000 (UTC) X-FDA: 81926751354.02.0C078B4 Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by imf15.hostedemail.com (Postfix) with ESMTP id 9C154A0008 for ; Sat, 23 Mar 2024 02:40:54 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=B3mqPHL8; spf=pass (imf15.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711161655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ubB5GX00NOgOmV4nuWF8+vu4MS8DTronHMCNxhdTyUg=; b=hxo93KGf82E+4Q10pBmI4kBh4VdUXMHn6+nDQzZ8jcbDblBZh3hOFVhSZYCzwq8a6Fsb7k 3kOkbbb9ydNT9YpJFd3OW7gkVNpvoL6t6muN9EkFu1n9oDh9rsaA9KmXoynquTkpzUgxOS 90iyWSGdLlFkhMYFaxgp6pC6KUijDlg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711161655; a=rsa-sha256; cv=none; b=v3S7kP1of19yoTe+QdnUJdRAQj8P9hm+Ym1pcLQs8jyO05USP0knEtoy23fJKB6g+68zJC cb82oMIiYCV38vghBlLgLfZXsTLJ6vXzOBYet1yns3+5ubqXHWeyeusej6W43iT6EJGOxt roh/Cy3ntE4apxf9DteGX082dOvOnjM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=B3mqPHL8; spf=pass (imf15.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-lf1-f41.google.com with SMTP id 2adb3069b0e04-513d247e3c4so2514889e87.0 for ; Fri, 22 Mar 2024 19:40:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1711161653; x=1711766453; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ubB5GX00NOgOmV4nuWF8+vu4MS8DTronHMCNxhdTyUg=; b=B3mqPHL8Wx6yI56Dv+4S/RHNzvELlC5tmwIomPWNMv7AovgSsLKanIvZ5ykCmH0n9j eUI9GRYHdGoelZbANb9FHA2G1TlvV69d+8QJYH/S+dXdAjoP3OFDzcsFsMn69iq7g83h KfoisAIbA5qmYJRlNqF2hEf74DqYXYPudkdR/y3r9o6AaxYYyjbC/aD+RP38F4AgTbeY mq8rqGCSfE3MfbK7R7AoB3nJwQMXObUlyWj7B/aCISSIUp5uwdbXH/P7AY5rX8hyOVnO D6RL9sf6e/6aaLvOWq+CDw/3By7bFogIgx5dfIkufyzvcptwYsLdfvgNtcAiz4UAkHOi qD2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711161653; x=1711766453; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ubB5GX00NOgOmV4nuWF8+vu4MS8DTronHMCNxhdTyUg=; b=W1/jkdWy3g99HRujT/aNPk/E5a3mLEzcVDwuRkjofxUKeZgzfSM51JOIXPNvmWUzGC v4hGMV5xbv99R4tEh4cTbF/YsSWInf4nVcSB/sX8O5jhz78JHn+GqVK9rjiYTsRVKauD AGWOm5jIZ4He/kOyWAzvitmJ7A4JFfOyIVSlkp61y8cfIIyMmLq593jwkUX4Va9yPfXq jBdiwft7j1ouT4RezOXZm7CcIJQowCPPtLZXtCuBAb9iUGFT0O1phfcfpFDMDAU4BNk/ 5b2JJr+wd1V5qpH2PdB1HgTmHHLUKeTQSNOr2p/oePeEcXt0ymTbeWNL1cg9IIBVYLq7 gsOQ== X-Forwarded-Encrypted: i=1; AJvYcCUagaFkoo/rPVlKWC+ewaQnhDlYjfl2x4tZ/+CqcTK5Frb9qU9kptP6fNRoaEppWyw1uzNgM60tKf8mUzwPr3jXbDw= X-Gm-Message-State: AOJu0Yz6QpZaWnQghLGpaUGOaLdAtBYLSsuB+UrmhtlY++IM2ADRCNwC w1w7zZsKUxF+BQfPj8ERwUae1h961gGobax4qnXkFU+x/YKofM8gGeR3k5LvvKD2z9x5ZHswEA9 s3jnGhiBfM+nUQBBz3pZ/z8b4s4jNy0bO1pI6Bg== X-Google-Smtp-Source: AGHT+IFFT6p2ImBUNbS0EynuAT9cROXdyN5l4NtKq00juO1PxjAfU6zQEBsUh/9zRUXPNFb3knrnkinHt5kXsOlgy54= X-Received: by 2002:a19:6919:0:b0:513:a2d0:28c7 with SMTP id e25-20020a196919000000b00513a2d028c7mr201586lfc.16.1711161652556; Fri, 22 Mar 2024 19:40:52 -0700 (PDT) MIME-Version: 1.0 References: <20240322163939.17846-1-chengming.zhou@linux.dev> <20240322234826.GA448621@cmpxchg.org> <20240323015543.GB448621@cmpxchg.org> In-Reply-To: From: Zhongkun He Date: Sat, 23 Mar 2024 10:40:38 +0800 Message-ID: Subject: Re: [External] Re: [RFC PATCH] mm: add folio in swapcache if swapin from zswap To: Yosry Ahmed Cc: Johannes Weiner , Barry Song <21cnbao@gmail.com>, chengming.zhou@linux.dev, nphamcs@gmail.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9C154A0008 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 77fonxepxfm6mo6aydufaeunxfoe1nx7 X-HE-Tag: 1711161654-281864 X-HE-Meta: U2FsdGVkX1/rlazItVMfui5/ZTilmFeahUjKLXdeg27aNrwoFo3faSY6ionbV3ubfTqXYUqEWdVW/sseAW325ohJTLuPs01dbK5BZebMJlB0CyPcHcv13mUbEg6/QavxxMVmrJygnUOtNA7v22plMUqGhiu7MulVEMtEs6dvQQ8JzQUnt6pXFmygfzni4zJTs4HQV4vk0XyNHJNIvHgmNKF/LNKrIcL3RvV8TZWn/kMY8YIaIHK6cKYJ0zANmX/jpDpPFwLBuQvv8h7H3hrjB1xSgmnTDuBn2Bv6P5zMYo+yHkIZEOphb2d7kLXaBlSvAvk84/mOzjNbT9P2vnFqt5IuDADhD5OFuawzu+1ZVD1uiAPwDgGXu1R0CkPvDSlHW99bNL8W4/qc/ANJTz9MBUHeY62GZvxI0I4jyAaToJBPuo1kxjWSdo+uIG4tE5uvnLwHIQ9wd+re88eN7AJA2uyb65fqwy0AjONl/RPwgtaaf873Pa0BhIy20C+fKFipqnqYhQehoRFtS1jEL0N8gun3bQCoYBwUNgkOt87ogeW7VZAcsPH8rNo5l3kjUplcdKJscAejuzCxSup0tw3FdCsOtXTPxdIO7/USq3kELHOV8xyktYzXM44gK2JG/lvfvwEWvOU2btYAit+kVO3vp5o2C6RY2WKSwvcH/w670iOsHQjm0ErfkMosxcrZve3JQ5NAf/O2YJJ1coGvAws6KxMKLHj4vWOU8GnMQ+kRWEhuOFO+aDegV5YZhTW23YO6QUMXJXJy3vhX70XTpMf2ulx7W6LDMcBcg5nbkWZ2TBwnoOKCJw7J7XdmhsmJSnPTGK7nYML8eaIUJ9eWU6PabbT4u3Xhlcardhb9RfNoTUYvfwrgunoLdeyydg0o4Peq9KLbaV/P62jtNUFfCMH2FP1511G39t829Gd3sDKJnRylEY1TPEyPZEp+4i8XRDRLa3MfLB/ty8DdJtQUh6o qqvvqee4 7wXqrZjuYQv2o1F3pVYgBh0HXJ197P/sNNnNrS8AqOFBC5pJG9ZcLLBI6s6LIX6lBB3VWVwy10vWUwkp5oLi9iWDYj1RB3XG1RpUMSxxp/Nkgn2svSR+Ydod3O3LscmWHbyDmOhQqWiFwxzG+YWb7N8g9e9hrvvwiuDm24fNgyaEtiub4Re/IVuMNMWe6EteThBhvtxJHKBeyYmceexg0zD9D9yOnRpgs7vtiUcCeQRS34t5Sc9iV3Qs+VwVYT9BmHgLv7Favlwk8x8Eg288upYWzmvj+x7njVYgUzeUwwJpx/psbHXG02fyyF8VeZqR1NqL+oMa7FEWnTwQ+p9HMq2v7DTpeBGLrkta4HoNTkSQXz3jr5OeFHan9yiAdiGtOW4EE5Asc7dtu3u4OKQxNki770HrYmikszcCwJixv56r3mEQFA2XSi0fT8Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Mar 23, 2024 at 10:03=E2=80=AFAM Yosry Ahmed wrote: > > On Fri, Mar 22, 2024 at 6:55=E2=80=AFPM Johannes Weiner wrote: > > > > On Fri, Mar 22, 2024 at 05:14:37PM -0700, Yosry Ahmed wrote: > > > [..] > > > > > > I don't think we want to stop doing exclusive loads in zswap du= e to this > > > > > > interaction with zram, which shouldn't be common. > > > > > > > > > > > > I think we can solve this by just writing the folio back to zsw= ap upon > > > > > > failure as I mentioned. > > > > > > > > > > Instead of storing again, can we avoid invalidating the entry in = the > > > > > first place if the load is not "exclusive"? > > > > > > > > > > The reason for exclusive loads is that the ownership is transferr= ed to > > > > > the swapcache, so there is no point in keeping our copy. With an > > > > > optimistic read that doesn't transfer ownership, this doesn't > > > > > apply. And we can easily tell inside zswap_load() if we're dealin= g > > > > > with a swapcache read or not by testing the folio. > > > > > > > > > > The synchronous read already has to pin the swp_entry_t to be saf= e, > > > > > using swapcache_prepare(). That blocks __read_swap_cache_async() = which > > > > > means no other (exclusive) loads and no invalidates can occur. > > > > > > > > > > The zswap entry is freed during the regular swap_free() path, whi= ch > > > > > the sync fault calls on success. Otherwise we keep it. > > > > > > > > I thought about this, but I was particularly worried about the need= to > > > > bring back the refcount that was removed when we switched to only > > > > supporting exclusive loads: > > > > https://lore.kernel.org/lkml/20240201-b4-zswap-invalidate-entry-v2-= 6-99d4084260a0@bytedance.com/ > > > > > > > > It seems to be that we don't need it, because swap_free() will free > > > > the entry as you mentioned before anyone else has the chance to loa= d > > > > it or invalidate it. Writeback used to grab a reference as well, bu= t > > > > it removes the entry from the tree anyway and takes full ownership = of > > > > it then frees it, so that should be okay. > > > > > > > > It makes me nervous though to be honest. For example, not long ago > > > > swap_free() didn't call zswap_invalidate() directly (used to happen= to > > > > swap slots cache draining). Without it, a subsequent load could rac= e > > > > with writeback without refcount protection, right? We would need to > > > > make sure to backport 0827a1fb143f ("mm/zswap: invalidate zswap ent= ry > > > > when swap entry free") with the fix to stable for instance. > > > > > > > > I can't find a problem with your diff, but it just makes me nervous= to > > > > have non-exclusive loads without a refcount. > > > > > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > index 535c907345e0..686364a6dd86 100644 > > > > > --- a/mm/zswap.c > > > > > +++ b/mm/zswap.c > > > > > @@ -1622,6 +1622,7 @@ bool zswap_load(struct folio *folio) > > > > > swp_entry_t swp =3D folio->swap; > > > > > pgoff_t offset =3D swp_offset(swp); > > > > > struct page *page =3D &folio->page; > > > > > + bool swapcache =3D folio_test_swapcache(folio); > > > > > struct zswap_tree *tree =3D swap_zswap_tree(swp); > > > > > struct zswap_entry *entry; > > > > > u8 *dst; > > > > > @@ -1634,7 +1635,8 @@ bool zswap_load(struct folio *folio) > > > > > spin_unlock(&tree->lock); > > > > > return false; > > > > > } > > > > > - zswap_rb_erase(&tree->rbroot, entry); > > > > > + if (swapcache) > > > > > + zswap_rb_erase(&tree->rbroot, entry); > > > > > > On second thought, if we don't remove the entry from the tree here, > > > writeback could free the entry from under us after we drop the lock > > > here, right? > > > > The sync-swapin does swapcache_prepare() and holds SWAP_HAS_CACHE, so > > racing writeback would loop on the -EEXIST in __read_swap_cache_async()= . > > (Or, if writeback wins the race, sync-swapin fails on swapcache_prepare= () > > instead and bails on the fault.) > > > > This isn't coincidental. The sync-swapin needs to, and does, serialize > > against the swap entry moving into swapcache or being invalidated for > > it to be safe. Which is the same requirement that zswap ops have. > > You are right. Even if swap_free() isn't called under SWAP_HAS_CACHE's > protection, a subsequent load will also be protected by SWAP_HAS_CACHE > (whether it's swapped in with sync swapin or throught the swapcache) > -- so it would be protected against writeback as well. Now it seems > like we may have been able to drop the refcount even without exclusive > loads..? > > Anyway, I think your fix is sound. Zhongkun, do you mind confirming > that the diff Johannes sent fixes the problem for you? OK=EF=BC=8C I will try it and come back in a few hours. Thanks for the solution, it sounds great.