From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EDAFC47DD9 for ; Sat, 23 Mar 2024 00:15:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBE5E6B0089; Fri, 22 Mar 2024 20:15:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6E016B008A; Fri, 22 Mar 2024 20:15:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C35D66B008C; Fri, 22 Mar 2024 20:15:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B44F46B0089 for ; Fri, 22 Mar 2024 20:15:17 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2BD1A80989 for ; Sat, 23 Mar 2024 00:15:17 +0000 (UTC) X-FDA: 81926384274.11.1F7943C Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf15.hostedemail.com (Postfix) with ESMTP id 4349CA0019 for ; Sat, 23 Mar 2024 00:15:15 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4nQaZZQy; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711152915; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VkUvhTBt1IYqiRZpIl+GFpvoPq0KklV0EtW5qagQ88k=; b=ZlfK6DQ3sJa4pbGB/LhP7uewHFfOx+hGxKgiTUQhb2hfP30PBlIMe9QAHSF/G0B9aFeD5b l4jrd0RJHiT1E+mev70bZVYwpF/L5GqqSrUCmmaeYEONxXAhogmL2rUTT6MfGlIBGW/EXy sXqAX9aNgKi4KnTjKq9jVBJlEJWJrEg= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4nQaZZQy; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711152915; a=rsa-sha256; cv=none; b=cffm4stGCMq49x7r1QLaKuHHoAgPgh4/ZNpAk4OpzunLTFzKwaHl6jgDXADBOyROqOdRYO erdl6FqexWhA4tntFYsRp7CaL6K9OBxttJedJIFQqXqKsVDVKoJNEf4NwfXvPq5s4AZm4d xVJR75zr4du5eDWO4OsHOrsWtSfNI/M= Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-56bf5104ce2so420009a12.0 for ; Fri, 22 Mar 2024 17:15:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711152913; x=1711757713; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=VkUvhTBt1IYqiRZpIl+GFpvoPq0KklV0EtW5qagQ88k=; b=4nQaZZQy5Anf6k7oDhk33KYdqKUZhiLwYgkr7iU36pkYKqwgYzrMxNyOmNAvQunAsT Bh2OaOfCq+OuIrRsy0TzdQsrl3vYtlMvpPnvFzeqoHAMHjWflzgH/SWlt6SrpUiunsxk 5v1Hl1JxZPEFVofIsU7WxxwwxDpj8bGvIWJcpo6MJsE9i4zykcTKelY08Qzy+WwMfSSW MJOBrgEbZm7dk+asgxSzwEnBpZApElSjh2JtLkxoNapC53kohCHCFEZ7f4liDFfZkZpH GRb7xhzPjVOV+jNJWCGxpN/WweTWkmDkp+ieBf0mz0rvzJ2rX5Wa6yzu6LUQqaXeX4Ul IcKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711152913; x=1711757713; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VkUvhTBt1IYqiRZpIl+GFpvoPq0KklV0EtW5qagQ88k=; b=JVsWNL6WdH06YNSPfSnHBXAekmRKT61NysOYElUgbbQG5kHiAEggZCGJHKmb7XXZ+q MgKEuDmXwcpxyvvJBWqxguB35CYHbsUpvAuP+/sU2feBM004MYLcb2nDwxkJuWKnKDoq ArXC41BKTnfUyBiX97hQAAuKGlwPqI/M37JrPJ6AYrPm8wo5AXKNIsRNbMjOiO8zA9ts bo/iH25bdgF1i5VH4d8WrCi/6UgNj+AmYXyUZ448OdR9U6gmuVn7uuWLBvIF75Uk4qS3 VyqZLjxvxyBJyxFW1w8v8hOZqI56w6i5jD1GYbhGHyiI7816MVvIbTphpvIfuXuncTxb h+ZA== X-Forwarded-Encrypted: i=1; AJvYcCVasjupAEpjs5XjP6aD2SSoGock31t9HNcrvByTkn4Bu1GaeqZQhGRkiXiS/vPntygub/lJzbaWgImtnrJkzEt4xMM= X-Gm-Message-State: AOJu0YxHlSU27aXvWIKP5BthHaiysru8g+OIohtadJr2eeoLxyCmcPb5 QLBOz3ii3E8ngDqF9r21A24YSS+ktBH4HEG964JZn2Es+wrQ0+GcB5OjN/FEjeqWznZWK5z8CaC nAATRRom2Vy/ljdqv8dFo7mx1VgDhQ04ClWz1 X-Google-Smtp-Source: AGHT+IHkVoCll4b8OJSRskIDAfCmLxIBXHf5fO+Fei9Lq6oSjF56LnVWN3ybeliChP/rwQTDV0/JY0E1UG6DpAlQ9jM= X-Received: by 2002:a17:906:a217:b0:a46:e8dc:5d51 with SMTP id r23-20020a170906a21700b00a46e8dc5d51mr731356ejy.25.1711152913434; Fri, 22 Mar 2024 17:15:13 -0700 (PDT) MIME-Version: 1.0 References: <20240322163939.17846-1-chengming.zhou@linux.dev> <20240322234826.GA448621@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Fri, 22 Mar 2024 17:14:37 -0700 Message-ID: Subject: Re: [RFC PATCH] mm: add folio in swapcache if swapin from zswap To: Johannes Weiner Cc: Barry Song <21cnbao@gmail.com>, chengming.zhou@linux.dev, nphamcs@gmail.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhongkun He Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Stat-Signature: eh9xsnwbnx67jaam7zxpahfzhphe1uxx X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4349CA0019 X-HE-Tag: 1711152915-386308 X-HE-Meta: U2FsdGVkX1/p3oG04RApnxbO5RRql/OXTmmVH66XI4CoYFXTbL8Ugt39elPn35RVlvT5TAQ4Z6V6THx/znw+VNAhOgSXG1xpNwWUD14t0w5NjBRJfVoWCy84h2ZrmLICRtVxswBDAr63I+7LmpxZQewwM3BobwB8hbNqxmdQ7GQtg6nSw64vBW81eVs3EZLdB8Jx923rURG6bGcs1FicmeZJj4dnp5ooE1Qfa+yryiKVZT3t3h4cd7GYqsSecf/bny5CZMHZ6j/W7W5XLgIzKBBTX47Hnxz56F+mFXx57fqJRO8jmbgyC+Vvkzozuv5oXFlcnjcVEHRlwYPS+3i8y/shl7FN97+IMStvjWo0pYdrJps56UqFiYfd7uX0neGN5iwh458KkKu9O4CG7kV33M50gQZmbgqAsEcMdaydcZgk66lexet3kc12BhSUrNWMq5yWEAPS0FKfd0dnTknzeiIxwZY7yRUDa4qmgAah7/UvLexVifq/FFMTEA3UKyFcvVwUe1H7ocTNcyPTuigS84ySW8e/OuM4ZGse7fHPfGycTNJVTn6WQgbZ8cL0h6Ko2gJ3Z8htj5+D6aWYktOMvXO3xNcNp7b7AXhwUu4ES0y3IcfuIoiKQbREeakJL7gK5ETxTAYBbA/V0BRKkkDKcCVtao8H0O2Ti1TI3audh+N1Z6v0G+Sbdyhsvm4ru2PB9zlX1nvLQb6gWOi7+9O+6YAltOTzoOhrChcTYOi/QcSIqY7pVbCO0LWdSYIFlOdvGV6ks7zcvNdRaX8DvsjWWCzjE3FSEqMlDa9FK58dTLzVGwRHYzM1bx5psQMnP21Fpup/oSxY+uWnYhEcbDeSe5YPBSIDdeOuFBdhgHR7Z9vNbeHYfPKGdKzs6m0AGEHphcGZ9J3agTaxGQ6Zh/RUhDufjtpDcjUDXoyku6dLXdmMh9NM8kVc/3VW2GJ1dvziHCxIaDr487zCAeQvOCt z1x1ofqa k02m1tSBQk71Gp0eqT9/WydeZ7aj6Rsn2Cl91neojPlhRCBZgY5Fg+6BCS9/IbScCXZwM76U8SEXl8ictO0gniCAbxDq27JcavePCup8z/kUNRIlA3uUTqogRxS65h9EyIC7IqaAg1qbcVzaC+qKurOKKXnVlJnAZYEyBkeeY/nECQRdVXNa/KLS0Y6r/3SHPDi+ILcBUS94SWE5o5v3B0d6vuSoIVIycsvEHRK8j+9rcUUKtHNwG7TqPOz1iG+gsCarR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > > > I don't think we want to stop doing exclusive loads in zswap due to this > > > interaction with zram, which shouldn't be common. > > > > > > I think we can solve this by just writing the folio back to zswap upon > > > failure as I mentioned. > > > > Instead of storing again, can we avoid invalidating the entry in the > > first place if the load is not "exclusive"? > > > > The reason for exclusive loads is that the ownership is transferred to > > the swapcache, so there is no point in keeping our copy. With an > > optimistic read that doesn't transfer ownership, this doesn't > > apply. And we can easily tell inside zswap_load() if we're dealing > > with a swapcache read or not by testing the folio. > > > > The synchronous read already has to pin the swp_entry_t to be safe, > > using swapcache_prepare(). That blocks __read_swap_cache_async() which > > means no other (exclusive) loads and no invalidates can occur. > > > > The zswap entry is freed during the regular swap_free() path, which > > the sync fault calls on success. Otherwise we keep it. > > I thought about this, but I was particularly worried about the need to > bring back the refcount that was removed when we switched to only > supporting exclusive loads: > https://lore.kernel.org/lkml/20240201-b4-zswap-invalidate-entry-v2-6-99d4084260a0@bytedance.com/ > > It seems to be that we don't need it, because swap_free() will free > the entry as you mentioned before anyone else has the chance to load > it or invalidate it. Writeback used to grab a reference as well, but > it removes the entry from the tree anyway and takes full ownership of > it then frees it, so that should be okay. > > It makes me nervous though to be honest. For example, not long ago > swap_free() didn't call zswap_invalidate() directly (used to happen to > swap slots cache draining). Without it, a subsequent load could race > with writeback without refcount protection, right? We would need to > make sure to backport 0827a1fb143f ("mm/zswap: invalidate zswap entry > when swap entry free") with the fix to stable for instance. > > I can't find a problem with your diff, but it just makes me nervous to > have non-exclusive loads without a refcount. > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 535c907345e0..686364a6dd86 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1622,6 +1622,7 @@ bool zswap_load(struct folio *folio) > > swp_entry_t swp = folio->swap; > > pgoff_t offset = swp_offset(swp); > > struct page *page = &folio->page; > > + bool swapcache = folio_test_swapcache(folio); > > struct zswap_tree *tree = swap_zswap_tree(swp); > > struct zswap_entry *entry; > > u8 *dst; > > @@ -1634,7 +1635,8 @@ bool zswap_load(struct folio *folio) > > spin_unlock(&tree->lock); > > return false; > > } > > - zswap_rb_erase(&tree->rbroot, entry); > > + if (swapcache) > > + zswap_rb_erase(&tree->rbroot, entry); On second thought, if we don't remove the entry from the tree here, writeback could free the entry from under us after we drop the lock here, right? > > spin_unlock(&tree->lock); > > > > if (entry->length) > > @@ -1649,9 +1651,10 @@ bool zswap_load(struct folio *folio) > > if (entry->objcg) > > count_objcg_event(entry->objcg, ZSWPIN); > > > > - zswap_entry_free(entry); > > - > > - folio_mark_dirty(folio); > > + if (swapcache) { > > + zswap_entry_free(entry); > > + folio_mark_dirty(folio); > > + } > > > > return true; > > }