linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Barry Song <21cnbao@gmail.com>,
	chengming.zhou@linux.dev, nphamcs@gmail.com,
	 akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	 Zhongkun He <hezhongkun.hzk@bytedance.com>
Subject: Re: [RFC PATCH] mm: add folio in swapcache if swapin from zswap
Date: Fri, 22 Mar 2024 17:14:37 -0700	[thread overview]
Message-ID: <CAJD7tkZqrrXuYTMYOAP+arMLeNayafFeLocWu7bJtDFHCYjwDA@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkY2y_nGRq9ft80op6q0B3tfJvtyqYhS6t+x=TpyGy+AXg@mail.gmail.com>

[..]
> > > I don't think we want to stop doing exclusive loads in zswap due to this
> > > interaction with zram, which shouldn't be common.
> > >
> > > I think we can solve this by just writing the folio back to zswap upon
> > > failure as I mentioned.
> >
> > Instead of storing again, can we avoid invalidating the entry in the
> > first place if the load is not "exclusive"?
> >
> > The reason for exclusive loads is that the ownership is transferred to
> > the swapcache, so there is no point in keeping our copy. With an
> > optimistic read that doesn't transfer ownership, this doesn't
> > apply. And we can easily tell inside zswap_load() if we're dealing
> > with a swapcache read or not by testing the folio.
> >
> > The synchronous read already has to pin the swp_entry_t to be safe,
> > using swapcache_prepare(). That blocks __read_swap_cache_async() which
> > means no other (exclusive) loads and no invalidates can occur.
> >
> > The zswap entry is freed during the regular swap_free() path, which
> > the sync fault calls on success. Otherwise we keep it.
>
> I thought about this, but I was particularly worried about the need to
> bring back the refcount that was removed when we switched to only
> supporting exclusive loads:
> https://lore.kernel.org/lkml/20240201-b4-zswap-invalidate-entry-v2-6-99d4084260a0@bytedance.com/
>
> It seems to be that we don't need it, because swap_free() will free
> the entry as you mentioned before anyone else has the chance to load
> it or invalidate it. Writeback used to grab a reference as well, but
> it removes the entry from the tree anyway and takes full ownership of
> it then frees it, so that should be okay.
>
> It makes me nervous though to be honest. For example, not long ago
> swap_free() didn't call zswap_invalidate() directly (used to happen to
> swap slots cache draining). Without it, a subsequent load could race
> with writeback without refcount protection, right? We would need to
> make sure to backport 0827a1fb143f ("mm/zswap: invalidate zswap entry
> when swap entry free") with the fix to stable for instance.
>
> I can't find a problem with your diff, but it just makes me nervous to
> have non-exclusive loads without a refcount.
>
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 535c907345e0..686364a6dd86 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1622,6 +1622,7 @@ bool zswap_load(struct folio *folio)
> >         swp_entry_t swp = folio->swap;
> >         pgoff_t offset = swp_offset(swp);
> >         struct page *page = &folio->page;
> > +       bool swapcache = folio_test_swapcache(folio);
> >         struct zswap_tree *tree = swap_zswap_tree(swp);
> >         struct zswap_entry *entry;
> >         u8 *dst;
> > @@ -1634,7 +1635,8 @@ bool zswap_load(struct folio *folio)
> >                 spin_unlock(&tree->lock);
> >                 return false;
> >         }
> > -       zswap_rb_erase(&tree->rbroot, entry);
> > +       if (swapcache)
> > +               zswap_rb_erase(&tree->rbroot, entry);

On second thought, if we don't remove the entry from the tree here,
writeback could free the entry from under us after we drop the lock
here, right?

> >         spin_unlock(&tree->lock);
> >
> >         if (entry->length)
> > @@ -1649,9 +1651,10 @@ bool zswap_load(struct folio *folio)
> >         if (entry->objcg)
> >                 count_objcg_event(entry->objcg, ZSWPIN);
> >
> > -       zswap_entry_free(entry);
> > -
> > -       folio_mark_dirty(folio);
> > +       if (swapcache) {
> > +               zswap_entry_free(entry);
> > +               folio_mark_dirty(folio);
> > +       }
> >
> >         return true;
> >  }


  reply	other threads:[~2024-03-23  0:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22 16:39 [RFC PATCH] mm: add folio in swapcache if swapin from zswap chengming.zhou
2024-03-22 19:37 ` Yosry Ahmed
2024-03-22 21:41   ` Barry Song
2024-03-22 22:33     ` Yosry Ahmed
2024-03-22 23:48       ` Johannes Weiner
2024-03-23  0:12         ` Yosry Ahmed
2024-03-23  0:14           ` Yosry Ahmed [this message]
2024-03-23  1:55             ` Johannes Weiner
2024-03-23  2:02               ` Yosry Ahmed
2024-03-23  2:40                 ` [External] " Zhongkun He
2024-03-23  8:38         ` Zhongkun He
2024-03-23  2:49   ` Chengming Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkZqrrXuYTMYOAP+arMLeNayafFeLocWu7bJtDFHCYjwDA@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).