All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Chris Li <chrisl@kernel.org>,
	 lsf-pc@lists.linux-foundation.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Linux-MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
	 Shakeel Butt <shakeelb@google.com>,
	David Rientjes <rientjes@google.com>,
	 Hugh Dickins <hughd@google.com>,
	Seth Jennings <sjenning@redhat.com>,
	 Dan Streetman <ddstreet@ieee.org>,
	Vitaly Wool <vitaly.wool@konsulko.com>,
	 Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>,
	 Minchan Kim <minchan@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>,
	 Michal Hocko <mhocko@suse.com>,  Wei Xu <weixugc@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap
Date: Thu, 06 Apr 2023 09:40:45 +0800	[thread overview]
Message-ID: <87pm8hyehu.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <CAJD7tkaQMkAPmcSh7y44efd+M2Dyx63BEm1VVsbQ9bKbu4Woqw@mail.gmail.com> (Yosry Ahmed's message of "Tue, 4 Apr 2023 01:47:00 -0700")

Yosry Ahmed <yosryahmed@google.com> writes:

> On Tue, Apr 4, 2023 at 1:12 AM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Yosry Ahmed <yosryahmed@google.com> writes:
>>
>> > On Tue, Mar 28, 2023 at 6:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>> >>
>> >> Yosry Ahmed <yosryahmed@google.com> writes:
>> >>
>> >> > On Tue, Mar 28, 2023 at 2:32 PM Chris Li <chrisl@kernel.org> wrote:
>> >> >>
>> >> >> On Tue, Mar 28, 2023 at 02:01:09PM -0700, Yosry Ahmed wrote:
>> >> >> > On Tue, Mar 28, 2023 at 1:50 PM Chris Li <chrisl@kernel.org> wrote:
>> >> >> > >
>> >> >> > > On Tue, Mar 28, 2023 at 12:59:31AM -0700, Yosry Ahmed wrote:
>> >> >> > > > > > I don't have a problem with this approach, it is not really clean as
>> >> >> > > > > > we still treat zswap as a swapfile and have to deal with a lot of
>> >> >> > > > > > unnecessary code like swap slots handling and whatnot.
>> >> >> > > > >
>> >> >> > > > > These are existing code?
>> >> >> > >
>> >> >> > > Yes. The ghost swap file are existing code used in Google for many years.
>> >> >> > >
>> >> >> > > > I was referring to the fact that today with zswap being tied to
>> >> >> > > > swapfiles we do some necessary work such as searching for swap slots
>> >> >> > > > during swapout. The initial swap_desc approach aimed to avoid that.
>> >> >> > > > With this minimal ghost swapfile approach we retain this unfavorable
>> >> >> > > > behavior.
>> >> >> > >
>> >> >> > > Can you explain how you can avoid the free swap entry search
>> >> >> > > in the swap descriptor world?
>> >> >> >
>> >> >> > For zswap, in the swap descriptor world, you just need to allocate a
>> >> >> > struct zswap_entry and have the swap descriptor point to it. No need
>> >> >> > for swap slot management since we are not tied to a swapfile and pages
>> >> >> > in zswap do not have a specific position.
>> >> >>
>> >> >> Your swap descriptor will be using one swp_entry_t, which get from the PTE
>> >> >> to lookup, right? That is the swap entry I am talking about. You just
>> >> >> substitute zswap swap entry with the swap descriptor swap entry.
>> >> >> You still need to allocate from the free swap entry space at least once.
>> >> >
>> >> > Oh, you mean the swap ID space. We just need to find an unused ID, we
>> >> > can simply use an allocating xarray
>> >> > (https://docs.kernel.org/core-api/xarray.html#allocating-xarrays).
>> >> > This is simpler than keeping track of swap slots in a swapfile.
>> >>
>> >> If we want to implement the swap entry management inside the zswap
>> >> implementation (instead of reusing swap_map[]), then the allocating
>> >> xarray can be used too.  Some per-entry data (such as swap count, etc.)
>> >> can be stored there.  I understanding that this isn't perfect (one more
>> >> xarray looking up, one more data structure, etc.), but this is a choice
>> >> too.
>> >
>> > My main concern here would be having two separate swap counting
>> > implementations -- although it might not be the end of the world.
>>
>> This isn't a big issue for me.  For file systems, there are duplicated
>> functionality in different file system implementation, such as free
>> block space management.  Instead, I hope we can design better swap
>> implementation in the future.
>>
>> > It would be useful to consider all the options. So far, I think we
>> > have been discussing 3 alternatives:
>> >
>> > (a) The initial swap_desc proposal.
>>
>> My main concern for the initial swap_desc proposal is that the zswap
>> code is put in swap core instead of zswap implementation per my
>> understanding.  So zswap isn't another swap implementation encapsulated
>> with a common interface.  Please correct me if my understanding isn't
>> correct.
>>
>> If so, the flexibility of the swap system is the cost.  For example,
>> zswap may be always at the highest priority among all swap devices.  We
>> can move the cold page from zswap to some swap device.  But we cannot
>> move the cold page from some swap device to zswap.
>
>
> Not really. In the swap_desc proposal, I intended to have struct
> swap_desc contain either a swap device entry (swp_entry_t) or a
> frontswap entry (a pointer). zswap implementation would not be in the
> swap core, instead, we would have two swap implementations: swap
> devices and frontswap/zswap -- each of which implement a common swap
> API. We can use one of the free bits to distinguish the type of the
> underlying entry (swp_entry_t or pointer to frontswap/zswap entry).
>
> We can start by only supporting moving pages from frontswap/zswap to
> swap devices, but I don't see why the same design would not support
> pages moving in the other direction if the need arises.
>
> The number of free bits in swp_entry_t and pointers is limited (2 bits
> on 32-bit systems, 3 bits on 64-bit systems), so there are only a
> handful of different swap types we can support with the swap_desc
> design, but we only need two to begin with. If in the future we need
> more, we can add an indirection layer then or expand swap_desc -- or
> we can encode the data within the swap device itself (how it compares
> to frontswap/zswap).
>
> In summary, the swap_desc proposal does NOT involve moving zswap code
> to core swap, it involves a generic swap API with two implementations:
> swap devices and frontswap/zswap.

This eliminate the main concerns for me!  Thanks!

> The only problems I see with the swap_desc design are:
> - Extra overhead for users using swapfiles only.
> - A bigger leap from what we have today than other ideas proposed
> (e.g. virtual swap device for zswap).

Yes.

Best Regards,
Huang, Ying

>>
>>
>> Maybe compression is always faster than any other swap devices, so we
>> will never need the flexibility.  Maybe the cost to hide zswap behind a
>> common interface is unacceptable.  I'm open to these.  But please
>> provide the evidence, and maybe data.
>>
>> Best Regards,
>> Huang, Ying
>>
>> > (b) Add an optional indirection layer that can move swap entries
>> > between swap devices and add a virtual swap device for zswap in the
>> > kernel.
>> > (c) Add an optional indirection layer that can move entries between
>> > different swap backends. Swap backends would be zswap & swap devices
>> > for now. Zswap needs to implement swap entry management, swap
>> > counting, etc.
>> >
>> > Does this accurately summarize what we have discussed so far?
>> >


  reply	other threads:[~2023-04-06  1:42 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-18 22:38 [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap Yosry Ahmed
2023-02-19  4:31 ` Matthew Wilcox
2023-02-19  9:34   ` Yosry Ahmed
2023-02-28 23:22   ` Chris Li
2023-03-01  0:08     ` Matthew Wilcox
2023-03-01 23:22       ` Chris Li
2023-02-21 18:39 ` Yang Shi
2023-02-21 18:56   ` Yosry Ahmed
2023-02-21 19:26     ` Yang Shi
2023-02-21 19:46       ` Yosry Ahmed
2023-02-21 23:34         ` Yang Shi
2023-02-21 23:38           ` Yosry Ahmed
2023-02-22 16:57             ` Johannes Weiner
2023-02-22 22:46               ` Yosry Ahmed
2023-02-28  4:29                 ` Kalesh Singh
2023-02-28  8:09                   ` Yosry Ahmed
2023-02-28  4:54 ` Sergey Senozhatsky
2023-02-28  8:12   ` Yosry Ahmed
2023-02-28 23:29     ` Minchan Kim
2023-03-02  0:58       ` Yosry Ahmed
2023-03-02  1:25         ` Yosry Ahmed
2023-03-02 17:05         ` Chris Li
2023-03-02 17:47         ` Chris Li
2023-03-02 18:15           ` Johannes Weiner
2023-03-02 18:56             ` Chris Li
2023-03-02 18:23           ` Rik van Riel
2023-03-02 21:42             ` Chris Li
2023-03-02 22:36               ` Rik van Riel
2023-03-02 22:55                 ` Yosry Ahmed
2023-03-03  4:05                   ` Chris Li
2023-03-03  0:01                 ` Chris Li
2023-03-02 16:58       ` Chris Li
2023-03-01 10:44     ` Sergey Senozhatsky
2023-03-02  1:01       ` Yosry Ahmed
2023-02-28 23:11 ` Chris Li
2023-03-02  0:30   ` Yosry Ahmed
2023-03-02  1:00     ` Yosry Ahmed
2023-03-02 16:51     ` Chris Li
2023-03-03  0:33     ` Minchan Kim
2023-03-03  0:49       ` Yosry Ahmed
2023-03-03  1:25         ` Minchan Kim
2023-03-03 17:15           ` Yosry Ahmed
2023-03-09 12:48     ` Huang, Ying
2023-03-09 19:58       ` Chris Li
2023-03-09 20:19       ` Yosry Ahmed
2023-03-10  3:06         ` Huang, Ying
2023-03-10 23:14           ` Chris Li
2023-03-13  1:10             ` Huang, Ying
2023-03-15  7:41               ` Yosry Ahmed
2023-03-16  1:42                 ` Huang, Ying
2023-03-11  1:06           ` Yosry Ahmed
2023-03-13  2:12             ` Huang, Ying
2023-03-15  8:01               ` Yosry Ahmed
2023-03-16  7:50                 ` Huang, Ying
2023-03-17 10:19                   ` Yosry Ahmed
2023-03-17 18:19                     ` Chris Li
2023-03-17 18:23                       ` Yosry Ahmed
2023-03-20  2:55                     ` Huang, Ying
2023-03-20  6:25                       ` Chris Li
2023-03-23  0:56                         ` Huang, Ying
2023-03-23  6:46                           ` Chris Li
2023-03-23  6:56                             ` Huang, Ying
2023-03-23 18:28                               ` Chris Li
2023-03-23 18:40                                 ` Yosry Ahmed
2023-03-23 19:49                                   ` Chris Li
2023-03-23 19:54                                     ` Yosry Ahmed
2023-03-23 21:10                                       ` Chris Li
2023-03-24 17:28                                       ` Chris Li
2023-03-22  5:56                       ` Yosry Ahmed
2023-03-23  1:48                         ` Huang, Ying
2023-03-23  2:21                           ` Yosry Ahmed
2023-03-23  3:16                             ` Huang, Ying
2023-03-23  3:27                               ` Yosry Ahmed
2023-03-23  5:37                                 ` Huang, Ying
2023-03-23 15:18                                   ` Yosry Ahmed
2023-03-24  2:37                                     ` Huang, Ying
2023-03-24  7:28                                       ` Yosry Ahmed
2023-03-24 17:23                                         ` Chris Li
2023-03-27  1:23                                           ` Huang, Ying
2023-03-28  5:54                                             ` Yosry Ahmed
2023-03-28  6:20                                               ` Huang, Ying
2023-03-28  6:29                                                 ` Yosry Ahmed
2023-03-28  6:59                                                   ` Huang, Ying
2023-03-28  7:59                                                     ` Yosry Ahmed
2023-03-28 14:14                                                       ` Johannes Weiner
2023-03-28 19:59                                                         ` Yosry Ahmed
2023-03-28 21:22                                                           ` Chris Li
2023-03-28 21:30                                                             ` Yosry Ahmed
2023-03-28 20:50                                                       ` Chris Li
2023-03-28 21:01                                                         ` Yosry Ahmed
2023-03-28 21:32                                                           ` Chris Li
2023-03-28 21:44                                                             ` Yosry Ahmed
2023-03-28 22:01                                                               ` Chris Li
2023-03-28 22:02                                                                 ` Yosry Ahmed
2023-03-29  1:31                                                               ` Huang, Ying
2023-03-29  1:41                                                                 ` Yosry Ahmed
2023-03-29 16:04                                                                   ` Chris Li
2023-04-04  8:24                                                                     ` Huang, Ying
2023-04-04  8:10                                                                   ` Huang, Ying
2023-04-04  8:47                                                                     ` Yosry Ahmed
2023-04-06  1:40                                                                       ` Huang, Ying [this message]
2023-03-29 15:22                                                                 ` Chris Li
2023-03-10  2:07 ` Luis Chamberlain
2023-03-10  2:15   ` Yosry Ahmed
2023-05-12  3:07 ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pm8hyehu.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=chrisl@kernel.org \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=weixugc@google.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.