All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: lsf-pc@lists.linux-foundation.org, Johannes Weiner <hannes@cmpxchg.org>
Cc: Linux-MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
	 Shakeel Butt <shakeelb@google.com>,
	David Rientjes <rientjes@google.com>,
	 Hugh Dickins <hughd@google.com>,
	Seth Jennings <sjenning@redhat.com>,
	 Dan Streetman <ddstreet@ieee.org>,
	Vitaly Wool <vitaly.wool@konsulko.com>,
	 Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>,
	 Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap
Date: Thu, 11 May 2023 20:07:36 -0700	[thread overview]
Message-ID: <CAJD7tkYb_sGN8mfGVjr2JxdB8Pz8Td=yj9_sBCMrmsKQo56vTg@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkbCnXJ95Qow_aOjNX6NOMU5ovMSHRC+95U4wtW6cM+puw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5184 bytes --]

On Sat, Feb 18, 2023 at 2:38 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Hello everyone,
>
> I would like to propose a topic for the upcoming LSF/MM/BPF in May
> 2023 about swap & zswap (hope I am not too late).
>
> ==================== Intro ====================
> Currently, using zswap is dependent on swapfiles in an unnecessary
> way. To use zswap, you need a swapfile configured (even if the space
> will not be used) and zswap is restricted by its size. When pages
> reside in zswap, the corresponding swap entry in the swapfile cannot
> be used, and is essentially wasted. We also go through unnecessary
> code paths when using zswap, such as finding and allocating a swap
> entry on the swapout path, or readahead in the swapin path. I am
> proposing a swapping abstraction layer that would allow us to remove
> zswap's dependency on swapfiles. This can be done by introducing a
> data structure between the actual swapping implementation (swapfiles,
> zswap) and the rest of the MM code.
>
> ==================== Objective ====================
> Enabling the use of zswap without a backing swapfile, which makes
> zswap useful for a wider variety of use cases. Also, when zswap is
> used with a swapfile, the pages in zswap do not use up space in the
> swapfile, so the overall swapping capacity increases.
>
> ==================== Idea ====================
> Introduce a data structure, which I currently call a swap_desc, as an
> abstraction layer between swapping implementation and the rest of MM
> code. Page tables & page caches would store a swap id (encoded as a
> swp_entry_t) instead of directly storing the swap entry associated
> with the swapfile. This swap id maps to a struct swap_desc, which acts
> as our abstraction layer. All MM code not concerned with swapping
> details would operate in terms of swap descs. The swap_desc can point
> to either a normal swap entry (associated with a swapfile) or a zswap
> entry. It can also include all non-backend specific operations, such
> as the swapcache (which would be a simple pointer in swap_desc), swap
> counting, etc. It creates a clear, nice abstraction layer between MM
> code and the actual swapping implementation.
>
> ==================== Benefits ====================
> This work enables using zswap without a backing swapfile and increases
> the swap capacity when zswap is used with a swapfile. It also creates
> a separation that allows us to skip code paths that don't make sense
> in the zswap path (e.g. readahead). We get to drop zswap's rbtree
> which might result in better performance (less lookups, less lock
> contention).
>
> The abstraction layer also opens the door for multiple cleanups (e.g.
> removing swapper address spaces, removing swap count continuation
> code, etc). Another nice cleanup that this work enables would be
> separating the overloaded swp_entry_t into two distinct types: one for
> things that are stored in page tables / caches, and for actual swap
> entries. In the future, we can potentially further optimize how we use
> the bits in the page tables instead of sticking everything into the
> current type/offset format.
>
> Another potential win here can be swapoff, which can be more practical
> by directly scanning all swap_desc's instead of going through page
> tables and shmem page caches.
>
> Overall zswap becomes more accessible and available to a wider range
> of use cases.
>
> ==================== Cost ====================
> The obvious downside of this is added memory overhead, specifically
> for users that use swapfiles without zswap. Instead of paying one byte
> (swap_map) for every potential page in the swapfile (+ swap count
> continuation), we pay the size of the swap_desc for every page that is
> actually in the swapfile, which I am estimating can be roughly around
> 24 bytes or so, so maybe 0.6% of swapped out memory. The overhead only
> scales with pages actually swapped out. For zswap users, it should be
> a win (or at least even) because we get to drop a lot of fields from
> struct zswap_entry (e.g. rbtree, index, etc).
>
> Another potential concern is readahead. With this design, we have no
> way to get a swap_desc given a swap entry (type & offset). We would
> need to maintain a reverse mapping, adding a little bit more overhead,
> or search all swapped out pages instead :). A reverse mapping might
> pump the per-swapped page overhead to ~32 bytes (~0.8% of swapped out
> memory).
>
> ==================== Bottom Line ====================
> It would be nice to discuss the potential here and the tradeoffs. I
> know that other folks using zswap (or interested in using it) may find
> this very useful. I am sure I am missing some context on why things
> are the way they are, and perhaps some obvious holes in my story.
> Looking forward to discussing this with anyone interested :)
>
> I think Johannes may be interested in attending this discussion, since
> a lot of ideas here are inspired by discussions I had with him :)

For the record, here are the slides that were presented for this
discussion (attached).

[-- Attachment #2: [LSF_MM_BPF 2023] Swap Abstraction _ Native Zswap.pdf --]
[-- Type: application/pdf, Size: 133190 bytes --]

      parent reply	other threads:[~2023-05-12  3:08 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-18 22:38 [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap Yosry Ahmed
2023-02-19  4:31 ` Matthew Wilcox
2023-02-19  9:34   ` Yosry Ahmed
2023-02-28 23:22   ` Chris Li
2023-03-01  0:08     ` Matthew Wilcox
2023-03-01 23:22       ` Chris Li
2023-02-21 18:39 ` Yang Shi
2023-02-21 18:56   ` Yosry Ahmed
2023-02-21 19:26     ` Yang Shi
2023-02-21 19:46       ` Yosry Ahmed
2023-02-21 23:34         ` Yang Shi
2023-02-21 23:38           ` Yosry Ahmed
2023-02-22 16:57             ` Johannes Weiner
2023-02-22 22:46               ` Yosry Ahmed
2023-02-28  4:29                 ` Kalesh Singh
2023-02-28  8:09                   ` Yosry Ahmed
2023-02-28  4:54 ` Sergey Senozhatsky
2023-02-28  8:12   ` Yosry Ahmed
2023-02-28 23:29     ` Minchan Kim
2023-03-02  0:58       ` Yosry Ahmed
2023-03-02  1:25         ` Yosry Ahmed
2023-03-02 17:05         ` Chris Li
2023-03-02 17:47         ` Chris Li
2023-03-02 18:15           ` Johannes Weiner
2023-03-02 18:56             ` Chris Li
2023-03-02 18:23           ` Rik van Riel
2023-03-02 21:42             ` Chris Li
2023-03-02 22:36               ` Rik van Riel
2023-03-02 22:55                 ` Yosry Ahmed
2023-03-03  4:05                   ` Chris Li
2023-03-03  0:01                 ` Chris Li
2023-03-02 16:58       ` Chris Li
2023-03-01 10:44     ` Sergey Senozhatsky
2023-03-02  1:01       ` Yosry Ahmed
2023-02-28 23:11 ` Chris Li
2023-03-02  0:30   ` Yosry Ahmed
2023-03-02  1:00     ` Yosry Ahmed
2023-03-02 16:51     ` Chris Li
2023-03-03  0:33     ` Minchan Kim
2023-03-03  0:49       ` Yosry Ahmed
2023-03-03  1:25         ` Minchan Kim
2023-03-03 17:15           ` Yosry Ahmed
2023-03-09 12:48     ` Huang, Ying
2023-03-09 19:58       ` Chris Li
2023-03-09 20:19       ` Yosry Ahmed
2023-03-10  3:06         ` Huang, Ying
2023-03-10 23:14           ` Chris Li
2023-03-13  1:10             ` Huang, Ying
2023-03-15  7:41               ` Yosry Ahmed
2023-03-16  1:42                 ` Huang, Ying
2023-03-11  1:06           ` Yosry Ahmed
2023-03-13  2:12             ` Huang, Ying
2023-03-15  8:01               ` Yosry Ahmed
2023-03-16  7:50                 ` Huang, Ying
2023-03-17 10:19                   ` Yosry Ahmed
2023-03-17 18:19                     ` Chris Li
2023-03-17 18:23                       ` Yosry Ahmed
2023-03-20  2:55                     ` Huang, Ying
2023-03-20  6:25                       ` Chris Li
2023-03-23  0:56                         ` Huang, Ying
2023-03-23  6:46                           ` Chris Li
2023-03-23  6:56                             ` Huang, Ying
2023-03-23 18:28                               ` Chris Li
2023-03-23 18:40                                 ` Yosry Ahmed
2023-03-23 19:49                                   ` Chris Li
2023-03-23 19:54                                     ` Yosry Ahmed
2023-03-23 21:10                                       ` Chris Li
2023-03-24 17:28                                       ` Chris Li
2023-03-22  5:56                       ` Yosry Ahmed
2023-03-23  1:48                         ` Huang, Ying
2023-03-23  2:21                           ` Yosry Ahmed
2023-03-23  3:16                             ` Huang, Ying
2023-03-23  3:27                               ` Yosry Ahmed
2023-03-23  5:37                                 ` Huang, Ying
2023-03-23 15:18                                   ` Yosry Ahmed
2023-03-24  2:37                                     ` Huang, Ying
2023-03-24  7:28                                       ` Yosry Ahmed
2023-03-24 17:23                                         ` Chris Li
2023-03-27  1:23                                           ` Huang, Ying
2023-03-28  5:54                                             ` Yosry Ahmed
2023-03-28  6:20                                               ` Huang, Ying
2023-03-28  6:29                                                 ` Yosry Ahmed
2023-03-28  6:59                                                   ` Huang, Ying
2023-03-28  7:59                                                     ` Yosry Ahmed
2023-03-28 14:14                                                       ` Johannes Weiner
2023-03-28 19:59                                                         ` Yosry Ahmed
2023-03-28 21:22                                                           ` Chris Li
2023-03-28 21:30                                                             ` Yosry Ahmed
2023-03-28 20:50                                                       ` Chris Li
2023-03-28 21:01                                                         ` Yosry Ahmed
2023-03-28 21:32                                                           ` Chris Li
2023-03-28 21:44                                                             ` Yosry Ahmed
2023-03-28 22:01                                                               ` Chris Li
2023-03-28 22:02                                                                 ` Yosry Ahmed
2023-03-29  1:31                                                               ` Huang, Ying
2023-03-29  1:41                                                                 ` Yosry Ahmed
2023-03-29 16:04                                                                   ` Chris Li
2023-04-04  8:24                                                                     ` Huang, Ying
2023-04-04  8:10                                                                   ` Huang, Ying
2023-04-04  8:47                                                                     ` Yosry Ahmed
2023-04-06  1:40                                                                       ` Huang, Ying
2023-03-29 15:22                                                                 ` Chris Li
2023-03-10  2:07 ` Luis Chamberlain
2023-03-10  2:15   ` Yosry Ahmed
2023-05-12  3:07 ` Yosry Ahmed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJD7tkYb_sGN8mfGVjr2JxdB8Pz8Td=yj9_sBCMrmsKQo56vTg@mail.gmail.com' \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.