All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	lsf-pc@lists.linux-foundation.org, Linux-MM <linux-mm@kvack.org>,
	Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	David Rientjes <rientjes@google.com>,
	Hugh Dickins <hughd@google.com>,
	Seth Jennings <sjenning@redhat.com>,
	Dan Streetman <ddstreet@ieee.org>,
	Vitaly Wool <vitaly.wool@konsulko.com>,
	Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap
Date: Thu, 2 Mar 2023 09:05:28 -0800	[thread overview]
Message-ID: <ZADXWMtKDDq1xP3s@google.com> (raw)
In-Reply-To: <CAJD7tkYbLG1=YbBCF7bj9MbRcH0FkdjTi4tRZ+vAxE5DUodFAg@mail.gmail.com>


On Wed, Mar 01, 2023 at 04:58:08PM -0800, Yosry Ahmed wrote:
> > The indirection layer would be essential to support it but it would
> > be also great if we don't waste any memory for the user who don't
> > want the feature.
> 
> I can't currently think of a way to eliminate overhead for people only
> using swapfiles, as a lot of the core implementation changes, unless
> we want to maintain considerably more code with a lot of repeated
> functionality implemented differently. Perhaps this will change as I
> implement this, maybe things are better (or worse) than what I think
> they are, I am actively working on a proof-of-concept right now. Maybe
> a discussion in LSF/MM/BPF will help come up with optimizations as
> well :)
> 
> >
> > Just FYI, there was similar discussion long time ago about the
> > indirection layer.
> > https://lore.kernel.org/linux-mm/4DA25039.3020700@redhat.com/
> 
> Yeah Hugh shared this one with me earlier, but there are a few things
> that I don't understand how they would work, at least in today's
> world.

Let's add Rik into the discussion, maybe he can help refresh some details.

Chris

> 
> Firstly, the proposal suggests that we store a radix tree index in the
> page tables, and in the radix tree store the swap entry AND the swap
> count. I am not really sure how they would fit in 8 bytes, especially
> if we need continuation and 1 byte is not enough for the swap count.
> Continuation logic now depends on linking vmalloc'd pages using the
> lru field in struct page/folio. Perhaps we can figure out a split that
> gives enough space for swap count without continuation while also not
> limiting swapfile sizes too much.
> 
> Secondly, IIUC in that proposal once we swap a page in, we free the
> swap entry and add the swapcache page to the radix tree instead. In
> that case, where does the swap count go? IIUC we still need to
> maintain it to be able to tell when all processes mapping the page
> have faulted it back, otherwise the radix tree entry is maintained
> indefinitely. We can maybe stash the swap count somewhere else in this
> case, and bring it back to the radix tree if we swap the page out
> again. Not really sure where, we can have a separate radix tree for
> swap counts when the page is in swapcache, or we can always have it in
> a separate radix tree so that the swap entry fits comfortably in the
> first radix tree.
> 
> To be able to accomodate zswap in this design, I think we always need
> a separate radix tree for swap counts. In that case, one radix tree
> contains swap_entry/zswap_entry/swapcache, and the other radix tree
> contains the swap count. I think this may work, but I am not sure if
> the overhead of always doing a lookup to read the swap count is okay.
> I am also sure there would be some fun synchronization problems
> between both trees (but we already need to synchronize today between
> the swapcache and swap counts?).
> 
> It sounds like it is possible to make it work. I will spend some time
> thinking about it. Having 2 radix trees also solves the 32-bit systems
> problem, but I am not sure if it's a generally better design. Radix
> trees also take up some extra space other than the entry size itself,
> so I am not sure how much memory we would end up actually saving.
> 
> Johannes, I am curious if you have any thoughts about this alternative design?
> 


  parent reply	other threads:[~2023-03-02 17:05 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-18 22:38 [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap Yosry Ahmed
2023-02-19  4:31 ` Matthew Wilcox
2023-02-19  9:34   ` Yosry Ahmed
2023-02-28 23:22   ` Chris Li
2023-03-01  0:08     ` Matthew Wilcox
2023-03-01 23:22       ` Chris Li
2023-02-21 18:39 ` Yang Shi
2023-02-21 18:56   ` Yosry Ahmed
2023-02-21 19:26     ` Yang Shi
2023-02-21 19:46       ` Yosry Ahmed
2023-02-21 23:34         ` Yang Shi
2023-02-21 23:38           ` Yosry Ahmed
2023-02-22 16:57             ` Johannes Weiner
2023-02-22 22:46               ` Yosry Ahmed
2023-02-28  4:29                 ` Kalesh Singh
2023-02-28  8:09                   ` Yosry Ahmed
2023-02-28  4:54 ` Sergey Senozhatsky
2023-02-28  8:12   ` Yosry Ahmed
2023-02-28 23:29     ` Minchan Kim
2023-03-02  0:58       ` Yosry Ahmed
2023-03-02  1:25         ` Yosry Ahmed
2023-03-02 17:05         ` Chris Li [this message]
2023-03-02 17:47         ` Chris Li
2023-03-02 18:15           ` Johannes Weiner
2023-03-02 18:56             ` Chris Li
2023-03-02 18:23           ` Rik van Riel
2023-03-02 21:42             ` Chris Li
2023-03-02 22:36               ` Rik van Riel
2023-03-02 22:55                 ` Yosry Ahmed
2023-03-03  4:05                   ` Chris Li
2023-03-03  0:01                 ` Chris Li
2023-03-02 16:58       ` Chris Li
2023-03-01 10:44     ` Sergey Senozhatsky
2023-03-02  1:01       ` Yosry Ahmed
2023-02-28 23:11 ` Chris Li
2023-03-02  0:30   ` Yosry Ahmed
2023-03-02  1:00     ` Yosry Ahmed
2023-03-02 16:51     ` Chris Li
2023-03-03  0:33     ` Minchan Kim
2023-03-03  0:49       ` Yosry Ahmed
2023-03-03  1:25         ` Minchan Kim
2023-03-03 17:15           ` Yosry Ahmed
2023-03-09 12:48     ` Huang, Ying
2023-03-09 19:58       ` Chris Li
2023-03-09 20:19       ` Yosry Ahmed
2023-03-10  3:06         ` Huang, Ying
2023-03-10 23:14           ` Chris Li
2023-03-13  1:10             ` Huang, Ying
2023-03-15  7:41               ` Yosry Ahmed
2023-03-16  1:42                 ` Huang, Ying
2023-03-11  1:06           ` Yosry Ahmed
2023-03-13  2:12             ` Huang, Ying
2023-03-15  8:01               ` Yosry Ahmed
2023-03-16  7:50                 ` Huang, Ying
2023-03-17 10:19                   ` Yosry Ahmed
2023-03-17 18:19                     ` Chris Li
2023-03-17 18:23                       ` Yosry Ahmed
2023-03-20  2:55                     ` Huang, Ying
2023-03-20  6:25                       ` Chris Li
2023-03-23  0:56                         ` Huang, Ying
2023-03-23  6:46                           ` Chris Li
2023-03-23  6:56                             ` Huang, Ying
2023-03-23 18:28                               ` Chris Li
2023-03-23 18:40                                 ` Yosry Ahmed
2023-03-23 19:49                                   ` Chris Li
2023-03-23 19:54                                     ` Yosry Ahmed
2023-03-23 21:10                                       ` Chris Li
2023-03-24 17:28                                       ` Chris Li
2023-03-22  5:56                       ` Yosry Ahmed
2023-03-23  1:48                         ` Huang, Ying
2023-03-23  2:21                           ` Yosry Ahmed
2023-03-23  3:16                             ` Huang, Ying
2023-03-23  3:27                               ` Yosry Ahmed
2023-03-23  5:37                                 ` Huang, Ying
2023-03-23 15:18                                   ` Yosry Ahmed
2023-03-24  2:37                                     ` Huang, Ying
2023-03-24  7:28                                       ` Yosry Ahmed
2023-03-24 17:23                                         ` Chris Li
2023-03-27  1:23                                           ` Huang, Ying
2023-03-28  5:54                                             ` Yosry Ahmed
2023-03-28  6:20                                               ` Huang, Ying
2023-03-28  6:29                                                 ` Yosry Ahmed
2023-03-28  6:59                                                   ` Huang, Ying
2023-03-28  7:59                                                     ` Yosry Ahmed
2023-03-28 14:14                                                       ` Johannes Weiner
2023-03-28 19:59                                                         ` Yosry Ahmed
2023-03-28 21:22                                                           ` Chris Li
2023-03-28 21:30                                                             ` Yosry Ahmed
2023-03-28 20:50                                                       ` Chris Li
2023-03-28 21:01                                                         ` Yosry Ahmed
2023-03-28 21:32                                                           ` Chris Li
2023-03-28 21:44                                                             ` Yosry Ahmed
2023-03-28 22:01                                                               ` Chris Li
2023-03-28 22:02                                                                 ` Yosry Ahmed
2023-03-29  1:31                                                               ` Huang, Ying
2023-03-29  1:41                                                                 ` Yosry Ahmed
2023-03-29 16:04                                                                   ` Chris Li
2023-04-04  8:24                                                                     ` Huang, Ying
2023-04-04  8:10                                                                   ` Huang, Ying
2023-04-04  8:47                                                                     ` Yosry Ahmed
2023-04-06  1:40                                                                       ` Huang, Ying
2023-03-29 15:22                                                                 ` Chris Li
2023-03-10  2:07 ` Luis Chamberlain
2023-03-10  2:15   ` Yosry Ahmed
2023-05-12  3:07 ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZADXWMtKDDq1xP3s@google.com \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=senozhatsky@chromium.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.