On Sat, Feb 18, 2023 at 2:38 PM Yosry Ahmed wrote: > > Hello everyone, > > I would like to propose a topic for the upcoming LSF/MM/BPF in May > 2023 about swap & zswap (hope I am not too late). > > ==================== Intro ==================== > Currently, using zswap is dependent on swapfiles in an unnecessary > way. To use zswap, you need a swapfile configured (even if the space > will not be used) and zswap is restricted by its size. When pages > reside in zswap, the corresponding swap entry in the swapfile cannot > be used, and is essentially wasted. We also go through unnecessary > code paths when using zswap, such as finding and allocating a swap > entry on the swapout path, or readahead in the swapin path. I am > proposing a swapping abstraction layer that would allow us to remove > zswap's dependency on swapfiles. This can be done by introducing a > data structure between the actual swapping implementation (swapfiles, > zswap) and the rest of the MM code. > > ==================== Objective ==================== > Enabling the use of zswap without a backing swapfile, which makes > zswap useful for a wider variety of use cases. Also, when zswap is > used with a swapfile, the pages in zswap do not use up space in the > swapfile, so the overall swapping capacity increases. > > ==================== Idea ==================== > Introduce a data structure, which I currently call a swap_desc, as an > abstraction layer between swapping implementation and the rest of MM > code. Page tables & page caches would store a swap id (encoded as a > swp_entry_t) instead of directly storing the swap entry associated > with the swapfile. This swap id maps to a struct swap_desc, which acts > as our abstraction layer. All MM code not concerned with swapping > details would operate in terms of swap descs. The swap_desc can point > to either a normal swap entry (associated with a swapfile) or a zswap > entry. It can also include all non-backend specific operations, such > as the swapcache (which would be a simple pointer in swap_desc), swap > counting, etc. It creates a clear, nice abstraction layer between MM > code and the actual swapping implementation. > > ==================== Benefits ==================== > This work enables using zswap without a backing swapfile and increases > the swap capacity when zswap is used with a swapfile. It also creates > a separation that allows us to skip code paths that don't make sense > in the zswap path (e.g. readahead). We get to drop zswap's rbtree > which might result in better performance (less lookups, less lock > contention). > > The abstraction layer also opens the door for multiple cleanups (e.g. > removing swapper address spaces, removing swap count continuation > code, etc). Another nice cleanup that this work enables would be > separating the overloaded swp_entry_t into two distinct types: one for > things that are stored in page tables / caches, and for actual swap > entries. In the future, we can potentially further optimize how we use > the bits in the page tables instead of sticking everything into the > current type/offset format. > > Another potential win here can be swapoff, which can be more practical > by directly scanning all swap_desc's instead of going through page > tables and shmem page caches. > > Overall zswap becomes more accessible and available to a wider range > of use cases. > > ==================== Cost ==================== > The obvious downside of this is added memory overhead, specifically > for users that use swapfiles without zswap. Instead of paying one byte > (swap_map) for every potential page in the swapfile (+ swap count > continuation), we pay the size of the swap_desc for every page that is > actually in the swapfile, which I am estimating can be roughly around > 24 bytes or so, so maybe 0.6% of swapped out memory. The overhead only > scales with pages actually swapped out. For zswap users, it should be > a win (or at least even) because we get to drop a lot of fields from > struct zswap_entry (e.g. rbtree, index, etc). > > Another potential concern is readahead. With this design, we have no > way to get a swap_desc given a swap entry (type & offset). We would > need to maintain a reverse mapping, adding a little bit more overhead, > or search all swapped out pages instead :). A reverse mapping might > pump the per-swapped page overhead to ~32 bytes (~0.8% of swapped out > memory). > > ==================== Bottom Line ==================== > It would be nice to discuss the potential here and the tradeoffs. I > know that other folks using zswap (or interested in using it) may find > this very useful. I am sure I am missing some context on why things > are the way they are, and perhaps some obvious holes in my story. > Looking forward to discussing this with anyone interested :) > > I think Johannes may be interested in attending this discussion, since > a lot of ideas here are inspired by discussions I had with him :) For the record, here are the slides that were presented for this discussion (attached).