From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CD52C6FD19 for ; Sat, 11 Mar 2023 01:07:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECC028E0003; Fri, 10 Mar 2023 20:07:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E7D048E0001; Fri, 10 Mar 2023 20:07:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1C698E0003; Fri, 10 Mar 2023 20:07:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C0FE38E0001 for ; Fri, 10 Mar 2023 20:07:16 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8FEDE1A033A for ; Sat, 11 Mar 2023 01:07:16 +0000 (UTC) X-FDA: 80554828872.16.F1884AC Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf02.hostedemail.com (Postfix) with ESMTP id A647080005 for ; Sat, 11 Mar 2023 01:07:14 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pOF4h9fx; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678496834; a=rsa-sha256; cv=none; b=8nM3pdF0y/Ya3Rooh2WHBrLPJZa7iFy90JglmvxUuhwJaqQblEwXF9W7Rhs9e6LL6Po7ze y99aA3MltKjvwSaCOGUs/BXDIqnvNallR9kfilwMSWjheP40Vl0LeSQmI4+eThCtm7ax3Z ktFn2ViIn4EwAaPHTby8W1Ak1dNFBmY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pOF4h9fx; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678496834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/UlIXy6iO8PbQdUNZyajLHAzRDZ44pl8YlZTmqyXGV4=; b=GyBTV50LGJP9RLuvy398VFt3LCFkwwVQmu0yiJi29ZsTo87AqvB2Y1RsZPQf5X1WCpbVaQ yNAuRpolssMu4YdOkmezlQ/8Ah2Ax15oK6Vr8a1WKN18LADQ0q68Pl1tcurOTyTsKzuXwg tJqBmVKlPGYPYKz7GL9u2dDSVvovO+I= Received: by mail-ed1-f48.google.com with SMTP id r15so371843edq.11 for ; Fri, 10 Mar 2023 17:07:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678496833; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/UlIXy6iO8PbQdUNZyajLHAzRDZ44pl8YlZTmqyXGV4=; b=pOF4h9fxNMqYDMGVEKqZqkPdVtYmdAPQw/iQYj4gKAgLlbbWVgNfrp51ZqK+SUUH4l 57LB/+0VFjA/aL2JOimD83DFEhC1oxwURpaF/rKXbUWkhuH9HgH7hX5uglOSxk3brdob b3EstwirAEKwkQVcvs2yJ5MOK7MBtXq2QcBZF0VXI68xIE0AxeZe1eK6Ew4iW3msY2A1 j0O5NQnm1xuDr1OGUwn0Kr+o4tdgL1I8DfFoegJrstKNTcwxvKMzJdYWmWSCQmkBfIUW 190L/zDg0bcgZ2L2KtMOWyrfEFX7KFucL8U4+UI1LqY4oSD0pxckHGSzkRTqlGPDtlge vIWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678496833; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/UlIXy6iO8PbQdUNZyajLHAzRDZ44pl8YlZTmqyXGV4=; b=3gZre1bqtBFdEymeWBPcpTMZmnNibWevMDAklGcXDvbHzWru6uuY8P+xBCp7lxuh9L 9c9/CHJmLV8Vldg9tkKBITjxBm/REsG16OGl+UPOSYmZkEMew6c6G66ZvfJH7UxXLDVh /gQm2e5Gh4wymhv4EQb/j4MFCXo/TGjnptizYcuVD2njZdiaXVEZVts4ELyAiv6j+e03 6tZlJw13Z02Ood2dcwgzTh+jHBunJEbVYCkE5jB6+GWpoM+5h8mUi8C4DcPkyUXMs/dE qzsnYf8EJM3gl50tkGbd9Gjhuf7q/7Oc2Mt6za93PJMCPC/CDbIHCLy4VtWvAibrbbQR OtnQ== X-Gm-Message-State: AO0yUKUNkcNpVkM0kvQ2yyAnlZPND6iol9+oFOK4opKylXddzOdbyI7l A07d5TAMg16Ds2zUd/OlDqgJpFdvM0z7CYQgTKGHY8ul8YuW7SUPaX07Ng== X-Google-Smtp-Source: AK7set+VUhvBm+DfohlS9VxGeWgZUwwWjYP6fcIZzW2SFTcs2MhDvWcdilYDsvQcEIRmfC3nxmbscaekb5ZVEsnDnI0= X-Received: by 2002:a50:9fa9:0:b0:4ac:b616:4ba9 with SMTP id c38-20020a509fa9000000b004acb6164ba9mr15247282edf.5.1678496832603; Fri, 10 Mar 2023 17:07:12 -0800 (PST) MIME-Version: 1.0 References: <87356e850j.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1o571aa.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87y1o571aa.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Fri, 10 Mar 2023 17:06:35 -0800 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap To: "Huang, Ying" Cc: Chris Li , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton , Aneesh Kumar K V , Michal Hocko , Wei Xu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: A647080005 X-Rspamd-Server: rspam01 X-Stat-Signature: qtptd9cwguqebozin9s4s7j4de7a5xdn X-HE-Tag: 1678496834-158107 X-HE-Meta: U2FsdGVkX1+FuavxT2+2tgtt4IwY/qw01uDSpO4Z8EEKa1JPdSeNV87y++ItHNTpEPm8SmyecbTAApR3oBVJRtirVgbT20k8WFti7Od0F2dNt8t6dx0CRBUKc2j/+W71pMoJ35ESJD+RgD6zRMKe163HQXDsViqzWTbRgbILy1QWv0riwSOePwy/gTLcow/F9o0o/43qMub53P0iadGpTxYUmZAqn+KDDoCpmR2QurXvyTV7BsnQnYPGny1d1JHCUDm0EM3k5vKR1yYPBbgrNEXrGJtwZgeBrp9lj7+akfKLQpGm7h6SMX1ipcJtfWCYMQY6qGG8MZ7VdbPOXQJEcRyrZRYIbOwy1xWcSj/wNSOmVl94EDWx+ry4bZqJou0L95JhAAH9NosAC3ynOvGQfZCL5ptWQ4IHfK9eNDaS0TpUq037rSLXzbA4HqAg6HeC7mJqw/AHGFIUKRzg9o/upv+rLBQePRdG1FggLV/MAkmoKYWn5/7P3rgYF4rCUvDNmXLlx1hFC/r9ZOyfzIrU2HJj80qxZhe+++Ks1e13ruHWK0ek9iV9XG+5odBWu0wd7YQD2XRfL+E/OJnSEAS82aNZ78OdcqPKnC/qL6odmykSWG1YpEJVqsUTak9gjaL7hgFfCXpeGZ6KC8DmV+gm5SGJJLdubtC/GO2Uk+5GsF/Bh/MyjYpYiVhKQamhD0mVX0h87gA2NUnIaOH0oJqd5kidsCnDymCv0ooKSu5pGP2Yt1KNy1waatXZXKtc8kVpBJYzkZl5XTw1eOEFjfyFBSCyBMtFBsOmdZATqe2b4iweWIL9dZWbhfEATt06+O3JYEg7p9Z1E1UbodiCkuxaa8dCnEHnYQenuL4BEg2x5RXRVYYRenUdal4CARbx6md4enV9d+W0b2NHYDSzVqtT0NCbDYQ6jQkS9YIijRibNzSCR27M6uJiTiTWWE260jYK9/nePwpuZ9jw3bRkymp 2UcwsWHB uG0iTBaQogMopXKGvrEU8oosk/rqPFntYxxSFyTQdBRQ2x5QhTclUJuuVkxSEnNxjWqVwR8QG3HV3JQfhT3z3HQJcACNvvSqaG9eq2qkQPwN6tGEqelMgfwLNyk+aEjeHk443g8p4rgVal4DhOClGPrMaV6zp6q/wLqstQ2l6DtxZ3I+yVSeAejFTn93ED9O/HBeHh1u6scU4wJF3rreptF+1aduhCIrmYDOWcFFceqcT3sYrYLhzW/003p3ITmizEjiw9EScstD9jcK5a57k4RLTrE4hF7mOpwVbGlBFvGZI+NUiqNHqGQ7+xfNHzBMF8FalPnsjtnunKVllTrK3rg1sWuVEDhNDYfUM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 9, 2023 at 7:07=E2=80=AFPM Huang, Ying w= rote: > > Yosry Ahmed writes: > > > On Thu, Mar 9, 2023 at 4:49=E2=80=AFAM Huang, Ying wrote: > >> > >> Yosry Ahmed writes: > >> > >> > On Tue, Feb 28, 2023 at 3:11 PM Chris Li wrote: > >> >> > >> >> Hi Yosry, > >> >> > >> >> On Sat, Feb 18, 2023 at 02:38:40PM -0800, Yosry Ahmed wrote: > >> >> > Hello everyone, > >> >> > > >> >> > I would like to propose a topic for the upcoming LSF/MM/BPF in Ma= y > >> >> > 2023 about swap & zswap (hope I am not too late). > >> >> > >> >> I am very interested in participating in this discussion as well. > >> > > >> > That's great to hear! > >> > > >> >> > >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Obje= ctive =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> >> > Enabling the use of zswap without a backing swapfile, which makes > >> >> > zswap useful for a wider variety of use cases. Also, when zswap i= s > >> >> > used with a swapfile, the pages in zswap do not use up space in t= he > >> >> > swapfile, so the overall swapping capacity increases. > >> >> > >> >> Agree. > >> >> > >> >> > > >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Idea= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> >> > Introduce a data structure, which I currently call a swap_desc, a= s an > >> >> > abstraction layer between swapping implementation and the rest of= MM > >> >> > code. Page tables & page caches would store a swap id (encoded as= a > >> >> > swp_entry_t) instead of directly storing the swap entry associate= d > >> >> > with the swapfile. This swap id maps to a struct swap_desc, which= acts > >> >> > >> >> Can you provide a bit more detail? I am curious how this swap id > >> >> maps into the swap_desc? Is the swp_entry_t cast into "struct > >> >> swap_desc*" or going through some lookup table/tree? > >> > > >> > swap id would be an index in a radix tree (aka xarray), which contai= ns > >> > a pointer to the swap_desc struct. This lookup should be free with > >> > this design as we also use swap_desc to directly store the swap cach= e > >> > pointer, so this lookup essentially replaces the swap cache lookup. > >> > > >> >> > >> >> > as our abstraction layer. All MM code not concerned with swapping > >> >> > details would operate in terms of swap descs. The swap_desc can p= oint > >> >> > to either a normal swap entry (associated with a swapfile) or a z= swap > >> >> > entry. It can also include all non-backend specific operations, s= uch > >> >> > as the swapcache (which would be a simple pointer in swap_desc), = swap > >> >> > >> >> Does the zswap entry still use the swap slot cache and swap_info_st= ruct? > >> > > >> > In this design no, it shouldn't. > >> > > >> >> > >> >> > This work enables using zswap without a backing swapfile and incr= eases > >> >> > the swap capacity when zswap is used with a swapfile. It also cre= ates > >> >> > a separation that allows us to skip code paths that don't make se= nse > >> >> > in the zswap path (e.g. readahead). We get to drop zswap's rbtree > >> >> > which might result in better performance (less lookups, less lock > >> >> > contention). > >> >> > > >> >> > The abstraction layer also opens the door for multiple cleanups (= e.g. > >> >> > removing swapper address spaces, removing swap count continuation > >> >> > code, etc). Another nice cleanup that this work enables would be > >> >> > separating the overloaded swp_entry_t into two distinct types: on= e for > >> >> > things that are stored in page tables / caches, and for actual sw= ap > >> >> > entries. In the future, we can potentially further optimize how w= e use > >> >> > the bits in the page tables instead of sticking everything into t= he > >> >> > current type/offset format. > >> >> > >> >> Looking forward to seeing more details in the upcoming discussion. > >> >> > > >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Cost= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> >> > The obvious downside of this is added memory overhead, specifical= ly > >> >> > for users that use swapfiles without zswap. Instead of paying one= byte > >> >> > (swap_map) for every potential page in the swapfile (+ swap count > >> >> > continuation), we pay the size of the swap_desc for every page th= at is > >> >> > actually in the swapfile, which I am estimating can be roughly ar= ound > >> >> > 24 bytes or so, so maybe 0.6% of swapped out memory. The overhead= only > >> >> > scales with pages actually swapped out. For zswap users, it shoul= d be > >> >> > >> >> Is there a way to avoid turning 1 byte into 24 byte per swapped > >> >> pages? For the users that use swap but no zswap, this is pure overh= ead. > >> > > >> > That's what I could think of at this point. My idea was something li= ke this: > >> > > >> > struct swap_desc { > >> > union { /* Use one bit to distinguish them */ > >> > swp_entry_t swap_entry; > >> > struct zswap_entry *zswap_entry; > >> > }; > >> > struct folio *swapcache; > >> > atomic_t swap_count; > >> > u32 id; > >> > } > >> > > >> > Having the id in the swap_desc is convenient as we can directly map > >> > the swap_desc to a swp_entry_t to place in the page tables, but I > >> > don't think it's necessary. Without it, the struct size is 20 bytes, > >> > so I think the extra 4 bytes are okay to use anyway if the slab > >> > allocator only allocates multiples of 8 bytes. > >> > > >> > The idea here is to unify the swapcache and swap_count implementatio= n > >> > between different swap backends (swapfiles, zswap, etc), which would > >> > create a better abstraction and reduce reinventing the wheel. > >> > > >> > We can reduce to only 8 bytes and only store the swap/zswap entry, b= ut > >> > we still need the swap cache anyway so might as well just store the > >> > pointer in the struct and have a unified lookup-free swapcache, so > >> > really 16 bytes is the minimum. > >> > > >> > If we stop at 16 bytes, then we need to handle swap count separately > >> > in swapfiles and zswap. This is not the end of the world, but are th= e > >> > 8 bytes worth this? > >> > >> If my understanding were correct, for current implementation, we need > >> one swap cache pointer per swapped out page too. Even after calling > >> __delete_from_swap_cache(), we store the "shadow" entry there. Althou= gh > >> it's possible to implement shadow entry reclaiming like that for file > >> cache shadow entry (workingset_shadow_shrinker), we haven't done that > >> yet. And, it appears that we can live with that. So, in current > >> implementation, for each swapped out page, we use 9 bytes. If so, the > >> memory usage ratio is 24 / 9 =3D 2.667, still not trivial, but not as > >> horrible as 24 / 1 =3D 24. > > > > Unfortunately it's a little bit more. 24 is the extra overhead. > > > > Today we have an xarray entry for each swapped out page, that either > > has the swapcache pointer or the shadow entry. > > > > With this implementation, we have an xarray entry for each swapped out > > page, that has a pointer to the swap_desc. > > > > Ignoring the overhead of the xarray itself, we have (8 + 24) / (8 + 1) = =3D 3.5556. > > OK. I see. We can only hold 8 bytes for each xarray entry. To save > memory usage, we can allocate multiple swap_desc (e.g., 16) for each > xarray entry. Then the memory usage of xarray becomes 1/N. > > > For rotating disks, this might be even higher (8 + 32) / (8 + 1) =3D 4.= 444 > > > > This is because we need to maintain a reverse mapping between > > swp_entry_t and the swap_desc to use for cluster readahead. I am > > assuming we can limit cluster readahead for rotating disks only. > > If reverse mapping cannot be avoided for enough situation, it's better > to only keep swap_entry in swap_desc, and create another xarray indexed > by swap_entry and store swap_cache, swap_count etc. My current idea is to have one xarray that stores the swap_descs (which include swap_entry, swapcache, swap_count, etc), and only for rotating disks have an additional xarray that maps swap_entry -> swap_desc for cluster readahead, assuming we can eliminate all other situations requiring a reverse mapping. I am not sure how having separate xarrays help? If we have one xarray, might as well save the other lookups on put everything in swap_desc. In fact, this should improve the locking today as swapcache / swap_count operations can be lockless or very lightly contended. If the point is to store the swap_desc directly inside the xarray to save 8 bytes, I am concerned that having multiple xarrays for swapcache, swap_count, etc will use more than that. > > > >> > >> > Keep in mind that the current overhead is 1 byte O(max swap pages) n= ot > >> > O(swapped). Also, 1 byte is assuming we do not use the swap > >> > continuation pages. If we do, it may end up being more. We also > >> > allocate continuation in full 4k pages, so even if one swap_map > >> > element in a page requires continuation, we will allocate an entire > >> > page. What I am trying to say is that to get an actual comparison yo= u > >> > need to also factor in the swap utilization and the rate of usage of > >> > swap continuation. I don't know how to come up with a formula for th= is > >> > tbh. > >> > > >> > Also, like Johannes said, the worst case overhead (32 bytes if you > >> > count the reverse mapping) is 0.8% of swapped memory, aka 8M for eve= ry > >> > 1G swapped. It doesn't sound *very* bad. I understand that it is pur= e > >> > overhead for people not using zswap, but it is not very awful. > >> > > >> >> > >> >> It seems what you really need is one bit of information to indicate > >> >> this page is backed by zswap. Then you can have a seperate pointer > >> >> for the zswap entry. > >> > > >> > If you use one bit in swp_entry_t (or one of the available swap type= s) > >> > to indicate whether the page is backed with a swapfile or zswap it > >> > doesn't really work. We lose the indirection layer. How do we move t= he > >> > page from zswap to swapfile? We need to go update the page tables an= d > >> > the shmem page cache, similar to swapoff. > >> > > >> > Instead, if we store a key else in swp_entry_t and use this to looku= p > >> > the swp_entry_t or zswap_entry pointer then that's essentially what > >> > the swap_desc does. It just goes the extra mile of unifying the > >> > swapcache as well and storing it directly in the swap_desc instead o= f > >> > storing it in another lookup structure. > >> > >> If we choose to make sizeof(struct swap_desc) =3D=3D 8, that is, store= only > >> swap_entry in swap_desc. The added indirection appears to be another > >> level of page table with 1 entry. Then, we may use the similar method > >> as supporting system with 2 level and 3 level page tables, like the co= de > >> in include/asm-generic/pgtable-nopmd.h. But I haven't thought about > >> this deeply. > > > > Can you expand further on this idea? I am not sure I fully understand. > > OK. The goal is to avoid the overhead if indirection isn't enabled via > kconfig. > > If indirection isn't enabled, store swap_entry in PTE directly. > Otherwise, store index of swap_desc in PTE. Different functions (e.g., > to get/set swap_entry in PTE) are implemented based on kconfig. I thought about this, the problem is that we will have multiple implementations of multiple things. For example, swap_count without the indirection layer lives in the swap_map (with continuation logic). With the indirection layer, it lives in the swap_desc (or somewhere else). Same for the swapcache. Even if we keep the swapcache in an xarray and not inside swap_desc, it would be indexed by swap_entry if the indirection is disabled, and by swap_desc (or similar) if the indirection is enabled. I think maintaining separate implementations for when the indirection is enabled/disabled would be adding too much complexity. WDYT? > > > >> >> > >> >> Depending on how much you are going to reuse the swap cache, you mi= ght > >> >> need to have something like a swap_info_struct to keep the locks ha= ppy. > >> > > >> > My current intention is to reimplement the swapcache completely as a > >> > pointer in struct swap_desc. This would eliminate this need and a lo= t > >> > of the locking we do today if I get things right. > >> > > >> >> > >> >> > Another potential concern is readahead. With this design, we have= no > >> >> > >> >> Readahead is for spinning disk :-) Even a normal swap file with an = SSD can > >> >> use some modernization. > >> > > >> > Yeah, I initially thought we would only need the swp_entry_t -> > >> > swap_desc reverse mapping for readahead, and that we can only store > >> > that for spinning disks, but I was wrong. We need for other things a= s > >> > well today: swapoff, when trying to find an empty swap slot and we > >> > start trying to free swap slots used only by the swapcache. However,= I > >> > think both of these cases can be fixed (I can share more details if > >> > you want). If everything goes well we should only need to maintain t= he > >> > reverse mapping (extra overhead above 24 bytes) for swap files on > >> > spinning disks for readahead. > >> > > >> >> > >> >> Looking forward to your discussion. > > Per my understanding, the indirection is to make it easy to move > (swapped) pages among swap devices based on hot/cold. This is similar > as the target of memory tiering. It appears that we can extend the > memory tiering (mm/memory-tiers.c) framework to cover swap devices too? > Is it possible for zswap to be faster than some slow memory media? Agree with Chris that this may require a much larger overhaul. A slow memory tier is still addressable memory, swap/zswap requires a page fault to read the pages. I think (at least for now) there is a fundamental difference. We want reclaim to eventually treat slow memory & swap as just different tiers to place cold memory in with different characteristics, but otherwise I think the swapping implementation itself is very different. Am I missing something? > > > Best Regards, > Huang, Ying