From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38B6FC678D4 for ; Thu, 2 Mar 2023 16:51:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF8DB6B0073; Thu, 2 Mar 2023 11:51:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CA9256B0074; Thu, 2 Mar 2023 11:51:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B999F6B0075; Thu, 2 Mar 2023 11:51:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AACCC6B0073 for ; Thu, 2 Mar 2023 11:51:20 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7F734C03DB for ; Thu, 2 Mar 2023 16:51:20 +0000 (UTC) X-FDA: 80524548720.18.29DC614 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id 05572A0020 for ; Thu, 2 Mar 2023 16:51:16 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=klF6tg6v; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of chrisl@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677775877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NDjL5UhS2Up8JN1Q1u1939YAXD3YxR0jk3sQLmuNWsU=; b=yD+VucST0TDrtSWapt6hE/eRX+dt9qMB5JbQJhr1GoT412gQM/NOJ6p3OAKwRW1NFJRn5+ m/JYJjert9C0oP+t93jJOGx6VVKsH5U4JMiPWquGztvL+HqIqmLvX98Lf96l0hkcORDfIr ZDkmKspw5Hubqt6FT4OEdEO8qqaPDDU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=klF6tg6v; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of chrisl@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677775877; a=rsa-sha256; cv=none; b=ZxgAjJjjka7/7WsgumJIZQBYS6EhMMf4vBY/MdxcC3aChp/XQtzZej/yo1darfT55gAC6a 3poxTAhpcImoLoNR6ok8KY0poQQ99AADfvRllc43K7ZfmGesT6XY9ta5FJJQgKAb9Qr8Lt lIMn2GnYOYUJ2kq14F2+SyyIej1Oedw= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D1D5FB81223; Thu, 2 Mar 2023 16:51:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1698C433D2; Thu, 2 Mar 2023 16:51:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677775873; bh=aC8XiHb5aHHub32SI8lNGF4Vc8yUU95eM//kcyWCdls=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=klF6tg6vYoUySD+NZrqmPiKhVHt0ol33RZEoOHWf02bhgioLb9RSe9RhrbA4jq1N/ SDQ/agApHMI+QwuRnBTPt9c4FQhWX73mqhQoes8XXVqxMNGP7Dv9JcQGCVYIgM7XbI DR8q1CDJCtHI6SNwus7kUr85B1AIIp01/Eg5ffQq6+ybHVqkf8tVJ674ahk3H/9RoY ZBJjaZCpZZXE0BDSJ2Tm40/5N573sUKXXyKOjRspxgK7JbAlw7ohgIILNu/tc4vWGX XH0UgafHU3nSiTwIutBx4VByUn4eKoRPoMFUKkqFjNq4IYY8tEzWDTpw30XzLaEx1D PXnOqnJsqCGBQ== Date: Thu, 2 Mar 2023 08:51:10 -0800 From: Chris Li To: Yosry Ahmed Cc: lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 05572A0020 X-Stat-Signature: y96mpai9ke89yq7iqkukaxhb6i7s34gs X-HE-Tag: 1677775876-2504 X-HE-Meta: U2FsdGVkX1+reg4PVCSH96QwXAASR9gJXfK9AnFxjlsC94AJY4qI9UEWjJaKPVU0/XjR7UvmxNrELxCTKp/loACZupj9634LU5stlOVQUmvXt2JnMLQ1mG7xMYQIRPhR8S4zF4o2YfqvdkOcgy6i9nuzZTIda/8uJ4rv5TyecKmKIDqzVQw+0jlm147JT0T/m/K4V3FXocZbL59Oj9RpJAaGiP/Fv8cfq00+SdW74lTxG3p5qS2UyylspbKHdPYx3ZZytJF52pVeHp/bYEimyn1sIPhaw9Uox6fnmrcWCjYOmrHNR5RpLskisDwNdeRUPO4sW3Upyt1ajCi4h7Ur880vuS9v93Qnx4ZQLngvaov9PtYZT0XZYtuMFa1KbMWCmLZbEhl/SSJKbFF/Y93lbpeVc1YVV5rUAA440Hm8xmjE+WPgMHZtD2ITWeU1vGuFp8iPG+tEkrLNUiWARnCgyOEJu4U7roqdn5PEDsHUSggljgypoB2rJHm7zMdAOsKvVJJqwZi4L2m84dxVMj0cMkCxRYPnRDl3g82m1I5XZ8+XSewiyVV3w18taCy2M8NQWDxRNP1mxxk/feolZJp0oHCZ0Kav1G2QyF2+SIbCgLes6QZSqAKt+RtKVXTQlhofLbsLrA9sUKOFQQr5YM7zAl6r6ZiM888bfSrYWU/2o3E73SMDZDGx3QtBKd6qm49zGS+eWV2HtM9m+BUN7PajORHjwdCVVbejVAzMCszCSQXc58EkMP5EehNyBiWwFdOmxXb+OCpOdyjzruOXdo4SeNB4k0RC5UkdZI4JMn9of8N11XeE6A3QqV90RZ19gZmpusTMe5om38cFi5iIJalpQQijGgeGvFOHM0/PwdieXkooAjJr/GMhuqFq+6Z5E5/hH/WnzODl6AfIt+/WEu254EiFBe2CYHBSjbA2i5Lna1NYIAShV9rp1EJOiClo4WNJRwnVhj7eELn3Mus6tvU 8gYwPpS/ V+mJwTS9L2YrcSl76/fBksSR2joWBbruaNXXBj7KZNFAw00r+YWToN3dk8zrwl0fqr9ibbIrWii/gednoQTqqNm7uxx65pYloKzsPNCjlpzYv4i0HZ2QXkCUpajM69YmwZK2tkKKXTam5AsBlZuUPOFfkU/vBeWs5WhbwMJXavuQbJBU4MVLnHSi8YpvNaQGx3xHQZBAtKXcmM1bz7XEvZuBWs/8vpc5i9372tl+3XntQGSMTE0OemYZaEiZMNeiQfi77cJf+BXhpWDag7E849VvUv1KcGe29XiB6wt9HoXmCQx/NzU6xCWMSBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Yosry, On Wed, Mar 01, 2023 at 04:30:22PM -0800, Yosry Ahmed wrote: > > Can you provide a bit more detail? I am curious how this swap id > > maps into the swap_desc? Is the swp_entry_t cast into "struct > > swap_desc*" or going through some lookup table/tree? > > swap id would be an index in a radix tree (aka xarray), which contains > a pointer to the swap_desc struct. This lookup should be free with > this design as we also use swap_desc to directly store the swap cache > pointer, so this lookup essentially replaces the swap cache lookup. Thanks for the additional clarification. If you don't mind, I have some follow up questions. Is this radix tree global or has multiple small trees (e.g per swap device)? > > > as our abstraction layer. All MM code not concerned with swapping > > > details would operate in terms of swap descs. The swap_desc can point > > > to either a normal swap entry (associated with a swapfile) or a zswap > > > entry. It can also include all non-backend specific operations, such > > > as the swapcache (which would be a simple pointer in swap_desc), swap > > > > Does the zswap entry still use the swap slot cache and swap_info_struct? > > In this design no, it shouldn't. So the zswap entry only shares the swap cache with normal swap entry. That help me paint a better picture how you are going to do the indirection layers. > That's what I could think of at this point. My idea was something like this: > > struct swap_desc { > union { /* Use one bit to distinguish them */ > swp_entry_t swap_entry; > struct zswap_entry *zswap_entry; > }; > struct folio *swapcache; > atomic_t swap_count; > u32 id; > } > > Having the id in the swap_desc is convenient as we can directly map > the swap_desc to a swp_entry_t to place in the page tables, but I > don't think it's necessary. Without it, the struct size is 20 bytes, > so I think the extra 4 bytes are okay to use anyway if the slab > allocator only allocates multiples of 8 bytes. The whole complexity of the swap_count continues is trying to save a few bytes one swap entry has one low count numbers. This seems much more heavy weight compare to that. > The idea here is to unify the swapcache and swap_count implementation > between different swap backends (swapfiles, zswap, etc), which would > create a better abstraction and reduce reinventing the wheel. Same goal here. I am just trying to find ways to use less memory, for users who don't use the indirection. > Keep in mind that the current overhead is 1 byte O(max swap pages) not > O(swapped). Also, 1 byte is assuming we do not use the swap > continuation pages. If we do, it may end up being more. We also > allocate continuation in full 4k pages, so even if one swap_map > element in a page requires continuation, we will allocate an entire > page. What I am trying to say is that to get an actual comparison you > need to also factor in the swap utilization and the rate of usage of > swap continuation. I don't know how to come up with a formula for this > tbh. I would consider two extreme cases of memory usage first. 1) Have swap file size N and no swapping at all. 2) The swap file is full. (aka per page swapping memory overhead). Your proposal will likely do well on 1) because it is dynamic allocated. but worse in 2) due to the extra 20 or so bytes per swap_desc. > Also, like Johannes said, the worst case overhead (32 bytes if you > count the reverse mapping) is 0.8% of swapped memory, aka 8M for every > 1G swapped. It doesn't sound *very* bad. I understand that it is pure > overhead for people not using zswap, but it is not very awful. I might have an alternative to avoid increasing memory usage if not use the zswap. > > It seems what you really need is one bit of information to indicate > > this page is backed by zswap. Then you can have a seperate pointer > > for the zswap entry. > > If you use one bit in swp_entry_t (or one of the available swap types) > to indicate whether the page is backed with a swapfile or zswap it > doesn't really work. We lose the indirection layer. How do we move the > page from zswap to swapfile? We need to go update the page tables and > the shmem page cache, similar to swapoff. How about I make a proposal and you can help me poke holes on it? I fail to see how it can't move a page from zswap to swapfile, yet. Chris