From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83951C678D4 for ; Fri, 3 Mar 2023 00:33:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02D9A6B0072; Thu, 2 Mar 2023 19:33:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F205F6B0073; Thu, 2 Mar 2023 19:33:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC02D6B0074; Thu, 2 Mar 2023 19:33:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CDEA96B0072 for ; Thu, 2 Mar 2023 19:33:34 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A1D5EC0635 for ; Fri, 3 Mar 2023 00:33:34 +0000 (UTC) X-FDA: 80525713548.04.1222F9D Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf19.hostedemail.com (Postfix) with ESMTP id B77D81A0007 for ; Fri, 3 Mar 2023 00:33:32 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=KvowtNAO; spf=pass (imf19.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677803612; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DAFRQxKhy3uO36Q5Q+k79YJyGeJlpP/Z0q0TuieIswg=; b=TuMcL8c80ovLlLPCbZ3qvCR/A76eCTAOo+GA1/BqLlQhD9jJ6J/Etf3RMHOwbd/jHzvsl8 igZANrevdmnhEmcmu0sBReJy12ynLunWjF4jqrjW0WfgFQiApXFMxhK+rq/aro8Zicjlci TDovZRHcO7SAhhh0qKN65qPV5Tv7FSk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=KvowtNAO; spf=pass (imf19.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677803612; a=rsa-sha256; cv=none; b=m8xKv0EiM5jWSRKmKjgC3wwFGdGxvDbGPYn+wvlzLpa65BlTfXtVz+zr6q7ysl8XAhFlUD +m9FN030XlMNA7xdZa7xf+uVwAB4VjJPlcMw5nbDsnHuAHqjssz2kFJCEJeP6QfDmHeNuw RGmScRBMqeEW6nezEa+Vq00zAmjClpQ= Received: by mail-pl1-f171.google.com with SMTP id v11so1046299plz.8 for ; Thu, 02 Mar 2023 16:33:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677803611; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=DAFRQxKhy3uO36Q5Q+k79YJyGeJlpP/Z0q0TuieIswg=; b=KvowtNAOpkO/8f4aQASy40Q5KtGzhh+mm50+wxUZUKkv5MWcOMO+CdWzzZNw8KZxTf SeLnd/7J615pswqTpg3OV6I4DsIyRFX06n6eGTKorVXzD7gCg4Zvbn5OjnGzHIOGhKQG dpTCcHqQwJyBkl1T/AZWukSfSfSoL3ovNSiRud3hUESodfnCugOmJiefPw9O2A2q6nql KPpsS8A3TMl9eiqMz9qSKBLlTBZDsTBvUWKihbjK4SyRe8BHAAwuxBse8PC5MKZvptS5 V+IYDqaLIAXg18obImpzb8wdXmoBaKe+K0bHY9D5JwdQyPWDL/0Z3EJIiuNgeW+sOLfr 1hCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677803611; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DAFRQxKhy3uO36Q5Q+k79YJyGeJlpP/Z0q0TuieIswg=; b=FHuB/PyusVuk0xsigJ4Y8sffscDqyQLin5zuaj/jqvVueFauJLN51rLW5Z8jZfcY7j Dc93BJKkKU9BulA1JPOa+yUFczPg9gl/V6L5WyKBSpfWMeCEwcdnZ95hsOdTOa87uLCL uRlIK5OnTEyzvIf5NLjXz1IdDF6Emqoc7nyeZDU6d5fD5uVBVWZBTD42iJr1YiSTS8v1 pQY+fBRsJ5qwrWAMVycs6uC1tAvWpppB120DI45EErpnEznU8GDEtxmiZaufBkcZ9QXR /kWUW4VlJw6nKkntPy7HMB94AAT4OFQe6ZV3FMisRd2KXxIASA0cr71pEjwUFCb2lgrt BsWw== X-Gm-Message-State: AO0yUKU34Au8R3GPVflqDX3EcQPCHi7Qw4deUHJBsl5q2c6ENppOsuwm tT+OcPqlI0RQGi0YOhSpjKY= X-Google-Smtp-Source: AK7set9FqwZl5L5NC2pGCCFyARb2hebM9wAwROzJB22lAjJ3A95861lIkEtbVVjv3gTdgihMwx8b7A== X-Received: by 2002:a05:6a20:6982:b0:cb:c276:58d6 with SMTP id t2-20020a056a20698200b000cbc27658d6mr552306pzk.34.1677803611392; Thu, 02 Mar 2023 16:33:31 -0800 (PST) Received: from google.com ([2620:15c:211:201:4036:e990:6bc4:4206]) by smtp.gmail.com with ESMTPSA id c24-20020aa78818000000b005d3901948d8sm286067pfo.44.2023.03.02.16.33.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Mar 2023 16:33:30 -0800 (PST) Date: Thu, 2 Mar 2023 16:33:28 -0800 From: Minchan Kim To: Yosry Ahmed Cc: Chris Li , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Andrew Morton Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B77D81A0007 X-Rspam-User: X-Stat-Signature: h4kacbiwfwanceptabajzp4d11wbx7dr X-HE-Tag: 1677803612-522877 X-HE-Meta: U2FsdGVkX18ZABEM1epPcdHpy4JGheNEwTuFFEkeTBBDtM62hdbeJ/eP0mBrl/A+/y/tjlLygGDNopKBiLLyx48FCrxhLzR2ZBLZnAaiykHdROqB7r1xGT7H56Cwv3rNwyp9OAPE9E51sqa8r34/nbHxWhziVr1MQRbiXuVzE9hcjjDXqbHKbXXwAeLsWwpJnZ2fF/seSCSuteh7IgfrLEuP6LrTfBUsB+qQReTDfQW77Bb5eal6yj/FVXzQqeTz2zc+i6lMTOcVKb8KyrnvzyEEqJGeYfR7liW6mRDoMniI2cVTzr9iD9GKsyl1Ky9UOqRVgWFdCcNT0vdsbHjePHxyamuAZh89YC7zOgSyttkH4xlzpUhi4sSCXzuGtavl+Jg9TguZ4jXR95wxWz/+tuXIjJMK5/O5av0esM/k3zWQSZLaYvqRigYstmxqHq7H/EIAtrsu1DzffQhrCHYFyF+wie1U46tu1JvBNrxT0sQFIDXG4+ImbO7n4x+1a0vR69vocziSYqGSwXW1x6H12IE3eOn6US/Cz16tpPO3dxyBqNPw9GJc0cJL3lAtU/6T+Gs8HM7MC8FgM2W5EsAHDhYNn8l0vTqHeD9GFWAEurR3TCHl0zsgO65j/cJiFUHOcSBs23B/ry5VlNHoUZwseJOrY9CuElX9xnkhiqtkbUsFqZTp9A1P8kgGwwKi8rlMDKLQ4F5RVgvN2jw/ynwM2CgtQxT+GxSd5tnYI05WazXfmG4k3ZLgwhcxHlpdtSreNoD0wMFiWbPGkaGfqBc6Fj7qqv5ZAoA/IQv+FnASrQDpEvjHttABnI39O2I1+nxG17eZJQvk0cMAHOc7qsVHjTuwjO59nYHbPMmPXWAA9jQDP1q7m0KzcHkDnCX/RcjsrIR3hSilUihDMnDcPik48KHX1Ui0d5Iq+qOUwaacQPClWFPiQj7khL2RO4yK0t+PSfpSm8pUyhqSx2v0h5m I6ZI/DnB W/acOSnu3Pv2MYxQsBOpsDLS4OqIM6nokA7gvv2KpjyWZT/vFDSGO7z9JEuspvIEEIxU0TjcCBqgu9jnIHo3gY2ROphxT4kDsZPJoAy0+lUNvwpH5PNpnW4a2TOtd3psE4T0VyBlroLKQzE/jzNaRDJspj2mwsuN5FtLKQbQN52Zyr0lVlCWZofeMdLb+IVOGM9iaW6f5NrZvcxHAX4eHHXbbCdLFtA55QmUG61ycYUdE9fvrfCi1+IK4facqdwBa0eYtVBAcYyz4RLMG986KtLO9on5SelDdaaH0a62feQH1BdBBfd23BTfNgMCVCcD4Q05UlGCPeMdaMwLXs6BrG61rCFbEU5d03Bvb+9k2kx4yWqYS+PC/Y8x29qwpWAVB2m7mmlpQ+km6IbpqbfYT7U0Gua+sSLS9UaGAkFPO9Ksk5WqB/Cp4qZFJxap+NA++H8Nj/HHZ+VCcEb/1tB/Ehws7K/O+O5Mu1A9OY2vFAiR0kSBsoCVv8BL+GOsSimD1FH4BV29kTdRzSH2lFgDtEIabn8OVFXnVFouVfH3hW+7U0bem4CnEjMEvSkH+ppLz6sfTJz1lU4+JWcBJeNJBfQ3EvdPXcnTzJ44/p6yNnz6doQSfkhIbMcc0iKNQsp8hjkJwIB1ZPPjm2nnE9pqax9PX/A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 01, 2023 at 04:30:22PM -0800, Yosry Ahmed wrote: > On Tue, Feb 28, 2023 at 3:11 PM Chris Li wrote: > > > > Hi Yosry, > > > > On Sat, Feb 18, 2023 at 02:38:40PM -0800, Yosry Ahmed wrote: > > > Hello everyone, > > > > > > I would like to propose a topic for the upcoming LSF/MM/BPF in May > > > 2023 about swap & zswap (hope I am not too late). > > > > I am very interested in participating in this discussion as well. > > That's great to hear! > > > > > > ==================== Objective ==================== > > > Enabling the use of zswap without a backing swapfile, which makes > > > zswap useful for a wider variety of use cases. Also, when zswap is > > > used with a swapfile, the pages in zswap do not use up space in the > > > swapfile, so the overall swapping capacity increases. > > > > Agree. > > > > > > > > ==================== Idea ==================== > > > Introduce a data structure, which I currently call a swap_desc, as an > > > abstraction layer between swapping implementation and the rest of MM > > > code. Page tables & page caches would store a swap id (encoded as a > > > swp_entry_t) instead of directly storing the swap entry associated > > > with the swapfile. This swap id maps to a struct swap_desc, which acts > > > > Can you provide a bit more detail? I am curious how this swap id > > maps into the swap_desc? Is the swp_entry_t cast into "struct > > swap_desc*" or going through some lookup table/tree? > > swap id would be an index in a radix tree (aka xarray), which contains > a pointer to the swap_desc struct. This lookup should be free with > this design as we also use swap_desc to directly store the swap cache > pointer, so this lookup essentially replaces the swap cache lookup. > > > > > > as our abstraction layer. All MM code not concerned with swapping > > > details would operate in terms of swap descs. The swap_desc can point > > > to either a normal swap entry (associated with a swapfile) or a zswap > > > entry. It can also include all non-backend specific operations, such > > > as the swapcache (which would be a simple pointer in swap_desc), swap > > > > Does the zswap entry still use the swap slot cache and swap_info_struct? > > In this design no, it shouldn't. > > > > > > This work enables using zswap without a backing swapfile and increases > > > the swap capacity when zswap is used with a swapfile. It also creates > > > a separation that allows us to skip code paths that don't make sense > > > in the zswap path (e.g. readahead). We get to drop zswap's rbtree > > > which might result in better performance (less lookups, less lock > > > contention). > > > > > > The abstraction layer also opens the door for multiple cleanups (e.g. > > > removing swapper address spaces, removing swap count continuation > > > code, etc). Another nice cleanup that this work enables would be > > > separating the overloaded swp_entry_t into two distinct types: one for > > > things that are stored in page tables / caches, and for actual swap > > > entries. In the future, we can potentially further optimize how we use > > > the bits in the page tables instead of sticking everything into the > > > current type/offset format. > > > > Looking forward to seeing more details in the upcoming discussion. > > > > > > ==================== Cost ==================== > > > The obvious downside of this is added memory overhead, specifically > > > for users that use swapfiles without zswap. Instead of paying one byte > > > (swap_map) for every potential page in the swapfile (+ swap count > > > continuation), we pay the size of the swap_desc for every page that is > > > actually in the swapfile, which I am estimating can be roughly around > > > 24 bytes or so, so maybe 0.6% of swapped out memory. The overhead only > > > scales with pages actually swapped out. For zswap users, it should be > > > > Is there a way to avoid turning 1 byte into 24 byte per swapped > > pages? For the users that use swap but no zswap, this is pure overhead. > > That's what I could think of at this point. My idea was something like this: > > struct swap_desc { > union { /* Use one bit to distinguish them */ > swp_entry_t swap_entry; > struct zswap_entry *zswap_entry; > }; > struct folio *swapcache; > atomic_t swap_count; > u32 id; > } > > Having the id in the swap_desc is convenient as we can directly map > the swap_desc to a swp_entry_t to place in the page tables, but I > don't think it's necessary. Without it, the struct size is 20 bytes, > so I think the extra 4 bytes are okay to use anyway if the slab > allocator only allocates multiples of 8 bytes. > > The idea here is to unify the swapcache and swap_count implementation > between different swap backends (swapfiles, zswap, etc), which would > create a better abstraction and reduce reinventing the wheel. > > We can reduce to only 8 bytes and only store the swap/zswap entry, but > we still need the swap cache anyway so might as well just store the > pointer in the struct and have a unified lookup-free swapcache, so > really 16 bytes is the minimum. > > If we stop at 16 bytes, then we need to handle swap count separately > in swapfiles and zswap. This is not the end of the world, but are the > 8 bytes worth this? > > Keep in mind that the current overhead is 1 byte O(max swap pages) not > O(swapped). Also, 1 byte is assuming we do not use the swap Just to share info: Android usually used swap space fully most of times via Compacting background Apps so O(swapped) ~= O(max swap pages).