From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28262C678D4 for ; Fri, 3 Mar 2023 00:49:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E4896B0072; Thu, 2 Mar 2023 19:49:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 993786B0073; Thu, 2 Mar 2023 19:49:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 85AA66B0074; Thu, 2 Mar 2023 19:49:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 75C146B0072 for ; Thu, 2 Mar 2023 19:49:42 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3DD76810C4 for ; Fri, 3 Mar 2023 00:49:42 +0000 (UTC) X-FDA: 80525754204.05.C01EB09 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf21.hostedemail.com (Postfix) with ESMTP id 6D2E31C000B for ; Fri, 3 Mar 2023 00:49:40 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qbJHGuO2; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677804580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MqWBq9cB8f9By1FuciQWOoA/Vazh0yJp4OPvHQQw3Y8=; b=RHkn6wX7dwUYBesGjhNYJj4yOXbd3Okkl03hs9+0vKS8VpYkXQLk8NMkd6cbCIpt+O/vm4 XkSm2079Y7adEsXMgElIkymJgKmP0/yxsSEnmqufJIZYTA7War2VtPVdwlc4+rHrUFpsq3 LUhomWJX5ILTHrFMVR32Z36Awt6n37k= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qbJHGuO2; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677804580; a=rsa-sha256; cv=none; b=M3NWA81ca4r1tPKBbOLqiEAL7ZjOxAeE0Ro4YY9zOqtuhVyW06U4gYgeWUsGUdwQ0SGXzV a/QK5dJEhUeRKJ/3w6zQ1uTxkksoNON+jQR6EyiHW9UBLv/YoAJz2FbJRmsbTV29yHYYt9 nj7lJdTd77xmp+czpHceUA6Gac6s/Qo= Received: by mail-ed1-f53.google.com with SMTP id ay14so455566edb.11 for ; Thu, 02 Mar 2023 16:49:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1677804579; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MqWBq9cB8f9By1FuciQWOoA/Vazh0yJp4OPvHQQw3Y8=; b=qbJHGuO2LpqmnqzTIv852E+9LJaxXcRJKPkPuRvOP0alsNd0EvMelkdkKaRvlP0GpL tT7hEJSGHLBMNGN61bXNFIFTJs3jI8Gmf2XxBmUsyRLOHEo943P2SXL1kz3GDvnygmg6 nUn1UeUD0DJytu32c/LIq8/4x3r9d3EOuy+01btLGPkqoZpBMmIuwK3xe61wTl0SeeR1 zSoB/7wGBvB2mQ7RhO5aMZcwB8RfKtoWdAkJxCCTZwD432ObjpX9nY5YwbcVLbXRMNWP 5DQjpTzHz8HLmyc9e+nYd4QBJAxFscBOCGZbQr/ptVfOovNqw4p1Us67UUwlvnLd7obI Jfwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677804579; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MqWBq9cB8f9By1FuciQWOoA/Vazh0yJp4OPvHQQw3Y8=; b=569Y8GA/4mvd84FCMPrmXXAHvJt0WE6uMjT+xv61zOVN+NgWuKHwg259XEDT3CsXXa ozyejqlOnY7jMn2/qrbvG+vM2aV4kf3RgIQfpYRKuuMg0YSOVn6i9HBfaTc0MJU5xLvO 2NQ70OZiTaGj1dyYuEcXHWvoSzCB9SrD1Wc1lidwIfEqHWzQ8v1GAwlHjTqWzFFr3CkN 3tTYy9Kl3DEMXpDpSpD/OXTkmdtKli8J7PdVoys/LYgFbGeV66rVoaBbDEOtrgfS0TBq FEyrXAhCkJlaOMYhnRPPerwthJkhPU+QTsnm7kqKY65ybrUyaQojzzLnFUfNCvSdxoJE tv8Q== X-Gm-Message-State: AO0yUKU9n1e9n4wbCYkFzRka0/z+rPNQ44ZhD20jRSwGVN66HSYlnJRK zlBxfo6QesZZta6REclyupFH8B9AGx0BWE1wGG59yg== X-Google-Smtp-Source: AK7set/Rquc5ytIK1wPE4GzcSdpQFey0Mg6fFkePGZaXY7rwXdtJJXR3PQzxBLJcte6SG93WNqKdKNaWKz0DSfaGGEg= X-Received: by 2002:a50:bae3:0:b0:4c0:ef64:9299 with SMTP id x90-20020a50bae3000000b004c0ef649299mr108224ede.5.1677804578427; Thu, 02 Mar 2023 16:49:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Thu, 2 Mar 2023 16:49:01 -0800 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap To: Minchan Kim Cc: Chris Li , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 6D2E31C000B X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: eqcn1ugw71incffmat6b347cjsy6qdiu X-HE-Tag: 1677804580-680616 X-HE-Meta: U2FsdGVkX19TX1JOUOTBXH3ZFEZDdPKteMJZEEC6HIPhVRxpW7GzkSBmmUNDeg+BtOZAWH8DpPhqdiXopi1ZDxZcMQ0ERgcXVnWqNB5fMxvagujnuWw5mfmeQ7QJ6HaM7NqsQTGfnf44/DeVqXKfAX6dEzZxv7j5+cJYQudO9fKS6PoeGbmaOrtso6ndoLSdXZLUOB9h7wyoIl/oopsNMXOxrBirsTi6vzIOA3T6XnYyD2YxxVz5Hy5uNoHX4QdWgRJ5iwiihhvk16V2qWyZjr8TH3lipWuOuLSEd0CWixdg5vF4zWEUZGjQxplxLgcbKWSnQKRx+W7mvh/NBw3leURq3v4bzKiXy6FqKz8Z54aD5GDxhL72cFAyr25rd3bv6d+k59wG/h3egjzeXAWlE0TGBFzS5CvRwD8OJBrpIhRxYdfTX/yh5Vmc743GzdCVS33CWQ1MQC7M4yVFG8UTnqWNdwp40itZxjjlD6U9n8Lbvssmd1lagYmhxvXyklkm9JOZ3Os04951f6xCZidynYDv5Ss3Ac3ry9MbN4FGbFpqNg2TZhb4L8ocOVUbM5QFoj5T0/DjXNBAXgOpe8e8KU+cRRb6c0CjfYFIMd5GdG6mTROSUYP3zO5+r6eLEG2H01BH4+LUPLQc4TC+mQcDjNGCrvgBFy7DsFBJC/o4WXOl1QSnHRyFoyKQnbzUq7U3RVslAMYLPWr65jUruHZLkWrVtqfIrSCkvw8vhID2mM07YMLWoHYucr9sukISByQQdZQVVyKmk6cx/HvWOlUbKYkWffxmnnexSMqgnT36ZSx8I4dwBv0FO5m+dRGjlfEmHWAFmumcFr6alavpGmo/3wdpmTo4r5FjLV/jEonmkt3MdCKzHjZ/I6cc/vBnf2y+N/WWy7K+UJRQa9ekt4KEVnDomCF9D8AzduR/gbWIA552YKXZwIVzT37z6H1yv1ZkFCYRCCsdXOGli5Cps+n rndkxTzd iOSyaMxbMGIMv0mNdV7D7v9hw3H5rSPyRV/1X0SOZLAFL5kbWWGPQvQvYLzSiIfMgx26Dm9MX+xyuxip3VOajI3c31Qq9hc8KtlalIUET5hPxhtqVxrbjreOJl082kB9jgmck+XoiUpoLyaMj8YLKsk7bCMys1OrqQL44lgx+VQq4I3GDtQ4uKsDLLiD/oWr3RrzHeJFYdFgzav/WSTSJSy9JrwQ7JM61RNyst7Txi/SyBXtogyPF1iTRo6yfeerQAICMzw6OqPp3S9Y86zpbcqaDPeuhlaDQyrdpXG9kS0C0wWqqy5dJL+cQuA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 2, 2023 at 4:33 PM Minchan Kim wrote: > > On Wed, Mar 01, 2023 at 04:30:22PM -0800, Yosry Ahmed wrote: > > On Tue, Feb 28, 2023 at 3:11 PM Chris Li wrote: > > > > > > Hi Yosry, > > > > > > On Sat, Feb 18, 2023 at 02:38:40PM -0800, Yosry Ahmed wrote: > > > > Hello everyone, > > > > > > > > I would like to propose a topic for the upcoming LSF/MM/BPF in May > > > > 2023 about swap & zswap (hope I am not too late). > > > > > > I am very interested in participating in this discussion as well. > > > > That's great to hear! > > > > > > > > > ==================== Objective ==================== > > > > Enabling the use of zswap without a backing swapfile, which makes > > > > zswap useful for a wider variety of use cases. Also, when zswap is > > > > used with a swapfile, the pages in zswap do not use up space in the > > > > swapfile, so the overall swapping capacity increases. > > > > > > Agree. > > > > > > > > > > > ==================== Idea ==================== > > > > Introduce a data structure, which I currently call a swap_desc, as an > > > > abstraction layer between swapping implementation and the rest of MM > > > > code. Page tables & page caches would store a swap id (encoded as a > > > > swp_entry_t) instead of directly storing the swap entry associated > > > > with the swapfile. This swap id maps to a struct swap_desc, which acts > > > > > > Can you provide a bit more detail? I am curious how this swap id > > > maps into the swap_desc? Is the swp_entry_t cast into "struct > > > swap_desc*" or going through some lookup table/tree? > > > > swap id would be an index in a radix tree (aka xarray), which contains > > a pointer to the swap_desc struct. This lookup should be free with > > this design as we also use swap_desc to directly store the swap cache > > pointer, so this lookup essentially replaces the swap cache lookup. > > > > > > > > > as our abstraction layer. All MM code not concerned with swapping > > > > details would operate in terms of swap descs. The swap_desc can point > > > > to either a normal swap entry (associated with a swapfile) or a zswap > > > > entry. It can also include all non-backend specific operations, such > > > > as the swapcache (which would be a simple pointer in swap_desc), swap > > > > > > Does the zswap entry still use the swap slot cache and swap_info_struct? > > > > In this design no, it shouldn't. > > > > > > > > > This work enables using zswap without a backing swapfile and increases > > > > the swap capacity when zswap is used with a swapfile. It also creates > > > > a separation that allows us to skip code paths that don't make sense > > > > in the zswap path (e.g. readahead). We get to drop zswap's rbtree > > > > which might result in better performance (less lookups, less lock > > > > contention). > > > > > > > > The abstraction layer also opens the door for multiple cleanups (e.g. > > > > removing swapper address spaces, removing swap count continuation > > > > code, etc). Another nice cleanup that this work enables would be > > > > separating the overloaded swp_entry_t into two distinct types: one for > > > > things that are stored in page tables / caches, and for actual swap > > > > entries. In the future, we can potentially further optimize how we use > > > > the bits in the page tables instead of sticking everything into the > > > > current type/offset format. > > > > > > Looking forward to seeing more details in the upcoming discussion. > > > > > > > > ==================== Cost ==================== > > > > The obvious downside of this is added memory overhead, specifically > > > > for users that use swapfiles without zswap. Instead of paying one byte > > > > (swap_map) for every potential page in the swapfile (+ swap count > > > > continuation), we pay the size of the swap_desc for every page that is > > > > actually in the swapfile, which I am estimating can be roughly around > > > > 24 bytes or so, so maybe 0.6% of swapped out memory. The overhead only > > > > scales with pages actually swapped out. For zswap users, it should be > > > > > > Is there a way to avoid turning 1 byte into 24 byte per swapped > > > pages? For the users that use swap but no zswap, this is pure overhead. > > > > That's what I could think of at this point. My idea was something like this: > > > > struct swap_desc { > > union { /* Use one bit to distinguish them */ > > swp_entry_t swap_entry; > > struct zswap_entry *zswap_entry; > > }; > > struct folio *swapcache; > > atomic_t swap_count; > > u32 id; > > } > > > > Having the id in the swap_desc is convenient as we can directly map > > the swap_desc to a swp_entry_t to place in the page tables, but I > > don't think it's necessary. Without it, the struct size is 20 bytes, > > so I think the extra 4 bytes are okay to use anyway if the slab > > allocator only allocates multiples of 8 bytes. > > > > The idea here is to unify the swapcache and swap_count implementation > > between different swap backends (swapfiles, zswap, etc), which would > > create a better abstraction and reduce reinventing the wheel. > > > > We can reduce to only 8 bytes and only store the swap/zswap entry, but > > we still need the swap cache anyway so might as well just store the > > pointer in the struct and have a unified lookup-free swapcache, so > > really 16 bytes is the minimum. > > > > If we stop at 16 bytes, then we need to handle swap count separately > > in swapfiles and zswap. This is not the end of the world, but are the > > 8 bytes worth this? > > > > Keep in mind that the current overhead is 1 byte O(max swap pages) not > > O(swapped). Also, 1 byte is assuming we do not use the swap > > Just to share info: > > Android usually used swap space fully most of times via Compacting > background Apps so O(swapped) ~= O(max swap pages). Thanks for sharing this, that's definitely interesting. What percentage of memory is usually provisioned as swap in such cases? Would you consider an extra overhead of ~8M per 1G of swapped memory particularly unacceptable?