From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A477FC61DA4 for ; Sat, 18 Feb 2023 22:39:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF65D6B0072; Sat, 18 Feb 2023 17:39:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CA70C6B0073; Sat, 18 Feb 2023 17:39:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B47406B0074; Sat, 18 Feb 2023 17:39:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 92CC16B0072 for ; Sat, 18 Feb 2023 17:39:20 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 65227AAA37 for ; Sat, 18 Feb 2023 22:39:20 +0000 (UTC) X-FDA: 80481880080.01.CFA1D21 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf22.hostedemail.com (Postfix) with ESMTP id B8850C0002 for ; Sat, 18 Feb 2023 22:39:18 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=rVjNxb5R; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676759958; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=ybcvooX57kOZTHDfrIlCKXrGG7+NBVV2nLpWfdCYsqo=; b=jKGjO12oagrQEDcecNVg5KzOTXlNLlCouG/BhxE77X3mEQygsnnLdpADd7Hl8OroaaHaWF 0DeGhLTo/xFCBaG21ssjDoFMpSkutqWPrRsaf+e5YAiYQLi3YSLkRlqXy4i68yEW2oyJJ2 hqYhBt4KrQYF18i2A7LnlaGNtY1y50U= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=rVjNxb5R; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676759958; a=rsa-sha256; cv=none; b=q86s51qMZkSxKUGA4yhUVY+xwwhP4YY0JGTgJGeERTcPbErw1KQT9S9fzV7T1gFZcNGAq/ ND0S2gJcqFff27i3N09pGo3SqbJbvgeXbxfuWievOx1MvientveTvlhAADAlgthfGRspxl i4nnS+Kvgg21wh7RudVUG5+p3+J7aPU= Received: by mail-ed1-f50.google.com with SMTP id z7so3467126edb.12 for ; Sat, 18 Feb 2023 14:39:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=ybcvooX57kOZTHDfrIlCKXrGG7+NBVV2nLpWfdCYsqo=; b=rVjNxb5RrKS9qnk/D1E/m9CtZ30vRN/kKPkl9O9Ta8bzov2BvcFU8ZVgtfBHpvMAlI 41D8bwD/0VjOD3PvyheJWiylKoMlw+QTsZ9Js5dv9Wf6cob3Lj10ZvgO07X2MTTv5LKO wQ5Wa60g+kPfLllhctqU3V4d12mEv7dUM1VkctaHGpPkdQKYlaojcGSInxypLoakHRhZ SkWiRi0kadkblS1WNQafVmmAu0Zl7eD7YWwD4dIIf6L62ZRuFEJkTSY02j70stLBgpou gMtB+HJfcAuREzeOP0LMeSrZSEdD2dIssJVF7xS5eNvlH4oxqWlMadX5JwlAMfHRlaIY t3bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ybcvooX57kOZTHDfrIlCKXrGG7+NBVV2nLpWfdCYsqo=; b=Hu7ECmKK7Y0K3Z15P3uCM5wKwv+cct6sasr6E5a/T+AixUYIzNfh4TkIdrNaGVJsl+ 8m+oHmR9qpuIcpdNCVjTprrE5HcnXjuoVA9WRBaDPGRWaryCkxVxKamCjG4O2pFXUw/3 xVoZhSxzQpq/tN4R4wJFoznjWPoAYNvdh8OmRlIvsVk8b2N1LQYe8L2ylVZZUNmq7POT 1+M6XAF0hBEWQnmT+wq2RZMer8Z3Qc+TmVX59R0wKwECLKcdMrEPKHauTKVcizOzqbnf daU1EJzijsMpo/tfjAQ9o+ScES1dmnI19cL9Q47y4OZlpQq9JDoQGH5ulxh06FspdugZ KMgw== X-Gm-Message-State: AO0yUKUusw8Y4C3V+/rvISXOjB1Nuhk3NS+8kp8La6w/3i1ynFHIMoXy oc5WHpGpCHw+PfRvNAQKPSox3cWayt9zz9Rdx/kqRA== X-Google-Smtp-Source: AK7set/XcA4WvHpQ51tsIMOZIWElgcdWyYwc/F8KwfFYDX8LV1YzDAUA3dbOf1AVu1/iUoDiYd/tHuUnB3Glsv+80kE= X-Received: by 2002:a17:906:d966:b0:889:8b2f:75d1 with SMTP id rp6-20020a170906d96600b008898b2f75d1mr2221770ejb.10.1676759956793; Sat, 18 Feb 2023 14:39:16 -0800 (PST) MIME-Version: 1.0 From: Yosry Ahmed Date: Sat, 18 Feb 2023 14:38:40 -0800 Message-ID: Subject: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap To: lsf-pc@lists.linux-foundation.org, Johannes Weiner Cc: Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: betcysgqfhhwcftopom7iqar14y59yr8 X-Rspamd-Queue-Id: B8850C0002 X-HE-Tag: 1676759958-683612 X-HE-Meta: U2FsdGVkX1/v9+9pzYzBmzZFL0oE2aaDavczsntLMDjMljc1njGqJFoF8Xp5M4lqQl6NZVT6rXIPxhJuK87Wk5zXFxTwmBzD4Hi2y0AIMHz07iWbVy7DrlDvx4W2I3MN4ut30bKhhTXPUNcLJPb9+IudzYeouH8BxNJlQ3Vfx0ICMQVWNlSOWdkU1d+5gmlr+eC1Y1/HqC9UTYY15shdiee/UwXBsxGoMFX5YfhcI8x8t+UqD4Lm8o1yGmO2KtUouoSloSPb1N8ZIKFls+dPij1NB8QrS0d9PetrB91utIemqBQcQEaGdpNYWBvxlTQest+3W6t0+5Q8l1MobaiqVE6r+3f4qIbLN8reJoJZ8DLYuK5pqkG/VjbsbA3nPBqwPBNcZMcjzGJaxhzSlDCkqg6jvWiP7juu+A/4JEXAUNg4fWCLcIaaulQvax0upcJhNUHj3CR51LrJerYTZt988UUsLOSD0DufD/07PndtCGOnIaWr88F3dxtbGpexZUL3BEEXhvogZGDHCnLkH+WPR8puSXGFnhphPzQzmfOz6atxeEwo8ls8rJ6ZNNNNjxzDZq/dgqETeqZ3JQicStVmcJyEyEwvrbWI6k9d+cCUbkvoy7spizpve6Esdhc9icLMgtTEL1mla8Zkj6jqPrXMXI81eb80QWnbH8yTXQUins35awt9j9lDXBw6NSUGvUUjJzdOuc3yuMtRhj7zNmSVHYeRRI+PxU+TsNNK4VWpAjEzLErggksqFx+qwFNlPsZw8bACRkL6HbmGrxppjwuc1iNq+filMTnXXC077i35imUhjHT6m0v1jPobWmFiSJ0IPTtBRP18Z5CJOx9pR1vtgbm/ql1Vl/drscXKnF4ui1Xq2c2SVBZWUkvvFgVeldFLtObOkYme4gv2GMPX4lPo1s6a4Cvc7XSOQ01ZU1rMwhN/Iu39vZJmrM/XguVXXZMnOaUlzNW1jJenyrShDpL 0Rm54Z1v SI9WDxi5+5dxz95D8s1C1sePeIsJq2gkGyTAHeVa21nE2I9YCZV1nySZkDGD67wmpjYBZk7T+r9nwvhXBEwpFPvYip0ZWBMR0VWBR89SswJ++hOfaVXXHALaiOO/QSfTweOGL3VxlTy515izZUkZw5NACnESZ8FVc49MMiiRcn2RV8ZSFlBEPhCKRAwecMYKGNo1ZEciSxhO59Gn6B7JmcZkjdULFP5+dlN0Guy+XfR8qG8ohqIuMpN0jrA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello everyone, I would like to propose a topic for the upcoming LSF/MM/BPF in May 2023 about swap & zswap (hope I am not too late). ==================== Intro ==================== Currently, using zswap is dependent on swapfiles in an unnecessary way. To use zswap, you need a swapfile configured (even if the space will not be used) and zswap is restricted by its size. When pages reside in zswap, the corresponding swap entry in the swapfile cannot be used, and is essentially wasted. We also go through unnecessary code paths when using zswap, such as finding and allocating a swap entry on the swapout path, or readahead in the swapin path. I am proposing a swapping abstraction layer that would allow us to remove zswap's dependency on swapfiles. This can be done by introducing a data structure between the actual swapping implementation (swapfiles, zswap) and the rest of the MM code. ==================== Objective ==================== Enabling the use of zswap without a backing swapfile, which makes zswap useful for a wider variety of use cases. Also, when zswap is used with a swapfile, the pages in zswap do not use up space in the swapfile, so the overall swapping capacity increases. ==================== Idea ==================== Introduce a data structure, which I currently call a swap_desc, as an abstraction layer between swapping implementation and the rest of MM code. Page tables & page caches would store a swap id (encoded as a swp_entry_t) instead of directly storing the swap entry associated with the swapfile. This swap id maps to a struct swap_desc, which acts as our abstraction layer. All MM code not concerned with swapping details would operate in terms of swap descs. The swap_desc can point to either a normal swap entry (associated with a swapfile) or a zswap entry. It can also include all non-backend specific operations, such as the swapcache (which would be a simple pointer in swap_desc), swap counting, etc. It creates a clear, nice abstraction layer between MM code and the actual swapping implementation. ==================== Benefits ==================== This work enables using zswap without a backing swapfile and increases the swap capacity when zswap is used with a swapfile. It also creates a separation that allows us to skip code paths that don't make sense in the zswap path (e.g. readahead). We get to drop zswap's rbtree which might result in better performance (less lookups, less lock contention). The abstraction layer also opens the door for multiple cleanups (e.g. removing swapper address spaces, removing swap count continuation code, etc). Another nice cleanup that this work enables would be separating the overloaded swp_entry_t into two distinct types: one for things that are stored in page tables / caches, and for actual swap entries. In the future, we can potentially further optimize how we use the bits in the page tables instead of sticking everything into the current type/offset format. Another potential win here can be swapoff, which can be more practical by directly scanning all swap_desc's instead of going through page tables and shmem page caches. Overall zswap becomes more accessible and available to a wider range of use cases. ==================== Cost ==================== The obvious downside of this is added memory overhead, specifically for users that use swapfiles without zswap. Instead of paying one byte (swap_map) for every potential page in the swapfile (+ swap count continuation), we pay the size of the swap_desc for every page that is actually in the swapfile, which I am estimating can be roughly around 24 bytes or so, so maybe 0.6% of swapped out memory. The overhead only scales with pages actually swapped out. For zswap users, it should be a win (or at least even) because we get to drop a lot of fields from struct zswap_entry (e.g. rbtree, index, etc). Another potential concern is readahead. With this design, we have no way to get a swap_desc given a swap entry (type & offset). We would need to maintain a reverse mapping, adding a little bit more overhead, or search all swapped out pages instead :). A reverse mapping might pump the per-swapped page overhead to ~32 bytes (~0.8% of swapped out memory). ==================== Bottom Line ==================== It would be nice to discuss the potential here and the tradeoffs. I know that other folks using zswap (or interested in using it) may find this very useful. I am sure I am missing some context on why things are the way they are, and perhaps some obvious holes in my story. Looking forward to discussing this with anyone interested :) I think Johannes may be interested in attending this discussion, since a lot of ideas here are inspired by discussions I had with him :)