From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5423FC64EC7 for ; Tue, 28 Feb 2023 08:12:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A69416B0071; Tue, 28 Feb 2023 03:12:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A19696B0072; Tue, 28 Feb 2023 03:12:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8954E6B0073; Tue, 28 Feb 2023 03:12:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 77A256B0071 for ; Tue, 28 Feb 2023 03:12:45 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2DA64A0395 for ; Tue, 28 Feb 2023 08:12:45 +0000 (UTC) X-FDA: 80515984290.16.D71793F Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf20.hostedemail.com (Postfix) with ESMTP id 663CD1C0012 for ; Tue, 28 Feb 2023 08:12:43 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=HqWJYygk; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677571963; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TMG59nr6i3i5wA8gKdesc4W2dTpQ9yunWljnY8+FqJY=; b=HYdKCfO877dG+98vy4KDq2EIQWO0KgWjY8/qw8HWBMaClbfr5kUeERarq2UibHt8qiGYs0 e8EDcw9MnxGvlTgtP7Orj55wTNrCSA3v6+dUqNO30MFEKSK4U8AKf9txVs2hmBiPqUjSv4 qa4UQA+SbdIpGog7sbqmq/3vhdBtvB8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=HqWJYygk; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677571963; a=rsa-sha256; cv=none; b=sqKOq3rq2bMDEF3YS9NbXprYs8+elSBcJUOmkKtCNdxgnmF8eRB5CCzRwjK2m8/EfiEeVa PSqzbuwYupCdsw5Jqg+XhPyTrB1oNF6Jsi4PjflcME28icQSLGevHEDsELcMQocr80qeoY e0krfqxhYSH16WBTYL5uFfgi1mfMer8= Received: by mail-ed1-f51.google.com with SMTP id ec43so36306494edb.8 for ; Tue, 28 Feb 2023 00:12:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1677571962; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TMG59nr6i3i5wA8gKdesc4W2dTpQ9yunWljnY8+FqJY=; b=HqWJYygksclNlrS2IpyKTVSblwWBPcyqjYK+04iRxpti3339u5IicgDeG7ru6VFNG2 gz9zX8QdSN0WbkdahL/79L/LuTlVU5eb2VPPC4iy83febRT5Q52zGSd6Sl1DnI9xmshd 420U6TqUB7aJ0L50/6nlGDR9jLqRncdvJFx320vN4pFrfpFC5HoiNr3oaikBzdEPiqPQ eEu2eoVJHTix7/9Qx+mIOfEEuYE9hKZ7O7lEugpX56YYzEhZNFWWZ5D6NjtHwzb+5dZk ECapKiF9mB5IhVvp1gWy5hUwxZlQ28oAXETh9fRPHIUoK3j5ZHU8O9ZOPoUxnB6aqUWd nANQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677571962; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TMG59nr6i3i5wA8gKdesc4W2dTpQ9yunWljnY8+FqJY=; b=HMdjrFHeToQfAEvscsVVoqa+SBa198mnte3uJEo/eGJf5dbhrzWMJ89ZMVoIah7T6l Udiy6UPjA5gLDwqxZHVvewQCg/2X8vtqVuJMsuPGpXM9lepFv3kw+pXQ/LQyDE5rYS4y IDbbooIioFN022UmnlVuiaYQzzv3JnKC2tP2tN4spnTKfL6R0pV4N9jqy/JUHzu1xyTA r/M+EnqYEN2YyRzaPPHD1lYm3oySfSprZ0wg5QdUlbsuirHeF1jl5uN/tY0zB72Ikjog 9cO6qgR6YlV0j2IwDJL6pUIWPdsjIIbI5HbiAy2E5Yq6brCQJXv1egn8iMUvudcqnnnv BTow== X-Gm-Message-State: AO0yUKX/28q/uXIR7jkmzrvY54Yz4mlPqqECNMf0fWezduU7P8lKMUgO 6kTkUccnvjUhnzIaC9P/EIxrrpCaICTmqbA2LW+YY7gglmqu1dNK X-Google-Smtp-Source: AK7set9TAGKUMJUFG/qC1W/mehhNMTwqaIWiHUtcR3eYqtasOMc+Z+hilLrkQFZx0TyYp6voxxR8G4t+HkpwBK1IPWY= X-Received: by 2002:a17:906:b043:b0:878:8103:985 with SMTP id bj3-20020a170906b04300b0087881030985mr824315ejb.10.1677571961820; Tue, 28 Feb 2023 00:12:41 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Tue, 28 Feb 2023 00:12:05 -0800 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap To: Sergey Senozhatsky Cc: lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 663CD1C0012 X-Stat-Signature: hngqu4trg6idmbczr1dmj5rn3rr9o9yy X-Rspam-User: X-HE-Tag: 1677571963-406629 X-HE-Meta: U2FsdGVkX1+K/OqDvo7aqnP719YMdQmSvCtKvwikpjTUwKD9M4GEdR/IjPubeIb0E2XYW6QiRPtQZzzKzIQqbYsFno22dVqCWf4so8+8h6gnrYLjsf+PnGhzzRCxuwxX6nZkt2CX8jecEgiI7rxZ28oA7qVNytJ9ZNxdghDjR4ejHTKL//MZTJU1+d5P99SDWdprY0wrThr+vDzjYjzLEoHaIJiUbTaW1PpOgLgB49NpycKotTNf596ZXQgWgjYVvbxq3T7vDmz1P8jVjXAkFKWN/WSAKPgzWn9+ux90vlymZMVNZ6fox20T3cdpLU+2tSPDNAuTvBT8IiF3U+htGdt6F3TuMNuWPbetWDHz54s9AVhDneVkxOMLG5unm+6gGz7oeD+SlxeK95bREeAdPx0hDxHRcytqP5c5uq5Wrf9RufuMR78unuW903eQYpgzD3VcA7AKJo8saP8PXeOxjCSzkQVvedNDgBE98/sClPrvMrKjgBHrrfpYUg0SjcVNAvRRcDCYNZJOM4WxV7rI3Zhxy/eSs0FqxWIdbMU/BMnMXZiLmWDjtyl62U/uKUB2pmaQ4F5jkugrIBhI70iSHBG6j3ktavqd7SdXOHFLaBu/IDqHc7DXd1izvtf4koo2FWwDNZh9GAr1J0YCUJfzbGu6IHe+nD/OorobBW3ao1C+Lum8YJWFQ+9xB9jR/9q6h1m7BJ+Y+SgmAfcD/azscWdEe0NnunZRqDdQZCeUUzzb5c9/5EuwFG8aijSXsxb63LYKcQlONV/zh21DbfCg+P5mJMRi5pR4i84QEby4mLHDbgyvRGnQr4WhTaO25+KENuk8CaAGN+xDB49b81N5EsQVNaOKJ3UE9oEU4Bn3dbHN9FO1QdmCf7+AYDmdAFX+FaCk3cgVWKJRpOFdBtPjxNa+MicgJU5y+vm9jMUT5NcPG4qZ7DHiPhEggSXTlgD1BspWwzlSImho7J2grOZ nqEsIRwH lqrNMm7M3bJT5esG7nZpryg2e49HSUZQ9YhM8Ejdu+2QOAd7JZ6FNKsRMuOqN1atdCySBpWxDUjQzXpCRaGsHMBuRr7eHLetq3PJovK5vkbZ7ik1wpA/OSctJfHlzngIpda1PlWb75K7sKQHt3olup+oN9Re8mztaIKUi1Hd7b6GDm9g4fBYdjVdsUi6XcphGNomWzIM2qXpoL1ps3rac7aJ71tAK/6TgQjV4sDyDEJDoS1CN7s9hQ7SbpffR3++/1JjUd6M+KXrkaewiPGI2eBZ3PoynPwYvt4WOydY+NhDCYurWt7nTtIkG0QJ6n28s0Lu66ezfMEvjYVmARA/zEMy10w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 27, 2023 at 8:54 PM Sergey Senozhatsky wrote: > > On (23/02/18 14:38), Yosry Ahmed wrote: > [..] > > ==================== Idea ==================== > > Introduce a data structure, which I currently call a swap_desc, as an > > abstraction layer between swapping implementation and the rest of MM > > code. Page tables & page caches would store a swap id (encoded as a > > swp_entry_t) instead of directly storing the swap entry associated > > with the swapfile. This swap id maps to a struct swap_desc, which acts > > as our abstraction layer. All MM code not concerned with swapping > > details would operate in terms of swap descs. The swap_desc can point > > to either a normal swap entry (associated with a swapfile) or a zswap > > entry. It can also include all non-backend specific operations, such > > as the swapcache (which would be a simple pointer in swap_desc), swap > > counting, etc. It creates a clear, nice abstraction layer between MM > > code and the actual swapping implementation. > > > > ==================== Benefits ==================== > > This work enables using zswap without a backing swapfile and increases > > the swap capacity when zswap is used with a swapfile. It also creates > > a separation that allows us to skip code paths that don't make sense > > in the zswap path (e.g. readahead). We get to drop zswap's rbtree > > which might result in better performance (less lookups, less lock > > contention). > > > > The abstraction layer also opens the door for multiple cleanups (e.g. > > removing swapper address spaces, removing swap count continuation > > code, etc). Another nice cleanup that this work enables would be > > separating the overloaded swp_entry_t into two distinct types: one for > > things that are stored in page tables / caches, and for actual swap > > entries. In the future, we can potentially further optimize how we use > > the bits in the page tables instead of sticking everything into the > > current type/offset format. > > > > Another potential win here can be swapoff, which can be more practical > > by directly scanning all swap_desc's instead of going through page > > tables and shmem page caches. > > > > Overall zswap becomes more accessible and available to a wider range > > of use cases. > > I assume this also brings us closer to a proper writeback LRU handling? I assume by proper LRU handling you mean: - Swap writeback LRU that lives outside of the zpool backends (i.e in zswap itself or even outside zswap). - Fix the case where we temporarily skip zswap and write directly to the backing swapfile while zswap is full, until it performs some writeback in the background. This work is orthogonal to that, but it is on the list of things that we would like to do for zswap. I guess you are mainly eager to move the writeback logic outside of zsmalloc, or is there a different motivation? :)