From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2358BC76196 for ; Tue, 28 Mar 2023 06:30:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A5103900003; Tue, 28 Mar 2023 02:30:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D99A900002; Tue, 28 Mar 2023 02:30:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8537B900003; Tue, 28 Mar 2023 02:30:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 70796900002 for ; Tue, 28 Mar 2023 02:30:01 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1B060ABA9A for ; Tue, 28 Mar 2023 06:30:01 +0000 (UTC) X-FDA: 80617331802.27.7A33FA5 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf15.hostedemail.com (Postfix) with ESMTP id 492C3A0004 for ; Tue, 28 Mar 2023 06:29:59 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="fhyQD9W/"; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679984999; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XtJXxe8ymhY8F0YW8qaQMobAc6txB2K/rWzvavgg2UY=; b=oOsabKROdHn5GV/DKIT3YCarnTi7JHhUYh3077xzwvuwD3MimRd8ynjr9A65kgEmW3Zu3T p5twGHqZuF2xLLb5JmZeaCuf3sEv3x8+WgEmF/se0zoTgWln7y1VQapQiGAALRg1sdZv1Z nYuFpSsI6a69BiwKPFu9eFdLgfeY4Qc= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="fhyQD9W/"; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679984999; a=rsa-sha256; cv=none; b=N+XFgUcXvcOlAZODJGXwl2QaeCfU1zHTPRwCc4pOag6WDscvXeH5WvNXi9hbRBDbG2ozBu LzkDU6wL7G5VlJFRyWDAfXC3Cr8P9198RsrmSCcLg6n10xf+Z4Qdt2e8SJSRi+wo4Q6KNv f/YOUWoigcx6/UQwlXyy1GBHxjUFtGM= Received: by mail-ed1-f44.google.com with SMTP id b20so45326970edd.1 for ; Mon, 27 Mar 2023 23:29:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679984998; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XtJXxe8ymhY8F0YW8qaQMobAc6txB2K/rWzvavgg2UY=; b=fhyQD9W/AyfYAsjNc1DfUAEPlqdYVbdzgd4j7uKwyZjXrVvGQSKDEcsRk0RJy20xfl Pxmj3kiBUufr/1KWRsL9sYcLDBs67DsWu/MH1sLZIPRSRI2AvSB5cNKGJmU+ndRtHbSt NVNYz7eC8ggJLtmqa2rHBpWQAZ9V1o1+5MjyiG5qRXdUUALE0sSt6D2TOmh1SVCzXEu7 v2OWH1qx4hoPV9zwfREo3E5Js5FZ7JrqGJqHm3XnS5gehhwQtQwM6oNtKi623W6mcaiD iUTwkK3SiS6rmTxKhY9tNd0y09i7dh1+NZp0v9/xw5wKXaQ7mbq2uyTrgpC6pSOjohVR QqYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679984998; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XtJXxe8ymhY8F0YW8qaQMobAc6txB2K/rWzvavgg2UY=; b=oMc/eQHeIScLWa6oFV1n3M/ykMqXLV/Bl/Ke0TqR2njEG4Y08503bILomAHHwfTNO4 dQxAUENTgfINBlwFx9pxdXf/U6+QzLP+GNV+SQeO0HNUuuGvnACYSvEd8HuHX+gL6Tst 3v6/TbxszNmY0bDIPVyMY3ult4d7mEBLqVHBMGrYC4MmZIQfHYQ633V/NWATPt9M1PRo NkBr/Q0qZqfITqgQteOGlPc0/qRjs3ewVObVpPJrzPET2pfJqCJjhnRCvdVoLjX1zFzn 9lOmHOmYUHgOUmP6yANLEtrocO6fmaRpU3kHuu/GYiKozZwm9I7jWlvDQcGB6C1rjQ1f GQMQ== X-Gm-Message-State: AAQBX9fQKkH1GOP3vvRtV01mG0xbQkfc3xFU45tGf3L4FxywPvmU/DvL LeDXJWP6hJBRasO/ZYbCPaMCcF9oWme4AoHc8bm+Xg== X-Google-Smtp-Source: AKy350bpXaouEn+6XATJveBdeORTa41loowQ2Gkg8d4LU6zv5Ht/U9spgrGclIC7sbLyfnyCUAONOAXh4Hk/IwcRzBg= X-Received: by 2002:a50:8ace:0:b0:502:49bf:7e8d with SMTP id k14-20020a508ace000000b0050249bf7e8dmr2095076edk.8.1679984997541; Mon, 27 Mar 2023 23:29:57 -0700 (PDT) MIME-Version: 1.0 References: <87y1ns3zeg.fsf@yhuang6-desk2.ccr.corp.intel.com> <874jqcteyq.fsf@yhuang6-desk2.ccr.corp.intel.com> <87v8isrwck.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkkkrps8.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sfduri1j.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edpbq96g.fsf@yhuang6-desk2.ccr.corp.intel.com> <87jzz1pfb3.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87jzz1pfb3.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Mon, 27 Mar 2023 23:29:20 -0700 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap To: "Huang, Ying" Cc: Chris Li , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton , Aneesh Kumar K V , Michal Hocko , Wei Xu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: m3ibizh8rko44qh5uspjbgjy8y5msbep X-Rspamd-Queue-Id: 492C3A0004 X-HE-Tag: 1679984999-504863 X-HE-Meta: U2FsdGVkX1970XGYMWZrBMuF1BGJCi3houL57MR1FDhoNWtdAPVUBHZnkIZQQ2QP4pinRcZuU3WIUOR5UZbVRqt2BIJRlt8PGnX+DpvwUgOX8m20vu+n7Ho08SVjsPEb4vf0ECyqY96kYS0AgJmmKG+7OBrOx7lRBJI1CyuVkRvDHMBwHCyyEieq/b5a+UUTjMArjRfn08R7yb8zyzwlO0hLCrOfB6WNzV9cOEXUqNQlVQHTbaFvRr3yFAeAXleU4EinpVFLdoEQ124+wM63/abSn+bQp7AX8nRoQdNLZE+EESuEtsaeORU8IATE3zdjMmvhTm2JLt8K2XzzGch09RKivzCms/TT2k7C8cN+zvan4dQY0eut4N9+fFmgXaOFxhKUQvshQ1u+DJfSxYxp8E/gvkTBUPmemrpcNxcFqFZapnBOcK/QnJa8qZPyqW6pJkGMo+ZpF3IRfDi4RVKUjrScMRbG9nhXzgQvZ4uPpegP8n88Y21ba3VcJ2CaTgESYAQpnsOXOFIly9Oj3Z3PBcsBjPZW9ytp6q+5o/SGWxdmsHSdeHNbjwqhxtIju7Mqc7XWK/L5ELjWDSEO7230xBSAd9XZ1DlPTUCwwIclLSUpHZ2qfkzl7cCAkWS7F4/q3eC8NyXoWYNXT0v0kvoKNvMsJOgh7YvPyiFOzp9UQGvMIf6N+jAcdVjqeQGRMe66KDY8zfrAmnny8goKAiAlJkBu2djb9/4pF436gA2loJyfwXYPZ0giCD/CexFK85FSiVZ9hVBJC45KhMlNwrCtJffc9w4UuBnt081wwe23foH/UhJlBXoXXrw4d2lakC+GQn9TQthK9QNRCPRSH5v/CyuFZ3MK1acXKYVlrMQWTTniy8LV1geMBZOVLkBjpU69Rss1tuj6SEiLp9kRz3Iybr3n8pqbu6iKlf7dqfpbCHUtkiuKg+55FRkLxAUVepLlqdRSuMZ0EBL0AHwQzia VtMuLevX 6SF5FGrtWf0fg+x2msxvaQ4yNfrOPGmftqe++5TZNHq59yCSNkSKdy2vm+0QKLVcHg2N0zWGQ7PJfeaTtihRdarZso1QTuyYc+hfy+lWSvVI7sTTrvDN8Lz5jvwnQKMH0XiMekwKdBElJUygXYO/Ecbrl8kVF8Ld2uiRTlI1g9xvFeVv1FhC8syg1i806+RlbzxHIT6pPKoDjl6f77WTjCNjJTUv3+8/sk853TyIMbkIUrabWZ9dYSQvqLwgSvfLUaRJ+YE+5Q0S9HBNf5QfKM4JATqsX+4qRGX3IWAfyiibmEUobTCAUBr41w5MVJVpAt2jBW2c0GdKSuq+X0dBwdq7ZYKuTTasb00dK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Mar 27, 2023 at 11:22=E2=80=AFPM Huang, Ying = wrote: > > Yosry Ahmed writes: > > > On Sun, Mar 26, 2023 at 6:24=E2=80=AFPM Huang, Ying wrote: > >> > >> Chris Li writes: > >> > >> > On Fri, Mar 24, 2023 at 12:28:31AM -0700, Yosry Ahmed wrote: > >> >> > In fact, I just suggest to use the minimal design on top of the c= urrent > >> >> > implementation as the first step. Then, you can improve it step = by > >> >> > step. > >> >> > > >> >> > The first step could be the minimal effort to implement indirecti= on > >> >> > layer and moving swapped pages between swap implementations. Bas= ed on > >> >> > that, you can build other optimizations, such as pulling swap cou= nting > >> >> > to the swap core. For each step, we can evaluate the gain and co= st with > >> >> > data. > >> >> > >> >> Right, I understand that, but to implement the indirection layer on > >> >> top of the current implementation, then we will need to support usi= ng > >> >> zswap without a backing swap device. In order to do this without > >> > > >> > Agree with Ying on the minimal approach here as well. > >> > > >> > There are two ways to approach this. > >> > > >> > 1) Forget zswap, make a minimal implementation to move the page betw= een > >> > two swapfile device. It can be swapfile back to two loop back files. > >> > > >> > Any indirect layer you design will need to convert this usage case > >> > any way. > >> > > >> > 2) Make zswap work without a swapfile. > >> > You can implement the zswap on a fake ghosts swap file. > >> > > >> > If you keep the zswap as frontswap, just make zswap can work without > >> > a real swapfile. > >> > > >> > Make that as your first minimal step. Then it does not need to touch > >> > the swap count changes. > >> > > >> > I view make that step is independent of moving pages between swap de= vice. > >> > > >> > That patch exists and I consider it has value to some users. > >> > >> This sounds like an even smaller approach as the first step. Further > >> improvement can be built on top of it. > > > > I am not sure how this would be a step towards the abstraction goal we > > have been discussing. > > > > We have been discussing starting out with a minimal indirection layer, > > in the shape of an xarray that maps a swap ID to a swap entry, and > > that can be disabled with a config option. > > > > For such a design to work, we have to implement swap entry management > > & swap counting in zswap, right? Am I missing something? > > Chris suggested to avoid to implement the swap entry management & swap > counting in zswap via using a "fake ghost swap file". Copied his > suggestion as below, Right, we have been using ghost swapfiles at Google for a while. They are basically sparse files that you can never actually write to, they are just used so that we can use zswap without a backing swap device. What I do not understand is how this is a step towards the ultimate goal of swap abstraction. Is the idea to have the indirection layer only support moving swapped pages between swapfiles, and have those "ghost" swapfiles be on a higher tier than normal swapfiles? In this case, I am guessing we eliminate the writeback logic from zswap itself and move it to this indirection layer. I don't have a problem with this approach, it is not really clean as we still treat zswap as a swapfile and have to deal with a lot of unnecessary code like swap slots handling and whatnot. We also have to unnecessarily limit the size of zswap with the size of this fake swapfile. In other words, we retain a lot of limitations that we have today. Keep in mind that supporting ghost swapfiles is something that is exposed to userspace, so we have to commit to supporting it -- it can't just be an incremental step that we will change later. With all that said, it is certainly a much simpler "solution". Interested to hear thoughts on this, we can certainly pursue it if people think it is the right way to move forward. > > " > >> > 2) Make zswap work without a swapfile. > >> > You can implement the zswap on a fake ghosts swap file. > >> > > >> > If you keep the zswap as frontswap, just make zswap can work without > >> > a real swapfile. > >> > > >> > Make that as your first minimal step. Then it does not need to touch > >> > the swap count changes. > " > > Best Regards, > Huang, Ying > > >> > >> >> > Anyway, I don't think you can just implement all your final solut= ion in > >> >> > one step. And, I think the minimal design suggested could be a s= tarting > >> >> > point. > >> >> > >> >> I agree that's a great point, I am just afraid that we will avoid > >> >> implementing that full final solution and instead do a lot of work > >> >> inside zswap to make up for the difference (e.g. swap entry > >> >> management, swap counting). Also, that work in zswap may end up bei= ng > >> >> unacceptable due to the maintenance burden and/or complexity. > >> > > >> > If you do either 1) or 2), you can keep these two paths separate. > >> > > >> > Even if you want to move the page between zswap and swapfile. > >> > > >> > Idea 3) > >> > You don't have to change the swap count code, you can do a > >> > minimal change moves the page between zswap and another block > >> > device. That way you can get two differenet swap entry with > >> > existing code. > >> > > >> > Chris > >> >