From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 786F2C6FD1F for ; Thu, 23 Mar 2023 00:57:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE66B6B0072; Wed, 22 Mar 2023 20:57:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C96D36B0074; Wed, 22 Mar 2023 20:57:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B610F6B0075; Wed, 22 Mar 2023 20:57:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A92686B0072 for ; Wed, 22 Mar 2023 20:57:54 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6E6C5405AD for ; Thu, 23 Mar 2023 00:57:54 +0000 (UTC) X-FDA: 80598350868.11.91A4505 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 1EE7E4000D for ; Thu, 23 Mar 2023 00:57:50 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ftXUj8E4; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679533071; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RHCJxeKu1lEnbHYwq58+IU2A+DAmeh8SFrYAEh9tYXA=; b=UZpRaN/BBa9KFkQ+XfuE1EsowQDT762Y8DxJ61Y6gOlpl81ud+ESrAgh/82NciP49/MU54 z1CG7WeLNoWeAep4UBIWp4Obwp7XB7PgzI5XgQJGNOXer7x90waWu+8poDpV4FOYck8dO+ OIrTfdk3j58XA99NylMpxZ7h85/1phM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ftXUj8E4; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679533071; a=rsa-sha256; cv=none; b=p1u5lS4QUsrZ4z6WEP7u66OENYWJI9C5WpikVn86Iv4hCGk+tPZ4v/EhvBib4ftD3Mm369 5uV8yeCgqIKi47no7uNlMohh17sdDYujCPcNXe84g8iAsV7zN1eX8qbY+/rfWsEMAt0/xs p85YOd70ixZLxHKhRszgiVK+usNjnjY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679533071; x=1711069071; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=Jq0by3UeJ83b2ZXK6eJx6oTxYShhSidFL3YyPzn6xBo=; b=ftXUj8E4UT0R9jx9DRwoReeaL4EzCxgDp/+FjMU4nuX0ovj0UCpiCamW aXyPBA+cKSxBGHdTKdRgJrZChNNYJyKBHaZaN/kp0kH54d0qRA0TGzcCC y8rsUvnAleamdk59hEhIcdaEo9EXL4uYybXSAbgP1kRFFYqY3GYGp1VpM EsFbTFnYcr/HaNhn9AzVBEhU1qFho+VPKIDk/eNYD8iyk+Twp4NhX1zav 0JpKNlAF6C4aD2hIKZhHynD+O0qL4QQMtgWU/nkCsc5u/i/nfUIbK+A5/ 88KaltAJfca8usyFX0jq7TmxktwR0klQqcyBpR+f+O4aEBiePhnwLMECE w==; X-IronPort-AV: E=McAfee;i="6600,9927,10657"; a="367105487" X-IronPort-AV: E=Sophos;i="5.98,283,1673942400"; d="scan'208";a="367105487" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2023 17:57:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10657"; a="1011571341" X-IronPort-AV: E=Sophos;i="5.98,283,1673942400"; d="scan'208";a="1011571341" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2023 17:57:45 -0700 From: "Huang, Ying" To: Chris Li Cc: Yosry Ahmed , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton , Aneesh Kumar K V , Michal Hocko , Wei Xu Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap References: <87356e850j.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1o571aa.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o7ox762m.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkkt5e4o.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1ns3zeg.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 23 Mar 2023 08:56:43 +0800 In-Reply-To: (Chris Li's message of "Sun, 19 Mar 2023 23:25:54 -0700") Message-ID: <878rfothdg.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1EE7E4000D X-Stat-Signature: 5osqizdspu7haokmpgx7qbj851eo3jpm X-HE-Tag: 1679533070-22223 X-HE-Meta: U2FsdGVkX190AHt31QBURTWhYxwD3tWFmp/mFyk6xofDSo4vjLOVLyQZqcSxsDGKHya438xVhkyJLluvg/bJaqd8qZ5mbF7hxRMuyiIgiMfMAK85//tQQNt9TB1uAt1EioyDTeWa5QlZaEzKNECJU4eZqA5SZuH1Qk1DaMac7Ta+EOJsmXpJUSupEER/jPBoTzjdxfgZTWwj9rUsBlPI5z7doCmuW3N5NXvy52sTTbKA/Wso+RoeNEQR7tv9r4kFPyJVHYm3CYCpB0sA6I3FEz6ueetMegA4e5plxylCVtfkSnhsvQ94hWO7gJz1ZBObK0h7zBuIY7Rdz5kZsS7pJmYmEzrIBrWPp3KJGNLD//ZHTU+OLFDs5f2bq4QJwLOuq4N8S9WX9bqPQd7cob4oQTPF0fMEXDETRAmc0kP1S5ssok1vFBecRbwcsflBRe1qy3lS6huL6IhRXXU7R1LGkHXj+aBwmStjwJna7eYmrCITLwkema/wEFukJsgR2I2XBlvsuP/WzDeGTL0rvOAqNWsM1K2PJnI7b7+26B4wfaBfsc933P8PtT8zL8H577pIYno3D+lCLWM45whUmuj1emowmvLHssaCtyO5GXoF44C0QgVM+rZd01MktECC0oZBnQbB6mvngC+H35IN+WwNCYxuESq4RO0ux7RTZ921vWm5JCjljklBW85Hnrd/xlP8RLFuJPTHb3daF5TABuCoZAQzrCq+XKqG3Koym/Cx7dV+ryKOpbZEA2JSveTFDYEvAcuZBhmA59YgK6xSwB2M+815w6OOm5hHCYcdpcrOvjU/ewq/7PR6fZVH3UeWWiWbWJ3br708quKAhlKHxE2/J90fmho+AS0HejhlvjHb+KnJrmcC6pvqj5tetWjFqJzgcENZLmAo+iWLFYJL0cXD1l2SbX28EJFwcQbn7Bg9IWk2gbinhC3YtAs4LgU9ntL5Mcrtuf5xbof9Tmqgci1 MdO+zlCN 98gNZfuG6ft1u/r7x9LxE2VGz7wOFyEFUETQxH+0QGXHEmVfLeyFY3gAfUi52Rpec0hhAlit8crRW6l+nvXrQC9GDs7GOKsf2Eht6P8iQX3ZAASOtgvF+AU2SDsBn362c+Up34RieWP4neEPJWsVJ5jOWbmbGAfJDUU5eUNuyZE2m2GY1gCMedP/QkS9XumoaSkF9RRditIzPyRin4WVCfq87c+fhcYBagOdgEA9pkUuF1ZQLt5VJ7UjvLoRkdlzMpwvjuJFdb2OGvnRz8eIdOI4RWUpswZUSGNaDnDGi+2ed2Hl1ZtUTKuZTxA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Chris Li writes: > On Mon, Mar 20, 2023 at 10:55:03AM +0800, Huang, Ying wrote: >> > >> > How so? With the indirection enabled, the page tables & page cache >> > have the swap id (or swap_desc index), which can point to a swap entry >> > or a zswap entry -- which can change when the page is moved between >> > zswap & swapfiles. How is xarray (a) indexed by the swap entry in this >> > case? Shouldn't be indexed by the abstract swap id so that the >> > writeback from zswap is transparent? >> >> In my mind, >> >> - swap core will define a abstract interface to swap implementations >> (zswap, swap device/file, maybe more in the future), like VFS. > > I like your idea very much. Thanks! >> >> - zswap will be a special swap implementation (compressing instead of >> writing to disk). > > Agree. > >> >> - swap core will manage the indirection layer and swap cache. > > Agree, those are very good points. > >> >> - swap core can move swap pages between swap implementations (e.g., from >> zswap to a swap device, or from one swap device to another swap >> device) with the help of the indirection layer. > > We need to carefully design the swap cache that, when moving between > swap implementaions, there will be one shared swap cache. The current > swap cache belongs to swap devices, so two devices will have the same > page in two swap caches. We can remove a page from the swap cache for the swap device A, then insert the page into the swap cache for the swap device B. The swap entry will be changed too. >> In this design, the writeback from zswap becomes moving swapped pages >> from zswap to a swap device. > > Ack. > >> >> If my understanding were correct, your suggestion is kind of moving >> zswap logic to the swap core? And zswap will be always at a higher >> layer on top of swap device/file? > > It seems that way to me. I will let Yosry confirm that. > >> > I am not sure how this works with zswap. Currently swap_map[] >> > implementation is specific for swapfiles, it does not work for zswap >> > unless we implement separate swap counting logic for zswap & >> > swapfiles. Same for the swapcache, it currently supports being indexed >> > by a swap entry, it would need to support being indexed by a swap id, >> > or have a separate swap cache for zswap. Having separate >> > implementation would add complexity, and we would need to perform >> > handoffs of the swap count/cache when a page is moved from zswap to a >> > swapfile. >> >> We can allocate a swap entry for each swapped page in zswap. > > One thing to consider when moving page from zswap to swap file, is the > zswap swap entry the same entry as the swap file entry. I think that the swap entry will be changed after moving. Swap entry is kind of local to a swap device. While the swap desc ID isn't changed, that is why we need the indirection layer. >> > I think for this proposal, there are only 2 hardcoded tiers. Zswap is >> > fast, swapfile is slow. In the future, we can support more dynamic >> > tiering if the need arises. >> >> We can start from a simple implementation. And I think that it's better >> to consider the general design too. Try not to make it impossible now. > > In my mind there are a few usage cases: > 1) using only swap file. > 2) using only zswap, no swap file. > 3) Using zswap + swap file (SSD). > > The swap core should handle both 3 cases well with minial memory waste. Yes. Agree. Best Regards, Huang, Ying