From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEE97C761AF for ; Tue, 4 Apr 2023 08:26:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C6E86B0074; Tue, 4 Apr 2023 04:26:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2770F6B0075; Tue, 4 Apr 2023 04:26:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118BD6B0078; Tue, 4 Apr 2023 04:26:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id F2A396B0074 for ; Tue, 4 Apr 2023 04:26:11 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C72D9C0CCD for ; Tue, 4 Apr 2023 08:26:11 +0000 (UTC) X-FDA: 80643026142.25.2D1E241 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf21.hostedemail.com (Postfix) with ESMTP id A5C461C0006 for ; Tue, 4 Apr 2023 08:26:08 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lQoYxO5h; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680596768; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6e7jgVKObJ/zpZkLsrFHjPYRGmv4UWTXlODgj0xevP8=; b=feGbcGnm0QzF2lBbgw15VdxK2JzmlJ/6Eor+gVRsYOs6QVZi48WW+agRveVDAm6KLliiXR 9st2JJR+OSbISSo2mcRqxkhO47sixBu8n1kpLEw3QOoCG1QVVUYmIHvG9H/Gw9Wsbi1af7 G8n1vdUh2/4PIV/IXpUdN77wLXQlkUQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lQoYxO5h; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680596768; a=rsa-sha256; cv=none; b=23l2a718stL3ZNFiydxY5EbOxRsOCj3DKMLXPW2famBs8ICXTDrbtwAucEd9HopbYBs6iZ HAVYXQJ89T10Uaq4UfAqZlvxyz8YBbosgUi88s0wKayKROK4E+P6ctMcSXmeSaejP3Xzyy YzwK2z6OxfwmsUeYBgNaznG0+LPAcH0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680596768; x=1712132768; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=Lo8weqUYquYMdsv7N5Hdj9UZ7bBEtBhvnjvK9G29s+Y=; b=lQoYxO5hZ21GbgYCm96Hf5ZjAIAzaWAUTVvzTU6aPgtWhWBhzfNKz03g nD2W9s2eOBxftZUJoX/GWVEevviJVp8y1fk/Vs08TiX8QTHHWkZfRkCyx mcGYZA+WSmFa1xceaQpS0BwUu+tUoQJz/wijNYdQ7/eD9ADrCMzdDCWCh 8Py8V4jdIUgj8DCMYjPPiTjjAt+Y7dZRJ0/har/seqOFCp8+TE3R5QM95 t5C2jT85N4vFGw7Ix34HrmA0C84RYnoOAHScOodARsvpUhujkxQr2ay9u LJjJ4X5BZ00lXsU+Xrq3EUvLe8nnVPT8cRy0SG03s6QpMM6tDdSJHlt6A g==; X-IronPort-AV: E=McAfee;i="6600,9927,10669"; a="340855046" X-IronPort-AV: E=Sophos;i="5.98,317,1673942400"; d="scan'208";a="340855046" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2023 01:26:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10669"; a="810150685" X-IronPort-AV: E=Sophos;i="5.98,317,1673942400"; d="scan'208";a="810150685" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2023 01:25:59 -0700 From: "Huang, Ying" To: Chris Li Cc: Yosry Ahmed , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton , Aneesh Kumar K V , Michal Hocko , Wei Xu Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap References: <87jzz1pfb3.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs9ppdhz.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkkcpckw.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 04 Apr 2023 16:24:58 +0800 In-Reply-To: (Chris Li's message of "Wed, 29 Mar 2023 09:04:20 -0700") Message-ID: <87ttxwxdet.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A5C461C0006 X-Stat-Signature: xaangya6ejsbp94mnbhqk38jk74c1d9a X-HE-Tag: 1680596768-359100 X-HE-Meta: U2FsdGVkX1+qpEWUPOk8Dpx0r+ftQGoT79YuIET51rHVkb9N0Q1OEjEt9fM9a6iOT2zZns0YZAdvTB/saG0lmowUvYRHrWyRJOpAfHlxGQ14hXlT7mKOeNW0C9nMtsSskjByLgK+NLHAQshv/Vefo6xf/7JxV311DlaOydtdnlyw6oddyfye2vYXrMiePz2bGnkaL6OoUr0Caqf5fgUe8P77tsjbmL50LcJ7/6moM9Zpmnsp0iAGVdJRdJ2a2jqPDuuzDWqdWAkSmCvHHfmu1i1Jcow0/2vq/JMeZq6puckTBpTc9uJ7+Y8KcwBSdtQyfR7GLgwLnywCDsIxIZweCsDcfKAecPeGwiZKPRB6ihoJNi2ow6tDipcCoJazGxtZnYcINun6m6Nwzq2uUHHheYKwaNrXuxu3s7kFtnDtiZ36rhIxTjPenK5LXtZR8Yc9zre70W++wUFFz1TexJxXP2c32w7lONCiSjcSs9VsHC8kR+lJuS+4LwVdr6WqyXp1zwFDc1kJL9avRNDfyLRyBeCczv/28Gjfsefk++1vviAVx4p9C4SXG0EVoltQAjLPfq+lHarhDhJ4eYkYiHMqNImP3ofJfFTMXx8ifP5sbkQW1lVduiNvKEKWf5SitxVJHYMB+4dfulNoKlEAY1AtQ7hDcR7BBs7w6dKHGM/oZmKITXFbcyMU9fwPpdqcMk+1PKGl/LZdVLDc1l9pOICp1kTARHNziS1RSPezSRI+XvB5imq5x1GQiQKTXb4uH55nphECwzRtb56Xbkwk62VZnwzwzXd4PNUtXW2tahHhhYvN7iICn3+qlOftQ2iOCMhWWB5kUIwuS+NnmIK5iiE1d6YG4fTddR1hEu/J/xYpni2MY+nroqd6Sc42WyzFDSj0C73ojzmazapLzT7jfHoQXXhcdhiHBCpLnNl6ib5arIhonW6p6nyJwSVVhSAF/F1CyienkDr5i8B9KbRl0pa uDiupxkQ kqZwd6j1skdcZfhIxX6jArNgxEzth8MrOevpSh4qmnJpnvy63y85d5Be71Tz46dgRdTfpImIVlK7p2Y61uVsgubRl9TYTkJTEEXpR1Td/xTxmJ3o68Wsaxbpb+GGpvA3QGMbV/9SBhr0nC+jE/tQ0l/Flfn9Zktbhpjrkv5BBCNfcV2BAVXQ+kZ/EinVgbNtDSY6hp+oTfy+HZyv924/kCZv625KU6hqcn/1k X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Chris Li writes: > On Tue, Mar 28, 2023 at 06:41:54PM -0700, Yosry Ahmed wrote: >> My main concern here would be having two separate swap counting >> implementations -- although it might not be the end of the world. It >> would be useful to consider all the options. So far, I think we have > > Agree. > >> been discussing 3 alternatives: >> >> (a) The initial swap_desc proposal. >> (b) Add an optional indirection layer that can move swap entries >> between swap devices and add a virtual swap device for zswap in the >> kernel. > > For the completeness sake let me add some option that have both pros > and cons. > > (d) There is the google's ghost swap file. I understand it mean a bit > ABI change. It has the advantange that it allow more than one > zswap swapfile. Google use it that way. Another consideration is > that ghost swap file compatible with exisiting swapon behavior. > You can see how much swap entry was used from swapon summary. > Some application might depend on that. > > We might able to find some way to break ABI less. > >> (c) Add an optional indirection layer that can move entries between >> different swap backends. Swap backends would be zswap & swap devices >> for now. Zswap needs to implement swap entry management, swap >> counting, etc. > (f) I have been thinking of variants of (b) without adding a virtual > swap device for zswap, using the ghost swap file instead. > > Also the indirection is optional per swap entry at run time. > Some swap devices can have some entries move to another swap device. > Only those swap entries pay the price of the indirection layer. > > (e) This is the long term goal I have in mind. A VFS like > implementation for swap file. Let's call it VSW. > This allows different swap devices using different > swap file system implementations. I like this too! > A lot of the difficult trade off we have right now: > Smaller per entry up front allocate like swap_map[] for all > entry vs only allocating memory for swap entry that has been > swap out, but a larger per entry allocation. Yes. > I believe some of those trade offs can be addressed by having a > different swap file system. I do mean a different "mkswap" > that kind of file system. We may don't need that, because the swap on-disk format needn't to be permanent across rebooting. > We can write out some of the swap > entry meta data to the swap file system as well. It means > we don't have to pay the larger per swap entry allocation overhead > for very cold pages. it might need to take two reads to swap > in some of the very cold swap entries. But that should be rare. Sound like a good idea. At least can be investigated further. > It can offer benefits for swapping out larger folio as well. > Right now swapping out large folios still needs to go through > the per 4k page swap index allocation and break down. > > Basically, modernized the swap file system. > > The redirection layer should be able to implement within VSW > as well. > > I know that is a very ambitious plan :-) Yes. > We can do that incrementally. The swap file system doesn't have > much backward compatibility cross reboot, should be easier than > the normal file system. Agree. Best Regards, Huang, Ying