From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 765EDC761A6 for ; Tue, 4 Apr 2023 08:12:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FD9B6B0071; Tue, 4 Apr 2023 04:12:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0ADC16B0074; Tue, 4 Apr 2023 04:12:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8FC86B0075; Tue, 4 Apr 2023 04:11:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D945E6B0071 for ; Tue, 4 Apr 2023 04:11:59 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AE5A1A0819 for ; Tue, 4 Apr 2023 08:11:59 +0000 (UTC) X-FDA: 80642990358.16.43174B0 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf21.hostedemail.com (Postfix) with ESMTP id 85D321C000F for ; Tue, 4 Apr 2023 08:11:56 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fwpIkcSf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680595917; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iLh9caEAk7RmrQN/urJ3496HS9bacrVP55Cd+OyR01U=; b=TCXjVwPFXOFWIx8JIA801ttJwmu0DgRo3+wxBRs5AcrtEVdxl2a0WM6u9C5efqqvhh9pEp W8tsfO224hpaMPLmqAAICLWeeKshLeUuYZ1gMCGzvOZKEBeDZ2zgQgTk/Xs08UgrvOR3C/ o6Itf5etgmeu14waGHd6BYNV4hzqbCE= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fwpIkcSf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680595917; a=rsa-sha256; cv=none; b=jNsPcFo0kBo8WB3XlEROEsy3n+Q0xa0uDKHhDpVenYLF3JS65eSIUy2ddefnr5bpJdlOe6 zFyI4D8rgx2M8fiiqA2PBi+/wCYlEMivAUck3CQD8b8MILbsk6g9lAVhc2UQPknCphA4I3 4FOFIIGcSGTHmWfIoVWQUHXBpjnAi0k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680595916; x=1712131916; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version:content-transfer-encoding; bh=6KHp6uQaS0N2gyWJ1jtY/I6sSJa2IYFqRMIlYZa5zv0=; b=fwpIkcSfhzEQSjQC83R8i6X3nQQ8CoHHFfYZYt2pi0PDsbQWTIFhvCEX 1dV4Ef4L6YtbsAy8xI0dhM6T+uoTnO9I2u25TsBmWeQFkAX/nIAlM/LLi iukTTWEg0Sv2ncLwT1v2+pHyLaeUURiCD38a/ojUHsGZnIXx3EPWDOpDn 18f4kSGCHf3Pu/49KFo2bzOBetm3VKZX5ic9oSvG7lSYsOBsFb+qntKcI P6JRQc5VGtX66qbEcDaJGMxn1sYVp05N3ZRdOeUDgzY4ftMzxx3pkKmkR 1yK/Sdgz7TFZR+qaONt211keIxbblcldfhw/ZquwB/IBwOQsfGAffX3Rv A==; X-IronPort-AV: E=McAfee;i="6600,9927,10669"; a="340852664" X-IronPort-AV: E=Sophos;i="5.98,317,1673942400"; d="scan'208";a="340852664" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2023 01:11:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10669"; a="716563988" X-IronPort-AV: E=Sophos;i="5.98,317,1673942400"; d="scan'208";a="716563988" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2023 01:11:48 -0700 From: "Huang, Ying" To: Yosry Ahmed Cc: Chris Li , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton , Aneesh Kumar K V , Michal Hocko , Wei Xu Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap References: <87edpbq96g.fsf@yhuang6-desk2.ccr.corp.intel.com> <87jzz1pfb3.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs9ppdhz.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkkcpckw.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 04 Apr 2023 16:10:47 +0800 In-Reply-To: (Yosry Ahmed's message of "Tue, 28 Mar 2023 18:41:54 -0700") Message-ID: <87y1n8xe2g.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 85D321C000F X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: xdkzj9igqjqwtenx8uoogeic6e36srqu X-HE-Tag: 1680595916-806584 X-HE-Meta: U2FsdGVkX1+hBp5M3MeDJ29nY8qt+7kMIaRPa2VYIC/0KL3nYMJo3JuvjNWjxFdhF++bzu/s2izhJZyg5eqJDo+ncOP8dGrBvebTKyBugUo7ckP7YyEsgm7G1Q2lifaKsoEHmwM+12hkVd6ndj44wA3eJYiDqjviMBPXRTZjApRVnpspLk2NJD9uHWrMOdmVSWU3kNUjqygoVQdO5xfOhZH31o0rCr04cWAkds/zF3ZDURlxTUEiNFjTJOrgCVf4yIrf5EbvoaQCZbXRFeQatg8ez8DMHTjzLOWMYdWQHOcrqUEAOXZJ7/pnhd/10axkz/uUYgjXxTCU6V7rdH97A/f0HCLi3FZvtPuqd/K+anNtZOCUfu+82C6YBTzlMKUwbFaI4ldcNNAH4zi3Aj6SfM5n3VMyQ1e4oOT0DQIuWS5AGJNq7fJBb8D1DDYV+2gdL4ULrW33PKMV7rxlpxcYG+/ieUscW4cHzygv/zpSNTHAPXrfXPI7HwEzS3JzThatVGj0iWTAqNyMA9MLDdqKo+cZgq1Xs8CQMx2nONFQzOxsxWgAHZ+d9eClXVygSueB/zQ0YKbEIVyWGDRPcsR+/G7C6wHFj35u2tzx6N9dMqR3ajPkHg3KV6bP56Mmx90t6t7EFlRBxzCgmSsfGGKkrwMM2roVbayIJgMH0OYb6jZqdEYxpfKlgk4GzJ5N2z4L0m5/NUtqidWwiMWfJKxRdM7f67BazxvkrORzwLqI9H+JcJRoFCEMMP8ilWw4jidqKfvp7jv/0AHMm/geJJ6ZywIR4mVP/7qMiTN24asaiuhYIRZHOLNsBclF4JcVAIBe9dpMjYs7aw8KnE0wLAgvCIidJV3xHjzTmeaAtZ20CEFmY5DuC3CbSkfOKx4Ce11eg4xMWX4gr3Oomy+boWrjuwhwfO7FI1H77iCLMAhXg/hQPtiHf+sm71pT3dTD8tg3a2COwMoKoi1yeDFfgEf jJMoMHVk ZClTqNLtgmINMXc1ovh/qgPS7EMYc4T95mUj3zt/crh0+sPy0D1Eef96FwkObcZXaRGRC0Wbx79c1jpL8bxX4+Gpeba1kMRZPZqr3xlqVv5j4NmvgxV4846o/obZvu3RSiXwqosONfaivE6CLKrsmVqGCV4BLrvlyHKTkIM/83gM+pQViGwjkAhpri+rfTu6U/4cDBhURXKd+zxO+avviv1Fa5nfegTxMI398tx5Po4Bd0iJhBD8Iham8N9Z1Uz9nWAKUZrUuhZFYYp65Pav6/4MKHh3r6VL2GtKrklKFXook5nw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yosry Ahmed writes: > On Tue, Mar 28, 2023 at 6:33=E2=80=AFPM Huang, Ying wrote: >> >> Yosry Ahmed writes: >> >> > On Tue, Mar 28, 2023 at 2:32=E2=80=AFPM Chris Li w= rote: >> >> >> >> On Tue, Mar 28, 2023 at 02:01:09PM -0700, Yosry Ahmed wrote: >> >> > On Tue, Mar 28, 2023 at 1:50=E2=80=AFPM Chris Li wrote: >> >> > > >> >> > > On Tue, Mar 28, 2023 at 12:59:31AM -0700, Yosry Ahmed wrote: >> >> > > > > > I don't have a problem with this approach, it is not really= clean as >> >> > > > > > we still treat zswap as a swapfile and have to deal with a = lot of >> >> > > > > > unnecessary code like swap slots handling and whatnot. >> >> > > > > >> >> > > > > These are existing code? >> >> > > >> >> > > Yes. The ghost swap file are existing code used in Google for man= y years. >> >> > > >> >> > > > I was referring to the fact that today with zswap being tied to >> >> > > > swapfiles we do some necessary work such as searching for swap = slots >> >> > > > during swapout. The initial swap_desc approach aimed to avoid t= hat. >> >> > > > With this minimal ghost swapfile approach we retain this unfavo= rable >> >> > > > behavior. >> >> > > >> >> > > Can you explain how you can avoid the free swap entry search >> >> > > in the swap descriptor world? >> >> > >> >> > For zswap, in the swap descriptor world, you just need to allocate a >> >> > struct zswap_entry and have the swap descriptor point to it. No need >> >> > for swap slot management since we are not tied to a swapfile and pa= ges >> >> > in zswap do not have a specific position. >> >> >> >> Your swap descriptor will be using one swp_entry_t, which get from th= e PTE >> >> to lookup, right? That is the swap entry I am talking about. You just >> >> substitute zswap swap entry with the swap descriptor swap entry. >> >> You still need to allocate from the free swap entry space at least on= ce. >> > >> > Oh, you mean the swap ID space. We just need to find an unused ID, we >> > can simply use an allocating xarray >> > (https://docs.kernel.org/core-api/xarray.html#allocating-xarrays). >> > This is simpler than keeping track of swap slots in a swapfile. >> >> If we want to implement the swap entry management inside the zswap >> implementation (instead of reusing swap_map[]), then the allocating >> xarray can be used too. Some per-entry data (such as swap count, etc.) >> can be stored there. I understanding that this isn't perfect (one more >> xarray looking up, one more data structure, etc.), but this is a choice >> too. > > My main concern here would be having two separate swap counting > implementations -- although it might not be the end of the world. This isn't a big issue for me. For file systems, there are duplicated functionality in different file system implementation, such as free block space management. Instead, I hope we can design better swap implementation in the future. > It would be useful to consider all the options. So far, I think we > have been discussing 3 alternatives: > > (a) The initial swap_desc proposal. My main concern for the initial swap_desc proposal is that the zswap code is put in swap core instead of zswap implementation per my understanding. So zswap isn't another swap implementation encapsulated with a common interface. Please correct me if my understanding isn't correct. If so, the flexibility of the swap system is the cost. For example, zswap may be always at the highest priority among all swap devices. We can move the cold page from zswap to some swap device. But we cannot move the cold page from some swap device to zswap. Maybe compression is always faster than any other swap devices, so we will never need the flexibility. Maybe the cost to hide zswap behind a common interface is unacceptable. I'm open to these. But please provide the evidence, and maybe data. Best Regards, Huang, Ying > (b) Add an optional indirection layer that can move swap entries > between swap devices and add a virtual swap device for zswap in the > kernel. > (c) Add an optional indirection layer that can move entries between > different swap backends. Swap backends would be zswap & swap devices > for now. Zswap needs to implement swap entry management, swap > counting, etc. > > Does this accurately summarize what we have discussed so far? >