From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33A00C678D4 for ; Wed, 1 Mar 2023 23:22:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 682F96B0071; Wed, 1 Mar 2023 18:22:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 634046B0073; Wed, 1 Mar 2023 18:22:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FB746B0074; Wed, 1 Mar 2023 18:22:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3FC656B0071 for ; Wed, 1 Mar 2023 18:22:30 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D2AEF1C5E2C for ; Wed, 1 Mar 2023 23:22:27 +0000 (UTC) X-FDA: 80521905534.03.518B094 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf26.hostedemail.com (Postfix) with ESMTP id 220C214000A for ; Wed, 1 Mar 2023 23:22:25 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=FCysrPad; spf=pass (imf26.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677712946; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3KgoQBZ3YMX/l/7iWzoZ6wBpKXeKVOaEoQINtPCQTOI=; b=j78mFzGKkKDG3OK5cUFNehB4KZL8AZxAPDMNYc4+DwglxMikPOX2PVQPhUcV3KgUQCUzn4 wNa2t3sb7/o/UhjZJHpaylRSyYGD6B5WA8O/91ADzKbo3OQ6DN+1vTzARbJl4yzWo/B6cv BKoRfYMZhiRHqxPoujJhFCYvZJNqSMQ= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=FCysrPad; spf=pass (imf26.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677712946; a=rsa-sha256; cv=none; b=ToM0f91oGlI72U/dIDnPfgHQQcwnhBiJcj7wKBxXn+7MRYCphWGprIHSFiU/W59Ft2pTPA rgTBy4+46T4eG50TcwN7poZojhLFW+RhuSMts4n2eq+GmJwvVZNUCWEfKP5QUfBJ5tZLN9 xDAxrXd1TE85qaSHS++NRI6OgRfEeQE= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 050AF61321; Wed, 1 Mar 2023 23:22:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BEC87C433D2; Wed, 1 Mar 2023 23:22:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677712944; bh=ge9fxZC/2bqb8akw/ZsIFFUYeXjNCQkWEUzOiisQeQY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FCysrPadlyIv1hgxbhLDsi3jFppMxCVXL1/xFBH7uIJqYzVCgvGYKchrx3P0amQ0x zvgKSLgM3T4Nk0Wl6BoWMiCsg1Qlzq99f8bX60y2uU5+OCSUMkhUSrt3VQe7JuXrIP zpHmDXqPbKO+2Q+13YUSATbut5rbmG5sJS0tPNcKuQm1ybSuhu7PzYt4WfRF3gvPPO ovppqOf1J4M6kdIrFPdY1+PPEouVEnkfZEago7GUNeteul1pnruJh2QCNsslFaY6RR FfdFaijX4PEdHRHx7QWq96TsiU9lDasYzm0NAWOhxJrxPPY123WTJIFYsWwiD4OVPf 8FqB5fWsx34Ow== Date: Wed, 1 Mar 2023 15:22:21 -0800 From: Chris Li To: Matthew Wilcox Cc: Yosry Ahmed , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton , Huang Ying , NeilBrown Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: 1xr5tbhk6pp3ydz5b5j8efan94yffrym X-Rspam-User: X-Rspamd-Queue-Id: 220C214000A X-Rspamd-Server: rspam06 X-HE-Tag: 1677712945-324925 X-HE-Meta: U2FsdGVkX18K74RGmtmexfEWfcKwQ9fm0+XDIQ04cL8oiZzzS0UtDwzDFQQMuJPbGHUv+625J/v/ES9rU1zukklPm+fOnlG5sbOlQujhhUPk1EhnMsqGAug5kVFOzV3PrZ3x4/E308XYD16c/8VVY8yS41MLNsDu8ja3GlYvUI1FuyBAm8336Qkl4J060fOU8c2KOTRyl2lID7+6yfcYggjXQyzvsjuMuY8OA4rathqCqaiBqELHdgrSdrIHn7HI7ui/NjXwnUWxjlXU6yzQ9Ae1T6k1ldGTnDnYOu8R388M6JZtwN1F5WckRbE27ee85AqBi52DSg1B6Okz2zXIHteIhn7JqtsA5MZ+DeBANFyVR08XHwHG69uMpkM/Ny/x2CSBo4p78xAFP6A6TJkza5MTVXv/k2E3/upguL0n7QlXahX0bL1FJHa9v+gD0arxRyUUeZ6NNjvC5v/THQ/NHH8fko/VXmescbxxXAxh6UZ6Kejh57xnjDeFHljzMGItPwtr2xPCrAnE1l+AOHdxlCMIobD5DpYJgs1SKDOkzFmRfeIm5x4TWNBkbv0XgLYcLHnh6x0n8V4NakHKgP64uH4/ofudQuwKeBsVhA2aQMpHpNT3iyuHikdDQcQ5X8pxYcqeCIUMQ1qP2gNKZXg7ubYUrJRfLaNFqU8QSV4VZdqTgY3cmBFt0l5DkQYHc2AalX0ZdkqnVJqov1qZaj52LM/oeuwrLUsXp5Iaw3vLpmnh+A4Xv5c6YgPFT1Kt0Rq+Th8ClLQ2TV0gEah9XO5HiNiwhSDFqU4C1R4DjDdWmiQvd+IC5hLnQQ7+hrqpNa3/mIk68n6YbPYVU79w+FA+lPM8PA9vtW9Ey3fe7EXnBXEhZ/HxrdxjFfcYtgqkpJFBt95stdYKFpYCawMrCUoKpLPWVB+dbeGcSWO0vo/gv1mXY8MtVLD7sjRHs3cimbjKI+1ckhBYtBajaBJqxTP yfwsGB71 ZQPiIMWqcdcztzazIa54tAis3/CrDr8VfKSHyjFa7bujLLnn5s/gSBt5bGPp6R/QQI8AzJi2AK3/eENIRh4t2Be6s7QziipEgltcBg4fb6NRimolllS6a3a82GDqZY69xF3YdiovnxtfthCb6RNqkuoj0uQCkH5+xu7MN6yFFbPHfQNP6te20rb6Emt+aanOLHJvYAMPgzYrEdjJPX6DNBCiC3sKWSWOZtiUBOkRPI6GUeLPjaSG9NkRFU/Tb4DvJ/+a5IDnV95ANBCvK1kTa2flgVpGXeo0Y/G06X5OJlM043kdmxmIzE/N8AvaZSHDW74PS9X7B0LPWsI2WmlZVGizxpy6N13rgfJ9UkCeER9DdPrcp3iq27+AENKMOHLKsy1b2JodeeYsa+RasB1+3Fogy0uiLKPh6MxG+IOvbuA2Ub7E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 01, 2023 at 12:08:17AM +0000, Matthew Wilcox wrote: > > Can you expand a bit on that? I assume you want to see the swap file > > behavior more like a normal file system and reuse more of the readpage() > > and writepage() path. > > Actually, no, readpage() and writepage() should be reserved for > page cache. We now have a ->swap_rw(), but it's only implemented by > nfs so far. Instead of constructing its own BIOs, swap should invoke > ->swap_rw for every filesystem. I suspect we can do a fairly generic > block_swap_rw() for the vast majority of filesystems. The swap_rw() is for the file system backing the swap file. That is more close to the back end IO side. In the case of zswap, it it can't be implemented as a simple file system layer because the vma can only belong to one file system. Zswap can back some of the page in a vmx but not the other. It will require some support before hitting the swap_rw() paging path. BTW, current code the swap_rw() is called from swap_writepage() which is part of the writepage() call as well. > > When the page fault happens, does the whole folios get swapped in or break > > into smaller pages? > > I think the whole folio should be swapped in. See my proposal for > determining the correct size folio to use here: > https://lore.kernel.org/linux-mm/Y%2FU8bQd15aUO97vS@casper.infradead.org/ > > Assuming something like that gets implemented, for a large folio to > be swapped out, we've had a selection of page faults on the folio, > followed by a period of no faults. All of a sudden we have a fault, > so I think we should bring the whole folio back in. The algorithm I > outline in that email would then take care of breaking down the folio > into smaller folios if it turns out they're not used. One side effect is that the fault might bring in more pages than it absolutely necessary. Might want to collect some data on that to see the real impact. Chris