From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA6F7C64EC4 for ; Wed, 1 Mar 2023 00:08:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10D006B0071; Tue, 28 Feb 2023 19:08:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BCC76B0072; Tue, 28 Feb 2023 19:08:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EED776B0073; Tue, 28 Feb 2023 19:08:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DF6216B0071 for ; Tue, 28 Feb 2023 19:08:32 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A6CD11607DF for ; Wed, 1 Mar 2023 00:08:32 +0000 (UTC) X-FDA: 80518392864.14.5A0E3E9 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf18.hostedemail.com (Postfix) with ESMTP id BFA4C1C000A for ; Wed, 1 Mar 2023 00:08:30 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=s8nq2cib; spf=none (imf18.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677629311; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oHKZkThFrt45rEvANQ8u6c2K/JkgF77aqcytJ8ID3TU=; b=zg0ag0hE08XT2TLXTkZv9chiCc5+YmwnHzdnZdYjWwq3D5yEXLgj4MzpNm4+U1FOcEKsTT Cx/5svDpin22PvUJSW5hxjlhcb3l92PVR8PQPflwYTOvUEIau+vmVVhSzOU+Z8gagkvtMq f/2mBFYmOr9KIguaSCCqxEXqzxHdL/U= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=s8nq2cib; spf=none (imf18.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677629311; a=rsa-sha256; cv=none; b=oqjmpi+Muip3npDnF2RdvwcQoetehQJs0pvEf0ot80N7eGEzZ+IyBo+ChhhcW5rU3mO7Kl aBz2YSngKEhKtrkMpTKBdQAMwZNLtatFJ7nVU3caPUu/4XThToLx9TGpC8YSdTE9sSy+iI tHesiOi9685YeZ1hYrCbrvlXbJgcJuo= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=oHKZkThFrt45rEvANQ8u6c2K/JkgF77aqcytJ8ID3TU=; b=s8nq2cibeaUCpQM4+j/28ZBBH3 IWELGm6gwGHvuklZIDAp4qK/8q1dJ8bWYC1G156FxHKP0zxT2NxAvzXc9Ovz8r1IbK9PL6xtpo/9q BeKiMrcV93LWBrue+7hMyIcGigcd8Ui3DtGsFzH+J97ai+dnDGKrkaTqGc9oT6DC1z/AqNYduFVJ0 XW8RCOSrsZzBaVE552lZE3E0/i6D6WomlVqcHJYxcrHWa5n5Zy1MwpwVVLhNouXiP7bZWRQyxZRcB z5TQUObXzvFeNmdZd4/r6InDHenEHtaMbIXr2NWvWWPcJwhYjNsBae5Vwc936Qp+e8oMbtjLkWthA LtGKqXvw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pXA13-001E1R-SI; Wed, 01 Mar 2023 00:08:17 +0000 Date: Wed, 1 Mar 2023 00:08:17 +0000 From: Matthew Wilcox To: Chris Li Cc: Yosry Ahmed , lsf-pc@lists.linux-foundation.org, Johannes Weiner , Linux-MM , Michal Hocko , Shakeel Butt , David Rientjes , Hugh Dickins , Seth Jennings , Dan Streetman , Vitaly Wool , Yang Shi , Peter Xu , Minchan Kim , Andrew Morton , Huang Ying , NeilBrown Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: ufmh49577esh34wf3h31hrsyxxomdiwk X-Rspamd-Queue-Id: BFA4C1C000A X-HE-Tag: 1677629310-987985 X-HE-Meta: U2FsdGVkX1/fOqlnwawzwOWKc7NtSPmZqoUUSpxq1y+X6iwmwESCzlGb0alduW8tNkMfa+zPni0ndHFFCvOPR3kn177PjJUXhsei2+s+dQY3E+ZX7ttZV5WW2EYkDNM7omlVTBpf1w3OudBmxZa7Z+/ibjHUHssx30lzrdT3RIyU/lmaIvtO+H5PsZNWrMdaIsLITJd7HXvdHOx1RZhduImmde2JwCVYO1ykZMJAf1E5ZOA6hjpHRivKnIP9kZhUUp1nqheeHAZM9RyYbWLHR5L520UXNRR8jLXnuFleTOhIHJkQuP7ovSGwfvkeEeh3T8Nxwg3uwpirQ/lwm6vfUH2xHdBBINCkMKP+uV1z+bLhsbxj1qPtE6O2zoTBVLDuZxIDe+qoQzs5DqWCOECL4AeZsQ74uGtNaPRMDVNcMF6dfT76+iMZTM+8SChngA30x7F86oVvd7+if5qU7lUr5m34tzWpwuvegMxv3LAWEdCed0llGXzKU6zlUXAwUBmyqK3VBkNvflK30oYSIJfY+IyQJqUBU8b668Uey2u+hHg+Nhgk+kdMPjfoxZTUg88uDTr70EC5I9zNh/K6ZQzt+VtCCjtZtbD0FtDCSwoYCutoRWLvEioPpSpfL1oGbbIvUdNqOPFlbj4VFaCFSAx/5+QuKrKkfjRZuTtM/B6ID7vUL5pU4GbbJEgClYK6HRqs0PRxhllCL85ZMUHyQ1KpJS20uFlRJi+5qFfa6hOjK68PtD5RU+v2ANH6mj2W4eEJtpM40sPxbDm36hdeEUeNvIEoSfpwH8PaRLnwaAARkRRrItanTouDqu/RN0uqF+TQFV/op4A4m2oG+TswEUk6/pXA59s9EYQiTmZCtCDUb2nf8UtV4oAC77I7zr7eBiDQcSHyl698kVoHDgwobjZTiiyGbHKPLo8M9+2lex7sekekAFgUFV9sfbbq4C6zF5c8yH78i/78ol87RapRnQg k37kF0L3 ktTNZI7q7mw1qp/hBRv5NkG2K0TtVtM99SYrHlA6a+7Dbat0+n25hzgBY2sWM317cXvr1du33rNgqB1eakR/lOF58ursl3iH8SGgqC465nVMma5w28WdjzvwviC1vGYu/ixgd+wBKuLnAP8OQW+/r7LquuXPLjHOyAq/C8rL8ZyFyeWRCrmsT7nQDibm3Wl90dtZxB5Dzz0wrWANrrHu9v6Gcn2eL3CSmMZg2h/hNX4eohQxPqwsc9aV5K6B897xl/I1aQDKtIf7Gwej9YnTrrzewvg4xQ8LY9Pi+YZ6ThMcDfueazAkX2nDUe+wfTiLnZkAw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 28, 2023 at 03:22:20PM -0800, Chris Li wrote: > Hi Matthew, > > On Sun, Feb 19, 2023 at 04:31:33AM +0000, Matthew Wilcox wrote: > > > > I think an overhaul of the swap code is long overdue. I appreciate > > you're very much focused on zswap, but there are many other problems. > > For example, swap does not work on zoned devices. Swap readahead is > > generally physical (ie optimised for spinning discs) rather than logical > > (more appropriate for SSDs). Swap's management of free space is crude > > compared to real filesystems. The way that swap bypasses the filesystem > > when writing to swap files is awful. I haven't even started to look at > > Can you expand a bit on that? I assume you want to see the swap file > behavior more like a normal file system and reuse more of the readpage() > and writepage() path. Actually, no, readpage() and writepage() should be reserved for page cache. We now have a ->swap_rw(), but it's only implemented by nfs so far. Instead of constructing its own BIOs, swap should invoke ->swap_rw for every filesystem. I suspect we can do a fairly generic block_swap_rw() for the vast majority of filesystems. > > what changes need to be made to swap in order to swap out arbitrary-order > > folios (instead of PMD-sized + PTE-sized). > > When the page fault happens, does the whole folios get swapped in or break > into smaller pages? I think the whole folio should be swapped in. See my proposal for determining the correct size folio to use here: https://lore.kernel.org/linux-mm/Y%2FU8bQd15aUO97vS@casper.infradead.org/ Assuming something like that gets implemented, for a large folio to be swapped out, we've had a selection of page faults on the folio, followed by a period of no faults. All of a sudden we have a fault, so I think we should bring the whole folio back in. The algorithm I outline in that email would then take care of breaking down the folio into smaller folios if it turns out they're not used.