From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BE20CD8C8E for ; Tue, 10 Oct 2023 14:25:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 765098D00B4; Tue, 10 Oct 2023 10:25:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7151D8D0002; Tue, 10 Oct 2023 10:25:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 62AE18D00B4; Tue, 10 Oct 2023 10:25:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 542158D0002 for ; Tue, 10 Oct 2023 10:25:29 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 09A15A018F for ; Tue, 10 Oct 2023 14:25:29 +0000 (UTC) X-FDA: 81329774778.14.3C5C721 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id 645921A009D for ; Tue, 10 Oct 2023 14:21:24 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696947684; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=+PXohe2+H8KqBdmOC+os2jN6LksZvQvIK9iJxuYoX7g=; b=LSPVQfMuNRy0MaH5dyUEshh04BMRCwHvrdzoaWmdc4lvGd8xOVrP1ewppeeD7pKfxUPMDq i6axjx1ydC7AWenYMbRDRWs72Qe9TyvIYrurblCLXv7xHxQsE2r2Ap1s9u5mX0ELbcf76p XipU3LWhRY+9Hqiv4AyMoYV7oy+Gq+k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696947684; a=rsa-sha256; cv=none; b=oAFy4dDv7mE+rLSKczfQwbBWc2AO1vdClrcGA2YYW9Z5FR709+eYCIwtTRmYxpWsQ+LNGD X9qH/4Y5sai5RjCsT8RQmt5zfwzX0ZJq5loSC+OERw1r9EJRU8vDsVZWPU9ob1PEfhQFRj 90Oc0Mfgq9k2I0SvKouZQ0EbSkFQpcQ= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F08071FB; Tue, 10 Oct 2023 07:22:03 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CE8D43F762; Tue, 10 Oct 2023 07:21:21 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko Cc: Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 0/2] Swap-out small-sized THP without splitting Date: Tue, 10 Oct 2023 15:21:09 +0100 Message-Id: <20231010142111.3997780-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 645921A009D X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: ymdkdx613kj1zjbfe1e4xap1sgbsub4g X-HE-Tag: 1696947684-669268 X-HE-Meta: U2FsdGVkX1+hop21Jmi6Ghx7c8sUJBBV2B0FBEJWGDxLEjLu9iQDWe2kI/dSymvq3nBuTMTR9MFT6VD1icicJ+yYWsfEPqW6qY9OSqaPTzMsKBB4rRfH7qgMczNxnq29uM2+60KhqZD4ZXe9surhx92A/0xTOO53hcEHX2DFedWpNgkfwRXNNFEg3EStvBxXg6k4Kta6JM9Yfbcn9YjxJ87+/cGTnLjEHt8jI2fDWl3dNhhgehLP8gq6fKdVhOVv9IcY5FGqIac10bujY4YqWMTioFQ9o/Rxvnh3Iszj5g9GDNQe9JDKNkkMI1gocQ3L+ph9UU2dCEvzb1MsOVxbQXkPByIaX5mSvLMNPXGfjtmf2q5oAe+Oims3yfAPLZJZIAL0HrQ/Tg8MK5JO/Mtxsvc+D8KRrntIzHKvoOS4q984R86zruVnMGCMiz69B0IXlrq3zFt7h1QYm4XpBGoJeL/oc2qbExSJdMEjCF1KTh36ocAsNm5aO0VzTIVTAa7Nk/XLqN/TynwMP96VzlrZGNADzVlLValicqbitvrM2GcIw/5VG4OgXGic3hedAZ61RLQ+CdamO3Xv79+udXzNoiUetJOVIImia+MICzU2ChpmHvFiM2aAj0acfglWm9zsGzRF0PpeQJmiHey8Vcpm7PmcSaOwDDmQB3D8NiiigW21MEabWv3Tofmn5UFPW0+TkTV1cV/gNi6SmE6Nout1IWU5Qwn+HkocLvwsAkPa+Tt91Xra/iKAzogOmHxk91uPDQWdqx/7eISU6izHbpu2x2d1ylFZEBZB4a0+gXkyaLiaJF28Hx5SSqwxUALy61ufrgh+EBJ0LhN9n0CWVHzAyGePLCpJZUzdPJAQ2JwOCBENCeB4WcVVbNEfh1smOqsCEDkB0HBTEW/5SOzXjtivjgLTkuVP2pmixb0sXZMYfLEa3yqrg/ywNOn+b4uTxOS9+HKPJ2QVSrM2q3iJwug fB0L4law p2nXpDVFTH7M5smLvFpBe1f4bDVbED9SKgm2h0lpI+LOa0eiDCM15phuVI4QC3EgAlIc1ELf7up81tCre6rlLa9bKqaCTMYc6fLasLvmS7o4nosLydBXv8xsUeGx88vnbOpEtIEdxziRh65DIJmpbodwccCThmAckHo9ipkK/9VM3DLOu3BpAVXwMegZxYExv8DNN34OEQnM5aAZexOdGywoX/w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi All, This is an RFC for a small series to add support for swapping out small-sized THP without needing to first split the large folio via __split_huge_page(). It closely follows the approach already used by PMD-sized THP. "Small-sized THP" is an upcoming feature that enables performance improvements by allocating large folios for anonymous memory, where the large folio size is smaller than the traditional PMD-size. See [1]. In some circumstances I've observed a performance regression (see patch 2 for details), and this series is an attempt to fix the regression in advance of merging small-sized THP support. I've done what I thought was the smallest change possible, and as a result, this approach is only employed when the swap is backed by a non-rotating block device (just as PMD-sized THP is supported today). However, I have a few questions on whether we should consider relaxing those requirements in certain circumstances: 1) block-backed vs file-backed ============================== The code only attempts to allocate a contiguous set of entries if swap is backed by a block device (i.e. not file-backed). The original commit, f0eea189e8e9 ("mm, THP, swap: don't allocate huge cluster for file backed swap device"), stated "It's hard to write a whole transparent huge page (THP) to a file backed swap device". But didn't state why. Does this imply there is a size limit at which it becomes hard? And does that therefore imply that for "small enough" sizes we should now allow use with file-back swap? This original commit was subsequently fixed with commit 41663430588c ("mm, THP, swap: fix allocating cluster for swapfile by mistake"), which said the original commit was using the wrong flag to determine if it was a block device and therefore in some cases was actually doing large allocations for a file-backed swap device, and this was causing file-system corruption. But that implies some sort of correctness issue to me, rather than the performance issue I inferred from the original commit. If anyone can offer an explanation, that would be helpful in determining if we should allow some large sizes for file-backed swap. 2) rotating vs non-rotating =========================== I notice that the clustered approach is only used for non-rotating swap. That implies that for rotating media, we will always fail a large allocation, and fall back to splitting THPs to single pages. Which implies that the regression I'm fixing here may still be present on rotating media? Or perhaps rotating disk is so slow that the cost of writing the data out dominates the cost of splitting? I considered that potentially the free swap entry search algorithm that is used in this case could be modified to look for (small) contiguous runs of entries; Up to ~16 pages (order-4) could be done by doing 2x 64bit reads from map instead of single byte. I haven't looked into this idea in detail, but wonder if anybody thinks it is worth the effort? Or perhaps it would end up causing bad fragmentation. Finally on testing, I've run the mm selftests and see no regressions, but I don't think there is anything in there specifically aimed towards swap? Are there any functional or performance tests that I should run? It would certainly be good to confirm I haven't regressed PMD-size THP swap performance. Thanks, Ryan [1] https://lore.kernel.org/linux-mm/15a52c3d-9584-449b-8228-1335e0753b04@arm.com/ Ryan Roberts (2): mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags mm: swap: Swap-out small-sized THP without splitting include/linux/swap.h | 17 +++---- mm/huge_memory.c | 3 -- mm/swapfile.c | 105 ++++++++++++++++++++++--------------------- mm/vmscan.c | 10 +++-- 4 files changed, 66 insertions(+), 69 deletions(-) -- 2.25.1