From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA073C433F5 for ; Fri, 27 May 2022 03:47:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 134988D0003; Thu, 26 May 2022 23:47:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E03C8D0002; Thu, 26 May 2022 23:47:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F124B8D0003; Thu, 26 May 2022 23:47:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E1EBD8D0002 for ; Thu, 26 May 2022 23:47:26 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id A4AC280AE2 for ; Fri, 27 May 2022 03:47:26 +0000 (UTC) X-FDA: 79510138092.04.32CB369 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf07.hostedemail.com (Postfix) with ESMTP id DDFC040031 for ; Fri, 27 May 2022 03:47:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=MAVuy4m4empSUr3nv7m8jm6kyeTZHrTmKhIo1Yh7Rfk=; b=kWNT2Wa1o5Cmd5cH4W2abTPoic hlthCgsZfGTNxmvnlnADub4ReiI2NR04yYi0T8ZsZih+HuE+LqXsi2VLsw1EUQCpL0IrK26UjQ+fM VgdHR4ydgIWwJblAYhZU+mckMr4OcsYlg1ft2g7yvLC3AY/cJEVxLeInWBCr6GruOK04K46di3dvW hnLKfgySVd55PJY/ck+dNoQOKUKmLTu+whv7JzNb6x1CmB6I5Z0FukGWZ528p3EHTS7f1JLPLmnhe oJ+DS05V/PD7Ypjiex8BA+h4J80QyVkRMLwYNiPlnqxemu6oDQETJNmXWfBlBn38pIgnknXtqweQK iDOWmZeg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuQwd-001n6o-33; Fri, 27 May 2022 03:47:23 +0000 Date: Fri, 27 May 2022 04:47:23 +0100 From: Matthew Wilcox To: Zach O'Keefe Cc: David Rientjes , "linux-mm@kvack.org" Subject: Re: mm/khugepaged: collapse file/shmem compound pages Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=kWNT2Wa1; dmarc=none; spf=none (imf07.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DDFC040031 X-Stat-Signature: 51qctt5mjxqnz1e57sr6xfeyupktop7m X-HE-Tag: 1653623234-580143 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 26, 2022 at 05:54:27PM -0700, Zach O'Keefe wrote: > On Wed, May 25, 2022 at 8:36 PM Matthew Wilcox wrote: > > On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote: > > > On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox wrote: > > > > Anyway, that meaning behind that comment is that the PageTransCompound() > > > > test is going to be true on any compound page (TransCompound doesn't > > > > check that the page is necessarily a THP). So that particular test should > > > > be folio_test_pmd_mappable(), but there are probably other things which > > > > ought to be changed, including converting the entire file from dealing > > > > in pages to dealing in folios. > > > > > > Right, at this point, the page might be a pmd-mapped THP, or it could > > > be a pte-mapped compound page (I'm unsure if we can encounter compound > > > pages outside hugepages). > > > > Today, there is a way. We can find a folio with an order between 0 and > > PMD_ORDER if the underlying filesystem supports large folios and the > > file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS. > > In this case, we'll simply skip over it because the code believes that > > means it's already a PMD. > > I think I'm missing something here - sorry. If the folio order is < > HPAGE_PMD_ORDER, why does the code think it's a pmd? Because PageTransCompound() does not do what it says on the tin. static inline int PageTransCompound(struct page *page) { return PageCompound(page); } So any compound page is treated as if it's a PMD-sized page. > > > If we could tell it's already pmd-mapped, we're done :) IIUC, > > > folio_test_pmd_mappable() is a necessary but not sufficient condition > > > to determine this. > > > > It is necessary, but from khugepaged's point of view, it's sufficient > > because khugepaged's job is to create PMD-sized folios -- it's not up to > > khugepaged to ensure that PMD-sized folios are actually mapped using > > a PMD. > > I thought the point / benefit of khugepaged was precisely to try and > find places where we can collapse many pte entries into a single pmd > mapping? Ideally, yes. But if a file is mapped at an address which isn't PMD-aligned, it can't. Maybe it should just decline to operate in that case. > > There may be some other component of the system (eg DAMON?) > > which has chosen to temporarily map the PMD-sized folio using PTEs > > in order to track whether the memory is all being used. It may also > > be the case that (for file-based memory), the VMA is mis-aligned and > > despite creating a PMD-sized folio, it can't be mapped with a PMD. > > AFAIK DAMON doesn't do this pmd splitting to do subpage tracking for > THPs. Also, I believe retract_page_tables() does make the check to see > if the address is suitably hugepage aligned/sized. Maybe not DAMON itself, but it's something that various people are talkig about doing; trying to determine whether THPs are worth using or whether userspace has made the magic go-faster call without knowing whether the valuable 2MB page is being entirely used. > > shmem still expects folios to be of order either 0 or PMD_ORDER. > > That assumption extends into the swap code and I haven't had the heart > > to go and fix all those places yet. Plus Neil was doing major surgery > > to the swap code in the most recent deveopment cycle and I didn't want > > to get in his way. > > > > So I am absolutely fine with khugepaged allocating a PMD-size folio for > > any inode that claims mapping_large_folio_support(). If any filesystems > > break, we'll fix them. > > Just for clarification, what is the equivalent code today that > enforces mapping_large_folio_support()? I.e. today, khugepaged can > successfully collapse file without checking if the inode supports it > (we only check that it's a regular file not opened for writing). Yeah, that's a dodgy hack which needs to go away. But we need a lot more filesystems converted to supporting large folios before we can delete it. Not your responsibility; I'm doing my best to encourage fs maintainers to do this part. > Also, just to check, there isn't anything wrong with following > collapse_file()'s approach, even for folios of 0 < order < > HPAGE_PMD_ORDER? I.e this part: > > * Basic scheme is simple, details are more complex: > * - allocate and lock a new huge page; > * - scan page cache replacing old pages with the new one > * + swap/gup in pages if necessary; > * + fill in gaps; > * + keep old pages around in case rollback is required; > * - if replacing succeeds: > * + copy data over; > * + free old pages; > * + unlock huge page; > * - if replacing failed; > * + put all pages back and unfreeze them; > * + restore gaps in the page cache; > * + unlock and free huge page; > */ Correct. At least, as far as I know! Working on folios has been quite the education for me ...