From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDAC3C4361B for ; Thu, 17 Dec 2020 13:55:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D6B42395A for ; Thu, 17 Dec 2020 13:55:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D6B42395A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2EBC56B0068; Thu, 17 Dec 2020 08:55:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 275B26B006C; Thu, 17 Dec 2020 08:55:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1641E6B006E; Thu, 17 Dec 2020 08:55:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id EDDD46B0068 for ; Thu, 17 Dec 2020 08:55:23 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9F9A9180AD82F for ; Thu, 17 Dec 2020 13:55:23 +0000 (UTC) X-FDA: 77602921326.05.watch15_270f5cb27435 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 8189A18031911 for ; Thu, 17 Dec 2020 13:55:23 +0000 (UTC) X-HE-Tag: watch15_270f5cb27435 X-Filterd-Recvd-Size: 6305 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Dec 2020 13:55:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=/xHuHZVNJiJuB1/wE9ThBTiKge7RljSzOXowcSsCUnQ=; b=ZdApPT+uTR97j2wNolGGi0gLIu tjIfiom1cLqE2++kuNnVek0RmaLJeVwv6sCyqdVQGr1W6pMo+/VpjWxBTbr1wdNJProsbSf+LVl5k 3tGDDSw84fDoW/zibD25DAUhHNmUT4xBbADU6IWtEa+6hx+B1qG4VejZL7YSyMgPteoMtwiS+zlX5 D+4Q69bXWdXu0klpAocgsgtzHnH/0BLlhZ37zXho5bwgS0V94yTce8xKb44rLdNKoxDAVusOS2cCC ByPAbjZ1DVsf1EVLdRVw2aHv+T65lY1GgcIf5VxAhA51UlexSHmlSSptXFrtMAWrnQ36SQVOOXnq3 VimrJ9QQ==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kptkU-0007V5-29; Thu, 17 Dec 2020 13:55:18 +0000 Date: Thu, 17 Dec 2020 13:55:17 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/25] Page folios Message-ID: <20201217135517.GF15600@casper.infradead.org> References: <20201216182335.27227-1-willy@infradead.org> <9e764222-a274-0a99-5e41-7cfa9ea15b86@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e764222-a274-0a99-5e41-7cfa9ea15b86@redhat.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 17, 2020 at 01:47:57PM +0100, David Hildenbrand wrote: > On 16.12.20 19:23, Matthew Wilcox (Oracle) wrote: > > One of the great things about compound pages is that when you try to > > do various operations on a tail page, it redirects to the head page and > > everything Just Works. One of the awful things is how much we pay for > > that simplicity. Here's an example, end_page_writeback(): > > > > if (PageReclaim(page)) { > > ClearPageReclaim(page); > > rotate_reclaimable_page(page); > > } > > get_page(page); > > if (!test_clear_page_writeback(page)) > > BUG(); > > > > smp_mb__after_atomic(); > > wake_up_page(page, PG_writeback); > > put_page(page); > > > > That all looks very straightforward, but if you dive into the disassembly, > > you see that there are four calls to compound_head() in this function > > (PageReclaim(), ClearPageReclaim(), get_page() and put_page()). It's > > all for nothing, because if anyone does call this routine with a tail > > page, wake_up_page() will VM_BUG_ON_PGFLAGS(PageTail(page), page). > > > > I'm not really a CPU person, but I imagine there's some kind of dependency > > here that sucks too: > > > > 1fd7: 48 8b 57 08 mov 0x8(%rdi),%rdx > > 1fdb: 48 8d 42 ff lea -0x1(%rdx),%rax > > 1fdf: 83 e2 01 and $0x1,%edx > > 1fe2: 48 0f 44 c7 cmove %rdi,%rax > > 1fe6: f0 80 60 02 fb lock andb $0xfb,0x2(%rax) > > > > Sure, it's going to be cache hot, but that cmove has to execute before > > the lock andb. > > > > I would like to introduce a new concept that I call a Page Folio. > > Or just struct folio to its friends. Here it is, > > struct folio { > > struct page page; > > }; > > > > A folio is a struct page which is guaranteed not to be a tail page. > > So it's either a head page or a base (order-0) page. That means > > we don't have to call compound_head() on it and we save massively. > > end_page_writeback() reduces from four calls to compound_head() to just > > one (at the beginning of the function) and it shrinks from 213 bytes > > to 126 bytes (using distro kernel config options). I think even that one > > can be eliminated, but I'm going slowly at this point and taking the > > safe route of transforming a random struct page pointer into a struct > > folio pointer by calling page_folio(). By the end of this exercise, > > end_page_writeback() will become end_folio_writeback(). > > > > This is going to be a ton of work, and massively disruptive. It'll touch > > every filesystem, and a good few device drivers! But I think it's worth > > it. Not every routine benefits as much as end_page_writeback(), but it > > makes everything a little better. At 29 bytes per call to lock_page(), > > unlock_page(), put_page() and get_page(), that's on the order of 60kB of > > text for allyesconfig. More when you add on all the PageFoo() calls. > > With the small amount of work I've done here, mm/filemap.o shrinks its > > text segment by over a kilobyte from 33687 to 32318 bytes (and also 192 > > bytes of data). > > Just wondering, as the primary motivation here is "minimizing CPU work", > did you run any benchmarks that revealed a visible performance improvement? > > Otherwise, we're left with a concept that's hard to grasp first (folio - > what?!) and "a ton of work, and massively disruptive", saving some kb of > code - which does not sound too appealing to me. > > (I like the idea of abstracting which pages are actually worth looking > at directly instead of going via a tail page - tail pages act somewhat > like a proxy for the head page when accessing flags) My primary motivation here isn't minimising CPU work at all. It's trying to document which interfaces are expected to operate on an entire compound page and which are expected to operate on a PAGE_SIZE page. Today, we have a horrible mishmash of - This is a head page, I shall operate on 2MB of data - This is a tail page, I shall operate on 2MB of data - This is not a head page, I shall operate on 4kB of data - This is a head page, I shall operate on 4kB of data - This is a head|tail page, I shall operate on the size of the compound page. You might say "Well, why not lead with that?", but I don't know which advantages people are going to find most compelling. Even if someone doesn't believe in the advantages of using folios in the page cache, looking at the assembler output is, I think, compelling.