linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/25] Page folios
Date: Thu, 17 Dec 2020 13:55:17 +0000	[thread overview]
Message-ID: <20201217135517.GF15600@casper.infradead.org> (raw)
In-Reply-To: <9e764222-a274-0a99-5e41-7cfa9ea15b86@redhat.com>

On Thu, Dec 17, 2020 at 01:47:57PM +0100, David Hildenbrand wrote:
> On 16.12.20 19:23, Matthew Wilcox (Oracle) wrote:
> > One of the great things about compound pages is that when you try to
> > do various operations on a tail page, it redirects to the head page and
> > everything Just Works.  One of the awful things is how much we pay for
> > that simplicity.  Here's an example, end_page_writeback():
> > 
> >         if (PageReclaim(page)) {
> >                 ClearPageReclaim(page);
> >                 rotate_reclaimable_page(page);
> >         }
> >         get_page(page);
> >         if (!test_clear_page_writeback(page))
> >                 BUG();
> > 
> >         smp_mb__after_atomic();
> >         wake_up_page(page, PG_writeback);
> >         put_page(page);
> > 
> > That all looks very straightforward, but if you dive into the disassembly,
> > you see that there are four calls to compound_head() in this function
> > (PageReclaim(), ClearPageReclaim(), get_page() and put_page()).  It's
> > all for nothing, because if anyone does call this routine with a tail
> > page, wake_up_page() will VM_BUG_ON_PGFLAGS(PageTail(page), page).
> > 
> > I'm not really a CPU person, but I imagine there's some kind of dependency
> > here that sucks too:
> > 
> >     1fd7:       48 8b 57 08             mov    0x8(%rdi),%rdx
> >     1fdb:       48 8d 42 ff             lea    -0x1(%rdx),%rax
> >     1fdf:       83 e2 01                and    $0x1,%edx
> >     1fe2:       48 0f 44 c7             cmove  %rdi,%rax
> >     1fe6:       f0 80 60 02 fb          lock andb $0xfb,0x2(%rax)
> > 
> > Sure, it's going to be cache hot, but that cmove has to execute before
> > the lock andb.
> > 
> > I would like to introduce a new concept that I call a Page Folio.
> > Or just struct folio to its friends.  Here it is,
> > struct folio {
> >         struct page page;
> > };
> > 
> > A folio is a struct page which is guaranteed not to be a tail page.
> > So it's either a head page or a base (order-0) page.  That means
> > we don't have to call compound_head() on it and we save massively.
> > end_page_writeback() reduces from four calls to compound_head() to just
> > one (at the beginning of the function) and it shrinks from 213 bytes
> > to 126 bytes (using distro kernel config options).  I think even that one
> > can be eliminated, but I'm going slowly at this point and taking the
> > safe route of transforming a random struct page pointer into a struct
> > folio pointer by calling page_folio().  By the end of this exercise,
> > end_page_writeback() will become end_folio_writeback().
> > 
> > This is going to be a ton of work, and massively disruptive.  It'll touch
> > every filesystem, and a good few device drivers!  But I think it's worth
> > it.  Not every routine benefits as much as end_page_writeback(), but it
> > makes everything a little better.  At 29 bytes per call to lock_page(),
> > unlock_page(), put_page() and get_page(), that's on the order of 60kB of
> > text for allyesconfig.  More when you add on all the PageFoo() calls.
> > With the small amount of work I've done here, mm/filemap.o shrinks its
> > text segment by over a kilobyte from 33687 to 32318 bytes (and also 192
> > bytes of data).
> 
> Just wondering, as the primary motivation here is "minimizing CPU work",
> did you run any benchmarks that revealed a visible performance improvement?
> 
> Otherwise, we're left with a concept that's hard to grasp first (folio -
> what?!) and "a ton of work, and massively disruptive", saving some kb of
> code - which does not sound too appealing to me.
> 
> (I like the idea of abstracting which pages are actually worth looking
> at directly instead of going via a tail page - tail pages act somewhat
> like a proxy for the head page when accessing flags)

My primary motivation here isn't minimising CPU work at all.  It's trying
to document which interfaces are expected to operate on an entire
compound page and which are expected to operate on a PAGE_SIZE page.
Today, we have a horrible mishmash of

 - This is a head page, I shall operate on 2MB of data
 - This is a tail page, I shall operate on 2MB of data
 - This is not a head page, I shall operate on 4kB of data
 - This is a head page, I shall operate on 4kB of data
 - This is a head|tail page, I shall operate on the size of the compound page.

You might say "Well, why not lead with that?", but I don't know which
advantages people are going to find most compelling.  Even if someone
doesn't believe in the advantages of using folios in the page cache,
looking at the assembler output is, I think, compelling.

  reply	other threads:[~2020-12-17 13:56 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 18:23 [PATCH 00/25] Page folios Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 01/25] mm: Introduce struct folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 02/25] mm: Add put_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 03/25] mm: Add get_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 04/25] mm: Create FolioFlags Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 05/25] mm: Add unlock_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 06/25] mm: Add lock_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 07/25] mm: Add lock_folio_killable Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 08/25] mm: Add __alloc_folio_node and alloc_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 09/25] mm: Convert __page_cache_alloc to return a folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 10/25] mm/filemap: Convert end_page_writeback to use " Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 11/25] mm: Convert mapping_get_entry to return " Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 12/25] mm: Add mark_folio_accessed Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 13/25] mm: Add filemap_get_folio and find_get_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 14/25] mm/filemap: Add folio_add_to_page_cache Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 15/25] mm/swap: Convert rotate_reclaimable_page to folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 16/25] mm: Add folio_mapping Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 17/25] mm: Rename THP_SUPPORT to MULTI_PAGE_FOLIOS Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 18/25] btrfs: Use readahead_batch_length Matthew Wilcox (Oracle)
2020-12-17  9:15   ` John Hubbard
2020-12-17 12:12     ` Matthew Wilcox
2020-12-17 13:42       ` Matthew Wilcox
2020-12-17 19:36         ` John Hubbard
2020-12-16 18:23 ` [PATCH 19/25] fs: Change page refcount rules for readahead Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 20/25] fs: Change readpage to take a folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 21/25] mm: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 22/25] mm: Add wait_on_folio_locked & wait_on_folio_locked_killable Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 23/25] mm: Add flush_dcache_folio Matthew Wilcox (Oracle)
2020-12-16 20:59   ` kernel test robot
2020-12-16 22:01     ` Matthew Wilcox
2020-12-16 18:23 ` [PATCH 24/25] mm: Add read_cache_folio and read_mapping_folio Matthew Wilcox (Oracle)
2020-12-16 18:23 ` [PATCH 25/25] fs: Convert vfs_dedupe_file_range_compare to folios Matthew Wilcox (Oracle)
2020-12-17 12:47 ` [PATCH 00/25] Page folios David Hildenbrand
2020-12-17 13:55   ` Matthew Wilcox [this message]
2020-12-17 14:35     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201217135517.GF15600@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=david@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).