All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio
Date: Sun, 14 Mar 2021 04:07:31 +0000	[thread overview]
Message-ID: <20210314040731.GO2577561@casper.infradead.org> (raw)
In-Reply-To: <20210313123702.b7724b84d955ab6669d4417a@linux-foundation.org>

On Sat, Mar 13, 2021 at 12:37:02PM -0800, Andrew Morton wrote:
> On Fri,  5 Mar 2021 04:18:37 +0000 "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> 
> > A struct folio refers to an entire (possibly compound) page.  A function
> > which takes a struct folio argument declares that it will operate on the
> > entire compound page, not just PAGE_SIZE bytes.  In return, the caller
> > guarantees that the pointer it is passing does not point to a tail page.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  include/linux/mm.h       | 30 ++++++++++++++++++++++++++++++
> >  include/linux/mm_types.h | 17 +++++++++++++++++
> 
> Perhaps a new folio.h would be neater.

I understand that urge (I'm no fan of the size of mm.h ...), but it ends
up pretty interwoven with mm.h.  For example:

static inline struct zone *folio_zone(const struct folio *folio)
{
        return page_zone(&folio->page);
}

so we need both struct folio defined here and we need page_zone().
page_zone() is defined in mm.h, so we'd have folio.h including mm.h.
But then put_page() calls put_folio(), so we need folio.h included
in mm.h.

The patch series I have does a lot of movement of page cache functionality
from mm.h to filemap.h, so you may end up with a smaller mm.h at the
end of it.  Right now, it's 10 lines longer than it was.

> > +static inline struct folio *next_folio(struct folio *folio)
> > +{
> > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> > +	return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio));
> > +#else
> > +	return folio + folio_nr_pages(folio);
> > +#endif
> > +}
> 
> It's a shame this isn't called folio_something(), like the rest of the API.
> 
> Unclear what this does.  Some comments would help.

folio_next() it can be.  I'll add some commentary.

> > +static inline unsigned int folio_shift(struct folio *folio)
> > +{
> > +	return PAGE_SHIFT + folio_order(folio);
> > +}
> > +
> > +static inline size_t folio_size(struct folio *folio)
> > +{
> > +	return PAGE_SIZE << folio_order(folio);
> > +}
> 
> Why size_t?  That's pretty rare in this space and I'd have expected
> unsigned long.

I like to use size_t for things which are the number of bytes represented
by an in-memory object.  As opposed to all the other things which we
use unsigned long for.  Maybe that's more common on the filesystem side
of the house.

> > +static inline struct folio *page_folio(struct page *page)
> > +{
> > +	unsigned long head = READ_ONCE(page->compound_head);
> > +
> > +	if (unlikely(head & 1))
> > +		return (struct folio *)(head - 1);
> > +	return (struct folio *)page;
> > +}
> 
> What purpose does the READ_ONCE() serve?

Same purpose as it does in compound_head():

static inline struct page *compound_head(struct page *page)
{
        unsigned long head = READ_ONCE(page->compound_head);

        if (unlikely(head & 1))
                return (struct page *) (head - 1);
        return page;
}

I think Kirill would say that it's to defend against splitting if we
don't have a refcount on the page yet.  So if we do something like walk
the page tables, find a PTE, translate it to a struct page, then try to
get a reference on the head page, we don't end up with an incoherent
answer from 'compound_head()' if it's split in the middle of the call
and the page->compound_head field gets assigned some other value.

It might return the wrong page, so get_user_pages() still has to check
the page is right after it's got the reference, but at least this way
it's guaranteed to return something that was right at one time.

There might be more to it, but that's my understanding of why the code
is currently written this way.

  reply	other threads:[~2021-03-14  4:08 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05  4:18 [PATCH v4 00/25] Page folios Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 01/25] mm: Introduce struct folio Matthew Wilcox (Oracle)
2021-03-13 20:37   ` Andrew Morton
2021-03-14  4:07     ` Matthew Wilcox [this message]
2021-03-17 17:14   ` Christoph Hellwig
2021-03-18 23:56   ` Balbir Singh
2021-03-19  1:25     ` Matthew Wilcox
2021-03-20  2:09       ` Balbir Singh
2021-03-22  2:52       ` Nicholas Piggin
2021-03-22  3:54         ` Matthew Wilcox
2021-03-05  4:18 ` [PATCH v4 02/25] mm: Add folio_pgdat and folio_zone Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 03/25] mm/vmstat: Add functions to account folio statistics Matthew Wilcox (Oracle)
2021-03-13 20:37   ` Andrew Morton
2021-03-14  4:11     ` Matthew Wilcox
2021-03-14  4:51       ` Andrew Morton
2021-03-17 17:16   ` Christoph Hellwig
2021-03-05  4:18 ` [PATCH v4 04/25] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 05/25] mm: Add put_folio Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 06/25] mm: Add get_folio Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 07/25] mm: Create FolioFlags Matthew Wilcox (Oracle)
2021-03-15  2:24   ` Matthew Wilcox
2021-03-05  4:18 ` [PATCH v4 08/25] mm: Handle per-folio private data Matthew Wilcox (Oracle)
2021-03-17 17:20   ` Christoph Hellwig
2021-03-18 17:57     ` Matthew Wilcox
2021-03-05  4:18 ` [PATCH v4 09/25] mm: Add folio_index, folio_page and folio_contains Matthew Wilcox (Oracle)
2021-03-13 20:37   ` Andrew Morton
2021-03-14  3:45     ` Matthew Wilcox
2021-03-17 17:22     ` Christoph Hellwig
2021-03-05  4:18 ` [PATCH v4 10/25] mm/util: Add folio_mapping and folio_file_mapping Matthew Wilcox (Oracle)
2021-03-17 17:26   ` Christoph Hellwig
2021-03-05  4:18 ` [PATCH v4 11/25] mm/memcg: Add folio wrappers for various memcontrol functions Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 12/25] mm/filemap: Add unlock_folio Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 13/25] mm/filemap: Add lock_folio Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 14/25] mm/filemap: Add lock_folio_killable Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 15/25] mm/filemap: Convert lock_page_async to lock_folio_async Matthew Wilcox (Oracle)
2021-03-17 17:29   ` Christoph Hellwig
2021-03-05  4:18 ` [PATCH v4 16/25] mm/filemap: Convert end_page_writeback to end_folio_writeback Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 17/25] mm/filemap: Add wait_on_folio_locked & wait_on_folio_locked_killable Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 18/25] mm/page-writeback: Add wait_on_folio_writeback Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 19/25] mm/page-writeback: Add wait_for_stable_folio Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 20/25] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit Matthew Wilcox (Oracle)
2021-03-17 17:39   ` Christoph Hellwig
2021-03-05  4:18 ` [PATCH v4 21/25] mm/filemap: Add __lock_folio_or_retry Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 22/25] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit Matthew Wilcox (Oracle)
2021-03-05  4:18 ` [PATCH v4 23/25] mm/page-writeback: Convert test_clear_page_writeback to take a folio Matthew Wilcox (Oracle)
2021-03-05  4:19 ` [PATCH v4 24/25] mm/filemap: Convert page wait queues to be folios Matthew Wilcox (Oracle)
2021-03-05  4:19 ` [PATCH v4 25/25] cachefiles: Switch to wait_page_key Matthew Wilcox (Oracle)
2021-03-17 17:45   ` Christoph Hellwig
2021-03-13 20:36 ` [PATCH v4 00/25] Page folios Andrew Morton
2021-03-14  2:30   ` Matthew Wilcox
2021-03-14  3:09   ` Hugh Dickins
2021-03-14  3:09     ` Hugh Dickins
2021-03-15 11:55     ` Kirill A. Shutemov
2021-03-15 12:38       ` Michal Hocko
2021-03-15 19:09         ` Christoph Hellwig
2021-03-15 19:40           ` Matthew Wilcox
2021-03-15 22:27             ` Dave Chinner
2021-03-17 17:48             ` Christoph Hellwig
2021-03-15 21:33         ` Andi Kleen
2021-03-15 21:33           ` Andi Kleen
2021-03-15 13:45       ` Matthew Wilcox
2021-03-19  4:01 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210314040731.GO2577561@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.