All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size
Date: Wed, 9 Sep 2020 15:50:35 +0100	[thread overview]
Message-ID: <20200909145035.GH6583@casper.infradead.org> (raw)
In-Reply-To: <20200909142904.acca6gthbffk3jwq@box>

On Wed, Sep 09, 2020 at 05:29:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote:
> > A compound page in the page cache will not necessarily be of PMD size,
> > so check explicitly.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/memory.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 602f4283122f..4b35b4e71e64 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> >  	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> >  	pmd_t entry;
> >  	int i;
> > -	vm_fault_t ret;
> > +	vm_fault_t ret = VM_FAULT_FALLBACK;
> >  
> >  	if (!transhuge_vma_suitable(vma, haddr))
> > -		return VM_FAULT_FALLBACK;
> > +		return ret;
> >  
> > -	ret = VM_FAULT_FALLBACK;
> >  	page = compound_head(page);
> > +	if (page_order(page) != HPAGE_PMD_ORDER)
> > +		return ret;
> 
> Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)?
> Just in case.

In the patch where I actually start creating THPs, I limit the order to
HPAGE_PMD_ORDER, so we're not going to see this today.  At some point
in the future, I can imagine that we allow THPs larger than PMD size,
and what we'd want alloc_set_pte() to look like is:

	if (pud_none(*vmf->pud) && PageTransCompound(page)) {
		ret = do_set_pud(vmf, page);
		if (ret != VM_FAULT_FALLBACK)
			return ret;
	}
	if (pmd_none(*vmf->pmd) && PageTransCompound(page)) {
		ret = do_set_pmd(vmf, page);
		if (ret != VM_FAULT_FALLBACK)
			return ret;
	}

Once we're in that situation, in do_set_pmd(), we'd want to figure out
which sub-page of the >PMD-sized page to insert.  But I don't want to
write code for that now.

So, what's the right approach if somebody does call alloc_set_pte()
with a >PMD sized page?  It's not exported, so the only two ways to get
it called with a >PMD sized page is to (1) persuade filemap_map_pages()
to call it, which means putting it in the page cache or (2) return it
from vm_ops->fault.  If someone actually does that (an interesting
device driver, perhaps), I don't think hitting it with a BUG is the
right response.  I think it should actually be to map the right PMD-sized
chunk of the page, but we don't even do that today -- we map the first
PMD-sized chunk of the page.

With this patch, we'll simply map the appropriate PAGE_SIZE chunk at the
requested address.  So this would be a bugfix for such a demented driver.
At some point, it'd be nice to handle this with a PMD, but I don't want
to write that code without a test-case.  We could probably simulate
it with the page cache THP code and be super-aggressive about creating
order-10 pages ... but this is feeling more and more out of scope for
this patch set, which today hit 99 patches.


  reply	other threads:[~2020-09-09 14:50 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-08 19:55 [PATCH 00/11] Remove assumptions of THP size Matthew Wilcox (Oracle)
2020-09-08 19:55 ` [PATCH 01/11] mm/filemap: Fix page cache removal for arbitrary sized THPs Matthew Wilcox (Oracle)
2020-09-09 14:27   ` Kirill A. Shutemov
2020-09-15  7:13   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 02/11] mm/memory: Remove page fault assumption of compound page size Matthew Wilcox (Oracle)
2020-09-09 14:29   ` Kirill A. Shutemov
2020-09-09 14:50     ` Matthew Wilcox [this message]
2020-09-11 14:51       ` Kirill A. Shutemov
2020-09-08 19:55 ` [PATCH 03/11] mm/page_owner: Change split_page_owner to take a count Matthew Wilcox (Oracle)
2020-09-09 14:42   ` Kirill A. Shutemov
2020-09-15  7:17   ` SeongJae Park
2020-10-13 13:52   ` Matthew Wilcox
2020-09-08 19:55 ` [PATCH 04/11] mm/huge_memory: Fix total_mapcount assumption of page size Matthew Wilcox (Oracle)
2020-09-15  7:21   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 05/11] mm/huge_memory: Fix split " Matthew Wilcox (Oracle)
2020-09-15  7:23   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 06/11] mm/huge_memory: Fix page_trans_huge_mapcount assumption of THP size Matthew Wilcox (Oracle)
2020-09-09 14:45   ` Kirill A. Shutemov
2020-09-15  7:24   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 07/11] mm/huge_memory: Fix can_split_huge_page " Matthew Wilcox (Oracle)
2020-09-09 14:46   ` Kirill A. Shutemov
2020-09-15  7:25   ` SeongJae Park
2020-09-16  1:44   ` Huang, Ying
2020-09-08 19:55 ` [PATCH 08/11] mm/rmap: Fix assumptions " Matthew Wilcox (Oracle)
2020-09-09 14:47   ` Kirill A. Shutemov
2020-09-15  7:27   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 09/11] mm/truncate: Fix truncation for pages of arbitrary size Matthew Wilcox (Oracle)
2020-09-09 14:50   ` Kirill A. Shutemov
2020-09-15  7:36   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 10/11] mm/page-writeback: Support tail pages in wait_for_stable_page Matthew Wilcox (Oracle)
2020-09-09 14:53   ` Kirill A. Shutemov
2020-09-15  7:37   ` SeongJae Park
2020-09-08 19:55 ` [PATCH 11/11] mm/vmscan: Allow arbitrary sized pages to be paged out Matthew Wilcox (Oracle)
2020-09-09 14:55   ` Kirill A. Shutemov
2020-09-15  7:40   ` SeongJae Park
2020-09-15 12:52     ` Matthew Wilcox
2020-09-16  1:40       ` Huang, Ying
2020-09-16  6:09         ` SeongJae Park
2020-09-30 12:13         ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200909145035.GH6583@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.