linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Android Kernel Team <kernel-team@android.com>,
	Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Linux-MM <linux-mm@kvack.org>,
	Vinayak Menon <vinmenon@codeaurora.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Will Deacon <will@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting
Date: Wed, 23 Dec 2020 20:04:54 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.2012231905300.5723@eggly.anvils> (raw)
In-Reply-To: <20201222100047.p5zdb4ghagncq2oe@box>

On Tue, 22 Dec 2020, Kirill A. Shutemov wrote:
> 
> Updated patch is below.
> 
> From 0ec1bc1fe95587350ac4f4c866d6482383740b36 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Sat, 19 Dec 2020 15:19:23 +0300
> Subject: [PATCH] mm: Cleanup faultaround and finish_fault() codepaths
> 
> alloc_set_pte() has two users with different requirements: in the
> faultaround code, it called from an atomic context and PTE page table
> has to be preallocated. finish_fault() can sleep and allocate page table
> as needed.
> 
> PTL locking rules are also strange, hard to follow and overkill for
> finish_fault().
> 
> Let's untangle the mess. alloc_set_pte() has gone now. All locking is
> explicit.
> 
> The price is some code duplication to handle huge pages in faultaround
> path, but it should be fine, having overall improvement in readability.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

It's not ready yet.

I won't pretend to have reviewed, but I did try applying and running
with it: mostly it seems to work fine, but turned out to be leaking
huge pages (with vmstat's thp_split_page_failed growing bigger and
bigger as page reclaim cannot get rid of them).

Aside from the actual bug, filemap_map_pmd() seems suboptimal at
present: comments below (plus one comment in do_anonymous_page()).

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 0b2067b3c328..f8fdbe079375 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2831,10 +2832,74 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
>  }
>  EXPORT_SYMBOL(filemap_fault);
>  
> +static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page,
> +				  struct xa_state *xas)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct address_space *mapping = vma->vm_file->f_mapping;
> +
> +	/* Huge page is mapped? No need to proceed. */
> +	if (pmd_trans_huge(*vmf->pmd))
> +		return true;
> +
> +	if (xa_is_value(page))
> +		goto nohuge;

I think it would be easier to follow if filemap_map_pages() never
passed this an xa_is_value(page): probably just skip them in its
initial xas_next_entry() loop.

> +
> +	if (!pmd_none(*vmf->pmd))
> +		goto nohuge;

Then at nohuge it unconditionally takes pmd_lock(), finds !pmd_none,
and unlocks again: unnecessary overhead I believe we did not have before.

> +
> +	if (!PageTransHuge(page) || PageLocked(page))
> +		goto nohuge;

So if PageTransHuge, but someone else temporarily holds PageLocked,
we insert a page table at nohuge, sadly preventing it from being
mapped here later by huge pmd.

> +
> +	if (!page_cache_get_speculative(page))
> +		goto nohuge;
> +
> +	if (page != xas_reload(xas))
> +		goto unref;
> +
> +	if (!PageTransHuge(page))
> +		goto unref;
> +
> +	if (!PageUptodate(page) || PageReadahead(page) || PageHWPoison(page))
> +		goto unref;
> +
> +	if (!trylock_page(page))
> +		goto unref;
> +
> +	if (page->mapping != mapping || !PageUptodate(page))
> +		goto unlock;
> +
> +	if (xas->xa_index >= DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE))
> +		goto unlock;
> +
> +	do_set_pmd(vmf, page);

Here is the source of the huge page leak: do_set_pmd() can fail
(and we would do better to have skipped most of its failure cases long
before getting this far).  It worked without leaking once I patched it:

-	do_set_pmd(vmf, page);
-	unlock_page(page);
-	return true;
+	if (do_set_pmd(vmf, page) == 0) {
+		unlock_page(page);
+		return true;
+	}

> +	unlock_page(page);
> +	return true;
> +unlock:
> +	unlock_page(page);
> +unref:
> +	put_page(page);
> +nohuge:
> +	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> +	if (likely(pmd_none(*vmf->pmd))) {
> +		mm_inc_nr_ptes(vma->vm_mm);
> +		pmd_populate(vma->vm_mm, vmf->pmd, vmf->prealloc_pte);
> +		vmf->prealloc_pte = NULL;
> +	}
> +	spin_unlock(vmf->ptl);

I think it's a bit weird to hide this page table insertion inside
filemap_map_pmd() (I guess you're thinking that this function deals
with pmd level, but I'd find it easier to have a filemap_map_huge()
dealing with the huge mapping).  Better to do it on return into
filemap_map_pages(); maybe filemap_map_pmd() or filemap_map_huge()
would then need to return vm_fault_t rather than bool, I didn't try.

> +
> +	/* See comment in handle_pte_fault() */
> +	if (pmd_devmap_trans_unstable(vmf->pmd))
> +		return true;
> +
> +	return false;
> +}
...
> diff --git a/mm/memory.c b/mm/memory.c
> index c48f8df6e502..96d62774096a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3490,7 +3490,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  	if (pte_alloc(vma->vm_mm, vmf->pmd))
>  		return VM_FAULT_OOM;
>  
> -	/* See the comment in pte_alloc_one_map() */
> +	/* See the comment in map_set_pte() */

No, no such function: probably should be like the others and say
	/* See comment in handle_pte_fault() */

Hugh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-12-24  4:07 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 16:39 [PATCH 0/2] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
2020-12-09 16:39 ` [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon
2020-12-09 17:58   ` Linus Torvalds
2020-12-09 18:40     ` Will Deacon
2020-12-09 19:04       ` Linus Torvalds
2020-12-09 20:32         ` Matthew Wilcox
2020-12-09 21:04           ` Linus Torvalds
2020-12-10 15:08         ` Kirill A. Shutemov
2020-12-10 17:23           ` Linus Torvalds
2020-12-14 16:07             ` Kirill A. Shutemov
2020-12-14 17:54               ` Linus Torvalds
2020-12-14 18:56                 ` Matthew Wilcox
2020-12-16 17:07                 ` Kirill A. Shutemov
2020-12-16 18:41                   ` Linus Torvalds
2020-12-17 10:54                     ` Kirill A. Shutemov
2020-12-17 18:22                       ` Linus Torvalds
2020-12-18 11:04                         ` Kirill A. Shutemov
2020-12-18 18:56                           ` Linus Torvalds
2020-12-19 12:41                             ` Kirill A. Shutemov
2020-12-19 20:08                               ` Linus Torvalds
2020-12-19 20:34                               ` Linus Torvalds
2020-12-22 10:00                                 ` Kirill A. Shutemov
2020-12-24  4:04                                   ` Hugh Dickins [this message]
2020-12-25 11:31                                     ` Kirill A. Shutemov
2020-12-26 17:57                                       ` Linus Torvalds
2020-12-26 20:43                                         ` Kirill A. Shutemov
2020-12-26 21:03                                           ` Hugh Dickins
2020-12-26 21:16                                             ` Linus Torvalds
2020-12-26 22:40                                               ` Kirill A. Shutemov
2020-12-27  0:45                                                 ` Hugh Dickins
2020-12-27  2:38                                                   ` Hugh Dickins
2020-12-27 19:38                                                     ` Linus Torvalds
2020-12-27 20:32                                                       ` Damian Tometzki
2020-12-27 22:35                                                         ` Hugh Dickins
2020-12-27 23:12                                                           ` Linus Torvalds
2020-12-27 23:40                                                             ` Linus Torvalds
2020-12-27 23:55                                                               ` Kirill A. Shutemov
2020-12-27 23:48                                                       ` Kirill A. Shutemov
2020-12-28  1:54                                                         ` Linus Torvalds
2020-12-28  6:43                                                           ` Hugh Dickins
2020-12-28 12:53                                                             ` Kirill A. Shutemov
2020-12-28 18:47                                                               ` Linus Torvalds
2020-12-28 21:58                                                                 ` Linus Torvalds
2020-12-29 13:28                                                                   ` Kirill A. Shutemov
2020-12-29 15:19                                                                     ` Matthew Wilcox
2020-12-29 20:52                                                                     ` Linus Torvalds
2020-12-28 22:05                                                                 ` Kirill A. Shutemov
2020-12-28 22:12                                                                   ` Kirill A. Shutemov
2020-12-29  4:35                                                                     ` Hugh Dickins
2020-12-28 23:28                                                                   ` Linus Torvalds
2020-12-26 21:07                                           ` Linus Torvalds
2020-12-26 21:41                                           ` Matthew Wilcox
2020-12-09 16:39 ` [PATCH 2/2] arm64: mm: Implement arch_wants_old_faultaround_pte() Will Deacon
2020-12-09 18:35   ` Catalin Marinas
2020-12-09 18:46     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2012231905300.5723@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@android.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vinmenon@codeaurora.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).