linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Android Kernel Team <kernel-team@android.com>,
	Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Linux-MM <linux-mm@kvack.org>,
	Vinayak Menon <vinmenon@codeaurora.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Will Deacon <will@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting
Date: Mon, 28 Dec 2020 20:35:06 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.2012281925420.1028@eggly.anvils> (raw)
In-Reply-To: <20201228221237.6nu75kgxq7ikxn2a@box>

Got it at last, sorry it's taken so long.

On Tue, 29 Dec 2020, Kirill A. Shutemov wrote:
> On Tue, Dec 29, 2020 at 01:05:48AM +0300, Kirill A. Shutemov wrote:
> > On Mon, Dec 28, 2020 at 10:47:36AM -0800, Linus Torvalds wrote:
> > > On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > >
> > > > So far I only found one more pin leak and always-true check. I don't see
> > > > how can it lead to crash or corruption. Keep looking.

Those mods look good in themselves, but, as you expected,
made no difference to the corruption I was seeing.

> > > 
> > > Well, I noticed that the nommu.c version of filemap_map_pages() needs
> > > fixing, but that's obviously not the case Hugh sees.
> > > 
> > > No,m I think the problem is the
> > > 
> > >         pte_unmap_unlock(vmf->pte, vmf->ptl);
> > > 
> > > at the end of filemap_map_pages().
> > > 
> > > Why?
> > > 
> > > Because we've been updating vmf->pte as we go along:
> > > 
> > >                 vmf->pte += xas.xa_index - last_pgoff;
> > > 
> > > and I think that by the time we get to that "pte_unmap_unlock()",
> > > vmf->pte potentially points to past the edge of the page directory.
> > 
> > Well, if it's true we have bigger problem: we set up an pte entry without
> > relevant PTL.
> > 
> > But I *think* we should be fine here: do_fault_around() limits start_pgoff
> > and end_pgoff to stay within the page table.

Yes, Linus's patch had made no difference,
the map_pages loop is safe in that respect.

> > 
> > It made mw looking at the code around pte_unmap_unlock() and I think that
> > the bug is that we have to reset vmf->address and NULLify vmf->pte once we
> > are done with faultaround:
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> 
> Ugh.. Wrong place. Need to sleep.
> 
> I'll look into your idea tomorrow.
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 87671284de62..e4daab80ed81 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2987,6 +2987,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address,
>  	} while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL);
>  	pte_unmap_unlock(vmf->pte, vmf->ptl);
>  	rcu_read_unlock();
> +	vmf->address = address;
> +	vmf->pte = NULL;
>  	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
>  
>  	return ret;
> -- 

And that made no (noticeable) difference either.  But at last
I realized, it's absolutely on the right track, but missing the
couple of early returns at the head of filemap_map_pages(): add

--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3025,14 +3025,12 @@ vm_fault_t filemap_map_pages(struct vm_f
 
 	rcu_read_lock();
 	head = first_map_page(vmf, &xas, end_pgoff);
-	if (!head) {
-		rcu_read_unlock();
-		return 0;
-	}
+	if (!head)
+		goto out;
 
 	if (filemap_map_pmd(vmf, head)) {
-		rcu_read_unlock();
-		return VM_FAULT_NOPAGE;
+		ret = VM_FAULT_NOPAGE;
+		goto out;
 	}
 
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
@@ -3066,9 +3064,9 @@ unlock:
 		put_page(head);
 	} while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
+out:
 	rcu_read_unlock();
 	vmf->address = address;
-	vmf->pte = NULL;
 	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 
 	return ret;
--

and then the corruption is fixed.  It seems miraculous that the
machines even booted with that bad vmf->address going to __do_fault():
maybe that tells us what a good job map_pages does most of the time.

You'll see I've tried removing the "vmf->pte = NULL;" there. I did
criticize earlier that vmf->pte was being left set, but was either
thinking back to some earlier era of mm/memory.c, or else confusing
with vmf->prealloc_pte, which is NULLed when consumed: I could not
find anywhere in mm/memory.c which now needs vmf->pte to be cleared,
and I seem to run fine without it (even on i386 HIGHPTE).

So, the mystery is solved; but I don't think any of these patches
should be applied.  Without thinking through Linus's suggestions
re do_set_pte() in particular, I do think this map_pages interface
is too ugly, and given us lots of trouble: please take your time
to go over it all again, and come up with a cleaner patch.

I've grown rather jaded, and questioning the value of the rework:
I don't think I want to look at or test another for a week or so.

Hugh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-12-29  4:37 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 16:39 [PATCH 0/2] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
2020-12-09 16:39 ` [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon
2020-12-09 17:58   ` Linus Torvalds
2020-12-09 18:40     ` Will Deacon
2020-12-09 19:04       ` Linus Torvalds
2020-12-09 20:32         ` Matthew Wilcox
2020-12-09 21:04           ` Linus Torvalds
2020-12-10 15:08         ` Kirill A. Shutemov
2020-12-10 17:23           ` Linus Torvalds
2020-12-14 16:07             ` Kirill A. Shutemov
2020-12-14 17:54               ` Linus Torvalds
2020-12-14 18:56                 ` Matthew Wilcox
2020-12-16 17:07                 ` Kirill A. Shutemov
2020-12-16 18:41                   ` Linus Torvalds
2020-12-17 10:54                     ` Kirill A. Shutemov
2020-12-17 18:22                       ` Linus Torvalds
2020-12-18 11:04                         ` Kirill A. Shutemov
2020-12-18 18:56                           ` Linus Torvalds
2020-12-19 12:41                             ` Kirill A. Shutemov
2020-12-19 20:08                               ` Linus Torvalds
2020-12-19 20:34                               ` Linus Torvalds
2020-12-22 10:00                                 ` Kirill A. Shutemov
2020-12-24  4:04                                   ` Hugh Dickins
2020-12-25 11:31                                     ` Kirill A. Shutemov
2020-12-26 17:57                                       ` Linus Torvalds
2020-12-26 20:43                                         ` Kirill A. Shutemov
2020-12-26 21:03                                           ` Hugh Dickins
2020-12-26 21:16                                             ` Linus Torvalds
2020-12-26 22:40                                               ` Kirill A. Shutemov
2020-12-27  0:45                                                 ` Hugh Dickins
2020-12-27  2:38                                                   ` Hugh Dickins
2020-12-27 19:38                                                     ` Linus Torvalds
2020-12-27 20:32                                                       ` Damian Tometzki
2020-12-27 22:35                                                         ` Hugh Dickins
2020-12-27 23:12                                                           ` Linus Torvalds
2020-12-27 23:40                                                             ` Linus Torvalds
2020-12-27 23:55                                                               ` Kirill A. Shutemov
2020-12-27 23:48                                                       ` Kirill A. Shutemov
2020-12-28  1:54                                                         ` Linus Torvalds
2020-12-28  6:43                                                           ` Hugh Dickins
2020-12-28 12:53                                                             ` Kirill A. Shutemov
2020-12-28 18:47                                                               ` Linus Torvalds
2020-12-28 21:58                                                                 ` Linus Torvalds
2020-12-29 13:28                                                                   ` Kirill A. Shutemov
2020-12-29 15:19                                                                     ` Matthew Wilcox
2020-12-29 20:52                                                                     ` Linus Torvalds
2020-12-28 22:05                                                                 ` Kirill A. Shutemov
2020-12-28 22:12                                                                   ` Kirill A. Shutemov
2020-12-29  4:35                                                                     ` Hugh Dickins [this message]
2020-12-28 23:28                                                                   ` Linus Torvalds
2020-12-26 21:07                                           ` Linus Torvalds
2020-12-26 21:41                                           ` Matthew Wilcox
2020-12-09 16:39 ` [PATCH 2/2] arm64: mm: Implement arch_wants_old_faultaround_pte() Will Deacon
2020-12-09 18:35   ` Catalin Marinas
2020-12-09 18:46     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2012281925420.1028@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@android.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vinmenon@codeaurora.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).