From: Linus Torvalds <torvalds@linux-foundation.org> To: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com>, Matthew Wilcox <willy@infradead.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Will Deacon <will@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>, Linux ARM <linux-arm-kernel@lists.infradead.org>, Catalin Marinas <catalin.marinas@arm.com>, Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Vinayak Menon <vinmenon@codeaurora.org>, Android Kernel Team <kernel-team@android.com> Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Date: Mon, 28 Dec 2020 10:47:36 -0800 [thread overview] Message-ID: <CAHk-=wg4bzJ9ugrOp7DBoMjNpHechm4QWb0-HC3A_pN564RU5A@mail.gmail.com> (raw) In-Reply-To: <20201228125352.phnj2x2ci3kwfld5@box> [-- Attachment #1: Type: text/plain, Size: 1983 bytes --] On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > So far I only found one more pin leak and always-true check. I don't see > how can it lead to crash or corruption. Keep looking. Well, I noticed that the nommu.c version of filemap_map_pages() needs fixing, but that's obviously not the case Hugh sees. No,m I think the problem is the pte_unmap_unlock(vmf->pte, vmf->ptl); at the end of filemap_map_pages(). Why? Because we've been updating vmf->pte as we go along: vmf->pte += xas.xa_index - last_pgoff; and I think that by the time we get to that "pte_unmap_unlock()", vmf->pte potentially points to past the edge of the page directory. I think that is the bug that Hugh sees - simply because we get entirely confused about the page table locking. And it would match the latest change, which was all about moving that unlock from the caller to filemap_map_pages(), and now it's missing the pte fixup.. I personally think it's wrong to update vmf->pte at all. We should just have a local 'ptep' pointer that we update as we walk along. But that requires another change to the calling convention, namely to "do_set_pte()". Also, considering how complicated this patch is getting, I think it might be good to try to split it up a bit. In particular, I think the calling convention change for "filemap_map_pages()" could be done first and independently. And then as a second step, move the VM_FAULT_NOPAGE and "pte_unmap_lock()" from the callers to filemap_map_pages(). And then only as the final step, do that nice re-organization with first_map_page/next_map_page() and moving the locking from alloc_set_pte() into filemap_map_pages().. How does that sound? Anyway, Hugh, if it's about overshooting the pte pointer, maybe this absolutely horrendous and disgusting patch fixes it without the above kinds of more extensive cleanups. Worth testing, perhaps, even if it's too ugly for words? Linus [-- Attachment #2: patch --] [-- Type: application/octet-stream, Size: 1360 bytes --] mm/filemap.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 42d7a58e3a14..74384f9f1776 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3022,6 +3022,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, struct page *head, *page; unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); vm_fault_t ret = 0; + pte_t *orig_pte; + unsigned long orig_address; rcu_read_lock(); head = first_map_page(vmf, &xas, end_pgoff); @@ -3037,6 +3039,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + + /* + * Disgusting - we should not update vmf->pte and ->address, + * but do_set_pte() needs it + */ + orig_pte = vmf->pte; + orig_address = vmf->address; + do { page = find_subpage(head, xas.xa_index); if (PageHWPoison(page)) @@ -3066,6 +3076,10 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, unlock_page(head); put_page(head); } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); + + /* Hackety hack - reset the vmf state back */ + vmf->pte = orig_pte; + vmf->address = orig_address; pte_unmap_unlock(vmf->pte, vmf->ptl); rcu_read_unlock(); WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org> To: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Android Kernel Team <kernel-team@android.com>, Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>, Hugh Dickins <hughd@google.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Matthew Wilcox <willy@infradead.org>, Linux-MM <linux-mm@kvack.org>, Vinayak Menon <vinmenon@codeaurora.org>, Linux ARM <linux-arm-kernel@lists.infradead.org>, Andrew Morton <akpm@linux-foundation.org>, Will Deacon <will@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Date: Mon, 28 Dec 2020 10:47:36 -0800 [thread overview] Message-ID: <CAHk-=wg4bzJ9ugrOp7DBoMjNpHechm4QWb0-HC3A_pN564RU5A@mail.gmail.com> (raw) In-Reply-To: <20201228125352.phnj2x2ci3kwfld5@box> [-- Attachment #1: Type: text/plain, Size: 1983 bytes --] On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > So far I only found one more pin leak and always-true check. I don't see > how can it lead to crash or corruption. Keep looking. Well, I noticed that the nommu.c version of filemap_map_pages() needs fixing, but that's obviously not the case Hugh sees. No,m I think the problem is the pte_unmap_unlock(vmf->pte, vmf->ptl); at the end of filemap_map_pages(). Why? Because we've been updating vmf->pte as we go along: vmf->pte += xas.xa_index - last_pgoff; and I think that by the time we get to that "pte_unmap_unlock()", vmf->pte potentially points to past the edge of the page directory. I think that is the bug that Hugh sees - simply because we get entirely confused about the page table locking. And it would match the latest change, which was all about moving that unlock from the caller to filemap_map_pages(), and now it's missing the pte fixup.. I personally think it's wrong to update vmf->pte at all. We should just have a local 'ptep' pointer that we update as we walk along. But that requires another change to the calling convention, namely to "do_set_pte()". Also, considering how complicated this patch is getting, I think it might be good to try to split it up a bit. In particular, I think the calling convention change for "filemap_map_pages()" could be done first and independently. And then as a second step, move the VM_FAULT_NOPAGE and "pte_unmap_lock()" from the callers to filemap_map_pages(). And then only as the final step, do that nice re-organization with first_map_page/next_map_page() and moving the locking from alloc_set_pte() into filemap_map_pages().. How does that sound? Anyway, Hugh, if it's about overshooting the pte pointer, maybe this absolutely horrendous and disgusting patch fixes it without the above kinds of more extensive cleanups. Worth testing, perhaps, even if it's too ugly for words? Linus [-- Attachment #2: patch --] [-- Type: application/octet-stream, Size: 1360 bytes --] mm/filemap.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 42d7a58e3a14..74384f9f1776 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3022,6 +3022,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, struct page *head, *page; unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); vm_fault_t ret = 0; + pte_t *orig_pte; + unsigned long orig_address; rcu_read_lock(); head = first_map_page(vmf, &xas, end_pgoff); @@ -3037,6 +3039,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + + /* + * Disgusting - we should not update vmf->pte and ->address, + * but do_set_pte() needs it + */ + orig_pte = vmf->pte; + orig_address = vmf->address; + do { page = find_subpage(head, xas.xa_index); if (PageHWPoison(page)) @@ -3066,6 +3076,10 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, unlock_page(head); put_page(head); } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); + + /* Hackety hack - reset the vmf state back */ + vmf->pte = orig_pte; + vmf->address = orig_address; pte_unmap_unlock(vmf->pte, vmf->ptl); rcu_read_unlock(); WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); [-- Attachment #3: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-12-28 18:48 UTC|newest] Thread overview: 138+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-09 16:39 [PATCH 0/2] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 16:39 ` [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 18:40 ` Will Deacon 2020-12-09 18:40 ` Will Deacon 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 20:32 ` Matthew Wilcox 2020-12-09 20:32 ` Matthew Wilcox 2020-12-09 21:04 ` Linus Torvalds 2020-12-09 21:04 ` Linus Torvalds 2020-12-09 21:04 ` Linus Torvalds 2020-12-10 15:08 ` Kirill A. Shutemov 2020-12-10 15:08 ` Kirill A. Shutemov 2020-12-10 17:23 ` Linus Torvalds 2020-12-10 17:23 ` Linus Torvalds 2020-12-10 17:23 ` Linus Torvalds 2020-12-14 16:07 ` Kirill A. Shutemov 2020-12-14 16:07 ` Kirill A. Shutemov 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 18:56 ` Matthew Wilcox 2020-12-14 18:56 ` Matthew Wilcox 2020-12-16 17:07 ` Kirill A. Shutemov 2020-12-16 17:07 ` Kirill A. Shutemov 2020-12-16 18:41 ` Linus Torvalds 2020-12-16 18:41 ` Linus Torvalds 2020-12-16 18:41 ` Linus Torvalds 2020-12-17 10:54 ` Kirill A. Shutemov 2020-12-17 10:54 ` Kirill A. Shutemov 2020-12-17 18:22 ` Linus Torvalds 2020-12-17 18:22 ` Linus Torvalds 2020-12-17 18:22 ` Linus Torvalds 2020-12-18 11:04 ` Kirill A. Shutemov 2020-12-18 11:04 ` Kirill A. Shutemov 2020-12-18 18:56 ` Linus Torvalds 2020-12-18 18:56 ` Linus Torvalds 2020-12-18 18:56 ` Linus Torvalds 2020-12-19 12:41 ` Kirill A. Shutemov 2020-12-19 12:41 ` Kirill A. Shutemov 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-22 10:00 ` Kirill A. Shutemov 2020-12-22 10:00 ` Kirill A. Shutemov 2020-12-24 4:04 ` Hugh Dickins 2020-12-24 4:04 ` Hugh Dickins 2020-12-24 4:04 ` Hugh Dickins 2020-12-25 11:31 ` Kirill A. Shutemov 2020-12-25 11:31 ` Kirill A. Shutemov 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 20:43 ` Kirill A. Shutemov 2020-12-26 20:43 ` Kirill A. Shutemov 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 22:40 ` Kirill A. Shutemov 2020-12-26 22:40 ` Kirill A. Shutemov 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 20:32 ` Damian Tometzki 2020-12-27 20:32 ` Damian Tometzki 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:55 ` Kirill A. Shutemov 2020-12-27 23:55 ` Kirill A. Shutemov 2020-12-27 23:48 ` Kirill A. Shutemov 2020-12-27 23:48 ` Kirill A. Shutemov 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 12:53 ` Kirill A. Shutemov 2020-12-28 12:53 ` Kirill A. Shutemov 2020-12-28 18:47 ` Linus Torvalds [this message] 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-29 13:28 ` Kirill A. Shutemov 2020-12-29 13:28 ` Kirill A. Shutemov 2020-12-29 15:19 ` Matthew Wilcox 2020-12-29 15:19 ` Matthew Wilcox 2020-12-29 20:52 ` Linus Torvalds 2020-12-29 20:52 ` Linus Torvalds 2020-12-29 20:52 ` Linus Torvalds 2020-12-28 22:05 ` Kirill A. Shutemov 2020-12-28 22:05 ` Kirill A. Shutemov 2020-12-28 22:12 ` Kirill A. Shutemov 2020-12-28 22:12 ` Kirill A. Shutemov 2020-12-29 4:35 ` Hugh Dickins 2020-12-29 4:35 ` Hugh Dickins 2020-12-29 4:35 ` Hugh Dickins 2020-12-28 23:28 ` Linus Torvalds 2020-12-28 23:28 ` Linus Torvalds 2020-12-28 23:28 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:41 ` Matthew Wilcox 2020-12-26 21:41 ` Matthew Wilcox 2020-12-09 16:39 ` [PATCH 2/2] arm64: mm: Implement arch_wants_old_faultaround_pte() Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 18:35 ` Catalin Marinas 2020-12-09 18:35 ` Catalin Marinas 2020-12-09 18:46 ` Will Deacon 2020-12-09 18:46 ` Will Deacon
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAHk-=wg4bzJ9ugrOp7DBoMjNpHechm4QWb0-HC3A_pN564RU5A@mail.gmail.com' \ --to=torvalds@linux-foundation.org \ --cc=akpm@linux-foundation.org \ --cc=catalin.marinas@arm.com \ --cc=hughd@google.com \ --cc=jack@suse.cz \ --cc=kernel-team@android.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=kirill@shutemov.name \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=minchan@kernel.org \ --cc=vinmenon@codeaurora.org \ --cc=will@kernel.org \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.