linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail
@ 2023-06-08 19:07 Hugh Dickins
  2023-06-08 19:10 ` [PATCH v2 01/23] arm: " Hugh Dickins
                   ` (22 more replies)
  0 siblings, 23 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

Here is v2 series of patches to various architectures, based on v6.4-rc5:
preparing for v2 of changes following in mm, affecting pte_offset_map()
and pte_offset_map_lock().  There are very few differences from v1:
noted patch by patch below.

v1 was "arch: allow pte_offset_map[_lock]() to fail"
https://lore.kernel.org/linux-mm/77a5d8c-406b-7068-4f17-23b7ac53bc83@google.com/
series of 23 posted on 2023-05-09,
followed by "mm: allow pte_offset_map[_lock]() to fail"
https://lore.kernel.org/linux-mm/68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com/
series of 31 posted on 2023-05-21,
followed by  "mm: free retracted page table by RCU"
https://lore.kernel.org/linux-mm/35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com/
series of 12 posted on 2023-05-28.

The first two series are "independent": neither depends
for build or correctness on the other, and the arch patches can either be
merged separately via arch trees, or be picked up by akpm; but both series
must be in before the third series is added to make the effective changes
(and that adds just a little more in arm, powerpc, s390 and sparc).

What is it all about?  Some mmap_lock avoidance i.e. latency reduction.
Initially just for the case of collapsing shmem or file pages to THPs;
but likely to be relied upon later in other contexts e.g. freeing of
empty page tables (but that's not work I'm doing).  mmap_write_lock
avoidance when collapsing to anon THPs?  Perhaps, but again that's not
work I've done: a quick attempt was not as easy as the shmem/file case.

I would much prefer not to have to make these small but wide-ranging
changes for such a niche case; but failed to find another way, and
have heard that shmem MADV_COLLAPSE's usefulness is being limited by
that mmap_write_lock it currently requires.

These changes (though of course not these exact patches, and not all
of these architectures!) have been in Google's data centre kernel for
three years now: we do rely upon them.

What are the per-arch changes about?  Generally, two things.

One: the current mmap locking may not be enough to guard against that
tricky transition between pmd entry pointing to page table, and empty
pmd entry, and pmd entry pointing to huge page: pte_offset_map() will
have to validate the pmd entry for itself, returning NULL if no page
table is there.  What to do about that varies: often the nearby error
handling indicates just to skip it; but in some cases a "goto again"
looks appropriate (and if that risks an infinite loop, then there
must have been an oops, or pfn 0 mistaken for page table, before).

Deeper study of each site might show that 90% of them here in arch
code could only fail if there's corruption e.g. a transition to THP
would be surprising on an arch without HAVE_ARCH_TRANSPARENT_HUGEPAGE.
But given the likely extension to freeing empty page tables, I have
not limited this set of changes to THP; and it has been easier, and
sets a better example, if each site is given appropriate handling.

Two: pte_offset_map() will need to do an rcu_read_lock(), with the
corresponding rcu_read_unlock() in pte_unmap().  But most architectures
never supported CONFIG_HIGHPTE, so some don't always call pte_unmap()
after pte_offset_map(), or have used userspace pte_offset_map() where
pte_offset_kernel() is more correct.  No problem in the current tree,
but a problem once an rcu_read_unlock() will be needed to keep balance.

A common special case of that comes in arch/*/mm/hugetlbpage.c, if
the architecture supports hugetlb pages down at the lowest PTE level.
huge_pte_alloc() uses pte_alloc_map(), but generic hugetlb code does
no corresponding pte_unmap(); similarly for huge_pte_offset().
Thanks to Mike Kravetz and Andrew Morton, v6.4-rc1 already provides
pte_alloc_huge() and pte_offset_huge() to help fix up those cases.

This posting is based on v6.4-rc5, but good for any v6.4-rc,
current mm-everything and linux-next.

01/23 arm: allow pte_offset_map[_lock]() to fail
      v2: same as v1
02/23 arm64: allow pte_offset_map() to fail
      v2: add ack from Catalin
03/23 arm64/hugetlb: pte_alloc_huge() pte_offset_huge()
      v2: add ack from Catalin
04/23 ia64/hugetlb: pte_alloc_huge() pte_offset_huge()
      v2: same as v1
05/23 m68k: allow pte_offset_map[_lock]() to fail
      v2: same as v1
06/23 microblaze: allow pte_offset_map() to fail
      v2: same as v1
07/23 mips: update_mmu_cache() can replace __update_tlb()
      v2: same as v1
08/23 parisc: add pte_unmap() to balance get_ptep()
      v2: typo fix from Helge; stronger commit message
09/23 parisc: unmap_uncached_pte() use pte_offset_kernel()
      v2: same as v1
10/23 parisc/hugetlb: pte_alloc_huge() pte_offset_huge()
      v2: same as v1
11/23 powerpc: kvmppc_unmap_free_pmd() pte_offset_kernel()
      v2: same as v1
12/23 powerpc: allow pte_offset_map[_lock]() to fail
      v2: same as v1
13/23 powerpc/hugetlb: pte_alloc_huge()
      v2: same as v1
14/23 riscv/hugetlb: pte_alloc_huge() pte_offset_huge()
      v2: add review from Alex, ack from Palmer
15/23 s390: allow pte_offset_map_lock() to fail
      v2: add comment for Claudio
16/23 s390: gmap use pte_unmap_unlock() not spin_unlock()
      v2: add ack from Alexander
17/23 sh/hugetlb: pte_alloc_huge() pte_offset_huge()
      v2: same as v1
18/23 sparc/hugetlb: pte_alloc_huge() pte_offset_huge()
      v2: same as v1
19/23 sparc: allow pte_offset_map() to fail
      v2: same as v1
20/23 sparc: iounit and iommu use pte_offset_kernel()
      v2: same as v1
21/23 x86: Allow get_locked_pte() to fail
      v2: add WARN_ON_ONCE from PeterZ
22/23 x86: sme_populate_pgd() use pte_offset_kernel()
      v2: same as v1
23/23 xtensa: add pte_unmap() to balance pte_offset_map()
      v2: stronger commit message

 arch/arm/lib/uaccess_with_memcpy.c      |  3 ++
 arch/arm/mm/fault-armv.c                |  5 +++-
 arch/arm/mm/fault.c                     |  3 ++
 arch/arm64/mm/fault.c                   |  3 ++
 arch/arm64/mm/hugetlbpage.c             | 11 ++-----
 arch/ia64/mm/hugetlbpage.c              |  4 +--
 arch/m68k/include/asm/mmu_context.h     |  6 ++--
 arch/m68k/kernel/sys_m68k.c             |  2 ++
 arch/m68k/mm/mcfmmu.c                   | 52 +++++++++++++--------------------
 arch/microblaze/kernel/signal.c         |  5 ++--
 arch/mips/include/asm/pgtable.h         | 15 ++--------
 arch/mips/mm/tlb-r3k.c                  |  5 ++--
 arch/mips/mm/tlb-r4k.c                  |  9 ++----
 arch/parisc/kernel/cache.c              | 26 +++++++++++++----
 arch/parisc/kernel/pci-dma.c            |  2 +-
 arch/parisc/mm/hugetlbpage.c            |  4 +--
 arch/powerpc/kvm/book3s_64_mmu_radix.c  |  2 +-
 arch/powerpc/mm/book3s64/hash_tlb.c     |  4 +++
 arch/powerpc/mm/book3s64/subpage_prot.c |  2 ++
 arch/powerpc/mm/hugetlbpage.c           |  2 +-
 arch/powerpc/xmon/xmon.c                |  5 +++-
 arch/riscv/mm/hugetlbpage.c             |  4 +--
 arch/s390/kernel/uv.c                   |  2 ++
 arch/s390/mm/gmap.c                     | 31 ++++++++++++--------
 arch/s390/mm/pgtable.c                  | 12 ++++++--
 arch/sh/mm/hugetlbpage.c                |  4 +--
 arch/sparc/kernel/signal32.c            |  2 ++
 arch/sparc/mm/fault_64.c                |  3 ++
 arch/sparc/mm/hugetlbpage.c             |  4 +--
 arch/sparc/mm/io-unit.c                 |  2 +-
 arch/sparc/mm/iommu.c                   |  2 +-
 arch/sparc/mm/tlb.c                     |  2 ++
 arch/x86/kernel/ldt.c                   |  6 ++--
 arch/x86/mm/mem_encrypt_identity.c      |  2 +-
 arch/xtensa/mm/tlb.c                    |  5 +++-
 35 files changed, 146 insertions(+), 105 deletions(-)

Hugh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 01/23] arm: allow pte_offset_map[_lock]() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
@ 2023-06-08 19:10 ` Hugh Dickins
  2023-06-08 19:11 ` [PATCH v2 02/23] arm64: allow pte_offset_map() " Hugh Dickins
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/arm/lib/uaccess_with_memcpy.c | 3 +++
 arch/arm/mm/fault-armv.c           | 5 ++++-
 arch/arm/mm/fault.c                | 3 +++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm/lib/uaccess_with_memcpy.c b/arch/arm/lib/uaccess_with_memcpy.c
index e4c2677cc1e9..2f6163f05e93 100644
--- a/arch/arm/lib/uaccess_with_memcpy.c
+++ b/arch/arm/lib/uaccess_with_memcpy.c
@@ -74,6 +74,9 @@ pin_page_for_write(const void __user *_addr, pte_t **ptep, spinlock_t **ptlp)
 		return 0;
 
 	pte = pte_offset_map_lock(current->mm, pmd, addr, &ptl);
+	if (unlikely(!pte))
+		return 0;
+
 	if (unlikely(!pte_present(*pte) || !pte_young(*pte) ||
 	    !pte_write(*pte) || !pte_dirty(*pte))) {
 		pte_unmap_unlock(pte, ptl);
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 0e49154454a6..ca5302b0b7ee 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -117,8 +117,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
 	 * must use the nested version.  This also means we need to
 	 * open-code the spin-locking.
 	 */
-	ptl = pte_lockptr(vma->vm_mm, pmd);
 	pte = pte_offset_map(pmd, address);
+	if (!pte)
+		return 0;
+
+	ptl = pte_lockptr(vma->vm_mm, pmd);
 	do_pte_lock(ptl);
 
 	ret = do_adjust_pte(vma, address, pfn, pte);
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 2418f1efabd8..83598649a094 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -85,6 +85,9 @@ void show_pte(const char *lvl, struct mm_struct *mm, unsigned long addr)
 			break;
 
 		pte = pte_offset_map(pmd, addr);
+		if (!pte)
+			break;
+
 		pr_cont(", *pte=%08llx", (long long)pte_val(*pte));
 #ifndef CONFIG_ARM_LPAE
 		pr_cont(", *ppte=%08llx",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 02/23] arm64: allow pte_offset_map() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
  2023-06-08 19:10 ` [PATCH v2 01/23] arm: " Hugh Dickins
@ 2023-06-08 19:11 ` Hugh Dickins
  2023-06-08 19:13 ` [PATCH v2 03/23] arm64/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/mm/fault.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index cb21ccd7940d..f3aaba853547 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -177,6 +177,9 @@ static void show_pte(unsigned long addr)
 			break;
 
 		ptep = pte_offset_map(pmdp, addr);
+		if (!ptep)
+			break;
+
 		pte = READ_ONCE(*ptep);
 		pr_cont(", pte=%016llx", pte_val(pte));
 		pte_unmap(ptep);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 03/23] arm64/hugetlb: pte_alloc_huge() pte_offset_huge()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
  2023-06-08 19:10 ` [PATCH v2 01/23] arm: " Hugh Dickins
  2023-06-08 19:11 ` [PATCH v2 02/23] arm64: allow pte_offset_map() " Hugh Dickins
@ 2023-06-08 19:13 ` Hugh Dickins
  2023-06-08 19:14 ` [PATCH v2 04/23] ia64/hugetlb: " Hugh Dickins
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead; with pte_offset_huge() a better name for pte_offset_kernel().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 95364e8bdc19..21716c940682 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -307,14 +307,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			return NULL;
 
 		WARN_ON(addr & (sz - 1));
-		/*
-		 * Note that if this code were ever ported to the
-		 * 32-bit arm platform then it will cause trouble in
-		 * the case where CONFIG_HIGHPTE is set, since there
-		 * will be no pte_unmap() to correspond with this
-		 * pte_alloc_map().
-		 */
-		ptep = pte_alloc_map(mm, pmdp, addr);
+		ptep = pte_alloc_huge(mm, pmdp, addr);
 	} else if (sz == PMD_SIZE) {
 		if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp)))
 			ptep = huge_pmd_share(mm, vma, addr, pudp);
@@ -366,7 +359,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 		return (pte_t *)pmdp;
 
 	if (sz == CONT_PTE_SIZE)
-		return pte_offset_kernel(pmdp, (addr & CONT_PTE_MASK));
+		return pte_offset_huge(pmdp, (addr & CONT_PTE_MASK));
 
 	return NULL;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 04/23] ia64/hugetlb: pte_alloc_huge() pte_offset_huge()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (2 preceding siblings ...)
  2023-06-08 19:13 ` [PATCH v2 03/23] arm64/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
@ 2023-06-08 19:14 ` Hugh Dickins
  2023-06-08 19:15 ` [PATCH v2 05/23] m68k: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead; with pte_offset_huge() a better name for pte_offset_kernel().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/ia64/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index 78a02e026164..adc49f2d22e8 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -41,7 +41,7 @@ huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (pud) {
 		pmd = pmd_alloc(mm, pud, taddr);
 		if (pmd)
-			pte = pte_alloc_map(mm, pmd, taddr);
+			pte = pte_alloc_huge(mm, pmd, taddr);
 	}
 	return pte;
 }
@@ -64,7 +64,7 @@ huge_pte_offset (struct mm_struct *mm, unsigned long addr, unsigned long sz)
 			if (pud_present(*pud)) {
 				pmd = pmd_offset(pud, taddr);
 				if (pmd_present(*pmd))
-					pte = pte_offset_map(pmd, taddr);
+					pte = pte_offset_huge(pmd, taddr);
 			}
 		}
 	}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 05/23] m68k: allow pte_offset_map[_lock]() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (3 preceding siblings ...)
  2023-06-08 19:14 ` [PATCH v2 04/23] ia64/hugetlb: " Hugh Dickins
@ 2023-06-08 19:15 ` Hugh Dickins
  2023-06-08 19:16 ` [PATCH v2 06/23] microblaze: allow pte_offset_map() " Hugh Dickins
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Restructure cf_tlb_miss() with a pte_unmap() (previously omitted)
at label out, followed by one local_irq_restore() for all.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/m68k/include/asm/mmu_context.h |  6 ++--
 arch/m68k/kernel/sys_m68k.c         |  2 ++
 arch/m68k/mm/mcfmmu.c               | 52 ++++++++++++-----------------
 3 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/arch/m68k/include/asm/mmu_context.h b/arch/m68k/include/asm/mmu_context.h
index 8ed6ac14d99f..141bbdfad960 100644
--- a/arch/m68k/include/asm/mmu_context.h
+++ b/arch/m68k/include/asm/mmu_context.h
@@ -99,7 +99,7 @@ static inline void load_ksp_mmu(struct task_struct *task)
 	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte;
+	pte_t *pte = NULL;
 	unsigned long mmuar;
 
 	local_irq_save(flags);
@@ -139,7 +139,7 @@ static inline void load_ksp_mmu(struct task_struct *task)
 
 	pte = (mmuar >= PAGE_OFFSET) ? pte_offset_kernel(pmd, mmuar)
 				     : pte_offset_map(pmd, mmuar);
-	if (pte_none(*pte) || !pte_present(*pte))
+	if (!pte || pte_none(*pte) || !pte_present(*pte))
 		goto bug;
 
 	set_pte(pte, pte_mkyoung(*pte));
@@ -161,6 +161,8 @@ static inline void load_ksp_mmu(struct task_struct *task)
 bug:
 	pr_info("ksp load failed: mm=0x%p ksp=0x08%lx\n", mm, mmuar);
 end:
+	if (pte && mmuar < PAGE_OFFSET)
+		pte_unmap(pte);
 	local_irq_restore(flags);
 }
 
diff --git a/arch/m68k/kernel/sys_m68k.c b/arch/m68k/kernel/sys_m68k.c
index bd0274c7592e..c586034d2a7a 100644
--- a/arch/m68k/kernel/sys_m68k.c
+++ b/arch/m68k/kernel/sys_m68k.c
@@ -488,6 +488,8 @@ sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
 		if (!pmd_present(*pmd))
 			goto bad_access;
 		pte = pte_offset_map_lock(mm, pmd, (unsigned long)mem, &ptl);
+		if (!pte)
+			goto bad_access;
 		if (!pte_present(*pte) || !pte_dirty(*pte)
 		    || !pte_write(*pte)) {
 			pte_unmap_unlock(pte, ptl);
diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index 70aa0979e027..42f45abea37a 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -91,7 +91,8 @@ int cf_tlb_miss(struct pt_regs *regs, int write, int dtlb, int extension_word)
 	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte;
+	pte_t *pte = NULL;
+	int ret = -1;
 	int asid;
 
 	local_irq_save(flags);
@@ -100,47 +101,33 @@ int cf_tlb_miss(struct pt_regs *regs, int write, int dtlb, int extension_word)
 		regs->pc + (extension_word * sizeof(long));
 
 	mm = (!user_mode(regs) && KMAPAREA(mmuar)) ? &init_mm : current->mm;
-	if (!mm) {
-		local_irq_restore(flags);
-		return -1;
-	}
+	if (!mm)
+		goto out;
 
 	pgd = pgd_offset(mm, mmuar);
-	if (pgd_none(*pgd))  {
-		local_irq_restore(flags);
-		return -1;
-	}
+	if (pgd_none(*pgd))
+		goto out;
 
 	p4d = p4d_offset(pgd, mmuar);
-	if (p4d_none(*p4d)) {
-		local_irq_restore(flags);
-		return -1;
-	}
+	if (p4d_none(*p4d))
+		goto out;
 
 	pud = pud_offset(p4d, mmuar);
-	if (pud_none(*pud)) {
-		local_irq_restore(flags);
-		return -1;
-	}
+	if (pud_none(*pud))
+		goto out;
 
 	pmd = pmd_offset(pud, mmuar);
-	if (pmd_none(*pmd)) {
-		local_irq_restore(flags);
-		return -1;
-	}
+	if (pmd_none(*pmd))
+		goto out;
 
 	pte = (KMAPAREA(mmuar)) ? pte_offset_kernel(pmd, mmuar)
 				: pte_offset_map(pmd, mmuar);
-	if (pte_none(*pte) || !pte_present(*pte)) {
-		local_irq_restore(flags);
-		return -1;
-	}
+	if (!pte || pte_none(*pte) || !pte_present(*pte))
+		goto out;
 
 	if (write) {
-		if (!pte_write(*pte)) {
-			local_irq_restore(flags);
-			return -1;
-		}
+		if (!pte_write(*pte))
+			goto out;
 		set_pte(pte, pte_mkdirty(*pte));
 	}
 
@@ -161,9 +148,12 @@ int cf_tlb_miss(struct pt_regs *regs, int write, int dtlb, int extension_word)
 		mmu_write(MMUOR, MMUOR_ACC | MMUOR_UAA);
 	else
 		mmu_write(MMUOR, MMUOR_ITLB | MMUOR_ACC | MMUOR_UAA);
-
+	ret = 0;
+out:
+	if (pte && !KMAPAREA(mmuar))
+		pte_unmap(pte);
 	local_irq_restore(flags);
-	return 0;
+	return ret;
 }
 
 void __init cf_bootmem_alloc(void)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 06/23] microblaze: allow pte_offset_map() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (4 preceding siblings ...)
  2023-06-08 19:15 ` [PATCH v2 05/23] m68k: allow pte_offset_map[_lock]() to fail Hugh Dickins
@ 2023-06-08 19:16 ` Hugh Dickins
  2023-06-08 19:17 ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Hugh Dickins
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/microblaze/kernel/signal.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/microblaze/kernel/signal.c b/arch/microblaze/kernel/signal.c
index c3aebec71c0c..c78a0ff48066 100644
--- a/arch/microblaze/kernel/signal.c
+++ b/arch/microblaze/kernel/signal.c
@@ -194,7 +194,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
 
 	preempt_disable();
 	ptep = pte_offset_map(pmdp, address);
-	if (pte_present(*ptep)) {
+	if (ptep && pte_present(*ptep)) {
 		address = (unsigned long) page_address(pte_page(*ptep));
 		/* MS: I need add offset in page */
 		address += ((unsigned long)frame->tramp) & ~PAGE_MASK;
@@ -203,7 +203,8 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
 		invalidate_icache_range(address, address + 8);
 		flush_dcache_range(address, address + 8);
 	}
-	pte_unmap(ptep);
+	if (ptep)
+		pte_unmap(ptep);
 	preempt_enable();
 	if (err)
 		return -EFAULT;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (5 preceding siblings ...)
  2023-06-08 19:16 ` [PATCH v2 06/23] microblaze: allow pte_offset_map() " Hugh Dickins
@ 2023-06-08 19:17 ` Hugh Dickins
  2023-06-09  8:08   ` [PATCH v2 07/23 fix] mips: update_mmu_cache() can replace __update_tlb(): fix Hugh Dickins
  2023-06-14 23:17   ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Nathan Chancellor
  2023-06-08 19:18 ` [PATCH v2 08/23] parisc: add pte_unmap() to balance get_ptep() Hugh Dickins
                   ` (15 subsequent siblings)
  22 siblings, 2 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

Don't make update_mmu_cache() a wrapper around __update_tlb(): call it
directly, and use the ptep (or pmdp) provided by the caller, instead of
re-calling pte_offset_map() - which would raise a question of whether a
pte_unmap() is needed to balance it.

Check whether the "ptep" provided by the caller is actually the pmdp,
instead of testing pmd_huge(): or test pmd_huge() too and warn if it
disagrees?  This is "hazardous" territory: needs review and testing.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/mips/include/asm/pgtable.h | 15 +++------------
 arch/mips/mm/tlb-r3k.c          |  5 +++--
 arch/mips/mm/tlb-r4k.c          |  9 +++------
 3 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 574fa14ac8b2..9175dfab08d5 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -565,15 +565,8 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 }
 #endif
 
-extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
-	pte_t pte);
-
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *ptep)
-{
-	pte_t pte = *ptep;
-	__update_tlb(vma, address, pte);
-}
+extern void update_mmu_cache(struct vm_area_struct *vma,
+	unsigned long address, pte_t *ptep);
 
 #define	__HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb	update_mmu_cache
@@ -581,9 +574,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
 	unsigned long address, pmd_t *pmdp)
 {
-	pte_t pte = *(pte_t *)pmdp;
-
-	__update_tlb(vma, address, pte);
+	update_mmu_cache(vma, address, (pte_t *)pmdp);
 }
 
 /*
diff --git a/arch/mips/mm/tlb-r3k.c b/arch/mips/mm/tlb-r3k.c
index 53dfa2b9316b..e5722cd8dd6d 100644
--- a/arch/mips/mm/tlb-r3k.c
+++ b/arch/mips/mm/tlb-r3k.c
@@ -176,7 +176,8 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 	}
 }
 
-void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t pte)
+void update_mmu_cache(struct vm_area_struct *vma,
+		      unsigned long address, pte_t *ptep)
 {
 	unsigned long asid_mask = cpu_asid_mask(&current_cpu_data);
 	unsigned long flags;
@@ -203,7 +204,7 @@ void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t pte)
 	BARRIER;
 	tlb_probe();
 	idx = read_c0_index();
-	write_c0_entrylo0(pte_val(pte));
+	write_c0_entrylo0(pte_val(*ptep));
 	write_c0_entryhi(address | pid);
 	if (idx < 0) {					/* BARRIER */
 		tlb_write_random();
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index 1b939abbe4ca..c96725d17cab 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -290,14 +290,14 @@ void local_flush_tlb_one(unsigned long page)
  * updates the TLB with the new pte(s), and another which also checks
  * for the R4k "end of page" hardware bug and does the needy.
  */
-void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
+void update_mmu_cache(struct vm_area_struct *vma,
+		      unsigned long address, pte_t *ptep)
 {
 	unsigned long flags;
 	pgd_t *pgdp;
 	p4d_t *p4dp;
 	pud_t *pudp;
 	pmd_t *pmdp;
-	pte_t *ptep;
 	int idx, pid;
 
 	/*
@@ -326,10 +326,9 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 	idx = read_c0_index();
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/* this could be a huge page  */
-	if (pmd_huge(*pmdp)) {
+	if (ptep == (pte_t *)pmdp) {
 		unsigned long lo;
 		write_c0_pagemask(PM_HUGE_MASK);
-		ptep = (pte_t *)pmdp;
 		lo = pte_to_entrylo(pte_val(*ptep));
 		write_c0_entrylo0(lo);
 		write_c0_entrylo1(lo + (HPAGE_SIZE >> 7));
@@ -344,8 +343,6 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 	} else
 #endif
 	{
-		ptep = pte_offset_map(pmdp, address);
-
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 #ifdef CONFIG_XPA
 		write_c0_entrylo0(pte_to_entrylo(ptep->pte_high));
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 08/23] parisc: add pte_unmap() to balance get_ptep()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (6 preceding siblings ...)
  2023-06-08 19:17 ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Hugh Dickins
@ 2023-06-08 19:18 ` Hugh Dickins
  2023-06-19  3:55   ` Helge Deller
  2023-06-08 19:20 ` [PATCH v2 09/23] parisc: unmap_uncached_pte() use pte_offset_kernel() Hugh Dickins
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

To keep balance in future, remember to pte_unmap() after a successful
get_ptep().  And act as if flush_cache_pages() really needs a map there,
to read the pfn before "unmapping", to be sure page table is not removed.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/parisc/kernel/cache.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index ca4a302d4365..501160250bb7 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -426,10 +426,15 @@ void flush_dcache_page(struct page *page)
 		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
 		addr = mpnt->vm_start + offset;
 		if (parisc_requires_coherency()) {
+			bool needs_flush = false;
 			pte_t *ptep;
 
 			ptep = get_ptep(mpnt->vm_mm, addr);
-			if (ptep && pte_needs_flush(*ptep))
+			if (ptep) {
+				needs_flush = pte_needs_flush(*ptep);
+				pte_unmap(ptep);
+			}
+			if (needs_flush)
 				flush_user_cache_page(mpnt, addr);
 		} else {
 			/*
@@ -561,14 +566,20 @@ EXPORT_SYMBOL(flush_kernel_dcache_page_addr);
 static void flush_cache_page_if_present(struct vm_area_struct *vma,
 	unsigned long vmaddr, unsigned long pfn)
 {
-	pte_t *ptep = get_ptep(vma->vm_mm, vmaddr);
+	bool needs_flush = false;
+	pte_t *ptep;
 
 	/*
 	 * The pte check is racy and sometimes the flush will trigger
 	 * a non-access TLB miss. Hopefully, the page has already been
 	 * flushed.
 	 */
-	if (ptep && pte_needs_flush(*ptep))
+	ptep = get_ptep(vma->vm_mm, vmaddr);
+	if (ptep) {
+		needs_flush = pte_needs_flush(*ptep);
+		pte_unmap(ptep);
+	}
+	if (needs_flush)
 		flush_cache_page(vma, vmaddr, pfn);
 }
 
@@ -635,17 +646,22 @@ static void flush_cache_pages(struct vm_area_struct *vma, unsigned long start, u
 	pte_t *ptep;
 
 	for (addr = start; addr < end; addr += PAGE_SIZE) {
+		bool needs_flush = false;
 		/*
 		 * The vma can contain pages that aren't present. Although
 		 * the pte search is expensive, we need the pte to find the
 		 * page pfn and to check whether the page should be flushed.
 		 */
 		ptep = get_ptep(vma->vm_mm, addr);
-		if (ptep && pte_needs_flush(*ptep)) {
+		if (ptep) {
+			needs_flush = pte_needs_flush(*ptep);
+			pfn = pte_pfn(*ptep);
+			pte_unmap(ptep);
+		}
+		if (needs_flush) {
 			if (parisc_requires_coherency()) {
 				flush_user_cache_page(vma, addr);
 			} else {
-				pfn = pte_pfn(*ptep);
 				if (WARN_ON(!pfn_valid(pfn)))
 					return;
 				__flush_cache_page(vma, addr, PFN_PHYS(pfn));
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 09/23] parisc: unmap_uncached_pte() use pte_offset_kernel()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (7 preceding siblings ...)
  2023-06-08 19:18 ` [PATCH v2 08/23] parisc: add pte_unmap() to balance get_ptep() Hugh Dickins
@ 2023-06-08 19:20 ` Hugh Dickins
  2023-06-08 19:21 ` [PATCH v2 10/23] parisc/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

unmap_uncached_pte() is working from pgd_offset_k(vaddr), so it should
use pte_offset_kernel() instead of pte_offset_map(), to avoid the
question of whether a pte_unmap() will be needed to balance.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/parisc/kernel/pci-dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 71ed5391f29d..415f12d5bab3 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -164,7 +164,7 @@ static inline void unmap_uncached_pte(pmd_t * pmd, unsigned long vaddr,
 		pmd_clear(pmd);
 		return;
 	}
-	pte = pte_offset_map(pmd, vaddr);
+	pte = pte_offset_kernel(pmd, vaddr);
 	vaddr &= ~PMD_MASK;
 	end = vaddr + size;
 	if (end > PMD_SIZE)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 10/23] parisc/hugetlb: pte_alloc_huge() pte_offset_huge()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (8 preceding siblings ...)
  2023-06-08 19:20 ` [PATCH v2 09/23] parisc: unmap_uncached_pte() use pte_offset_kernel() Hugh Dickins
@ 2023-06-08 19:21 ` Hugh Dickins
  2023-06-08 19:22 ` [PATCH v2 11/23] powerpc: kvmppc_unmap_free_pmd() pte_offset_kernel() Hugh Dickins
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead; with pte_offset_huge() a better name for pte_offset_kernel().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/parisc/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
index d1d3990b83f6..a8a1a7c1e16e 100644
--- a/arch/parisc/mm/hugetlbpage.c
+++ b/arch/parisc/mm/hugetlbpage.c
@@ -66,7 +66,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (pud) {
 		pmd = pmd_alloc(mm, pud, addr);
 		if (pmd)
-			pte = pte_alloc_map(mm, pmd, addr);
+			pte = pte_alloc_huge(mm, pmd, addr);
 	}
 	return pte;
 }
@@ -90,7 +90,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 			if (!pud_none(*pud)) {
 				pmd = pmd_offset(pud, addr);
 				if (!pmd_none(*pmd))
-					pte = pte_offset_map(pmd, addr);
+					pte = pte_offset_huge(pmd, addr);
 			}
 		}
 	}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 11/23] powerpc: kvmppc_unmap_free_pmd() pte_offset_kernel()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (9 preceding siblings ...)
  2023-06-08 19:21 ` [PATCH v2 10/23] parisc/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
@ 2023-06-08 19:22 ` Hugh Dickins
  2023-06-08 19:23 ` [PATCH v2 12/23] powerpc: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

kvmppc_unmap_free_pmd() use pte_offset_kernel(), like everywhere else
in book3s_64_mmu_radix.c: instead of pte_offset_map(), which will come
to need a pte_unmap() to balance it.

But note that this is a more complex case than most: see those -EAGAINs
in kvmppc_create_pte(), which is coping with kvmppc races beween page
table and huge entry, of the kind which we are expecting to address
in pte_offset_map() - this might want to be revisited in future.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 461307b89c3a..572707858d65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -509,7 +509,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t *pmd, bool full,
 		} else {
 			pte_t *pte;
 
-			pte = pte_offset_map(p, 0);
+			pte = pte_offset_kernel(p, 0);
 			kvmppc_unmap_free_pte(kvm, pte, full, lpid);
 			pmd_clear(p);
 		}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 12/23] powerpc: allow pte_offset_map[_lock]() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (10 preceding siblings ...)
  2023-06-08 19:22 ` [PATCH v2 11/23] powerpc: kvmppc_unmap_free_pmd() pte_offset_kernel() Hugh Dickins
@ 2023-06-08 19:23 ` Hugh Dickins
  2023-06-08 19:24 ` [PATCH v2 13/23] powerpc/hugetlb: pte_alloc_huge() Hugh Dickins
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.
Balance successful pte_offset_map() with pte_unmap() where omitted.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/powerpc/mm/book3s64/hash_tlb.c     | 4 ++++
 arch/powerpc/mm/book3s64/subpage_prot.c | 2 ++
 arch/powerpc/xmon/xmon.c                | 5 ++++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/hash_tlb.c b/arch/powerpc/mm/book3s64/hash_tlb.c
index a64ea0a7ef96..21fcad97ae80 100644
--- a/arch/powerpc/mm/book3s64/hash_tlb.c
+++ b/arch/powerpc/mm/book3s64/hash_tlb.c
@@ -239,12 +239,16 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned long
 	local_irq_save(flags);
 	arch_enter_lazy_mmu_mode();
 	start_pte = pte_offset_map(pmd, addr);
+	if (!start_pte)
+		goto out;
 	for (pte = start_pte; pte < start_pte + PTRS_PER_PTE; pte++) {
 		unsigned long pteval = pte_val(*pte);
 		if (pteval & H_PAGE_HASHPTE)
 			hpte_need_flush(mm, addr, pte, pteval, 0);
 		addr += PAGE_SIZE;
 	}
+	pte_unmap(start_pte);
+out:
 	arch_leave_lazy_mmu_mode();
 	local_irq_restore(flags);
 }
diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book3s64/subpage_prot.c
index b75a9fb99599..0dc85556dec5 100644
--- a/arch/powerpc/mm/book3s64/subpage_prot.c
+++ b/arch/powerpc/mm/book3s64/subpage_prot.c
@@ -71,6 +71,8 @@ static void hpte_flush_range(struct mm_struct *mm, unsigned long addr,
 	if (pmd_none(*pmd))
 		return;
 	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	if (!pte)
+		return;
 	arch_enter_lazy_mmu_mode();
 	for (; npages > 0; --npages) {
 		pte_update(mm, addr, pte, 0, 0, 0);
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 70c4c59a1a8f..fae747cc57d2 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3376,12 +3376,15 @@ static void show_pte(unsigned long addr)
 	printf("pmdp @ 0x%px = 0x%016lx\n", pmdp, pmd_val(*pmdp));
 
 	ptep = pte_offset_map(pmdp, addr);
-	if (pte_none(*ptep)) {
+	if (!ptep || pte_none(*ptep)) {
+		if (ptep)
+			pte_unmap(ptep);
 		printf("no valid PTE\n");
 		return;
 	}
 
 	format_pte(ptep, pte_val(*ptep));
+	pte_unmap(ptep);
 
 	sync();
 	__delay(200);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 13/23] powerpc/hugetlb: pte_alloc_huge()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (11 preceding siblings ...)
  2023-06-08 19:23 ` [PATCH v2 12/23] powerpc: allow pte_offset_map[_lock]() to fail Hugh Dickins
@ 2023-06-08 19:24 ` Hugh Dickins
  2023-06-08 19:25 ` [PATCH v2 14/23] riscv/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead.  huge_pte_offset() is using __find_linux_pte(), which is using
pte_offset_kernel() - don't rename that to _huge, it's more complicated.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/powerpc/mm/hugetlbpage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index b900933507da..f7c683b672c1 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -183,7 +183,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		return NULL;
 
 	if (IS_ENABLED(CONFIG_PPC_8xx) && pshift < PMD_SHIFT)
-		return pte_alloc_map(mm, (pmd_t *)hpdp, addr);
+		return pte_alloc_huge(mm, (pmd_t *)hpdp, addr);
 
 	BUG_ON(!hugepd_none(*hpdp) && !hugepd_ok(*hpdp));
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 14/23] riscv/hugetlb: pte_alloc_huge() pte_offset_huge()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (12 preceding siblings ...)
  2023-06-08 19:24 ` [PATCH v2 13/23] powerpc/hugetlb: pte_alloc_huge() Hugh Dickins
@ 2023-06-08 19:25 ` Hugh Dickins
  2023-06-08 19:27 ` [PATCH v2 15/23] s390: allow pte_offset_map_lock() to fail Hugh Dickins
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead; with pte_offset_huge() a better name for pte_offset_kernel().

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosync.com>
---
 arch/riscv/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index e0ef56dc57b9..542883b3b49b 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -67,7 +67,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 
 	for_each_napot_order(order) {
 		if (napot_cont_size(order) == sz) {
-			pte = pte_alloc_map(mm, pmd, addr & napot_cont_mask(order));
+			pte = pte_alloc_huge(mm, pmd, addr & napot_cont_mask(order));
 			break;
 		}
 	}
@@ -114,7 +114,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 
 	for_each_napot_order(order) {
 		if (napot_cont_size(order) == sz) {
-			pte = pte_offset_kernel(pmd, addr & napot_cont_mask(order));
+			pte = pte_offset_huge(pmd, addr & napot_cont_mask(order));
 			break;
 		}
 	}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 15/23] s390: allow pte_offset_map_lock() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (13 preceding siblings ...)
  2023-06-08 19:25 ` [PATCH v2 14/23] riscv/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
@ 2023-06-08 19:27 ` Hugh Dickins
  2023-06-13 11:45   ` Claudio Imbrenda
  2023-06-08 19:29 ` [PATCH v2 16/23] s390: gmap use pte_unmap_unlock() not spin_unlock() Hugh Dickins
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Add comment on mm's contract with s390 above __zap_zero_pages(),
and fix old comment there: must be called after THP was disabled.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/s390/kernel/uv.c  |  2 ++
 arch/s390/mm/gmap.c    |  9 ++++++++-
 arch/s390/mm/pgtable.c | 12 +++++++++---
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index cb2ee06df286..3c62d1b218b1 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -294,6 +294,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
 
 	rc = -ENXIO;
 	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
+	if (!ptep)
+		goto out;
 	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
 		page = pte_page(*ptep);
 		rc = -EAGAIN;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index dc90d1eb0d55..3a2a31a15ea8 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2537,7 +2537,12 @@ static inline void thp_split_mm(struct mm_struct *mm)
  * Remove all empty zero pages from the mapping for lazy refaulting
  * - This must be called after mm->context.has_pgste is set, to avoid
  *   future creation of zero pages
- * - This must be called after THP was enabled
+ * - This must be called after THP was disabled.
+ *
+ * mm contracts with s390, that even if mm were to remove a page table,
+ * racing with the loop below and so causing pte_offset_map_lock() to fail,
+ * it will never insert a page table containing empty zero pages once
+ * mm_forbids_zeropage(mm) i.e. mm->context.has_pgste is set.
  */
 static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
 			   unsigned long end, struct mm_walk *walk)
@@ -2549,6 +2554,8 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
 		spinlock_t *ptl;
 
 		ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+		if (!ptep)
+			break;
 		if (is_zero_pfn(pte_pfn(*ptep)))
 			ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID));
 		pte_unmap_unlock(ptep, ptl);
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 6effb24de6d9..3bd2ab2a9a34 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -829,7 +829,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 	default:
 		return -EFAULT;
 	}
-
+again:
 	ptl = pmd_lock(mm, pmdp);
 	if (!pmd_present(*pmdp)) {
 		spin_unlock(ptl);
@@ -850,6 +850,8 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 	spin_unlock(ptl);
 
 	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+	if (!ptep)
+		goto again;
 	new = old = pgste_get_lock(ptep);
 	pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT |
 			    PGSTE_ACC_BITS | PGSTE_FP_BIT);
@@ -938,7 +940,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
 	default:
 		return -EFAULT;
 	}
-
+again:
 	ptl = pmd_lock(mm, pmdp);
 	if (!pmd_present(*pmdp)) {
 		spin_unlock(ptl);
@@ -955,6 +957,8 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
 	spin_unlock(ptl);
 
 	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+	if (!ptep)
+		goto again;
 	new = old = pgste_get_lock(ptep);
 	/* Reset guest reference bit only */
 	pgste_val(new) &= ~PGSTE_GR_BIT;
@@ -1000,7 +1004,7 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 	default:
 		return -EFAULT;
 	}
-
+again:
 	ptl = pmd_lock(mm, pmdp);
 	if (!pmd_present(*pmdp)) {
 		spin_unlock(ptl);
@@ -1017,6 +1021,8 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 	spin_unlock(ptl);
 
 	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+	if (!ptep)
+		goto again;
 	pgste = pgste_get_lock(ptep);
 	*key = (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
 	paddr = pte_val(*ptep) & PAGE_MASK;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 16/23] s390: gmap use pte_unmap_unlock() not spin_unlock()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (14 preceding siblings ...)
  2023-06-08 19:27 ` [PATCH v2 15/23] s390: allow pte_offset_map_lock() to fail Hugh Dickins
@ 2023-06-08 19:29 ` Hugh Dickins
  2023-06-08 19:30 ` [PATCH v2 17/23] sh/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map_lock() expects to be followed by pte_unmap_unlock(): to
keep balance in future, pass ptep as well as ptl to gmap_pte_op_end(),
and use pte_unmap_unlock() instead of direct spin_unlock() (even though
ptep ends up unused inside the macro).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
 arch/s390/mm/gmap.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 3a2a31a15ea8..f4b6fc746fce 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -895,12 +895,12 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
 
 /**
  * gmap_pte_op_end - release the page table lock
- * @ptl: pointer to the spinlock pointer
+ * @ptep: pointer to the locked pte
+ * @ptl: pointer to the page table spinlock
  */
-static void gmap_pte_op_end(spinlock_t *ptl)
+static void gmap_pte_op_end(pte_t *ptep, spinlock_t *ptl)
 {
-	if (ptl)
-		spin_unlock(ptl);
+	pte_unmap_unlock(ptep, ptl);
 }
 
 /**
@@ -1011,7 +1011,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 {
 	int rc;
 	pte_t *ptep;
-	spinlock_t *ptl = NULL;
+	spinlock_t *ptl;
 	unsigned long pbits = 0;
 
 	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
@@ -1025,7 +1025,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 	pbits |= (bits & GMAP_NOTIFY_SHADOW) ? PGSTE_VSIE_BIT : 0;
 	/* Protect and unlock. */
 	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits);
-	gmap_pte_op_end(ptl);
+	gmap_pte_op_end(ptep, ptl);
 	return rc;
 }
 
@@ -1154,7 +1154,7 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
 				/* Do *NOT* clear the _PAGE_INVALID bit! */
 				rc = 0;
 			}
-			gmap_pte_op_end(ptl);
+			gmap_pte_op_end(ptep, ptl);
 		}
 		if (!rc)
 			break;
@@ -1248,7 +1248,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 			if (!rc)
 				gmap_insert_rmap(sg, vmaddr, rmap);
 			spin_unlock(&sg->guest_table_lock);
-			gmap_pte_op_end(ptl);
+			gmap_pte_op_end(ptep, ptl);
 		}
 		radix_tree_preload_end();
 		if (rc) {
@@ -2156,7 +2156,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 			tptep = (pte_t *) gmap_table_walk(sg, saddr, 0);
 			if (!tptep) {
 				spin_unlock(&sg->guest_table_lock);
-				gmap_pte_op_end(ptl);
+				gmap_pte_op_end(sptep, ptl);
 				radix_tree_preload_end();
 				break;
 			}
@@ -2167,7 +2167,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 				rmap = NULL;
 				rc = 0;
 			}
-			gmap_pte_op_end(ptl);
+			gmap_pte_op_end(sptep, ptl);
 			spin_unlock(&sg->guest_table_lock);
 		}
 		radix_tree_preload_end();
@@ -2495,7 +2495,7 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4],
 				continue;
 			if (ptep_test_and_clear_uc(gmap->mm, vmaddr, ptep))
 				set_bit(i, bitmap);
-			spin_unlock(ptl);
+			pte_unmap_unlock(ptep, ptl);
 		}
 	}
 	gmap_pmd_op_end(gmap, pmdp);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 17/23] sh/hugetlb: pte_alloc_huge() pte_offset_huge()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (15 preceding siblings ...)
  2023-06-08 19:29 ` [PATCH v2 16/23] s390: gmap use pte_unmap_unlock() not spin_unlock() Hugh Dickins
@ 2023-06-08 19:30 ` Hugh Dickins
  2023-06-08 19:31 ` [PATCH v2 18/23] sparc/hugetlb: " Hugh Dickins
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead; with pte_offset_huge() a better name for pte_offset_kernel().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/sh/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index 999ab5916e69..6cb0ad73dbb9 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -38,7 +38,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			if (pud) {
 				pmd = pmd_alloc(mm, pud, addr);
 				if (pmd)
-					pte = pte_alloc_map(mm, pmd, addr);
+					pte = pte_alloc_huge(mm, pmd, addr);
 			}
 		}
 	}
@@ -63,7 +63,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 			if (pud) {
 				pmd = pmd_offset(pud, addr);
 				if (pmd)
-					pte = pte_offset_map(pmd, addr);
+					pte = pte_offset_huge(pmd, addr);
 			}
 		}
 	}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 18/23] sparc/hugetlb: pte_alloc_huge() pte_offset_huge()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (16 preceding siblings ...)
  2023-06-08 19:30 ` [PATCH v2 17/23] sh/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
@ 2023-06-08 19:31 ` Hugh Dickins
  2023-06-08 19:32 ` [PATCH v2 19/23] sparc: allow pte_offset_map() to fail Hugh Dickins
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead; with pte_offset_huge() a better name for pte_offset_kernel().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/sparc/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index d8e0e3c7038d..d7018823206c 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -298,7 +298,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		return NULL;
 	if (sz >= PMD_SIZE)
 		return (pte_t *)pmd;
-	return pte_alloc_map(mm, pmd, addr);
+	return pte_alloc_huge(mm, pmd, addr);
 }
 
 pte_t *huge_pte_offset(struct mm_struct *mm,
@@ -325,7 +325,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 		return NULL;
 	if (is_hugetlb_pmd(*pmd))
 		return (pte_t *)pmd;
-	return pte_offset_map(pmd, addr);
+	return pte_offset_huge(pmd, addr);
 }
 
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 19/23] sparc: allow pte_offset_map() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (17 preceding siblings ...)
  2023-06-08 19:31 ` [PATCH v2 18/23] sparc/hugetlb: " Hugh Dickins
@ 2023-06-08 19:32 ` Hugh Dickins
  2023-06-08 19:33 ` [PATCH v2 20/23] sparc: iounit and iommu use pte_offset_kernel() Hugh Dickins
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/sparc/kernel/signal32.c | 2 ++
 arch/sparc/mm/fault_64.c     | 3 +++
 arch/sparc/mm/tlb.c          | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c
index dad38960d1a8..ca450c7bc53f 100644
--- a/arch/sparc/kernel/signal32.c
+++ b/arch/sparc/kernel/signal32.c
@@ -328,6 +328,8 @@ static void flush_signal_insns(unsigned long address)
 		goto out_irqs_on;
 
 	ptep = pte_offset_map(pmdp, address);
+	if (!ptep)
+		goto out_irqs_on;
 	pte = *ptep;
 	if (!pte_present(pte))
 		goto out_unmap;
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index d91305de694c..d8a407fbe350 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -99,6 +99,7 @@ static unsigned int get_user_insn(unsigned long tpc)
 	local_irq_disable();
 
 	pmdp = pmd_offset(pudp, tpc);
+again:
 	if (pmd_none(*pmdp) || unlikely(pmd_bad(*pmdp)))
 		goto out_irq_enable;
 
@@ -115,6 +116,8 @@ static unsigned int get_user_insn(unsigned long tpc)
 #endif
 	{
 		ptep = pte_offset_map(pmdp, tpc);
+		if (!ptep)
+			goto again;
 		pte = *ptep;
 		if (pte_present(pte)) {
 			pa  = (pte_pfn(pte) << PAGE_SHIFT);
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 9a725547578e..7ecf8556947a 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -149,6 +149,8 @@ static void tlb_batch_pmd_scan(struct mm_struct *mm, unsigned long vaddr,
 	pte_t *pte;
 
 	pte = pte_offset_map(&pmd, vaddr);
+	if (!pte)
+		return;
 	end = vaddr + HPAGE_SIZE;
 	while (vaddr < end) {
 		if (pte_val(*pte) & _PAGE_VALID) {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 20/23] sparc: iounit and iommu use pte_offset_kernel()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (18 preceding siblings ...)
  2023-06-08 19:32 ` [PATCH v2 19/23] sparc: allow pte_offset_map() to fail Hugh Dickins
@ 2023-06-08 19:33 ` Hugh Dickins
  2023-06-08 19:35 ` [PATCH v2 21/23] x86: Allow get_locked_pte() to fail Hugh Dickins
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

iounit_alloc() and sbus_iommu_alloc() are working from pmd_off_k(),
so should use pte_offset_kernel() instead of pte_offset_map(), to avoid
the question of whether a pte_unmap() will be needed to balance.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/sparc/mm/io-unit.c | 2 +-
 arch/sparc/mm/iommu.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/io-unit.c b/arch/sparc/mm/io-unit.c
index bf3e6d2fe5d9..133dd42570d6 100644
--- a/arch/sparc/mm/io-unit.c
+++ b/arch/sparc/mm/io-unit.c
@@ -244,7 +244,7 @@ static void *iounit_alloc(struct device *dev, size_t len,
 			long i;
 
 			pmdp = pmd_off_k(addr);
-			ptep = pte_offset_map(pmdp, addr);
+			ptep = pte_offset_kernel(pmdp, addr);
 
 			set_pte(ptep, mk_pte(virt_to_page(page), dvma_prot));
 
diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c
index 9e3f6933ca13..3a6caef68348 100644
--- a/arch/sparc/mm/iommu.c
+++ b/arch/sparc/mm/iommu.c
@@ -358,7 +358,7 @@ static void *sbus_iommu_alloc(struct device *dev, size_t len,
 				__flush_page_to_ram(page);
 
 			pmdp = pmd_off_k(addr);
-			ptep = pte_offset_map(pmdp, addr);
+			ptep = pte_offset_kernel(pmdp, addr);
 
 			set_pte(ptep, mk_pte(virt_to_page(page), dvma_prot));
 		}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 21/23] x86: Allow get_locked_pte() to fail
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (19 preceding siblings ...)
  2023-06-08 19:33 ` [PATCH v2 20/23] sparc: iounit and iommu use pte_offset_kernel() Hugh Dickins
@ 2023-06-08 19:35 ` Hugh Dickins
  2023-06-08 19:36 ` [PATCH v2 22/23] x86: sme_populate_pgd() use pte_offset_kernel() Hugh Dickins
  2023-06-08 19:37 ` [PATCH v2 23/23] xtensa: add pte_unmap() to balance pte_offset_map() Hugh Dickins
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/x86/kernel/ldt.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 525876e7b9f4..adc67f98819a 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -367,8 +367,10 @@ static void unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
 
 		va = (unsigned long)ldt_slot_va(ldt->slot) + offset;
 		ptep = get_locked_pte(mm, va, &ptl);
-		pte_clear(mm, va, ptep);
-		pte_unmap_unlock(ptep, ptl);
+		if (!WARN_ON_ONCE(!ptep)) {
+			pte_clear(mm, va, ptep);
+			pte_unmap_unlock(ptep, ptl);
+		}
 	}
 
 	va = (unsigned long)ldt_slot_va(ldt->slot);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 22/23] x86: sme_populate_pgd() use pte_offset_kernel()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (20 preceding siblings ...)
  2023-06-08 19:35 ` [PATCH v2 21/23] x86: Allow get_locked_pte() to fail Hugh Dickins
@ 2023-06-08 19:36 ` Hugh Dickins
  2023-06-08 19:37 ` [PATCH v2 23/23] xtensa: add pte_unmap() to balance pte_offset_map() Hugh Dickins
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

sme_populate_pgd() is an __init function for sme_encrypt_kernel():
it should use pte_offset_kernel() instead of pte_offset_map(), to avoid
the question of whether a pte_unmap() will be needed to balance.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/x86/mm/mem_encrypt_identity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index c6efcf559d88..a1ab542bdfd6 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -188,7 +188,7 @@ static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
 	if (pmd_large(*pmd))
 		return;
 
-	pte = pte_offset_map(pmd, ppd->vaddr);
+	pte = pte_offset_kernel(pmd, ppd->vaddr);
 	if (pte_none(*pte))
 		set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 23/23] xtensa: add pte_unmap() to balance pte_offset_map()
  2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
                   ` (21 preceding siblings ...)
  2023-06-08 19:36 ` [PATCH v2 22/23] x86: sme_populate_pgd() use pte_offset_kernel() Hugh Dickins
@ 2023-06-08 19:37 ` Hugh Dickins
  22 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-08 19:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

To keep balance in future, remember to pte_unmap() after a successful
pte_offset_map().  And act as if get_pte_for_vaddr() really needs a map
there, to read the pteval before "unmapping", to be sure page table is
not removed.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/xtensa/mm/tlb.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/xtensa/mm/tlb.c b/arch/xtensa/mm/tlb.c
index 27a477dae232..0a11fc5f185b 100644
--- a/arch/xtensa/mm/tlb.c
+++ b/arch/xtensa/mm/tlb.c
@@ -179,6 +179,7 @@ static unsigned get_pte_for_vaddr(unsigned vaddr)
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
+	unsigned int pteval;
 
 	if (!mm)
 		mm = task->active_mm;
@@ -197,7 +198,9 @@ static unsigned get_pte_for_vaddr(unsigned vaddr)
 	pte = pte_offset_map(pmd, vaddr);
 	if (!pte)
 		return 0;
-	return pte_val(*pte);
+	pteval = pte_val(*pte);
+	pte_unmap(pte);
+	return pteval;
 }
 
 enum {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 07/23 fix] mips: update_mmu_cache() can replace __update_tlb(): fix
  2023-06-08 19:17 ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Hugh Dickins
@ 2023-06-09  8:08   ` Hugh Dickins
  2023-06-14 23:17   ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Nathan Chancellor
  1 sibling, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2023-06-09  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

I expect this to fix the
arch/mips/mm/tlb-r4k.c:300:16: warning: variable 'pmdp' set but not used
reported by the kernel test robot; but I am uncomfortable rearranging
lines in this tlb_probe_hazard() area, and would be glad for review and
testing by someone familiar with mips - thanks in advance!

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306091304.cNVIspK0-lkp@intel.com/
Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/mips/mm/tlb-r4k.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index c96725d17cab..80fc90d8d2f1 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -293,11 +293,13 @@ void local_flush_tlb_one(unsigned long page)
 void update_mmu_cache(struct vm_area_struct *vma,
 		      unsigned long address, pte_t *ptep)
 {
-	unsigned long flags;
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	pgd_t *pgdp;
 	p4d_t *p4dp;
 	pud_t *pudp;
 	pmd_t *pmdp;
+#endif
+	unsigned long flags;
 	int idx, pid;
 
 	/*
@@ -316,15 +318,15 @@ void update_mmu_cache(struct vm_area_struct *vma,
 		pid = read_c0_entryhi() & cpu_asid_mask(&current_cpu_data);
 		write_c0_entryhi(address | pid);
 	}
-	pgdp = pgd_offset(vma->vm_mm, address);
 	mtc0_tlbw_hazard();
 	tlb_probe();
 	tlb_probe_hazard();
+	idx = read_c0_index();
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+	pgdp = pgd_offset(vma->vm_mm, address);
 	p4dp = p4d_offset(pgdp, address);
 	pudp = pud_offset(p4dp, address);
 	pmdp = pmd_offset(pudp, address);
-	idx = read_c0_index();
-#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/* this could be a huge page  */
 	if (ptep == (pte_t *)pmdp) {
 		unsigned long lo;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 15/23] s390: allow pte_offset_map_lock() to fail
  2023-06-08 19:27 ` [PATCH v2 15/23] s390: allow pte_offset_map_lock() to fail Hugh Dickins
@ 2023-06-13 11:45   ` Claudio Imbrenda
  0 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2023-06-13 11:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Peter Zijlstra, Russell King, Catalin Marinas, Will Deacon,
	Geert Uytterhoeven, Greg Ungerer, Michal Simek,
	Thomas Bogendoerfer, Helge Deller, John David Anglin,
	Aneesh Kumar K.V, Michael Ellerman, Alexandre Ghiti,
	Palmer Dabbelt, Heiko Carstens, Christian Borntraeger,
	Alexander Gordeev, John Paul Adrian Glaubitz, David S. Miller,
	Chris Zankel, Max Filippov, x86, linux-arm-kernel, linux-ia64,
	linux-m68k, linux-mips, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, sparclinux, linux-kernel, linux-mm

On Thu, 8 Jun 2023 12:27:22 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> In rare transient cases, not yet made possible, pte_offset_map() and
> pte_offset_map_lock() may not find a page table: handle appropriately.
> 
> Add comment on mm's contract with s390 above __zap_zero_pages(),
> and fix old comment there: must be called after THP was disabled.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

> ---
>  arch/s390/kernel/uv.c  |  2 ++
>  arch/s390/mm/gmap.c    |  9 ++++++++-
>  arch/s390/mm/pgtable.c | 12 +++++++++---
>  3 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index cb2ee06df286..3c62d1b218b1 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -294,6 +294,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>  
>  	rc = -ENXIO;
>  	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> +	if (!ptep)
> +		goto out;
>  	if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {
>  		page = pte_page(*ptep);
>  		rc = -EAGAIN;
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index dc90d1eb0d55..3a2a31a15ea8 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2537,7 +2537,12 @@ static inline void thp_split_mm(struct mm_struct *mm)
>   * Remove all empty zero pages from the mapping for lazy refaulting
>   * - This must be called after mm->context.has_pgste is set, to avoid
>   *   future creation of zero pages
> - * - This must be called after THP was enabled
> + * - This must be called after THP was disabled.
> + *
> + * mm contracts with s390, that even if mm were to remove a page table,
> + * racing with the loop below and so causing pte_offset_map_lock() to fail,
> + * it will never insert a page table containing empty zero pages once
> + * mm_forbids_zeropage(mm) i.e. mm->context.has_pgste is set.
>   */
>  static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
>  			   unsigned long end, struct mm_walk *walk)
> @@ -2549,6 +2554,8 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
>  		spinlock_t *ptl;
>  
>  		ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> +		if (!ptep)
> +			break;
>  		if (is_zero_pfn(pte_pfn(*ptep)))
>  			ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID));
>  		pte_unmap_unlock(ptep, ptl);
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 6effb24de6d9..3bd2ab2a9a34 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -829,7 +829,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	default:
>  		return -EFAULT;
>  	}
> -
> +again:
>  	ptl = pmd_lock(mm, pmdp);
>  	if (!pmd_present(*pmdp)) {
>  		spin_unlock(ptl);
> @@ -850,6 +850,8 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	spin_unlock(ptl);
>  
>  	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
> +	if (!ptep)
> +		goto again;
>  	new = old = pgste_get_lock(ptep);
>  	pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT |
>  			    PGSTE_ACC_BITS | PGSTE_FP_BIT);
> @@ -938,7 +940,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
>  	default:
>  		return -EFAULT;
>  	}
> -
> +again:
>  	ptl = pmd_lock(mm, pmdp);
>  	if (!pmd_present(*pmdp)) {
>  		spin_unlock(ptl);
> @@ -955,6 +957,8 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
>  	spin_unlock(ptl);
>  
>  	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
> +	if (!ptep)
> +		goto again;
>  	new = old = pgste_get_lock(ptep);
>  	/* Reset guest reference bit only */
>  	pgste_val(new) &= ~PGSTE_GR_BIT;
> @@ -1000,7 +1004,7 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	default:
>  		return -EFAULT;
>  	}
> -
> +again:
>  	ptl = pmd_lock(mm, pmdp);
>  	if (!pmd_present(*pmdp)) {
>  		spin_unlock(ptl);
> @@ -1017,6 +1021,8 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	spin_unlock(ptl);
>  
>  	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
> +	if (!ptep)
> +		goto again;
>  	pgste = pgste_get_lock(ptep);
>  	*key = (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
>  	paddr = pte_val(*ptep) & PAGE_MASK;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()
  2023-06-08 19:17 ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Hugh Dickins
  2023-06-09  8:08   ` [PATCH v2 07/23 fix] mips: update_mmu_cache() can replace __update_tlb(): fix Hugh Dickins
@ 2023-06-14 23:17   ` Nathan Chancellor
  2023-06-15  0:26     ` Hugh Dickins
  2023-06-15 22:07     ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Yu Zhao
  1 sibling, 2 replies; 36+ messages in thread
From: Nathan Chancellor @ 2023-06-14 23:17 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Peter Zijlstra, Russell King, Catalin Marinas, Will Deacon,
	Geert Uytterhoeven, Greg Ungerer, Michal Simek,
	Thomas Bogendoerfer, Helge Deller, John David Anglin,
	Aneesh Kumar K.V, Michael Ellerman, Alexandre Ghiti,
	Palmer Dabbelt, Heiko Carstens, Christian Borntraeger,
	Claudio Imbrenda, Alexander Gordeev, John Paul Adrian Glaubitz,
	David S. Miller, Chris Zankel, Max Filippov, x86,
	linux-arm-kernel, linux-ia64, linux-m68k, linux-mips,
	linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
	sparclinux, linux-kernel, linux-mm

Hi Hugh,

On Thu, Jun 08, 2023 at 12:17:24PM -0700, Hugh Dickins wrote:
> Don't make update_mmu_cache() a wrapper around __update_tlb(): call it
> directly, and use the ptep (or pmdp) provided by the caller, instead of
> re-calling pte_offset_map() - which would raise a question of whether a
> pte_unmap() is needed to balance it.
> 
> Check whether the "ptep" provided by the caller is actually the pmdp,
> instead of testing pmd_huge(): or test pmd_huge() too and warn if it
> disagrees?  This is "hazardous" territory: needs review and testing.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  arch/mips/include/asm/pgtable.h | 15 +++------------
>  arch/mips/mm/tlb-r3k.c          |  5 +++--
>  arch/mips/mm/tlb-r4k.c          |  9 +++------
>  3 files changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 574fa14ac8b2..9175dfab08d5 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -565,15 +565,8 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  }
>  #endif
>  
> -extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> -	pte_t pte);
> -
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -	unsigned long address, pte_t *ptep)
> -{
> -	pte_t pte = *ptep;
> -	__update_tlb(vma, address, pte);
> -}
> +extern void update_mmu_cache(struct vm_area_struct *vma,
> +	unsigned long address, pte_t *ptep);
>  
>  #define	__HAVE_ARCH_UPDATE_MMU_TLB
>  #define update_mmu_tlb	update_mmu_cache
> @@ -581,9 +574,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
>  	unsigned long address, pmd_t *pmdp)
>  {
> -	pte_t pte = *(pte_t *)pmdp;
> -
> -	__update_tlb(vma, address, pte);
> +	update_mmu_cache(vma, address, (pte_t *)pmdp);
>  }
>  
>  /*
> diff --git a/arch/mips/mm/tlb-r3k.c b/arch/mips/mm/tlb-r3k.c
> index 53dfa2b9316b..e5722cd8dd6d 100644
> --- a/arch/mips/mm/tlb-r3k.c
> +++ b/arch/mips/mm/tlb-r3k.c
> @@ -176,7 +176,8 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
>  	}
>  }
>  
> -void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t pte)
> +void update_mmu_cache(struct vm_area_struct *vma,
> +		      unsigned long address, pte_t *ptep)
>  {
>  	unsigned long asid_mask = cpu_asid_mask(&current_cpu_data);
>  	unsigned long flags;
> @@ -203,7 +204,7 @@ void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t pte)
>  	BARRIER;
>  	tlb_probe();
>  	idx = read_c0_index();
> -	write_c0_entrylo0(pte_val(pte));
> +	write_c0_entrylo0(pte_val(*ptep));
>  	write_c0_entryhi(address | pid);
>  	if (idx < 0) {					/* BARRIER */
>  		tlb_write_random();
> diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
> index 1b939abbe4ca..c96725d17cab 100644
> --- a/arch/mips/mm/tlb-r4k.c
> +++ b/arch/mips/mm/tlb-r4k.c
> @@ -290,14 +290,14 @@ void local_flush_tlb_one(unsigned long page)
>   * updates the TLB with the new pte(s), and another which also checks
>   * for the R4k "end of page" hardware bug and does the needy.
>   */
> -void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
> +void update_mmu_cache(struct vm_area_struct *vma,
> +		      unsigned long address, pte_t *ptep)
>  {
>  	unsigned long flags;
>  	pgd_t *pgdp;
>  	p4d_t *p4dp;
>  	pud_t *pudp;
>  	pmd_t *pmdp;
> -	pte_t *ptep;
>  	int idx, pid;
>  
>  	/*
> @@ -326,10 +326,9 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
>  	idx = read_c0_index();
>  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>  	/* this could be a huge page  */
> -	if (pmd_huge(*pmdp)) {
> +	if (ptep == (pte_t *)pmdp) {
>  		unsigned long lo;
>  		write_c0_pagemask(PM_HUGE_MASK);
> -		ptep = (pte_t *)pmdp;
>  		lo = pte_to_entrylo(pte_val(*ptep));
>  		write_c0_entrylo0(lo);
>  		write_c0_entrylo1(lo + (HPAGE_SIZE >> 7));
> @@ -344,8 +343,6 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
>  	} else
>  #endif
>  	{
> -		ptep = pte_offset_map(pmdp, address);
> -
>  #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
>  #ifdef CONFIG_XPA
>  		write_c0_entrylo0(pte_to_entrylo(ptep->pte_high));
> -- 
> 2.35.3
> 

I just bisected a crash while powering down a MIPS machine in QEMU to
this change as commit 8044511d3893 ("mips: update_mmu_cache() can
replace __update_tlb()") in linux-next. Unfortunately, I can still
reproduce it with the existing fix you have for this change on the
mailing list, which is present in next-20230614.

I can reproduce it with the GCC 13.1.0 on kernel.org [1].

  $ make -skj"$(nproc)" ARCH=mips CROSS_COMPILE=mips-linux- mrproper malta_defconfig vmlinux

  $ qemu-system-mipsel \
      -display none \
      -nodefaults \
      -cpu 24Kf \
      -machine malta \
      -kernel vmlinux \
      -initrd rootfs.cpio \
      -m 512m \
      -serial mon:stdio
  ...
  Linux version 6.4.0-rc6-next-20230614 (nathan@dev-arch.thelio-3990X) (mips-linux-gcc (GCC) 13.1.0, GNU ld (GNU Binutils) 2.40) #1 SMP Wed Jun 14 16:13:02 MST 2023
  ...
  Run /init as init process
  process '/bin/busybox' started with executable stack
  do_page_fault(): sending SIGSEGV to init for invalid read access from 0000003c
  epc = 77b893dc in ld-uClibc-1.0.39.so[77b84000+8000]
  ra  = 77b8930c in ld-uClibc-1.0.39.so[77b84000+8000]
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

The rootfs is available at [2] if it is needed. I am more than happy to
provide additional information or test patches if necessary.

[1]: https://mirrors.edge.kernel.org/pub/tools/crosstool/
[2]: https://github.com/ClangBuiltLinux/boot-utils/releases/download/20230609-194440/mipsel-rootfs.cpio.zst

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()
  2023-06-14 23:17   ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Nathan Chancellor
@ 2023-06-15  0:26     ` Hugh Dickins
  2023-06-15  5:43       ` Hugh Dickins
  2023-06-15 22:07     ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Yu Zhao
  1 sibling, 1 reply; 36+ messages in thread
From: Hugh Dickins @ 2023-06-15  0:26 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Hugh Dickins, Andrew Morton, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Peter Zijlstra, Russell King,
	Catalin Marinas, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Wed, 14 Jun 2023, Nathan Chancellor wrote:

> Hi Hugh,
> 
> On Thu, Jun 08, 2023 at 12:17:24PM -0700, Hugh Dickins wrote:
> > Don't make update_mmu_cache() a wrapper around __update_tlb(): call it
> > directly, and use the ptep (or pmdp) provided by the caller, instead of
> > re-calling pte_offset_map() - which would raise a question of whether a
> > pte_unmap() is needed to balance it.
> > 
> > Check whether the "ptep" provided by the caller is actually the pmdp,
> > instead of testing pmd_huge(): or test pmd_huge() too and warn if it
> > disagrees?  This is "hazardous" territory: needs review and testing.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> >  arch/mips/include/asm/pgtable.h | 15 +++------------
> >  arch/mips/mm/tlb-r3k.c          |  5 +++--
> >  arch/mips/mm/tlb-r4k.c          |  9 +++------
> >  3 files changed, 9 insertions(+), 20 deletions(-)
> > 
> > diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> > index 574fa14ac8b2..9175dfab08d5 100644
> > --- a/arch/mips/include/asm/pgtable.h
> > +++ b/arch/mips/include/asm/pgtable.h
> > @@ -565,15 +565,8 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> >  }
> >  #endif
> >  
> > -extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> > -	pte_t pte);
> > -
> > -static inline void update_mmu_cache(struct vm_area_struct *vma,
> > -	unsigned long address, pte_t *ptep)
> > -{
> > -	pte_t pte = *ptep;
> > -	__update_tlb(vma, address, pte);
> > -}
> > +extern void update_mmu_cache(struct vm_area_struct *vma,
> > +	unsigned long address, pte_t *ptep);
> >  
> >  #define	__HAVE_ARCH_UPDATE_MMU_TLB
> >  #define update_mmu_tlb	update_mmu_cache
> > @@ -581,9 +574,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> >  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> >  	unsigned long address, pmd_t *pmdp)
> >  {
> > -	pte_t pte = *(pte_t *)pmdp;
> > -
> > -	__update_tlb(vma, address, pte);
> > +	update_mmu_cache(vma, address, (pte_t *)pmdp);
> >  }
> >  
> >  /*
> > diff --git a/arch/mips/mm/tlb-r3k.c b/arch/mips/mm/tlb-r3k.c
> > index 53dfa2b9316b..e5722cd8dd6d 100644
> > --- a/arch/mips/mm/tlb-r3k.c
> > +++ b/arch/mips/mm/tlb-r3k.c
> > @@ -176,7 +176,8 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
> >  	}
> >  }
> >  
> > -void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t pte)
> > +void update_mmu_cache(struct vm_area_struct *vma,
> > +		      unsigned long address, pte_t *ptep)
> >  {
> >  	unsigned long asid_mask = cpu_asid_mask(&current_cpu_data);
> >  	unsigned long flags;
> > @@ -203,7 +204,7 @@ void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t pte)
> >  	BARRIER;
> >  	tlb_probe();
> >  	idx = read_c0_index();
> > -	write_c0_entrylo0(pte_val(pte));
> > +	write_c0_entrylo0(pte_val(*ptep));
> >  	write_c0_entryhi(address | pid);
> >  	if (idx < 0) {					/* BARRIER */
> >  		tlb_write_random();
> > diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
> > index 1b939abbe4ca..c96725d17cab 100644
> > --- a/arch/mips/mm/tlb-r4k.c
> > +++ b/arch/mips/mm/tlb-r4k.c
> > @@ -290,14 +290,14 @@ void local_flush_tlb_one(unsigned long page)
> >   * updates the TLB with the new pte(s), and another which also checks
> >   * for the R4k "end of page" hardware bug and does the needy.
> >   */
> > -void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
> > +void update_mmu_cache(struct vm_area_struct *vma,
> > +		      unsigned long address, pte_t *ptep)
> >  {
> >  	unsigned long flags;
> >  	pgd_t *pgdp;
> >  	p4d_t *p4dp;
> >  	pud_t *pudp;
> >  	pmd_t *pmdp;
> > -	pte_t *ptep;
> >  	int idx, pid;
> >  
> >  	/*
> > @@ -326,10 +326,9 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
> >  	idx = read_c0_index();
> >  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
> >  	/* this could be a huge page  */
> > -	if (pmd_huge(*pmdp)) {
> > +	if (ptep == (pte_t *)pmdp) {
> >  		unsigned long lo;
> >  		write_c0_pagemask(PM_HUGE_MASK);
> > -		ptep = (pte_t *)pmdp;
> >  		lo = pte_to_entrylo(pte_val(*ptep));
> >  		write_c0_entrylo0(lo);
> >  		write_c0_entrylo1(lo + (HPAGE_SIZE >> 7));
> > @@ -344,8 +343,6 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
> >  	} else
> >  #endif
> >  	{
> > -		ptep = pte_offset_map(pmdp, address);
> > -
> >  #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> >  #ifdef CONFIG_XPA
> >  		write_c0_entrylo0(pte_to_entrylo(ptep->pte_high));
> > -- 
> > 2.35.3
> > 
> 
> I just bisected a crash while powering down a MIPS machine in QEMU to
> this change as commit 8044511d3893 ("mips: update_mmu_cache() can
> replace __update_tlb()") in linux-next.

Thank you, Nathan, that's very helpful indeed.  This patch certainly knew
that it wanted testing, and I'm glad to hear that it is now seeing some.

While powering down?  The messages below look like it was just coming up,
but no doubt that's because you were bisecting (or because I'm unfamiliar
with what messages to expect there).  It's probably irrelevant information,
but I wonder whether the (V)machine worked well enough for a while before
you first powered down and spotted the problem, or whether it's never got
much further than trying to run init (busybox)?  I'm trying to get a feel
for whether the problem occurs under common or uncommon conditions.

> Unfortunately, I can still
> reproduce it with the existing fix you have for this change on the
> mailing list, which is present in next-20230614.

Right, that later fix was only for a build warning, nothing functional
(or at least I hoped that it wasn't making any functional difference).

Thanks a lot for the detailed instructions below: unfortunately, those
would draw me into a realm of testing I've never needed to enter before,
so a lot of time spent on setup and learning.  Usually, I just stare at
the source.

What this probably says is that I should revert most my cleanup there,
and keep as close to the existing code as possible.  But some change is
needed, and I may need to understand (or have a good guess at) what was
going wrong, to decide what kind of retreat will be successful.

Back to the source for a while: I hope I'll find examples in nearby MIPS
kernel source (and git history), which will hint at the right way forward.
Then send you a patch against next-20230614 to try, when I'm reasonably
confident that it's enough to satisfy my purpose, but likely not to waste
your time.

Thanks, until later,
Hugh

> 
> I can reproduce it with the GCC 13.1.0 on kernel.org [1].
> 
>   $ make -skj"$(nproc)" ARCH=mips CROSS_COMPILE=mips-linux- mrproper malta_defconfig vmlinux
> 
>   $ qemu-system-mipsel \
>       -display none \
>       -nodefaults \
>       -cpu 24Kf \
>       -machine malta \
>       -kernel vmlinux \
>       -initrd rootfs.cpio \
>       -m 512m \
>       -serial mon:stdio
>   ...
>   Linux version 6.4.0-rc6-next-20230614 (nathan@dev-arch.thelio-3990X) (mips-linux-gcc (GCC) 13.1.0, GNU ld (GNU Binutils) 2.40) #1 SMP Wed Jun 14 16:13:02 MST 2023
>   ...
>   Run /init as init process
>   process '/bin/busybox' started with executable stack
>   do_page_fault(): sending SIGSEGV to init for invalid read access from 0000003c
>   epc = 77b893dc in ld-uClibc-1.0.39.so[77b84000+8000]
>   ra  = 77b8930c in ld-uClibc-1.0.39.so[77b84000+8000]
>   Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>   ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> The rootfs is available at [2] if it is needed. I am more than happy to
> provide additional information or test patches if necessary.
> 
> [1]: https://mirrors.edge.kernel.org/pub/tools/crosstool/
> [2]: https://github.com/ClangBuiltLinux/boot-utils/releases/download/20230609-194440/mipsel-rootfs.cpio.zst
> 
> Cheers,
> Nathan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()
  2023-06-15  0:26     ` Hugh Dickins
@ 2023-06-15  5:43       ` Hugh Dickins
  2023-06-15 15:50         ` Nathan Chancellor
  0 siblings, 1 reply; 36+ messages in thread
From: Hugh Dickins @ 2023-06-15  5:43 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Hugh Dickins, Andrew Morton, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Peter Zijlstra, Russell King,
	Catalin Marinas, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3563 bytes --]

On Wed, 14 Jun 2023, Hugh Dickins wrote:
> On Wed, 14 Jun 2023, Nathan Chancellor wrote:
> > 
> > I just bisected a crash while powering down a MIPS machine in QEMU to
> > this change as commit 8044511d3893 ("mips: update_mmu_cache() can
> > replace __update_tlb()") in linux-next.
> 
> Thank you, Nathan, that's very helpful indeed.  This patch certainly knew
> that it wanted testing, and I'm glad to hear that it is now seeing some.
> 
> While powering down?  The messages below look like it was just coming up,
> but no doubt that's because you were bisecting (or because I'm unfamiliar
> with what messages to expect there).  It's probably irrelevant information,
> but I wonder whether the (V)machine worked well enough for a while before
> you first powered down and spotted the problem, or whether it's never got
> much further than trying to run init (busybox)?  I'm trying to get a feel
> for whether the problem occurs under common or uncommon conditions.
> 
> > Unfortunately, I can still
> > reproduce it with the existing fix you have for this change on the
> > mailing list, which is present in next-20230614.
> 
> Right, that later fix was only for a build warning, nothing functional
> (or at least I hoped that it wasn't making any functional difference).
> 
> Thanks a lot for the detailed instructions below: unfortunately, those
> would draw me into a realm of testing I've never needed to enter before,
> so a lot of time spent on setup and learning.  Usually, I just stare at
> the source.
> 
> What this probably says is that I should revert most my cleanup there,
> and keep as close to the existing code as possible.  But some change is
> needed, and I may need to understand (or have a good guess at) what was
> going wrong, to decide what kind of retreat will be successful.
> 
> Back to the source for a while: I hope I'll find examples in nearby MIPS
> kernel source (and git history), which will hint at the right way forward.
> Then send you a patch against next-20230614 to try, when I'm reasonably
> confident that it's enough to satisfy my purpose, but likely not to waste
> your time.

I'm going to take advantage of your good nature by attaching
two alternative patches, either to go on top of next-20230614.

mips1.patch,
 arch/mips/mm/tlb-r4k.c |   12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

is by far my favourite.  I couldn't see anything wrong with what's
already there for mips, but it seems possible that (though I didn't
find it) somewhere calls update_mmu_cache_pmd() on a page table.  So
mips1.patch restores the pmd_huge() check, and cleans up further by
removing the silly pgdp, p4dp, pudp, pmdp stuff: the pointer has now
been passed in by the caller, why walk the tree again?  I should have
done it this way before.

But if that doesn't work, then I'm afraid it will have to be
mips2.patch,
 arch/mips/include/asm/pgtable.h |   15 ++++++++++++---
 arch/mips/mm/tlb-r3k.c          |    5 ++---
 arch/mips/mm/tlb-r4k.c          |   27 ++++++++++++++++++---------
 3 files changed, 32 insertions(+), 15 deletions(-)

which reverts all of the original patch and its build warning fix,
and does a pte_unmap() to balance the silly pte_offset_map() there;
with an apologetic comment for this being about the only place in
the tree where I have no idea what to do if ptep were NULL.

I do hope that you find the first fixes the breakage; but if not, then
I even more fervently hope that the second will, despite my hating it.
Touch wood for the first, fingers crossed for the second, thanks,

Hugh

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch; name=mips1.patch, Size: 900 bytes --]

--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -293,12 +293,6 @@ void local_flush_tlb_one(unsigned long page)
 void update_mmu_cache(struct vm_area_struct *vma,
 		      unsigned long address, pte_t *ptep)
 {
-#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
-	pgd_t *pgdp;
-	p4d_t *p4dp;
-	pud_t *pudp;
-	pmd_t *pmdp;
-#endif
 	unsigned long flags;
 	int idx, pid;
 
@@ -323,12 +317,8 @@ void update_mmu_cache(struct vm_area_struct *vma,
 	tlb_probe_hazard();
 	idx = read_c0_index();
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
-	pgdp = pgd_offset(vma->vm_mm, address);
-	p4dp = p4d_offset(pgdp, address);
-	pudp = pud_offset(p4dp, address);
-	pmdp = pmd_offset(pudp, address);
 	/* this could be a huge page  */
-	if (ptep == (pte_t *)pmdp) {
+	if (pmd_huge(*(pmd_t *)ptep)) {
 		unsigned long lo;
 		write_c0_pagemask(PM_HUGE_MASK);
 		lo = pte_to_entrylo(pte_val(*ptep));

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: Type: text/x-patch; name=mips2.patch, Size: 3927 bytes --]

--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -565,8 +565,15 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 }
 #endif
 
-extern void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *ptep);
+extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
+	pte_t pte);
+
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+	unsigned long address, pte_t *ptep)
+{
+	pte_t pte = *ptep;
+	__update_tlb(vma, address, pte);
+}
 
 #define	__HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb	update_mmu_cache
@@ -574,7 +581,9 @@ extern void update_mmu_cache(struct vm_area_struct *vma,
 static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
 	unsigned long address, pmd_t *pmdp)
 {
-	update_mmu_cache(vma, address, (pte_t *)pmdp);
+	pte_t pte = *(pte_t *)pmdp;
+
+	__update_tlb(vma, address, pte);
 }
 
 /*
--- a/arch/mips/mm/tlb-r3k.c
+++ b/arch/mips/mm/tlb-r3k.c
@@ -176,8 +176,7 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 	}
 }
 
-void update_mmu_cache(struct vm_area_struct *vma,
-		      unsigned long address, pte_t *ptep)
+void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t pte)
 {
 	unsigned long asid_mask = cpu_asid_mask(&current_cpu_data);
 	unsigned long flags;
@@ -204,7 +203,7 @@ void update_mmu_cache(struct vm_area_struct *vma,
 	BARRIER;
 	tlb_probe();
 	idx = read_c0_index();
-	write_c0_entrylo0(pte_val(*ptep));
+	write_c0_entrylo0(pte_val(pte));
 	write_c0_entryhi(address | pid);
 	if (idx < 0) {					/* BARRIER */
 		tlb_write_random();
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -290,16 +290,14 @@ void local_flush_tlb_one(unsigned long page)
  * updates the TLB with the new pte(s), and another which also checks
  * for the R4k "end of page" hardware bug and does the needy.
  */
-void update_mmu_cache(struct vm_area_struct *vma,
-		      unsigned long address, pte_t *ptep)
+void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 {
-#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+	unsigned long flags;
 	pgd_t *pgdp;
 	p4d_t *p4dp;
 	pud_t *pudp;
 	pmd_t *pmdp;
-#endif
-	unsigned long flags;
+	pte_t *ptep, *ptemap = NULL;
 	int idx, pid;
 
 	/*
@@ -318,19 +316,20 @@ void update_mmu_cache(struct vm_area_struct *vma,
 		pid = read_c0_entryhi() & cpu_asid_mask(&current_cpu_data);
 		write_c0_entryhi(address | pid);
 	}
+	pgdp = pgd_offset(vma->vm_mm, address);
 	mtc0_tlbw_hazard();
 	tlb_probe();
 	tlb_probe_hazard();
-	idx = read_c0_index();
-#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
-	pgdp = pgd_offset(vma->vm_mm, address);
 	p4dp = p4d_offset(pgdp, address);
 	pudp = pud_offset(p4dp, address);
 	pmdp = pmd_offset(pudp, address);
+	idx = read_c0_index();
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/* this could be a huge page  */
-	if (ptep == (pte_t *)pmdp) {
+	if (pmd_huge(*pmdp)) {
 		unsigned long lo;
 		write_c0_pagemask(PM_HUGE_MASK);
+		ptep = (pte_t *)pmdp;
 		lo = pte_to_entrylo(pte_val(*ptep));
 		write_c0_entrylo0(lo);
 		write_c0_entrylo1(lo + (HPAGE_SIZE >> 7));
@@ -345,6 +344,13 @@ void update_mmu_cache(struct vm_area_struct *vma,
 	} else
 #endif
 	{
+		ptemap = ptep = pte_offset_map(pmdp, address);
+		/*
+		 * update_mmu_cache() is called between pte_offset_map_lock()
+		 * and pte_unmap_unlock(), so we can assume that ptep is not
+		 * NULL here: and what should be done below if it were NULL?
+		 */
+
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 #ifdef CONFIG_XPA
 		write_c0_entrylo0(pte_to_entrylo(ptep->pte_high));
@@ -372,6 +378,9 @@ void update_mmu_cache(struct vm_area_struct *vma,
 	tlbw_use_hazard();
 	htw_start();
 	flush_micro_tlb_vm(vma);
+
+	if (ptemap)
+		pte_unmap(ptemap);
 	local_irq_restore(flags);
 }
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()
  2023-06-15  5:43       ` Hugh Dickins
@ 2023-06-15 15:50         ` Nathan Chancellor
  2023-06-15 21:22           ` Hugh Dickins
  0 siblings, 1 reply; 36+ messages in thread
From: Nathan Chancellor @ 2023-06-15 15:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Peter Zijlstra, Russell King, Catalin Marinas, Will Deacon,
	Geert Uytterhoeven, Greg Ungerer, Michal Simek,
	Thomas Bogendoerfer, Helge Deller, John David Anglin,
	Aneesh Kumar K.V, Michael Ellerman, Alexandre Ghiti,
	Palmer Dabbelt, Heiko Carstens, Christian Borntraeger,
	Claudio Imbrenda, Alexander Gordeev, John Paul Adrian Glaubitz,
	David S. Miller, Chris Zankel, Max Filippov, x86,
	linux-arm-kernel, linux-ia64, linux-m68k, linux-mips,
	linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
	sparclinux, linux-kernel, linux-mm

On Wed, Jun 14, 2023 at 10:43:30PM -0700, Hugh Dickins wrote:
> On Wed, 14 Jun 2023, Hugh Dickins wrote:
> > On Wed, 14 Jun 2023, Nathan Chancellor wrote:
> > > 
> > > I just bisected a crash while powering down a MIPS machine in QEMU to
> > > this change as commit 8044511d3893 ("mips: update_mmu_cache() can
> > > replace __update_tlb()") in linux-next.
> > 
> > Thank you, Nathan, that's very helpful indeed.  This patch certainly knew
> > that it wanted testing, and I'm glad to hear that it is now seeing some.
> > 
> > While powering down?  The messages below look like it was just coming up,
> > but no doubt that's because you were bisecting (or because I'm unfamiliar
> > with what messages to expect there).  It's probably irrelevant information,
> > but I wonder whether the (V)machine worked well enough for a while before
> > you first powered down and spotted the problem, or whether it's never got
> > much further than trying to run init (busybox)?  I'm trying to get a feel
> > for whether the problem occurs under common or uncommon conditions.

Ugh sorry, I have been looking into too many bugs lately and got my
wires crossed :) this is indeed a problem when running init (which is
busybox, this is a simple Buildroot file system).

> > > Unfortunately, I can still
> > > reproduce it with the existing fix you have for this change on the
> > > mailing list, which is present in next-20230614.
> > 
> > Right, that later fix was only for a build warning, nothing functional
> > (or at least I hoped that it wasn't making any functional difference).
> > 
> > Thanks a lot for the detailed instructions below: unfortunately, those
> > would draw me into a realm of testing I've never needed to enter before,
> > so a lot of time spent on setup and learning.  Usually, I just stare at
> > the source.
> > 
> > What this probably says is that I should revert most my cleanup there,
> > and keep as close to the existing code as possible.  But some change is
> > needed, and I may need to understand (or have a good guess at) what was
> > going wrong, to decide what kind of retreat will be successful.
> > 
> > Back to the source for a while: I hope I'll find examples in nearby MIPS
> > kernel source (and git history), which will hint at the right way forward.
> > Then send you a patch against next-20230614 to try, when I'm reasonably
> > confident that it's enough to satisfy my purpose, but likely not to waste
> > your time.
> 
> I'm going to take advantage of your good nature by attaching
> two alternative patches, either to go on top of next-20230614.
> 
> mips1.patch,
>  arch/mips/mm/tlb-r4k.c |   12 +-----------
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> is by far my favourite.  I couldn't see anything wrong with what's
> already there for mips, but it seems possible that (though I didn't
> find it) somewhere calls update_mmu_cache_pmd() on a page table.  So
> mips1.patch restores the pmd_huge() check, and cleans up further by
> removing the silly pgdp, p4dp, pudp, pmdp stuff: the pointer has now
> been passed in by the caller, why walk the tree again?  I should have
> done it this way before.
> 
> But if that doesn't work, then I'm afraid it will have to be
> mips2.patch,
>  arch/mips/include/asm/pgtable.h |   15 ++++++++++++---
>  arch/mips/mm/tlb-r3k.c          |    5 ++---
>  arch/mips/mm/tlb-r4k.c          |   27 ++++++++++++++++++---------
>  3 files changed, 32 insertions(+), 15 deletions(-)
> 
> which reverts all of the original patch and its build warning fix,
> and does a pte_unmap() to balance the silly pte_offset_map() there;
> with an apologetic comment for this being about the only place in
> the tree where I have no idea what to do if ptep were NULL.
> 
> I do hope that you find the first fixes the breakage; but if not, then

I hate to be the bearer of bad news but the first patch did not fix the
breakage, I see the same issue.

> I even more fervently hope that the second will, despite my hating it.
> Touch wood for the first, fingers crossed for the second, thanks,

Thankfully, the second one does. Thanks for the quick and thoughtful
responses!

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()
  2023-06-15 15:50         ` Nathan Chancellor
@ 2023-06-15 21:22           ` Hugh Dickins
  2023-06-15 23:02             ` [PATCH v2 07/23 replacement] mips: add pte_unmap() to balance pte_offset_map() Hugh Dickins
  0 siblings, 1 reply; 36+ messages in thread
From: Hugh Dickins @ 2023-06-15 21:22 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Hugh Dickins, Andrew Morton, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Peter Zijlstra, Russell King,
	Catalin Marinas, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Thu, 15 Jun 2023, Nathan Chancellor wrote:
> On Wed, Jun 14, 2023 at 10:43:30PM -0700, Hugh Dickins wrote:
> > 
> > I do hope that you find the first fixes the breakage; but if not, then
> 
> I hate to be the bearer of bad news but the first patch did not fix the
> breakage, I see the same issue.

Boo!

> 
> > I even more fervently hope that the second will, despite my hating it.
> > Touch wood for the first, fingers crossed for the second, thanks,
> 
> Thankfully, the second one does. Thanks for the quick and thoughtful
> responses!

Hurrah!

Thanks a lot, Nathan.  I'll set aside my disappointment and curiosity,
clearly I'm not going to have much of a future as a MIPS programmer.

I must take a break, then rush Andrew the second patch, well, not
exactly that second patch, since most of that is revert: I'll just
send the few lines of replacement patch (with a new Subject line, as
update_mmu_cache() goes back to being separate from __update_tlb()).

Unless you object, I'll include a Tested-by: you.  I realize that
your testing is limited to seeing it running; but that's true of
most of the testing at this stage - it gets to be more interesting
when the patch that adds the rcu_read_lock() and rcu_read_unlock()
is added on top later.

Thanks again,
Hugh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()
  2023-06-14 23:17   ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Nathan Chancellor
  2023-06-15  0:26     ` Hugh Dickins
@ 2023-06-15 22:07     ` Yu Zhao
  1 sibling, 0 replies; 36+ messages in thread
From: Yu Zhao @ 2023-06-15 22:07 UTC (permalink / raw)
  To: Nathan Chancellor, Hugh Dickins
  Cc: Hugh Dickins, Andrew Morton, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Peter Zijlstra, Russell King,
	Catalin Marinas, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Wed, Jun 14, 2023 at 04:17:58PM -0700, Nathan Chancellor wrote:
> Hi Hugh,
> 
> On Thu, Jun 08, 2023 at 12:17:24PM -0700, Hugh Dickins wrote:
> > Don't make update_mmu_cache() a wrapper around __update_tlb(): call it
> > directly, and use the ptep (or pmdp) provided by the caller, instead of
> > re-calling pte_offset_map() - which would raise a question of whether a
> > pte_unmap() is needed to balance it.
> > 
> > Check whether the "ptep" provided by the caller is actually the pmdp,
> > instead of testing pmd_huge(): or test pmd_huge() too and warn if it
> > disagrees?  This is "hazardous" territory: needs review and testing.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> >  arch/mips/include/asm/pgtable.h | 15 +++------------
> >  arch/mips/mm/tlb-r3k.c          |  5 +++--
> >  arch/mips/mm/tlb-r4k.c          |  9 +++------
> >  3 files changed, 9 insertions(+), 20 deletions(-)
> > 
> 
> I just bisected a crash while powering down a MIPS machine in QEMU to
> this change as commit 8044511d3893 ("mips: update_mmu_cache() can
> replace __update_tlb()") in linux-next. Unfortunately, I can still
> reproduce it with the existing fix you have for this change on the
> mailing list, which is present in next-20230614.
> 
> I can reproduce it with the GCC 13.1.0 on kernel.org [1].
> 
>   $ make -skj"$(nproc)" ARCH=mips CROSS_COMPILE=mips-linux- mrproper malta_defconfig vmlinux
> 
>   $ qemu-system-mipsel \
>       -display none \
>       -nodefaults \
>       -cpu 24Kf \
>       -machine malta \
>       -kernel vmlinux \
>       -initrd rootfs.cpio \
>       -m 512m \
>       -serial mon:stdio
>   ...
>   Linux version 6.4.0-rc6-next-20230614 (nathan@dev-arch.thelio-3990X) (mips-linux-gcc (GCC) 13.1.0, GNU ld (GNU Binutils) 2.40) #1 SMP Wed Jun 14 16:13:02 MST 2023
>   ...
>   Run /init as init process
>   process '/bin/busybox' started with executable stack
>   do_page_fault(): sending SIGSEGV to init for invalid read access from 0000003c
>   epc = 77b893dc in ld-uClibc-1.0.39.so[77b84000+8000]
>   ra  = 77b8930c in ld-uClibc-1.0.39.so[77b84000+8000]
>   Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>   ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> The rootfs is available at [2] if it is needed. I am more than happy to
> provide additional information or test patches if necessary.
> 
> [1]: https://mirrors.edge.kernel.org/pub/tools/crosstool/
> [2]: https://github.com/ClangBuiltLinux/boot-utils/releases/download/20230609-194440/mipsel-rootfs.cpio.zst

Seeing this on real h/w as well (just to confirm).

Linux version 6.4.0-rc4-00437-g4bab5c42a698 (root@yuzhao.bld.corp.google.com) (mips64el-linux-gnuabi64-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #3 SMP PREEMPT Thu Jun 15 01:05:20 MDT 2023
Skipping L2 locking due to reduced L2 cache size
CVMSEG size: 2 cache lines (256 bytes)
printk: bootconsole [early0] enabled
CPU0 revision is: 000d9602 (Cavium Octeon III)
FPU revision is: 00739600
Kernel sections are not in the memory maps
Wasting 243712 bytes for tracking 4352 unused pages
Initrd not found or empty - disabling initrd
Using passed Device Tree.
software IO TLB: SWIOTLB bounce buffer size adjusted to 0MB
software IO TLB: area num 1.
software IO TLB: mapped [mem 0x000000000370d000-0x000000000374d000] (0MB)
Primary instruction cache 78kB, virtually tagged, 39 way, 16 sets, linesize 128 bytes.
Primary data cache 32kB, 32-way, 8 sets, linesize 128 bytes.
Zone ranges:
  DMA32    [mem 0x0000000001100000-0x00000000efffffff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000001100000-0x0000000003646fff]
  node   0: [mem 0x0000000003700000-0x000000000fafffff]
  node   0: [mem 0x0000000020000000-0x000000004ebfffff]
Initmem setup node 0 [mem 0x0000000001100000-0x000000004ebfffff]
On node 0, zone DMA32: 4352 pages in unavailable ranges
On node 0, zone DMA32: 185 pages in unavailable ranges
On node 0, zone DMA32: 1280 pages in unavailable ranges
On node 0, zone DMA32: 5120 pages in unavailable ranges
percpu: Embedded 15 pages/cpu s24368 r8192 d28880 u61440
pcpu-alloc: s24368 r8192 d28880 u61440 alloc=15*4096
pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
Kernel command line:  loglevel=8 console=ttyS0,115200
printk: log_buf_len individual max cpu contribution: 4096 bytes
printk: log_buf_len total cpu_extra contributions: 12288 bytes
printk: log_buf_len min size: 16384 bytes
printk: log_buf_len: 32768 bytes
printk: early log buf free: 14184(86%)
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
Built 1 zonelists, mobility grouping on.  Total pages: 247772
mem auto-init: stack:all(zero), heap alloc:off, heap free:off
Memory: 950032K/1004828K available (8058K kernel code, 575K rwdata, 1880K rodata, 27488K init, 158K bss, 54796K reserved, 0K cma-reserved)
rcu: Preemptible hierarchical RCU implementation.
rcu: 	RCU event tracing is enabled.
rcu: 	RCU restricting CPUs from NR_CPUS=32 to nr_cpu_ids=4.
rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
NR_IRQS: 512
CIB interrupt controller probed: 800107000000e000 23
CIB interrupt controller probed: 800107000000e200 12
CIB interrupt controller probed: 800107000000e400 6
CIB interrupt controller probed: 800107000000ec00 15
CIB interrupt controller probed: 800107000000e600 4
CIB interrupt controller probed: 800107000000e800 11
CIB interrupt controller probed: 800107000000e900 11
rcu: srcu_init: Setting srcu_struct sizes based on contention.
clocksource: OCTEON_CVMCOUNT: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
Calibrating delay loop (skipped) preset value.. 2000.00 BogoMIPS (lpj=10000000)
pid_max: default: 32768 minimum: 301
LSM: initializing lsm=capability,integrity
Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
rcu: Hierarchical SRCU implementation.
rcu: 	Max phase no-delay instances is 1000.
smp: Bringing up secondary CPUs ...
SMP: Booting CPU01 (CoreId  1)...
CPU1 revision is: 000d9602 (Cavium Octeon III)
FPU revision is: 00739600
SMP: Booting CPU02 (CoreId  2)...
CPU2 revision is: 000d9602 (Cavium Octeon III)
FPU revision is: 00739600
SMP: Booting CPU03 (CoreId  3)...
CPU3 revision is: 000d9602 (Cavium Octeon III)
FPU revision is: 00739600
smp: Brought up 1 node, 4 CPUs
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
futex hash table entries: 1024 (order: 5, 131072 bytes, linear)
NET: Registered PF_NETLINK/PF_ROUTE protocol family
PCIe: Initializing port 0
PCIe: BIST2 FAILED for port 0 (0x0000000000000003)
PCIe: Link timeout on port 0, probably the slot is empty
PCIe: Initializing port 1
PCIe: BIST FAILED for port 1 (0xffffffffffffffff)
PCIe: Link timeout on port 1, probably the slot is empty
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
EDAC MC: Ver: 3.0.0
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [mem 0x1000000000000]
pci_bus 0000:00: root bus resource [io  0x0000]
pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
vgaarb: loaded
clocksource: Switched to clocksource OCTEON_CVMCOUNT
NET: Registered PF_INET protocol family
IP idents hash table entries: 16384 (order: 5, 131072 bytes, linear)
tcp_listen_portaddr_hash hash table entries: 512 (order: 1, 8192 bytes, linear)
Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
TCP established hash table entries: 8192 (order: 4, 65536 bytes, linear)
TCP bind hash table entries: 8192 (order: 6, 262144 bytes, linear)
TCP: Hash tables configured (established 8192 bind 8192)
UDP hash table entries: 512 (order: 2, 16384 bytes, linear)
UDP-Lite hash table entries: 512 (order: 2, 16384 bytes, linear)
NET: Registered PF_UNIX/PF_LOCAL protocol family
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 0 bytes, default 128
platform 1180068000000.uctl: clocks initialized.
platform 1180069000000.uctl: clocks initialized.
Starting KVM with MIPS VZ extensions
workingset: timestamp_bits=62 max_order=18 bucket_order=0
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
nfs4filelayout_init: NFSv4 File Layout Driver Registering...
nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering...
io scheduler mq-deadline registered
io scheduler kyber registered
io scheduler bfq registered
gpio gpiochip0: Static allocation of GPIO base is deprecated, use dynamic allocation.
octeon_gpio 1070000000800.gpio-controller: OCTEON GPIO driver probed.
Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
printk: console [ttyS0] disabled
1180000000800.serial: ttyS0 at MMIO 0x1180000000800 (irq = 34, base_baud = 25000000) is a OCTEON
printk: console [ttyS0] enabled
printk: console [ttyS0] enabled
printk: bootconsole [early0] disabled
printk: bootconsole [early0] disabled
1180000000c00.serial: ttyS1 at MMIO 0x1180000000c00 (irq = 35, base_baud = 25000000) is a OCTEON
loop: module loaded
Driver 'pata_octeon_cf' needs updating - please use bus_type methods
slram: not enough parameters.
spi-octeon 1070000001000.spi: OCTEON SPI bus driver
process '/bin/kmod' started with executable stack
do_page_fault(): sending SIGSEGV to modprobe for invalid read access from 0000000000000298
epc = 000000fff3346470 in ld.so.1[fff3328000+2e000]
ra  = 000000fff33456d0 in ld.so.1[fff3328000+2e000]
do_page_fault(): sending SIGSEGV to modprobe for invalid read access from 0000000000000298
epc = 000000fff3c78470 in ld.so.1[fff3c5a000+2e000]
ra  = 000000fff3c776d0 in ld.so.1[fff3c5a000+2e000]
do_page_fault(): sending SIGSEGV to modprobe for invalid read access from 0000000000021da8
epc = 000000fff35aa2c0 in ld.so.1[fff358d000+2e000]
ra  = 000000fff35aa688 in ld.so.1[fff358d000+2e000]
do_page_fault(): sending SIGSEGV to modprobe for invalid read access from 0000000000000298
epc = 000000fff34cc470 in ld.so.1[fff34ae000+2e000]
ra  = 000000fff34cb6d0 in ld.so.1[fff34ae000+2e000]
mdio_octeon 1180000001800.mdio: Probed
mdio_octeon 1180000001900.mdio: Probed
dwc3 1680000000000.xhci: Configuration mismatch. dr_mode forced to host
dwc3 1690000000000.xhci: Configuration mismatch. dr_mode forced to host
xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f06d hci version 0x100 quirks 0x0000000002010010
xhci-hcd xhci-hcd.0.auto: irq 25, io mem 0x1680000000000
dwc3 1680000000000.xhci: xhci_plat_probe get usb3phy fail (ret=-6)
xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
xhci-hcd xhci-hcd.1.auto: new USB bus registered, assigned bus number 3
xhci-hcd xhci-hcd.1.auto: hcc params 0x0220f06d hci version 0x100 quirks 0x0000000002010010
xhci-hcd xhci-hcd.1.auto: irq 26, io mem 0x1690000000000
dwc3 1690000000000.xhci: xhci_plat_probe get usb3phy fail (ret=-6)
xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
xhci-hcd xhci-hcd.1.auto: new USB bus registered, assigned bus number 4
xhci-hcd xhci-hcd.1.auto: Host supports USB 3.0 SuperSpeed
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 1 port detected
usb usb4: We don't know the algorithms for LPM for this host, disabling LPM.
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 1 port detected
usbcore: registered new interface driver usb-storage
i2c-octeon 1180000001000.i2c: probed
i2c-octeon 1180000001200.i2c: probed
octeon_wdt: Initial granularity 5 Sec
EDAC DEVICE0: Giving out device to module octeon-cpu controller cache: DEV octeon_pc_edac (INTERRUPT)
EDAC DEVICE1: Giving out device to module octeon-l2c controller octeon_l2c_err: DEV octeon_l2c_edac (POLLED)
octeon_lmc_edac octeon_lmc_edac.0: Disabled (ECC not enabled)
Interface 0 has 4 ports (SGMII)
Interface 1 has 4 ports (SGMII)
Interface 3 has 4 ports (LOOP)
NET: Registered PF_INET6 protocol family
Segment Routing with IPv6
In-situ OAM (IOAM) with IPv6
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered PF_PACKET protocol family
Key type dns_resolver registered
OF: fdt: not creating '/sys/firmware/fdt': CRC check failed
Freeing unused kernel image (initmem) memory: 27488K
This architecture does not have kernel memory protection.
Run /init as init process
  with arguments:
    /init
  with environment:
    HOME=/
    TERM=linux
do_page_fault(): sending SIGSEGV to init for invalid read access from 0000000000021da8
epc = 000000fff3a542c0 in ld.so.1[fff3a37000+2e000]
ra  = 000000fff3a54688 in ld.so.1[fff3a37000+2e000]
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 07/23 replacement] mips: add pte_unmap() to balance pte_offset_map()
  2023-06-15 21:22           ` Hugh Dickins
@ 2023-06-15 23:02             ` Hugh Dickins
  2023-06-17  3:54               ` Yu Zhao
  0 siblings, 1 reply; 36+ messages in thread
From: Hugh Dickins @ 2023-06-15 23:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Chancellor, Hugh Dickins, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Peter Zijlstra, Russell King,
	Catalin Marinas, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, Yu Zhao, x86, linux-arm-kernel, linux-ia64,
	linux-m68k, linux-mips, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, sparclinux, linux-kernel, linux-mm

To keep balance in future, __update_tlb() remember to pte_unmap() after
pte_offset_map().  This is an odd case, since the caller has already done
pte_offset_map_lock(), then mips forgets the address and recalculates it;
but my two naive attempts to clean that up did more harm than good.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
Andrew, please replace my mips patch, and its build warning fix patch,
in mm-unstable by this less ambitious but working replacement - thanks.

 arch/mips/mm/tlb-r4k.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index 1b939abbe4ca..93c2d695588a 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -297,7 +297,7 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 	p4d_t *p4dp;
 	pud_t *pudp;
 	pmd_t *pmdp;
-	pte_t *ptep;
+	pte_t *ptep, *ptemap = NULL;
 	int idx, pid;
 
 	/*
@@ -344,7 +344,12 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 	} else
 #endif
 	{
-		ptep = pte_offset_map(pmdp, address);
+		ptemap = ptep = pte_offset_map(pmdp, address);
+		/*
+		 * update_mmu_cache() is called between pte_offset_map_lock()
+		 * and pte_unmap_unlock(), so we can assume that ptep is not
+		 * NULL here: and what should be done below if it were NULL?
+		 */
 
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 #ifdef CONFIG_XPA
@@ -373,6 +378,9 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 	tlbw_use_hazard();
 	htw_start();
 	flush_micro_tlb_vm(vma);
+
+	if (ptemap)
+		pte_unmap(ptemap);
 	local_irq_restore(flags);
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23 replacement] mips: add pte_unmap() to balance pte_offset_map()
  2023-06-15 23:02             ` [PATCH v2 07/23 replacement] mips: add pte_unmap() to balance pte_offset_map() Hugh Dickins
@ 2023-06-17  3:54               ` Yu Zhao
  2023-06-18 20:57                 ` Yu Zhao
  0 siblings, 1 reply; 36+ messages in thread
From: Yu Zhao @ 2023-06-17  3:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Nathan Chancellor, Mike Kravetz, Mike Rapoport,
	Kirill A. Shutemov, Matthew Wilcox, David Hildenbrand,
	Suren Baghdasaryan, Qi Zheng, Peter Zijlstra, Russell King,
	Catalin Marinas, Will Deacon, Geert Uytterhoeven, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On Thu, Jun 15, 2023 at 04:02:43PM -0700, Hugh Dickins wrote:
> To keep balance in future, __update_tlb() remember to pte_unmap() after
> pte_offset_map().  This is an odd case, since the caller has already done
> pte_offset_map_lock(), then mips forgets the address and recalculates it;
> but my two naive attempts to clean that up did more harm than good.
> 
> Tested-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Hugh Dickins <hughd@google.com>

FWIW: Tested-by: Yu Zhao <yuzhao@google.com>

There is another problem, likely caused by khugepaged, happened multiple times. But I don't think it's related to your series, just FYI.

  Got mcheck at ffffffff81134ef0
  CPU: 3 PID: 36 Comm: khugepaged Not tainted 6.4.0-rc6-00049-g62d8779610bb-dirty #1
  $ 0   : 0000000000000000 0000000000000014 40000000011ac004 4000000000000000
  $ 4   : c000000000000000 0000000000000045 000000011a80045b 000000011a80045b
  $ 8   : 8000000080188000 ffffffff81b526c0 0000000000000200 0000000000000000
  $12   : 0000000000000028 ffffffff81910cb4 0000000000000000 0000000000000207
  $16   : 000000aaab800000 80000000037ee990 ffffffff81b50200 8000000005066ae0
  $20   : 0000000000000001 ffffffff80000000 ffffffff81c10000 000000aaab800000
  $24   : 0000000000000002 ffffffff812b75f8
  $28   : 8000000002310000 8000000002313b00 ffffffff81b50000 ffffffff81134d88
  Hi    : 000000000000017a
  Lo    : 0000000000000000
  epc   : ffffffff81134ef0 __update_tlb+0x260/0x2a0
  ra    : ffffffff81134d88 __update_tlb+0xf8/0x2a0
  Status: 14309ce2	KX SX UX KERNEL EXL
  Cause : 00800060 (ExcCode 18)
  PrId  : 000d9602 (Cavium Octeon III)
  CPU: 3 PID: 36 Comm: khugepaged Not tainted 6.4.0-rc6-00049-g62d8779610bb-dirty #1
  Stack : 0000000000000001 0000000000000000 0000000000000008 8000000002313768
          8000000002313768 80000000023138f8 0000000000000000 0000000000000000
          a6c8cd76e1667e00 8000000001db4f28 0000000000000001 30302d3663722d30
          643236672d393430 0000000000000010 ffffffff81910cc0 0000000000000000
          8000000001d96bcc 0000000000000000 0000000000000000 ffffffff81a68ed0
          ffffffff81b50000 0000000000000001 ffffffff80000000 ffffffff81c10000
          000000aaab800000 0000000000000002 ffffffff815b78c0 ffffffffa184e710
          00000000000000c0 8000000002310000 8000000002313760 ffffffff81b50000
          ffffffff8111c9cc 0000000000000000 0000000000000000 0000000000000000
          0000000000000000 0000000000000000 ffffffff8111c9ec 0000000000000000
          ...
  Call Trace:
  [<ffffffff8111c9ec>] show_stack+0x64/0x158
  [<ffffffff81920078>] dump_stack_lvl+0x5c/0x7c
  [<ffffffff8111e03c>] do_mcheck+0x2c/0x98
  [<ffffffff81118608>] handle_mcheck_int+0x38/0x50
  
  Index    : 80000000
  PageMask : 1fe000
  EntryHi  : 000000aaab8000bd
  EntryLo0 : 40000000011a8004
  EntryLo1 : 40000000011ac004
  Wired    : 0
  PageGrain: e8000000
  
  Index:  2 pgmask=4kb va=c00000feffff4000 asid=b9
  	[ri=0 xi=0 pa=000022a7000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000022af000 c=0 d=1 v=1 g=1]
  Index:  3 pgmask=4kb va=c00000feffff8000 asid=b9
  	[ri=0 xi=0 pa=00002380000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002381000 c=0 d=1 v=1 g=1]
  Index:  4 pgmask=4kb va=c00000feffffa000 asid=b9
  	[ri=0 xi=0 pa=000023e9000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000023ea000 c=0 d=1 v=1 g=1]
  Index:  5 pgmask=4kb va=c00000feffffe000 asid=b9
  	[ri=0 xi=0 pa=00002881000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002882000 c=0 d=1 v=1 g=1]
  Index:  6 pgmask=4kb va=c00000fefffb0000 asid=b9
  	[ri=0 xi=0 pa=00002cc2000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002cc3000 c=0 d=1 v=1 g=1]
  Index:  7 pgmask=4kb va=c00000feffffc000 asid=b9
  	[ri=0 xi=0 pa=000023eb000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002880000 c=0 d=1 v=1 g=1]
  Index:  8 pgmask=4kb va=c00000feffff6000 asid=b9
  	[ri=0 xi=0 pa=0000237e000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000237f000 c=0 d=1 v=1 g=1]
  Index: 14 pgmask=4kb va=c00000fefff62000 asid=8e
  	[ri=0 xi=0 pa=00007477000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000745e000 c=0 d=1 v=1 g=1]
  Index: 15 pgmask=4kb va=c00000fefff52000 asid=8e
  	[ri=0 xi=0 pa=0000744c000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000616d000 c=0 d=1 v=1 g=1]
  Index: 16 pgmask=4kb va=c00000fefff42000 asid=8e
  	[ri=0 xi=0 pa=00006334000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000616b000 c=0 d=1 v=1 g=1]
  Index: 19 pgmask=4kb va=c00000fefffb6000 asid=8e
  	[ri=0 xi=0 pa=00005050000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00005051000 c=0 d=1 v=1 g=1]
  Index: 20 pgmask=4kb va=c00000fefff72000 asid=b9
  	[ri=0 xi=0 pa=00007504000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00007503000 c=0 d=1 v=1 g=1]
  Index: 58 pgmask=4kb va=c00000fefffaa000 asid=8e
  	[ri=0 xi=0 pa=00005126000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00005127000 c=0 d=1 v=1 g=1]
  Index: 59 pgmask=4kb va=c00000fefffba000 asid=8e
  	[ri=0 xi=0 pa=00005129000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000512a000 c=0 d=1 v=1 g=1]
  Index: 79 pgmask=4kb va=c000000000060000 asid=8e
  	[ri=0 xi=0 pa=0000534b000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000062f9000 c=0 d=1 v=1 g=1]
  Index: 80 pgmask=4kb va=c00000000005e000 asid=8e
  	[ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=1] [ri=0 xi=0 pa=00004013000 c=0 d=1 v=1 g=1]
  Index: 81 pgmask=4kb va=c0000000003a0000 asid=8e
  	[ri=0 xi=0 pa=000060c6000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000340e000 c=0 d=1 v=1 g=1]
  Index: 82 pgmask=4kb va=c00000000039e000 asid=8e
  	[ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=1] [ri=0 xi=0 pa=000060c5000 c=0 d=1 v=1 g=1]
  Index: 83 pgmask=4kb va=c00000000003e000 asid=8e
  	[ri=0 xi=0 pa=00002bf3000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002c42000 c=0 d=1 v=1 g=1]
  Index: 84 pgmask=4kb va=c000000000042000 asid=8e
  	[ri=0 xi=0 pa=00002c45000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002c46000 c=0 d=1 v=1 g=1]
  Index: 85 pgmask=4kb va=0aaab820000 asid=bd
  	[ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] [ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0]
  Index: 86 pgmask=4kb va=0aaab748000 asid=bd
  	[ri=0 xi=1 pa=0003c959000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0000f7b6000 c=0 d=0 v=1 g=0]
  Index: 87 pgmask=4kb va=0fff37c4000 asid=bd
  	[ri=0 xi=0 pa=0000bd23000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=0000bd24000 c=0 d=0 v=1 g=0]
  Index: 88 pgmask=4kb va=0fff3992000 asid=bd
  	[ri=0 xi=1 pa=0000bfcd000 c=0 d=0 v=1 g=0] [ri=0 xi=1 pa=0002977b000 c=0 d=0 v=1 g=0]
  Index: 89 pgmask=4kb va=0fff3288000 asid=bd
  	[ri=0 xi=0 pa=00032b62000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=00032b63000 c=0 d=0 v=1 g=0]
  Index: 90 pgmask=4kb va=0fff3982000 asid=bd
  	[ri=0 xi=1 pa=0002d6a3000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0002a423000 c=0 d=0 v=1 g=0]
  Index: 91 pgmask=4kb va=0fffbb5e000 asid=bd
  	[ri=0 xi=0 pa=00028949000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=00035060000 c=0 d=1 v=1 g=0]
  Index: 92 pgmask=4kb va=c00000fefffe2000 asid=8e
  	[ri=0 xi=0 pa=000020f0000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000020ff000 c=0 d=1 v=1 g=1]
  Index: 93 pgmask=4kb va=c00000fefffd2000 asid=8e
  	[ri=0 xi=0 pa=000020b7000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000020fe000 c=0 d=1 v=1 g=1]
  Index: 94 pgmask=4kb va=c00000fefffc2000 asid=8e
  	[ri=0 xi=0 pa=000020b6000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000020fd000 c=0 d=1 v=1 g=1]
  Index: 110 pgmask=4kb va=c00000feffff2000 asid=bc
  	[ri=0 xi=0 pa=000020f1000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002100000 c=0 d=1 v=1 g=1]
  Index: 125 pgmask=4kb va=c00000fefffbe000 asid=bc
  	[ri=0 xi=0 pa=00005268000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000053dc000 c=0 d=1 v=1 g=1]
  Index: 126 pgmask=4kb va=c00000fefffbc000 asid=bc
  	[ri=0 xi=0 pa=00005266000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00005267000 c=0 d=1 v=1 g=1]
  Index: 188 pgmask=4kb va=c00000fefff76000 asid=bb
  	[ri=0 xi=0 pa=00007576000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00007577000 c=0 d=1 v=1 g=1]
  
  Code: 1000ff92  00601025  00000000 <42000006> 1000ffb8  00000000  00000000  8f820018  00021238
  Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.
  ---[ end Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB. ]---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 07/23 replacement] mips: add pte_unmap() to balance pte_offset_map()
  2023-06-17  3:54               ` Yu Zhao
@ 2023-06-18 20:57                 ` Yu Zhao
  0 siblings, 0 replies; 36+ messages in thread
From: Yu Zhao @ 2023-06-18 20:57 UTC (permalink / raw)
  To: Hugh Dickins, Nathan Chancellor, Thomas Bogendoerfer
  Cc: Andrew Morton, Mike Kravetz, Mike Rapoport, Kirill A. Shutemov,
	Matthew Wilcox, David Hildenbrand, Suren Baghdasaryan, Qi Zheng,
	Peter Zijlstra, Russell King, Catalin Marinas, Will Deacon,
	Geert Uytterhoeven, Greg Ungerer, Michal Simek, Helge Deller,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
	sparclinux, linux-kernel, linux-mm, linux-mips

On Fri, Jun 16, 2023 at 9:54 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Jun 15, 2023 at 04:02:43PM -0700, Hugh Dickins wrote:
> > To keep balance in future, __update_tlb() remember to pte_unmap() after
> > pte_offset_map().  This is an odd case, since the caller has already done
> > pte_offset_map_lock(), then mips forgets the address and recalculates it;
> > but my two naive attempts to clean that up did more harm than good.
> >
> > Tested-by: Nathan Chancellor <nathan@kernel.org>
> > Signed-off-by: Hugh Dickins <hughd@google.com>
>
> FWIW: Tested-by: Yu Zhao <yuzhao@google.com>
>
> There is another problem, likely caused by khugepaged, happened multiple times. But I don't think it's related to your series, just FYI.
>
>   Got mcheck at ffffffff81134ef0
>   CPU: 3 PID: 36 Comm: khugepaged Not tainted 6.4.0-rc6-00049-g62d8779610bb-dirty #1

...

>   Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.

In case anyone plans to try to fix this - the problem goes back to at
least 5.15 stable. My (educated) guess is that nobody complained about
it because all the testing is done in QEMU, which does NOT detect
conflicting TLBs. This means the verification of the fix would need to
be on a real piece of h/w or an updated QEMU.

In target/mips/tcg/sysemu/tlb_helper.c:

static void r4k_fill_tlb(CPUMIPSState *env, int idx)
{
    r4k_tlb_t *tlb;
    uint64_t mask = env->CP0_PageMask >> (TARGET_PAGE_BITS + 1);

    /* XXX: detect conflicting TLBs and raise a MCHECK exception when needed */
...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/23] parisc: add pte_unmap() to balance get_ptep()
  2023-06-08 19:18 ` [PATCH v2 08/23] parisc: add pte_unmap() to balance get_ptep() Hugh Dickins
@ 2023-06-19  3:55   ` Helge Deller
  0 siblings, 0 replies; 36+ messages in thread
From: Helge Deller @ 2023-06-19  3:55 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: Mike Kravetz, Mike Rapoport, Kirill A. Shutemov, Matthew Wilcox,
	David Hildenbrand, Suren Baghdasaryan, Qi Zheng, Peter Zijlstra,
	Russell King, Catalin Marinas, Will Deacon, Geert Uytterhoeven,
	Greg Ungerer, Michal Simek, Thomas Bogendoerfer,
	John David Anglin, Aneesh Kumar K.V, Michael Ellerman,
	Alexandre Ghiti, Palmer Dabbelt, Heiko Carstens,
	Christian Borntraeger, Claudio Imbrenda, Alexander Gordeev,
	John Paul Adrian Glaubitz, David S. Miller, Chris Zankel,
	Max Filippov, x86, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-riscv, linux-s390,
	linux-sh, sparclinux, linux-kernel, linux-mm

On 6/8/23 21:18, Hugh Dickins wrote:
> To keep balance in future, remember to pte_unmap() after a successful
> get_ptep().  And act as if flush_cache_pages() really needs a map there,
> to read the pfn before "unmapping", to be sure page table is not removed.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>

For the parisc parts:

Acked-by: Helge Deller <deller@gmx.de> # parisc

Helge


> ---
>   arch/parisc/kernel/cache.c | 26 +++++++++++++++++++++-----
>   1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
> index ca4a302d4365..501160250bb7 100644
> --- a/arch/parisc/kernel/cache.c
> +++ b/arch/parisc/kernel/cache.c
> @@ -426,10 +426,15 @@ void flush_dcache_page(struct page *page)
>   		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
>   		addr = mpnt->vm_start + offset;
>   		if (parisc_requires_coherency()) {
> +			bool needs_flush = false;
>   			pte_t *ptep;
>
>   			ptep = get_ptep(mpnt->vm_mm, addr);
> -			if (ptep && pte_needs_flush(*ptep))
> +			if (ptep) {
> +				needs_flush = pte_needs_flush(*ptep);
> +				pte_unmap(ptep);
> +			}
> +			if (needs_flush)
>   				flush_user_cache_page(mpnt, addr);
>   		} else {
>   			/*
> @@ -561,14 +566,20 @@ EXPORT_SYMBOL(flush_kernel_dcache_page_addr);
>   static void flush_cache_page_if_present(struct vm_area_struct *vma,
>   	unsigned long vmaddr, unsigned long pfn)
>   {
> -	pte_t *ptep = get_ptep(vma->vm_mm, vmaddr);
> +	bool needs_flush = false;
> +	pte_t *ptep;
>
>   	/*
>   	 * The pte check is racy and sometimes the flush will trigger
>   	 * a non-access TLB miss. Hopefully, the page has already been
>   	 * flushed.
>   	 */
> -	if (ptep && pte_needs_flush(*ptep))
> +	ptep = get_ptep(vma->vm_mm, vmaddr);
> +	if (ptep) {
> +		needs_flush = pte_needs_flush(*ptep);
> +		pte_unmap(ptep);
> +	}
> +	if (needs_flush)
>   		flush_cache_page(vma, vmaddr, pfn);
>   }
>
> @@ -635,17 +646,22 @@ static void flush_cache_pages(struct vm_area_struct *vma, unsigned long start, u
>   	pte_t *ptep;
>
>   	for (addr = start; addr < end; addr += PAGE_SIZE) {
> +		bool needs_flush = false;
>   		/*
>   		 * The vma can contain pages that aren't present. Although
>   		 * the pte search is expensive, we need the pte to find the
>   		 * page pfn and to check whether the page should be flushed.
>   		 */
>   		ptep = get_ptep(vma->vm_mm, addr);
> -		if (ptep && pte_needs_flush(*ptep)) {
> +		if (ptep) {
> +			needs_flush = pte_needs_flush(*ptep);
> +			pfn = pte_pfn(*ptep);
> +			pte_unmap(ptep);
> +		}
> +		if (needs_flush) {
>   			if (parisc_requires_coherency()) {
>   				flush_user_cache_page(vma, addr);
>   			} else {
> -				pfn = pte_pfn(*ptep);
>   				if (WARN_ON(!pfn_valid(pfn)))
>   					return;
>   				__flush_cache_page(vma, addr, PFN_PHYS(pfn));


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-06-19  3:57 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 19:07 [PATCH v2 00/23] arch: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-06-08 19:10 ` [PATCH v2 01/23] arm: " Hugh Dickins
2023-06-08 19:11 ` [PATCH v2 02/23] arm64: allow pte_offset_map() " Hugh Dickins
2023-06-08 19:13 ` [PATCH v2 03/23] arm64/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-06-08 19:14 ` [PATCH v2 04/23] ia64/hugetlb: " Hugh Dickins
2023-06-08 19:15 ` [PATCH v2 05/23] m68k: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-06-08 19:16 ` [PATCH v2 06/23] microblaze: allow pte_offset_map() " Hugh Dickins
2023-06-08 19:17 ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Hugh Dickins
2023-06-09  8:08   ` [PATCH v2 07/23 fix] mips: update_mmu_cache() can replace __update_tlb(): fix Hugh Dickins
2023-06-14 23:17   ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Nathan Chancellor
2023-06-15  0:26     ` Hugh Dickins
2023-06-15  5:43       ` Hugh Dickins
2023-06-15 15:50         ` Nathan Chancellor
2023-06-15 21:22           ` Hugh Dickins
2023-06-15 23:02             ` [PATCH v2 07/23 replacement] mips: add pte_unmap() to balance pte_offset_map() Hugh Dickins
2023-06-17  3:54               ` Yu Zhao
2023-06-18 20:57                 ` Yu Zhao
2023-06-15 22:07     ` [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb() Yu Zhao
2023-06-08 19:18 ` [PATCH v2 08/23] parisc: add pte_unmap() to balance get_ptep() Hugh Dickins
2023-06-19  3:55   ` Helge Deller
2023-06-08 19:20 ` [PATCH v2 09/23] parisc: unmap_uncached_pte() use pte_offset_kernel() Hugh Dickins
2023-06-08 19:21 ` [PATCH v2 10/23] parisc/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-06-08 19:22 ` [PATCH v2 11/23] powerpc: kvmppc_unmap_free_pmd() pte_offset_kernel() Hugh Dickins
2023-06-08 19:23 ` [PATCH v2 12/23] powerpc: allow pte_offset_map[_lock]() to fail Hugh Dickins
2023-06-08 19:24 ` [PATCH v2 13/23] powerpc/hugetlb: pte_alloc_huge() Hugh Dickins
2023-06-08 19:25 ` [PATCH v2 14/23] riscv/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-06-08 19:27 ` [PATCH v2 15/23] s390: allow pte_offset_map_lock() to fail Hugh Dickins
2023-06-13 11:45   ` Claudio Imbrenda
2023-06-08 19:29 ` [PATCH v2 16/23] s390: gmap use pte_unmap_unlock() not spin_unlock() Hugh Dickins
2023-06-08 19:30 ` [PATCH v2 17/23] sh/hugetlb: pte_alloc_huge() pte_offset_huge() Hugh Dickins
2023-06-08 19:31 ` [PATCH v2 18/23] sparc/hugetlb: " Hugh Dickins
2023-06-08 19:32 ` [PATCH v2 19/23] sparc: allow pte_offset_map() to fail Hugh Dickins
2023-06-08 19:33 ` [PATCH v2 20/23] sparc: iounit and iommu use pte_offset_kernel() Hugh Dickins
2023-06-08 19:35 ` [PATCH v2 21/23] x86: Allow get_locked_pte() to fail Hugh Dickins
2023-06-08 19:36 ` [PATCH v2 22/23] x86: sme_populate_pgd() use pte_offset_kernel() Hugh Dickins
2023-06-08 19:37 ` [PATCH v2 23/23] xtensa: add pte_unmap() to balance pte_offset_map() Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).