From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, kaleshsingh@google.com, npiggin@gmail.com, joel@joelfernandes.org, Christophe Leroy <christophe.leroy@csgroup.eu>, Linus Torvalds <torvalds@linux-foundation.org>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Subject: [PATCH v6 updated 9/11] mm/mremap: Fix race between mremap and pageout Date: Mon, 24 May 2021 19:08:56 +0530 [thread overview] Message-ID: <20210524133856.85060-1-aneesh.kumar@linux.ibm.com> (raw) In-Reply-To: <20210524090114.63446-10-aneesh.kumar@linux.ibm.com> CPU 1 CPU 2 CPU 3 mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one addr = old_addr lock(pte_ptl) lock(pmd_ptl) pmd = *old_pmd pmd_clear(old_pmd) flush_tlb_range(old_addr) *new_pmd = pmd *new_addr = 10; and fills TLB with new addr and old pfn unlock(pmd_ptl) ptep_get_and_clear() flush_tlb_range(old_addr) old pfn is free. Stale TLB entry Avoid the above race with MOVE_PMD by holding pte ptl in mremap and waiting for parallel pagetable walk to finish operating on pte before updating new_pmd With MOVE_PUD only enable MOVE_PUD only if USE_SPLIT_PTE_PTLOCKS is disabled. In this case both pte ptl and pud ptl points to mm->page_table_lock. Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions") Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions") Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- Change: * Check for split PTL before taking pte ptl lock. mm/mremap.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/mm/mremap.c b/mm/mremap.c index 8967a3707332..2fa3e0cb6176 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -224,7 +224,7 @@ static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma, static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) { - spinlock_t *old_ptl, *new_ptl; + spinlock_t *pte_ptl, *old_ptl, *new_ptl; struct mm_struct *mm = vma->vm_mm; pmd_t pmd; @@ -254,6 +254,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, if (WARN_ON_ONCE(!pmd_none(*new_pmd))) return false; + /* * We don't have to worry about the ordering of src and dst * ptlocks because exclusive mmap_lock prevents deadlock. @@ -263,6 +264,10 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + if (pmd_none(*old_pmd)) + goto unlock_out; + + pte_ptl = pte_lockptr(mm, old_pmd); /* Clear the pmd */ pmd = *old_pmd; pmd_clear(old_pmd); @@ -270,9 +275,20 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, * flush the TLB before we move the page table entries. */ flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE); + + /* + * Take the ptl here so that we wait for parallel page table walk + * and operations (eg: pageout)using old addr to finish. + */ + if (USE_SPLIT_PTE_PTLOCKS) + spin_lock(pte_ptl); + VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); + if (USE_SPLIT_PTE_PTLOCKS) + spin_unlock(pte_ptl); +unlock_out: if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); @@ -296,6 +312,14 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr, struct mm_struct *mm = vma->vm_mm; pud_t pud; + /* + * Disable MOVE_PUD until we get the pageout done with all + * higher level page table locks held. With SPLIT_PTE_PTLOCKS + * we use mm->page_table_lock for both pte ptl and pud ptl + */ + if (USE_SPLIT_PTE_PTLOCKS) + return false; + /* * The destination pud shouldn't be established, free_pgtables() * should have released it. -- 2.31.1
WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Linus Torvalds <torvalds@linux-foundation.org>, npiggin@gmail.com, kaleshsingh@google.com, joel@joelfernandes.org, linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6 updated 9/11] mm/mremap: Fix race between mremap and pageout Date: Mon, 24 May 2021 19:08:56 +0530 [thread overview] Message-ID: <20210524133856.85060-1-aneesh.kumar@linux.ibm.com> (raw) In-Reply-To: <20210524090114.63446-10-aneesh.kumar@linux.ibm.com> CPU 1 CPU 2 CPU 3 mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one addr = old_addr lock(pte_ptl) lock(pmd_ptl) pmd = *old_pmd pmd_clear(old_pmd) flush_tlb_range(old_addr) *new_pmd = pmd *new_addr = 10; and fills TLB with new addr and old pfn unlock(pmd_ptl) ptep_get_and_clear() flush_tlb_range(old_addr) old pfn is free. Stale TLB entry Avoid the above race with MOVE_PMD by holding pte ptl in mremap and waiting for parallel pagetable walk to finish operating on pte before updating new_pmd With MOVE_PUD only enable MOVE_PUD only if USE_SPLIT_PTE_PTLOCKS is disabled. In this case both pte ptl and pud ptl points to mm->page_table_lock. Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions") Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions") Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- Change: * Check for split PTL before taking pte ptl lock. mm/mremap.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/mm/mremap.c b/mm/mremap.c index 8967a3707332..2fa3e0cb6176 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -224,7 +224,7 @@ static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma, static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) { - spinlock_t *old_ptl, *new_ptl; + spinlock_t *pte_ptl, *old_ptl, *new_ptl; struct mm_struct *mm = vma->vm_mm; pmd_t pmd; @@ -254,6 +254,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, if (WARN_ON_ONCE(!pmd_none(*new_pmd))) return false; + /* * We don't have to worry about the ordering of src and dst * ptlocks because exclusive mmap_lock prevents deadlock. @@ -263,6 +264,10 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + if (pmd_none(*old_pmd)) + goto unlock_out; + + pte_ptl = pte_lockptr(mm, old_pmd); /* Clear the pmd */ pmd = *old_pmd; pmd_clear(old_pmd); @@ -270,9 +275,20 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, * flush the TLB before we move the page table entries. */ flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE); + + /* + * Take the ptl here so that we wait for parallel page table walk + * and operations (eg: pageout)using old addr to finish. + */ + if (USE_SPLIT_PTE_PTLOCKS) + spin_lock(pte_ptl); + VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); + if (USE_SPLIT_PTE_PTLOCKS) + spin_unlock(pte_ptl); +unlock_out: if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); @@ -296,6 +312,14 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr, struct mm_struct *mm = vma->vm_mm; pud_t pud; + /* + * Disable MOVE_PUD until we get the pageout done with all + * higher level page table locks held. With SPLIT_PTE_PTLOCKS + * we use mm->page_table_lock for both pte ptl and pud ptl + */ + if (USE_SPLIT_PTE_PTLOCKS) + return false; + /* * The destination pud shouldn't be established, free_pgtables() * should have released it. -- 2.31.1
next prev parent reply other threads:[~2021-05-24 13:39 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-24 9:01 [PATCH v6 00/11] Speedup mremap on ppc64 Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 01/11] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 02/11] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 03/11] mm/mremap: Convert huge PUD move to separate helper Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 04/11] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 05/11] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 06/11] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 07/11] mm/mremap: Use range flush that does TLB and page walk cache flush Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 17:02 ` Linus Torvalds 2021-05-24 17:02 ` Linus Torvalds 2021-05-25 13:27 ` Aneesh Kumar K.V 2021-05-25 13:27 ` Aneesh Kumar K.V 2021-05-25 17:08 ` Linus Torvalds 2021-05-25 17:08 ` Linus Torvalds 2021-05-24 9:01 ` [PATCH v6 08/11] mm/mremap: properly flush the TLB on mremap Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 09/11] mm/mremap: Fix race between mremap and pageout Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 13:38 ` Aneesh Kumar K.V [this message] 2021-05-24 13:38 ` [PATCH v6 updated 9/11] " Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 10/11] mm/mremap: Allow arch runtime override Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V 2021-05-24 9:01 ` [PATCH v6 11/11] powerpc/mm: Enable HAVE_MOVE_PMD support Aneesh Kumar K.V 2021-05-24 9:01 ` Aneesh Kumar K.V [not found] <id:20210524090114.63446-10-aneesh.kumar@linux.ibm.com> 2021-05-24 13:38 ` [PATCH v6 updated 9/11] mm/mremap: Fix race between mremap and pageout Aneesh Kumar K.V 2021-05-24 13:38 ` Aneesh Kumar K.V 2021-05-24 17:16 ` Linus Torvalds 2021-05-24 17:16 ` Linus Torvalds 2021-05-25 8:44 ` A lneesh Kumar K.V 2021-05-25 8:44 ` A lneesh Kumar K.V 2021-05-25 17:22 ` Linus Torvalds 2021-05-25 17:22 ` Linus Torvalds
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210524133856.85060-1-aneesh.kumar@linux.ibm.com \ --to=aneesh.kumar@linux.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=christophe.leroy@csgroup.eu \ --cc=joel@joelfernandes.org \ --cc=kaleshsingh@google.com \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mpe@ellerman.id.au \ --cc=npiggin@gmail.com \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.