Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hillf Danton <hdanton@sina.com>, Hugh Dickins <hughd@google.com>,
	Josef Bacik <josef@toxicpanda.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Linux-MM <linux-mm@kvack.org>,
	mm-commits@vger.kernel.org, Will Deacon <will.deacon@arm.com>,
	Matthew Wilcox <willy@infradead.org>,
	xuyu@linux.alibaba.com
Subject: Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault
Date: Sat, 25 Jul 2020 16:58:43 +0100
Message-ID: <20200725155841.GA14490@gaia> (raw)
In-Reply-To: <CAHk-=wi8ztx7E3fmU3anHSUzxC9g+ac7kdBOop9UWdvdG-jFOg@mail.gmail.com>

On Fri, Jul 24, 2020 at 06:29:43PM -0700, Linus Torvalds wrote:
> On Fri, Jul 24, 2020 at 5:37 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
> > A follow-up question about your comment in the previous email "The
> > notion of "this is a retry, so let's do nothing" is fundamentally
> > wrong.", do you mean it is not safe?
> 
> I mean it fails my "smell test".
> 
> The patch didn't just avoid the TLB flush, it avoided all the other
> "mark it dirty and young" things too. And that made me go "why would
> RETRY be different in this regard"?

I had a similar concern, couldn't convince myself it's entirely safe.
Even if it is safe now, some mm change in the future may break the
current assumptions.

The arm64 do_page_fault() sets FAULT_FLAG_TRIED if a previous
handle_mm_fault() returns VM_FAULT_RETRY. A quick grep for
VM_FAULT_RETRY shows a few more instances than what Yang listed. Maybe
they are all safe, I just couldn't get my head around it.

> For any architecture that guarantees that a page fault will always
> flush the old TLB entry for this kind of situation, that
> flush_tlb_fix_spurious_fault() thing can be a no-op.
> 
> So that's why on x86, we just do
> 
>   #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
> 
> and have no issues.
> 
> Note that it does *not* need to do any cross-CPU flushing or anything
> like that. So it's actually wrong (I think) to have that default
> fallback for
> 
>    #define flush_tlb_fix_spurious_fault(vma, address)
> flush_tlb_page(vma, address)
> 
> because flush_tlb_page() is the serious "do cross CPU etc".
> 
> Does the arm64 flush_tlb_page() perhaps do the whole expensive
> cross-CPU thing rather than the much cheaper "just local invalidate"
> version?

I think it makes sense to have a local-only
flush_tlb_fix_spurious_fault(), but with ptep_set_access_flags() updated
to still issue the full broadcast TLBI. In addition, I added a minor
optimisation to avoid the TLB flush if the old pte was not accessible.
In a read-access fault case (followed by mkyoung), the TLB wouldn't have
cached a non-accessible pte (not sure it makes much difference to Yang's
case). Anyway, from ARMv8.1 onwards, the hardware handles the access
flag automatically.

I'm not sure the first dsb(nshst) below is of much use in this case. If
we got a spurious fault, the write to the pte happened on a different
CPU (IIUC, we shouldn't return to user with updated ptes without a TLB
flush on the same CPU). Anyway, we can refine this if it solves Yang's
performance regression.

-------------8<-----------------------
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index d493174415db..d1401cbad7d4 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -268,6 +268,20 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
 	dsb(ish);
 }
 
+static inline void local_flush_tlb_page(struct vm_area_struct *vma,
+					unsigned long uaddr)
+{
+	unsigned long addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm));
+
+	dsb(nshst);
+	__tlbi(vale1, addr);
+	__tlbi_user(vale1, addr);
+	dsb(nsh);
+}
+
+#define flush_tlb_fix_spurious_fault(vma, address) \
+	local_flush_tlb_page(vma, address)
+
 /*
  * This is meant to avoid soft lock-ups on large TLB flushing ranges and not
  * necessarily a performance improvement.
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8afb238ff335..0accee714cc2 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -218,7 +218,9 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
 		pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
 	} while (pteval != old_pteval);
 
-	flush_tlb_fix_spurious_fault(vma, address);
+	if (pte_accessible(vma->vm_mm, pte))
+		flush_tlb_page(vma, address);
+
 	return 1;
 }
 

-- 
Catalin


  reply index

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-24  4:14 incoming Andrew Morton
2020-07-24  4:15 ` [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault Andrew Morton
2020-07-24  4:38   ` Yang Shi
2020-07-24  4:56     ` Andrew Morton
2020-07-24 19:27   ` Linus Torvalds
2020-07-24 20:22     ` Linus Torvalds
2020-07-25  0:36       ` Yang Shi
2020-07-25  1:29         ` Linus Torvalds
2020-07-25 15:58           ` Catalin Marinas [this message]
2020-07-28  9:22             ` Will Deacon
2020-07-28  9:39               ` Catalin Marinas
2020-07-28 10:07                 ` Yu Xu
2020-07-28 11:46                   ` Catalin Marinas
2020-07-28 10:21                 ` Will Deacon
2020-07-28 18:28                 ` Linus Torvalds
2020-07-27 17:52           ` Yang Shi
2020-07-27 18:04             ` Linus Torvalds
2020-07-27 18:42               ` Catalin Marinas
2020-07-27 20:56                 ` Linus Torvalds
2020-07-27 22:34               ` Yang Shi
2020-07-27  7:31       ` Yu Xu
2020-07-27 11:05         ` Catalin Marinas
2020-07-27 17:01           ` Linus Torvalds
2020-07-28 11:19             ` Catalin Marinas
2020-07-27 17:12           ` Yu Xu
2020-07-27 18:04             ` Yang Shi
2020-07-27 18:37               ` Linus Torvalds
2020-07-27 22:43                 ` Yang Shi
2020-07-28  0:38                   ` Linus Torvalds
2020-07-28  0:13                 ` Yu Xu
2020-07-28 10:53                 ` Nicholas Piggin
2020-07-28 19:02                   ` Linus Torvalds
2020-07-28 22:53                     ` Nicholas Piggin
2020-07-29 13:58                       ` Michael Ellerman
2020-07-28  6:41             ` Yu Xu
2020-07-24  4:15 ` [patch 02/15] mm/mmap.c: close race between munmap() and expand_upwards()/downwards() Andrew Morton
2020-07-24  4:15 ` [patch 03/15] vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way Andrew Morton
2020-07-24  4:15 ` [patch 04/15] mm: initialize return of vm_insert_pages Andrew Morton
2020-07-24  4:15 ` [patch 05/15] mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages() Andrew Morton
2020-07-24  4:15 ` [patch 06/15] mm/memcg: fix refcount error while moving and swapping Andrew Morton
2020-07-24 13:41   ` Alex Shi
2020-07-24  4:15 ` [patch 07/15] mm: memcg/slab: fix memory leak at non-root kmem_cache destroy Andrew Morton
2020-07-24  4:15 ` [patch 08/15] mm/hugetlb: avoid hardcoding while checking if cma is enabled Andrew Morton
2020-07-24  4:15 ` [patch 09/15] khugepaged: fix null-pointer dereference due to race Andrew Morton
2020-07-24  4:15 ` [patch 10/15] mailmap: add entry for Mike Rapoport Andrew Morton
2020-07-24  4:15 ` [patch 11/15] squashfs: fix length field overlap check in metadata reading Andrew Morton
2020-07-24  4:15 ` [patch 12/15] scripts/decode_stacktrace: strip basepath from all paths Andrew Morton
2020-07-24  4:15 ` [patch 13/15] io-mapping: indicate mapping failure Andrew Morton
2020-07-24  4:15 ` [patch 14/15] MAINTAINERS: add KCOV section Andrew Morton
2020-07-24  4:15 ` [patch 15/15] scripts/gdb: fix lx-symbols 'gdb.error' while loading modules Andrew Morton
2020-07-28  1:19 ` mmotm 2020-07-27-18-18 uploaded Andrew Morton
2020-07-28  2:14   ` Stephen Rothwell
2020-07-28  3:22   ` mmotm 2020-07-27-18-18 uploaded (drivers/scsi/ufs/: SCSI_UFS_EXYNOS) Randy Dunlap
2020-07-28  8:23     ` Alim Akhtar
2020-07-28 12:33   ` mmotm 2020-07-27-18-18 uploaded (mm/page_alloc.c) Randy Dunlap
2020-07-28 21:55     ` Andrew Morton
2020-07-28 22:20       ` Stephen Rothwell
2020-07-28 22:31         ` Andrew Morton
2020-07-29 14:18           ` Michael S. Tsirkin
2020-07-29 14:38             ` David Hildenbrand
2020-07-29 16:14               ` David Hildenbrand
2020-07-29 17:29                 ` Randy Dunlap
2020-07-28 22:39       ` Randy Dunlap
2020-07-29  1:43         ` Nathan Chancellor
2020-07-29  1:44         ` Andrew Morton
2020-07-29  2:04           ` Randy Dunlap
2020-07-29 14:09           ` make oldconfig (Re: mmotm 2020-07-27-18-18 uploaded (mm/page_alloc.c)) Alexey Dobriyan
2020-07-31 23:46 ` mmotm 2020-07-31-16-45 uploaded Andrew Morton
2020-08-01  5:24   ` mmotm 2020-07-31-16-45 uploaded (drivers/staging/vc04_services/) Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200725155841.GA14490@gaia \
    --to=catalin.marinas@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=josef@toxicpanda.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=xuyu@linux.alibaba.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git