From: Minchan Kim <minchan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>,
linux-mm@kvack.org, nadav.amit@gmail.com,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
Ingo Molnar <mingo@redhat.com>,
Russell King <linux@armlinux.org.uk>,
Tony Luck <tony.luck@intel.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Jeff Dike <jdike@addtoit.com>,
linux-arch@vger.kernel.org
Subject: Re: [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Date: Mon, 14 Aug 2017 10:26:17 +0900 [thread overview]
Message-ID: <20170814012617.GB25427@bbox> (raw)
In-Reply-To: <20170811133020.zozuuhbw72lzolj5@hirez.programming.kicks-ass.net>
Hi Peter,
On Fri, Aug 11, 2017 at 03:30:20PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 05:08:17PM -0700, Nadav Amit wrote:
> > void tlb_finish_mmu(struct mmu_gather *tlb,
> > unsigned long start, unsigned long end)
> > {
> > - arch_tlb_finish_mmu(tlb, start, end);
> > + /*
> > + * If there are parallel threads are doing PTE changes on same range
> > + * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
> > + * flush by batching, a thread has stable TLB entry can fail to flush
> > + * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
> > + * forcefully if we detect parallel PTE batching threads.
> > + */
> > + bool force = mm_tlb_flush_nested(tlb->mm);
> > +
> > + arch_tlb_finish_mmu(tlb, start, end, force);
> > }
>
> I don't understand the comment nor the ordering. What guarantees we see
> the increment if we need to?
How about this about commenting part?
From 05f06fd6aba14447a9ca2df8b810fbcf9a58e14b Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 14 Aug 2017 10:16:56 +0900
Subject: [PATCH] mm: add describable comment for TLB batch race
[1] is a rather subtle/complicated bug so that it's hard to
understand it with limited code comment.
This patch adds a sequence diagaram to explain the problem
more easily, I hope.
[1] 99baac21e458, mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/memory.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index bcbe56f52163..f571b0eb9816 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -413,12 +413,37 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
+
+
/*
* If there are parallel threads are doing PTE changes on same range
* under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
* flush by batching, a thread has stable TLB entry can fail to flush
* the TLB by observing pte_none|!pte_dirty, for example so flush TLB
* forcefully if we detect parallel PTE batching threads.
+ *
+ * Example: MADV_DONTNEED stale TLB problem on same range
+ *
+ * CPU 0 CPU 1
+ * *a = 1;
+ * MADV_DONTNEED
+ * MADV_DONTNEED tlb_gather_mmu
+ * tlb_gather_mmu
+ * down_read(mmap_sem) down_read(mmap_sem)
+ * pte_lock
+ * pte_get_and_clear
+ * tlb_remove_tlb_entry
+ * pte_unlock
+ * pte_lock
+ * found out the pte is none
+ * pte_unlock
+ * tlb_finish_mmu doesn't flush
+ *
+ * Access the address with stale TLB
+ * *a = 2;ie, success without segfault
+ * tlb_finish_mmu flush on range
+ * but it is too late.
+ *
*/
bool force = mm_tlb_flush_nested(tlb->mm);
--
2.7.4
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>,
linux-mm@kvack.org, nadav.amit@gmail.com,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
Ingo Molnar <mingo@redhat.com>,
Russell King <linux@armlinux.org.uk>,
Tony Luck <tony.luck@intel.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Jeff Dike <jdike@addtoit.com>,
linux-arch@vger.kernel.org
Subject: Re: [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Date: Mon, 14 Aug 2017 10:26:17 +0900 [thread overview]
Message-ID: <20170814012617.GB25427@bbox> (raw)
Message-ID: <20170814012617.38ip7KUv-4nQr2LlmL3_9PbWJviNcqGaNk-iZ_7Oc5s@z> (raw)
In-Reply-To: <20170811133020.zozuuhbw72lzolj5@hirez.programming.kicks-ass.net>
Hi Peter,
On Fri, Aug 11, 2017 at 03:30:20PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 05:08:17PM -0700, Nadav Amit wrote:
> > void tlb_finish_mmu(struct mmu_gather *tlb,
> > unsigned long start, unsigned long end)
> > {
> > - arch_tlb_finish_mmu(tlb, start, end);
> > + /*
> > + * If there are parallel threads are doing PTE changes on same range
> > + * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
> > + * flush by batching, a thread has stable TLB entry can fail to flush
> > + * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
> > + * forcefully if we detect parallel PTE batching threads.
> > + */
> > + bool force = mm_tlb_flush_nested(tlb->mm);
> > +
> > + arch_tlb_finish_mmu(tlb, start, end, force);
> > }
>
> I don't understand the comment nor the ordering. What guarantees we see
> the increment if we need to?
How about this about commenting part?
next prev parent reply other threads:[~2017-08-14 1:26 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20170802000818.4760-1-namit@vmware.com>
2017-08-02 0:08 ` [PATCH v6 4/7] mm: refactoring TLB gathering API Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-11 9:23 ` Peter Zijlstra
2017-08-11 9:23 ` Peter Zijlstra
2017-08-11 17:12 ` Nadav Amit
2017-08-11 17:12 ` Nadav Amit
2017-08-14 0:49 ` Minchan Kim
2017-08-02 0:08 ` [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-08 1:19 ` [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression kernel test robot
2017-08-08 2:28 ` Minchan Kim
2017-08-08 2:28 ` Minchan Kim
2017-08-08 4:23 ` Nadav Amit
2017-08-08 4:23 ` Nadav Amit
2017-08-08 5:51 ` Nadav Amit
2017-08-08 5:51 ` Nadav Amit
2017-08-08 8:08 ` Minchan Kim
2017-08-08 8:08 ` Minchan Kim
2017-08-08 8:16 ` Nadav Amit
2017-08-09 1:25 ` Ye Xiaolong
2017-08-09 2:59 ` Ye Xiaolong
2017-08-09 2:59 ` Ye Xiaolong
2017-08-10 4:13 ` Minchan Kim
2017-08-10 4:13 ` Minchan Kim
2017-08-10 4:14 ` Nadav Amit
2017-08-10 4:14 ` Nadav Amit
2017-08-10 4:20 ` Minchan Kim
2017-08-11 13:30 ` [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem Peter Zijlstra
2017-08-11 13:30 ` Peter Zijlstra
2017-08-13 6:14 ` Nadav Amit
2017-08-13 6:14 ` Nadav Amit
2017-08-13 12:08 ` Peter Zijlstra
2017-08-13 12:08 ` Peter Zijlstra
2017-08-14 1:26 ` Minchan Kim [this message]
2017-08-14 1:26 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170814012617.GB25427@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=heiko.carstens@de.ibm.com \
--cc=jdike@addtoit.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=mingo@redhat.com \
--cc=nadav.amit@gmail.com \
--cc=namit@vmware.com \
--cc=peterz@infradead.org \
--cc=schwidefsky@de.ibm.com \
--cc=tony.luck@intel.com \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).