From: Minchan Kim <minchan@kernel.org>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>, Andy Lutomirski <luto@kernel.org>,
"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: Potential race in TLB flush batching?
Date: Thu, 27 Jul 2017 09:34:34 +0900 [thread overview]
Message-ID: <20170727003434.GA537@bbox> (raw)
In-Reply-To: <60FF1876-AC4F-49BB-BC36-A144C3B6EA9E@gmail.com>
On Wed, Jul 26, 2017 at 05:09:09PM -0700, Nadav Amit wrote:
> Minchan Kim <minchan@kernel.org> wrote:
>
> > Hello Nadav,
> >
> > On Wed, Jul 26, 2017 at 12:18:37PM -0700, Nadav Amit wrote:
> >> Mel Gorman <mgorman@suse.de> wrote:
> >>
> >>> On Wed, Jul 26, 2017 at 02:43:06PM +0900, Minchan Kim wrote:
> >>>>> I'm relying on the fact you are the madv_free author to determine if
> >>>>> it's really necessary. The race in question is CPU 0 running madv_free
> >>>>> and updating some PTEs while CPU 1 is also running madv_free and looking
> >>>>> at the same PTEs. CPU 1 may have writable TLB entries for a page but fail
> >>>>> the pte_dirty check (because CPU 0 has updated it already) and potentially
> >>>>> fail to flush. Hence, when madv_free on CPU 1 returns, there are still
> >>>>> potentially writable TLB entries and the underlying PTE is still present
> >>>>> so that a subsequent write does not necessarily propagate the dirty bit
> >>>>> to the underlying PTE any more. Reclaim at some unknown time at the future
> >>>>> may then see that the PTE is still clean and discard the page even though
> >>>>> a write has happened in the meantime. I think this is possible but I could
> >>>>> have missed some protection in madv_free that prevents it happening.
> >>>>
> >>>> Thanks for the detail. You didn't miss anything. It can happen and then
> >>>> it's really bug. IOW, if application does write something after madv_free,
> >>>> it must see the written value, not zero.
> >>>>
> >>>> How about adding [set|clear]_tlb_flush_pending in tlb batchin interface?
> >>>> With it, when tlb_finish_mmu is called, we can know we skip the flush
> >>>> but there is pending flush, so flush focefully to avoid madv_dontneed
> >>>> as well as madv_free scenario.
> >>>
> >>> I *think* this is ok as it's simply more expensive on the KSM side in
> >>> the event of a race but no other harmful change is made assuming that
> >>> KSM is the only race-prone. The check for mm_tlb_flush_pending also
> >>> happens under the PTL so there should be sufficient protection from the
> >>> mm struct update being visible at teh right time.
> >>>
> >>> Check using the test program from "mm: Always flush VMA ranges affected
> >>> by zap_page_range v2" if it handles the madvise case as well as that
> >>> would give some degree of safety. Make sure it's tested against 4.13-rc2
> >>> instead of mmotm which already includes the madv_dontneed fix. If yours
> >>> works for both then it supersedes the mmotm patch.
> >>>
> >>> It would also be interesting if Nadav would use his slowdown hack to see
> >>> if he can still force the corruption.
> >>
> >> The proposed fix for the KSM side is likely to work (I will try later), but
> >> on the tlb_finish_mmu() side, I think there is a problem, since if any TLB
> >> flush is performed by tlb_flush_mmu(), flush_tlb_mm_range() will not be
> >> executed. This means that tlb_finish_mmu() may flush one TLB entry, leave
> >> another one stale and not flush it.
> >
> > Okay, I will change that part like this to avoid partial flush problem.
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 1c42d69490e4..87d0ebac6605 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -529,10 +529,13 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
> > * The barriers below prevent the compiler from re-ordering the instructions
> > * around the memory barriers that are already present in the code.
> > */
> > -static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> > +static inline int mm_tlb_flush_pending(struct mm_struct *mm)
> > {
> > + int nr_pending;
> > +
> > barrier();
> > - return atomic_read(&mm->tlb_flush_pending) > 0;
> > + nr_pending = atomic_read(&mm->tlb_flush_pending);
> > + return nr_pending;
> > }
> > static inline void set_tlb_flush_pending(struct mm_struct *mm)
> > {
> > diff --git a/mm/memory.c b/mm/memory.c
> > index d5c5e6497c70..b5320e96ec51 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -286,11 +286,15 @@ bool tlb_flush_mmu(struct mmu_gather *tlb)
> > void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
> > {
> > struct mmu_gather_batch *batch, *next;
> > - bool flushed = tlb_flush_mmu(tlb);
> >
> > + if (!tlb->fullmm && !tlb->need_flush_all &&
> > + mm_tlb_flush_pending(tlb->mm) > 1) {
>
> I saw you noticed my comment about the access of the flag without a lock. I
> must say it feels strange that a memory barrier would be needed here, but
> that what I understood from the documentation.
I saw your recent barriers fix patch, too.
[PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending
As I commented out in there, I hope to use below here without being
aware of complex barrier stuff. Instead, mm_tlb_flush_pending should
call the right barrier inside.
mm_tlb_flush_pending(tlb->mm, false:no-pte-locked) > 1
>
> > + tlb->start = min(start, tlb->start);
> > + tlb->end = max(end, tlb->end);
>
> Erra?| You open-code mmu_gather which is arch-specific. It appears that all of
> them have start and end members, but not need_flush_all. Besides, I am not
When I see tlb_gather_mmu which is not arch-specific, it intializes
need_flush_all to zero so it would be no harmful although some of
architecture doesn't set the flag.
Please correct me if I miss something.
> sure whether they regard start and end the same way.
I understand your worry but my patch takes longer range by min/max
so I cannot imagine how it breaks. During looking the code, I found
__tlb_adjust_range so better to use it rather than open-code.
diff --git a/mm/memory.c b/mm/memory.c
index b5320e96ec51..b23188daa396 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -288,10 +288,8 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
struct mmu_gather_batch *batch, *next;
if (!tlb->fullmm && !tlb->need_flush_all &&
- mm_tlb_flush_pending(tlb->mm) > 1) {
- tlb->start = min(start, tlb->start);
- tlb->end = max(end, tlb->end);
- }
+ mm_tlb_flush_pending(tlb->mm) > 1)
+ __tlb_adjust_range(tlb->mm, start, end - start);
tlb_flush_mmu(tlb);
clear_tlb_flush_pending(tlb->mm);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-27 0:34 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-11 0:52 Potential race in TLB flush batching? Nadav Amit
2017-07-11 6:41 ` Mel Gorman
2017-07-11 7:30 ` Nadav Amit
2017-07-11 9:29 ` Mel Gorman
2017-07-11 10:40 ` Nadav Amit
2017-07-11 13:20 ` Mel Gorman
2017-07-11 14:58 ` Andy Lutomirski
2017-07-11 15:53 ` Mel Gorman
2017-07-11 17:23 ` Andy Lutomirski
2017-07-11 19:18 ` Mel Gorman
2017-07-11 20:06 ` Nadav Amit
2017-07-11 21:09 ` Mel Gorman
2017-07-11 20:09 ` Mel Gorman
2017-07-11 21:52 ` Mel Gorman
2017-07-11 22:27 ` Nadav Amit
2017-07-11 22:34 ` Nadav Amit
2017-07-12 8:27 ` Mel Gorman
2017-07-12 23:27 ` Nadav Amit
2017-07-12 23:36 ` Andy Lutomirski
2017-07-12 23:42 ` Nadav Amit
2017-07-13 5:38 ` Andy Lutomirski
2017-07-13 16:05 ` Nadav Amit
2017-07-13 16:06 ` Andy Lutomirski
2017-07-13 6:07 ` Mel Gorman
2017-07-13 16:08 ` Andy Lutomirski
2017-07-13 17:07 ` Mel Gorman
2017-07-13 17:15 ` Andy Lutomirski
2017-07-13 18:23 ` Mel Gorman
2017-07-14 23:16 ` Nadav Amit
2017-07-15 15:55 ` Mel Gorman
2017-07-15 16:41 ` Andy Lutomirski
2017-07-17 7:49 ` Mel Gorman
2017-07-18 21:28 ` Nadav Amit
2017-07-19 7:41 ` Mel Gorman
2017-07-19 19:41 ` Nadav Amit
2017-07-19 19:58 ` Mel Gorman
2017-07-19 20:20 ` Nadav Amit
2017-07-19 21:47 ` Mel Gorman
2017-07-19 22:19 ` Nadav Amit
2017-07-19 22:59 ` Mel Gorman
2017-07-19 23:39 ` Nadav Amit
2017-07-20 7:43 ` Mel Gorman
2017-07-22 1:19 ` Nadav Amit
2017-07-24 9:58 ` Mel Gorman
2017-07-24 19:46 ` Nadav Amit
2017-07-25 7:37 ` Minchan Kim
2017-07-25 8:51 ` Mel Gorman
2017-07-25 9:11 ` Minchan Kim
2017-07-25 10:10 ` Mel Gorman
2017-07-26 5:43 ` Minchan Kim
2017-07-26 9:22 ` Mel Gorman
2017-07-26 19:18 ` Nadav Amit
2017-07-26 23:40 ` Minchan Kim
2017-07-27 0:09 ` Nadav Amit
2017-07-27 0:34 ` Minchan Kim [this message]
2017-07-27 0:48 ` Nadav Amit
2017-07-27 1:13 ` Nadav Amit
2017-07-27 7:04 ` Minchan Kim
2017-07-27 7:21 ` Mel Gorman
2017-07-27 16:04 ` Nadav Amit
2017-07-27 17:36 ` Mel Gorman
2017-07-26 23:44 ` Minchan Kim
2017-07-11 22:07 ` Andy Lutomirski
2017-07-11 22:33 ` Mel Gorman
2017-07-14 7:00 ` Benjamin Herrenschmidt
2017-07-14 8:31 ` Mel Gorman
2017-07-14 9:02 ` Benjamin Herrenschmidt
2017-07-14 9:27 ` Mel Gorman
2017-07-14 22:21 ` Andy Lutomirski
2017-07-11 16:22 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170727003434.GA537@bbox \
--to=minchan@kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=nadav.amit@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).