linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>, Andy Lutomirski <luto@kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: Potential race in TLB flush batching?
Date: Wed, 26 Jul 2017 17:48:58 -0700	[thread overview]
Message-ID: <77AFE0A4-FE3D-4E05-B248-30ADE2F184EF@gmail.com> (raw)
In-Reply-To: <20170727003434.GA537@bbox>

Minchan Kim <minchan@kernel.org> wrote:

> On Wed, Jul 26, 2017 at 05:09:09PM -0700, Nadav Amit wrote:
>> Minchan Kim <minchan@kernel.org> wrote:
>> 
>>> Hello Nadav,
>>> 
>>> On Wed, Jul 26, 2017 at 12:18:37PM -0700, Nadav Amit wrote:
>>>> Mel Gorman <mgorman@suse.de> wrote:
>>>> 
>>>>> On Wed, Jul 26, 2017 at 02:43:06PM +0900, Minchan Kim wrote:
>>>>>>> I'm relying on the fact you are the madv_free author to determine if
>>>>>>> it's really necessary. The race in question is CPU 0 running madv_free
>>>>>>> and updating some PTEs while CPU 1 is also running madv_free and looking
>>>>>>> at the same PTEs. CPU 1 may have writable TLB entries for a page but fail
>>>>>>> the pte_dirty check (because CPU 0 has updated it already) and potentially
>>>>>>> fail to flush. Hence, when madv_free on CPU 1 returns, there are still
>>>>>>> potentially writable TLB entries and the underlying PTE is still present
>>>>>>> so that a subsequent write does not necessarily propagate the dirty bit
>>>>>>> to the underlying PTE any more. Reclaim at some unknown time at the future
>>>>>>> may then see that the PTE is still clean and discard the page even though
>>>>>>> a write has happened in the meantime. I think this is possible but I could
>>>>>>> have missed some protection in madv_free that prevents it happening.
>>>>>> 
>>>>>> Thanks for the detail. You didn't miss anything. It can happen and then
>>>>>> it's really bug. IOW, if application does write something after madv_free,
>>>>>> it must see the written value, not zero.
>>>>>> 
>>>>>> How about adding [set|clear]_tlb_flush_pending in tlb batchin interface?
>>>>>> With it, when tlb_finish_mmu is called, we can know we skip the flush
>>>>>> but there is pending flush, so flush focefully to avoid madv_dontneed
>>>>>> as well as madv_free scenario.
>>>>> 
>>>>> I *think* this is ok as it's simply more expensive on the KSM side in
>>>>> the event of a race but no other harmful change is made assuming that
>>>>> KSM is the only race-prone. The check for mm_tlb_flush_pending also
>>>>> happens under the PTL so there should be sufficient protection from the
>>>>> mm struct update being visible at teh right time.
>>>>> 
>>>>> Check using the test program from "mm: Always flush VMA ranges affected
>>>>> by zap_page_range v2" if it handles the madvise case as well as that
>>>>> would give some degree of safety. Make sure it's tested against 4.13-rc2
>>>>> instead of mmotm which already includes the madv_dontneed fix. If yours
>>>>> works for both then it supersedes the mmotm patch.
>>>>> 
>>>>> It would also be interesting if Nadav would use his slowdown hack to see
>>>>> if he can still force the corruption.
>>>> 
>>>> The proposed fix for the KSM side is likely to work (I will try later), but
>>>> on the tlb_finish_mmu() side, I think there is a problem, since if any TLB
>>>> flush is performed by tlb_flush_mmu(), flush_tlb_mm_range() will not be
>>>> executed. This means that tlb_finish_mmu() may flush one TLB entry, leave
>>>> another one stale and not flush it.
>>> 
>>> Okay, I will change that part like this to avoid partial flush problem.
>>> 
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 1c42d69490e4..87d0ebac6605 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -529,10 +529,13 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
>>> * The barriers below prevent the compiler from re-ordering the instructions
>>> * around the memory barriers that are already present in the code.
>>> */
>>> -static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
>>> +static inline int mm_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> +	int nr_pending;
>>> +
>>> 	barrier();
>>> -	return atomic_read(&mm->tlb_flush_pending) > 0;
>>> +	nr_pending = atomic_read(&mm->tlb_flush_pending);
>>> +	return nr_pending;
>>> }
>>> static inline void set_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index d5c5e6497c70..b5320e96ec51 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -286,11 +286,15 @@ bool tlb_flush_mmu(struct mmu_gather *tlb)
>>> void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
>>> {
>>> 	struct mmu_gather_batch *batch, *next;
>>> -	bool flushed = tlb_flush_mmu(tlb);
>>> 
>>> +	if (!tlb->fullmm && !tlb->need_flush_all &&
>>> +			mm_tlb_flush_pending(tlb->mm) > 1) {
>> 
>> I saw you noticed my comment about the access of the flag without a lock. I
>> must say it feels strange that a memory barrier would be needed here, but
>> that what I understood from the documentation.
> 
> I saw your recent barriers fix patch, too.
> [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending
> 
> As I commented out in there, I hope to use below here without being
> aware of complex barrier stuff. Instead, mm_tlb_flush_pending should
> call the right barrier inside.
> 
>        mm_tlb_flush_pending(tlb->mm, false:no-pte-locked) > 1

I will address it in v3.


> 
>>> +		tlb->start = min(start, tlb->start);
>>> +		tlb->end = max(end, tlb->end);
>> 
>> Err… You open-code mmu_gather which is arch-specific. It appears that all of
>> them have start and end members, but not need_flush_all. Besides, I am not
> 
> When I see tlb_gather_mmu which is not arch-specific, it intializes
> need_flush_all to zero so it would be no harmful although some of
> architecture doesn't set the flag.
> Please correct me if I miss something.

Oh.. my bad. I missed the fact that this code is under “#ifdef
HAVE_GENERIC_MMU_GATHER”. But that means that arch-specific tlb_finish_mmu()
implementations (s390, arm) may need to be modified as well.

>> sure whether they regard start and end the same way.
> 
> I understand your worry but my patch takes longer range by min/max
> so I cannot imagine how it breaks. During looking the code, I found
> __tlb_adjust_range so better to use it rather than open-code.
> 
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index b5320e96ec51..b23188daa396 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -288,10 +288,8 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
> 	struct mmu_gather_batch *batch, *next;
> 
> 	if (!tlb->fullmm && !tlb->need_flush_all &&
> -			mm_tlb_flush_pending(tlb->mm) > 1) {
> -		tlb->start = min(start, tlb->start);
> -		tlb->end = max(end, tlb->end);
> -	}
> +			mm_tlb_flush_pending(tlb->mm) > 1)
> +		__tlb_adjust_range(tlb->mm, start, end - start);
> 
> 	tlb_flush_mmu(tlb);
> 	clear_tlb_flush_pending(tlb->mm);

This one is better, especially as I now understand it is only for the
generic MMU gather (which I missed before).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-27  0:49 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11  0:52 Potential race in TLB flush batching? Nadav Amit
2017-07-11  6:41 ` Mel Gorman
2017-07-11  7:30   ` Nadav Amit
2017-07-11  9:29     ` Mel Gorman
2017-07-11 10:40       ` Nadav Amit
2017-07-11 13:20         ` Mel Gorman
2017-07-11 14:58           ` Andy Lutomirski
2017-07-11 15:53             ` Mel Gorman
2017-07-11 17:23               ` Andy Lutomirski
2017-07-11 19:18                 ` Mel Gorman
2017-07-11 20:06                   ` Nadav Amit
2017-07-11 21:09                     ` Mel Gorman
2017-07-11 20:09                   ` Mel Gorman
2017-07-11 21:52                     ` Mel Gorman
2017-07-11 22:27                       ` Nadav Amit
2017-07-11 22:34                         ` Nadav Amit
2017-07-12  8:27                         ` Mel Gorman
2017-07-12 23:27                           ` Nadav Amit
2017-07-12 23:36                             ` Andy Lutomirski
2017-07-12 23:42                               ` Nadav Amit
2017-07-13  5:38                                 ` Andy Lutomirski
2017-07-13 16:05                                   ` Nadav Amit
2017-07-13 16:06                                     ` Andy Lutomirski
2017-07-13  6:07                             ` Mel Gorman
2017-07-13 16:08                               ` Andy Lutomirski
2017-07-13 17:07                                 ` Mel Gorman
2017-07-13 17:15                                   ` Andy Lutomirski
2017-07-13 18:23                                     ` Mel Gorman
2017-07-14 23:16                               ` Nadav Amit
2017-07-15 15:55                                 ` Mel Gorman
2017-07-15 16:41                                   ` Andy Lutomirski
2017-07-17  7:49                                     ` Mel Gorman
2017-07-18 21:28                                   ` Nadav Amit
2017-07-19  7:41                                     ` Mel Gorman
2017-07-19 19:41                                       ` Nadav Amit
2017-07-19 19:58                                         ` Mel Gorman
2017-07-19 20:20                                           ` Nadav Amit
2017-07-19 21:47                                             ` Mel Gorman
2017-07-19 22:19                                               ` Nadav Amit
2017-07-19 22:59                                                 ` Mel Gorman
2017-07-19 23:39                                                   ` Nadav Amit
2017-07-20  7:43                                                     ` Mel Gorman
2017-07-22  1:19                                                       ` Nadav Amit
2017-07-24  9:58                                                         ` Mel Gorman
2017-07-24 19:46                                                           ` Nadav Amit
2017-07-25  7:37                                                           ` Minchan Kim
2017-07-25  8:51                                                             ` Mel Gorman
2017-07-25  9:11                                                               ` Minchan Kim
2017-07-25 10:10                                                                 ` Mel Gorman
2017-07-26  5:43                                                                   ` Minchan Kim
2017-07-26  9:22                                                                     ` Mel Gorman
2017-07-26 19:18                                                                       ` Nadav Amit
2017-07-26 23:40                                                                         ` Minchan Kim
2017-07-27  0:09                                                                           ` Nadav Amit
2017-07-27  0:34                                                                             ` Minchan Kim
2017-07-27  0:48                                                                               ` Nadav Amit [this message]
2017-07-27  1:13                                                                                 ` Nadav Amit
2017-07-27  7:04                                                                                   ` Minchan Kim
2017-07-27  7:21                                                                                     ` Mel Gorman
2017-07-27 16:04                                                                                       ` Nadav Amit
2017-07-27 17:36                                                                                         ` Mel Gorman
2017-07-26 23:44                                                                       ` Minchan Kim
2017-07-11 22:07                   ` Andy Lutomirski
2017-07-11 22:33                     ` Mel Gorman
2017-07-14  7:00                     ` Benjamin Herrenschmidt
2017-07-14  8:31                       ` Mel Gorman
2017-07-14  9:02                         ` Benjamin Herrenschmidt
2017-07-14  9:27                           ` Mel Gorman
2017-07-14 22:21                             ` Andy Lutomirski
2017-07-11 16:22           ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77AFE0A4-FE3D-4E05-B248-30ADE2F184EF@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).