From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71])
	by kanga.kvack.org (Postfix) with ESMTP id 729E86810BE
	for <linux-mm@kvack.org>; Tue, 11 Jul 2017 18:34:47 -0400 (EDT)
Received: by mail-pg0-f71.google.com with SMTP id t8so6410585pgs.5
        for <linux-mm@kvack.org>; Tue, 11 Jul 2017 15:34:47 -0700 (PDT)
Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com. [2607:f8b0:400e:c00::243])
        by mx.google.com with ESMTPS id o5si442944pgk.27.2017.07.11.15.34.46
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Tue, 11 Jul 2017 15:34:46 -0700 (PDT)
Received: by mail-pf0-x243.google.com with SMTP id z6so670236pfk.3
        for <linux-mm@kvack.org>; Tue, 11 Jul 2017 15:34:46 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: Potential race in TLB flush batching?
From: Nadav Amit <nadav.amit@gmail.com>
In-Reply-To: <9ECCACFE-6006-4C19-8FC0-C387EB5F3BEE@gmail.com>
Date: Tue, 11 Jul 2017 15:34:44 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <54C7B456-17EF-442D-8FAC-C8BE9D160750@gmail.com>
References: <20170711064149.bg63nvi54ycynxw4@suse.de>
 <D810A11D-1827-48C7-BA74-C1A6DCD80862@gmail.com>
 <20170711092935.bogdb4oja6v7kilq@suse.de>
 <E37E0D40-821A-4C82-B924-F1CE6DF97719@gmail.com>
 <20170711132023.wdfpjxwtbqpi3wp2@suse.de>
 <CALCETrUOYwpJZAAVF8g+_U9fo5cXmGhYrM-ix+X=bbfid+j-Cw@mail.gmail.com>
 <20170711155312.637eyzpqeghcgqzp@suse.de>
 <CALCETrWjER+vLfDryhOHbJAF5D5YxjN7e9Z0kyhbrmuQ-CuVbA@mail.gmail.com>
 <20170711191823.qthrmdgqcd3rygjk@suse.de>
 <20170711200923.gyaxfjzz3tpvreuq@suse.de>
 <20170711215240.tdpmwmgwcuerjj3o@suse.de>
 <9ECCACFE-6006-4C19-8FC0-C387EB5F3BEE@gmail.com>
Sender: owner-linux-mm@kvack.org
List-ID: <linux-mm.kvack.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>, "open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>

Nadav Amit <nadav.amit@gmail.com> wrote:

> Mel Gorman <mgorman@suse.de> wrote:
>=20
>> On Tue, Jul 11, 2017 at 09:09:23PM +0100, Mel Gorman wrote:
>>> On Tue, Jul 11, 2017 at 08:18:23PM +0100, Mel Gorman wrote:
>>>> I don't think we should be particularly clever about this and =
instead just
>>>> flush the full mm if there is a risk of a parallel batching of =
flushing is
>>>> in progress resulting in a stale TLB entry being used. I think =
tracking mms
>>>> that are currently batching would end up being costly in terms of =
memory,
>>>> fairly complex, or both. Something like this?
>>>=20
>>> mremap and madvise(DONTNEED) would also need to flush. Memory =
policies are
>>> fine as a move_pages call that hits the race will simply fail to =
migrate
>>> a page that is being freed and once migration starts, it'll be =
flushed so
>>> a stale access has no further risk. copy_page_range should also be =
ok as
>>> the old mm is flushed and the new mm cannot have entries yet.
>>=20
>> Adding those results in
>=20
> You are way too fast for me.
>=20
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -637,12 +637,34 @@ static bool should_defer_flush(struct mm_struct =
*mm, enum ttu_flags flags)
>> 		return false;
>>=20
>> 	/* If remote CPUs need to be flushed then defer batch the flush =
*/
>> -	if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
>> +	if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) {
>> 		should_defer =3D true;
>> +		mm->tlb_flush_batched =3D true;
>> +	}
>=20
> Since mm->tlb_flush_batched is set before the PTE is actually cleared, =
it
> still seems to leave a short window for a race.
>=20
> CPU0				CPU1
> ---- 				----
> should_defer_flush
> =3D> mm->tlb_flush_batched=3Dtrue	=09
> 				flush_tlb_batched_pending (another PT)
> 				=3D> flush TLB
> 				=3D> mm->tlb_flush_batched=3Dfalse
> ptep_get_and_clear
> ...
>=20
> 				flush_tlb_batched_pending (batched PT)
> 				use the stale PTE
> ...
> try_to_unmap_flush
>=20
>=20
> IOW it seems that mm->flush_flush_batched should be set after the PTE =
is
> cleared (and have some compiler barrier to be on the safe side).

I=E2=80=99m actually not sure about that. Without a lock that other =
order may be
racy as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>