From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 729E86810BE for ; Tue, 11 Jul 2017 18:34:47 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id t8so6410585pgs.5 for ; Tue, 11 Jul 2017 15:34:47 -0700 (PDT) Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com. [2607:f8b0:400e:c00::243]) by mx.google.com with ESMTPS id o5si442944pgk.27.2017.07.11.15.34.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Jul 2017 15:34:46 -0700 (PDT) Received: by mail-pf0-x243.google.com with SMTP id z6so670236pfk.3 for ; Tue, 11 Jul 2017 15:34:46 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Potential race in TLB flush batching? From: Nadav Amit In-Reply-To: <9ECCACFE-6006-4C19-8FC0-C387EB5F3BEE@gmail.com> Date: Tue, 11 Jul 2017 15:34:44 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <54C7B456-17EF-442D-8FAC-C8BE9D160750@gmail.com> References: <20170711064149.bg63nvi54ycynxw4@suse.de> <20170711092935.bogdb4oja6v7kilq@suse.de> <20170711132023.wdfpjxwtbqpi3wp2@suse.de> <20170711155312.637eyzpqeghcgqzp@suse.de> <20170711191823.qthrmdgqcd3rygjk@suse.de> <20170711200923.gyaxfjzz3tpvreuq@suse.de> <20170711215240.tdpmwmgwcuerjj3o@suse.de> <9ECCACFE-6006-4C19-8FC0-C387EB5F3BEE@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andy Lutomirski , "open list:MEMORY MANAGEMENT" Nadav Amit wrote: > Mel Gorman wrote: >=20 >> On Tue, Jul 11, 2017 at 09:09:23PM +0100, Mel Gorman wrote: >>> On Tue, Jul 11, 2017 at 08:18:23PM +0100, Mel Gorman wrote: >>>> I don't think we should be particularly clever about this and = instead just >>>> flush the full mm if there is a risk of a parallel batching of = flushing is >>>> in progress resulting in a stale TLB entry being used. I think = tracking mms >>>> that are currently batching would end up being costly in terms of = memory, >>>> fairly complex, or both. Something like this? >>>=20 >>> mremap and madvise(DONTNEED) would also need to flush. Memory = policies are >>> fine as a move_pages call that hits the race will simply fail to = migrate >>> a page that is being freed and once migration starts, it'll be = flushed so >>> a stale access has no further risk. copy_page_range should also be = ok as >>> the old mm is flushed and the new mm cannot have entries yet. >>=20 >> Adding those results in >=20 > You are way too fast for me. >=20 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -637,12 +637,34 @@ static bool should_defer_flush(struct mm_struct = *mm, enum ttu_flags flags) >> return false; >>=20 >> /* If remote CPUs need to be flushed then defer batch the flush = */ >> - if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) >> + if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) { >> should_defer =3D true; >> + mm->tlb_flush_batched =3D true; >> + } >=20 > Since mm->tlb_flush_batched is set before the PTE is actually cleared, = it > still seems to leave a short window for a race. >=20 > CPU0 CPU1 > ---- ---- > should_defer_flush > =3D> mm->tlb_flush_batched=3Dtrue =09 > flush_tlb_batched_pending (another PT) > =3D> flush TLB > =3D> mm->tlb_flush_batched=3Dfalse > ptep_get_and_clear > ... >=20 > flush_tlb_batched_pending (batched PT) > use the stale PTE > ... > try_to_unmap_flush >=20 >=20 > IOW it seems that mm->flush_flush_batched should be set after the PTE = is > cleared (and have some compiler barrier to be on the safe side). I=E2=80=99m actually not sure about that. Without a lock that other = order may be racy as well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org