From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53826C43603 for ; Tue, 10 Dec 2019 16:07:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0A5D82077B for ; Tue, 10 Dec 2019 16:07:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CtUL45ec" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A5D82077B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AA8E86B2D24; Tue, 10 Dec 2019 11:07:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A59DD6B2D25; Tue, 10 Dec 2019 11:07:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9484A6B2D26; Tue, 10 Dec 2019 11:07:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 7FCEB6B2D24 for ; Tue, 10 Dec 2019 11:07:34 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 2B2688249980 for ; Tue, 10 Dec 2019 16:07:34 +0000 (UTC) X-FDA: 76249712028.07.ants66_7b6acda45c5c X-HE-Tag: ants66_7b6acda45c5c X-Filterd-Recvd-Size: 5669 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Dec 2019 16:07:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575994052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qCBhZFm6exzU/drq8rW0+khqqXSZXdCqp6UCmKR0gi0=; b=CtUL45ecvQThdhtcYP18rSOBKg3XZ15KCHZUaxcAq6tVYtFGywz/b0AsU0E/KYJs4SLqiB qx0K292QgxJPqsRxAW9/Lf+KTK5zTgnoh0UYz3f+4/WPaCTD02vXqRAVtWHB9rEk5c1dym /e6s2CKDtx0XLbdrjTPRnHbLAS1xz8k= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-240-drklZIBSN16p8j9dzN8jsw-1; Tue, 10 Dec 2019 11:07:30 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CEF55800D4C; Tue, 10 Dec 2019 16:07:28 +0000 (UTC) Received: from redhat.com (ovpn-117-131.phx2.redhat.com [10.3.117.131]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B53D15EE1A; Tue, 10 Dec 2019 16:07:27 +0000 (UTC) Date: Tue, 10 Dec 2019 11:07:25 -0500 From: Jerome Glisse To: Vlastimil Babka Cc: Matthew Wilcox , linux-mm@kvack.org, Laurent Dufour , David Rientjes , Hugh Dickins , Michel Lespinasse , Davidlohr Bueso Subject: Re: Splitting the mmap_sem Message-ID: <20191210160725.GB5257@redhat.com> References: <20191203222147.GV20752@bombadil.infradead.org> <20191205172150.GD5819@redhat.com> <16178c54-2884-667b-7ae9-814ff4eeed1b@suse.cz> MIME-Version: 1.0 In-Reply-To: <16178c54-2884-667b-7ae9-814ff4eeed1b@suse.cz> User-Agent: Mutt/1.12.1 (2019-06-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: drklZIBSN16p8j9dzN8jsw-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 10, 2019 at 04:26:40PM +0100, Vlastimil Babka wrote: > On 12/5/19 6:21 PM, Jerome Glisse wrote: > >>=20 > >> So calling mmap() looks like this: > >>=20 > >> 1 allocate a new VMA > >> 2 update pointer(s) in maple tree > >> 3 sleep until old VMAs have a zero refcount > >> 4 synchronize_rcu() > >> 5 free old VMAs > >> 6 flush caches for affected range > >> 7 return to userspace > >>=20 > >> While one thread is calling mmap(MAP_FIXED), two other threads which a= re > >> accessing the same address may see different data from each other and > >> have different page translations in their respective CPU caches until > >> the thread calling mmap() returns. I believe this is OK, but would > >> greatly appreciate hearing from people who know better. > >=20 > > I do not believe this is OK, i believe this is wrong (not even consider= ing > > possible hardware issues that can arise from such aliasing). >=20 > But is it true that the races can happen in the above such that multiple = CPU's > have different translations? I think it's impossible to tell from above -= there > are no details about when and which pte modifications happen, where ptl l= ock is > taken... perhaps after filling those details, we could be able to see tha= t > there's no race. >=20 My assumption reading Matthew was that as step 6 is making progress (flushing caches and i assume TLB too) then you can have a CPU which is already flushed and that do take a fault against the new VMA and thus get a new TLB entry that do not match a CPU which is not yet flushed. Today this can not happens because page fault will serialize on the mmap_sem (ie until the write mode is release when returning to user- space). I advocate that for MAP_FIXED we should keep that behavior ie have page fault wait until all the CPUs caches and TLB are flush before servicing new fault. The rational are: - Application with racing thread accessing an area which is under going a mmap MAP_FIXED are doing something weird (which might be totaly waranted) and thus it should not matter if we penalize those by not offering the benefit of fully concurrent page fault for that area. We are not making thing any worse then they are today. - Waiting on new MAP_FIXED vma is easy to implement and does not affect other kind of page fault (ie you still have concurrent fault to other area). Note i stress again this is only for MAP_FIXED replacing an existing vma. For any other mmap (well i would need to go over all the flags to make sure there is nothing else similar to MAP_FIXED) we can have full concurrency ie: - Access to an area that is under going munmap will SEGFAULT as soon as the vma is no longer valid (ie no longer in maple tree i guess) - Access to an area that is just being mmap will fault using the new vma (assuming we publish the vma only once it is ready to take fault) - A mmap can make progress as another munmap is making progress and reuse the the munmaped area as soon as it is fully flushed (caches and TLB). Cheers, J=E9r=F4me