From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E06D7C433DB for ; Tue, 9 Feb 2021 14:29:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 43CC864E4F for ; Tue, 9 Feb 2021 14:29:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 43CC864E4F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C5AF86B0072; Tue, 9 Feb 2021 09:29:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE53B6B0073; Tue, 9 Feb 2021 09:29:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A85F46B0074; Tue, 9 Feb 2021 09:29:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 8E7066B0072 for ; Tue, 9 Feb 2021 09:29:52 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 55D509990 for ; Tue, 9 Feb 2021 14:29:52 +0000 (UTC) X-FDA: 77798963424.22.doll80_5c0ce1d27608 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 7E12118002E2D for ; Tue, 9 Feb 2021 14:29:46 +0000 (UTC) X-HE-Tag: doll80_5c0ce1d27608 X-Filterd-Recvd-Size: 3125 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Tue, 9 Feb 2021 14:29:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=BEAkMrtPd2EhqvXFz8kxZBwV2NEmGFcVk1K4hQcJzSk=; b=KqBvsVPTsz2CuGjiUhAbN2YDC3 oxOGzPZBlNwfSaRusWU1eZ/FfSV+ufmzGgRIFa9Tr/6LyDOOooUAtNBQpMAXgIdNHLHpE8R+93+aD CV6JDD1H5QsIhcif9GJD4CzmwilxbyvqzZMK22JOTlelj7wYHiLa/jRKlUobreQkfuhvLK/OrSNPy AGz2PxgkMIA8vMp5SaO2l3g8RBSnwQCRwXzrlP31gN1cZ4Im95KmhddhRuDQF2Mfh5McqLGXCKQT+ rNm/yNLtH9xvdBlkG0gKElVr3wyIixIFIB919tejCXQUzZJXW8XdGw1JrUGUPEYPouqlvIvtw5jAk kDpa37/w==; Received: from willy by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1l9U1N-007YRO-3z; Tue, 09 Feb 2021 14:29:41 +0000 Date: Tue, 9 Feb 2021 14:29:41 +0000 From: Matthew Wilcox To: linux-mm@kvack.org Cc: "Liam R. Howlett" , Laurent Dufour , Paul McKenney Subject: Re: synchronize_rcu in munmap? Message-ID: <20210209142941.GY308988@casper.infradead.org> References: <20210208132643.GP308988@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210208132643.GP308988@casper.infradead.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 08, 2021 at 01:26:43PM +0000, Matthew Wilcox wrote: > Next problem: /proc/$pid/smaps calls walk_page_vma() which starts out by > saying: > mmap_assert_locked(walk.mm); > which made me realise that smaps is also going to walk the page tables. > So the page tables have to be pinned by the existence of the VMA. > Which means the page tables must be freed by the same RCU callback that > frees the VMA. But doing that means that a task which calls mmap(); > munmap(); mmap(); must avoid allocating the same address for the second > mmap (until the RCU grace period has elapsed), otherwise threads on > other CPUs may see the stale PTEs instead of the new ones. > > Solution 1: Move the page table freeing into the RCU callback, call > synchronize_rcu() in munmap(). > > Solution 2: Refcount the VMA and free the page tables on refcount > dropping to zero. This doesn't actually work because the stale PTE > problem still exists. > > Solution 3: When unmapping a VMA, instead of erasing the VMA from the > maple tree, put a "dead" entry in its place. Once the RCU freeing and the > TLB shootdown has happened, erase the entry and it can then be allocated. > If we do that MAP_FIXED will have to synchronize_rcu() if it overlaps > a dead entry. Solution 4: RCU free the page table pages and teach pagewalk.c to be RCU-safe. That means that it will have to use rcu_dereference() or READ_ONCE to dereference (eg) pmdp, but also allows GUP-fast to run under the rcu read lock instead of disabling interrupts.