From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B9BEC4360C for ; Wed, 25 Sep 2019 22:27:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 41B0B20640 for ; Wed, 25 Sep 2019 22:27:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RjuYzY0N" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 41B0B20640 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CF4B26B0006; Wed, 25 Sep 2019 18:27:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7C7A6B0008; Wed, 25 Sep 2019 18:27:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B44086B000A; Wed, 25 Sep 2019 18:27:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 9325A6B0006 for ; Wed, 25 Sep 2019 18:27:01 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 20B5E4FF7 for ; Wed, 25 Sep 2019 22:27:01 +0000 (UTC) X-FDA: 75974879442.29.watch53_3d315865ef242 X-HE-Tag: watch53_3d315865ef242 X-Filterd-Recvd-Size: 6714 Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 25 Sep 2019 22:27:00 +0000 (UTC) Received: by mail-io1-f66.google.com with SMTP id q10so1209449iop.2 for ; Wed, 25 Sep 2019 15:27:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=fm80X3RKbE4ptbvOr+D6078MZ2RjBl28nL5ykuuTQLs=; b=RjuYzY0NFN5UQbOo85Gxnrd6/tMuvsnauNcqjc8V/Add6MpPa7GHJKPE92tzVaI6a+ OVwszkmyOLMSkFBO5fIGLxoabJ8uMu4klb/S0gVNKF0bJKt64NI4eKEIGrHcoIQFWc7Q sT3pP8+Sz8pNMYPbK/OP5DBXslYaqhJG8K2YaB0nwL3+DTjjWC+c5T1R5Do3Hxeyz4oS kfCdC0cdVNIkrADIjhEGBY2afz92KHgixzovi2hWrtKvCgozpbYIMQMakVcX5R7hjK/J 83k4dR0D6+q0Nx5W+d1BlP0musvEnJ21aL5oBJA/wIpbi/2VWZzjifEZMieY7A5xIlYI /ENg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=fm80X3RKbE4ptbvOr+D6078MZ2RjBl28nL5ykuuTQLs=; b=VHAbBQABB0AZRxCIVDOgVHTpkqlZ9IbtuXgIkXhAKJgb46FsEhZMo+71pM1WrQUqsa azQFLkUpTTzxr9CkVfbw2EsMDTIS+5mmX8S+DY5vyXkK04QMlic/Iy0nbs6/R1rg7KlG DpZLyYuA24/YoX5lx1L2nzuAfifq6ubJNQHp25KkqNUCcIPAaxAlMWe092ThwRntRQKB rtWnPro/fXfLC2P0K3LoGEoghCb+yWRUGUS+iiS1qr79JluUu/xxBan5umg49nlo4JDl Lz+A5T3jtN3dkXTcvS7BUnMadYTuS9NCRfx9gnCdKfdqR0lgrKKry2EDMxPVqe1gzBPg ro3w== X-Gm-Message-State: APjAAAWv/zfH1tooWJoXy8OxfXCHXKeJm5Rh71LNtiI8dMcTRSMmNCrF eg5YDzgQCI7nZQmBLpossSfNIA== X-Google-Smtp-Source: APXvYqy9YqEdsNZhvxrBH4jGn3loHdLKjcvrLSAnBK/MTULbkW6TJqewZ9kD3D5YoQPSrFMpUB4tNA== X-Received: by 2002:a5e:8218:: with SMTP id l24mr354085iom.56.1569450419680; Wed, 25 Sep 2019 15:26:59 -0700 (PDT) Received: from google.com ([2620:15c:183:0:9f3b:444a:4649:ca05]) by smtp.gmail.com with ESMTPSA id w7sm33392iob.17.2019.09.25.15.26.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 15:26:59 -0700 (PDT) Date: Wed, 25 Sep 2019 16:26:54 -0600 From: Yu Zhao To: Peter Zijlstra Cc: Andrew Morton , Michal Hocko , "Kirill A . Shutemov" , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Vlastimil Babka , Hugh Dickins , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrea Arcangeli , "Aneesh Kumar K . V" , David Rientjes , Matthew Wilcox , Lance Roy , Ralph Campbell , Jason Gunthorpe , Dave Airlie , Thomas Hellstrom , Souptick Joarder , Mel Gorman , Jan Kara , Mike Kravetz , Huang Ying , Aaron Lu , Omar Sandoval , Thomas Gleixner , Vineeth Remanan Pillai , Daniel Jordan , Mike Rapoport , Joel Fernandes , Mark Rutland , Alexander Duyck , Pavel Tatashin , David Hildenbrand , Juergen Gross , Anthony Yznaga , Johannes Weiner , "Darrick J . Wong" , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely Message-ID: <20190925222654.GA180125@google.com> References: <20190914070518.112954-1-yuzhao@google.com> <20190924232459.214097-1-yuzhao@google.com> <20190924232459.214097-3-yuzhao@google.com> <20190925082530.GD4536@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190925082530.GD4536@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 25, 2019 at 10:25:30AM +0200, Peter Zijlstra wrote: > On Tue, Sep 24, 2019 at 05:24:58PM -0600, Yu Zhao wrote: > > We don't want to expose a non-hugetlb page to the fast gup running > > on a remote CPU before all local non-atomic ops on the page flags > > are visible first. > > > > For an anon page that isn't in swap cache, we need to make sure all > > prior non-atomic ops, especially __SetPageSwapBacked() in > > page_add_new_anon_rmap(), are ordered before set_pte_at() to prevent > > the following race: > > > > CPU 1 CPU1 > > set_pte_at() get_user_pages_fast() > > page_add_new_anon_rmap() gup_pte_range() > > __SetPageSwapBacked() SetPageReferenced() > > > > This demonstrates a non-fatal scenario. Though haven't been directly > > observed, the fatal ones can exist, e.g., PG_lock set by fast gup > > caller and then overwritten by __SetPageSwapBacked(). > > > > For an anon page that is already in swap cache or a file page, we > > don't need smp_wmb() before set_pte_at() because adding to swap or > > file cach serves as a valid write barrier. Using non-atomic ops > > thereafter is a bug, obviously. > > > > smp_wmb() is added following 11 of total 12 page_add_new_anon_rmap() > > call sites, with the only exception being > > do_huge_pmd_wp_page_fallback() because of an existing smp_wmb(). > > > > I'm thinking this patch make stuff rather fragile.. Should we instead > stick the barrier in set_p*d_at() instead? Or rather, make that store a > store-release? I prefer it this way too, but I suspected the majority would be concerned with the performance implications, especially those looping set_pte_at()s in mm/huge_memory.c. If we have a consensus that smp_wmb() would be better off wrapped together with set_p*d_at() in a macro, I'd be glad to make the change. And it seems to me applying smp_store_release() would have to happen in each arch, which might be just as fragile.