From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E36DC433E2 for ; Wed, 16 Sep 2020 18:46:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CBFAA20732 for ; Wed, 16 Sep 2020 18:46:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NeHyOVDE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CBFAA20732 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EC2826B006E; Wed, 16 Sep 2020 14:46:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E73B66B007D; Wed, 16 Sep 2020 14:46:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D50E76B007E; Wed, 16 Sep 2020 14:46:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id BD27A6B006E for ; Wed, 16 Sep 2020 14:46:29 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7443F8249980 for ; Wed, 16 Sep 2020 18:46:29 +0000 (UTC) X-FDA: 77269805298.27.bulb68_3e0f4ce2711c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 47AA33D668 for ; Wed, 16 Sep 2020 18:46:29 +0000 (UTC) X-HE-Tag: bulb68_3e0f4ce2711c X-Filterd-Recvd-Size: 7663 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Sep 2020 18:46:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600281988; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=puIJGOJT+MwnnD6TZO9vIKCPtXLQsx8Gz/5DaTlQPWM=; b=NeHyOVDEjdlx5LfmCy9lCSy4coCB4KZlqlwPP+qS4NDNPwVC4qh1ATPZyqSzKfj/TzAnUR f4H0/fiQktcMn7ZwMPHmJD7v9Ty0jfL0SD5wOZmZTIj67TSFagTpvsEMBV8z+MK2MGBQnw D1YoqGFdQl385aDTkuBtun7HnlNZSQI= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-452-2Z9u5odfPlGxWsn_QdPNmg-1; Wed, 16 Sep 2020 14:46:24 -0400 X-MC-Unique: 2Z9u5odfPlGxWsn_QdPNmg-1 Received: by mail-qt1-f198.google.com with SMTP id g10so6903159qto.1 for ; Wed, 16 Sep 2020 11:46:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=puIJGOJT+MwnnD6TZO9vIKCPtXLQsx8Gz/5DaTlQPWM=; b=f7b4wS9CCmfGmTWE2FoxpwN6rSiGaZqx2ZdjwGZMJ3HhtpWvYYtPNQgOZ8rOo8O2rU 7OQkvoWR020szRIR+T0lOINfTffRF6ubzRQWGnqGCodg1ZuoFlLNfOp39d82JKkGQp5n yp95tuVFbA9kSaFfLetrWNX+FavxAJZqhFI8HOEPPu2s/+J7Uh9RkCW/ww2XRS61b/x2 a7+KStNPPE4tX4kY7EU6wSrFAgW9NhtLoTT7v4ondlQQNnT5yQ43TUuVdBhFu3+6l1lw BaiU0VWxt3TL1fM5PZdezpgLhPZT4eyT1fH0uTxNAmiQnXCqIZiMWQu8Ulqhk6sq4GG3 h+mg== X-Gm-Message-State: AOAM533NSIKBh2nV/jtKspicjlFgFiR+sCEydeaXUhIhEff8SmRAePXH UAIz3LbTO6m3J3Yc7inJd0jO/XzD8u9LxwO3Ecgp6n9K6RsyTAThddk0yJXF0SLzIdKxvStpn+E baafKApo4Q5U= X-Received: by 2002:ad4:4b61:: with SMTP id m1mr25199596qvx.11.1600281982357; Wed, 16 Sep 2020 11:46:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyo98nfkY+hGwK/tn74LUrnsECrW254lBUhMHvRV6bFVTzuK7i0gQXeUYoi4ivt4+pB84qrSw== X-Received: by 2002:ad4:4b61:: with SMTP id m1mr25199556qvx.11.1600281982062; Wed, 16 Sep 2020 11:46:22 -0700 (PDT) Received: from xz-x1 (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id v2sm6833117qkv.26.2020.09.16.11.46.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Sep 2020 11:46:21 -0700 (PDT) Date: Wed, 16 Sep 2020 14:46:19 -0400 From: Peter Xu To: Jason Gunthorpe Cc: John Hubbard , Linus Torvalds , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200916184619.GB40154@xz-x1> References: <20200914232851.GH1221970@ziepe.ca> <20200915145040.GA2949@xz-x1> <20200915160553.GJ1221970@ziepe.ca> <20200915182933.GM1221970@ziepe.ca> <20200915191346.GD2949@xz-x1> <20200915193838.GN1221970@ziepe.ca> <20200915213330.GE2949@xz-x1> <20200915232238.GO1221970@ziepe.ca> <20200916174804.GC8409@ziepe.ca> MIME-Version: 1.0 In-Reply-To: <20200916174804.GC8409@ziepe.ca> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0.002 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 47AA33D668 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 16, 2020 at 02:48:04PM -0300, Jason Gunthorpe wrote: > On Tue, Sep 15, 2020 at 06:50:46PM -0700, John Hubbard wrote: > > > > > > It seems very strange that a physical page exclusively owned by a > > > process can become copied if pin_user_pages() is active and the > > > process did fork() at some point. > > > > > > Could the new pin_user_pages() logic help here? eg the > > > GUP_PIN_COUNTING_BIAS stuff? > > > > > > Could the COW code consider a refcount of GUP_PIN_COUNTING_BIAS + 1 as > > > being owned by the current mm and not needing COW? The DMA pin would > > > be 'invisible' for COW purposes? > > > > > > Please do be careful to use the API, rather than the implementation. The > > FOLL_PIN refcounting system results in being able to get a "maybe > > DMA-pinned", or a "definitely not DMA-pinned", via this API call: > > So, what I'm thinking is something like (untested): > > diff --git a/mm/memory.c b/mm/memory.c > index 469af373ae76e1..77f63183667e52 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2889,6 +2889,26 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf) > return ret; > } > > +static bool cow_needed(struct vm_fault *vmf) > +{ > + int total_map_swapcount; > + > + if (!reuse_swap_page(vmf->page, &total_map_swapcount)) { > + unlock_page(vmf->page); > + return true; > + } > + > + if (total_map_swapcount == 1) { > + /* > + * The page is all ours. Move it to our anon_vma so the rmap > + * code will not search our parent or siblings. Protected > + * against the rmap code by the page lock. > + */ > + page_move_anon_rmap(vmf->page, vmf->vma); > + } > + return false; > +} > + > /* > * This routine handles present pages, when users try to write > * to a shared page. It is done by copying the page to a new address > @@ -2947,8 +2967,21 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > if (!trylock_page(page)) > goto copy; > if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { > + bool cow = true; > + > + /* > + * If the page is DMA pinned we can't rely on the > + * above to know if there are other CPU references as > + * page_count() will be elevated by the > + * pin. Needlessly copying the page will cause the DMA > + * pin to break, try harder to avoid that. > + */ > + if (page_maybe_dma_pinned(page)) > + cow = cow_needed(vmf); > + > unlock_page(page); > - goto copy; > + if (cow) > + goto copy; > } > /* > * Ok, we've got the only map reference, and the only > > What do you think Peter? Is this remotely close? My understanding is this may only work for the case when the fork()ed child quitted before we reach here (so we still have mapcount==1 for the page). What if not? Then mapcount will be greater than 1, and cow will still trigger. Is that what we want? Another problem is that, aiui, one of the major change previous patch proposed is to avoid using lock_page() so that we never block in this path. However if we need the page reuse logic to guarantee pinning pages, then it becomes a correctness issue, so I'm afraid we'll start to make all things complicated again. Maybe even more complicated, because "correctness" should be even harder than "best effort reuse" since it can cause data corruption if we didn't do it right... -- Peter Xu