From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2EC8C43461 for ; Thu, 17 Sep 2020 19:03:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 657A020717 for ; Thu, 17 Sep 2020 19:03:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VSyp07pD" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 657A020717 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A0E3C6B0003; Thu, 17 Sep 2020 15:03:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BEC56B0037; Thu, 17 Sep 2020 15:03:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AD4D6B0055; Thu, 17 Sep 2020 15:03:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id 6D13F6B0003 for ; Thu, 17 Sep 2020 15:03:40 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D6979363B for ; Thu, 17 Sep 2020 19:03:39 +0000 (UTC) X-FDA: 77273477358.14.brass06_4e1501f27124 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id B04AB18229818 for ; Thu, 17 Sep 2020 19:03:39 +0000 (UTC) X-HE-Tag: brass06_4e1501f27124 X-Filterd-Recvd-Size: 6880 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Sep 2020 19:03:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600369417; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=werZBEA6aQYY57n+G7CDD2kpsyT+DQ1TWdsoiO3KsTk=; b=VSyp07pD/ouQ+gDRGp4PmAYNDsxp1VyZ5f6KiNy8bio+6sYQI9iNM3ba/4+aVGg6CZ47bG ybKz6cqmfjM2wsFQ2vSkycH1ZL+lV5k/U8Lbu+iS6HuDUx//Bu4xUWLSDZjbpTMkaItlrj Pr9VbdO0Xbvt/hNtLY1tQlc37grLzOM= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-158-bfQX133UP3C4e17eyThsrg-1; Thu, 17 Sep 2020 15:03:35 -0400 X-MC-Unique: bfQX133UP3C4e17eyThsrg-1 Received: by mail-qt1-f197.google.com with SMTP id f5so2580851qtk.11 for ; Thu, 17 Sep 2020 12:03:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=werZBEA6aQYY57n+G7CDD2kpsyT+DQ1TWdsoiO3KsTk=; b=Z6GGo3T6+FXHYqCuzeZpcm7bgav56nG3b4Ois9dxJqyz0H37T6fyDkWqNrTC6LBHsO d7apGeE4f5booHjoMApiDMY1C8i4CkAlpCMb3/BBLCOPLrzo8HjoZPWHB9U42r0yQi3G 1gbXqBhRXuUfWuF66D2wjdA4J18vhjp/pfU0uhwK5P8icKVuNKWqOnWdhDlTQsFKVGxr 7vVbtFxxvQLC4pRF5Gf7+2FsghPtVsYUmnXybwNB71w2iPmbAovPkzXbyYoMhvBV6Ndq d/2QQo5FcQKDq84OYhepsUvWSjNu3mYXI7jdt0lWlg9yVf5kRqdrvySik7ZH6TV7auZf NrCA== X-Gm-Message-State: AOAM533+P9445relWQA++KkKhd3UAVDZj3W6pwtT557776ZXnOcp7tvJ KMqJPMXmT3gL5bLRUoJ56SCyC/ho9BbKvtGXpzYZ4XcAvL9maRLND7al8/kD07k0t9GyXAasiUA NaUrbuCP5Mww= X-Received: by 2002:ac8:3261:: with SMTP id y30mr29831084qta.242.1600369415115; Thu, 17 Sep 2020 12:03:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyMcFp/ZC2dSEfUBEXD6VSFje71VAmHUeXv1Bo/4H1yZr0fGe/3WGTbE9mhvRnP9pBE7wVyLw== X-Received: by 2002:ac8:3261:: with SMTP id y30mr29831049qta.242.1600369414812; Thu, 17 Sep 2020 12:03:34 -0700 (PDT) Received: from xz-x1 (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id g25sm423162qto.47.2020.09.17.12.03.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Sep 2020 12:03:33 -0700 (PDT) Date: Thu, 17 Sep 2020 15:03:32 -0400 From: Peter Xu To: Linus Torvalds Cc: Jason Gunthorpe , John Hubbard , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200917190332.GB133226@xz-x1> References: <20200915191346.GD2949@xz-x1> <20200915193838.GN1221970@ziepe.ca> <20200915213330.GE2949@xz-x1> <20200915232238.GO1221970@ziepe.ca> <20200916174804.GC8409@ziepe.ca> <20200916184619.GB40154@xz-x1> <20200917112538.GD8409@ziepe.ca> <20200917181411.GA133226@xz-x1> MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 17, 2020 at 11:26:01AM -0700, Linus Torvalds wrote: > On Thu, Sep 17, 2020 at 11:14 AM Peter Xu wrote: > > > > In my humble opinion, the real solution is still to use MADV_DONTFORK properly > > so we should never share the DMA pages with others when we know the fact. > > Is this all just because somebody does a fork() after doing page pinning? > > If so, I feel this should be trivially fixed in copy_one_pte(). > That's where we currently do > > /* > * If it's a COW mapping, write protect it both > * in the parent and the child > */ > if (is_cow_mapping(vm_flags) && pte_write(pte)) { > ptep_set_wrprotect(src_mm, addr, src_pte); > pte = pte_wrprotect(pte); > } > > and I feel that that is where we could just change the code to do a > COW event for pinned pages (and *not* mark the parent write protected, > since the parent page now isn't a COW page). > > Because if that's the case that Jason is hitting, then I feel that > really is the correct fix: make sure that the pinning action is > meaningful. > > As mentioned, I really think the whole (and only) point of page > pinning is that it should keep the page locked in the page tables. And > by "locked" I mean exactly that: not just present, but writable. > > And then the "we never COW a pinned page" comes not from the COW code > doing magic, but by it simply never becoming non-writable - because > the page table entry is locked! Looks reasonable to me. The fork() should be slightly slower though, since we'll need to copy the data for all the DMA buffers for each of the child processes, even if we should be pretty sure those processes won't use these pages at all. But it seems a good approach anyway if we care about the potential breakages in the userspace so the breakage is turned into perf degrades, and if any userspace noticed such perf degrade on fork() then people will probably remember to use MADV_DONTFORK properly since afaict MADV_DONTFORK can remove this extra overhead.. Another side effect I can think of is that we'll bring some uncertainty to fork() starting from when page_maybe_dma_pinned() is used, since it's sometimes bogus (hpage_pincount_available()==false) so some COWs might be triggered during fork() even when not necessary if we've got some normal pages with too many refcounts (over GUP_PIN_COUNTING_BIAS). But assuming that's not a big deal since it should be extremely rare, or is it?.. -- Peter Xu