From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED34AC433DF for ; Tue, 11 Aug 2020 23:11:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AC6C3206DC for ; Tue, 11 Aug 2020 23:11:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="a5GhrKjB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC6C3206DC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1871B6B0003; Tue, 11 Aug 2020 19:11:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 138616B0005; Tue, 11 Aug 2020 19:11:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0260A6B0006; Tue, 11 Aug 2020 19:11:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id DCDE16B0003 for ; Tue, 11 Aug 2020 19:11:18 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7C88E8248047 for ; Tue, 11 Aug 2020 23:11:18 +0000 (UTC) X-FDA: 77139835836.24.spoon31_160a8bd26fe6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 4F6CE1A4A0 for ; Tue, 11 Aug 2020 23:11:18 +0000 (UTC) X-HE-Tag: spoon31_160a8bd26fe6 X-Filterd-Recvd-Size: 5980 Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Tue, 11 Aug 2020 23:11:17 +0000 (UTC) Received: by mail-lf1-f68.google.com with SMTP id b11so140153lfe.10 for ; Tue, 11 Aug 2020 16:11:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wLNNVJSdWQONVjpAUUPGUdY9DEPRmdPHuKKqpzDsfqg=; b=a5GhrKjBf0PAmxmSEbjWUBWZJO4eP+0G9mthkovjp8ofgJc7Jk8Qoqa1S58g+k2mDE fcAYRDnfGmnARo0/Q2cxcOrRvBNGF9me0fXh6xWZv950UXqcW/1TYtzsx/E7oz4GBSn2 88Q218pBpIdSZTW+PUNZxzbLstopizX7AChLM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wLNNVJSdWQONVjpAUUPGUdY9DEPRmdPHuKKqpzDsfqg=; b=BV2h2og1LMywL8bdKwU7Kla7Oc7ObdCrQWvFZ3EGWQMwDePFzFLV31R7/ofZh4wOiy EPF/5PBD/rkEEf+f5ZOHQS7Fs/tgVkVkM3tjJsHDekDdAiLHRXA1W70Pe3mi4n8u3mJ+ fvgaUG7BUKn2CVASN2QRPfEM+gOX+ChO1WqYzN1WcBINK1Sh9UznAl/sEaEnj5QGHKO7 qrCsuf40mRM5MLkKnA4QNlYHZUXDoEbXL8jOA7XNpyMYkvG6IRIE9rkgX9GsRW1Y6+pD 4960Qca20pkSj5L3jVXrp6nXlJXQ1ZEulYr64kiu/neRjairJquIkjZyQDCli9SWsvza 5H8w== X-Gm-Message-State: AOAM530/HiieMfbTIm1/LfpnvDSJkEAlA16pLvGE1KVrNdZZkMgXXG60 GPF69hPSyM4pP2sLJEGBOSfDqQtD7XA= X-Google-Smtp-Source: ABdhPJwLzPj2wq4VEsHHqEShvOQohXSK0niaSDJaOPT3/0f40wvv5XGYiWn9MpKlNIjxaHWLarVkKQ== X-Received: by 2002:a19:be87:: with SMTP id o129mr4211561lff.180.1597187475758; Tue, 11 Aug 2020 16:11:15 -0700 (PDT) Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com. [209.85.208.173]) by smtp.gmail.com with ESMTPSA id v1sm40190ljg.60.2020.08.11.16.11.13 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Aug 2020 16:11:13 -0700 (PDT) Received: by mail-lj1-f173.google.com with SMTP id m22so158795ljj.5 for ; Tue, 11 Aug 2020 16:11:13 -0700 (PDT) X-Received: by 2002:a2e:9a11:: with SMTP id o17mr3849775lji.314.1597187473000; Tue, 11 Aug 2020 16:11:13 -0700 (PDT) MIME-Version: 1.0 References: <20200811183950.10603-1-peterx@redhat.com> <20200811214255.GE6353@xz-x1> In-Reply-To: <20200811214255.GE6353@xz-x1> From: Linus Torvalds Date: Tue, 11 Aug 2020 16:10:57 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3] mm/gup: Allow real explicit breaking of COW To: Peter Xu Cc: Andrea Arcangeli , Linux-MM , Linux Kernel Mailing List , Andrew Morton , Marty Mcfadden , "Maya B . Gokhale" , Jann Horn , Christoph Hellwig , Oleg Nesterov , Kirill Shutemov , Jan Kara Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4F6CE1A4A0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 11, 2020 at 2:43 PM Peter Xu wrote: > > I don't know good enough on the reuse refactoring patch (which at least looks > functionally correct), but... IMHO we still need the enforced cow logic no > matter we refactor the page reuse logic or not, am I right? > > Example: > > - Process A & B shares private anonymous page P0 > > - Process A does READ of get_user_pages() on page P0 > > - Process A (e.g., another thread of process A, or as long as process A still > holds the page P0 somehow) writes to page P0 which triggers cow, so for > process A the page P0 is replaced by P1 with identical content > > Then process A still keeps the reference to page P0 that potentially belongs to > process B or others? The COW from process A will indeed keep a reference to page P0 (for whatever nefarious kernel use it did the GUP for). And yes, that page is still mapped into process B. HOWEVER. Since the GUP will be incrementing the reference count of said page, the actual problem has gone away. Because the GUP copy won't be modifying the page (it's a read lookup), and as long as process B only reads from the page, we're happily sharing a read-only page. And if process B now starts writing to it, the "page_count()" check at fault time will trigger, and process B will do a COW copy. So now we'll have three copies of the page: the original one is being kept for the GUP, and both A and B did their COW copies in their page tables. And that's exactly what we wanted - they are all now containing different data, after all. The problem with the *current* code is that we don't actually look at the page count at all, only the mapping count, so the GUP reference count is basically invisible. And the reason we don't look too closely at the page count is that there's a lot of incidental things that can affect it, like the whole KSM reference, the swap cache reference, other GUP users etc etc. So we historically have tried to maximize the amount of sharing we can do. But that "maximize sharing" is really complicated. That's the big change of that simplification patch - it's basically saying that "whenever _anything_ else has a reference to that page, we'll just copy and not even try to share". Linus