From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A864C433DF for ; Thu, 20 Aug 2020 21:54:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9D84920658 for ; Thu, 20 Aug 2020 21:54:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ae7IFeqG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D84920658 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 33F886B002B; Thu, 20 Aug 2020 17:54:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D2468D0050; Thu, 20 Aug 2020 17:54:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 191C76B002E; Thu, 20 Aug 2020 17:54:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id F384D6B002B for ; Thu, 20 Aug 2020 17:54:57 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AC2578245571 for ; Thu, 20 Aug 2020 21:54:57 +0000 (UTC) X-FDA: 77172302634.24.nerve13_4617af027034 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 8B3BD1A4A0 for ; Thu, 20 Aug 2020 21:54:57 +0000 (UTC) X-HE-Tag: nerve13_4617af027034 X-Filterd-Recvd-Size: 7049 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Aug 2020 21:54:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1597960494; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=s0qPncheLw3CKXvEX1h3fwtoDZQE55eTmK3ToGqEFVE=; b=ae7IFeqG9dKEF65La0iS0Kx+MAFh6bDCa7/lBdAtb5TTMbxi/2o1Izuouz+uJ8GXCqmk9H h/B7vIrZx3tiRrgYc+A8qGmhyBQDYZfM5X2hMhG13axQPE3U/XRP/NA4A0juNX/pxGeB7+ mWv/JsvyHMIE+fch85dIlmEkjQY3izg= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-345-cfh3p7caPwKl0bDT_EqEsg-1; Thu, 20 Aug 2020 17:54:53 -0400 X-MC-Unique: cfh3p7caPwKl0bDT_EqEsg-1 Received: by mail-qv1-f72.google.com with SMTP id d30so42961qve.5 for ; Thu, 20 Aug 2020 14:54:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=s0qPncheLw3CKXvEX1h3fwtoDZQE55eTmK3ToGqEFVE=; b=mBTYIG3yZcy8Y9i/TKYIfzpDr826OgWRldki0JfzoFq+irUhCe28mnxXSINfrIv7HW 5XvHo51PnZKBhZraZbyP8Oe1CgYPf6Mz1q4OXtsAN3s1/v72ON9OxI/vKVQZfIS7QkWn 0xYZodTVGIIguYLRMzuY3vKfdTVYMJnT8Qteap+O8UIXQcNta2lN1qXRiSx+fNP4odHo fFYFcIUmhPIdECQzQxVZJJrgCahb6Vyu1ZvJtt8KvGrJcoPks+qOp+hwdm2ktIYzOfCT u+r534pd98xecUZ+2+gYSfAYUXXXPLQjSxKu7LPKnNKzCm4+HGFjHv9FyTLSGRaBqchb HGIg== X-Gm-Message-State: AOAM5334pSo8bBq/jcMIZCc6AkBoSDh9Lz9aiUjTgvyUIgwxqasRQ6g8 gi1Z+HHFTnLQwsW3U6zHRHUrP3LITkKVZqTLRgJmbMlWIyXzT53GnGtowxx064RD1OiSHQXe56F 3Z0Wx2EgWOyo= X-Received: by 2002:a05:620a:12fb:: with SMTP id f27mr43315qkl.232.1597960492600; Thu, 20 Aug 2020 14:54:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwfoqiD8raU/idXSD8wbKIq//ZtT7ZywNsaHa8GNY0rHZVP7tBG7VgPJSnDG0IFg/hihlBQaA== X-Received: by 2002:a05:620a:12fb:: with SMTP id f27mr43288qkl.232.1597960492225; Thu, 20 Aug 2020 14:54:52 -0700 (PDT) Received: from xz-x1 (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id q68sm3139421qke.123.2020.08.20.14.54.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Aug 2020 14:54:51 -0700 (PDT) Date: Thu, 20 Aug 2020 17:54:49 -0400 From: Peter Xu To: Linus Torvalds Cc: Andrea Arcangeli , Linux-MM , Linux Kernel Mailing List , Andrew Morton , Marty Mcfadden , "Maya B . Gokhale" , Jann Horn , Christoph Hellwig , Oleg Nesterov , Kirill Shutemov , Jan Kara Subject: Re: [PATCH v3] mm/gup: Allow real explicit breaking of COW Message-ID: <20200820215449.GB358043@xz-x1> References: <20200811183950.10603-1-peterx@redhat.com> <20200811214255.GE6353@xz-x1> MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0.001 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 8B3BD1A4A0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 11, 2020 at 04:10:57PM -0700, Linus Torvalds wrote: > On Tue, Aug 11, 2020 at 2:43 PM Peter Xu wrote: > > > > I don't know good enough on the reuse refactoring patch (which at least looks > > functionally correct), but... IMHO we still need the enforced cow logic no > > matter we refactor the page reuse logic or not, am I right? > > > > Example: > > > > - Process A & B shares private anonymous page P0 > > > > - Process A does READ of get_user_pages() on page P0 > > > > - Process A (e.g., another thread of process A, or as long as process A still > > holds the page P0 somehow) writes to page P0 which triggers cow, so for > > process A the page P0 is replaced by P1 with identical content > > > > Then process A still keeps the reference to page P0 that potentially belongs to > > process B or others? > > The COW from process A will indeed keep a reference to page P0 (for > whatever nefarious kernel use it did the GUP for). And yes, that page > is still mapped into process B. > > HOWEVER. > > Since the GUP will be incrementing the reference count of said page, > the actual problem has gone away. Because the GUP copy won't be > modifying the page (it's a read lookup), and as long as process B only > reads from the page, we're happily sharing a read-only page. > > And if process B now starts writing to it, the "page_count()" check at > fault time will trigger, and process B will do a COW copy. > > So now we'll have three copies of the page: the original one is being > kept for the GUP, and both A and B did their COW copies in their page > tables. > > And that's exactly what we wanted - they are all now containing > different data, after all. > > The problem with the *current* code is that we don't actually look at > the page count at all, only the mapping count, so the GUP reference > count is basically invisible. > > And the reason we don't look too closely at the page count is that > there's a lot of incidental things that can affect it, like the whole > KSM reference, the swap cache reference, other GUP users etc etc. So > we historically have tried to maximize the amount of sharing we can > do. > > But that "maximize sharing" is really complicated. > > That's the big change of that simplification patch - it's basically > saying that "whenever _anything_ else has a reference to that page, > we'll just copy and not even try to share". Sorry for the late reply, and thanks for the explanations. That definitely helped me to understand. So, which way should we go? I kind of prefer the new suggestion to remove code rather than adding new codes. I definitely don't know enough on the side effect of it, especially performance-wise on either ksm or swap, but... IIUC the worst case is we'll get some perf report later on, and it seems also not hard to revert the patch later if we want. Thanks, -- Peter Xu