From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B0D6C43465 for ; Sat, 19 Sep 2020 00:28:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9F78D21D20 for ; Sat, 19 Sep 2020 00:28:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="dLBnAlnw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F78D21D20 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 060606B0037; Fri, 18 Sep 2020 20:28:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 011AF6B0055; Fri, 18 Sep 2020 20:28:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E43B26B005A; Fri, 18 Sep 2020 20:28:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id CE36F6B0037 for ; Fri, 18 Sep 2020 20:28:52 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 8E4BF362E for ; Sat, 19 Sep 2020 00:28:52 +0000 (UTC) X-FDA: 77277925704.14.wrist86_27177432712f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 717E918229818 for ; Sat, 19 Sep 2020 00:28:52 +0000 (UTC) X-HE-Tag: wrist86_27177432712f X-Filterd-Recvd-Size: 5360 Received: from mail-qt1-f195.google.com (mail-qt1-f195.google.com [209.85.160.195]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Sat, 19 Sep 2020 00:28:51 +0000 (UTC) Received: by mail-qt1-f195.google.com with SMTP id k25so6745062qtu.4 for ; Fri, 18 Sep 2020 17:28:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=j4P3BTqdkFuFYaLLGZbe7c2TIqQozYOPAE0JV1keGJo=; b=dLBnAlnwbin8WDERspJuceop3Te+BNeAc5uU9GUX9B0PcR+b2YgIDAH0JiWHmwvIl/ eKo6b4RFOJbrPP1JGgI8gZaYIvehyRf1S1+m/DGm2fGyKMZ6b/Wwzt6zrfJhjxcdN2Gn XjMLJtWiRyph+GRm1L4JJKsFgvP1M/j4gKOhQ6NxatU8v1lbsloMYI+8NB2c6Y1X0qt/ PNRv+gdoFqUmP3s8N88s4iK+6A4u1hkPMbBnEu8ofJhWz4VSd3kDY2GvZJJcR0R3WKRn 5F2KAk8fSF4AotSJno+qerOij0Ebn0ylNz1h/Odkt/lTMVTWVrm77UJXaifsZA53uMzI PWMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=j4P3BTqdkFuFYaLLGZbe7c2TIqQozYOPAE0JV1keGJo=; b=BXQ9GDneiNjy3yiuXtxv1tqNUVGnGWlOvgt0h4ZE9PGmull2lQS8wmY36LzSQUqGRU mdn3orvlIC1APjk+QPOEI2xmVuaP4YxQeKRSsbF73c2NIcKD4GQ72mNl+d+u/WD7I/uM TiFwRvWtF7Uctttw2qwHJ+vBANI24zoMLztOo6i8Zrg/6j+Nz5tGv2o+MycFHdgEw8xr 3NvmxdwR3AHkmCV4NM0MH2TCNQpPbmx+30MzJ7K4i3hYfT8ozzOwgoy9Y64y30w1bxsq YvYsffGSooXisPNzggzOPkK2mKwtWEsxjukSEJ9Xfy05rmbdXmRbhheV19d3beKB/AJ2 8AKA== X-Gm-Message-State: AOAM5326j0l9s25X59iukhzt3z6ZYGBPvFzlr06UPcn+wGK8t0mUlxRd CjhaaFVTM64nL3OijtpgX+fYYQ== X-Google-Smtp-Source: ABdhPJySxxqfzDfjfbksWHdFYTVJRjxf8rsXn3DLPgiggYB39MIUEoswhSgA5BPzH0RZb/+eHfRtKg== X-Received: by 2002:ac8:4d84:: with SMTP id a4mr36829730qtw.365.1600475331321; Fri, 18 Sep 2020 17:28:51 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-48-30.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.48.30]) by smtp.gmail.com with ESMTPSA id j88sm3179374qte.96.2020.09.18.17.28.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Sep 2020 17:28:50 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kJQkE-001uTq-5K; Fri, 18 Sep 2020 21:28:50 -0300 Date: Fri, 18 Sep 2020 21:28:50 -0300 From: Jason Gunthorpe To: Linus Torvalds Cc: Peter Xu , John Hubbard , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200919002850.GA8409@ziepe.ca> References: <20200916174804.GC8409@ziepe.ca> <20200916184619.GB40154@xz-x1> <20200917112538.GD8409@ziepe.ca> <20200917193824.GL8409@ziepe.ca> <20200918164032.GA5962@xz-x1> <20200918173240.GY8409@ziepe.ca> <20200918204048.GC5962@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Sep 18, 2020 at 01:59:41PM -0700, Linus Torvalds wrote: > Honestly, if we had a completely *reliable* sign of "this page is > pinned", then I think the much nicer option would be to just say > "pinned pages will not be copied at all". Kind of an implicit > VM_DONTCOPY. It would be simpler to implement, but it makes the programming model really sketchy. For instance O_DIRECT is using FOLL_PIN, so imagine this program: CPU0 CPU1 a = malloc(1024); b = malloc(1024); read(fd, a, 1024); // FD is O_DIRECT ... fork() *b = ... read completes Here a and b got lucky and both come from the same page due to the allocator. In this case the fork() child in CPU1, would be very surprised that 'b' was not mapped into the fork. Similiarly, CPU0 would have silent data corruption if the read didn't deposit data into 'a' - which is a bug we have today. In this race the COW break of *b might steal the physical page to the child, and *a won't see the data. For this reason, John is right, fork needs to eventually do this for O_DIRECT as well. The copy on fork nicely fixes all of this weird oddball stuff. Jason