From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5501C433B4 for ; Fri, 9 Apr 2021 03:08:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C1A27610CC for ; Fri, 9 Apr 2021 03:08:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233167AbhDIDIm (ORCPT ); Thu, 8 Apr 2021 23:08:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233153AbhDIDIj (ORCPT ); Thu, 8 Apr 2021 23:08:39 -0400 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 416B6C061760 for ; Thu, 8 Apr 2021 20:08:23 -0700 (PDT) Received: by mail-oi1-x232.google.com with SMTP id n8so4357437oie.10 for ; Thu, 08 Apr 2021 20:08:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=9uthe1FxOzaDy+lszfFzdCpFcyILmV2zQJPxB5oi21c=; b=DpTO01Wvt3BiLQVzhjRo0O2uCjpzLu70Np53FubkoZ5Wdd4qB2TR4aq3G9EAuD5kxF oKF1DQSbaeH0Wsxv1mpYjNa8XRaKmzmSNp+vwuza6oodi2c2AAqBt3OD47SqSSS0tFI+ K8jBnnpO/rDXkTN3+NWWKrz9bwvrhtEGv5yrBReSjvO+JY9IFv16d59VPPOw02lRkAhs r/nWyOyUH2x3myijAHfpBX7wXT2scNBUXRNVhtTADzi4YG989Y465H+/RBYqs3ljSd4H EPWJuyGBLUvAy+2/j+FPNu1synqnp1lefptHLDgoGHKqKglfmH32ligoKuhKT5My+M2X 2WUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=9uthe1FxOzaDy+lszfFzdCpFcyILmV2zQJPxB5oi21c=; b=DeBN/D2FA+ZSEvTbfb1/PfXz0ImubEbkZY+AUIO7dvrkBFqiLY5PMZ5JAQT70BrqaS EYOHOmMk6U3LWN7j26qNT3mvAnd7/IUo44ykyfYgEpN8K0MEDQkKnmg3bNgagGCwiB/c if26ppjO2v2JpNdcqKo5jpueNzngeP1rp9l4svuQhH86XHqdLw82nLJQodxj5pJZ4zLu YvqhhEG6rAGRSIp5/xkBETWD52rAZO/AYW36AQ/ztRTvGQ/+J2tzDz8hH+PvZzT4mh+c sjaZTdqYajFh8HtCKdFNIqOiDv3sQeK+xhi/vu6BVnaL8vXzTwhCAMlzb4XcHtxPLDw4 pN2Q== X-Gm-Message-State: AOAM532gQaEmru0/gS5tiG7qMpg0DLlBplNv5neeEqjYrPHDtXbFfsK6 aFNpxPK7aZLz992RooMrDpm41Q== X-Google-Smtp-Source: ABdhPJwuyOcM3WhiOtVZEWdmStl8yBE9r5JSQp/xMohf6HKVzzwFaVwovIdg+etJ8w9iw+UpuDnMSw== X-Received: by 2002:aca:db41:: with SMTP id s62mr8559824oig.54.1617937702431; Thu, 08 Apr 2021 20:08:22 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id f29sm327464ots.22.2021.04.08.20.08.20 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Thu, 08 Apr 2021 20:08:21 -0700 (PDT) Date: Thu, 8 Apr 2021 20:08:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Axel Rasmussen cc: Peter Xu , Alexander Viro , Andrea Arcangeli , Andrew Morton , Hugh Dickins , Jerome Glisse , Joe Perches , Lokesh Gidra , Mike Rapoport , Shaohua Li , Shuah Khan , Stephen Rothwell , Wang Qing , LKML , linux-fsdevel@vger.kernel.org, Linux MM , linux-kselftest@vger.kernel.org, Brian Geffon , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Michel Lespinasse , Mina Almasry , Oliver Upton Subject: Re: [PATCH v5] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior In-Reply-To: Message-ID: References: <20210405171917.2423068-1-axelrasmussen@google.com> <20210406234949.GN628002@xz-x1> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 8 Apr 2021, Axel Rasmussen wrote: > On Tue, Apr 6, 2021 at 4:49 PM Peter Xu wrote: > > On Mon, Apr 05, 2021 at 10:19:17AM -0700, Axel Rasmussen wrote: ... > > > --- a/mm/userfaultfd.c > > > +++ b/mm/userfaultfd.c ... > > > + > > > + if (is_file_backed) { > > > + /* The shmem MAP_PRIVATE case requires checking the i_size */ > > > + inode = dst_vma->vm_file->f_inode; > > > + offset = linear_page_index(dst_vma, dst_addr); > > > + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > > + ret = -EFAULT; > > > + if (unlikely(offset >= max_off)) > > > + goto out_unlock; > > > > Frankly I don't really know why this must be put into pgtable lock.. Since if > > not required then it can be moved into UFFDIO_COPY path, as CONTINUE doesn't > > need it iiuc. Just raise it up as a pure question. > > It's not clear to me either. shmem_getpage_gfp() does check this twice > kinda like we're doing, but it doesn't ever touch the PTL. What it > seems to be worried about is, what happens if a concurrent > FALLOC_FL_PUNCH_HOLE happens somewhere in the middle of whatever > manipulation we're doing? From looking at shmem_fallocate(), I think > the basic point is that truncation happens while "inode_lock(inode)" > is held, but neither shmem_mcopy_atomic_pte() or the new > mcopy_atomic_install_ptes() take that lock. > > I'm a bit hesitant to just remove it, run some tests, and then declare > victory, because it seems plausible it's there to catch some > semi-hard-to-induce race. I'm not sure how to prove that *isn't* > needed, so my inclination is to just keep it? > > I'll send a series addressing the feedback so far this afternoon, and > I'll leave this alone for now - at least, it doesn't seem to hurt > anything. Maybe Hugh or someone else has some more advice about it. If > so, I'm happy to remove it in a follow-up. It takes some thinking about, but the i_size check is required to be under the pagetable lock, for the MAP_PRIVATE UFFDIO_COPY path, where it is inserting an anonymous page into the file-backed vma (skipping actually inserting a page into page cache, as an ordinary fault would). Not because of FALLOC_FL_PUNCH_HOLE (which makes no change to i_size; and it's okay if a race fills in the hole immediately afterwards), but because of truncation (which must remove all beyond i_size). In the MAP_SHARED case, with a locked page inserted into page cache, the page lock is enough to exclude concurrent truncation. But even in that case the second i_size check (I'm looking at 5.12-rc's shmem_mfill_atomic_pte(), rather than recent patches which might differ) is required: because the first i_size check was done before the page became visible in page cache, so a concurrent truncation could miss it). Maybe that first check is redundant, though I'm definitely for doing it; or maybe shmem_add_to_page_cache() would be better if it made that check itself, under xas_lock (I think the reason it does not is historical). The second check, in the MAP_SHARED case, does not need to be under pagetable lock - the page lock on the page cache page is enough - but probably Andrea placed it there to resemble the anonymous case. You might then question, how come there is no i_size check in all of mm/memory.c, where ordinary faulting is handled. I'll answer that the pte_same() check, under pagetable lock in wp_page_copy(), is where the equivalent to userfaultfd's MAP_PRIVATE UFFDIO_COPY check is made: if the page cache page has already been truncated, that pte will have been cleared. Or, if the page cache page is truncated an instant after wp_page_copy() drops page table lock, then the unmap_mapping_range(,,, even_cows = 1) which follows truncation has to clean it up. Er, does that mean that the i_size check I started off insisting is required, actually is not required? Um, maybe, but let's just keep it and say goodnight! Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F2F5C433ED for ; Fri, 9 Apr 2021 03:08:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BBC39611AF for ; Fri, 9 Apr 2021 03:08:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BBC39611AF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4E69D6B006C; Thu, 8 Apr 2021 23:08:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 496936B006E; Thu, 8 Apr 2021 23:08:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30FF66B0070; Thu, 8 Apr 2021 23:08:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 16ECB6B006C for ; Thu, 8 Apr 2021 23:08:24 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BF4231801D305 for ; Fri, 9 Apr 2021 03:08:23 +0000 (UTC) X-FDA: 78011345286.21.E18284E Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by imf07.hostedemail.com (Postfix) with ESMTP id 1C6CFA0000FD for ; Fri, 9 Apr 2021 03:08:23 +0000 (UTC) Received: by mail-oi1-f182.google.com with SMTP id k25so4397008oic.4 for ; Thu, 08 Apr 2021 20:08:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=9uthe1FxOzaDy+lszfFzdCpFcyILmV2zQJPxB5oi21c=; b=DpTO01Wvt3BiLQVzhjRo0O2uCjpzLu70Np53FubkoZ5Wdd4qB2TR4aq3G9EAuD5kxF oKF1DQSbaeH0Wsxv1mpYjNa8XRaKmzmSNp+vwuza6oodi2c2AAqBt3OD47SqSSS0tFI+ K8jBnnpO/rDXkTN3+NWWKrz9bwvrhtEGv5yrBReSjvO+JY9IFv16d59VPPOw02lRkAhs r/nWyOyUH2x3myijAHfpBX7wXT2scNBUXRNVhtTADzi4YG989Y465H+/RBYqs3ljSd4H EPWJuyGBLUvAy+2/j+FPNu1synqnp1lefptHLDgoGHKqKglfmH32ligoKuhKT5My+M2X 2WUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=9uthe1FxOzaDy+lszfFzdCpFcyILmV2zQJPxB5oi21c=; b=deKHjH8KhDBFhbxog9PPys5NVJPted5AkeZacHqbTB3xnHQjrnvpgnNzfAnyHK1SHd qGelUSOU1mwZ2nCYicrfnhzh2b4W+p1hV6nfvtMkCUl0N3MScGRQEsNCY7isa+hlG6oe Sfhzskajmvoqv3NvJS4jKXy3jlY6ibFzTILnu/LseiJiZySCcXLt9Dqp4sPCT/YByWwv r1wAPB9+dYlkFWpq0smiM//X4qJbGwnPgycwdWFzxviuKES5lzSuOtijUVevBgQBq73b Bup1zEDwi7EgxhUVdE/EuOTsj/nddhWFD4l+DXhBxNbSdG9vd9R9o3k+Sqgnwrl4exxG Qscw== X-Gm-Message-State: AOAM533Xw/3nhLTVNz2rS7KBjcLw1WXU77RfU+TA1KT/OEYK6LILEqhU KmbiaCZzCpDsD8Ej4anJYzcusw== X-Google-Smtp-Source: ABdhPJwuyOcM3WhiOtVZEWdmStl8yBE9r5JSQp/xMohf6HKVzzwFaVwovIdg+etJ8w9iw+UpuDnMSw== X-Received: by 2002:aca:db41:: with SMTP id s62mr8559824oig.54.1617937702431; Thu, 08 Apr 2021 20:08:22 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id f29sm327464ots.22.2021.04.08.20.08.20 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Thu, 08 Apr 2021 20:08:21 -0700 (PDT) Date: Thu, 8 Apr 2021 20:08:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Axel Rasmussen cc: Peter Xu , Alexander Viro , Andrea Arcangeli , Andrew Morton , Hugh Dickins , Jerome Glisse , Joe Perches , Lokesh Gidra , Mike Rapoport , Shaohua Li , Shuah Khan , Stephen Rothwell , Wang Qing , LKML , linux-fsdevel@vger.kernel.org, Linux MM , linux-kselftest@vger.kernel.org, Brian Geffon , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Michel Lespinasse , Mina Almasry , Oliver Upton Subject: Re: [PATCH v5] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior In-Reply-To: Message-ID: References: <20210405171917.2423068-1-axelrasmussen@google.com> <20210406234949.GN628002@xz-x1> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1C6CFA0000FD X-Stat-Signature: 37o85hfz8pxjbxfbskckbqgqz5s1jy8f Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=mail-oi1-f182.google.com; client-ip=209.85.167.182 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617937703-758 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 8 Apr 2021, Axel Rasmussen wrote: > On Tue, Apr 6, 2021 at 4:49 PM Peter Xu wrote: > > On Mon, Apr 05, 2021 at 10:19:17AM -0700, Axel Rasmussen wrote: ... > > > --- a/mm/userfaultfd.c > > > +++ b/mm/userfaultfd.c ... > > > + > > > + if (is_file_backed) { > > > + /* The shmem MAP_PRIVATE case requires checking the i_size */ > > > + inode = dst_vma->vm_file->f_inode; > > > + offset = linear_page_index(dst_vma, dst_addr); > > > + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > > + ret = -EFAULT; > > > + if (unlikely(offset >= max_off)) > > > + goto out_unlock; > > > > Frankly I don't really know why this must be put into pgtable lock.. Since if > > not required then it can be moved into UFFDIO_COPY path, as CONTINUE doesn't > > need it iiuc. Just raise it up as a pure question. > > It's not clear to me either. shmem_getpage_gfp() does check this twice > kinda like we're doing, but it doesn't ever touch the PTL. What it > seems to be worried about is, what happens if a concurrent > FALLOC_FL_PUNCH_HOLE happens somewhere in the middle of whatever > manipulation we're doing? From looking at shmem_fallocate(), I think > the basic point is that truncation happens while "inode_lock(inode)" > is held, but neither shmem_mcopy_atomic_pte() or the new > mcopy_atomic_install_ptes() take that lock. > > I'm a bit hesitant to just remove it, run some tests, and then declare > victory, because it seems plausible it's there to catch some > semi-hard-to-induce race. I'm not sure how to prove that *isn't* > needed, so my inclination is to just keep it? > > I'll send a series addressing the feedback so far this afternoon, and > I'll leave this alone for now - at least, it doesn't seem to hurt > anything. Maybe Hugh or someone else has some more advice about it. If > so, I'm happy to remove it in a follow-up. It takes some thinking about, but the i_size check is required to be under the pagetable lock, for the MAP_PRIVATE UFFDIO_COPY path, where it is inserting an anonymous page into the file-backed vma (skipping actually inserting a page into page cache, as an ordinary fault would). Not because of FALLOC_FL_PUNCH_HOLE (which makes no change to i_size; and it's okay if a race fills in the hole immediately afterwards), but because of truncation (which must remove all beyond i_size). In the MAP_SHARED case, with a locked page inserted into page cache, the page lock is enough to exclude concurrent truncation. But even in that case the second i_size check (I'm looking at 5.12-rc's shmem_mfill_atomic_pte(), rather than recent patches which might differ) is required: because the first i_size check was done before the page became visible in page cache, so a concurrent truncation could miss it). Maybe that first check is redundant, though I'm definitely for doing it; or maybe shmem_add_to_page_cache() would be better if it made that check itself, under xas_lock (I think the reason it does not is historical). The second check, in the MAP_SHARED case, does not need to be under pagetable lock - the page lock on the page cache page is enough - but probably Andrea placed it there to resemble the anonymous case. You might then question, how come there is no i_size check in all of mm/memory.c, where ordinary faulting is handled. I'll answer that the pte_same() check, under pagetable lock in wp_page_copy(), is where the equivalent to userfaultfd's MAP_PRIVATE UFFDIO_COPY check is made: if the page cache page has already been truncated, that pte will have been cleared. Or, if the page cache page is truncated an instant after wp_page_copy() drops page table lock, then the unmap_mapping_range(,,, even_cows = 1) which follows truncation has to clean it up. Er, does that mean that the i_size check I started off insisting is required, actually is not required? Um, maybe, but let's just keep it and say goodnight! Hugh