From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9033EC4727D for ; Tue, 22 Sep 2020 12:05:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 19B092388B for ; Tue, 22 Sep 2020 12:05:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="VTAh7UaQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 19B092388B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 50ACF900074; Tue, 22 Sep 2020 08:05:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4941C900063; Tue, 22 Sep 2020 08:05:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30E6D900074; Tue, 22 Sep 2020 08:05:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 155EE900063 for ; Tue, 22 Sep 2020 08:05:13 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BAECF180AD80F for ; Tue, 22 Sep 2020 12:05:12 +0000 (UTC) X-FDA: 77290566864.23.work79_0708feb2714d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 9B87237606 for ; Tue, 22 Sep 2020 12:05:12 +0000 (UTC) X-HE-Tag: work79_0708feb2714d X-Filterd-Recvd-Size: 6187 Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 12:05:11 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id b2so15244996qtp.8 for ; Tue, 22 Sep 2020 05:05:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=WzIXZoR0LdxywGLZGuzcb6+7NezKMV4HLOr+hyN4ucs=; b=VTAh7UaQGkPZN3uDhiLIaUvDm4ZZs64f0cMpY/SCi98BQkfrsmaqHNbHd7oQ2mMdpL 7p9OcmrhyX3pEjvnjzbXtVBO1uL/Om0Hqu4NdRijJJm9gBOGD/pkL1ZURCJg7dITo5fX rGKFyOLqLAxpf+DVz78tU/TCE5eq162vlAPwoGH5eon2DhAR8aBBjMcdMCAkVKEhP3cC awoGK/hsI/8snERu8TKTcyNIMmsxrisIaGqlbCtTqOBLPA3Qe3cAmUwHaE4EWUUqQuvd Zn7YC37CfFp0rRvwRDo9hRQiQPhebUtU4Y6RoHWp3d6ue8HUlXXKy6+7dbfvChbTFCbR 3sBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=WzIXZoR0LdxywGLZGuzcb6+7NezKMV4HLOr+hyN4ucs=; b=lC6kLhbbshMVd1+/SFYQ9gojRq3rsMfg4ZOjg1TOgueZSRmHwAhOpkLUoc35lLucoi v9s0BdM+hh840e+nQlueXwaPEPZ/zFOuma5kZeTqTRvzt7liEA65STxAbBlvKV+cu7dI 4Gyj5AeZ+2mkdzzTnJrkwC3DIQar58S+HNIyNgxGw6cLRg1GQ7C3A/6rymIxrlutlGXF KwL7rT32SeevJQNAer95KhYs9Bs9SMiHWjF343Dq5WyplBlmQRl7e48cJog6fpItc2zI r3lOvkzICo9w+Mb5PSUCTw2+4AisxTJ8NLKzoUYH4dLc1nrCithRM4F6mQZcX+VgL0vP mEnQ== X-Gm-Message-State: AOAM531fIbjBYtBkzdAWZWqtlJQbjCFGBgW8hMkNWFsEHRR0LjWIExW6 hnDTKvX3QkdWwUelM3wpzhPfFw== X-Google-Smtp-Source: ABdhPJxsOhM2mvGEoU3xAot9iOtGbxLSdtavFLJAVYOGRNqz0wSzFm/1MyNUvG+rSkqoellji8P6nA== X-Received: by 2002:aed:2c06:: with SMTP id f6mr4338295qtd.362.1600776311411; Tue, 22 Sep 2020 05:05:11 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-48-30.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.48.30]) by smtp.gmail.com with ESMTPSA id m10sm12533625qti.46.2020.09.22.05.05.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Sep 2020 05:05:10 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kKh2f-0030mM-CZ; Tue, 22 Sep 2020 09:05:05 -0300 Date: Tue, 22 Sep 2020 09:05:05 -0300 From: Jason Gunthorpe To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Linus Torvalds , Michal Hocko , Kirill Shutemov , Jann Horn , Oleg Nesterov , Kirill Tkhai , Hugh Dickins , Leon Romanovsky , Jan Kara , John Hubbard , Christoph Hellwig , Andrew Morton , Andrea Arcangeli Subject: Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Message-ID: <20200922120505.GH8409@ziepe.ca> References: <20200921211744.24758-1-peterx@redhat.com> <20200921212031.25233-1-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200921212031.25233-1-peterx@redhat.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 21, 2020 at 05:20:31PM -0400, Peter Xu wrote: > Pinned pages shouldn't be write-protected when fork() happens, because follow > up copy-on-write on these pages could cause the pinned pages to be replaced by > random newly allocated pages. > > For huge PMDs, we split the huge pmd if pinning is detected. So that future > handling will be done by the PTE level (with our latest changes, each of the > small pages will be copied). We can achieve this by let copy_huge_pmd() return > -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and > finally land the next copy_pte_range() call. > > Huge PUDs will be even more special - so far it does not support anonymous > pages. But it can actually be done the same as the huge PMDs even if the split > huge PUDs means to erase the PUD entries. It'll guarantee the follow up fault > ins will remap the same pages in either parent/child later. > > This might not be the most efficient way, but it should be easy and clean > enough. It should be fine, since we're tackling with a very rare case just to > make sure userspaces that pinned some thps will still work even without > MADV_DONTFORK and after they fork()ed. > > Signed-off-by: Peter Xu > mm/huge_memory.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 7ff29cc3d55c..c40aac0ad87e 100644 > +++ b/mm/huge_memory.c > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > src_page = pmd_page(pmd); > VM_BUG_ON_PAGE(!PageHead(src_page), src_page); > + > + /* > + * If this page is a potentially pinned page, split and retry the fault > + * with smaller page size. Normally this should not happen because the > + * userspace should use MADV_DONTFORK upon pinned regions. This is a > + * best effort that the pinned pages won't be replaced by another > + * random page during the coming copy-on-write. > + */ > + if (unlikely(READ_ONCE(src_mm->has_pinned) && > + page_maybe_dma_pinned(src_page))) { > + pte_free(dst_mm, pgtable); > + spin_unlock(src_ptl); > + spin_unlock(dst_ptl); > + __split_huge_pmd(vma, src_pmd, addr, false, NULL); > + return -EAGAIN; > + } Not sure why, but the PMD stuff here is not calling is_cow_mapping() before doing the write protect. Seems like it might be an existing bug? In any event, the has_pinned logic shouldn't be used without also checking is_cow_mapping(), so it should be added to that test. Same remarks for PUD Jason