From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 710CEC4363C for ; Mon, 21 Sep 2020 21:20:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ECE462067D for ; Mon, 21 Sep 2020 21:20:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="A/PeS+Vj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ECE462067D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8FBBA6B00E8; Mon, 21 Sep 2020 17:20:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8AC116B00E9; Mon, 21 Sep 2020 17:20:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79A596B00EA; Mon, 21 Sep 2020 17:20:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id 5FC1B6B00E8 for ; Mon, 21 Sep 2020 17:20:54 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 23371181AC9BF for ; Mon, 21 Sep 2020 21:20:54 +0000 (UTC) X-FDA: 77288338428.28.patch39_610bedc27148 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 00B686C05 for ; Mon, 21 Sep 2020 21:20:53 +0000 (UTC) X-HE-Tag: patch39_610bedc27148 X-Filterd-Recvd-Size: 6856 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Sep 2020 21:20:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600723253; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yDMdof6ALSopUITXJ9LE96zIR7LM2dnpA2weFW14G1E=; b=A/PeS+Vjuh2zGnZYo+GmxwDIlM+H6sHQXgQCFsq84/6cUbDz+HvXyrlpsoDE+24vlQw7Aq mFhtGSLlut+szqiNcUifR32x78fc60ML6mVeAHvZ1bwyk/bhZYe8Vh0/LoiF/qOjyyT8CS ErSjQ44A7nrTGLZtdc3eqINR8bCUJb8= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-187-hZDVhJ78NDyeHSmYOHbPBA-1; Mon, 21 Sep 2020 17:20:34 -0400 X-MC-Unique: hZDVhJ78NDyeHSmYOHbPBA-1 Received: by mail-qt1-f199.google.com with SMTP id r22so14130087qtc.9 for ; Mon, 21 Sep 2020 14:20:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yDMdof6ALSopUITXJ9LE96zIR7LM2dnpA2weFW14G1E=; b=JHXeWb8az/O026slo4pSaa9WFEdoz9RnHQofCnmvJibkIHqu2N8tbiS5PfneMTBHSJ 3RsHzgkSxJTpM0DzcZPFfUJ7FwHhNaDxaZwjwK8hfrUvcPpK+0/Ik6lpDjyXwwrORpYI aMdGH3Xx7MqHqchz+zLHqtKK9tW7/bFyq7mWvbiOEqZjry7In1WIk6xYOSUbEBnfx9Fy MCW7nYj5KBN2lBe64Jg5MnGrKqTkRAFjOCF+4CHp0phcN7hBA/J7KLGBXnF8wDQ9zBx6 rT7HFPXmspNAhXoVnIfpi6ZmC7ieH7x4dMSp7AVzxeLxW2JcuPOmAD3ErGvUO0oKC/1E TASQ== X-Gm-Message-State: AOAM530HVmjA9Pe5VcZvs2WfviYmoSqU+NNnHRejdS3zffj08XXwhV4t I7CLzkAYpJd6BcHG8TCLUTsPSEIkxqNpviLe7Tq6rfl9JKJT2CZ2178mDBCQSGwAZ/gE89pfqk/ wKCT7OfJ470s= X-Received: by 2002:ad4:42b3:: with SMTP id e19mr2290408qvr.6.1600723234198; Mon, 21 Sep 2020 14:20:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzDjASxm4cZwqCyNr7H0oNe8o3ayVDHqmTX72bTOJkJWKrmKm1WB/Vws768Lk5uQL2LI2S7ew== X-Received: by 2002:ad4:42b3:: with SMTP id e19mr2290376qvr.6.1600723233967; Mon, 21 Sep 2020 14:20:33 -0700 (PDT) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id p28sm11437356qta.88.2020.09.21.14.20.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 14:20:33 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Linus Torvalds , Michal Hocko , Kirill Shutemov , Jann Horn , Oleg Nesterov , Kirill Tkhai , Hugh Dickins , Leon Romanovsky , Jan Kara , John Hubbard , Christoph Hellwig , Andrew Morton , Jason Gunthorpe , Andrea Arcangeli Subject: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Date: Mon, 21 Sep 2020 17:20:31 -0400 Message-Id: <20200921212031.25233-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921211744.24758-1-peterx@redhat.com> References: <20200921211744.24758-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pinned pages shouldn't be write-protected when fork() happens, because fo= llow up copy-on-write on these pages could cause the pinned pages to be replac= ed by random newly allocated pages. For huge PMDs, we split the huge pmd if pinning is detected. So that fut= ure handling will be done by the PTE level (with our latest changes, each of = the small pages will be copied). We can achieve this by let copy_huge_pmd() = return -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() a= nd finally land the next copy_pte_range() call. Huge PUDs will be even more special - so far it does not support anonymou= s pages. But it can actually be done the same as the huge PMDs even if the= split huge PUDs means to erase the PUD entries. It'll guarantee the follow up = fault ins will remap the same pages in either parent/child later. This might not be the most efficient way, but it should be easy and clean enough. It should be fine, since we're tackling with a very rare case ju= st to make sure userspaces that pinned some thps will still work even without MADV_DONTFORK and after they fork()ed. Signed-off-by: Peter Xu --- mm/huge_memory.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7ff29cc3d55c..c40aac0ad87e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct= mm_struct *src_mm, =20 src_page =3D pmd_page(pmd); VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + + /* + * If this page is a potentially pinned page, split and retry the fault + * with smaller page size. Normally this should not happen because the + * userspace should use MADV_DONTFORK upon pinned regions. This is a + * best effort that the pinned pages won't be replaced by another + * random page during the coming copy-on-write. + */ + if (unlikely(READ_ONCE(src_mm->has_pinned) && + page_maybe_dma_pinned(src_page))) { + pte_free(dst_mm, pgtable); + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pmd(vma, src_pmd, addr, false, NULL); + return -EAGAIN; + } + get_page(src_page); page_dup_rmap(src_page, true); add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); @@ -1177,6 +1194,15 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct= mm_struct *src_mm, /* No huge zero pud yet */ } =20 + /* Please refer to comments in copy_huge_pmd() */ + if (unlikely(READ_ONCE(src_mm->has_pinned) && + page_maybe_dma_pinned(pud_page(pud)))) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pud(vma, src_pud, addr); + return -EAGAIN; + } + pudp_set_wrprotect(src_mm, addr, src_pud); pud =3D pud_mkold(pud_wrprotect(pud)); set_pud_at(dst_mm, addr, dst_pud, pud); --=20 2.26.2