From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCD75C433DB for ; Thu, 24 Dec 2020 04:37:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 55236224B0 for ; Thu, 24 Dec 2020 04:37:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 55236224B0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 74F7E8D0071; Wed, 23 Dec 2020 23:37:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D8B28D0063; Wed, 23 Dec 2020 23:37:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A19A8D0071; Wed, 23 Dec 2020 23:37:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id 3CE908D0063 for ; Wed, 23 Dec 2020 23:37:39 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EBB028249980 for ; Thu, 24 Dec 2020 04:37:38 +0000 (UTC) X-FDA: 77626917396.17.yard42_60029292746e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id C55D4180D0180 for ; Thu, 24 Dec 2020 04:37:38 +0000 (UTC) X-HE-Tag: yard42_60029292746e X-Filterd-Recvd-Size: 7153 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 24 Dec 2020 04:37:38 +0000 (UTC) Received: by mail-pl1-f172.google.com with SMTP id be12so747939plb.4 for ; Wed, 23 Dec 2020 20:37:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=UHW8D/vLiku736lshJK5k759PLM13qxM1Fi6BXa63YY=; b=Nq+JY8nHgUqHfOT6yEF6/xKweUQlO9WUlrTDx9SK0nrwgVi9Tj5cOo/IgtgsB5EZcj NPBXBOdu1WHZq+3FwviapmJHpM7oZqDQGvCFDL1ve+Fj31ykI4L1446cda/hwELyYNCQ wIcVI+rgNEUXD6A7AI6sRkOTzOLfs1y+Q6xnEQOiP0E/2mlf7+BGYgr9td7WVz+R5mLq MN++ezk0QZ6W9o4qL7+NRvckJnOpi6+LT63XshA9a0tBH0pmRjWtFWrUswGN6fUcZ4Dp sFI91iUssAWJjIULiBr3SHFvNdP8c2Ix3dt+UlBxnAtWKCRl2eiliCwbh18VYWzhfmGp CMvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=UHW8D/vLiku736lshJK5k759PLM13qxM1Fi6BXa63YY=; b=dfZRwSsCsj13aFM9qpbgoa948454QzSrXxT5oejm5tatuSgHdoklVsWTGc5m4aLFaz FRuNyoU9iBoplRVa/k1wfA7vqgdDFQuyJXNgwTrbP5CJQFxhmEnGBK5KMoXuwaGvAEfv Jf+iVt+nRLvLL8xmm1k1J+gX9h7zW5Ez4yhVrofZIokEExRyS18oXFn62rXg55PUcg33 bkT3W1q9ixZfhu2Ukhz+IwJtqZ5Z2m1HN7w6HKqpVQEx5mIA5Jt1Vxvz3EXS6RJL3KhC +W4G9H08pjKc3lYVmisXowp9kvJT372/x0/EK3c7gS8OFLikVDeTQ006YDMkYMkjW3kn YWTQ== X-Gm-Message-State: AOAM532NEd/7l0P976Ijk8Q/hPOjYtmBkSIu9CI+BPr7afUbe33gy3GD 6FkXjSZ0VYLdWlbiXg8aZRY= X-Google-Smtp-Source: ABdhPJzBX9lMKB1TKXCnFnQ69ZdbGJ3J4yr372TJ9wiR76j0X09Mu+pA+Lh6lXr97dVxVEEpmcCJjg== X-Received: by 2002:a17:90a:c592:: with SMTP id l18mr2717382pjt.228.1608784657017; Wed, 23 Dec 2020 20:37:37 -0800 (PST) Received: from ?IPv6:2601:647:4700:9b2:50a2:5929:401b:705e? ([2601:647:4700:9b2:50a2:5929:401b:705e]) by smtp.gmail.com with ESMTPSA id y5sm18875764pgs.90.2020.12.23.20.37.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Dec 2020 20:37:35 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect From: Nadav Amit In-Reply-To: Date: Wed, 23 Dec 2020 20:37:33 -0800 Cc: Andrea Arcangeli , Andy Lutomirski , Andy Lutomirski , Linus Torvalds , Peter Xu , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra Content-Transfer-Encoding: quoted-printable Message-Id: References: <3A6A1049-24C6-4B2D-8C59-21B549F742B4@gmail.com> To: Yu Zhao X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Dec 23, 2020, at 7:34 PM, Yu Zhao wrote: >=20 > On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote: >>> On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli = wrote: >>>=20 >>> On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote: >>>> I don=E2=80=99t love this as a long term fix. AFAICT we can have = mm_tlb_flush_pending set for quite a while =E2=80=94 mprotect seems like = it can wait in IO while splitting a huge page, for example. That gives = us a window in which every write fault turns into a TLB flush. >>>=20 >>> mprotect can't run concurrently with a page fault in the first = place. >>>=20 >>> One other near zero cost improvement easy to add if this would be = "if >>> (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made >>> conditional to the two config options too. >>>=20 >>> Still I don't mind doing it in some other way, uffd-wp has much = easier >>> time doing it in another way in fact. >>>=20 >>> Whatever performs better is fine, but queuing up pending invalidate >>> ranges don't look very attractive since it'd be a fixed cost that = we'd >>> always have to pay even when there's no fault (and there can't be = any >>> fault at least for mprotect). >>=20 >> I think there are other cases in which Andy=E2=80=99s concern is = relevant >> (MADV_PAGEOUT). >=20 > That patch only demonstrate a rough idea and I should have been > elaborate: if we ever decide to go that direction, we only need to > worry about "jumping through hoops", because the final patch (set) > I have in mind would not only have the build time optimization Andrea > suggested but also include runtime optimizations like skipping > do_swap_page() path and (!PageAnon() || page_mapcount > 1). Rest > assured, the performance impact on do_wp_page() from occasionally an > additional TLB flush on top of a page copy is negligible. I agree with you to a certain extent, since there is anyhow another TLB flush in this path when the PTE is set after copying. Yet, I think that having a combined and efficient central mechanism for pending TLB flushes is important even for robustness: to prevent the development of new independent deferred flushing schemes. I specifically = do not like tlb_flush_batched which every time that I look at gets me = confused. For example the following code completely confuses me: void flush_tlb_batched_pending(struct mm_struct *mm) { =20 if (data_race(mm->tlb_flush_batched)) { flush_tlb_mm(mm); /*=20 * Do not allow the compiler to re-order the clearing of * tlb_flush_batched before the tlb is flushed. */ barrier(); mm->tlb_flush_batched =3D false; } } =E2=80=A6 and then I ask myself (no answer): 1. What prevents concurrent flush_tlb_batched_pending() which is called = by madvise_free_pte_range(), for instance from madvise_free_pte_range(), = from clearing new deferred flush indication that was just set by set_tlb_ubc_flush_pending()? Can it cause a missed TLB flush later? 2. Why the write to tlb_flush_batched is not done with WRITE_ONCE()? 3. Should we have smp_wmb() instead of barrier()? (probably the = barrier() is not needed at all since flush_tlb_mm() serializes if a flush is needed). 4. Why do we need 2 deferred TLB flushing mechanisms?