From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 962DFC433DB for ; Tue, 22 Dec 2020 22:02:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2A54D22AAE for ; Tue, 22 Dec 2020 22:02:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A54D22AAE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 529668D0003; Tue, 22 Dec 2020 17:02:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B34C6B00BF; Tue, 22 Dec 2020 17:02:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37ABB8D0003; Tue, 22 Dec 2020 17:02:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id 1B0746B00BE for ; Tue, 22 Dec 2020 17:02:47 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D17E93632 for ; Tue, 22 Dec 2020 22:02:46 +0000 (UTC) X-FDA: 77622293532.04.self59_2d0ff0827463 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id AB104800FFCF for ; Tue, 22 Dec 2020 22:02:46 +0000 (UTC) X-HE-Tag: self59_2d0ff0827463 X-Filterd-Recvd-Size: 6937 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 22:02:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608674565; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=urX6b7Ar6vf2kG6DWZuX4Gwil4+VQz0Wx+qVJUk6lNk=; b=CpTMyTP8Ypzqz/lF66NWyQie0EmIrbmkGn6ydR7TvGk1sUcUxHbmWkd88Kner6ZlupnaG3 nqsDuiYWkg/fanlSdj9LAo08zZ8zgJPaNDcsk+q2lb62gRCEMkwQrcPelz8aQEidp2YOvC 0CeHauo5EKWt2Qjitfk9p4FOklA2SRw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-263-bjp8P9mIOo6Shx9mSuMVew-1; Tue, 22 Dec 2020 17:02:43 -0500 X-MC-Unique: bjp8P9mIOo6Shx9mSuMVew-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C5502801817; Tue, 22 Dec 2020 22:02:41 +0000 (UTC) Received: from mail (ovpn-112-5.rdu2.redhat.com [10.10.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7C97D60C66; Tue, 22 Dec 2020 22:02:38 +0000 (UTC) Date: Tue, 22 Dec 2020 17:02:37 -0500 From: Andrea Arcangeli To: Yu Zhao Cc: Andy Lutomirski , Linus Torvalds , Peter Xu , Nadav Amit , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.3 (2020-12-04) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 22, 2020 at 02:14:41PM -0700, Yu Zhao wrote: > This works but I don't prefer this option because 1) this is new > way of making pte_wrprotect safe and 2) it's specific to ufd and > can't be applied to clear_soft_dirty() which has no "catcher". No I didn't look into clear_soft_dirty issue, I can look into that. To avoid having to deal with a 3rd solution it will have to use one of the two: 1) avoid deferring tlb flush and enforce a sync flush before dropping the PT lock even if mm_mm_tlb_flush_pending is true -> write_protect_page in KSM 2) add its own new catcher for its own specific marker (for UFFD-WP the marker is _PAGE_UFFD_WP, for change_prot_numa is PROT_NONE on a vma->vm_pgprot not PROT_NONE, for soft dirty it could be _PAGE_SOFT_DIRTY) to send the page fault to a dead end before the pte value is interpreted. > matter how good the documentation about this new way is now, it > will be out of date, speaking from my personal experience. A common catcher for all 3 is not possible because each catcher depends on whatever marker and whatever pte value they set that may lead to a different deterministic path where to put the catcher or multiple paths even. do_numa_page requires a catcher in a different place already. Or well, a common catcher for all 3 is technically possible but it'd perform slower requiring to check things twice. But perhaps the soft_dirty can use the same catcher of uffd-wp given the similarity? > I'd go with what Nadav has -- the memory corruption problem has been > there for a while and nobody has complained except Nadav. This makes > me think people is less likely to notice any performance issues from > holding mmap_lock for write in his patch either. uffd-wp is a fairly new feature, the current users are irrelevant, keeping it optimal is just for the future potential. > But I can't say I have zero concern with the potential performance > impact given that I have been expecting the fix to go to stable, > which I care most. So the next option on my list is to have a Actually stable would be very fine to go with Nadav patch and use the mmap_lock_write unconditionally. The optimal fix is only relevant for the future potential, so it's only relevant for Linus's tree. However the feature is recent enough that it won't require a deep backport so the optimal fix is likely fine for stable as well, generally stable prefers the same fix as in the upstream when there's no major backport rejection issue. The alternative solution for uffd is to do the deferred flush under mmap_lock_write if len is > HPAGE_PMD_SIZE, or to tell change_protection not to defer the flush and to take the mmap_lock_read for <= HPAGE_PMD_SIZE. That will avoid depending on the catcher and then userfaultfd_writeprotect(mode_wp=true) userfaultfd_writeprotect(mode_wp=false) can even run in parallel at all times. The cons is large userfaultfd_writeprotect will block for I/O and those would happen at least in the postcopy live snapshotting use case. The main cons is that it'd require modification to change_protection so it actually looks more intrusive, not less. Overall anything that allows to wrprotect 1 pte with only the mmap_lock_read exactly like KSM write_protect_page, would be enough for uffd-wp. What isn't ok in terms of future potential is unconditional mmap_lock_write as in the original suggested patch in my view. It doesn't mean we can take mmap_lock_write when the operation is large and there is actually more benefit from deferring the flush. > common "catcher" in do_wp_page() which singles out pages that have > page_mapcount equal to one and reuse them instead of trying to I don't think the page_mapcount matters here. If the wp page reuse was more accurate (as it was before) we wouldn't notice this issue, but it still would be a bug that there were stale TLB entries. It worked by luck. Thanks, Andrea