From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A637DC433E0 for ; Wed, 23 Dec 2020 22:46:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 285B022288 for ; Wed, 23 Dec 2020 22:46:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 285B022288 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7C41D8D0056; Wed, 23 Dec 2020 17:46:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 775408D0026; Wed, 23 Dec 2020 17:46:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 663AA8D0056; Wed, 23 Dec 2020 17:46:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 50AD58D0026 for ; Wed, 23 Dec 2020 17:46:05 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1A8F61EF3 for ; Wed, 23 Dec 2020 22:46:05 +0000 (UTC) X-FDA: 77626031490.28.wheel77_63129952746c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id EFB826C33 for ; Wed, 23 Dec 2020 22:46:04 +0000 (UTC) X-HE-Tag: wheel77_63129952746c X-Filterd-Recvd-Size: 7511 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Dec 2020 22:46:04 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id v3so378580plz.13 for ; Wed, 23 Dec 2020 14:46:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=0gM77z9qiu4cf/vMjZqQd+73V4+IFV7TM5e+1+zFyXg=; b=JiNrZGyyOlrL/TtQQdx4zb8cxFJhOqwSFGR5bkKgQYmxswgl0plS1h4kxnM7UzZfYc R2aXM4rP+lWe0Im9ueqwS8A7N1WMrKY+0v3ad1Kky9K8W7i9gurhdqbIYQUTkqG7w329 TLnBblDTfyY3PAQbULjAM+cbIlopqLzzn6S8Tt4KGaN/7zLqnUFfIUvy9/O9cNeyTMCn xzn1CqWEJd0eBgMsjVJodQUm2GbzsGs/wajEM7qi2e0JLvbm0XNvxZBYBLULtSelJ4E8 VfJbQmQ4PbB7rBb3GvslMU4vyCB8Vo2HRz2wA/HPntbHx0so5d9xgdEe4glx5lIYs0du UtAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=0gM77z9qiu4cf/vMjZqQd+73V4+IFV7TM5e+1+zFyXg=; b=pf4EmO76Yv+3UHFoGqUiEOA8ibOEOEU+1gXqA1B9X6FCfdG2rnDRgJcpP0t/1FDfgB qjEnL4C+K7lmOwt2Okizmu1ffP0Ta055U/+R/tZsXGXxSbj+SFEcBYZ/3GIzIkV0Q1ar XJNH+l3RUDv6z26jmYIATwD6fPZE+NC9jF21XeQDQIr3lcFPQaFF0152dpS0AGksI6QJ qEK3L4t6hCRZEKcKs3bxOz5D5gV7HjclZqM/78ExvCcTl3SK1saaOiWCvtG59FDFt609 4myda0+SxgioTOJqDuSG124anm6QsQ6S3IqOfdNL2crYc2U0fceOy4SqoWICrdLMqrby M5MA== X-Gm-Message-State: AOAM531DbVJAjoFFplSNUT9lwsQwmzd9HjRHymZXEzyrMyqBwdQyNK2J 0xqvYgWCVTpJ7S5TkkxoZOE= X-Google-Smtp-Source: ABdhPJxxN/by8wSi4sx7VGSfiAMnkSzB+vIfE15e3a+L8kGj7riJDxTJ9H711r1N1k6yacIAMP+79w== X-Received: by 2002:a17:90b:11d7:: with SMTP id gv23mr1619814pjb.2.1608763563202; Wed, 23 Dec 2020 14:46:03 -0800 (PST) Received: from ?IPv6:2601:647:4700:9b2:50a2:5929:401b:705e? ([2601:647:4700:9b2:50a2:5929:401b:705e]) by smtp.gmail.com with ESMTPSA id z7sm25119015pfq.193.2020.12.23.14.46.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Dec 2020 14:46:01 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect From: Nadav Amit In-Reply-To: Date: Wed, 23 Dec 2020 14:45:59 -0800 Cc: Yu Zhao , Peter Zijlstra , Minchan Kim , Linus Torvalds , Peter Xu , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Andy Lutomirski , Will Deacon Content-Transfer-Encoding: quoted-printable Message-Id: References: <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> To: Andrea Arcangeli X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Dec 23, 2020, at 2:05 PM, Andrea Arcangeli = wrote: >=20 > On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote: >>> On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote: >>>=20 >>> On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote: >>>> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit = wrote: >>>>> Using mmap_write_lock() was my initial fix and there was a strong = pushback >>>>> on this approach due to its potential impact on performance. >>>>=20 >>>> =46rom whom? >>>>=20 >>>> Somebody who doesn't understand that correctness is more important >>>> than performance? And that userfaultfd is not the most important = part >>>> of the system? >>>>=20 >>>> The fact is, userfaultfd is CLEARLY BUGGY. >>>>=20 >>>> Linus >>>=20 >>> Fair enough. >>>=20 >>> Nadav, for your patch (you might want to update the commit message). >>>=20 >>> Reviewed-by: Yu Zhao >>>=20 >>> While we are all here, there is also clear_soft_dirty() that could >>> use a similar fix=E2=80=A6 >>=20 >> Just an update as for why I have still not sent v2: I fixed >> clear_soft_dirty(), created a reproducer, and the reproducer kept = failing. >>=20 >> So after some debugging, it appears that clear_refs_write() does not = flush >> the TLB. It indeed calls tlb_finish_mmu() but since 0758cd830494 >> ("asm-generic/tlb: avoid potential double flush=E2=80=9D), = tlb_finish_mmu() does not >> flush the TLB since there is clear_refs_write() does not call to >> __tlb_adjust_range() (unless there are nested TLBs are pending). >>=20 >> So I have a patch for this issue too: arguably the tlb_gather = interface is >> not the right one for clear_refs_write() that does not clear PTEs but >> changes them. >>=20 >> Yet, sadly, my reproducer keeps falling (less frequently, but still). = So I >> will keep debugging to see what goes wrong. I will send v2 once I = figure out >> what the heck is wrong in the code or my reproducer. >=20 > If you put the page_mapcount check back in do_wp_page instead of > page_count, it'll stop reproducing but the bug is still very much > there... I know. I designed the (re)producer just to be able to hit the bug = without changing the kernel (and test my fix), but I am fully aware that it = would not work on older kernels although the bug is still there. > And then we'll have to enforce uffd-wp cannot be registered if > VM_SOFTDIRTY is set or the other way around so that VM_UFFD* is > mutually exclusive with VM_SOFTDIRTY. So then we can also unify the > bit so they all use the same software bit in the pgtable (that's > something I considered anyway originally since it doesn't make whole > lot of sense to use the two features on the same vma at the same > time). I think it may be reasonable. Just a proposal: At some point we can also ask ourselves whether the =E2=80=9Cartificial" limitation of the number of software bits per PTE = should really limit us, or do we want to hold some additional metadata per-PTE by = either putting it in an adjacent page (holding 64-bits of additional = software-bits per PTE) or by finding some place in the page-struct to link to this metadata (and have the liberty of number of bits per PTE). One of the = PTE software-bits can be repurposed to say whether there is = =E2=80=9Cextra-metadata=E2=80=9D associated with the PTE. I am fully aware that there will be some overhead associated, but it can be limited to less-common use-cases.