From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DD1FC433E6 for ; Wed, 23 Dec 2020 23:41:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2BBFD22287 for ; Wed, 23 Dec 2020 23:41:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727147AbgLWXkz (ORCPT ); Wed, 23 Dec 2020 18:40:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728168AbgLWXkz (ORCPT ); Wed, 23 Dec 2020 18:40:55 -0500 Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97BEAC06179C for ; Wed, 23 Dec 2020 15:40:14 -0800 (PST) Received: by mail-lf1-x133.google.com with SMTP id l11so1235439lfg.0 for ; Wed, 23 Dec 2020 15:40:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZJKV+DKmW9cukKBH1Gw4pXmY6HG23GwwQEMYiVw4sdI=; b=Q7GCA8ElDazk7r78Qa460iWOF7WAdDbE+wL5TTYsge2luCiMZOkmXR8dsc3u3H5opr taTbuaRgfi4x7GKjIMQ8nsKLMLcALaZ6srNL/WZYy/zn5PN/wdGzdlTr5l8Wc/1+kKGk yYn91v5QqoxP7Vz9O6m2pOiBKSU7Bmss5l5Ww= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZJKV+DKmW9cukKBH1Gw4pXmY6HG23GwwQEMYiVw4sdI=; b=VpLQnjPZQBjTi/zLR0TME9tOGpUgC7yftaC6liZAK4Aa2for2T1qOy8q4e5GitEpLS 39m2WhfyMYmM0o47c9Y5BRBSBqFdIpx21uXbJ/3/rYCRWMNRisQiHm5SiyH2jOF7R6jN fovWuXlHlXPRUzvIP053wC3t4mDU9slsEh+Pn1f6je3IKdz6B1KM8vEzSZhdmNmz6O50 SvDlJo+DK/YKANVmUUCzqAVkFDBhhdVUdqXDQ02R0cwQwT63UcXVsKNHClxF5ZUVtGYs yQvUGcnTy9DGKi5UE1yJ9wmAJ9OwUvGU5pdpEaoV11Yhd8m7vJVOCESR0nlrwuq/F66g EoGg== X-Gm-Message-State: AOAM530jMGmy4x/PaiwqixlLlziSgfxmkiRYbbl0SsSz33b6BUgwgbcD a/uQxCi0io2j1K4EE4rfmg8vmVQJ7oEVbQ== X-Google-Smtp-Source: ABdhPJyfowGN48/kNr837lc8Z8YdUoCHEO4yWD49zgyfBRWaYCyDKHqgbnFzReUmpiUmoJHFcZQcQg== X-Received: by 2002:a2e:8e62:: with SMTP id t2mr13141304ljk.463.1608766812554; Wed, 23 Dec 2020 15:40:12 -0800 (PST) Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com. [209.85.167.47]) by smtp.gmail.com with ESMTPSA id q25sm3398765lfd.282.2020.12.23.15.40.10 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Dec 2020 15:40:10 -0800 (PST) Received: by mail-lf1-f47.google.com with SMTP id m12so1147465lfo.7 for ; Wed, 23 Dec 2020 15:40:10 -0800 (PST) X-Received: by 2002:a2e:3211:: with SMTP id y17mr12310117ljy.61.1608766809995; Wed, 23 Dec 2020 15:40:09 -0800 (PST) MIME-Version: 1.0 References: <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> In-Reply-To: From: Linus Torvalds Date: Wed, 23 Dec 2020 15:39:53 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect To: Andrea Arcangeli Cc: Yu Zhao , Andy Lutomirski , Peter Xu , Nadav Amit , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 23, 2020 at 1:39 PM Andrea Arcangeli wrote: > > On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote: > > Thanks for the details. > > I hope we can find a way put the page_mapcount back where there's a > page_count right now. I really don't think that's ever going to happen - at least if you're talking about do_wp_page(). I refuse to make the *simple* core operations VM have to jump through hoops, and the old COW mapcount logic was that. I *much* prefer the newer COW code, because the rules are more straightforward. > page_count is far from optimal, but it is a feature it finally allowed > us to notice that various code (clear_refs_write included apparently > even after the fix) leaves stale too permissive TLB entries when it > shouldn't. I absolutely agree that page_count isn't exactly optimal, but "mapcount" is just so much worse. page_count() is at least _logical_, and has a very clear meaning: "this page has other users". mapcount() means something else, and is simply not sufficient or relevant wrt COW. That doesn't mean that page_mapcount() is wrong - it's just that it's wrong for COW. page_mapcount() is great for rmap, so that we can see when we need to shoot down a memory mapping of a page that gets released (truncate being the classic example). I think that the mapcount games we used to have were horrible. I absolutely much prefer where we are now wrt COW. The modern rules for COW handling are: - if we got a COW fault there might be another user, we copy (and this is where the page count makes so much logical sense). - if somebody needs to pin the page in the VM, we either make sure that it is pre-COWed and we (a) either never turn it back into a COW page (ie the fork()-time stuff we have for pinned pages) (b) or there is some explicit marker on the page or in the page table (ie the userfaultfd_pte_wp thing). those are _so_ much more straightforward than the very complex rules we used to have that were actively buggy, in addition to requiring the page lock. So they were buggy and slow. And yes, I had forgotten about that userfaultfd_pte_wp() because I was being myopic and only looking at wp_page_copy(). So using that as a way to make sure that a page doesn't go through COW is a good way to avoid the COW race, but I think that thing requires a bit in the page table which might be a problem on some architectures? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D75B0C433DB for ; Wed, 23 Dec 2020 23:40:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1CF8621D79 for ; Wed, 23 Dec 2020 23:40:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1CF8621D79 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 574808D005C; Wed, 23 Dec 2020 18:40:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 524F98D0026; Wed, 23 Dec 2020 18:40:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43B218D005C; Wed, 23 Dec 2020 18:40:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 2D7628D0026 for ; Wed, 23 Dec 2020 18:40:15 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E12F6181AC9B6 for ; Wed, 23 Dec 2020 23:40:14 +0000 (UTC) X-FDA: 77626167948.01.grain69_0f0caee2746c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id C35851004E71E for ; Wed, 23 Dec 2020 23:40:14 +0000 (UTC) X-HE-Tag: grain69_0f0caee2746c X-Filterd-Recvd-Size: 6369 Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Dec 2020 23:40:14 +0000 (UTC) Received: by mail-lf1-f45.google.com with SMTP id a12so1155327lfl.6 for ; Wed, 23 Dec 2020 15:40:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZJKV+DKmW9cukKBH1Gw4pXmY6HG23GwwQEMYiVw4sdI=; b=Q7GCA8ElDazk7r78Qa460iWOF7WAdDbE+wL5TTYsge2luCiMZOkmXR8dsc3u3H5opr taTbuaRgfi4x7GKjIMQ8nsKLMLcALaZ6srNL/WZYy/zn5PN/wdGzdlTr5l8Wc/1+kKGk yYn91v5QqoxP7Vz9O6m2pOiBKSU7Bmss5l5Ww= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZJKV+DKmW9cukKBH1Gw4pXmY6HG23GwwQEMYiVw4sdI=; b=HAMI3c634D487BhbhSB2xV+pMiry/QR14eGe8jStOzz74TwykFKQWnCRVCLJ5Uw/Vn rU90uFRQN10hj1WsvO5SpeeY2tk2qecoJG+lVS8KSbpxkSQxL7j0clkVZnAd2aq2tlL2 3qGGnQl2xXQkaSak4aLYn8qANcXgTbLuenXtyYZKf50kuCQW/GUqA/UV8ievXWDobjZj aagFbggZhjj6dc/ueWjy4KkY0dBhj62KQYKDVsxuE/bJEPwMUQft8WzqIGxhiv3rfer0 llB/mrToVKImMb0f2Qlzn5NYA/U+amESxKr8c7R7QEbVmdlr+HLnadXyMJ0YJ57IWvh1 upTg== X-Gm-Message-State: AOAM531wYvQJ1eNrzHy0S/5v4dh9ZT8fs1IElS7Q8XQ5DvGVGj2ruGb6 aO255cZCWe5UQ0FNzdSNfqJENurFwwZ9LA== X-Google-Smtp-Source: ABdhPJzEe9bFpplLHOfOr/m6Il5dBL8NVvd/nCA+O0mBcl2E4zOhQ5HIVFwOCEVTgLb310JMRajGGA== X-Received: by 2002:a19:6b19:: with SMTP id d25mr11410010lfa.282.1608766812181; Wed, 23 Dec 2020 15:40:12 -0800 (PST) Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com. [209.85.167.45]) by smtp.gmail.com with ESMTPSA id q9sm3745364ljm.113.2020.12.23.15.40.10 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Dec 2020 15:40:10 -0800 (PST) Received: by mail-lf1-f45.google.com with SMTP id 23so1110414lfg.10 for ; Wed, 23 Dec 2020 15:40:10 -0800 (PST) X-Received: by 2002:a2e:3211:: with SMTP id y17mr12310117ljy.61.1608766809995; Wed, 23 Dec 2020 15:40:09 -0800 (PST) MIME-Version: 1.0 References: <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> In-Reply-To: From: Linus Torvalds Date: Wed, 23 Dec 2020 15:39:53 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect To: Andrea Arcangeli Cc: Yu Zhao , Andy Lutomirski , Peter Xu , Nadav Amit , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 23, 2020 at 1:39 PM Andrea Arcangeli wrote: > > On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote: > > Thanks for the details. > > I hope we can find a way put the page_mapcount back where there's a > page_count right now. I really don't think that's ever going to happen - at least if you're talking about do_wp_page(). I refuse to make the *simple* core operations VM have to jump through hoops, and the old COW mapcount logic was that. I *much* prefer the newer COW code, because the rules are more straightforward. > page_count is far from optimal, but it is a feature it finally allowed > us to notice that various code (clear_refs_write included apparently > even after the fix) leaves stale too permissive TLB entries when it > shouldn't. I absolutely agree that page_count isn't exactly optimal, but "mapcount" is just so much worse. page_count() is at least _logical_, and has a very clear meaning: "this page has other users". mapcount() means something else, and is simply not sufficient or relevant wrt COW. That doesn't mean that page_mapcount() is wrong - it's just that it's wrong for COW. page_mapcount() is great for rmap, so that we can see when we need to shoot down a memory mapping of a page that gets released (truncate being the classic example). I think that the mapcount games we used to have were horrible. I absolutely much prefer where we are now wrt COW. The modern rules for COW handling are: - if we got a COW fault there might be another user, we copy (and this is where the page count makes so much logical sense). - if somebody needs to pin the page in the VM, we either make sure that it is pre-COWed and we (a) either never turn it back into a COW page (ie the fork()-time stuff we have for pinned pages) (b) or there is some explicit marker on the page or in the page table (ie the userfaultfd_pte_wp thing). those are _so_ much more straightforward than the very complex rules we used to have that were actively buggy, in addition to requiring the page lock. So they were buggy and slow. And yes, I had forgotten about that userfaultfd_pte_wp() because I was being myopic and only looking at wp_page_copy(). So using that as a way to make sure that a page doesn't go through COW is a good way to avoid the COW race, but I think that thing requires a bit in the page table which might be a problem on some architectures? Linus