From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17C8AC433DB for ; Wed, 23 Dec 2020 23:57:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D40D5224DF for ; Wed, 23 Dec 2020 23:57:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727814AbgLWX4q (ORCPT ); Wed, 23 Dec 2020 18:56:46 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:38927 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727029AbgLWX4p (ORCPT ); Wed, 23 Dec 2020 18:56:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608767718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fX9BrNYtyqUJqlsQzDSDD/LvzkNk1AuVbPuwYkphfN8=; b=B5+dGlJsrEAuD+7fCzNVP2eXQyNsjq4LkCzQy8tgzIRsdFqadtSsnJd2U5DKBaOfMncdGc mdKeaUUIXY/FdOcS5ke4lP3drKSqAnZ+DhSPnF+IEy/MgkH5lNdUM3v1vaQGCxak7+0g7c zRdTvvm8Lkx00GX8oPEOoai/YaFZqsY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-9uL81yDSN_yiXHLV5Sjddg-1; Wed, 23 Dec 2020 18:55:17 -0500 X-MC-Unique: 9uL81yDSN_yiXHLV5Sjddg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 15FB1801AC0; Wed, 23 Dec 2020 23:55:15 +0000 (UTC) Received: from mail (ovpn-112-5.rdu2.redhat.com [10.10.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8E4405D74C; Wed, 23 Dec 2020 23:55:11 +0000 (UTC) Date: Wed, 23 Dec 2020 18:55:11 -0500 From: Andrea Arcangeli To: Nadav Amit Cc: Yu Zhao , Peter Zijlstra , Minchan Kim , Linus Torvalds , Peter Xu , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Andy Lutomirski , Will Deacon Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/2.0.3 (2020-12-04) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 23, 2020 at 02:45:59PM -0800, Nadav Amit wrote: > I think it may be reasonable. Whatever solution used, there will be 2 users of it: uffd-wp will use whatever technique used by clear_refs_write to avoid the mmap_write_lock. My favorite is Yu's patch and not the group lock anymore. The cons is it changes the VM rules (which kind of reminds me my initial proposal of adding a spurious tlb flush if mm_tlb_flush_pending is set, except I didn't correctly specify it'd need to go in the page fault), but it still appears the simplest. > Just a proposal: At some point we can also ask ourselves whether the > “artificial" limitation of the number of software bits per PTE should really > limit us, or do we want to hold some additional metadata per-PTE by either > putting it in an adjacent page (holding 64-bits of additional software-bits > per PTE) or by finding some place in the page-struct to link to this > metadata (and have the liberty of number of bits per PTE). One of the PTE > software-bits can be repurposed to say whether there is “extra-metadata” > associated with the PTE. > > I am fully aware that there will be some overhead associated, but it > can be limited to less-common use-cases. That's a good point, so far far we didn't run out so it's not an immediate concern. (as opposed we run out in page->flags where the PG_tail went to some LSB). In general kicking the can down the road sounds like the best thing to do for those bit shortage matters, until we can't anymore at least.. There's no gain to the kernel runtime, in doing something generically good here (again see where PG_tail rightfully went). So before spending RAM and CPU, we'd need to find a more compact encoding with the bits we already have available. This reminds me again we could double check if we could make VM_UFFD_WP mutually exclusive with VM_SOFTDIRTY. I wasn't sure if it could ever happen in a legit way to use both at the same time (CRIU on a app using uffd-wp for its own internal mm management?). Theoretically it's too late already for it, but VM_UFFD_WP is relatively new, if we're sure it cannot ever happen in a legit way, it would be possible to at least evaluate/discuss it. This is an immediate matter. What we'll do if we later run out, is not an immediate matter instead, because it won't make our life any simpler to resolve it now. Thanks, Andrea