From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=3xAm=F2=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4C4ECC433DB
	for <linux-mm@archiver.kernel.org>; Tue, 22 Dec 2020 20:34:55 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id C0FBC22AB9
	for <linux-mm@archiver.kernel.org>; Tue, 22 Dec 2020 20:34:54 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0FBC22AB9
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 323436B00B4; Tue, 22 Dec 2020 15:34:54 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2D4C66B00B5; Tue, 22 Dec 2020 15:34:54 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 211416B00B6; Tue, 22 Dec 2020 15:34:54 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 0CEC36B00B4
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 15:34:54 -0500 (EST)
Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id C58398249980
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 20:34:53 +0000 (UTC)
X-FDA: 77622072066.11.talk98_60165dc27462
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin11.hostedemail.com (Postfix) with ESMTP id A7111180F8B87
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 20:34:53 +0000 (UTC)
X-HE-Tag: talk98_60165dc27462
X-Filterd-Recvd-Size: 5220
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by imf09.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 20:34:53 +0000 (UTC)
Received: by mail.kernel.org (Postfix) with ESMTPSA id E00D222B2D
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 20:34:51 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1608669292;
	bh=p6xKICnqooK/rn2BF9K9yaO+3IBNYsaVfdCpfQPveGU=;
	h=References:In-Reply-To:From:Date:Subject:To:Cc:From;
	b=SLKXP6jrkz/rqEjR50YozmGriXfGRnR3X8pWmdzGRL4xrqf0CgvW0SEv4e+UYJ8Vt
	 mqWWztiFWl9oqLT9EwcQl5K/BYtEcauJ382MV5Ew7Ajtt38YjOCGQkffUYQTCxvp+Y
	 6tKdgjoHKK0ZcUtt3wxn+5pW1pxRBchufAOxarwtuWqk+jGmI/LlZFyMxxTFV0ZjZf
	 m6sVGdFSWTTjJgRUF2Zr6mioXLy/xo03f19tplhp9NvZhmF7Njhqh6m8uwyeuotF9c
	 EXGEOPiwBSFliAHbbYuMCvxDeJoTBvQL+R/Lz+4mtVJURQqTSs/WnXqD+W+UW39EKa
	 5VVGKwdkYBekg==
Received: by mail-wr1-f52.google.com with SMTP id t30so16013856wrb.0
        for <linux-mm@kvack.org>; Tue, 22 Dec 2020 12:34:51 -0800 (PST)
X-Gm-Message-State: AOAM533/pCdomfdkvQINUhGnrCAx8XJppy5xSCkdCYdqBF3hS+WuF4B8
	KiPapiOfY9zMQWPnDZkJ10u7TB9j269XEiw1HMoNGw==
X-Google-Smtp-Source: ABdhPJwO+xzWqh2HS90TjCvmcwzK37BMFuVHXXj/xQOQL2gX5GxNaX+rXIJ8+69OQJax3KDvljpQFskv6YHY3ZzT1Zs=
X-Received: by 2002:a5d:43c3:: with SMTP id v3mr25514235wrr.184.1608669290441;
 Tue, 22 Dec 2020 12:34:50 -0800 (PST)
MIME-Version: 1.0
References: <20201219043006.2206347-1-namit@vmware.com> <X95RRZ3hkebEmmaj@redhat.com>
 <EDC00345-B46E-4396-8379-98E943723809@gmail.com> <DD367393-D1B3-4A84-AF92-9C6BAEAB40DC@gmail.com>
In-Reply-To: <DD367393-D1B3-4A84-AF92-9C6BAEAB40DC@gmail.com>
From: Andy Lutomirski <luto@kernel.org>
Date: Tue, 22 Dec 2020 12:34:38 -0800
X-Gmail-Original-Message-ID: <CALCETrXLH7vPep-h4fBFSft1YEkyZQo_7W2uh017rHKYT=Occw@mail.gmail.com>
Message-ID: <CALCETrXLH7vPep-h4fBFSft1YEkyZQo_7W2uh017rHKYT=Occw@mail.gmail.com>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>, linux-mm <linux-mm@kvack.org>, 
	Peter Xu <peterx@redhat.com>, lkml <linux-kernel@vger.kernel.org>, 
	Pavel Emelyanov <xemul@openvz.org>, Mike Kravetz <mike.kravetz@oracle.com>, 
	Mike Rapoport <rppt@linux.vnet.ibm.com>, stable <stable@vger.kernel.org>, 
	Minchan Kim <minchan@kernel.org>, Andy Lutomirski <luto@kernel.org>, Yu Zhao <yuzhao@google.com>, 
	Will Deacon <will@kernel.org>, Peter Zijlstra <peterz@infradead.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Sat, Dec 19, 2020 at 2:06 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> > [ I have in mind another solution, such as keeping in each page-table a
> > =E2=80=9Ctable-generation=E2=80=9D which is the mm-generation at the ti=
me of the change,
> > and only flush if =E2=80=9Ctable-generation=E2=80=9D=3D=3D=E2=80=9Cmm-g=
eneration=E2=80=9D, but it requires
> > some thought on how to avoid adding new memory barriers. ]
> >
> > IOW: I think the change that you suggest is insufficient, and a proper
> > solution is too intrusive for =E2=80=9Cstable".
> >
> > As for performance, I can add another patch later to remove the TLB flu=
sh
> > that is unnecessarily performed during change_protection_range() that d=
oes
> > permission promotion. I know that your concern is about the =E2=80=9Cpr=
otect=E2=80=9D case
> > but I cannot think of a good immediate solution that avoids taking mmap=
_lock
> > for write.
> >
> > Thoughts?
>
> On a second thought (i.e., I don=E2=80=99t know what I was thinking), doi=
ng so =E2=80=94
> checking mm_tlb_flush_pending() on every PTE read which is potentially
> dangerous and flushing if needed - can lead to huge amount of TLB flushes
> and shootodowns as the counter might be elevated for considerable amount =
of
> time.

I've lost track as to whether we still think that this particular
problem is really a problem, but could we perhaps make the
tlb_flush_pending field be per-ptl instead of per-mm?  Depending on
how it gets used, it could plausibly be done without atomics or
expensive barriers by using PTL to protect the field.

FWIW, x86 has a mm generation counter, and I don't think it would be
totally crazy to find a way to expose an mm generation to core code.
I don't think we'd want to expose the specific data structures that
x86 uses to track it -- they're very tailored to the oddities of x86
TLB management.  x86 also doesn't currently have any global concept of
which mm generation is guaranteed to have been propagated to all CPUs
-- we track the generation in the pagetables and, per cpu, the
generation that we know that CPU has seen.  x86 could offer a function
"ensure that all CPUs catch up to mm generation G and don't return
until this happens" and its relative "have all CPUs caught up to mm
generation G", but these would need to look at data from multiple CPUs
and would probably be too expensive on very large systems to use in
normal page faults unless we were to cache the results somewhere.
Making a nice cache for this is surely doable, but maybe more
complexity than we'd want.

--Andy