From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=KBoR=HE=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 98645C433DB
	for <linux-mm@archiver.kernel.org>; Tue,  2 Feb 2021 00:15:04 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 2FA0E64DDE
	for <linux-mm@archiver.kernel.org>; Tue,  2 Feb 2021 00:15:04 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2FA0E64DDE
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=amacapital.net
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 85F986B0005; Mon,  1 Feb 2021 19:15:03 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7E86E6B0006; Mon,  1 Feb 2021 19:15:03 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6B03D6B006E; Mon,  1 Feb 2021 19:15:03 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192])
	by kanga.kvack.org (Postfix) with ESMTP id 4EB946B0005
	for <linux-mm@kvack.org>; Mon,  1 Feb 2021 19:15:03 -0500 (EST)
Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 19E721EE6
	for <linux-mm@kvack.org>; Tue,  2 Feb 2021 00:15:03 +0000 (UTC)
X-FDA: 77771407686.10.side95_43173bb275c6
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin10.hostedemail.com (Postfix) with ESMTP id EB64816A4A4
	for <linux-mm@kvack.org>; Tue,  2 Feb 2021 00:15:02 +0000 (UTC)
X-HE-Tag: side95_43173bb275c6
X-Filterd-Recvd-Size: 6484
Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182])
	by imf16.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue,  2 Feb 2021 00:15:01 +0000 (UTC)
Received: by mail-pg1-f182.google.com with SMTP id c132so13599753pga.3
        for <linux-mm@kvack.org>; Mon, 01 Feb 2021 16:15:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=amacapital-net.20150623.gappssmtp.com; s=20150623;
        h=content-transfer-encoding:from:mime-version:subject:date:message-id
         :references:cc:in-reply-to:to;
        bh=c8//MWYPxllDiS7aiVWvuvZ5u+o3Llfwk59pxO392CA=;
        b=wcbV8xCkVuGdJEFtGv7Xqp2eeynTTNbHizH/im84FIh/EIpCJzQ7zYjYIa9eBay/zD
         H9zwuRAS2FRyJ6a0ktEVZLEO/NRXvbxRjIglMZ9hjl1AVzIECg93uRXgNd/ifpB0XLvn
         VdhgySsn7/9NEzA0jo65SYPTJKi3NBJDVwa9QJ8O0YW06GTjWqNdakvOFFgLPoTkeZt0
         03RRj/7aGLwp+7jl0sH3xlmBHcvGPWMEHYpzkbvxzGfgxR+vSNMQg1AYZFE0xKT9A62m
         L+dUQA/X1ahqSIQ3jvfp8T9WxpUbD/xOTEd6EcetqxseBPMNIY2Toos6p+tbh8bd/c08
         YZYw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:content-transfer-encoding:from:mime-version
         :subject:date:message-id:references:cc:in-reply-to:to;
        bh=c8//MWYPxllDiS7aiVWvuvZ5u+o3Llfwk59pxO392CA=;
        b=fvYWz7805PNLzpsgRusAZGu6YO8EcqPiv+71kTBMLqMSxyn5otg+8QnrEGaBaLSCH/
         08Mdqtv1fbbNnqPmfLrtbSAc9yVOICK1xQsRUr8AHahTXOZd95FyU+y3kislF5fcM5ZW
         MPG0HzOcrZ2+A3cx2+a2lMKI8ZNNKxrV9ikmtsGqvb09lNKrR96lF0lWCNTt8yNDEubZ
         kYIsuf+ZbE70dgmHJT9sViHVNdmVacHhheukhJ2hqxvYtgZSO6/xOhgrzFhPE0wN2Ps0
         my75q80rPdoOoStRaSIVD/y9HVxO76qLpnjv2Ni4jCF+5BNUHrYrCv8Co2w3OmIQhlx9
         c67g==
X-Gm-Message-State: AOAM532np3MRLp8BUQbFfGkDhREJ7ivEqKvlhPmHOMMj4laN7i+tB04v
	L1olwectrw6qAIiEkGCbfYm8dQ==
X-Google-Smtp-Source: ABdhPJw53jcn5hSuIG8ub4KmD6MDqHvXydl0r2qjhnl1FjYpKiDzNi4jrAIH68xn7NOoahHTgkQgsg==
X-Received: by 2002:a62:7acf:0:b029:1bd:e13a:fdbc with SMTP id v198-20020a627acf0000b02901bde13afdbcmr19006548pfc.52.1612224900461;
        Mon, 01 Feb 2021 16:15:00 -0800 (PST)
Received: from ?IPv6:2601:646:c200:1ef2:117a:7890:cdb4:ef1c? ([2601:646:c200:1ef2:117a:7890:cdb4:ef1c])
        by smtp.gmail.com with ESMTPSA id x8sm564282pjf.55.2021.02.01.16.14.59
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 01 Feb 2021 16:14:59 -0800 (PST)
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
From: Andy Lutomirski <luto@amacapital.net>
Mime-Version: 1.0 (1.0)
Subject: Re: [RFC 15/20] mm: detect deferred TLB flushes in vma granularity
Date: Mon, 1 Feb 2021 16:14:58 -0800
Message-Id: <8F37526F-8189-483A-A16E-E0EB8662AD98@amacapital.net>
References: <A6E4897D-8D5A-4084-8288-8E43F3039921@gmail.com>
Cc: Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>,
 Andy Lutomirski <luto@kernel.org>, Andrea Arcangeli <aarcange@redhat.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 Dave Hansen <dave.hansen@linux.intel.com>,
 Peter Zijlstra <peterz@infradead.org>,
 Thomas Gleixner <tglx@linutronix.de>, Will Deacon <will@kernel.org>,
 Yu Zhao <yuzhao@google.com>, X86 ML <x86@kernel.org>
In-Reply-To: <A6E4897D-8D5A-4084-8288-8E43F3039921@gmail.com>
To: Nadav Amit <nadav.amit@gmail.com>
X-Mailer: iPhone Mail (18C66)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


> On Feb 1, 2021, at 2:04 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>=20
> =EF=BB=BF
>>=20
>> On Jan 30, 2021, at 4:11 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>>=20
>> From: Nadav Amit <namit@vmware.com>
>>=20
>> Currently, deferred TLB flushes are detected in the mm granularity: if
>> there is any deferred TLB flush in the entire address space due to NUMA
>> migration, pte_accessible() in x86 would return true, and
>> ptep_clear_flush() would require a TLB flush. This would happen even if
>> the PTE resides in a completely different vma.
>=20
> [ snip ]
>=20
>> +static inline void read_defer_tlb_flush_gen(struct mmu_gather *tlb)
>> +{
>> +    struct mm_struct *mm =3D tlb->mm;
>> +    u64 mm_gen;
>> +
>> +    /*
>> +     * Any change of PTE before calling __track_deferred_tlb_flush() mus=
t be
>> +     * performed using RMW atomic operation that provides a memory barri=
ers,
>> +     * such as ptep_modify_prot_start().  The barrier ensure the PTEs ar=
e
>> +     * written before the current generation is read, synchronizing
>> +     * (implicitly) with flush_tlb_mm_range().
>> +     */
>> +    smp_mb__after_atomic();
>> +
>> +    mm_gen =3D atomic64_read(&mm->tlb_gen);
>> +
>> +    /*
>> +     * This condition checks for both first deferred TLB flush and for o=
ther
>> +     * TLB pending or executed TLB flushes after the last table that we
>> +     * updated. In the latter case, we are going to skip a generation, w=
hich
>> +     * would lead to a full TLB flush. This should therefore not cause
>> +     * correctness issues, and should not induce overheads, since anyhow=
 in
>> +     * TLB storms it is better to perform full TLB flush.
>> +     */
>> +    if (mm_gen !=3D tlb->defer_gen) {
>> +        VM_BUG_ON(mm_gen < tlb->defer_gen);
>> +
>> +        tlb->defer_gen =3D inc_mm_tlb_gen(mm);
>> +    }
>> +}
>=20
> Andy=E2=80=99s comments managed to make me realize this code is wrong. We m=
ust
> call inc_mm_tlb_gen(mm) every time.
>=20
> Otherwise, a CPU that saw the old tlb_gen and updated it in its local
> cpu_tlbstate on a context-switch. If the process was not running when the
> TLB flush was issued, no IPI will be sent to the CPU. Therefore, later
> switch_mm_irqs_off() back to the process will not flush the local TLB.
>=20
> I need to think if there is a better solution. Multiple calls to
> inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush
> instead of one that is specific to the ranges, once the flush actually tak=
es
> place. On x86 it=E2=80=99s practically a non-issue, since anyhow any updat=
e of more
> than 33-entries or so would cause a full TLB flush, but this is still ugly=
.
>=20

What if we had a per-mm ring buffer of flushes?  When starting a flush, we w=
ould stick the range in the ring buffer and, when flushing, we would read th=
e ring buffer to catch up.  This would mostly replace the flush_tlb_info str=
uct, and it would let us process multiple partial flushes together.=