From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F123C433DB for ; Mon, 1 Feb 2021 22:04:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C472064E2A for ; Mon, 1 Feb 2021 22:04:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C472064E2A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 45F2B6B0006; Mon, 1 Feb 2021 17:04:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 40E8F6B006E; Mon, 1 Feb 2021 17:04:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FE4F6B0070; Mon, 1 Feb 2021 17:04:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id 145096B0006 for ; Mon, 1 Feb 2021 17:04:51 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CDE57362B for ; Mon, 1 Feb 2021 22:04:50 +0000 (UTC) X-FDA: 77771079540.15.trees84_2a04099275c5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 9D42C1814B0CA for ; Mon, 1 Feb 2021 22:04:50 +0000 (UTC) X-HE-Tag: trees84_2a04099275c5 X-Filterd-Recvd-Size: 5897 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Mon, 1 Feb 2021 22:04:50 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id d13so10844368plg.0 for ; Mon, 01 Feb 2021 14:04:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=560t6fBoRJB7pYk+ynZBL69vW684cbwpnt+V28G7u9E=; b=QPkHQFq6Z49wZLxc9MO+bZW/z2SwMNCcUMkQkCgAyST/fL4TXcwsxZoC/1MYxtYHru RgWmLK9mnhYR3Y5crET85kdbAIBm39MKCeCqR0vcHL4KorrO+bhSXUWJnuYcyG8xixdX UnLRCgP/6pBgGHvLIysFxidvtauEaBQp6U9ayK0Ms03mm7exV3F3J1QLICXsfOedHQOM 8kgWZalGMxShVN0ymQn6t6KlDqhFgxZx7vS91rCnJyPn/YjvKZ1p80y2hGnB84MSJldA Zzu1KpCtamr+lYDDdpT8xdIMBsDwBymmGs8J5d3oS/eqN37JA7pqvanI71ExM9yCde4T zUyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=560t6fBoRJB7pYk+ynZBL69vW684cbwpnt+V28G7u9E=; b=Liwn1AdgPRdpDKRNiTSbNf0iuTEr0s5eUHFNKdqK0Q9xIh/1NftE8amxI8QWb3V5vO gt1sUUzGs6DGtwIOfalhjd2CcWb1FEnAnzD5K98fBOGJsfD5lLSRLbUbJF9HvM1g4Drs oJdWFdxkOsyQBdYd3fBUdXrpK7CKFmgPUDKsZJjHQWBfL61Ugt/zyeCZlrOCYSu+N6WA gWFbgBYsK4aiv5iKMDdLlH+YnXzKlsxzPsXpiBAPT8byPK0nUHQcbl0lAJFY/OPHvcld ljmxi45uuz+sjz+0Fn/He+MuzmDbyilWpwA40uuCKTyJNOk5WUioco1kH9NzLBFeAJnz ZyvA== X-Gm-Message-State: AOAM5319nBfXLreuvlew/0gqWe3TKaplhGDoxL9g48lnTGXsIzqL4Hno EJGcndXlB25D/D/ik43VQxKy9TXn/RAqww== X-Google-Smtp-Source: ABdhPJwCsZCRkAE/Mwiu7DtQNEGp0BUUgOUSP+osTV8dFNdnbg9ewRz/DzXZpVRBIuKdXvYUlPotZQ== X-Received: by 2002:a17:90b:14cf:: with SMTP id jz15mr919036pjb.180.1612217088619; Mon, 01 Feb 2021 14:04:48 -0800 (PST) Received: from [192.168.88.245] (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id e21sm18869724pgv.74.2021.02.01.14.04.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 01 Feb 2021 14:04:47 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: [RFC 15/20] mm: detect deferred TLB flushes in vma granularity From: Nadav Amit In-Reply-To: <20210131001132.3368247-16-namit@vmware.com> Date: Mon, 1 Feb 2021 14:04:45 -0800 Cc: Andrea Arcangeli , Andrew Morton , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , X86 ML Content-Transfer-Encoding: quoted-printable Message-Id: References: <20210131001132.3368247-1-namit@vmware.com> <20210131001132.3368247-16-namit@vmware.com> To: Linux-MM , LKML , Andy Lutomirski X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Jan 30, 2021, at 4:11 PM, Nadav Amit wrote: >=20 > From: Nadav Amit >=20 > Currently, deferred TLB flushes are detected in the mm granularity: if > there is any deferred TLB flush in the entire address space due to = NUMA > migration, pte_accessible() in x86 would return true, and > ptep_clear_flush() would require a TLB flush. This would happen even = if > the PTE resides in a completely different vma. [ snip ] > +static inline void read_defer_tlb_flush_gen(struct mmu_gather *tlb) > +{ > + struct mm_struct *mm =3D tlb->mm; > + u64 mm_gen; > + > + /* > + * Any change of PTE before calling __track_deferred_tlb_flush() = must be > + * performed using RMW atomic operation that provides a memory = barriers, > + * such as ptep_modify_prot_start(). The barrier ensure the = PTEs are > + * written before the current generation is read, synchronizing > + * (implicitly) with flush_tlb_mm_range(). > + */ > + smp_mb__after_atomic(); > + > + mm_gen =3D atomic64_read(&mm->tlb_gen); > + > + /* > + * This condition checks for both first deferred TLB flush and = for other > + * TLB pending or executed TLB flushes after the last table that = we > + * updated. In the latter case, we are going to skip a = generation, which > + * would lead to a full TLB flush. This should therefore not = cause > + * correctness issues, and should not induce overheads, since = anyhow in > + * TLB storms it is better to perform full TLB flush. > + */ > + if (mm_gen !=3D tlb->defer_gen) { > + VM_BUG_ON(mm_gen < tlb->defer_gen); > + > + tlb->defer_gen =3D inc_mm_tlb_gen(mm); > + } > +} Andy=E2=80=99s comments managed to make me realize this code is wrong. = We must call inc_mm_tlb_gen(mm) every time. Otherwise, a CPU that saw the old tlb_gen and updated it in its local cpu_tlbstate on a context-switch. If the process was not running when = the TLB flush was issued, no IPI will be sent to the CPU. Therefore, later switch_mm_irqs_off() back to the process will not flush the local TLB. I need to think if there is a better solution. Multiple calls to inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush instead of one that is specific to the ranges, once the flush actually = takes place. On x86 it=E2=80=99s practically a non-issue, since anyhow any = update of more than 33-entries or so would cause a full TLB flush, but this is still = ugly.