From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-17.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C0FC3C64E7A
	for <linux-kernel@archiver.kernel.org>; Thu,  3 Dec 2020 05:10:34 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 633F4208A9
	for <linux-kernel@archiver.kernel.org>; Thu,  3 Dec 2020 05:10:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728197AbgLCFKc (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 3 Dec 2020 00:10:32 -0500
Received: from mail.kernel.org ([198.145.29.99]:51464 "EHLO mail.kernel.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1726071AbgLCFKc (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 3 Dec 2020 00:10:32 -0500
X-Gm-Message-State: AOAM532lSLVNTKFudxYosP6ETobPLqnHLzPlwsa418Q3qHIuocfZXdwL
        VAx9YH1APIldxFJlXk40wRyLWdhJ06psTvu+VyyQ9Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1606972190;
        bh=7g2I9ZPxa4pchuE7dPGHoljxF4dRWBnOMGl/P7H9WIA=;
        h=References:In-Reply-To:From:Date:Subject:To:Cc:From;
        b=tfMeMJ93rqivue3nmkFMInQS2xqr4igxWM3EfAPRbMi6rlAjonbMVtq21YSenddTX
         7K2KbW+u0Z1V8Uz+Bz0A6TPWn3pAqW934WfaUQmmklKjgbkxuXgfvSR9X9p03VSdkw
         IGfD6169dI1yzRY1dfx39xZuEK962uFAyXSe97o8MpsV7BPearkqNkYviq/qNdZ2Uy
         7TttBTGKbY/s1CxOcVALnPKQJuM9jqxg7/yc/ahVe/Pn8BFn++bIF8WjHYY3ckDhFW
         MWbt36hmdgtpKIZDDg7m6pya6naGxSKStcED9XQ0SIw1Qe9Hm8rVu84WJJCUnP+jP1
         MdhhNnDRlQk1g==
X-Google-Smtp-Source: ABdhPJxDY9JVxN1buVB6rnmNib1VO7P6DXlfg5FcFerSQ1RvZb4vgVhneeBCk5IE2FYo99+L/kmDBlVlLd9sen+gCuk=
X-Received: by 2002:a1c:7e87:: with SMTP id z129mr1177759wmc.176.1606972188903;
 Wed, 02 Dec 2020 21:09:48 -0800 (PST)
MIME-Version: 1.0
References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-3-npiggin@gmail.com>
 <CALCETrXYkbuJJnDv9ijfT+5tLQ2FOvvN1Ugoh5NwOy+zHp9HXA@mail.gmail.com> <1606876327.dyrhkih2kh.astroid@bobo.none>
In-Reply-To: <1606876327.dyrhkih2kh.astroid@bobo.none>
From:   Andy Lutomirski <luto@kernel.org>
Date:   Wed, 2 Dec 2020 21:09:37 -0800
X-Gmail-Original-Message-ID: <CALCETrV8Z5JdsP-Qa8B6y01LmXnSruOEWVt9_Un1RX1+nZuhxw@mail.gmail.com>
Message-ID: <CALCETrV8Z5JdsP-Qa8B6y01LmXnSruOEWVt9_Un1RX1+nZuhxw@mail.gmail.com>
Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
To:     Nicholas Piggin <npiggin@gmail.com>
Cc:     Andy Lutomirski <luto@kernel.org>,
        Anton Blanchard <anton@ozlabs.org>,
        Arnd Bergmann <arnd@arndb.de>,
        linux-arch <linux-arch@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Linux-MM <linux-mm@kvack.org>,
        linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        Peter Zijlstra <peterz@infradead.org>, X86 ML <x86@kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am:
> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>
> >> And get rid of the generic sync_core_before_usermode facility. This is
> >> functionally a no-op in the core scheduler code, but it also catches
> >>
> >> This helper is the wrong way around I think. The idea that membarrier
> >> state requires a core sync before returning to user is the easy one
> >> that does not need hiding behind membarrier calls. The gap in core
> >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the
> >> tricky detail that is better put in x86 lazy tlb code.
> >>
> >> Consider if an arch did not synchronize core in switch_mm either, then
> >> membarrier_mm_sync_core_before_usermode would be in the wrong place
> >> but arch specific mmu context functions would still be the right place.
> >> There is also a exit_lazy_tlb case that is not covered by this call, which
> >> could be a bugs (kthread use mm the membarrier process's mm then context
> >> switch back to the process without switching mm or lazy mm switch).
> >>
> >> This makes lazy tlb code a bit more modular.
> >
> > I have a couple of membarrier fixes that I want to send out today or
> > tomorrow, and they might eliminate the need for this patch.  Let me
> > think about this a little bit.  I'll cc you.  The existing code is way
> > to subtle and the comments are far too confusing for me to be quickly
> > confident about any of my conclusions :)
> >
>
> Thanks for the head's up. I'll have to have a better look through them
> but I don't know that it eliminates the need for this entirely although
> it might close some gaps and make this not a bug fix. The problem here
> is x86 code wanted something to be called when a lazy mm is unlazied,
> but it missed some spots and also the core scheduler doesn't need to
> know about those x86 details if it has this generic call that annotates
> the lazy handling better.

I'll send v3 tomorrow.  They add more sync_core_before_usermode() callers.

Having looked at your patches a bunch and the membarrier code a bunch,
I don't think I like the approach of pushing this logic out into new
core functions called by arch code.  Right now, even with my
membarrier patches applied, understanding how (for example) the x86
switch_mm_irqs_off() plus the scheduler code provides the barriers
that membarrier needs is quite complicated, and it's not clear to me
that the code is even correct.  At the very least I'm pretty sure that
the x86 comments are misleading.  With your patches, someone trying to
audit the code would have to follow core code calling into arch code
and back out into core code to figure out what's going on.  I think
the result is worse.

I wrote this incomplete patch which takes the opposite approach (sorry
for whitespace damage):

commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Dec 2 17:24:02 2020 -0800

    [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code

    The core scheduler isn't a great place for
    membarrier_mm_sync_core_before_usermode() -- the core scheduler
    doesn't actually know whether we are lazy.  With the old code, if a
    CPU is running a membarrier-registered task, goes idle, gets unlazied
    via a TLB shootdown IPI, and switches back to the
    membarrier-registered task, it will do an unnecessary core sync.

    Conveniently, x86 is the only architecture that does anything in this
    hook, so we can just move the code.

    XXX: actually delete the old code.

    Signed-off-by: Andy Lutomirski <luto@kernel.org>

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3338a1feccf9..e27300fc865b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
          * from one thread in a process to another thread in the same
          * process. No TLB flush required.
          */
+
+        // XXX: why is this okay wrt membarrier?
         if (!was_lazy)
             return;

@@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
         smp_mb();
         next_tlb_gen = atomic64_read(&next->context.tlb_gen);
         if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
-                next_tlb_gen)
+            next_tlb_gen) {
+            /*
+             * We're reactivating an mm, and membarrier might
+             * need to serialize.  Tell membarrier.
+             */
+
+            // XXX: I can't understand the logic in
+            // membarrier_mm_sync_core_before_usermode().  What's
+            // the mm check for?
+            membarrier_mm_sync_core_before_usermode(next);
             return;
+        }

         /*
          * TLB contents went out of date while we were in lazy
          * mode. Fall through to the TLB switching code below.
+         * No need for an explicit membarrier invocation -- the CR3
+         * write will serialize.
          */
         new_asid = prev_asid;
         need_flush = true;

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=pR5i=FH=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-17.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 98D0BC63777
	for <linux-mm@archiver.kernel.org>; Thu,  3 Dec 2020 05:09:54 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id D0B80208A9
	for <linux-mm@archiver.kernel.org>; Thu,  3 Dec 2020 05:09:53 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0B80208A9
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 31D706B005C; Thu,  3 Dec 2020 00:09:53 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2CDFB6B005D; Thu,  3 Dec 2020 00:09:53 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1E2F16B0068; Thu,  3 Dec 2020 00:09:53 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34])
	by kanga.kvack.org (Postfix) with ESMTP id 051866B005C
	for <linux-mm@kvack.org>; Thu,  3 Dec 2020 00:09:53 -0500 (EST)
Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id BAA8F8249980
	for <linux-mm@kvack.org>; Thu,  3 Dec 2020 05:09:52 +0000 (UTC)
X-FDA: 77550793824.24.cars49_5a16659273b9
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin24.hostedemail.com (Postfix) with ESMTP id 988601A4A0
	for <linux-mm@kvack.org>; Thu,  3 Dec 2020 05:09:52 +0000 (UTC)
X-HE-Tag: cars49_5a16659273b9
X-Filterd-Recvd-Size: 7582
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by imf12.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu,  3 Dec 2020 05:09:51 +0000 (UTC)
X-Gm-Message-State: AOAM530t9ty4bNIA43Rm/nac5HAYfcs6NCMBY5F3FLLohb3K7Wo4h3gw
	gzKivaR9a4beG3L3ONcAT7eODRHWvawVuB4/EDtxqQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1606972190;
	bh=7g2I9ZPxa4pchuE7dPGHoljxF4dRWBnOMGl/P7H9WIA=;
	h=References:In-Reply-To:From:Date:Subject:To:Cc:From;
	b=tfMeMJ93rqivue3nmkFMInQS2xqr4igxWM3EfAPRbMi6rlAjonbMVtq21YSenddTX
	 7K2KbW+u0Z1V8Uz+Bz0A6TPWn3pAqW934WfaUQmmklKjgbkxuXgfvSR9X9p03VSdkw
	 IGfD6169dI1yzRY1dfx39xZuEK962uFAyXSe97o8MpsV7BPearkqNkYviq/qNdZ2Uy
	 7TttBTGKbY/s1CxOcVALnPKQJuM9jqxg7/yc/ahVe/Pn8BFn++bIF8WjHYY3ckDhFW
	 MWbt36hmdgtpKIZDDg7m6pya6naGxSKStcED9XQ0SIw1Qe9Hm8rVu84WJJCUnP+jP1
	 MdhhNnDRlQk1g==
X-Google-Smtp-Source: ABdhPJxDY9JVxN1buVB6rnmNib1VO7P6DXlfg5FcFerSQ1RvZb4vgVhneeBCk5IE2FYo99+L/kmDBlVlLd9sen+gCuk=
X-Received: by 2002:a1c:7e87:: with SMTP id z129mr1177759wmc.176.1606972188903;
 Wed, 02 Dec 2020 21:09:48 -0800 (PST)
MIME-Version: 1.0
References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-3-npiggin@gmail.com>
 <CALCETrXYkbuJJnDv9ijfT+5tLQ2FOvvN1Ugoh5NwOy+zHp9HXA@mail.gmail.com> <1606876327.dyrhkih2kh.astroid@bobo.none>
In-Reply-To: <1606876327.dyrhkih2kh.astroid@bobo.none>
From: Andy Lutomirski <luto@kernel.org>
Date: Wed, 2 Dec 2020 21:09:37 -0800
X-Gmail-Original-Message-ID: <CALCETrV8Z5JdsP-Qa8B6y01LmXnSruOEWVt9_Un1RX1+nZuhxw@mail.gmail.com>
Message-ID: <CALCETrV8Z5JdsP-Qa8B6y01LmXnSruOEWVt9_Un1RX1+nZuhxw@mail.gmail.com>
Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>, Anton Blanchard <anton@ozlabs.org>, Arnd Bergmann <arnd@arndb.de>, 
	linux-arch <linux-arch@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, 
	Linux-MM <linux-mm@kvack.org>, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, 
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Peter Zijlstra <peterz@infradead.org>, 
	X86 ML <x86@kernel.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am:
> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>
> >> And get rid of the generic sync_core_before_usermode facility. This is
> >> functionally a no-op in the core scheduler code, but it also catches
> >>
> >> This helper is the wrong way around I think. The idea that membarrier
> >> state requires a core sync before returning to user is the easy one
> >> that does not need hiding behind membarrier calls. The gap in core
> >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the
> >> tricky detail that is better put in x86 lazy tlb code.
> >>
> >> Consider if an arch did not synchronize core in switch_mm either, then
> >> membarrier_mm_sync_core_before_usermode would be in the wrong place
> >> but arch specific mmu context functions would still be the right place.
> >> There is also a exit_lazy_tlb case that is not covered by this call, which
> >> could be a bugs (kthread use mm the membarrier process's mm then context
> >> switch back to the process without switching mm or lazy mm switch).
> >>
> >> This makes lazy tlb code a bit more modular.
> >
> > I have a couple of membarrier fixes that I want to send out today or
> > tomorrow, and they might eliminate the need for this patch.  Let me
> > think about this a little bit.  I'll cc you.  The existing code is way
> > to subtle and the comments are far too confusing for me to be quickly
> > confident about any of my conclusions :)
> >
>
> Thanks for the head's up. I'll have to have a better look through them
> but I don't know that it eliminates the need for this entirely although
> it might close some gaps and make this not a bug fix. The problem here
> is x86 code wanted something to be called when a lazy mm is unlazied,
> but it missed some spots and also the core scheduler doesn't need to
> know about those x86 details if it has this generic call that annotates
> the lazy handling better.

I'll send v3 tomorrow.  They add more sync_core_before_usermode() callers.

Having looked at your patches a bunch and the membarrier code a bunch,
I don't think I like the approach of pushing this logic out into new
core functions called by arch code.  Right now, even with my
membarrier patches applied, understanding how (for example) the x86
switch_mm_irqs_off() plus the scheduler code provides the barriers
that membarrier needs is quite complicated, and it's not clear to me
that the code is even correct.  At the very least I'm pretty sure that
the x86 comments are misleading.  With your patches, someone trying to
audit the code would have to follow core code calling into arch code
and back out into core code to figure out what's going on.  I think
the result is worse.

I wrote this incomplete patch which takes the opposite approach (sorry
for whitespace damage):

commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Dec 2 17:24:02 2020 -0800

    [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code

    The core scheduler isn't a great place for
    membarrier_mm_sync_core_before_usermode() -- the core scheduler
    doesn't actually know whether we are lazy.  With the old code, if a
    CPU is running a membarrier-registered task, goes idle, gets unlazied
    via a TLB shootdown IPI, and switches back to the
    membarrier-registered task, it will do an unnecessary core sync.

    Conveniently, x86 is the only architecture that does anything in this
    hook, so we can just move the code.

    XXX: actually delete the old code.

    Signed-off-by: Andy Lutomirski <luto@kernel.org>

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3338a1feccf9..e27300fc865b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
          * from one thread in a process to another thread in the same
          * process. No TLB flush required.
          */
+
+        // XXX: why is this okay wrt membarrier?
         if (!was_lazy)
             return;

@@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
         smp_mb();
         next_tlb_gen = atomic64_read(&next->context.tlb_gen);
         if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
-                next_tlb_gen)
+            next_tlb_gen) {
+            /*
+             * We're reactivating an mm, and membarrier might
+             * need to serialize.  Tell membarrier.
+             */
+
+            // XXX: I can't understand the logic in
+            // membarrier_mm_sync_core_before_usermode().  What's
+            // the mm check for?
+            membarrier_mm_sync_core_before_usermode(next);
             return;
+        }

         /*
          * TLB contents went out of date while we were in lazy
          * mode. Fall through to the TLB switching code below.
+         * No need for an explicit membarrier invocation -- the CR3
+         * write will serialize.
          */
         new_asid = prev_asid;
         need_flush = true;


From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=jgKV=FH=lists.ozlabs.org=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C52BEC64E7A
	for <linuxppc-dev@archiver.kernel.org>; Thu,  3 Dec 2020 05:12:48 +0000 (UTC)
Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id F037C208A9
	for <linuxppc-dev@archiver.kernel.org>; Thu,  3 Dec 2020 05:12:47 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F037C208A9
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org
Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])
	by lists.ozlabs.org (Postfix) with ESMTP id 4CmkVj252mzDqT9
	for <linuxppc-dev@archiver.kernel.org>; Thu,  3 Dec 2020 16:12:45 +1100 (AEDT)
Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized)
 smtp.mailfrom=kernel.org (client-ip=198.145.29.99; helo=mail.kernel.org;
 envelope-from=luto@kernel.org; receiver=<UNKNOWN>)
Authentication-Results: lists.ozlabs.org;
 dmarc=pass (p=none dis=none) header.from=kernel.org
Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key;
 unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256
 header.s=k20201202 header.b=tfMeMJ93; 
 dkim-atps=neutral
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.ozlabs.org (Postfix) with ESMTPS id 4CmkRP32l4zDrLv
 for <linuxppc-dev@lists.ozlabs.org>; Thu,  3 Dec 2020 16:09:53 +1100 (AEDT)
X-Gm-Message-State: AOAM532Gv75Sywmo1ytWEBIJonAOS5Q9sWkdhD0BqlH4G0rUHqRq8sJE
 zefDS0w82kJtVZ6NZED6uMk03dGlW04KuB6J55snSA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
 s=k20201202; t=1606972190;
 bh=7g2I9ZPxa4pchuE7dPGHoljxF4dRWBnOMGl/P7H9WIA=;
 h=References:In-Reply-To:From:Date:Subject:To:Cc:From;
 b=tfMeMJ93rqivue3nmkFMInQS2xqr4igxWM3EfAPRbMi6rlAjonbMVtq21YSenddTX
 7K2KbW+u0Z1V8Uz+Bz0A6TPWn3pAqW934WfaUQmmklKjgbkxuXgfvSR9X9p03VSdkw
 IGfD6169dI1yzRY1dfx39xZuEK962uFAyXSe97o8MpsV7BPearkqNkYviq/qNdZ2Uy
 7TttBTGKbY/s1CxOcVALnPKQJuM9jqxg7/yc/ahVe/Pn8BFn++bIF8WjHYY3ckDhFW
 MWbt36hmdgtpKIZDDg7m6pya6naGxSKStcED9XQ0SIw1Qe9Hm8rVu84WJJCUnP+jP1
 MdhhNnDRlQk1g==
X-Google-Smtp-Source: ABdhPJxDY9JVxN1buVB6rnmNib1VO7P6DXlfg5FcFerSQ1RvZb4vgVhneeBCk5IE2FYo99+L/kmDBlVlLd9sen+gCuk=
X-Received: by 2002:a1c:7e87:: with SMTP id z129mr1177759wmc.176.1606972188903; 
 Wed, 02 Dec 2020 21:09:48 -0800 (PST)
MIME-Version: 1.0
References: <20201128160141.1003903-1-npiggin@gmail.com>
 <20201128160141.1003903-3-npiggin@gmail.com>
 <CALCETrXYkbuJJnDv9ijfT+5tLQ2FOvvN1Ugoh5NwOy+zHp9HXA@mail.gmail.com>
 <1606876327.dyrhkih2kh.astroid@bobo.none>
In-Reply-To: <1606876327.dyrhkih2kh.astroid@bobo.none>
From: Andy Lutomirski <luto@kernel.org>
Date: Wed, 2 Dec 2020 21:09:37 -0800
X-Gmail-Original-Message-ID: <CALCETrV8Z5JdsP-Qa8B6y01LmXnSruOEWVt9_Un1RX1+nZuhxw@mail.gmail.com>
Message-ID: <CALCETrV8Z5JdsP-Qa8B6y01LmXnSruOEWVt9_Un1RX1+nZuhxw@mail.gmail.com>
Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than
 membarrier_mm_sync_core_before_usermode
To: Nicholas Piggin <npiggin@gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-BeenThere: linuxppc-dev@lists.ozlabs.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Linux on PowerPC Developers Mail List <linuxppc-dev.lists.ozlabs.org>
List-Unsubscribe: <https://lists.ozlabs.org/options/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>
List-Archive: <http://lists.ozlabs.org/pipermail/linuxppc-dev/>
List-Post: <mailto:linuxppc-dev@lists.ozlabs.org>
List-Help: <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>
List-Subscribe: <https://lists.ozlabs.org/listinfo/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>
Cc: linux-arch <linux-arch@vger.kernel.org>, Arnd Bergmann <arnd@arndb.de>,
 Peter Zijlstra <peterz@infradead.org>, X86 ML <x86@kernel.org>,
 LKML <linux-kernel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>,
 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
 Andy Lutomirski <luto@kernel.org>,
 linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org
Sender: "Linuxppc-dev"
 <linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org>

On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am:
> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>
> >> And get rid of the generic sync_core_before_usermode facility. This is
> >> functionally a no-op in the core scheduler code, but it also catches
> >>
> >> This helper is the wrong way around I think. The idea that membarrier
> >> state requires a core sync before returning to user is the easy one
> >> that does not need hiding behind membarrier calls. The gap in core
> >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the
> >> tricky detail that is better put in x86 lazy tlb code.
> >>
> >> Consider if an arch did not synchronize core in switch_mm either, then
> >> membarrier_mm_sync_core_before_usermode would be in the wrong place
> >> but arch specific mmu context functions would still be the right place.
> >> There is also a exit_lazy_tlb case that is not covered by this call, which
> >> could be a bugs (kthread use mm the membarrier process's mm then context
> >> switch back to the process without switching mm or lazy mm switch).
> >>
> >> This makes lazy tlb code a bit more modular.
> >
> > I have a couple of membarrier fixes that I want to send out today or
> > tomorrow, and they might eliminate the need for this patch.  Let me
> > think about this a little bit.  I'll cc you.  The existing code is way
> > to subtle and the comments are far too confusing for me to be quickly
> > confident about any of my conclusions :)
> >
>
> Thanks for the head's up. I'll have to have a better look through them
> but I don't know that it eliminates the need for this entirely although
> it might close some gaps and make this not a bug fix. The problem here
> is x86 code wanted something to be called when a lazy mm is unlazied,
> but it missed some spots and also the core scheduler doesn't need to
> know about those x86 details if it has this generic call that annotates
> the lazy handling better.

I'll send v3 tomorrow.  They add more sync_core_before_usermode() callers.

Having looked at your patches a bunch and the membarrier code a bunch,
I don't think I like the approach of pushing this logic out into new
core functions called by arch code.  Right now, even with my
membarrier patches applied, understanding how (for example) the x86
switch_mm_irqs_off() plus the scheduler code provides the barriers
that membarrier needs is quite complicated, and it's not clear to me
that the code is even correct.  At the very least I'm pretty sure that
the x86 comments are misleading.  With your patches, someone trying to
audit the code would have to follow core code calling into arch code
and back out into core code to figure out what's going on.  I think
the result is worse.

I wrote this incomplete patch which takes the opposite approach (sorry
for whitespace damage):

commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Dec 2 17:24:02 2020 -0800

    [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code

    The core scheduler isn't a great place for
    membarrier_mm_sync_core_before_usermode() -- the core scheduler
    doesn't actually know whether we are lazy.  With the old code, if a
    CPU is running a membarrier-registered task, goes idle, gets unlazied
    via a TLB shootdown IPI, and switches back to the
    membarrier-registered task, it will do an unnecessary core sync.

    Conveniently, x86 is the only architecture that does anything in this
    hook, so we can just move the code.

    XXX: actually delete the old code.

    Signed-off-by: Andy Lutomirski <luto@kernel.org>

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3338a1feccf9..e27300fc865b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
          * from one thread in a process to another thread in the same
          * process. No TLB flush required.
          */
+
+        // XXX: why is this okay wrt membarrier?
         if (!was_lazy)
             return;

@@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
         smp_mb();
         next_tlb_gen = atomic64_read(&next->context.tlb_gen);
         if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
-                next_tlb_gen)
+            next_tlb_gen) {
+            /*
+             * We're reactivating an mm, and membarrier might
+             * need to serialize.  Tell membarrier.
+             */
+
+            // XXX: I can't understand the logic in
+            // membarrier_mm_sync_core_before_usermode().  What's
+            // the mm check for?
+            membarrier_mm_sync_core_before_usermode(next);
             return;
+        }

         /*
          * TLB contents went out of date while we were in lazy
          * mode. Fall through to the TLB switching code below.
+         * No need for an explicit membarrier invocation -- the CR3
+         * write will serialize.
          */
         new_asid = prev_asid;
         need_flush = true;