From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0FC3C64E7A for ; Thu, 3 Dec 2020 05:10:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 633F4208A9 for ; Thu, 3 Dec 2020 05:10:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728197AbgLCFKc (ORCPT ); Thu, 3 Dec 2020 00:10:32 -0500 Received: from mail.kernel.org ([198.145.29.99]:51464 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726071AbgLCFKc (ORCPT ); Thu, 3 Dec 2020 00:10:32 -0500 X-Gm-Message-State: AOAM532lSLVNTKFudxYosP6ETobPLqnHLzPlwsa418Q3qHIuocfZXdwL VAx9YH1APIldxFJlXk40wRyLWdhJ06psTvu+VyyQ9Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1606972190; bh=7g2I9ZPxa4pchuE7dPGHoljxF4dRWBnOMGl/P7H9WIA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=tfMeMJ93rqivue3nmkFMInQS2xqr4igxWM3EfAPRbMi6rlAjonbMVtq21YSenddTX 7K2KbW+u0Z1V8Uz+Bz0A6TPWn3pAqW934WfaUQmmklKjgbkxuXgfvSR9X9p03VSdkw IGfD6169dI1yzRY1dfx39xZuEK962uFAyXSe97o8MpsV7BPearkqNkYviq/qNdZ2Uy 7TttBTGKbY/s1CxOcVALnPKQJuM9jqxg7/yc/ahVe/Pn8BFn++bIF8WjHYY3ckDhFW MWbt36hmdgtpKIZDDg7m6pya6naGxSKStcED9XQ0SIw1Qe9Hm8rVu84WJJCUnP+jP1 MdhhNnDRlQk1g== X-Google-Smtp-Source: ABdhPJxDY9JVxN1buVB6rnmNib1VO7P6DXlfg5FcFerSQ1RvZb4vgVhneeBCk5IE2FYo99+L/kmDBlVlLd9sen+gCuk= X-Received: by 2002:a1c:7e87:: with SMTP id z129mr1177759wmc.176.1606972188903; Wed, 02 Dec 2020 21:09:48 -0800 (PST) MIME-Version: 1.0 References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-3-npiggin@gmail.com> <1606876327.dyrhkih2kh.astroid@bobo.none> In-Reply-To: <1606876327.dyrhkih2kh.astroid@bobo.none> From: Andy Lutomirski Date: Wed, 2 Dec 2020 21:09:37 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Nicholas Piggin Cc: Andy Lutomirski , Anton Blanchard , Arnd Bergmann , linux-arch , LKML , Linux-MM , linuxppc-dev , Mathieu Desnoyers , Peter Zijlstra , X86 ML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin wrote: > > Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am: > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: > >> > >> And get rid of the generic sync_core_before_usermode facility. This is > >> functionally a no-op in the core scheduler code, but it also catches > >> > >> This helper is the wrong way around I think. The idea that membarrier > >> state requires a core sync before returning to user is the easy one > >> that does not need hiding behind membarrier calls. The gap in core > >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the > >> tricky detail that is better put in x86 lazy tlb code. > >> > >> Consider if an arch did not synchronize core in switch_mm either, then > >> membarrier_mm_sync_core_before_usermode would be in the wrong place > >> but arch specific mmu context functions would still be the right place. > >> There is also a exit_lazy_tlb case that is not covered by this call, which > >> could be a bugs (kthread use mm the membarrier process's mm then context > >> switch back to the process without switching mm or lazy mm switch). > >> > >> This makes lazy tlb code a bit more modular. > > > > I have a couple of membarrier fixes that I want to send out today or > > tomorrow, and they might eliminate the need for this patch. Let me > > think about this a little bit. I'll cc you. The existing code is way > > to subtle and the comments are far too confusing for me to be quickly > > confident about any of my conclusions :) > > > > Thanks for the head's up. I'll have to have a better look through them > but I don't know that it eliminates the need for this entirely although > it might close some gaps and make this not a bug fix. The problem here > is x86 code wanted something to be called when a lazy mm is unlazied, > but it missed some spots and also the core scheduler doesn't need to > know about those x86 details if it has this generic call that annotates > the lazy handling better. I'll send v3 tomorrow. They add more sync_core_before_usermode() callers. Having looked at your patches a bunch and the membarrier code a bunch, I don't think I like the approach of pushing this logic out into new core functions called by arch code. Right now, even with my membarrier patches applied, understanding how (for example) the x86 switch_mm_irqs_off() plus the scheduler code provides the barriers that membarrier needs is quite complicated, and it's not clear to me that the code is even correct. At the very least I'm pretty sure that the x86 comments are misleading. With your patches, someone trying to audit the code would have to follow core code calling into arch code and back out into core code to figure out what's going on. I think the result is worse. I wrote this incomplete patch which takes the opposite approach (sorry for whitespace damage): commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d Author: Andy Lutomirski Date: Wed Dec 2 17:24:02 2020 -0800 [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code The core scheduler isn't a great place for membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't actually know whether we are lazy. With the old code, if a CPU is running a membarrier-registered task, goes idle, gets unlazied via a TLB shootdown IPI, and switches back to the membarrier-registered task, it will do an unnecessary core sync. Conveniently, x86 is the only architecture that does anything in this hook, so we can just move the code. XXX: actually delete the old code. Signed-off-by: Andy Lutomirski diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 3338a1feccf9..e27300fc865b 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * from one thread in a process to another thread in the same * process. No TLB flush required. */ + + // XXX: why is this okay wrt membarrier? if (!was_lazy) return; @@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, smp_mb(); next_tlb_gen = atomic64_read(&next->context.tlb_gen); if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) == - next_tlb_gen) + next_tlb_gen) { + /* + * We're reactivating an mm, and membarrier might + * need to serialize. Tell membarrier. + */ + + // XXX: I can't understand the logic in + // membarrier_mm_sync_core_before_usermode(). What's + // the mm check for? + membarrier_mm_sync_core_before_usermode(next); return; + } /* * TLB contents went out of date while we were in lazy * mode. Fall through to the TLB switching code below. + * No need for an explicit membarrier invocation -- the CR3 + * write will serialize. */ new_asid = prev_asid; need_flush = true; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98D0BC63777 for ; Thu, 3 Dec 2020 05:09:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0B80208A9 for ; Thu, 3 Dec 2020 05:09:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0B80208A9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 31D706B005C; Thu, 3 Dec 2020 00:09:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CDFB6B005D; Thu, 3 Dec 2020 00:09:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E2F16B0068; Thu, 3 Dec 2020 00:09:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id 051866B005C for ; Thu, 3 Dec 2020 00:09:53 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BAA8F8249980 for ; Thu, 3 Dec 2020 05:09:52 +0000 (UTC) X-FDA: 77550793824.24.cars49_5a16659273b9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 988601A4A0 for ; Thu, 3 Dec 2020 05:09:52 +0000 (UTC) X-HE-Tag: cars49_5a16659273b9 X-Filterd-Recvd-Size: 7582 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Dec 2020 05:09:51 +0000 (UTC) X-Gm-Message-State: AOAM530t9ty4bNIA43Rm/nac5HAYfcs6NCMBY5F3FLLohb3K7Wo4h3gw gzKivaR9a4beG3L3ONcAT7eODRHWvawVuB4/EDtxqQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1606972190; bh=7g2I9ZPxa4pchuE7dPGHoljxF4dRWBnOMGl/P7H9WIA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=tfMeMJ93rqivue3nmkFMInQS2xqr4igxWM3EfAPRbMi6rlAjonbMVtq21YSenddTX 7K2KbW+u0Z1V8Uz+Bz0A6TPWn3pAqW934WfaUQmmklKjgbkxuXgfvSR9X9p03VSdkw IGfD6169dI1yzRY1dfx39xZuEK962uFAyXSe97o8MpsV7BPearkqNkYviq/qNdZ2Uy 7TttBTGKbY/s1CxOcVALnPKQJuM9jqxg7/yc/ahVe/Pn8BFn++bIF8WjHYY3ckDhFW MWbt36hmdgtpKIZDDg7m6pya6naGxSKStcED9XQ0SIw1Qe9Hm8rVu84WJJCUnP+jP1 MdhhNnDRlQk1g== X-Google-Smtp-Source: ABdhPJxDY9JVxN1buVB6rnmNib1VO7P6DXlfg5FcFerSQ1RvZb4vgVhneeBCk5IE2FYo99+L/kmDBlVlLd9sen+gCuk= X-Received: by 2002:a1c:7e87:: with SMTP id z129mr1177759wmc.176.1606972188903; Wed, 02 Dec 2020 21:09:48 -0800 (PST) MIME-Version: 1.0 References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-3-npiggin@gmail.com> <1606876327.dyrhkih2kh.astroid@bobo.none> In-Reply-To: <1606876327.dyrhkih2kh.astroid@bobo.none> From: Andy Lutomirski Date: Wed, 2 Dec 2020 21:09:37 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Nicholas Piggin Cc: Andy Lutomirski , Anton Blanchard , Arnd Bergmann , linux-arch , LKML , Linux-MM , linuxppc-dev , Mathieu Desnoyers , Peter Zijlstra , X86 ML Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin wrote: > > Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am: > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: > >> > >> And get rid of the generic sync_core_before_usermode facility. This is > >> functionally a no-op in the core scheduler code, but it also catches > >> > >> This helper is the wrong way around I think. The idea that membarrier > >> state requires a core sync before returning to user is the easy one > >> that does not need hiding behind membarrier calls. The gap in core > >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the > >> tricky detail that is better put in x86 lazy tlb code. > >> > >> Consider if an arch did not synchronize core in switch_mm either, then > >> membarrier_mm_sync_core_before_usermode would be in the wrong place > >> but arch specific mmu context functions would still be the right place. > >> There is also a exit_lazy_tlb case that is not covered by this call, which > >> could be a bugs (kthread use mm the membarrier process's mm then context > >> switch back to the process without switching mm or lazy mm switch). > >> > >> This makes lazy tlb code a bit more modular. > > > > I have a couple of membarrier fixes that I want to send out today or > > tomorrow, and they might eliminate the need for this patch. Let me > > think about this a little bit. I'll cc you. The existing code is way > > to subtle and the comments are far too confusing for me to be quickly > > confident about any of my conclusions :) > > > > Thanks for the head's up. I'll have to have a better look through them > but I don't know that it eliminates the need for this entirely although > it might close some gaps and make this not a bug fix. The problem here > is x86 code wanted something to be called when a lazy mm is unlazied, > but it missed some spots and also the core scheduler doesn't need to > know about those x86 details if it has this generic call that annotates > the lazy handling better. I'll send v3 tomorrow. They add more sync_core_before_usermode() callers. Having looked at your patches a bunch and the membarrier code a bunch, I don't think I like the approach of pushing this logic out into new core functions called by arch code. Right now, even with my membarrier patches applied, understanding how (for example) the x86 switch_mm_irqs_off() plus the scheduler code provides the barriers that membarrier needs is quite complicated, and it's not clear to me that the code is even correct. At the very least I'm pretty sure that the x86 comments are misleading. With your patches, someone trying to audit the code would have to follow core code calling into arch code and back out into core code to figure out what's going on. I think the result is worse. I wrote this incomplete patch which takes the opposite approach (sorry for whitespace damage): commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d Author: Andy Lutomirski Date: Wed Dec 2 17:24:02 2020 -0800 [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code The core scheduler isn't a great place for membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't actually know whether we are lazy. With the old code, if a CPU is running a membarrier-registered task, goes idle, gets unlazied via a TLB shootdown IPI, and switches back to the membarrier-registered task, it will do an unnecessary core sync. Conveniently, x86 is the only architecture that does anything in this hook, so we can just move the code. XXX: actually delete the old code. Signed-off-by: Andy Lutomirski diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 3338a1feccf9..e27300fc865b 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * from one thread in a process to another thread in the same * process. No TLB flush required. */ + + // XXX: why is this okay wrt membarrier? if (!was_lazy) return; @@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, smp_mb(); next_tlb_gen = atomic64_read(&next->context.tlb_gen); if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) == - next_tlb_gen) + next_tlb_gen) { + /* + * We're reactivating an mm, and membarrier might + * need to serialize. Tell membarrier. + */ + + // XXX: I can't understand the logic in + // membarrier_mm_sync_core_before_usermode(). What's + // the mm check for? + membarrier_mm_sync_core_before_usermode(next); return; + } /* * TLB contents went out of date while we were in lazy * mode. Fall through to the TLB switching code below. + * No need for an explicit membarrier invocation -- the CR3 + * write will serialize. */ new_asid = prev_asid; need_flush = true; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C52BEC64E7A for ; Thu, 3 Dec 2020 05:12:48 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F037C208A9 for ; Thu, 3 Dec 2020 05:12:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F037C208A9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CmkVj252mzDqT9 for ; Thu, 3 Dec 2020 16:12:45 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kernel.org (client-ip=198.145.29.99; helo=mail.kernel.org; envelope-from=luto@kernel.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=tfMeMJ93; dkim-atps=neutral Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4CmkRP32l4zDrLv for ; Thu, 3 Dec 2020 16:09:53 +1100 (AEDT) X-Gm-Message-State: AOAM532Gv75Sywmo1ytWEBIJonAOS5Q9sWkdhD0BqlH4G0rUHqRq8sJE zefDS0w82kJtVZ6NZED6uMk03dGlW04KuB6J55snSA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1606972190; bh=7g2I9ZPxa4pchuE7dPGHoljxF4dRWBnOMGl/P7H9WIA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=tfMeMJ93rqivue3nmkFMInQS2xqr4igxWM3EfAPRbMi6rlAjonbMVtq21YSenddTX 7K2KbW+u0Z1V8Uz+Bz0A6TPWn3pAqW934WfaUQmmklKjgbkxuXgfvSR9X9p03VSdkw IGfD6169dI1yzRY1dfx39xZuEK962uFAyXSe97o8MpsV7BPearkqNkYviq/qNdZ2Uy 7TttBTGKbY/s1CxOcVALnPKQJuM9jqxg7/yc/ahVe/Pn8BFn++bIF8WjHYY3ckDhFW MWbt36hmdgtpKIZDDg7m6pya6naGxSKStcED9XQ0SIw1Qe9Hm8rVu84WJJCUnP+jP1 MdhhNnDRlQk1g== X-Google-Smtp-Source: ABdhPJxDY9JVxN1buVB6rnmNib1VO7P6DXlfg5FcFerSQ1RvZb4vgVhneeBCk5IE2FYo99+L/kmDBlVlLd9sen+gCuk= X-Received: by 2002:a1c:7e87:: with SMTP id z129mr1177759wmc.176.1606972188903; Wed, 02 Dec 2020 21:09:48 -0800 (PST) MIME-Version: 1.0 References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-3-npiggin@gmail.com> <1606876327.dyrhkih2kh.astroid@bobo.none> In-Reply-To: <1606876327.dyrhkih2kh.astroid@bobo.none> From: Andy Lutomirski Date: Wed, 2 Dec 2020 21:09:37 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Nicholas Piggin Content-Type: text/plain; charset="UTF-8" X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch , Arnd Bergmann , Peter Zijlstra , X86 ML , LKML , Linux-MM , Mathieu Desnoyers , Andy Lutomirski , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin wrote: > > Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am: > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: > >> > >> And get rid of the generic sync_core_before_usermode facility. This is > >> functionally a no-op in the core scheduler code, but it also catches > >> > >> This helper is the wrong way around I think. The idea that membarrier > >> state requires a core sync before returning to user is the easy one > >> that does not need hiding behind membarrier calls. The gap in core > >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the > >> tricky detail that is better put in x86 lazy tlb code. > >> > >> Consider if an arch did not synchronize core in switch_mm either, then > >> membarrier_mm_sync_core_before_usermode would be in the wrong place > >> but arch specific mmu context functions would still be the right place. > >> There is also a exit_lazy_tlb case that is not covered by this call, which > >> could be a bugs (kthread use mm the membarrier process's mm then context > >> switch back to the process without switching mm or lazy mm switch). > >> > >> This makes lazy tlb code a bit more modular. > > > > I have a couple of membarrier fixes that I want to send out today or > > tomorrow, and they might eliminate the need for this patch. Let me > > think about this a little bit. I'll cc you. The existing code is way > > to subtle and the comments are far too confusing for me to be quickly > > confident about any of my conclusions :) > > > > Thanks for the head's up. I'll have to have a better look through them > but I don't know that it eliminates the need for this entirely although > it might close some gaps and make this not a bug fix. The problem here > is x86 code wanted something to be called when a lazy mm is unlazied, > but it missed some spots and also the core scheduler doesn't need to > know about those x86 details if it has this generic call that annotates > the lazy handling better. I'll send v3 tomorrow. They add more sync_core_before_usermode() callers. Having looked at your patches a bunch and the membarrier code a bunch, I don't think I like the approach of pushing this logic out into new core functions called by arch code. Right now, even with my membarrier patches applied, understanding how (for example) the x86 switch_mm_irqs_off() plus the scheduler code provides the barriers that membarrier needs is quite complicated, and it's not clear to me that the code is even correct. At the very least I'm pretty sure that the x86 comments are misleading. With your patches, someone trying to audit the code would have to follow core code calling into arch code and back out into core code to figure out what's going on. I think the result is worse. I wrote this incomplete patch which takes the opposite approach (sorry for whitespace damage): commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d Author: Andy Lutomirski Date: Wed Dec 2 17:24:02 2020 -0800 [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code The core scheduler isn't a great place for membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't actually know whether we are lazy. With the old code, if a CPU is running a membarrier-registered task, goes idle, gets unlazied via a TLB shootdown IPI, and switches back to the membarrier-registered task, it will do an unnecessary core sync. Conveniently, x86 is the only architecture that does anything in this hook, so we can just move the code. XXX: actually delete the old code. Signed-off-by: Andy Lutomirski diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 3338a1feccf9..e27300fc865b 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * from one thread in a process to another thread in the same * process. No TLB flush required. */ + + // XXX: why is this okay wrt membarrier? if (!was_lazy) return; @@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, smp_mb(); next_tlb_gen = atomic64_read(&next->context.tlb_gen); if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) == - next_tlb_gen) + next_tlb_gen) { + /* + * We're reactivating an mm, and membarrier might + * need to serialize. Tell membarrier. + */ + + // XXX: I can't understand the logic in + // membarrier_mm_sync_core_before_usermode(). What's + // the mm check for? + membarrier_mm_sync_core_before_usermode(next); return; + } /* * TLB contents went out of date while we were in lazy * mode. Fall through to the TLB switching code below. + * No need for an explicit membarrier invocation -- the CR3 + * write will serialize. */ new_asid = prev_asid; need_flush = true;