From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E927C4727D for ; Thu, 24 Sep 2020 17:25:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA619238D6 for ; Thu, 24 Sep 2020 17:25:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="mTIP/SB4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728703AbgIXRZY (ORCPT ); Thu, 24 Sep 2020 13:25:24 -0400 Received: from mail.efficios.com ([167.114.26.124]:53602 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727364AbgIXRZU (ORCPT ); Thu, 24 Sep 2020 13:25:20 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 8B3E52D6659; Thu, 24 Sep 2020 13:25:18 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id znds9xyEdHu2; Thu, 24 Sep 2020 13:25:18 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 3140D2D6989; Thu, 24 Sep 2020 13:25:18 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 3140D2D6989 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1600968318; bh=/fSFeLFJXNGurJeLhsEr/lOZLQ24yyPsj1h9I13yqlA=; h=From:To:Date:Message-Id; b=mTIP/SB4TqsSwKnIEDDNvQNyStJsjMRzCoOXj0/erDNhVJisiiuacBt2ayZD5jTK5 LSycw0sn1DJd1ouyz7aaFu2Ick1ne90TVmdHZihPFRe6ViHL2Rn6F21gX1/cp9zFxR hnVP+lBiHMQGXLRn67SEUj4kIVH0ungAzf0nmN8iyc7hwbNmyjpNnzdua7nw3Ti1QN j5mogOEKSU4WIDg7ed8zUGSYLnovd+PcvQt8SiWjAxWuIE5xb52qfhqpZM8e5rn290 cEmHsvHzAjbBIIzMP3zZJaEOwF7t4merHULvQ1HarHwVz9gC6slrPc62N5iU1/MTGk PpQ/0KWJ45OsA== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id d7WN_UTnm1DI; Thu, 24 Sep 2020 13:25:18 -0400 (EDT) Received: from localhost.localdomain (192-222-181-218.qc.cable.ebox.net [192.222.181.218]) by mail.efficios.com (Postfix) with ESMTPSA id 914A22D6658; Thu, 24 Sep 2020 13:25:17 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Boqun Feng Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Will Deacon , "Paul E . McKenney" , Nicholas Piggin , Andy Lutomirski , Andrew Morton , Alan Stern Subject: [RFC PATCH 3/3] sched: membarrier: document memory ordering scenarios Date: Thu, 24 Sep 2020 13:25:08 -0400 Message-Id: <20200924172508.8724-4-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200924172508.8724-1-mathieu.desnoyers@efficios.com> References: <20200924172508.8724-1-mathieu.desnoyers@efficios.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Document membarrier ordering scenarios in membarrier.c. Thanks to Alan Stern for refreshing my memory. Now that I have those in mind, it seems appropriate to serialize them to comments for posterity. Signed-off-by: Mathieu Desnoyers Cc: Peter Zijlstra (Intel) Cc: Boqun Feng Cc: Will Deacon Cc: Paul E. McKenney Cc: Nicholas Piggin Cc: Andy Lutomirski Cc: Andrew Morton Cc: Alan Stern --- kernel/sched/membarrier.c | 128 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index e5246580201b..cddc5df01e40 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -6,6 +6,134 @@ */ #include "sched.h" +/* + * For documentation purposes, here are some membarrier ordering + * scenarios to keep in mind: + * + * A) Userspace thread execution after IPI vs membarrier's memory + * barrier before sending the IPI + * + * Userspace variables: + * + * int x = 0, y = 0; + * + * The memory barrier at the start of membarrier() on CPU0 is necessary in + * order to enforce the guarantee that any writes occurring on CPU0 before + * the membarrier() is executed will be visible to any code executing on + * CPU1 after the IPI-induced memory barrier: + * + * CPU0 CPU1 + * + * x = 1 + * membarrier(): + * a: smp_mb() + * b: send IPI IPI-induced mb + * c: smp_mb() + * r2 = y + * y = 1 + * barrier() + * r1 = x + * + * BUG_ON(r1 == 0 && r2 == 0) + * + * The write to y and load from x by CPU1 are unordered by the hardware, + * so it's possible to have "r1 = x" reordered before "y = 1" at any + * point after (b). If the memory barrier at (a) is omitted, then "x = 1" + * can be reordered after (a) (although not after (c)), so we get r1 == 0 + * and r2 == 0. This violates the guarantee that membarrier() is + * supposed by provide. + * + * The timing of the memory barrier at (a) has to ensure that it executes + * before the IPI-induced memory barrier on CPU1. + * + * B) Userspace thread execution before IPI vs membarrier's memory + * barrier after completing the IPI + * + * Userspace variables: + * + * int x = 0, y = 0; + * + * The memory barrier at the end of membarrier() on CPU0 is necessary in + * order to enforce the guarantee that any writes occurring on CPU1 before + * the membarrier() is executed will be visible to any code executing on + * CPU0 after the membarrier(): + * + * CPU0 CPU1 + * + * x = 1 + * barrier() + * y = 1 + * r2 = y + * membarrier(): + * a: smp_mb() + * b: send IPI IPI-induced mb + * c: smp_mb() + * r1 = x + * BUG_ON(r1 == 0 && r2 == 1) + * + * The writes to x and y are unordered by the hardware, so it's possible to + * have "r2 = 1" even though the write to x doesn't execute until (b). If + * the memory barrier at (c) is omitted then "r1 = x" can be reordered + * before (b) (although not before (a)), so we get "r1 = 0". This violates + * the guarantee that membarrier() is supposed to provide. + * + * The timing of the memory barrier at (c) has to ensure that it executes + * after the IPI-induced memory barrier on CPU1. + * + * C) Scheduling userspace thread -> kthread -> userspace thread vs membarrier + * + * CPU0 CPU1 + * + * membarrier(): + * a: smp_mb() + * d: switch to kthread (includes mb) + * b: read rq->curr->mm == NULL + * e: switch to user (includes mb) + * c: smp_mb() + * + * Using the scenario from (A), we can show that (a) needs to be paired + * with (e). Using the scenario from (B), we can show that (c) needs to + * be paired with (d). + * + * D) exit_mm vs membarrier + * + * Two thread groups are created, A and B. Thread group B is created by + * issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. + * Let's assume we have a single thread within each thread group (Thread A + * and Thread B). Thread A runs on CPU0, Thread B runs on CPU1. + * + * CPU0 CPU1 + * + * membarrier(): + * a: smp_mb() + * exit_mm(): + * d: smp_mb() + * e: current->mm = NULL + * b: read rq->curr->mm == NULL + * c: smp_mb() + * + * Using scenario (B), we can show that (c) needs to be paired with (d). + * + * E) kthread_{use,unuse}_mm vs membarrier + * + * CPU0 CPU1 + * + * membarrier(): + * a: smp_mb() + * kthread_unuse_mm() + * d: smp_mb() + * e: current->mm = NULL + * b: read rq->curr->mm == NULL + * kthread_use_mm() + * f: current->mm = mm + * g: smp_mb() + * c: smp_mb() + * + * Using the scenario from (A), we can show that (a) needs to be paired + * with (g). Using the scenario from (B), we can show that (c) needs to + * be paired with (d). + */ + /* * Bitmask made from a "or" of all commands within enum membarrier_cmd, * except MEMBARRIER_CMD_QUERY. -- 2.17.1