From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751539AbdG0SM4 (ORCPT <rfc822;w@1wt.eu>);
        Thu, 27 Jul 2017 14:12:56 -0400
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:43347 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1750981AbdG0SMy (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 27 Jul 2017 14:12:54 -0400
Date: Thu, 27 Jul 2017 11:12:50 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: avi@scylladb.com, maged.michael@gmail.com, ahh@google.com,
        gromer@google.com
Cc: linux-kernel@vger.kernel.org
Subject: Udpated sys_membarrier() speedup patch, FYI
Reply-To: paulmck@linux.vnet.ibm.com
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
x-cbid: 17072718-0024-0000-0000-000002B6983B
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007436; HX=3.00000241; KW=3.00000007;
 PH=3.00000004; SC=3.00000214; SDB=6.00893741; UDB=6.00446843; IPR=6.00673902;
 BA=6.00005495; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016410; XFM=3.00000015;
 UTC=2017-07-27 18:12:51
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17072718-0025-0000-0000-000044E5E58D
Message-Id: <20170727181250.GA20183@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-27_10:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000
 definitions=main-1707270283
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello!

Please see below for a prototype sys_membarrier() speedup patch.
Please note that there is some controversy on this subject, so the final
version will probably be quite a bit different than this prototype.

But my main question is whether the throttling shown below is acceptable
for your use cases, namely only one expedited sys_membarrier() permitted
per scheduling-clock period (1 millisecond on many platforms), with any
excess being silently converted to non-expedited form.  The reason for
the throttling is concerns about DoS attacks based on user code with a
tight loop invoking this system call.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 4cd5253094b6d7f9501e21e13aa4e2e78e8a70cd
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jul 18 13:53:32 2017 -0700

    sys_membarrier: Add expedited option
    
    The sys_membarrier() system call has proven too slow for some use cases,
    which has prompted users to instead rely on TLB shootdown.  Although TLB
    shootdown is much faster, it has the slight disadvantage of not working
    at all on arm and arm64 and also of being vulnerable to reasonable
    optimizations that might skip some IPIs.  However, the Linux kernel
    does not currrently provide a reasonable alternative, so it is hard to
    criticize these users from doing what works for them on a given piece
    of hardware at a given time.
    
    This commit therefore adds an expedited option to the sys_membarrier()
    system call, thus providing a faster mechanism that is portable and
    is not subject to death by optimization.  Note that if more than one
    MEMBARRIER_CMD_SHARED_EXPEDITED sys_membarrier() call happens within
    the same jiffy, all but the first will use synchronize_sched() instead
    of synchronize_sched_expedited().
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    [ paulmck: Fix code style issue pointed out by Boqun Feng. ]
    Tested-by: Avi Kivity <avi@scylladb.com>
    Cc: Maged Michael <maged.michael@gmail.com>
    Cc: Andrew Hunter <ahh@google.com>
    Cc: Geoffrey Romer <gromer@google.com>

diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index e0b108bd2624..5720386d0904 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -40,6 +40,16 @@
  *                          (non-running threads are de facto in such a
  *                          state). This covers threads from all processes
  *                          running on the system. This command returns 0.
+ * @MEMBARRIER_CMD_SHARED_EXPEDITED:  Execute a memory barrier on all
+ *			    running threads, but in an expedited fashion.
+ *                          Upon return from system call, the caller thread
+ *                          is ensured that all running threads have passed
+ *                          through a state where all memory accesses to
+ *                          user-space addresses match program order between
+ *                          entry to and return from the system call
+ *                          (non-running threads are de facto in such a
+ *                          state). This covers threads from all processes
+ *                          running on the system. This command returns 0.
  *
  * Command to be passed to the membarrier system call. The commands need to
  * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
@@ -48,6 +58,7 @@
 enum membarrier_cmd {
 	MEMBARRIER_CMD_QUERY = 0,
 	MEMBARRIER_CMD_SHARED = (1 << 0),
+	MEMBARRIER_CMD_SHARED_EXPEDITED = (1 << 1),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/membarrier.c b/kernel/membarrier.c
index 9f9284f37f8d..587e3bbfae7e 100644
--- a/kernel/membarrier.c
+++ b/kernel/membarrier.c
@@ -22,7 +22,8 @@
  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
  * except MEMBARRIER_CMD_QUERY.
  */
-#define MEMBARRIER_CMD_BITMASK	(MEMBARRIER_CMD_SHARED)
+#define MEMBARRIER_CMD_BITMASK	(MEMBARRIER_CMD_SHARED |		\
+				 MEMBARRIER_CMD_SHARED_EXPEDITED)
 
 /**
  * sys_membarrier - issue memory barriers on a set of threads
@@ -64,6 +65,20 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 		if (num_online_cpus() > 1)
 			synchronize_sched();
 		return 0;
+	case MEMBARRIER_CMD_SHARED_EXPEDITED:
+		if (num_online_cpus() > 1) {
+			static unsigned long lastexp;
+			unsigned long j;
+
+			j = jiffies;
+			if (READ_ONCE(lastexp) == j) {
+				synchronize_sched();
+				WRITE_ONCE(lastexp, j);
+			} else {
+				synchronize_sched_expedited();
+			}
+		}
+		return 0;
 	default:
 		return -EINVAL;
 	}