From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750919AbdH3FCT (ORCPT ); Wed, 30 Aug 2017 01:02:19 -0400 Received: from mail-it0-f45.google.com ([209.85.214.45]:37201 "EHLO mail-it0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750756AbdH3FCR (ORCPT ); Wed, 30 Aug 2017 01:02:17 -0400 MIME-Version: 1.0 In-Reply-To: <1463521395.16945.1503889546934.JavaMail.zimbra@efficios.com> References: <20170827205035.25620-1-mathieu.desnoyers@efficios.com> <1463521395.16945.1503889546934.JavaMail.zimbra@efficios.com> From: Andy Lutomirski Date: Tue, 29 Aug 2017 22:01:56 -0700 Message-ID: Subject: Re: [PATCH v2] membarrier: provide register sync core cmd To: Mathieu Desnoyers Cc: "Paul E. McKenney" , Peter Zijlstra , linux-kernel , Boqun Feng , Andrew Hunter , maged michael , gromer , Avi Kivity , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Dave Watson , Andy Lutomirski , Will Deacon , Hans Boehm Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Aug 27, 2017, at 8:05 PM, Mathieu Desnoyers wrote: > > ----- On Aug 27, 2017, at 3:53 PM, Andy Lutomirski luto@amacapital.net wrote: > >>> On Aug 27, 2017, at 1:50 PM, Mathieu Desnoyers >>> wrote: >>> >>> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier >>> system call. It allows processes to register their intent to have their >>> threads issue core serializing barriers in addition to memory barriers >>> whenever a membarrier command is performed. >>> >> >> Why is this stateful? That is, why not just have a new membarrier command to >> sync every thread's icache? > > If we'd do it on every CPU icache, it would be as trivial as you say. The > concern here is sending IPIs only to CPUs running threads that belong to the > same process, so we don't disturb unrelated processes. > > If we could just grab each CPU's runqueue lock, it would be fairly simple > to do. But we want to avoid hitting each runqueue with exclusive atomic > access associated with grabbing the lock. (cache-line bouncing) Hmm. Are there really arches where there is no clean implementation without this hacker? It seems rather unfortunate that munmap() can be done efficiently but this barrier can't be. At the very least, could there be a register command *and* a special sync command? I dislike the idea that the sync command does something different depending on some other state. Even better (IMO) would be a design where you ask for an isync and, if the arch can do it efficiently (x86), you get an efficient isync and, if the arch can't (arm64?) you take all the rq locks? --Andy