From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933051AbdGJRW2 (ORCPT ); Mon, 10 Jul 2017 13:22:28 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:36219 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932637AbdGJRWY (ORCPT ); Mon, 10 Jul 2017 13:22:24 -0400 Subject: Re: [PATCH v2 0/9] Remove spin_unlock_wait() To: Alan Stern , Ingo Molnar Cc: "Paul E. McKenney" , Peter Zijlstra , David Laight , "linux-kernel@vger.kernel.org" , "netfilter-devel@vger.kernel.org" , "netdev@vger.kernel.org" , "oleg@redhat.com" , "akpm@linux-foundation.org" , "mingo@redhat.com" , "dave@stgolabs.net" , "tj@kernel.org" , "arnd@arndb.de" , "linux-arch@vger.kernel.org" , "will.deacon@arm.com" , "parri.andrea@gmail.com" , "torvalds@linux-foundation.org" References: From: Manfred Spraul Message-ID: Date: Mon, 10 Jul 2017 19:22:19 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alan, On 07/08/2017 06:21 PM, Alan Stern wrote: > Pardon me for barging in, but I found this whole interchange extremely > confusing... > > On Sat, 8 Jul 2017, Ingo Molnar wrote: > >> * Paul E. McKenney wrote: >> >>> On Sat, Jul 08, 2017 at 10:35:43AM +0200, Ingo Molnar wrote: >>>> * Manfred Spraul wrote: >>>> >>>>> Hi Ingo, >>>>> >>>>> On 07/07/2017 10:31 AM, Ingo Molnar wrote: >>>>>> There's another, probably just as significant advantage: queued_spin_unlock_wait() >>>>>> is 'read-only', while spin_lock()+spin_unlock() dirties the lock cache line. On >>>>>> any bigger system this should make a very measurable difference - if >>>>>> spin_unlock_wait() is ever used in a performance critical code path. >>>>> At least for ipc/sem: >>>>> Dirtying the cacheline (in the slow path) allows to remove a smp_mb() in the >>>>> hot path. >>>>> So for sem_lock(), I either need a primitive that dirties the cacheline or >>>>> sem_lock() must continue to use spin_lock()/spin_unlock(). > This statement doesn't seem to make sense. Did Manfred mean to write > "smp_mb()" instead of "spin_lock()/spin_unlock()"? Option 1: fastpath: spin_lock(local_lock) smp_mb(); [[1]] smp_load_acquire(global_flag); slow path: global_flag = 1; smp_mb(); Option 2: fastpath: spin_lock(local_lock); smp_load_acquire(global_flag) slow path: global_flag = 1; spin_lock(local_lock);spin_unlock(local_lock). Rational: The ACQUIRE from spin_lock is at the read of local_lock, not at the write. i.e.: Without the smp_mb() at [[1]], the CPU can do: read local_lock; read global_flag; write local_lock; For Option 2, the smp_mb() is not required, because fast path and slow path acquire the same lock. >>>> Technically you could use spin_trylock()+spin_unlock() and avoid the lock acquire >>>> spinning on spin_unlock() and get very close to the slow path performance of a >>>> pure cacheline-dirtying behavior. > This is even more confusing. Did Ingo mean to suggest using > "spin_trylock()+spin_unlock()" in place of "spin_lock()+spin_unlock()" > could provide the desired ordering guarantee without delaying other > CPUs that may try to acquire the lock? That seems highly questionable. I agree :-) -- Manfred