From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 640186B00F6 for ; Tue, 5 Jan 2010 14:30:39 -0500 (EST) Date: Tue, 5 Jan 2010 11:28:57 -0800 (PST) From: Linus Torvalds Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault() In-Reply-To: Message-ID: References: <20100104182429.833180340@chello.nl> <20100104182813.753545361@chello.nl> <20100105092559.1de8b613.kamezawa.hiroyu@jp.fujitsu.com> <28c262361001042029w4b95f226lf54a3ed6a4291a3b@mail.gmail.com> <20100105134357.4bfb4951.kamezawa.hiroyu@jp.fujitsu.com> <20100105143046.73938ea2.kamezawa.hiroyu@jp.fujitsu.com> <20100105163939.a3f146fb.kamezawa.hiroyu@jp.fujitsu.com> <87wrzwbh0z.fsf@basil.nowhere.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: Christoph Lameter Cc: Andi Kleen , KAMEZAWA Hiroyuki , Minchan Kim , Peter Zijlstra , "Paul E. McKenney" , Peter Zijlstra , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hugh.dickins" , Nick Piggin , Ingo Molnar List-ID: On Tue, 5 Jan 2010, Christoph Lameter wrote: > > The wait state is the processor being stopped due to not being able to > access the cacheline. Not the processor spinning in the xadd loop. That > only occurs if the critical section is longer than the timeout. You don't know what you're talking about, do you? Just go and read the source code. The process is spinning currently in the spin_lock loop. Here, I'll quote it to you: LOCK_PREFIX "xaddw %w0, %1\n" "1:\t" "cmpb %h0, %b0\n\t" "je 2f\n\t" "rep ; nop\n\t" "movb %1, %b0\n\t" /* don't need lfence here, because loads are in-order */ "jmp 1b\n" note the loop that spins - reading the thing over and over - waiting for _that_ CPU to be the owner of the xadd ticket? That's the one you have now, only because x86-64 uses the STUPID FALLBACK CODE for the rwsemaphores! In contrast, look at what the non-stupid rwsemaphore code does (which triggers on x86-32): LOCK_PREFIX " incl (%%eax)\n\t" /* adds 0x00000001, returns the old value */ " jns 1f\n" " call call_rwsem_down_read_failed\n" (that's a "down_read()", which happens to be the op we care most about. See? That's a single locked "inc" (it avoids the xadd on the read side because of how we've biased things). In particular, notice how this means that we do NOT have fifty million CPU's all trying to read the same location while one writes to it successfully. Spot the difference? Here's putting it another way. Which of these schenarios do you think should result in less cross-node traffic: - multiple CPU's that - one by one - get the cacheline for exclusive access. - multiple CPU's that - one by one - get the cacheline for exclusive access, while other CPU's are all trying to read the same cacheline at the same time, over and over again, in a loop. See the shared part? See the difference? If you look at just a single lock acquire, it boils down to these two scenarios - one CPU gets the cacheline exclusively - one CPU gets the cacheline exclusively while other CPU's are all trying to read the old and the new value. It really is that simple. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org