From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932540AbbFUBbR (ORCPT ); Sat, 20 Jun 2015 21:31:17 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:45118 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751145AbbFUBbG (ORCPT ); Sat, 20 Jun 2015 21:31:06 -0400 X-Helo: d03dlp03.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Sat, 20 Jun 2015 18:30:58 -0700 From: "Paul E. McKenney" To: Dave Hansen Cc: Andi Kleen , dave.hansen@linux.intel.com, akpm@linux-foundation.org, jack@suse.cz, viro@zeniv.linux.org.uk, eparis@redhat.com, john@johnmccutchan.com, rlove@rlove.org, tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH] fs: optimize inotify/fsnotify code for unwatched files Message-ID: <20150621013058.GH3913@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150619215025.4F689817@viggo.jf.intel.com> <20150619233306.GT25760@tassilo.jf.intel.com> <5584B62F.5080506@sr71.net> <20150620022135.GF3913@linux.vnet.ibm.com> <5585AAA0.1030305@sr71.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5585AAA0.1030305@sr71.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15062101-0009-0000-0000-00000BDBD663 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 20, 2015 at 11:02:08AM -0700, Dave Hansen wrote: > On 06/19/2015 07:21 PM, Paul E. McKenney wrote: > >>> > > What is so expensive in it? Just the memory barrier in it? > >> > > >> > The profiling doesn't hit on the mfence directly, but I assume that the > >> > overhead is coming from there. The "mov 0x8(%rdi),%rcx" is identical > >> > before and after the barrier, but it appears much more expensive > >> > _after_. That makes no sense unless the barrier is the thing causing it. > > OK, one thing to try is to simply delete the memory barrier. The > > resulting code will be unsafe, but will probably run well enough to > > get benchmark results. If it is the memory barrier, you should of > > course get increased throughput. > > So I took the smp_mb() out of __srcu_read_lock(). The benchmark didn't > improve at all. Looking at the profile, all of the overhead had just > shifted to __srcu_read_unlock() and its memory barrier! Removing the > barrier in __srcu_read_unlock() got essentially the same gains out of > the benchmark as the original patch in this thread that just avoids RCU. > > I think that's fairly conclusive that the source of the overhead is, > indeed, the memory barriers. > > Although I said this test was single threaded, I also had another > thought. The benchmark is single-threaded, but 'perf' is sitting doing > profiling and who knows what else on the other core, and the profiling > NMIs are certainly writing plenty of data to memory. So, there might be > plenty of work for that smp_mb()/mfence to do _despite_ the benchmark > itself being single threaded. Well, it is not hard to have an SRCU-like thing that doesn't have read-side memory barriers, given that older versions of SRCU didn't have them. However, the price is increased latency for the analog to synchronize_srcu(). I am guessing that this would not be a problem for notification-group destruction, which is presumably rare. That said, if empty *_fsnotify_mask is the common case or if the overhead of processing notification overwhelms srcu_read_lock(), your original patch seems a bit simpler. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/