From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754595AbbFTCVw (ORCPT ); Fri, 19 Jun 2015 22:21:52 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:45246 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753696AbbFTCVo (ORCPT ); Fri, 19 Jun 2015 22:21:44 -0400 X-Helo: d03dlp02.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Fri, 19 Jun 2015 19:21:35 -0700 From: "Paul E. McKenney" To: Dave Hansen Cc: Andi Kleen , dave.hansen@linux.intel.com, akpm@linux-foundation.org, jack@suse.cz, viro@zeniv.linux.org.uk, eparis@redhat.com, john@johnmccutchan.com, rlove@rlove.org, tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH] fs: optimize inotify/fsnotify code for unwatched files Message-ID: <20150620022135.GF3913@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150619215025.4F689817@viggo.jf.intel.com> <20150619233306.GT25760@tassilo.jf.intel.com> <5584B62F.5080506@sr71.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5584B62F.5080506@sr71.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15062002-0013-0000-0000-00000F53E880 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 19, 2015 at 05:39:11PM -0700, Dave Hansen wrote: > On 06/19/2015 04:33 PM, Andi Kleen wrote: > >> > I *think* we can avoid taking the srcu_read_lock() for the > >> > common case where there are no actual marks on the file > >> > being modified *or* the vfsmount. > > What is so expensive in it? Just the memory barrier in it? > > The profiling doesn't hit on the mfence directly, but I assume that the > overhead is coming from there. The "mov 0x8(%rdi),%rcx" is identical > before and after the barrier, but it appears much more expensive > _after_. That makes no sense unless the barrier is the thing causing it. OK, one thing to try is to simply delete the memory barrier. The resulting code will be unsafe, but will probably run well enough to get benchmark results. If it is the memory barrier, you should of course get increased throughput. Thanx, Paul > Here's how the annotation mode of 'perf top' breaks it down: > > > │ ffffffff810fb480 : > > │ nop > > │ mov (%rdi),%rax > > 0.58 │ push %rbp > > │ incl %gs:0x7ef0f488(%rip) > > 1.73 │ mov %rsp,%rbp > > │ and $0x1,%eax > > │ movslq %eax,%rdx > > 0.58 │ mov 0x8(%rdi),%rcx > > │ incq %gs:(%rcx,%rdx,8) > > │ mfence > > 69.94 │ add $0x2,%rdx > > 7.51 │ mov 0x8(%rdi),%rcx > > 4.05 │ incq %gs:(%rcx,%rdx,8) > > 13.87 │ decl %gs:0x7ef0f45f(%rip) > > │ pop %rbp > > 1.73 │ ← retq > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/