From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753732AbbFXQvF (ORCPT <rfc822;w@1wt.eu>);
	Wed, 24 Jun 2015 12:51:05 -0400
Received: from www.sr71.net ([198.145.64.142]:39051 "EHLO blackbird.sr71.net"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753330AbbFXQuw (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 24 Jun 2015 12:50:52 -0400
Message-ID: <558ADFEA.7020905@sr71.net>
Date: Wed, 24 Jun 2015 09:50:50 -0700
From: Dave Hansen <dave@sr71.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: paulmck@linux.vnet.ibm.com
CC: Peter Zijlstra <peterz@infradead.org>, Andi Kleen <ak@linux.intel.com>,
        dave.hansen@linux.intel.com, akpm@linux-foundation.org, jack@suse.cz,
        viro@zeniv.linux.org.uk, eparis@redhat.com, john@johnmccutchan.com,
        rlove@rlove.org, tim.c.chen@linux.intel.com,
        linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] fs: optimize inotify/fsnotify code for unwatched
 files
References: <20150619215025.4F689817@viggo.jf.intel.com> <20150619233306.GT25760@tassilo.jf.intel.com> <5584B62F.5080506@sr71.net> <20150620022135.GF3913@linux.vnet.ibm.com> <5585AAA0.1030305@sr71.net> <20150621013058.GH3913@linux.vnet.ibm.com> <20150622132821.GB12596@twins.programming.kicks-ass.net> <20150622151121.GK3913@linux.vnet.ibm.com> <5588590A.7080001@sr71.net> <20150623002614.GD3892@linux.vnet.ibm.com>
In-Reply-To: <20150623002614.GD3892@linux.vnet.ibm.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/22/2015 05:26 PM, Paul E. McKenney wrote:
> OK, here is an experimental patch that provides a fast-readers variant
> of RCU, forward-ported from v3.3.  Because we didn't have call_srcu()
> and srcu_barrier() back then, it is not a drop-in replacement for SRCU,
> so you need to adapt the code to the API, which means putting an "fr"
> in front of the "srcu" in the API members.
> 
> Understood on the overhead of the memory-barrier instruction showing
> up consistently.  My point was instead that getting rid of this
> memory-barrier instruction does not come for free, as it greatly
> increases the latency of synchronize_frsrcu().  In a real workload,
> it is entirely possible that the savings from eliminating the memory
> barrier are overwhelmed by the increased grace-period latency.
> 
> Anyway, the patch is below.  Very lightly tested.

This does give a very similar performance boost as the other
optimization I posted.  I measured this patch to boost the writes/second
by 11.0% while my previous optimization did 10.8%.

I don't think this workload will see any of the overhead of the
synchronize_frsrcu(), though, but this helps confirm the source of the
overhead.