From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754829AbbFWMCp (ORCPT ); Tue, 23 Jun 2015 08:02:45 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50011 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754134AbbFWMCa (ORCPT ); Tue, 23 Jun 2015 08:02:30 -0400 Date: Tue, 23 Jun 2015 14:02:25 +0200 From: Jan Kara To: Dave Hansen Cc: jack@suse.cz, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com, tim.c.chen@linux.intel.com, ak@linux.intel.com Subject: Re: [RFC][PATCH 2/2] fs: conditionally do memory barrier in __sb_end_write() Message-ID: <20150623120225.GF2427@quack.suse.cz> References: <20150619223223.10B658AD@viggo.jf.intel.com> <20150619223223.94775FFA@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150619223223.94775FFA@viggo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 19-06-15 15:32:23, Dave Hansen wrote: > If I sit in a loop and do write()s to small tmpfs files, > __sb_end_write() is third-hottest kernel function due to its > smp_mb(). > > The stated purpose for the smp_mb() in __sb_end_write() is to > ensure "s_writers are updated before we wake up waiters". We > only wake up waiters if waitqueue_active(), but we do the > smp_mb() unconditionally. > > It seems like we should be able to avoid it unless we are > actually doing the wake_up(). ... > diff -puN fs/super.c~selectively-do-barriers-in-__sb_end_write fs/super.c > --- a/fs/super.c~selectively-do-barriers-in-__sb_end_write 2015-06-19 15:20:37.953726659 -0700 > +++ b/fs/super.c 2015-06-19 15:20:37.956726794 -0700 > @@ -1147,13 +1147,14 @@ out: > void __sb_end_write(struct super_block *sb, int level) > { > percpu_counter_dec(&sb->s_writers.counter[level-1]); > - /* > - * Make sure s_writers are updated before we wake up waiters in > - * freeze_super(). > - */ > - smp_mb(); > - if (waitqueue_active(&sb->s_writers.wait)) > + if (waitqueue_active(&sb->s_writers.wait)) { > + /* > + * Make sure other CPUs can see our s_writers update > + * before we wake up waiters in freeze_super(). > + */ > + smp_mb(); I think this is actually wrong. The barrier has to be before the waitqueue_active() check. Otherwise that read can be reordered before the percpu counter increment and a race window opens... But we could make things faster by something like: __sb_end_write() rcu_read_lock(); percpu_counter_dec(&sb->s_writers.counter[level-1]); if (unlikely(sb->s_writers.frozen >= level)) wake_up(&sb->s_writers.wait); rcu_read_unlock(); So the synchronize_rcu() calls you've added in the first patch will make sure that all __sb_end_write() calls after we've started the freeze procedure will end up calling wake_up() and so the process waiting in sb_wait_write() will be woken as necessary. But please add a detailed comment about the synchronization because its tricky and uncommon... Honza > wake_up(&sb->s_writers.wait); > + } > rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_); > } > EXPORT_SYMBOL(__sb_end_write); > _ -- Jan Kara SUSE Labs, CR