From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756242AbbFSWcg (ORCPT ); Fri, 19 Jun 2015 18:32:36 -0400 Received: from mga02.intel.com ([134.134.136.20]:52325 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754054AbbFSWc1 (ORCPT ); Fri, 19 Jun 2015 18:32:27 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,646,1427785200"; d="scan'208";a="749873022" Subject: [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() To: dave@sr71.net Cc: jack@suse.cz, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com, tim.c.chen@linux.intel.com, ak@linux.intel.com From: Dave Hansen Date: Fri, 19 Jun 2015 15:32:23 -0700 Message-Id: <20150619223223.10B658AD@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, __sb_start_write() and freeze_super() can race with each other. __sb_start_write() uses a smp_mb() to ensure that freeze_super() can see its write to sb->s_writers.counter and that it can see freeze_super()'s update to sb->s_writers.frozen. This all seems to work fine. But, this smp_mb() makes __sb_start_write() the single hottest function in the kernel if I sit in a loop and do tiny write()s to tmpfs over and over. This is on a very small 2-core system, so it will only get worse on larger systems. This _seems_ like an ideal case for RCU. __sb_start_write() is the RCU read-side and is in a very fast, performance-sensitive path. freeze_super() is the RCU writer and is in an extremely rare non-performance-sensitive path. Instead of doing and smp_wmb() in __sb_start_write(), we do rcu_read_lock(). This ensures that a CPU doing freeze_super() can not proceed past its synchronize_rcu() until the grace period has ended and the 's_writers.frozen = SB_FREEZE_WRITE' is visible to __sb_start_write(). One question here: Does the work that __sb_start_write() does in a previous grace period becomes visible to freeze_super() after its call to synchronize_rcu()? It _seems_ like it should, but it seems backwards to me since __sb_start_write() is the "reader" in this case. This patch increases the number of writes/second that I can do by 10.4%. Does anybody see any holes with this? Cc: Jan Kara Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Paul E. McKenney Cc: Tim Chen Cc: Andi Kleen --- b/fs/super.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff -puN fs/super.c~rcu-__sb_start_write fs/super.c --- a/fs/super.c~rcu-__sb_start_write 2015-06-19 14:50:53.081869092 -0700 +++ b/fs/super.c 2015-06-19 15:19:03.000473047 -0700 @@ -1190,27 +1190,25 @@ static void acquire_freeze_lock(struct s */ int __sb_start_write(struct super_block *sb, int level, bool wait) { -retry: - if (unlikely(sb->s_writers.frozen >= level)) { + /* + * RCU keeps freeze_super() from proceeding + * while we are in here. + */ + rcu_read_lock(); + while (unlikely(sb->s_writers.frozen >= level)) { + rcu_read_unlock(); if (!wait) - return 0; + return 0; wait_event(sb->s_writers.wait_unfrozen, sb->s_writers.frozen < level); + rcu_read_lock(); } #ifdef CONFIG_LOCKDEP acquire_freeze_lock(sb, level, !wait, _RET_IP_); #endif percpu_counter_inc(&sb->s_writers.counter[level-1]); - /* - * Make sure counter is updated before we check for frozen. - * freeze_super() first sets frozen and then checks the counter. - */ - smp_mb(); - if (unlikely(sb->s_writers.frozen >= level)) { - __sb_end_write(sb, level); - goto retry; - } + rcu_read_unlock(); return 1; } EXPORT_SYMBOL(__sb_start_write); @@ -1312,7 +1310,13 @@ int freeze_super(struct super_block *sb) /* From now on, no new normal writers can start */ sb->s_writers.frozen = SB_FREEZE_WRITE; - smp_wmb(); + /* + * After we synchronize_rcu(), we have ensured that everyone + * who reads sb->s_writers.frozen under rcu_read_lock() can + * now see our update. This pretty much means that + * __sb_start_write() will not allow any new writers. + */ + synchronize_rcu(); /* Release s_umount to preserve sb_start_write -> s_umount ordering */ up_write(&sb->s_umount); @@ -1322,7 +1326,7 @@ int freeze_super(struct super_block *sb) /* Now we go and block page faults... */ down_write(&sb->s_umount); sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_PAGEFAULT); @@ -1331,7 +1335,7 @@ int freeze_super(struct super_block *sb) /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_FS); if (sb->s_op->freeze_fs) { @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); return ret; @@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb) out: sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Hansen Subject: [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Date: Fri, 19 Jun 2015 15:32:23 -0700 Message-ID: <20150619223223.10B658AD@viggo.jf.intel.com> Cc: jack@suse.cz, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com, tim.c.chen@linux.intel.com, ak@linux.intel.com To: dave@sr71.net Return-path: Received: from mga02.intel.com ([134.134.136.20]:52325 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754054AbbFSWc1 (ORCPT ); Fri, 19 Jun 2015 18:32:27 -0400 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Currently, __sb_start_write() and freeze_super() can race with each other. __sb_start_write() uses a smp_mb() to ensure that freeze_super() can see its write to sb->s_writers.counter and that it can see freeze_super()'s update to sb->s_writers.frozen. This all seems to work fine. But, this smp_mb() makes __sb_start_write() the single hottest function in the kernel if I sit in a loop and do tiny write()s to tmpfs over and over. This is on a very small 2-core system, so it will only get worse on larger systems. This _seems_ like an ideal case for RCU. __sb_start_write() is the RCU read-side and is in a very fast, performance-sensitive path. freeze_super() is the RCU writer and is in an extremely rare non-performance-sensitive path. Instead of doing and smp_wmb() in __sb_start_write(), we do rcu_read_lock(). This ensures that a CPU doing freeze_super() can not proceed past its synchronize_rcu() until the grace period has ended and the 's_writers.frozen = SB_FREEZE_WRITE' is visible to __sb_start_write(). One question here: Does the work that __sb_start_write() does in a previous grace period becomes visible to freeze_super() after its call to synchronize_rcu()? It _seems_ like it should, but it seems backwards to me since __sb_start_write() is the "reader" in this case. This patch increases the number of writes/second that I can do by 10.4%. Does anybody see any holes with this? Cc: Jan Kara Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Paul E. McKenney Cc: Tim Chen Cc: Andi Kleen --- b/fs/super.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff -puN fs/super.c~rcu-__sb_start_write fs/super.c --- a/fs/super.c~rcu-__sb_start_write 2015-06-19 14:50:53.081869092 -0700 +++ b/fs/super.c 2015-06-19 15:19:03.000473047 -0700 @@ -1190,27 +1190,25 @@ static void acquire_freeze_lock(struct s */ int __sb_start_write(struct super_block *sb, int level, bool wait) { -retry: - if (unlikely(sb->s_writers.frozen >= level)) { + /* + * RCU keeps freeze_super() from proceeding + * while we are in here. + */ + rcu_read_lock(); + while (unlikely(sb->s_writers.frozen >= level)) { + rcu_read_unlock(); if (!wait) - return 0; + return 0; wait_event(sb->s_writers.wait_unfrozen, sb->s_writers.frozen < level); + rcu_read_lock(); } #ifdef CONFIG_LOCKDEP acquire_freeze_lock(sb, level, !wait, _RET_IP_); #endif percpu_counter_inc(&sb->s_writers.counter[level-1]); - /* - * Make sure counter is updated before we check for frozen. - * freeze_super() first sets frozen and then checks the counter. - */ - smp_mb(); - if (unlikely(sb->s_writers.frozen >= level)) { - __sb_end_write(sb, level); - goto retry; - } + rcu_read_unlock(); return 1; } EXPORT_SYMBOL(__sb_start_write); @@ -1312,7 +1310,13 @@ int freeze_super(struct super_block *sb) /* From now on, no new normal writers can start */ sb->s_writers.frozen = SB_FREEZE_WRITE; - smp_wmb(); + /* + * After we synchronize_rcu(), we have ensured that everyone + * who reads sb->s_writers.frozen under rcu_read_lock() can + * now see our update. This pretty much means that + * __sb_start_write() will not allow any new writers. + */ + synchronize_rcu(); /* Release s_umount to preserve sb_start_write -> s_umount ordering */ up_write(&sb->s_umount); @@ -1322,7 +1326,7 @@ int freeze_super(struct super_block *sb) /* Now we go and block page faults... */ down_write(&sb->s_umount); sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_PAGEFAULT); @@ -1331,7 +1335,7 @@ int freeze_super(struct super_block *sb) /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_FS); if (sb->s_op->freeze_fs) { @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); return ret; @@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb) out: sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); _ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in