* [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write()
@ 2015-06-19 22:32 Dave Hansen
2015-06-19 22:32 ` [RFC][PATCH 2/2] fs: conditionally do memory barrier in __sb_end_write() Dave Hansen
2015-06-23 11:09 ` [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Jan Kara
0 siblings, 2 replies; 5+ messages in thread
From: Dave Hansen @ 2015-06-19 22:32 UTC (permalink / raw)
To: dave; +Cc: jack, viro, linux-fsdevel, linux-kernel, paulmck, tim.c.chen, ak
Currently, __sb_start_write() and freeze_super() can race with
each other. __sb_start_write() uses a smp_mb() to ensure that
freeze_super() can see its write to sb->s_writers.counter and
that it can see freeze_super()'s update to sb->s_writers.frozen.
This all seems to work fine.
But, this smp_mb() makes __sb_start_write() the single hottest
function in the kernel if I sit in a loop and do tiny write()s to
tmpfs over and over. This is on a very small 2-core system, so
it will only get worse on larger systems.
This _seems_ like an ideal case for RCU. __sb_start_write() is
the RCU read-side and is in a very fast, performance-sensitive
path. freeze_super() is the RCU writer and is in an extremely
rare non-performance-sensitive path.
Instead of doing and smp_wmb() in __sb_start_write(), we do
rcu_read_lock(). This ensures that a CPU doing freeze_super()
can not proceed past its synchronize_rcu() until the grace
period has ended and the 's_writers.frozen = SB_FREEZE_WRITE'
is visible to __sb_start_write().
One question here: Does the work that __sb_start_write() does in
a previous grace period becomes visible to freeze_super() after
its call to synchronize_rcu()? It _seems_ like it should, but it
seems backwards to me since __sb_start_write() is the "reader" in
this case.
This patch increases the number of writes/second that I can do
by 10.4%.
Does anybody see any holes with this?
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
b/fs/super.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff -puN fs/super.c~rcu-__sb_start_write fs/super.c
--- a/fs/super.c~rcu-__sb_start_write 2015-06-19 14:50:53.081869092 -0700
+++ b/fs/super.c 2015-06-19 15:19:03.000473047 -0700
@@ -1190,27 +1190,25 @@ static void acquire_freeze_lock(struct s
*/
int __sb_start_write(struct super_block *sb, int level, bool wait)
{
-retry:
- if (unlikely(sb->s_writers.frozen >= level)) {
+ /*
+ * RCU keeps freeze_super() from proceeding
+ * while we are in here.
+ */
+ rcu_read_lock();
+ while (unlikely(sb->s_writers.frozen >= level)) {
+ rcu_read_unlock();
if (!wait)
- return 0;
+ return 0;
wait_event(sb->s_writers.wait_unfrozen,
sb->s_writers.frozen < level);
+ rcu_read_lock();
}
#ifdef CONFIG_LOCKDEP
acquire_freeze_lock(sb, level, !wait, _RET_IP_);
#endif
percpu_counter_inc(&sb->s_writers.counter[level-1]);
- /*
- * Make sure counter is updated before we check for frozen.
- * freeze_super() first sets frozen and then checks the counter.
- */
- smp_mb();
- if (unlikely(sb->s_writers.frozen >= level)) {
- __sb_end_write(sb, level);
- goto retry;
- }
+ rcu_read_unlock();
return 1;
}
EXPORT_SYMBOL(__sb_start_write);
@@ -1312,7 +1310,13 @@ int freeze_super(struct super_block *sb)
/* From now on, no new normal writers can start */
sb->s_writers.frozen = SB_FREEZE_WRITE;
- smp_wmb();
+ /*
+ * After we synchronize_rcu(), we have ensured that everyone
+ * who reads sb->s_writers.frozen under rcu_read_lock() can
+ * now see our update. This pretty much means that
+ * __sb_start_write() will not allow any new writers.
+ */
+ synchronize_rcu();
/* Release s_umount to preserve sb_start_write -> s_umount ordering */
up_write(&sb->s_umount);
@@ -1322,7 +1326,7 @@ int freeze_super(struct super_block *sb)
/* Now we go and block page faults... */
down_write(&sb->s_umount);
sb->s_writers.frozen = SB_FREEZE_PAGEFAULT;
- smp_wmb();
+ synchronize_rcu();
sb_wait_write(sb, SB_FREEZE_PAGEFAULT);
@@ -1331,7 +1335,7 @@ int freeze_super(struct super_block *sb)
/* Now wait for internal filesystem counter */
sb->s_writers.frozen = SB_FREEZE_FS;
- smp_wmb();
+ synchronize_rcu();
sb_wait_write(sb, SB_FREEZE_FS);
if (sb->s_op->freeze_fs) {
@@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem freeze failed\n");
sb->s_writers.frozen = SB_UNFROZEN;
- smp_wmb();
+ synchronize_rcu();
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);
return ret;
@@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb)
out:
sb->s_writers.frozen = SB_UNFROZEN;
- smp_wmb();
+ synchronize_rcu();
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC][PATCH 2/2] fs: conditionally do memory barrier in __sb_end_write()
2015-06-19 22:32 [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Dave Hansen
@ 2015-06-19 22:32 ` Dave Hansen
2015-06-23 12:02 ` Jan Kara
2015-06-23 11:09 ` [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Jan Kara
1 sibling, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2015-06-19 22:32 UTC (permalink / raw)
To: dave; +Cc: jack, viro, linux-fsdevel, linux-kernel, paulmck, tim.c.chen, ak
If I sit in a loop and do write()s to small tmpfs files,
__sb_end_write() is third-hottest kernel function due to its
smp_mb().
The stated purpose for the smp_mb() in __sb_end_write() is to
ensure "s_writers are updated before we wake up waiters". We
only wake up waiters if waitqueue_active(), but we do the
smp_mb() unconditionally.
It seems like we should be able to avoid it unless we are
actually doing the wake_up().
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
b/fs/super.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff -puN fs/super.c~selectively-do-barriers-in-__sb_end_write fs/super.c
--- a/fs/super.c~selectively-do-barriers-in-__sb_end_write 2015-06-19 15:20:37.953726659 -0700
+++ b/fs/super.c 2015-06-19 15:20:37.956726794 -0700
@@ -1147,13 +1147,14 @@ out:
void __sb_end_write(struct super_block *sb, int level)
{
percpu_counter_dec(&sb->s_writers.counter[level-1]);
- /*
- * Make sure s_writers are updated before we wake up waiters in
- * freeze_super().
- */
- smp_mb();
- if (waitqueue_active(&sb->s_writers.wait))
+ if (waitqueue_active(&sb->s_writers.wait)) {
+ /*
+ * Make sure other CPUs can see our s_writers update
+ * before we wake up waiters in freeze_super().
+ */
+ smp_mb();
wake_up(&sb->s_writers.wait);
+ }
rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_);
}
EXPORT_SYMBOL(__sb_end_write);
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write()
2015-06-19 22:32 [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Dave Hansen
2015-06-19 22:32 ` [RFC][PATCH 2/2] fs: conditionally do memory barrier in __sb_end_write() Dave Hansen
@ 2015-06-23 11:09 ` Jan Kara
2015-06-24 20:21 ` Dave Hansen
1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2015-06-23 11:09 UTC (permalink / raw)
To: Dave Hansen
Cc: jack, viro, linux-fsdevel, linux-kernel, paulmck, tim.c.chen, ak
On Fri 19-06-15 15:32:23, Dave Hansen wrote:
>
> Currently, __sb_start_write() and freeze_super() can race with
> each other. __sb_start_write() uses a smp_mb() to ensure that
> freeze_super() can see its write to sb->s_writers.counter and
> that it can see freeze_super()'s update to sb->s_writers.frozen.
> This all seems to work fine.
>
> But, this smp_mb() makes __sb_start_write() the single hottest
> function in the kernel if I sit in a loop and do tiny write()s to
> tmpfs over and over. This is on a very small 2-core system, so
> it will only get worse on larger systems.
>
> This _seems_ like an ideal case for RCU. __sb_start_write() is
> the RCU read-side and is in a very fast, performance-sensitive
> path. freeze_super() is the RCU writer and is in an extremely
> rare non-performance-sensitive path.
>
> Instead of doing and smp_wmb() in __sb_start_write(), we do
> rcu_read_lock(). This ensures that a CPU doing freeze_super()
> can not proceed past its synchronize_rcu() until the grace
> period has ended and the 's_writers.frozen = SB_FREEZE_WRITE'
> is visible to __sb_start_write().
>
> One question here: Does the work that __sb_start_write() does in
> a previous grace period becomes visible to freeze_super() after
> its call to synchronize_rcu()? It _seems_ like it should, but it
> seems backwards to me since __sb_start_write() is the "reader" in
> this case.
>
> This patch increases the number of writes/second that I can do
> by 10.4%.
>
> Does anybody see any holes with this?
Nice speed up and looks good to me. Just one question below.
> @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb)
> printk(KERN_ERR
> "VFS:Filesystem freeze failed\n");
> sb->s_writers.frozen = SB_UNFROZEN;
> - smp_wmb();
> + synchronize_rcu();
Do we really need synchronize_rcu() here? We just need to make sure write
to sb->s_writers.frozen happens before we start waking processes...
> wake_up(&sb->s_writers.wait_unfrozen);
> deactivate_locked_super(sb);
> return ret;
> @@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb)
>
> out:
> sb->s_writers.frozen = SB_UNFROZEN;
> - smp_wmb();
> + synchronize_rcu();
> wake_up(&sb->s_writers.wait_unfrozen);
And here as well...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH 2/2] fs: conditionally do memory barrier in __sb_end_write()
2015-06-19 22:32 ` [RFC][PATCH 2/2] fs: conditionally do memory barrier in __sb_end_write() Dave Hansen
@ 2015-06-23 12:02 ` Jan Kara
0 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2015-06-23 12:02 UTC (permalink / raw)
To: Dave Hansen
Cc: jack, viro, linux-fsdevel, linux-kernel, paulmck, tim.c.chen, ak
On Fri 19-06-15 15:32:23, Dave Hansen wrote:
> If I sit in a loop and do write()s to small tmpfs files,
> __sb_end_write() is third-hottest kernel function due to its
> smp_mb().
>
> The stated purpose for the smp_mb() in __sb_end_write() is to
> ensure "s_writers are updated before we wake up waiters". We
> only wake up waiters if waitqueue_active(), but we do the
> smp_mb() unconditionally.
>
> It seems like we should be able to avoid it unless we are
> actually doing the wake_up().
...
> diff -puN fs/super.c~selectively-do-barriers-in-__sb_end_write fs/super.c
> --- a/fs/super.c~selectively-do-barriers-in-__sb_end_write 2015-06-19 15:20:37.953726659 -0700
> +++ b/fs/super.c 2015-06-19 15:20:37.956726794 -0700
> @@ -1147,13 +1147,14 @@ out:
> void __sb_end_write(struct super_block *sb, int level)
> {
> percpu_counter_dec(&sb->s_writers.counter[level-1]);
> - /*
> - * Make sure s_writers are updated before we wake up waiters in
> - * freeze_super().
> - */
> - smp_mb();
> - if (waitqueue_active(&sb->s_writers.wait))
> + if (waitqueue_active(&sb->s_writers.wait)) {
> + /*
> + * Make sure other CPUs can see our s_writers update
> + * before we wake up waiters in freeze_super().
> + */
> + smp_mb();
I think this is actually wrong. The barrier has to be before the
waitqueue_active() check. Otherwise that read can be reordered before the
percpu counter increment and a race window opens...
But we could make things faster by something like:
__sb_end_write()
rcu_read_lock();
percpu_counter_dec(&sb->s_writers.counter[level-1]);
if (unlikely(sb->s_writers.frozen >= level))
wake_up(&sb->s_writers.wait);
rcu_read_unlock();
So the synchronize_rcu() calls you've added in the first patch will make
sure that all __sb_end_write() calls after we've started the freeze
procedure will end up calling wake_up() and so the process waiting in
sb_wait_write() will be woken as necessary. But please add a detailed
comment about the synchronization because its tricky and uncommon...
Honza
> wake_up(&sb->s_writers.wait);
> + }
> rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_);
> }
> EXPORT_SYMBOL(__sb_end_write);
> _
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write()
2015-06-23 11:09 ` [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Jan Kara
@ 2015-06-24 20:21 ` Dave Hansen
0 siblings, 0 replies; 5+ messages in thread
From: Dave Hansen @ 2015-06-24 20:21 UTC (permalink / raw)
To: Jan Kara; +Cc: viro, linux-fsdevel, linux-kernel, paulmck, tim.c.chen, ak
On 06/23/2015 04:09 AM, Jan Kara wrote:
>> @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb)
>> printk(KERN_ERR
>> "VFS:Filesystem freeze failed\n");
>> sb->s_writers.frozen = SB_UNFROZEN;
>> - smp_wmb();
>> + synchronize_rcu();
>
> Do we really need synchronize_rcu() here? We just need to make sure write
> to sb->s_writers.frozen happens before we start waking processes...
I don't think it is necessary. We only need to be concerned in practice
if someone could be inside a critical section when we are executing
this. I *think* the only case that we have that really matters will be
taken care of by the _first_ synchronize_rcu().
It's definitely worth adding a comment.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-06-24 20:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-19 22:32 [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Dave Hansen
2015-06-19 22:32 ` [RFC][PATCH 2/2] fs: conditionally do memory barrier in __sb_end_write() Dave Hansen
2015-06-23 12:02 ` Jan Kara
2015-06-23 11:09 ` [RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write() Jan Kara
2015-06-24 20:21 ` Dave Hansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).