All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
@ 2017-10-26  2:26 NeilBrown
  2017-10-26 12:27 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2017-10-26  2:26 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Paul E. McKenney, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 1421 bytes --]


The synchronize_rcu() in namespace_unlock() is called every time
a filesystem is unmounted.  If a great many filesystems are mounted,
this can cause a noticable slow-down in, for example, system shutdown.

The sequence:
  mkdir -p /tmp/Mtest/{0..5000}
  time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
  time umount /tmp/Mtest/*

on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
100 seconds to unmount them.

Boot the same VM with 1 CPU and it takes 18 seconds to mount the
tmpfs filesystems, but only 36 to unmount.

If we change the synchronize_rcu() to synchronize_rcu_expedited()
the umount time on a 4-cpu VM is 8 seconds to mount and 0.6 to
unmount.

I think this 200-fold speed up is worth the slightly higher system
impact of use synchronize_rcu_expedited().

Signed-off-by: NeilBrown <neilb@suse.com>
---

Cc: to Paul and Josh in case they'll correct me if using _expedited()
is really bad here.

Thanks,
NeilBrown


 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3b601f115b6c..fce91c447fab 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1420,7 +1420,7 @@ static void namespace_unlock(void)
 	if (likely(hlist_empty(&head)))
 		return;
 
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	group_pin_kill(&head);
 }
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-10-26  2:26 [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock() NeilBrown
@ 2017-10-26 12:27 ` Paul E. McKenney
  2017-10-26 13:50   ` Paul E. McKenney
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Paul E. McKenney @ 2017-10-26 12:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, Josh Triplett

On Thu, Oct 26, 2017 at 01:26:37PM +1100, NeilBrown wrote:
> 
> The synchronize_rcu() in namespace_unlock() is called every time
> a filesystem is unmounted.  If a great many filesystems are mounted,
> this can cause a noticable slow-down in, for example, system shutdown.
> 
> The sequence:
>   mkdir -p /tmp/Mtest/{0..5000}
>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>   time umount /tmp/Mtest/*
> 
> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> 100 seconds to unmount them.
> 
> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> tmpfs filesystems, but only 36 to unmount.
> 
> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> the umount time on a 4-cpu VM is 8 seconds to mount and 0.6 to
> unmount.
> 
> I think this 200-fold speed up is worth the slightly higher system
> impact of use synchronize_rcu_expedited().
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> Cc: to Paul and Josh in case they'll correct me if using _expedited()
> is really bad here.

I suspect that filesystem unmount is pretty rare in production real-time
workloads, which are the ones that might care.  So I would guess that
this is OK.

If the real-time guys ever do want to do filesystem unmounts while their
real-time applications are running, they might modify this so that it can
use synchronize_rcu() instead for real-time builds of the kernel.

But just for completeness, one way to make this work across the board
might be to instead use call_rcu(), with the callback function kicking
off a workqueue handler to do the rest of the unmount.  Of course,
in saying that, I am ignoring any mutexes that you might be holding
across this whole thing, and also ignoring any problems that might arise
when returning to userspace with some portion of the unmount operation
still pending.  (For example, someone unmounting a filesystem and then
immediately remounting that same filesystem.)

							Thanx, Paul

> Thanks,
> NeilBrown
> 
> 
>  fs/namespace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 3b601f115b6c..fce91c447fab 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1420,7 +1420,7 @@ static void namespace_unlock(void)
>  	if (likely(hlist_empty(&head)))
>  		return;
>  
> -	synchronize_rcu();
> +	synchronize_rcu_expedited();
>  
>  	group_pin_kill(&head);
>  }
> -- 
> 2.14.0.rc0.dirty
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-10-26 12:27 ` Paul E. McKenney
@ 2017-10-26 13:50   ` Paul E. McKenney
  2017-10-27  0:45   ` NeilBrown
  2017-11-27 11:27   ` Florian Weimer
  2 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2017-10-26 13:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, Josh Triplett

On Thu, Oct 26, 2017 at 05:27:43AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 26, 2017 at 01:26:37PM +1100, NeilBrown wrote:
> > 
> > The synchronize_rcu() in namespace_unlock() is called every time
> > a filesystem is unmounted.  If a great many filesystems are mounted,
> > this can cause a noticable slow-down in, for example, system shutdown.
> > 
> > The sequence:
> >   mkdir -p /tmp/Mtest/{0..5000}
> >   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
> >   time umount /tmp/Mtest/*
> > 
> > on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> > 100 seconds to unmount them.
> > 
> > Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> > tmpfs filesystems, but only 36 to unmount.
> > 
> > If we change the synchronize_rcu() to synchronize_rcu_expedited()
> > the umount time on a 4-cpu VM is 8 seconds to mount and 0.6 to
> > unmount.
> > 
> > I think this 200-fold speed up is worth the slightly higher system
> > impact of use synchronize_rcu_expedited().
> > 
> > Signed-off-by: NeilBrown <neilb@suse.com>
> > ---
> > 
> > Cc: to Paul and Josh in case they'll correct me if using _expedited()
> > is really bad here.
> 
> I suspect that filesystem unmount is pretty rare in production real-time
> workloads, which are the ones that might care.  So I would guess that
> this is OK.
> 
> If the real-time guys ever do want to do filesystem unmounts while their
> real-time applications are running, they might modify this so that it can
> use synchronize_rcu() instead for real-time builds of the kernel.

Which they can already do using the rcupdate.rcu_normal boot parameter.

							Thanx, Paul

> But just for completeness, one way to make this work across the board
> might be to instead use call_rcu(), with the callback function kicking
> off a workqueue handler to do the rest of the unmount.  Of course,
> in saying that, I am ignoring any mutexes that you might be holding
> across this whole thing, and also ignoring any problems that might arise
> when returning to userspace with some portion of the unmount operation
> still pending.  (For example, someone unmounting a filesystem and then
> immediately remounting that same filesystem.)
> 
> 							Thanx, Paul
> 
> > Thanks,
> > NeilBrown
> > 
> > 
> >  fs/namespace.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 3b601f115b6c..fce91c447fab 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -1420,7 +1420,7 @@ static void namespace_unlock(void)
> >  	if (likely(hlist_empty(&head)))
> >  		return;
> >  
> > -	synchronize_rcu();
> > +	synchronize_rcu_expedited();
> >  
> >  	group_pin_kill(&head);
> >  }
> > -- 
> > 2.14.0.rc0.dirty
> > 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-10-26 12:27 ` Paul E. McKenney
  2017-10-26 13:50   ` Paul E. McKenney
@ 2017-10-27  0:45   ` NeilBrown
  2017-10-27  1:24     ` Paul E. McKenney
  2017-11-27 11:27   ` Florian Weimer
  2 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2017-10-27  0:45 UTC (permalink / raw)
  To: paulmck; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

On Thu, Oct 26 2017, Paul E. McKenney wrote:

> On Thu, Oct 26, 2017 at 01:26:37PM +1100, NeilBrown wrote:
>> 
>> The synchronize_rcu() in namespace_unlock() is called every time
>> a filesystem is unmounted.  If a great many filesystems are mounted,
>> this can cause a noticable slow-down in, for example, system shutdown.
>> 
>> The sequence:
>>   mkdir -p /tmp/Mtest/{0..5000}
>>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>>   time umount /tmp/Mtest/*
>> 
>> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
>> 100 seconds to unmount them.
>> 
>> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
>> tmpfs filesystems, but only 36 to unmount.
>> 
>> If we change the synchronize_rcu() to synchronize_rcu_expedited()
>> the umount time on a 4-cpu VM is 8 seconds to mount and 0.6 to
>> unmount.
>> 
>> I think this 200-fold speed up is worth the slightly higher system
>> impact of use synchronize_rcu_expedited().
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> 
>> Cc: to Paul and Josh in case they'll correct me if using _expedited()
>> is really bad here.
>
> I suspect that filesystem unmount is pretty rare in production real-time
> workloads, which are the ones that might care.  So I would guess that
> this is OK.
>
> If the real-time guys ever do want to do filesystem unmounts while their
> real-time applications are running, they might modify this so that it can
> use synchronize_rcu() instead for real-time builds of the kernel.

Thanks for the confirmation Paul.

>
> But just for completeness, one way to make this work across the board
> might be to instead use call_rcu(), with the callback function kicking
> off a workqueue handler to do the rest of the unmount.  Of course,
> in saying that, I am ignoring any mutexes that you might be holding
> across this whole thing, and also ignoring any problems that might arise
> when returning to userspace with some portion of the unmount operation
> still pending.  (For example, someone unmounting a filesystem and then
> immediately remounting that same filesystem.)

I had briefly considered that option, but it doesn't work.
The purpose of this synchronize_rcu() is to wait for any filename lookup
which might be locklessly touching the mountpoint to complete.
It is only after that that the real meat of unmount happen - the
filesystem is told that the last reference is gone, and it gets to
flush any saved changes out to disk etc.
That stuff really has to happen before the umount syscall returns.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-10-27  0:45   ` NeilBrown
@ 2017-10-27  1:24     ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2017-10-27  1:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, Josh Triplett

On Fri, Oct 27, 2017 at 11:45:08AM +1100, NeilBrown wrote:
> On Thu, Oct 26 2017, Paul E. McKenney wrote:
> 
> > On Thu, Oct 26, 2017 at 01:26:37PM +1100, NeilBrown wrote:
> >> 
> >> The synchronize_rcu() in namespace_unlock() is called every time
> >> a filesystem is unmounted.  If a great many filesystems are mounted,
> >> this can cause a noticable slow-down in, for example, system shutdown.
> >> 
> >> The sequence:
> >>   mkdir -p /tmp/Mtest/{0..5000}
> >>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
> >>   time umount /tmp/Mtest/*
> >> 
> >> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> >> 100 seconds to unmount them.
> >> 
> >> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> >> tmpfs filesystems, but only 36 to unmount.
> >> 
> >> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> >> the umount time on a 4-cpu VM is 8 seconds to mount and 0.6 to
> >> unmount.
> >> 
> >> I think this 200-fold speed up is worth the slightly higher system
> >> impact of use synchronize_rcu_expedited().
> >> 
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >> ---
> >> 
> >> Cc: to Paul and Josh in case they'll correct me if using _expedited()
> >> is really bad here.
> >
> > I suspect that filesystem unmount is pretty rare in production real-time
> > workloads, which are the ones that might care.  So I would guess that
> > this is OK.
> >
> > If the real-time guys ever do want to do filesystem unmounts while their
> > real-time applications are running, they might modify this so that it can
> > use synchronize_rcu() instead for real-time builds of the kernel.
> 
> Thanks for the confirmation Paul.
> 
> >
> > But just for completeness, one way to make this work across the board
> > might be to instead use call_rcu(), with the callback function kicking
> > off a workqueue handler to do the rest of the unmount.  Of course,
> > in saying that, I am ignoring any mutexes that you might be holding
> > across this whole thing, and also ignoring any problems that might arise
> > when returning to userspace with some portion of the unmount operation
> > still pending.  (For example, someone unmounting a filesystem and then
> > immediately remounting that same filesystem.)
> 
> I had briefly considered that option, but it doesn't work.
> The purpose of this synchronize_rcu() is to wait for any filename lookup
> which might be locklessly touching the mountpoint to complete.
> It is only after that that the real meat of unmount happen - the
> filesystem is told that the last reference is gone, and it gets to
> flush any saved changes out to disk etc.
> That stuff really has to happen before the umount syscall returns.

Hey, I was hoping!  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-10-26 12:27 ` Paul E. McKenney
  2017-10-26 13:50   ` Paul E. McKenney
  2017-10-27  0:45   ` NeilBrown
@ 2017-11-27 11:27   ` Florian Weimer
  2017-11-27 14:41     ` Paul E. McKenney
  2 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2017-11-27 11:27 UTC (permalink / raw)
  To: paulmck, NeilBrown
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Josh Triplett

On 10/26/2017 02:27 PM, Paul E. McKenney wrote:
> But just for completeness, one way to make this work across the board
> might be to instead use call_rcu(), with the callback function kicking
> off a workqueue handler to do the rest of the unmount.  Of course,
> in saying that, I am ignoring any mutexes that you might be holding
> across this whole thing, and also ignoring any problems that might arise
> when returning to userspace with some portion of the unmount operation
> still pending.  (For example, someone unmounting a filesystem and then
> immediately remounting that same filesystem.)

You really need to complete all side effects of deallocating a resource 
before returning to user space.  Otherwise, it will never be possible to 
allocate and deallocate resources in a tight loop because you either get 
spurious failures because too many unaccounted deallocations are stuck 
somewhere in the system (and the user can't tell that this is due to a 
race), or you get an OOM because the user manages to queue up too much 
state.

We already have this problem with RLIMIT_NPROC, where waitpid etc. 
return before the process is completely gone.  On some 
kernels/configurations, the resulting race is so wide that parallel make 
no longer works reliable because it runs into fork failures.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-11-27 11:27   ` Florian Weimer
@ 2017-11-27 14:41     ` Paul E. McKenney
  2017-11-28 22:17       ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-11-27 14:41 UTC (permalink / raw)
  To: Florian Weimer
  Cc: NeilBrown, Alexander Viro, linux-fsdevel, linux-kernel, Josh Triplett

On Mon, Nov 27, 2017 at 12:27:04PM +0100, Florian Weimer wrote:
> On 10/26/2017 02:27 PM, Paul E. McKenney wrote:
> >But just for completeness, one way to make this work across the board
> >might be to instead use call_rcu(), with the callback function kicking
> >off a workqueue handler to do the rest of the unmount.  Of course,
> >in saying that, I am ignoring any mutexes that you might be holding
> >across this whole thing, and also ignoring any problems that might arise
> >when returning to userspace with some portion of the unmount operation
> >still pending.  (For example, someone unmounting a filesystem and then
> >immediately remounting that same filesystem.)
> 
> You really need to complete all side effects of deallocating a
> resource before returning to user space.  Otherwise, it will never
> be possible to allocate and deallocate resources in a tight loop
> because you either get spurious failures because too many
> unaccounted deallocations are stuck somewhere in the system (and the
> user can't tell that this is due to a race), or you get an OOM
> because the user manages to queue up too much state.
> 
> We already have this problem with RLIMIT_NPROC, where waitpid etc.
> return before the process is completely gone.  On some
> kernels/configurations, the resulting race is so wide that parallel
> make no longer works reliable because it runs into fork failures.

Or alternatively, use rcu_barrier() occasionally to wait for all
preceding deferred deallocations.  And there are quite a few other
ways to take on this problem.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-11-27 14:41     ` Paul E. McKenney
@ 2017-11-28 22:17       ` NeilBrown
  2018-10-05  1:27         ` [PATCH - resend] " NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2017-11-28 22:17 UTC (permalink / raw)
  To: paulmck, Florian Weimer
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]

On Mon, Nov 27 2017, Paul E. McKenney wrote:

> On Mon, Nov 27, 2017 at 12:27:04PM +0100, Florian Weimer wrote:
>> On 10/26/2017 02:27 PM, Paul E. McKenney wrote:
>> >But just for completeness, one way to make this work across the board
>> >might be to instead use call_rcu(), with the callback function kicking
>> >off a workqueue handler to do the rest of the unmount.  Of course,
>> >in saying that, I am ignoring any mutexes that you might be holding
>> >across this whole thing, and also ignoring any problems that might arise
>> >when returning to userspace with some portion of the unmount operation
>> >still pending.  (For example, someone unmounting a filesystem and then
>> >immediately remounting that same filesystem.)
>> 
>> You really need to complete all side effects of deallocating a
>> resource before returning to user space.  Otherwise, it will never
>> be possible to allocate and deallocate resources in a tight loop
>> because you either get spurious failures because too many
>> unaccounted deallocations are stuck somewhere in the system (and the
>> user can't tell that this is due to a race), or you get an OOM
>> because the user manages to queue up too much state.
>> 
>> We already have this problem with RLIMIT_NPROC, where waitpid etc.
>> return before the process is completely gone.  On some
>> kernels/configurations, the resulting race is so wide that parallel
>> make no longer works reliable because it runs into fork failures.
>
> Or alternatively, use rcu_barrier() occasionally to wait for all
> preceding deferred deallocations.  And there are quite a few other
> ways to take on this problem.

So, supposing we could package up everything that has to happen after
the current synchronize_rcu() and put it in an call_rcu() call back,
then instead of calling synchronize_rcu_expedited() at the end of
namespace_unlock(), we could possibly call call_rcu() there and
rcu_barrier() at the start of namespace_lock().....

That would mean a single unmount would have low impact, but it would
still slow down a sequence of 1000 consecutive unmounts.
Maybe we would only need the rcu_barrier() before select
namespace_lock() calls.  I would need to study the code closely to
form an opinion.  Interesting idea though.

Hopefully the _expedited() patch will be accepted - I haven't had a
"nak" yet...

thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH - resend] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2017-11-28 22:17       ` NeilBrown
@ 2018-10-05  1:27         ` NeilBrown
  2018-10-05  1:40           ` Al Viro
  2018-11-06  3:15           ` [PATCH - resend] " NeilBrown
  0 siblings, 2 replies; 16+ messages in thread
From: NeilBrown @ 2018-10-05  1:27 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton
  Cc: paulmck, Florian Weimer, linux-fsdevel, linux-kernel, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]


The synchronize_rcu() in namespace_unlock() is called every time
a filesystem is unmounted.  If a great many filesystems are mounted,
this can cause a noticable slow-down in, for example, system shutdown.

The sequence:
  mkdir -p /tmp/Mtest/{0..5000}
  time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
  time umount /tmp/Mtest/*

on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
100 seconds to unmount them.

Boot the same VM with 1 CPU and it takes 18 seconds to mount the
tmpfs filesystems, but only 36 to unmount.

If we change the synchronize_rcu() to synchronize_rcu_expedited()
the umount time on a 4-cpu VM drop to 0.6 seconds

I think this 200-fold speed up is worth the slightly high system
impact of using synchronize_rcu_expedited().

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
Signed-off-by: NeilBrown <neilb@suse.com>
---

I posted this last October, then again last November (cc:ing Linus)
Paul is happy enough with it, but no other response.
I'm hoping it can get applied this time....

Thanks,
NeilBrown


 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 99186556f8d3..02e978b22294 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1360,7 +1360,7 @@ static void namespace_unlock(void)
 	if (likely(hlist_empty(&head)))
 		return;
 
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	group_pin_kill(&head);
 }
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH - resend] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2018-10-05  1:27         ` [PATCH - resend] " NeilBrown
@ 2018-10-05  1:40           ` Al Viro
  2018-10-05  2:53             ` NeilBrown
                               ` (2 more replies)
  2018-11-06  3:15           ` [PATCH - resend] " NeilBrown
  1 sibling, 3 replies; 16+ messages in thread
From: Al Viro @ 2018-10-05  1:40 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, paulmck, Florian Weimer, linux-fsdevel,
	linux-kernel, Josh Triplett

On Fri, Oct 05, 2018 at 11:27:37AM +1000, NeilBrown wrote:
> 
> The synchronize_rcu() in namespace_unlock() is called every time
> a filesystem is unmounted.  If a great many filesystems are mounted,
> this can cause a noticable slow-down in, for example, system shutdown.
> 
> The sequence:
>   mkdir -p /tmp/Mtest/{0..5000}
>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>   time umount /tmp/Mtest/*
> 
> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> 100 seconds to unmount them.
> 
> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> tmpfs filesystems, but only 36 to unmount.
> 
> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> the umount time on a 4-cpu VM drop to 0.6 seconds
> 
> I think this 200-fold speed up is worth the slightly high system
> impact of using synchronize_rcu_expedited().
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> I posted this last October, then again last November (cc:ing Linus)
> Paul is happy enough with it, but no other response.
> I'm hoping it can get applied this time....

Umm...  IIRC, the last one got sidetracked on the other thing in the series...
<checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
of load would trigger the impact of the change?  Paul?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH - resend] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2018-10-05  1:40           ` Al Viro
@ 2018-10-05  2:53             ` NeilBrown
  2018-10-05  4:08             ` Paul E. McKenney
  2018-11-29 23:33             ` [PATCH - resend*2] " NeilBrown
  2 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2018-10-05  2:53 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, paulmck, Florian Weimer, linux-fsdevel,
	linux-kernel, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]

On Fri, Oct 05 2018, Al Viro wrote:

> On Fri, Oct 05, 2018 at 11:27:37AM +1000, NeilBrown wrote:
>> 
>> The synchronize_rcu() in namespace_unlock() is called every time
>> a filesystem is unmounted.  If a great many filesystems are mounted,
>> this can cause a noticable slow-down in, for example, system shutdown.
>> 
>> The sequence:
>>   mkdir -p /tmp/Mtest/{0..5000}
>>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>>   time umount /tmp/Mtest/*
>> 
>> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
>> 100 seconds to unmount them.
>> 
>> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
>> tmpfs filesystems, but only 36 to unmount.
>> 
>> If we change the synchronize_rcu() to synchronize_rcu_expedited()
>> the umount time on a 4-cpu VM drop to 0.6 seconds
>> 
>> I think this 200-fold speed up is worth the slightly high system
>> impact of using synchronize_rcu_expedited().
>> 
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> 
>> I posted this last October, then again last November (cc:ing Linus)
>> Paul is happy enough with it, but no other response.
>> I'm hoping it can get applied this time....
>
> Umm...  IIRC, the last one got sidetracked on the other thing in the series...
> <checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
> of load would trigger the impact of the change?  Paul?

I think you would need a long sequence of umounts to notice anything.
What you would notice is substantially reduced wall-clock time, but
slightly increased CPU time.

The original bug report that lead to this patch was a system with "HUGE
direct automount maps (>23k at this point)".
Stopping autofs (during shutdown) took more minutes than seemed
reasonable.

I noticed it again just recently when working on a systemd issue.  If
you mount thousands of filesystems in quick succession (ClearCase can do
this), systemd processes /proc/self/mountinfo constantly and slows down
the whole process.  When I unmount my test filesystems (mount --bind
/etc /MNT/$1) it takes a similar amount of time, but now it isn't
systemd slowing things down (which is odd actually, I wonder why systemd
didn't notice..) but rather the synchronize_rcu() delays.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH - resend] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2018-10-05  1:40           ` Al Viro
  2018-10-05  2:53             ` NeilBrown
@ 2018-10-05  4:08             ` Paul E. McKenney
  2018-11-29 23:33             ` [PATCH - resend*2] " NeilBrown
  2 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2018-10-05  4:08 UTC (permalink / raw)
  To: Al Viro
  Cc: NeilBrown, Andrew Morton, Florian Weimer, linux-fsdevel,
	linux-kernel, Josh Triplett

On Fri, Oct 05, 2018 at 02:40:02AM +0100, Al Viro wrote:
> On Fri, Oct 05, 2018 at 11:27:37AM +1000, NeilBrown wrote:
> > 
> > The synchronize_rcu() in namespace_unlock() is called every time
> > a filesystem is unmounted.  If a great many filesystems are mounted,
> > this can cause a noticable slow-down in, for example, system shutdown.
> > 
> > The sequence:
> >   mkdir -p /tmp/Mtest/{0..5000}
> >   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
> >   time umount /tmp/Mtest/*
> > 
> > on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> > 100 seconds to unmount them.
> > 
> > Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> > tmpfs filesystems, but only 36 to unmount.
> > 
> > If we change the synchronize_rcu() to synchronize_rcu_expedited()
> > the umount time on a 4-cpu VM drop to 0.6 seconds
> > 
> > I think this 200-fold speed up is worth the slightly high system
> > impact of using synchronize_rcu_expedited().
> > 
> > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> > Signed-off-by: NeilBrown <neilb@suse.com>
> > ---
> > 
> > I posted this last October, then again last November (cc:ing Linus)
> > Paul is happy enough with it, but no other response.
> > I'm hoping it can get applied this time....
> 
> Umm...  IIRC, the last one got sidetracked on the other thing in the series...
> <checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
> of load would trigger the impact of the change?  Paul?

You lost me with "what kind of load would trigger the impact of the
change?", but if you are asking about the downside, that would be IPIs
sent from each call to synchronize_rcu_expedited().  But people with
things like real-time workloads that therefore don't like those IPIs
have a number of options:

1.	Boot with rcupdate.rcu_normal=1, which converts all calls to
	synchronize_rcu_expedited() to synchronize_rcu().  This of
	course loses the performance gain, but this can be a good
	tradeoff for real-time workloads.

2.	Build with CONFIG_NO_HZ_FULL=y and boot with nohz_full= to
	cover the CPUs running your real-time workload.  Then
	as long as there is only one runnable usermode task per
	nohz_full CPU, synchronize_rcu_expedited() will avoid sending
	IPIs to any of the nohz_full CPUs.

3.	Don't do unmounts while your real-time application is running.

Probably other options as well, but those are the ones that come
immediately to mind.

If I missed the point of your question, please help me understand
what you are asking for.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH - resend] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2018-10-05  1:27         ` [PATCH - resend] " NeilBrown
  2018-10-05  1:40           ` Al Viro
@ 2018-11-06  3:15           ` NeilBrown
  1 sibling, 0 replies; 16+ messages in thread
From: NeilBrown @ 2018-11-06  3:15 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton
  Cc: paulmck, Florian Weimer, linux-fsdevel, linux-kernel, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]

On Fri, Oct 05 2018, NeilBrown wrote:

> The synchronize_rcu() in namespace_unlock() is called every time
> a filesystem is unmounted.  If a great many filesystems are mounted,
> this can cause a noticable slow-down in, for example, system shutdown.
>
> The sequence:
>   mkdir -p /tmp/Mtest/{0..5000}
>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>   time umount /tmp/Mtest/*
>
> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> 100 seconds to unmount them.
>
> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> tmpfs filesystems, but only 36 to unmount.
>
> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> the umount time on a 4-cpu VM drop to 0.6 seconds
>
> I think this 200-fold speed up is worth the slightly high system
> impact of using synchronize_rcu_expedited().
>
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>
> I posted this last October, then again last November (cc:ing Linus)
> Paul is happy enough with it, but no other response.
> I'm hoping it can get applied this time....

Hi Al,
 this isn't in 4.20-rc1.  Are you still waiting for something?

Thanks,
NeilBrown


>
> Thanks,
> NeilBrown
>
>
>  fs/namespace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 99186556f8d3..02e978b22294 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1360,7 +1360,7 @@ static void namespace_unlock(void)
>  	if (likely(hlist_empty(&head)))
>  		return;
>  
> -	synchronize_rcu();
> +	synchronize_rcu_expedited();
>  
>  	group_pin_kill(&head);
>  }
> -- 
> 2.14.0.rc0.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH - resend*2] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2018-10-05  1:40           ` Al Viro
  2018-10-05  2:53             ` NeilBrown
  2018-10-05  4:08             ` Paul E. McKenney
@ 2018-11-29 23:33             ` NeilBrown
  2018-11-29 23:52               ` Al Viro
  2 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2018-11-29 23:33 UTC (permalink / raw)
  To: Al Viro, Andrew Morton, Linus Torvalds
  Cc: paulmck, Florian Weimer, linux-fsdevel, linux-kernel, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 1773 bytes --]


The synchronize_rcu() in namespace_unlock() is called every time
a filesystem is unmounted.  If a great many filesystems are mounted,
this can cause a noticable slow-down in, for example, system shutdown.

The sequence:
  mkdir -p /tmp/Mtest/{0..5000}
  time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
  time umount /tmp/Mtest/*

on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
100 seconds to unmount them.

Boot the same VM with 1 CPU and it takes 18 seconds to mount the
tmpfs filesystems, but only 36 to unmount.

If we change the synchronize_rcu() to synchronize_rcu_expedited()
the umount time on a 4-cpu VM drop to 0.6 seconds

I think this 200-fold speed up is worth the slightly high system
impact of using synchronize_rcu_expedited().

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
Signed-off-by: NeilBrown <neilb@suse.com>
---

Al Viro says "I can live with this one" but this still hasn't landed.
Maybe someone else could apply it?

Thanks,
NeilBrown

Full quote from Al on 5th Oct:
> Umm...  IIRC, the last one got sidetracked on the other thing in the series...
> <checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
> of load would trigger the impact of the change?  Paul?
which Paul replied to.

 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a7f91265ea67..43a0d2c7449d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1360,7 +1360,7 @@ static void namespace_unlock(void)
 	if (likely(hlist_empty(&head)))
 		return;
 
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	group_pin_kill(&head);
 }
-- 
2.14.0.rc0.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH - resend*2] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2018-11-29 23:33             ` [PATCH - resend*2] " NeilBrown
@ 2018-11-29 23:52               ` Al Viro
  2018-11-30  1:09                 ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2018-11-29 23:52 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Linus Torvalds, paulmck, Florian Weimer,
	linux-fsdevel, linux-kernel, Josh Triplett

On Fri, Nov 30, 2018 at 10:33:18AM +1100, NeilBrown wrote:
> 
> The synchronize_rcu() in namespace_unlock() is called every time
> a filesystem is unmounted.  If a great many filesystems are mounted,
> this can cause a noticable slow-down in, for example, system shutdown.
> 
> The sequence:
>   mkdir -p /tmp/Mtest/{0..5000}
>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>   time umount /tmp/Mtest/*
> 
> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> 100 seconds to unmount them.
> 
> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> tmpfs filesystems, but only 36 to unmount.
> 
> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> the umount time on a 4-cpu VM drop to 0.6 seconds
> 
> I think this 200-fold speed up is worth the slightly high system
> impact of using synchronize_rcu_expedited().
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> Al Viro says "I can live with this one" but this still hasn't landed.
> Maybe someone else could apply it?

Applied (in work.misc, once I push it out)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH - resend*2] VFS: use synchronize_rcu_expedited() in namespace_unlock()
  2018-11-29 23:52               ` Al Viro
@ 2018-11-30  1:09                 ` NeilBrown
  0 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2018-11-30  1:09 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Linus Torvalds, paulmck, Florian Weimer,
	linux-fsdevel, linux-kernel, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

On Thu, Nov 29 2018, Al Viro wrote:

> On Fri, Nov 30, 2018 at 10:33:18AM +1100, NeilBrown wrote:
>> 
>> The synchronize_rcu() in namespace_unlock() is called every time
>> a filesystem is unmounted.  If a great many filesystems are mounted,
>> this can cause a noticable slow-down in, for example, system shutdown.
>> 
>> The sequence:
>>   mkdir -p /tmp/Mtest/{0..5000}
>>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>>   time umount /tmp/Mtest/*
>> 
>> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
>> 100 seconds to unmount them.
>> 
>> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
>> tmpfs filesystems, but only 36 to unmount.
>> 
>> If we change the synchronize_rcu() to synchronize_rcu_expedited()
>> the umount time on a 4-cpu VM drop to 0.6 seconds
>> 
>> I think this 200-fold speed up is worth the slightly high system
>> impact of using synchronize_rcu_expedited().
>> 
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> 
>> Al Viro says "I can live with this one" but this still hasn't landed.
>> Maybe someone else could apply it?
>
> Applied (in work.misc, once I push it out)

Excellent - thanks a lot :-)

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-11-30  1:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-26  2:26 [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock() NeilBrown
2017-10-26 12:27 ` Paul E. McKenney
2017-10-26 13:50   ` Paul E. McKenney
2017-10-27  0:45   ` NeilBrown
2017-10-27  1:24     ` Paul E. McKenney
2017-11-27 11:27   ` Florian Weimer
2017-11-27 14:41     ` Paul E. McKenney
2017-11-28 22:17       ` NeilBrown
2018-10-05  1:27         ` [PATCH - resend] " NeilBrown
2018-10-05  1:40           ` Al Viro
2018-10-05  2:53             ` NeilBrown
2018-10-05  4:08             ` Paul E. McKenney
2018-11-29 23:33             ` [PATCH - resend*2] " NeilBrown
2018-11-29 23:52               ` Al Viro
2018-11-30  1:09                 ` NeilBrown
2018-11-06  3:15           ` [PATCH - resend] " NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.