On Mon, Nov 27 2017, Paul E. McKenney wrote: > On Mon, Nov 27, 2017 at 12:27:04PM +0100, Florian Weimer wrote: >> On 10/26/2017 02:27 PM, Paul E. McKenney wrote: >> >But just for completeness, one way to make this work across the board >> >might be to instead use call_rcu(), with the callback function kicking >> >off a workqueue handler to do the rest of the unmount. Of course, >> >in saying that, I am ignoring any mutexes that you might be holding >> >across this whole thing, and also ignoring any problems that might arise >> >when returning to userspace with some portion of the unmount operation >> >still pending. (For example, someone unmounting a filesystem and then >> >immediately remounting that same filesystem.) >> >> You really need to complete all side effects of deallocating a >> resource before returning to user space. Otherwise, it will never >> be possible to allocate and deallocate resources in a tight loop >> because you either get spurious failures because too many >> unaccounted deallocations are stuck somewhere in the system (and the >> user can't tell that this is due to a race), or you get an OOM >> because the user manages to queue up too much state. >> >> We already have this problem with RLIMIT_NPROC, where waitpid etc. >> return before the process is completely gone. On some >> kernels/configurations, the resulting race is so wide that parallel >> make no longer works reliable because it runs into fork failures. > > Or alternatively, use rcu_barrier() occasionally to wait for all > preceding deferred deallocations. And there are quite a few other > ways to take on this problem. So, supposing we could package up everything that has to happen after the current synchronize_rcu() and put it in an call_rcu() call back, then instead of calling synchronize_rcu_expedited() at the end of namespace_unlock(), we could possibly call call_rcu() there and rcu_barrier() at the start of namespace_lock()..... That would mean a single unmount would have low impact, but it would still slow down a sequence of 1000 consecutive unmounts. Maybe we would only need the rcu_barrier() before select namespace_lock() calls. I would need to study the code closely to form an opinion. Interesting idea though. Hopefully the _expedited() patch will be accepted - I haven't had a "nak" yet... thanks, NeilBrown