From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:57554 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751375AbdK0L1I (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Mon, 27 Nov 2017 06:27:08 -0500
Subject: Re: [PATCH] VFS: use synchronize_rcu_expedited() in
 namespace_unlock()
To: paulmck@linux.vnet.ibm.com, NeilBrown <neilb@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
        linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
        Josh Triplett <josh@joshtriplett.org>
References: <87y3nyd4pu.fsf@notabene.neil.brown.name>
 <20171026122743.GX3659@linux.vnet.ibm.com>
From: Florian Weimer <fweimer@redhat.com>
Message-ID: <b8a1a898-850c-cc7a-2574-1bfd15cc9888@redhat.com>
Date: Mon, 27 Nov 2017 12:27:04 +0100
MIME-Version: 1.0
In-Reply-To: <20171026122743.GX3659@linux.vnet.ibm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 10/26/2017 02:27 PM, Paul E. McKenney wrote:
> But just for completeness, one way to make this work across the board
> might be to instead use call_rcu(), with the callback function kicking
> off a workqueue handler to do the rest of the unmount.  Of course,
> in saying that, I am ignoring any mutexes that you might be holding
> across this whole thing, and also ignoring any problems that might arise
> when returning to userspace with some portion of the unmount operation
> still pending.  (For example, someone unmounting a filesystem and then
> immediately remounting that same filesystem.)

You really need to complete all side effects of deallocating a resource 
before returning to user space.  Otherwise, it will never be possible to 
allocate and deallocate resources in a tight loop because you either get 
spurious failures because too many unaccounted deallocations are stuck 
somewhere in the system (and the user can't tell that this is due to a 
race), or you get an OOM because the user manages to queue up too much 
state.

We already have this problem with RLIMIT_NPROC, where waitpid etc. 
return before the process is completely gone.  On some 
kernels/configurations, the resulting race is so wide that parallel make 
no longer works reliable because it runs into fork failures.

Thanks,
Florian