From mboxrd@z Thu Jan 1 00:00:00 1970 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=dVCc9AlYdPstvgl/kWIBuAsXFWyjG9Ne0kemj0nfUsY=; b=diq4nbdppgyicmXSuPgOIQCYcqPs9nnW293ezsxMCnBiYmhL3KD9CXqDLkTsPiCZ/6 z0EFNYZl28m2/L7b6YVlQ6zVHvgdU+KGp7+8J6466Nh9dsLqoNLBcXj7M9nKD59A7Njk fGy8b6crdND5gfJsFIOMTTQDmEktP4jKYqYtJr9IgFcSdw+bH8YYAZ1IZOmC9e3OqmJF 30jfHY4tgcRPOyMg/oEIpBS1GU2Acxrv24LKbxf6HIpKfl/BUpKFcTzZU7id7BZuQmdR Hwkhs4KXe/aQqAdZlwu1+Uw6aZQ6+u+eFWcTBtRM9OF0hEg1SZwDQQYxX1/J5pvxechQ ezAg== MIME-Version: 1.0 References: <20210104160013.GG2972@work-vm> <20210104184527.GC63879@redhat.com> <20210104185655.GN2972@work-vm> In-Reply-To: From: Amir Goldstein Date: Wed, 6 Jan 2021 11:27:20 +0200 Message-ID: Content-Type: text/plain; charset="UTF-8" Subject: Re: [Virtio-fs] [fuse-devel] 'FORGET' ordering semantics (vs unlink & NFS) List-Id: Development discussions about virtio-fs List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Miklos Szeredi Cc: fuse-devel , Max Reitz , virtio-fs-list , Vivek Goyal On Wed, Jan 6, 2021 at 11:16 AM Amir Goldstein wrote: > > On Wed, Jan 6, 2021 at 10:02 AM Miklos Szeredi wrote: > > > > On Wed, Jan 6, 2021 at 5:29 AM Amir Goldstein wrote: > > > > > > On Mon, Jan 4, 2021 at 8:57 PM Dr. David Alan Gilbert > > > wrote: > > > > > > > > * Vivek Goyal (vgoyal@redhat.com) wrote: > > > > > On Mon, Jan 04, 2021 at 04:00:13PM +0000, Dr. David Alan Gilbert wrote: > > > > > > Hi, > > > > > > On virtio-fs we're hitting a problem with NFS, where > > > > > > unlinking a file in a directory and then rmdir'ing that > > > > > > directory fails complaining about the directory not being empty. > > > > > > > > > > > > The problem here is that if a file has an open fd, NFS doesn't > > > > > > actually delete the file on unlink, it just renames it to > > > > > > a hidden file (e.g. .nfs*******). That open file is there because > > > > > > the 'FORGET' hasn't completed yet by the time the rmdir is issued. > > > > > > > > > > > > Question: > > > > > > a) In the FUSE protocol, are requests assumed to complete in order; > > > > > > i.e. unlink, forget, rmdir is it required that 'forget' completes > > > > > > before the rmdir is processed? > > > > > > (In virtiofs we've been processing requests, in parallel, and > > > > > > have sent forgets down a separate queue to keep them out of the way). > > > > > > > > > > > > b) 'forget' doesn't send a reply - so the kernel can't wait for the > > > > > > client to have finished it; do we need a synchronous forget here? > > > > > > > > > > Even if we introduce a synchronous forget, will that really fix the > > > > > issue. For example, this could also happen if file has been unlinked > > > > > but it is still open and directory is being removed. > > > > > > > > > > fd = open(foo/bar.txt) > > > > > unlink foo/bar.txt > > > > > rmdir foo > > > > > close(fd). > > > > > > > > > > In this case, final forget should go after fd has been closed. Its > > > > > not a forget race. > > > > > > > > > > I wrote a test case for this and it works on regular file systems. > > > > > > > > > > https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/rmdir.c > > > > > > > > > > I suspect it will fail on nfs because I am assuming that temporary > > > > > file will be there till final close(fd) happens. If that's the > > > > > case this is a NFS specific issue because its behavior is different > > > > > from other file systems. > > > > > > > > That's true; but that's NFS just being NFS; in our case we're keeping > > > > an fd open even though the guest has been smart enough not to; so we're > > > > causing the NFS oddity when it wouldn't normally happen. > > > > > > > > > > Are you sure that you really need this oddity? > > > > > > My sense from looking virtiofsd is that the open O_PATH fd > > > in InodeData for non-directories are an overkill and even the need > > > for open fd for all directories is questionable. > > > > > > If you store a FileHandle (name_to_handle_at(2)) instead of an open fd > > > for non-directories, you won't be keeping a reference on the underlying inode > > > so no unlink issue. > > > > > > open_by_handle_at(2) is very cheap for non-directory when underlying inode > > > is cached and as cheap as it can get even when inode is not in cache, so no > > > performance penalty is expected. > > > > You are perfectly right that using file handles would solve a number > > of issues, one being too many open file descriptors. > > > > The issue with open_by_handle_at(2) is that it needs > > CAP_DAC_READ_SEARCH in the initial user namespace. That currently > > makes it impossible to use in containers and such. > > Is that a problem for virtiofsd? does it also run inside a container?? > > Please note that NFS doesn't do "silly rename" for directories, > so mitigation is mostly needed for non-dir. > > An alternative method if daemon is not capable, is to store parent dirfd > in addition to filehandle and implement open_child_by_handle_at(int > parent_fd, ...): > - readdir(parend_fd) > - search a match for d_ino > - name_to_handle_at() and verify match to stored filehandle That would have to be AT_EMPTY_PATH after opening the file by name. > > This is essentially what open_by_handle_at(2) does under the covers > with a "connectable" non-dir filehandle after having resolved the > parent file handle part. I meant this is what's done under the covers for the non cached inode case of "connectable" non-dir filehandles. Of course we cannot do that on every lookup. What I forgot to say is that with this method we can close file fds if we have too many open files or on unlink() and open the file again in this method if needed. Thanks, Amir.