All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hanna Reitz <hreitz@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	virtio-fs@redhat.com, Ioannis Angelakopoulos <jaggel@bu.edu>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
Date: Wed, 11 Aug 2021 08:41:18 +0200	[thread overview]
Message-ID: <6e943ee0-dcb3-6812-3a0b-eb2b72b503ad@redhat.com> (raw)
In-Reply-To: <YRKh/fbBntF+GfS8@redhat.com>

On 10.08.21 17:57, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 05:26:15PM +0200, Hanna Reitz wrote:
>> On 10.08.21 17:23, Vivek Goyal wrote:
>>> On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
>>>> On 09.08.21 20:41, Vivek Goyal wrote:
>>>>> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>>>>>> When the inode_file_handles option is set, try to generate a file handle
>>>>>> for new inodes instead of opening an O_PATH FD.
>>>>>>
>>>>>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>>>>>> description text tells the user they will also need to specify
>>>>>> -o modcaps=+dac_read_search.
>>>>>>
>>>>>> Generating a file handle returns the mount ID it is valid for.  Opening
>>>>>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>>>>>> get_file_handle() fills the hash map by opening the file we have
>>>>>> generated a handle for.  To verify that the resulting FD indeed
>>>>>> represents the handle's mount ID, we use statx().  Therefore, using file
>>>>>> handles requires statx() support.
>>>>> So opening the file and storing that fd in mount_fds table might be
>>>>> a potential problem with inotify work Ioannis is doing.
>>>>>
>>>>> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
>>>>> say user unlinks foo.txt. If notifications are enabled, final notification
>>>>> will not be generated till this mount_fds fd is closed.
>>>>>
>>>>> Now question is when will this fd be closed? If it closed at some
>>>>> later point and then notification is generated, that will break
>>>>> notificaitons.
>>>> Currently, it is never closed.
>>>>
>>>>> In fact even O_PATH fd is delaying notifications due to same reason.
>>>>> But its not too bad as we close O_PATH fd pretty quickly after
>>>>> unlinking. And we were hoping that file handle support will get rid
>>>>> of this problem because we will not keep O_PATH fd open.
>>>>>
>>>>> But, IIUC, mount_fds stuff will make it even worse. I did not see
>>>>> the code which removes this fd from mount_fds. So I am not sure what's
>>>>> the life time of this fd.
>>>> The lifetime is forever.  If we wanted to remove it at some point, we’d need
>>>> to track how many file handles we have open for the given mount fd and then
>>>> remove it from the table once the count reaches 0, so it would still be
>>>> delayed.
>>>>
>>>> I think in practice the first thing that is looked up from some mount will
>>>> probably be the root directory, which cannot be deleted before everything
>>>> else on the mount is gone, so that would work.  We track how many handles
>>>> are there, if the whole mount were to be deleted, I hope all lo_inodes are
>>>> evicted, the count goes to 0, and we can drop the mount fd.
>>> Keeping a reference count on mount_fd object make sense. So we probably
>>> maintain this hash table and lookup using mount_id (as you are already
>>> doing). All subsequent inodes from same filesystem will use same
>>> object. Once all inodes have been flushed out, then mount_fd object
>>> should go away as well (allowing for unmount on host).
>>>
>>>> I think we can make the assumption that the mount fd is the root directory
>>>> certain by, well, looking into mountinfo...  That would result in us always
>>>> opening the root node of the filesystem, so that first the whole filesystem
>>>> needs to disappear before it can be deleted (and our mount fd closed) –
>>>> which should work, I guess?
>>> This seems more reasonable. And I think that's what man page seems to
>>> suggest.
>>>
>>>          The  mount_id  argument  returns an identifier for the filesystem mount
>>>          that corresponds to pathname.  This corresponds to the first  field  in
>>>          one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
>>>          the fifth field of that record yields a file descriptor for  the  mount
>>>          point;  that  file  descriptor  can  be  used  in  a subsequent call to
>>>          open_by_handle_at().
>>>
>>> Fifth field seems to be the mount point. man proc says.
>>>
>>>                 (5)  mount  point:  the  pathname of the mount point relative to
>>>                      the process's root directory.
>>>
>>> So opening mount point and saving as mount_fd (if it is not already
>>> in hash table) and then take a per inode reference count on mount_fd
>>> object looks like will solve the life time issue of mount_fd as
>>> well as the issue of temporary failures arising because we can't
>>> open a device special file.
>> Well, we’ve had this discussion before, and it’s possible that a filesystem
>> has a device file as its mount point.
> Yes. I think you did modified fuse to do some special trickery. Not sure
> where should that be fixed.

I used fuse, but I’m sure a non-fuse filesystem can do the same.  (I 
mean, fuse effectively is a non-fuse filesystem, too.)

I don’t think it needs to be fixed, it just means we need to continue to 
stat the mount point to verify it’s a regular file or directory.

> If filesystem is faking, then it can fake a device node as regular
> file and fool us into opening it as well?

Well, of course opening any file can have side effects, on any filesystem.

>> But given the inotify complications, there’s really a good reason we should
>> use mountinfo.
>>
>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>> but if that’s the only way...
>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>> that any mount table changes will still be visible despite the fact
>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>> changes).
>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>> when I tried keeping the fd open, reading from it would just return 0
>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>> nothing else in /proc is visible. Perhaps we need to bind-mount
>> /proc/self/mountinfo into /proc/self/fd before that...
> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> before /proc/self/fd is bind mounted on /proc?

Yes, I tried that, and then reading would just return 0 bytes.

Hanna



WARNING: multiple messages have this Message-ID (diff)
From: Hanna Reitz <hreitz@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: qemu-devel@nongnu.org, virtio-fs@redhat.com,
	Max Reitz <mreitz@redhat.com>
Subject: Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
Date: Wed, 11 Aug 2021 08:41:18 +0200	[thread overview]
Message-ID: <6e943ee0-dcb3-6812-3a0b-eb2b72b503ad@redhat.com> (raw)
In-Reply-To: <YRKh/fbBntF+GfS8@redhat.com>

On 10.08.21 17:57, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 05:26:15PM +0200, Hanna Reitz wrote:
>> On 10.08.21 17:23, Vivek Goyal wrote:
>>> On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
>>>> On 09.08.21 20:41, Vivek Goyal wrote:
>>>>> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>>>>>> When the inode_file_handles option is set, try to generate a file handle
>>>>>> for new inodes instead of opening an O_PATH FD.
>>>>>>
>>>>>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>>>>>> description text tells the user they will also need to specify
>>>>>> -o modcaps=+dac_read_search.
>>>>>>
>>>>>> Generating a file handle returns the mount ID it is valid for.  Opening
>>>>>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>>>>>> get_file_handle() fills the hash map by opening the file we have
>>>>>> generated a handle for.  To verify that the resulting FD indeed
>>>>>> represents the handle's mount ID, we use statx().  Therefore, using file
>>>>>> handles requires statx() support.
>>>>> So opening the file and storing that fd in mount_fds table might be
>>>>> a potential problem with inotify work Ioannis is doing.
>>>>>
>>>>> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
>>>>> say user unlinks foo.txt. If notifications are enabled, final notification
>>>>> will not be generated till this mount_fds fd is closed.
>>>>>
>>>>> Now question is when will this fd be closed? If it closed at some
>>>>> later point and then notification is generated, that will break
>>>>> notificaitons.
>>>> Currently, it is never closed.
>>>>
>>>>> In fact even O_PATH fd is delaying notifications due to same reason.
>>>>> But its not too bad as we close O_PATH fd pretty quickly after
>>>>> unlinking. And we were hoping that file handle support will get rid
>>>>> of this problem because we will not keep O_PATH fd open.
>>>>>
>>>>> But, IIUC, mount_fds stuff will make it even worse. I did not see
>>>>> the code which removes this fd from mount_fds. So I am not sure what's
>>>>> the life time of this fd.
>>>> The lifetime is forever.  If we wanted to remove it at some point, we’d need
>>>> to track how many file handles we have open for the given mount fd and then
>>>> remove it from the table once the count reaches 0, so it would still be
>>>> delayed.
>>>>
>>>> I think in practice the first thing that is looked up from some mount will
>>>> probably be the root directory, which cannot be deleted before everything
>>>> else on the mount is gone, so that would work.  We track how many handles
>>>> are there, if the whole mount were to be deleted, I hope all lo_inodes are
>>>> evicted, the count goes to 0, and we can drop the mount fd.
>>> Keeping a reference count on mount_fd object make sense. So we probably
>>> maintain this hash table and lookup using mount_id (as you are already
>>> doing). All subsequent inodes from same filesystem will use same
>>> object. Once all inodes have been flushed out, then mount_fd object
>>> should go away as well (allowing for unmount on host).
>>>
>>>> I think we can make the assumption that the mount fd is the root directory
>>>> certain by, well, looking into mountinfo...  That would result in us always
>>>> opening the root node of the filesystem, so that first the whole filesystem
>>>> needs to disappear before it can be deleted (and our mount fd closed) –
>>>> which should work, I guess?
>>> This seems more reasonable. And I think that's what man page seems to
>>> suggest.
>>>
>>>          The  mount_id  argument  returns an identifier for the filesystem mount
>>>          that corresponds to pathname.  This corresponds to the first  field  in
>>>          one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
>>>          the fifth field of that record yields a file descriptor for  the  mount
>>>          point;  that  file  descriptor  can  be  used  in  a subsequent call to
>>>          open_by_handle_at().
>>>
>>> Fifth field seems to be the mount point. man proc says.
>>>
>>>                 (5)  mount  point:  the  pathname of the mount point relative to
>>>                      the process's root directory.
>>>
>>> So opening mount point and saving as mount_fd (if it is not already
>>> in hash table) and then take a per inode reference count on mount_fd
>>> object looks like will solve the life time issue of mount_fd as
>>> well as the issue of temporary failures arising because we can't
>>> open a device special file.
>> Well, we’ve had this discussion before, and it’s possible that a filesystem
>> has a device file as its mount point.
> Yes. I think you did modified fuse to do some special trickery. Not sure
> where should that be fixed.

I used fuse, but I’m sure a non-fuse filesystem can do the same.  (I 
mean, fuse effectively is a non-fuse filesystem, too.)

I don’t think it needs to be fixed, it just means we need to continue to 
stat the mount point to verify it’s a regular file or directory.

> If filesystem is faking, then it can fake a device node as regular
> file and fool us into opening it as well?

Well, of course opening any file can have side effects, on any filesystem.

>> But given the inotify complications, there’s really a good reason we should
>> use mountinfo.
>>
>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>> but if that’s the only way...
>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>> that any mount table changes will still be visible despite the fact
>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>> changes).
>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>> when I tried keeping the fd open, reading from it would just return 0
>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>> nothing else in /proc is visible. Perhaps we need to bind-mount
>> /proc/self/mountinfo into /proc/self/fd before that...
> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> before /proc/self/fd is bind mounted on /proc?

Yes, I tried that, and then reading would just return 0 bytes.

Hanna


  reply	other threads:[~2021-08-11  6:43 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 15:01 [PATCH v3 00/10] virtiofsd: Allow using file handles instead of O_PATH FDs Max Reitz
2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 14:16   ` Vivek Goyal
2021-08-06 14:16     ` [Virtio-fs] " Vivek Goyal
2021-08-09 10:30     ` Max Reitz
2021-08-09 10:30       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 02/10] virtiofsd: Add TempFd structure Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 14:41   ` Vivek Goyal
2021-08-06 14:41     ` [Virtio-fs] " Vivek Goyal
2021-08-09 10:44     ` Max Reitz
2021-08-09 10:44       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 03/10] virtiofsd: Use lo_inode_open() instead of openat() Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 15:42   ` Vivek Goyal
2021-08-06 15:42     ` [Virtio-fs] " Vivek Goyal
2021-07-30 15:01 ` [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 18:25   ` Vivek Goyal
2021-08-06 18:25     ` [Virtio-fs] " Vivek Goyal
2021-08-09 10:48     ` Max Reitz
2021-08-09 10:48       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 05/10] virtiofsd: Let lo_fd() return a TempFd Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 06/10] virtiofsd: Let lo_inode_open() " Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 19:55   ` Vivek Goyal
2021-08-06 19:55     ` [Virtio-fs] " Vivek Goyal
2021-08-09 13:40     ` Max Reitz
2021-08-09 13:40       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 15:21   ` Vivek Goyal
2021-08-09 15:21     ` [Virtio-fs] " Vivek Goyal
2021-08-09 16:41     ` Hanna Reitz
2021-08-09 16:41       ` [Virtio-fs] " Hanna Reitz
2021-07-30 15:01 ` [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 16:10   ` Vivek Goyal
2021-08-09 16:10     ` [Virtio-fs] " Vivek Goyal
2021-08-09 16:47     ` Hanna Reitz
2021-08-09 16:47       ` [Virtio-fs] " Hanna Reitz
2021-08-10 14:07       ` Vivek Goyal
2021-08-10 14:07         ` [Virtio-fs] " Vivek Goyal
2021-08-10 14:13         ` Hanna Reitz
2021-08-10 14:13           ` [Virtio-fs] " Hanna Reitz
2021-08-10 17:51           ` Vivek Goyal
2021-08-10 17:51             ` [Virtio-fs] " Vivek Goyal
2021-07-30 15:01 ` [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 18:41   ` Vivek Goyal
2021-08-09 18:41     ` [Virtio-fs] " Vivek Goyal
2021-08-10  8:32     ` Hanna Reitz
2021-08-10  8:32       ` [Virtio-fs] " Hanna Reitz
2021-08-10 15:23       ` Vivek Goyal
2021-08-10 15:23         ` [Virtio-fs] " Vivek Goyal
2021-08-10 15:26         ` Hanna Reitz
2021-08-10 15:26           ` [Virtio-fs] " Hanna Reitz
2021-08-10 15:57           ` Vivek Goyal
2021-08-10 15:57             ` [Virtio-fs] " Vivek Goyal
2021-08-11  6:41             ` Hanna Reitz [this message]
2021-08-11  6:41               ` Hanna Reitz
2021-08-16 19:44               ` Vivek Goyal
2021-08-16 19:44                 ` [Virtio-fs] " Vivek Goyal
2021-08-17  8:27                 ` Hanna Reitz
2021-08-17  8:27                   ` [Virtio-fs] " Hanna Reitz
2021-08-17 19:45                   ` Vivek Goyal
2021-08-17 19:45                     ` [Virtio-fs] " Vivek Goyal
2021-08-18  0:14                     ` Vivek Goyal
2021-08-18  0:14                       ` [Virtio-fs] " Vivek Goyal
2021-08-18 13:32                       ` Vivek Goyal
2021-08-18 13:32                         ` [Virtio-fs] " Vivek Goyal
2021-08-18 13:48                         ` Hanna Reitz
2021-08-18 13:48                           ` [Virtio-fs] " Hanna Reitz
2021-08-19 16:38   ` Dr. David Alan Gilbert
2021-08-19 16:38     ` [Virtio-fs] " Dr. David Alan Gilbert
2021-07-30 15:01 ` [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find() Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 19:08   ` Vivek Goyal
2021-08-09 19:08     ` [Virtio-fs] " Vivek Goyal
2021-08-10  8:38     ` Hanna Reitz
2021-08-10  8:38       ` [Virtio-fs] " Hanna Reitz
2021-08-10 14:12       ` Vivek Goyal
2021-08-10 14:12         ` [Virtio-fs] " Vivek Goyal
2021-08-10 14:17         ` Hanna Reitz
2021-08-10 14:17           ` [Virtio-fs] " Hanna Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e943ee0-dcb3-6812-3a0b-eb2b72b503ad@redhat.com \
    --to=hreitz@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jaggel@bu.edu \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.