linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Userfaultfd doesn't seem to break out of poll on fd close
@ 2020-04-12 20:10 Brian Geffon
  2020-04-14 21:45 ` Peter Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Geffon @ 2020-04-12 20:10 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm, LKML, Sonny Rao, Peter Xu

Hi,
It seems that userfaultfd isn't woken from a poll when the file
descriptor is closed. It seems that it should be from the code in
userfault_ctx_release, but it appears that's not actually called
immediately. I have a simple standalone example that shows this
behavior. It's straight forward: one thread creates a userfaultfd and
then closes it after a second thread has entered a poll syscall, some
abbreviated strace output is below showing this and the code can be
seen here: https://gist.github.com/bgaff/9a8fbbe8af79c0e18502430d416df77e

Given that it's probably very common to have a dedicated thread remain
blocked indefinitely in a poll(2) waiting for faults there must be a
way to break it out early when it's closed. Am I missing something?

Thanks,
Brian

// Open a userfaultfd from Thread 1.
12:55:27.611942 userfaultfd(O_NONBLOCK) = 3
12:55:27.612007 ioctl(3, UFFDIO_API, {api=0xaa, features=0 =>
features=UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID,
ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0

// Create a second thread (Thread 2) to poll.
12:55:27.612447 clone(strace: Process 72730 attached
child_stack=0x7f30efa9ffb0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7f30efaa09d0, tls=0x7f30efaa0700,
child_tidptr=0x7f30efaa09d0) = 72730

// Thread 2 will poll on the userfaultfd for up to 2000ms.
[pid 72730] 12:55:27.612676 poll([{fd=3, events=POLLIN}], 1, 2000
<unfinished ...>

// Thread 1 closes the userfaultfd and fcntl confirms it's closed:
[pid 72729] 12:55:28.612945 close(3)    = 0
[pid 72729] 12:55:28.613039 fcntl(3, F_GETFD) = -1 EBADF (Bad file descriptor)

// Poll technically times out and while loop back in do_poll it gets a POLLNVAL.
[pid 72730] 12:55:29.614906 <... poll resumed> ) = 1 ([{fd=3,
revents=POLLNVAL}])


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-12 20:10 Userfaultfd doesn't seem to break out of poll on fd close Brian Geffon
@ 2020-04-14 21:45 ` Peter Xu
  2020-04-14 22:34   ` Jason Gunthorpe
  2020-04-15  3:16   ` Hillf Danton
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Xu @ 2020-04-14 21:45 UTC (permalink / raw)
  To: Brian Geffon; +Cc: Andrea Arcangeli, linux-mm, LKML, Sonny Rao

On Sun, Apr 12, 2020 at 01:10:40PM -0700, Brian Geffon wrote:
> Hi,
> It seems that userfaultfd isn't woken from a poll when the file
> descriptor is closed. It seems that it should be from the code in
> userfault_ctx_release, but it appears that's not actually called
> immediately. I have a simple standalone example that shows this
> behavior. It's straight forward: one thread creates a userfaultfd and
> then closes it after a second thread has entered a poll syscall, some
> abbreviated strace output is below showing this and the code can be
> seen here: https://gist.github.com/bgaff/9a8fbbe8af79c0e18502430d416df77e
> 
> Given that it's probably very common to have a dedicated thread remain
> blocked indefinitely in a poll(2) waiting for faults there must be a
> way to break it out early when it's closed. Am I missing something?

Hi, Brian,

I might be wrong below, just to share my understanding...

IMHO a well-behaved userspace should not close() on a file descriptor
if it's still in use within another thread.  In this case, the poll()
thread is still using the userfaultfd handle, so imo it's cleaner that
the main thread should pthread_join() on the poll() thread before it
closes the handle.  It can be easily achieved by attaching another
eventfd to the struct pollfds array, and write to the eventfd when the
main thread wants to quit so that the poll() will return on the write
to the eventfd.

On the other hand I'm thinking whether we can achieve what you said.
IIUC userfaultfd_release() is only called when the file descriptor
destructs itself.  But shouldn't the poll() take a refcount of that
file descriptor too before waiting?  Not sure userfaultfd_release() is
the place to kick then, because if so, close() will only decrease the
fd refcount from 2->1, and I'm not sure userfaultfd_release() will be
triggered.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-14 21:45 ` Peter Xu
@ 2020-04-14 22:34   ` Jason Gunthorpe
  2020-04-15  3:16   ` Hillf Danton
  1 sibling, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2020-04-14 22:34 UTC (permalink / raw)
  To: Peter Xu; +Cc: Brian Geffon, Andrea Arcangeli, linux-mm, LKML, Sonny Rao

On Tue, Apr 14, 2020 at 05:45:16PM -0400, Peter Xu wrote:
> On Sun, Apr 12, 2020 at 01:10:40PM -0700, Brian Geffon wrote:
> > Hi,
> > It seems that userfaultfd isn't woken from a poll when the file
> > descriptor is closed. It seems that it should be from the code in
> > userfault_ctx_release, but it appears that's not actually called
> > immediately. I have a simple standalone example that shows this
> > behavior. It's straight forward: one thread creates a userfaultfd and
> > then closes it after a second thread has entered a poll syscall, some
> > abbreviated strace output is below showing this and the code can be
> > seen here: https://gist.github.com/bgaff/9a8fbbe8af79c0e18502430d416df77e
> > 
> > Given that it's probably very common to have a dedicated thread remain
> > blocked indefinitely in a poll(2) waiting for faults there must be a
> > way to break it out early when it's closed. Am I missing something?
> 
> Hi, Brian,
> 
> I might be wrong below, just to share my understanding...
> 
> IMHO a well-behaved userspace should not close() on a file descriptor
> if it's still in use within another thread.  In this case, the poll()
> thread is still using the userfaultfd handle

I also don't think concurrant close() on a file descriptor that is
under poll() is well defined, or should be relied upon.

> IIUC userfaultfd_release() is only called when the file descriptor
> destructs itself.  But shouldn't the poll() take a refcount of that
> file descriptor too before waiting?  Not sure userfaultfd_release() is
> the place to kick then, because if so, close() will only decrease the
> fd refcount from 2->1, and I'm not sure userfaultfd_release() will be
> triggered.

This is most probably true.

eventfd, epoll and pthread_join is the robust answer to these
problems.

Jason


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-14 21:45 ` Peter Xu
  2020-04-14 22:34   ` Jason Gunthorpe
@ 2020-04-15  3:16   ` Hillf Danton
  2020-04-15 14:25     ` Jason Gunthorpe
  1 sibling, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2020-04-15  3:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, Brian Geffon, Andrea Arcangeli, linux-mm, LKML, Sonny Rao


On Tue, 14 Apr 2020 19:34:10 -0300 Jason Gunthorpe wrote:
> 
> On Tue, Apr 14, 2020 at 05:45:16PM -0400, Peter Xu wrote:
> > On Sun, Apr 12, 2020 at 01:10:40PM -0700, Brian Geffon wrote:
> > > Hi,
> > > It seems that userfaultfd isn't woken from a poll when the file
> > > descriptor is closed. It seems that it should be from the code in
> > > userfault_ctx_release, but it appears that's not actually called
> > > immediately. I have a simple standalone example that shows this
> > > behavior. It's straight forward: one thread creates a userfaultfd and
> > > then closes it after a second thread has entered a poll syscall, some
> > > abbreviated strace output is below showing this and the code can be
> > > seen here: https://gist.github.com/bgaff/9a8fbbe8af79c0e18502430d416df77e
> > > 
> > > Given that it's probably very common to have a dedicated thread remain
> > > blocked indefinitely in a poll(2) waiting for faults there must be a
> > > way to break it out early when it's closed. Am I missing something?
> > 
> > Hi, Brian,
> > 
> > I might be wrong below, just to share my understanding...
> > 
> > IMHO a well-behaved userspace should not close() on a file descriptor
> > if it's still in use within another thread.  In this case, the poll()
> > thread is still using the userfaultfd handle
> 
> I also don't think concurrant close() on a file descriptor that is
> under poll() is well defined, or should be relied upon.
> 
> > IIUC userfaultfd_release() is only called when the file descriptor
> > destructs itself.  But shouldn't the poll() take a refcount of that
> > file descriptor too before waiting?  Not sure userfaultfd_release() is
> > the place to kick then, because if so, close() will only decrease the
> > fd refcount from 2->1, and I'm not sure userfaultfd_release() will be
> > triggered.
> 
> This is most probably true.
> 
> eventfd, epoll and pthread_join is the robust answer to these
> problems.
> 

See the difference EPOLLHUP makes.

--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -937,7 +937,7 @@ wakeup:
 	/* Flush pending events that may still wait on event_wqh */
 	wake_up_all(&ctx->event_wqh);
 
-	wake_up_poll(&ctx->fd_wqh, EPOLLHUP);
+	wake_up_all(&ctx->fd_wqh);
 	userfaultfd_ctx_put(ctx);
 	return 0;
 }



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-15  3:16   ` Hillf Danton
@ 2020-04-15 14:25     ` Jason Gunthorpe
  2020-04-15 15:16       ` Brian Geffon
  2020-04-16  0:02       ` Andrea Arcangeli
  0 siblings, 2 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2020-04-15 14:25 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Peter Xu, Brian Geffon, Andrea Arcangeli, linux-mm, LKML, Sonny Rao

On Wed, Apr 15, 2020 at 11:16:02AM +0800, Hillf Danton wrote:
> 
> On Tue, 14 Apr 2020 19:34:10 -0300 Jason Gunthorpe wrote:
> > 
> > On Tue, Apr 14, 2020 at 05:45:16PM -0400, Peter Xu wrote:
> > > On Sun, Apr 12, 2020 at 01:10:40PM -0700, Brian Geffon wrote:
> > > > Hi,
> > > > It seems that userfaultfd isn't woken from a poll when the file
> > > > descriptor is closed. It seems that it should be from the code in
> > > > userfault_ctx_release, but it appears that's not actually called
> > > > immediately. I have a simple standalone example that shows this
> > > > behavior. It's straight forward: one thread creates a userfaultfd and
> > > > then closes it after a second thread has entered a poll syscall, some
> > > > abbreviated strace output is below showing this and the code can be
> > > > seen here: https://gist.github.com/bgaff/9a8fbbe8af79c0e18502430d416df77e
> > > > 
> > > > Given that it's probably very common to have a dedicated thread remain
> > > > blocked indefinitely in a poll(2) waiting for faults there must be a
> > > > way to break it out early when it's closed. Am I missing something?
> > > 
> > > Hi, Brian,
> > > 
> > > I might be wrong below, just to share my understanding...
> > > 
> > > IMHO a well-behaved userspace should not close() on a file descriptor
> > > if it's still in use within another thread.  In this case, the poll()
> > > thread is still using the userfaultfd handle
> > 
> > I also don't think concurrant close() on a file descriptor that is
> > under poll() is well defined, or should be relied upon.
> > 
> > > IIUC userfaultfd_release() is only called when the file descriptor
> > > destructs itself.  But shouldn't the poll() take a refcount of that
> > > file descriptor too before waiting?  Not sure userfaultfd_release() is
> > > the place to kick then, because if so, close() will only decrease the
> > > fd refcount from 2->1, and I'm not sure userfaultfd_release() will be
> > > triggered.
> > 
> > This is most probably true.
> > 
> > eventfd, epoll and pthread_join is the robust answer to these
> > problems.
> > 
> 
> See the difference EPOLLHUP makes.

The whole idea is completely racey:

          CPU1                            CPU2                  CPU3
 fds[i]->fd = userfaultfd;
 while()
                                       close(userfaultfd)
                                       pthread_join()
                                                            someother_fd = open()
                                                            userfaultfd == someother_fd
     poll(fds)   // <- Still sleeps

The kernel should not be trying to wake poll from fd release, and
userspace should not close a FD that is currently under poll.

Besides, it really does look like poll holds the fget while doing its
work (see poll_freewait), so fops release() won't be called anyhow..

Jason


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-15 14:25     ` Jason Gunthorpe
@ 2020-04-15 15:16       ` Brian Geffon
  2020-04-16  0:02       ` Andrea Arcangeli
  1 sibling, 0 replies; 11+ messages in thread
From: Brian Geffon @ 2020-04-15 15:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hillf Danton, Peter Xu, Andrea Arcangeli, linux-mm, LKML, Sonny Rao

Thanks everyone for the insights!




On Wed, Apr 15, 2020 at 7:25 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Apr 15, 2020 at 11:16:02AM +0800, Hillf Danton wrote:
> >
> > On Tue, 14 Apr 2020 19:34:10 -0300 Jason Gunthorpe wrote:
> > >
> > > On Tue, Apr 14, 2020 at 05:45:16PM -0400, Peter Xu wrote:
> > > > On Sun, Apr 12, 2020 at 01:10:40PM -0700, Brian Geffon wrote:
> > > > > Hi,
> > > > > It seems that userfaultfd isn't woken from a poll when the file
> > > > > descriptor is closed. It seems that it should be from the code in
> > > > > userfault_ctx_release, but it appears that's not actually called
> > > > > immediately. I have a simple standalone example that shows this
> > > > > behavior. It's straight forward: one thread creates a userfaultfd and
> > > > > then closes it after a second thread has entered a poll syscall, some
> > > > > abbreviated strace output is below showing this and the code can be
> > > > > seen here: https://gist.github.com/bgaff/9a8fbbe8af79c0e18502430d416df77e
> > > > >
> > > > > Given that it's probably very common to have a dedicated thread remain
> > > > > blocked indefinitely in a poll(2) waiting for faults there must be a
> > > > > way to break it out early when it's closed. Am I missing something?
> > > >
> > > > Hi, Brian,
> > > >
> > > > I might be wrong below, just to share my understanding...
> > > >
> > > > IMHO a well-behaved userspace should not close() on a file descriptor
> > > > if it's still in use within another thread.  In this case, the poll()
> > > > thread is still using the userfaultfd handle
> > >
> > > I also don't think concurrant close() on a file descriptor that is
> > > under poll() is well defined, or should be relied upon.
> > >
> > > > IIUC userfaultfd_release() is only called when the file descriptor
> > > > destructs itself.  But shouldn't the poll() take a refcount of that
> > > > file descriptor too before waiting?  Not sure userfaultfd_release() is
> > > > the place to kick then, because if so, close() will only decrease the
> > > > fd refcount from 2->1, and I'm not sure userfaultfd_release() will be
> > > > triggered.
> > >
> > > This is most probably true.
> > >
> > > eventfd, epoll and pthread_join is the robust answer to these
> > > problems.
> > >
> >
> > See the difference EPOLLHUP makes.
>
> The whole idea is completely racey:
>
>           CPU1                            CPU2                  CPU3
>  fds[i]->fd = userfaultfd;
>  while()
>                                        close(userfaultfd)
>                                        pthread_join()
>                                                             someother_fd = open()
>                                                             userfaultfd == someother_fd
>      poll(fds)   // <- Still sleeps
>
> The kernel should not be trying to wake poll from fd release, and
> userspace should not close a FD that is currently under poll.
>
> Besides, it really does look like poll holds the fget while doing its
> work (see poll_freewait), so fops release() won't be called anyhow..
>
> Jason


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-15 14:25     ` Jason Gunthorpe
  2020-04-15 15:16       ` Brian Geffon
@ 2020-04-16  0:02       ` Andrea Arcangeli
  2020-04-16  1:15         ` Brian Geffon
  1 sibling, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2020-04-16  0:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hillf Danton, Peter Xu, Brian Geffon, linux-mm, LKML, Sonny Rao

Hello everyone,

On Wed, Apr 15, 2020 at 11:25:46AM -0300, Jason Gunthorpe wrote:
>           CPU1                            CPU2                  CPU3
>  fds[i]->fd = userfaultfd;
>  while()
>                                        close(userfaultfd)
>                                        pthread_join()
>                                                             someother_fd = open()
>                                                             userfaultfd == someother_fd
>      poll(fds)   // <- Still sleeps
> 
> The kernel should not be trying to wake poll from fd release, and
> userspace should not close a FD that is currently under poll.
> 
> Besides, it really does look like poll holds the fget while doing its
> work (see poll_freewait), so fops release() won't be called anyhow..

Agreed, poll does fdget (not userfaultfd_poll) so there's no way
->release will be called when the fd is closed in the other thread.

The simple way to fix this is to implement a ->flush operation
(userfaultfd_flush), perhaps something like this would work (untested):

static int userfaultfd_flush(struct file *file, fl_owner_t id)
{
	struct userfaultfd_ctx *ctx = file->private_data;
	wake_up_poll(&ctx->fd_wqh, EPOLLHUP);
}

If eventfd and pipes all behave identical to uffd (they should as they
don't seem to implement flush) I'm not sure if there's good enough
justification to deviate from the default VFS behavior here.

The file flush operation is usually meaningful when the fd represent
data stored remotely, like with nfs, for uffd close() has no special
semantics.

With threads, you can get the wakeup by other means as Peter
suggested. Then you can close the uffd in the parent after poll
returns.

Alternatively if you want to rely on uffd to send the poll wakeup you
could use UFFDIO_WAKE instead of closing the fd, and still close the fd
after poll returns.

Overall the more normal thing to do is to close the uffd after poll
returns, if you can't do that (or if it's less efficient doing that)
it'd be interesting to know why to better evaluate this. By just
looking the testcase there's no way to tell if you gain something
meaningful by closing the fd during poll..

Thanks,
Andrea



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-16  0:02       ` Andrea Arcangeli
@ 2020-04-16  1:15         ` Brian Geffon
  2020-04-16  1:37           ` Peter Xu
  2020-04-16 14:49           ` Jason Gunthorpe
  0 siblings, 2 replies; 11+ messages in thread
From: Brian Geffon @ 2020-04-16  1:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Jason Gunthorpe, Hillf Danton, Peter Xu, linux-mm, LKML, Sonny Rao

Hi Andrea,
Thanks for taking the time to reply.

> static int userfaultfd_flush(struct file *file, fl_owner_t id)
> {
>         struct userfaultfd_ctx *ctx = file->private_data;
>         wake_up_poll(&ctx->fd_wqh, EPOLLHUP);
> }
>

Yes, I think that something like this would work for this situation and eventfd.

> If eventfd and pipes all behave identical to uffd (they should as they
> don't seem to implement flush) I'm not sure if there's good enough
> justification to deviate from the default VFS behavior here.

Pipes actually behave a little differently, in the case that you close
the write end of the pipe the read end will break out of the poll with
EPOLLHUP, but I suppose closing the read end while the read end is
being polled would be more analogous to what I'm describing here. And
this is why it felt weird to me, in these situations the kernel
_knows_ that after the close nothing can happen on the file
descriptor, so what's the point of keeping it in a poll? As soon as
the poll breaks any read, write, ioctl, etc on the fd whether it's a
userfaultfd or an eventfd would fail with -EBADF.

And all of that I guess makes sense in the case of a non-blocking fd,
but what about the case of a blocking file descriptor? Both
userfaultfd and eventfd can seemingly be stuck in a read syscall with
no way to break them out when the userfaultfd/eventfd has no further
utility. Here is an example:
https://gist.github.com/bgaff/607302d86d99ac539efca307ce2dd679

For my use case adding an eventfd on poll works well, so thank you for
that suggestion. But the behavior just seemed odd to me which is why I
started this thread.

Thanks,
Brian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-16  1:15         ` Brian Geffon
@ 2020-04-16  1:37           ` Peter Xu
  2020-04-16  4:39             ` Brian Geffon
  2020-04-16 14:49           ` Jason Gunthorpe
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Xu @ 2020-04-16  1:37 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Andrea Arcangeli, Jason Gunthorpe, Hillf Danton, linux-mm, LKML,
	Sonny Rao

On Wed, Apr 15, 2020 at 06:15:26PM -0700, Brian Geffon wrote:
> Hi Andrea,
> Thanks for taking the time to reply.
> 
> > static int userfaultfd_flush(struct file *file, fl_owner_t id)
> > {
> >         struct userfaultfd_ctx *ctx = file->private_data;
> >         wake_up_poll(&ctx->fd_wqh, EPOLLHUP);
> > }
> >
> 
> Yes, I think that something like this would work for this situation and eventfd.
> 
> > If eventfd and pipes all behave identical to uffd (they should as they
> > don't seem to implement flush) I'm not sure if there's good enough
> > justification to deviate from the default VFS behavior here.
> 
> Pipes actually behave a little differently, in the case that you close
> the write end of the pipe the read end will break out of the poll with
> EPOLLHUP, but I suppose closing the read end while the read end is
> being polled would be more analogous to what I'm describing here. And
> this is why it felt weird to me, in these situations the kernel
> _knows_ that after the close nothing can happen on the file
> descriptor, so what's the point of keeping it in a poll? As soon as
> the poll breaks any read, write, ioctl, etc on the fd whether it's a
> userfaultfd or an eventfd would fail with -EBADF.
> 
> And all of that I guess makes sense in the case of a non-blocking fd,
> but what about the case of a blocking file descriptor? Both
> userfaultfd and eventfd can seemingly be stuck in a read syscall with
> no way to break them out when the userfaultfd/eventfd has no further
> utility. Here is an example:
> https://gist.github.com/bgaff/607302d86d99ac539efca307ce2dd679
> 
> For my use case adding an eventfd on poll works well, so thank you for
> that suggestion. But the behavior just seemed odd to me which is why I
> started this thread.

Hi, Brian,

I think I can understand you on the weirdness when comparing to the
pipes.  And IIUC that's majorly what POLLHUP is used for - it tells us
that the channel has closed.  I believe it's the same to a pair of
send/recv sockets when one end closes the port so the other side can
get a POLLHUP.

However IMO userfaultfd is not such a channel like pipes, as you have
already mentioned.  It's not paired ports.  As you've given the other
example on "closing the read pipe when reading the read pipe" - I'm
curious what will happen for that.  I feel like it'll happen the same
way as being blocked, just like what userfaultfd and eventfd are
doing.  My understanding is that the Linux kernel should be thread
safe on all these operations so no matter how we use the syscalls and
in what order the kernel shouldn't break with this.  However IMHO it
does not mean that it'll guarantee things like "close() will kick all
existing fd operations".  I don't know whether there's any restriction
in POSIX or anything for this, but... I won't be too surprised if
someone tells me there's some OS that will directly crash the process
if one fd is close()ed during a read()...

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-16  1:37           ` Peter Xu
@ 2020-04-16  4:39             ` Brian Geffon
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Geffon @ 2020-04-16  4:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrea Arcangeli, Jason Gunthorpe, Hillf Danton, linux-mm, LKML,
	Sonny Rao

Thanks Peter,
I see your point. I'm totally fine if we just leave this at: just
don't do it. lol. I appreciate you guys taking the time to talk
through this.

Brian



Brian

On Wed, Apr 15, 2020 at 6:37 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Apr 15, 2020 at 06:15:26PM -0700, Brian Geffon wrote:
> > Hi Andrea,
> > Thanks for taking the time to reply.
> >
> > > static int userfaultfd_flush(struct file *file, fl_owner_t id)
> > > {
> > >         struct userfaultfd_ctx *ctx = file->private_data;
> > >         wake_up_poll(&ctx->fd_wqh, EPOLLHUP);
> > > }
> > >
> >
> > Yes, I think that something like this would work for this situation and eventfd.
> >
> > > If eventfd and pipes all behave identical to uffd (they should as they
> > > don't seem to implement flush) I'm not sure if there's good enough
> > > justification to deviate from the default VFS behavior here.
> >
> > Pipes actually behave a little differently, in the case that you close
> > the write end of the pipe the read end will break out of the poll with
> > EPOLLHUP, but I suppose closing the read end while the read end is
> > being polled would be more analogous to what I'm describing here. And
> > this is why it felt weird to me, in these situations the kernel
> > _knows_ that after the close nothing can happen on the file
> > descriptor, so what's the point of keeping it in a poll? As soon as
> > the poll breaks any read, write, ioctl, etc on the fd whether it's a
> > userfaultfd or an eventfd would fail with -EBADF.
> >
> > And all of that I guess makes sense in the case of a non-blocking fd,
> > but what about the case of a blocking file descriptor? Both
> > userfaultfd and eventfd can seemingly be stuck in a read syscall with
> > no way to break them out when the userfaultfd/eventfd has no further
> > utility. Here is an example:
> > https://gist.github.com/bgaff/607302d86d99ac539efca307ce2dd679
> >
> > For my use case adding an eventfd on poll works well, so thank you for
> > that suggestion. But the behavior just seemed odd to me which is why I
> > started this thread.
>
> Hi, Brian,
>
> I think I can understand you on the weirdness when comparing to the
> pipes.  And IIUC that's majorly what POLLHUP is used for - it tells us
> that the channel has closed.  I believe it's the same to a pair of
> send/recv sockets when one end closes the port so the other side can
> get a POLLHUP.
>
> However IMO userfaultfd is not such a channel like pipes, as you have
> already mentioned.  It's not paired ports.  As you've given the other
> example on "closing the read pipe when reading the read pipe" - I'm
> curious what will happen for that.  I feel like it'll happen the same
> way as being blocked, just like what userfaultfd and eventfd are
> doing.  My understanding is that the Linux kernel should be thread
> safe on all these operations so no matter how we use the syscalls and
> in what order the kernel shouldn't break with this.  However IMHO it
> does not mean that it'll guarantee things like "close() will kick all
> existing fd operations".  I don't know whether there's any restriction
> in POSIX or anything for this, but... I won't be too surprised if
> someone tells me there's some OS that will directly crash the process
> if one fd is close()ed during a read()...
>
> Thanks,
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userfaultfd doesn't seem to break out of poll on fd close
  2020-04-16  1:15         ` Brian Geffon
  2020-04-16  1:37           ` Peter Xu
@ 2020-04-16 14:49           ` Jason Gunthorpe
  1 sibling, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2020-04-16 14:49 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Andrea Arcangeli, Hillf Danton, Peter Xu, linux-mm, LKML, Sonny Rao

On Wed, Apr 15, 2020 at 06:15:26PM -0700, Brian Geffon wrote:

> And all of that I guess makes sense in the case of a non-blocking fd,
> but what about the case of a blocking file descriptor? Both
> userfaultfd and eventfd can seemingly be stuck in a read syscall with
> no way to break them out when the userfaultfd/eventfd has no further
> utility. Here is an example:
> https://gist.github.com/bgaff/607302d86d99ac539efca307ce2dd679

If an application wants to terminate blocking calls it has to rely on
signal delivery or pthread_cancel to end it.

This is very complicated so it is generally better to use poll and non
blocking.

Jason


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-04-16 14:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-12 20:10 Userfaultfd doesn't seem to break out of poll on fd close Brian Geffon
2020-04-14 21:45 ` Peter Xu
2020-04-14 22:34   ` Jason Gunthorpe
2020-04-15  3:16   ` Hillf Danton
2020-04-15 14:25     ` Jason Gunthorpe
2020-04-15 15:16       ` Brian Geffon
2020-04-16  0:02       ` Andrea Arcangeli
2020-04-16  1:15         ` Brian Geffon
2020-04-16  1:37           ` Peter Xu
2020-04-16  4:39             ` Brian Geffon
2020-04-16 14:49           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).