All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
@ 2022-05-11 15:45 Chengguang Xu
  2022-05-11 15:51 ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Chengguang Xu @ 2022-05-11 15:45 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, Chengguang Xu

Move fdput() to right place in ksys_sync_file_range() to
avoid fdput() after failed fdget().

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 fs/sync.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index c7690016453e..b217d908bee8 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -360,10 +360,10 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 
 	ret = -EBADF;
 	f = fdget(fd);
-	if (f.file)
+	if (f.file) {
 		ret = sync_file_range(f.file, offset, nbytes, flags);
-
-	fdput(f);
+		fdput(f);
+	}
 	return ret;
 }
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
  2022-05-11 15:45 [PATCH] vfs: move fdput() to right place in ksys_sync_file_range() Chengguang Xu
@ 2022-05-11 15:51 ` Matthew Wilcox
  2022-05-11 19:01   ` Eric Biggers
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2022-05-11 15:51 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: viro, linux-fsdevel

On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote:
> Move fdput() to right place in ksys_sync_file_range() to
> avoid fdput() after failed fdget().

Why?  fdput() is already conditional on FDPUT_FPUT so you're ...
optimising the failure case?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
  2022-05-11 15:51 ` Matthew Wilcox
@ 2022-05-11 19:01   ` Eric Biggers
  2022-05-11 21:43     ` Al Viro
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Biggers @ 2022-05-11 19:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Chengguang Xu, viro, linux-fsdevel

On Wed, May 11, 2022 at 04:51:34PM +0100, Matthew Wilcox wrote:
> On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote:
> > Move fdput() to right place in ksys_sync_file_range() to
> > avoid fdput() after failed fdget().
> 
> Why?  fdput() is already conditional on FDPUT_FPUT so you're ...
> optimising the failure case?

"fdput() after failed fdget()" has confused people before, so IMO it's worth
cleaning this up.  But the commit message should make clear that it's a cleanup,
not a bug fix.  Also I recommend using an early return:

	f = fdget(fd);
	if (!f.file)
		return -EBADF;
	ret = sync_file_range(f.file, offset, nbytes, flags);
	fdput(f);
	return ret;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
  2022-05-11 19:01   ` Eric Biggers
@ 2022-05-11 21:43     ` Al Viro
  2022-05-12  0:28       ` Al Viro
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Al Viro @ 2022-05-11 21:43 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Matthew Wilcox, Chengguang Xu, linux-fsdevel, Xu Kuohai,
	Alexei Starovoitov

[bpf folks Cc'd]

On Wed, May 11, 2022 at 07:01:34PM +0000, Eric Biggers wrote:
> On Wed, May 11, 2022 at 04:51:34PM +0100, Matthew Wilcox wrote:
> > On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote:
> > > Move fdput() to right place in ksys_sync_file_range() to
> > > avoid fdput() after failed fdget().
> > 
> > Why?  fdput() is already conditional on FDPUT_FPUT so you're ...
> > optimising the failure case?
> 
> "fdput() after failed fdget()" has confused people before, so IMO it's worth
> cleaning this up.  But the commit message should make clear that it's a cleanup,
> not a bug fix.  Also I recommend using an early return:
> 
> 	f = fdget(fd);
> 	if (!f.file)
> 		return -EBADF;
> 	ret = sync_file_range(f.file, offset, nbytes, flags);
> 	fdput(f);
> 	return ret;

FWIW, fdput() after failed fdget() is rare, but there's no fundamental reasons
why it would be wrong.  No objections against that patch, anyway.

Out of curiousity, I've just looked at the existing users.  In mainline we have
203 callers of fdput()/fdput_pos(); all but 7 never get reached with NULL ->file.

1) There's ksys_sync_file_range(), kernel_read_file_from_fd() and ksys_readahead() -
all with similar pattern.  I'm not sure that for readahead(2) "not opened for
read" should yield the same error as "bad descriptor", but since it's been a part
of userland ABI for a while...

2) two callers in perf_event_open(2) are playing silly buggers with explicit
        struct fd group = {NULL, 0};
and rely upon "fdput() is a no-op if we hadn't touched that" (note that if
we try to touch it and get NULL ->file from fdget(), we do not hit those fdput()
at all).

3) ovl_aio_put() is hard to follow (and some of the callers are poking
where they shouldn't), no idea if it's correct.  struct fd is manually
constructed there, anyway.

4) bpf generic_map_update_batch() is really asking for trouble.  The comment in
there is wrong:
        f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */
*NOTHING* we'd done earlier can guarantee that.  We might have a descriptor
table shared with another thread, and it might have very well done dup2() since
the last time we'd looked things up.  IOW, this fdget() is racy - the function
assumes it refers to the same thing that gave us map back in bpf_map_do_batch(),
but it's not guaranteed at all.

I hadn't put together a reproducer, but that code is very suspicious.  As a general
rule, you should treat descriptor table as shared object, modifiable by other
threads.  It can be explicitly locked and it can be explicitly unshared, but
short of that doing a lookup for the same descriptor twice in a row can yield
different results.

What's going on there?  Do you really want the same struct file you've got back in
bpf_map_do_batch() (i.e. the one you've got the map from)?  What should happen
if the descriptor changes its meaning during (or after) the operation?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
  2022-05-11 21:43     ` Al Viro
@ 2022-05-12  0:28       ` Al Viro
  2022-05-12  0:42         ` Jens Axboe
  2022-05-12  2:03       ` Alexei Starovoitov
  2022-05-15  3:30       ` [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()) Al Viro
  2 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2022-05-12  0:28 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Matthew Wilcox, Chengguang Xu, linux-fsdevel, Jens Axboe

On Wed, May 11, 2022 at 09:43:46PM +0000, Al Viro wrote:

> 3) ovl_aio_put() is hard to follow (and some of the callers are poking
> where they shouldn't), no idea if it's correct.  struct fd is manually
> constructed there, anyway.

Speaking of poking in the internals:

SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
                u32, min_complete, u32, flags, const void __user *, argp,
                size_t, argsz)
{
...
        struct fd f;
...
        if (flags & IORING_ENTER_REGISTERED_RING) {
                struct io_uring_task *tctx = current->io_uring;

                if (!tctx || fd >= IO_RINGFD_REG_MAX)
                        return -EINVAL;
                fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
                f.file = tctx->registered_rings[fd];
                if (unlikely(!f.file))
                        return -EBADF;
        } else {
                f = fdget(fd);
                if (unlikely(!f.file))
                        return -EBADF;
        }
...
a bunch of accesses to f.file
...
        if (!(flags & IORING_ENTER_REGISTERED_RING))
                fdput(f);

Note that f.flags is left uninitialized in the first case; it doesn't
break since we have fdput(f) (which does look at f.flags) done only
in the case where we don't have IORING_ENTER_REGISTERED_RING in flags
and since flags remains unchanged since the first if.  But it would
be just as easy to set f.flags to 0 and use fdput() in both cases...

Jens, do you have any objections against the following?  Easier to
follow that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 91de361ea9ab..b61ae18ef10a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -10760,14 +10760,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 			return -EINVAL;
 		fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
 		f.file = tctx->registered_rings[fd];
-		if (unlikely(!f.file))
-			return -EBADF;
+		f.flags = 0;
 	} else {
 		f = fdget(fd);
-		if (unlikely(!f.file))
-			return -EBADF;
 	}
 
+	if (unlikely(!f.file))
+		return -EBADF;
+
 	ret = -EOPNOTSUPP;
 	if (unlikely(f.file->f_op != &io_uring_fops))
 		goto out_fput;
@@ -10840,8 +10840,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 out:
 	percpu_ref_put(&ctx->refs);
 out_fput:
-	if (!(flags & IORING_ENTER_REGISTERED_RING))
-		fdput(f);
+	fdput(f);
 	return submitted ? submitted : ret;
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
  2022-05-12  0:28       ` Al Viro
@ 2022-05-12  0:42         ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2022-05-12  0:42 UTC (permalink / raw)
  To: Al Viro, Eric Biggers; +Cc: Matthew Wilcox, Chengguang Xu, linux-fsdevel

On 5/11/22 6:28 PM, Al Viro wrote:
> On Wed, May 11, 2022 at 09:43:46PM +0000, Al Viro wrote:
> 
>> 3) ovl_aio_put() is hard to follow (and some of the callers are poking
>> where they shouldn't), no idea if it's correct.  struct fd is manually
>> constructed there, anyway.
> 
> Speaking of poking in the internals:
> 
> SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>                 u32, min_complete, u32, flags, const void __user *, argp,
>                 size_t, argsz)
> {
> ...
>         struct fd f;
> ...
>         if (flags & IORING_ENTER_REGISTERED_RING) {
>                 struct io_uring_task *tctx = current->io_uring;
> 
>                 if (!tctx || fd >= IO_RINGFD_REG_MAX)
>                         return -EINVAL;
>                 fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
>                 f.file = tctx->registered_rings[fd];
>                 if (unlikely(!f.file))
>                         return -EBADF;
>         } else {
>                 f = fdget(fd);
>                 if (unlikely(!f.file))
>                         return -EBADF;
>         }
> ...
> a bunch of accesses to f.file
> ...
>         if (!(flags & IORING_ENTER_REGISTERED_RING))
>                 fdput(f);
> 
> Note that f.flags is left uninitialized in the first case; it doesn't
> break since we have fdput(f) (which does look at f.flags) done only
> in the case where we don't have IORING_ENTER_REGISTERED_RING in flags
> and since flags remains unchanged since the first if.  But it would
> be just as easy to set f.flags to 0 and use fdput() in both cases...
> 
> Jens, do you have any objections against the following?  Easier to
> follow that way...

No, I think that looks fine. If you need it for other changes:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

(or let me know if you want me to take it).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
  2022-05-11 21:43     ` Al Viro
  2022-05-12  0:28       ` Al Viro
@ 2022-05-12  2:03       ` Alexei Starovoitov
  2022-05-12 12:48         ` Brian Vazquez
  2022-05-15  3:30       ` [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()) Al Viro
  2 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2022-05-12  2:03 UTC (permalink / raw)
  To: Al Viro, bpf, Brian Vazquez, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko
  Cc: Eric Biggers, Matthew Wilcox, Chengguang Xu, Linux-Fsdevel,
	Xu Kuohai, Alexei Starovoitov

On Wed, May 11, 2022 at 2:43 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> [bpf folks Cc'd]
>
> On Wed, May 11, 2022 at 07:01:34PM +0000, Eric Biggers wrote:
> > On Wed, May 11, 2022 at 04:51:34PM +0100, Matthew Wilcox wrote:
> > > On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote:
> > > > Move fdput() to right place in ksys_sync_file_range() to
> > > > avoid fdput() after failed fdget().
> > >
> > > Why?  fdput() is already conditional on FDPUT_FPUT so you're ...
> > > optimising the failure case?
> >
> > "fdput() after failed fdget()" has confused people before, so IMO it's worth
> > cleaning this up.  But the commit message should make clear that it's a cleanup,
> > not a bug fix.  Also I recommend using an early return:
> >
> >       f = fdget(fd);
> >       if (!f.file)
> >               return -EBADF;
> >       ret = sync_file_range(f.file, offset, nbytes, flags);
> >       fdput(f);
> >       return ret;
>
> FWIW, fdput() after failed fdget() is rare, but there's no fundamental reasons
> why it would be wrong.  No objections against that patch, anyway.
>
> Out of curiousity, I've just looked at the existing users.  In mainline we have
> 203 callers of fdput()/fdput_pos(); all but 7 never get reached with NULL ->file.
>
> 1) There's ksys_sync_file_range(), kernel_read_file_from_fd() and ksys_readahead() -
> all with similar pattern.  I'm not sure that for readahead(2) "not opened for
> read" should yield the same error as "bad descriptor", but since it's been a part
> of userland ABI for a while...
>
> 2) two callers in perf_event_open(2) are playing silly buggers with explicit
>         struct fd group = {NULL, 0};
> and rely upon "fdput() is a no-op if we hadn't touched that" (note that if
> we try to touch it and get NULL ->file from fdget(), we do not hit those fdput()
> at all).
>
> 3) ovl_aio_put() is hard to follow (and some of the callers are poking
> where they shouldn't), no idea if it's correct.  struct fd is manually
> constructed there, anyway.
>
> 4) bpf generic_map_update_batch() is really asking for trouble.  The comment in
> there is wrong:
>         f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */
> *NOTHING* we'd done earlier can guarantee that.  We might have a descriptor
> table shared with another thread, and it might have very well done dup2() since
> the last time we'd looked things up.  IOW, this fdget() is racy - the function
> assumes it refers to the same thing that gave us map back in bpf_map_do_batch(),
> but it's not guaranteed at all.
>
> I hadn't put together a reproducer, but that code is very suspicious.  As a general
> rule, you should treat descriptor table as shared object, modifiable by other
> threads.  It can be explicitly locked and it can be explicitly unshared, but
> short of that doing a lookup for the same descriptor twice in a row can yield
> different results.
>
> What's going on there?  Do you really want the same struct file you've got back in
> bpf_map_do_batch() (i.e. the one you've got the map from)?  What should happen
> if the descriptor changes its meaning during (or after) the operation?

Interesting.
If I got this right... in the following:

f = fdget(ufd);
map = __bpf_map_get(f);
if (IS_ERR(map))
   return PTR_ERR(map);
...
f = fdget(ufd);
here there are no guarantees that 'f' is valid and points
to the same map.
Argh. In hindsight that makes sense.

generic_map_update_batch calls bpf_map_update_value.
That could use 'f' for prog_array, fd_array and hash_of_maps
types of maps.
The first two types don't' define .map_update_batch callback.
So BPF_DO_BATCH(map->ops->map_update_batch); will error out
with -ENOTSUPP since that callback is NULL for that map type.

The hash_of_maps does seem to support it, but
that's an odd one to use with batch access.

Anyhow we certainly need to clean this up.

Brian,
do you mind fixing it up, since you've added that
secondary fdget() in the first place?

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
  2022-05-12  2:03       ` Alexei Starovoitov
@ 2022-05-12 12:48         ` Brian Vazquez
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Vazquez @ 2022-05-12 12:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Al Viro, bpf, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
	Eric Biggers, Matthew Wilcox, Chengguang Xu, Linux-Fsdevel,
	Xu Kuohai, Alexei Starovoitov

Sure, let me take a look.

On Wed, May 11, 2022 at 7:04 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, May 11, 2022 at 2:43 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > [bpf folks Cc'd]
> >
> > On Wed, May 11, 2022 at 07:01:34PM +0000, Eric Biggers wrote:
> > > On Wed, May 11, 2022 at 04:51:34PM +0100, Matthew Wilcox wrote:
> > > > On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote:
> > > > > Move fdput() to right place in ksys_sync_file_range() to
> > > > > avoid fdput() after failed fdget().
> > > >
> > > > Why?  fdput() is already conditional on FDPUT_FPUT so you're ...
> > > > optimising the failure case?
> > >
> > > "fdput() after failed fdget()" has confused people before, so IMO it's worth
> > > cleaning this up.  But the commit message should make clear that it's a cleanup,
> > > not a bug fix.  Also I recommend using an early return:
> > >
> > >       f = fdget(fd);
> > >       if (!f.file)
> > >               return -EBADF;
> > >       ret = sync_file_range(f.file, offset, nbytes, flags);
> > >       fdput(f);
> > >       return ret;
> >
> > FWIW, fdput() after failed fdget() is rare, but there's no fundamental reasons
> > why it would be wrong.  No objections against that patch, anyway.
> >
> > Out of curiousity, I've just looked at the existing users.  In mainline we have
> > 203 callers of fdput()/fdput_pos(); all but 7 never get reached with NULL ->file.
> >
> > 1) There's ksys_sync_file_range(), kernel_read_file_from_fd() and ksys_readahead() -
> > all with similar pattern.  I'm not sure that for readahead(2) "not opened for
> > read" should yield the same error as "bad descriptor", but since it's been a part
> > of userland ABI for a while...
> >
> > 2) two callers in perf_event_open(2) are playing silly buggers with explicit
> >         struct fd group = {NULL, 0};
> > and rely upon "fdput() is a no-op if we hadn't touched that" (note that if
> > we try to touch it and get NULL ->file from fdget(), we do not hit those fdput()
> > at all).
> >
> > 3) ovl_aio_put() is hard to follow (and some of the callers are poking
> > where they shouldn't), no idea if it's correct.  struct fd is manually
> > constructed there, anyway.
> >
> > 4) bpf generic_map_update_batch() is really asking for trouble.  The comment in
> > there is wrong:
> >         f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */
> > *NOTHING* we'd done earlier can guarantee that.  We might have a descriptor
> > table shared with another thread, and it might have very well done dup2() since
> > the last time we'd looked things up.  IOW, this fdget() is racy - the function
> > assumes it refers to the same thing that gave us map back in bpf_map_do_batch(),
> > but it's not guaranteed at all.
> >
> > I hadn't put together a reproducer, but that code is very suspicious.  As a general
> > rule, you should treat descriptor table as shared object, modifiable by other
> > threads.  It can be explicitly locked and it can be explicitly unshared, but
> > short of that doing a lookup for the same descriptor twice in a row can yield
> > different results.
> >
> > What's going on there?  Do you really want the same struct file you've got back in
> > bpf_map_do_batch() (i.e. the one you've got the map from)?  What should happen
> > if the descriptor changes its meaning during (or after) the operation?
>
> Interesting.
> If I got this right... in the following:
>
> f = fdget(ufd);
> map = __bpf_map_get(f);
> if (IS_ERR(map))
>    return PTR_ERR(map);
> ...
> f = fdget(ufd);
> here there are no guarantees that 'f' is valid and points
> to the same map.
> Argh. In hindsight that makes sense.
>
> generic_map_update_batch calls bpf_map_update_value.
> That could use 'f' for prog_array, fd_array and hash_of_maps
> types of maps.
> The first two types don't' define .map_update_batch callback.
> So BPF_DO_BATCH(map->ops->map_update_batch); will error out
> with -ENOTSUPP since that callback is NULL for that map type.
>
> The hash_of_maps does seem to support it, but
> that's an odd one to use with batch access.
>
> Anyhow we certainly need to clean this up.
>
> Brian,
> do you mind fixing it up, since you've added that
> secondary fdget() in the first place?
>
> Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range())
  2022-05-11 21:43     ` Al Viro
  2022-05-12  0:28       ` Al Viro
  2022-05-12  2:03       ` Alexei Starovoitov
@ 2022-05-15  3:30       ` Al Viro
  2022-05-15 16:14         ` Michael S. Tsirkin
  2022-05-16  4:17         ` Jason Wang
  2 siblings, 2 replies; 13+ messages in thread
From: Al Viro @ 2022-05-15  3:30 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, Jason Wang, David S. Miller, Michael S. Tsirkin

[tun/tap and vhost folks Cc'd]

here's another piece of code assuming that repeated fget() will yield the
same opened file: in vhost_net_set_backend() we have

        sock = get_socket(fd);
        if (IS_ERR(sock)) {
                r = PTR_ERR(sock);
                goto err_vq;
        }

        /* start polling new socket */
        oldsock = vhost_vq_get_backend(vq);
        if (sock != oldsock) {
...
                vhost_vq_set_backend(vq, sock);
...
                if (index == VHOST_NET_VQ_RX)
                        nvq->rx_ring = get_tap_ptr_ring(fd);

with
static struct socket *get_socket(int fd)
{
        struct socket *sock;

        /* special case to disable backend */
        if (fd == -1)
                return NULL;
        sock = get_raw_socket(fd);
        if (!IS_ERR(sock))
                return sock;
        sock = get_tap_socket(fd);
        if (!IS_ERR(sock))
                return sock;
        return ERR_PTR(-ENOTSOCK);
}
and
static struct ptr_ring *get_tap_ptr_ring(int fd)
{
        struct ptr_ring *ring;
        struct file *file = fget(fd);

        if (!file)
                return NULL;
        ring = tun_get_tx_ring(file);
        if (!IS_ERR(ring))
                goto out;
        ring = tap_get_ptr_ring(file);
        if (!IS_ERR(ring))
                goto out;
        ring = NULL;
out:
        fput(file);
        return ring;
}

Again, there is no promise that fd will resolve to the same thing for
lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
enough with the guts of drivers/vhost to tell how easy it is to turn
into attack, but it looks like trouble.  If nothing else, the pointer
returned by tun_get_tx_ring() is not guaranteed to be pinned down by
anything - the reference to sock will _usually_ suffice, but that
doesn't help any if we get a different socket on that second fget().

One possible way to fix it would be the patch below; objections?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 792ab5f23647..86ea7695241e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1450,13 +1450,9 @@ static struct socket *get_raw_socket(int fd)
 	return ERR_PTR(r);
 }
 
-static struct ptr_ring *get_tap_ptr_ring(int fd)
+static struct ptr_ring *get_tap_ptr_ring(struct file *file)
 {
 	struct ptr_ring *ring;
-	struct file *file = fget(fd);
-
-	if (!file)
-		return NULL;
 	ring = tun_get_tx_ring(file);
 	if (!IS_ERR(ring))
 		goto out;
@@ -1465,7 +1461,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
 		goto out;
 	ring = NULL;
 out:
-	fput(file);
 	return ring;
 }
 
@@ -1553,7 +1548,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		if (r)
 			goto err_used;
 		if (index == VHOST_NET_VQ_RX)
-			nvq->rx_ring = get_tap_ptr_ring(fd);
+			nvq->rx_ring = get_tap_ptr_ring(sock->file);
 
 		oldubufs = nvq->ubufs;
 		nvq->ubufs = ubufs;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range())
  2022-05-15  3:30       ` [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()) Al Viro
@ 2022-05-15 16:14         ` Michael S. Tsirkin
  2022-05-16  4:17         ` Jason Wang
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-05-15 16:14 UTC (permalink / raw)
  To: Al Viro; +Cc: Eric Biggers, linux-fsdevel, Jason Wang, David S. Miller

On Sun, May 15, 2022 at 03:30:23AM +0000, Al Viro wrote:
> [tun/tap and vhost folks Cc'd]
> 
> here's another piece of code assuming that repeated fget() will yield the
> same opened file: in vhost_net_set_backend() we have
> 
>         sock = get_socket(fd);
>         if (IS_ERR(sock)) {
>                 r = PTR_ERR(sock);
>                 goto err_vq;
>         }
> 
>         /* start polling new socket */
>         oldsock = vhost_vq_get_backend(vq);
>         if (sock != oldsock) {
> ...
>                 vhost_vq_set_backend(vq, sock);
> ...
>                 if (index == VHOST_NET_VQ_RX)
>                         nvq->rx_ring = get_tap_ptr_ring(fd);
> 
> with
> static struct socket *get_socket(int fd)
> {
>         struct socket *sock;
> 
>         /* special case to disable backend */
>         if (fd == -1)
>                 return NULL;
>         sock = get_raw_socket(fd);
>         if (!IS_ERR(sock))
>                 return sock;
>         sock = get_tap_socket(fd);
>         if (!IS_ERR(sock))
>                 return sock;
>         return ERR_PTR(-ENOTSOCK);
> }
> and
> static struct ptr_ring *get_tap_ptr_ring(int fd)
> {
>         struct ptr_ring *ring;
>         struct file *file = fget(fd);
> 
>         if (!file)
>                 return NULL;
>         ring = tun_get_tx_ring(file);
>         if (!IS_ERR(ring))
>                 goto out;
>         ring = tap_get_ptr_ring(file);
>         if (!IS_ERR(ring))
>                 goto out;
>         ring = NULL;
> out:
>         fput(file);
>         return ring;
> }
> 
> Again, there is no promise that fd will resolve to the same thing for
> lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
> enough with the guts of drivers/vhost to tell how easy it is to turn
> into attack, but it looks like trouble.  If nothing else, the pointer
> returned by tun_get_tx_ring() is not guaranteed to be pinned down by
> anything - the reference to sock will _usually_ suffice, but that
> doesn't help any if we get a different socket on that second fget().
> 
> One possible way to fix it would be the patch below; objections?
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Suspect you are right, didn't test yet. Jason?

> ---
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 792ab5f23647..86ea7695241e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1450,13 +1450,9 @@ static struct socket *get_raw_socket(int fd)
>  	return ERR_PTR(r);
>  }
>  
> -static struct ptr_ring *get_tap_ptr_ring(int fd)
> +static struct ptr_ring *get_tap_ptr_ring(struct file *file)
>  {
>  	struct ptr_ring *ring;
> -	struct file *file = fget(fd);
> -
> -	if (!file)
> -		return NULL;
>  	ring = tun_get_tx_ring(file);
>  	if (!IS_ERR(ring))
>  		goto out;
> @@ -1465,7 +1461,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
>  		goto out;
>  	ring = NULL;
>  out:
> -	fput(file);
>  	return ring;
>  }
>  
> @@ -1553,7 +1548,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>  		if (r)
>  			goto err_used;
>  		if (index == VHOST_NET_VQ_RX)
> -			nvq->rx_ring = get_tap_ptr_ring(fd);
> +			nvq->rx_ring = get_tap_ptr_ring(sock->file);
>  
>  		oldubufs = nvq->ubufs;
>  		nvq->ubufs = ubufs;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range())
  2022-05-15  3:30       ` [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()) Al Viro
  2022-05-15 16:14         ` Michael S. Tsirkin
@ 2022-05-16  4:17         ` Jason Wang
  2022-05-16  7:54           ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-05-16  4:17 UTC (permalink / raw)
  To: Al Viro, Eric Biggers; +Cc: linux-fsdevel, David S. Miller, Michael S. Tsirkin


在 2022/5/15 11:30, Al Viro 写道:
> [tun/tap and vhost folks Cc'd]
>
> here's another piece of code assuming that repeated fget() will yield the
> same opened file: in vhost_net_set_backend() we have
>
>          sock = get_socket(fd);
>          if (IS_ERR(sock)) {
>                  r = PTR_ERR(sock);
>                  goto err_vq;
>          }
>
>          /* start polling new socket */
>          oldsock = vhost_vq_get_backend(vq);
>          if (sock != oldsock) {
> ...
>                  vhost_vq_set_backend(vq, sock);
> ...
>                  if (index == VHOST_NET_VQ_RX)
>                          nvq->rx_ring = get_tap_ptr_ring(fd);
>
> with
> static struct socket *get_socket(int fd)
> {
>          struct socket *sock;
>
>          /* special case to disable backend */
>          if (fd == -1)
>                  return NULL;
>          sock = get_raw_socket(fd);
>          if (!IS_ERR(sock))
>                  return sock;
>          sock = get_tap_socket(fd);
>          if (!IS_ERR(sock))
>                  return sock;
>          return ERR_PTR(-ENOTSOCK);
> }
> and
> static struct ptr_ring *get_tap_ptr_ring(int fd)
> {
>          struct ptr_ring *ring;
>          struct file *file = fget(fd);
>
>          if (!file)
>                  return NULL;
>          ring = tun_get_tx_ring(file);
>          if (!IS_ERR(ring))
>                  goto out;
>          ring = tap_get_ptr_ring(file);
>          if (!IS_ERR(ring))
>                  goto out;
>          ring = NULL;
> out:
>          fput(file);
>          return ring;
> }
>
> Again, there is no promise that fd will resolve to the same thing for
> lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
> enough with the guts of drivers/vhost to tell how easy it is to turn
> into attack, but it looks like trouble.  If nothing else, the pointer
> returned by tun_get_tx_ring() is not guaranteed to be pinned down by
> anything - the reference to sock will _usually_ suffice, but that
> doesn't help any if we get a different socket on that second fget().
>
> One possible way to fix it would be the patch below; objections?
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 792ab5f23647..86ea7695241e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1450,13 +1450,9 @@ static struct socket *get_raw_socket(int fd)
>   	return ERR_PTR(r);
>   }
>   
> -static struct ptr_ring *get_tap_ptr_ring(int fd)
> +static struct ptr_ring *get_tap_ptr_ring(struct file *file)
>   {
>   	struct ptr_ring *ring;
> -	struct file *file = fget(fd);
> -
> -	if (!file)
> -		return NULL;
>   	ring = tun_get_tx_ring(file);
>   	if (!IS_ERR(ring))
>   		goto out;
> @@ -1465,7 +1461,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
>   		goto out;
>   	ring = NULL;
>   out:
> -	fput(file);
>   	return ring;
>   }
>   
> @@ -1553,7 +1548,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>   		if (r)
>   			goto err_used;
>   		if (index == VHOST_NET_VQ_RX)
> -			nvq->rx_ring = get_tap_ptr_ring(fd);
> +			nvq->rx_ring = get_tap_ptr_ring(sock->file);


sock could be NULL if we want to stop the vhost-net.

Other looks fine.

Thanks


>   
>   		oldubufs = nvq->ubufs;
>   		nvq->ubufs = ubufs;
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range())
  2022-05-16  4:17         ` Jason Wang
@ 2022-05-16  7:54           ` Michael S. Tsirkin
  2022-05-16  8:42             ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-05-16  7:54 UTC (permalink / raw)
  To: Jason Wang; +Cc: Al Viro, Eric Biggers, linux-fsdevel, David S. Miller

On Mon, May 16, 2022 at 12:17:56PM +0800, Jason Wang wrote:
> 
> 在 2022/5/15 11:30, Al Viro 写道:
> > [tun/tap and vhost folks Cc'd]
> > 
> > here's another piece of code assuming that repeated fget() will yield the
> > same opened file: in vhost_net_set_backend() we have
> > 
> >          sock = get_socket(fd);
> >          if (IS_ERR(sock)) {
> >                  r = PTR_ERR(sock);
> >                  goto err_vq;
> >          }
> > 
> >          /* start polling new socket */
> >          oldsock = vhost_vq_get_backend(vq);
> >          if (sock != oldsock) {
> > ...
> >                  vhost_vq_set_backend(vq, sock);
> > ...
> >                  if (index == VHOST_NET_VQ_RX)
> >                          nvq->rx_ring = get_tap_ptr_ring(fd);
> > 
> > with
> > static struct socket *get_socket(int fd)
> > {
> >          struct socket *sock;
> > 
> >          /* special case to disable backend */
> >          if (fd == -1)
> >                  return NULL;
> >          sock = get_raw_socket(fd);
> >          if (!IS_ERR(sock))
> >                  return sock;
> >          sock = get_tap_socket(fd);
> >          if (!IS_ERR(sock))
> >                  return sock;
> >          return ERR_PTR(-ENOTSOCK);
> > }
> > and
> > static struct ptr_ring *get_tap_ptr_ring(int fd)
> > {
> >          struct ptr_ring *ring;
> >          struct file *file = fget(fd);
> > 
> >          if (!file)
> >                  return NULL;
> >          ring = tun_get_tx_ring(file);
> >          if (!IS_ERR(ring))
> >                  goto out;
> >          ring = tap_get_ptr_ring(file);
> >          if (!IS_ERR(ring))
> >                  goto out;
> >          ring = NULL;
> > out:
> >          fput(file);
> >          return ring;
> > }
> > 
> > Again, there is no promise that fd will resolve to the same thing for
> > lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
> > enough with the guts of drivers/vhost to tell how easy it is to turn
> > into attack, but it looks like trouble.  If nothing else, the pointer
> > returned by tun_get_tx_ring() is not guaranteed to be pinned down by
> > anything - the reference to sock will _usually_ suffice, but that
> > doesn't help any if we get a different socket on that second fget().
> > 
> > One possible way to fix it would be the patch below; objections?
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 792ab5f23647..86ea7695241e 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -1450,13 +1450,9 @@ static struct socket *get_raw_socket(int fd)
> >   	return ERR_PTR(r);
> >   }
> > -static struct ptr_ring *get_tap_ptr_ring(int fd)
> > +static struct ptr_ring *get_tap_ptr_ring(struct file *file)
> >   {
> >   	struct ptr_ring *ring;
> > -	struct file *file = fget(fd);
> > -
> > -	if (!file)
> > -		return NULL;
> >   	ring = tun_get_tx_ring(file);
> >   	if (!IS_ERR(ring))
> >   		goto out;
> > @@ -1465,7 +1461,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
> >   		goto out;
> >   	ring = NULL;
> >   out:
> > -	fput(file);
> >   	return ring;
> >   }
> > @@ -1553,7 +1548,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> >   		if (r)
> >   			goto err_used;
> >   		if (index == VHOST_NET_VQ_RX)
> > -			nvq->rx_ring = get_tap_ptr_ring(fd);
> > +			nvq->rx_ring = get_tap_ptr_ring(sock->file);
> 
> 
> sock could be NULL if we want to stop the vhost-net.

Can you cook up a correct patch then please?

> Other looks fine.
> 
> Thanks
> 
> 
> >   		oldubufs = nvq->ubufs;
> >   		nvq->ubufs = ubufs;
> > 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range())
  2022-05-16  7:54           ` Michael S. Tsirkin
@ 2022-05-16  8:42             ` Jason Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2022-05-16  8:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Al Viro, Eric Biggers, linux-fsdevel, David S. Miller

On Mon, May 16, 2022 at 3:54 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, May 16, 2022 at 12:17:56PM +0800, Jason Wang wrote:
> >
> > 在 2022/5/15 11:30, Al Viro 写道:
> > > [tun/tap and vhost folks Cc'd]
> > >
> > > here's another piece of code assuming that repeated fget() will yield the
> > > same opened file: in vhost_net_set_backend() we have
> > >
> > >          sock = get_socket(fd);
> > >          if (IS_ERR(sock)) {
> > >                  r = PTR_ERR(sock);
> > >                  goto err_vq;
> > >          }
> > >
> > >          /* start polling new socket */
> > >          oldsock = vhost_vq_get_backend(vq);
> > >          if (sock != oldsock) {
> > > ...
> > >                  vhost_vq_set_backend(vq, sock);
> > > ...
> > >                  if (index == VHOST_NET_VQ_RX)
> > >                          nvq->rx_ring = get_tap_ptr_ring(fd);
> > >
> > > with
> > > static struct socket *get_socket(int fd)
> > > {
> > >          struct socket *sock;
> > >
> > >          /* special case to disable backend */
> > >          if (fd == -1)
> > >                  return NULL;
> > >          sock = get_raw_socket(fd);
> > >          if (!IS_ERR(sock))
> > >                  return sock;
> > >          sock = get_tap_socket(fd);
> > >          if (!IS_ERR(sock))
> > >                  return sock;
> > >          return ERR_PTR(-ENOTSOCK);
> > > }
> > > and
> > > static struct ptr_ring *get_tap_ptr_ring(int fd)
> > > {
> > >          struct ptr_ring *ring;
> > >          struct file *file = fget(fd);
> > >
> > >          if (!file)
> > >                  return NULL;
> > >          ring = tun_get_tx_ring(file);
> > >          if (!IS_ERR(ring))
> > >                  goto out;
> > >          ring = tap_get_ptr_ring(file);
> > >          if (!IS_ERR(ring))
> > >                  goto out;
> > >          ring = NULL;
> > > out:
> > >          fput(file);
> > >          return ring;
> > > }
> > >
> > > Again, there is no promise that fd will resolve to the same thing for
> > > lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
> > > enough with the guts of drivers/vhost to tell how easy it is to turn
> > > into attack, but it looks like trouble.  If nothing else, the pointer
> > > returned by tun_get_tx_ring() is not guaranteed to be pinned down by
> > > anything - the reference to sock will _usually_ suffice, but that
> > > doesn't help any if we get a different socket on that second fget().
> > >
> > > One possible way to fix it would be the patch below; objections?
> > >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > ---
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 792ab5f23647..86ea7695241e 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -1450,13 +1450,9 @@ static struct socket *get_raw_socket(int fd)
> > >     return ERR_PTR(r);
> > >   }
> > > -static struct ptr_ring *get_tap_ptr_ring(int fd)
> > > +static struct ptr_ring *get_tap_ptr_ring(struct file *file)
> > >   {
> > >     struct ptr_ring *ring;
> > > -   struct file *file = fget(fd);
> > > -
> > > -   if (!file)
> > > -           return NULL;
> > >     ring = tun_get_tx_ring(file);
> > >     if (!IS_ERR(ring))
> > >             goto out;
> > > @@ -1465,7 +1461,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
> > >             goto out;
> > >     ring = NULL;
> > >   out:
> > > -   fput(file);
> > >     return ring;
> > >   }
> > > @@ -1553,7 +1548,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
> > >             if (r)
> > >                     goto err_used;
> > >             if (index == VHOST_NET_VQ_RX)
> > > -                   nvq->rx_ring = get_tap_ptr_ring(fd);
> > > +                   nvq->rx_ring = get_tap_ptr_ring(sock->file);
> >
> >
> > sock could be NULL if we want to stop the vhost-net.
>
> Can you cook up a correct patch then please?

Sent.

Thanks

>
> > Other looks fine.
> >
> > Thanks
> >
> >
> > >             oldubufs = nvq->ubufs;
> > >             nvq->ubufs = ubufs;
> > >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-16  8:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11 15:45 [PATCH] vfs: move fdput() to right place in ksys_sync_file_range() Chengguang Xu
2022-05-11 15:51 ` Matthew Wilcox
2022-05-11 19:01   ` Eric Biggers
2022-05-11 21:43     ` Al Viro
2022-05-12  0:28       ` Al Viro
2022-05-12  0:42         ` Jens Axboe
2022-05-12  2:03       ` Alexei Starovoitov
2022-05-12 12:48         ` Brian Vazquez
2022-05-15  3:30       ` [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()) Al Viro
2022-05-15 16:14         ` Michael S. Tsirkin
2022-05-16  4:17         ` Jason Wang
2022-05-16  7:54           ` Michael S. Tsirkin
2022-05-16  8:42             ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.