All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Al Viro <viro@zeniv.linux.org.uk>, bpf <bpf@vger.kernel.org>,
	Brian Vazquez <brianvv@google.com>, Yonghong Song <yhs@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Chengguang Xu <cgxu519@mykernel.net>,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	Xu Kuohai <xukuohai@huawei.com>,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()
Date: Wed, 11 May 2022 19:03:49 -0700	[thread overview]
Message-ID: <CAADnVQ+tcrDQxb769yMYvyzPHcgZXozqYq0uj4QHi+kjzBYTvQ@mail.gmail.com> (raw)
In-Reply-To: <YnwuEt2Xm1iPjW7S@zeniv-ca.linux.org.uk>

On Wed, May 11, 2022 at 2:43 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> [bpf folks Cc'd]
>
> On Wed, May 11, 2022 at 07:01:34PM +0000, Eric Biggers wrote:
> > On Wed, May 11, 2022 at 04:51:34PM +0100, Matthew Wilcox wrote:
> > > On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote:
> > > > Move fdput() to right place in ksys_sync_file_range() to
> > > > avoid fdput() after failed fdget().
> > >
> > > Why?  fdput() is already conditional on FDPUT_FPUT so you're ...
> > > optimising the failure case?
> >
> > "fdput() after failed fdget()" has confused people before, so IMO it's worth
> > cleaning this up.  But the commit message should make clear that it's a cleanup,
> > not a bug fix.  Also I recommend using an early return:
> >
> >       f = fdget(fd);
> >       if (!f.file)
> >               return -EBADF;
> >       ret = sync_file_range(f.file, offset, nbytes, flags);
> >       fdput(f);
> >       return ret;
>
> FWIW, fdput() after failed fdget() is rare, but there's no fundamental reasons
> why it would be wrong.  No objections against that patch, anyway.
>
> Out of curiousity, I've just looked at the existing users.  In mainline we have
> 203 callers of fdput()/fdput_pos(); all but 7 never get reached with NULL ->file.
>
> 1) There's ksys_sync_file_range(), kernel_read_file_from_fd() and ksys_readahead() -
> all with similar pattern.  I'm not sure that for readahead(2) "not opened for
> read" should yield the same error as "bad descriptor", but since it's been a part
> of userland ABI for a while...
>
> 2) two callers in perf_event_open(2) are playing silly buggers with explicit
>         struct fd group = {NULL, 0};
> and rely upon "fdput() is a no-op if we hadn't touched that" (note that if
> we try to touch it and get NULL ->file from fdget(), we do not hit those fdput()
> at all).
>
> 3) ovl_aio_put() is hard to follow (and some of the callers are poking
> where they shouldn't), no idea if it's correct.  struct fd is manually
> constructed there, anyway.
>
> 4) bpf generic_map_update_batch() is really asking for trouble.  The comment in
> there is wrong:
>         f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */
> *NOTHING* we'd done earlier can guarantee that.  We might have a descriptor
> table shared with another thread, and it might have very well done dup2() since
> the last time we'd looked things up.  IOW, this fdget() is racy - the function
> assumes it refers to the same thing that gave us map back in bpf_map_do_batch(),
> but it's not guaranteed at all.
>
> I hadn't put together a reproducer, but that code is very suspicious.  As a general
> rule, you should treat descriptor table as shared object, modifiable by other
> threads.  It can be explicitly locked and it can be explicitly unshared, but
> short of that doing a lookup for the same descriptor twice in a row can yield
> different results.
>
> What's going on there?  Do you really want the same struct file you've got back in
> bpf_map_do_batch() (i.e. the one you've got the map from)?  What should happen
> if the descriptor changes its meaning during (or after) the operation?

Interesting.
If I got this right... in the following:

f = fdget(ufd);
map = __bpf_map_get(f);
if (IS_ERR(map))
   return PTR_ERR(map);
...
f = fdget(ufd);
here there are no guarantees that 'f' is valid and points
to the same map.
Argh. In hindsight that makes sense.

generic_map_update_batch calls bpf_map_update_value.
That could use 'f' for prog_array, fd_array and hash_of_maps
types of maps.
The first two types don't' define .map_update_batch callback.
So BPF_DO_BATCH(map->ops->map_update_batch); will error out
with -ENOTSUPP since that callback is NULL for that map type.

The hash_of_maps does seem to support it, but
that's an odd one to use with batch access.

Anyhow we certainly need to clean this up.

Brian,
do you mind fixing it up, since you've added that
secondary fdget() in the first place?

Thanks!

  parent reply	other threads:[~2022-05-12  2:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 15:45 [PATCH] vfs: move fdput() to right place in ksys_sync_file_range() Chengguang Xu
2022-05-11 15:51 ` Matthew Wilcox
2022-05-11 19:01   ` Eric Biggers
2022-05-11 21:43     ` Al Viro
2022-05-12  0:28       ` Al Viro
2022-05-12  0:42         ` Jens Axboe
2022-05-12  2:03       ` Alexei Starovoitov [this message]
2022-05-12 12:48         ` Brian Vazquez
2022-05-15  3:30       ` [BUG] double fget() in vhost/net (was Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()) Al Viro
2022-05-15 16:14         ` Michael S. Tsirkin
2022-05-16  4:17         ` Jason Wang
2022-05-16  7:54           ` Michael S. Tsirkin
2022-05-16  8:42             ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQ+tcrDQxb769yMYvyzPHcgZXozqYq0uj4QHi+kjzBYTvQ@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brianvv@google.com \
    --cc=cgxu519@mykernel.net \
    --cc=daniel@iogearbox.net \
    --cc=ebiggers@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=xukuohai@huawei.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.