netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: David Ahern <dsahern@gmail.com>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Networking <netdev@vger.kernel.org>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Lorenzo Bianconi <lorenzo@kernel.org>
Subject: Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
Date: Mon, 8 Jun 2020 13:05:18 -0700	[thread overview]
Message-ID: <CAEf4Bzah_5jrFcvbk+mLa31KQjY77JdoXw=cPGAYw92X8nN=gw@mail.gmail.com> (raw)
In-Reply-To: <20200608214437.5f7766ab@carbon>

-- Andrii

On Mon, Jun 8, 2020 at 12:45 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Mon, 8 Jun 2020 11:36:33 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Mon, Jun 8, 2020 at 9:51 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> > >
> > > This patch change BPF syscall to avoid returning file descriptor value zero.
> > >
> > > As mentioned in cover letter, it is very impractical when extending kABI
> > > that the file-descriptor value 'zero' is valid, as this requires new fields
> > > must be initialised as minus-1. First step is to change the kernel such that
> > > BPF-syscall simply doesn't return value zero as a FD number.
> > >
> > > This patch achieves this by similar code to anon_inode_getfd(), with the
> > > exception of getting unused FD starting from 1. The kernel already supports
> > > starting from a specific FD value, as this is used by f_dupfd(). It seems
> > > simpler to replicate part of anon_inode_getfd() code and use this start from
> > > offset feature, instead of using f_dupfd() handling afterwards.
> >
> > Wouldn't it be better to just handle that on libbpf side? That way it
> > works on all kernels and doesn't require this duplication of logic
> > inside kernel?
>
> IMHO this is needed on the kernel side, to pair it with the API change.
> I don't see how doing this in libbpf can cover all cases.
>

In practice, it's going to be very rare that fd=0 is not already
allocated for application. So even not doing anything is going to work
in 99.9% of cases.

Think about this, any realistic BPF program will have a map or global
variable associated with it. So in the rare case where we have FD 0
not used, map will get that FD. Even if not, if you load your program
from ELF file, that ELF file will get FD 0. Even if not, modern BPF
programs will have BTF object associated, which will get FD 0. And so
on... Even daemons will probably already have some FD open for
whatever logging/output it needs to do (it doesn't have to be stdout).
So this FD=0 is very hypothetical case, very. You have to essentially
stage it consciously, to actually hit it.

Second of all, this BPF-specific FD allocation logic is just that --
duplication and extra code to maintain. Other folks in kernel will
eventually just be "huh? bpf needs its own anon_file API, why?!..." Do
we really want more of that?

Third, you already missed anon_inode_getfile use in bpf_link_prime(),
and in the future we'll undoubtedly will miss something else, so this
FD >= 1 from kernel will work only sometimes and no one will notice
until it breaks for someone, which won't happen for a while (because
it works as is most of the time, see above).

I'm not even talking about the fact that we are also subverting
standard Linux promise that a user gets the smallest available FD in
the system...

So I think libbpf can be kind to user and do:

int fd = bpf_whatever();
if (fd == 0) {
    fd = dup(0);
    close(0);
}

But even if it doesn't, not a big deal and probably no one ever will
have this problem with FD 0 for BPF program.

> First of all, some users might not be using libbpf.
>
> Second, a userspace application could be using an older version of
> libbpf on a newer kernel. (Notice this is not only due to older
> distros, but also because projects using git submodule libbpf will
> freeze at some point in time that worked).

Theoretically this is a problem, in practice libbpf is always more
up-to-date compared to kernel... so I don't buy this argument,
honestly :)

>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

  reply	other threads:[~2020-06-08 20:05 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-08 16:51 [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Jesper Dangaard Brouer
2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
2020-06-08 18:28   ` Toke Høiland-Jørgensen
2020-06-08 18:36   ` Andrii Nakryiko
2020-06-08 19:44     ` Jesper Dangaard Brouer
2020-06-08 20:05       ` Andrii Nakryiko [this message]
2020-06-08 19:00   ` kernel test robot
2020-06-08 19:55   ` kernel test robot
2020-06-08 19:55   ` [RFC PATCH] bpf: bpf_anon_inode_getfd() can be static kernel test robot
2020-06-08 20:39   ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 kernel test robot
2020-06-08 20:42   ` kernel test robot
2020-06-08 16:51 ` [PATCH bpf 2/3] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
2020-06-08 18:30   ` Toke Høiland-Jørgensen
2020-06-08 16:51 ` [PATCH bpf 3/3] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
2020-06-09  1:34 ` [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Alexei Starovoitov
2020-06-09  9:55   ` Jesper Dangaard Brouer
2020-06-09 13:31   ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Jesper Dangaard Brouer
2020-06-09 13:31     ` [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
2020-06-09 13:47       ` David Ahern
2020-06-09 15:12         ` Jesper Dangaard Brouer
2020-06-09 13:31     ` [PATCH bpf V2 2/2] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
2020-06-09 19:03     ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4Bzah_5jrFcvbk+mLa31KQjY77JdoXw=cPGAYw92X8nN=gw@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=lorenzo@kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).