netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kui-Feng Lee <sinquersw@gmail.com>
To: Stanislav Fomichev <sdf@google.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
	Eric Dumazet <edumazet@google.com>,
	davem@davemloft.net, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, daniel@iogearbox.net,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Leon Romanovsky <leon@kernel.org>,
	David Ahern <dsahern@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Kees Cook <keescook@chromium.org>,
	Christian Brauner <brauner@kernel.org>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	Lennart Poettering <mzxreary@0pointer.de>,
	linux-arch@vger.kernel.org,
	Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>,
	bpf <bpf@vger.kernel.org>
Subject: Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
Date: Tue, 25 Apr 2023 14:28:19 -0700	[thread overview]
Message-ID: <927ddd10-ae5b-886c-6725-3daf04456e52@gmail.com> (raw)
In-Reply-To: <CAKH8qBusi0AWpo_iDaFkLFPUhgZy7-p6JwhimCkpYMhWnToE7g@mail.gmail.com>



On 4/25/23 11:42, Stanislav Fomichev wrote:
> On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/18/23 09:47, Stanislav Fomichev wrote:
>>> On 04/17, Martin KaFai Lau wrote:
>>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>>>> On 04/13, Stanislav Fomichev wrote:
>>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>>>
>>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>>>> install FDs into the process fdtable.
>>>>>>>>>
>>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>>>
>>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>>>
>>>>>>>>
>>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>>>
>>>>>>> Sure, I will add next time.
>>>>>>>
>>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>>>
>>>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>>>
>>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>>>> well?
>>>>>
>>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>>>
>>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>>>      before the kernel and shouldn't leak anything
>>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>>>      ignore the error from the bpf program? that would still preserve
>>>>>      the observability aspect
>>>>
>>>> stealing this thread to discuss the optlen issue which may make sense to
>>>> bypass also.
>>>>
>>>> There has been issue with optlen. Other than this older post related to
>>>> optlen > PAGE_SIZE:
>>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>>>> the recent one related to optlen that we have seen is
>>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>>>> -EFAULT to the userspace even the bpf prog has not changed anything.
>>>
>>> (ignoring -EFAULT issue) this seems like it needs to be
>>>
>>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
>>>                /* error */
>>>        }
>>>
>>> ?
>>>
>>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>>>> max_optlen' for now (and this can use a separate patch which as usual
>>>> requires a bpf selftests)?
>>>
>>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
>>> seems like the way to go. It caused too much trouble already :-(
>>>
>>> Should I prepare a patch or do you want to take a stab at it?
>>>
>>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>>>> such that it can enforce read-only for some optname and potentially also
>>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>>>> holes but could be a seed of thought to continue polishing the idea.
>>>
>>> Ack, let's think about it.
>>>
>>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
>>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
>>> and we have a mostly working bpf_getsockopt (after your refactoring),
>>> I don't see why we need to continue the current scheme of triggering
>>> after the kernel?
>>
>> Since a sleepable hook would cause some restrictions, perhaps, we could
>> introduce something like the promise pattern.  In our case here, BPF
>> program call an async version of copy_from_user()/copy_to_user() to
>> return a promise.
> 
> Having a promise might work. This is essentially what we already do
> with sockets/etc with acquire/release pattern.

Would you mind to give me some context of the socket things?

> 
> What are the sleepable restrictions you're hinting about? I feel like
> with the sleepable bpf, we can also remove all the temporary buffer
> management / extra copies which sounds like a win to me. (we have this
> ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> allocate temporary buffers if needed..
> 
>>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>>>      gets called whenever bpf returns an error to make sure protocols have
>>>>>      a chance to handle that condition (and free the fd)
>>>>>
>>>>
>>>>

  parent reply	other threads:[~2023-04-25 21:28 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 13:33 [PATCH net-next v4 0/4] Add SCM_PIDFD and SO_PEERPIDFD Alexander Mikhalitsyn
2023-04-13 13:33 ` [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD Alexander Mikhalitsyn
2023-04-17 15:18   ` Christian Brauner
2023-04-17 16:01     ` Aleksandr Mikhalitsyn
2023-04-17 17:16       ` Christian Brauner
2023-04-17 17:43         ` Eric Dumazet
2023-04-18  8:16           ` Christian Brauner
2023-04-18 13:07   ` Christian Brauner
2023-04-13 13:33 ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Alexander Mikhalitsyn
2023-04-13 14:21   ` Eric Dumazet
2023-04-13 14:38     ` Aleksandr Mikhalitsyn
2023-04-13 16:37       ` Stanislav Fomichev
2023-04-15  1:55         ` Stanislav Fomichev
2023-04-17 14:42           ` Christian Brauner
2023-04-17 18:10             ` Stanislav Fomichev
2023-04-18  1:03           ` handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Martin KaFai Lau
2023-04-18 16:47             ` Stanislav Fomichev
2023-04-25 17:59               ` Kui-Feng Lee
2023-04-25 18:42                 ` Stanislav Fomichev
2023-04-25 21:11                   ` Kui-Feng Lee
2023-04-26 15:50                     ` Stanislav Fomichev
2023-04-25 21:28                   ` Kui-Feng Lee [this message]
2023-04-26 15:50                     ` Stanislav Fomichev
2023-04-15  0:51       ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Jakub Kicinski
2023-04-13 13:33 ` [PATCH net-next v4 3/4] net: core: add getsockopt SO_PEERPIDFD Alexander Mikhalitsyn
2023-04-18 13:13   ` Christian Brauner
2023-04-13 13:33 ` [PATCH net-next v4 4/4] selftests: net: add SCM_PIDFD / SO_PEERPIDFD test Alexander Mikhalitsyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=927ddd10-ae5b-886c-6725-3daf04456e52@gmail.com \
    --to=sinquersw@gmail.com \
    --cc=aleksandr.mikhalitsyn@canonical.com \
    --cc=arnd@arndb.de \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=keescook@chromium.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=leon@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mzxreary@0pointer.de \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).