From: Martin KaFai Lau <martin.lau@linux.dev>
To: Xin Long <lucien.xin@gmail.com>, Stefan Metzmacher <metze@samba.org>
Cc: network dev <netdev@vger.kernel.org>,
davem@davemloft.net, kuba@kernel.org,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Steve French <smfrench@gmail.com>,
Namjae Jeon <linkinjeon@kernel.org>,
Chuck Lever III <chuck.lever@oracle.com>,
Jeff Layton <jlayton@kernel.org>,
Sabrina Dubroca <sd@queasysnail.net>,
Tyler Fanelli <tfanelli@redhat.com>,
Pengtao He <hepengtao@xiaomi.com>,
"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
Samba Technical <samba-technical@lists.samba.org>,
bpf <bpf@vger.kernel.org>
Subject: Re: [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with Userspace handshake
Date: Thu, 25 Apr 2024 21:58:50 -0700 [thread overview]
Message-ID: <840ddcb4-acaa-4ce4-ad56-e2d14b447907@linux.dev> (raw)
In-Reply-To: <CADvbK_e7i08GAiOenJNTP_m+-MeYjSf7J-vkF+hgRfYGNCjkwQ@mail.gmail.com>
On 4/22/24 1:58 PM, Xin Long wrote:
> On Sun, Apr 21, 2024 at 3:27 PM Stefan Metzmacher <metze@samba.org> wrote:
>>
>> Am 20.04.24 um 21:32 schrieb Xin Long:
>>> On Fri, Apr 19, 2024 at 3:19 PM Xin Long <lucien.xin@gmail.com> wrote:
>>>>
>>>> On Fri, Apr 19, 2024 at 2:51 PM Stefan Metzmacher <metze@samba.org> wrote:
>>>>>
>>>>> Hi Xin Long,
>>>>>
>>>>>>> But I think its unavoidable for the ALPN and SNI fields on
>>>>>>> the server side. As every service tries to use udp port 443
>>>>>>> and somehow that needs to be shared if multiple services want to
>>>>>>> use it.
>>>>>>>
>>>>>>> I guess on the acceptor side we would need to somehow detach low level
>>>>>>> udp struct sock from the logical listen struct sock.
>>>>>>>
>>>>>>> And quic_do_listen_rcv() would need to find the correct logical listening
>>>>>>> socket and call quic_request_sock_enqueue() on the logical socket
>>>>>>> not the lowlevel udo socket. The same for all stuff happening after
>>>>>>> quic_request_sock_enqueue() at the end of quic_do_listen_rcv.
>>>>>>>
>>>>>> The implementation allows one low level UDP sock to serve for multiple
>>>>>> QUIC socks.
>>>>>>
>>>>>> Currently, if your 3 quic applications listen to the same address:port
>>>>>> with SO_REUSEPORT socket option set, the incoming connection will choose
>>>>>> one of your applications randomly with hash(client_addr+port) vi
>>>>>> reuseport_select_sock() in quic_sock_lookup().
>>>>>>
>>>>>> It should be easy to do a further match with ALPN between these 3 quic
>>>>>> socks that listens to the same address:port to get the right quic sock,
>>>>>> instead of that randomly choosing.
>>>>>
>>>>> Ah, that sounds good.
>>>>>
>>>>>> The problem is to parse the TLS Client_Hello message to get the ALPN in
>>>>>> quic_sock_lookup(), which is not a proper thing to do in kernel, and
>>>>>> might be rejected by networking maintainers, I need to check with them.
>>>>>
>>>>> Is the reassembling of CRYPTO frames done in the kernel or
>>>>> userspace? Can you point me to the place in the code?
>>>> In quic_inq_handshake_tail() in kernel, for Client Initial packet
>>>> is processed when calling accept(), this is the path:
>>>>
>>>> quic_accept()-> quic_accept_sock_init() -> quic_packet_process() ->
>>>> quic_packet_handshake_process() -> quic_frame_process() ->
>>>> quic_frame_crypto_process() -> quic_inq_handshake_tail().
>>>>
>>>> Note that it's with the accept sock, not the listen sock.
>>>>
>>>>>
>>>>> If it's really impossible to do in C code maybe
>>>>> registering a bpf function in order to allow a listener
>>>>> to check the intial quic packet and decide if it wants to serve
>>>>> that connection would be possible as last resort?
>>>> That's a smart idea! man.
>>>> I think the bpf hook in reuseport_select_sock() is meant to do such
>>>> selection.
>>>>
>>>> For the Client initial packet (the only packet you need to handle),
>>>> I double you will need to do the reassembling, as Client Hello TLS message
>>>> is always less than 400 byte in my env.
>>>>
>>>> But I think you need to do the decryption for the Client initial packet
>>>> before decoding it then parsing the TLS message from its crypto frame.
>>> I created this patch:
>>>
>>> https://github.com/lxin/quic/commit/aee0b7c77df3f39941f98bb901c73fdc560befb8
>>>
>>> to do this decryption in quic_sock_look() before calling
>>> reuseport_select_sock(), so that it provides the bpf selector with
>>> a plain-text QUIC initial packet:
>>>
>>> https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.2
>>>
>>> If it's complex for you to do the decryption for the initial packet in
>>> the bpf selector, I will apply this patch. Please let me know.
>>
>> I guess in addition to quic_server_handshake(), which is called
>> after accept(), there should be quic_server_prepare_listen()
>> (and something similar for in kernel servers) that setup the reuseport
>> magic for the socket, so that it's not needed in every application.
> It's done when calling listen(), see quic_inet_listen()->quic_hash()
> where only listening sockets with its sk_reuseport set will be
> added into the reuseport group.
>
> It means SO_REUSEPORT sockopt must be set for every socket
> before calling listen().
>
>>
>> It seems there is only a single ebpf program possible per
>> reuseport group, so there has to be just a single one.
> Yes, a single ebpf program per reuseport group should work.
> see prepare_sk_fds() in kernel selftests for select_reuseport bfp.
>
>>
>> But is it possible for in kernel servers to also register an epbf program?
> Good question. TBH, I don't really know much about epbf programming.
> I guess the real problem is how you pass the .o file to kernel space?
>
> Another question is, in the selftests:
> tools/testing/selftests/bpf/prog_tests/s
> tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
>
> it created a global reuseport_array, and then added these sockets
> into this array for the later lookup, but these sockets are all created
> in the same process.
>
> But your case is that the sockets are created in different processes.
> I'm not sure if it's possible to add sockets from different processes
> into the same reuseport_array?
>
> Added Martin who introduced BPF_PROG_TYPE_SK_REUSEPORT,
> I guess he may know the answers.
I didn't read the patchset, so I don't know what wanted to be done.
From capturing the questions in this and next email:
the reuseport_array is a bpf map. Like any bpf map, it can be shared across
different processes. Meaning different processes can add sk to the map.
The bpf prog that selects a sk from the reuseport_array is set by the userspace
through setsockopt(SO_ATTACH_REUSEPORT_EBPF). It is the only way right now, iirc.
If you can summarize what want to be done, it could help to see if there
are ways that work for the use case.
>
> Thanks.
>
next prev parent reply other threads:[~2024-04-26 4:59 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1710173427.git.lucien.xin@gmail.com>
2024-03-11 19:53 ` Fwd: [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with Userspace handshake Xin Long
2024-03-13 8:56 ` Stefan Metzmacher
2024-03-13 16:03 ` Xin Long
2024-03-13 17:28 ` Stefan Metzmacher
2024-03-13 19:39 ` Xin Long
2024-03-14 9:21 ` Stefan Metzmacher
2024-03-14 16:21 ` Xin Long
2024-04-19 14:07 ` Stefan Metzmacher
2024-04-19 18:09 ` Xin Long
2024-04-19 18:51 ` Stefan Metzmacher
2024-04-19 19:19 ` Xin Long
2024-04-20 19:32 ` Xin Long
2024-04-21 19:27 ` Stefan Metzmacher
2024-04-22 20:58 ` Xin Long
2024-04-26 4:58 ` Martin KaFai Lau [this message]
2024-04-25 18:06 ` Xin Long
2024-04-29 15:20 ` Stefan Metzmacher
2024-05-02 18:08 ` Xin Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=840ddcb4-acaa-4ce4-ad56-e2d14b447907@linux.dev \
--to=martin.lau@linux.dev \
--cc=bpf@vger.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hepengtao@xiaomi.com \
--cc=jlayton@kernel.org \
--cc=kuba@kernel.org \
--cc=linkinjeon@kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=lucien.xin@gmail.com \
--cc=metze@samba.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=samba-technical@lists.samba.org \
--cc=sd@queasysnail.net \
--cc=smfrench@gmail.com \
--cc=tfanelli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).