All of lore.kernel.org
 help / color / mirror / Atom feed
* More flexible BPF socket inet_lookup hooking after listening sockets are dispatched
@ 2021-01-20 13:40 Shanti Lombard née Bouchez-Mongardé
  2021-01-20 21:06 ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Shanti Lombard née Bouchez-Mongardé @ 2021-01-20 13:40 UTC (permalink / raw)
  To: linux-kernel

Hello,

I believe this is my first time here, so please excuse me for mistakes. 
Also, please Cc me on answers.

Background : I am currently investigating putting network services on a 
machine without using network namespace but still keep them isolated. To 
do that, I allocated a separate IP address (127.0.0.0/8 for IPv4 and ULA 
prefix below fd00::/8 for IPv6) and those services are forced to listen 
to this IP address only. For some, I use seccomp with a small utility I 
wrote at <https://github.com/mildred/force-bind-seccomp>. Now, I still 
want a few selected services (reverse proxies) to listed for public 
address but they can't necessarily listen with INADDR_ANY because some 
other services might listen on the same port on their private IP. It 
seems SO_REUSEADDR can be used to circumvent this on BSD but not on 
Linux. After much research, I found Cloudflare recent contribution 
(explained here <https://blog.cloudflare.com/its-crowded-in-here/>) 
about inet_lookup BPF programs that could replace INADDR_ANY listening.

The inet_lookup BPF programs are hooking up in socket selection code for 
incoming packets after connected packets are dispatched to their 
respective sockets but before any new connection is dispatched to a 
listening socket. This is well explained in the blog post.

However, I believe that being able to hook up later in the process could 
have great use cases. With its current position, the BPF program can 
override any listening socket too easily. It can also be surprising for 
administrators used to the socket API not understanding why their 
listening socket does not receives any packet.

Socket selection process (in net/ipv4/inet_hashtables.c function 
__inet_lookup_listener):

- A: look for already connected sockets (before __inet_lookup_listener)
- B: look for inet_lookup BPF programs
- C: look for listening sockets specifying address and port
- D: here, provide another inet_lookup BPF hook
- E: look for sockets listening using INADDR_ANY
- F: here, provide another inet_lookup BPF hook

In position D, a BPF program could implement socket listening like 
INADDR_ANY listening would do but without the limitation that the port 
must not be listened on by another IP address

In position F, a BPF program could redirect new connection attempts to a 
socket of its choice, allowing any connection attempt to be intercepted 
if not catched before by an already listening socket.

The suggestion above would work for my use case, but there is another 
possibility to make the same use cases possible : implement in BPF (or 
allow BPF to call) the C and E steps above so the BPF program can 
supplant the kernel behavior. I find this solution less elegant and it 
might not work well in case there are multiple inet_lookup BPF programs 
installed.

With this e-mail I wanted to spawn a discussion around that and possibly 
take on the implementation. I never did any kernel development before 
but you must start by something, and I believe this is a rather simple 
improvement (duplicate already existing hooking, just a little bit lower 
in the function). I might not be able to deliver this very quickly 
either because I have limited time for this and I need to learn kernel 
development but I'm ready to take on this task.

Thank you for your time

Shanti



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: More flexible BPF socket inet_lookup hooking after listening sockets are dispatched
  2021-01-20 13:40 More flexible BPF socket inet_lookup hooking after listening sockets are dispatched Shanti Lombard née Bouchez-Mongardé
@ 2021-01-20 21:06 ` Alexei Starovoitov
  2021-01-21 11:14   ` Jakub Sitnicki
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2021-01-20 21:06 UTC (permalink / raw)
  To: Shanti Lombard née Bouchez-Mongardé
  Cc: bpf, Network Development, Jakub Sitnicki, Martin KaFai Lau

cc-ing the right folks

On Wed, Jan 20, 2021 at 12:30 PM Shanti Lombard née Bouchez-Mongardé
<shanti20210120@mildred.fr> wrote:
>
> Hello,
>
> I believe this is my first time here, so please excuse me for mistakes.
> Also, please Cc me on answers.
>
> Background : I am currently investigating putting network services on a
> machine without using network namespace but still keep them isolated. To
> do that, I allocated a separate IP address (127.0.0.0/8 for IPv4 and ULA
> prefix below fd00::/8 for IPv6) and those services are forced to listen
> to this IP address only. For some, I use seccomp with a small utility I
> wrote at <https://github.com/mildred/force-bind-seccomp>. Now, I still
> want a few selected services (reverse proxies) to listed for public
> address but they can't necessarily listen with INADDR_ANY because some
> other services might listen on the same port on their private IP. It
> seems SO_REUSEADDR can be used to circumvent this on BSD but not on
> Linux. After much research, I found Cloudflare recent contribution
> (explained here <https://blog.cloudflare.com/its-crowded-in-here/>)
> about inet_lookup BPF programs that could replace INADDR_ANY listening.
>
> The inet_lookup BPF programs are hooking up in socket selection code for
> incoming packets after connected packets are dispatched to their
> respective sockets but before any new connection is dispatched to a
> listening socket. This is well explained in the blog post.
>
> However, I believe that being able to hook up later in the process could
> have great use cases. With its current position, the BPF program can
> override any listening socket too easily. It can also be surprising for
> administrators used to the socket API not understanding why their
> listening socket does not receives any packet.
>
> Socket selection process (in net/ipv4/inet_hashtables.c function
> __inet_lookup_listener):
>
> - A: look for already connected sockets (before __inet_lookup_listener)
> - B: look for inet_lookup BPF programs
> - C: look for listening sockets specifying address and port
> - D: here, provide another inet_lookup BPF hook
> - E: look for sockets listening using INADDR_ANY
> - F: here, provide another inet_lookup BPF hook
>
> In position D, a BPF program could implement socket listening like
> INADDR_ANY listening would do but without the limitation that the port
> must not be listened on by another IP address
>
> In position F, a BPF program could redirect new connection attempts to a
> socket of its choice, allowing any connection attempt to be intercepted
> if not catched before by an already listening socket.
>
> The suggestion above would work for my use case, but there is another
> possibility to make the same use cases possible : implement in BPF (or
> allow BPF to call) the C and E steps above so the BPF program can
> supplant the kernel behavior. I find this solution less elegant and it
> might not work well in case there are multiple inet_lookup BPF programs
> installed.
>
> With this e-mail I wanted to spawn a discussion around that and possibly
> take on the implementation. I never did any kernel development before
> but you must start by something, and I believe this is a rather simple
> improvement (duplicate already existing hooking, just a little bit lower
> in the function). I might not be able to deliver this very quickly
> either because I have limited time for this and I need to learn kernel
> development but I'm ready to take on this task.
>
> Thank you for your time
>
> Shanti
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: More flexible BPF socket inet_lookup hooking after listening sockets are dispatched
  2021-01-20 21:06 ` Alexei Starovoitov
@ 2021-01-21 11:14   ` Jakub Sitnicki
  2021-01-21 20:40     ` Shanti Lombard
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Sitnicki @ 2021-01-21 11:14 UTC (permalink / raw)
  To: Shanti Lombard née Bouchez-Mongardé
  Cc: Alexei Starovoitov, bpf, Network Development, Martin KaFai Lau

On Wed, Jan 20, 2021 at 10:06 PM CET, Alexei Starovoitov wrote:
> cc-ing the right folks
>
> On Wed, Jan 20, 2021 at 12:30 PM Shanti Lombard née Bouchez-Mongardé
> <shanti20210120@mildred.fr> wrote:
>>
>> Hello,
>>
>> I believe this is my first time here, so please excuse me for mistakes.
>> Also, please Cc me on answers.
>>
>> Background : I am currently investigating putting network services on a
>> machine without using network namespace but still keep them isolated. To
>> do that, I allocated a separate IP address (127.0.0.0/8 for IPv4 and ULA
>> prefix below fd00::/8 for IPv6) and those services are forced to listen
>> to this IP address only. For some, I use seccomp with a small utility I
>> wrote at <https://github.com/mildred/force-bind-seccomp>. Now, I still
>> want a few selected services (reverse proxies) to listed for public
>> address but they can't necessarily listen with INADDR_ANY because some
>> other services might listen on the same port on their private IP. It
>> seems SO_REUSEADDR can be used to circumvent this on BSD but not on
>> Linux. After much research, I found Cloudflare recent contribution
>> (explained here <https://blog.cloudflare.com/its-crowded-in-here/>)
>> about inet_lookup BPF programs that could replace INADDR_ANY listening.

There is also documentation in the kernel:

https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html

>> The inet_lookup BPF programs are hooking up in socket selection code for
>> incoming packets after connected packets are dispatched to their
>> respective sockets but before any new connection is dispatched to a
>> listening socket. This is well explained in the blog post.
>>
>> However, I believe that being able to hook up later in the process could
>> have great use cases. With its current position, the BPF program can
>> override any listening socket too easily. It can also be surprising for
>> administrators used to the socket API not understanding why their
>> listening socket does not receives any packet.
>>
>> Socket selection process (in net/ipv4/inet_hashtables.c function
>> __inet_lookup_listener):
>>
>> - A: look for already connected sockets (before __inet_lookup_listener)
>> - B: look for inet_lookup BPF programs
>> - C: look for listening sockets specifying address and port
>> - D: here, provide another inet_lookup BPF hook
>> - E: look for sockets listening using INADDR_ANY
>> - F: here, provide another inet_lookup BPF hook
>>
>> In position D, a BPF program could implement socket listening like
>> INADDR_ANY listening would do but without the limitation that the port
>> must not be listened on by another IP address
>>
>> In position F, a BPF program could redirect new connection attempts to a
>> socket of its choice, allowing any connection attempt to be intercepted
>> if not catched before by an already listening socket.

Existing hook is placed before regular listening/unconnected socket
lookup to prevent port hijacking on the unprivileged range.

>> The suggestion above would work for my use case, but there is another
>> possibility to make the same use cases possible : implement in BPF (or
>> allow BPF to call) the C and E steps above so the BPF program can
>> supplant the kernel behavior. I find this solution less elegant and it
>> might not work well in case there are multiple inet_lookup BPF programs
>> installed.

Having a BPF helper available to BPF sk_lookup programs that looks up a
socket by packet 4-tuple and netns ID in tcp/udp hashtables sounds
reasonable to me. You gain the flexibility that you describe without
adding code on the hot path.

[...]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: More flexible BPF socket inet_lookup hooking after listening sockets are dispatched
  2021-01-21 11:14   ` Jakub Sitnicki
@ 2021-01-21 20:40     ` Shanti Lombard
  2021-01-21 22:08       ` Martin KaFai Lau
  0 siblings, 1 reply; 5+ messages in thread
From: Shanti Lombard @ 2021-01-21 20:40 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Shanti Lombard née Bouchez-Mongardé,
	Alexei Starovoitov, bpf, Network Development, Martin KaFai Lau

Le 2021-01-21 12:14, Jakub Sitnicki a écrit :
> On Wed, Jan 20, 2021 at 10:06 PM CET, Alexei Starovoitov wrote:
> 
> There is also documentation in the kernel:
> 
> https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html
> 

Thank you, I saw it, it's well written and very much explains it all.

> 
> Existing hook is placed before regular listening/unconnected socket
> lookup to prevent port hijacking on the unprivileged range.
> 

Yes, from the point of view of the BPF program. However from the point 
of view of a legitimate service listening on a port that might be 
blocked by the BPF program, BPF is actually hijacking a port bind.

That being said, if you install the BPF filter, you should know what you 
are doing.

>>> The suggestion above would work for my use case, but there is another
>>> possibility to make the same use cases possible : implement in BPF 
>>> (or
>>> allow BPF to call) the C and E steps above so the BPF program can
>>> supplant the kernel behavior. I find this solution less elegant and 
>>> it
>>> might not work well in case there are multiple inet_lookup BPF 
>>> programs
>>> installed.
> 
> Having a BPF helper available to BPF sk_lookup programs that looks up a
> socket by packet 4-tuple and netns ID in tcp/udp hashtables sounds
> reasonable to me. You gain the flexibility that you describe without
> adding code on the hot path.

True, if you consider that hot path should not be slowed down. It makes 
sense. However, for me, it seems the implementation would be more 
difficult.

Looking at existing BPF helpers 
<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html> I found 
bpf_sk_lookup_tcp and bpf_sk_lookup_ucp that should yield a socket from 
a matching tuple and netns. If that's true and usable from within BPF 
sk_lookup then it's just a matter of implementing it and the kernel is 
already ready for such use cases.

Shanti

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: More flexible BPF socket inet_lookup hooking after listening sockets are dispatched
  2021-01-21 20:40     ` Shanti Lombard
@ 2021-01-21 22:08       ` Martin KaFai Lau
  0 siblings, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2021-01-21 22:08 UTC (permalink / raw)
  To: Shanti Lombard
  Cc: Jakub Sitnicki, Shanti Lombard née Bouchez-Mongardé,
	Alexei Starovoitov, bpf, Network Development

On Thu, Jan 21, 2021 at 09:40:19PM +0100, Shanti Lombard wrote:
> Le 2021-01-21 12:14, Jakub Sitnicki a écrit :
> > On Wed, Jan 20, 2021 at 10:06 PM CET, Alexei Starovoitov wrote:
> > 
> > There is also documentation in the kernel:
> > 
> > https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html
> > 
> 
> Thank you, I saw it, it's well written and very much explains it all.
> 
> > 
> > Existing hook is placed before regular listening/unconnected socket
> > lookup to prevent port hijacking on the unprivileged range.
> > 
> 
> Yes, from the point of view of the BPF program. However from the point of
> view of a legitimate service listening on a port that might be blocked by
> the BPF program, BPF is actually hijacking a port bind.
> 
> That being said, if you install the BPF filter, you should know what you are
> doing.
> 
> > > > The suggestion above would work for my use case, but there is another
> > > > possibility to make the same use cases possible : implement in
> > > > BPF (or
> > > > allow BPF to call) the C and E steps above so the BPF program can
> > > > supplant the kernel behavior. I find this solution less elegant
> > > > and it
> > > > might not work well in case there are multiple inet_lookup BPF
> > > > programs
> > > > installed.
> > 
> > Having a BPF helper available to BPF sk_lookup programs that looks up a
> > socket by packet 4-tuple and netns ID in tcp/udp hashtables sounds
> > reasonable to me. You gain the flexibility that you describe without
> > adding code on the hot path.
Agree that a helper to lookup the inet_hash is probably a better way.
There are some existing lookup helper examples as you also pointed out.

I would avoid adding new hooks doing the same thing.
The same bpf prog will be called multiple times, the bpf running
ctx has to be initialized multiple times...etc.

> 
> True, if you consider that hot path should not be slowed down. It makes
> sense. However, for me, it seems the implementation would be more difficult.
> 
> Looking at existing BPF helpers <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
> > I found bpf_sk_lookup_tcp and bpf_sk_lookup_ucp that should yield a socket
> from a matching tuple and netns. If that's true and usable from within BPF
> sk_lookup then it's just a matter of implementing it and the kernel is
> already ready for such use cases.
> 
> Shanti

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-21 22:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 13:40 More flexible BPF socket inet_lookup hooking after listening sockets are dispatched Shanti Lombard née Bouchez-Mongardé
2021-01-20 21:06 ` Alexei Starovoitov
2021-01-21 11:14   ` Jakub Sitnicki
2021-01-21 20:40     ` Shanti Lombard
2021-01-21 22:08       ` Martin KaFai Lau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.