netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* combining sockmap + ktls
@ 2019-11-15 21:31 Willem de Bruijn
  2019-11-15 22:21 ` John Fastabend
  0 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2019-11-15 21:31 UTC (permalink / raw)
  To: bpf; +Cc: Network Development, John Fastabend, Jakub Kicinski, Daniel Borkmann

I've been playing with sockmap and ktls. They're fantastic tools.
Combining them I did run into a few issues. Would like to understand
whether (a) it's just me, else (b) whether these are known issues and
(c) some feedback on an initial hacky patch.

My test [1] sets up an echo request/response between a client and
server, optionally interposed by an "icept" guard process on each side
and optionally enabling ktls between the icept processes.

Without ktls, most variants of interpositioning {iptables, iptables +
splice(), iptables + sockmap splice, sk_msg to icept tx } work.

Only sk_msg redirection to icept ingress with BPF_F_INGRESS does not
if the destination socket has a verdict program. I *think* this is
intentional, judging from commit 552de9106882 ("bpf: sk_msg, fix
socket data_ready events") explicitly ensuring that the process gets
awoken on new data if a socket has a verdict program and another
socket redirects to it, as opposed to passing it to the program.

For this workload, more interesting is sk_msg directly to icept
egress, anyway. This works without ktls. Support for ktls is added in
commit d3b18ad31f93 ("tls: add bpf support to sk_msg handling"). The
relevant callback function tls_sw_sendpage_locked was not immediately
used and subsequently removed in commit cc1dbdfed023 ("Revert
"net/tls: remove unused function tls_sw_sendpage_locked""). It appears
to work once reverting that change, plus registering the function

        @@ -859,6 +861,7 @@ static int __init tls_register(void)

                tls_sw_proto_ops = inet_stream_ops;
                tls_sw_proto_ops.splice_read = tls_sw_splice_read;
        +       tls_sw_proto_ops.sendpage_locked   = tls_sw_sendpage_locked,

and additionally allowing MSG_NO_SHARED_FRAGS:

         int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
                                    int offset, size_t size, int flags)
         {
               if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
        -                     MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
        +                     MSG_SENDPAGE_NOTLAST |
MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS))
                         return -ENOTSUPP;

and not registering parser+verdict programs on the destination socket.
Note that without ktls this mode also works with such programs
attached.

Lastly, sockmap splicing from icept ingress to egress (no sk_msg) also
stops working when I enable ktls on the egress socket. I'm taking a
look at that next. But this email is long enough already ;)

Thanks for having a look!

  Willem

[1] https://github.com/wdebruij/kerneltools/tree/icept.2

probably more readable is the stack of commits, one per feature:

  c86c112 icept: initial client/server test
  727a8ae icept: add iptables interception
  60c34b2 icept: add splice interception
  03a516a icept: add sockmap interception
  c9c6103 icept: run client and server in cgroup
  579bcae icept: add skmsg interception
  e1b0d17 icept: add kTLS

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: combining sockmap + ktls
  2019-11-15 21:31 combining sockmap + ktls Willem de Bruijn
@ 2019-11-15 22:21 ` John Fastabend
  2019-11-15 23:12   ` Willem de Bruijn
  0 siblings, 1 reply; 5+ messages in thread
From: John Fastabend @ 2019-11-15 22:21 UTC (permalink / raw)
  To: Willem de Bruijn, bpf
  Cc: Network Development, John Fastabend, Jakub Kicinski, Daniel Borkmann

Willem de Bruijn wrote:
> I've been playing with sockmap and ktls. They're fantastic tools.

Great, glad to get more eyes on it. Thanks.

> Combining them I did run into a few issues. Would like to understand
> whether (a) it's just me, else (b) whether these are known issues and
> (c) some feedback on an initial hacky patch.

There are still a few outstanding things I need to flush out of my
queue listed below.

> 
> My test [1] sets up an echo request/response between a client and
> server, optionally interposed by an "icept" guard process on each side
> and optionally enabling ktls between the icept processes.
> 
> Without ktls, most variants of interpositioning {iptables, iptables +
> splice(), iptables + sockmap splice, sk_msg to icept tx } work.
> 
> Only sk_msg redirection to icept ingress with BPF_F_INGRESS does not
> if the destination socket has a verdict program. I *think* this is
> intentional, judging from commit 552de9106882 ("bpf: sk_msg, fix
> socket data_ready events") explicitly ensuring that the process gets
> awoken on new data if a socket has a verdict program and another
> socket redirects to it, as opposed to passing it to the program.

Right.

> 
> For this workload, more interesting is sk_msg directly to icept
> egress, anyway. This works without ktls. Support for ktls is added in
> commit d3b18ad31f93 ("tls: add bpf support to sk_msg handling"). The
> relevant callback function tls_sw_sendpage_locked was not immediately
> used and subsequently removed in commit cc1dbdfed023 ("Revert
> "net/tls: remove unused function tls_sw_sendpage_locked""). It appears
> to work once reverting that change, plus registering the function

I don't fully understand this. Are you saying a BPF_SK_MSG_VERDICT
program attach to a ktls socket is not being called? Or packets are
being dropped or ...? Or that the program doesn't work even with
just KTLS and no bpf involved. 

> 
>         @@ -859,6 +861,7 @@ static int __init tls_register(void)
> 
>                 tls_sw_proto_ops = inet_stream_ops;
>                 tls_sw_proto_ops.splice_read = tls_sw_splice_read;
>         +       tls_sw_proto_ops.sendpage_locked   = tls_sw_sendpage_locked,
> 
> and additionally allowing MSG_NO_SHARED_FRAGS:
> 
>          int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
>                                     int offset, size_t size, int flags)
>          {
>                if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
>         -                     MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
>         +                     MSG_SENDPAGE_NOTLAST |
> MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS))
>                          return -ENOTSUPP;
> 

If you had added MSG_NO_SHARED_FRAGS to the existing tls_sw_sendpage
would that have been sufficient?

> and not registering parser+verdict programs on the destination socket.
> Note that without ktls this mode also works with such programs
> attached.

Right ingress + ktls is known to be broken at the moment. Also I have
plans to cleanup ingress side at some point. The current model is a
bit clumsy IMO. The workqueue adds latency spikes on the 99+
percentiles. At this point it makes the ingress side similar to the
egress side without a workqueue and with verdict+parser done in a
single program.

> 
> Lastly, sockmap splicing from icept ingress to egress (no sk_msg) also
> stops working when I enable ktls on the egress socket. I'm taking a
> look at that next. But this email is long enough already ;)

Yes this is a known bug I've got a set of patches to address this. I've
been trying to get to it for awhile now and just resolved a few other
things on my side so plan to do this Monday/Tuesday next week.

FWIW there is also a bugfix on the way to resolve a case where we
receive a FIN/RST flag after redirecting data causing dropped data.

> 
> Thanks for having a look!
> 
>   Willem
> 
> [1] https://github.com/wdebruij/kerneltools/tree/icept.2
> 
> probably more readable is the stack of commits, one per feature:
> 
>   c86c112 icept: initial client/server test
>   727a8ae icept: add iptables interception
>   60c34b2 icept: add splice interception
>   03a516a icept: add sockmap interception
>   c9c6103 icept: run client and server in cgroup
>   579bcae icept: add skmsg interception
>   e1b0d17 icept: add kTLS



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: combining sockmap + ktls
  2019-11-15 22:21 ` John Fastabend
@ 2019-11-15 23:12   ` Willem de Bruijn
  2019-11-18  4:49     ` John Fastabend
  0 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2019-11-15 23:12 UTC (permalink / raw)
  To: John Fastabend
  Cc: Willem de Bruijn, bpf, Network Development, Jakub Kicinski,
	Daniel Borkmann

> > For this workload, more interesting is sk_msg directly to icept
> > egress, anyway. This works without ktls. Support for ktls is added in
> > commit d3b18ad31f93 ("tls: add bpf support to sk_msg handling"). The
> > relevant callback function tls_sw_sendpage_locked was not immediately
> > used and subsequently removed in commit cc1dbdfed023 ("Revert
> > "net/tls: remove unused function tls_sw_sendpage_locked""). It appears
> > to work once reverting that change, plus registering the function
>
> I don't fully understand this. Are you saying a BPF_SK_MSG_VERDICT
> program attach to a ktls socket is not being called? Or packets are
> being dropped or ...? Or that the program doesn't work even with
> just KTLS and no bpf involved.

Not the verdict program. The setup is as follows:

  client --> icept.1 -- (optionally ktls) --> icept.2 --> server

I'm trying to redirect on send from the client directly into the send
side of the ktls socket, to avoid waking up the icept.1 process
completely. The ktls enabled socket has no BPF programs associated.

>
> >
> >         @@ -859,6 +861,7 @@ static int __init tls_register(void)
> >
> >                 tls_sw_proto_ops = inet_stream_ops;
> >                 tls_sw_proto_ops.splice_read = tls_sw_splice_read;
> >         +       tls_sw_proto_ops.sendpage_locked   = tls_sw_sendpage_locked,
> >
> > and additionally allowing MSG_NO_SHARED_FRAGS:
> >
> >          int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
> >                                     int offset, size_t size, int flags)
> >          {
> >                if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
> >         -                     MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
> >         +                     MSG_SENDPAGE_NOTLAST |
> > MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS))
> >                          return -ENOTSUPP;
> >
>
> If you had added MSG_NO_SHARED_FRAGS to the existing tls_sw_sendpage
> would that have been sufficient?

No, the stack trace I observed is

  tcp_bpf_sendmsg_redir
    tcp_bpf_push_locked
      tcp_bpf_push
        kernel_sendpage_locked
          sock->ops->sendpage_locked

which never tries tls_sw_sendpage. Perhaps the relevant part is the
following in tcp_bpf_push?

                if (has_tx_ulp) {
                        flags |= MSG_SENDPAGE_NOPOLICY;
                        ret = kernel_sendpage_locked(sk,
                                                     page, off, size, flags);
                } else {
                        ret = do_tcp_sendpages(sk, page, off, size, flags);
                }

> > and not registering parser+verdict programs on the destination socket.
> > Note that without ktls this mode also works with such programs
> > attached.
>
> Right ingress + ktls is known to be broken at the moment. Also I have
> plans to cleanup ingress side at some point. The current model is a
> bit clumsy IMO. The workqueue adds latency spikes on the 99+
> percentiles. At this point it makes the ingress side similar to the
> egress side without a workqueue and with verdict+parser done in a
> single program.

Good to know thanks. Then I won't spend too much time on this.

> >
> > Lastly, sockmap splicing from icept ingress to egress (no sk_msg) also
> > stops working when I enable ktls on the egress socket. I'm taking a
> > look at that next. But this email is long enough already ;)
>
> Yes this is a known bug I've got a set of patches to address this.

Same thing. Good to know I'm not crazy :)

> I've
> been trying to get to it for awhile now and just resolved a few other
> things on my side so plan to do this Monday/Tuesday next week.

To be clear, I don't mean to pressure at all. Was just running through
a variety of points in the option space. The sk_msg  plus redirect to
egress of a ktls-enabled socket is the variant I'm most interested in.

Do let me know if there's anything I can help out with. Thanks for
your quick answer!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: combining sockmap + ktls
  2019-11-15 23:12   ` Willem de Bruijn
@ 2019-11-18  4:49     ` John Fastabend
  2019-11-18 15:45       ` Willem de Bruijn
  0 siblings, 1 reply; 5+ messages in thread
From: John Fastabend @ 2019-11-18  4:49 UTC (permalink / raw)
  To: Willem de Bruijn, John Fastabend
  Cc: Willem de Bruijn, bpf, Network Development, Jakub Kicinski,
	Daniel Borkmann

Willem de Bruijn wrote:
> > > For this workload, more interesting is sk_msg directly to icept
> > > egress, anyway. This works without ktls. Support for ktls is added in
> > > commit d3b18ad31f93 ("tls: add bpf support to sk_msg handling"). The
> > > relevant callback function tls_sw_sendpage_locked was not immediately
> > > used and subsequently removed in commit cc1dbdfed023 ("Revert
> > > "net/tls: remove unused function tls_sw_sendpage_locked""). It appears
> > > to work once reverting that change, plus registering the function
> >
> > I don't fully understand this. Are you saying a BPF_SK_MSG_VERDICT
> > program attach to a ktls socket is not being called? Or packets are
> > being dropped or ...? Or that the program doesn't work even with
> > just KTLS and no bpf involved.
> 
> Not the verdict program. The setup is as follows:
> 
>   client --> icept.1 -- (optionally ktls) --> icept.2 --> server
> 
> I'm trying to redirect on send from the client directly into the send
> side of the ktls socket, to avoid waking up the icept.1 process
> completely. The ktls enabled socket has no BPF programs associated.
> 

Ah ok. This is probably the least optimized case, we've focused mostly
on the ingress sk_msg case so far.

> >
> > >
> > >         @@ -859,6 +861,7 @@ static int __init tls_register(void)
> > >
> > >                 tls_sw_proto_ops = inet_stream_ops;
> > >                 tls_sw_proto_ops.splice_read = tls_sw_splice_read;
> > >         +       tls_sw_proto_ops.sendpage_locked   = tls_sw_sendpage_locked,
> > >
> > > and additionally allowing MSG_NO_SHARED_FRAGS:
> > >
> > >          int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
> > >                                     int offset, size_t size, int flags)
> > >          {
> > >                if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
> > >         -                     MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
> > >         +                     MSG_SENDPAGE_NOTLAST |
> > > MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS))
> > >                          return -ENOTSUPP;
> > >
> >
> > If you had added MSG_NO_SHARED_FRAGS to the existing tls_sw_sendpage
> > would that have been sufficient?
> 
> No, the stack trace I observed is
> 
>   tcp_bpf_sendmsg_redir
>     tcp_bpf_push_locked
>       tcp_bpf_push
>         kernel_sendpage_locked
>           sock->ops->sendpage_locked
> 
> which never tries tls_sw_sendpage. Perhaps the relevant part is the
> following in tcp_bpf_push?
> 
>                 if (has_tx_ulp) {
>                         flags |= MSG_SENDPAGE_NOPOLICY;
>                         ret = kernel_sendpage_locked(sk,
>                                                      page, off, size, flags);
>                 } else {
>                         ret = do_tcp_sendpages(sk, page, off, size, flags);
>                 }
> 

Got it, want to submit a fix? Or I can this is a bug.

> > > and not registering parser+verdict programs on the destination socket.
> > > Note that without ktls this mode also works with such programs
> > > attached.
> >
> > Right ingress + ktls is known to be broken at the moment. Also I have
> > plans to cleanup ingress side at some point. The current model is a
> > bit clumsy IMO. The workqueue adds latency spikes on the 99+
> > percentiles. At this point it makes the ingress side similar to the
> > egress side without a workqueue and with verdict+parser done in a
> > single program.
> 
> Good to know thanks. Then I won't spend too much time on this.
> 
> > >
> > > Lastly, sockmap splicing from icept ingress to egress (no sk_msg) also
> > > stops working when I enable ktls on the egress socket. I'm taking a
> > > look at that next. But this email is long enough already ;)
> >
> > Yes this is a known bug I've got a set of patches to address this.
> 
> Same thing. Good to know I'm not crazy :)
> 
> > I've
> > been trying to get to it for awhile now and just resolved a few other
> > things on my side so plan to do this Monday/Tuesday next week.
> 
> To be clear, I don't mean to pressure at all. Was just running through
> a variety of points in the option space. The sk_msg  plus redirect to
> egress of a ktls-enabled socket is the variant I'm most interested in.

No pressure I need to get my queue here out I've been sitting on it
for awhile.

> 
> Do let me know if there's anything I can help out with. Thanks for
> your quick answer!

Can you send out a fix for above sendpage_locked case?

Thanks,
John

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: combining sockmap + ktls
  2019-11-18  4:49     ` John Fastabend
@ 2019-11-18 15:45       ` Willem de Bruijn
  0 siblings, 0 replies; 5+ messages in thread
From: Willem de Bruijn @ 2019-11-18 15:45 UTC (permalink / raw)
  To: John Fastabend
  Cc: Willem de Bruijn, bpf, Network Development, Jakub Kicinski,
	Daniel Borkmann

> > > >         @@ -859,6 +861,7 @@ static int __init tls_register(void)
> > > >
> > > >                 tls_sw_proto_ops = inet_stream_ops;
> > > >                 tls_sw_proto_ops.splice_read = tls_sw_splice_read;
> > > >         +       tls_sw_proto_ops.sendpage_locked   = tls_sw_sendpage_locked,
> > > >
> > > > and additionally allowing MSG_NO_SHARED_FRAGS:
> > > >
> > > >          int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
> > > >                                     int offset, size_t size, int flags)
> > > >          {
> > > >                if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
> > > >         -                     MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
> > > >         +                     MSG_SENDPAGE_NOTLAST |
> > > > MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS))
> > > >                          return -ENOTSUPP;
> > > >
> > >
> > > If you had added MSG_NO_SHARED_FRAGS to the existing tls_sw_sendpage
> > > would that have been sufficient?
> >
> > No, the stack trace I observed is
> >
> >   tcp_bpf_sendmsg_redir
> >     tcp_bpf_push_locked
> >       tcp_bpf_push
> >         kernel_sendpage_locked
> >           sock->ops->sendpage_locked
> >
> > which never tries tls_sw_sendpage. Perhaps the relevant part is the
> > following in tcp_bpf_push?
> >
> >                 if (has_tx_ulp) {
> >                         flags |= MSG_SENDPAGE_NOPOLICY;
> >                         ret = kernel_sendpage_locked(sk,
> >                                                      page, off, size, flags);
> >                 } else {
> >                         ret = do_tcp_sendpages(sk, page, off, size, flags);
> >                 }
> >
>
> Got it, want to submit a fix? Or I can this is a bug.
>
> > Do let me know if there's anything I can help out with. Thanks for
> > your quick answer!
>
> Can you send out a fix for above sendpage_locked case?

Done:

 net/tls: enable sk_msg redirect to tls socket egress
 http://patchwork.ozlabs.org/patch/1196825/

It addresses both issues at the same time. This seemed preferable than
two separate patches. MSG_NO_SHARED_FRAGS precedes commit 0608c69c9a80
("bpf: sk_msg, sock{map|hash} redirect through ULP"), so no additional
Fixes tag for that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-18 15:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15 21:31 combining sockmap + ktls Willem de Bruijn
2019-11-15 22:21 ` John Fastabend
2019-11-15 23:12   ` Willem de Bruijn
2019-11-18  4:49     ` John Fastabend
2019-11-18 15:45       ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).