All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict.
@ 2022-06-28 15:25 Julien Salleyron
  2022-06-28 17:34 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Julien Salleyron @ 2022-06-28 15:25 UTC (permalink / raw)
  To: bpf, netdev; +Cc: Julien Salleyron, Marc Vertes

This patch allows to use KTLS on a socket where we apply sk_redirect using a BPF
verdict program.

Without this patch, we see that the data received after the redirection are
decrypted but with an incorrect offset and length. It seems to us that the
offset and length are correct in the stream-parser data, but finally not applied
in the skb. We have simply applied those values to the skb.

In the case of regular sockets, we saw a big performance improvement from
applying redirect. This is not the case now with KTLS, may be related to the
following point.

It is still necessary to perform a read operation (never triggered) from user
space despite the redirection. It makes no sense, since this read operation is
not necessary on regular sockets without KTLS.

We do not see how to fix this problem without a change of architecture, for
example by performing TLS decrypt directly inside the BPF verdict program.

An example program can be found at
https://github.com/juliens/ktls-bpf_redirect-example/

Co-authored-by: Marc Vertes <mvertes@free.fr>
---
 net/tls/tls_sw.c                           | 6 ++++++
 tools/testing/selftests/bpf/test_sockmap.c | 8 +++-----
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0513f82b8537..a409f8a251db 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1839,8 +1839,14 @@ int tls_sw_recvmsg(struct sock *sk,
 			if (bpf_strp_enabled) {
 				/* BPF may try to queue the skb */
 				__skb_unlink(skb, &ctx->rx_list);
+
 				err = sk_psock_tls_strp_read(psock, skb);
+
 				if (err != __SK_PASS) {
+                    if (err == __SK_REDIRECT) {
+                        skb->data += rxm->offset;
+                        skb->len = rxm->full_len;
+                    }
 					rxm->offset = rxm->offset + rxm->full_len;
 					rxm->full_len = 0;
 					if (err == __SK_DROP)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 0fbaccdc8861..503e0f3d16a7 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -739,13 +739,11 @@ static int sendmsg_test(struct sockmap_options *opt)
 
 	if (ktls) {
 		/* Redirecting into non-TLS socket which sends into a TLS
-		 * socket is not a valid test. So in this case lets not
-		 * enable kTLS but still run the test.
+		 * socket is not a valid test. So in this case just skip
+		 * the test.
 		 */
 		if (!txmsg_redir || txmsg_ingress) {
-			err = sockmap_init_ktls(opt->verbose, rx_fd);
-			if (err)
-				return err;
+			return 0;
 		}
 		err = sockmap_init_ktls(opt->verbose, c1);
 		if (err)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict.
  2022-06-28 15:25 [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict Julien Salleyron
@ 2022-06-28 17:34 ` Jakub Kicinski
  2022-06-29  7:00   ` John Fastabend
  2022-07-05  3:26   ` Ziyang Xuan (William)
  0 siblings, 2 replies; 6+ messages in thread
From: Jakub Kicinski @ 2022-06-28 17:34 UTC (permalink / raw)
  To: Julien Salleyron; +Cc: bpf, netdev, Marc Vertes

On Tue, 28 Jun 2022 17:25:05 +0200 Julien Salleyron wrote:
> This patch allows to use KTLS on a socket where we apply sk_redirect using a BPF
> verdict program.
> 
> Without this patch, we see that the data received after the redirection are
> decrypted but with an incorrect offset and length. It seems to us that the
> offset and length are correct in the stream-parser data, but finally not applied
> in the skb. We have simply applied those values to the skb.
> 
> In the case of regular sockets, we saw a big performance improvement from
> applying redirect. This is not the case now with KTLS, may be related to the
> following point.

It's because kTLS does a very expensive reallocation and copy for the
non-zerocopy case (which currently means all of TLS 1.3). I have
code almost ready to fix that (just needs to be reshuffled into
upstreamable patches). Brings us up from 5.9 Gbps to 8.4 Gbps per CPU
on my test box with 16k records. Probably much more than that with
smaller records.

> It is still necessary to perform a read operation (never triggered) from user
> space despite the redirection. It makes no sense, since this read operation is
> not necessary on regular sockets without KTLS.
> 
> We do not see how to fix this problem without a change of architecture, for
> example by performing TLS decrypt directly inside the BPF verdict program.
> 
> An example program can be found at
> https://github.com/juliens/ktls-bpf_redirect-example/
> 
> Co-authored-by: Marc Vertes <mvertes@free.fr>
> ---
>  net/tls/tls_sw.c                           | 6 ++++++
>  tools/testing/selftests/bpf/test_sockmap.c | 8 +++-----
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 0513f82b8537..a409f8a251db 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1839,8 +1839,14 @@ int tls_sw_recvmsg(struct sock *sk,
>  			if (bpf_strp_enabled) {
>  				/* BPF may try to queue the skb */
>  				__skb_unlink(skb, &ctx->rx_list);
> +
>  				err = sk_psock_tls_strp_read(psock, skb);
> +
>  				if (err != __SK_PASS) {
> +                    if (err == __SK_REDIRECT) {
> +                        skb->data += rxm->offset;
> +                        skb->len = rxm->full_len;
> +                    }

IDK what this is trying to do but I certainly depends on the fact 
we run skb_cow_data() and is not "generally correct" :S

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict.
  2022-06-28 17:34 ` Jakub Kicinski
@ 2022-06-29  7:00   ` John Fastabend
  2022-06-29  8:58     ` Julien Salleyron
  2022-06-30 15:21     ` Vadim Fedorenko
  2022-07-05  3:26   ` Ziyang Xuan (William)
  1 sibling, 2 replies; 6+ messages in thread
From: John Fastabend @ 2022-06-29  7:00 UTC (permalink / raw)
  To: Jakub Kicinski, Julien Salleyron; +Cc: bpf, netdev, Marc Vertes

Jakub Kicinski wrote:
> On Tue, 28 Jun 2022 17:25:05 +0200 Julien Salleyron wrote:
> > This patch allows to use KTLS on a socket where we apply sk_redirect using a BPF
> > verdict program.
> > 

You'll also need a signed-off-by.

> > Without this patch, we see that the data received after the redirection are
> > decrypted but with an incorrect offset and length. It seems to us that the
> > offset and length are correct in the stream-parser data, but finally not applied
> > in the skb. We have simply applied those values to the skb.
> > 
> > In the case of regular sockets, we saw a big performance improvement from
> > applying redirect. This is not the case now with KTLS, may be related to the
> > following point.
> 
> It's because kTLS does a very expensive reallocation and copy for the
> non-zerocopy case (which currently means all of TLS 1.3). I have
> code almost ready to fix that (just needs to be reshuffled into
> upstreamable patches). Brings us up from 5.9 Gbps to 8.4 Gbps per CPU
> on my test box with 16k records. Probably much more than that with
> smaller records.

Also on my list open-ssl support is lacking ktls support for both
direction in tls1.3 iirc. We have a couple test workloads pinned on
1.2 for example which really isn't great.

> 
> > It is still necessary to perform a read operation (never triggered) from user
> > space despite the redirection. It makes no sense, since this read operation is
> > not necessary on regular sockets without KTLS.
> > 
> > We do not see how to fix this problem without a change of architecture, for
> > example by performing TLS decrypt directly inside the BPF verdict program.
> > 
> > An example program can be found at
> > https://github.com/juliens/ktls-bpf_redirect-example/
> > 
> > Co-authored-by: Marc Vertes <mvertes@free.fr>
> > ---
> >  net/tls/tls_sw.c                           | 6 ++++++
> >  tools/testing/selftests/bpf/test_sockmap.c | 8 +++-----
> >  2 files changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index 0513f82b8537..a409f8a251db 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -1839,8 +1839,14 @@ int tls_sw_recvmsg(struct sock *sk,
> >  			if (bpf_strp_enabled) {
> >  				/* BPF may try to queue the skb */
> >  				__skb_unlink(skb, &ctx->rx_list);
> > +
> >  				err = sk_psock_tls_strp_read(psock, skb);
> > +
> >  				if (err != __SK_PASS) {
> > +                    if (err == __SK_REDIRECT) {
> > +                        skb->data += rxm->offset;
> > +                        skb->len = rxm->full_len;
> > +                    }
> 
> IDK what this is trying to do but I certainly depends on the fact 
> we run skb_cow_data() and is not "generally correct" :S

Ah also we are not handling partially consumed correctly either.
Seems we might pop off the skb even when we need to continue;

Maybe look at how skb_copy_datagram_msg() goes below because it
fixes the skb copy up with the rxm->offset. But, also we need to
do this repair before sk_psock_tls_strp_read I think so that
the BPF program reads the correct data in all cases? I guess
your sample program (and selftests for that matter) just did
the redirect without reading the data?

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict.
  2022-06-29  7:00   ` John Fastabend
@ 2022-06-29  8:58     ` Julien Salleyron
  2022-06-30 15:21     ` Vadim Fedorenko
  1 sibling, 0 replies; 6+ messages in thread
From: Julien Salleyron @ 2022-06-29  8:58 UTC (permalink / raw)
  To: John Fastabend, Jakub Kicinski; +Cc: bpf, netdev, Marc Vertes

Thanks for your quick response.

 > It's because kTLS does a very expensive reallocation and copy for the
 > non-zerocopy case (which currently means all of TLS 1.3). I have
 > code almost ready to fix that (just needs to be reshuffled into
 > upstreamable patches). Brings us up from 5.9 Gbps to 8.4 Gbps per CPU
 > on my test box with 16k records. Probably much more than that with
 > smaller records.

Happy to hear that, it seems an impressive improvement!

 > You'll also need a signed-off-by.

We will do it.

 >
 > > IDK what this is trying to do but I certainly depends on the fact
 > > we run skb_cow_data() and is not "generally correct" :S
 >
 > Ah also we are not handling partially consumed correctly either.
 > Seems we might pop off the skb even when we need to continue;
 >
 > Maybe look at how skb_copy_datagram_msg() goes below because it
 > fixes the skb copy up with the rxm->offset. But, also we need to
 > do this repair before sk_psock_tls_strp_read I think so that
 > the BPF program reads the correct data in all cases? I guess
 > your sample program (and selftests for that matter) just did
 > the redirect without reading the data?

Even if our sample program doesn't read data, we can confirm that the 
data in the BPF Program are incorrect (no rxm applied).
We will make a change to handle this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict.
  2022-06-29  7:00   ` John Fastabend
  2022-06-29  8:58     ` Julien Salleyron
@ 2022-06-30 15:21     ` Vadim Fedorenko
  1 sibling, 0 replies; 6+ messages in thread
From: Vadim Fedorenko @ 2022-06-30 15:21 UTC (permalink / raw)
  To: John Fastabend, Jakub Kicinski, Julien Salleyron; +Cc: bpf, netdev, Marc Vertes

On 29.06.2022 08:00, John Fastabend wrote:
> Jakub Kicinski wrote:
>> On Tue, 28 Jun 2022 17:25:05 +0200 Julien Salleyron wrote:
>>> This patch allows to use KTLS on a socket where we apply sk_redirect using a BPF
>>> verdict program.
>>>
> 
> You'll also need a signed-off-by.
> 
>>> Without this patch, we see that the data received after the redirection are
>>> decrypted but with an incorrect offset and length. It seems to us that the
>>> offset and length are correct in the stream-parser data, but finally not applied
>>> in the skb. We have simply applied those values to the skb.
>>>
>>> In the case of regular sockets, we saw a big performance improvement from
>>> applying redirect. This is not the case now with KTLS, may be related to the
>>> following point.
>>
>> It's because kTLS does a very expensive reallocation and copy for the
>> non-zerocopy case (which currently means all of TLS 1.3). I have
>> code almost ready to fix that (just needs to be reshuffled into
>> upstreamable patches). Brings us up from 5.9 Gbps to 8.4 Gbps per CPU
>> on my test box with 16k records. Probably much more than that with
>> smaller records.
> 
> Also on my list open-ssl support is lacking ktls support for both
> direction in tls1.3 iirc. We have a couple test workloads pinned on
> 1.2 for example which really isn't great.
>

AFAIK in-kernel TLS 1.3 is supported in OpenSSL 3.0, I implemented TX part long 
time ago and was fixing some parts while it was 3.0-alpha. Not sure about RX.
Or are you talking about zero-copy implementation?

>>
>>> It is still necessary to perform a read operation (never triggered) from user
>>> space despite the redirection. It makes no sense, since this read operation is
>>> not necessary on regular sockets without KTLS.
>>>
>>> We do not see how to fix this problem without a change of architecture, for
>>> example by performing TLS decrypt directly inside the BPF verdict program.
>>>
>>> An example program can be found at
>>> https://github.com/juliens/ktls-bpf_redirect-example/
>>>
>>> Co-authored-by: Marc Vertes <mvertes@free.fr>
>>> ---
>>>   net/tls/tls_sw.c                           | 6 ++++++
>>>   tools/testing/selftests/bpf/test_sockmap.c | 8 +++-----
>>>   2 files changed, 9 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>>> index 0513f82b8537..a409f8a251db 100644
>>> --- a/net/tls/tls_sw.c
>>> +++ b/net/tls/tls_sw.c
>>> @@ -1839,8 +1839,14 @@ int tls_sw_recvmsg(struct sock *sk,
>>>   			if (bpf_strp_enabled) {
>>>   				/* BPF may try to queue the skb */
>>>   				__skb_unlink(skb, &ctx->rx_list);
>>> +
>>>   				err = sk_psock_tls_strp_read(psock, skb);
>>> +
>>>   				if (err != __SK_PASS) {
>>> +                    if (err == __SK_REDIRECT) {
>>> +                        skb->data += rxm->offset;
>>> +                        skb->len = rxm->full_len;
>>> +                    }
>>
>> IDK what this is trying to do but I certainly depends on the fact
>> we run skb_cow_data() and is not "generally correct" :S
> 
> Ah also we are not handling partially consumed correctly either.
> Seems we might pop off the skb even when we need to continue;
> 
> Maybe look at how skb_copy_datagram_msg() goes below because it
> fixes the skb copy up with the rxm->offset. But, also we need to
> do this repair before sk_psock_tls_strp_read I think so that
> the BPF program reads the correct data in all cases? I guess
> your sample program (and selftests for that matter) just did
> the redirect without reading the data?
> 
> Thanks!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict.
  2022-06-28 17:34 ` Jakub Kicinski
  2022-06-29  7:00   ` John Fastabend
@ 2022-07-05  3:26   ` Ziyang Xuan (William)
  1 sibling, 0 replies; 6+ messages in thread
From: Ziyang Xuan (William) @ 2022-07-05  3:26 UTC (permalink / raw)
  To: Jakub Kicinski, Julien Salleyron; +Cc: bpf, netdev, Marc Vertes

> On Tue, 28 Jun 2022 17:25:05 +0200 Julien Salleyron wrote:
>> This patch allows to use KTLS on a socket where we apply sk_redirect using a BPF
>> verdict program.
>>
>> Without this patch, we see that the data received after the redirection are
>> decrypted but with an incorrect offset and length. It seems to us that the
>> offset and length are correct in the stream-parser data, but finally not applied
>> in the skb. We have simply applied those values to the skb.
>>
>> In the case of regular sockets, we saw a big performance improvement from
>> applying redirect. This is not the case now with KTLS, may be related to the
>> following point.
> 
> It's because kTLS does a very expensive reallocation and copy for the
> non-zerocopy case (which currently means all of TLS 1.3). I have
> code almost ready to fix that (just needs to be reshuffled into
> upstreamable patches). Brings us up from 5.9 Gbps to 8.4 Gbps per CPU
> on my test box with 16k records. Probably much more than that with
> smaller records.
> 
>> It is still necessary to perform a read operation (never triggered) from user
>> space despite the redirection. It makes no sense, since this read operation is
>> not necessary on regular sockets without KTLS.
>>
>> We do not see how to fix this problem without a change of architecture, for
>> example by performing TLS decrypt directly inside the BPF verdict program.
>>
>> An example program can be found at
>> https://github.com/juliens/ktls-bpf_redirect-example/
>>
>> Co-authored-by: Marc Vertes <mvertes@free.fr>
>> ---
>>  net/tls/tls_sw.c                           | 6 ++++++
>>  tools/testing/selftests/bpf/test_sockmap.c | 8 +++-----
>>  2 files changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 0513f82b8537..a409f8a251db 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -1839,8 +1839,14 @@ int tls_sw_recvmsg(struct sock *sk,
>>  			if (bpf_strp_enabled) {
>>  				/* BPF may try to queue the skb */
>>  				__skb_unlink(skb, &ctx->rx_list);
>> +
>>  				err = sk_psock_tls_strp_read(psock, skb);
>> +
>>  				if (err != __SK_PASS) {
>> +                    if (err == __SK_REDIRECT) {
>> +                        skb->data += rxm->offset;
>> +                        skb->len = rxm->full_len;
>> +                    }
> 

Read the code with the following ktls + ebpf redirect model,
we may find the problem for the above modfication.

sk_psock_tls_strp_read()
->sk_psock_tls_verdict_apply()
  ->sk_psock_skb_redirect()
    ->skb_queue_tail() // add skb to ingress_skb queue
      schedule_work() [ sk_psock_backlog ] // sk_psock_tls_strp_read return

Walk through sk_psock->ingress_skb queue in sk_psock_backlog(),
after sk_psock_handle_skb(), it will kfree_skb() if not ingress.
So I think the above modfication has race problem about skb.
And I backport the above modfication to linux-5.10 as following:

--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1882,6 +1882,10 @@ int tls_sw_recvmsg(struct sock *sk,
                        if (bpf_strp_enabled) {
                                err = sk_psock_tls_strp_read(psock, skb);
                                if (err != __SK_PASS) {
+                                       if (err == __SK_REDIRECT) {
+                                               skb->data += rxm->offset;
+                                               skb->len = rxm->full_len;
+                                       }
                                        rxm->offset = rxm->offset + rxm->full_len;

Run the following ktls + ebpf redirect model, occur system panic.

ktls       ktls
sender     proxy_recv         proxy_send        recv
 |            |                                  |
 |          verdict ---------------+             |
 |            |    redirect_egress |             |
 +------------+                    +-------------+

Call trace as following. It's may be not the race problem about skb,
but the above modfication is not OK for the ktls + ebpf redirect model.

[  816.687852] BUG: unable to handle page fault for address: ffff8f057ffdf000
[  816.687998] Oops: 0000 [#1] SMP PTI
[  816.688025] CPU: 2 PID: 619 Comm: kworker/2:3 Kdump: loaded Not tainted 5.10.0-ktls+ #4
[  816.688122] Workqueue: events sk_psock_backlog
[  816.688152] RIP: 0010:memcpy_erms+0x6/0x10
[  816.688263] RSP: 0018:ffff9c60c01cfc58 EFLAGS: 00010286
[  816.688292] RAX: ffff8f054f880000 RBX: 0000000000008000 RCX: 00000000000044bb
[  816.688330] RDX: 0000000000008000 RSI: ffff8f057ffdf000 RDI: ffff8f054f883b45
[  816.688376] RBP: 0000000000008000 R08: 0000000000000001 R09: 0000000000000000
[  816.688412] R10: 0000000000000001 R11: ffff9c60c01cfdd0 R12: ffff9c60c01cfdd0
[  816.688447] R13: ffff8f054f880000 R14: ffff9c60c01cfdb0 R15: 0000000000008000
[  816.688480] FS:  0000000000000000(0000) GS:ffff8f0775d00000(0000) knlGS:0000000000000000
[  816.688524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  816.688564] CR2: ffff8f057ffdf000 CR3: 0000000120b16005 CR4: 0000000000370ee0
[  816.688600] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  816.688631] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  816.688661] Call Trace:
[  816.688681]  _copy_from_iter_full+0x1ee/0x270
[  816.688725]  tcp_sendmsg_locked+0x5aa/0xdc0
[  816.688749]  tcp_sendmsg+0x28/0x40
[  816.688769]  sock_sendmsg+0x57/0x60
[  816.688790]  ? sendmsg_unlocked+0x20/0x20
[  816.688811]  __skb_send_sock+0x29b/0x2f0
[  816.688847]  ? linear_to_page+0xf0/0xf0
[  816.688868]  sk_psock_backlog+0x85/0x1d0
[  816.688890]  process_one_work+0x1b0/0x350
[  816.688911]  worker_thread+0x49/0x300


In addition, I'm trying to explain performance degradation for the
ktls + ebpf redirect model compare to the following general proxy
model using TLS 1.3 (which are all using non-zerocopy), even though
ktls + ebpf redirect model reduce two cross-kernel data copies.
Now I find a memcpy_erms hot spot for _copy_from_iter_full() in
tcp_sendmsg_locked  within redirect_egress process. I suspect that
copy efficiency for the kvec is not good. But this just my guess, I
need to verify.

ktls           ktls
sender         proxy_recv proxy_send      recv
 |                |           |           |
 +----------------+           +-----------+

If you have some ideas, hope you can give me some advice.

Thand you!

> IDK what this is trying to do but I certainly depends on the fact 
> we run skb_cow_data() and is not "generally correct" :S
> .
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-05  3:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-28 15:25 [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict Julien Salleyron
2022-06-28 17:34 ` Jakub Kicinski
2022-06-29  7:00   ` John Fastabend
2022-06-29  8:58     ` Julien Salleyron
2022-06-30 15:21     ` Vadim Fedorenko
2022-07-05  3:26   ` Ziyang Xuan (William)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.