All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request
@ 2020-05-17 16:26 Pooja Trivedi
  2020-05-18 22:50 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Pooja Trivedi @ 2020-05-17 16:26 UTC (permalink / raw)
  To: borisp, aviadye, john.fastabend, daniel, kuba, davem, vakul.garg,
	netdev, linux-crypto, mallesham.jatharkonda, josh.tway,
	pooja.trivedi

In pure sw ktls(AES-NI), -EAGAIN from tcp layer (do_tcp_sendpages for
encrypted record) gets treated as error, subtracts the offset, and
returns to application. Because of this, application sends data from
subtracted offset, which leads to data integrity issue. Since record is
already encrypted, ktls module marks it as partially sent and pushes the
packet to tcp layer in the following iterations (either from bottom half
or when pushing next chunk). So returning success in case of EAGAIN
will fix the issue.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
Signed-off-by: Pooja Trivedi <pooja.trivedi@stackpath.com>
Reviewed-by: Mallesham Jatharkonda <mallesham.jatharkonda@oneconvergence.com>
Reviewed-by: Josh Tway <josh.tway@stackpath.com>
---
 net/tls/tls_sw.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e23f94a..d8ebdfc 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -415,10 +415,12 @@ int tls_tx_records(struct sock *sk, int flags)
 	}
 
 tx_err:
-	if (rc < 0 && rc != -EAGAIN)
+	if (rc < 0 && rc != -EAGAIN) {
 		tls_err_abort(sk, EBADMSG);
+		return rc;
+	}
 
-	return rc;
+	return 0;
 }
 
 static void tls_encrypt_done(struct crypto_async_request *req, int err)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request
  2020-05-17 16:26 [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request Pooja Trivedi
@ 2020-05-18 22:50 ` Jakub Kicinski
  2020-05-19 17:21   ` Pooja Trivedi
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2020-05-18 22:50 UTC (permalink / raw)
  To: Pooja Trivedi
  Cc: borisp, aviadye, john.fastabend, daniel, davem, vakul.garg,
	netdev, linux-crypto, mallesham.jatharkonda, josh.tway,
	pooja.trivedi

On Sun, 17 May 2020 16:26:36 +0000 Pooja Trivedi wrote:
> In pure sw ktls(AES-NI), -EAGAIN from tcp layer (do_tcp_sendpages for
> encrypted record) gets treated as error, subtracts the offset, and
> returns to application. Because of this, application sends data from
> subtracted offset, which leads to data integrity issue. Since record is
> already encrypted, ktls module marks it as partially sent and pushes the
> packet to tcp layer in the following iterations (either from bottom half
> or when pushing next chunk). So returning success in case of EAGAIN
> will fix the issue.
> 
> Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
> Signed-off-by: Pooja Trivedi <pooja.trivedi@stackpath.com>
> Reviewed-by: Mallesham Jatharkonda <mallesham.jatharkonda@oneconvergence.com>
> Reviewed-by: Josh Tway <josh.tway@stackpath.com>

This looks reasonable, I think. Next time user space calls if no new
buffer space was made available it will get a -EAGAIN, right?

Two questions - is there any particular application or use case that
runs into this? Seems a bit surprising to see a patch from Vadim and
you guys come at the same time.

Could you also add test for this bug? 
In tools/testing/selftests/net/tls.c

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request
  2020-05-18 22:50 ` Jakub Kicinski
@ 2020-05-19 17:21   ` Pooja Trivedi
  2020-05-19 21:42     ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Pooja Trivedi @ 2020-05-19 17:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: borisp, aviadye, john.fastabend, daniel, davem, vakul.garg,
	netdev, linux-crypto, mallesham.jatharkonda, josh.tway,
	Pooja Trivedi

On Mon, May 18, 2020 at 6:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 17 May 2020 16:26:36 +0000 Pooja Trivedi wrote:
> > In pure sw ktls(AES-NI), -EAGAIN from tcp layer (do_tcp_sendpages for
> > encrypted record) gets treated as error, subtracts the offset, and
> > returns to application. Because of this, application sends data from
> > subtracted offset, which leads to data integrity issue. Since record is
> > already encrypted, ktls module marks it as partially sent and pushes the
> > packet to tcp layer in the following iterations (either from bottom half
> > or when pushing next chunk). So returning success in case of EAGAIN
> > will fix the issue.
> >
> > Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
> > Signed-off-by: Pooja Trivedi <pooja.trivedi@stackpath.com>
> > Reviewed-by: Mallesham Jatharkonda <mallesham.jatharkonda@oneconvergence.com>
> > Reviewed-by: Josh Tway <josh.tway@stackpath.com>
>
> This looks reasonable, I think. Next time user space calls if no new
> buffer space was made available it will get a -EAGAIN, right?
>

Yes, this fix should only affect encrypted record. Plain text calls from
user space should be unaffected.

>
> Two questions - is there any particular application or use case that
> runs into this?
>

We are running into this case when we hit our kTLS-enabled homegrown
webserver with a 'pipeline' test tool, also homegrown. The issue basically
happens whenever the send buffer on the server gets full and TCP layer
returns EAGAIN when attempting to TX the encrypted record. In fact, we
are also able to reproduce the issue by using a simple wget with a large
file, if/when sndbuf fills up.

> Seems a bit surprising to see a patch from Vadim and
> you guys come at the same time.
>

Not familiar with Vadim or her/his patch. Could you please point me to it?

>
> Could you also add test for this bug?
> In tools/testing/selftests/net/tls.c
>

Sure, yes. Let me look into this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request
  2020-05-19 17:21   ` Pooja Trivedi
@ 2020-05-19 21:42     ` Jakub Kicinski
  2020-05-20 19:56       ` Pooja Trivedi
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2020-05-19 21:42 UTC (permalink / raw)
  To: Pooja Trivedi
  Cc: borisp, aviadye, john.fastabend, daniel, davem, vakul.garg,
	netdev, linux-crypto, mallesham.jatharkonda, josh.tway,
	Pooja Trivedi

On Tue, 19 May 2020 13:21:56 -0400 Pooja Trivedi wrote:
> On Mon, May 18, 2020 at 6:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Sun, 17 May 2020 16:26:36 +0000 Pooja Trivedi wrote:  
> > > In pure sw ktls(AES-NI), -EAGAIN from tcp layer (do_tcp_sendpages for
> > > encrypted record) gets treated as error, subtracts the offset, and
> > > returns to application. Because of this, application sends data from
> > > subtracted offset, which leads to data integrity issue. Since record is
> > > already encrypted, ktls module marks it as partially sent and pushes the
> > > packet to tcp layer in the following iterations (either from bottom half
> > > or when pushing next chunk). So returning success in case of EAGAIN
> > > will fix the issue.
> > >
> > > Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
> > > Signed-off-by: Pooja Trivedi <pooja.trivedi@stackpath.com>
> > > Reviewed-by: Mallesham Jatharkonda <mallesham.jatharkonda@oneconvergence.com>
> > > Reviewed-by: Josh Tway <josh.tway@stackpath.com>  
> >
> > This looks reasonable, I think. Next time user space calls if no new
> > buffer space was made available it will get a -EAGAIN, right?
> 
> Yes, this fix should only affect encrypted record. Plain text calls from
> user space should be unaffected.

AFAICS if TCP layer is full next call from user space should hit
sk_stream_wait_memory() immediately and if it has MSG_DONTWAIT set 
exit with EAGAIN. Which I believe to be correct behavior.

> > Two questions - is there any particular application or use case that
> > runs into this?
> 
> We are running into this case when we hit our kTLS-enabled homegrown
> webserver with a 'pipeline' test tool, also homegrown. The issue basically
> happens whenever the send buffer on the server gets full and TCP layer
> returns EAGAIN when attempting to TX the encrypted record. In fact, we
> are also able to reproduce the issue by using a simple wget with a large
> file, if/when sndbuf fills up.

I see just a coincidence, then, no worries.

> > Seems a bit surprising to see a patch from Vadim and
> > you guys come at the same time.
> 
> Not familiar with Vadim or her/his patch. Could you please point me to it?

http://patchwork.ozlabs.org/project/netdev/patch/20200517014451.954F05026DE@novek.ru/

> > Could you also add test for this bug?
> > In tools/testing/selftests/net/tls.c
> >  
> 
> Sure, yes. Let me look into this.

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request
  2020-05-19 21:42     ` Jakub Kicinski
@ 2020-05-20 19:56       ` Pooja Trivedi
  2020-05-20 20:12         ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Pooja Trivedi @ 2020-05-20 19:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: borisp, aviadye, john.fastabend, daniel, davem, vakul.garg,
	netdev, linux-crypto, mallesham.jatharkonda, josh.tway,
	Pooja Trivedi

On Tue, May 19, 2020 at 5:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 19 May 2020 13:21:56 -0400 Pooja Trivedi wrote:
> > On Mon, May 18, 2020 at 6:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Sun, 17 May 2020 16:26:36 +0000 Pooja Trivedi wrote:
> > > > In pure sw ktls(AES-NI), -EAGAIN from tcp layer (do_tcp_sendpages for
> > > > encrypted record) gets treated as error, subtracts the offset, and
> > > > returns to application. Because of this, application sends data from
> > > > subtracted offset, which leads to data integrity issue. Since record is
> > > > already encrypted, ktls module marks it as partially sent and pushes the
> > > > packet to tcp layer in the following iterations (either from bottom half
> > > > or when pushing next chunk). So returning success in case of EAGAIN
> > > > will fix the issue.
> > > >
> > > > Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
> > > > Signed-off-by: Pooja Trivedi <pooja.trivedi@stackpath.com>
> > > > Reviewed-by: Mallesham Jatharkonda <mallesham.jatharkonda@oneconvergence.com>
> > > > Reviewed-by: Josh Tway <josh.tway@stackpath.com>
> > >
> > > This looks reasonable, I think. Next time user space calls if no new
> > > buffer space was made available it will get a -EAGAIN, right?
> >
> > Yes, this fix should only affect encrypted record. Plain text calls from
> > user space should be unaffected.
>
> AFAICS if TCP layer is full next call from user space should hit
> sk_stream_wait_memory() immediately and if it has MSG_DONTWAIT set
> exit with EAGAIN. Which I believe to be correct behavior.
>

The flow is tls_sw_sendmsg/tls_sw_do_sendpage --> bpf_exec_tx_verdict -->
tls_push_record --> tls_tx_records --> tls_push_sg --> do_tcp_sendpages

do_tcp_sendpages() sends partial record, 'retry:' label is exercised wherein
do_tcp_sendpages gets called again and returns -EAGAIN.
tls_push_sg sets partially_sent_record/partially_sent_offset and
returns -EAGAIN. -EAGAIN bubbles up to bpf_exec_tx_verdict.
In bpf_exec_tx_verdict, the following code causes 'copied' variable to
get updated to a negative value and returns -EAGAIN.

                err = tls_push_record(sk, flags, record_type);
                if (err && err != -EINPROGRESS) {
                        *copied -= sk_msg_free(sk, msg);
                        tls_free_open_rec(sk);
                }
                return err;

-EAGAIN returned by bpf_exec_tx_verdict causes
tls_sw_sendmsg/tls_sw_do_sendpage to 'continue' in the while loop and
call sk_stream_wait_memory(). sk_stream_wait_memory returns -EAGAIN
also and control reaches the 'send_end:' label. The following return
statement causes a negative 'copied' variable value to be returned to the
user space.

        return copied ? copied : ret;

User space applies this negative value as offset for the next send, causing
part of the record that was already sent to be pushed again.

Hope this clarifies it.


>
> > > Two questions - is there any particular application or use case that
> > > runs into this?
> >
> > We are running into this case when we hit our kTLS-enabled homegrown
> > webserver with a 'pipeline' test tool, also homegrown. The issue basically
> > happens whenever the send buffer on the server gets full and TCP layer
> > returns EAGAIN when attempting to TX the encrypted record. In fact, we
> > are also able to reproduce the issue by using a simple wget with a large
> > file, if/when sndbuf fills up.
>
> I see just a coincidence, then, no worries.
>
> > > Seems a bit surprising to see a patch from Vadim and
> > > you guys come at the same time.
> >
> > Not familiar with Vadim or her/his patch. Could you please point me to it?
>
> http://patchwork.ozlabs.org/project/netdev/patch/20200517014451.954F05026DE@novek.ru/
>

Ah, looks like Vadim ran into the exact issue!

>
> > > Could you also add test for this bug?
> > > In tools/testing/selftests/net/tls.c
> > >
> >
> > Sure, yes. Let me look into this.
>
> Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request
  2020-05-20 19:56       ` Pooja Trivedi
@ 2020-05-20 20:12         ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2020-05-20 20:12 UTC (permalink / raw)
  To: Pooja Trivedi
  Cc: borisp, aviadye, john.fastabend, daniel, davem, vakul.garg,
	netdev, linux-crypto, mallesham.jatharkonda, josh.tway,
	Pooja Trivedi

On Wed, 20 May 2020 15:56:56 -0400 Pooja Trivedi wrote:
> On Tue, May 19, 2020 at 5:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 19 May 2020 13:21:56 -0400 Pooja Trivedi wrote:  
> > > On Mon, May 18, 2020 at 6:50 PM Jakub Kicinski <kuba@kernel.org> wrote:  
> > > > On Sun, 17 May 2020 16:26:36 +0000 Pooja Trivedi wrote:  
> > > > > In pure sw ktls(AES-NI), -EAGAIN from tcp layer (do_tcp_sendpages for
> > > > > encrypted record) gets treated as error, subtracts the offset, and
> > > > > returns to application. Because of this, application sends data from
> > > > > subtracted offset, which leads to data integrity issue. Since record is
> > > > > already encrypted, ktls module marks it as partially sent and pushes the
> > > > > packet to tcp layer in the following iterations (either from bottom half
> > > > > or when pushing next chunk). So returning success in case of EAGAIN
> > > > > will fix the issue.
> > > > >
> > > > > Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
> > > > > Signed-off-by: Pooja Trivedi <pooja.trivedi@stackpath.com>
> > > > > Reviewed-by: Mallesham Jatharkonda <mallesham.jatharkonda@oneconvergence.com>
> > > > > Reviewed-by: Josh Tway <josh.tway@stackpath.com>  
> > > >
> > > > This looks reasonable, I think. Next time user space calls if no new
> > > > buffer space was made available it will get a -EAGAIN, right?  
> > >
> > > Yes, this fix should only affect encrypted record. Plain text calls from
> > > user space should be unaffected.  
> >
> > AFAICS if TCP layer is full next call from user space should hit
> > sk_stream_wait_memory() immediately and if it has MSG_DONTWAIT set
> > exit with EAGAIN. Which I believe to be correct behavior.
> >  
> 
> The flow is tls_sw_sendmsg/tls_sw_do_sendpage --> bpf_exec_tx_verdict -->
> tls_push_record --> tls_tx_records --> tls_push_sg --> do_tcp_sendpages
> 
> do_tcp_sendpages() sends partial record, 'retry:' label is exercised wherein
> do_tcp_sendpages gets called again and returns -EAGAIN.
> tls_push_sg sets partially_sent_record/partially_sent_offset and
> returns -EAGAIN. -EAGAIN bubbles up to bpf_exec_tx_verdict.
> In bpf_exec_tx_verdict, the following code causes 'copied' variable to
> get updated to a negative value and returns -EAGAIN.
> 
>                 err = tls_push_record(sk, flags, record_type);
>                 if (err && err != -EINPROGRESS) {
>                         *copied -= sk_msg_free(sk, msg);
>                         tls_free_open_rec(sk);
>                 }
>                 return err;
> 
> -EAGAIN returned by bpf_exec_tx_verdict causes
> tls_sw_sendmsg/tls_sw_do_sendpage to 'continue' in the while loop and
> call sk_stream_wait_memory(). sk_stream_wait_memory returns -EAGAIN
> also and control reaches the 'send_end:' label. The following return
> statement causes a negative 'copied' variable value to be returned to the
> user space.
> 
>         return copied ? copied : ret;
> 
> User space applies this negative value as offset for the next send, causing
> part of the record that was already sent to be pushed again.
> 
> Hope this clarifies it.

Oh yes, sorry I was talking about the behavior _after_ your patch, on
the _next_ sendmsg/sendpage call.

It should now work like this:

	bpf_exec_tx_verdict() returns success, next iteration of the
	sendmsg/sendpage loop hits sk_stream_wait_memory(), we return
	positive copied which is counts the entire record, even though
	some of it is still in partially_sent_record. If user space
	calls sendmsg again we will hit sk_stream_wait_memory() ->
	send_end -> this time copied is 0, so user space will see
	-EAGAIN.

If I'm still not making sense don't worry about it, I think it should
be easy to explain based on the selftest.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-20 20:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-17 16:26 [PATCH net] net/tls(TLS_SW): Fix integrity issue with non-blocking sw KTLS request Pooja Trivedi
2020-05-18 22:50 ` Jakub Kicinski
2020-05-19 17:21   ` Pooja Trivedi
2020-05-19 21:42     ` Jakub Kicinski
2020-05-20 19:56       ` Pooja Trivedi
2020-05-20 20:12         ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.