* [RFC net] net/tls: clear SG markings on encryption error
@ 2019-11-22 21:45 Jakub Kicinski
2019-11-22 22:36 ` Jakub Kicinski
2019-11-23 3:25 ` John Fastabend
0 siblings, 2 replies; 6+ messages in thread
From: Jakub Kicinski @ 2019-11-22 21:45 UTC (permalink / raw)
To: borisp, aviadye, john.fastabend, daniel
Cc: netdev, Jakub Kicinski, syzbot+df0d4ec12332661dd1f9
When tls_do_encryption() fails the SG lists are left with the
SG_END and SG_CHAIN marks in place. One could hope that once
encryption fails we will never see the record again, but that
is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support
to sk_msg handling") added special handling to ENOMEM and ENOSPC
errors which mean we may see the same record re-submitted.
In all honesty I don't understand why we need the ENOMEM handling.
Waiting for socket memory without setting SOCK_NOSPACE on any
random memory allocation failure seems slightly ill advised.
Having said that, undoing the SG markings seems wise regardless.
Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com
Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
John, I'm sending this mostly to ask if we can safely remove
the ENOMEM handling? :)
I was going to try the sockmap tests myself, but looks like the current
LLVM 10 build I get from their debs just segfaults when trying to build
selftest :/
Also there's at least one more bug in this piece of code, TLS 1.3
can't assume there's at least one free SG entry.
net/tls/tls_sw.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 24161750a737..4a0ea87b20cf 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags,
if (rc < 0) {
if (rc != -EINPROGRESS) {
tls_err_abort(sk, EBADMSG);
+
+ i = msg_pl->sg.end;
+ if (prot->version == TLS_1_3_VERSION) {
+ sg_mark_end(sk_msg_elem(msg_pl, i));
+ sg_unmark_end(sk_msg_elem(msg_pl, i));
+ }
+ sk_msg_iter_var_prev(i);
+ sg_unmark_end(sk_msg_elem(msg_pl, i));
+
+ i = msg_en->sg.end;
+ sk_msg_iter_var_prev(i);
+ sg_unmark_end(sk_msg_elem(msg_en, i));
+
if (split) {
tls_ctx->pending_open_record_frags = true;
tls_merge_open_record(sk, rec, tmp, orig_end);
--
2.23.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error
2019-11-22 21:45 [RFC net] net/tls: clear SG markings on encryption error Jakub Kicinski
@ 2019-11-22 22:36 ` Jakub Kicinski
2019-11-23 6:56 ` John Fastabend
2019-11-23 3:25 ` John Fastabend
1 sibling, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2019-11-22 22:36 UTC (permalink / raw)
To: john.fastabend, daniel
Cc: borisp, aviadye, netdev, syzbot+df0d4ec12332661dd1f9
On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote:
> Also there's at least one more bug in this piece of code, TLS 1.3
> can't assume there's at least one free SG entry.
And I don't see any place where the front and back of the SG circular
buffer are actually chained :( This:
static inline void sk_msg_init(struct sk_msg *msg)
{
BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
memset(msg, 0, sizeof(*msg));
sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
}
looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end,
we don't know where the end is going to be..
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 6cb077b646a5..6c6ce6f90e7d 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg)
static inline void sk_msg_init(struct sk_msg *msg)
{
- BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
memset(msg, 0, sizeof(*msg));
- sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
+ sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data);
}
static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src,
Hm?
^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [RFC net] net/tls: clear SG markings on encryption error
2019-11-22 21:45 [RFC net] net/tls: clear SG markings on encryption error Jakub Kicinski
2019-11-22 22:36 ` Jakub Kicinski
@ 2019-11-23 3:25 ` John Fastabend
2019-11-25 19:52 ` Jakub Kicinski
1 sibling, 1 reply; 6+ messages in thread
From: John Fastabend @ 2019-11-23 3:25 UTC (permalink / raw)
To: Jakub Kicinski, borisp, aviadye, john.fastabend, daniel
Cc: netdev, Jakub Kicinski, syzbot+df0d4ec12332661dd1f9
Jakub Kicinski wrote:
> When tls_do_encryption() fails the SG lists are left with the
> SG_END and SG_CHAIN marks in place. One could hope that once
> encryption fails we will never see the record again, but that
> is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support
> to sk_msg handling") added special handling to ENOMEM and ENOSPC
> errors which mean we may see the same record re-submitted.
>
> In all honesty I don't understand why we need the ENOMEM handling.
> Waiting for socket memory without setting SOCK_NOSPACE on any
> random memory allocation failure seems slightly ill advised.
>
> Having said that, undoing the SG markings seems wise regardless.
>
> Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com
> Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
> Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> John, I'm sending this mostly to ask if we can safely remove
> the ENOMEM handling? :)
>
What ENOMEM are you asking about here? The return code handling
from bpf_exec_tx_verdict?
ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
record_type, &copied,
msg->msg_flags);
if (ret) {
if (ret == -EINPROGRESS)
num_async++;
else if (ret == -ENOMEM)
goto wait_for_memory;
else if (ret == -ENOSPC)
goto rollback_iter;
else if (ret != -EAGAIN)
goto send_end;
}
I would want to run it through some of our tests but I don't think
there is anything specific about BPF that needs it to be handled.
I was just trying to handle the error case gracefully and
wait_for_memory seems like the right behavior to me. What is ill
advised here?
> I was going to try the sockmap tests myself, but looks like the current
> LLVM 10 build I get from their debs just segfaults when trying to build
> selftest :/
>
> Also there's at least one more bug in this piece of code, TLS 1.3
> can't assume there's at least one free SG entry.
There should always be one free SG entry at the end of the ring
that is used for chaining.
From sk_msg_sg{}
/* The extra element is used for chaining the front and sections when
* the list becomes partitioned (e.g. end < start). The crypto APIs
* require the chaining.
*/
struct scatterlist data[MAX_MSG_FRAGS + 1];
Can we use that element in that case? Otherwise probably can
add an extra element there if needed, data[MAX_MSG_FRAGS + 2].
>
> net/tls/tls_sw.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 24161750a737..4a0ea87b20cf 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags,
> if (rc < 0) {
> if (rc != -EINPROGRESS) {
> tls_err_abort(sk, EBADMSG);
> +
> + i = msg_pl->sg.end;
> + if (prot->version == TLS_1_3_VERSION) {
> + sg_mark_end(sk_msg_elem(msg_pl, i));
> + sg_unmark_end(sk_msg_elem(msg_pl, i));
> + }
> + sk_msg_iter_var_prev(i);
> + sg_unmark_end(sk_msg_elem(msg_pl, i));
> +
> + i = msg_en->sg.end;
> + sk_msg_iter_var_prev(i);
> + sg_unmark_end(sk_msg_elem(msg_en, i));
> +
> if (split) {
> tls_ctx->pending_open_record_frags = true;
> tls_merge_open_record(sk, rec, tmp, orig_end);
> --
> 2.23.0
>
Can you copy the tls_push_record() error handling from BPF side instead of
embedding more into tls_push_record itself?
err = tls_push_record(sk, flags, record_type);
if (err < 0) {
*copied -= sk_msg_free(sk, msg);
tls_free_open_rec(sk);
goto out_err;
}
If the BPF program is not installed I guess you can skip the copied part
because you wont have the 'more_data' case.
So something like (untested/compiled/etc)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 141da093ff04..0469eb73bc88 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -771,8 +771,14 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
policy = !(flags & MSG_SENDPAGE_NOPOLICY);
psock = sk_psock_get(sk);
- if (!psock || !policy)
- return tls_push_record(sk, flags, record_type);
+ if (!psock || !policy) {
+ err = tls_push_record(sk, flags, record_type);
+ if (err) {
+ sk_msg_free(sk, msg);
+ tls_free_open_rec(sk); // might not be needed in noBPF
+ }
+ return err;
+ }
more_data:
enospc = sk_msg_full(msg);
if (psock->eval == __SK_NONE) {
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error
2019-11-22 22:36 ` Jakub Kicinski
@ 2019-11-23 6:56 ` John Fastabend
2019-11-25 19:58 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: John Fastabend @ 2019-11-23 6:56 UTC (permalink / raw)
To: Jakub Kicinski, john.fastabend, daniel
Cc: borisp, aviadye, netdev, syzbot+df0d4ec12332661dd1f9
Jakub Kicinski wrote:
> On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote:
> > Also there's at least one more bug in this piece of code, TLS 1.3
> > can't assume there's at least one free SG entry.
>
> And I don't see any place where the front and back of the SG circular
> buffer are actually chained :( This:
The easiest way to generate a message that needs to be chained is to
use cork but we haven't yet enabled cork. However, there is one case
with the use of apply, pass, and drop that I think this case could
also be generated. I'll add a test for it and a fix. This case should
only be hit when using with BPF and programs using apply/cork.
I have the patches for cork support on a branch as well so we should
probably just send those out.
>
> static inline void sk_msg_init(struct sk_msg *msg)
> {
> BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
> memset(msg, 0, sizeof(*msg));
> sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> }
>
> looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end,
> we don't know where the end is going to be..
We use end->MAX_MSG_FRAGS and size==0 to indicate an fresh sk_msg. This
should only ever be called to initialize a msg never afterwards.
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 6cb077b646a5..6c6ce6f90e7d 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg)
>
> static inline void sk_msg_init(struct sk_msg *msg)
> {
> - BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
> memset(msg, 0, sizeof(*msg));
> - sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> + sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data);
> }
>
I don't think we want to chain these here. We could drop the init
marker part but its handy when reading sg values.
> static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src,
>
> Hm?
Nice catch on the missing chaining we dropped across various revisions
and rebases of the code. Without cork support our test cases don't hit
it now.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error
2019-11-23 3:25 ` John Fastabend
@ 2019-11-25 19:52 ` Jakub Kicinski
0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2019-11-25 19:52 UTC (permalink / raw)
To: John Fastabend
Cc: borisp, aviadye, daniel, netdev, syzbot+df0d4ec12332661dd1f9
Sorry for the delay, this code is too tricky for me to handle while
applying patches :) Lemme quickly answer best I can and go back into
digging deeper.
On Fri, 22 Nov 2019 19:25:13 -0800, John Fastabend wrote:
> Jakub Kicinski wrote:
> > When tls_do_encryption() fails the SG lists are left with the
> > SG_END and SG_CHAIN marks in place. One could hope that once
> > encryption fails we will never see the record again, but that
> > is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support
> > to sk_msg handling") added special handling to ENOMEM and ENOSPC
> > errors which mean we may see the same record re-submitted.
> >
> > In all honesty I don't understand why we need the ENOMEM handling.
> > Waiting for socket memory without setting SOCK_NOSPACE on any
> > random memory allocation failure seems slightly ill advised.
> >
> > Having said that, undoing the SG markings seems wise regardless.
> >
> > Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com
> > Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
> > Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> > John, I'm sending this mostly to ask if we can safely remove
> > the ENOMEM handling? :)
> >
> What ENOMEM are you asking about here? The return code handling
> from bpf_exec_tx_verdict?
>
>
> ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
> record_type, &copied,
> msg->msg_flags);
> if (ret) {
> if (ret == -EINPROGRESS)
> num_async++;
> else if (ret == -ENOMEM)
> goto wait_for_memory;
> else if (ret == -ENOSPC)
> goto rollback_iter;
> else if (ret != -EAGAIN)
> goto send_end;
> }
>
> I would want to run it through some of our tests but I don't think
> there is anything specific about BPF that needs it to be handled.
> I was just trying to handle the error case gracefully and
> wait_for_memory seems like the right behavior to me. What is ill
> advised here?
It's probably too strong of an expression :)
I was not seeing a clear relationship between ENOMEM and socket being
out of memory. The ENOMEM is likely because of a slab allocation
failure (quick look doesn't really reveal send_pages returning ENOMEM
if socket is full). So the wait is equivalent to a msleep, no?
I was a little worried about this error handling, it seems to make
assumptions about the reasons for errors. I heard a report of user
seeing ENOSPC coming out of a crypto accelerator, which didn't end well.
> > I was going to try the sockmap tests myself, but looks like the current
> > LLVM 10 build I get from their debs just segfaults when trying to build
> > selftest :/
> >
> > Also there's at least one more bug in this piece of code, TLS 1.3
> > can't assume there's at least one free SG entry.
>
> There should always be one free SG entry at the end of the ring
> that is used for chaining.
>
> From sk_msg_sg{}
>
> /* The extra element is used for chaining the front and sections when
> * the list becomes partitioned (e.g. end < start). The crypto APIs
> * require the chaining.
> */
> struct scatterlist data[MAX_MSG_FRAGS + 1];
>
> Can we use that element in that case? Otherwise probably can
> add an extra element there if needed, data[MAX_MSG_FRAGS + 2].
If sg really is a circular buffer we'd need to shift all entries to
make sure the hole is in the right place. Then we have a chain entry
in the middle of the ring, and that's not good, right now the range
of [0, MAX_MSG_FRAGS) is assumed to contain data pages :S
> > net/tls/tls_sw.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index 24161750a737..4a0ea87b20cf 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags,
> > if (rc < 0) {
> > if (rc != -EINPROGRESS) {
> > tls_err_abort(sk, EBADMSG);
> > +
> > + i = msg_pl->sg.end;
> > + if (prot->version == TLS_1_3_VERSION) {
> > + sg_mark_end(sk_msg_elem(msg_pl, i));
> > + sg_unmark_end(sk_msg_elem(msg_pl, i));
> > + }
> > + sk_msg_iter_var_prev(i);
> > + sg_unmark_end(sk_msg_elem(msg_pl, i));
> > +
> > + i = msg_en->sg.end;
> > + sk_msg_iter_var_prev(i);
> > + sg_unmark_end(sk_msg_elem(msg_en, i));
> > +
> > if (split) {
> > tls_ctx->pending_open_record_frags = true;
> > tls_merge_open_record(sk, rec, tmp, orig_end);
>
> Can you copy the tls_push_record() error handling from BPF side instead of
> embedding more into tls_push_record itself?
>
> err = tls_push_record(sk, flags, record_type);
> if (err < 0) {
> *copied -= sk_msg_free(sk, msg);
> tls_free_open_rec(sk);
> goto out_err;
> }
>
> If the BPF program is not installed I guess you can skip the copied part
> because you wont have the 'more_data' case.
>
> So something like (untested/compiled/etc)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 141da093ff04..0469eb73bc88 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -771,8 +771,14 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk,
>
> policy = !(flags & MSG_SENDPAGE_NOPOLICY);
> psock = sk_psock_get(sk);
> - if (!psock || !policy)
> - return tls_push_record(sk, flags, record_type);
> + if (!psock || !policy) {
> + err = tls_push_record(sk, flags, record_type);
> + if (err) {
> + sk_msg_free(sk, msg);
> + tls_free_open_rec(sk); // might not be needed in noBPF
> + }
> + return err;
> + }
> more_data:
> enospc = sk_msg_full(msg);
> if (psock->eval == __SK_NONE) {
Ah, that's a great suggestion, I missed the copied count is passed by
reference into bpf_exec_tx_verdict() and thought more surgery would be
required to handle it like that.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error
2019-11-23 6:56 ` John Fastabend
@ 2019-11-25 19:58 ` Jakub Kicinski
0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2019-11-25 19:58 UTC (permalink / raw)
To: John Fastabend
Cc: daniel, borisp, aviadye, netdev, syzbot+df0d4ec12332661dd1f9
On Fri, 22 Nov 2019 22:56:05 -0800, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote:
> > > Also there's at least one more bug in this piece of code, TLS 1.3
> > > can't assume there's at least one free SG entry.
> >
> > And I don't see any place where the front and back of the SG circular
> > buffer are actually chained :( This:
>
> The easiest way to generate a message that needs to be chained is to
> use cork but we haven't yet enabled cork. However, there is one case
> with the use of apply, pass, and drop that I think this case could
> also be generated. I'll add a test for it and a fix. This case should
> only be hit when using with BPF and programs using apply/cork.
>
> I have the patches for cork support on a branch as well so we should
> probably just send those out.
>
> >
> > static inline void sk_msg_init(struct sk_msg *msg)
> > {
> > BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
> > memset(msg, 0, sizeof(*msg));
> > sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> > }
> >
> > looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end,
> > we don't know where the end is going to be..
>
> We use end->MAX_MSG_FRAGS and size==0 to indicate an fresh sk_msg. This
> should only ever be called to initialize a msg never afterwards.
>
> >
> > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> > index 6cb077b646a5..6c6ce6f90e7d 100644
> > --- a/include/linux/skmsg.h
> > +++ b/include/linux/skmsg.h
> > @@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg)
> >
> > static inline void sk_msg_init(struct sk_msg *msg)
> > {
> > - BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS);
> > memset(msg, 0, sizeof(*msg));
> > - sg_init_marker(msg->sg.data, MAX_MSG_FRAGS);
> > + sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data);
> > }
> >
>
> I don't think we want to chain these here. We could drop the init
> marker part but its handy when reading sg values.
>
> > static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src,
> >
> > Hm?
>
> Nice catch on the missing chaining we dropped across various revisions
> and rebases of the code. Without cork support our test cases don't hit
> it now.
I see, thanks for the explanation! I'll leave fixing the chaining to
you, then.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-11-25 19:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 21:45 [RFC net] net/tls: clear SG markings on encryption error Jakub Kicinski
2019-11-22 22:36 ` Jakub Kicinski
2019-11-23 6:56 ` John Fastabend
2019-11-25 19:58 ` Jakub Kicinski
2019-11-23 3:25 ` John Fastabend
2019-11-25 19:52 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).