* [RFC net] net/tls: clear SG markings on encryption error @ 2019-11-22 21:45 Jakub Kicinski 2019-11-22 22:36 ` Jakub Kicinski 2019-11-23 3:25 ` John Fastabend 0 siblings, 2 replies; 6+ messages in thread From: Jakub Kicinski @ 2019-11-22 21:45 UTC (permalink / raw) To: borisp, aviadye, john.fastabend, daniel Cc: netdev, Jakub Kicinski, syzbot+df0d4ec12332661dd1f9 When tls_do_encryption() fails the SG lists are left with the SG_END and SG_CHAIN marks in place. One could hope that once encryption fails we will never see the record again, but that is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support to sk_msg handling") added special handling to ENOMEM and ENOSPC errors which mean we may see the same record re-submitted. In all honesty I don't understand why we need the ENOMEM handling. Waiting for socket memory without setting SOCK_NOSPACE on any random memory allocation failure seems slightly ill advised. Having said that, undoing the SG markings seems wise regardless. Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support") Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> --- John, I'm sending this mostly to ask if we can safely remove the ENOMEM handling? :) I was going to try the sockmap tests myself, but looks like the current LLVM 10 build I get from their debs just segfaults when trying to build selftest :/ Also there's at least one more bug in this piece of code, TLS 1.3 can't assume there's at least one free SG entry. net/tls/tls_sw.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 24161750a737..4a0ea87b20cf 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags, if (rc < 0) { if (rc != -EINPROGRESS) { tls_err_abort(sk, EBADMSG); + + i = msg_pl->sg.end; + if (prot->version == TLS_1_3_VERSION) { + sg_mark_end(sk_msg_elem(msg_pl, i)); + sg_unmark_end(sk_msg_elem(msg_pl, i)); + } + sk_msg_iter_var_prev(i); + sg_unmark_end(sk_msg_elem(msg_pl, i)); + + i = msg_en->sg.end; + sk_msg_iter_var_prev(i); + sg_unmark_end(sk_msg_elem(msg_en, i)); + if (split) { tls_ctx->pending_open_record_frags = true; tls_merge_open_record(sk, rec, tmp, orig_end); -- 2.23.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error 2019-11-22 21:45 [RFC net] net/tls: clear SG markings on encryption error Jakub Kicinski @ 2019-11-22 22:36 ` Jakub Kicinski 2019-11-23 6:56 ` John Fastabend 2019-11-23 3:25 ` John Fastabend 1 sibling, 1 reply; 6+ messages in thread From: Jakub Kicinski @ 2019-11-22 22:36 UTC (permalink / raw) To: john.fastabend, daniel Cc: borisp, aviadye, netdev, syzbot+df0d4ec12332661dd1f9 On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote: > Also there's at least one more bug in this piece of code, TLS 1.3 > can't assume there's at least one free SG entry. And I don't see any place where the front and back of the SG circular buffer are actually chained :( This: static inline void sk_msg_init(struct sk_msg *msg) { BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS); memset(msg, 0, sizeof(*msg)); sg_init_marker(msg->sg.data, MAX_MSG_FRAGS); } looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end, we don't know where the end is going to be.. diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 6cb077b646a5..6c6ce6f90e7d 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg) static inline void sk_msg_init(struct sk_msg *msg) { - BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS); memset(msg, 0, sizeof(*msg)); - sg_init_marker(msg->sg.data, MAX_MSG_FRAGS); + sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data); } static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src, Hm? ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error 2019-11-22 22:36 ` Jakub Kicinski @ 2019-11-23 6:56 ` John Fastabend 2019-11-25 19:58 ` Jakub Kicinski 0 siblings, 1 reply; 6+ messages in thread From: John Fastabend @ 2019-11-23 6:56 UTC (permalink / raw) To: Jakub Kicinski, john.fastabend, daniel Cc: borisp, aviadye, netdev, syzbot+df0d4ec12332661dd1f9 Jakub Kicinski wrote: > On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote: > > Also there's at least one more bug in this piece of code, TLS 1.3 > > can't assume there's at least one free SG entry. > > And I don't see any place where the front and back of the SG circular > buffer are actually chained :( This: The easiest way to generate a message that needs to be chained is to use cork but we haven't yet enabled cork. However, there is one case with the use of apply, pass, and drop that I think this case could also be generated. I'll add a test for it and a fix. This case should only be hit when using with BPF and programs using apply/cork. I have the patches for cork support on a branch as well so we should probably just send those out. > > static inline void sk_msg_init(struct sk_msg *msg) > { > BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS); > memset(msg, 0, sizeof(*msg)); > sg_init_marker(msg->sg.data, MAX_MSG_FRAGS); > } > > looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end, > we don't know where the end is going to be.. We use end->MAX_MSG_FRAGS and size==0 to indicate an fresh sk_msg. This should only ever be called to initialize a msg never afterwards. > > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h > index 6cb077b646a5..6c6ce6f90e7d 100644 > --- a/include/linux/skmsg.h > +++ b/include/linux/skmsg.h > @@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg) > > static inline void sk_msg_init(struct sk_msg *msg) > { > - BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS); > memset(msg, 0, sizeof(*msg)); > - sg_init_marker(msg->sg.data, MAX_MSG_FRAGS); > + sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data); > } > I don't think we want to chain these here. We could drop the init marker part but its handy when reading sg values. > static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src, > > Hm? Nice catch on the missing chaining we dropped across various revisions and rebases of the code. Without cork support our test cases don't hit it now. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error 2019-11-23 6:56 ` John Fastabend @ 2019-11-25 19:58 ` Jakub Kicinski 0 siblings, 0 replies; 6+ messages in thread From: Jakub Kicinski @ 2019-11-25 19:58 UTC (permalink / raw) To: John Fastabend Cc: daniel, borisp, aviadye, netdev, syzbot+df0d4ec12332661dd1f9 On Fri, 22 Nov 2019 22:56:05 -0800, John Fastabend wrote: > Jakub Kicinski wrote: > > On Fri, 22 Nov 2019 13:45:53 -0800, Jakub Kicinski wrote: > > > Also there's at least one more bug in this piece of code, TLS 1.3 > > > can't assume there's at least one free SG entry. > > > > And I don't see any place where the front and back of the SG circular > > buffer are actually chained :( This: > > The easiest way to generate a message that needs to be chained is to > use cork but we haven't yet enabled cork. However, there is one case > with the use of apply, pass, and drop that I think this case could > also be generated. I'll add a test for it and a fix. This case should > only be hit when using with BPF and programs using apply/cork. > > I have the patches for cork support on a branch as well so we should > probably just send those out. > > > > > static inline void sk_msg_init(struct sk_msg *msg) > > { > > BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS); > > memset(msg, 0, sizeof(*msg)); > > sg_init_marker(msg->sg.data, MAX_MSG_FRAGS); > > } > > > > looks questionable as well, we shouldn't mark MAX_MSG_FRAGS as the end, > > we don't know where the end is going to be.. > > We use end->MAX_MSG_FRAGS and size==0 to indicate an fresh sk_msg. This > should only ever be called to initialize a msg never afterwards. > > > > > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h > > index 6cb077b646a5..6c6ce6f90e7d 100644 > > --- a/include/linux/skmsg.h > > +++ b/include/linux/skmsg.h > > @@ -173,9 +173,8 @@ static inline void sk_msg_clear_meta(struct sk_msg *msg) > > > > static inline void sk_msg_init(struct sk_msg *msg) > > { > > - BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != MAX_MSG_FRAGS); > > memset(msg, 0, sizeof(*msg)); > > - sg_init_marker(msg->sg.data, MAX_MSG_FRAGS); > > + sg_chain(msg->sg.data, ARRAY_SIZE(msg->sg.data), msg->sg.data); > > } > > > > I don't think we want to chain these here. We could drop the init > marker part but its handy when reading sg values. > > > static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src, > > > > Hm? > > Nice catch on the missing chaining we dropped across various revisions > and rebases of the code. Without cork support our test cases don't hit > it now. I see, thanks for the explanation! I'll leave fixing the chaining to you, then. ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [RFC net] net/tls: clear SG markings on encryption error 2019-11-22 21:45 [RFC net] net/tls: clear SG markings on encryption error Jakub Kicinski 2019-11-22 22:36 ` Jakub Kicinski @ 2019-11-23 3:25 ` John Fastabend 2019-11-25 19:52 ` Jakub Kicinski 1 sibling, 1 reply; 6+ messages in thread From: John Fastabend @ 2019-11-23 3:25 UTC (permalink / raw) To: Jakub Kicinski, borisp, aviadye, john.fastabend, daniel Cc: netdev, Jakub Kicinski, syzbot+df0d4ec12332661dd1f9 Jakub Kicinski wrote: > When tls_do_encryption() fails the SG lists are left with the > SG_END and SG_CHAIN marks in place. One could hope that once > encryption fails we will never see the record again, but that > is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support > to sk_msg handling") added special handling to ENOMEM and ENOSPC > errors which mean we may see the same record re-submitted. > > In all honesty I don't understand why we need the ENOMEM handling. > Waiting for socket memory without setting SOCK_NOSPACE on any > random memory allocation failure seems slightly ill advised. > > Having said that, undoing the SG markings seems wise regardless. > > Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com > Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support") > Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > --- > John, I'm sending this mostly to ask if we can safely remove > the ENOMEM handling? :) > What ENOMEM are you asking about here? The return code handling from bpf_exec_tx_verdict? ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, record_type, &copied, msg->msg_flags); if (ret) { if (ret == -EINPROGRESS) num_async++; else if (ret == -ENOMEM) goto wait_for_memory; else if (ret == -ENOSPC) goto rollback_iter; else if (ret != -EAGAIN) goto send_end; } I would want to run it through some of our tests but I don't think there is anything specific about BPF that needs it to be handled. I was just trying to handle the error case gracefully and wait_for_memory seems like the right behavior to me. What is ill advised here? > I was going to try the sockmap tests myself, but looks like the current > LLVM 10 build I get from their debs just segfaults when trying to build > selftest :/ > > Also there's at least one more bug in this piece of code, TLS 1.3 > can't assume there's at least one free SG entry. There should always be one free SG entry at the end of the ring that is used for chaining. From sk_msg_sg{} /* The extra element is used for chaining the front and sections when * the list becomes partitioned (e.g. end < start). The crypto APIs * require the chaining. */ struct scatterlist data[MAX_MSG_FRAGS + 1]; Can we use that element in that case? Otherwise probably can add an extra element there if needed, data[MAX_MSG_FRAGS + 2]. > > net/tls/tls_sw.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > index 24161750a737..4a0ea87b20cf 100644 > --- a/net/tls/tls_sw.c > +++ b/net/tls/tls_sw.c > @@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags, > if (rc < 0) { > if (rc != -EINPROGRESS) { > tls_err_abort(sk, EBADMSG); > + > + i = msg_pl->sg.end; > + if (prot->version == TLS_1_3_VERSION) { > + sg_mark_end(sk_msg_elem(msg_pl, i)); > + sg_unmark_end(sk_msg_elem(msg_pl, i)); > + } > + sk_msg_iter_var_prev(i); > + sg_unmark_end(sk_msg_elem(msg_pl, i)); > + > + i = msg_en->sg.end; > + sk_msg_iter_var_prev(i); > + sg_unmark_end(sk_msg_elem(msg_en, i)); > + > if (split) { > tls_ctx->pending_open_record_frags = true; > tls_merge_open_record(sk, rec, tmp, orig_end); > -- > 2.23.0 > Can you copy the tls_push_record() error handling from BPF side instead of embedding more into tls_push_record itself? err = tls_push_record(sk, flags, record_type); if (err < 0) { *copied -= sk_msg_free(sk, msg); tls_free_open_rec(sk); goto out_err; } If the BPF program is not installed I guess you can skip the copied part because you wont have the 'more_data' case. So something like (untested/compiled/etc) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 141da093ff04..0469eb73bc88 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -771,8 +771,14 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk, policy = !(flags & MSG_SENDPAGE_NOPOLICY); psock = sk_psock_get(sk); - if (!psock || !policy) - return tls_push_record(sk, flags, record_type); + if (!psock || !policy) { + err = tls_push_record(sk, flags, record_type); + if (err) { + sk_msg_free(sk, msg); + tls_free_open_rec(sk); // might not be needed in noBPF + } + return err; + } more_data: enospc = sk_msg_full(msg); if (psock->eval == __SK_NONE) { ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC net] net/tls: clear SG markings on encryption error 2019-11-23 3:25 ` John Fastabend @ 2019-11-25 19:52 ` Jakub Kicinski 0 siblings, 0 replies; 6+ messages in thread From: Jakub Kicinski @ 2019-11-25 19:52 UTC (permalink / raw) To: John Fastabend Cc: borisp, aviadye, daniel, netdev, syzbot+df0d4ec12332661dd1f9 Sorry for the delay, this code is too tricky for me to handle while applying patches :) Lemme quickly answer best I can and go back into digging deeper. On Fri, 22 Nov 2019 19:25:13 -0800, John Fastabend wrote: > Jakub Kicinski wrote: > > When tls_do_encryption() fails the SG lists are left with the > > SG_END and SG_CHAIN marks in place. One could hope that once > > encryption fails we will never see the record again, but that > > is in fact not true. Commit d3b18ad31f93 ("tls: add bpf support > > to sk_msg handling") added special handling to ENOMEM and ENOSPC > > errors which mean we may see the same record re-submitted. > > > > In all honesty I don't understand why we need the ENOMEM handling. > > Waiting for socket memory without setting SOCK_NOSPACE on any > > random memory allocation failure seems slightly ill advised. > > > > Having said that, undoing the SG markings seems wise regardless. > > > > Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com > > Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support") > > Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> > > --- > > John, I'm sending this mostly to ask if we can safely remove > > the ENOMEM handling? :) > > > What ENOMEM are you asking about here? The return code handling > from bpf_exec_tx_verdict? > > > ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, > record_type, &copied, > msg->msg_flags); > if (ret) { > if (ret == -EINPROGRESS) > num_async++; > else if (ret == -ENOMEM) > goto wait_for_memory; > else if (ret == -ENOSPC) > goto rollback_iter; > else if (ret != -EAGAIN) > goto send_end; > } > > I would want to run it through some of our tests but I don't think > there is anything specific about BPF that needs it to be handled. > I was just trying to handle the error case gracefully and > wait_for_memory seems like the right behavior to me. What is ill > advised here? It's probably too strong of an expression :) I was not seeing a clear relationship between ENOMEM and socket being out of memory. The ENOMEM is likely because of a slab allocation failure (quick look doesn't really reveal send_pages returning ENOMEM if socket is full). So the wait is equivalent to a msleep, no? I was a little worried about this error handling, it seems to make assumptions about the reasons for errors. I heard a report of user seeing ENOSPC coming out of a crypto accelerator, which didn't end well. > > I was going to try the sockmap tests myself, but looks like the current > > LLVM 10 build I get from their debs just segfaults when trying to build > > selftest :/ > > > > Also there's at least one more bug in this piece of code, TLS 1.3 > > can't assume there's at least one free SG entry. > > There should always be one free SG entry at the end of the ring > that is used for chaining. > > From sk_msg_sg{} > > /* The extra element is used for chaining the front and sections when > * the list becomes partitioned (e.g. end < start). The crypto APIs > * require the chaining. > */ > struct scatterlist data[MAX_MSG_FRAGS + 1]; > > Can we use that element in that case? Otherwise probably can > add an extra element there if needed, data[MAX_MSG_FRAGS + 2]. If sg really is a circular buffer we'd need to shift all entries to make sure the hole is in the right place. Then we have a chain entry in the middle of the ring, and that's not good, right now the range of [0, MAX_MSG_FRAGS) is assumed to contain data pages :S > > net/tls/tls_sw.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > > index 24161750a737..4a0ea87b20cf 100644 > > --- a/net/tls/tls_sw.c > > +++ b/net/tls/tls_sw.c > > @@ -737,6 +737,19 @@ static int tls_push_record(struct sock *sk, int flags, > > if (rc < 0) { > > if (rc != -EINPROGRESS) { > > tls_err_abort(sk, EBADMSG); > > + > > + i = msg_pl->sg.end; > > + if (prot->version == TLS_1_3_VERSION) { > > + sg_mark_end(sk_msg_elem(msg_pl, i)); > > + sg_unmark_end(sk_msg_elem(msg_pl, i)); > > + } > > + sk_msg_iter_var_prev(i); > > + sg_unmark_end(sk_msg_elem(msg_pl, i)); > > + > > + i = msg_en->sg.end; > > + sk_msg_iter_var_prev(i); > > + sg_unmark_end(sk_msg_elem(msg_en, i)); > > + > > if (split) { > > tls_ctx->pending_open_record_frags = true; > > tls_merge_open_record(sk, rec, tmp, orig_end); > > Can you copy the tls_push_record() error handling from BPF side instead of > embedding more into tls_push_record itself? > > err = tls_push_record(sk, flags, record_type); > if (err < 0) { > *copied -= sk_msg_free(sk, msg); > tls_free_open_rec(sk); > goto out_err; > } > > If the BPF program is not installed I guess you can skip the copied part > because you wont have the 'more_data' case. > > So something like (untested/compiled/etc) > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > index 141da093ff04..0469eb73bc88 100644 > --- a/net/tls/tls_sw.c > +++ b/net/tls/tls_sw.c > @@ -771,8 +771,14 @@ static int bpf_exec_tx_verdict(struct sk_msg *msg, struct sock *sk, > > policy = !(flags & MSG_SENDPAGE_NOPOLICY); > psock = sk_psock_get(sk); > - if (!psock || !policy) > - return tls_push_record(sk, flags, record_type); > + if (!psock || !policy) { > + err = tls_push_record(sk, flags, record_type); > + if (err) { > + sk_msg_free(sk, msg); > + tls_free_open_rec(sk); // might not be needed in noBPF > + } > + return err; > + } > more_data: > enospc = sk_msg_full(msg); > if (psock->eval == __SK_NONE) { Ah, that's a great suggestion, I missed the copied count is passed by reference into bpf_exec_tx_verdict() and thought more surgery would be required to handle it like that. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-11-25 19:59 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-22 21:45 [RFC net] net/tls: clear SG markings on encryption error Jakub Kicinski 2019-11-22 22:36 ` Jakub Kicinski 2019-11-23 6:56 ` John Fastabend 2019-11-25 19:58 ` Jakub Kicinski 2019-11-23 3:25 ` John Fastabend 2019-11-25 19:52 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).