* [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure @ 2017-12-18 11:10 Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 1/6] tcp: Add clean acked data hook Ilya Lesokhin ` (6 more replies) 0 siblings, 7 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-18 11:10 UTC (permalink / raw) To: netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin Changes from v2: - Fix sk use after free and possible netdev use after free - tls device now keeps a refernce on the offloading netdev - tls device registers to the netdev notifer. Upon a NETDEV_DOWN event, offload is stopped and the reference on the netdev is dropped. - SW fallback support for skb->ip_summed != CHECKSUM_PARTIAL - Merged TLS patches are no longer part of this series. Changes from v1: - Remove the binding of the socket to a specific netdev through sk->sk_bound_dev_if. Add a check in validate_xmit_skb to detect route changes and call SW fallback code to do the crypto in software. - tls_get_record now returns the tls record sequence number. This is required to support connections with rcd_sn != iv. - Bug fixes to the TLS code. This patchset adds a generic infrastructure to offload TLS crypto to a network devices. patches 1-2 Export functions that we need patch 3 adds infrastructue for offloaded socket fallback patches 4-5 add new NDOs and capabilities. patch 6 adds the TLS NIC offload infrastructure. Github with mlx5e TLS offload support: https://github.com/Mellanox/tls-offload/tree/tls_device_v3 Paper: https://www.netdevconf.org/1.2/papers/netdevconf-TLS.pdf Ilya Lesokhin (6): tcp: Add clean acked data hook net: Rename and export copy_skb_header net: Add SW fallback infrastructure for offloaded sockets net: Add TLS offload netdev ops net: Add TLS TX offload features tls: Add generic NIC offload infrastructure. include/linux/netdev_features.h | 2 + include/linux/netdevice.h | 23 ++ include/linux/skbuff.h | 1 + include/net/inet_connection_sock.h | 2 + include/net/sock.h | 17 + include/net/tls.h | 62 ++- net/core/dev.c | 4 + net/core/ethtool.c | 1 + net/core/skbuff.c | 9 +- net/ipv4/tcp_input.c | 3 + net/tls/Kconfig | 9 + net/tls/Makefile | 3 + net/tls/tls_device.c | 800 +++++++++++++++++++++++++++++++++++++ net/tls/tls_device_fallback.c | 405 +++++++++++++++++++ net/tls/tls_main.c | 33 +- 15 files changed, 1363 insertions(+), 11 deletions(-) create mode 100644 net/tls/tls_device.c create mode 100644 net/tls/tls_device_fallback.c -- 2.15.0.317.g14c63a9 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3 net-next 1/6] tcp: Add clean acked data hook 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin @ 2017-12-18 11:10 ` Ilya Lesokhin 2017-12-19 19:13 ` Eric Dumazet 2017-12-18 11:10 ` [PATCH v3 net-next 2/6] net: Rename and export copy_skb_header Ilya Lesokhin ` (5 subsequent siblings) 6 siblings, 1 reply; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-18 11:10 UTC (permalink / raw) To: netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin Called when a TCP segment is acknowledged. Could be used by application protocols who hold additional metadata associated with the stream data This is required by TLS device offload to release metadata associated with acknowledged TLS records. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> --- include/net/inet_connection_sock.h | 2 ++ net/ipv4/tcp_input.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 8e1bf9ae4a5e..ec405a667a85 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -77,6 +77,7 @@ struct inet_connection_sock_af_ops { * @icsk_af_ops Operations which are AF_INET{4,6} specific * @icsk_ulp_ops Pluggable ULP control hook * @icsk_ulp_data ULP private data + * @icsk_clean_acked Clean acked data hook * @icsk_listen_portaddr_node hash to the portaddr listener hashtable * @icsk_ca_state: Congestion control state * @icsk_retransmits: Number of unrecovered [RTO] timeouts @@ -102,6 +103,7 @@ struct inet_connection_sock { const struct inet_connection_sock_af_ops *icsk_af_ops; const struct tcp_ulp_ops *icsk_ulp_ops; void *icsk_ulp_data; + void (*icsk_clean_acked)(struct sock *sk); struct hlist_node icsk_listen_portaddr_node; unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu); __u8 icsk_ca_state:6, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4d55c4b338ee..961abc5be84c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3592,6 +3592,9 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if (!prior_packets) goto no_queue; + if (icsk->icsk_clean_acked) + icsk->icsk_clean_acked(sk); + /* See if we can take anything off of the retransmit queue. */ flag |= tcp_clean_rtx_queue(sk, prior_fack, prior_snd_una, &sack_state); -- 2.15.0.317.g14c63a9 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 1/6] tcp: Add clean acked data hook 2017-12-18 11:10 ` [PATCH v3 net-next 1/6] tcp: Add clean acked data hook Ilya Lesokhin @ 2017-12-19 19:13 ` Eric Dumazet 2017-12-19 19:21 ` Ilya Lesokhin 0 siblings, 1 reply; 35+ messages in thread From: Eric Dumazet @ 2017-12-19 19:13 UTC (permalink / raw) To: Ilya Lesokhin, netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl On Mon, 2017-12-18 at 13:10 +0200, Ilya Lesokhin wrote: > Called when a TCP segment is acknowledged. > Could be used by application protocols who hold additional > metadata associated with the stream data > This is required by TLS device offload to release > metadata associated with acknowledged TLS records. > > Signed-off-by: Boris Pismenny <borisp@mellanox.com> > Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> > Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> > --- > include/net/inet_connection_sock.h | 2 ++ > net/ipv4/tcp_input.c | 3 +++ > 2 files changed, 5 insertions(+) > > diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h > index 8e1bf9ae4a5e..ec405a667a85 100644 > --- a/include/net/inet_connection_sock.h > +++ b/include/net/inet_connection_sock.h > @@ -77,6 +77,7 @@ struct inet_connection_sock_af_ops { > * @icsk_af_ops Operations which are AF_INET{4,6} specific > * @icsk_ulp_ops Pluggable ULP control hook > * @icsk_ulp_data ULP private data > + * @icsk_clean_acked Clean acked data hook > * @icsk_listen_portaddr_node hash to the portaddr listener hashtable > * @icsk_ca_state: Congestion control state > * @icsk_retransmits: Number of unrecovered [RTO] timeouts > @@ -102,6 +103,7 @@ struct inet_connection_sock { > const struct inet_connection_sock_af_ops *icsk_af_ops; > const struct tcp_ulp_ops *icsk_ulp_ops; > void *icsk_ulp_data; > + void (*icsk_clean_acked)(struct sock *sk); > struct hlist_node icsk_listen_portaddr_node; > unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu); > __u8 icsk_ca_state:6, > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 4d55c4b338ee..961abc5be84c 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -3592,6 +3592,9 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) > if (!prior_packets) > goto no_queue; > > + if (icsk->icsk_clean_acked) > + icsk->icsk_clean_acked(sk); > + > /* See if we can take anything off of the retransmit queue. */ > flag |= tcp_clean_rtx_queue(sk, prior_fack, prior_snd_una, &sack_state); > 1) tcp_ack() is already very expensive. 2) Since you do not pass any state here, this looks very suspicious to me. ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 1/6] tcp: Add clean acked data hook 2017-12-19 19:13 ` Eric Dumazet @ 2017-12-19 19:21 ` Ilya Lesokhin 2017-12-19 19:28 ` Eric Dumazet 0 siblings, 1 reply; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-19 19:21 UTC (permalink / raw) To: Eric Dumazet, netdev, davem Cc: davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss > 1) tcp_ack() is already very expensive. > I'm not sure how what we should do with that comment. We need Some trigger to free TLS records. tcp_ack seemed like a reasonable Trigger. > 2) Since you do not pass any state here, this looks very suspicious to > me. > The state we need is the acknowledged sequence and it located in the socket. https://github.com/Mellanox/tls-offload/blob/tls_device_v3/net/tls/tls_device.c#L157 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 1/6] tcp: Add clean acked data hook 2017-12-19 19:21 ` Ilya Lesokhin @ 2017-12-19 19:28 ` Eric Dumazet 2017-12-19 19:43 ` Ilya Lesokhin 0 siblings, 1 reply; 35+ messages in thread From: Eric Dumazet @ 2017-12-19 19:28 UTC (permalink / raw) To: Ilya Lesokhin, netdev, davem Cc: davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss On Tue, 2017-12-19 at 19:21 +0000, Ilya Lesokhin wrote: > > 1) tcp_ack() is already very expensive. > > > > I'm not sure how what we should do with that comment. We need > Some trigger to free TLS records. tcp_ack seemed like a reasonable > Trigger. TLS records should be attached to skbs ? It seems more reasonable to free TLS when skb are freed, and not in general tcp_ack() path. > > > 2) Since you do not pass any state here, this looks very suspicious to > > me. > > > > The state we need is the acknowledged sequence and it located in the socket. > https://github.com/Mellanox/tls-offload/blob/tls_device_v3/net/tls/tls_device.c#L157 So it looks like TCP stack is bleeding all over the places ? So in the future, a change in TCP stack will have to make sure we do not break net/tls/... compilation. Not pretty. ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 1/6] tcp: Add clean acked data hook 2017-12-19 19:28 ` Eric Dumazet @ 2017-12-19 19:43 ` Ilya Lesokhin 0 siblings, 0 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-19 19:43 UTC (permalink / raw) To: Eric Dumazet, netdev, davem Cc: davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss > > TLS records should be attached to skbs ? > > It seems more reasonable to free TLS when skb are freed, and not in > general tcp_ack() path. We've considered it, but then we would have to touch all the places the TCP stack splits or merges SKBs. Seems more intrusive. > > > > > > 2) Since you do not pass any state here, this looks very suspicious to > > > me. > > > > > > > The state we need is the acknowledged sequence and it located in the > socket. > >> > So it looks like TCP stack is bleeding all over the places ? > > So in the future, a change in TCP stack will have to make sure we do > not break net/tls/... compilation. > > Not pretty. I guess we could pass the ack sequence number in that callback. Would that address the issue? ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3 net-next 2/6] net: Rename and export copy_skb_header 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 1/6] tcp: Add clean acked data hook Ilya Lesokhin @ 2017-12-18 11:10 ` Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets Ilya Lesokhin ` (4 subsequent siblings) 6 siblings, 0 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-18 11:10 UTC (permalink / raw) To: netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin copy_skb_header is renamed to skb_copy_header and exported. Exposing this function give more flexibility in copying SKBs. skb_copy and skb_copy_expand do not give enough control over which parts are copied. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 9 +++++---- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b8e0da6c27d6..1857ec77e89b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1028,6 +1028,7 @@ static inline struct sk_buff *alloc_skb_fclone(unsigned int size, struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src); int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask); struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority); +void skb_copy_header(struct sk_buff *new, const struct sk_buff *old); struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t priority); struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom, gfp_t gfp_mask, bool fclone); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a592ca025fc4..c318cdf67a83 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1301,7 +1301,7 @@ static void skb_headers_offset_update(struct sk_buff *skb, int off) skb->inner_mac_header += off; } -static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) +void skb_copy_header(struct sk_buff *new, const struct sk_buff *old) { __copy_skb_header(new, old); @@ -1309,6 +1309,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs; skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type; } +EXPORT_SYMBOL(skb_copy_header); static inline int skb_alloc_rx_flag(const struct sk_buff *skb) { @@ -1351,7 +1352,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); - copy_skb_header(n, skb); + skb_copy_header(n, skb); return n; } EXPORT_SYMBOL(skb_copy); @@ -1415,7 +1416,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom, skb_clone_fraglist(n); } - copy_skb_header(n, skb); + skb_copy_header(n, skb); out: return n; } @@ -1595,7 +1596,7 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, BUG_ON(skb_copy_bits(skb, -head_copy_len, n->head + head_copy_off, skb->len + head_copy_len)); - copy_skb_header(n, skb); + skb_copy_header(n, skb); skb_headers_offset_update(n, newheadroom - oldheadroom); -- 2.15.0.317.g14c63a9 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 1/6] tcp: Add clean acked data hook Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 2/6] net: Rename and export copy_skb_header Ilya Lesokhin @ 2017-12-18 11:10 ` Ilya Lesokhin 2017-12-18 19:18 ` Marcelo Ricardo Leitner 2017-12-19 19:12 ` Eric Dumazet 2017-12-18 11:10 ` [PATCH v3 net-next 4/6] net: Add TLS offload netdev ops Ilya Lesokhin ` (3 subsequent siblings) 6 siblings, 2 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-18 11:10 UTC (permalink / raw) To: netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin Offloaded sockets rely on the netdev to transform the transmitted packets before sending them over the network. When a packet from an offloaded socket is looped back or rerouted to a different device we need to detect it and do the transformation in software Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> --- include/net/sock.h | 17 +++++++++++++++++ net/core/dev.c | 4 ++++ 2 files changed, 21 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index 9a9047268d37..5397307603ec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -479,6 +479,9 @@ struct sock { void (*sk_error_report)(struct sock *sk); int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); + struct sk_buff* (*sk_offload_check)(struct sock *sk, + struct net_device *dev, + struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); struct sock_reuseport __rcu *sk_reuseport_cb; struct rcu_head sk_rcu; @@ -2324,6 +2327,20 @@ static inline bool sk_fullsock(const struct sock *sk) return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); } +/* Checks if this SKB belongs to an HW offloaded socket + * and whether any SW fallbacks are required based on dev. + */ +static inline struct sk_buff *skb_offload_check(struct sk_buff *skb, + struct net_device *dev) +{ + struct sock *sk = skb->sk; + + if (sk && sk_fullsock(sk) && sk->sk_offload_check) + skb = sk->sk_offload_check(sk, dev, skb); + + return skb; +} + /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV * SYNACK messages can be attached to either ones (depending on SYNCOOKIE) */ diff --git a/net/core/dev.c b/net/core/dev.c index b0eee49a2489..6a78d9046674 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3051,6 +3051,10 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device if (unlikely(!skb)) goto out_null; + skb = skb_offload_check(skb, dev); + if (!skb) + goto out_null; + if (netif_needs_gso(skb, features)) { struct sk_buff *segs; -- 2.15.0.317.g14c63a9 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets 2017-12-18 11:10 ` [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets Ilya Lesokhin @ 2017-12-18 19:18 ` Marcelo Ricardo Leitner 2017-12-19 7:51 ` Ilya Lesokhin 2017-12-19 19:12 ` Eric Dumazet 1 sibling, 1 reply; 35+ messages in thread From: Marcelo Ricardo Leitner @ 2017-12-18 19:18 UTC (permalink / raw) To: Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl On Mon, Dec 18, 2017 at 01:10:30PM +0200, Ilya Lesokhin wrote: > Offloaded sockets rely on the netdev to transform the transmitted > packets before sending them over the network. > When a packet from an offloaded socket is looped back or > rerouted to a different device we need to detect it and > do the transformation in software > > Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> > Signed-off-by: Boris Pismenny <borisp@mellanox.com> > --- > include/net/sock.h | 17 +++++++++++++++++ > net/core/dev.c | 4 ++++ > 2 files changed, 21 insertions(+) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 9a9047268d37..5397307603ec 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -479,6 +479,9 @@ struct sock { > void (*sk_error_report)(struct sock *sk); > int (*sk_backlog_rcv)(struct sock *sk, > struct sk_buff *skb); > + struct sk_buff* (*sk_offload_check)(struct sock *sk, > + struct net_device *dev, > + struct sk_buff *skb); > void (*sk_destruct)(struct sock *sk); > struct sock_reuseport __rcu *sk_reuseport_cb; > struct rcu_head sk_rcu; > @@ -2324,6 +2327,20 @@ static inline bool sk_fullsock(const struct sock *sk) > return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); > } > > +/* Checks if this SKB belongs to an HW offloaded socket > + * and whether any SW fallbacks are required based on dev. > + */ > +static inline struct sk_buff *skb_offload_check(struct sk_buff *skb, > + struct net_device *dev) > +{ > + struct sock *sk = skb->sk; > + > + if (sk && sk_fullsock(sk) && sk->sk_offload_check) Isn't this going to hurt the fast path, checking for sk fields here? > + skb = sk->sk_offload_check(sk, dev, skb); > + > + return skb; > +} > + > /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV > * SYNACK messages can be attached to either ones (depending on SYNCOOKIE) > */ > diff --git a/net/core/dev.c b/net/core/dev.c > index b0eee49a2489..6a78d9046674 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3051,6 +3051,10 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device > if (unlikely(!skb)) > goto out_null; > > + skb = skb_offload_check(skb, dev); > + if (!skb) > + goto out_null; > + > if (netif_needs_gso(skb, features)) { > struct sk_buff *segs; > > -- > 2.15.0.317.g14c63a9 > ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets 2017-12-18 19:18 ` Marcelo Ricardo Leitner @ 2017-12-19 7:51 ` Ilya Lesokhin 2017-12-19 15:05 ` Marcelo Ricardo Leitner 0 siblings, 1 reply; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-19 7:51 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: netdev, davem, davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss, Steffen Klassert On Monday, December 18, 2017 9:18 PM, Marcelo Ricardo Leitner wrote: > > + > > + if (sk && sk_fullsock(sk) && sk->sk_offload_check) > > Isn't this going to hurt the fast path, checking for sk fields here? > We do add code to the fast path but it seems unavoidable if you want to have SW fallback. The XFRM device offload also does that http://elixir.free-electrons.com/linux/v4.14.7/source/net/core/dev.c#L3058 The check can be optimized but I didn't want to do that before I saw that it's an issue. I'm also not sure what the correct solution is. I don't like that fact that each "stateful protocol" we offload requires its own check. We need to think if we can find a generic way of doing it. Perhaps we can hold the expected netdev somewhere in the SKB and only if we don't Go out of the expected netdev go to a slow path that does a check for each protocol. Thanks, Ilya ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets 2017-12-19 7:51 ` Ilya Lesokhin @ 2017-12-19 15:05 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 35+ messages in thread From: Marcelo Ricardo Leitner @ 2017-12-19 15:05 UTC (permalink / raw) To: Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss, Steffen Klassert On Tue, Dec 19, 2017 at 07:51:02AM +0000, Ilya Lesokhin wrote: > On Monday, December 18, 2017 9:18 PM, Marcelo Ricardo Leitner wrote: > > > > + > > > + if (sk && sk_fullsock(sk) && sk->sk_offload_check) > > > > Isn't this going to hurt the fast path, checking for sk fields here? > > > > We do add code to the fast path but it seems unavoidable if you want to have SW fallback. > The XFRM device offload also does that > http://elixir.free-electrons.com/linux/v4.14.7/source/net/core/dev.c#L3058 Right, although a bit different. It's accessing skb->sp and not the socket and depending on how compiler is doing things, the check http://elixir.free-electrons.com/linux/v4.14.7/source/net/xfrm/xfrm_device.c#L32 will help in some cases. But more importantly, all the above only exists if CONFIG_XFRM_OFFLOAD is enabled. > > The check can be optimized but I didn't want to do that before I saw that it's an issue. > I'm also not sure what the correct solution is. > I don't like that fact that each "stateful protocol" we offload requires its own check. > We need to think if we can find a generic way of doing it. > > Perhaps we can hold the expected netdev somewhere in the SKB and only if we don't > Go out of the expected netdev go to a slow path that does a check for each protocol. This could be a good switch, yes. Thanks, Marcelo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets 2017-12-18 11:10 ` [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets Ilya Lesokhin 2017-12-18 19:18 ` Marcelo Ricardo Leitner @ 2017-12-19 19:12 ` Eric Dumazet 2017-12-19 19:15 ` Ilya Lesokhin 1 sibling, 1 reply; 35+ messages in thread From: Eric Dumazet @ 2017-12-19 19:12 UTC (permalink / raw) To: Ilya Lesokhin, netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl On Mon, 2017-12-18 at 13:10 +0200, Ilya Lesokhin wrote: > Offloaded sockets rely on the netdev to transform the transmitted > packets before sending them over the network. > When a packet from an offloaded socket is looped back or > rerouted to a different device we need to detect it and > do the transformation in software > > Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> > Signed-off-by: Boris Pismenny <borisp@mellanox.com> > --- > include/net/sock.h | 17 +++++++++++++++++ > net/core/dev.c | 4 ++++ > 2 files changed, 21 insertions(+) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 9a9047268d37..5397307603ec 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -479,6 +479,9 @@ struct sock { > void (*sk_error_report)(struct sock *sk); > int (*sk_backlog_rcv)(struct sock *sk, > struct sk_buff *skb); > + struct sk_buff* (*sk_offload_check)(struct sock *sk, > + struct net_device *dev, > + struct sk_buff *skb); > void (*sk_destruct)(struct sock *sk); > struct sock_reuseport __rcu *sk_reuseport_cb; > struct rcu_head sk_rcu; > @@ -2324,6 +2327,20 @@ static inline bool sk_fullsock(const struct sock *sk) > return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); > } > > +/* Checks if this SKB belongs to an HW offloaded socket > + * and whether any SW fallbacks are required based on dev. > + */ > +static inline struct sk_buff *skb_offload_check(struct sk_buff *skb, > + struct net_device *dev) > +{ > + struct sock *sk = skb->sk; > + > + if (sk && sk_fullsock(sk) && sk->sk_offload_check) > + skb = sk->sk_offload_check(sk, dev, skb); > + > + return skb; > +} > + > /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV > * SYNACK messages can be attached to either ones (depending on SYNCOOKIE) > */ > diff --git a/net/core/dev.c b/net/core/dev.c > index b0eee49a2489..6a78d9046674 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3051,6 +3051,10 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device > if (unlikely(!skb)) > goto out_null; > > + skb = skb_offload_check(skb, dev); > + if (!skb) > + goto out_null; > + > if (netif_needs_gso(skb, features)) { > struct sk_buff *segs; > How is it going to work with stacked devices (bonding, team, tunnels) ? ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets 2017-12-19 19:12 ` Eric Dumazet @ 2017-12-19 19:15 ` Ilya Lesokhin 0 siblings, 0 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-19 19:15 UTC (permalink / raw) To: Eric Dumazet, netdev, davem Cc: davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss > > diff --git a/net/core/dev.c b/net/core/dev.c > > index b0eee49a2489..6a78d9046674 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3051,6 +3051,10 @@ static struct sk_buff *validate_xmit_skb(struct > sk_buff *skb, struct net_device > > if (unlikely(!skb)) > > goto out_null; > > > > + skb = skb_offload_check(skb, dev); > > + if (!skb) > > + goto out_null; > > + > > if (netif_needs_gso(skb, features)) { > > struct sk_buff *segs; > > > > How is it going to work with stacked devices (bonding, team, tunnels) ? > Currently we don't support TLS offload on stacked devices. Note however that in that case the TLS layer will refuse to use device offload And we will not go through this software offload path. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3 net-next 4/6] net: Add TLS offload netdev ops 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin ` (2 preceding siblings ...) 2017-12-18 11:10 ` [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets Ilya Lesokhin @ 2017-12-18 11:10 ` Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 5/6] net: Add TLS TX offload features Ilya Lesokhin ` (2 subsequent siblings) 6 siblings, 0 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-18 11:10 UTC (permalink / raw) To: netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin Add new netdev ops to add and delete tls context Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> --- include/linux/netdevice.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cc4ce7456e38..48f26fe67b82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -845,6 +845,25 @@ struct xfrmdev_ops { }; #endif +#if IS_ENABLED(CONFIG_TLS_DEVICE) +enum tls_offload_ctx_dir { + TLS_OFFLOAD_CTX_DIR_RX, + TLS_OFFLOAD_CTX_DIR_TX, +}; + +struct tls_crypto_info; +struct tls_context; + +struct tlsdev_ops { + int (*tls_dev_add)(struct net_device *netdev, struct sock *sk, + enum tls_offload_ctx_dir direction, + struct tls_crypto_info *crypto_info); + void (*tls_dev_del)(struct net_device *netdev, + struct tls_context *ctx, + enum tls_offload_ctx_dir direction); +}; +#endif + struct dev_ifalias { struct rcu_head rcuhead; char ifalias[]; @@ -1730,6 +1749,10 @@ struct net_device { const struct xfrmdev_ops *xfrmdev_ops; #endif +#if IS_ENABLED(CONFIG_TLS_DEVICE) + const struct tlsdev_ops *tlsdev_ops; +#endif + const struct header_ops *header_ops; unsigned int flags; -- 2.15.0.317.g14c63a9 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 net-next 5/6] net: Add TLS TX offload features 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin ` (3 preceding siblings ...) 2017-12-18 11:10 ` [PATCH v3 net-next 4/6] net: Add TLS offload netdev ops Ilya Lesokhin @ 2017-12-18 11:10 ` Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin 2017-12-18 17:10 ` [PATCH v3 net-next 0/6] " Jiri Pirko 6 siblings, 0 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-18 11:10 UTC (permalink / raw) To: netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin This patch adds a netdev feature to configure TLS TX offloads. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> --- include/linux/netdev_features.h | 2 ++ net/core/ethtool.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index b1b0ca7ccb2b..b219776a1340 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -77,6 +77,7 @@ enum { NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload */ NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */ NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */ + NETIF_F_HW_TLS_TX_BIT, /* Hardware TLS TX offload */ /* * Add your fresh new feature above and remember to update @@ -142,6 +143,7 @@ enum { #define NETIF_F_HW_ESP __NETIF_F(HW_ESP) #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM) #define NETIF_F_RX_UDP_TUNNEL_PORT __NETIF_F(RX_UDP_TUNNEL_PORT) +#define NETIF_F_HW_TLS_TX __NETIF_F(HW_TLS_TX) #define for_each_netdev_feature(mask_addr, bit) \ for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) diff --git a/net/core/ethtool.c b/net/core/ethtool.c index f8fcf450a36e..a1138d2dafa6 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -106,6 +106,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] [NETIF_F_HW_ESP_BIT] = "esp-hw-offload", [NETIF_F_HW_ESP_TX_CSUM_BIT] = "esp-tx-csum-hw-offload", [NETIF_F_RX_UDP_TUNNEL_PORT_BIT] = "rx-udp_tunnel-port-offload", + [NETIF_F_HW_TLS_TX_BIT] = "tls-hw-tx-offload", }; static const char -- 2.15.0.317.g14c63a9 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin ` (4 preceding siblings ...) 2017-12-18 11:10 ` [PATCH v3 net-next 5/6] net: Add TLS TX offload features Ilya Lesokhin @ 2017-12-18 11:10 ` Ilya Lesokhin 2017-12-18 19:53 ` Marcelo Ricardo Leitner ` (4 more replies) 2017-12-18 17:10 ` [PATCH v3 net-next 0/6] " Jiri Pirko 6 siblings, 5 replies; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-18 11:10 UTC (permalink / raw) To: netdev, davem Cc: davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin This patch adds a generic infrastructure to offload TLS crypto to a network devices. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmit the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> --- include/net/tls.h | 62 +++- net/tls/Kconfig | 9 + net/tls/Makefile | 3 + net/tls/tls_device.c | 800 ++++++++++++++++++++++++++++++++++++++++++ net/tls/tls_device_fallback.c | 405 +++++++++++++++++++++ net/tls/tls_main.c | 33 +- 6 files changed, 1305 insertions(+), 7 deletions(-) create mode 100644 net/tls/tls_device.c create mode 100644 net/tls/tls_device_fallback.c diff --git a/include/net/tls.h b/include/net/tls.h index 936cfc5cab7d..9c1b5d13d9a7 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -75,6 +75,29 @@ struct tls_sw_context { struct scatterlist sg_aead_out[2]; }; +struct tls_record_info { + struct list_head list; + u32 end_seq; + int len; + int num_frags; + skb_frag_t frags[MAX_SKB_FRAGS]; +}; + +struct tls_offload_context { + struct crypto_aead *aead_send; + + struct list_head records_list; + struct scatterlist sg_tx_data[MAX_SKB_FRAGS]; + void (*sk_destruct)(struct sock *sk); + struct tls_record_info *open_record; + struct tls_record_info *retransmit_hint; + u64 hint_record_sn; + u64 unacked_record_sn; + + u32 expected_seq; + spinlock_t lock; /* protects records list */ +}; + enum { TLS_PENDING_CLOSED_RECORD }; @@ -85,6 +108,10 @@ struct tls_context { struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128; }; + struct list_head list; + struct net_device *netdev; + refcount_t refcount; + void *priv_ctx; u8 tx_conf:2; @@ -129,9 +156,29 @@ int tls_sw_sendpage(struct sock *sk, struct page *page, void tls_sw_close(struct sock *sk, long timeout); void tls_sw_free_tx_resources(struct sock *sk); -void tls_sk_destruct(struct sock *sk, struct tls_context *ctx); -void tls_icsk_clean_acked(struct sock *sk); +void tls_clear_device_offload(struct sock *sk, struct tls_context *ctx); +int tls_set_device_offload(struct sock *sk, struct tls_context *ctx); +int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); +int tls_device_sendpage(struct sock *sk, struct page *page, + int offset, size_t size, int flags); +void tls_device_sk_destruct(struct sock *sk); +void tls_device_init(void); +void tls_device_cleanup(void); +struct tls_record_info *tls_get_record(struct tls_offload_context *context, + u32 seq, u64 *p_record_sn); + +static inline bool tls_record_is_start_marker(struct tls_record_info *rec) +{ + return rec->len == 0; +} + +static inline u32 tls_record_start_seq(struct tls_record_info *rec) +{ + return rec->end_seq - rec->len; +} + +void tls_sk_destruct(struct sock *sk, struct tls_context *ctx); int tls_push_sg(struct sock *sk, struct tls_context *ctx, struct scatterlist *sg, u16 first_offset, int flags); @@ -168,6 +215,13 @@ static inline bool tls_is_pending_open_record(struct tls_context *tls_ctx) return tls_ctx->pending_open_record_frags; } +static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk) +{ + return sk_fullsock(sk) && + /* matches smp_store_release in tls_set_device_offload */ + smp_load_acquire(&sk->sk_destruct) == &tls_device_sk_destruct; +} + static inline void tls_err_abort(struct sock *sk) { sk->sk_err = -EBADMSG; @@ -255,4 +309,8 @@ static inline struct tls_offload_context *tls_offload_ctx( int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg, unsigned char *record_type); +int tls_sw_fallback_init(struct sock *sk, + struct tls_offload_context *offload_ctx, + struct tls_crypto_info *crypto_info); + #endif /* _TLS_OFFLOAD_H */ diff --git a/net/tls/Kconfig b/net/tls/Kconfig index eb583038c67e..1a4ea55c2f09 100644 --- a/net/tls/Kconfig +++ b/net/tls/Kconfig @@ -13,3 +13,12 @@ config TLS encryption handling of the TLS protocol to be done in-kernel. If unsure, say N. + +config TLS_DEVICE + bool "Transport Layer Security HW offload" + depends on TLS + default n + ---help--- + Enable kernel support for HW offload of the TLS protocol. + + If unsure, say N. diff --git a/net/tls/Makefile b/net/tls/Makefile index a930fd1c4f7b..44483cd47b3a 100644 --- a/net/tls/Makefile +++ b/net/tls/Makefile @@ -5,3 +5,6 @@ obj-$(CONFIG_TLS) += tls.o tls-y := tls_main.o tls_sw.o + +tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o + diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c new file mode 100644 index 000000000000..5082d693a503 --- /dev/null +++ b/net/tls/tls_device.c @@ -0,0 +1,800 @@ +/* Copyright (c) 2016-2017, Mellanox Technologies All rights reserved. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * - Neither the name of the Mellanox Technologies nor the + * names of its contributors may be used to endorse or promote + * products derived from this software without specific prior written + * permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE + */ + +#include <linux/module.h> +#include <net/tcp.h> +#include <net/inet_common.h> +#include <linux/highmem.h> +#include <linux/netdevice.h> + +#include <net/tls.h> +#include <crypto/aead.h> + +/* device_offload_lock is used to synchronize tls_dev_add + * against NETDEV_DOWN notifications. + */ +DEFINE_STATIC_PERCPU_RWSEM(device_offload_lock); + +static void tls_device_gc_task(struct work_struct *work); + +static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task); +static LIST_HEAD(tls_device_gc_list); +static LIST_HEAD(tls_device_list); +static DEFINE_SPINLOCK(tls_device_lock); + +static void tls_device_free_ctx(struct tls_context *ctx) +{ + struct tls_offload_context *offlad_ctx = tls_offload_ctx(ctx); + + kfree(offlad_ctx); + kfree(ctx); +} + +static void tls_device_gc_task(struct work_struct *work) +{ + struct tls_context *ctx, *tmp; + struct list_head gc_list; + unsigned long flags; + + spin_lock_irqsave(&tls_device_lock, flags); + INIT_LIST_HEAD(&gc_list); + list_splice_init(&tls_device_gc_list, &gc_list); + spin_unlock_irqrestore(&tls_device_lock, flags); + + list_for_each_entry_safe(ctx, tmp, &gc_list, list) { + struct net_device *netdev = ctx->netdev; + + if (netdev) { + netdev->tlsdev_ops->tls_dev_del(netdev, ctx, + TLS_OFFLOAD_CTX_DIR_TX); + dev_put(netdev); + } + + list_del(&ctx->list); + tls_device_free_ctx(ctx); + } +} + +static void tls_device_queue_ctx_destruction(struct tls_context *ctx) +{ + unsigned long flags; + + spin_lock_irqsave(&tls_device_lock, flags); + list_move_tail(&ctx->list, &tls_device_gc_list); + + /* schedule_work inside the spinlock + * to make sure tls_device_down waits for that work. + */ + schedule_work(&tls_device_gc_work); + + spin_unlock_irqrestore(&tls_device_lock, flags); +} + +/* We assume that the socket is already connected */ +static struct net_device *get_netdev_for_sock(struct sock *sk) +{ + struct inet_sock *inet = inet_sk(sk); + struct net_device *netdev = NULL; + + netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif); + + return netdev; +} + +static int attach_sock_to_netdev(struct sock *sk, struct net_device *netdev, + struct tls_context *ctx) +{ + int rc; + + rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, TLS_OFFLOAD_CTX_DIR_TX, + &ctx->crypto_send); + if (rc) { + pr_err("The netdev has refused to offload this socket\n"); + goto out; + } + + rc = 0; +out: + return rc; +} + +static void destroy_record(struct tls_record_info *record) +{ + skb_frag_t *frag; + int nr_frags = record->num_frags; + + while (nr_frags > 0) { + frag = &record->frags[nr_frags - 1]; + __skb_frag_unref(frag); + --nr_frags; + } + kfree(record); +} + +static void delete_all_records(struct tls_offload_context *offload_ctx) +{ + struct tls_record_info *info, *temp; + + list_for_each_entry_safe(info, temp, &offload_ctx->records_list, list) { + list_del(&info->list); + destroy_record(info); + } + + offload_ctx->retransmit_hint = NULL; +} + +static void tls_icsk_clean_acked(struct sock *sk) +{ + struct tls_context *tls_ctx = tls_get_ctx(sk); + struct tls_offload_context *ctx; + struct tcp_sock *tp = tcp_sk(sk); + struct tls_record_info *info, *temp; + unsigned long flags; + u64 deleted_records = 0; + + if (!tls_ctx) + return; + + ctx = tls_offload_ctx(tls_ctx); + + spin_lock_irqsave(&ctx->lock, flags); + info = ctx->retransmit_hint; + if (info && !before(tp->snd_una, info->end_seq)) { + ctx->retransmit_hint = NULL; + list_del(&info->list); + destroy_record(info); + deleted_records++; + } + + list_for_each_entry_safe(info, temp, &ctx->records_list, list) { + if (before(tp->snd_una, info->end_seq)) + break; + list_del(&info->list); + + destroy_record(info); + deleted_records++; + } + + ctx->unacked_record_sn += deleted_records; + spin_unlock_irqrestore(&ctx->lock, flags); +} + +/* At this point, there should be no references on this + * socket and no in-flight SKBs associated with this + * socket, so it is safe to free all the resources. + */ +void tls_device_sk_destruct(struct sock *sk) +{ + struct tls_context *tls_ctx = tls_get_ctx(sk); + struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx); + + if (ctx->open_record) + destroy_record(ctx->open_record); + + delete_all_records(ctx); + crypto_free_aead(ctx->aead_send); + ctx->sk_destruct(sk); + + if (refcount_dec_and_test(&tls_ctx->refcount)) + tls_device_queue_ctx_destruction(tls_ctx); +} +EXPORT_SYMBOL(tls_device_sk_destruct); + +static inline void tls_append_frag(struct tls_record_info *record, + struct page_frag *pfrag, + int size) +{ + skb_frag_t *frag; + + frag = &record->frags[record->num_frags - 1]; + if (frag->page.p == pfrag->page && + frag->page_offset + frag->size == pfrag->offset) { + frag->size += size; + } else { + ++frag; + frag->page.p = pfrag->page; + frag->page_offset = pfrag->offset; + frag->size = size; + ++record->num_frags; + get_page(pfrag->page); + } + + pfrag->offset += size; + record->len += size; +} + +static inline int tls_push_record(struct sock *sk, + struct tls_context *ctx, + struct tls_offload_context *offload_ctx, + struct tls_record_info *record, + struct page_frag *pfrag, + int flags, + unsigned char record_type) +{ + skb_frag_t *frag; + struct tcp_sock *tp = tcp_sk(sk); + struct page_frag fallback_frag; + struct page_frag *tag_pfrag = pfrag; + int i; + + /* fill prepand */ + frag = &record->frags[0]; + tls_fill_prepend(ctx, + skb_frag_address(frag), + record->len - ctx->prepend_size, + record_type); + + if (unlikely(!skb_page_frag_refill(ctx->tag_size, pfrag, GFP_KERNEL))) { + /* HW doesn't care about the data in the tag + * so in case pfrag has no room + * for a tag and we can't allocate a new pfrag + * just use the page in the first frag + * rather then write a complicated fall back code. + */ + tag_pfrag = &fallback_frag; + tag_pfrag->page = skb_frag_page(frag); + tag_pfrag->offset = 0; + } + + tls_append_frag(record, tag_pfrag, ctx->tag_size); + record->end_seq = tp->write_seq + record->len; + spin_lock_irq(&offload_ctx->lock); + list_add_tail(&record->list, &offload_ctx->records_list); + spin_unlock_irq(&offload_ctx->lock); + offload_ctx->open_record = NULL; + set_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags); + tls_advance_record_sn(sk, ctx); + + for (i = 0; i < record->num_frags; i++) { + frag = &record->frags[i]; + sg_unmark_end(&offload_ctx->sg_tx_data[i]); + sg_set_page(&offload_ctx->sg_tx_data[i], skb_frag_page(frag), + frag->size, frag->page_offset); + sk_mem_charge(sk, frag->size); + get_page(skb_frag_page(frag)); + } + sg_mark_end(&offload_ctx->sg_tx_data[record->num_frags - 1]); + + /* all ready, send */ + return tls_push_sg(sk, ctx, offload_ctx->sg_tx_data, 0, flags); +} + +static inline int tls_create_new_record(struct tls_offload_context *offload_ctx, + struct page_frag *pfrag, + size_t prepend_size) +{ + skb_frag_t *frag; + struct tls_record_info *record; + + record = kmalloc(sizeof(*record), GFP_KERNEL); + if (!record) + return -ENOMEM; + + frag = &record->frags[0]; + __skb_frag_set_page(frag, pfrag->page); + frag->page_offset = pfrag->offset; + skb_frag_size_set(frag, prepend_size); + + get_page(pfrag->page); + pfrag->offset += prepend_size; + + record->num_frags = 1; + record->len = prepend_size; + offload_ctx->open_record = record; + return 0; +} + +static inline int tls_do_allocation(struct sock *sk, + struct tls_offload_context *offload_ctx, + struct page_frag *pfrag, + size_t prepend_size) +{ + int ret; + + if (!offload_ctx->open_record) { + if (unlikely(!skb_page_frag_refill(prepend_size, pfrag, + sk->sk_allocation))) { + sk->sk_prot->enter_memory_pressure(sk); + sk_stream_moderate_sndbuf(sk); + return -ENOMEM; + } + + ret = tls_create_new_record(offload_ctx, pfrag, prepend_size); + if (ret) + return ret; + + if (pfrag->size > pfrag->offset) + return 0; + } + + if (!sk_page_frag_refill(sk, pfrag)) + return -ENOMEM; + + return 0; +} + +static int tls_push_data(struct sock *sk, + struct iov_iter *msg_iter, + size_t size, int flags, + unsigned char record_type) +{ + struct tls_context *tls_ctx = tls_get_ctx(sk); + struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx); + struct tls_record_info *record = ctx->open_record; + struct page_frag *pfrag; + int copy, rc = 0; + size_t orig_size = size; + u32 max_open_record_len; + long timeo; + int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE); + int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST; + bool done = false; + + if (flags & + ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST)) + return -ENOTSUPP; + + if (sk->sk_err) + return -sk->sk_err; + + timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); + rc = tls_complete_pending_work(sk, tls_ctx, flags, &timeo); + if (rc < 0) + return rc; + + pfrag = sk_page_frag(sk); + + /* KTLS_TLS_HEADER_SIZE is not counted as part of the TLS record, and + * we need to leave room for an authentication tag. + */ + max_open_record_len = TLS_MAX_PAYLOAD_SIZE + + tls_ctx->prepend_size; + do { + if (tls_do_allocation(sk, ctx, pfrag, + tls_ctx->prepend_size)) { + rc = sk_stream_wait_memory(sk, &timeo); + if (!rc) + continue; + + record = ctx->open_record; + if (!record) + break; +handle_error: + if (record_type != TLS_RECORD_TYPE_DATA) { + /* avoid sending partial + * record with type != + * application_data + */ + size = orig_size; + destroy_record(record); + ctx->open_record = NULL; + } else if (record->len > tls_ctx->prepend_size) { + goto last_record; + } + + break; + } + + record = ctx->open_record; + copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); + copy = min_t(size_t, copy, (max_open_record_len - record->len)); + + if (copy_from_iter_nocache(page_address(pfrag->page) + + pfrag->offset, + copy, msg_iter) != copy) { + rc = -EFAULT; + goto handle_error; + } + tls_append_frag(record, pfrag, copy); + + size -= copy; + if (!size) { +last_record: + tls_push_record_flags = flags; + if (more) { + tls_ctx->pending_open_record_frags = + record->num_frags; + break; + } + + done = true; + } + + if ((done) || record->len >= max_open_record_len || + (record->num_frags >= MAX_SKB_FRAGS - 1)) { + rc = tls_push_record(sk, + tls_ctx, + ctx, + record, + pfrag, + tls_push_record_flags, + record_type); + if (rc < 0) + break; + } + } while (!done); + + if (orig_size - size > 0) + rc = orig_size - size; + + return rc; +} + +int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) +{ + unsigned char record_type = TLS_RECORD_TYPE_DATA; + int rc = 0; + + lock_sock(sk); + + if (unlikely(msg->msg_controllen)) { + rc = tls_proccess_cmsg(sk, msg, &record_type); + if (rc) + goto out; + } + + rc = tls_push_data(sk, &msg->msg_iter, size, + msg->msg_flags, record_type); + +out: + release_sock(sk); + return rc; +} + +int tls_device_sendpage(struct sock *sk, struct page *page, + int offset, size_t size, int flags) +{ + struct iov_iter msg_iter; + struct kvec iov; + char *kaddr = kmap(page); + int rc = 0; + + if (flags & MSG_SENDPAGE_NOTLAST) + flags |= MSG_MORE; + + lock_sock(sk); + + if (flags & MSG_OOB) { + rc = -ENOTSUPP; + goto out; + } + + iov.iov_base = kaddr + offset; + iov.iov_len = size; + iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, &iov, 1, size); + rc = tls_push_data(sk, &msg_iter, size, + flags, TLS_RECORD_TYPE_DATA); + kunmap(page); + +out: + release_sock(sk); + return rc; +} + +struct tls_record_info *tls_get_record(struct tls_offload_context *context, + u32 seq, u64 *p_record_sn) +{ + struct tls_record_info *info; + u64 record_sn = context->hint_record_sn; + + info = context->retransmit_hint; + if (!info || + before(seq, info->end_seq - info->len)) { + /* if retransmit_hint is irrelevant start + * from the begging of the list + */ + info = list_first_entry(&context->records_list, + struct tls_record_info, list); + record_sn = context->unacked_record_sn; + } + + list_for_each_entry_from(info, &context->records_list, list) { + if (before(seq, info->end_seq)) { + if (!context->retransmit_hint || + after(info->end_seq, + context->retransmit_hint->end_seq)) { + context->hint_record_sn = record_sn; + context->retransmit_hint = info; + } + *p_record_sn = record_sn; + return info; + } + record_sn++; + } + + return NULL; +} +EXPORT_SYMBOL(tls_get_record); + +static int tls_device_push_pending_record(struct sock *sk, int flags) +{ + struct iov_iter msg_iter; + + iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, NULL, 0, 0); + return tls_push_data(sk, &msg_iter, 0, flags, TLS_RECORD_TYPE_DATA); +} + +int tls_set_device_offload(struct sock *sk, struct tls_context *ctx) +{ + struct tls_crypto_info *crypto_info; + struct tls_offload_context *offload_ctx; + struct tls_record_info *start_marker_record; + u16 nonece_size, tag_size, iv_size, rec_seq_size; + char *iv, *rec_seq; + int rc; + struct net_device *netdev; + struct sk_buff *skb; + + if (!ctx) { + rc = -EINVAL; + goto out; + } + + if (ctx->priv_ctx) { + rc = -EEXIST; + goto out; + } + + /* We support starting offload on multiple sockets + * concurrently, So we only need a read lock here. + */ + percpu_down_read(&device_offload_lock); + netdev = get_netdev_for_sock(sk); + if (!netdev) { + pr_err("%s: netdev not found\n", __func__); + rc = -EINVAL; + goto release_lock; + } + + if (!(netdev->features & NETIF_F_HW_TLS_TX)) { + rc = -ENOTSUPP; + goto release_netdev; + } + + /* Avoid offloading if the device is down + * We don't want to offload new flows after + * the NETDEV_DOWN event + */ + if (!(netdev->flags & IFF_UP)) { + rc = -EINVAL; + goto release_lock; + } + + crypto_info = &ctx->crypto_send; + switch (crypto_info->cipher_type) { + case TLS_CIPHER_AES_GCM_128: { + nonece_size = TLS_CIPHER_AES_GCM_128_IV_SIZE; + tag_size = TLS_CIPHER_AES_GCM_128_TAG_SIZE; + iv_size = TLS_CIPHER_AES_GCM_128_IV_SIZE; + iv = ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->iv; + rec_seq_size = TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE; + rec_seq = + ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->rec_seq; + break; + } + default: + rc = -EINVAL; + goto release_netdev; + } + + start_marker_record = kmalloc(sizeof(*start_marker_record), GFP_KERNEL); + if (!start_marker_record) { + rc = -ENOMEM; + goto release_netdev; + } + + rc = attach_sock_to_netdev(sk, netdev, ctx); + if (rc) + goto free_marker_record; + + ctx->netdev = netdev; + + ctx->prepend_size = TLS_HEADER_SIZE + nonece_size; + ctx->tag_size = tag_size; + ctx->iv_size = iv_size; + ctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE, + GFP_KERNEL); + if (!ctx->iv) { + rc = -ENOMEM; + goto detach_sock; + } + + memcpy(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, rec_seq, iv_size); + + ctx->rec_seq_size = rec_seq_size; + ctx->rec_seq = kmalloc(rec_seq_size, GFP_KERNEL); + if (!ctx->rec_seq) { + rc = -ENOMEM; + goto err_iv; + } + memcpy(ctx->rec_seq, rec_seq, rec_seq_size); + + offload_ctx = ctx->priv_ctx; + memcpy(&offload_ctx->unacked_record_sn, rec_seq, + sizeof(offload_ctx->unacked_record_sn)); + + /* start at rec_seq -1 to account for the start marker record */ + offload_ctx->unacked_record_sn = + be64_to_cpu(offload_ctx->unacked_record_sn) - 1; + + rc = tls_sw_fallback_init(sk, offload_ctx, crypto_info); + if (rc) + goto err_iv; + + start_marker_record->end_seq = tcp_sk(sk)->write_seq; + start_marker_record->len = 0; + start_marker_record->num_frags = 0; + + INIT_LIST_HEAD(&offload_ctx->records_list); + list_add_tail(&start_marker_record->list, &offload_ctx->records_list); + spin_lock_init(&offload_ctx->lock); + + inet_csk(sk)->icsk_clean_acked = &tls_icsk_clean_acked; + ctx->push_pending_record = tls_device_push_pending_record; + offload_ctx->sk_destruct = sk->sk_destruct; + + /* TLS offload is greatly simplified if we don't send + * SKBs where only part of the payload needs to be encrypted. + * So mark the last skb in the write queue as end of record. + */ + skb = tcp_write_queue_tail(sk); + if (skb) + TCP_SKB_CB(skb)->eor = 1; + + refcount_set(&ctx->refcount, 1); + spin_lock_irq(&tls_device_lock); + list_add_tail(&ctx->list, &tls_device_list); + spin_unlock_irq(&tls_device_lock); + + /* following this assignment tls_is_sk_tx_device_offloaded + * will return true and the context might be accessed + * by the netdev's xmit function. + */ + smp_store_release(&sk->sk_destruct, + &tls_device_sk_destruct); + goto release_lock; + +err_iv: + kfree(ctx->iv); +detach_sock: + netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_TX); +free_marker_record: + kfree(start_marker_record); +release_netdev: + dev_put(netdev); +release_lock: + percpu_up_read(&device_offload_lock); +out: + return rc; +} + +static int tls_device_register(struct net_device *dev) +{ + if ((dev->features & NETIF_F_HW_TLS_TX) && !dev->tlsdev_ops) + return NOTIFY_BAD; + + return NOTIFY_DONE; +} + +static int tls_device_unregister(struct net_device *dev) +{ + return NOTIFY_DONE; +} + +static int tls_device_feat_change(struct net_device *dev) +{ + if ((dev->features & NETIF_F_HW_TLS_TX) && !dev->tlsdev_ops) + return NOTIFY_BAD; + + return NOTIFY_DONE; +} + +static int tls_device_down(struct net_device *netdev) +{ + struct tls_context *ctx, *tmp; + struct list_head list; + unsigned long flags; + + if (!(netdev->features & NETIF_F_HW_TLS_TX)) + return NOTIFY_DONE; + + /* Request a write lock to block new offload attempts + */ + percpu_down_write(&device_offload_lock); + + spin_lock_irqsave(&tls_device_lock, flags); + INIT_LIST_HEAD(&list); + + list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) { + if (ctx->netdev != netdev || + !refcount_inc_not_zero(&ctx->refcount)) + continue; + + list_move(&ctx->list, &list); + } + spin_unlock_irqrestore(&tls_device_lock, flags); + + list_for_each_entry_safe(ctx, tmp, &list, list) { + netdev->tlsdev_ops->tls_dev_del(netdev, ctx, + TLS_OFFLOAD_CTX_DIR_TX); + ctx->netdev = NULL; + dev_put(netdev); + list_del_init(&ctx->list); + + if (refcount_dec_and_test(&ctx->refcount)) + tls_device_free_ctx(ctx); + } + + percpu_up_write(&device_offload_lock); + + flush_work(&tls_device_gc_work); + + return NOTIFY_DONE; +} + +static int tls_dev_event(struct notifier_block *this, unsigned long event, + void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + + switch (event) { + case NETDEV_REGISTER: + return tls_device_register(dev); + + case NETDEV_UNREGISTER: + return tls_device_unregister(dev); + + case NETDEV_FEAT_CHANGE: + return tls_device_feat_change(dev); + + case NETDEV_DOWN: + return tls_device_down(dev); + } + return NOTIFY_DONE; +} + +static struct notifier_block tls_dev_notifier = { + .notifier_call = tls_dev_event, +}; + +void __init tls_device_init(void) +{ + register_netdevice_notifier(&tls_dev_notifier); +} + +void __exit tls_device_cleanup(void) +{ + unregister_netdevice_notifier(&tls_dev_notifier); + flush_work(&tls_device_gc_work); +} diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c new file mode 100644 index 000000000000..95d04d6ef0e4 --- /dev/null +++ b/net/tls/tls_device_fallback.c @@ -0,0 +1,405 @@ +/* Copyright (c) 2016-2017, Mellanox Technologies All rights reserved. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * - Neither the name of the Mellanox Technologies nor the + * names of its contributors may be used to endorse or promote + * products derived from this software without specific prior written + * permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE + */ + +#include <net/tls.h> +#include <crypto/aead.h> +#include <crypto/scatterwalk.h> + +static void chain_to_walk(struct scatterlist *sg, struct scatter_walk *walk) +{ + struct scatterlist *src = walk->sg; + int diff = walk->offset - src->offset; + + sg_set_page(sg, sg_page(src), + src->length - diff, walk->offset); + + scatterwalk_crypto_chain(sg, sg_next(src), 0, 2); +} + +static int tls_enc_record(struct aead_request *aead_req, + struct crypto_aead *aead, char *aad, char *iv, + __be64 rcd_sn, struct scatter_walk *in, + struct scatter_walk *out, int *in_len) +{ + struct scatterlist sg_in[3]; + struct scatterlist sg_out[3]; + unsigned char buf[TLS_HEADER_SIZE + TLS_CIPHER_AES_GCM_128_IV_SIZE]; + u16 len; + int rc; + + len = min_t(int, *in_len, ARRAY_SIZE(buf)); + + scatterwalk_copychunks(buf, in, len, 0); + scatterwalk_copychunks(buf, out, len, 1); + + *in_len -= len; + if (!*in_len) + return 0; + + scatterwalk_pagedone(in, 0, 1); + scatterwalk_pagedone(out, 1, 1); + + len = buf[4] | (buf[3] << 8); + len -= TLS_CIPHER_AES_GCM_128_IV_SIZE; + + tls_make_aad(aad, len - TLS_CIPHER_AES_GCM_128_TAG_SIZE, + (char *)&rcd_sn, sizeof(rcd_sn), buf[0]); + + memcpy(iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, buf + TLS_HEADER_SIZE, + TLS_CIPHER_AES_GCM_128_IV_SIZE); + + sg_init_table(sg_in, ARRAY_SIZE(sg_in)); + sg_init_table(sg_out, ARRAY_SIZE(sg_out)); + sg_set_buf(sg_in, aad, TLS_AAD_SPACE_SIZE); + sg_set_buf(sg_out, aad, TLS_AAD_SPACE_SIZE); + chain_to_walk(sg_in + 1, in); + chain_to_walk(sg_out + 1, out); + + *in_len -= len; + if (*in_len < 0) { + *in_len += TLS_CIPHER_AES_GCM_128_TAG_SIZE; + if (*in_len < 0) + /* the input buffer doesn't contain the entire record. + * trim len accordingly. The resulting authentication tag + * will contain garbage. but we don't care as we won't + * include any of it in the output skb + * Note that we assume the output buffer length + * is larger then input buffer length + tag size + */ + len += *in_len; + + *in_len = 0; + } + + if (*in_len) { + scatterwalk_copychunks(NULL, in, len, 2); + scatterwalk_pagedone(in, 0, 1); + scatterwalk_copychunks(NULL, out, len, 2); + scatterwalk_pagedone(out, 1, 1); + } + + len -= TLS_CIPHER_AES_GCM_128_TAG_SIZE; + aead_request_set_crypt(aead_req, sg_in, sg_out, len, iv); + + rc = crypto_aead_encrypt(aead_req); + + return rc; +} + +static void tls_init_aead_request(struct aead_request *aead_req, + struct crypto_aead *aead) +{ + aead_request_set_tfm(aead_req, aead); + aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE); +} + +static struct aead_request *tls_alloc_aead_request(struct crypto_aead *aead, + gfp_t flags) +{ + unsigned int req_size = sizeof(struct aead_request) + + crypto_aead_reqsize(aead); + struct aead_request *aead_req; + + aead_req = kzalloc(req_size, flags); + if (!aead) + return NULL; + + tls_init_aead_request(aead_req, aead); + return aead_req; +} + +static int tls_enc_records(struct aead_request *aead_req, + struct crypto_aead *aead, struct scatterlist *sg_in, + struct scatterlist *sg_out, char *aad, char *iv, + u64 rcd_sn, int len) +{ + struct scatter_walk in; + struct scatter_walk out; + int rc; + + scatterwalk_start(&in, sg_in); + scatterwalk_start(&out, sg_out); + + do { + rc = tls_enc_record(aead_req, aead, aad, iv, + cpu_to_be64(rcd_sn), &in, &out, &len); + rcd_sn++; + + } while (rc == 0 && len); + + scatterwalk_done(&in, 0, 0); + scatterwalk_done(&out, 1, 0); + + return rc; +} + +static inline void update_chksum(struct sk_buff *skb, int headln) +{ + /* Can't use icsk->icsk_af_ops->send_check here because the ip addresses + * might have been changed by NAT. + */ + + const struct ipv6hdr *ipv6h; + const struct iphdr *iph; + struct tcphdr *th = tcp_hdr(skb); + int datalen = skb->len - headln; + + /* We only changed the payload so if we are using partial we don't + * need to update anything. + */ + if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) + return; + + skb->ip_summed = CHECKSUM_PARTIAL; + skb->csum_start = skb_transport_header(skb) - skb->head; + skb->csum_offset = offsetof(struct tcphdr, check); + + if (skb->sk->sk_family == AF_INET6) { + ipv6h = ipv6_hdr(skb); + th->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, + datalen, IPPROTO_TCP, 0); + } else { + iph = ip_hdr(skb); + th->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, datalen, + IPPROTO_TCP, 0); + } +} + +static void complete_skb(struct sk_buff *nskb, struct sk_buff *skb, int headln) +{ + skb_copy_header(nskb, skb); + + skb_put(nskb, skb->len); + memcpy(nskb->data, skb->data, headln); + update_chksum(nskb, headln); + + nskb->destructor = skb->destructor; + nskb->sk = skb->sk; + skb->destructor = NULL; + skb->sk = NULL; + refcount_add(nskb->truesize - skb->truesize, + &nskb->sk->sk_wmem_alloc); +} + +/* This function may be called after the user socket is already + * closed so make sure we don't use anything freed during + * tls_sk_proto_close here + */ +struct sk_buff *tls_sw_fallback(struct sock *sk, struct sk_buff *skb) +{ + int tcp_header_size = tcp_hdrlen(skb); + int tcp_payload_offset = skb_transport_offset(skb) + tcp_header_size; + int payload_len = skb->len - tcp_payload_offset; + struct tls_context *tls_ctx = tls_get_ctx(sk); + struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx); + int remaining, buf_len, resync_sgs, rc, i = 0; + void *buf, *dummy_buf, *iv, *aad; + struct scatterlist sg_in[2 * (MAX_SKB_FRAGS + 1)]; + struct scatterlist sg_out[3]; + u32 tcp_seq = ntohl(tcp_hdr(skb)->seq); + struct aead_request *aead_req; + struct sk_buff *nskb = NULL; + struct tls_record_info *record; + unsigned long flags; + s32 sync_size; + u64 rcd_sn; + + if (!payload_len) + return skb; + + sg_init_table(sg_in, ARRAY_SIZE(sg_in)); + sg_init_table(sg_out, ARRAY_SIZE(sg_out)); + + spin_lock_irqsave(&ctx->lock, flags); + record = tls_get_record(ctx, tcp_seq, &rcd_sn); + if (!record) { + spin_unlock_irqrestore(&ctx->lock, flags); + WARN(1, "Record not found for seq %u\n", tcp_seq); + goto free_orig; + } + + sync_size = tcp_seq - tls_record_start_seq(record); + if (sync_size < 0) { + int is_start_marker = tls_record_is_start_marker(record); + + spin_unlock_irqrestore(&ctx->lock, flags); + if (!is_start_marker) + /* This should only occur if the relevant record was + * already acked. In that case it should be ok + * to drop the packet and avoid retransmission. + * + * There is a corner case where the packet contains + * both an acked and a non-acked record. + * We currently don't handle that case and rely + * on TCP to retranmit a packet that doesn't contain + * already acked payload. + */ + goto free_orig; + + if (payload_len > -sync_size) { + WARN(1, "Fallback of partially offloaded packets is not supported\n"); + goto free_orig; + } else { + return skb; + } + } + + remaining = sync_size; + while (remaining > 0) { + skb_frag_t *frag = &record->frags[i]; + + __skb_frag_ref(frag); + sg_set_page(sg_in + i, skb_frag_page(frag), + skb_frag_size(frag), frag->page_offset); + + remaining -= skb_frag_size(frag); + + if (remaining < 0) + sg_in[i].length += remaining; + + i++; + } + spin_unlock_irqrestore(&ctx->lock, flags); + resync_sgs = i; + + aead_req = tls_alloc_aead_request(ctx->aead_send, GFP_ATOMIC); + if (!aead_req) + goto put_sg; + + buf_len = TLS_CIPHER_AES_GCM_128_SALT_SIZE + + TLS_CIPHER_AES_GCM_128_IV_SIZE + + TLS_AAD_SPACE_SIZE + + sync_size + + tls_ctx->tag_size; + buf = kmalloc(buf_len, GFP_ATOMIC); + if (!buf) + goto free_req; + + nskb = alloc_skb(skb_headroom(skb) + skb->len, GFP_ATOMIC); + if (!nskb) + goto free_req; + + skb_reserve(nskb, skb_headroom(skb)); + + iv = buf; + + memcpy(iv, tls_ctx->crypto_send_aes_gcm_128.salt, + TLS_CIPHER_AES_GCM_128_SALT_SIZE); + aad = buf + TLS_CIPHER_AES_GCM_128_SALT_SIZE + + TLS_CIPHER_AES_GCM_128_IV_SIZE; + dummy_buf = aad + TLS_AAD_SPACE_SIZE; + + sg_set_buf(&sg_out[0], dummy_buf, sync_size); + sg_set_buf(&sg_out[1], nskb->data + tcp_payload_offset, + payload_len); + /* Add room for authentication tag produced by crypto */ + dummy_buf += sync_size; + sg_set_buf(&sg_out[2], dummy_buf, tls_ctx->tag_size); + rc = skb_to_sgvec(skb, &sg_in[i], tcp_payload_offset, + payload_len); + if (rc < 0) + goto free_nskb; + + rc = tls_enc_records(aead_req, ctx->aead_send, sg_in, sg_out, aad, iv, + rcd_sn, sync_size + payload_len); + if (rc < 0) + goto free_nskb; + + complete_skb(nskb, skb, tcp_payload_offset); + + /* validate_xmit_skb_list assumes that if the skb wasn't segmented + * nskb->prev will point to the skb itself + */ + nskb->prev = nskb; +free_buf: + kfree(buf); +free_req: + kfree(aead_req); +put_sg: + for (i = 0; i < resync_sgs; i++) + put_page(sg_page(&sg_in[i])); +free_orig: + kfree_skb(skb); + return nskb; + +free_nskb: + kfree_skb(nskb); + nskb = NULL; + goto free_buf; +} + +static struct sk_buff * +tls_validate_xmit(struct sock *sk, struct net_device *dev, struct sk_buff *skb) +{ + if (dev == tls_get_ctx(sk)->netdev) + return skb; + + return tls_sw_fallback(sk, skb); +} + +int tls_sw_fallback_init(struct sock *sk, + struct tls_offload_context *offload_ctx, + struct tls_crypto_info *crypto_info) +{ + int rc; + const u8 *key; + + offload_ctx->aead_send = + crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC); + if (IS_ERR(offload_ctx->aead_send)) { + pr_err("crypto_alloc_aead failed\n"); + rc = PTR_ERR(offload_ctx->aead_send); + offload_ctx->aead_send = NULL; + goto err_out; + } + + key = ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->key; + + rc = crypto_aead_setkey(offload_ctx->aead_send, key, + TLS_CIPHER_AES_GCM_128_KEY_SIZE); + if (rc) + goto free_aead; + + rc = crypto_aead_setauthsize(offload_ctx->aead_send, + TLS_CIPHER_AES_GCM_128_TAG_SIZE); + if (rc) + goto free_aead; + + sk->sk_offload_check = tls_validate_xmit; + return 0; +free_aead: + crypto_free_aead(offload_ctx->aead_send); +err_out: + return rc; +} diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index e07ee3ae0023..f5a211cc30b1 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -48,6 +48,9 @@ MODULE_LICENSE("Dual BSD/GPL"); enum { TLS_BASE_TX, TLS_SW_TX, +#ifdef CONFIG_TLS_DEVICE + TLS_HW_TX, +#endif TLS_NUM_CONFIG, }; @@ -401,11 +404,19 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval, goto out; } - /* currently SW is default, we will have ethtool in future */ - rc = tls_set_sw_offload(sk, ctx); - tx_conf = TLS_SW_TX; - if (rc) - goto err_crypto_info; +#ifdef CONFIG_TLS_DEVICE + rc = tls_set_device_offload(sk, ctx); + tx_conf = TLS_HW_TX; + if (rc) { +#else + { +#endif + /* if HW offload fails fallback to SW */ + rc = tls_set_sw_offload(sk, ctx); + tx_conf = TLS_SW_TX; + if (rc) + goto err_crypto_info; + } ctx->tx_conf = tx_conf; update_sk_prot(sk, ctx); @@ -487,12 +498,21 @@ static void build_protos(struct proto *prot, struct proto *base) prot[TLS_SW_TX] = prot[TLS_BASE_TX]; prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg; prot[TLS_SW_TX].sendpage = tls_sw_sendpage; + +#ifdef CONFIG_TLS_DEVICE + prot[TLS_HW_TX] = prot[TLS_SW_TX]; + prot[TLS_HW_TX].sendmsg = tls_device_sendmsg; + prot[TLS_HW_TX].sendpage = tls_device_sendpage; +#endif } static int __init tls_register(void) { build_protos(tls_prots, &tcp_prot); +#ifdef CONFIG_TLS_DEVICE + tls_device_init(); +#endif tcp_register_ulp(&tcp_tls_ulp_ops); return 0; @@ -501,6 +521,9 @@ static int __init tls_register(void) static void __exit tls_unregister(void) { tcp_unregister_ulp(&tcp_tls_ulp_ops); +#ifdef CONFIG_TLS_DEVICE + tls_device_cleanup(); +#endif } module_init(tls_register); -- 2.15.0.317.g14c63a9 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin @ 2017-12-18 19:53 ` Marcelo Ricardo Leitner 2017-12-19 7:31 ` Ilya Lesokhin 2017-12-19 7:00 ` kbuild test robot ` (3 subsequent siblings) 4 siblings, 1 reply; 35+ messages in thread From: Marcelo Ricardo Leitner @ 2017-12-18 19:53 UTC (permalink / raw) To: Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl On Mon, Dec 18, 2017 at 01:10:33PM +0200, Ilya Lesokhin wrote: > This patch adds a generic infrastructure to offload TLS crypto to a > network devices. It enables the kernel TLS socket to skip encryption > and authentication operations on the transmit side of the data path. > Leaving those computationally expensive operations to the NIC. I have a hard time understanding why this was named 'tls_device' if no net_device's are registered. > > The NIC offload infrastructure builds TLS records and pushes them to > the TCP layer just like the SW KTLS implementation and using the same API. > TCP segmentation is mostly unaffected. Currently the only exception is > that we prevent mixed SKBs where only part of the payload requires > offload. In the future we are likely to add a similar restriction > following a change cipher spec record. > > The notable differences between SW KTLS and NIC offloaded TLS > implementations are as follows: > 1. The offloaded implementation builds "plaintext TLS record", those > records contain plaintext instead of ciphertext and place holder bytes > instead of authentication tags. > 2. The offloaded implementation maintains a mapping from TCP sequence > number to TLS records. Thus given a TCP SKB sent from a NIC offloaded > TLS socket, we can use the tls NIC offload infrastructure to obtain > enough context to encrypt the payload of the SKB. > A TLS record is released when the last byte of the record is ack'ed, > this is done through the new icsk_clean_acked callback. > > The infrastructure should be extendable to support various NIC offload > implementations. However it is currently written with the > implementation below in mind: > The NIC assumes that packets from each offloaded stream are sent as > plaintext and in-order. It keeps track of the TLS records in the TCP > stream. When a packet marked for offload is transmitted, the NIC > encrypts the payload in-place and puts authentication tags in the > relevant place holders. > > The responsibility for handling out-of-order packets (i.e. TCP > retransmission, qdisc drops) falls on the netdev driver. > > The netdev driver keeps track of the expected TCP SN from the NIC's > perspective. If the next packet to transmit matches the expected TCP > SN, the driver advances the expected TCP SN, and transmits the packet > with TLS offload indication. > > If the next packet to transmit does not match the expected TCP SN. The > driver calls the TLS layer to obtain the TLS record that includes the > TCP of the packet for transmission. Using this TLS record, the driver > posts a work entry on the transmit queue to reconstruct the NIC TLS > state required for the offload of the out-of-order packet. It updates > the expected TCP SN accordingly and transmit the now in-order packet. > The same queue is used for packet transmission and TLS context > reconstruction to avoid the need for flushing the transmit queue before > issuing the context reconstruction request. > > Signed-off-by: Boris Pismenny <borisp@mellanox.com> > Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> > Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> > --- > include/net/tls.h | 62 +++- > net/tls/Kconfig | 9 + > net/tls/Makefile | 3 + > net/tls/tls_device.c | 800 ++++++++++++++++++++++++++++++++++++++++++ > net/tls/tls_device_fallback.c | 405 +++++++++++++++++++++ > net/tls/tls_main.c | 33 +- > 6 files changed, 1305 insertions(+), 7 deletions(-) > create mode 100644 net/tls/tls_device.c > create mode 100644 net/tls/tls_device_fallback.c > > diff --git a/include/net/tls.h b/include/net/tls.h > index 936cfc5cab7d..9c1b5d13d9a7 100644 > --- a/include/net/tls.h > +++ b/include/net/tls.h > @@ -75,6 +75,29 @@ struct tls_sw_context { > struct scatterlist sg_aead_out[2]; > }; > > +struct tls_record_info { > + struct list_head list; > + u32 end_seq; > + int len; > + int num_frags; > + skb_frag_t frags[MAX_SKB_FRAGS]; > +}; > + > +struct tls_offload_context { > + struct crypto_aead *aead_send; > + > + struct list_head records_list; > + struct scatterlist sg_tx_data[MAX_SKB_FRAGS]; > + void (*sk_destruct)(struct sock *sk); > + struct tls_record_info *open_record; > + struct tls_record_info *retransmit_hint; > + u64 hint_record_sn; > + u64 unacked_record_sn; > + > + u32 expected_seq; > + spinlock_t lock; /* protects records list */ > +}; > + > enum { > TLS_PENDING_CLOSED_RECORD > }; > @@ -85,6 +108,10 @@ struct tls_context { > struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128; > }; > > + struct list_head list; > + struct net_device *netdev; > + refcount_t refcount; > + > void *priv_ctx; > > u8 tx_conf:2; > @@ -129,9 +156,29 @@ int tls_sw_sendpage(struct sock *sk, struct page *page, > void tls_sw_close(struct sock *sk, long timeout); > void tls_sw_free_tx_resources(struct sock *sk); > > -void tls_sk_destruct(struct sock *sk, struct tls_context *ctx); > -void tls_icsk_clean_acked(struct sock *sk); > +void tls_clear_device_offload(struct sock *sk, struct tls_context *ctx); > +int tls_set_device_offload(struct sock *sk, struct tls_context *ctx); > +int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); > +int tls_device_sendpage(struct sock *sk, struct page *page, > + int offset, size_t size, int flags); > +void tls_device_sk_destruct(struct sock *sk); > +void tls_device_init(void); > +void tls_device_cleanup(void); > > +struct tls_record_info *tls_get_record(struct tls_offload_context *context, > + u32 seq, u64 *p_record_sn); > + > +static inline bool tls_record_is_start_marker(struct tls_record_info *rec) > +{ > + return rec->len == 0; > +} > + > +static inline u32 tls_record_start_seq(struct tls_record_info *rec) > +{ > + return rec->end_seq - rec->len; > +} > + > +void tls_sk_destruct(struct sock *sk, struct tls_context *ctx); > int tls_push_sg(struct sock *sk, struct tls_context *ctx, > struct scatterlist *sg, u16 first_offset, > int flags); > @@ -168,6 +215,13 @@ static inline bool tls_is_pending_open_record(struct tls_context *tls_ctx) > return tls_ctx->pending_open_record_frags; > } > > +static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk) > +{ > + return sk_fullsock(sk) && > + /* matches smp_store_release in tls_set_device_offload */ > + smp_load_acquire(&sk->sk_destruct) == &tls_device_sk_destruct; > +} > + > static inline void tls_err_abort(struct sock *sk) > { > sk->sk_err = -EBADMSG; > @@ -255,4 +309,8 @@ static inline struct tls_offload_context *tls_offload_ctx( > int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg, > unsigned char *record_type); > > +int tls_sw_fallback_init(struct sock *sk, > + struct tls_offload_context *offload_ctx, > + struct tls_crypto_info *crypto_info); > + > #endif /* _TLS_OFFLOAD_H */ > diff --git a/net/tls/Kconfig b/net/tls/Kconfig > index eb583038c67e..1a4ea55c2f09 100644 > --- a/net/tls/Kconfig > +++ b/net/tls/Kconfig > @@ -13,3 +13,12 @@ config TLS > encryption handling of the TLS protocol to be done in-kernel. > > If unsure, say N. > + > +config TLS_DEVICE > + bool "Transport Layer Security HW offload" > + depends on TLS > + default n > + ---help--- > + Enable kernel support for HW offload of the TLS protocol. > + > + If unsure, say N. > diff --git a/net/tls/Makefile b/net/tls/Makefile > index a930fd1c4f7b..44483cd47b3a 100644 > --- a/net/tls/Makefile > +++ b/net/tls/Makefile > @@ -5,3 +5,6 @@ > obj-$(CONFIG_TLS) += tls.o > > tls-y := tls_main.o tls_sw.o > + > +tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o > + > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > new file mode 100644 > index 000000000000..5082d693a503 > --- /dev/null > +++ b/net/tls/tls_device.c > @@ -0,0 +1,800 @@ > +/* Copyright (c) 2016-2017, Mellanox Technologies All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * - Neither the name of the Mellanox Technologies nor the > + * names of its contributors may be used to endorse or promote > + * products derived from this software without specific prior written > + * permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. > + * IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR > + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR > + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING > + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > + * POSSIBILITY OF SUCH DAMAGE > + */ > + > +#include <linux/module.h> > +#include <net/tcp.h> > +#include <net/inet_common.h> > +#include <linux/highmem.h> > +#include <linux/netdevice.h> > + > +#include <net/tls.h> > +#include <crypto/aead.h> > + > +/* device_offload_lock is used to synchronize tls_dev_add > + * against NETDEV_DOWN notifications. > + */ > +DEFINE_STATIC_PERCPU_RWSEM(device_offload_lock); > + > +static void tls_device_gc_task(struct work_struct *work); > + > +static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task); > +static LIST_HEAD(tls_device_gc_list); > +static LIST_HEAD(tls_device_list); > +static DEFINE_SPINLOCK(tls_device_lock); > + > +static void tls_device_free_ctx(struct tls_context *ctx) > +{ > + struct tls_offload_context *offlad_ctx = tls_offload_ctx(ctx); > + > + kfree(offlad_ctx); > + kfree(ctx); > +} > + > +static void tls_device_gc_task(struct work_struct *work) > +{ > + struct tls_context *ctx, *tmp; > + struct list_head gc_list; > + unsigned long flags; > + > + spin_lock_irqsave(&tls_device_lock, flags); > + INIT_LIST_HEAD(&gc_list); > + list_splice_init(&tls_device_gc_list, &gc_list); > + spin_unlock_irqrestore(&tls_device_lock, flags); > + > + list_for_each_entry_safe(ctx, tmp, &gc_list, list) { > + struct net_device *netdev = ctx->netdev; > + > + if (netdev) { > + netdev->tlsdev_ops->tls_dev_del(netdev, ctx, > + TLS_OFFLOAD_CTX_DIR_TX); > + dev_put(netdev); > + } > + > + list_del(&ctx->list); > + tls_device_free_ctx(ctx); > + } > +} > + > +static void tls_device_queue_ctx_destruction(struct tls_context *ctx) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&tls_device_lock, flags); > + list_move_tail(&ctx->list, &tls_device_gc_list); > + > + /* schedule_work inside the spinlock > + * to make sure tls_device_down waits for that work. > + */ > + schedule_work(&tls_device_gc_work); > + > + spin_unlock_irqrestore(&tls_device_lock, flags); > +} > + > +/* We assume that the socket is already connected */ > +static struct net_device *get_netdev_for_sock(struct sock *sk) > +{ > + struct inet_sock *inet = inet_sk(sk); > + struct net_device *netdev = NULL; > + > + netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif); > + > + return netdev; > +} > + > +static int attach_sock_to_netdev(struct sock *sk, struct net_device *netdev, > + struct tls_context *ctx) > +{ > + int rc; > + > + rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, TLS_OFFLOAD_CTX_DIR_TX, > + &ctx->crypto_send); > + if (rc) { > + pr_err("The netdev has refused to offload this socket\n"); _ratelimit here is probably welcomed, as this could be triggered by users at will. > + goto out; > + } > + > + rc = 0; > +out: > + return rc; > +} > + > +static void destroy_record(struct tls_record_info *record) > +{ > + skb_frag_t *frag; > + int nr_frags = record->num_frags; > + > + while (nr_frags > 0) { > + frag = &record->frags[nr_frags - 1]; > + __skb_frag_unref(frag); > + --nr_frags; > + } > + kfree(record); > +} > + > +static void delete_all_records(struct tls_offload_context *offload_ctx) > +{ > + struct tls_record_info *info, *temp; > + > + list_for_each_entry_safe(info, temp, &offload_ctx->records_list, list) { > + list_del(&info->list); > + destroy_record(info); > + } > + > + offload_ctx->retransmit_hint = NULL; > +} > + > +static void tls_icsk_clean_acked(struct sock *sk) > +{ > + struct tls_context *tls_ctx = tls_get_ctx(sk); > + struct tls_offload_context *ctx; > + struct tcp_sock *tp = tcp_sk(sk); > + struct tls_record_info *info, *temp; > + unsigned long flags; > + u64 deleted_records = 0; > + > + if (!tls_ctx) > + return; > + > + ctx = tls_offload_ctx(tls_ctx); > + > + spin_lock_irqsave(&ctx->lock, flags); Would be nice if this spinlock could be avoided somehow, as it's being called right from tcp_ack(). > + info = ctx->retransmit_hint; > + if (info && !before(tp->snd_una, info->end_seq)) { > + ctx->retransmit_hint = NULL; > + list_del(&info->list); > + destroy_record(info); > + deleted_records++; > + } > + > + list_for_each_entry_safe(info, temp, &ctx->records_list, list) { > + if (before(tp->snd_una, info->end_seq)) > + break; > + list_del(&info->list); > + > + destroy_record(info); > + deleted_records++; > + } > + > + ctx->unacked_record_sn += deleted_records; > + spin_unlock_irqrestore(&ctx->lock, flags); > +} > + > +/* At this point, there should be no references on this > + * socket and no in-flight SKBs associated with this > + * socket, so it is safe to free all the resources. > + */ > +void tls_device_sk_destruct(struct sock *sk) > +{ > + struct tls_context *tls_ctx = tls_get_ctx(sk); > + struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx); > + > + if (ctx->open_record) > + destroy_record(ctx->open_record); > + > + delete_all_records(ctx); > + crypto_free_aead(ctx->aead_send); > + ctx->sk_destruct(sk); > + > + if (refcount_dec_and_test(&tls_ctx->refcount)) > + tls_device_queue_ctx_destruction(tls_ctx); > +} > +EXPORT_SYMBOL(tls_device_sk_destruct); > + > +static inline void tls_append_frag(struct tls_record_info *record, > + struct page_frag *pfrag, > + int size) > +{ > + skb_frag_t *frag; > + > + frag = &record->frags[record->num_frags - 1]; > + if (frag->page.p == pfrag->page && > + frag->page_offset + frag->size == pfrag->offset) { > + frag->size += size; > + } else { > + ++frag; > + frag->page.p = pfrag->page; > + frag->page_offset = pfrag->offset; > + frag->size = size; > + ++record->num_frags; > + get_page(pfrag->page); > + } > + > + pfrag->offset += size; > + record->len += size; > +} > + > +static inline int tls_push_record(struct sock *sk, > + struct tls_context *ctx, > + struct tls_offload_context *offload_ctx, > + struct tls_record_info *record, > + struct page_frag *pfrag, > + int flags, > + unsigned char record_type) > +{ > + skb_frag_t *frag; > + struct tcp_sock *tp = tcp_sk(sk); > + struct page_frag fallback_frag; > + struct page_frag *tag_pfrag = pfrag; > + int i; > + > + /* fill prepand */ > + frag = &record->frags[0]; > + tls_fill_prepend(ctx, > + skb_frag_address(frag), > + record->len - ctx->prepend_size, > + record_type); > + > + if (unlikely(!skb_page_frag_refill(ctx->tag_size, pfrag, GFP_KERNEL))) { > + /* HW doesn't care about the data in the tag > + * so in case pfrag has no room > + * for a tag and we can't allocate a new pfrag > + * just use the page in the first frag > + * rather then write a complicated fall back code. > + */ > + tag_pfrag = &fallback_frag; > + tag_pfrag->page = skb_frag_page(frag); > + tag_pfrag->offset = 0; > + } > + > + tls_append_frag(record, tag_pfrag, ctx->tag_size); > + record->end_seq = tp->write_seq + record->len; > + spin_lock_irq(&offload_ctx->lock); > + list_add_tail(&record->list, &offload_ctx->records_list); > + spin_unlock_irq(&offload_ctx->lock); > + offload_ctx->open_record = NULL; > + set_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags); > + tls_advance_record_sn(sk, ctx); > + > + for (i = 0; i < record->num_frags; i++) { > + frag = &record->frags[i]; > + sg_unmark_end(&offload_ctx->sg_tx_data[i]); > + sg_set_page(&offload_ctx->sg_tx_data[i], skb_frag_page(frag), > + frag->size, frag->page_offset); > + sk_mem_charge(sk, frag->size); > + get_page(skb_frag_page(frag)); > + } > + sg_mark_end(&offload_ctx->sg_tx_data[record->num_frags - 1]); > + > + /* all ready, send */ > + return tls_push_sg(sk, ctx, offload_ctx->sg_tx_data, 0, flags); > +} > + > +static inline int tls_create_new_record(struct tls_offload_context *offload_ctx, > + struct page_frag *pfrag, > + size_t prepend_size) > +{ > + skb_frag_t *frag; > + struct tls_record_info *record; > + > + record = kmalloc(sizeof(*record), GFP_KERNEL); > + if (!record) > + return -ENOMEM; > + > + frag = &record->frags[0]; > + __skb_frag_set_page(frag, pfrag->page); > + frag->page_offset = pfrag->offset; > + skb_frag_size_set(frag, prepend_size); > + > + get_page(pfrag->page); > + pfrag->offset += prepend_size; > + > + record->num_frags = 1; > + record->len = prepend_size; > + offload_ctx->open_record = record; > + return 0; > +} > + > +static inline int tls_do_allocation(struct sock *sk, > + struct tls_offload_context *offload_ctx, > + struct page_frag *pfrag, > + size_t prepend_size) > +{ > + int ret; > + > + if (!offload_ctx->open_record) { > + if (unlikely(!skb_page_frag_refill(prepend_size, pfrag, > + sk->sk_allocation))) { > + sk->sk_prot->enter_memory_pressure(sk); > + sk_stream_moderate_sndbuf(sk); > + return -ENOMEM; > + } > + > + ret = tls_create_new_record(offload_ctx, pfrag, prepend_size); > + if (ret) > + return ret; > + > + if (pfrag->size > pfrag->offset) > + return 0; > + } > + > + if (!sk_page_frag_refill(sk, pfrag)) > + return -ENOMEM; > + > + return 0; > +} > + > +static int tls_push_data(struct sock *sk, > + struct iov_iter *msg_iter, > + size_t size, int flags, > + unsigned char record_type) > +{ > + struct tls_context *tls_ctx = tls_get_ctx(sk); > + struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx); > + struct tls_record_info *record = ctx->open_record; > + struct page_frag *pfrag; > + int copy, rc = 0; > + size_t orig_size = size; > + u32 max_open_record_len; > + long timeo; > + int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE); > + int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST; > + bool done = false; > + > + if (flags & > + ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST)) > + return -ENOTSUPP; > + > + if (sk->sk_err) > + return -sk->sk_err; > + > + timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); > + rc = tls_complete_pending_work(sk, tls_ctx, flags, &timeo); > + if (rc < 0) > + return rc; > + > + pfrag = sk_page_frag(sk); > + > + /* KTLS_TLS_HEADER_SIZE is not counted as part of the TLS record, and > + * we need to leave room for an authentication tag. > + */ > + max_open_record_len = TLS_MAX_PAYLOAD_SIZE + > + tls_ctx->prepend_size; > + do { > + if (tls_do_allocation(sk, ctx, pfrag, > + tls_ctx->prepend_size)) { > + rc = sk_stream_wait_memory(sk, &timeo); > + if (!rc) > + continue; > + > + record = ctx->open_record; > + if (!record) > + break; > +handle_error: > + if (record_type != TLS_RECORD_TYPE_DATA) { > + /* avoid sending partial > + * record with type != > + * application_data > + */ > + size = orig_size; > + destroy_record(record); > + ctx->open_record = NULL; > + } else if (record->len > tls_ctx->prepend_size) { > + goto last_record; > + } > + > + break; > + } > + > + record = ctx->open_record; > + copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); > + copy = min_t(size_t, copy, (max_open_record_len - record->len)); > + > + if (copy_from_iter_nocache(page_address(pfrag->page) + > + pfrag->offset, > + copy, msg_iter) != copy) { > + rc = -EFAULT; > + goto handle_error; > + } > + tls_append_frag(record, pfrag, copy); > + > + size -= copy; > + if (!size) { > +last_record: > + tls_push_record_flags = flags; > + if (more) { > + tls_ctx->pending_open_record_frags = > + record->num_frags; > + break; > + } > + > + done = true; > + } > + > + if ((done) || record->len >= max_open_record_len || > + (record->num_frags >= MAX_SKB_FRAGS - 1)) { > + rc = tls_push_record(sk, > + tls_ctx, > + ctx, > + record, > + pfrag, > + tls_push_record_flags, > + record_type); > + if (rc < 0) > + break; > + } > + } while (!done); > + > + if (orig_size - size > 0) > + rc = orig_size - size; > + > + return rc; > +} > + > +int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > +{ > + unsigned char record_type = TLS_RECORD_TYPE_DATA; > + int rc = 0; > + > + lock_sock(sk); > + > + if (unlikely(msg->msg_controllen)) { > + rc = tls_proccess_cmsg(sk, msg, &record_type); > + if (rc) > + goto out; > + } > + > + rc = tls_push_data(sk, &msg->msg_iter, size, > + msg->msg_flags, record_type); > + > +out: > + release_sock(sk); > + return rc; > +} > + > +int tls_device_sendpage(struct sock *sk, struct page *page, > + int offset, size_t size, int flags) > +{ > + struct iov_iter msg_iter; > + struct kvec iov; > + char *kaddr = kmap(page); > + int rc = 0; > + > + if (flags & MSG_SENDPAGE_NOTLAST) > + flags |= MSG_MORE; > + > + lock_sock(sk); > + > + if (flags & MSG_OOB) { > + rc = -ENOTSUPP; > + goto out; > + } > + > + iov.iov_base = kaddr + offset; > + iov.iov_len = size; > + iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, &iov, 1, size); > + rc = tls_push_data(sk, &msg_iter, size, > + flags, TLS_RECORD_TYPE_DATA); > + kunmap(page); > + > +out: > + release_sock(sk); > + return rc; > +} > + > +struct tls_record_info *tls_get_record(struct tls_offload_context *context, > + u32 seq, u64 *p_record_sn) > +{ > + struct tls_record_info *info; > + u64 record_sn = context->hint_record_sn; > + > + info = context->retransmit_hint; > + if (!info || > + before(seq, info->end_seq - info->len)) { > + /* if retransmit_hint is irrelevant start > + * from the begging of the list > + */ > + info = list_first_entry(&context->records_list, > + struct tls_record_info, list); > + record_sn = context->unacked_record_sn; > + } > + > + list_for_each_entry_from(info, &context->records_list, list) { > + if (before(seq, info->end_seq)) { > + if (!context->retransmit_hint || > + after(info->end_seq, > + context->retransmit_hint->end_seq)) { > + context->hint_record_sn = record_sn; > + context->retransmit_hint = info; > + } > + *p_record_sn = record_sn; > + return info; > + } > + record_sn++; > + } > + > + return NULL; > +} > +EXPORT_SYMBOL(tls_get_record); > + > +static int tls_device_push_pending_record(struct sock *sk, int flags) > +{ > + struct iov_iter msg_iter; > + > + iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, NULL, 0, 0); > + return tls_push_data(sk, &msg_iter, 0, flags, TLS_RECORD_TYPE_DATA); > +} > + > +int tls_set_device_offload(struct sock *sk, struct tls_context *ctx) > +{ > + struct tls_crypto_info *crypto_info; > + struct tls_offload_context *offload_ctx; > + struct tls_record_info *start_marker_record; > + u16 nonece_size, tag_size, iv_size, rec_seq_size; > + char *iv, *rec_seq; > + int rc; > + struct net_device *netdev; > + struct sk_buff *skb; > + > + if (!ctx) { > + rc = -EINVAL; > + goto out; > + } > + > + if (ctx->priv_ctx) { > + rc = -EEXIST; > + goto out; > + } > + > + /* We support starting offload on multiple sockets > + * concurrently, So we only need a read lock here. > + */ > + percpu_down_read(&device_offload_lock); > + netdev = get_netdev_for_sock(sk); > + if (!netdev) { > + pr_err("%s: netdev not found\n", __func__); _ratelimit? > + rc = -EINVAL; > + goto release_lock; > + } > + Marcelo ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-18 19:53 ` Marcelo Ricardo Leitner @ 2017-12-19 7:31 ` Ilya Lesokhin 2017-12-19 15:11 ` Marcelo Ricardo Leitner 0 siblings, 1 reply; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-19 7:31 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: netdev, davem, davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss On Mon, Monday, December 18, 2017 9:54 PM, Marcelo Ricardo Leitner wrote: > On Mon, Dec 18, 2017 at 01:10:33PM +0200, Ilya Lesokhin wrote: > > This patch adds a generic infrastructure to offload TLS crypto to a > > network devices. It enables the kernel TLS socket to skip encryption > > and authentication operations on the transmit side of the data path. > > Leaving those computationally expensive operations to the NIC. > > I have a hard time understanding why this was named 'tls_device' if no > net_device's are registered. > I'm not quite sure what you mean by "no net_device's are registered" Presumably you mean there is no device that implements the NETIF_F_HW_TLS_TX capability yet. I'll just say that the IPSEC device offload infrastructure was also submitted https://github.com/torvalds/linux/commit/d77e38e612a017480157fe6d2c1422f42cb5b7e3 before the first implementation https://github.com/torvalds/linux/commit/bebb23e6cb02d2fc752905e39d09ff6152852c6c And we did provide a link to an implementation https://github.com/Mellanox/tls-offload/tree/tls_device_v3 for people who want to take a look. Unfortunately it is not ready for upstream submission yet > > + percpu_down_read(&device_offload_lock); > > + netdev = get_netdev_for_sock(sk); > > + if (!netdev) { > > + pr_err("%s: netdev not found\n", __func__); > > _ratelimit? > Thanks, we will fix it in the future. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-19 7:31 ` Ilya Lesokhin @ 2017-12-19 15:11 ` Marcelo Ricardo Leitner 2017-12-19 15:38 ` Ilya Lesokhin 0 siblings, 1 reply; 35+ messages in thread From: Marcelo Ricardo Leitner @ 2017-12-19 15:11 UTC (permalink / raw) To: Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss On Tue, Dec 19, 2017 at 07:31:24AM +0000, Ilya Lesokhin wrote: > On Mon, Monday, December 18, 2017 9:54 PM, Marcelo Ricardo Leitner wrote: > > > On Mon, Dec 18, 2017 at 01:10:33PM +0200, Ilya Lesokhin wrote: > > > This patch adds a generic infrastructure to offload TLS crypto to a > > > network devices. It enables the kernel TLS socket to skip encryption > > > and authentication operations on the transmit side of the data path. > > > Leaving those computationally expensive operations to the NIC. > > > > I have a hard time understanding why this was named 'tls_device' if no > > net_device's are registered. > > > I'm not quite sure what you mean by "no net_device's are registered" > Presumably you mean there is no device that implements the > NETIF_F_HW_TLS_TX capability yet. Not really. Let me try again. This patchset is using the expression "tls_device". When I read that, I expect a new interface type, like a tunnel, that would be created on top of another interface that has the offloading capability. That's why I'm confused. IMHO "tls_offload" is a better fit. Makes sense? > I'll just say that the IPSEC device offload infrastructure was also submitted > https://github.com/torvalds/linux/commit/d77e38e612a017480157fe6d2c1422f42cb5b7e3 > before the first implementation > https://github.com/torvalds/linux/commit/bebb23e6cb02d2fc752905e39d09ff6152852c6c > > And we did provide a link to an implementation > https://github.com/Mellanox/tls-offload/tree/tls_device_v3 > for people who want to take a look. > Unfortunately it is not ready for upstream submission yet Yep, although I still have to get there. Thanks, Marcelo ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-19 15:11 ` Marcelo Ricardo Leitner @ 2017-12-19 15:38 ` Ilya Lesokhin 2017-12-19 16:18 ` Marcelo Ricardo Leitner 0 siblings, 1 reply; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-19 15:38 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: netdev, davem, davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss Tuesday, December 19, 2017 5:12 PM, Marcelo Ricardo Leitner wrote: > > I'm not quite sure what you mean by "no net_device's are registered" > > Presumably you mean there is no device that implements the > > NETIF_F_HW_TLS_TX capability yet. > > Not really. Let me try again. This patchset is using the expression "tls_device". > When I read that, I expect a new interface type, like a tunnel, that would be > created on top of another interface that has the offloading capability. That's > why I'm confused. IMHO "tls_offload" is a better fit. Makes sense? > We don't expose a new interface. An existing netdev does the offload. The xfrm layer also calls the offload layer xfrm_device and It also doesn't need to add another interface to offload ipsec to a netdev. I thought about calling it tls_hw or tls_hw_offload. The problem is that the important distinction here is that the offload is done by a netdev. tls_sw can also use hw offload if you have the required memory to memory crypto engine and crypto_alloc_aead("gcm(aes)", 0, 0); decides on using it. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-19 15:38 ` Ilya Lesokhin @ 2017-12-19 16:18 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 35+ messages in thread From: Marcelo Ricardo Leitner @ 2017-12-19 16:18 UTC (permalink / raw) To: Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, Boris Pismenny, Aviad Yehezkel, Liran Liss On Tue, Dec 19, 2017 at 03:38:16PM +0000, Ilya Lesokhin wrote: > Tuesday, December 19, 2017 5:12 PM, Marcelo Ricardo Leitner wrote: > > > > I'm not quite sure what you mean by "no net_device's are registered" > > > Presumably you mean there is no device that implements the > > > NETIF_F_HW_TLS_TX capability yet. > > > > Not really. Let me try again. This patchset is using the expression "tls_device". > > When I read that, I expect a new interface type, like a tunnel, that would be > > created on top of another interface that has the offloading capability. That's > > why I'm confused. IMHO "tls_offload" is a better fit. Makes sense? > > > > We don't expose a new interface. An existing netdev does the offload. > > The xfrm layer also calls the offload layer xfrm_device and It also doesn't need to > add another interface to offload ipsec to a netdev. Hm right, there is xfrm_dev_init() and others, but there is also XFRM_OFFLOAD as the config define and not XFRM_DEVICE. > > I thought about calling it tls_hw or tls_hw_offload. > The problem is that the important distinction here is that the > offload is done by a netdev. > tls_sw can also use hw offload if you have the required > memory to memory crypto engine and crypto_alloc_aead("gcm(aes)", 0, 0); > decides on using it. Now I can see the confusion in both ways, thanks. And now I don't have a preference either. Marcelo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin 2017-12-18 19:53 ` Marcelo Ricardo Leitner @ 2017-12-19 7:00 ` kbuild test robot 2017-12-19 7:01 ` kbuild test robot ` (2 subsequent siblings) 4 siblings, 0 replies; 35+ messages in thread From: kbuild test robot @ 2017-12-19 7:00 UTC (permalink / raw) To: Ilya Lesokhin Cc: kbuild-all, netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin [-- Attachment #1: Type: text/plain, Size: 5992 bytes --] Hi Ilya, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/tls-Add-generic-NIC-offload-infrastructure/20171219-140819 config: xtensa-allmodconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All warnings (new ones prefixed by >>): net//tls/tls_device_fallback.c: In function 'tls_sw_fallback': >> net//tls/tls_device_fallback.c:360:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] } ^ vim +360 net//tls/tls_device_fallback.c 214 215 /* This function may be called after the user socket is already 216 * closed so make sure we don't use anything freed during 217 * tls_sk_proto_close here 218 */ 219 struct sk_buff *tls_sw_fallback(struct sock *sk, struct sk_buff *skb) 220 { 221 int tcp_header_size = tcp_hdrlen(skb); 222 int tcp_payload_offset = skb_transport_offset(skb) + tcp_header_size; 223 int payload_len = skb->len - tcp_payload_offset; 224 struct tls_context *tls_ctx = tls_get_ctx(sk); 225 struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx); 226 int remaining, buf_len, resync_sgs, rc, i = 0; 227 void *buf, *dummy_buf, *iv, *aad; 228 struct scatterlist sg_in[2 * (MAX_SKB_FRAGS + 1)]; 229 struct scatterlist sg_out[3]; 230 u32 tcp_seq = ntohl(tcp_hdr(skb)->seq); 231 struct aead_request *aead_req; 232 struct sk_buff *nskb = NULL; 233 struct tls_record_info *record; 234 unsigned long flags; 235 s32 sync_size; 236 u64 rcd_sn; 237 238 if (!payload_len) 239 return skb; 240 241 sg_init_table(sg_in, ARRAY_SIZE(sg_in)); 242 sg_init_table(sg_out, ARRAY_SIZE(sg_out)); 243 244 spin_lock_irqsave(&ctx->lock, flags); 245 record = tls_get_record(ctx, tcp_seq, &rcd_sn); 246 if (!record) { 247 spin_unlock_irqrestore(&ctx->lock, flags); 248 WARN(1, "Record not found for seq %u\n", tcp_seq); 249 goto free_orig; 250 } 251 252 sync_size = tcp_seq - tls_record_start_seq(record); 253 if (sync_size < 0) { 254 int is_start_marker = tls_record_is_start_marker(record); 255 256 spin_unlock_irqrestore(&ctx->lock, flags); 257 if (!is_start_marker) 258 /* This should only occur if the relevant record was 259 * already acked. In that case it should be ok 260 * to drop the packet and avoid retransmission. 261 * 262 * There is a corner case where the packet contains 263 * both an acked and a non-acked record. 264 * We currently don't handle that case and rely 265 * on TCP to retranmit a packet that doesn't contain 266 * already acked payload. 267 */ 268 goto free_orig; 269 270 if (payload_len > -sync_size) { 271 WARN(1, "Fallback of partially offloaded packets is not supported\n"); 272 goto free_orig; 273 } else { 274 return skb; 275 } 276 } 277 278 remaining = sync_size; 279 while (remaining > 0) { 280 skb_frag_t *frag = &record->frags[i]; 281 282 __skb_frag_ref(frag); 283 sg_set_page(sg_in + i, skb_frag_page(frag), 284 skb_frag_size(frag), frag->page_offset); 285 286 remaining -= skb_frag_size(frag); 287 288 if (remaining < 0) 289 sg_in[i].length += remaining; 290 291 i++; 292 } 293 spin_unlock_irqrestore(&ctx->lock, flags); 294 resync_sgs = i; 295 296 aead_req = tls_alloc_aead_request(ctx->aead_send, GFP_ATOMIC); 297 if (!aead_req) 298 goto put_sg; 299 300 buf_len = TLS_CIPHER_AES_GCM_128_SALT_SIZE + 301 TLS_CIPHER_AES_GCM_128_IV_SIZE + 302 TLS_AAD_SPACE_SIZE + 303 sync_size + 304 tls_ctx->tag_size; 305 buf = kmalloc(buf_len, GFP_ATOMIC); 306 if (!buf) 307 goto free_req; 308 309 nskb = alloc_skb(skb_headroom(skb) + skb->len, GFP_ATOMIC); 310 if (!nskb) 311 goto free_req; 312 313 skb_reserve(nskb, skb_headroom(skb)); 314 315 iv = buf; 316 317 memcpy(iv, tls_ctx->crypto_send_aes_gcm_128.salt, 318 TLS_CIPHER_AES_GCM_128_SALT_SIZE); 319 aad = buf + TLS_CIPHER_AES_GCM_128_SALT_SIZE + 320 TLS_CIPHER_AES_GCM_128_IV_SIZE; 321 dummy_buf = aad + TLS_AAD_SPACE_SIZE; 322 323 sg_set_buf(&sg_out[0], dummy_buf, sync_size); 324 sg_set_buf(&sg_out[1], nskb->data + tcp_payload_offset, 325 payload_len); 326 /* Add room for authentication tag produced by crypto */ 327 dummy_buf += sync_size; 328 sg_set_buf(&sg_out[2], dummy_buf, tls_ctx->tag_size); 329 rc = skb_to_sgvec(skb, &sg_in[i], tcp_payload_offset, 330 payload_len); 331 if (rc < 0) 332 goto free_nskb; 333 334 rc = tls_enc_records(aead_req, ctx->aead_send, sg_in, sg_out, aad, iv, 335 rcd_sn, sync_size + payload_len); 336 if (rc < 0) 337 goto free_nskb; 338 339 complete_skb(nskb, skb, tcp_payload_offset); 340 341 /* validate_xmit_skb_list assumes that if the skb wasn't segmented 342 * nskb->prev will point to the skb itself 343 */ 344 nskb->prev = nskb; 345 free_buf: 346 kfree(buf); 347 free_req: 348 kfree(aead_req); 349 put_sg: 350 for (i = 0; i < resync_sgs; i++) 351 put_page(sg_page(&sg_in[i])); 352 free_orig: 353 kfree_skb(skb); 354 return nskb; 355 356 free_nskb: 357 kfree_skb(nskb); 358 nskb = NULL; 359 goto free_buf; > 360 } 361 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 52638 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin 2017-12-18 19:53 ` Marcelo Ricardo Leitner 2017-12-19 7:00 ` kbuild test robot @ 2017-12-19 7:01 ` kbuild test robot 2017-12-19 8:17 ` [RFC PATCH] tls: tls_sw_fallback() can be static kbuild test robot 2017-12-19 8:17 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure kbuild test robot 4 siblings, 0 replies; 35+ messages in thread From: kbuild test robot @ 2017-12-19 7:01 UTC (permalink / raw) To: Ilya Lesokhin Cc: kbuild-all, netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin [-- Attachment #1: Type: text/plain, Size: 2352 bytes --] Hi Ilya, I love your patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/tls-Add-generic-NIC-offload-infrastructure/20171219-140819 config: tile-allmodconfig (attached as .config) compiler: tilegx-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=tile All errors (new ones prefixed by >>): net/tls/tls_device_fallback.c: In function 'update_chksum': >> net/tls/tls_device_fallback.c:190:16: error: implicit declaration of function 'csum_ipv6_magic'; did you mean 'csum_tcpudp_magic'? [-Werror=implicit-function-declaration] th->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, ^~~~~~~~~~~~~~~ csum_tcpudp_magic cc1: some warnings being treated as errors vim +190 net/tls/tls_device_fallback.c 166 167 static inline void update_chksum(struct sk_buff *skb, int headln) 168 { 169 /* Can't use icsk->icsk_af_ops->send_check here because the ip addresses 170 * might have been changed by NAT. 171 */ 172 173 const struct ipv6hdr *ipv6h; 174 const struct iphdr *iph; 175 struct tcphdr *th = tcp_hdr(skb); 176 int datalen = skb->len - headln; 177 178 /* We only changed the payload so if we are using partial we don't 179 * need to update anything. 180 */ 181 if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) 182 return; 183 184 skb->ip_summed = CHECKSUM_PARTIAL; 185 skb->csum_start = skb_transport_header(skb) - skb->head; 186 skb->csum_offset = offsetof(struct tcphdr, check); 187 188 if (skb->sk->sk_family == AF_INET6) { 189 ipv6h = ipv6_hdr(skb); > 190 th->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, 191 datalen, IPPROTO_TCP, 0); 192 } else { 193 iph = ip_hdr(skb); 194 th->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, datalen, 195 IPPROTO_TCP, 0); 196 } 197 } 198 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 51390 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [RFC PATCH] tls: tls_sw_fallback() can be static 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin ` (2 preceding siblings ...) 2017-12-19 7:01 ` kbuild test robot @ 2017-12-19 8:17 ` kbuild test robot 2017-12-19 8:17 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure kbuild test robot 4 siblings, 0 replies; 35+ messages in thread From: kbuild test robot @ 2017-12-19 8:17 UTC (permalink / raw) To: Ilya Lesokhin Cc: kbuild-all, netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin Fixes: 716b5be4af2a ("tls: Add generic NIC offload infrastructure.") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> --- tls_device_fallback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c index 95d04d6..d2403d6e 100644 --- a/net/tls/tls_device_fallback.c +++ b/net/tls/tls_device_fallback.c @@ -216,7 +216,7 @@ static void complete_skb(struct sk_buff *nskb, struct sk_buff *skb, int headln) * closed so make sure we don't use anything freed during * tls_sk_proto_close here */ -struct sk_buff *tls_sw_fallback(struct sock *sk, struct sk_buff *skb) +static struct sk_buff *tls_sw_fallback(struct sock *sk, struct sk_buff *skb) { int tcp_header_size = tcp_hdrlen(skb); int tcp_payload_offset = skb_transport_offset(skb) + tcp_header_size; ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure. 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin ` (3 preceding siblings ...) 2017-12-19 8:17 ` [RFC PATCH] tls: tls_sw_fallback() can be static kbuild test robot @ 2017-12-19 8:17 ` kbuild test robot 4 siblings, 0 replies; 35+ messages in thread From: kbuild test robot @ 2017-12-19 8:17 UTC (permalink / raw) To: Ilya Lesokhin Cc: kbuild-all, netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl, Ilya Lesokhin Hi Ilya, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Ilya-Lesokhin/tls-Add-generic-NIC-offload-infrastructure/20171219-140819 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) Please review and possibly fold the followup patch. vim +649 net/tls/tls_device.c 547 548 int tls_set_device_offload(struct sock *sk, struct tls_context *ctx) 549 { 550 struct tls_crypto_info *crypto_info; 551 struct tls_offload_context *offload_ctx; 552 struct tls_record_info *start_marker_record; 553 u16 nonece_size, tag_size, iv_size, rec_seq_size; 554 char *iv, *rec_seq; 555 int rc; 556 struct net_device *netdev; 557 struct sk_buff *skb; 558 559 if (!ctx) { 560 rc = -EINVAL; 561 goto out; 562 } 563 564 if (ctx->priv_ctx) { 565 rc = -EEXIST; 566 goto out; 567 } 568 569 /* We support starting offload on multiple sockets 570 * concurrently, So we only need a read lock here. 571 */ 572 percpu_down_read(&device_offload_lock); 573 netdev = get_netdev_for_sock(sk); 574 if (!netdev) { 575 pr_err("%s: netdev not found\n", __func__); 576 rc = -EINVAL; 577 goto release_lock; 578 } 579 580 if (!(netdev->features & NETIF_F_HW_TLS_TX)) { 581 rc = -ENOTSUPP; 582 goto release_netdev; 583 } 584 585 /* Avoid offloading if the device is down 586 * We don't want to offload new flows after 587 * the NETDEV_DOWN event 588 */ 589 if (!(netdev->flags & IFF_UP)) { 590 rc = -EINVAL; 591 goto release_lock; 592 } 593 594 crypto_info = &ctx->crypto_send; 595 switch (crypto_info->cipher_type) { 596 case TLS_CIPHER_AES_GCM_128: { 597 nonece_size = TLS_CIPHER_AES_GCM_128_IV_SIZE; 598 tag_size = TLS_CIPHER_AES_GCM_128_TAG_SIZE; 599 iv_size = TLS_CIPHER_AES_GCM_128_IV_SIZE; 600 iv = ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->iv; 601 rec_seq_size = TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE; 602 rec_seq = 603 ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->rec_seq; 604 break; 605 } 606 default: 607 rc = -EINVAL; 608 goto release_netdev; 609 } 610 611 start_marker_record = kmalloc(sizeof(*start_marker_record), GFP_KERNEL); 612 if (!start_marker_record) { 613 rc = -ENOMEM; 614 goto release_netdev; 615 } 616 617 rc = attach_sock_to_netdev(sk, netdev, ctx); 618 if (rc) 619 goto free_marker_record; 620 621 ctx->netdev = netdev; 622 623 ctx->prepend_size = TLS_HEADER_SIZE + nonece_size; 624 ctx->tag_size = tag_size; 625 ctx->iv_size = iv_size; 626 ctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE, 627 GFP_KERNEL); 628 if (!ctx->iv) { 629 rc = -ENOMEM; 630 goto detach_sock; 631 } 632 633 memcpy(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, rec_seq, iv_size); 634 635 ctx->rec_seq_size = rec_seq_size; 636 ctx->rec_seq = kmalloc(rec_seq_size, GFP_KERNEL); 637 if (!ctx->rec_seq) { 638 rc = -ENOMEM; 639 goto err_iv; 640 } 641 memcpy(ctx->rec_seq, rec_seq, rec_seq_size); 642 643 offload_ctx = ctx->priv_ctx; 644 memcpy(&offload_ctx->unacked_record_sn, rec_seq, 645 sizeof(offload_ctx->unacked_record_sn)); 646 647 /* start at rec_seq -1 to account for the start marker record */ 648 offload_ctx->unacked_record_sn = > 649 be64_to_cpu(offload_ctx->unacked_record_sn) - 1; 650 651 rc = tls_sw_fallback_init(sk, offload_ctx, crypto_info); 652 if (rc) 653 goto err_iv; 654 655 start_marker_record->end_seq = tcp_sk(sk)->write_seq; 656 start_marker_record->len = 0; 657 start_marker_record->num_frags = 0; 658 659 INIT_LIST_HEAD(&offload_ctx->records_list); 660 list_add_tail(&start_marker_record->list, &offload_ctx->records_list); 661 spin_lock_init(&offload_ctx->lock); 662 663 inet_csk(sk)->icsk_clean_acked = &tls_icsk_clean_acked; 664 ctx->push_pending_record = tls_device_push_pending_record; 665 offload_ctx->sk_destruct = sk->sk_destruct; 666 667 /* TLS offload is greatly simplified if we don't send 668 * SKBs where only part of the payload needs to be encrypted. 669 * So mark the last skb in the write queue as end of record. 670 */ 671 skb = tcp_write_queue_tail(sk); 672 if (skb) 673 TCP_SKB_CB(skb)->eor = 1; 674 675 refcount_set(&ctx->refcount, 1); 676 spin_lock_irq(&tls_device_lock); 677 list_add_tail(&ctx->list, &tls_device_list); 678 spin_unlock_irq(&tls_device_lock); 679 680 /* following this assignment tls_is_sk_tx_device_offloaded 681 * will return true and the context might be accessed 682 * by the netdev's xmit function. 683 */ 684 smp_store_release(&sk->sk_destruct, 685 &tls_device_sk_destruct); 686 goto release_lock; 687 688 err_iv: 689 kfree(ctx->iv); 690 detach_sock: 691 netdev->tlsdev_ops->tls_dev_del(netdev, ctx, TLS_OFFLOAD_CTX_DIR_TX); 692 free_marker_record: 693 kfree(start_marker_record); 694 release_netdev: 695 dev_put(netdev); 696 release_lock: 697 percpu_up_read(&device_offload_lock); 698 out: 699 return rc; 700 } 701 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin ` (5 preceding siblings ...) 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin @ 2017-12-18 17:10 ` Jiri Pirko 2017-12-19 10:30 ` Jiri Pirko 6 siblings, 1 reply; 35+ messages in thread From: Jiri Pirko @ 2017-12-18 17:10 UTC (permalink / raw) To: Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl Mon, Dec 18, 2017 at 12:10:27PM CET, ilyal@mellanox.com wrote: >Changes from v2: >- Fix sk use after free and possible netdev use after free >- tls device now keeps a refernce on the offloading netdev >- tls device registers to the netdev notifer. > Upon a NETDEV_DOWN event, offload is stopped and > the reference on the netdev is dropped. >- SW fallback support for skb->ip_summed != CHECKSUM_PARTIAL >- Merged TLS patches are no longer part of this series. > >Changes from v1: >- Remove the binding of the socket to a specific netdev > through sk->sk_bound_dev_if. > Add a check in validate_xmit_skb to detect route changes > and call SW fallback code to do the crypto in software. >- tls_get_record now returns the tls record sequence number. > This is required to support connections with rcd_sn != iv. >- Bug fixes to the TLS code. > >This patchset adds a generic infrastructure to offload TLS crypto to a >network devices. > >patches 1-2 Export functions that we need >patch 3 adds infrastructue for offloaded socket fallback >patches 4-5 add new NDOs and capabilities. >patch 6 adds the TLS NIC offload infrastructure. > >Github with mlx5e TLS offload support: >https://github.com/Mellanox/tls-offload/tree/tls_device_v3 I don't get it. You are pushing infra but not the actual driver part who is consuming the infra? Why? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-18 17:10 ` [PATCH v3 net-next 0/6] " Jiri Pirko @ 2017-12-19 10:30 ` Jiri Pirko 2017-12-20 8:28 ` Boris Pismenny 0 siblings, 1 reply; 35+ messages in thread From: Jiri Pirko @ 2017-12-19 10:30 UTC (permalink / raw) To: Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl Mon, Dec 18, 2017 at 06:10:10PM CET, jiri@resnulli.us wrote: >Mon, Dec 18, 2017 at 12:10:27PM CET, ilyal@mellanox.com wrote: >>Changes from v2: >>- Fix sk use after free and possible netdev use after free >>- tls device now keeps a refernce on the offloading netdev >>- tls device registers to the netdev notifer. >> Upon a NETDEV_DOWN event, offload is stopped and >> the reference on the netdev is dropped. >>- SW fallback support for skb->ip_summed != CHECKSUM_PARTIAL >>- Merged TLS patches are no longer part of this series. >> >>Changes from v1: >>- Remove the binding of the socket to a specific netdev >> through sk->sk_bound_dev_if. >> Add a check in validate_xmit_skb to detect route changes >> and call SW fallback code to do the crypto in software. >>- tls_get_record now returns the tls record sequence number. >> This is required to support connections with rcd_sn != iv. >>- Bug fixes to the TLS code. >> >>This patchset adds a generic infrastructure to offload TLS crypto to a >>network devices. >> >>patches 1-2 Export functions that we need >>patch 3 adds infrastructue for offloaded socket fallback >>patches 4-5 add new NDOs and capabilities. >>patch 6 adds the TLS NIC offload infrastructure. >> >>Github with mlx5e TLS offload support: >>https://github.com/Mellanox/tls-offload/tree/tls_device_v3 > >I don't get it. You are pushing infra but not the actual driver part >who is consuming the infra? Why? Okay. Since the driver that uses the API introduced by this patchset is missing, this patchset should be marked as RFC. Dave, I see that you were about to apply v2. I'm sure you missed this. Thanks. ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-19 10:30 ` Jiri Pirko @ 2017-12-20 8:28 ` Boris Pismenny 2017-12-20 10:08 ` Jiri Pirko ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Boris Pismenny @ 2017-12-20 8:28 UTC (permalink / raw) To: Jiri Pirko, Ilya Lesokhin Cc: netdev, davem, davejwatson, tom, hannes, Aviad Yehezkel, Liran Liss > Tue, Dec 19, 2017 at 01:10:10AM CET, jiri@resnulli.us wrote: > > Mon, Dec 18, 2017 at 06:10:10PM CET, jiri@resnulli.us wrote: > >Mon, Dec 18, 2017 at 12:10:27PM CET, ilyal@mellanox.com wrote: > >>Changes from v2: > >>- Fix sk use after free and possible netdev use after free > >>- tls device now keeps a refernce on the offloading netdev > >>- tls device registers to the netdev notifer. > >> Upon a NETDEV_DOWN event, offload is stopped and > >> the reference on the netdev is dropped. > >>- SW fallback support for skb->ip_summed != CHECKSUM_PARTIAL > >>- Merged TLS patches are no longer part of this series. > >> > >>Changes from v1: > >>- Remove the binding of the socket to a specific netdev > >> through sk->sk_bound_dev_if. > >> Add a check in validate_xmit_skb to detect route changes > >> and call SW fallback code to do the crypto in software. > >>- tls_get_record now returns the tls record sequence number. > >> This is required to support connections with rcd_sn != iv. > >>- Bug fixes to the TLS code. > >> > >>This patchset adds a generic infrastructure to offload TLS crypto to a > >>network devices. > >> > >>patches 1-2 Export functions that we need patch 3 adds infrastructue > >>for offloaded socket fallback patches 4-5 add new NDOs and > >>capabilities. > >>patch 6 adds the TLS NIC offload infrastructure. > >> > >>Github with mlx5e TLS offload support: > >>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2F > git > >>hub.com%2FMellanox%2Ftls- > offload%2Ftree%2Ftls_device_v3&data=02%7C01%7 > >>Cborisp%40mellanox.com%7C5aebe81262554f40221908d546cb7c37%7Ca6 > 52971c7d > >>2e4d9ba6a4d149256f461b%7C0%7C0%7C636492762141202894&sdata=gYY > DEmspNfBs > >>aQhefcEojl456L9eWqZnEEI7iPCT0NA%3D&reserved=0 > > > >I don't get it. You are pushing infra but not the actual driver part > >who is consuming the infra? Why? > > Okay. Since the driver that uses the API introduced by this patchset is > missing, this patchset should be marked as RFC. > > Dave, I see that you were about to apply v2. I'm sure you missed this. > Thanks. Isn't this a chicken and egg problem, where something must come first, driver or infra. Unless we combine the infra patches with mlx5 driver code and submit both in a single pull request. Here, we assumed that the infra goes first, and we will submit the driver soon after. We could submit the driver first instead. Dave, would you prefer to get the driver patches that use this infra before the infra? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-20 8:28 ` Boris Pismenny @ 2017-12-20 10:08 ` Jiri Pirko 2017-12-20 10:15 ` Or Gerlitz 2017-12-20 16:12 ` David Miller 2 siblings, 0 replies; 35+ messages in thread From: Jiri Pirko @ 2017-12-20 10:08 UTC (permalink / raw) To: Boris Pismenny Cc: Ilya Lesokhin, netdev, davem, davejwatson, tom, hannes, Aviad Yehezkel, Liran Liss Wed, Dec 20, 2017 at 09:28:03AM CET, borisp@mellanox.com wrote: > >> Tue, Dec 19, 2017 at 01:10:10AM CET, jiri@resnulli.us wrote: >> >> Mon, Dec 18, 2017 at 06:10:10PM CET, jiri@resnulli.us wrote: >> >Mon, Dec 18, 2017 at 12:10:27PM CET, ilyal@mellanox.com wrote: >> >>Changes from v2: >> >>- Fix sk use after free and possible netdev use after free >> >>- tls device now keeps a refernce on the offloading netdev >> >>- tls device registers to the netdev notifer. >> >> Upon a NETDEV_DOWN event, offload is stopped and >> >> the reference on the netdev is dropped. >> >>- SW fallback support for skb->ip_summed != CHECKSUM_PARTIAL >> >>- Merged TLS patches are no longer part of this series. >> >> >> >>Changes from v1: >> >>- Remove the binding of the socket to a specific netdev >> >> through sk->sk_bound_dev_if. >> >> Add a check in validate_xmit_skb to detect route changes >> >> and call SW fallback code to do the crypto in software. >> >>- tls_get_record now returns the tls record sequence number. >> >> This is required to support connections with rcd_sn != iv. >> >>- Bug fixes to the TLS code. >> >> >> >>This patchset adds a generic infrastructure to offload TLS crypto to a >> >>network devices. >> >> >> >>patches 1-2 Export functions that we need patch 3 adds infrastructue >> >>for offloaded socket fallback patches 4-5 add new NDOs and >> >>capabilities. >> >>patch 6 adds the TLS NIC offload infrastructure. >> >> >> >>Github with mlx5e TLS offload support: >> >>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2F >> git >> >>hub.com%2FMellanox%2Ftls- >> offload%2Ftree%2Ftls_device_v3&data=02%7C01%7 >> >>Cborisp%40mellanox.com%7C5aebe81262554f40221908d546cb7c37%7Ca6 >> 52971c7d >> >>2e4d9ba6a4d149256f461b%7C0%7C0%7C636492762141202894&sdata=gYY >> DEmspNfBs >> >>aQhefcEojl456L9eWqZnEEI7iPCT0NA%3D&reserved=0 >> > >> >I don't get it. You are pushing infra but not the actual driver part >> >who is consuming the infra? Why? >> >> Okay. Since the driver that uses the API introduced by this patchset is >> missing, this patchset should be marked as RFC. >> >> Dave, I see that you were about to apply v2. I'm sure you missed this. >> Thanks. > >Isn't this a chicken and egg problem, where something must come first, >driver or infra. Unless we combine the infra patches with mlx5 driver >code and submit both in a single pull request. Yes, you should submit that in a single patchset. That is the usual way. Thanks. >Here, we assumed that the infra goes first, and we will submit the >driver soon after. We could submit the driver first instead. No. You cannot do that like this. > >Dave, would you prefer to get the driver patches that use this infra before >the infra? > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-20 8:28 ` Boris Pismenny 2017-12-20 10:08 ` Jiri Pirko @ 2017-12-20 10:15 ` Or Gerlitz 2017-12-20 10:31 ` Or Gerlitz 2017-12-20 16:12 ` David Miller 2 siblings, 1 reply; 35+ messages in thread From: Or Gerlitz @ 2017-12-20 10:15 UTC (permalink / raw) To: Boris Pismenny Cc: Jiri Pirko, Ilya Lesokhin, netdev, davem, davejwatson, tom, hannes, Aviad Yehezkel, Liran Liss On Wed, Dec 20, 2017 at 10:28 AM, Boris Pismenny <borisp@mellanox.com> wrote: > Isn't this a chicken and egg problem, where something must come first, > driver or infra. Unless we combine the infra patches with mlx5 driver > code and submit both in a single pull request. why chicken and egg? you do the infra changes and apply them on the driver. > Here, we assumed that the infra goes first, and we will submit the > driver soon after. We could submit the driver first instead. NOo Boris, It is fundamental requirement to show a use-case along with infra-structure. Or. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-20 10:15 ` Or Gerlitz @ 2017-12-20 10:31 ` Or Gerlitz 0 siblings, 0 replies; 35+ messages in thread From: Or Gerlitz @ 2017-12-20 10:31 UTC (permalink / raw) To: Boris Pismenny Cc: Jiri Pirko, Ilya Lesokhin, netdev, davem, davejwatson, tom, hannes, Aviad Yehezkel, Liran Liss > It is fundamental requirement to show a use-case along with infra-structure. To make it clear, sometimes you don't have yet the full HW/FW/FPGA solution when you come to upstream the infra code. In that case you do the driver code and submit it based on HW emulator run, this is common the netdev community and done by few vendors among them us. Or. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-20 8:28 ` Boris Pismenny 2017-12-20 10:08 ` Jiri Pirko 2017-12-20 10:15 ` Or Gerlitz @ 2017-12-20 16:12 ` David Miller 2017-12-20 16:23 ` Ilya Lesokhin 2 siblings, 1 reply; 35+ messages in thread From: David Miller @ 2017-12-20 16:12 UTC (permalink / raw) To: borisp; +Cc: jiri, ilyal, netdev, davejwatson, tom, hannes, aviadye, liranl From: Boris Pismenny <borisp@mellanox.com> Date: Wed, 20 Dec 2017 08:28:03 +0000 > Dave, would you prefer to get the driver patches that use this infra > before the infra? The arguments you present are silly. In order to analyze any proposed API, the users of it must be presented for the reviewers to see as well. Logically, you must have tried to make use of the APIs to see how well they work and are usable for at least one such user, right? Therefore, the use case exists, and you must present it alongside the API proposal. Whether you provide the API addition patches and the user in the same patch series, or a separate one, doesn't really matter. What is important is that this is accessible to the reviewer at the same time. ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-20 16:12 ` David Miller @ 2017-12-20 16:23 ` Ilya Lesokhin 2017-12-20 16:36 ` David Miller 0 siblings, 1 reply; 35+ messages in thread From: Ilya Lesokhin @ 2017-12-20 16:23 UTC (permalink / raw) To: David Miller, Boris Pismenny Cc: jiri, netdev, davejwatson, tom, hannes, Aviad Yehezkel, Liran Liss > > > Dave, would you prefer to get the driver patches that use this infra > > before the infra? > > The arguments you present are silly. > > In order to analyze any proposed API, the users of it must be presented for the > reviewers to see as well. > > Logically, you must have tried to make use of the APIs to see how well they > work and are usable for at least one such user, right? Right, we agree. > > Therefore, the use case exists, and you must present it alongside the API > proposal. > > Whether you provide the API addition patches and the user in the same patch > series, or a separate one, doesn't really matter. What is important is that this > is accessible to the reviewer at the same time. Note that we did provide a user in an accessible place. https://github.com/Mellanox/tls-offload/tree/tls_device_v3 The link was at the bottom of the cover letter. We just feel that the code there is not yet ready for upstream submission, and it might have conflicts with other stuff submitted by Mellanox. Would it be better if we submitted the mlx5e TLS support as an RFC alongside the TLS Infrastructure patches? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-20 16:23 ` Ilya Lesokhin @ 2017-12-20 16:36 ` David Miller 2017-12-20 19:12 ` Jiri Pirko 0 siblings, 1 reply; 35+ messages in thread From: David Miller @ 2017-12-20 16:36 UTC (permalink / raw) To: ilyal; +Cc: borisp, jiri, netdev, davejwatson, tom, hannes, aviadye, liranl From: Ilya Lesokhin <ilyal@mellanox.com> Date: Wed, 20 Dec 2017 16:23:03 +0000 >> Whether you provide the API addition patches and the user in the same patch >> series, or a separate one, doesn't really matter. What is important is that this >> is accessible to the reviewer at the same time. > > Note that we did provide a user in an accessible place. That is not accessible for people reading netdev, it needs to be posted on the netdev list. It is never appropriate to require a reviewer to look at some external site to review a patch series posted here. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure 2017-12-20 16:36 ` David Miller @ 2017-12-20 19:12 ` Jiri Pirko 0 siblings, 0 replies; 35+ messages in thread From: Jiri Pirko @ 2017-12-20 19:12 UTC (permalink / raw) To: David Miller Cc: ilyal, borisp, netdev, davejwatson, tom, hannes, aviadye, liranl Wed, Dec 20, 2017 at 05:36:14PM CET, davem@davemloft.net wrote: >From: Ilya Lesokhin <ilyal@mellanox.com> >Date: Wed, 20 Dec 2017 16:23:03 +0000 > >>> Whether you provide the API addition patches and the user in the same patch >>> series, or a separate one, doesn't really matter. What is important is that this >>> is accessible to the reviewer at the same time. >> >> Note that we did provide a user in an accessible place. > >That is not accessible for people reading netdev, it needs to be posted >on the netdev list. > >It is never appropriate to require a reviewer to look at some external >site to review a patch series posted here. Yeah, that is why I think it is really best to have the infra and the driver that uses it in a single patchset. Easier to understand what is going on :) ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2017-12-20 19:12 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-18 11:10 [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 1/6] tcp: Add clean acked data hook Ilya Lesokhin 2017-12-19 19:13 ` Eric Dumazet 2017-12-19 19:21 ` Ilya Lesokhin 2017-12-19 19:28 ` Eric Dumazet 2017-12-19 19:43 ` Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 2/6] net: Rename and export copy_skb_header Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets Ilya Lesokhin 2017-12-18 19:18 ` Marcelo Ricardo Leitner 2017-12-19 7:51 ` Ilya Lesokhin 2017-12-19 15:05 ` Marcelo Ricardo Leitner 2017-12-19 19:12 ` Eric Dumazet 2017-12-19 19:15 ` Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 4/6] net: Add TLS offload netdev ops Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 5/6] net: Add TLS TX offload features Ilya Lesokhin 2017-12-18 11:10 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure Ilya Lesokhin 2017-12-18 19:53 ` Marcelo Ricardo Leitner 2017-12-19 7:31 ` Ilya Lesokhin 2017-12-19 15:11 ` Marcelo Ricardo Leitner 2017-12-19 15:38 ` Ilya Lesokhin 2017-12-19 16:18 ` Marcelo Ricardo Leitner 2017-12-19 7:00 ` kbuild test robot 2017-12-19 7:01 ` kbuild test robot 2017-12-19 8:17 ` [RFC PATCH] tls: tls_sw_fallback() can be static kbuild test robot 2017-12-19 8:17 ` [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure kbuild test robot 2017-12-18 17:10 ` [PATCH v3 net-next 0/6] " Jiri Pirko 2017-12-19 10:30 ` Jiri Pirko 2017-12-20 8:28 ` Boris Pismenny 2017-12-20 10:08 ` Jiri Pirko 2017-12-20 10:15 ` Or Gerlitz 2017-12-20 10:31 ` Or Gerlitz 2017-12-20 16:12 ` David Miller 2017-12-20 16:23 ` Ilya Lesokhin 2017-12-20 16:36 ` David Miller 2017-12-20 19:12 ` Jiri Pirko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.