From: Boris Pismenny <borisp@mellanox.com> To: kuba@kernel.org, davem@davemloft.net, saeedm@nvidia.com, hch@lst.de, sagi@grimberg.me, axboe@fb.com, kbusch@kernel.org, viro@zeniv.linux.org.uk, edumazet@google.com, dsahern@gmail.com, smalin@marvell.com Cc: boris.pismenny@gmail.com, linux-nvme@lists.infradead.org, netdev@vger.kernel.org, benishay@nvidia.com, ogerlitz@nvidia.com, yorayz@nvidia.com, Ben Ben-Ishay <benishay@mellanox.com>, Or Gerlitz <ogerlitz@mellanox.com>, Yoray Zack <yorayz@mellanox.com> Subject: [PATCH v2 net-next 03/21] net: Introduce crc offload for tcp ddp ulp Date: Thu, 14 Jan 2021 17:10:15 +0200 [thread overview] Message-ID: <20210114151033.13020-4-borisp@mellanox.com> (raw) In-Reply-To: <20210114151033.13020-1-borisp@mellanox.com> This commit introduces support for CRC offload to direct data placement ULP on the receive side. Both DDP and CRC share a common API to initialize the offload for a TCP socket. But otherwise, both can be executed independently. On the receive side, CRC offload requires a new SKB bit that indicates that no CRC error was encountered while processing this packet. If all packets of a ULP message have this bit set, then the CRC verification for the message can be skipped, as hardware already checked it. The following patches will set and use this bit to perform NVME-TCP CRC offload. A subsequent series, will add NVMe-TCP transmit side CRC support. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ben Ben-Ishay <benishay@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yoray Zack <yorayz@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> --- include/linux/netdev_features.h | 2 ++ include/linux/skbuff.h | 5 +++++ net/Kconfig | 8 ++++++++ net/ethtool/common.c | 1 + net/ipv4/tcp_input.c | 7 +++++++ net/ipv4/tcp_ipv4.c | 3 +++ net/ipv4/tcp_offload.c | 3 +++ 7 files changed, 29 insertions(+) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index fb35dcac03d2..dc79709586cd 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -85,6 +85,7 @@ enum { NETIF_F_HW_MACSEC_BIT, /* Offload MACsec operations */ NETIF_F_HW_TCP_DDP_BIT, /* TCP direct data placement offload */ + NETIF_F_HW_TCP_DDP_CRC_RX_BIT, /* TCP DDP CRC RX offload */ /* * Add your fresh new feature above and remember to update @@ -159,6 +160,7 @@ enum { #define NETIF_F_GSO_FRAGLIST __NETIF_F(GSO_FRAGLIST) #define NETIF_F_HW_MACSEC __NETIF_F(HW_MACSEC) #define NETIF_F_HW_TCP_DDP __NETIF_F(HW_TCP_DDP) +#define NETIF_F_HW_TCP_DDP_CRC_RX __NETIF_F(HW_TCP_DDP_CRC_RX) /* Finds the next feature with the highest number of the range of start till 0. */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 333bcdc39635..a841bc974452 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -683,6 +683,7 @@ typedef unsigned char *sk_buff_data_t; * CHECKSUM_UNNECESSARY (max 3) * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB + * @ddp_crc: NIC is responsible for PDU's CRC computation and verification * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @secmark: security marking @@ -859,6 +860,10 @@ struct sk_buff { #ifdef CONFIG_TLS_DEVICE __u8 decrypted:1; #endif +#ifdef CONFIG_TCP_DDP_CRC + __u8 ddp_crc:1; +#endif + #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ diff --git a/net/Kconfig b/net/Kconfig index 3876861cdc90..80ed9f038968 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -465,6 +465,14 @@ config TCP_DDP NVMe-TCP/iSCSI, to request the NIC to place TCP payload data of a command response directly into kernel pages. +config TCP_DDP_CRC + bool "TCP direct data placement CRC offload" + default n + help + Direct Data Placement (DDP) CRC32C offload for TCP enables ULP, such as + NVMe-TCP/iSCSI, to request the NIC to calculate/verify the data digest + of commands as they go through the NIC. Thus avoiding the costly + per-byte overhead. endif # if NET diff --git a/net/ethtool/common.c b/net/ethtool/common.c index a2ff7a4a6bbf..cc6858105449 100644 --- a/net/ethtool/common.c +++ b/net/ethtool/common.c @@ -69,6 +69,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = { [NETIF_F_GRO_FRAGLIST_BIT] = "rx-gro-list", [NETIF_F_HW_MACSEC_BIT] = "macsec-hw-offload", [NETIF_F_HW_TCP_DDP_BIT] = "tcp-ddp-offload", + [NETIF_F_HW_TCP_DDP_CRC_RX_BIT] = "tcp-ddp-crc-rx-offload", }; const char diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c7e16b0ed791..090c695e18b8 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5147,6 +5147,9 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); #ifdef CONFIG_TLS_DEVICE nskb->decrypted = skb->decrypted; +#endif +#ifdef CONFIG_TCP_DDP_CRC + nskb->ddp_crc = skb->ddp_crc; #endif TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start; if (list) @@ -5180,6 +5183,10 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, #ifdef CONFIG_TLS_DEVICE if (skb->decrypted != nskb->decrypted) goto end; +#endif +#ifdef CONFIG_TCP_DDP_CRC + if (skb->ddp_crc != nskb->ddp_crc) + goto end; #endif } } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 58207c7769d0..64ccaf08fb3a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1815,6 +1815,9 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_ECE | TCPHDR_CWR)) || #ifdef CONFIG_TLS_DEVICE tail->decrypted != skb->decrypted || +#endif +#ifdef CONFIG_TCP_DDP_CRC + tail->ddp_crc != skb->ddp_crc || #endif thtail->doff != th->doff || memcmp(thtail + 1, th + 1, hdrlen - sizeof(*th))) diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index e09147ac9a99..39f5f0bcf181 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -262,6 +262,9 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb) #ifdef CONFIG_TLS_DEVICE flush |= p->decrypted ^ skb->decrypted; #endif +#ifdef CONFIG_TCP_DDP_CRC + flush |= p->ddp_crc ^ skb->ddp_crc; +#endif if (flush || skb_gro_receive(p, skb)) { mss = 1; -- 2.24.1
WARNING: multiple messages have this Message-ID (diff)
From: Boris Pismenny <borisp@mellanox.com> To: kuba@kernel.org, davem@davemloft.net, saeedm@nvidia.com, hch@lst.de, sagi@grimberg.me, axboe@fb.com, kbusch@kernel.org, viro@zeniv.linux.org.uk, edumazet@google.com, dsahern@gmail.com, smalin@marvell.com Cc: Yoray Zack <yorayz@mellanox.com>, yorayz@nvidia.com, boris.pismenny@gmail.com, Ben Ben-Ishay <benishay@mellanox.com>, benishay@nvidia.com, linux-nvme@lists.infradead.org, netdev@vger.kernel.org, Or Gerlitz <ogerlitz@mellanox.com>, ogerlitz@nvidia.com Subject: [PATCH v2 net-next 03/21] net: Introduce crc offload for tcp ddp ulp Date: Thu, 14 Jan 2021 17:10:15 +0200 [thread overview] Message-ID: <20210114151033.13020-4-borisp@mellanox.com> (raw) In-Reply-To: <20210114151033.13020-1-borisp@mellanox.com> This commit introduces support for CRC offload to direct data placement ULP on the receive side. Both DDP and CRC share a common API to initialize the offload for a TCP socket. But otherwise, both can be executed independently. On the receive side, CRC offload requires a new SKB bit that indicates that no CRC error was encountered while processing this packet. If all packets of a ULP message have this bit set, then the CRC verification for the message can be skipped, as hardware already checked it. The following patches will set and use this bit to perform NVME-TCP CRC offload. A subsequent series, will add NVMe-TCP transmit side CRC support. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ben Ben-Ishay <benishay@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yoray Zack <yorayz@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> --- include/linux/netdev_features.h | 2 ++ include/linux/skbuff.h | 5 +++++ net/Kconfig | 8 ++++++++ net/ethtool/common.c | 1 + net/ipv4/tcp_input.c | 7 +++++++ net/ipv4/tcp_ipv4.c | 3 +++ net/ipv4/tcp_offload.c | 3 +++ 7 files changed, 29 insertions(+) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index fb35dcac03d2..dc79709586cd 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -85,6 +85,7 @@ enum { NETIF_F_HW_MACSEC_BIT, /* Offload MACsec operations */ NETIF_F_HW_TCP_DDP_BIT, /* TCP direct data placement offload */ + NETIF_F_HW_TCP_DDP_CRC_RX_BIT, /* TCP DDP CRC RX offload */ /* * Add your fresh new feature above and remember to update @@ -159,6 +160,7 @@ enum { #define NETIF_F_GSO_FRAGLIST __NETIF_F(GSO_FRAGLIST) #define NETIF_F_HW_MACSEC __NETIF_F(HW_MACSEC) #define NETIF_F_HW_TCP_DDP __NETIF_F(HW_TCP_DDP) +#define NETIF_F_HW_TCP_DDP_CRC_RX __NETIF_F(HW_TCP_DDP_CRC_RX) /* Finds the next feature with the highest number of the range of start till 0. */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 333bcdc39635..a841bc974452 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -683,6 +683,7 @@ typedef unsigned char *sk_buff_data_t; * CHECKSUM_UNNECESSARY (max 3) * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB + * @ddp_crc: NIC is responsible for PDU's CRC computation and verification * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @secmark: security marking @@ -859,6 +860,10 @@ struct sk_buff { #ifdef CONFIG_TLS_DEVICE __u8 decrypted:1; #endif +#ifdef CONFIG_TCP_DDP_CRC + __u8 ddp_crc:1; +#endif + #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ diff --git a/net/Kconfig b/net/Kconfig index 3876861cdc90..80ed9f038968 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -465,6 +465,14 @@ config TCP_DDP NVMe-TCP/iSCSI, to request the NIC to place TCP payload data of a command response directly into kernel pages. +config TCP_DDP_CRC + bool "TCP direct data placement CRC offload" + default n + help + Direct Data Placement (DDP) CRC32C offload for TCP enables ULP, such as + NVMe-TCP/iSCSI, to request the NIC to calculate/verify the data digest + of commands as they go through the NIC. Thus avoiding the costly + per-byte overhead. endif # if NET diff --git a/net/ethtool/common.c b/net/ethtool/common.c index a2ff7a4a6bbf..cc6858105449 100644 --- a/net/ethtool/common.c +++ b/net/ethtool/common.c @@ -69,6 +69,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = { [NETIF_F_GRO_FRAGLIST_BIT] = "rx-gro-list", [NETIF_F_HW_MACSEC_BIT] = "macsec-hw-offload", [NETIF_F_HW_TCP_DDP_BIT] = "tcp-ddp-offload", + [NETIF_F_HW_TCP_DDP_CRC_RX_BIT] = "tcp-ddp-crc-rx-offload", }; const char diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c7e16b0ed791..090c695e18b8 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5147,6 +5147,9 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); #ifdef CONFIG_TLS_DEVICE nskb->decrypted = skb->decrypted; +#endif +#ifdef CONFIG_TCP_DDP_CRC + nskb->ddp_crc = skb->ddp_crc; #endif TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start; if (list) @@ -5180,6 +5183,10 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, #ifdef CONFIG_TLS_DEVICE if (skb->decrypted != nskb->decrypted) goto end; +#endif +#ifdef CONFIG_TCP_DDP_CRC + if (skb->ddp_crc != nskb->ddp_crc) + goto end; #endif } } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 58207c7769d0..64ccaf08fb3a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1815,6 +1815,9 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_ECE | TCPHDR_CWR)) || #ifdef CONFIG_TLS_DEVICE tail->decrypted != skb->decrypted || +#endif +#ifdef CONFIG_TCP_DDP_CRC + tail->ddp_crc != skb->ddp_crc || #endif thtail->doff != th->doff || memcmp(thtail + 1, th + 1, hdrlen - sizeof(*th))) diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index e09147ac9a99..39f5f0bcf181 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -262,6 +262,9 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb) #ifdef CONFIG_TLS_DEVICE flush |= p->decrypted ^ skb->decrypted; #endif +#ifdef CONFIG_TCP_DDP_CRC + flush |= p->ddp_crc ^ skb->ddp_crc; +#endif if (flush || skb_gro_receive(p, skb)) { mss = 1; -- 2.24.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2021-01-14 15:12 UTC|newest] Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-14 15:10 [PATCH v2 net-next 00/21] nvme-tcp receive offloads Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 01/21] iov_iter: Introduce new procedures for copy to iter/pages Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 02/21] net: Introduce direct data placement tcp offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:57 ` Eric Dumazet 2021-01-14 15:57 ` Eric Dumazet 2021-01-14 20:19 ` Boris Pismenny 2021-01-14 20:19 ` Boris Pismenny 2021-01-14 20:43 ` Eric Dumazet 2021-01-14 20:43 ` Eric Dumazet 2021-01-31 10:40 ` Boris Pismenny 2021-01-31 10:40 ` Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny [this message] 2021-01-14 15:10 ` [PATCH v2 net-next 03/21] net: Introduce crc offload for tcp ddp ulp Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 04/21] net: SKB copy(+hash) iterators for DDP offloads Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 05/21] net/tls: expose get_netdev_for_sock Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 06/21] nvme-tcp: Add DDP offload control path Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-19 3:47 ` David Ahern 2021-01-19 3:47 ` David Ahern 2021-01-31 7:51 ` Boris Pismenny 2021-01-31 7:51 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 07/21] nvme-tcp: Add DDP data-path Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-19 4:18 ` David Ahern 2021-01-19 4:18 ` David Ahern 2021-01-31 8:44 ` Boris Pismenny 2021-01-31 8:44 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 08/21] nvme-tcp : Recalculate crc in the end of the capsule Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 09/21] nvme-tcp: Deal with netdevice DOWN events Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 10/21] net/mlx5: Header file changes for nvme-tcp offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 11/21] net/mlx5: Add 128B CQE for NVMEoTCP offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 12/21] net/mlx5e: TCP flow steering for nvme-tcp Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 13/21] net/mlx5e: NVMEoTCP offload initialization Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 14/21] net/mlx5e: KLM UMR helper macros Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 15/21] net/mlx5e: NVMEoTCP use KLM UMRs Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 16/21] net/mlx5e: NVMEoTCP queue init/teardown Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 17/21] net/mlx5e: NVMEoTCP async ddp invalidation Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 18/21] net/mlx5e: NVMEoTCP ddp setup and resync Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 19/21] net/mlx5e: NVMEoTCP, data-path for DDP offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-16 4:57 ` David Ahern 2021-01-16 4:57 ` David Ahern 2021-01-17 8:42 ` Boris Pismenny 2021-01-17 8:42 ` Boris Pismenny 2021-01-19 4:36 ` David Ahern 2021-01-19 4:36 ` David Ahern 2021-01-31 9:27 ` Boris Pismenny 2021-01-31 9:27 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 20/21] net/mlx5e: NVMEoTCP statistics Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 21/21] Documentation: add TCP DDP offload documentation Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210114151033.13020-4-borisp@mellanox.com \ --to=borisp@mellanox.com \ --cc=axboe@fb.com \ --cc=benishay@mellanox.com \ --cc=benishay@nvidia.com \ --cc=boris.pismenny@gmail.com \ --cc=davem@davemloft.net \ --cc=dsahern@gmail.com \ --cc=edumazet@google.com \ --cc=hch@lst.de \ --cc=kbusch@kernel.org \ --cc=kuba@kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=netdev@vger.kernel.org \ --cc=ogerlitz@mellanox.com \ --cc=ogerlitz@nvidia.com \ --cc=saeedm@nvidia.com \ --cc=sagi@grimberg.me \ --cc=smalin@marvell.com \ --cc=viro@zeniv.linux.org.uk \ --cc=yorayz@mellanox.com \ --cc=yorayz@nvidia.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.