From: Boris Pismenny <borispismenny@gmail.com> To: Sagi Grimberg <sagi@grimberg.me>, Boris Pismenny <borisp@mellanox.com>, kuba@kernel.org, davem@davemloft.net, saeedm@nvidia.com, hch@lst.de, axboe@fb.com, kbusch@kernel.org, viro@zeniv.linux.org.uk, edumazet@google.com Cc: Yoray Zack <yorayz@mellanox.com>, Ben Ben-Ishay <benishay@mellanox.com>, boris.pismenny@gmail.com, linux-nvme@lists.infradead.org, netdev@vger.kernel.org, Or Gerlitz <ogerlitz@mellanox.com> Subject: Re: [PATCH net-next RFC v1 07/10] nvme-tcp : Recalculate crc in the end of the capsule Date: Sun, 8 Nov 2020 16:46:45 +0200 [thread overview] Message-ID: <d080bd0c-ca1d-42a6-bee7-e6aa4bcb6896@gmail.com> (raw) In-Reply-To: <a17cf1ca-4183-8f6c-8470-9d45febb755b@grimberg.me> On 09/10/2020 1:44, Sagi Grimberg wrote: >> crc offload of the nvme capsule. Check if all the skb bits >> are on, and if not recalculate the crc in SW and check it. > Can you clarify in the patch description that this is only > for pdu data digest and not header digest? Will do > >> This patch reworks the receive-side crc calculation to always >> run at the end, so as to keep a single flow for both offload >> and non-offload. This change simplifies the code, but it may degrade >> performance for non-offload crc calculation. > ?? > > From my scan it doeesn't look like you do that.. Am I missing something? > Can you explain? The performance of CRC data digest in the offload's fallback path may be less good compared to CRC calculation with skb_copy_and_hash. To be clear, the fallback path is occurs when `queue->data_digest && test_bit(NVME_TCP_Q_OFF_CRC_RX, &queue->flags)`, while we receive SKBs where `skb->ddp_crc = 0` > >> rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id); >> if (!rq) { >> dev_err(queue->ctrl->ctrl.device, >> @@ -992,7 +1031,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb, >> recv_len = min_t(size_t, recv_len, >> iov_iter_count(&req->iter)); >> >> - if (queue->data_digest) >> + if (queue->data_digest && !test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags)) >> ret = skb_copy_and_hash_datagram_iter(skb, *offset, >> &req->iter, recv_len, queue->rcv_hash); > This is the skb copy and hash, not clear why you say that you move this > to the end... See the offload fallback path below > >> else >> @@ -1012,7 +1051,6 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb, >> >> if (!queue->data_remaining) { >> if (queue->data_digest) { >> - nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst); > If I instead do: > if (!test_bit(NVME_TCP_Q_OFFLOADS, > &queue->flags)) > nvme_tcp_ddgst_final(queue->rcv_hash, > &queue->exp_ddgst); > > Does that help the mess in nvme_tcp_recv_ddgst? Not really, as the code path there takes care of the fallback path, i.e. offloaded requested, but didn't succeed. > >> queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH; >> } else { >> if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) { >> @@ -1033,8 +1071,11 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue, >> char *ddgst = (char *)&queue->recv_ddgst; >> size_t recv_len = min_t(size_t, *len, queue->ddgst_remaining); >> off_t off = NVME_TCP_DIGEST_LENGTH - queue->ddgst_remaining; >> + bool ddgst_offload_fail; >> int ret; >> >> + if (test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags)) >> + nvme_tcp_device_ddgst_update(queue, skb); >> ret = skb_copy_bits(skb, *offset, &ddgst[off], recv_len); >> if (unlikely(ret)) >> return ret; >> @@ -1045,12 +1086,21 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue, >> if (queue->ddgst_remaining) >> return 0; >> >> - if (queue->recv_ddgst != queue->exp_ddgst) { >> - dev_err(queue->ctrl->ctrl.device, >> - "data digest error: recv %#x expected %#x\n", >> - le32_to_cpu(queue->recv_ddgst), >> - le32_to_cpu(queue->exp_ddgst)); >> - return -EIO; >> + ddgst_offload_fail = !nvme_tcp_device_ddgst_ok(queue); >> + if (!test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags) || >> + ddgst_offload_fail) { >> + if (test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags) && >> + ddgst_offload_fail) >> + nvme_tcp_crc_recalculate(queue, pdu); >> + >> + nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst); >> + if (queue->recv_ddgst != queue->exp_ddgst) { >> + dev_err(queue->ctrl->ctrl.device, >> + "data digest error: recv %#x expected %#x\n", >> + le32_to_cpu(queue->recv_ddgst), >> + le32_to_cpu(queue->exp_ddgst)); >> + return -EIO; > This gets convoluted here... Will try to simplify, the general idea is that there are 3 paths with common code: 1. non-offload 2. offload failed 3. offload success (1) and (2) share the code for finalizing checking the data digest, while (3) skips this entirely. In other words, how about this: ``` offload_fail = !nvme_tcp_ddp_ddgst_ok(queue); offload = test_bit(NVME_TCP_Q_OFF_CRC_RX, &queue->flags); if (!offload || offload_fail) { if (offload && offload_fail) // software-fallback nvme_tcp_ddp_ddgst_recalc(queue, pdu); nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst); if (queue->recv_ddgst != queue->exp_ddgst) { dev_err(queue->ctrl->ctrl.device, "data digest error: recv %#x expected %#x\n", le32_to_cpu(queue->recv_ddgst), le32_to_cpu(queue->exp_ddgst)); return -EIO; } } ``` > >> + } >> } >> >> if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) { >>
WARNING: multiple messages have this Message-ID (diff)
From: Boris Pismenny <borispismenny@gmail.com> To: Sagi Grimberg <sagi@grimberg.me>, Boris Pismenny <borisp@mellanox.com>, kuba@kernel.org, davem@davemloft.net, saeedm@nvidia.com, hch@lst.de, axboe@fb.com, kbusch@kernel.org, viro@zeniv.linux.org.uk, edumazet@google.com Cc: Yoray Zack <yorayz@mellanox.com>, Ben Ben-Ishay <benishay@mellanox.com>, boris.pismenny@gmail.com, linux-nvme@lists.infradead.org, netdev@vger.kernel.org, Or Gerlitz <ogerlitz@mellanox.com> Subject: Re: [PATCH net-next RFC v1 07/10] nvme-tcp : Recalculate crc in the end of the capsule Date: Sun, 8 Nov 2020 16:46:45 +0200 [thread overview] Message-ID: <d080bd0c-ca1d-42a6-bee7-e6aa4bcb6896@gmail.com> (raw) In-Reply-To: <a17cf1ca-4183-8f6c-8470-9d45febb755b@grimberg.me> On 09/10/2020 1:44, Sagi Grimberg wrote: >> crc offload of the nvme capsule. Check if all the skb bits >> are on, and if not recalculate the crc in SW and check it. > Can you clarify in the patch description that this is only > for pdu data digest and not header digest? Will do > >> This patch reworks the receive-side crc calculation to always >> run at the end, so as to keep a single flow for both offload >> and non-offload. This change simplifies the code, but it may degrade >> performance for non-offload crc calculation. > ?? > > From my scan it doeesn't look like you do that.. Am I missing something? > Can you explain? The performance of CRC data digest in the offload's fallback path may be less good compared to CRC calculation with skb_copy_and_hash. To be clear, the fallback path is occurs when `queue->data_digest && test_bit(NVME_TCP_Q_OFF_CRC_RX, &queue->flags)`, while we receive SKBs where `skb->ddp_crc = 0` > >> rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id); >> if (!rq) { >> dev_err(queue->ctrl->ctrl.device, >> @@ -992,7 +1031,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb, >> recv_len = min_t(size_t, recv_len, >> iov_iter_count(&req->iter)); >> >> - if (queue->data_digest) >> + if (queue->data_digest && !test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags)) >> ret = skb_copy_and_hash_datagram_iter(skb, *offset, >> &req->iter, recv_len, queue->rcv_hash); > This is the skb copy and hash, not clear why you say that you move this > to the end... See the offload fallback path below > >> else >> @@ -1012,7 +1051,6 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb, >> >> if (!queue->data_remaining) { >> if (queue->data_digest) { >> - nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst); > If I instead do: > if (!test_bit(NVME_TCP_Q_OFFLOADS, > &queue->flags)) > nvme_tcp_ddgst_final(queue->rcv_hash, > &queue->exp_ddgst); > > Does that help the mess in nvme_tcp_recv_ddgst? Not really, as the code path there takes care of the fallback path, i.e. offloaded requested, but didn't succeed. > >> queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH; >> } else { >> if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) { >> @@ -1033,8 +1071,11 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue, >> char *ddgst = (char *)&queue->recv_ddgst; >> size_t recv_len = min_t(size_t, *len, queue->ddgst_remaining); >> off_t off = NVME_TCP_DIGEST_LENGTH - queue->ddgst_remaining; >> + bool ddgst_offload_fail; >> int ret; >> >> + if (test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags)) >> + nvme_tcp_device_ddgst_update(queue, skb); >> ret = skb_copy_bits(skb, *offset, &ddgst[off], recv_len); >> if (unlikely(ret)) >> return ret; >> @@ -1045,12 +1086,21 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue, >> if (queue->ddgst_remaining) >> return 0; >> >> - if (queue->recv_ddgst != queue->exp_ddgst) { >> - dev_err(queue->ctrl->ctrl.device, >> - "data digest error: recv %#x expected %#x\n", >> - le32_to_cpu(queue->recv_ddgst), >> - le32_to_cpu(queue->exp_ddgst)); >> - return -EIO; >> + ddgst_offload_fail = !nvme_tcp_device_ddgst_ok(queue); >> + if (!test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags) || >> + ddgst_offload_fail) { >> + if (test_bit(NVME_TCP_Q_OFFLOADS, &queue->flags) && >> + ddgst_offload_fail) >> + nvme_tcp_crc_recalculate(queue, pdu); >> + >> + nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst); >> + if (queue->recv_ddgst != queue->exp_ddgst) { >> + dev_err(queue->ctrl->ctrl.device, >> + "data digest error: recv %#x expected %#x\n", >> + le32_to_cpu(queue->recv_ddgst), >> + le32_to_cpu(queue->exp_ddgst)); >> + return -EIO; > This gets convoluted here... Will try to simplify, the general idea is that there are 3 paths with common code: 1. non-offload 2. offload failed 3. offload success (1) and (2) share the code for finalizing checking the data digest, while (3) skips this entirely. In other words, how about this: ``` offload_fail = !nvme_tcp_ddp_ddgst_ok(queue); offload = test_bit(NVME_TCP_Q_OFF_CRC_RX, &queue->flags); if (!offload || offload_fail) { if (offload && offload_fail) // software-fallback nvme_tcp_ddp_ddgst_recalc(queue, pdu); nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst); if (queue->recv_ddgst != queue->exp_ddgst) { dev_err(queue->ctrl->ctrl.device, "data digest error: recv %#x expected %#x\n", le32_to_cpu(queue->recv_ddgst), le32_to_cpu(queue->exp_ddgst)); return -EIO; } } ``` > >> + } >> } >> >> if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) { >> _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-11-08 14:46 UTC|newest] Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-30 16:20 [PATCH net-next RFC v1 00/10] nvme-tcp receive offloads Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-09-30 16:20 ` [PATCH net-next RFC v1 01/10] iov_iter: Skip copy in memcpy_to_page if src==dst Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 23:05 ` Sagi Grimberg 2020-10-08 23:05 ` Sagi Grimberg 2020-09-30 16:20 ` [PATCH net-next RFC v1 02/10] net: Introduce direct data placement tcp offload Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 21:47 ` Sagi Grimberg 2020-10-08 21:47 ` Sagi Grimberg 2020-10-11 14:44 ` Boris Pismenny 2020-10-11 14:44 ` Boris Pismenny 2020-09-30 16:20 ` [PATCH net-next RFC v1 03/10] net: Introduce crc offload for tcp ddp ulp Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 21:51 ` Sagi Grimberg 2020-10-08 21:51 ` Sagi Grimberg 2020-10-11 14:58 ` Boris Pismenny 2020-10-11 14:58 ` Boris Pismenny 2020-09-30 16:20 ` [PATCH net-next RFC v1 04/10] net/tls: expose get_netdev_for_sock Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 21:56 ` Sagi Grimberg 2020-10-08 21:56 ` Sagi Grimberg 2020-09-30 16:20 ` [PATCH net-next RFC v1 05/10] nvme-tcp: Add DDP offload control path Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 22:19 ` Sagi Grimberg 2020-10-08 22:19 ` Sagi Grimberg 2020-10-19 18:28 ` Boris Pismenny 2020-10-19 18:28 ` Boris Pismenny [not found] ` <PH0PR18MB3845430DDF572E0DD4832D06CCED0@PH0PR18MB3845.namprd18.prod.outlook.com> 2020-11-08 6:51 ` Shai Malin 2020-11-08 6:51 ` Shai Malin 2020-11-09 23:23 ` Sagi Grimberg 2020-11-09 23:23 ` Sagi Grimberg 2020-11-11 5:12 ` FW: " Shai Malin 2020-11-11 5:12 ` Shai Malin 2020-11-11 5:43 ` Shai Malin 2020-11-11 5:43 ` Shai Malin 2020-09-30 16:20 ` [PATCH net-next RFC v1 06/10] nvme-tcp: Add DDP data-path Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 22:29 ` Sagi Grimberg 2020-10-08 22:29 ` Sagi Grimberg 2020-10-08 23:00 ` Sagi Grimberg 2020-10-08 23:00 ` Sagi Grimberg 2020-11-08 13:59 ` Boris Pismenny 2020-11-08 13:59 ` Boris Pismenny 2020-11-08 9:44 ` Boris Pismenny 2020-11-08 9:44 ` Boris Pismenny 2020-11-09 23:18 ` Sagi Grimberg 2020-11-09 23:18 ` Sagi Grimberg 2020-09-30 16:20 ` [PATCH net-next RFC v1 07/10] nvme-tcp : Recalculate crc in the end of the capsule Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 22:44 ` Sagi Grimberg 2020-10-08 22:44 ` Sagi Grimberg [not found] ` <PH0PR18MB3845764B48FD24C87FA34304CCED0@PH0PR18MB3845.namprd18.prod.outlook.com> [not found] ` <PH0PR18MB38458FD325BD77983D2623D4CCEB0@PH0PR18MB3845.namprd18.prod.outlook.com> 2020-11-08 6:59 ` Shai Malin 2020-11-08 6:59 ` Shai Malin 2020-11-08 7:28 ` Boris Pismenny 2020-11-08 7:28 ` Boris Pismenny 2020-11-08 14:46 ` Boris Pismenny [this message] 2020-11-08 14:46 ` Boris Pismenny 2020-09-30 16:20 ` [PATCH net-next RFC v1 08/10] nvme-tcp: Deal with netdevice DOWN events Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-08 22:47 ` Sagi Grimberg 2020-10-08 22:47 ` Sagi Grimberg 2020-10-11 6:54 ` Or Gerlitz 2020-10-11 6:54 ` Or Gerlitz 2020-09-30 16:20 ` [PATCH net-next RFC v1 09/10] net/mlx5e: Add NVMEoTCP offload Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-09-30 22:33 ` kernel test robot 2020-10-01 0:26 ` kernel test robot 2020-09-30 16:20 ` [PATCH net-next RFC v1 10/10] net/mlx5e: NVMEoTCP, data-path for DDP offload Boris Pismenny 2020-09-30 16:20 ` Boris Pismenny 2020-10-01 1:10 ` kernel test robot 2020-10-09 0:08 ` [PATCH net-next RFC v1 00/10] nvme-tcp receive offloads Sagi Grimberg 2020-10-09 0:08 ` Sagi Grimberg
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d080bd0c-ca1d-42a6-bee7-e6aa4bcb6896@gmail.com \ --to=borispismenny@gmail.com \ --cc=axboe@fb.com \ --cc=benishay@mellanox.com \ --cc=boris.pismenny@gmail.com \ --cc=borisp@mellanox.com \ --cc=davem@davemloft.net \ --cc=edumazet@google.com \ --cc=hch@lst.de \ --cc=kbusch@kernel.org \ --cc=kuba@kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=netdev@vger.kernel.org \ --cc=ogerlitz@mellanox.com \ --cc=saeedm@nvidia.com \ --cc=sagi@grimberg.me \ --cc=viro@zeniv.linux.org.uk \ --cc=yorayz@mellanox.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.