From: David Ahern <dsahern@gmail.com> To: Boris Pismenny <borispismenny@gmail.com>, Boris Pismenny <borisp@mellanox.com>, kuba@kernel.org, davem@davemloft.net, saeedm@nvidia.com, hch@lst.de, sagi@grimberg.me, axboe@fb.com, kbusch@kernel.org, viro@zeniv.linux.org.uk, edumazet@google.com, smalin@marvell.com Cc: boris.pismenny@gmail.com, linux-nvme@lists.infradead.org, netdev@vger.kernel.org, benishay@nvidia.com, ogerlitz@nvidia.com, yorayz@nvidia.com, Or Gerlitz <ogerlitz@mellanox.com>, Yoray Zack <yorayz@mellanox.com> Subject: Re: [PATCH v2 net-next 19/21] net/mlx5e: NVMEoTCP, data-path for DDP offload Date: Mon, 18 Jan 2021 21:36:30 -0700 [thread overview] Message-ID: <2a0bfce0-6226-7b9a-95b1-15f4f1f321e8@gmail.com> (raw) In-Reply-To: <15248743-82bf-4283-d8c6-99f2210e42ae@gmail.com> On 1/17/21 1:42 AM, Boris Pismenny wrote: > This is needed for a few reasons that are explained in detail > in the tcp-ddp offload documentation. See patch 21 overview > and rx-data-path sections. Our reasons are as follows: I read the documentation patch, and it does not explain it and really should not since this is very mlx specific based on the changes. Different h/w will have different limitations. Given that, it would be best to enhance the patch description to explain why these gymnastics are needed for the skb. > 1) Each SKB may contain multiple PDUs. DDP offload doesn't operate on > PDU headers, so these are written in the receive ring. Therefore, we > need to rebuild the SKB to account for it. Additionally, due to HW > limitations, we will only offload the first PDU in the SKB. Are you referring to LRO skbs here? I can't imagine going through this for 1500 byte packets that have multiple PDUs. > 2) The newly constructed SKB represents the original data as it is on > the wire, such that the network stack is oblivious to the offload. > 3) We decided not to modify all of the mlx5e_skb_from_cqe* functions > because it would make the offload harder to distinguish, and it would > add overhead to the existing data-path fucntions. Therefore, we opted > for this modular approach. > > If we only had generic header-data split, then we just couldn't > provide this offload. It is not enough to place payload into some > buffer without TCP headers because RPC protocols and advanced storage > protocols, such as nvme-tcp, reorder their responses and require data > to be placed into application/pagecache buffers, which are anything > but anonymous. In other words, header-data split alone writes data > to the wrong buffers (reordering), or to anonymous buffers that > can't be page-flipped to replace application/pagecache buffers. >
WARNING: multiple messages have this Message-ID (diff)
From: David Ahern <dsahern@gmail.com> To: Boris Pismenny <borispismenny@gmail.com>, Boris Pismenny <borisp@mellanox.com>, kuba@kernel.org, davem@davemloft.net, saeedm@nvidia.com, hch@lst.de, sagi@grimberg.me, axboe@fb.com, kbusch@kernel.org, viro@zeniv.linux.org.uk, edumazet@google.com, smalin@marvell.com Cc: Yoray Zack <yorayz@mellanox.com>, yorayz@nvidia.com, boris.pismenny@gmail.com, benishay@nvidia.com, linux-nvme@lists.infradead.org, netdev@vger.kernel.org, Or Gerlitz <ogerlitz@mellanox.com>, ogerlitz@nvidia.com Subject: Re: [PATCH v2 net-next 19/21] net/mlx5e: NVMEoTCP, data-path for DDP offload Date: Mon, 18 Jan 2021 21:36:30 -0700 [thread overview] Message-ID: <2a0bfce0-6226-7b9a-95b1-15f4f1f321e8@gmail.com> (raw) In-Reply-To: <15248743-82bf-4283-d8c6-99f2210e42ae@gmail.com> On 1/17/21 1:42 AM, Boris Pismenny wrote: > This is needed for a few reasons that are explained in detail > in the tcp-ddp offload documentation. See patch 21 overview > and rx-data-path sections. Our reasons are as follows: I read the documentation patch, and it does not explain it and really should not since this is very mlx specific based on the changes. Different h/w will have different limitations. Given that, it would be best to enhance the patch description to explain why these gymnastics are needed for the skb. > 1) Each SKB may contain multiple PDUs. DDP offload doesn't operate on > PDU headers, so these are written in the receive ring. Therefore, we > need to rebuild the SKB to account for it. Additionally, due to HW > limitations, we will only offload the first PDU in the SKB. Are you referring to LRO skbs here? I can't imagine going through this for 1500 byte packets that have multiple PDUs. > 2) The newly constructed SKB represents the original data as it is on > the wire, such that the network stack is oblivious to the offload. > 3) We decided not to modify all of the mlx5e_skb_from_cqe* functions > because it would make the offload harder to distinguish, and it would > add overhead to the existing data-path fucntions. Therefore, we opted > for this modular approach. > > If we only had generic header-data split, then we just couldn't > provide this offload. It is not enough to place payload into some > buffer without TCP headers because RPC protocols and advanced storage > protocols, such as nvme-tcp, reorder their responses and require data > to be placed into application/pagecache buffers, which are anything > but anonymous. In other words, header-data split alone writes data > to the wrong buffers (reordering), or to anonymous buffers that > can't be page-flipped to replace application/pagecache buffers. > _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2021-01-19 4:49 UTC|newest] Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-14 15:10 [PATCH v2 net-next 00/21] nvme-tcp receive offloads Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 01/21] iov_iter: Introduce new procedures for copy to iter/pages Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 02/21] net: Introduce direct data placement tcp offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:57 ` Eric Dumazet 2021-01-14 15:57 ` Eric Dumazet 2021-01-14 20:19 ` Boris Pismenny 2021-01-14 20:19 ` Boris Pismenny 2021-01-14 20:43 ` Eric Dumazet 2021-01-14 20:43 ` Eric Dumazet 2021-01-31 10:40 ` Boris Pismenny 2021-01-31 10:40 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 03/21] net: Introduce crc offload for tcp ddp ulp Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 04/21] net: SKB copy(+hash) iterators for DDP offloads Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 05/21] net/tls: expose get_netdev_for_sock Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 06/21] nvme-tcp: Add DDP offload control path Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-19 3:47 ` David Ahern 2021-01-19 3:47 ` David Ahern 2021-01-31 7:51 ` Boris Pismenny 2021-01-31 7:51 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 07/21] nvme-tcp: Add DDP data-path Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-19 4:18 ` David Ahern 2021-01-19 4:18 ` David Ahern 2021-01-31 8:44 ` Boris Pismenny 2021-01-31 8:44 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 08/21] nvme-tcp : Recalculate crc in the end of the capsule Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 09/21] nvme-tcp: Deal with netdevice DOWN events Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 10/21] net/mlx5: Header file changes for nvme-tcp offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 11/21] net/mlx5: Add 128B CQE for NVMEoTCP offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 12/21] net/mlx5e: TCP flow steering for nvme-tcp Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 13/21] net/mlx5e: NVMEoTCP offload initialization Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 14/21] net/mlx5e: KLM UMR helper macros Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 15/21] net/mlx5e: NVMEoTCP use KLM UMRs Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 16/21] net/mlx5e: NVMEoTCP queue init/teardown Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 17/21] net/mlx5e: NVMEoTCP async ddp invalidation Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 18/21] net/mlx5e: NVMEoTCP ddp setup and resync Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 19/21] net/mlx5e: NVMEoTCP, data-path for DDP offload Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-16 4:57 ` David Ahern 2021-01-16 4:57 ` David Ahern 2021-01-17 8:42 ` Boris Pismenny 2021-01-17 8:42 ` Boris Pismenny 2021-01-19 4:36 ` David Ahern [this message] 2021-01-19 4:36 ` David Ahern 2021-01-31 9:27 ` Boris Pismenny 2021-01-31 9:27 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 20/21] net/mlx5e: NVMEoTCP statistics Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny 2021-01-14 15:10 ` [PATCH v2 net-next 21/21] Documentation: add TCP DDP offload documentation Boris Pismenny 2021-01-14 15:10 ` Boris Pismenny
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2a0bfce0-6226-7b9a-95b1-15f4f1f321e8@gmail.com \ --to=dsahern@gmail.com \ --cc=axboe@fb.com \ --cc=benishay@nvidia.com \ --cc=boris.pismenny@gmail.com \ --cc=borisp@mellanox.com \ --cc=borispismenny@gmail.com \ --cc=davem@davemloft.net \ --cc=edumazet@google.com \ --cc=hch@lst.de \ --cc=kbusch@kernel.org \ --cc=kuba@kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=netdev@vger.kernel.org \ --cc=ogerlitz@mellanox.com \ --cc=ogerlitz@nvidia.com \ --cc=saeedm@nvidia.com \ --cc=sagi@grimberg.me \ --cc=smalin@marvell.com \ --cc=viro@zeniv.linux.org.uk \ --cc=yorayz@mellanox.com \ --cc=yorayz@nvidia.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.