linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Shai Malin <smalin@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>, Aurelien Aptel <aaptel@nvidia.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"edumazet@google.com" <edumazet@google.com>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	"leon@kernel.org" <leon@kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"sagi@grimberg.me" <sagi@grimberg.me>, "hch@lst.de" <hch@lst.de>,
	"kbusch@kernel.org" <kbusch@kernel.org>,
	"axboe@fb.com" <axboe@fb.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Or Gerlitz <ogerlitz@nvidia.com>, Yoray Zack <yorayz@nvidia.com>,
	Boris Pismenny <borisp@nvidia.com>,
	"aurelien.aptel@gmail.com" <aurelien.aptel@gmail.com>,
	"malin1024@gmail.com" <malin1024@gmail.com>
Subject: RE: [PATCH v7 01/23] net: Introduce direct data placement tcp offload
Date: Wed, 26 Oct 2022 15:01:42 +0000	[thread overview]
Message-ID: <DM6PR12MB3564FB23C582CEF338D11435BC309@DM6PR12MB3564.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20221025153925.64b5b040@kernel.org>

On Tue, 25 Oct 2022 at 23:39, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 25 Oct 2022 16:59:36 +0300 Aurelien Aptel wrote:
> > diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> > index 7c2d77d75a88..bf7391aa04c7 100644
> > --- a/include/linux/netdev_features.h
> > +++ b/include/linux/netdev_features.h
> > @@ -14,7 +14,7 @@ typedef u64 netdev_features_t;
> >  enum {
> >       NETIF_F_SG_BIT,                 /* Scatter/gather IO. */
> >       NETIF_F_IP_CSUM_BIT,            /* Can checksum TCP/UDP over IPv4. */
> > -     __UNUSED_NETIF_F_1,
> > +     NETIF_F_HW_ULP_DDP_BIT,         /* ULP direct data placement offload */
> 
> Why do you need a feature bit if there is a whole caps / limit querying
> mechanism?

The caps are used for the HW device to publish the supported 
capabilities/limitation, while the feature bit is used for the DDP 
enablement "per net-device".

Disabling will be required in case that another feature which is 
mutually exclusive to the DDP is needed (as an example in the mlx case, 
CQE compress which is controlled from ethtool).

> 
> >       NETIF_F_HW_CSUM_BIT,            /* Can checksum all the packets. */
> >       NETIF_F_IPV6_CSUM_BIT,          /* Can checksum TCP/UDP over IPV6 */
> >       NETIF_F_HIGHDMA_BIT,            /* Can DMA to high memory. */
> > @@ -168,6 +168,7 @@ enum {
> >  #define NETIF_F_HW_HSR_TAG_RM        __NETIF_F(HW_HSR_TAG_RM)
> >  #define NETIF_F_HW_HSR_FWD   __NETIF_F(HW_HSR_FWD)
> >  #define NETIF_F_HW_HSR_DUP   __NETIF_F(HW_HSR_DUP)
> > +#define NETIF_F_HW_ULP_DDP   __NETIF_F(HW_ULP_DDP)
> >
> >  /* Finds the next feature with the highest number of the range of start-1 till 0.
> >   */
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index eddf8ee270e7..84554f26ad6b 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1043,6 +1043,7 @@ struct dev_ifalias {
> >
> >  struct devlink;
> >  struct tlsdev_ops;
> > +struct ulp_ddp_dev_ops;
> 
> I thought forward declarations are not required for struct members,
> are they?

Right, thanks, we will remove it.

> 
> >  struct netdev_net_notifier {
> >       struct list_head list;
> > @@ -2096,6 +2097,10 @@ struct net_device {
> >       const struct tlsdev_ops *tlsdev_ops;
> >  #endif
> >
> > +#if IS_ENABLED(CONFIG_ULP_DDP)
> > +     const struct ulp_ddp_dev_ops *ulp_ddp_ops;
> > +#endif
> 
> It's somewhat unclear to me why we add ops to struct net_device,
> rather than to ops.. can you explain?

We were trying to follow the TLS design which is similar.
Can you please clarify what do you mean by "rather than to ops.."?

> 
> >       const struct header_ops *header_ops;
> >
> >       unsigned char           operstate;
> 
> > +#include <linux/netdevice.h>
> > +#include <net/inet_connection_sock.h>
> > +#include <net/sock.h>
> > +
> > +enum ulp_ddp_type {
> > +     ULP_DDP_NVME = 1,
> 
> I think the DDP and the NVME parts should have more separation.
> 
> Are you planning to implement pure TCP placement, without NIC trying
> to also "add value" by processing whatever TCP is carrying.

We are not planning to implement pure TCP placement.
As we will present in the netdev, this is doable only if the HW is L5 aware.

> 
> Can you split the DDP and NVME harder in the API, somehow?

We can simplify the layering by using union per ulp_ddp_type.
There are no nvme structures or definitions needed in ulp_ddp.

> 
> > +};
> > +
> > +enum ulp_ddp_offload_capabilities {
> > +     ULP_DDP_C_NVME_TCP = 1,
> > +     ULP_DDP_C_NVME_TCP_DDGST_RX = 2,
> > +};
> > +
> > +/**
> > + * struct ulp_ddp_limits - Generic ulp ddp limits: tcp ddp
> > + * protocol limits.
> > + * Protocol implementations must use this as the first member.
> > + * Add new instances of ulp_ddp_limits below (nvme-tcp, etc.).
> > + *
> > + * @type:            type of this limits struct
> > + * @offload_capabilities:bitmask of supported offload types
> > + * @max_ddp_sgl_len: maximum sgl size supported (zero means no limit)
> > + * @io_threshold:    minimum payload size required to offload
> > + * @buf:             protocol-specific limits struct (if any)
> > + */
> > +struct ulp_ddp_limits {
> 
> Why is this called limits not capabilities / caps?

We will change it to caps.

> 
> > +     enum ulp_ddp_type       type;
> > +     u64                     offload_capabilities;
> > +     int                     max_ddp_sgl_len;
> > +     int                     io_threshold;
> > +     unsigned char           buf[];
> 
> Just put a union of all the protos here.

Sure.

> 
> > +};
> > +
> > +/**
> > + * struct nvme_tcp_ddp_limits - nvme tcp driver limitations
> > + *
> > + * @lmt:             generic ULP limits struct
> > + * @full_ccid_range: true if the driver supports the full CID range
> > + */
> > +struct nvme_tcp_ddp_limits {
> > +     struct ulp_ddp_limits   lmt;
> > +
> > +     bool                    full_ccid_range;
> > +};
> 
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 0640453fce54..df37db420110 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -5233,6 +5233,10 @@ tcp_collapse(struct sock *sk, struct sk_buff_head
> *list, struct rb_root *root,
> >               memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
> >  #ifdef CONFIG_TLS_DEVICE
> >               nskb->decrypted = skb->decrypted;
> > +#endif
> > +#ifdef CONFIG_ULP_DDP
> > +             nskb->ulp_ddp = skb->ulp_ddp;
> > +             nskb->ulp_crc = skb->ulp_crc;
> >  #endif
> >               TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start;
> >               if (list)
> > @@ -5266,6 +5270,10 @@ tcp_collapse(struct sock *sk, struct sk_buff_head
> *list, struct rb_root *root,
> >  #ifdef CONFIG_TLS_DEVICE
> >                               if (skb->decrypted != nskb->decrypted)
> >                                       goto end;
> > +#endif
> > +#ifdef CONFIG_ULP_DDP
> 
> no ifdef needed

Thanks, we will remove it.

> 
> > +                             if (skb_is_ulp_crc(skb) != skb_is_ulp_crc(nskb))
> > +                                     goto end;
> >  #endif
> >                       }
> >               }



  reply	other threads:[~2022-10-26 15:01 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 13:59 [PATCH v7 00/23] nvme-tcp receive offloads Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 01/23] net: Introduce direct data placement tcp offload Aurelien Aptel
2022-10-25 22:39   ` Jakub Kicinski
2022-10-26 15:01     ` Shai Malin [this message]
2022-10-26 16:24       ` Jakub Kicinski
2022-10-28 10:32         ` Shai Malin
2022-10-28 15:40           ` Jakub Kicinski
2022-10-31 18:13             ` Shai Malin
2022-10-31 23:47               ` Jakub Kicinski
2022-11-03 17:23                 ` Shai Malin
2022-11-03 17:29                   ` Aurelien Aptel
2022-11-04  1:57                     ` Jakub Kicinski
2022-11-04 13:44                       ` Aurelien Aptel
2022-11-04 16:15                         ` Jakub Kicinski
2022-10-25 13:59 ` [PATCH v7 02/23] iov_iter: DDP copy to iter/pages Aurelien Aptel
2022-10-25 16:01   ` Christoph Hellwig
2022-10-26 16:05     ` Aurelien Aptel
2022-10-25 22:40   ` Jakub Kicinski
2022-10-25 13:59 ` [PATCH v7 03/23] net/tls: export get_netdev_for_sock Aurelien Aptel
2022-10-25 16:12   ` Christoph Hellwig
2022-10-26 15:55     ` Aurelien Aptel
2022-10-30 16:06       ` Christoph Hellwig
2022-10-25 13:59 ` [PATCH v7 04/23] Revert "nvme-tcp: remove the unused queue_size member in nvme_tcp_queue" Aurelien Aptel
2022-10-25 16:14   ` Christoph Hellwig
2022-10-26 11:02     ` Sagi Grimberg
2022-10-26 11:52       ` Shai Malin
2022-10-25 13:59 ` [PATCH v7 05/23] nvme-tcp: Add DDP offload control path Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 06/23] nvme-tcp: Add DDP data-path Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 07/23] nvme-tcp: RX DDGST offload Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 08/23] nvme-tcp: Deal with netdevice DOWN events Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 09/23] nvme-tcp: Add modparam to control the ULP offload enablement Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 10/23] Documentation: add ULP DDP offload documentation Aurelien Aptel
2022-10-26 22:23   ` kernel test robot
2022-10-25 13:59 ` [PATCH v7 11/23] net/mlx5e: Rename from tls to transport static params Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 12/23] net/mlx5e: Refactor ico sq polling to get budget Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 13/23] net/mlx5e: Have mdev pointer directly on the icosq structure Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 14/23] net/mlx5e: Refactor doorbell function to allow avoiding a completion Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 15/23] net/mlx5: Add NVMEoTCP caps, HW bits, 128B CQE and enumerations Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 16/23] net/mlx5e: NVMEoTCP, offload initialization Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 17/23] net/mlx5e: TCP flow steering for nvme-tcp acceleration Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 18/23] net/mlx5e: NVMEoTCP, use KLM UMRs for buffer registration Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 19/23] net/mlx5e: NVMEoTCP, queue init/teardown Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 20/23] net/mlx5e: NVMEoTCP, ddp setup and resync Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 21/23] net/mlx5e: NVMEoTCP, async ddp invalidation Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 22/23] net/mlx5e: NVMEoTCP, data-path for DDP+DDGST offload Aurelien Aptel
2022-10-25 13:59 ` [PATCH v7 23/23] net/mlx5e: NVMEoTCP, statistics Aurelien Aptel
2022-10-25 16:00 ` [PATCH v7 00/23] nvme-tcp receive offloads Christoph Hellwig
2022-10-26  8:28   ` Or Gerlitz
2022-10-26 11:52   ` Aurelien Aptel
2022-10-27 10:35 ` Sagi Grimberg
2022-10-27 10:45   ` Shai Malin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB3564FB23C582CEF338D11435BC309@DM6PR12MB3564.namprd12.prod.outlook.com \
    --to=smalin@nvidia.com \
    --cc=aaptel@nvidia.com \
    --cc=aurelien.aptel@gmail.com \
    --cc=axboe@fb.com \
    --cc=borisp@nvidia.com \
    --cc=chaitanyak@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=malin1024@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=sagi@grimberg.me \
    --cc=tariqt@nvidia.com \
    --cc=yorayz@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).