All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] introduce GVE PMD
@ 2022-07-29 19:30 Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 01/10] net/gve: introduce GVE PMD base code Xiaoyun Li
                   ` (9 more replies)
  0 siblings, 10 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Introduce a new PMD for Google Virtual Ethernet (GVE).

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h
It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0

Xiaoyun Li (10):
  net/gve: introduce GVE PMD base code
  net/gve: add logs and OS specific implementation
  net/gve: support device initialization
  net/gve: add link update support
  net/gve: add MTU set support
  net/gve: add queue operations
  net/gve: add Rx/Tx support
  net/gve: add support to get dev info and configure dev
  net/gve: add stats support
  doc: update documentation

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  18 +
 doc/guides/nics/gve.rst                |  65 ++
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/gve/gve.h                  | 331 +++++++++
 drivers/net/gve/gve_adminq.c           | 926 +++++++++++++++++++++++++
 drivers/net/gve/gve_adminq.h           | 383 ++++++++++
 drivers/net/gve/gve_desc.h             | 139 ++++
 drivers/net/gve/gve_desc_dqo.h         | 256 +++++++
 drivers/net/gve/gve_ethdev.c           | 772 +++++++++++++++++++++
 drivers/net/gve/gve_logs.h             |  22 +
 drivers/net/gve/gve_osdep.h            | 149 ++++
 drivers/net/gve/gve_register.h         |  30 +
 drivers/net/gve/gve_rx.c               | 366 ++++++++++
 drivers/net/gve/gve_tx.c               | 678 ++++++++++++++++++
 drivers/net/gve/meson.build            |  15 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 18 files changed, 4164 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve.h
 create mode 100644 drivers/net/gve/gve_adminq.c
 create mode 100644 drivers/net/gve/gve_adminq.h
 create mode 100644 drivers/net/gve/gve_desc.h
 create mode 100644 drivers/net/gve/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_osdep.h
 create mode 100644 drivers/net/gve/gve_register.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.25.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH 01/10] net/gve: introduce GVE PMD base code
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 22:42   ` Stephen Hemminger
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
  2022-07-29 19:30 ` [PATCH 02/10] net/gve: add logs and OS specific implementation Xiaoyun Li
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson
  Cc: dev, Xiaoyun Li, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
  - gve_adminq.c
  - gve_adminq.h
  - gve_desc.h
  - gve_desc_dqo.h
  - gve_register.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
---
 drivers/net/gve/gve_adminq.c   | 925 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_adminq.h   | 381 ++++++++++++++
 drivers/net/gve/gve_desc.h     | 137 +++++
 drivers/net/gve/gve_desc_dqo.h | 254 +++++++++
 drivers/net/gve/gve_register.h |  28 +
 5 files changed, 1725 insertions(+)
 create mode 100644 drivers/net/gve/gve_adminq.c
 create mode 100644 drivers/net/gve/gve_adminq.h
 create mode 100644 drivers/net/gve/gve_desc.h
 create mode 100644 drivers/net/gve/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/gve_register.h

diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
new file mode 100644
index 0000000000..8a724f12c6
--- /dev/null
+++ b/drivers/net/gve/gve_adminq.c
@@ -0,0 +1,925 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n" \
+"Expected: length=%d, feature_mask=%x.\n" \
+"Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.\n",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.\n");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIME;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_lsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u8 *mac;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	mac = descriptor->mac;
+	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
+		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR, "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/gve_adminq.h b/drivers/net/gve/gve_adminq.h
new file mode 100644
index 0000000000..c7114cc883
--- /dev/null
+++ b/drivers/net/gve/gve_adminq.h
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/gve_desc.h b/drivers/net/gve/gve_desc.h
new file mode 100644
index 0000000000..b531669bc0
--- /dev/null
+++ b/drivers/net/gve/gve_desc.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+static_assert(sizeof(struct gve_rx_desc) == 64);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/gve_desc_dqo.h b/drivers/net/gve/gve_desc_dqo.h
new file mode 100644
index 0000000000..0d533abcd1
--- /dev/null
+++ b/drivers/net/gve/gve_desc_dqo.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/gve_register.h b/drivers/net/gve/gve_register.h
new file mode 100644
index 0000000000..b65f336be2
--- /dev/null
+++ b/drivers/net/gve/gve_register.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 02/10] net/gve: add logs and OS specific implementation
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 01/10] net/gve: introduce GVE PMD base code Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 03/10] net/gve: support device initialization Xiaoyun Li
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson
  Cc: dev, Xiaoyun Li, Haiyue Wang

Add GVE PMD logs.
Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve_adminq.h   |   2 +
 drivers/net/gve/gve_desc.h     |   2 +
 drivers/net/gve/gve_desc_dqo.h |   2 +
 drivers/net/gve/gve_logs.h     |  22 +++++
 drivers/net/gve/gve_osdep.h    | 149 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_register.h |   2 +
 6 files changed, 179 insertions(+)
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_osdep.h

diff --git a/drivers/net/gve/gve_adminq.h b/drivers/net/gve/gve_adminq.h
index c7114cc883..cd496760ae 100644
--- a/drivers/net/gve/gve_adminq.h
+++ b/drivers/net/gve/gve_adminq.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/gve_desc.h b/drivers/net/gve/gve_desc.h
index b531669bc0..049792b43e 100644
--- a/drivers/net/gve/gve_desc.h
+++ b/drivers/net/gve/gve_desc.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/gve_desc_dqo.h b/drivers/net/gve/gve_desc_dqo.h
index 0d533abcd1..5031752b43 100644
--- a/drivers/net/gve/gve_desc_dqo.h
+++ b/drivers/net/gve/gve_desc_dqo.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..a050253f59
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_init;
+extern int gve_logtype_driver;
+
+#define PMD_INIT_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_init, "%s(): " fmt "\n", \
+		__func__, ##args)
+
+#define PMD_DRV_LOG_RAW(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt, \
+		__func__, ## args)
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	PMD_DRV_LOG_RAW(level, fmt "\n", ## args)
+
+#endif
diff --git a/drivers/net/gve/gve_osdep.h b/drivers/net/gve/gve_osdep.h
new file mode 100644
index 0000000000..92acccf846
--- /dev/null
+++ b/drivers/net/gve/gve_osdep.h
@@ -0,0 +1,149 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+#define PAGE_SIZE	4096
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#define ____cacheline_aligned	__rte_cache_aligned
+#define __packed		__rte_packed
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/gve_register.h b/drivers/net/gve/gve_register.h
index b65f336be2..a599c1a08e 100644
--- a/drivers/net/gve/gve_register.h
+++ b/drivers/net/gve/gve_register.h
@@ -7,6 +7,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 03/10] net/gve: support device initialization
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 01/10] net/gve: introduce GVE PMD base code Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 02/10] net/gve: add logs and OS specific implementation Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 04/10] net/gve: add link update support Xiaoyun Li
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson
  Cc: dev, Xiaoyun Li, Haiyue Wang

Support device init and the fowllowing devops:
  - dev_configure
  - dev_start
  - dev_stop
  - dev_close

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve.h        | 249 +++++++++++++++++++++++
 drivers/net/gve/gve_adminq.c |   1 +
 drivers/net/gve/gve_ethdev.c | 375 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |  13 ++
 drivers/net/gve/version.map  |   3 +
 drivers/net/meson.build      |   1 +
 6 files changed, 642 insertions(+)
 create mode 100644 drivers/net/gve/gve.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
new file mode 100644
index 0000000000..704c88983c
--- /dev/null
+++ b/drivers/net/gve/gve.h
@@ -0,0 +1,249 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "gve_desc.h"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR	0
+#define GVE_DB_BAR	2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES	1024
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_lsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+static inline bool gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
index 8a724f12c6..438ca2070e 100644
--- a/drivers/net/gve/gve_adminq.c
+++ b/drivers/net/gve/gve_adminq.c
@@ -5,6 +5,7 @@
  * Copyright(C) 2022 Intel Corporation
  */
 
+#include "gve.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..f10f273f7d
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,375 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve.h"
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d\n", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_INIT_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_INIT_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_INIT_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_INIT_LOG(ERR, "Failed to map pci bar!\n");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_INIT_LOG(ERR, "Failed to map doorbell bar!\n");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
+	if (!eth_dev->data->mac_addrs) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory to store mac address");
+		return -ENOMEM;
+	}
+	rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	eth_dev->data->mac_addrs = NULL;
+
+	gve_teardown_priv_resources(priv);
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_init, init, NOTICE);
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..9a22cc9abe
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'gve_adminq.c',
+        'gve_ethdev.c',
+)
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..c2e0723b4c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_22 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index e35652fe63..f1a0ee2cef 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 04/10] net/gve: add link update support
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
                   ` (2 preceding siblings ...)
  2022-07-29 19:30 ` [PATCH 03/10] net/gve: support device initialization Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 05/10] net/gve: add MTU set support Xiaoyun Li
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index f10f273f7d..435115c047 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -37,10 +37,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_INIT_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -73,6 +102,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 05/10] net/gve: add MTU set support
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
                   ` (3 preceding siblings ...)
  2022-07-29 19:30 ` [PATCH 04/10] net/gve: add link update support Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 06/10] net/gve: add queue operations Xiaoyun Li
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 435115c047..26b45fde6f 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -97,12 +97,41 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	dev->data->dev_conf.rxmode.mtu = mtu + RTE_ETHER_HDR_LEN;
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 06/10] net/gve: add queue operations
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
                   ` (4 preceding siblings ...)
  2022-07-29 19:30 ` [PATCH 05/10] net/gve: add MTU set support Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 07/10] net/gve: add Rx/Tx support Xiaoyun Li
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Add support for queue operations:
 - setup rx/tx queue
 - release rx/tx queue
 - start rx/tx queues
 - stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve.h        |  52 +++++++++
 drivers/net/gve/gve_ethdev.c | 203 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 683 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index 704c88983c..a53a852a5f 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -23,6 +23,9 @@
 /* 1 for management, 1 for rx, 1 for tx */
 #define GVE_MIN_MSIX		3
 
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+
 /* PTYPEs are always 10 bits. */
 #define GVE_NUM_PTYPES	1024
 
@@ -42,15 +45,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -64,6 +87,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -72,9 +97,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -91,6 +124,8 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_irq_db {
@@ -246,4 +281,21 @@ static inline void gve_clear_device_rings_ok(struct gve_priv *priv)
 	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
 				&priv->state_flags);
 }
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void gve_tx_queue_release(void *txq);
+
+void gve_rx_queue_release(void *rxq);
+
+void gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_H_ */
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 26b45fde6f..5201398664 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -31,12 +31,111 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 {
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -68,16 +167,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -86,7 +239,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -94,6 +251,18 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+
 	return err;
 }
 
@@ -130,6 +299,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -267,7 +438,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -318,10 +491,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..7298b4cc86
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve.h"
+#include "gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_INIT_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			     hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_INIT_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			     free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_INIT_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..947c9d1627
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve.h"
+#include "gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_INIT_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			     hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_INIT_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			     free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_INIT_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_INIT_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index 9a22cc9abe..c4fd013ef2 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,5 +9,7 @@ endif
 
 sources = files(
         'gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 07/10] net/gve: add Rx/Tx support
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
                   ` (5 preceding siblings ...)
  2022-07-29 19:30 ` [PATCH 06/10] net/gve: add queue operations Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 08/10] net/gve: add support to get dev info and configure dev Xiaoyun Li
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve.h        |  17 ++
 drivers/net/gve/gve_ethdev.c |   5 +
 drivers/net/gve/gve_rx.c     | 143 +++++++++++
 drivers/net/gve/gve_tx.c     | 452 +++++++++++++++++++++++++++++++++++
 4 files changed, 617 insertions(+)

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index a53a852a5f..7f4d0e37f3 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -25,6 +25,7 @@
 
 #define GVE_DEFAULT_RX_FREE_THRESH  512
 #define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
 
 /* PTYPEs are always 10 bits. */
 #define GVE_NUM_PTYPES	1024
@@ -45,6 +46,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -298,4 +311,8 @@ void gve_stop_tx_queues(struct rte_eth_dev *dev);
 
 void gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_H_ */
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 5201398664..5ebe2c30ea 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	}
+
 	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
 	if (!eth_dev->data->mac_addrs) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory to store mac address");
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 7298b4cc86..8f560ae592 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,149 @@
 #include "gve.h"
 #include "gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+
+	rxr = rxq->rx_desc_ring;
+
+	for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		rxe->data_off = RTE_PKTMBUF_HEADROOM;
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((uint64_t)rxe->buf_addr + rxe->data_off),
+				   (void *)addr, len);
+		}
+		rxe->nb_segs = 1;
+		rxe->next = NULL;
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->packet_type = 0;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index 947c9d1627..2dc3411672 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,458 @@
 #include "gve.h"
 #include "gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+			}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		txq->next_to_clean += nb_clean;
+	}
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(fifo_addr + txq->fifo_base), (void *)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen, (void *)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(fifo_addr + txq->fifo_base), (void *)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 08/10] net/gve: add support to get dev info and configure dev
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
                   ` (6 preceding siblings ...)
  2022-07-29 19:30 ` [PATCH 07/10] net/gve: add Rx/Tx support Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 09/10] net/gve: add stats support Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 10/10] doc: update documentation Xiaoyun Li
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads configuration.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve.h        |  3 ++
 drivers/net/gve/gve_ethdev.c | 61 ++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index 7f4d0e37f3..004e0a75ca 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -27,6 +27,9 @@
 #define GVE_DEFAULT_TX_FREE_THRESH  256
 #define GVE_TX_MAX_FREE_SZ          512
 
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
 /* PTYPEs are always 10 bits. */
 #define GVE_NUM_PTYPES	1024
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 5ebe2c30ea..6bc7bf4519 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -96,6 +96,14 @@ gve_free_qpls(struct gve_priv *priv)
 static int
 gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+		dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_lsc = 1;
+
 	return 0;
 }
 
@@ -266,6 +274,58 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -299,6 +359,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 09/10] net/gve: add stats support
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
                   ` (7 preceding siblings ...)
  2022-07-29 19:30 ` [PATCH 08/10] net/gve: add support to get dev info and configure dev Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  2022-07-29 19:30 ` [PATCH 10/10] doc: update documentation Xiaoyun Li
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Update stats add support of dev_ops stats_get/reset.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/net/gve/gve.h        | 10 ++++++
 drivers/net/gve/gve_ethdev.c | 69 ++++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_rx.c     | 15 ++++++--
 drivers/net/gve/gve_tx.c     | 12 +++++++
 4 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index 004e0a75ca..e256a2bec2 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -91,6 +91,10 @@ struct gve_tx_queue {
 	struct gve_queue_page_list *qpl;
 	struct gve_tx_iovec *iov_ring;
 
+	/* Stats */
+	uint64_t packets;
+	uint64_t bytes;
+
 	uint16_t port_id;
 	uint16_t queue_id;
 
@@ -129,6 +133,12 @@ struct gve_rx_queue {
 	/* only valid for GQI_QPL queue format */
 	struct gve_queue_page_list *qpl;
 
+	/* stats */
+	uint64_t no_mbufs;
+	uint64_t errors;
+	uint64_t packets;
+	uint64_t bytes;
+
 	struct gve_priv *hw;
 	const struct rte_memzone *qres_mz;
 	struct gve_queue_resources *qres;
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 6bc7bf4519..2977df01f1 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -326,6 +326,73 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	return 0;
 }
 
+static int
+gve_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		stats->opackets += txq->packets;
+		stats->obytes += txq->bytes;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_opackets[i] = txq->packets;
+			stats->q_obytes[i] = txq->bytes;
+		}
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		stats->ipackets += rxq->packets;
+		stats->ibytes += rxq->bytes;
+		stats->ierrors += rxq->errors;
+		stats->rx_nombuf += rxq->no_mbufs;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_ipackets[i] = rxq->packets;
+			stats->q_ibytes[i] = rxq->bytes;
+			stats->q_errors[i] = rxq->errors;
+		}
+	}
+
+	return 0;
+}
+
+static int
+gve_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		txq->packets  = 0;
+		txq->bytes = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		rxq->packets  = 0;
+		rxq->bytes = 0;
+		rxq->no_mbufs = 0;
+		rxq->errors = 0;
+	}
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -363,6 +430,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
+	.stats_get            = gve_dev_stats_get,
+	.stats_reset          = gve_dev_stats_reset,
 	.mtu_set              = gve_dev_mtu_set,
 };
 
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 8f560ae592..3a8a869980 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -26,8 +26,10 @@ gve_rx_refill(struct gve_rx_queue *rxq)
 					break;
 				rxq->sw_ring[idx + i] = nmb;
 			}
-			if (i != nb_alloc)
+			if (i != nb_alloc) {
+				rxq->no_mbufs += nb_alloc - i;
 				nb_alloc = i;
+			}
 		}
 		rxq->nb_avail -= nb_alloc;
 		next_avail += nb_alloc;
@@ -88,6 +90,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	uint16_t rx_id = rxq->rx_tail;
 	struct rte_mbuf *rxe;
 	uint16_t nb_rx, len;
+	uint64_t bytes = 0;
 	uint64_t addr;
 
 	rxr = rxq->rx_desc_ring;
@@ -97,8 +100,10 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
 			break;
 
-		if (rxd->flags_seq & GVE_RXF_ERR)
+		if (rxd->flags_seq & GVE_RXF_ERR) {
+			rxq->errors++;
 			continue;
+		}
 
 		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
 		rxe = rxq->sw_ring[rx_id];
@@ -137,6 +142,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			rx_id = 0;
 
 		rx_pkts[nb_rx] = rxe;
+		bytes += len;
 	}
 
 	rxq->nb_avail += nb_rx;
@@ -145,6 +151,11 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	if (rxq->nb_avail > rxq->free_thresh)
 		gve_rx_refill(rxq);
 
+	if (nb_rx) {
+		rxq->packets += nb_rx;
+		rxq->bytes += bytes;
+	}
+
 	return nb_rx;
 }
 
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index 2dc3411672..d99e6eb009 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -260,6 +260,7 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t sw_id = txq->sw_tail;
 	uint16_t nb_used, i;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 	uint32_t hlen;
 
@@ -352,6 +353,8 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		txq->nb_free -= nb_used;
 		txq->sw_nb_free -= first->nb_segs;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
@@ -359,6 +362,9 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
 		txq->sw_tail = sw_id;
+
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
@@ -377,6 +383,7 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t nb_used, hlen, i;
 	uint64_t ol_flags, addr;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 
 	txr = txq->tx_desc_ring;
@@ -435,12 +442,17 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		txq->nb_free -= nb_used;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
 	if (nb_tx) {
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
+
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH 10/10] doc: update documentation
  2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
                   ` (8 preceding siblings ...)
  2022-07-29 19:30 ` [PATCH 09/10] net/gve: add stats support Xiaoyun Li
@ 2022-07-29 19:30 ` Xiaoyun Li
  9 siblings, 0 replies; 192+ messages in thread
From: Xiaoyun Li @ 2022-07-29 19:30 UTC (permalink / raw)
  To: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson; +Cc: dev, Xiaoyun Li

Update documentation of GVE PMD and release note.
Add Junfeng Guo as GVE PMD maintainer since he'll work on GVE PMD
in the future and maintain it and I won't be available for maintaining.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 MAINTAINERS                            |  6 +++
 doc/guides/nics/features/gve.ini       | 18 +++++++
 doc/guides/nics/gve.rst                | 65 ++++++++++++++++++++++++++
 doc/guides/rel_notes/release_22_11.rst |  4 ++
 4 files changed, 93 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 32ffdd1a61..474f41f0de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..180408aa80
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,18 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Speed capabilities   = Y
+Link status          = Y
+MTU update           = Y
+TSO                  = Y
+RSS hash             = Y
+L4 checksum offload  = Y
+Basic stats          = Y
+Stats per queue      = Y
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..310328c8ab
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,65 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_i40e**) provides poll mode driver support for
+Google Virtual Ethernet device.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_register.h
+- gve_desc.h
+- gve_desc_dqo.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL
+- GQI_RDA
+- DQO_RDA
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+Supported features of the GVE PMD are:
+
+- Multiple queues for TX and RX
+- Receiver Side Scaling (RSS)
+- TSO offload
+- Port hardware statistics
+- Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..6674f4cf6f 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added GVE net PMD**
+
+  Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  See the :doc:`../nics/gve` NIC guide for more details on this new driver.
 
 Removed Items
 -------------
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH 01/10] net/gve: introduce GVE PMD base code
  2022-07-29 19:30 ` [PATCH 01/10] net/gve: introduce GVE PMD base code Xiaoyun Li
@ 2022-07-29 22:42   ` Stephen Hemminger
  2022-07-29 22:45     ` Stephen Hemminger
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
  1 sibling, 1 reply; 192+ messages in thread
From: Stephen Hemminger @ 2022-07-29 22:42 UTC (permalink / raw)
  To: Xiaoyun Li
  Cc: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson, dev, Haiyue Wang

On Fri, 29 Jul 2022 19:30:33 +0000
Xiaoyun Li <xiaoyun.li@intel.com> wrote:

> diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
> new file mode 100644
> index 0000000000..8a724f12c6
> --- /dev/null
> +++ b/drivers/net/gve/gve_adminq.c
> @@ -0,0 +1,925 @@
> +/* SPDX-License-Identifier: MIT
> + * Google Virtual Ethernet (gve) driver
> + * Version: 1.3.0
> + * Copyright (C) 2015-2022 Google, Inc.
> + * Copyright(C) 2022 Intel Corporation
> + */
> +

This would require special license exception approval by the tech board.
Can you make it GPL or dual licensed instead please?

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH 01/10] net/gve: introduce GVE PMD base code
  2022-07-29 22:42   ` Stephen Hemminger
@ 2022-07-29 22:45     ` Stephen Hemminger
  2022-08-23  8:44       ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Stephen Hemminger @ 2022-07-29 22:45 UTC (permalink / raw)
  To: Xiaoyun Li
  Cc: junfeng.guo, qi.z.zhang, awogbemila, bruce.richardson, dev, Haiyue Wang

On Fri, 29 Jul 2022 15:42:48 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Fri, 29 Jul 2022 19:30:33 +0000
> Xiaoyun Li <xiaoyun.li@intel.com> wrote:
> 
> > diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
> > new file mode 100644
> > index 0000000000..8a724f12c6
> > --- /dev/null
> > +++ b/drivers/net/gve/gve_adminq.c
> > @@ -0,0 +1,925 @@
> > +/* SPDX-License-Identifier: MIT
> > + * Google Virtual Ethernet (gve) driver
> > + * Version: 1.3.0
> > + * Copyright (C) 2015-2022 Google, Inc.
> > + * Copyright(C) 2022 Intel Corporation
> > + */
> > +  
> 
> This would require special license exception approval by the tech board.
> Can you make it GPL or dual licensed instead please?

I meant BSD

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH 01/10] net/gve: introduce GVE PMD base code
  2022-07-29 22:45     ` Stephen Hemminger
@ 2022-08-23  8:44       ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-08-23  8:44 UTC (permalink / raw)
  To: Stephen Hemminger, Richardson, Bruce
  Cc: Zhang, Qi Z, awogbemila, dev, Wang, Haiyue, Li, Xiaoyun

Hi Bruce,

Could you help give some comments about the License?
Thanks!

Regards,
Junfeng Guo

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Saturday, July 30, 2022 06:45
> To: Li, Xiaoyun <xiaoyun.li@intel.com>
> Cc: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; dev@dpdk.org; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH 01/10] net/gve: introduce GVE PMD base code
> 
> On Fri, 29 Jul 2022 15:42:48 -0700
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> > On Fri, 29 Jul 2022 19:30:33 +0000
> > Xiaoyun Li <xiaoyun.li@intel.com> wrote:
> >
> > > diff --git a/drivers/net/gve/gve_adminq.c
> b/drivers/net/gve/gve_adminq.c
> > > new file mode 100644
> > > index 0000000000..8a724f12c6
> > > --- /dev/null
> > > +++ b/drivers/net/gve/gve_adminq.c
> > > @@ -0,0 +1,925 @@
> > > +/* SPDX-License-Identifier: MIT
> > > + * Google Virtual Ethernet (gve) driver
> > > + * Version: 1.3.0
> > > + * Copyright (C) 2015-2022 Google, Inc.
> > > + * Copyright(C) 2022 Intel Corporation
> > > + */
> > > +
> >
> > This would require special license exception approval by the tech board.
> > Can you make it GPL or dual licensed instead please?
> 
> I meant BSD

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v2 00/10] introduce GVE PMD
  2022-07-29 19:30 ` [PATCH 01/10] net/gve: introduce GVE PMD base code Xiaoyun Li
  2022-07-29 22:42   ` Stephen Hemminger
@ 2022-08-29  8:41   ` Junfeng Guo
  2022-08-29  8:41     ` [PATCH v2 01/10] net/gve: introduce GVE PMD base code Junfeng Guo
                       ` (10 more replies)
  1 sibling, 11 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Introduce a new PMD for Google Virtual Ethernet (GVE).

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
	- gve_adminq.c
	- gve_adminq.h
	- gve_desc.h
	- gve_desc_dqo.h
	- gve_register.h

It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0

v2:
fix some CI check error.

Junfeng Guo (10):
  net/gve: introduce GVE PMD base code
  net/gve: add logs and OS specific implementation
  net/gve: support device initialization
  net/gve: add link update support
  net/gve: add MTU set support
  net/gve: add queue operations
  net/gve: add Rx/Tx support
  net/gve: add support to get dev info and configure dev
  net/gve: add stats support
  doc: update documentation

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  18 +
 doc/guides/nics/gve.rst                |  65 ++
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/gve/gve.h                  | 331 +++++++++
 drivers/net/gve/gve_adminq.c           | 926 +++++++++++++++++++++++++
 drivers/net/gve/gve_adminq.h           | 383 ++++++++++
 drivers/net/gve/gve_desc.h             | 139 ++++
 drivers/net/gve/gve_desc_dqo.h         | 256 +++++++
 drivers/net/gve/gve_ethdev.c           | 772 +++++++++++++++++++++
 drivers/net/gve/gve_logs.h             |  22 +
 drivers/net/gve/gve_osdep.h            | 149 ++++
 drivers/net/gve/gve_register.h         |  30 +
 drivers/net/gve/gve_rx.c               | 366 ++++++++++
 drivers/net/gve/gve_tx.c               | 678 ++++++++++++++++++
 drivers/net/gve/meson.build            |  15 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 18 files changed, 4164 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve.h
 create mode 100644 drivers/net/gve/gve_adminq.c
 create mode 100644 drivers/net/gve/gve_adminq.h
 create mode 100644 drivers/net/gve/gve_desc.h
 create mode 100644 drivers/net/gve/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_osdep.h
 create mode 100644 drivers/net/gve/gve_register.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.34.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v2 01/10] net/gve: introduce GVE PMD base code
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-09-01 17:19       ` Ferruh Yigit
  2022-08-29  8:41     ` [PATCH v2 02/10] net/gve: add logs and OS specific implementation Junfeng Guo
                       ` (9 subsequent siblings)
  10 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	junfeng.guo, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
  - gve_adminq.c
  - gve_adminq.h
  - gve_desc.h
  - gve_desc_dqo.h
  - gve_register.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_adminq.c   | 925 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_adminq.h   | 381 ++++++++++++++
 drivers/net/gve/gve_desc.h     | 137 +++++
 drivers/net/gve/gve_desc_dqo.h | 254 +++++++++
 drivers/net/gve/gve_register.h |  28 +
 5 files changed, 1725 insertions(+)
 create mode 100644 drivers/net/gve/gve_adminq.c
 create mode 100644 drivers/net/gve/gve_adminq.h
 create mode 100644 drivers/net/gve/gve_desc.h
 create mode 100644 drivers/net/gve/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/gve_register.h

diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
new file mode 100644
index 0000000000..8a724f12c6
--- /dev/null
+++ b/drivers/net/gve/gve_adminq.c
@@ -0,0 +1,925 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n" \
+"Expected: length=%d, feature_mask=%x.\n" \
+"Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.\n",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.\n");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIME;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_lsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u8 *mac;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	mac = descriptor->mac;
+	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
+		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR, "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/gve_adminq.h b/drivers/net/gve/gve_adminq.h
new file mode 100644
index 0000000000..c7114cc883
--- /dev/null
+++ b/drivers/net/gve/gve_adminq.h
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/gve_desc.h b/drivers/net/gve/gve_desc.h
new file mode 100644
index 0000000000..358755b7e0
--- /dev/null
+++ b/drivers/net/gve/gve_desc.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+GVE_CHECK_STRUCT_LEN(64, gve_rx_desc);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/gve_desc_dqo.h b/drivers/net/gve/gve_desc_dqo.h
new file mode 100644
index 0000000000..0d533abcd1
--- /dev/null
+++ b/drivers/net/gve/gve_desc_dqo.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/gve_register.h b/drivers/net/gve/gve_register.h
new file mode 100644
index 0000000000..b65f336be2
--- /dev/null
+++ b/drivers/net/gve/gve_register.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 02/10] net/gve: add logs and OS specific implementation
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
  2022-08-29  8:41     ` [PATCH v2 01/10] net/gve: introduce GVE PMD base code Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-09-01 17:20       ` Ferruh Yigit
  2022-08-29  8:41     ` [PATCH v2 03/10] net/gve: support device initialization Junfeng Guo
                       ` (8 subsequent siblings)
  10 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	junfeng.guo, Haiyue Wang

Add GVE PMD logs.
Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_adminq.h   |   2 +
 drivers/net/gve/gve_desc.h     |   2 +
 drivers/net/gve/gve_desc_dqo.h |   2 +
 drivers/net/gve/gve_logs.h     |  22 +++++
 drivers/net/gve/gve_osdep.h    | 149 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_register.h |   2 +
 6 files changed, 179 insertions(+)
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_osdep.h

diff --git a/drivers/net/gve/gve_adminq.h b/drivers/net/gve/gve_adminq.h
index c7114cc883..cd496760ae 100644
--- a/drivers/net/gve/gve_adminq.h
+++ b/drivers/net/gve/gve_adminq.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/gve_desc.h b/drivers/net/gve/gve_desc.h
index 358755b7e0..627b9120dc 100644
--- a/drivers/net/gve/gve_desc.h
+++ b/drivers/net/gve/gve_desc.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/gve_desc_dqo.h b/drivers/net/gve/gve_desc_dqo.h
index 0d533abcd1..5031752b43 100644
--- a/drivers/net/gve/gve_desc_dqo.h
+++ b/drivers/net/gve/gve_desc_dqo.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..a050253f59
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_init;
+extern int gve_logtype_driver;
+
+#define PMD_INIT_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_init, "%s(): " fmt "\n", \
+		__func__, ##args)
+
+#define PMD_DRV_LOG_RAW(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt, \
+		__func__, ## args)
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	PMD_DRV_LOG_RAW(level, fmt "\n", ## args)
+
+#endif
diff --git a/drivers/net/gve/gve_osdep.h b/drivers/net/gve/gve_osdep.h
new file mode 100644
index 0000000000..92acccf846
--- /dev/null
+++ b/drivers/net/gve/gve_osdep.h
@@ -0,0 +1,149 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+#define PAGE_SIZE	4096
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#define ____cacheline_aligned	__rte_cache_aligned
+#define __packed		__rte_packed
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/gve_register.h b/drivers/net/gve/gve_register.h
index b65f336be2..a599c1a08e 100644
--- a/drivers/net/gve/gve_register.h
+++ b/drivers/net/gve/gve_register.h
@@ -7,6 +7,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 03/10] net/gve: support device initialization
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
  2022-08-29  8:41     ` [PATCH v2 01/10] net/gve: introduce GVE PMD base code Junfeng Guo
  2022-08-29  8:41     ` [PATCH v2 02/10] net/gve: add logs and OS specific implementation Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-09-01 17:21       ` Ferruh Yigit
  2022-09-01 17:22       ` Ferruh Yigit
  2022-08-29  8:41     ` [PATCH v2 04/10] net/gve: add link update support Junfeng Guo
                       ` (7 subsequent siblings)
  10 siblings, 2 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	junfeng.guo, Haiyue Wang

Support device init and the fowllowing devops:
  - dev_configure
  - dev_start
  - dev_stop
  - dev_close

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve.h        | 249 +++++++++++++++++++++++
 drivers/net/gve/gve_adminq.c |   1 +
 drivers/net/gve/gve_ethdev.c | 375 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |  13 ++
 drivers/net/gve/version.map  |   3 +
 drivers/net/meson.build      |   1 +
 6 files changed, 642 insertions(+)
 create mode 100644 drivers/net/gve/gve.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
new file mode 100644
index 0000000000..704c88983c
--- /dev/null
+++ b/drivers/net/gve/gve.h
@@ -0,0 +1,249 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "gve_desc.h"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR	0
+#define GVE_DB_BAR	2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES	1024
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_lsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+static inline bool gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
index 8a724f12c6..438ca2070e 100644
--- a/drivers/net/gve/gve_adminq.c
+++ b/drivers/net/gve/gve_adminq.c
@@ -5,6 +5,7 @@
  * Copyright(C) 2022 Intel Corporation
  */
 
+#include "gve.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..f10f273f7d
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,375 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve.h"
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d\n", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_INIT_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_INIT_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_INIT_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_INIT_LOG(ERR, "Failed to map pci bar!\n");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_INIT_LOG(ERR, "Failed to map doorbell bar!\n");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
+	if (!eth_dev->data->mac_addrs) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory to store mac address");
+		return -ENOMEM;
+	}
+	rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	eth_dev->data->mac_addrs = NULL;
+
+	gve_teardown_priv_resources(priv);
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_init, init, NOTICE);
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..9a22cc9abe
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'gve_adminq.c',
+        'gve_ethdev.c',
+)
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..c2e0723b4c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_22 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index e35652fe63..f1a0ee2cef 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 04/10] net/gve: add link update support
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (2 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 03/10] net/gve: support device initialization Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-09-01 17:23       ` Ferruh Yigit
  2022-08-29  8:41     ` [PATCH v2 05/10] net/gve: add MTU set support Junfeng Guo
                       ` (6 subsequent siblings)
  10 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index f10f273f7d..435115c047 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -37,10 +37,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_INIT_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -73,6 +102,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 05/10] net/gve: add MTU set support
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (3 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 04/10] net/gve: add link update support Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-08-29  8:41     ` [PATCH v2 06/10] net/gve: add queue operations Junfeng Guo
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 435115c047..26b45fde6f 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -97,12 +97,41 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	dev->data->dev_conf.rxmode.mtu = mtu + RTE_ETHER_HDR_LEN;
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 06/10] net/gve: add queue operations
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (4 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 05/10] net/gve: add MTU set support Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-08-29  8:41     ` [PATCH v2 07/10] net/gve: add Rx/Tx support Junfeng Guo
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Add support for queue operations:
 - setup rx/tx queue
 - release rx/tx queue
 - start rx/tx queues
 - stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve.h        |  52 +++++++++
 drivers/net/gve/gve_ethdev.c | 203 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 683 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index 704c88983c..a53a852a5f 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -23,6 +23,9 @@
 /* 1 for management, 1 for rx, 1 for tx */
 #define GVE_MIN_MSIX		3
 
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+
 /* PTYPEs are always 10 bits. */
 #define GVE_NUM_PTYPES	1024
 
@@ -42,15 +45,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -64,6 +87,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -72,9 +97,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -91,6 +124,8 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_irq_db {
@@ -246,4 +281,21 @@ static inline void gve_clear_device_rings_ok(struct gve_priv *priv)
 	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
 				&priv->state_flags);
 }
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void gve_tx_queue_release(void *txq);
+
+void gve_rx_queue_release(void *rxq);
+
+void gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_H_ */
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 26b45fde6f..5201398664 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -31,12 +31,111 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 {
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -68,16 +167,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -86,7 +239,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -94,6 +251,18 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+
 	return err;
 }
 
@@ -130,6 +299,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -267,7 +438,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -318,10 +491,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..7298b4cc86
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve.h"
+#include "gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_INIT_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			     hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_INIT_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			     free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_INIT_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..947c9d1627
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve.h"
+#include "gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_INIT_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			     hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_INIT_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			     free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_INIT_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_INIT_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index 9a22cc9abe..c4fd013ef2 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,5 +9,7 @@ endif
 
 sources = files(
         'gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 07/10] net/gve: add Rx/Tx support
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (5 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 06/10] net/gve: add queue operations Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-08-29  8:41     ` [PATCH v2 08/10] net/gve: add support to get dev info and configure dev Junfeng Guo
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve.h        |  17 ++
 drivers/net/gve/gve_ethdev.c |   5 +
 drivers/net/gve/gve_rx.c     | 143 +++++++++++
 drivers/net/gve/gve_tx.c     | 452 +++++++++++++++++++++++++++++++++++
 4 files changed, 617 insertions(+)

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index a53a852a5f..7f4d0e37f3 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -25,6 +25,7 @@
 
 #define GVE_DEFAULT_RX_FREE_THRESH  512
 #define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
 
 /* PTYPEs are always 10 bits. */
 #define GVE_NUM_PTYPES	1024
@@ -45,6 +46,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -298,4 +311,8 @@ void gve_stop_tx_queues(struct rte_eth_dev *dev);
 
 void gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_H_ */
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 5201398664..5ebe2c30ea 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	}
+
 	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
 	if (!eth_dev->data->mac_addrs) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory to store mac address");
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 7298b4cc86..8f560ae592 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,149 @@
 #include "gve.h"
 #include "gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+
+	rxr = rxq->rx_desc_ring;
+
+	for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		rxe->data_off = RTE_PKTMBUF_HEADROOM;
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((uint64_t)rxe->buf_addr + rxe->data_off),
+				   (void *)addr, len);
+		}
+		rxe->nb_segs = 1;
+		rxe->next = NULL;
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->packet_type = 0;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index 947c9d1627..2dc3411672 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,458 @@
 #include "gve.h"
 #include "gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+			}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		txq->next_to_clean += nb_clean;
+	}
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(fifo_addr + txq->fifo_base), (void *)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen, (void *)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(fifo_addr + txq->fifo_base), (void *)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 08/10] net/gve: add support to get dev info and configure dev
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (6 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 07/10] net/gve: add Rx/Tx support Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-09-01 17:23       ` Ferruh Yigit
  2022-08-29  8:41     ` [PATCH v2 09/10] net/gve: add stats support Junfeng Guo
                       ` (2 subsequent siblings)
  10 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads configuration.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve.h        |  3 ++
 drivers/net/gve/gve_ethdev.c | 61 ++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index 7f4d0e37f3..004e0a75ca 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -27,6 +27,9 @@
 #define GVE_DEFAULT_TX_FREE_THRESH  256
 #define GVE_TX_MAX_FREE_SZ          512
 
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
 /* PTYPEs are always 10 bits. */
 #define GVE_NUM_PTYPES	1024
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 5ebe2c30ea..6bc7bf4519 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -96,6 +96,14 @@ gve_free_qpls(struct gve_priv *priv)
 static int
 gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+		dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_lsc = 1;
+
 	return 0;
 }
 
@@ -266,6 +274,58 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -299,6 +359,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 09/10] net/gve: add stats support
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (7 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 08/10] net/gve: add support to get dev info and configure dev Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-09-01 17:24       ` Ferruh Yigit
  2022-08-29  8:41     ` [PATCH v2 10/10] doc: update documentation Junfeng Guo
  2022-09-01 17:19     ` [PATCH v2 00/10] introduce GVE PMD Ferruh Yigit
  10 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Update stats add support of dev_ops stats_get/reset.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve.h        | 10 ++++++
 drivers/net/gve/gve_ethdev.c | 69 ++++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_rx.c     | 15 ++++++--
 drivers/net/gve/gve_tx.c     | 12 +++++++
 4 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
index 004e0a75ca..e256a2bec2 100644
--- a/drivers/net/gve/gve.h
+++ b/drivers/net/gve/gve.h
@@ -91,6 +91,10 @@ struct gve_tx_queue {
 	struct gve_queue_page_list *qpl;
 	struct gve_tx_iovec *iov_ring;
 
+	/* Stats */
+	uint64_t packets;
+	uint64_t bytes;
+
 	uint16_t port_id;
 	uint16_t queue_id;
 
@@ -129,6 +133,12 @@ struct gve_rx_queue {
 	/* only valid for GQI_QPL queue format */
 	struct gve_queue_page_list *qpl;
 
+	/* stats */
+	uint64_t no_mbufs;
+	uint64_t errors;
+	uint64_t packets;
+	uint64_t bytes;
+
 	struct gve_priv *hw;
 	const struct rte_memzone *qres_mz;
 	struct gve_queue_resources *qres;
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 6bc7bf4519..2977df01f1 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -326,6 +326,73 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	return 0;
 }
 
+static int
+gve_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		stats->opackets += txq->packets;
+		stats->obytes += txq->bytes;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_opackets[i] = txq->packets;
+			stats->q_obytes[i] = txq->bytes;
+		}
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		stats->ipackets += rxq->packets;
+		stats->ibytes += rxq->bytes;
+		stats->ierrors += rxq->errors;
+		stats->rx_nombuf += rxq->no_mbufs;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_ipackets[i] = rxq->packets;
+			stats->q_ibytes[i] = rxq->bytes;
+			stats->q_errors[i] = rxq->errors;
+		}
+	}
+
+	return 0;
+}
+
+static int
+gve_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		txq->packets  = 0;
+		txq->bytes = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		rxq->packets  = 0;
+		rxq->bytes = 0;
+		rxq->no_mbufs = 0;
+		rxq->errors = 0;
+	}
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -363,6 +430,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
+	.stats_get            = gve_dev_stats_get,
+	.stats_reset          = gve_dev_stats_reset,
 	.mtu_set              = gve_dev_mtu_set,
 };
 
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 8f560ae592..3a8a869980 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -26,8 +26,10 @@ gve_rx_refill(struct gve_rx_queue *rxq)
 					break;
 				rxq->sw_ring[idx + i] = nmb;
 			}
-			if (i != nb_alloc)
+			if (i != nb_alloc) {
+				rxq->no_mbufs += nb_alloc - i;
 				nb_alloc = i;
+			}
 		}
 		rxq->nb_avail -= nb_alloc;
 		next_avail += nb_alloc;
@@ -88,6 +90,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	uint16_t rx_id = rxq->rx_tail;
 	struct rte_mbuf *rxe;
 	uint16_t nb_rx, len;
+	uint64_t bytes = 0;
 	uint64_t addr;
 
 	rxr = rxq->rx_desc_ring;
@@ -97,8 +100,10 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
 			break;
 
-		if (rxd->flags_seq & GVE_RXF_ERR)
+		if (rxd->flags_seq & GVE_RXF_ERR) {
+			rxq->errors++;
 			continue;
+		}
 
 		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
 		rxe = rxq->sw_ring[rx_id];
@@ -137,6 +142,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			rx_id = 0;
 
 		rx_pkts[nb_rx] = rxe;
+		bytes += len;
 	}
 
 	rxq->nb_avail += nb_rx;
@@ -145,6 +151,11 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	if (rxq->nb_avail > rxq->free_thresh)
 		gve_rx_refill(rxq);
 
+	if (nb_rx) {
+		rxq->packets += nb_rx;
+		rxq->bytes += bytes;
+	}
+
 	return nb_rx;
 }
 
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index 2dc3411672..d99e6eb009 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -260,6 +260,7 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t sw_id = txq->sw_tail;
 	uint16_t nb_used, i;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 	uint32_t hlen;
 
@@ -352,6 +353,8 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		txq->nb_free -= nb_used;
 		txq->sw_nb_free -= first->nb_segs;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
@@ -359,6 +362,9 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
 		txq->sw_tail = sw_id;
+
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
@@ -377,6 +383,7 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t nb_used, hlen, i;
 	uint64_t ol_flags, addr;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 
 	txr = txq->tx_desc_ring;
@@ -435,12 +442,17 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		txq->nb_free -= nb_used;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
 	if (nb_tx) {
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
+
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v2 10/10] doc: update documentation
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (8 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 09/10] net/gve: add stats support Junfeng Guo
@ 2022-08-29  8:41     ` Junfeng Guo
  2022-09-01 17:20       ` Ferruh Yigit
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
  2022-09-01 17:19     ` [PATCH v2 00/10] introduce GVE PMD Ferruh Yigit
  10 siblings, 2 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-08-29  8:41 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, junfeng.guo

Update documentation of GVE PMD and release note.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 MAINTAINERS                            |  6 +++
 doc/guides/nics/features/gve.ini       | 18 +++++++
 doc/guides/nics/gve.rst                | 65 ++++++++++++++++++++++++++
 doc/guides/rel_notes/release_22_11.rst |  4 ++
 4 files changed, 93 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 32ffdd1a61..474f41f0de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..180408aa80
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,18 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Speed capabilities   = Y
+Link status          = Y
+MTU update           = Y
+TSO                  = Y
+RSS hash             = Y
+L4 checksum offload  = Y
+Basic stats          = Y
+Stats per queue      = Y
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..20cda5031b
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,65 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_gve**) provides poll mode driver support for
+Google Virtual Ethernet device.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_desc.h
+- gve_desc_dqo.h
+- gve_register.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL - GQI with queue page list
+- GQI_RDA - GQI with raw DMA addressing
+- DQO_RDA - DQO with raw DMA addressing
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+Supported features of the GVE PMD are:
+
+- Multiple queues for TX and RX
+- Receiver Side Scaling (RSS)
+- TSO offload
+- Port hardware statistics
+- Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..6674f4cf6f 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added GVE net PMD**
+
+  Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  See the :doc:`../nics/gve` NIC guide for more details on this new driver.
 
 Removed Items
 -------------
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 01/10] net/gve: introduce GVE PMD base code
  2022-08-29  8:41     ` [PATCH v2 01/10] net/gve: introduce GVE PMD base code Junfeng Guo
@ 2022-09-01 17:19       ` Ferruh Yigit
  2022-09-01 18:23         ` Stephen Hemminger
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:19 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, Stephen Hemminger, Hemant Agrawal
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, Haiyue Wang, techboard

On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> The following base code is based on Google Virtual Ethernet (gve)
> driver v1.3.0 under MIT license.
>    - gve_adminq.c
>    - gve_adminq.h
>    - gve_desc.h
>    - gve_desc_dqo.h
>    - gve_register.h
> 
> The original code is in:
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
> tree/v1.3.0/google/gve
> 
> Signed-off-by: Xiaoyun Li<xiaoyun.li@intel.com>
> Signed-off-by: Haiyue Wang<haiyue.wang@intel.com>
> Signed-off-by: Junfeng Guo<junfeng.guo@intel.com>
> ---
>   drivers/net/gve/gve_adminq.c   | 925 +++++++++++++++++++++++++++++++++
>   drivers/net/gve/gve_adminq.h   | 381 ++++++++++++++
>   drivers/net/gve/gve_desc.h     | 137 +++++
>   drivers/net/gve/gve_desc_dqo.h | 254 +++++++++
>   drivers/net/gve/gve_register.h |  28 +
>   5 files changed, 1725 insertions(+)
>   create mode 100644 drivers/net/gve/gve_adminq.c
>   create mode 100644 drivers/net/gve/gve_adminq.h
>   create mode 100644 drivers/net/gve/gve_desc.h
>   create mode 100644 drivers/net/gve/gve_desc_dqo.h
>   create mode 100644 drivers/net/gve/gve_register.h
> 
> diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
> new file mode 100644
> index 0000000000..8a724f12c6
> --- /dev/null
> +++ b/drivers/net/gve/gve_adminq.c
> @@ -0,0 +1,925 @@
> +/* SPDX-License-Identifier: MIT
> + * Google Virtual Ethernet (gve) driver
> + * Version: 1.3.0
> + * Copyright (C) 2015-2022 Google, Inc.
> + * Copyright(C) 2022 Intel Corporation
> + */
> +

Can you please get approval for the MIT license from techboard, as 
Stephen highlighted in previous version?


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 00/10] introduce GVE PMD
  2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
                       ` (9 preceding siblings ...)
  2022-08-29  8:41     ` [PATCH v2 10/10] doc: update documentation Junfeng Guo
@ 2022-09-01 17:19     ` Ferruh Yigit
  2022-09-07  2:09       ` Guo, Junfeng
  10 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:19 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson

On 8/29/2022 9:41 AM, Junfeng Guo wrote:

> 
> Introduce a new PMD for Google Virtual Ethernet (GVE).
> 
> This patch set requires an exception for MIT license for GVE base code.
> And the base code includes the following files:
>          - gve_adminq.c
>          - gve_adminq.h
>          - gve_desc.h
>          - gve_desc_dqo.h
>          - gve_register.h
> 
> It's based on GVE kernel driver v1.3.0 and the original code is in
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0
> 
> v2:
> fix some CI check error.
> 
> Junfeng Guo (10):
>    net/gve: introduce GVE PMD base code
>    net/gve: add logs and OS specific implementation
>    net/gve: support device initialization
>    net/gve: add link update support
>    net/gve: add MTU set support
>    net/gve: add queue operations
>    net/gve: add Rx/Tx support
>    net/gve: add support to get dev info and configure dev
>    net/gve: add stats support
>    doc: update documentation
> 

Please check build error reported by CI:
https://patches.dpdk.org/project/dpdk/patch/20220829084127.934183-11-junfeng.guo@intel.com/

I am also getting various build errors, even not able to reach patch by 
patch build stage where I expect some issues, can you please verify 
patch by patch build in next version?


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 10/10] doc: update documentation
  2022-08-29  8:41     ` [PATCH v2 10/10] doc: update documentation Junfeng Guo
@ 2022-09-01 17:20       ` Ferruh Yigit
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
  1 sibling, 0 replies; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:20 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson

On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
> Update documentation of GVE PMD and release note.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   MAINTAINERS                            |  6 +++
>   doc/guides/nics/features/gve.ini       | 18 +++++++
>   doc/guides/nics/gve.rst                | 65 ++++++++++++++++++++++++++
>   doc/guides/rel_notes/release_22_11.rst |  4 ++
>   4 files changed, 93 insertions(+)
>   create mode 100644 doc/guides/nics/features/gve.ini
>   create mode 100644 doc/guides/nics/gve.rst
> 

Need to update index file (doc/guides/nics/index.rst) to make new 
gve.rst file visible. Did you test build and viewing the documentation?

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 32ffdd1a61..474f41f0de 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -697,6 +697,12 @@ F: drivers/net/enic/
>   F: doc/guides/nics/enic.rst
>   F: doc/guides/nics/features/enic.ini
> 
> +Google Virtual Ethernet
> +M: Junfeng Guo <junfeng.guo@intel.com>
> +F: drivers/net/gve/
> +F: doc/guides/nics/gve.rst
> +F: doc/guides/nics/features/gve.ini
> +
>   Hisilicon hns3
>   M: Dongdong Liu <liudongdong3@huawei.com>
>   M: Yisen Zhuang <yisen.zhuang@huawei.com>
> diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
> new file mode 100644
> index 0000000000..180408aa80
> --- /dev/null
> +++ b/doc/guides/nics/features/gve.ini
> @@ -0,0 +1,18 @@
> +;
> +; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Speed capabilities   = Y
> +Link status          = Y
> +MTU update           = Y
> +TSO                  = Y
> +RSS hash             = Y
> +L4 checksum offload  = Y
> +Basic stats          = Y
> +Stats per queue      = Y
> +Linux                = Y
> +x86-32               = Y
> +x86-64               = Y
> +Usage doc            = Y

Can you please add this patch as first patch with infrastructure files 
(like meson files, .map file etc..) and later update the .ini with each 
patch, as a patch introduces a feature update .ini to add that feature.

> diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
> new file mode 100644
> index 0000000000..20cda5031b
> --- /dev/null
> +++ b/doc/guides/nics/gve.rst
> @@ -0,0 +1,65 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(C) 2022 Intel Corporation.
> +
> +GVE poll mode driver
> +=======================
> +
> +The GVE PMD (**librte_net_gve**) provides poll mode driver support for
> +Google Virtual Ethernet device.
> +

Can you please provide some references for the device? Official product 
link etc...

> +The base code is under MIT license and based on GVE kernel driver v1.3.0.
> +GVE base code files are:
> +
> +- gve_adminq.h
> +- gve_adminq.c
> +- gve_desc.h
> +- gve_desc_dqo.h
> +- gve_register.h
> +

Instead of listing these files in the documentation, what do you think 
to place under a 'base' folder (drivers/net/gve/base/*) which will 
clarify that these are base code files.

> +Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
> +to find the original base code.
> +
> +GVE has 3 queue formats:
> +
> +- GQI_QPL - GQI with queue page list
> +- GQI_RDA - GQI with raw DMA addressing
> +- DQO_RDA - DQO with raw DMA addressing
> +
> +GQI_QPL queue format is queue page list mode. Driver needs to allocate
> +memory and register this memory as a Queue Page List (QPL) in hardware
> +(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
> +Then Tx needs to copy packets to QPL memory and put this packet's offset
> +in the QPL memory into hardware descriptors so that hardware can get the
> +packets data. And Rx needs to read descriptors of offset in QPL to get
> +QPL address and copy packets from the address to get real packets data.
> +
> +GQI_RDA queue format works like usual NICs that driver can put packets'
> +physical address into hardware descriptors.
> +
> +DQO_RDA queue format has submission and completion queue pair for each
> +Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
> +address into hardware descriptors.
> +
> +Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
> +to get more information about GVE queue formats.
> +
> +Features and Limitations
> +------------------------
> +
> +In this release, the GVE PMD provides the basic functionality of packet
> +reception and transmission.
> +Supported features of the GVE PMD are:
> +
> +- Multiple queues for TX and RX
> +- Receiver Side Scaling (RSS)
> +- TSO offload
> +- Port hardware statistics
> +- Link state information
> +- TX multi-segments (Scatter TX)
> +- Tx UDP/TCP/SCTP Checksum
> +

Same comment as .ini file above. Let's build the features list gradually 
as code adds them, instead of adding it as batch at the end.



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 02/10] net/gve: add logs and OS specific implementation
  2022-08-29  8:41     ` [PATCH v2 02/10] net/gve: add logs and OS specific implementation Junfeng Guo
@ 2022-09-01 17:20       ` Ferruh Yigit
  2022-09-07  6:58         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:20 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, Haiyue Wang

On 8/29/2022 9:41 AM, Junfeng Guo wrote:

> 
> Add GVE PMD logs.
> Add some MACRO definitions and memory operations which are specific
> for DPDK.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
> new file mode 100644
> index 0000000000..a050253f59
> --- /dev/null
> +++ b/drivers/net/gve/gve_logs.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +
> +#ifndef _GVE_LOGS_H_
> +#define _GVE_LOGS_H_
> +
> +extern int gve_logtype_init;
> +extern int gve_logtype_driver;
> +
> +#define PMD_INIT_LOG(level, fmt, args...) \
> +       rte_log(RTE_LOG_ ## level, gve_logtype_init, "%s(): " fmt "\n", \
> +               __func__, ##args)
> +
> +#define PMD_DRV_LOG_RAW(level, fmt, args...) \
> +       rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt, \
> +               __func__, ## args)
> + > +#define PMD_DRV_LOG(level, fmt, args...) \
> +       PMD_DRV_LOG_RAW(level, fmt "\n", ## args)
> +

Why 'PMD_DRV_LOG_RAW' is needed, why not directly use 'PMD_DRV_LOG'?


Do you really need two different log types? How do you differentiate 
'init' & 'driver' types? As far as I can see there is mixed usage of them.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 03/10] net/gve: support device initialization
  2022-08-29  8:41     ` [PATCH v2 03/10] net/gve: support device initialization Junfeng Guo
@ 2022-09-01 17:21       ` Ferruh Yigit
  2022-09-23  9:38         ` Guo, Junfeng
  2022-09-01 17:22       ` Ferruh Yigit
  1 sibling, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:21 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, Haiyue Wang

On 8/29/2022 9:41 AM, Junfeng Guo wrote:

> 
> Support device init and the fowllowing devops:
>    - dev_configure
>    - dev_start
>    - dev_stop
>    - dev_close
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   drivers/net/gve/gve.h        | 249 +++++++++++++++++++++++
>   drivers/net/gve/gve_adminq.c |   1 +
>   drivers/net/gve/gve_ethdev.c | 375 +++++++++++++++++++++++++++++++++++
>   drivers/net/gve/meson.build  |  13 ++
>   drivers/net/gve/version.map  |   3 +
>   drivers/net/meson.build      |   1 +
>   6 files changed, 642 insertions(+)
>   create mode 100644 drivers/net/gve/gve.h
>   create mode 100644 drivers/net/gve/gve_ethdev.c
>   create mode 100644 drivers/net/gve/meson.build
>   create mode 100644 drivers/net/gve/version.map
> 
> diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
> new file mode 100644
> index 0000000000..704c88983c
> --- /dev/null
> +++ b/drivers/net/gve/gve.h
> @@ -0,0 +1,249 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +
> +#ifndef _GVE_H_
> +#define _GVE_H_
> +
> +#include <ethdev_driver.h>
> +#include <ethdev_pci.h>
> +#include <rte_ether.h>
> +
> +#include "gve_desc.h"
> +
> +#ifndef GOOGLE_VENDOR_ID
> +#define GOOGLE_VENDOR_ID       0x1ae0
> +#endif
> +
> +#define GVE_DEV_ID             0x0042
> +
> +#define GVE_REG_BAR    0
> +#define GVE_DB_BAR     2
> +
> +/* 1 for management, 1 for rx, 1 for tx */
> +#define GVE_MIN_MSIX           3
> +
> +/* PTYPEs are always 10 bits. */
> +#define GVE_NUM_PTYPES 1024
> +
> +/* A list of pages registered with the device during setup and used by a queue
> + * as buffers
> + */
> +struct gve_queue_page_list {
> +       uint32_t id; /* unique id */
> +       uint32_t num_entries;
> +       dma_addr_t *page_buses; /* the dma addrs of the pages */
> +       const struct rte_memzone *mz;
> +};
> +
> +/* A TX desc ring entry */
> +union gve_tx_desc {
> +       struct gve_tx_pkt_desc pkt; /* first desc for a packet */
> +       struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
> +};
> +
> +struct gve_tx_queue {
> +       volatile union gve_tx_desc *tx_desc_ring;
> +       const struct rte_memzone *mz;
> +       uint64_t tx_ring_phys_addr;
> +
> +       uint16_t nb_tx_desc;
> +
> +       /* Only valid for DQO_QPL queue format */
> +       struct gve_queue_page_list *qpl;
> +
> +       uint16_t port_id;
> +       uint16_t queue_id;
> +
> +       uint16_t ntfy_id;
> +       volatile rte_be32_t *ntfy_addr;
> +
> +       struct gve_priv *hw;
> +       const struct rte_memzone *qres_mz;
> +       struct gve_queue_resources *qres;
> +
> +       /* Only valid for DQO_RDA queue format */
> +       struct gve_tx_queue *complq;
> +};
> +
> +struct gve_rx_queue {
> +       volatile struct gve_rx_desc *rx_desc_ring;
> +       volatile union gve_rx_data_slot *rx_data_ring;
> +       const struct rte_memzone *mz;
> +       const struct rte_memzone *data_mz;
> +       uint64_t rx_ring_phys_addr;
> +
> +       uint16_t nb_rx_desc;
> +
> +       volatile rte_be32_t *ntfy_addr;
> +
> +       /* only valid for GQI_QPL queue format */
> +       struct gve_queue_page_list *qpl;
> +
> +       struct gve_priv *hw;
> +       const struct rte_memzone *qres_mz;
> +       struct gve_queue_resources *qres;
> +
> +       uint16_t port_id;
> +       uint16_t queue_id;
> +       uint16_t ntfy_id;
> +       uint16_t rx_buf_len;
> +
> +       /* Only valid for DQO_RDA queue format */
> +       struct gve_rx_queue *bufq;
> +};
> +
> +struct gve_irq_db {
> +       rte_be32_t id;
> +} ____cacheline_aligned;
> +
> +struct gve_ptype {
> +       uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
> +       uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
> +};
> +
> +struct gve_ptype_lut {
> +       struct gve_ptype ptypes[GVE_NUM_PTYPES];
> +};
> +
> +enum gve_queue_format {
> +       GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
> +       GVE_GQI_RDA_FORMAT           = 0x1, /* GQI Raw Addressing */
> +       GVE_GQI_QPL_FORMAT           = 0x2, /* GQI Queue Page List */
> +       GVE_DQO_RDA_FORMAT           = 0x3, /* DQO Raw Addressing */
> +};
> +

Shouldn't these queue format information be part of 'base' file? Both 
for licensing issues also to cover the case it is updated in the google 
repo.
But if some dpdk related information is required, what do you think to 
split this file as one for base folder, other for dpdk?

> +struct gve_priv {
> +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> +       const struct rte_memzone *irq_dbs_mz;
> +       uint32_t mgmt_msix_idx;
> +       rte_be32_t *cnt_array; /* array of num_event_counters */
> +       const struct rte_memzone *cnt_array_mz;
> +
> +       uint16_t num_event_counters;
> +       uint16_t tx_desc_cnt; /* txq size */
> +       uint16_t rx_desc_cnt; /* rxq size */
> +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> +       uint16_t rx_data_slot_cnt; /* rx buffer length */
> +
> +       /* Only valid for DQO_RDA queue format */
> +       uint16_t tx_compq_size; /* tx completion queue size */
> +       uint16_t rx_bufq_size; /* rx buff queue size */
> +
> +       uint64_t max_registered_pages;
> +       uint64_t num_registered_pages; /* num pages registered with NIC */
> +       uint16_t default_num_queues; /* default num queues to set up */
> +       enum gve_queue_format queue_format; /* see enum gve_queue_format */
> +       uint8_t enable_lsc;
> +
> +       uint16_t max_nb_txq;
> +       uint16_t max_nb_rxq;
> +       uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
> +
> +       struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
> +       rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
> +       struct rte_pci_device *pci_dev;
> +
> +       /* Admin queue - see gve_adminq.h*/
> +       union gve_adminq_command *adminq;
> +       struct gve_dma_mem adminq_dma_mem;
> +       uint32_t adminq_mask; /* masks prod_cnt to adminq size */
> +       uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
> +       uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
> +       uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
> +       /* free-running count of per AQ cmd executed */
> +       uint32_t adminq_describe_device_cnt;
> +       uint32_t adminq_cfg_device_resources_cnt;
> +       uint32_t adminq_register_page_list_cnt;
> +       uint32_t adminq_unregister_page_list_cnt;
> +       uint32_t adminq_create_tx_queue_cnt;
> +       uint32_t adminq_create_rx_queue_cnt;
> +       uint32_t adminq_destroy_tx_queue_cnt;
> +       uint32_t adminq_destroy_rx_queue_cnt;
> +       uint32_t adminq_dcfg_device_resources_cnt;
> +       uint32_t adminq_set_driver_parameter_cnt;
> +       uint32_t adminq_report_stats_cnt;
> +       uint32_t adminq_report_link_speed_cnt;
> +       uint32_t adminq_get_ptype_map_cnt;
> +
> +       volatile uint32_t state_flags;
> +
> +       /* Gvnic device link speed from hypervisor. */
> +       uint64_t link_speed;
> +
> +       uint16_t max_mtu;
> +       struct rte_ether_addr dev_addr; /* mac address */
> +
> +       struct gve_queue_page_list *qpl;
> +
> +       struct gve_tx_queue **txqs;
> +       struct gve_rx_queue **rxqs;
> +};
> +

The device private data is provided fully here, as well as 
'gve_rx_queue' and 'gve_tx_queue' structs etc, although most of the 
files not used at this stage and not clear why needed.

Instead of adding full structs, can you please add here bare minimum 
structs, whatever used at this point, and as they are used in .c file, 
keep adding them in structs too?
This clarifies what is used/added for specific features, also lets us 
easily figure out unused fields / clutter in the header files.

> +enum gve_state_flags_bit {
> +       GVE_PRIV_FLAGS_ADMIN_QUEUE_OK           = 1,
> +       GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK      = 2,
> +       GVE_PRIV_FLAGS_DEVICE_RINGS_OK          = 3,
> +       GVE_PRIV_FLAGS_NAPI_ENABLED             = 4,
> +};
> +
> +static inline bool gve_is_gqi(struct gve_priv *priv)
> +{
> +       return priv->queue_format == GVE_GQI_RDA_FORMAT ||
> +               priv->queue_format == GVE_GQI_QPL_FORMAT;
> +}
> +
> +static inline bool gve_get_admin_queue_ok(struct gve_priv *priv)
> +{
> +       return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
> +                                      &priv->state_flags);
> +}
> +
> +static inline void gve_set_admin_queue_ok(struct gve_priv *priv)
> +{
> +       rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
> +                             &priv->state_flags);
> +}
> +
> +static inline void gve_clear_admin_queue_ok(struct gve_priv *priv)
> +{
> +       rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
> +                               &priv->state_flags);
> +}
> +
> +static inline bool gve_get_device_resources_ok(struct gve_priv *priv)
> +{
> +       return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
> +                                      &priv->state_flags);
> +}
> +
> +static inline void gve_set_device_resources_ok(struct gve_priv *priv)
> +{
> +       rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
> +                             &priv->state_flags);
> +}
> +
> +static inline void gve_clear_device_resources_ok(struct gve_priv *priv)
> +{
> +       rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
> +                               &priv->state_flags);
> +}
> +
> +static inline bool gve_get_device_rings_ok(struct gve_priv *priv)
> +{
> +       return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
> +                                      &priv->state_flags);
> +}
> +
> +static inline void gve_set_device_rings_ok(struct gve_priv *priv)
> +{
> +       rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
> +                             &priv->state_flags);
> +}
> +
> +static inline void gve_clear_device_rings_ok(struct gve_priv *priv)
> +{
> +       rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
> +                               &priv->state_flags);
> +}

If this is dpdk file, not base file, please follow dpdk coding 
convention, like return type should be in separate line.

> +#endif /* _GVE_H_ */
> diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
> index 8a724f12c6..438ca2070e 100644
> --- a/drivers/net/gve/gve_adminq.c
> +++ b/drivers/net/gve/gve_adminq.c
> @@ -5,6 +5,7 @@
>    * Copyright(C) 2022 Intel Corporation
>    */
> 
> +#include "gve.h"
>   #include "gve_adminq.h"
>   #include "gve_register.h"
> 
> diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
> new file mode 100644
> index 0000000000..f10f273f7d
> --- /dev/null
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -0,0 +1,375 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +#include <linux/pci_regs.h>
> +
> +#include "gve.h"
> +#include "gve_adminq.h"
> +#include "gve_register.h"
> +
> +#define GVE_VERSION            "1.3.0"
> +#define GVE_VERSION_PREFIX     "GVE-"
> +

Again, shouldn't these come from base file (google repo)?

Perhaps can be good to discuss what is the base file update 
strategy/plan for future?
It can be easier if you can drop some external files with minimum change 
in dpdk files, that can be done by grouping external content in spefic 
files.
Qi has lots of experince on this and I believe he can privide useful 
insight.

> +const char gve_version_str[] = GVE_VERSION;
> +static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
> +
> +static void
> +gve_write_version(uint8_t *driver_version_register)
> +{
> +       const char *c = gve_version_prefix;
> +
> +       while (*c) {
> +               writeb(*c, driver_version_register);
> +               c++;
> +       }
> +
> +       c = gve_version_str;
> +       while (*c) {
> +               writeb(*c, driver_version_register);
> +               c++;
> +       }
> +       writeb('\n', driver_version_register);
> +}
> +
> +static int
> +gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
> +{
> +       return 0;
> +}
> +
> +static int
> +gve_dev_start(struct rte_eth_dev *dev)
> +{
> +       dev->data->dev_started = 1;
> +
> +       return 0;
> +}
> +
> +static int
> +gve_dev_stop(struct rte_eth_dev *dev)
> +{
> +       dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
> +       dev->data->dev_started = 0;
> +
> +       return 0;
> +}
> +
> +static int
> +gve_dev_close(struct rte_eth_dev *dev)
> +{
> +       int err = 0;
> +
> +       if (dev->data->dev_started) {
> +               err = gve_dev_stop(dev);
> +               if (err != 0)
> +                       PMD_DRV_LOG(ERR, "Failed to stop dev.");
> +       }
> +

Just a reminder that in 'close' driver should free all the resources, if 
there is previously allocated memomory, it should be freed now.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 03/10] net/gve: support device initialization
  2022-08-29  8:41     ` [PATCH v2 03/10] net/gve: support device initialization Junfeng Guo
  2022-09-01 17:21       ` Ferruh Yigit
@ 2022-09-01 17:22       ` Ferruh Yigit
  1 sibling, 0 replies; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:22 UTC (permalink / raw)
  To: bruce.richardson, David Marchand, Thomas Monjalon, Andrew Rybchenko
  Cc: dev, xiaoyun.li, awogbemila, Haiyue Wang, Junfeng Guo,
	qi.z.zhang, jingjing.wu

On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
> new file mode 100644
> index 0000000000..c2e0723b4c
> --- /dev/null
> +++ b/drivers/net/gve/version.map
> @@ -0,0 +1,3 @@
> +DPDK_22 {
> +       local: *;
> +};

It should be 'DPDK_23' now.

@Bruce, @David, @Thomas, @Andrew,

This is one of the common errors as far as I can see, and many times the 
.map is just empty and required because of the build process.

Can we add a build system magic to let libraries/drivers use a common 
template for the .map file, unless library/driver needs something special?
This prevents this trivial error also reduces overhead for the
'yy.mm-rc0' commit.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 04/10] net/gve: add link update support
  2022-08-29  8:41     ` [PATCH v2 04/10] net/gve: add link update support Junfeng Guo
@ 2022-09-01 17:23       ` Ferruh Yigit
  2022-09-23  9:38         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:23 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson

On 8/29/2022 9:41 AM, Junfeng Guo wrote:

> 
> Support dev_ops link_update.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   drivers/net/gve/gve_ethdev.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
> index f10f273f7d..435115c047 100644
> --- a/drivers/net/gve/gve_ethdev.c
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -37,10 +37,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
>          return 0;
>   }
> 
> +static int
> +gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
> +{
> +       struct gve_priv *priv = dev->data->dev_private;
> +       struct rte_eth_link link;
> +       int err;
> +
> +       memset(&link, 0, sizeof(link));
> +       link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
> +       link.link_autoneg = RTE_ETH_LINK_AUTONEG;
> +
> +       if (!dev->data->dev_started) {
> +               link.link_status = RTE_ETH_LINK_DOWN;
> +               link.link_speed = RTE_ETH_SPEED_NUM_NONE;
> +       } else {
> +               link.link_status = RTE_ETH_LINK_UP;
> +               PMD_INIT_LOG(DEBUG, "Get link status from hw");
> +               err = gve_adminq_report_link_speed(priv);

As far as I can see the API is calling an adminq command, is this 
command blocking until link is up? If so is there a non blocking version 
to utilize 'wait_to_complete', instead of ignoring it?

Also what will happen if 'start()' dev_ops called but cable is not 
plugged in at all, won't this set link status still to "UP"?



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 08/10] net/gve: add support to get dev info and configure dev
  2022-08-29  8:41     ` [PATCH v2 08/10] net/gve: add support to get dev info and configure dev Junfeng Guo
@ 2022-09-01 17:23       ` Ferruh Yigit
  2022-09-23  9:38         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:23 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson

On 8/29/2022 9:41 AM, Junfeng Guo wrote:

> 
> Add dev_ops dev_infos_get.
> Complete dev_configure with RX offloads configuration.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   drivers/net/gve/gve.h        |  3 ++
>   drivers/net/gve/gve_ethdev.c | 61 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 64 insertions(+)
> 
> diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
> index 7f4d0e37f3..004e0a75ca 100644
> --- a/drivers/net/gve/gve.h
> +++ b/drivers/net/gve/gve.h
> @@ -27,6 +27,9 @@
>   #define GVE_DEFAULT_TX_FREE_THRESH  256
>   #define GVE_TX_MAX_FREE_SZ          512
> 
> +#define GVE_MIN_BUF_SIZE           1024
> +#define GVE_MAX_RX_PKTLEN          65535
> +
>   /* PTYPEs are always 10 bits. */
>   #define GVE_NUM_PTYPES 1024
> 
> diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
> index 5ebe2c30ea..6bc7bf4519 100644
> --- a/drivers/net/gve/gve_ethdev.c
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -96,6 +96,14 @@ gve_free_qpls(struct gve_priv *priv)
>   static int
>   gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
>   {
> +       struct gve_priv *priv = dev->data->dev_private;
> +
> +       if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
> +               dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
> +
> +       if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
> +               priv->enable_lsc = 1;

What is the relation between LRO and LSC? Is it a typo?

And does driver support LSC at all? Or any interrupt?

> +
>          return 0;
>   }
> 
> @@ -266,6 +274,58 @@ gve_dev_close(struct rte_eth_dev *dev)
>          return err;
>   }
> 
> +static int
> +gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +       struct gve_priv *priv = dev->data->dev_private;
> +
> +       dev_info->device = dev->device;
> +       dev_info->max_mac_addrs = 1;
> +       dev_info->max_rx_queues = priv->max_nb_rxq;
> +       dev_info->max_tx_queues = priv->max_nb_txq;
> +       dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
> +       dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
> +

Can you please provide 'max_mtu' & 'min_mtu' values too?


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 09/10] net/gve: add stats support
  2022-08-29  8:41     ` [PATCH v2 09/10] net/gve: add stats support Junfeng Guo
@ 2022-09-01 17:24       ` Ferruh Yigit
  2022-09-23  9:38         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-01 17:24 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson

On 8/29/2022 9:41 AM, Junfeng Guo wrote:

> 
> Update stats add support of dev_ops stats_get/reset.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   drivers/net/gve/gve.h        | 10 ++++++
>   drivers/net/gve/gve_ethdev.c | 69 ++++++++++++++++++++++++++++++++++++
>   drivers/net/gve/gve_rx.c     | 15 ++++++--
>   drivers/net/gve/gve_tx.c     | 12 +++++++
>   4 files changed, 104 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
> index 004e0a75ca..e256a2bec2 100644
> --- a/drivers/net/gve/gve.h
> +++ b/drivers/net/gve/gve.h
> @@ -91,6 +91,10 @@ struct gve_tx_queue {
>          struct gve_queue_page_list *qpl;
>          struct gve_tx_iovec *iov_ring;
> 
> +       /* Stats */
> +       uint64_t packets;
> +       uint64_t bytes;
> +

Can't you get stats for 'errors' in Tx path?



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 01/10] net/gve: introduce GVE PMD base code
  2022-09-01 17:19       ` Ferruh Yigit
@ 2022-09-01 18:23         ` Stephen Hemminger
  2022-09-01 20:49           ` Thomas Monjalon
  0 siblings, 1 reply; 192+ messages in thread
From: Stephen Hemminger @ 2022-09-01 18:23 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Junfeng Guo, qi.z.zhang, jingjing.wu, Hemant Agrawal, dev,
	xiaoyun.li, awogbemila, bruce.richardson, Haiyue Wang, techboard

On Thu, 1 Sep 2022 18:19:22 +0100
Ferruh Yigit <ferruh.yigit@xilinx.com> wrote:

> > 
> > diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
> > new file mode 100644
> > index 0000000000..8a724f12c6
> > --- /dev/null
> > +++ b/drivers/net/gve/gve_adminq.c
> > @@ -0,0 +1,925 @@
> > +/* SPDX-License-Identifier: MIT
> > + * Google Virtual Ethernet (gve) driver
> > + * Version: 1.3.0
> > + * Copyright (C) 2015-2022 Google, Inc.
> > + * Copyright(C) 2022 Intel Corporation
> > + */
> > +  
> 
> Can you please get approval for the MIT license from techboard, as 
> Stephen highlighted in previous version?


I would prefer that it be BSD or dual licensed.
Although MIT and BSD-3 licenses are compatible, this is not something techboard can decide
it requires a statement from a knowledgeable open source lawyer (Intel or LF).

Please fix the license to BSD and save lots of trouble.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 01/10] net/gve: introduce GVE PMD base code
  2022-09-01 18:23         ` Stephen Hemminger
@ 2022-09-01 20:49           ` Thomas Monjalon
  2022-09-06  9:31             ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Thomas Monjalon @ 2022-09-01 20:49 UTC (permalink / raw)
  To: Ferruh Yigit, techboard
  Cc: Junfeng Guo, qi.z.zhang, jingjing.wu, Hemant Agrawal, dev,
	xiaoyun.li, awogbemila, bruce.richardson, Haiyue Wang, techboard,
	Stephen Hemminger

01/09/2022 20:23, Stephen Hemminger:
> On Thu, 1 Sep 2022 18:19:22 +0100
> Ferruh Yigit <ferruh.yigit@xilinx.com> wrote:
> 
> > > 
> > > diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
> > > new file mode 100644
> > > index 0000000000..8a724f12c6
> > > --- /dev/null
> > > +++ b/drivers/net/gve/gve_adminq.c
> > > @@ -0,0 +1,925 @@
> > > +/* SPDX-License-Identifier: MIT
> > > + * Google Virtual Ethernet (gve) driver
> > > + * Version: 1.3.0
> > > + * Copyright (C) 2015-2022 Google, Inc.
> > > + * Copyright(C) 2022 Intel Corporation
> > > + */
> > > +  
> > 
> > Can you please get approval for the MIT license from techboard, as 
> > Stephen highlighted in previous version?
> 
> 
> I would prefer that it be BSD or dual licensed.
> Although MIT and BSD-3 licenses are compatible, this is not something techboard can decide
> it requires a statement from a knowledgeable open source lawyer (Intel or LF).
> 
> Please fix the license to BSD and save lots of trouble.

+1 to change to BSD to avoid trouble.




^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 01/10] net/gve: introduce GVE PMD base code
  2022-09-01 20:49           ` Thomas Monjalon
@ 2022-09-06  9:31             ` Guo, Junfeng
  2022-09-14 10:38               ` Thomas Monjalon
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-06  9:31 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, techboard
  Cc: Zhang, Qi Z, Wu, Jingjing, Hemant Agrawal, dev, Li, Xiaoyun,
	awogbemila, Richardson, Bruce, Wang, Haiyue, techboard,
	Stephen Hemminger



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, September 2, 2022 04:50
> To: Ferruh Yigit <ferruh.yigit@xilinx.com>; techboard@dpdk.org
> Cc: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Hemant
> Agrawal <hemant.agrawal@nxp.com>; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>;
> techboard@dpdk.org; Stephen Hemminger
> <stephen@networkplumber.org>
> Subject: Re: [PATCH v2 01/10] net/gve: introduce GVE PMD base code
> 
> 01/09/2022 20:23, Stephen Hemminger:
> > On Thu, 1 Sep 2022 18:19:22 +0100
> > Ferruh Yigit <ferruh.yigit@xilinx.com> wrote:
> >
> > > >
> > > > diff --git a/drivers/net/gve/gve_adminq.c
> b/drivers/net/gve/gve_adminq.c
> > > > new file mode 100644
> > > > index 0000000000..8a724f12c6
> > > > --- /dev/null
> > > > +++ b/drivers/net/gve/gve_adminq.c
> > > > @@ -0,0 +1,925 @@
> > > > +/* SPDX-License-Identifier: MIT
> > > > + * Google Virtual Ethernet (gve) driver
> > > > + * Version: 1.3.0
> > > > + * Copyright (C) 2015-2022 Google, Inc.
> > > > + * Copyright(C) 2022 Intel Corporation
> > > > + */
> > > > +
> > >
> > > Can you please get approval for the MIT license from techboard, as
> > > Stephen highlighted in previous version?
> >
> >
> > I would prefer that it be BSD or dual licensed.
> > Although MIT and BSD-3 licenses are compatible, this is not something
> techboard can decide
> > it requires a statement from a knowledgeable open source lawyer (Intel
> or LF).
> >
> > Please fix the license to BSD and save lots of trouble.
> 
> +1 to change to BSD to avoid trouble.

Thanks for your concern and comments!
Yes, we are also willing to have these base code under BSD license.

Note that these code are not Intel files and they come from the kernel community.
Everyone can reach the code at:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0.
The base code here has the statement of SPDX-License-Identifier: (GPL-2.0 OR MIT).

Thus, we may not be in the good position to re-license these code,
and we didn't find the BSD-licensed version at any open community,
so we just follow the required MIT license as an exception to DPDK.

Regards,
Junfeng

> 
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 00/10] introduce GVE PMD
  2022-09-01 17:19     ` [PATCH v2 00/10] introduce GVE PMD Ferruh Yigit
@ 2022-09-07  2:09       ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-07  2:09 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Friday, September 2, 2022 01:19
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: Re: [PATCH v2 00/10] introduce GVE PMD
> 
> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> 
> >
> > Introduce a new PMD for Google Virtual Ethernet (GVE).
> >
> > This patch set requires an exception for MIT license for GVE base code.
> > And the base code includes the following files:
> >          - gve_adminq.c
> >          - gve_adminq.h
> >          - gve_desc.h
> >          - gve_desc_dqo.h
> >          - gve_register.h
> >
> > It's based on GVE kernel driver v1.3.0 and the original code is in
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linux/tree/v1.3.0
> >
> > v2:
> > fix some CI check error.
> >
> > Junfeng Guo (10):
> >    net/gve: introduce GVE PMD base code
> >    net/gve: add logs and OS specific implementation
> >    net/gve: support device initialization
> >    net/gve: add link update support
> >    net/gve: add MTU set support
> >    net/gve: add queue operations
> >    net/gve: add Rx/Tx support
> >    net/gve: add support to get dev info and configure dev
> >    net/gve: add stats support
> >    doc: update documentation
> >
> 
> Please check build error reported by CI:
> https://patches.dpdk.org/project/dpdk/patch/20220829084127.934183-
> 11-junfeng.guo@intel.com/
> 
> I am also getting various build errors, even not able to reach patch by
> patch build stage where I expect some issues, can you please verify
> patch by patch build in next version?

Sure, thanks for reminding!
The compile/build issues are being handled in process now. 
Thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 02/10] net/gve: add logs and OS specific implementation
  2022-09-01 17:20       ` Ferruh Yigit
@ 2022-09-07  6:58         ` Guo, Junfeng
  2022-09-07 11:16           ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-07  6:58 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Friday, September 2, 2022 01:21
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v2 02/10] net/gve: add logs and OS specific
> implementation
> 
> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> 
> >
> > Add GVE PMD logs.
> > Add some MACRO definitions and memory operations which are specific
> > for DPDK.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
> > new file mode 100644
> > index 0000000000..a050253f59
> > --- /dev/null
> > +++ b/drivers/net/gve/gve_logs.h
> > @@ -0,0 +1,22 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation
> > + */
> > +
> > +#ifndef _GVE_LOGS_H_
> > +#define _GVE_LOGS_H_
> > +
> > +extern int gve_logtype_init;
> > +extern int gve_logtype_driver;
> > +
> > +#define PMD_INIT_LOG(level, fmt, args...) \
> > +       rte_log(RTE_LOG_ ## level, gve_logtype_init, "%s(): " fmt "\n", \
> > +               __func__, ##args)
> > +
> > +#define PMD_DRV_LOG_RAW(level, fmt, args...) \
> > +       rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt, \
> > +               __func__, ## args)
> > + > +#define PMD_DRV_LOG(level, fmt, args...) \
> > +       PMD_DRV_LOG_RAW(level, fmt "\n", ## args)
> > +
> 
> Why 'PMD_DRV_LOG_RAW' is needed, why not directly use
> 'PMD_DRV_LOG'?

It seems that the _RAW macro was first introduced at i40e driver logs file.
Since sometimes the trailing '\n' is added at the end of the log message in
the base code, the PMD_DRV_LOG_RAW macro that will not add one is
used to keep consistent of the new line character.

Well, looks that the macro PMD_DRV_LOG_RAW is somewhat redundant.
I think it's ok to remove PMD_DRV_LOG_RAW and keep all the log messages
end without the trailing '\n'. Thanks!

> 
> 
> Do you really need two different log types? How do you differentiate
> 'init' & 'driver' types? As far as I can see there is mixed usage of them.

The PMD_INIT_LOG is used at the init stage, while the PMD_DRV_LOG
is used at the driver normal running stage. I agree that there might be
mixed usage of these two macros. I'll try to check all these usages and 
update them at correct conditions in the coming versions. 
If you insist that only one log type is needed to keep the code clean,
then I could update them as you expected. Thanks!

Regards,
Junfeng

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 02/10] net/gve: add logs and OS specific implementation
  2022-09-07  6:58         ` Guo, Junfeng
@ 2022-09-07 11:16           ` Ferruh Yigit
  2022-09-08  8:09             ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-09-07 11:16 UTC (permalink / raw)
  To: Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Wang, Haiyue

On 9/7/2022 7:58 AM, Guo, Junfeng wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
>> Sent: Friday, September 2, 2022 01:21
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
>> Subject: Re: [PATCH v2 02/10] net/gve: add logs and OS specific
>> implementation
>>
>> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
>>
>>>
>>> Add GVE PMD logs.
>>> Add some MACRO definitions and memory operations which are specific
>>> for DPDK.
>>>
>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
>>> new file mode 100644
>>> index 0000000000..a050253f59
>>> --- /dev/null
>>> +++ b/drivers/net/gve/gve_logs.h
>>> @@ -0,0 +1,22 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright(C) 2022 Intel Corporation
>>> + */
>>> +
>>> +#ifndef _GVE_LOGS_H_
>>> +#define _GVE_LOGS_H_
>>> +
>>> +extern int gve_logtype_init;
>>> +extern int gve_logtype_driver;
>>> +
>>> +#define PMD_INIT_LOG(level, fmt, args...) \
>>> +       rte_log(RTE_LOG_ ## level, gve_logtype_init, "%s(): " fmt "\n", \
>>> +               __func__, ##args)
>>> +
>>> +#define PMD_DRV_LOG_RAW(level, fmt, args...) \
>>> +       rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt, \
>>> +               __func__, ## args)
>>> + > +#define PMD_DRV_LOG(level, fmt, args...) \
>>> +       PMD_DRV_LOG_RAW(level, fmt "\n", ## args)
>>> +
>>
>> Why 'PMD_DRV_LOG_RAW' is needed, why not directly use
>> 'PMD_DRV_LOG'?
> 
> It seems that the _RAW macro was first introduced at i40e driver logs file.
> Since sometimes the trailing '\n' is added at the end of the log message in
> the base code, the PMD_DRV_LOG_RAW macro that will not add one is
> used to keep consistent of the new line character.
> 
> Well, looks that the macro PMD_DRV_LOG_RAW is somewhat redundant.
> I think it's ok to remove PMD_DRV_LOG_RAW and keep all the log messages
> end without the trailing '\n'. Thanks!
> 

Or you can add '\n' to 'PMD_DRV_LOG', to not change all logs. Only 
having two macro seems unnecessary.

>>
>>
>> Do you really need two different log types? How do you differentiate
>> 'init' & 'driver' types? As far as I can see there is mixed usage of them.
> 
> The PMD_INIT_LOG is used at the init stage, while the PMD_DRV_LOG
> is used at the driver normal running stage. I agree that there might be
> mixed usage of these two macros. I'll try to check all these usages and
> update them at correct conditions in the coming versions.
> If you insist that only one log type is needed to keep the code clean,
> then I could update them as you expected. Thanks!
> 

I do not insist, but it looks like you are complicating things, is there 
really a benefit to have two different log types?



^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 02/10] net/gve: add logs and OS specific implementation
  2022-09-07 11:16           ` Ferruh Yigit
@ 2022-09-08  8:09             ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-08  8:09 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Wednesday, September 7, 2022 19:17
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v2 02/10] net/gve: add logs and OS specific
> implementation
> 
> On 9/7/2022 7:58 AM, Guo, Junfeng wrote:
> > CAUTION: This message has originated from an External Source. Please
> use proper judgment and caution when opening attachments, clicking
> links, or responding to this email.
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> >> Sent: Friday, September 2, 2022 01:21
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> >> Subject: Re: [PATCH v2 02/10] net/gve: add logs and OS specific
> >> implementation
> >>
> >> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> >>
> >>>
> >>> Add GVE PMD logs.
> >>> Add some MACRO definitions and memory operations which are
> specific
> >>> for DPDK.
> >>>
> >>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
> >>> new file mode 100644
> >>> index 0000000000..a050253f59
> >>> --- /dev/null
> >>> +++ b/drivers/net/gve/gve_logs.h
> >>> @@ -0,0 +1,22 @@
> >>> +/* SPDX-License-Identifier: BSD-3-Clause
> >>> + * Copyright(C) 2022 Intel Corporation
> >>> + */
> >>> +
> >>> +#ifndef _GVE_LOGS_H_
> >>> +#define _GVE_LOGS_H_
> >>> +
> >>> +extern int gve_logtype_init;
> >>> +extern int gve_logtype_driver;
> >>> +
> >>> +#define PMD_INIT_LOG(level, fmt, args...) \
> >>> +       rte_log(RTE_LOG_ ## level, gve_logtype_init, "%s(): " fmt "\n", \
> >>> +               __func__, ##args)
> >>> +
> >>> +#define PMD_DRV_LOG_RAW(level, fmt, args...) \
> >>> +       rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt, \
> >>> +               __func__, ## args)
> >>> + > +#define PMD_DRV_LOG(level, fmt, args...) \
> >>> +       PMD_DRV_LOG_RAW(level, fmt "\n", ## args)
> >>> +
> >>
> >> Why 'PMD_DRV_LOG_RAW' is needed, why not directly use
> >> 'PMD_DRV_LOG'?
> >
> > It seems that the _RAW macro was first introduced at i40e driver logs
> file.
> > Since sometimes the trailing '\n' is added at the end of the log message
> in
> > the base code, the PMD_DRV_LOG_RAW macro that will not add one is
> > used to keep consistent of the new line character.
> >
> > Well, looks that the macro PMD_DRV_LOG_RAW is somewhat
> redundant.
> > I think it's ok to remove PMD_DRV_LOG_RAW and keep all the log
> messages
> > end without the trailing '\n'. Thanks!
> >
> 
> Or you can add '\n' to 'PMD_DRV_LOG', to not change all logs. Only
> having two macro seems unnecessary.

Yes, already did as this form in the coming version gve pmd code. Thanks!

> 
> >>
> >>
> >> Do you really need two different log types? How do you differentiate
> >> 'init' & 'driver' types? As far as I can see there is mixed usage of them.
> >
> > The PMD_INIT_LOG is used at the init stage, while the PMD_DRV_LOG
> > is used at the driver normal running stage. I agree that there might be
> > mixed usage of these two macros. I'll try to check all these usages and
> > update them at correct conditions in the coming versions.
> > If you insist that only one log type is needed to keep the code clean,
> > then I could update them as you expected. Thanks!
> >
> 
> I do not insist, but it looks like you are complicating things, is there
> really a benefit to have two different log types?

Well, these two types may be used to show init/driver logs, respectively.
But It seems that there is no such specific need to use two log types in the
GVE PMD. Anyway, I think it is good time to keep the code clean and not 
just inherit from previous drivers. We can add new log type in the future
if it's required. Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v2 01/10] net/gve: introduce GVE PMD base code
  2022-09-06  9:31             ` Guo, Junfeng
@ 2022-09-14 10:38               ` Thomas Monjalon
  0 siblings, 0 replies; 192+ messages in thread
From: Thomas Monjalon @ 2022-09-14 10:38 UTC (permalink / raw)
  To: Ferruh Yigit, techboard, Guo, Junfeng, Stephen Hemminger
  Cc: Zhang, Qi Z, Wu, Jingjing, Hemant Agrawal, dev, Li, Xiaoyun,
	awogbemila, Richardson, Bruce, Wang, Haiyue

06/09/2022 11:31, Guo, Junfeng:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 01/09/2022 20:23, Stephen Hemminger:
> > > On Thu, 1 Sep 2022 18:19:22 +0100
> > > Ferruh Yigit <ferruh.yigit@xilinx.com> wrote:
> > >
> > > > >
> > > > > diff --git a/drivers/net/gve/gve_adminq.c
> > b/drivers/net/gve/gve_adminq.c
> > > > > new file mode 100644
> > > > > index 0000000000..8a724f12c6
> > > > > --- /dev/null
> > > > > +++ b/drivers/net/gve/gve_adminq.c
> > > > > @@ -0,0 +1,925 @@
> > > > > +/* SPDX-License-Identifier: MIT
> > > > > + * Google Virtual Ethernet (gve) driver
> > > > > + * Version: 1.3.0
> > > > > + * Copyright (C) 2015-2022 Google, Inc.
> > > > > + * Copyright(C) 2022 Intel Corporation
> > > > > + */
> > > > > +
> > > >
> > > > Can you please get approval for the MIT license from techboard, as
> > > > Stephen highlighted in previous version?
> > >
> > >
> > > I would prefer that it be BSD or dual licensed.
> > > Although MIT and BSD-3 licenses are compatible, this is not something
> > techboard can decide
> > > it requires a statement from a knowledgeable open source lawyer (Intel
> > or LF).
> > >
> > > Please fix the license to BSD and save lots of trouble.
> > 
> > +1 to change to BSD to avoid trouble.
> 
> Thanks for your concern and comments!
> Yes, we are also willing to have these base code under BSD license.
> 
> Note that these code are not Intel files and they come from the kernel community.
> Everyone can reach the code at:
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0.
> The base code here has the statement of SPDX-License-Identifier: (GPL-2.0 OR MIT).
> 
> Thus, we may not be in the good position to re-license these code,
> and we didn't find the BSD-licensed version at any open community,
> so we just follow the required MIT license as an exception to DPDK.

I understand we are in trouble here.
We need the techboard to decide what to do.
If we want to go in the MIT direction, we need to ask the govboard.



^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 03/10] net/gve: support device initialization
  2022-09-01 17:21       ` Ferruh Yigit
@ 2022-09-23  9:38         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-23  9:38 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Friday, September 2, 2022 01:22
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v2 03/10] net/gve: support device initialization
> 
> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> 
> >
> > Support device init and the fowllowing devops:
> >    - dev_configure
> >    - dev_start
> >    - dev_stop
> >    - dev_close
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   drivers/net/gve/gve.h        | 249 +++++++++++++++++++++++
> >   drivers/net/gve/gve_adminq.c |   1 +
> >   drivers/net/gve/gve_ethdev.c | 375
> +++++++++++++++++++++++++++++++++++
> >   drivers/net/gve/meson.build  |  13 ++
> >   drivers/net/gve/version.map  |   3 +
> >   drivers/net/meson.build      |   1 +
> >   6 files changed, 642 insertions(+)
> >   create mode 100644 drivers/net/gve/gve.h
> >   create mode 100644 drivers/net/gve/gve_ethdev.c
> >   create mode 100644 drivers/net/gve/meson.build
> >   create mode 100644 drivers/net/gve/version.map
> >
> > diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
> > new file mode 100644
> > index 0000000000..704c88983c
> > --- /dev/null
> > +++ b/drivers/net/gve/gve.h
> > @@ -0,0 +1,249 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation
> > + */
> > +
> > +#ifndef _GVE_H_
> > +#define _GVE_H_
> > +
> > +#include <ethdev_driver.h>
> > +#include <ethdev_pci.h>
> > +#include <rte_ether.h>
> > +
> > +#include "gve_desc.h"
> > +
> > +#ifndef GOOGLE_VENDOR_ID
> > +#define GOOGLE_VENDOR_ID       0x1ae0
> > +#endif
> > +
> > +#define GVE_DEV_ID             0x0042
> > +
> > +#define GVE_REG_BAR    0
> > +#define GVE_DB_BAR     2
> > +
> > +/* 1 for management, 1 for rx, 1 for tx */
> > +#define GVE_MIN_MSIX           3
> > +
> > +/* PTYPEs are always 10 bits. */
> > +#define GVE_NUM_PTYPES 1024
> > +
> > +/* A list of pages registered with the device during setup and used by a
> queue
> > + * as buffers
> > + */
> > +struct gve_queue_page_list {
> > +       uint32_t id; /* unique id */
> > +       uint32_t num_entries;
> > +       dma_addr_t *page_buses; /* the dma addrs of the pages */
> > +       const struct rte_memzone *mz;
> > +};
> > +
> > +/* A TX desc ring entry */
> > +union gve_tx_desc {
> > +       struct gve_tx_pkt_desc pkt; /* first desc for a packet */
> > +       struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
> > +};
> > +
> > +struct gve_tx_queue {
> > +       volatile union gve_tx_desc *tx_desc_ring;
> > +       const struct rte_memzone *mz;
> > +       uint64_t tx_ring_phys_addr;
> > +
> > +       uint16_t nb_tx_desc;
> > +
> > +       /* Only valid for DQO_QPL queue format */
> > +       struct gve_queue_page_list *qpl;
> > +
> > +       uint16_t port_id;
> > +       uint16_t queue_id;
> > +
> > +       uint16_t ntfy_id;
> > +       volatile rte_be32_t *ntfy_addr;
> > +
> > +       struct gve_priv *hw;
> > +       const struct rte_memzone *qres_mz;
> > +       struct gve_queue_resources *qres;
> > +
> > +       /* Only valid for DQO_RDA queue format */
> > +       struct gve_tx_queue *complq;
> > +};
> > +
> > +struct gve_rx_queue {
> > +       volatile struct gve_rx_desc *rx_desc_ring;
> > +       volatile union gve_rx_data_slot *rx_data_ring;
> > +       const struct rte_memzone *mz;
> > +       const struct rte_memzone *data_mz;
> > +       uint64_t rx_ring_phys_addr;
> > +
> > +       uint16_t nb_rx_desc;
> > +
> > +       volatile rte_be32_t *ntfy_addr;
> > +
> > +       /* only valid for GQI_QPL queue format */
> > +       struct gve_queue_page_list *qpl;
> > +
> > +       struct gve_priv *hw;
> > +       const struct rte_memzone *qres_mz;
> > +       struct gve_queue_resources *qres;
> > +
> > +       uint16_t port_id;
> > +       uint16_t queue_id;
> > +       uint16_t ntfy_id;
> > +       uint16_t rx_buf_len;
> > +
> > +       /* Only valid for DQO_RDA queue format */
> > +       struct gve_rx_queue *bufq;
> > +};
> > +
> > +struct gve_irq_db {
> > +       rte_be32_t id;
> > +} ____cacheline_aligned;
> > +
> > +struct gve_ptype {
> > +       uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
> > +       uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
> > +};
> > +
> > +struct gve_ptype_lut {
> > +       struct gve_ptype ptypes[GVE_NUM_PTYPES];
> > +};
> > +
> > +enum gve_queue_format {
> > +       GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified
> */
> > +       GVE_GQI_RDA_FORMAT           = 0x1, /* GQI Raw Addressing */
> > +       GVE_GQI_QPL_FORMAT           = 0x2, /* GQI Queue Page List */
> > +       GVE_DQO_RDA_FORMAT           = 0x3, /* DQO Raw Addressing */
> > +};
> > +
> 
> Shouldn't these queue format information be part of 'base' file? Both
> for licensing issues also to cover the case it is updated in the google
> repo.
> But if some dpdk related information is required, what do you think to
> split this file as one for base folder, other for dpdk?

Yes, current solution is to move these base-related code into another
gve.h file and rename this file as gve_ethdev.h to match the dpdk style. 
Will update the solution in the coming version.
Thanks!

> 
> > +struct gve_priv {
> > +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> > +       const struct rte_memzone *irq_dbs_mz;
> > +       uint32_t mgmt_msix_idx;
> > +       rte_be32_t *cnt_array; /* array of num_event_counters */
> > +       const struct rte_memzone *cnt_array_mz;
> > +
> > +       uint16_t num_event_counters;
> > +       uint16_t tx_desc_cnt; /* txq size */
> > +       uint16_t rx_desc_cnt; /* rxq size */
> > +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> > +       uint16_t rx_data_slot_cnt; /* rx buffer length */
> > +
> > +       /* Only valid for DQO_RDA queue format */
> > +       uint16_t tx_compq_size; /* tx completion queue size */
> > +       uint16_t rx_bufq_size; /* rx buff queue size */
> > +
> > +       uint64_t max_registered_pages;
> > +       uint64_t num_registered_pages; /* num pages registered with NIC
> */
> > +       uint16_t default_num_queues; /* default num queues to set up */
> > +       enum gve_queue_format queue_format; /* see enum
> gve_queue_format */
> > +       uint8_t enable_lsc;
> > +
> > +       uint16_t max_nb_txq;
> > +       uint16_t max_nb_rxq;
> > +       uint32_t num_ntfy_blks; /* spilt between TX and RX so must be
> even */
> > +
> > +       struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
> > +       rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
> > +       struct rte_pci_device *pci_dev;
> > +
> > +       /* Admin queue - see gve_adminq.h*/
> > +       union gve_adminq_command *adminq;
> > +       struct gve_dma_mem adminq_dma_mem;
> > +       uint32_t adminq_mask; /* masks prod_cnt to adminq size */
> > +       uint32_t adminq_prod_cnt; /* free-running count of AQ cmds
> executed */
> > +       uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed
> */
> > +       uint32_t adminq_timeouts; /* free-running count of AQ cmds
> timeouts */
> > +       /* free-running count of per AQ cmd executed */
> > +       uint32_t adminq_describe_device_cnt;
> > +       uint32_t adminq_cfg_device_resources_cnt;
> > +       uint32_t adminq_register_page_list_cnt;
> > +       uint32_t adminq_unregister_page_list_cnt;
> > +       uint32_t adminq_create_tx_queue_cnt;
> > +       uint32_t adminq_create_rx_queue_cnt;
> > +       uint32_t adminq_destroy_tx_queue_cnt;
> > +       uint32_t adminq_destroy_rx_queue_cnt;
> > +       uint32_t adminq_dcfg_device_resources_cnt;
> > +       uint32_t adminq_set_driver_parameter_cnt;
> > +       uint32_t adminq_report_stats_cnt;
> > +       uint32_t adminq_report_link_speed_cnt;
> > +       uint32_t adminq_get_ptype_map_cnt;
> > +
> > +       volatile uint32_t state_flags;
> > +
> > +       /* Gvnic device link speed from hypervisor. */
> > +       uint64_t link_speed;
> > +
> > +       uint16_t max_mtu;
> > +       struct rte_ether_addr dev_addr; /* mac address */
> > +
> > +       struct gve_queue_page_list *qpl;
> > +
> > +       struct gve_tx_queue **txqs;
> > +       struct gve_rx_queue **rxqs;
> > +};
> > +
> 
> The device private data is provided fully here, as well as
> 'gve_rx_queue' and 'gve_tx_queue' structs etc, although most of the
> files not used at this stage and not clear why needed.
> 
> Instead of adding full structs, can you please add here bare minimum
> structs, whatever used at this point, and as they are used in .c file,
> keep adding them in structs too?
> This clarifies what is used/added for specific features, also lets us
> easily figure out unused fields / clutter in the header files.

Sure, will update in the coming version. Thanks!

> 
> > +enum gve_state_flags_bit {
> > +       GVE_PRIV_FLAGS_ADMIN_QUEUE_OK           = 1,
> > +       GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK      = 2,
> > +       GVE_PRIV_FLAGS_DEVICE_RINGS_OK          = 3,
> > +       GVE_PRIV_FLAGS_NAPI_ENABLED             = 4,
> > +};
> > +
> > +static inline bool gve_is_gqi(struct gve_priv *priv)
> > +{
> > +       return priv->queue_format == GVE_GQI_RDA_FORMAT ||
> > +               priv->queue_format == GVE_GQI_QPL_FORMAT;
> > +}
> > +
> > +static inline bool gve_get_admin_queue_ok(struct gve_priv *priv)
> > +{
> > +
> return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
> > +                                      &priv->state_flags);
> > +}
> > +
> > +static inline void gve_set_admin_queue_ok(struct gve_priv *priv)
> > +{
> > +       rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
> > +                             &priv->state_flags);
> > +}
> > +
> > +static inline void gve_clear_admin_queue_ok(struct gve_priv *priv)
> > +{
> > +       rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
> > +                               &priv->state_flags);
> > +}
> > +
> > +static inline bool gve_get_device_resources_ok(struct gve_priv *priv)
> > +{
> > +
> return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_O
> K,
> > +                                      &priv->state_flags);
> > +}
> > +
> > +static inline void gve_set_device_resources_ok(struct gve_priv *priv)
> > +{
> > +       rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
> > +                             &priv->state_flags);
> > +}
> > +
> > +static inline void gve_clear_device_resources_ok(struct gve_priv *priv)
> > +{
> > +
> rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
> > +                               &priv->state_flags);
> > +}
> > +
> > +static inline bool gve_get_device_rings_ok(struct gve_priv *priv)
> > +{
> > +
> return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
> > +                                      &priv->state_flags);
> > +}
> > +
> > +static inline void gve_set_device_rings_ok(struct gve_priv *priv)
> > +{
> > +       rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
> > +                             &priv->state_flags);
> > +}
> > +
> > +static inline void gve_clear_device_rings_ok(struct gve_priv *priv)
> > +{
> > +       rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
> > +                               &priv->state_flags);
> > +}
> 
> If this is dpdk file, not base file, please follow dpdk coding
> convention, like return type should be in separate line.

Sure, will update in the coming version patchset. Thanks!

> 
> > +#endif /* _GVE_H_ */
> > diff --git a/drivers/net/gve/gve_adminq.c
> b/drivers/net/gve/gve_adminq.c
> > index 8a724f12c6..438ca2070e 100644
> > --- a/drivers/net/gve/gve_adminq.c
> > +++ b/drivers/net/gve/gve_adminq.c
> > @@ -5,6 +5,7 @@
> >    * Copyright(C) 2022 Intel Corporation
> >    */
> >
> > +#include "gve.h"
> >   #include "gve_adminq.h"
> >   #include "gve_register.h"
> >
> > diff --git a/drivers/net/gve/gve_ethdev.c
> b/drivers/net/gve/gve_ethdev.c
> > new file mode 100644
> > index 0000000000..f10f273f7d
> > --- /dev/null
> > +++ b/drivers/net/gve/gve_ethdev.c
> > @@ -0,0 +1,375 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation
> > + */
> > +#include <linux/pci_regs.h>
> > +
> > +#include "gve.h"
> > +#include "gve_adminq.h"
> > +#include "gve_register.h"
> > +
> > +#define GVE_VERSION            "1.3.0"
> > +#define GVE_VERSION_PREFIX     "GVE-"
> > +
> 
> Again, shouldn't these come from base file (google repo)?
> 
> Perhaps can be good to discuss what is the base file update
> strategy/plan for future?
> It can be easier if you can drop some external files with minimum change
> in dpdk files, that can be done by grouping external content in spefic
> files.
> Qi has lots of experince on this and I believe he can privide useful
> insight.

Yes, need to keep the base file at the base folder in the future.
Thanks!

> 
> > +const char gve_version_str[] = GVE_VERSION;
> > +static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
> > +
> > +static void
> > +gve_write_version(uint8_t *driver_version_register)
> > +{
> > +       const char *c = gve_version_prefix;
> > +
> > +       while (*c) {
> > +               writeb(*c, driver_version_register);
> > +               c++;
> > +       }
> > +
> > +       c = gve_version_str;
> > +       while (*c) {
> > +               writeb(*c, driver_version_register);
> > +               c++;
> > +       }
> > +       writeb('\n', driver_version_register);
> > +}
> > +
> > +static int
> > +gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
> > +{
> > +       return 0;
> > +}
> > +
> > +static int
> > +gve_dev_start(struct rte_eth_dev *dev)
> > +{
> > +       dev->data->dev_started = 1;
> > +
> > +       return 0;
> > +}
> > +
> > +static int
> > +gve_dev_stop(struct rte_eth_dev *dev)
> > +{
> > +       dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
> > +       dev->data->dev_started = 0;
> > +
> > +       return 0;
> > +}
> > +
> > +static int
> > +gve_dev_close(struct rte_eth_dev *dev)
> > +{
> > +       int err = 0;
> > +
> > +       if (dev->data->dev_started) {
> > +               err = gve_dev_stop(dev);
> > +               if (err != 0)
> > +                       PMD_DRV_LOG(ERR, "Failed to stop dev.");
> > +       }
> > +
> 
> Just a reminder that in 'close' driver should free all the resources, if
> there is previously allocated memomory, it should be freed now.

Thanks! Will update in the coming version.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 04/10] net/gve: add link update support
  2022-09-01 17:23       ` Ferruh Yigit
@ 2022-09-23  9:38         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-23  9:38 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Friday, September 2, 2022 01:23
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: Re: [PATCH v2 04/10] net/gve: add link update support
> 
> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> 
> >
> > Support dev_ops link_update.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   drivers/net/gve/gve_ethdev.c | 30
> ++++++++++++++++++++++++++++++
> >   1 file changed, 30 insertions(+)
> >
> > diff --git a/drivers/net/gve/gve_ethdev.c
> b/drivers/net/gve/gve_ethdev.c
> > index f10f273f7d..435115c047 100644
> > --- a/drivers/net/gve/gve_ethdev.c
> > +++ b/drivers/net/gve/gve_ethdev.c
> > @@ -37,10 +37,39 @@ gve_dev_configure(__rte_unused struct
> rte_eth_dev *dev)
> >          return 0;
> >   }
> >
> > +static int
> > +gve_link_update(struct rte_eth_dev *dev, __rte_unused int
> wait_to_complete)
> > +{
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +       struct rte_eth_link link;
> > +       int err;
> > +
> > +       memset(&link, 0, sizeof(link));
> > +       link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
> > +       link.link_autoneg = RTE_ETH_LINK_AUTONEG;
> > +
> > +       if (!dev->data->dev_started) {
> > +               link.link_status = RTE_ETH_LINK_DOWN;
> > +               link.link_speed = RTE_ETH_SPEED_NUM_NONE;
> > +       } else {
> > +               link.link_status = RTE_ETH_LINK_UP;
> > +               PMD_INIT_LOG(DEBUG, "Get link status from hw");
> > +               err = gve_adminq_report_link_speed(priv);
> 
> As far as I can see the API is calling an adminq command, is this
> command blocking until link is up? If so is there a non blocking version
> to utilize 'wait_to_complete', instead of ignoring it?

Yes, getting the link speed via an adminq command here is an blocking
until, and that's the only method we can utilize. As for the non-blocking
version, it depends on the real behavior of the HW (or backend), which 
is like a black-box to us. There is no HW register to store the link status
info. Seems the 'wait_to_complete' cannot be used here. Thanks!

> 
> Also what will happen if 'start()' dev_ops called but cable is not
> plugged in at all, won't this set link status still to "UP"?

Yes, that could be a terrible situation. Since this driver is running on the
cloud environment, we can only assume that the HW & backend part
work well when we run our driver. Thanks!

> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 0/9] introduce GVE PMD
  2022-08-29  8:41     ` [PATCH v2 10/10] doc: update documentation Junfeng Guo
  2022-09-01 17:20       ` Ferruh Yigit
@ 2022-09-23  9:38       ` Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 1/9] net/gve: introduce GVE PMD base code Junfeng Guo
                           ` (8 more replies)
  1 sibling, 9 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Introduce a new PMD for Google Virtual Ethernet (GVE).

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h

It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0

v2:
fix some CI check error.

v3:
refactor some code and fix some build error.

Junfeng Guo (9):
  net/gve: introduce GVE PMD base code
  net/gve: add logs and OS specific implementation
  net/gve: support device initialization
  net/gve: add link update support
  net/gve: add MTU set support
  net/gve: add queue operations
  net/gve: add Rx/Tx support
  net/gve: add support to get dev info and configure dev
  net/gve: add stats support

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  18 +
 doc/guides/nics/gve.rst                |  69 ++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/gve.h                  |  58 ++
 drivers/net/gve/gve_adminq.c           | 926 +++++++++++++++++++++++++
 drivers/net/gve/gve_adminq.h           | 383 ++++++++++
 drivers/net/gve/gve_desc.h             | 139 ++++
 drivers/net/gve/gve_desc_dqo.h         | 256 +++++++
 drivers/net/gve/gve_ethdev.c           | 775 +++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 300 ++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/gve_osdep.h            | 159 +++++
 drivers/net/gve/gve_register.h         |  30 +
 drivers/net/gve/gve_rx.c               | 366 ++++++++++
 drivers/net/gve/gve_tx.c               | 682 ++++++++++++++++++
 drivers/net/gve/meson.build            |  15 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 20 files changed, 4206 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve.h
 create mode 100644 drivers/net/gve/gve_adminq.c
 create mode 100644 drivers/net/gve/gve_adminq.h
 create mode 100644 drivers/net/gve/gve_desc.h
 create mode 100644 drivers/net/gve/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_osdep.h
 create mode 100644 drivers/net/gve/gve_register.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.34.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 1/9] net/gve: introduce GVE PMD base code
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23 18:57           ` Stephen Hemminger
                             ` (2 more replies)
  2022-09-23  9:38         ` [PATCH v3 2/9] net/gve: add logs and OS specific implementation Junfeng Guo
                           ` (7 subsequent siblings)
  8 siblings, 3 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h
 - gve.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Note that these code are not Intel files and they come from the kernel
community. The base code there has the statement of
SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
required MIT license as an exception to DPDK.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve.h          |  58 +++
 drivers/net/gve/gve_adminq.c   | 925 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_adminq.h   | 381 ++++++++++++++
 drivers/net/gve/gve_desc.h     | 137 +++++
 drivers/net/gve/gve_desc_dqo.h | 254 +++++++++
 drivers/net/gve/gve_register.h |  28 +
 6 files changed, 1783 insertions(+)
 create mode 100644 drivers/net/gve/gve.h
 create mode 100644 drivers/net/gve/gve_adminq.c
 create mode 100644 drivers/net/gve/gve_adminq.h
 create mode 100644 drivers/net/gve/gve_desc.h
 create mode 100644 drivers/net/gve/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/gve_register.h

diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
new file mode 100644
index 0000000000..1b0d59b639
--- /dev/null
+++ b/drivers/net/gve/gve.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include "gve_desc.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR		0
+#define GVE_DB_BAR		2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES		1024
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
new file mode 100644
index 0000000000..06f2ac2315
--- /dev/null
+++ b/drivers/net/gve/gve_adminq.c
@@ -0,0 +1,925 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n" \
+"Expected: length=%d, feature_mask=%x.\n" \
+"Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.\n",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.\n");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIME;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_rsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u8 *mac;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	mac = descriptor->mac;
+	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
+		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR, "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/gve_adminq.h b/drivers/net/gve/gve_adminq.h
new file mode 100644
index 0000000000..c7114cc883
--- /dev/null
+++ b/drivers/net/gve/gve_adminq.h
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/gve_desc.h b/drivers/net/gve/gve_desc.h
new file mode 100644
index 0000000000..358755b7e0
--- /dev/null
+++ b/drivers/net/gve/gve_desc.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+GVE_CHECK_STRUCT_LEN(64, gve_rx_desc);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/gve_desc_dqo.h b/drivers/net/gve/gve_desc_dqo.h
new file mode 100644
index 0000000000..0d533abcd1
--- /dev/null
+++ b/drivers/net/gve/gve_desc_dqo.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/gve_register.h b/drivers/net/gve/gve_register.h
new file mode 100644
index 0000000000..b65f336be2
--- /dev/null
+++ b/drivers/net/gve/gve_register.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 08/10] net/gve: add support to get dev info and configure dev
  2022-09-01 17:23       ` Ferruh Yigit
@ 2022-09-23  9:38         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-23  9:38 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Friday, September 2, 2022 01:24
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: Re: [PATCH v2 08/10] net/gve: add support to get dev info and
> configure dev
> 
> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> 
> >
> > Add dev_ops dev_infos_get.
> > Complete dev_configure with RX offloads configuration.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   drivers/net/gve/gve.h        |  3 ++
> >   drivers/net/gve/gve_ethdev.c | 61
> ++++++++++++++++++++++++++++++++++++
> >   2 files changed, 64 insertions(+)
> >
> > diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
> > index 7f4d0e37f3..004e0a75ca 100644
> > --- a/drivers/net/gve/gve.h
> > +++ b/drivers/net/gve/gve.h
> > @@ -27,6 +27,9 @@
> >   #define GVE_DEFAULT_TX_FREE_THRESH  256
> >   #define GVE_TX_MAX_FREE_SZ          512
> >
> > +#define GVE_MIN_BUF_SIZE           1024
> > +#define GVE_MAX_RX_PKTLEN          65535
> > +
> >   /* PTYPEs are always 10 bits. */
> >   #define GVE_NUM_PTYPES 1024
> >
> > diff --git a/drivers/net/gve/gve_ethdev.c
> b/drivers/net/gve/gve_ethdev.c
> > index 5ebe2c30ea..6bc7bf4519 100644
> > --- a/drivers/net/gve/gve_ethdev.c
> > +++ b/drivers/net/gve/gve_ethdev.c
> > @@ -96,6 +96,14 @@ gve_free_qpls(struct gve_priv *priv)
> >   static int
> >   gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
> >   {
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +
> > +       if (dev->data->dev_conf.rxmode.mq_mode &
> RTE_ETH_MQ_RX_RSS_FLAG)
> > +               dev->data->dev_conf.rxmode.offloads |=
> RTE_ETH_RX_OFFLOAD_RSS_HASH;
> > +
> > +       if (dev->data->dev_conf.rxmode.offloads &
> RTE_ETH_RX_OFFLOAD_TCP_LRO)
> > +               priv->enable_lsc = 1;
> 
> What is the relation between LRO and LSC? Is it a typo?

Yes, this is just a typo and it should be 'enable_rsc' to indicate
Receive Segment Coalescing for TCP Large Receive Offload.
Thanks for reminding!

> 
> And does driver support LSC at all? Or any interrupt?

Looks that current base code has not provided these functions.

> 
> > +
> >          return 0;
> >   }
> >
> > @@ -266,6 +274,58 @@ gve_dev_close(struct rte_eth_dev *dev)
> >          return err;
> >   }
> >
> > +static int
> > +gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
> *dev_info)
> > +{
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +
> > +       dev_info->device = dev->device;
> > +       dev_info->max_mac_addrs = 1;
> > +       dev_info->max_rx_queues = priv->max_nb_rxq;
> > +       dev_info->max_tx_queues = priv->max_nb_txq;
> > +       dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
> > +       dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
> > +
> 
> Can you please provide 'max_mtu' & 'min_mtu' values too?

Yes, will add these in the coming version patchset. Thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 2/9] net/gve: add logs and OS specific implementation
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 1/9] net/gve: introduce GVE PMD base code Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23 19:01           ` Stephen Hemminger
  2022-09-23  9:38         ` [PATCH v3 3/9] net/gve: support device initialization Junfeng Guo
                           ` (6 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

Add GVE PMD logs.
Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_adminq.h   |   2 +
 drivers/net/gve/gve_desc.h     |   2 +
 drivers/net/gve/gve_desc_dqo.h |   2 +
 drivers/net/gve/gve_logs.h     |  14 +++
 drivers/net/gve/gve_osdep.h    | 159 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_register.h |   2 +
 6 files changed, 181 insertions(+)
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_osdep.h

diff --git a/drivers/net/gve/gve_adminq.h b/drivers/net/gve/gve_adminq.h
index c7114cc883..cd496760ae 100644
--- a/drivers/net/gve/gve_adminq.h
+++ b/drivers/net/gve/gve_adminq.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/gve_desc.h b/drivers/net/gve/gve_desc.h
index 358755b7e0..627b9120dc 100644
--- a/drivers/net/gve/gve_desc.h
+++ b/drivers/net/gve/gve_desc.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/gve_desc_dqo.h b/drivers/net/gve/gve_desc_dqo.h
index 0d533abcd1..5031752b43 100644
--- a/drivers/net/gve/gve_desc_dqo.h
+++ b/drivers/net/gve/gve_desc_dqo.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..0d02da46e1
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_driver;
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
+		__func__, ## args)
+
+#endif
diff --git a/drivers/net/gve/gve_osdep.h b/drivers/net/gve/gve_osdep.h
new file mode 100644
index 0000000000..ba882038f5
--- /dev/null
+++ b/drivers/net/gve/gve_osdep.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+
+#ifndef PAGE_SHIFT
+#define PAGE_SHIFT	12
+#endif
+#ifndef PAGE_SIZE
+#define PAGE_SIZE	(1UL << PAGE_SHIFT)
+#endif
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#ifndef ____cacheline_aligned
+#define ____cacheline_aligned	__rte_cache_aligned
+#endif
+#ifndef __packed
+#define __packed		__rte_packed
+#endif
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/gve_register.h b/drivers/net/gve/gve_register.h
index b65f336be2..a599c1a08e 100644
--- a/drivers/net/gve/gve_register.h
+++ b/drivers/net/gve/gve_register.h
@@ -7,6 +7,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 3/9] net/gve: support device initialization
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 1/9] net/gve: introduce GVE PMD base code Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 2/9] net/gve: add logs and OS specific implementation Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 4/9] net/gve: add link update support Junfeng Guo
                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

Support device init and the fowllowing devops:
- dev_configure
- dev_start
- dev_stop
- dev_close

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  10 +
 doc/guides/nics/gve.rst                |  69 +++++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/gve_adminq.c           |   1 +
 drivers/net/gve/gve_ethdev.c           | 371 +++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 225 +++++++++++++++
 drivers/net/gve/meson.build            |  13 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 11 files changed, 705 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 32ffdd1a61..474f41f0de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..44aec28009
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,10 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..e93a0a6338
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,69 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_gve**) provides poll mode driver support for
+Google Virtual Ethernet device.
+
+Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
+for the device description.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_desc.h
+- gve_desc_dqo.h
+- gve_register.h
+- gve.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL - GQI with queue page list
+- GQI_RDA - GQI with raw DMA addressing
+- DQO_RDA - DQO with raw DMA addressing
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+Supported features of the GVE PMD are:
+
+- Multiple queues for TX and RX
+- Receiver Side Scaling (RSS)
+- TSO offload
+- Port hardware statistics
+- Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index f48e9f815c..64388adad0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -29,6 +29,7 @@ Network Interface Controller Drivers
     enetfec
     enic
     fm10k
+    gve
     hinic
     hns3
     i40e
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index bb77a03e24..20d9dcaafd 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -59,6 +59,11 @@ New Features
 
   * Added flow subscription support.
 
+* **Added GVE net PMD**
+
+  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/gve/gve_adminq.c b/drivers/net/gve/gve_adminq.c
index 06f2ac2315..dcc573c077 100644
--- a/drivers/net/gve/gve_adminq.c
+++ b/drivers/net/gve/gve_adminq.c
@@ -5,6 +5,7 @@
  * Copyright(C) 2022 Intel Corporation
  */
 
+#include "gve_ethdev.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..4bb73b188d
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,371 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve_ethdev.h"
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d\n", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_DRV_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map pci bar!");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
+	if (!eth_dev->data->mac_addrs) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
+		return -ENOMEM;
+	}
+	rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	eth_dev->data->mac_addrs = NULL;
+
+	gve_teardown_priv_resources(priv);
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
new file mode 100644
index 0000000000..8ab5c2c877
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.h
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ETHDEV_H_
+#define _GVE_ETHDEV_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "gve.h"
+
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
+
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_rsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+static inline bool
+gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool
+gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+
+#endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..9a22cc9abe
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'gve_adminq.c',
+        'gve_ethdev.c',
+)
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..c2e0723b4c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_22 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index e35652fe63..f1a0ee2cef 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* RE: [PATCH v2 09/10] net/gve: add stats support
  2022-09-01 17:24       ` Ferruh Yigit
@ 2022-09-23  9:38         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-23  9:38 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Friday, September 2, 2022 01:24
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: Re: [PATCH v2 09/10] net/gve: add stats support
> 
> On 8/29/2022 9:41 AM, Junfeng Guo wrote:
> 
> >
> > Update stats add support of dev_ops stats_get/reset.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   drivers/net/gve/gve.h        | 10 ++++++
> >   drivers/net/gve/gve_ethdev.c | 69
> ++++++++++++++++++++++++++++++++++++
> >   drivers/net/gve/gve_rx.c     | 15 ++++++--
> >   drivers/net/gve/gve_tx.c     | 12 +++++++
> >   4 files changed, 104 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/gve/gve.h b/drivers/net/gve/gve.h
> > index 004e0a75ca..e256a2bec2 100644
> > --- a/drivers/net/gve/gve.h
> > +++ b/drivers/net/gve/gve.h
> > @@ -91,6 +91,10 @@ struct gve_tx_queue {
> >          struct gve_queue_page_list *qpl;
> >          struct gve_tx_iovec *iov_ring;
> >
> > +       /* Stats */
> > +       uint64_t packets;
> > +       uint64_t bytes;
> > +
> 
> Can't you get stats for 'errors' in Tx path?

Yes, will add in the coming version. Thanks!

> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 4/9] net/gve: add link update support
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
                           ` (2 preceding siblings ...)
  2022-09-23  9:38         ` [PATCH v3 3/9] net/gve: support device initialization Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 5/9] net/gve: add MTU set support Junfeng Guo
                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 ++
 drivers/net/gve/gve_ethdev.c     | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 44aec28009..d03e3ac89e 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,6 +4,8 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
+Link status          = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 4bb73b188d..7eb93d7366 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -34,10 +34,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_DRV_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -70,6 +99,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 5/9] net/gve: add MTU set support
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
                           ` (3 preceding siblings ...)
  2022-09-23  9:38         ` [PATCH v3 4/9] net/gve: add link update support Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 6/9] net/gve: add queue operations Junfeng Guo
                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 drivers/net/gve/gve_ethdev.c     | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index d03e3ac89e..fbff0a5462 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -6,6 +6,7 @@
 [Features]
 Speed capabilities   = Y
 Link status          = Y
+MTU update           = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 7eb93d7366..c510938832 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -94,12 +94,41 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	dev->data->dev_conf.rxmode.mtu = mtu + RTE_ETHER_HDR_LEN;
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 6/9] net/gve: add queue operations
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
                           ` (4 preceding siblings ...)
  2022-09-23  9:38         ` [PATCH v3 5/9] net/gve: add MTU set support Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 7/9] net/gve: add Rx/Tx support Junfeng Guo
                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add support for queue operations:
- setup rx/tx queue
- release rx/tx queue
- start rx/tx queues
- stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 206 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h |  47 ++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 681 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index c510938832..72e7a78ace 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -28,12 +28,111 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 {
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -65,16 +164,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -83,7 +236,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -91,6 +248,21 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+	rte_free(priv->adminq);
+	rte_free(priv->qpl);
+	rte_free(priv);
+
 	return err;
 }
 
@@ -127,6 +299,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -264,7 +438,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -315,10 +491,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 8ab5c2c877..44e075e166 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,15 +34,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -56,6 +76,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -64,9 +86,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -83,6 +113,7 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_priv {
@@ -222,4 +253,20 @@ gve_clear_device_rings_ok(struct gve_priv *priv)
 				&priv->state_flags);
 }
 
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void gve_tx_queue_release(void *txq);
+
+void gve_rx_queue_release(void *rxq);
+
+void gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..90b3a52aca
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_DRV_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..b18e8123aa
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_DRV_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index 9a22cc9abe..c4fd013ef2 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,5 +9,7 @@ endif
 
 sources = files(
         'gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 7/9] net/gve: add Rx/Tx support
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
                           ` (5 preceding siblings ...)
  2022-09-23  9:38         ` [PATCH v3 6/9] net/gve: add queue operations Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 8/9] net/gve: add support to get dev info and configure dev Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 9/9] net/gve: add stats support Junfeng Guo
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |   2 +
 drivers/net/gve/gve_ethdev.c     |   5 +
 drivers/net/gve/gve_ethdev.h     |  16 ++
 drivers/net/gve/gve_rx.c         | 143 ++++++++++
 drivers/net/gve/gve_tx.c         | 455 +++++++++++++++++++++++++++++++
 5 files changed, 621 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index fbff0a5462..38dc7024d6 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -7,6 +7,8 @@
 Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+TSO                  = Y
+L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 72e7a78ace..dcf79ddb23 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	}
+
 	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
 	if (!eth_dev->data->mac_addrs) {
 		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 44e075e166..0624085517 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,6 +34,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -269,4 +281,8 @@ void gve_stop_tx_queues(struct rte_eth_dev *dev);
 
 void gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 90b3a52aca..e29f979a4e 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,149 @@
 #include "gve_ethdev.h"
 #include "gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+
+	rxr = rxq->rx_desc_ring;
+
+	for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		rxe->data_off = RTE_PKTMBUF_HEADROOM;
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
+				   (void *)(size_t)addr, len);
+		}
+		rxe->nb_segs = 1;
+		rxe->next = NULL;
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->packet_type = 0;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index b18e8123aa..6196c29e24 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,461 @@
 #include "gve_ethdev.h"
 #include "gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+			}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		txq->next_to_clean += nb_clean;
+	}
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+				   (void *)(size_t)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen,
+					 (void *)(size_t)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+					   (void *)(size_t)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(size_t)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 8/9] net/gve: add support to get dev info and configure dev
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
                           ` (6 preceding siblings ...)
  2022-09-23  9:38         ` [PATCH v3 7/9] net/gve: add Rx/Tx support Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  2022-09-23  9:38         ` [PATCH v3 9/9] net/gve: add stats support Junfeng Guo
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads configuration.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 drivers/net/gve/gve_ethdev.c     | 63 ++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 38dc7024d6..cdc46b08a3 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -8,6 +8,7 @@ Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
 TSO                  = Y
+RSS hash             = Y
 L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index dcf79ddb23..e3195376c4 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -93,6 +93,14 @@ gve_free_qpls(struct gve_priv *priv)
 static int
 gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+		dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_rsc = 1;
+
 	return 0;
 }
 
@@ -266,6 +274,60 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+	dev_info->max_mtu = RTE_ETHER_MTU;
+	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -299,6 +361,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 9/9] net/gve: add stats support
  2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
                           ` (7 preceding siblings ...)
  2022-09-23  9:38         ` [PATCH v3 8/9] net/gve: add support to get dev info and configure dev Junfeng Guo
@ 2022-09-23  9:38         ` Junfeng Guo
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-23  9:38 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Update stats add support of dev_ops stats_get/reset.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 +
 drivers/net/gve/gve_ethdev.c     | 71 ++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h     | 12 ++++++
 drivers/net/gve/gve_rx.c         | 15 ++++++-
 drivers/net/gve/gve_tx.c         | 13 ++++++
 5 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index cdc46b08a3..180408aa80 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -10,6 +10,8 @@ MTU update           = Y
 TSO                  = Y
 RSS hash             = Y
 L4 checksum offload  = Y
+Basic stats          = Y
+Stats per queue      = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index e3195376c4..7730835ed5 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -328,6 +328,75 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	return 0;
 }
 
+static int
+gve_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		stats->opackets += txq->packets;
+		stats->obytes += txq->bytes;
+		stats->oerrors += txq->errors;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_opackets[i] = txq->packets;
+			stats->q_obytes[i] = txq->bytes;
+		}
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		stats->ipackets += rxq->packets;
+		stats->ibytes += rxq->bytes;
+		stats->ierrors += rxq->errors;
+		stats->rx_nombuf += rxq->no_mbufs;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_ipackets[i] = rxq->packets;
+			stats->q_ibytes[i] = rxq->bytes;
+			stats->q_errors[i] = rxq->errors;
+		}
+	}
+
+	return 0;
+}
+
+static int
+gve_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		txq->packets  = 0;
+		txq->bytes = 0;
+		txq->errors = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		rxq->packets  = 0;
+		rxq->bytes = 0;
+		rxq->no_mbufs = 0;
+		rxq->errors = 0;
+	}
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -365,6 +434,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
+	.stats_get            = gve_dev_stats_get,
+	.stats_reset          = gve_dev_stats_reset,
 	.mtu_set              = gve_dev_mtu_set,
 };
 
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 0624085517..a07c438b5d 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -76,6 +76,11 @@ struct gve_tx_queue {
 	struct gve_queue_page_list *qpl;
 	struct gve_tx_iovec *iov_ring;
 
+	/* Stats */
+	uint64_t errors;
+	uint64_t packets;
+	uint64_t bytes;
+
 	uint16_t port_id;
 	uint16_t queue_id;
 
@@ -114,6 +119,12 @@ struct gve_rx_queue {
 	/* only valid for GQI_QPL queue format */
 	struct gve_queue_page_list *qpl;
 
+	/* stats */
+	uint64_t no_mbufs;
+	uint64_t errors;
+	uint64_t packets;
+	uint64_t bytes;
+
 	struct gve_priv *hw;
 	const struct rte_memzone *qres_mz;
 	struct gve_queue_resources *qres;
@@ -125,6 +136,7 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
 	uint8_t is_gqi_qpl;
 };
 
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index e29f979a4e..8d3ee35472 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -26,8 +26,10 @@ gve_rx_refill(struct gve_rx_queue *rxq)
 					break;
 				rxq->sw_ring[idx + i] = nmb;
 			}
-			if (i != nb_alloc)
+			if (i != nb_alloc) {
+				rxq->no_mbufs += nb_alloc - i;
 				nb_alloc = i;
+			}
 		}
 		rxq->nb_avail -= nb_alloc;
 		next_avail += nb_alloc;
@@ -88,6 +90,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	uint16_t rx_id = rxq->rx_tail;
 	struct rte_mbuf *rxe;
 	uint16_t nb_rx, len;
+	uint64_t bytes = 0;
 	uint64_t addr;
 
 	rxr = rxq->rx_desc_ring;
@@ -97,8 +100,10 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
 			break;
 
-		if (rxd->flags_seq & GVE_RXF_ERR)
+		if (rxd->flags_seq & GVE_RXF_ERR) {
+			rxq->errors++;
 			continue;
+		}
 
 		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
 		rxe = rxq->sw_ring[rx_id];
@@ -137,6 +142,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			rx_id = 0;
 
 		rx_pkts[nb_rx] = rxe;
+		bytes += len;
 	}
 
 	rxq->nb_avail += nb_rx;
@@ -145,6 +151,11 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	if (rxq->nb_avail > rxq->free_thresh)
 		gve_rx_refill(rxq);
 
+	if (nb_rx) {
+		rxq->packets += nb_rx;
+		rxq->bytes += bytes;
+	}
+
 	return nb_rx;
 }
 
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index 6196c29e24..81778840cf 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -260,6 +260,7 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t sw_id = txq->sw_tail;
 	uint16_t nb_used, i;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 	uint32_t hlen;
 
@@ -355,6 +356,8 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		txq->nb_free -= nb_used;
 		txq->sw_nb_free -= first->nb_segs;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
@@ -362,6 +365,10 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
 		txq->sw_tail = sw_id;
+
+		txq->errors += nb_pkts - nb_tx;
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
@@ -380,6 +387,7 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t nb_used, hlen, i;
 	uint64_t ol_flags, addr;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 
 	txr = txq->tx_desc_ring;
@@ -438,12 +446,17 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		txq->nb_free -= nb_used;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
 	if (nb_tx) {
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
+
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 1/9] net/gve: introduce GVE PMD base code
  2022-09-23  9:38         ` [PATCH v3 1/9] net/gve: introduce GVE PMD base code Junfeng Guo
@ 2022-09-23 18:57           ` Stephen Hemminger
  2022-09-27  7:27             ` Guo, Junfeng
  2022-09-23 18:58           ` Stephen Hemminger
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
  2 siblings, 1 reply; 192+ messages in thread
From: Stephen Hemminger @ 2022-09-23 18:57 UTC (permalink / raw)
  To: Junfeng Guo
  Cc: qi.z.zhang, jingjing.wu, ferruh.yigit, dev, xiaoyun.li,
	awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

On Fri, 23 Sep 2022 17:38:21 +0800
Junfeng Guo <junfeng.guo@intel.com> wrote:

> +#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."

Why do you need #define for this?

+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n" \
+"Expected: length=%d, feature_mask=%x.\n" \
+"Actual: length=%d, feature_mask=%x."
+

Why such a wordy multi-line message, please use single line

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 1/9] net/gve: introduce GVE PMD base code
  2022-09-23  9:38         ` [PATCH v3 1/9] net/gve: introduce GVE PMD base code Junfeng Guo
  2022-09-23 18:57           ` Stephen Hemminger
@ 2022-09-23 18:58           ` Stephen Hemminger
  2022-09-27  7:27             ` Guo, Junfeng
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
  2 siblings, 1 reply; 192+ messages in thread
From: Stephen Hemminger @ 2022-09-23 18:58 UTC (permalink / raw)
  To: Junfeng Guo
  Cc: qi.z.zhang, jingjing.wu, ferruh.yigit, dev, xiaoyun.li,
	awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

On Fri, 23 Sep 2022 17:38:21 +0800
Junfeng Guo <junfeng.guo@intel.com> wrote:

> Note that these code are not Intel files and they come from the kernel
> community. The base code there has the statement of
> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> required MIT license as an exception to DPDK.

Using MIT license in DPDK will require approval from TAB and from
the DPDK governing board. So it probably won't make this release.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 2/9] net/gve: add logs and OS specific implementation
  2022-09-23  9:38         ` [PATCH v3 2/9] net/gve: add logs and OS specific implementation Junfeng Guo
@ 2022-09-23 19:01           ` Stephen Hemminger
  2022-09-27  7:27             ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Stephen Hemminger @ 2022-09-23 19:01 UTC (permalink / raw)
  To: Junfeng Guo
  Cc: qi.z.zhang, jingjing.wu, ferruh.yigit, dev, xiaoyun.li,
	awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

On Fri, 23 Sep 2022 17:38:22 +0800
Junfeng Guo <junfeng.guo@intel.com> wrote:

> +
> +#define PMD_DRV_LOG(level, fmt, args...) \
> +	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
> +		__func__, ## args)

Many of your existing log messages already have newline, so using this
common definition will create double spaced log messages.

Please audit all usages and print one newline.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v3 1/9] net/gve: introduce GVE PMD base code
  2022-09-23 18:57           ` Stephen Hemminger
@ 2022-09-27  7:27             ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-27  7:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Zhang, Qi Z, Wu, Jingjing, ferruh.yigit, dev, Li, Xiaoyun,
	awogbemila, Richardson, Bruce, Lin, Xueqin, Wang, Haiyue



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Saturday, September 24, 2022 02:58
> To: Guo, Junfeng <junfeng.guo@intel.com>
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; ferruh.yigit@xilinx.com; dev@dpdk.org; Li,
> Xiaoyun <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson,
> Bruce <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v3 1/9] net/gve: introduce GVE PMD base code
> 
> On Fri, 23 Sep 2022 17:38:21 +0800
> Junfeng Guo <junfeng.guo@intel.com> wrote:
> 
> > +#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option
> larger than expected. Possible older version of guest driver."
> 
> Why do you need #define for this?

This macro is used by gve kernel driver and we just keep it here. Thanks!

> 
> +#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n" \
> +"Expected: length=%d, feature_mask=%x.\n" \
> +"Actual: length=%d, feature_mask=%x."
> +
> 
> Why such a wordy multi-line message, please use single line

This one is also from gve kernel driver.
Will update with single line message in the coming version.
Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v3 1/9] net/gve: introduce GVE PMD base code
  2022-09-23 18:58           ` Stephen Hemminger
@ 2022-09-27  7:27             ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-27  7:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Zhang, Qi Z, Wu, Jingjing, ferruh.yigit, dev, Li, Xiaoyun,
	awogbemila, Richardson, Bruce, Lin, Xueqin, Wang, Haiyue



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Saturday, September 24, 2022 02:59
> To: Guo, Junfeng <junfeng.guo@intel.com>
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; ferruh.yigit@xilinx.com; dev@dpdk.org; Li,
> Xiaoyun <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson,
> Bruce <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v3 1/9] net/gve: introduce GVE PMD base code
> 
> On Fri, 23 Sep 2022 17:38:21 +0800
> Junfeng Guo <junfeng.guo@intel.com> wrote:
> 
> > Note that these code are not Intel files and they come from the kernel
> > community. The base code there has the statement of
> > SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> > required MIT license as an exception to DPDK.
> 
> Using MIT license in DPDK will require approval from TAB and from
> the DPDK governing board. So it probably won't make this release.

Yes, the approval is under process now. Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v3 2/9] net/gve: add logs and OS specific implementation
  2022-09-23 19:01           ` Stephen Hemminger
@ 2022-09-27  7:27             ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-09-27  7:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Zhang, Qi Z, Wu, Jingjing, ferruh.yigit, dev, Li, Xiaoyun,
	awogbemila, Richardson, Bruce, Lin, Xueqin, Wang, Haiyue



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Saturday, September 24, 2022 03:01
> To: Guo, Junfeng <junfeng.guo@intel.com>
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; ferruh.yigit@xilinx.com; dev@dpdk.org; Li,
> Xiaoyun <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson,
> Bruce <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v3 2/9] net/gve: add logs and OS specific
> implementation
> 
> On Fri, 23 Sep 2022 17:38:22 +0800
> Junfeng Guo <junfeng.guo@intel.com> wrote:
> 
> > +
> > +#define PMD_DRV_LOG(level, fmt, args...) \
> > +	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n",
> \
> > +		__func__, ## args)
> 
> Many of your existing log messages already have newline, so using this
> common definition will create double spaced log messages.
> 
> Please audit all usages and print one newline.

Sure, will double check all the usages and update in the coming version.
Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v4 0/9] introduce GVE PMD
  2022-09-23  9:38         ` [PATCH v3 1/9] net/gve: introduce GVE PMD base code Junfeng Guo
  2022-09-23 18:57           ` Stephen Hemminger
  2022-09-23 18:58           ` Stephen Hemminger
@ 2022-09-27  7:32           ` Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code Junfeng Guo
                               ` (8 more replies)
  2 siblings, 9 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Introduce a new PMD for Google Virtual Ethernet (GVE).

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h

It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0

v2:
fix some CI check error.

v3:
refactor some code and fix some build error.

v4:
move the Google base code files into DPDK base folder.

Junfeng Guo (9):
  net/gve/base: introduce GVE PMD base code
  net/gve/base: add logs and OS specific implementation
  net/gve: add support for device initialization
  net/gve: add support for link update
  net/gve: add support for MTU setting
  net/gve: add support for queue operations
  net/gve: add support for Rx/Tx
  net/gve: add support for dev info get and dev configure
  net/gve: add support for stats

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  18 +
 doc/guides/nics/gve.rst                |  69 ++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve.h             |  58 ++
 drivers/net/gve/base/gve_adminq.c      | 924 +++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h      | 383 ++++++++++
 drivers/net/gve/base/gve_desc.h        | 139 ++++
 drivers/net/gve/base/gve_desc_dqo.h    | 256 +++++++
 drivers/net/gve/base/gve_osdep.h       | 159 +++++
 drivers/net/gve/base/gve_register.h    |  30 +
 drivers/net/gve/gve_ethdev.c           | 775 +++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 300 ++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/gve_rx.c               | 366 ++++++++++
 drivers/net/gve/gve_tx.c               | 682 ++++++++++++++++++
 drivers/net/gve/meson.build            |  16 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 20 files changed, 4205 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_osdep.h
 create mode 100644 drivers/net/gve/base/gve_register.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.34.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-10-06 14:19               ` Ferruh Yigit
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 2/9] net/gve/base: add logs and OS specific implementation Junfeng Guo
                               ` (7 subsequent siblings)
  8 siblings, 2 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
- gve_adminq.c
- gve_adminq.h
- gve_desc.h
- gve_desc_dqo.h
- gve_register.h
- gve.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Note that these code are not Intel files and they come from the kernel
community. The base code there has the statement of
SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
required MIT license as an exception to DPDK.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve.h          |  58 ++
 drivers/net/gve/base/gve_adminq.c   | 923 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h   | 381 ++++++++++++
 drivers/net/gve/base/gve_desc.h     | 137 +++++
 drivers/net/gve/base/gve_desc_dqo.h | 254 ++++++++
 drivers/net/gve/base/gve_register.h |  28 +
 6 files changed, 1781 insertions(+)
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_register.h

diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
new file mode 100644
index 0000000000..1b0d59b639
--- /dev/null
+++ b/drivers/net/gve/base/gve.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include "gve_desc.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR		0
+#define GVE_DB_BAR		2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES		1024
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
new file mode 100644
index 0000000000..95ec6b015c
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -0,0 +1,923 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n Expected: length=%d, feature_mask=%x.\n Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIME;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_rsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u8 *mac;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	mac = descriptor->mac;
+	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
+		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR, "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
new file mode 100644
index 0000000000..c7114cc883
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
new file mode 100644
index 0000000000..358755b7e0
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+GVE_CHECK_STRUCT_LEN(64, gve_rx_desc);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
new file mode 100644
index 0000000000..0d533abcd1
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
new file mode 100644
index 0000000000..b65f336be2
--- /dev/null
+++ b/drivers/net/gve/base/gve_register.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 2/9] net/gve/base: add logs and OS specific implementation
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-10-06 14:20               ` Ferruh Yigit
  2022-09-27  7:32             ` [PATCH v4 3/9] net/gve: add support for device initialization Junfeng Guo
                               ` (6 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

Add GVE PMD logs.
Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve_adminq.h   |   2 +
 drivers/net/gve/base/gve_desc.h     |   2 +
 drivers/net/gve/base/gve_desc_dqo.h |   2 +
 drivers/net/gve/base/gve_osdep.h    | 159 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_register.h |   2 +
 drivers/net/gve/gve_logs.h          |  14 +++
 6 files changed, 181 insertions(+)
 create mode 100644 drivers/net/gve/base/gve_osdep.h
 create mode 100644 drivers/net/gve/gve_logs.h

diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
index c7114cc883..cd496760ae 100644
--- a/drivers/net/gve/base/gve_adminq.h
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
index 358755b7e0..627b9120dc 100644
--- a/drivers/net/gve/base/gve_desc.h
+++ b/drivers/net/gve/base/gve_desc.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
index 0d533abcd1..5031752b43 100644
--- a/drivers/net/gve/base/gve_desc_dqo.h
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/base/gve_osdep.h b/drivers/net/gve/base/gve_osdep.h
new file mode 100644
index 0000000000..7cb73002f4
--- /dev/null
+++ b/drivers/net/gve/base/gve_osdep.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "../gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+
+#ifndef PAGE_SHIFT
+#define PAGE_SHIFT	12
+#endif
+#ifndef PAGE_SIZE
+#define PAGE_SIZE	(1UL << PAGE_SHIFT)
+#endif
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#ifndef ____cacheline_aligned
+#define ____cacheline_aligned	__rte_cache_aligned
+#endif
+#ifndef __packed
+#define __packed		__rte_packed
+#endif
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
index b65f336be2..a599c1a08e 100644
--- a/drivers/net/gve/base/gve_register.h
+++ b/drivers/net/gve/base/gve_register.h
@@ -7,6 +7,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..0d02da46e1
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_driver;
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
+		__func__, ## args)
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 3/9] net/gve: add support for device initialization
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 2/9] net/gve/base: add logs and OS specific implementation Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-10-06 14:22               ` Ferruh Yigit
  2022-09-27  7:32             ` [PATCH v4 4/9] net/gve: add support for link update Junfeng Guo
                               ` (5 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

Support device init and the fowllowing devops:
- dev_configure
- dev_start
- dev_stop
- dev_close

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  10 +
 doc/guides/nics/gve.rst                |  69 +++++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve_adminq.c      |   1 +
 drivers/net/gve/gve_ethdev.c           | 371 +++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 225 +++++++++++++++
 drivers/net/gve/meson.build            |  14 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 11 files changed, 706 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 32ffdd1a61..474f41f0de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..44aec28009
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,10 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..e93a0a6338
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,69 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_gve**) provides poll mode driver support for
+Google Virtual Ethernet device.
+
+Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
+for the device description.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_desc.h
+- gve_desc_dqo.h
+- gve_register.h
+- gve.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL - GQI with queue page list
+- GQI_RDA - GQI with raw DMA addressing
+- DQO_RDA - DQO with raw DMA addressing
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+Supported features of the GVE PMD are:
+
+- Multiple queues for TX and RX
+- Receiver Side Scaling (RSS)
+- TSO offload
+- Port hardware statistics
+- Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index f48e9f815c..64388adad0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -29,6 +29,7 @@ Network Interface Controller Drivers
     enetfec
     enic
     fm10k
+    gve
     hinic
     hns3
     i40e
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index bb77a03e24..20d9dcaafd 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -59,6 +59,11 @@ New Features
 
   * Added flow subscription support.
 
+* **Added GVE net PMD**
+
+  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
index 95ec6b015c..072fbee539 100644
--- a/drivers/net/gve/base/gve_adminq.c
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -5,6 +5,7 @@
  * Copyright(C) 2022 Intel Corporation
  */
 
+#include "../gve_ethdev.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..df698c1b02
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,371 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+#include "base/gve_register.h"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_DRV_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map pci bar!");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
+	if (!eth_dev->data->mac_addrs) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
+		return -ENOMEM;
+	}
+	rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	eth_dev->data->mac_addrs = NULL;
+
+	gve_teardown_priv_resources(priv);
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
new file mode 100644
index 0000000000..2ac2a46ac1
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.h
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ETHDEV_H_
+#define _GVE_ETHDEV_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "base/gve.h"
+
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
+
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_rsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+static inline bool
+gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool
+gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+
+#endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..d8ec64b3a3
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'base/gve_adminq.c',
+        'gve_ethdev.c',
+)
+includes += include_directories('base')
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..c2e0723b4c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_22 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index e35652fe63..f1a0ee2cef 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 4/9] net/gve: add support for link update
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
                               ` (2 preceding siblings ...)
  2022-09-27  7:32             ` [PATCH v4 3/9] net/gve: add support for device initialization Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-10-06 14:23               ` Ferruh Yigit
  2022-09-27  7:32             ` [PATCH v4 5/9] net/gve: add support for MTU setting Junfeng Guo
                               ` (4 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 ++
 drivers/net/gve/gve_ethdev.c     | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 44aec28009..d03e3ac89e 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,6 +4,8 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
+Link status          = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index df698c1b02..43112d901a 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -34,10 +34,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_DRV_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -70,6 +99,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 5/9] net/gve: add support for MTU setting
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
                               ` (3 preceding siblings ...)
  2022-09-27  7:32             ` [PATCH v4 4/9] net/gve: add support for link update Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 6/9] net/gve: add support for queue operations Junfeng Guo
                               ` (3 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 drivers/net/gve/gve_ethdev.c     | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index d03e3ac89e..fbff0a5462 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -6,6 +6,7 @@
 [Features]
 Speed capabilities   = Y
 Link status          = Y
+MTU update           = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 43112d901a..5bcc9ab3a0 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -94,12 +94,41 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	dev->data->dev_conf.rxmode.mtu = mtu + RTE_ETHER_HDR_LEN;
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 6/9] net/gve: add support for queue operations
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
                               ` (4 preceding siblings ...)
  2022-09-27  7:32             ` [PATCH v4 5/9] net/gve: add support for MTU setting Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 7/9] net/gve: add support for Rx/Tx Junfeng Guo
                               ` (2 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add support for queue operations:
- setup rx/tx queue
- release rx/tx queue
- start rx/tx queues
- stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 206 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h |  48 ++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 682 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 5bcc9ab3a0..7a3695aec1 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -28,12 +28,111 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 {
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -65,16 +164,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -83,7 +236,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -91,6 +248,21 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+	rte_free(priv->adminq);
+	rte_free(priv->qpl);
+	rte_free(priv);
+
 	return err;
 }
 
@@ -127,6 +299,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -264,7 +438,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -315,10 +491,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 2ac2a46ac1..b0391f7df5 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,15 +34,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -56,6 +76,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -64,9 +86,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -83,6 +113,8 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_priv {
@@ -222,4 +254,20 @@ gve_clear_device_rings_ok(struct gve_priv *priv)
 				&priv->state_flags);
 }
 
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void gve_tx_queue_release(void *txq);
+
+void gve_rx_queue_release(void *rxq);
+
+void gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..e64a461253
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_DRV_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..b706b62e71
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_DRV_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index d8ec64b3a3..af0010c01c 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,6 +9,8 @@ endif
 
 sources = files(
         'base/gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
 includes += include_directories('base')
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 7/9] net/gve: add support for Rx/Tx
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
                               ` (5 preceding siblings ...)
  2022-09-27  7:32             ` [PATCH v4 6/9] net/gve: add support for queue operations Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-10-06 14:24               ` Ferruh Yigit
  2022-09-27  7:32             ` [PATCH v4 8/9] net/gve: add support for dev info get and dev configure Junfeng Guo
  2022-09-27  7:32             ` [PATCH v4 9/9] net/gve: add support for stats Junfeng Guo
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |   2 +
 drivers/net/gve/gve_ethdev.c     |   5 +
 drivers/net/gve/gve_ethdev.h     |  16 ++
 drivers/net/gve/gve_rx.c         | 143 ++++++++++
 drivers/net/gve/gve_tx.c         | 455 +++++++++++++++++++++++++++++++
 5 files changed, 621 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index fbff0a5462..38dc7024d6 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -7,6 +7,8 @@
 Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+TSO                  = Y
+L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 7a3695aec1..0aae447b9b 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	}
+
 	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
 	if (!eth_dev->data->mac_addrs) {
 		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index b0391f7df5..502ba88dc3 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,6 +34,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -270,4 +282,8 @@ void gve_stop_tx_queues(struct rte_eth_dev *dev);
 
 void gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index e64a461253..3634a2762f 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,149 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+
+	rxr = rxq->rx_desc_ring;
+
+	for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		rxe->data_off = RTE_PKTMBUF_HEADROOM;
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
+				   (void *)(size_t)addr, len);
+		}
+		rxe->nb_segs = 1;
+		rxe->next = NULL;
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->packet_type = 0;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index b706b62e71..d94b1186a4 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,461 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+			}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		txq->next_to_clean += nb_clean;
+	}
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+				   (void *)(size_t)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen,
+					 (void *)(size_t)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+					   (void *)(size_t)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(size_t)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 8/9] net/gve: add support for dev info get and dev configure
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
                               ` (6 preceding siblings ...)
  2022-09-27  7:32             ` [PATCH v4 7/9] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-10-06 14:25               ` Ferruh Yigit
  2022-09-27  7:32             ` [PATCH v4 9/9] net/gve: add support for stats Junfeng Guo
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads configuration.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 drivers/net/gve/gve_ethdev.c     | 65 +++++++++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 38dc7024d6..cdc46b08a3 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -8,6 +8,7 @@ Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
 TSO                  = Y
+RSS hash             = Y
 L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 0aae447b9b..b9b8e51b02 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -91,8 +91,16 @@ gve_free_qpls(struct gve_priv *priv)
 }
 
 static int
-gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+gve_dev_configure(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+		dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_rsc = 1;
+
 	return 0;
 }
 
@@ -266,6 +274,60 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+	dev_info->max_mtu = RTE_ETHER_MTU;
+	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS	|
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -299,6 +361,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v4 9/9] net/gve: add support for stats
  2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
                               ` (7 preceding siblings ...)
  2022-09-27  7:32             ` [PATCH v4 8/9] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-09-27  7:32             ` Junfeng Guo
  2022-10-06 14:25               ` Ferruh Yigit
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-09-27  7:32 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Update stats add support of dev_ops stats_get/reset.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 +
 drivers/net/gve/gve_ethdev.c     | 71 ++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h     | 11 +++++
 drivers/net/gve/gve_rx.c         | 15 ++++++-
 drivers/net/gve/gve_tx.c         | 13 ++++++
 5 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index cdc46b08a3..180408aa80 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -10,6 +10,8 @@ MTU update           = Y
 TSO                  = Y
 RSS hash             = Y
 L4 checksum offload  = Y
+Basic stats          = Y
+Stats per queue      = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index b9b8e51b02..cd474b8128 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -328,6 +328,75 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	return 0;
 }
 
+static int
+gve_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		stats->opackets += txq->packets;
+		stats->obytes += txq->bytes;
+		stats->oerrors += txq->errors;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_opackets[i] = txq->packets;
+			stats->q_obytes[i] = txq->bytes;
+		}
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		stats->ipackets += rxq->packets;
+		stats->ibytes += rxq->bytes;
+		stats->ierrors += rxq->errors;
+		stats->rx_nombuf += rxq->no_mbufs;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_ipackets[i] = rxq->packets;
+			stats->q_ibytes[i] = rxq->bytes;
+			stats->q_errors[i] = rxq->errors;
+		}
+	}
+
+	return 0;
+}
+
+static int
+gve_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct gve_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		txq->packets  = 0;
+		txq->bytes = 0;
+		txq->errors = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct gve_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		rxq->packets  = 0;
+		rxq->bytes = 0;
+		rxq->no_mbufs = 0;
+		rxq->errors = 0;
+	}
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -365,6 +434,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.rx_queue_setup       = gve_rx_queue_setup,
 	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
+	.stats_get            = gve_dev_stats_get,
+	.stats_reset          = gve_dev_stats_reset,
 	.mtu_set              = gve_dev_mtu_set,
 };
 
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 502ba88dc3..7d9283f8fa 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -76,6 +76,11 @@ struct gve_tx_queue {
 	struct gve_queue_page_list *qpl;
 	struct gve_tx_iovec *iov_ring;
 
+	/* Stats */
+	uint64_t errors;
+	uint64_t packets;
+	uint64_t bytes;
+
 	uint16_t port_id;
 	uint16_t queue_id;
 
@@ -114,6 +119,12 @@ struct gve_rx_queue {
 	/* only valid for GQI_QPL queue format */
 	struct gve_queue_page_list *qpl;
 
+	/* stats */
+	uint64_t no_mbufs;
+	uint64_t errors;
+	uint64_t packets;
+	uint64_t bytes;
+
 	struct gve_priv *hw;
 	const struct rte_memzone *qres_mz;
 	struct gve_queue_resources *qres;
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index 3634a2762f..6928afe96e 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -26,8 +26,10 @@ gve_rx_refill(struct gve_rx_queue *rxq)
 					break;
 				rxq->sw_ring[idx + i] = nmb;
 			}
-			if (i != nb_alloc)
+			if (i != nb_alloc) {
+				rxq->no_mbufs += nb_alloc - i;
 				nb_alloc = i;
+			}
 		}
 		rxq->nb_avail -= nb_alloc;
 		next_avail += nb_alloc;
@@ -88,6 +90,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	uint16_t rx_id = rxq->rx_tail;
 	struct rte_mbuf *rxe;
 	uint16_t nb_rx, len;
+	uint64_t bytes = 0;
 	uint64_t addr;
 
 	rxr = rxq->rx_desc_ring;
@@ -97,8 +100,10 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
 			break;
 
-		if (rxd->flags_seq & GVE_RXF_ERR)
+		if (rxd->flags_seq & GVE_RXF_ERR) {
+			rxq->errors++;
 			continue;
+		}
 
 		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
 		rxe = rxq->sw_ring[rx_id];
@@ -137,6 +142,7 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			rx_id = 0;
 
 		rx_pkts[nb_rx] = rxe;
+		bytes += len;
 	}
 
 	rxq->nb_avail += nb_rx;
@@ -145,6 +151,11 @@ gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	if (rxq->nb_avail > rxq->free_thresh)
 		gve_rx_refill(rxq);
 
+	if (nb_rx) {
+		rxq->packets += nb_rx;
+		rxq->bytes += bytes;
+	}
+
 	return nb_rx;
 }
 
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index d94b1186a4..0f2c3f8288 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -260,6 +260,7 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t sw_id = txq->sw_tail;
 	uint16_t nb_used, i;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 	uint32_t hlen;
 
@@ -355,6 +356,8 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		txq->nb_free -= nb_used;
 		txq->sw_nb_free -= first->nb_segs;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
@@ -362,6 +365,10 @@ gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
 		txq->sw_tail = sw_id;
+
+		txq->errors += nb_pkts - nb_tx;
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
@@ -380,6 +387,7 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct rte_mbuf *tx_pkt, *first;
 	uint16_t nb_used, hlen, i;
 	uint64_t ol_flags, addr;
+	uint64_t bytes = 0;
 	uint16_t nb_tx = 0;
 
 	txr = txq->tx_desc_ring;
@@ -438,12 +446,17 @@ gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		txq->nb_free -= nb_used;
 		tx_tail += nb_used;
+
+		bytes += first->pkt_len;
 	}
 
 end_of_tx:
 	if (nb_tx) {
 		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
 		txq->tx_tail = tx_tail;
+
+		txq->packets += nb_tx;
+		txq->bytes += bytes;
 	}
 
 	return nb_tx;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code
  2022-09-27  7:32             ` [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code Junfeng Guo
@ 2022-10-06 14:19               ` Ferruh Yigit
  2022-10-09  9:14                 ` Guo, Junfeng
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
  1 sibling, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-06 14:19 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, Haiyue Wang

On 9/27/2022 8:32 AM, Junfeng Guo wrote:

> 
> The following base code is based on Google Virtual Ethernet (gve)
> driver v1.3.0 under MIT license.
> - gve_adminq.c
> - gve_adminq.h
> - gve_desc.h
> - gve_desc_dqo.h
> - gve_register.h
> - gve.h
> 
> The original code is in:
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
> tree/v1.3.0/google/gve
> 
> Note that these code are not Intel files and they come from the kernel
> community. The base code there has the statement of
> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> required MIT license as an exception to DPDK.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +/* Process all device options for a given describe device call. */
> +static int
> +gve_process_device_options(struct gve_priv *priv,
> +                          struct gve_device_descriptor *descriptor,
> +                          struct gve_device_option_gqi_rda **dev_op_gqi_rda,
> +                          struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
> +                          struct gve_device_option_dqo_rda **dev_op_dqo_rda,
> +                          struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
> +{
> +       const int num_options = be16_to_cpu(descriptor->num_device_options);
> +       struct gve_device_option *dev_opt;
> +       int i;
> +
> +       /* The options struct directly follows the device descriptor. */
> +       dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
> +       for (i = 0; i < num_options; i++) {
> +               struct gve_device_option *next_opt;
> +
> +               next_opt = gve_get_next_option(descriptor, dev_opt);
> +               if (!next_opt) {
> +                       PMD_DRV_LOG(ERR,
> +                                   "options exceed device_descriptor's total length.");
> +                       return -EINVAL;
> +               }
> +
> +               gve_parse_device_option(priv, dev_opt,
> +                                       dev_op_gqi_rda, dev_op_gqi_qpl,
> +                                       dev_op_dqo_rda, dev_op_jumbo_frames);
> +               dev_opt = next_opt;
> +       }
> +
> +       return 0;
> +}
> +
> +int gve_adminq_alloc(struct gve_priv *priv)

Can you please be consistent in the syntax, at least within same file, 
if this file has slightly different syntax because it is base file, keep 
the file syntax instead of mixing with DPDK syntax,
like return type of function should be on separate line.

A generic comment for all base files.

<...>

> +int gve_adminq_describe_device(struct gve_priv *priv)
> +{
> +       struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
> +       struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
> +       struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
> +       struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
> +       struct gve_device_descriptor *descriptor;
> +       struct gve_dma_mem descriptor_dma_mem;
> +       u32 supported_features_mask = 0;
> +       union gve_adminq_command cmd;
> +       int err = 0;
> +       u8 *mac;
> +       u16 mtu;
> +
> +       memset(&cmd, 0, sizeof(cmd));
> +       descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
> +       if (!descriptor)
> +               return -ENOMEM;
> +       cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
> +       cmd.describe_device.device_descriptor_addr =
> +                                       cpu_to_be64(descriptor_dma_mem.pa);
> +       cmd.describe_device.device_descriptor_version =
> +                       cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
> +       cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
> +
> +       err = gve_adminq_execute_cmd(priv, &cmd);
> +       if (err)
> +               goto free_device_descriptor;
> +
> +       err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
> +                                        &dev_op_gqi_qpl, &dev_op_dqo_rda,
> +                                        &dev_op_jumbo_frames);
> +       if (err)
> +               goto free_device_descriptor;
> +
> +       /* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
> +        * is not set to GqiRda, choose the queue format in a priority order:
> +        * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
> +        */
> +       if (dev_op_dqo_rda) {
> +               priv->queue_format = GVE_DQO_RDA_FORMAT;
> +               PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
> +               supported_features_mask =
> +                       be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
> +       } else if (dev_op_gqi_rda) {
> +               priv->queue_format = GVE_GQI_RDA_FORMAT;
> +               PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
> +               supported_features_mask =
> +                       be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
> +       } else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
> +               PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
> +       } else {
> +               priv->queue_format = GVE_GQI_QPL_FORMAT;
> +               if (dev_op_gqi_qpl)
> +                       supported_features_mask =
> +                               be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
> +               PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
> +       }
> +       if (gve_is_gqi(priv)) {
> +               err = gve_set_desc_cnt(priv, descriptor);
> +       } else {
> +               /* DQO supports LRO. */
> +               err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
> +       }
> +       if (err)
> +               goto free_device_descriptor;
> +
> +       priv->max_registered_pages =
> +                               be64_to_cpu(descriptor->max_registered_pages);
> +       mtu = be16_to_cpu(descriptor->mtu);
> +       if (mtu < ETH_MIN_MTU) {
> +               PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
> +               err = -EINVAL;
> +               goto free_device_descriptor;
> +       }
> +       priv->max_mtu = mtu;
> +       priv->num_event_counters = be16_to_cpu(descriptor->counters);
> +       rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
> +       mac = descriptor->mac;
> +       PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
> +                   mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> +       priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
> +       priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
> +
> +       if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
> +               PMD_DRV_LOG(ERR, "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",

Can you try to reduce the line length as;
PMD_DRV_LOG(ERR,
	"rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting 
rx_desc_cnt down to %d",


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v4 2/9] net/gve/base: add logs and OS specific implementation
  2022-09-27  7:32             ` [PATCH v4 2/9] net/gve/base: add logs and OS specific implementation Junfeng Guo
@ 2022-10-06 14:20               ` Ferruh Yigit
  2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-06 14:20 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, Haiyue Wang

On 9/27/2022 8:32 AM, Junfeng Guo wrote:

> 
> Add GVE PMD logs.
> Add some MACRO definitions and memory operations which are specific
> for DPDK.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> --- /dev/null
> +++ b/drivers/net/gve/gve_logs.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Intel Corporation
> + */
> +
> +#ifndef _GVE_LOGS_H_
> +#define _GVE_LOGS_H_
> +
> +extern int gve_logtype_driver;
> +
> +#define PMD_DRV_LOG(level, fmt, args...) \
> +       rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
> +               __func__, ## args)
> +

What do you think to move 'gve_logs.h' to next patch, since this is 
extern 'gve_logtype_driver' which is not added yet.
Although files are not compiled yet, logically I think it suits better 
to next patch.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v4 3/9] net/gve: add support for device initialization
  2022-09-27  7:32             ` [PATCH v4 3/9] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-06 14:22               ` Ferruh Yigit
  2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-06 14:22 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

On 9/27/2022 8:32 AM, Junfeng Guo wrote:

> 
> Support device init and the fowllowing devops:

s/fowllowing/following/

> - dev_configure
> - dev_start
> - dev_stop
> - dev_close

At this stage most of above are empty functions and not implemented yet, 
instead can you document in the commit log that build system and device 
initialization is added?

> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> --- /dev/null
> +++ b/doc/guides/nics/gve.rst
> @@ -0,0 +1,69 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(C) 2022 Intel Corporation.
> +
> +GVE poll mode driver
> +=======================
> +
> +The GVE PMD (**librte_net_gve**) provides poll mode driver support for
> +Google Virtual Ethernet device.
> +
> +Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
> +for the device description.
> +

This seems another virtual interface, similar to iavf/virtio/idpf ...

Can you please briefly describe here the motivation to add yet another 
virtual interface, and again briefly describe cons/pros of the interface?

> +The base code is under MIT license and based on GVE kernel driver v1.3.0.
> +GVE base code files are:
> +
> +- gve_adminq.h
> +- gve_adminq.c
> +- gve_desc.h
> +- gve_desc_dqo.h
> +- gve_register.h
> +- gve.h
> +
> +Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
> +to find the original base code.
> +
> +GVE has 3 queue formats:
> +
> +- GQI_QPL - GQI with queue page list
> +- GQI_RDA - GQI with raw DMA addressing
> +- DQO_RDA - DQO with raw DMA addressing
> +
> +GQI_QPL queue format is queue page list mode. Driver needs to allocate
> +memory and register this memory as a Queue Page List (QPL) in hardware
> +(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
> +Then Tx needs to copy packets to QPL memory and put this packet's offset
> +in the QPL memory into hardware descriptors so that hardware can get the
> +packets data. And Rx needs to read descriptors of offset in QPL to get
> +QPL address and copy packets from the address to get real packets data.
> +
> +GQI_RDA queue format works like usual NICs that driver can put packets'
> +physical address into hardware descriptors.
> +
> +DQO_RDA queue format has submission and completion queue pair for each
> +Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
> +address into hardware descriptors.
> +
> +Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
> +to get more information about GVE queue formats.
> +
> +Features and Limitations
> +------------------------
> +
> +In this release, the GVE PMD provides the basic functionality of packet
> +reception and transmission.
> +Supported features of the GVE PMD are:
> +
> +- Multiple queues for TX and RX
> +- Receiver Side Scaling (RSS)
> +- TSO offload
> +- Port hardware statistics
> +- Link state information
> +- TX multi-segments (Scatter TX)
> +- Tx UDP/TCP/SCTP Checksum
> +

Can you build this list gradully, by adding relavent item in each patch 
that adds it?
That way mapping with the code and documented feature becomes more obvious.

<...>

> +static int
> +gve_dev_uninit(struct rte_eth_dev *eth_dev)
> +{
> +       struct gve_priv *priv = eth_dev->data->dev_private;
> +
> +       eth_dev->data->mac_addrs = NULL;
> +

At this stage 'mac_addrs' is not freed, setting it to NULL prevents it 
to be freed.

<...>

> +
> +static struct rte_pci_driver rte_gve_pmd = {
> +       .id_table = pci_id_gve_map,
> +       .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,

As far as I can see LSC interrupt is not supported, if that is correct 
should we drop the flag?

<...>

> +
> +struct gve_priv {
> +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> +       const struct rte_memzone *irq_dbs_mz;
> +       uint32_t mgmt_msix_idx;
> +       rte_be32_t *cnt_array; /* array of num_event_counters */
> +       const struct rte_memzone *cnt_array_mz;
> +
> +       uint16_t num_event_counters;
> +       uint16_t tx_desc_cnt; /* txq size */
> +       uint16_t rx_desc_cnt; /* rxq size */
> +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> +       uint16_t rx_data_slot_cnt; /* rx buffer length */
> +
> +       /* Only valid for DQO_RDA queue format */
> +       uint16_t tx_compq_size; /* tx completion queue size */
> +       uint16_t rx_bufq_size; /* rx buff queue size */
> +
> +       uint64_t max_registered_pages;
> +       uint64_t num_registered_pages; /* num pages registered with NIC */
> +       uint16_t default_num_queues; /* default num queues to set up */
> +       enum gve_queue_format queue_format; /* see enum gve_queue_format */
> +       uint8_t enable_rsc;
> +
> +       uint16_t max_nb_txq;
> +       uint16_t max_nb_rxq;
> +       uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
> +
> +       struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
> +       rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
> +       struct rte_pci_device *pci_dev;
> +
> +       /* Admin queue - see gve_adminq.h*/
> +       union gve_adminq_command *adminq;
> +       struct gve_dma_mem adminq_dma_mem;
> +       uint32_t adminq_mask; /* masks prod_cnt to adminq size */
> +       uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
> +       uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
> +       uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
> +       /* free-running count of per AQ cmd executed */
> +       uint32_t adminq_describe_device_cnt;
> +       uint32_t adminq_cfg_device_resources_cnt;
> +       uint32_t adminq_register_page_list_cnt;
> +       uint32_t adminq_unregister_page_list_cnt;
> +       uint32_t adminq_create_tx_queue_cnt;
> +       uint32_t adminq_create_rx_queue_cnt;
> +       uint32_t adminq_destroy_tx_queue_cnt;
> +       uint32_t adminq_destroy_rx_queue_cnt;
> +       uint32_t adminq_dcfg_device_resources_cnt;
> +       uint32_t adminq_set_driver_parameter_cnt;
> +       uint32_t adminq_report_stats_cnt;
> +       uint32_t adminq_report_link_speed_cnt;
> +       uint32_t adminq_get_ptype_map_cnt;
> +
> +       volatile uint32_t state_flags;
> +
> +       /* Gvnic device link speed from hypervisor. */
> +       uint64_t link_speed;
> +
> +       uint16_t max_mtu;
> +       struct rte_ether_addr dev_addr; /* mac address */
> +
> +       struct gve_queue_page_list *qpl;
> +
> +       struct gve_tx_queue **txqs;
> +       struct gve_rx_queue **rxqs;
> +};
> +

Similar to previous comment, can you construct the headers by only 
adding used fields in that patch?

When batch copied an existing struct, it is very easy to add unused code 
and very hard to detect it. So if you only add what you need, that 
becomes easy to be sure all fields are used.

Also it makes more obvious which fields related to which feature.

<...>

> new file mode 100644
> index 0000000000..c2e0723b4c
> --- /dev/null
> +++ b/drivers/net/gve/version.map
> @@ -0,0 +1,3 @@
> +DPDK_22 {
> +       local: *;
> +};

it is 'DPDK_23' now, hopefully we will have an update to get rid of 
empty map files, feel free to review:
https://patches.dpdk.org/project/dpdk/list/?series=25002



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v4 4/9] net/gve: add support for link update
  2022-09-27  7:32             ` [PATCH v4 4/9] net/gve: add support for link update Junfeng Guo
@ 2022-10-06 14:23               ` Ferruh Yigit
  2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-06 14:23 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin

On 9/27/2022 8:32 AM, Junfeng Guo wrote:

> 
> Support dev_ops link_update.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   doc/guides/nics/features/gve.ini |  2 ++
>   drivers/net/gve/gve_ethdev.c     | 30 ++++++++++++++++++++++++++++++
>   2 files changed, 32 insertions(+)
> 
> diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
> index 44aec28009..d03e3ac89e 100644
> --- a/doc/guides/nics/features/gve.ini
> +++ b/doc/guides/nics/features/gve.ini
> @@ -4,6 +4,8 @@
>   ; Refer to default.ini for the full list of available PMD features.
>   ;
>   [Features]
> +Speed capabilities   = Y

'Speed capabilities' is when device reports supported speeds in 
'rte_eth_dev_info_get()', so it shouldn't be in this patch.

Please check 'doc/guides/nics/features.rst' for more details.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v4 7/9] net/gve: add support for Rx/Tx
  2022-09-27  7:32             ` [PATCH v4 7/9] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-10-06 14:24               ` Ferruh Yigit
  2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-06 14:24 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin

On 9/27/2022 8:32 AM, Junfeng Guo wrote:

> 
> Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> --- a/drivers/net/gve/gve_ethdev.c
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
>          if (err)
>                  return err;
> 
> +       if (gve_is_gqi(priv)) {
> +               eth_dev->rx_pkt_burst = gve_rx_burst;
> +               eth_dev->tx_pkt_burst = gve_tx_burst;
> +       }
> +

What do you think to add a log here for 'else' case, to inform user why 
datapath is not working?

<...>

> +uint16_t
> +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> +{
> +       volatile struct gve_rx_desc *rxr, *rxd;
> +       struct gve_rx_queue *rxq = rx_queue;
> +       uint16_t rx_id = rxq->rx_tail;
> +       struct rte_mbuf *rxe;
> +       uint16_t nb_rx, len;
> +       uint64_t addr;
> +
> +       rxr = rxq->rx_desc_ring;
> +
> +       for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
> +               rxd = &rxr[rx_id];
> +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> +                       break;
> +
> +               if (rxd->flags_seq & GVE_RXF_ERR)
> +                       continue;
> +
> +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> +               rxe = rxq->sw_ring[rx_id];
> +               rxe->data_off = RTE_PKTMBUF_HEADROOM;
> +               if (rxq->is_gqi_qpl) {
> +                       addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
> +                       rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
> +                                  (void *)(size_t)addr, len);

Why a 'memcpy' is needed? Can't it DMA to mbuf data buffer?

> +               }
> +               rxe->nb_segs = 1;
> +               rxe->next = NULL;
> +               rxe->pkt_len = len;
> +               rxe->data_len = len;
> +               rxe->port = rxq->port_id;
> +               rxe->packet_type = 0;
> +               rxe->ol_flags = 0;
> +

As far as I can see 'sw_ring[]' filled using 'rte_pktmbuf_alloc_bulk()' 
API, which should reset mbuf fields to default values, so some of the 
assignment above can be redundant.

> +               if (rxd->flags_seq & GVE_RXF_TCP)
> +                       rxe->packet_type |= RTE_PTYPE_L4_TCP;
> +               if (rxd->flags_seq & GVE_RXF_UDP)
> +                       rxe->packet_type |= RTE_PTYPE_L4_UDP;
> +               if (rxd->flags_seq & GVE_RXF_IPV4)
> +                       rxe->packet_type |= RTE_PTYPE_L3_IPV4;
> +               if (rxd->flags_seq & GVE_RXF_IPV6)
> +                       rxe->packet_type |= RTE_PTYPE_L3_IPV6;
> +

If you are setting packet_type, it is better to implement 
'dev_supported_ptypes_get()' dev_ops too, to announce host which packet 
type parsin supporting. (+ dev_ptypes_set() dev_ops)
And later driver can announce "Packet type parsing" feature in .ini file.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v4 8/9] net/gve: add support for dev info get and dev configure
  2022-09-27  7:32             ` [PATCH v4 8/9] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-06 14:25               ` Ferruh Yigit
  2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-06 14:25 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin

On 9/27/2022 8:32 AM, Junfeng Guo wrote:

> 
> Add dev_ops dev_infos_get.
> Complete dev_configure with RX offloads configuration.
> 

I think better to have this before datapath patches (6/9 & 7/9), because 
this is more fundamental step.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v4 9/9] net/gve: add support for stats
  2022-09-27  7:32             ` [PATCH v4 9/9] net/gve: add support for stats Junfeng Guo
@ 2022-10-06 14:25               ` Ferruh Yigit
  2022-10-09  9:15                 ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-06 14:25 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin

On 9/27/2022 8:32 AM, Junfeng Guo wrote:

> 
> Update stats add support of dev_ops stats_get/reset.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   doc/guides/nics/features/gve.ini |  2 +
>   drivers/net/gve/gve_ethdev.c     | 71 ++++++++++++++++++++++++++++++++
>   drivers/net/gve/gve_ethdev.h     | 11 +++++
>   drivers/net/gve/gve_rx.c         | 15 ++++++-
>   drivers/net/gve/gve_tx.c         | 13 ++++++
>   5 files changed, 110 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
> index cdc46b08a3..180408aa80 100644
> --- a/doc/guides/nics/features/gve.ini
> +++ b/doc/guides/nics/features/gve.ini
> @@ -10,6 +10,8 @@ MTU update           = Y
>   TSO                  = Y
>   RSS hash             = Y
>   L4 checksum offload  = Y
> +Basic stats          = Y
> +Stats per queue      = Y

"stats per queue" is something else, agree that it is bad naming, please 
check features.rst file.

>   Linux                = Y
>   x86-32               = Y
>   x86-64               = Y
> diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
> index b9b8e51b02..cd474b8128 100644
> --- a/drivers/net/gve/gve_ethdev.c
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -328,6 +328,75 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>          return 0;
>   }
> 
> +static int
> +gve_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +       uint16_t i;
> +
> +       for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +               struct gve_tx_queue *txq = dev->data->tx_queues[i];
> +               if (txq == NULL)
> +                       continue;
> +
> +               stats->opackets += txq->packets;
> +               stats->obytes += txq->bytes;
> +               stats->oerrors += txq->errors;
> +
> +               if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
> +                       stats->q_opackets[i] = txq->packets;
> +                       stats->q_obytes[i] = txq->bytes;

Queue stats update is moved to xstat [1], we are waiting existing PMDs 
to adopt it but for new drivers it is better to implement new method.

Can you please either drop queue stats completely, or implement it via 
xstats?

[1]
https://elixir.bootlin.com/dpdk/v22.07/source/doc/guides/rel_notes/deprecation.rst#L118


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code
  2022-10-06 14:19               ` Ferruh Yigit
@ 2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-09  9:14 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 6, 2022 22:20
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code
> 
> On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> 
> >
> > The following base code is based on Google Virtual Ethernet (gve)
> > driver v1.3.0 under MIT license.
> > - gve_adminq.c
> > - gve_adminq.h
> > - gve_desc.h
> > - gve_desc_dqo.h
> > - gve_register.h
> > - gve.h
> >
> > The original code is in:
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linux/\
> > tree/v1.3.0/google/gve
> >
> > Note that these code are not Intel files and they come from the kernel
> > community. The base code there has the statement of
> > SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> > required MIT license as an exception to DPDK.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +/* Process all device options for a given describe device call. */
> > +static int
> > +gve_process_device_options(struct gve_priv *priv,
> > +                          struct gve_device_descriptor *descriptor,
> > +                          struct gve_device_option_gqi_rda **dev_op_gqi_rda,
> > +                          struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
> > +                          struct gve_device_option_dqo_rda **dev_op_dqo_rda,
> > +                          struct gve_device_option_jumbo_frames
> **dev_op_jumbo_frames)
> > +{
> > +       const int num_options = be16_to_cpu(descriptor-
> >num_device_options);
> > +       struct gve_device_option *dev_opt;
> > +       int i;
> > +
> > +       /* The options struct directly follows the device descriptor. */
> > +       dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
> > +       for (i = 0; i < num_options; i++) {
> > +               struct gve_device_option *next_opt;
> > +
> > +               next_opt = gve_get_next_option(descriptor, dev_opt);
> > +               if (!next_opt) {
> > +                       PMD_DRV_LOG(ERR,
> > +                                   "options exceed device_descriptor's total length.");
> > +                       return -EINVAL;
> > +               }
> > +
> > +               gve_parse_device_option(priv, dev_opt,
> > +                                       dev_op_gqi_rda, dev_op_gqi_qpl,
> > +                                       dev_op_dqo_rda, dev_op_jumbo_frames);
> > +               dev_opt = next_opt;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +int gve_adminq_alloc(struct gve_priv *priv)
> 
> Can you please be consistent in the syntax, at least within same file,
> if this file has slightly different syntax because it is base file, keep
> the file syntax instead of mixing with DPDK syntax,
> like return type of function should be on separate line.
> 
> A generic comment for all base files.

Thanks for reminding!
Will keep the same syntax with kernel code for the files within the base folder.
And use DPDK syntax for the files in the upper folder of the base.
For the base files, it would be better to keep most of the code with the origins.

> 
> <...>
> 
> > +int gve_adminq_describe_device(struct gve_priv *priv)
> > +{
> > +       struct gve_device_option_jumbo_frames *dev_op_jumbo_frames
> = NULL;
> > +       struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
> > +       struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
> > +       struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
> > +       struct gve_device_descriptor *descriptor;
> > +       struct gve_dma_mem descriptor_dma_mem;
> > +       u32 supported_features_mask = 0;
> > +       union gve_adminq_command cmd;
> > +       int err = 0;
> > +       u8 *mac;
> > +       u16 mtu;
> > +
> > +       memset(&cmd, 0, sizeof(cmd));
> > +       descriptor = gve_alloc_dma_mem(&descriptor_dma_mem,
> PAGE_SIZE);
> > +       if (!descriptor)
> > +               return -ENOMEM;
> > +       cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
> > +       cmd.describe_device.device_descriptor_addr =
> > +                                       cpu_to_be64(descriptor_dma_mem.pa);
> > +       cmd.describe_device.device_descriptor_version =
> > +
> cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
> > +       cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
> > +
> > +       err = gve_adminq_execute_cmd(priv, &cmd);
> > +       if (err)
> > +               goto free_device_descriptor;
> > +
> > +       err = gve_process_device_options(priv, descriptor,
> &dev_op_gqi_rda,
> > +                                        &dev_op_gqi_qpl, &dev_op_dqo_rda,
> > +                                        &dev_op_jumbo_frames);
> > +       if (err)
> > +               goto free_device_descriptor;
> > +
> > +       /* If the GQI_RAW_ADDRESSING option is not enabled and the
> queue format
> > +        * is not set to GqiRda, choose the queue format in a priority order:
> > +        * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
> > +        */
> > +       if (dev_op_dqo_rda) {
> > +               priv->queue_format = GVE_DQO_RDA_FORMAT;
> > +               PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue
> format.");
> > +               supported_features_mask =
> > +                       be32_to_cpu(dev_op_dqo_rda-
> >supported_features_mask);
> > +       } else if (dev_op_gqi_rda) {
> > +               priv->queue_format = GVE_GQI_RDA_FORMAT;
> > +               PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue
> format.");
> > +               supported_features_mask =
> > +                       be32_to_cpu(dev_op_gqi_rda-
> >supported_features_mask);
> > +       } else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
> > +               PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue
> format.");
> > +       } else {
> > +               priv->queue_format = GVE_GQI_QPL_FORMAT;
> > +               if (dev_op_gqi_qpl)
> > +                       supported_features_mask =
> > +                               be32_to_cpu(dev_op_gqi_qpl-
> >supported_features_mask);
> > +               PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue
> format.");
> > +       }
> > +       if (gve_is_gqi(priv)) {
> > +               err = gve_set_desc_cnt(priv, descriptor);
> > +       } else {
> > +               /* DQO supports LRO. */
> > +               err = gve_set_desc_cnt_dqo(priv, descriptor,
> dev_op_dqo_rda);
> > +       }
> > +       if (err)
> > +               goto free_device_descriptor;
> > +
> > +       priv->max_registered_pages =
> > +                               be64_to_cpu(descriptor->max_registered_pages);
> > +       mtu = be16_to_cpu(descriptor->mtu);
> > +       if (mtu < ETH_MIN_MTU) {
> > +               PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
> > +               err = -EINVAL;
> > +               goto free_device_descriptor;
> > +       }
> > +       priv->max_mtu = mtu;
> > +       priv->num_event_counters = be16_to_cpu(descriptor->counters);
> > +       rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac,
> ETH_ALEN);
> > +       mac = descriptor->mac;
> > +       PMD_DRV_LOG(INFO, "MAC
> addr: %02x:%02x:%02x:%02x:%02x:%02x",
> > +                   mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> > +       priv->tx_pages_per_qpl = be16_to_cpu(descriptor-
> >tx_pages_per_qpl);
> > +       priv->rx_data_slot_cnt = be16_to_cpu(descriptor-
> >rx_pages_per_qpl);
> > +
> > +       if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt)
> {
> > +               PMD_DRV_LOG(ERR, "rx_data_slot_cnt cannot be smaller
> than rx_desc_cnt, setting rx_desc_cnt down to %d",
> 
> Can you try to reduce the line length as;
> PMD_DRV_LOG(ERR,
> 	"rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting
> rx_desc_cnt down to %d",

Sure, it could be. But it would still reach the 100 characters limit even started
with a new line... Good news is that it is only log that won't trigger warnings
while running the build.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 2/9] net/gve/base: add logs and OS specific implementation
  2022-10-06 14:20               ` Ferruh Yigit
@ 2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-09  9:14 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 6, 2022 22:20
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v4 2/9] net/gve/base: add logs and OS specific
> implementation
> 
> On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> 
> >
> > Add GVE PMD logs.
> > Add some MACRO definitions and memory operations which are specific
> > for DPDK.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > --- /dev/null
> > +++ b/drivers/net/gve/gve_logs.h
> > @@ -0,0 +1,14 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2022 Intel Corporation
> > + */
> > +
> > +#ifndef _GVE_LOGS_H_
> > +#define _GVE_LOGS_H_
> > +
> > +extern int gve_logtype_driver;
> > +
> > +#define PMD_DRV_LOG(level, fmt, args...) \
> > +       rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
> > +               __func__, ## args)
> > +
> 
> What do you think to move 'gve_logs.h' to next patch, since this is
> extern 'gve_logtype_driver' which is not added yet.
> Although files are not compiled yet, logically I think it suits better
> to next patch.

Sure, make sense. Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 3/9] net/gve: add support for device initialization
  2022-10-06 14:22               ` Ferruh Yigit
@ 2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-09  9:14 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Lin, Xueqin,
	Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 6, 2022 22:23
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v4 3/9] net/gve: add support for device initialization
> 
> On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> 
> >
> > Support device init and the fowllowing devops:
> 
> s/fowllowing/following/
> 
> > - dev_configure
> > - dev_start
> > - dev_stop
> > - dev_close
> 
> At this stage most of above are empty functions and not implemented yet,
> instead can you document in the commit log that build system and device
> initialization is added?

Agreed, will add this in the coming version. Thanks!

> 
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > --- /dev/null
> > +++ b/doc/guides/nics/gve.rst
> > @@ -0,0 +1,69 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(C) 2022 Intel Corporation.
> > +
> > +GVE poll mode driver
> > +=======================
> > +
> > +The GVE PMD (**librte_net_gve**) provides poll mode driver support
> for
> > +Google Virtual Ethernet device.
> > +
> > +Please refer to
> https://cloud.google.com/compute/docs/networking/using-gvnic
> > +for the device description.
> > +
> 
> This seems another virtual interface, similar to iavf/virtio/idpf ...
> 
> Can you please briefly describe here the motivation to add yet another
> virtual interface, and again briefly describe cons/pros of the interface?

Sure. According to the official gVNIC description, gVNIC is an alternative
to the virtio driver that can provide higher network bandwidths.
Will add the brief descriptions of the cons/pros in the coming version.
Thanks!

> 
> > +The base code is under MIT license and based on GVE kernel driver
> v1.3.0.
> > +GVE base code files are:
> > +
> > +- gve_adminq.h
> > +- gve_adminq.c
> > +- gve_desc.h
> > +- gve_desc_dqo.h
> > +- gve_register.h
> > +- gve.h
> > +
> > +Please refer to https://github.com/GoogleCloudPlatform/compute-
> virtual-ethernet-linux/tree/v1.3.0/google/gve
> > +to find the original base code.
> > +
> > +GVE has 3 queue formats:
> > +
> > +- GQI_QPL - GQI with queue page list
> > +- GQI_RDA - GQI with raw DMA addressing
> > +- DQO_RDA - DQO with raw DMA addressing
> > +
> > +GQI_QPL queue format is queue page list mode. Driver needs to
> allocate
> > +memory and register this memory as a Queue Page List (QPL) in
> hardware
> > +(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
> > +Then Tx needs to copy packets to QPL memory and put this packet's
> offset
> > +in the QPL memory into hardware descriptors so that hardware can get
> the
> > +packets data. And Rx needs to read descriptors of offset in QPL to get
> > +QPL address and copy packets from the address to get real packets
> data.
> > +
> > +GQI_RDA queue format works like usual NICs that driver can put
> packets'
> > +physical address into hardware descriptors.
> > +
> > +DQO_RDA queue format has submission and completion queue pair for
> each
> > +Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
> > +address into hardware descriptors.
> > +
> > +Please refer to
> https://www.kernel.org/doc/html/latest/networking/device_drivers/ethe
> rnet/google/gve.html
> > +to get more information about GVE queue formats.
> > +
> > +Features and Limitations
> > +------------------------
> > +
> > +In this release, the GVE PMD provides the basic functionality of packet
> > +reception and transmission.
> > +Supported features of the GVE PMD are:
> > +
> > +- Multiple queues for TX and RX
> > +- Receiver Side Scaling (RSS)
> > +- TSO offload
> > +- Port hardware statistics
> > +- Link state information
> > +- TX multi-segments (Scatter TX)
> > +- Tx UDP/TCP/SCTP Checksum
> > +
> 
> Can you build this list gradully, by adding relavent item in each patch
> that adds it?
> That way mapping with the code and documented feature becomes more
> obvious.

Sure... Will add the items of this list gradually in the coming version. Thanks!

> 
> <...>
> 
> > +static int
> > +gve_dev_uninit(struct rte_eth_dev *eth_dev)
> > +{
> > +       struct gve_priv *priv = eth_dev->data->dev_private;
> > +
> > +       eth_dev->data->mac_addrs = NULL;
> > +
> 
> At this stage 'mac_addrs' is not freed, setting it to NULL prevents it
> to be freed.

Thanks for the catch! Will improve this. Thanks!

> 
> <...>
> 
> > +
> > +static struct rte_pci_driver rte_gve_pmd = {
> > +       .id_table = pci_id_gve_map,
> > +       .drv_flags = RTE_PCI_DRV_NEED_MAPPING |
> RTE_PCI_DRV_INTR_LSC,
> 
> As far as I can see LSC interrupt is not supported, if that is correct
> should we drop the flag?

Sure, seems this flag is not used in current GCP env.
Will remove it in the coming version. Thanks!

> 
> <...>
> 
> > +
> > +struct gve_priv {
> > +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> > +       const struct rte_memzone *irq_dbs_mz;
> > +       uint32_t mgmt_msix_idx;
> > +       rte_be32_t *cnt_array; /* array of num_event_counters */
> > +       const struct rte_memzone *cnt_array_mz;
> > +
> > +       uint16_t num_event_counters;
> > +       uint16_t tx_desc_cnt; /* txq size */
> > +       uint16_t rx_desc_cnt; /* rxq size */
> > +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> > +       uint16_t rx_data_slot_cnt; /* rx buffer length */
> > +
> > +       /* Only valid for DQO_RDA queue format */
> > +       uint16_t tx_compq_size; /* tx completion queue size */
> > +       uint16_t rx_bufq_size; /* rx buff queue size */
> > +
> > +       uint64_t max_registered_pages;
> > +       uint64_t num_registered_pages; /* num pages registered with NIC
> */
> > +       uint16_t default_num_queues; /* default num queues to set up */
> > +       enum gve_queue_format queue_format; /* see enum
> gve_queue_format */
> > +       uint8_t enable_rsc;
> > +
> > +       uint16_t max_nb_txq;
> > +       uint16_t max_nb_rxq;
> > +       uint32_t num_ntfy_blks; /* spilt between TX and RX so must be
> even */
> > +
> > +       struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
> > +       rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
> > +       struct rte_pci_device *pci_dev;
> > +
> > +       /* Admin queue - see gve_adminq.h*/
> > +       union gve_adminq_command *adminq;
> > +       struct gve_dma_mem adminq_dma_mem;
> > +       uint32_t adminq_mask; /* masks prod_cnt to adminq size */
> > +       uint32_t adminq_prod_cnt; /* free-running count of AQ cmds
> executed */
> > +       uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed
> */
> > +       uint32_t adminq_timeouts; /* free-running count of AQ cmds
> timeouts */
> > +       /* free-running count of per AQ cmd executed */
> > +       uint32_t adminq_describe_device_cnt;
> > +       uint32_t adminq_cfg_device_resources_cnt;
> > +       uint32_t adminq_register_page_list_cnt;
> > +       uint32_t adminq_unregister_page_list_cnt;
> > +       uint32_t adminq_create_tx_queue_cnt;
> > +       uint32_t adminq_create_rx_queue_cnt;
> > +       uint32_t adminq_destroy_tx_queue_cnt;
> > +       uint32_t adminq_destroy_rx_queue_cnt;
> > +       uint32_t adminq_dcfg_device_resources_cnt;
> > +       uint32_t adminq_set_driver_parameter_cnt;
> > +       uint32_t adminq_report_stats_cnt;
> > +       uint32_t adminq_report_link_speed_cnt;
> > +       uint32_t adminq_get_ptype_map_cnt;
> > +
> > +       volatile uint32_t state_flags;
> > +
> > +       /* Gvnic device link speed from hypervisor. */
> > +       uint64_t link_speed;
> > +
> > +       uint16_t max_mtu;
> > +       struct rte_ether_addr dev_addr; /* mac address */
> > +
> > +       struct gve_queue_page_list *qpl;
> > +
> > +       struct gve_tx_queue **txqs;
> > +       struct gve_rx_queue **rxqs;
> > +};
> > +
> 
> Similar to previous comment, can you construct the headers by only
> adding used fields in that patch?
> 
> When batch copied an existing struct, it is very easy to add unused code
> and very hard to detect it. So if you only add what you need, that
> becomes easy to be sure all fields are used.
> 
> Also it makes more obvious which fields related to which feature.

Sure... Will try best to construct the header structure items gradually.
Thanks!

> 
> <...>
> 
> > new file mode 100644
> > index 0000000000..c2e0723b4c
> > --- /dev/null
> > +++ b/drivers/net/gve/version.map
> > @@ -0,0 +1,3 @@
> > +DPDK_22 {
> > +       local: *;
> > +};
> 
> it is 'DPDK_23' now, hopefully we will have an update to get rid of
> empty map files, feel free to review:
> https://patches.dpdk.org/project/dpdk/list/?series=25002

Sure, it looks much better. Thanks!

> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 4/9] net/gve: add support for link update
  2022-10-06 14:23               ` Ferruh Yigit
@ 2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-09  9:14 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 6, 2022 22:23
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v4 4/9] net/gve: add support for link update
> 
> On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> 
> >
> > Support dev_ops link_update.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   doc/guides/nics/features/gve.ini |  2 ++
> >   drivers/net/gve/gve_ethdev.c     | 30
> ++++++++++++++++++++++++++++++
> >   2 files changed, 32 insertions(+)
> >
> > diff --git a/doc/guides/nics/features/gve.ini
> b/doc/guides/nics/features/gve.ini
> > index 44aec28009..d03e3ac89e 100644
> > --- a/doc/guides/nics/features/gve.ini
> > +++ b/doc/guides/nics/features/gve.ini
> > @@ -4,6 +4,8 @@
> >   ; Refer to default.ini for the full list of available PMD features.
> >   ;
> >   [Features]
> > +Speed capabilities   = Y
> 
> 'Speed capabilities' is when device reports supported speeds in
> 'rte_eth_dev_info_get()', so it shouldn't be in this patch.
> 
> Please check 'doc/guides/nics/features.rst' for more details.

Thanks for the comments!
Will check the features file and update accordingly.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 7/9] net/gve: add support for Rx/Tx
  2022-10-06 14:24               ` Ferruh Yigit
@ 2022-10-09  9:14                 ` Guo, Junfeng
  2022-10-10  9:39                   ` Li, Xiaoyun
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-09  9:14 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 6, 2022 22:25
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v4 7/9] net/gve: add support for Rx/Tx
> 
> On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> 
> >
> > Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > --- a/drivers/net/gve/gve_ethdev.c
> > +++ b/drivers/net/gve/gve_ethdev.c
> > @@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
> >          if (err)
> >                  return err;
> >
> > +       if (gve_is_gqi(priv)) {
> > +               eth_dev->rx_pkt_burst = gve_rx_burst;
> > +               eth_dev->tx_pkt_burst = gve_tx_burst;
> > +       }
> > +
> 
> What do you think to add a log here for 'else' case, to inform user why
> datapath is not working?

Agreed, make sense!
Currently only one queue mode (i.e., qpl mode) is supported on the GCP
env. Will add a log to inform this in the else case. Thanks!

> 
> <...>
> 
> > +uint16_t
> > +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t
> nb_pkts)
> > +{
> > +       volatile struct gve_rx_desc *rxr, *rxd;
> > +       struct gve_rx_queue *rxq = rx_queue;
> > +       uint16_t rx_id = rxq->rx_tail;
> > +       struct rte_mbuf *rxe;
> > +       uint16_t nb_rx, len;
> > +       uint64_t addr;
> > +
> > +       rxr = rxq->rx_desc_ring;
> > +
> > +       for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
> > +               rxd = &rxr[rx_id];
> > +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> > +                       break;
> > +
> > +               if (rxd->flags_seq & GVE_RXF_ERR)
> > +                       continue;
> > +
> > +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> > +               rxe = rxq->sw_ring[rx_id];
> > +               rxe->data_off = RTE_PKTMBUF_HEADROOM;
> > +               if (rxq->is_gqi_qpl) {
> > +                       addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE
> + GVE_RX_PAD;
> > +                       rte_memcpy((void *)((size_t)rxe->buf_addr + rxe-
> >data_off),
> > +                                  (void *)(size_t)addr, len);
> 
> Why a 'memcpy' is needed? Can't it DMA to mbuf data buffer?

Well, only qpl (queue page list) mode supported on the GCP env now.
So the DMA may not be used in current case.

> 
> > +               }
> > +               rxe->nb_segs = 1;
> > +               rxe->next = NULL;
> > +               rxe->pkt_len = len;
> > +               rxe->data_len = len;
> > +               rxe->port = rxq->port_id;
> > +               rxe->packet_type = 0;
> > +               rxe->ol_flags = 0;
> > +
> 
> As far as I can see 'sw_ring[]' filled using 'rte_pktmbuf_alloc_bulk()'
> API, which should reset mbuf fields to default values, so some of the
> assignment above can be redundant.

Yes, some fields are already assigned at 'rte_pktmbuf_reset()'.
Will remove the redundant ones in the coming version. Thanks! 

> 
> > +               if (rxd->flags_seq & GVE_RXF_TCP)
> > +                       rxe->packet_type |= RTE_PTYPE_L4_TCP;
> > +               if (rxd->flags_seq & GVE_RXF_UDP)
> > +                       rxe->packet_type |= RTE_PTYPE_L4_UDP;
> > +               if (rxd->flags_seq & GVE_RXF_IPV4)
> > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV4;
> > +               if (rxd->flags_seq & GVE_RXF_IPV6)
> > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV6;
> > +
> 
> If you are setting packet_type, it is better to implement
> 'dev_supported_ptypes_get()' dev_ops too, to announce host which
> packet
> type parsin supporting. (+ dev_ptypes_set() dev_ops)
> And later driver can announce "Packet type parsing" feature in .ini file.

Well, on current GCP env, the APIs for supported ptypes get/set have not
been exposed even in the base code. The only one in the base code is for
the dqo mode (gve_adminq_get_ptype_map_dqo). But this also cannot
be used on current GCP env. We can only implement this once they are
supported and exposed at GCP. Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 8/9] net/gve: add support for dev info get and dev configure
  2022-10-06 14:25               ` Ferruh Yigit
@ 2022-10-09  9:14                 ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-09  9:14 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 6, 2022 22:25
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v4 8/9] net/gve: add support for dev info get and dev
> configure
> 
> On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> 
> >
> > Add dev_ops dev_infos_get.
> > Complete dev_configure with RX offloads configuration.
> >
> 
> I think better to have this before datapath patches (6/9 & 7/9), because
> this is more fundamental step.

Sure, make sense! Will reorder the patches in the coming version patchset. 
Thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 9/9] net/gve: add support for stats
  2022-10-06 14:25               ` Ferruh Yigit
@ 2022-10-09  9:15                 ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-09  9:15 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 6, 2022 22:26
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v4 9/9] net/gve: add support for stats
> 
> On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> 
> >
> > Update stats add support of dev_ops stats_get/reset.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   doc/guides/nics/features/gve.ini |  2 +
> >   drivers/net/gve/gve_ethdev.c     | 71
> ++++++++++++++++++++++++++++++++
> >   drivers/net/gve/gve_ethdev.h     | 11 +++++
> >   drivers/net/gve/gve_rx.c         | 15 ++++++-
> >   drivers/net/gve/gve_tx.c         | 13 ++++++
> >   5 files changed, 110 insertions(+), 2 deletions(-)
> >
> > diff --git a/doc/guides/nics/features/gve.ini
> b/doc/guides/nics/features/gve.ini
> > index cdc46b08a3..180408aa80 100644
> > --- a/doc/guides/nics/features/gve.ini
> > +++ b/doc/guides/nics/features/gve.ini
> > @@ -10,6 +10,8 @@ MTU update           = Y
> >   TSO                  = Y
> >   RSS hash             = Y
> >   L4 checksum offload  = Y
> > +Basic stats          = Y
> > +Stats per queue      = Y
> 
> "stats per queue" is something else, agree that it is bad naming, please
> check features.rst file.

Sure, will check the features file and update accordingly. Thanks!

> 
> >   Linux                = Y
> >   x86-32               = Y
> >   x86-64               = Y
> > diff --git a/drivers/net/gve/gve_ethdev.c
> b/drivers/net/gve/gve_ethdev.c
> > index b9b8e51b02..cd474b8128 100644
> > --- a/drivers/net/gve/gve_ethdev.c
> > +++ b/drivers/net/gve/gve_ethdev.c
> > @@ -328,6 +328,75 @@ gve_dev_info_get(struct rte_eth_dev *dev,
> struct rte_eth_dev_info *dev_info)
> >          return 0;
> >   }
> >
> > +static int
> > +gve_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> > +{
> > +       uint16_t i;
> > +
> > +       for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > +               struct gve_tx_queue *txq = dev->data->tx_queues[i];
> > +               if (txq == NULL)
> > +                       continue;
> > +
> > +               stats->opackets += txq->packets;
> > +               stats->obytes += txq->bytes;
> > +               stats->oerrors += txq->errors;
> > +
> > +               if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
> > +                       stats->q_opackets[i] = txq->packets;
> > +                       stats->q_obytes[i] = txq->bytes;
> 
> Queue stats update is moved to xstat [1], we are waiting existing PMDs
> to adopt it but for new drivers it is better to implement new method.
> 
> Can you please either drop queue stats completely, or implement it via
> xstats?
> 
> [1]
> https://elixir.bootlin.com/dpdk/v22.07/source/doc/guides/rel_notes/dep
> recation.rst#L118

Sure, thanks for reminding this!
Seems that it would be better to drop the stats feature at this point since lack
of time and bandwidth... But we can still measure the performance via other
tools like TRex. And we can plan the implementation of the new method in 
the coming release. Thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 7/9] net/gve: add support for Rx/Tx
  2022-10-09  9:14                 ` Guo, Junfeng
@ 2022-10-10  9:39                   ` Li, Xiaoyun
  2022-10-10 10:18                     ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Li, Xiaoyun @ 2022-10-10  9:39 UTC (permalink / raw)
  To: Guo, Junfeng, Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, awogbemila, Richardson,  Bruce, Lin, Xueqin

Hi

> -----Original Message-----
> From: Guo, Junfeng <junfeng.guo@intel.com>
> Sent: Sunday, October 9, 2022 10:15
> To: Ferruh Yigit <ferruh.yigit@amd.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: RE: [PATCH v4 7/9] net/gve: add support for Rx/Tx
> 
> 
> 
> > -----Original Message-----
> > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > Sent: Thursday, October 6, 2022 22:25
> > To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> > <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> > Subject: Re: [PATCH v4 7/9] net/gve: add support for Rx/Tx
> >
> > On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> >
> > >
> > > Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> > >
> > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >
> > <...>
> >
> > > --- a/drivers/net/gve/gve_ethdev.c
> > > +++ b/drivers/net/gve/gve_ethdev.c
> > > @@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
> > >          if (err)
> > >                  return err;
> > >
> > > +       if (gve_is_gqi(priv)) {
> > > +               eth_dev->rx_pkt_burst = gve_rx_burst;
> > > +               eth_dev->tx_pkt_burst = gve_tx_burst;
> > > +       }
> > > +
> >
> > What do you think to add a log here for 'else' case, to inform user
> > why datapath is not working?
> 
> Agreed, make sense!
> Currently only one queue mode (i.e., qpl mode) is supported on the GCP env.
> Will add a log to inform this in the else case. Thanks!

This explanation is not correct. Only QPL mode is supported in GCP now. This is env limitation but not related to the else code here.
gve_is_gqi() includes two modes GQI_QPL and GQI_RDA. And both of these datapath is supported in rxtx.
GQI means its queue model is single queue model (txq for tx and rxq for rx). And there're 2 ways for this queue model QPL and RDA.
QPL needs to copy packets from/to several reserved pages negotiated with backend. RDA is just like normal device and uses PA in descs.

The datapath not supported is DQO_RDA which uses different hardware so different queue model (split/double queue model). Tx will use txq and tx_completion_q and Rx will use rxq and rx_completion_q.
This is not implemented in the datapath for now and will be implemented in the future.

So if you want to add comment here. Please say "DQO_RDA is not implemented and will be added in the future". Don't say it's not available in GCP env which is not the reason.

> 
> >
> > <...>
> >
> > > +uint16_t
> > > +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t
> > nb_pkts)
> > > +{
> > > +       volatile struct gve_rx_desc *rxr, *rxd;
> > > +       struct gve_rx_queue *rxq = rx_queue;
> > > +       uint16_t rx_id = rxq->rx_tail;
> > > +       struct rte_mbuf *rxe;
> > > +       uint16_t nb_rx, len;
> > > +       uint64_t addr;
> > > +
> > > +       rxr = rxq->rx_desc_ring;
> > > +
> > > +       for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
> > > +               rxd = &rxr[rx_id];
> > > +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> > > +                       break;
> > > +
> > > +               if (rxd->flags_seq & GVE_RXF_ERR)
> > > +                       continue;
> > > +
> > > +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> > > +               rxe = rxq->sw_ring[rx_id];
> > > +               rxe->data_off = RTE_PKTMBUF_HEADROOM;
> > > +               if (rxq->is_gqi_qpl) {
> > > +                       addr = (uint64_t)(rxq->qpl->mz->addr) +
> > > + rx_id * PAGE_SIZE
> > + GVE_RX_PAD;
> > > +                       rte_memcpy((void *)((size_t)rxe->buf_addr +
> > > + rxe-
> > >data_off),
> > > +                                  (void *)(size_t)addr, len);
> >
> > Why a 'memcpy' is needed? Can't it DMA to mbuf data buffer?

When queue model is gpi_qpl (this is negotiated and gotten using adminq with backend), the device needs to register a block of memory (called page list). And tx needs to copy the packets to this memory and rx will get packets from this area.
Backend will be responsible for getting(tx)/giving(rx) packets from this memory to the device/line (We don't really know how backend does this).
Please refer to https://www.kernel.org/doc/html/v5.4/networking/device_drivers/google/gve.html. There's a bit more explanation about this queue format.

> 
> Well, only qpl (queue page list) mode supported on the GCP env now.
> So the DMA may not be used in current case.

And yes, it's because GCP doesn't support GQI_RDA for now so GQI_QPL has to be implemented. But even if GCP env supports RDA in the future, unless they completely remove QPL support, QPL is still needed.
Because queue format/model is getting from backend through gve_adminq_describe_device(). You may just get the QPI version. The device can't really control which queue format to get.

> 
> >
> > > +               }
> > > +               rxe->nb_segs = 1;
> > > +               rxe->next = NULL;
> > > +               rxe->pkt_len = len;
> > > +               rxe->data_len = len;
> > > +               rxe->port = rxq->port_id;
> > > +               rxe->packet_type = 0;
> > > +               rxe->ol_flags = 0;
> > > +
> >
> > As far as I can see 'sw_ring[]' filled using 'rte_pktmbuf_alloc_bulk()'
> > API, which should reset mbuf fields to default values, so some of the
> > assignment above can be redundant.
> 
> Yes, some fields are already assigned at 'rte_pktmbuf_reset()'.
> Will remove the redundant ones in the coming version. Thanks!
> 
> >
> > > +               if (rxd->flags_seq & GVE_RXF_TCP)
> > > +                       rxe->packet_type |= RTE_PTYPE_L4_TCP;
> > > +               if (rxd->flags_seq & GVE_RXF_UDP)
> > > +                       rxe->packet_type |= RTE_PTYPE_L4_UDP;
> > > +               if (rxd->flags_seq & GVE_RXF_IPV4)
> > > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV4;
> > > +               if (rxd->flags_seq & GVE_RXF_IPV6)
> > > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV6;
> > > +
> >
> > If you are setting packet_type, it is better to implement
> > 'dev_supported_ptypes_get()' dev_ops too, to announce host which
> > packet type parsin supporting. (+ dev_ptypes_set() dev_ops) And later
> > driver can announce "Packet type parsing" feature in .ini file.
> 
> Well, on current GCP env, the APIs for supported ptypes get/set have not
> been exposed even in the base code. The only one in the base code is for
> the dqo mode (gve_adminq_get_ptype_map_dqo). But this also cannot be
> used on current GCP env. We can only implement this once they are
> supported and exposed at GCP. Thanks!

You're mixing the concept again. GCP env only supports QPL is not an excuse.
The packet type is supported even in QPL. It's just very limited to L4_TCP/UDP and L3_IPV4/6. Ptypes_get is possible and it'll be RTE_PTYPE_L3_IPV4/6 and RTE_PTYPE_L4_UDP/TCP.
For DQO mode you mentioned, it'll be more flexible and have more support. I'm not sure what's your plan but it can be implemented whenever based on the plan not GCP env availability. The base code is there. It's just you may not be able to timely verify and debug it.

Ptype_set is not supported since the hardware doesn't support it (There's no such adminq).

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v5 0/8] introduce GVE PMD
  2022-09-27  7:32             ` [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code Junfeng Guo
  2022-10-06 14:19               ` Ferruh Yigit
@ 2022-10-10 10:17               ` Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
                                   ` (7 more replies)
  1 sibling, 8 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Introduce a new PMD for Google Virtual Ethernet (GVE).

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h

It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0

v2:
fix some CI check error.

v3:
refactor some code and fix some build error.

v4:
move the Google base code files into DPDK base folder.

v5:
reorder commit sequence and drop the stats feature.

Junfeng Guo (8):
  net/gve/base: introduce GVE PMD base code
  net/gve/base: add OS specific implementation
  net/gve: add support for device initialization
  net/gve: add support for link update
  net/gve: add support for MTU setting
  net/gve: add support for dev info get and dev configure
  net/gve: add support for queue operations
  net/gve: add support for Rx/Tx

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  16 +
 doc/guides/nics/gve.rst                |  71 ++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve.h             |  58 ++
 drivers/net/gve/base/gve_adminq.c      | 925 +++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h      | 383 ++++++++++
 drivers/net/gve/base/gve_desc.h        | 139 ++++
 drivers/net/gve/base/gve_desc_dqo.h    | 256 +++++++
 drivers/net/gve/base/gve_osdep.h       | 159 +++++
 drivers/net/gve/base/gve_register.h    |  30 +
 drivers/net/gve/gve_ethdev.c           | 704 +++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 295 ++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/gve_rx.c               | 352 ++++++++++
 drivers/net/gve/gve_tx.c               | 669 ++++++++++++++++++
 drivers/net/gve/meson.build            |  16 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 20 files changed, 4103 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_osdep.h
 create mode 100644 drivers/net/gve/base/gve_register.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.34.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-19 13:45                   ` Ferruh Yigit
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 2/8] net/gve/base: add OS specific implementation Junfeng Guo
                                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
- gve_adminq.c
- gve_adminq.h
- gve_desc.h
- gve_desc_dqo.h
- gve_register.h
- gve.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Note that these code are not Intel files and they come from the kernel
community. The base code there has the statement of
SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
required MIT license as an exception to DPDK.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve.h          |  58 ++
 drivers/net/gve/base/gve_adminq.c   | 924 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h   | 381 ++++++++++++
 drivers/net/gve/base/gve_desc.h     | 137 +++++
 drivers/net/gve/base/gve_desc_dqo.h | 254 ++++++++
 drivers/net/gve/base/gve_register.h |  28 +
 6 files changed, 1782 insertions(+)
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_register.h

diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
new file mode 100644
index 0000000000..1b0d59b639
--- /dev/null
+++ b/drivers/net/gve/base/gve.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include "gve_desc.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR		0
+#define GVE_DB_BAR		2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES		1024
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
new file mode 100644
index 0000000000..2344100f1a
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -0,0 +1,924 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n Expected: length=%d, feature_mask=%x.\n Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIME;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_rsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u8 *mac;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	mac = descriptor->mac;
+	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
+		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR,
+			    "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
new file mode 100644
index 0000000000..c7114cc883
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
new file mode 100644
index 0000000000..358755b7e0
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+GVE_CHECK_STRUCT_LEN(64, gve_rx_desc);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
new file mode 100644
index 0000000000..0d533abcd1
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
new file mode 100644
index 0000000000..b65f336be2
--- /dev/null
+++ b/drivers/net/gve/base/gve_register.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v5 2/8] net/gve/base: add OS specific implementation
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 3/8] net/gve: add support for device initialization Junfeng Guo
                                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve_adminq.h   |   2 +
 drivers/net/gve/base/gve_desc.h     |   2 +
 drivers/net/gve/base/gve_desc_dqo.h |   2 +
 drivers/net/gve/base/gve_osdep.h    | 159 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_register.h |   2 +
 5 files changed, 167 insertions(+)
 create mode 100644 drivers/net/gve/base/gve_osdep.h

diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
index c7114cc883..cd496760ae 100644
--- a/drivers/net/gve/base/gve_adminq.h
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
index 358755b7e0..627b9120dc 100644
--- a/drivers/net/gve/base/gve_desc.h
+++ b/drivers/net/gve/base/gve_desc.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
index 0d533abcd1..5031752b43 100644
--- a/drivers/net/gve/base/gve_desc_dqo.h
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/base/gve_osdep.h b/drivers/net/gve/base/gve_osdep.h
new file mode 100644
index 0000000000..7cb73002f4
--- /dev/null
+++ b/drivers/net/gve/base/gve_osdep.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "../gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+
+#ifndef PAGE_SHIFT
+#define PAGE_SHIFT	12
+#endif
+#ifndef PAGE_SIZE
+#define PAGE_SIZE	(1UL << PAGE_SHIFT)
+#endif
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#ifndef ____cacheline_aligned
+#define ____cacheline_aligned	__rte_cache_aligned
+#endif
+#ifndef __packed
+#define __packed		__rte_packed
+#endif
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
index b65f336be2..a599c1a08e 100644
--- a/drivers/net/gve/base/gve_register.h
+++ b/drivers/net/gve/base/gve_register.h
@@ -7,6 +7,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 2/8] net/gve/base: add OS specific implementation Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-19 13:46                   ` Ferruh Yigit
  2022-10-19 13:47                   ` Ferruh Yigit
  2022-10-10 10:17                 ` [PATCH v5 4/8] net/gve: add support for link update Junfeng Guo
                                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo, Haiyue Wang

Support device init and add following devops skeleton:
 - dev_configure
 - dev_start
 - dev_stop
 - dev_close

Note that build system (including doc) is also added in this patch.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  10 +
 doc/guides/nics/gve.rst                |  63 +++++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve_adminq.c      |   1 +
 drivers/net/gve/gve_ethdev.c           | 371 +++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 225 +++++++++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/meson.build            |  14 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 12 files changed, 714 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 3757ccc3b3..a959f710a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -700,6 +700,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..44aec28009
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,10 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..4fbef8f9dc
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,63 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_gve**) provides poll mode driver support for
+Google Virtual Ethernet device (also called as gVNIC).
+
+Current gVNIC is an alternative to the virtIO-based ethernet interface that can
+support higher network bandwidths such as the 50-100 Gbps speeds.
+Large receive offload (LRO) is currently not supported.
+Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
+for the device description.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_desc.h
+- gve_desc_dqo.h
+- gve_register.h
+- gve.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL - GQI with queue page list
+- GQI_RDA - GQI with raw DMA addressing
+- DQO_RDA - DQO with raw DMA addressing
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 32c7544968..4d40ea29a3 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -29,6 +29,7 @@ Network Interface Controller Drivers
     enetfec
     enic
     fm10k
+    gve
     hinic
     hns3
     i40e
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index fbb575255f..c1162ea1a4 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -200,6 +200,11 @@ New Features
   into single event containing ``rte_event_vector``
   whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
 
+* **Added GVE net PMD**
+
+  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
index 2344100f1a..ceafa05286 100644
--- a/drivers/net/gve/base/gve_adminq.c
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -5,6 +5,7 @@
  * Copyright(C) 2022 Intel Corporation
  */
 
+#include "../gve_ethdev.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..f8ff3b1923
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,371 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+#include "base/gve_register.h"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_DRV_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map pci bar!");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
+	if (!eth_dev->data->mac_addrs) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
+		return -ENOMEM;
+	}
+	rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	gve_teardown_priv_resources(priv);
+
+	eth_dev->data->mac_addrs = NULL;
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
new file mode 100644
index 0000000000..2ac2a46ac1
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.h
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ETHDEV_H_
+#define _GVE_ETHDEV_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "base/gve.h"
+
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
+
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_rsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+static inline bool
+gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool
+gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+
+#endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..0d02da46e1
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_driver;
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
+		__func__, ## args)
+
+#endif
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..d8ec64b3a3
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'base/gve_adminq.c',
+        'gve_ethdev.c',
+)
+includes += include_directories('base')
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..c2e0723b4c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_22 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 35bfa78dee..355dbd07e9 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v5 4/8] net/gve: add support for link update
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
                                   ` (2 preceding siblings ...)
  2022-10-10 10:17                 ` [PATCH v5 3/8] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 5/8] net/gve: add support for MTU setting Junfeng Guo
                                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 doc/guides/nics/gve.rst          |  3 +++
 drivers/net/gve/gve_ethdev.c     | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 44aec28009..ae466ad677 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Link status          = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 4fbef8f9dc..6e95528f55 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -55,6 +55,9 @@ Features and Limitations
 
 In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
+Supported features of the GVE PMD are:
+
+- Link state information
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index f8ff3b1923..ca4a467140 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -34,10 +34,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_DRV_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -70,6 +99,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v5 5/8] net/gve: add support for MTU setting
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
                                   ` (3 preceding siblings ...)
  2022-10-10 10:17                 ` [PATCH v5 4/8] net/gve: add support for link update Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-19 13:47                   ` Ferruh Yigit
  2022-10-10 10:17                 ` [PATCH v5 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
                                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 drivers/net/gve/gve_ethdev.c     | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index ae466ad677..d1703d8dab 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -5,6 +5,7 @@
 ;
 [Features]
 Link status          = Y
+MTU update           = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index ca4a467140..e9c68964ac 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -94,12 +94,41 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	dev->data->dev_conf.rxmode.mtu = mtu + RTE_ETHER_HDR_LEN;
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v5 6/8] net/gve: add support for dev info get and dev configure
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
                                   ` (4 preceding siblings ...)
  2022-10-10 10:17                 ` [PATCH v5 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-19 13:49                   ` Ferruh Yigit
  2022-10-10 10:17                 ` [PATCH v5 7/8] net/gve: add support for queue operations Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 8/8] net/gve: add support for Rx/Tx Junfeng Guo
  7 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads configuration.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 +
 doc/guides/nics/gve.rst          |  1 +
 drivers/net/gve/gve_ethdev.c     | 65 +++++++++++++++++++++++++++++++-
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index d1703d8dab..986df7f94a 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,8 +4,10 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+RSS hash             = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 6e95528f55..d76f9bd5b9 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -57,6 +57,7 @@ In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
 Supported features of the GVE PMD are:
 
+- Receiver Side Scaling (RSS)
 - Link state information
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index e9c68964ac..2f914b55ed 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -29,8 +29,16 @@ gve_write_version(uint8_t *driver_version_register)
 }
 
 static int
-gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+gve_dev_configure(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+		dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_rsc = 1;
+
 	return 0;
 }
 
@@ -94,6 +102,60 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+	dev_info->max_mtu = RTE_ETHER_MTU;
+	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS	|
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -127,6 +189,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v5 7/8] net/gve: add support for queue operations
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
                                   ` (5 preceding siblings ...)
  2022-10-10 10:17                 ` [PATCH v5 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-10 10:17                 ` [PATCH v5 8/8] net/gve: add support for Rx/Tx Junfeng Guo
  7 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add support for queue operations:
- setup rx/tx queue
- release rx/tx queue
- start rx/tx queues
- stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 204 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h |  52 +++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 684 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 2f914b55ed..5c568268fa 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -28,6 +28,68 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(struct rte_eth_dev *dev)
 {
@@ -42,6 +104,43 @@ gve_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -73,16 +172,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -91,7 +244,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -99,6 +256,19 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+	rte_free(priv->adminq);
+
 	return err;
 }
 
@@ -190,6 +360,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.dev_infos_get        = gve_dev_info_get,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -327,7 +499,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -378,10 +552,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 2ac2a46ac1..20fe57781e 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,15 +34,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -56,6 +76,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -64,9 +86,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -83,6 +113,8 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_priv {
@@ -222,4 +254,24 @@ gve_clear_device_rings_ok(struct gve_priv *priv)
 				&priv->state_flags);
 }
 
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void
+gve_tx_queue_release(void *txq);
+
+void
+gve_rx_queue_release(void *rxq);
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..e64a461253
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_DRV_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..b706b62e71
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_DRV_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index d8ec64b3a3..af0010c01c 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,6 +9,8 @@ endif
 
 sources = files(
         'base/gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
 includes += include_directories('base')
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v5 8/8] net/gve: add support for Rx/Tx
  2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
                                   ` (6 preceding siblings ...)
  2022-10-10 10:17                 ` [PATCH v5 7/8] net/gve: add support for queue operations Junfeng Guo
@ 2022-10-10 10:17                 ` Junfeng Guo
  2022-10-19 13:47                   ` Ferruh Yigit
  7 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-10 10:17 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, junfeng.guo

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |   2 +
 doc/guides/nics/gve.rst          |   4 +
 drivers/net/gve/gve_ethdev.c     |   7 +
 drivers/net/gve/gve_ethdev.h     |  18 ++
 drivers/net/gve/gve_rx.c         | 140 ++++++++++
 drivers/net/gve/gve_tx.c         | 455 +++++++++++++++++++++++++++++++
 6 files changed, 626 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 986df7f94a..cdc46b08a3 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -7,7 +7,9 @@
 Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+TSO                  = Y
 RSS hash             = Y
+L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index d76f9bd5b9..f675679625 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -57,8 +57,12 @@ In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
 Supported features of the GVE PMD are:
 
+- Multiple queues for TX and RX
 - Receiver Side Scaling (RSS)
+- TSO offload
 - Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 5c568268fa..47ffb3afff 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -644,6 +644,13 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	} else {
+		PMD_DRV_LOG(ERR, "DQO_RDA is not implemented and will be added in the future");
+	}
+
 	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
 	if (!eth_dev->data->mac_addrs) {
 		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 20fe57781e..266b831a01 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,6 +34,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -274,4 +286,10 @@ gve_stop_tx_queues(struct rte_eth_dev *dev);
 void
 gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t
+gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t
+gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index e64a461253..d83c0eccbc 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,146 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+
+	rxr = rxq->rx_desc_ring;
+
+	for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		rxe->data_off = RTE_PKTMBUF_HEADROOM;
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
+				   (void *)(size_t)addr, len);
+		}
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index b706b62e71..d94b1186a4 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,461 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+			}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		txq->next_to_clean += nb_clean;
+	}
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+				   (void *)(size_t)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen,
+					 (void *)(size_t)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+					   (void *)(size_t)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(size_t)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* RE: [PATCH v4 7/9] net/gve: add support for Rx/Tx
  2022-10-10  9:39                   ` Li, Xiaoyun
@ 2022-10-10 10:18                     ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-10 10:18 UTC (permalink / raw)
  To: Li, Xiaoyun, Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, awogbemila, Richardson,  Bruce, Lin, Xueqin

Thanks Xiaoyun for helping explain, it helps a lot!

> -----Original Message-----
> From: Li, Xiaoyun <xiaoyun.li@intel.com>
> Sent: Monday, October 10, 2022 17:40
> To: Guo, Junfeng <junfeng.guo@intel.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; awogbemila@google.com;
> Richardson, Bruce <bruce.richardson@intel.com>; Lin, Xueqin
> <xueqin.lin@intel.com>
> Subject: RE: [PATCH v4 7/9] net/gve: add support for Rx/Tx
> 
> Hi
> 
> > -----Original Message-----
> > From: Guo, Junfeng <junfeng.guo@intel.com>
> > Sent: Sunday, October 9, 2022 10:15
> > To: Ferruh Yigit <ferruh.yigit@amd.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> > <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> > Subject: RE: [PATCH v4 7/9] net/gve: add support for Rx/Tx
> >
> >
> >
> > > -----Original Message-----
> > > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > > Sent: Thursday, October 6, 2022 22:25
> > > To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> > > <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > > Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> > > <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson,
> Bruce
> > > <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> > > Subject: Re: [PATCH v4 7/9] net/gve: add support for Rx/Tx
> > >
> > > On 9/27/2022 8:32 AM, Junfeng Guo wrote:
> > >
> > > >
> > > > Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> > > >
> > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > > > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > >
> > > <...>
> > >
> > > > --- a/drivers/net/gve/gve_ethdev.c
> > > > +++ b/drivers/net/gve/gve_ethdev.c
> > > > @@ -583,6 +583,11 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
> > > >          if (err)
> > > >                  return err;
> > > >
> > > > +       if (gve_is_gqi(priv)) {
> > > > +               eth_dev->rx_pkt_burst = gve_rx_burst;
> > > > +               eth_dev->tx_pkt_burst = gve_tx_burst;
> > > > +       }
> > > > +
> > >
> > > What do you think to add a log here for 'else' case, to inform user
> > > why datapath is not working?
> >
> > Agreed, make sense!
> > Currently only one queue mode (i.e., qpl mode) is supported on the GCP
> env.
> > Will add a log to inform this in the else case. Thanks!
> 
> This explanation is not correct. Only QPL mode is supported in GCP now.
> This is env limitation but not related to the else code here.
> gve_is_gqi() includes two modes GQI_QPL and GQI_RDA. And both of
> these datapath is supported in rxtx.
> GQI means its queue model is single queue model (txq for tx and rxq for
> rx). And there're 2 ways for this queue model QPL and RDA.
> QPL needs to copy packets from/to several reserved pages negotiated
> with backend. RDA is just like normal device and uses PA in descs.
> 
> The datapath not supported is DQO_RDA which uses different hardware
> so different queue model (split/double queue model). Tx will use txq and
> tx_completion_q and Rx will use rxq and rx_completion_q.
> This is not implemented in the datapath for now and will be implemented
> in the future.
> 
> So if you want to add comment here. Please say "DQO_RDA is not
> implemented and will be added in the future". Don't say it's not available
> in GCP env which is not the reason.

Okay, will add this in the coming version. Thanks!

> 
> >
> > >
> > > <...>
> > >
> > > > +uint16_t
> > > > +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t
> > > nb_pkts)
> > > > +{
> > > > +       volatile struct gve_rx_desc *rxr, *rxd;
> > > > +       struct gve_rx_queue *rxq = rx_queue;
> > > > +       uint16_t rx_id = rxq->rx_tail;
> > > > +       struct rte_mbuf *rxe;
> > > > +       uint16_t nb_rx, len;
> > > > +       uint64_t addr;
> > > > +
> > > > +       rxr = rxq->rx_desc_ring;
> > > > +
> > > > +       for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
> > > > +               rxd = &rxr[rx_id];
> > > > +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> > > > +                       break;
> > > > +
> > > > +               if (rxd->flags_seq & GVE_RXF_ERR)
> > > > +                       continue;
> > > > +
> > > > +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> > > > +               rxe = rxq->sw_ring[rx_id];
> > > > +               rxe->data_off = RTE_PKTMBUF_HEADROOM;
> > > > +               if (rxq->is_gqi_qpl) {
> > > > +                       addr = (uint64_t)(rxq->qpl->mz->addr) +
> > > > + rx_id * PAGE_SIZE
> > > + GVE_RX_PAD;
> > > > +                       rte_memcpy((void *)((size_t)rxe->buf_addr +
> > > > + rxe-
> > > >data_off),
> > > > +                                  (void *)(size_t)addr, len);
> > >
> > > Why a 'memcpy' is needed? Can't it DMA to mbuf data buffer?
> 
> When queue model is gpi_qpl (this is negotiated and gotten using adminq
> with backend), the device needs to register a block of memory (called
> page list). And tx needs to copy the packets to this memory and rx will get
> packets from this area.
> Backend will be responsible for getting(tx)/giving(rx) packets from this
> memory to the device/line (We don't really know how backend does this).
> Please refer to
> https://www.kernel.org/doc/html/v5.4/networking/device_drivers/googl
> e/gve.html. There's a bit more explanation about this queue format.
> 
> >
> > Well, only qpl (queue page list) mode supported on the GCP env now.
> > So the DMA may not be used in current case.
> 
> And yes, it's because GCP doesn't support GQI_RDA for now so GQI_QPL
> has to be implemented. But even if GCP env supports RDA in the future,
> unless they completely remove QPL support, QPL is still needed.
> Because queue format/model is getting from backend through
> gve_adminq_describe_device(). You may just get the QPI version. The
> device can't really control which queue format to get.

Thanks for the explanation!

> 
> >
> > >
> > > > +               }
> > > > +               rxe->nb_segs = 1;
> > > > +               rxe->next = NULL;
> > > > +               rxe->pkt_len = len;
> > > > +               rxe->data_len = len;
> > > > +               rxe->port = rxq->port_id;
> > > > +               rxe->packet_type = 0;
> > > > +               rxe->ol_flags = 0;
> > > > +
> > >
> > > As far as I can see 'sw_ring[]' filled using 'rte_pktmbuf_alloc_bulk()'
> > > API, which should reset mbuf fields to default values, so some of the
> > > assignment above can be redundant.
> >
> > Yes, some fields are already assigned at 'rte_pktmbuf_reset()'.
> > Will remove the redundant ones in the coming version. Thanks!
> >
> > >
> > > > +               if (rxd->flags_seq & GVE_RXF_TCP)
> > > > +                       rxe->packet_type |= RTE_PTYPE_L4_TCP;
> > > > +               if (rxd->flags_seq & GVE_RXF_UDP)
> > > > +                       rxe->packet_type |= RTE_PTYPE_L4_UDP;
> > > > +               if (rxd->flags_seq & GVE_RXF_IPV4)
> > > > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV4;
> > > > +               if (rxd->flags_seq & GVE_RXF_IPV6)
> > > > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV6;
> > > > +
> > >
> > > If you are setting packet_type, it is better to implement
> > > 'dev_supported_ptypes_get()' dev_ops too, to announce host which
> > > packet type parsin supporting. (+ dev_ptypes_set() dev_ops) And later
> > > driver can announce "Packet type parsing" feature in .ini file.
> >
> > Well, on current GCP env, the APIs for supported ptypes get/set have
> not
> > been exposed even in the base code. The only one in the base code is
> for
> > the dqo mode (gve_adminq_get_ptype_map_dqo). But this also cannot
> be
> > used on current GCP env. We can only implement this once they are
> > supported and exposed at GCP. Thanks!
> 
> You're mixing the concept again. GCP env only supports QPL is not an
> excuse.
> The packet type is supported even in QPL. It's just very limited to
> L4_TCP/UDP and L3_IPV4/6. Ptypes_get is possible and it'll be
> RTE_PTYPE_L3_IPV4/6 and RTE_PTYPE_L4_UDP/TCP.
> For DQO mode you mentioned, it'll be more flexible and have more
> support. I'm not sure what's your plan but it can be implemented
> whenever based on the plan not GCP env availability. The base code is
> there. It's just you may not be able to timely verify and debug it.
> 
> Ptype_set is not supported since the hardware doesn't support it (There's
> no such adminq).

Okay... no much bandwidth to implement at this point.
Maybe next release, thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-10 10:17                 ` [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
@ 2022-10-19 13:45                   ` Ferruh Yigit
  2022-10-19 15:13                     ` Hemant Agrawal
  2022-10-19 15:48                     ` Li, Xiaoyun
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
  1 sibling, 2 replies; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 13:45 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, Hemant Agrawal, Stephen Hemminger
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> The following base code is based on Google Virtual Ethernet (gve)
> driver v1.3.0 under MIT license.
> - gve_adminq.c
> - gve_adminq.h
> - gve_desc.h
> - gve_desc_dqo.h
> - gve_register.h
> - gve.h
> 
> The original code is in:
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
> tree/v1.3.0/google/gve
> 
> Note that these code are not Intel files and they come from the kernel
> community. The base code there has the statement of
> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> required MIT license as an exception to DPDK.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
> new file mode 100644
> index 0000000000..1b0d59b639
> --- /dev/null
> +++ b/drivers/net/gve/base/gve.h
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: MIT
> + * Google Virtual Ethernet (gve) driver
> + * Version: 1.3.0

There is a version macro in the code, is version information required in 
the file comment?

> + * Copyright (C) 2015-2022 Google, Inc.
> + * Copyright(C) 2022 Intel Corporation

I don't know if it is OK to add Intel copyright, as far as I know this 
requires big enough contribution to the code, if this is copy of 
existing code, may be only original copyright should exist.

cc'ed @Hemant and @Stephen for more comment.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-10 10:17                 ` [PATCH v5 3/8] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-19 13:46                   ` Ferruh Yigit
  2022-10-19 15:59                     ` Li, Xiaoyun
  2022-10-19 13:47                   ` Ferruh Yigit
  1 sibling, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 13:46 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson,
	xueqin.lin, Haiyue Wang

On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> 
> Support device init and add following devops skeleton:
>   - dev_configure
>   - dev_start
>   - dev_stop
>   - dev_close
> 
> Note that build system (including doc) is also added in this patch.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index fbb575255f..c1162ea1a4 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -200,6 +200,11 @@ New Features
>     into single event containing ``rte_event_vector``
>     whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
> 
> +* **Added GVE net PMD**
> +
> +  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
> +  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
> +
> 

Can you please move the block amaong the other ethdev drivers, as 
alphabetically sorted?

<...>

> +static int
> +gve_dev_init(struct rte_eth_dev *eth_dev)
> +{
> +       struct gve_priv *priv = eth_dev->data->dev_private;
> +       int max_tx_queues, max_rx_queues;
> +       struct rte_pci_device *pci_dev;
> +       struct gve_registers *reg_bar;
> +       rte_be32_t *db_bar;
> +       int err;
> +
> +       eth_dev->dev_ops = &gve_eth_dev_ops;
> +
> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +               return 0;
> +
> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> +
> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> +       if (!reg_bar) {
> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> +               return -ENOMEM;
> +       }
> +
> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> +       if (!db_bar) {
> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> +               return -ENOMEM;
> +       }
> +
> +       gve_write_version(&reg_bar->driver_version);
> +       /* Get max queues to alloc etherdev */
> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> +
> +       priv->reg_bar0 = reg_bar;
> +       priv->db_bar2 = db_bar;
> +       priv->pci_dev = pci_dev;
> +       priv->state_flags = 0x0;
> +
> +       priv->max_nb_txq = max_tx_queues;
> +       priv->max_nb_rxq = max_rx_queues;
> +
> +       err = gve_init_priv(priv, false);
> +       if (err)
> +               return err;
> +
> +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
> +       if (!eth_dev->data->mac_addrs) {
> +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
> +               return -ENOMEM;
> +       }
> +       rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
> +

Is anything assinged to 'priv->dev_addr' to copy?
Also since there is a 'priv->dev_addr' field, why not use it directly, 
instead of allocating memory for 'eth_dev->data->mac_addrs'?
I mean why not "eth_dev->data->mac_addrs = &priv->dev_addr"?

<...>

> +struct gve_priv {
> +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> +       const struct rte_memzone *irq_dbs_mz;
> +       uint32_t mgmt_msix_idx;
> +       rte_be32_t *cnt_array; /* array of num_event_counters */
> +       const struct rte_memzone *cnt_array_mz;
> +
> +       uint16_t num_event_counters;
> +       uint16_t tx_desc_cnt; /* txq size */
> +       uint16_t rx_desc_cnt; /* rxq size */
> +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> +       uint16_t rx_data_slot_cnt; /* rx buffer length */

These fields are not used in this patch, I guess some will be used in 
datapath patch.

Can you please only add fields that is used in the patch? This way it 
will be clear in which functionality that field is used and enable to 
detect not used fields.
We are accepting batch updates for base code, but this is dpdk related 
code, lets only add things that are used when they are used.
Same for all data structures.

<...>

> diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
> new file mode 100644
> index 0000000000..c2e0723b4c
> --- /dev/null
> +++ b/drivers/net/gve/version.map
> @@ -0,0 +1,3 @@
> +DPDK_22 {

DPDK_23

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 8/8] net/gve: add support for Rx/Tx
  2022-10-10 10:17                 ` [PATCH v5 8/8] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-10-19 13:47                   ` Ferruh Yigit
  2022-10-20  9:34                     ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 13:47 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin

On 10/10/2022 11:17 AM, Junfeng Guo wrote:

> 
> Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +uint16_t
> +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> +{
> +       volatile struct gve_rx_desc *rxr, *rxd;
> +       struct gve_rx_queue *rxq = rx_queue;
> +       uint16_t rx_id = rxq->rx_tail;
> +       struct rte_mbuf *rxe;
> +       uint16_t nb_rx, len;
> +       uint64_t addr;
> +
> +       rxr = rxq->rx_desc_ring;
> +
> +       for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
> +               rxd = &rxr[rx_id];
> +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> +                       break;
> +
> +               if (rxd->flags_seq & GVE_RXF_ERR)
> +                       continue;
> +

I think above code wrong.
Function returns 'nb_rx', if you continue on error 'nb_rx' kept 
increased, so application will receive wrong number of packets.

Also packets assigned as "rx_pkts[nb_rx] = rxe;", this will cause gaps 
in the arrays.

You can either break on the error, or keep another variable to store 
number of received packets.

> +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> +               rxe = rxq->sw_ring[rx_id];
> +               rxe->data_off = RTE_PKTMBUF_HEADROOM;

As far as I can see mbufs allocated using 'rte_pktmbuf_alloc()', if so 
no need to explicitly set the 'm->data_off'.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-10 10:17                 ` [PATCH v5 3/8] net/gve: add support for device initialization Junfeng Guo
  2022-10-19 13:46                   ` Ferruh Yigit
@ 2022-10-19 13:47                   ` Ferruh Yigit
  2022-10-19 14:02                     ` Xia, Chenbo
                                       ` (2 more replies)
  1 sibling, 3 replies; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 13:47 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, Maxime Coquelin, Chenbo Xia
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin,
	Haiyue Wang, Helin Zhang, Thomas Monjalon

On 10/10/2022 11:17 AM, Junfeng Guo wrote:

> 
> Support device init and add following devops skeleton:
>   - dev_configure
>   - dev_start
>   - dev_stop
>   - dev_close
> 
> Note that build system (including doc) is also added in this patch.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> --- /dev/null
> +++ b/doc/guides/nics/gve.rst
> @@ -0,0 +1,63 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(C) 2022 Intel Corporation.
> +
> +GVE poll mode driver
> +=======================
> +
> +The GVE PMD (**librte_net_gve**) provides poll mode driver support for
> +Google Virtual Ethernet device (also called as gVNIC).
> +

This is a virtual device, emulated in VM as PCI device, right?
If so what emulates it, I mean can we use QEMU for it?
And is there a kernel supported backend, as virtio has vhost?

> +Current gVNIC is an alternative to the virtIO-based ethernet interface that can
> +support higher network bandwidths such as the 50-100 Gbps speeds.

This is an alternative to virtio, and it would be good to document 
pros/cons of this device/approach, to help users to chose one or other.

Is "support higher network bandwidths" means this device is faster than 
virtio? Is there any performance report?
Aren't there any other notable difference?

I think better to document as much as possible, cc'ed more virtio people.



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 5/8] net/gve: add support for MTU setting
  2022-10-10 10:17                 ` [PATCH v5 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-19 13:47                   ` Ferruh Yigit
  2022-10-20 10:14                     ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 13:47 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin

On 10/10/2022 11:17 AM, Junfeng Guo wrote:

> 
> Support dev_ops mtu_set.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> 
> +static int
> +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> +       struct gve_priv *priv = dev->data->dev_private;
> +       int err;
> +
> +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> +               PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);
> +               return -EINVAL;
> +       }
> +
> +       /* mtu setting is forbidden if port is start */
> +       if (dev->data->dev_started) {
> +               PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
> +               return -EBUSY;
> +       }
> +
> +       dev->data->dev_conf.rxmode.mtu = mtu + RTE_ETHER_HDR_LEN;

it is 'dev->data->mtu' that holds latest MTU value.

'dev_conf.rxmode.mtu' is the config requested from user, no need to 
update that.

And since 'dev->data->mtu' already updated by 'rte_eth_dev_set_mtu()', 
can drop above line.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 6/8] net/gve: add support for dev info get and dev configure
  2022-10-10 10:17                 ` [PATCH v5 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-19 13:49                   ` Ferruh Yigit
  2022-10-20  9:29                     ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 13:49 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu
  Cc: ferruh.yigit, dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin

On 10/10/2022 11:17 AM, Junfeng Guo wrote:

> 
> Add dev_ops dev_infos_get.
> Complete dev_configure with RX offloads configuration.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> --- a/drivers/net/gve/gve_ethdev.c
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -29,8 +29,16 @@ gve_write_version(uint8_t *driver_version_register)
>   }
> 
>   static int
> -gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
> +gve_dev_configure(struct rte_eth_dev *dev)
>   {
> +       struct gve_priv *priv = dev->data->dev_private;
> +
> +       if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
> +               dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
> +

This is force enabling the feature, we are doing this for PMDs that has 
the hash value anyway and no additional work or performance loss 
observed to enable this offload. Otherwise drivers shouldn't update 
'dev_conf.rxmode'.

Can you please confirm this PMD fits above description? And can you 
please add a coment that says force enabling the feature?

> +       if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
> +               priv->enable_rsc = 1;
> +
>          return 0;
>   }
> 
> @@ -94,6 +102,60 @@ gve_dev_close(struct rte_eth_dev *dev)
>          return err;
>   }
> 
> +static int
> +gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +       struct gve_priv *priv = dev->data->dev_private;
> +
> +       dev_info->device = dev->device;
> +       dev_info->max_mac_addrs = 1;
> +       dev_info->max_rx_queues = priv->max_nb_rxq;
> +       dev_info->max_tx_queues = priv->max_nb_txq;
> +       dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
> +       dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
> +       dev_info->max_mtu = RTE_ETHER_MTU;

Can you please confirm max MTU this PMD supports is 1500? Meaning it 
doesn't support jumbo frames etc...

> +       dev_info->min_mtu = RTE_ETHER_MIN_MTU;
> +
> +       dev_info->rx_offload_capa = 0;
> +       dev_info->tx_offload_capa =
> +               RTE_ETH_TX_OFFLOAD_MULTI_SEGS   |
> +               RTE_ETH_TX_OFFLOAD_IPV4_CKSUM   |
> +               RTE_ETH_TX_OFFLOAD_UDP_CKSUM    |
> +               RTE_ETH_TX_OFFLOAD_TCP_CKSUM    |
> +               RTE_ETH_TX_OFFLOAD_SCTP_CKSUM   |
> +               RTE_ETH_TX_OFFLOAD_TCP_TSO;

Can you adverstise these capabilities in the patch that implements them?


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-19 13:47                   ` Ferruh Yigit
@ 2022-10-19 14:02                     ` Xia, Chenbo
  2022-10-19 14:24                     ` Zhang, Helin
  2022-10-19 16:20                     ` Li, Xiaoyun
  2 siblings, 0 replies; 192+ messages in thread
From: Xia, Chenbo @ 2022-10-19 14:02 UTC (permalink / raw)
  To: Ferruh Yigit, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing, Maxime Coquelin
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Lin, Xueqin,
	Wang, Haiyue, Zhang, Helin, Thomas Monjalon

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 9:47 PM
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Maxime
> Coquelin <maxime.coquelin@redhat.com>; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce <bruce.richardson@intel.com>; Lin,
> Xueqin <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>; Zhang,
> Helin <helin.zhang@intel.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
> 
> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> 
> >
> > Support device init and add following devops skeleton:
> >   - dev_configure
> >   - dev_start
> >   - dev_stop
> >   - dev_close
> >
> > Note that build system (including doc) is also added in this patch.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > --- /dev/null
> > +++ b/doc/guides/nics/gve.rst
> > @@ -0,0 +1,63 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(C) 2022 Intel Corporation.
> > +
> > +GVE poll mode driver
> > +=======================
> > +
> > +The GVE PMD (**librte_net_gve**) provides poll mode driver support for
> > +Google Virtual Ethernet device (also called as gVNIC).
> > +
> 
> This is a virtual device, emulated in VM as PCI device, right?

Most likely yes. But I think this part is not public by Google.
(Correct me if I am wrong)

> If so what emulates it, I mean can we use QEMU for it?

Yes if the user can have a emulated gVNIC device or real HW that supports
gVNIC interface.

> And is there a kernel supported backend, as virtio has vhost?

I guess no if the back-end is not public

> 
> > +Current gVNIC is an alternative to the virtIO-based ethernet interface
> that can
> > +support higher network bandwidths such as the 50-100 Gbps speeds.
> 
> This is an alternative to virtio, and it would be good to document
> pros/cons of this device/approach, to help users to chose one or other.

I don't think it's good to compare these two. It's just two virtual interface,
I don't see strong analysis to prove which one is better.

Like if you use Google cloud, you use gVNIC. Use other cloud that uses virtio,
choose virtio.

> 
> Is "support higher network bandwidths" means this device is faster than
> virtio? Is there any performance report?
> Aren't there any other notable difference?

If we want to leave such description, I agree with Ferruh that we need to give
some evidence.

But in my understanding, they are just two similar virtual interfaces. One has
public standard, one is google-specific. Users need to use one of them when they
choose to use one cloud service.

Thanks,
Chenbo

> 
> I think better to document as much as possible, cc'ed more virtio people.
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-19 13:47                   ` Ferruh Yigit
  2022-10-19 14:02                     ` Xia, Chenbo
@ 2022-10-19 14:24                     ` Zhang, Helin
  2022-10-19 21:16                       ` Ferruh Yigit
  2022-10-19 16:20                     ` Li, Xiaoyun
  2 siblings, 1 reply; 192+ messages in thread
From: Zhang, Helin @ 2022-10-19 14:24 UTC (permalink / raw)
  To: Ferruh Yigit, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing,
	Maxime Coquelin, Xia, Chenbo
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Lin, Xueqin,
	Wang, Haiyue, Thomas Monjalon



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com> 
> Sent: Wednesday, October 19, 2022 9:47 PM
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Maxime Coquelin <maxime.coquelin@redhat.com>; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization

> On 10/10/2022 11:17 AM, Junfeng Guo wrote:

> > 
> > Support device init and add following devops skeleton:
> >   - dev_configure
> >   - dev_start
> >   - dev_stop
> >   - dev_close
> > 
> > Note that build system (including doc) is also added in this patch.
> > 
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>
> <...>
>
> > --- /dev/null
> > +++ b/doc/guides/nics/gve.rst
> > @@ -0,0 +1,63 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(C) 2022 Intel Corporation.
> > +
> > +GVE poll mode driver
> > +=======================
> > +
> > +The GVE PMD (**librte_net_gve**) provides poll mode driver support 
> > +for Google Virtual Ethernet device (also called as gVNIC).
> > +
>
> This is a virtual device, emulated in VM as PCI device, right?
> If so what emulates it, I mean can we use QEMU for it?
> And is there a kernel supported backend, as virtio has vhost?
This is a virtual interface only provided on Google Cloud Platform (GCP), like ena on AWS, and virtio on Alibaba Cloud, etc.
The gve (gVNIC) is the standard virtual ethernet interface provided to users when anyone buys a cloud instance on GCP, which is ready there and everybody can access it.
The backend details are not open to anyone outside of Google, I assume it is more like a business & technical secret of Google.
>
> > +Current gVNIC is an alternative to the virtIO-based ethernet 
> > +interface that can support higher network bandwidths such as the 50-100 Gbps speeds.
I don't think it is an alternative to virtio. Gve is the driver name of gVNIC of GCP, which is the standard virtual ethernet interface on that cloud platform.

>
> This is an alternative to virtio, and it would be good to document pros/cons of this device/approach, to help users to chose one or other.
>
> Is "support higher network bandwidths" means this device is faster than virtio? Is there any performance report?
There is no hint that gve (or virtio) interface is faster than virtio (or gve). I think it heavily depends on the backend design, which can be SW or HW.
I would treat gve (actually the driver for gVNIC) as just one of the leading virtual ethernet interfaces on different cloud environments. Till now, gve is for GCP cloud environment only.

Hopefully my understanding is correct, as I am not the expert at Google cloud, and I got all the information from public.

Thanks for all the good questions! Hopefully my answers help!

Regards,
Helin

> Aren't there any other notable difference?
>
> I think better to document as much as possible, cc'ed more virtio people.



^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-19 13:45                   ` Ferruh Yigit
@ 2022-10-19 15:13                     ` Hemant Agrawal
  2022-10-19 15:18                       ` Ferruh Yigit
  2022-10-19 15:48                     ` Li, Xiaoyun
  1 sibling, 1 reply; 192+ messages in thread
From: Hemant Agrawal @ 2022-10-19 15:13 UTC (permalink / raw)
  To: Ferruh Yigit, Junfeng Guo, qi.z.zhang, jingjing.wu, Stephen Hemminger
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> > The following base code is based on Google Virtual Ethernet (gve)
> > driver v1.3.0 under MIT license.
> > - gve_adminq.c
> > - gve_adminq.h
> > - gve_desc.h
> > - gve_desc_dqo.h
> > - gve_register.h
> > - gve.h
> >
> > The original code is in:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FGoogleCloudPlatform%2Fcompute-virtual-ethernet-
> linux%2F%2F&am
> >
> p;data=05%7C01%7Chemant.agrawal%40nxp.com%7C45cbc9718dcc40d04e4
> 508dab1
> >
> d82440%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6380178391
> 21579415
> > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LCJBTiI6I
> >
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LHunq53xMl8i
> W6%2B3scjZ
> > q0Bx7oF08yLWk424aw5lnwA%3D&amp;reserved=0
> > tree/v1.3.0/google/gve
> >
> > Note that these code are not Intel files and they come from the kernel
> > community. The base code there has the statement of
> > SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> > required MIT license as an exception to DPDK.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
> > new file mode 100644 index 0000000000..1b0d59b639
> > --- /dev/null
> > +++ b/drivers/net/gve/base/gve.h
> > @@ -0,0 +1,58 @@
> > +/* SPDX-License-Identifier: MIT
> > + * Google Virtual Ethernet (gve) driver
> > + * Version: 1.3.0
> 
> There is a version macro in the code, is version information required in the
> file comment?
> 
> > + * Copyright (C) 2015-2022 Google, Inc.
> > + * Copyright(C) 2022 Intel Corporation
> 
> I don't know if it is OK to add Intel copyright, as far as I know this requires big
> enough contribution to the code, if this is copy of existing code, may be only
> original copyright should exist.
> 
[Hemant] Yes, the general guideline is that one should add their copyright if they have big enough contribution.  But at the end it is a guideline - not the rule.
It is up-to the original copyright holder to object.

> cc'ed @Hemant and @Stephen for more comment.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-19 15:13                     ` Hemant Agrawal
@ 2022-10-19 15:18                       ` Ferruh Yigit
  2022-10-20  3:33                         ` Hemant Agrawal
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 15:18 UTC (permalink / raw)
  To: Hemant Agrawal, Junfeng Guo, qi.z.zhang, jingjing.wu, Stephen Hemminger
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

On 10/19/2022 4:13 PM, Hemant Agrawal wrote:
>> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
>>> The following base code is based on Google Virtual Ethernet (gve)
>>> driver v1.3.0 under MIT license.
>>> - gve_adminq.c
>>> - gve_adminq.h
>>> - gve_desc.h
>>> - gve_desc_dqo.h
>>> - gve_register.h
>>> - gve.h
>>>
>>> The original code is in:
>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>>> ub.com%2FGoogleCloudPlatform%2Fcompute-virtual-ethernet-
>> linux%2F%2F&am
>>>
>> p;data=05%7C01%7Chemant.agrawal%40nxp.com%7C45cbc9718dcc40d04e4
>> 508dab1
>>>
>> d82440%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6380178391
>> 21579415
>>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
>> LCJBTiI6I
>>>
>> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LHunq53xMl8i
>> W6%2B3scjZ
>>> q0Bx7oF08yLWk424aw5lnwA%3D&amp;reserved=0
>>> tree/v1.3.0/google/gve
>>>
>>> Note that these code are not Intel files and they come from the kernel
>>> community. The base code there has the statement of
>>> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
>>> required MIT license as an exception to DPDK.
>>>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
>>> new file mode 100644 index 0000000000..1b0d59b639
>>> --- /dev/null
>>> +++ b/drivers/net/gve/base/gve.h
>>> @@ -0,0 +1,58 @@
>>> +/* SPDX-License-Identifier: MIT
>>> + * Google Virtual Ethernet (gve) driver
>>> + * Version: 1.3.0
>>
>> There is a version macro in the code, is version information required in the
>> file comment?
>>
>>> + * Copyright (C) 2015-2022 Google, Inc.
>>> + * Copyright(C) 2022 Intel Corporation
>>
>> I don't know if it is OK to add Intel copyright, as far as I know this requires big
>> enough contribution to the code, if this is copy of existing code, may be only
>> original copyright should exist.
>>
> [Hemant] Yes, the general guideline is that one should add their copyright if they have big enough contribution.  But at the end it is a guideline - not the rule.
> It is up-to the original copyright holder to object.

Does this mean as long as original copyright holder did not object, it 
is OK to add more copyright?
I don't think they are represented or aware of it this change at all, I 
believe we (as community) also have responsibility to make these things 
correct, in our capacity.

> 
>> cc'ed @Hemant and @Stephen for more comment.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-19 13:45                   ` Ferruh Yigit
  2022-10-19 15:13                     ` Hemant Agrawal
@ 2022-10-19 15:48                     ` Li, Xiaoyun
  2022-10-19 20:52                       ` Ferruh Yigit
  1 sibling, 1 reply; 192+ messages in thread
From: Li, Xiaoyun @ 2022-10-19 15:48 UTC (permalink / raw)
  To: Ferruh Yigit, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing,
	Hemant Agrawal, Stephen Hemminger
  Cc: dev, awogbemila, Richardson, Bruce, Lin, Xueqin, Wang, Haiyue

Hi

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 14:45
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Hemant
> Agrawal <hemant.agrawal@nxp.com>; Stephen Hemminger
> <stephen@networkplumber.org>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce <bruce.richardson@intel.com>;
> Lin, Xueqin <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
> 
> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> > The following base code is based on Google Virtual Ethernet (gve)
> > driver v1.3.0 under MIT license.
> > - gve_adminq.c
> > - gve_adminq.h
> > - gve_desc.h
> > - gve_desc_dqo.h
> > - gve_register.h
> > - gve.h
> >
> > The original code is in:
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/
> > \
> > tree/v1.3.0/google/gve
> >
> > Note that these code are not Intel files and they come from the kernel
> > community. The base code there has the statement of
> > SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> > required MIT license as an exception to DPDK.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
> > new file mode 100644 index 0000000000..1b0d59b639
> > --- /dev/null
> > +++ b/drivers/net/gve/base/gve.h
> > @@ -0,0 +1,58 @@
> > +/* SPDX-License-Identifier: MIT
> > + * Google Virtual Ethernet (gve) driver
> > + * Version: 1.3.0
> 
> There is a version macro in the code, is version information required in the
> file comment?

Different versions of gve kernel driver change a lot. So for reference, I think adding the version info for the base code makes sense. Just tell the following update people which version is used.

> 
> > + * Copyright (C) 2015-2022 Google, Inc.
> > + * Copyright(C) 2022 Intel Corporation
> 
> I don't know if it is OK to add Intel copyright, as far as I know this requires big
> enough contribution to the code, if this is copy of existing code, may be only
> original copyright should exist.

It's not just directly copy. Directly copy like gve_desc.h doesn't have Intel copyright.
But gve.h, in gve kernel driver, it has a lot of info dpdk doesn't need or dpdk has its own version like txq/rxq info.
I'm not sure the contribution is a lot or not. But I suppose this patchset is following the principle that if the code is changed, intel copy right is added, otherwise, only google's copyright.

> 
> cc'ed @Hemant and @Stephen for more comment.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-19 13:46                   ` Ferruh Yigit
@ 2022-10-19 15:59                     ` Li, Xiaoyun
  2022-10-19 21:00                       ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Li, Xiaoyun @ 2022-10-19 15:59 UTC (permalink / raw)
  To: Ferruh Yigit, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, awogbemila, Richardson,  Bruce, Lin, Xueqin,
	Wang, Haiyue

Hi

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 14:46
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
> 
> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> >
> > Support device init and add following devops skeleton:
> >   - dev_configure
> >   - dev_start
> >   - dev_stop
> >   - dev_close
> >
> > Note that build system (including doc) is also added in this patch.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index fbb575255f..c1162ea1a4 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -200,6 +200,11 @@ New Features
> >     into single event containing ``rte_event_vector``
> >     whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
> >
> > +* **Added GVE net PMD**
> > +
> > +  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
> > +  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
> > +
> >
> 
> Can you please move the block amaong the other ethdev drivers, as
> alphabetically sorted?
> 
> <...>
> 
> > +static int
> > +gve_dev_init(struct rte_eth_dev *eth_dev) {
> > +       struct gve_priv *priv = eth_dev->data->dev_private;
> > +       int max_tx_queues, max_rx_queues;
> > +       struct rte_pci_device *pci_dev;
> > +       struct gve_registers *reg_bar;
> > +       rte_be32_t *db_bar;
> > +       int err;
> > +
> > +       eth_dev->dev_ops = &gve_eth_dev_ops;
> > +
> > +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> > +               return 0;
> > +
> > +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> > +
> > +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> > +       if (!reg_bar) {
> > +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> > +               return -ENOMEM;
> > +       }
> > +
> > +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> > +       if (!db_bar) {
> > +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> > +               return -ENOMEM;
> > +       }
> > +
> > +       gve_write_version(&reg_bar->driver_version);
> > +       /* Get max queues to alloc etherdev */
> > +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> > +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> > +
> > +       priv->reg_bar0 = reg_bar;
> > +       priv->db_bar2 = db_bar;
> > +       priv->pci_dev = pci_dev;
> > +       priv->state_flags = 0x0;
> > +
> > +       priv->max_nb_txq = max_tx_queues;
> > +       priv->max_nb_rxq = max_rx_queues;
> > +
> > +       err = gve_init_priv(priv, false);
> > +       if (err)
> > +               return err;
> > +
> > +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct
> rte_ether_addr), 0);
> > +       if (!eth_dev->data->mac_addrs) {
> > +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac
> address");
> > +               return -ENOMEM;
> > +       }
> > +       rte_ether_addr_copy(&priv->dev_addr,
> > + eth_dev->data->mac_addrs);
> > +
> 
> Is anything assinged to 'priv->dev_addr' to copy?
> Also since there is a 'priv->dev_addr' field, why not use it directly, instead of
> allocating memory for 'eth_dev->data->mac_addrs'?
> I mean why not "eth_dev->data->mac_addrs = &priv->dev_addr"?

Makes sense. There's no need to allocate a new memory. @Guo, Junfeng Can you update this?
> 
> <...>
> 
> > +struct gve_priv {
> > +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> > +       const struct rte_memzone *irq_dbs_mz;
> > +       uint32_t mgmt_msix_idx;
> > +       rte_be32_t *cnt_array; /* array of num_event_counters */
> > +       const struct rte_memzone *cnt_array_mz;
> > +
> > +       uint16_t num_event_counters;
> > +       uint16_t tx_desc_cnt; /* txq size */
> > +       uint16_t rx_desc_cnt; /* rxq size */
> > +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> > +       uint16_t rx_data_slot_cnt; /* rx buffer length */
> 
> These fields are not used in this patch, I guess some will be used in datapath
> patch.

This is needed for base code gve_adminq.c not for datapath. Most of the stuff in gve_priv is for gve_adminq.c.
The adminq will update this info which dpdk pmd will need later. Compiler will complain if these don't exsit.

> 
> Can you please only add fields that is used in the patch? This way it will be
> clear in which functionality that field is used and enable to detect not used
> fields.
> We are accepting batch updates for base code, but this is dpdk related code,
> lets only add things that are used when they are used.
> Same for all data structures.
> 
> <...>
> 
> > diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
> > new file mode 100644 index 0000000000..c2e0723b4c
> > --- /dev/null
> > +++ b/drivers/net/gve/version.map
> > @@ -0,0 +1,3 @@
> > +DPDK_22 {
> 
> DPDK_23

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-19 13:47                   ` Ferruh Yigit
  2022-10-19 14:02                     ` Xia, Chenbo
  2022-10-19 14:24                     ` Zhang, Helin
@ 2022-10-19 16:20                     ` Li, Xiaoyun
  2 siblings, 0 replies; 192+ messages in thread
From: Li, Xiaoyun @ 2022-10-19 16:20 UTC (permalink / raw)
  To: Ferruh Yigit, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing,
	Maxime Coquelin, Xia, Chenbo
  Cc: dev, awogbemila, Richardson, Bruce, Lin, Xueqin, Wang, Haiyue,
	Zhang, Helin, Thomas Monjalon

Hi

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 14:47
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Maxime
> Coquelin <maxime.coquelin@redhat.com>; Xia, Chenbo
> <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce <bruce.richardson@intel.com>;
> Lin, Xueqin <xueqin.lin@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Thomas
> Monjalon <thomas@monjalon.net>
> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
> 
> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> 
> >
> > Support device init and add following devops skeleton:
> >   - dev_configure
> >   - dev_start
> >   - dev_stop
> >   - dev_close
> >
> > Note that build system (including doc) is also added in this patch.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > --- /dev/null
> > +++ b/doc/guides/nics/gve.rst
> > @@ -0,0 +1,63 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(C) 2022 Intel Corporation.
> > +
> > +GVE poll mode driver
> > +=======================
> > +
> > +The GVE PMD (**librte_net_gve**) provides poll mode driver support
> > +for Google Virtual Ethernet device (also called as gVNIC).
> > +
> 
> This is a virtual device, emulated in VM as PCI device, right?

It's similar to ENA driver for AWS. This is Google's own device for GCP.

> If so what emulates it, I mean can we use QEMU for it?
> And is there a kernel supported backend, as virtio has vhost?

No. backend is not public.

> 
> > +Current gVNIC is an alternative to the virtIO-based ethernet
> > +interface that can support higher network bandwidths such as the 50-100
> Gbps speeds.
> 
> This is an alternative to virtio, and it would be good to document pros/cons of
> this device/approach, to help users to chose one or other.

In GCP, they provide two types of devices which users can choose for the interfaces of the instance, virtio or gvnic, if users choose gvnic, then gve driver is needed and if they choose virtio, vitio is used.
I'm not sure we should suggest which one to choose.

I guess the best thing to do is just linking google's doc and let users to choose themselves.
https://cloud.google.com/compute/docs/networking/using-gvnic

> 
> Is "support higher network bandwidths" means this device is faster than
> virtio? Is there any performance report?
> Aren't there any other notable difference?

This is from https://cloud.google.com/compute/docs/networking/using-gvnic.
There is no official performance report. Google didn't provide any.

From our private testing, gvnic is faster than virtio using the same instance type but virtio is more stable.
But our testing can be wrong. It all depends how you config and use cloud env.

> 
> I think better to document as much as possible, cc'ed more virtio people.
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-19 15:48                     ` Li, Xiaoyun
@ 2022-10-19 20:52                       ` Ferruh Yigit
  2022-10-20  8:50                         ` Li, Xiaoyun
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 20:52 UTC (permalink / raw)
  To: Li, Xiaoyun, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing,
	Hemant Agrawal, Stephen Hemminger
  Cc: dev, awogbemila, Richardson, Bruce, Lin, Xueqin, Wang, Haiyue

On 10/19/2022 4:48 PM, Li, Xiaoyun wrote:
> Hi
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, October 19, 2022 14:45
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Hemant
>> Agrawal <hemant.agrawal@nxp.com>; Stephen Hemminger
>> <stephen@networkplumber.org>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> awogbemila@google.com; Richardson, Bruce <bruce.richardson@intel.com>;
>> Lin, Xueqin <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
>> Subject: Re: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
>>
>> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
>>> The following base code is based on Google Virtual Ethernet (gve)
>>> driver v1.3.0 under MIT license.
>>> - gve_adminq.c
>>> - gve_adminq.h
>>> - gve_desc.h
>>> - gve_desc_dqo.h
>>> - gve_register.h
>>> - gve.h
>>>
>>> The original code is in:
>>> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/
>>> \
>>> tree/v1.3.0/google/gve
>>>
>>> Note that these code are not Intel files and they come from the kernel
>>> community. The base code there has the statement of
>>> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
>>> required MIT license as an exception to DPDK.
>>>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
>>> new file mode 100644 index 0000000000..1b0d59b639
>>> --- /dev/null
>>> +++ b/drivers/net/gve/base/gve.h
>>> @@ -0,0 +1,58 @@
>>> +/* SPDX-License-Identifier: MIT
>>> + * Google Virtual Ethernet (gve) driver
>>> + * Version: 1.3.0
>>
>> There is a version macro in the code, is version information required in the
>> file comment?
> 
> Different versions of gve kernel driver change a lot. So for reference, I think adding the version info for the base code makes sense. Just tell the following update people which version is used.
> 

No problem to add version to base code, it is already in the code as macro:

  #define GVE_VERSION		"1.3.0"
  #define GVE_VERSION_PREFIX	"GVE-"

My question is if to have is in the *file comment* or not, for
it is duplicate and another thing to maintain (I won't get surprised in 
the future someone update the macro but not file comment).

>>
>>> + * Copyright (C) 2015-2022 Google, Inc.
>>> + * Copyright(C) 2022 Intel Corporation
>>
>> I don't know if it is OK to add Intel copyright, as far as I know this requires big
>> enough contribution to the code, if this is copy of existing code, may be only
>> original copyright should exist.
> 
> It's not just directly copy. Directly copy like gve_desc.h doesn't have Intel copyright.
> But gve.h, in gve kernel driver, it has a lot of info dpdk doesn't need or dpdk has its own version like txq/rxq info.
> I'm not sure the contribution is a lot or not. But I suppose this patchset is following the principle that if the code is changed, intel copy right is added, otherwise, only google's copyright.
> 

Thanks Xiaoyun confirming it is not direct copy, can someone in your end 
check Intel's addition to the file, and if it justifies Copyright or not?

>>
>> cc'ed @Hemant and @Stephen for more comment.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-19 15:59                     ` Li, Xiaoyun
@ 2022-10-19 21:00                       ` Ferruh Yigit
  2022-10-20  9:29                         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 21:00 UTC (permalink / raw)
  To: Li, Xiaoyun, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, awogbemila, Richardson, Bruce, Lin, Xueqin,
	Wang, Haiyue

On 10/19/2022 4:59 PM, Li, Xiaoyun wrote:

> 
> Hi
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, October 19, 2022 14:46
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
>> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
>> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Wang,
>> Haiyue <haiyue.wang@intel.com>
>> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
>>
>> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
>>>
>>> Support device init and add following devops skeleton:
>>>    - dev_configure
>>>    - dev_start
>>>    - dev_stop
>>>    - dev_close
>>>
>>> Note that build system (including doc) is also added in this patch.
>>>
>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> diff --git a/doc/guides/rel_notes/release_22_11.rst
>>> b/doc/guides/rel_notes/release_22_11.rst
>>> index fbb575255f..c1162ea1a4 100644
>>> --- a/doc/guides/rel_notes/release_22_11.rst
>>> +++ b/doc/guides/rel_notes/release_22_11.rst
>>> @@ -200,6 +200,11 @@ New Features
>>>      into single event containing ``rte_event_vector``
>>>      whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
>>>
>>> +* **Added GVE net PMD**
>>> +
>>> +  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
>>> +  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
>>> +
>>>
>>
>> Can you please move the block amaong the other ethdev drivers, as
>> alphabetically sorted?
>>
>> <...>
>>
>>> +static int
>>> +gve_dev_init(struct rte_eth_dev *eth_dev) {
>>> +       struct gve_priv *priv = eth_dev->data->dev_private;
>>> +       int max_tx_queues, max_rx_queues;
>>> +       struct rte_pci_device *pci_dev;
>>> +       struct gve_registers *reg_bar;
>>> +       rte_be32_t *db_bar;
>>> +       int err;
>>> +
>>> +       eth_dev->dev_ops = &gve_eth_dev_ops;
>>> +
>>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>>> +               return 0;
>>> +
>>> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
>>> +
>>> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
>>> +       if (!reg_bar) {
>>> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
>>> +               return -ENOMEM;
>>> +       }
>>> +
>>> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
>>> +       if (!db_bar) {
>>> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
>>> +               return -ENOMEM;
>>> +       }
>>> +
>>> +       gve_write_version(&reg_bar->driver_version);
>>> +       /* Get max queues to alloc etherdev */
>>> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
>>> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
>>> +
>>> +       priv->reg_bar0 = reg_bar;
>>> +       priv->db_bar2 = db_bar;
>>> +       priv->pci_dev = pci_dev;
>>> +       priv->state_flags = 0x0;
>>> +
>>> +       priv->max_nb_txq = max_tx_queues;
>>> +       priv->max_nb_rxq = max_rx_queues;
>>> +
>>> +       err = gve_init_priv(priv, false);
>>> +       if (err)
>>> +               return err;
>>> +
>>> +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct
>> rte_ether_addr), 0);
>>> +       if (!eth_dev->data->mac_addrs) {
>>> +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac
>> address");
>>> +               return -ENOMEM;
>>> +       }
>>> +       rte_ether_addr_copy(&priv->dev_addr,
>>> + eth_dev->data->mac_addrs);
>>> +
>>
>> Is anything assinged to 'priv->dev_addr' to copy?
>> Also since there is a 'priv->dev_addr' field, why not use it directly, instead of
>> allocating memory for 'eth_dev->data->mac_addrs'?
>> I mean why not "eth_dev->data->mac_addrs = &priv->dev_addr"?
> 
> Makes sense. There's no need to allocate a new memory. @Guo, Junfeng Can you update this?
>>
>> <...>
>>
>>> +struct gve_priv {
>>> +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
>>> +       const struct rte_memzone *irq_dbs_mz;
>>> +       uint32_t mgmt_msix_idx;
>>> +       rte_be32_t *cnt_array; /* array of num_event_counters */
>>> +       const struct rte_memzone *cnt_array_mz;
>>> +
>>> +       uint16_t num_event_counters;
>>> +       uint16_t tx_desc_cnt; /* txq size */
>>> +       uint16_t rx_desc_cnt; /* rxq size */
>>> +       uint16_t tx_pages_per_qpl; /* tx buffer length */
>>> +       uint16_t rx_data_slot_cnt; /* rx buffer length */
>>
>> These fields are not used in this patch, I guess some will be used in datapath
>> patch.
> 
> This is needed for base code gve_adminq.c not for datapath. Most of the stuff in gve_priv is for gve_adminq.c.
> The adminq will update this info which dpdk pmd will need later. Compiler will complain if these don't exsit.
> 

You are right they are used by 'gve_adminq.c', so OK to keep them, if 
there are ones not used at this stage, can you add them whenever they 
are used, or remove them if not used at all. If all used/required, no 
change required.

>>
>> Can you please only add fields that is used in the patch? This way it will be
>> clear in which functionality that field is used and enable to detect not used
>> fields.
>> We are accepting batch updates for base code, but this is dpdk related code,
>> lets only add things that are used when they are used.
>> Same for all data structures.
>>
>> <...>
>>
>>> diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
>>> new file mode 100644 index 0000000000..c2e0723b4c
>>> --- /dev/null
>>> +++ b/drivers/net/gve/version.map
>>> @@ -0,0 +1,3 @@
>>> +DPDK_22 {
>>
>> DPDK_23


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-19 14:24                     ` Zhang, Helin
@ 2022-10-19 21:16                       ` Ferruh Yigit
  0 siblings, 0 replies; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-19 21:16 UTC (permalink / raw)
  To: Zhang, Helin, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing,
	Maxime Coquelin, Xia, Chenbo, Xiaoyun Li
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, Lin, Xueqin,
	Wang, Haiyue, Thomas Monjalon

On 10/19/2022 3:24 PM, Zhang, Helin wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, October 19, 2022 9:47 PM
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Maxime Coquelin <maxime.coquelin@redhat.com>; Xia, Chenbo <chenbo.xia@intel.com>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Thomas Monjalon <thomas@monjalon.net>
>> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
> 
>> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> 
>>>
>>> Support device init and add following devops skeleton:
>>>    - dev_configure
>>>    - dev_start
>>>    - dev_stop
>>>    - dev_close
>>>
>>> Note that build system (including doc) is also added in this patch.
>>>
>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> --- /dev/null
>>> +++ b/doc/guides/nics/gve.rst
>>> @@ -0,0 +1,63 @@
>>> +..  SPDX-License-Identifier: BSD-3-Clause
>>> +    Copyright(C) 2022 Intel Corporation.
>>> +
>>> +GVE poll mode driver
>>> +=======================
>>> +
>>> +The GVE PMD (**librte_net_gve**) provides poll mode driver support
>>> +for Google Virtual Ethernet device (also called as gVNIC).
>>> +
>>
>> This is a virtual device, emulated in VM as PCI device, right?
>> If so what emulates it, I mean can we use QEMU for it?
>> And is there a kernel supported backend, as virtio has vhost?
> This is a virtual interface only provided on Google Cloud Platform (GCP), like ena on AWS, and virtio on Alibaba Cloud, etc.
> The gve (gVNIC) is the standard virtual ethernet interface provided to users when anyone buys a cloud instance on GCP, which is ready there and everybody can access it.
> The backend details are not open to anyone outside of Google, I assume it is more like a business & technical secret of Google.
>>
>>> +Current gVNIC is an alternative to the virtIO-based ethernet
>>> +interface that can support higher network bandwidths such as the 50-100 Gbps speeds.
> I don't think it is an alternative to virtio. Gve is the driver name of gVNIC of GCP, which is the standard virtual ethernet interface on that cloud platform.
> 

Thanks Helin, Chenbo, Xiaoyun for clarification,

My understanding that gVNIC is another virtual interface spec like 
'virtio' is not correct, so comparing this with virtio and to document 
it is not really necessary.

Perhaps "gVNIC is an alternative to the virtIO-based ethernet interface" 
part in the documentation can be updated slightly, because as far as I 
understand gVNIC is an alternative to virtio in the *Google Cloud 
platform*, not a general alternative.

Overall, agree to enable virtual network interface in Google cloud and 
as you all highlighted we have similar device support for other cloud 
platforms.

Thanks,
ferruh

>>
>> This is an alternative to virtio, and it would be good to document pros/cons of this device/approach, to help users to chose one or other.
>>
>> Is "support higher network bandwidths" means this device is faster than virtio? Is there any performance report?
> There is no hint that gve (or virtio) interface is faster than virtio (or gve). I think it heavily depends on the backend design, which can be SW or HW.
> I would treat gve (actually the driver for gVNIC) as just one of the leading virtual ethernet interfaces on different cloud environments. Till now, gve is for GCP cloud environment only.
> 
> Hopefully my understanding is correct, as I am not the expert at Google cloud, and I got all the information from public.
> 
> Thanks for all the good questions! Hopefully my answers help!
> 
> Regards,
> Helin
> 
>> Aren't there any other notable difference?
>>
>> I think better to document as much as possible, cc'ed more virtio people.
> 
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-19 15:18                       ` Ferruh Yigit
@ 2022-10-20  3:33                         ` Hemant Agrawal
  0 siblings, 0 replies; 192+ messages in thread
From: Hemant Agrawal @ 2022-10-20  3:33 UTC (permalink / raw)
  To: Ferruh Yigit, Junfeng Guo, qi.z.zhang, jingjing.wu, Stephen Hemminger
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, xueqin.lin, Haiyue Wang

> On 10/19/2022 4:13 PM, Hemant Agrawal wrote:
> >> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> >>> The following base code is based on Google Virtual Ethernet (gve)
> >>> driver v1.3.0 under MIT license.
> >>> - gve_adminq.c
> >>> - gve_adminq.h
> >>> - gve_desc.h
> >>> - gve_desc_dqo.h
> >>> - gve_register.h
> >>> - gve.h
> >>>
> >>> The original code is in:
> >>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> >>> th
> >>> ub.com%2FGoogleCloudPlatform%2Fcompute-virtual-ethernet-
> >> linux%2F%2F&am
> >>>
> >>
> p;data=05%7C01%7Chemant.agrawal%40nxp.com%7C45cbc9718dcc40d04e4
> >> 508dab1
> >>>
> >>
> d82440%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6380178391
> >> 21579415
> >>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2lu
> MzIi
> >> LCJBTiI6I
> >>>
> >>
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LHunq53xMl8i
> >> W6%2B3scjZ
> >>> q0Bx7oF08yLWk424aw5lnwA%3D&amp;reserved=0
> >>> tree/v1.3.0/google/gve
> >>>
> >>> Note that these code are not Intel files and they come from the
> >>> kernel community. The base code there has the statement of
> >>> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> >>> required MIT license as an exception to DPDK.
> >>>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
> >>> new file mode 100644 index 0000000000..1b0d59b639
> >>> --- /dev/null
> >>> +++ b/drivers/net/gve/base/gve.h
> >>> @@ -0,0 +1,58 @@
> >>> +/* SPDX-License-Identifier: MIT
> >>> + * Google Virtual Ethernet (gve) driver
> >>> + * Version: 1.3.0
> >>
> >> There is a version macro in the code, is version information required
> >> in the file comment?
> >>
> >>> + * Copyright (C) 2015-2022 Google, Inc.
> >>> + * Copyright(C) 2022 Intel Corporation
> >>
> >> I don't know if it is OK to add Intel copyright, as far as I know
> >> this requires big enough contribution to the code, if this is copy of
> >> existing code, may be only original copyright should exist.
> >>
> > [Hemant] Yes, the general guideline is that one should add their copyright if
> they have big enough contribution.  But at the end it is a guideline - not the
> rule.
> > It is up-to the original copyright holder to object.
> 
> Does this mean as long as original copyright holder did not object, it is OK to
> add more copyright?
> I don't think they are represented or aware of it this change at all, I believe
> we (as community) also have responsibility to make these things correct, in
> our capacity.

[Hemant] I tried to convey the same in decent words.  
Yes, it is incorrect to add copyright without major contribution change.
Intel team shall provide details about what is their contribution over the original code. Or they should remove their copyright

> >
> >> cc'ed @Hemant and @Stephen for more comment.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-19 20:52                       ` Ferruh Yigit
@ 2022-10-20  8:50                         ` Li, Xiaoyun
  0 siblings, 0 replies; 192+ messages in thread
From: Li, Xiaoyun @ 2022-10-20  8:50 UTC (permalink / raw)
  To: Ferruh Yigit, Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing,
	Hemant Agrawal, Stephen Hemminger
  Cc: dev, awogbemila, Richardson, Bruce, Lin, Xueqin, Wang, Haiyue

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 21:53
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Guo, Junfeng
> <junfeng.guo@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Hemant Agrawal <hemant.agrawal@nxp.com>;
> Stephen Hemminger <stephen@networkplumber.org>
> Cc: dev@dpdk.org; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
> 
> On 10/19/2022 4:48 PM, Li, Xiaoyun wrote:
> > Hi
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, October 19, 2022 14:45
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Hemant
> >> Agrawal <hemant.agrawal@nxp.com>; Stephen Hemminger
> >> <stephen@networkplumber.org>
> >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> >> Wang, Haiyue <haiyue.wang@intel.com>
> >> Subject: Re: [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code
> >>
> >> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> >>> The following base code is based on Google Virtual Ethernet (gve)
> >>> driver v1.3.0 under MIT license.
> >>> - gve_adminq.c
> >>> - gve_adminq.h
> >>> - gve_desc.h
> >>> - gve_desc_dqo.h
> >>> - gve_register.h
> >>> - gve.h
> >>>
> >>> The original code is in:
> >>> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linu
> >>> x/
> >>> \
> >>> tree/v1.3.0/google/gve
> >>>
> >>> Note that these code are not Intel files and they come from the
> >>> kernel community. The base code there has the statement of
> >>> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> >>> required MIT license as an exception to DPDK.
> >>>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
> >>> new file mode 100644 index 0000000000..1b0d59b639
> >>> --- /dev/null
> >>> +++ b/drivers/net/gve/base/gve.h
> >>> @@ -0,0 +1,58 @@
> >>> +/* SPDX-License-Identifier: MIT
> >>> + * Google Virtual Ethernet (gve) driver
> >>> + * Version: 1.3.0
> >>
> >> There is a version macro in the code, is version information required
> >> in the file comment?
> >
> > Different versions of gve kernel driver change a lot. So for reference, I
> think adding the version info for the base code makes sense. Just tell the
> following update people which version is used.
> >
> 
> No problem to add version to base code, it is already in the code as macro:
> 
>   #define GVE_VERSION		"1.3.0"
>   #define GVE_VERSION_PREFIX	"GVE-"
> 
> My question is if to have is in the *file comment* or not, for it is duplicate
> and another thing to maintain (I won't get surprised in the future someone
> update the macro but not file comment).

Forgot about this one. Then it's better to remove it from copyright place then. @Guo, Junfeng
> 
> >>
> >>> + * Copyright (C) 2015-2022 Google, Inc.
> >>> + * Copyright(C) 2022 Intel Corporation
> >>
> >> I don't know if it is OK to add Intel copyright, as far as I know
> >> this requires big enough contribution to the code, if this is copy of
> >> existing code, may be only original copyright should exist.
> >
> > It's not just directly copy. Directly copy like gve_desc.h doesn't have Intel
> copyright.
> > But gve.h, in gve kernel driver, it has a lot of info dpdk doesn't need or
> dpdk has its own version like txq/rxq info.
> > I'm not sure the contribution is a lot or not. But I suppose this patchset is
> following the principle that if the code is changed, intel copy right is added,
> otherwise, only google's copyright.
> >
> 
> Thanks Xiaoyun confirming it is not direct copy, can someone in your end
> check Intel's addition to the file, and if it justifies Copyright or not?

@Guo, Junfeng Can you update this? I think you can ask Qi's opinion.
If people want to keep the copyright, add one short sentence under copyright to explain what's changed.

In my opinion, there're many changes but those changes are only for adapting kernel code to DPDK.
I'm not sure that counts a lot or not and I think Intel's copyright can be removed for the base code.
Afterall, Intel's main contribution is ethdev part and dpdk datapath.

> 
> >>
> >> cc'ed @Hemant and @Stephen for more comment.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-19 21:00                       ` Ferruh Yigit
@ 2022-10-20  9:29                         ` Guo, Junfeng
  2022-10-20 11:15                           ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-20  9:29 UTC (permalink / raw)
  To: Ferruh Yigit, Li, Xiaoyun, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, awogbemila, Richardson,  Bruce, Lin, Xueqin,
	Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 05:01
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Guo, Junfeng
> <junfeng.guo@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; awogbemila@google.com;
> Richardson, Bruce <bruce.richardson@intel.com>; Lin, Xueqin
> <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
> 
> On 10/19/2022 4:59 PM, Li, Xiaoyun wrote:
> 
> >
> > Hi
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, October 19, 2022 14:46
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> >> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> >> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Wang,
> >> Haiyue <haiyue.wang@intel.com>
> >> Subject: Re: [PATCH v5 3/8] net/gve: add support for device
> initialization
> >>
> >> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> >>>
> >>> Support device init and add following devops skeleton:
> >>>    - dev_configure
> >>>    - dev_start
> >>>    - dev_stop
> >>>    - dev_close
> >>>
> >>> Note that build system (including doc) is also added in this patch.
> >>>
> >>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> diff --git a/doc/guides/rel_notes/release_22_11.rst
> >>> b/doc/guides/rel_notes/release_22_11.rst
> >>> index fbb575255f..c1162ea1a4 100644
> >>> --- a/doc/guides/rel_notes/release_22_11.rst
> >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> >>> @@ -200,6 +200,11 @@ New Features
> >>>      into single event containing ``rte_event_vector``
> >>>      whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
> >>>
> >>> +* **Added GVE net PMD**
> >>> +
> >>> +  * Added the new ``gve`` net driver for Google Virtual Ethernet
> devices.
> >>> +  * See the :doc:`../nics/gve` NIC guide for more details on this new
> driver.
> >>> +
> >>>
> >>
> >> Can you please move the block amaong the other ethdev drivers, as
> >> alphabetically sorted?
> >>
> >> <...>
> >>
> >>> +static int
> >>> +gve_dev_init(struct rte_eth_dev *eth_dev) {
> >>> +       struct gve_priv *priv = eth_dev->data->dev_private;
> >>> +       int max_tx_queues, max_rx_queues;
> >>> +       struct rte_pci_device *pci_dev;
> >>> +       struct gve_registers *reg_bar;
> >>> +       rte_be32_t *db_bar;
> >>> +       int err;
> >>> +
> >>> +       eth_dev->dev_ops = &gve_eth_dev_ops;
> >>> +
> >>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> >>> +               return 0;
> >>> +
> >>> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> >>> +
> >>> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> >>> +       if (!reg_bar) {
> >>> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> >>> +               return -ENOMEM;
> >>> +       }
> >>> +
> >>> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> >>> +       if (!db_bar) {
> >>> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> >>> +               return -ENOMEM;
> >>> +       }
> >>> +
> >>> +       gve_write_version(&reg_bar->driver_version);
> >>> +       /* Get max queues to alloc etherdev */
> >>> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> >>> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> >>> +
> >>> +       priv->reg_bar0 = reg_bar;
> >>> +       priv->db_bar2 = db_bar;
> >>> +       priv->pci_dev = pci_dev;
> >>> +       priv->state_flags = 0x0;
> >>> +
> >>> +       priv->max_nb_txq = max_tx_queues;
> >>> +       priv->max_nb_rxq = max_rx_queues;
> >>> +
> >>> +       err = gve_init_priv(priv, false);
> >>> +       if (err)
> >>> +               return err;
> >>> +
> >>> +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac",
> sizeof(struct
> >> rte_ether_addr), 0);
> >>> +       if (!eth_dev->data->mac_addrs) {
> >>> +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store
> mac
> >> address");
> >>> +               return -ENOMEM;
> >>> +       }
> >>> +       rte_ether_addr_copy(&priv->dev_addr,
> >>> + eth_dev->data->mac_addrs);
> >>> +
> >>
> >> Is anything assinged to 'priv->dev_addr' to copy?
> >> Also since there is a 'priv->dev_addr' field, why not use it directly,
> instead of
> >> allocating memory for 'eth_dev->data->mac_addrs'?
> >> I mean why not "eth_dev->data->mac_addrs = &priv->dev_addr"?
> >
> > Makes sense. There's no need to allocate a new memory. @Guo,
> Junfeng Can you update this?

Thanks Xiaoyun and Ferruh for the comments!
I tried to update the code as suggested but may get "Invalid Memory"
warning when quit the testpmd. I found it was caused at the function
rte_eth_dev_release_port with " rte_free(eth_dev->data->mac_addrs); ".
Seems that allocating memory for 'eth_dev->data->mac_addrs' is still
needed. Please help correct me if I misunderstood this. Thanks! I'll keep 
this part unchanged for the coming patchset first.

> >>
> >> <...>
> >>
> >>> +struct gve_priv {
> >>> +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> >>> +       const struct rte_memzone *irq_dbs_mz;
> >>> +       uint32_t mgmt_msix_idx;
> >>> +       rte_be32_t *cnt_array; /* array of num_event_counters */
> >>> +       const struct rte_memzone *cnt_array_mz;
> >>> +
> >>> +       uint16_t num_event_counters;
> >>> +       uint16_t tx_desc_cnt; /* txq size */
> >>> +       uint16_t rx_desc_cnt; /* rxq size */
> >>> +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> >>> +       uint16_t rx_data_slot_cnt; /* rx buffer length */
> >>
> >> These fields are not used in this patch, I guess some will be used in
> datapath
> >> patch.
> >
> > This is needed for base code gve_adminq.c not for datapath. Most of
> the stuff in gve_priv is for gve_adminq.c.
> > The adminq will update this info which dpdk pmd will need later.
> Compiler will complain if these don't exsit.
> >
> 
> You are right they are used by 'gve_adminq.c', so OK to keep them, if
> there are ones not used at this stage, can you add them whenever they
> are used, or remove them if not used at all. If all used/required, no
> change required.

Yes, we have already tried to move all the unused items to the corresponding 
stages patch by patch. Thanks for reminding this!

> 
> >>
> >> Can you please only add fields that is used in the patch? This way it will
> be
> >> clear in which functionality that field is used and enable to detect not
> used
> >> fields.
> >> We are accepting batch updates for base code, but this is dpdk related
> code,
> >> lets only add things that are used when they are used.
> >> Same for all data structures.
> >>
> >> <...>
> >>
> >>> diff --git a/drivers/net/gve/version.map
> b/drivers/net/gve/version.map
> >>> new file mode 100644 index 0000000000..c2e0723b4c
> >>> --- /dev/null
> >>> +++ b/drivers/net/gve/version.map
> >>> @@ -0,0 +1,3 @@
> >>> +DPDK_22 {
> >>
> >> DPDK_23


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 6/8] net/gve: add support for dev info get and dev configure
  2022-10-19 13:49                   ` Ferruh Yigit
@ 2022-10-20  9:29                     ` Guo, Junfeng
  2022-10-20 11:19                       ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-20  9:29 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 21:49
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v5 6/8] net/gve: add support for dev info get and dev
> configure
> 
> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> 
> >
> > Add dev_ops dev_infos_get.
> > Complete dev_configure with RX offloads configuration.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > --- a/drivers/net/gve/gve_ethdev.c
> > +++ b/drivers/net/gve/gve_ethdev.c
> > @@ -29,8 +29,16 @@ gve_write_version(uint8_t
> *driver_version_register)
> >   }
> >
> >   static int
> > -gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
> > +gve_dev_configure(struct rte_eth_dev *dev)
> >   {
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +
> > +       if (dev->data->dev_conf.rxmode.mq_mode &
> RTE_ETH_MQ_RX_RSS_FLAG)
> > +               dev->data->dev_conf.rxmode.offloads |=
> RTE_ETH_RX_OFFLOAD_RSS_HASH;
> > +
> 
> This is force enabling the feature, we are doing this for PMDs that has
> the hash value anyway and no additional work or performance loss
> observed to enable this offload. Otherwise drivers shouldn't update
> 'dev_conf.rxmode'.
> 
> Can you please confirm this PMD fits above description? And can you
> please add a coment that says force enabling the feature?

Yes, it seems force enabling this offloading is not quite reasonable here.
This may just follow previous PMD convention, so we decided to remove
this part in the coming version. Thanks! 

> 
> > +       if (dev->data->dev_conf.rxmode.offloads &
> RTE_ETH_RX_OFFLOAD_TCP_LRO)
> > +               priv->enable_rsc = 1;
> > +
> >          return 0;
> >   }
> >
> > @@ -94,6 +102,60 @@ gve_dev_close(struct rte_eth_dev *dev)
> >          return err;
> >   }
> >
> > +static int
> > +gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
> *dev_info)
> > +{
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +
> > +       dev_info->device = dev->device;
> > +       dev_info->max_mac_addrs = 1;
> > +       dev_info->max_rx_queues = priv->max_nb_rxq;
> > +       dev_info->max_tx_queues = priv->max_nb_txq;
> > +       dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
> > +       dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
> > +       dev_info->max_mtu = RTE_ETHER_MTU;
> 
> Can you please confirm max MTU this PMD supports is 1500? Meaning it
> doesn't support jumbo frames etc...

Actually here is just a workaround solution for the max_mtu info...
We can only get the max_mtu value via adminq message from the backend.
But the real one (i.e., priv->max_mtu) we get is 1460, which is less than 1500
Seems it is the GCP bug or something. 
If we use "dev_info->max_mtu = priv->max_mtu", the testpmd cannot even
be launched successfully...
I'll keep this part unchanged with some comments here if no other solutions.
Please help correct me if you have any other idea. Thanks a lot!

> 
> > +       dev_info->min_mtu = RTE_ETHER_MIN_MTU;
> > +
> > +       dev_info->rx_offload_capa = 0;
> > +       dev_info->tx_offload_capa =
> > +               RTE_ETH_TX_OFFLOAD_MULTI_SEGS   |
> > +               RTE_ETH_TX_OFFLOAD_IPV4_CKSUM   |
> > +               RTE_ETH_TX_OFFLOAD_UDP_CKSUM    |
> > +               RTE_ETH_TX_OFFLOAD_TCP_CKSUM    |
> > +               RTE_ETH_TX_OFFLOAD_SCTP_CKSUM   |
> > +               RTE_ETH_TX_OFFLOAD_TCP_TSO;
> 
> Can you adverstise these capabilities in the patch that implements them?

Will move this to the corresponding patch, thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 8/8] net/gve: add support for Rx/Tx
  2022-10-19 13:47                   ` Ferruh Yigit
@ 2022-10-20  9:34                     ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-20  9:34 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 21:47
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v5 8/8] net/gve: add support for Rx/Tx
> 
> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> 
> >
> > Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +uint16_t
> > +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t
> nb_pkts)
> > +{
> > +       volatile struct gve_rx_desc *rxr, *rxd;
> > +       struct gve_rx_queue *rxq = rx_queue;
> > +       uint16_t rx_id = rxq->rx_tail;
> > +       struct rte_mbuf *rxe;
> > +       uint16_t nb_rx, len;
> > +       uint64_t addr;
> > +
> > +       rxr = rxq->rx_desc_ring;
> > +
> > +       for (nb_rx = 0; nb_rx < nb_pkts; nb_rx++) {
> > +               rxd = &rxr[rx_id];
> > +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> > +                       break;
> > +
> > +               if (rxd->flags_seq & GVE_RXF_ERR)
> > +                       continue;
> > +
> 
> I think above code wrong.
> Function returns 'nb_rx', if you continue on error 'nb_rx' kept
> increased, so application will receive wrong number of packets.
> 
> Also packets assigned as "rx_pkts[nb_rx] = rxe;", this will cause gaps
> in the arrays.
> 
> You can either break on the error, or keep another variable to store
> number of received packets.

Thanks for pointing out this!
Yes, we will use another variable as index to go through all the nb_pkts 
and use nb_rx to store all valid pkts. In this way, there would be no gap
in the rx_pkts array and the size can align with the final returned nb_rx.
Thanks!

> 
> > +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> > +               rxe = rxq->sw_ring[rx_id];
> > +               rxe->data_off = RTE_PKTMBUF_HEADROOM;
> 
> As far as I can see mbufs allocated using 'rte_pktmbuf_alloc()', if so
> no need to explicitly set the 'm->data_off'.

Will update this, thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 5/8] net/gve: add support for MTU setting
  2022-10-19 13:47                   ` Ferruh Yigit
@ 2022-10-20 10:14                     ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-20 10:14 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 19, 2022 21:48
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v5 5/8] net/gve: add support for MTU setting
> 
> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> 
> >
> > Support dev_ops mtu_set.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> >
> > +static int
> > +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> > +{
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +       int err;
> > +
> > +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> > +               PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u",
> RTE_ETHER_MIN_MTU, priv->max_mtu);
> > +               return -EINVAL;
> > +       }
> > +
> > +       /* mtu setting is forbidden if port is start */
> > +       if (dev->data->dev_started) {
> > +               PMD_DRV_LOG(ERR, "Port must be stopped before
> configuration");
> > +               return -EBUSY;
> > +       }
> > +
> > +       dev->data->dev_conf.rxmode.mtu = mtu + RTE_ETHER_HDR_LEN;
> 
> it is 'dev->data->mtu' that holds latest MTU value.
> 
> 'dev_conf.rxmode.mtu' is the config requested from user, no need to
> update that.
> 
> And since 'dev->data->mtu' already updated by 'rte_eth_dev_set_mtu()',
> can drop above line.

Will fix this, thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v6 0/8] introduce GVE PMD
  2022-10-10 10:17                 ` [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
  2022-10-19 13:45                   ` Ferruh Yigit
@ 2022-10-20 10:36                   ` Junfeng Guo
  2022-10-20 10:36                     ` [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
                                       ` (8 more replies)
  1 sibling, 9 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Introduce a new PMD for Google Virtual Ethernet (GVE).

gve (or gVNIC) is the standard virtual ethernet interface on Google Cloud
Platform (GCP), which is one of the multiple virtual interfaces from those
leading CSP customers in the world.

Having a well maintained/optimized gve PMD on DPDK community can help those
cloud instance consumers with better experience of performance, maintenance
who wants to run their own VNFs on GCP.

Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
for the device description.

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h

It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0

Please info us if there is any object for the copyright concern. Thanks!

v2:
fix some CI check error.

v3:
refactor some code and fix some build error.

v4:
move the Google base code files into DPDK base folder.

v5:
reorder commit sequence and drop the stats feature.

v6:
improve the code.

Junfeng Guo (8):
  net/gve/base: introduce GVE PMD base code
  net/gve/base: add OS specific implementation
  net/gve: add support for device initialization
  net/gve: add support for link update
  net/gve: add support for MTU setting
  net/gve: add support for dev info get and dev configure
  net/gve: add support for queue operations
  net/gve: add support for Rx/Tx

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  16 +
 doc/guides/nics/gve.rst                |  76 ++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve.h             |  58 ++
 drivers/net/gve/base/gve_adminq.c      | 925 +++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h      | 383 ++++++++++
 drivers/net/gve/base/gve_desc.h        | 139 ++++
 drivers/net/gve/base/gve_desc_dqo.h    | 256 +++++++
 drivers/net/gve/base/gve_osdep.h       | 159 +++++
 drivers/net/gve/base/gve_register.h    |  30 +
 drivers/net/gve/gve_ethdev.c           | 699 +++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 295 ++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/gve_rx.c               | 354 ++++++++++
 drivers/net/gve/gve_tx.c               | 669 ++++++++++++++++++
 drivers/net/gve/meson.build            |  16 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 20 files changed, 4105 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_osdep.h
 create mode 100644 drivers/net/gve/base/gve_register.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.34.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 14:39                       ` Ferruh Yigit
  2022-10-20 14:40                       ` Ferruh Yigit
  2022-10-20 10:36                     ` [PATCH v6 2/8] net/gve/base: add OS specific implementation Junfeng Guo
                                       ` (7 subsequent siblings)
  8 siblings, 2 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
- gve_adminq.c
- gve_adminq.h
- gve_desc.h
- gve_desc_dqo.h
- gve_register.h
- gve.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Note that these code are not Intel files and they come from the kernel
community. The base code there has the statement of
SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
required MIT license as an exception to DPDK.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve.h          |  58 ++
 drivers/net/gve/base/gve_adminq.c   | 924 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h   | 381 ++++++++++++
 drivers/net/gve/base/gve_desc.h     | 137 +++++
 drivers/net/gve/base/gve_desc_dqo.h | 254 ++++++++
 drivers/net/gve/base/gve_register.h |  28 +
 6 files changed, 1782 insertions(+)
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_register.h

diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
new file mode 100644
index 0000000000..1b0d59b639
--- /dev/null
+++ b/drivers/net/gve/base/gve.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include "gve_desc.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR		0
+#define GVE_DB_BAR		2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES		1024
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
new file mode 100644
index 0000000000..2344100f1a
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -0,0 +1,924 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n Expected: length=%d, feature_mask=%x.\n Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIME;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_rsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u8 *mac;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	mac = descriptor->mac;
+	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
+		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR,
+			    "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
new file mode 100644
index 0000000000..c7114cc883
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
new file mode 100644
index 0000000000..358755b7e0
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+GVE_CHECK_STRUCT_LEN(64, gve_rx_desc);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
new file mode 100644
index 0000000000..0d533abcd1
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
new file mode 100644
index 0000000000..b65f336be2
--- /dev/null
+++ b/drivers/net/gve/base/gve_register.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Version: 1.3.0
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v6 2/8] net/gve/base: add OS specific implementation
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
  2022-10-20 10:36                     ` [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 10:36                     ` [PATCH v6 3/8] net/gve: add support for device initialization Junfeng Guo
                                       ` (6 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve_adminq.h   |   2 +
 drivers/net/gve/base/gve_desc.h     |   2 +
 drivers/net/gve/base/gve_desc_dqo.h |   2 +
 drivers/net/gve/base/gve_osdep.h    | 159 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_register.h |   2 +
 5 files changed, 167 insertions(+)
 create mode 100644 drivers/net/gve/base/gve_osdep.h

diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
index c7114cc883..cd496760ae 100644
--- a/drivers/net/gve/base/gve_adminq.h
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
index 358755b7e0..627b9120dc 100644
--- a/drivers/net/gve/base/gve_desc.h
+++ b/drivers/net/gve/base/gve_desc.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
index 0d533abcd1..5031752b43 100644
--- a/drivers/net/gve/base/gve_desc_dqo.h
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -9,6 +9,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/base/gve_osdep.h b/drivers/net/gve/base/gve_osdep.h
new file mode 100644
index 0000000000..7cb73002f4
--- /dev/null
+++ b/drivers/net/gve/base/gve_osdep.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "../gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+
+#ifndef PAGE_SHIFT
+#define PAGE_SHIFT	12
+#endif
+#ifndef PAGE_SIZE
+#define PAGE_SIZE	(1UL << PAGE_SHIFT)
+#endif
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#ifndef ____cacheline_aligned
+#define ____cacheline_aligned	__rte_cache_aligned
+#endif
+#ifndef __packed
+#define __packed		__rte_packed
+#endif
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
index b65f336be2..a599c1a08e 100644
--- a/drivers/net/gve/base/gve_register.h
+++ b/drivers/net/gve/base/gve_register.h
@@ -7,6 +7,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v6 3/8] net/gve: add support for device initialization
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
  2022-10-20 10:36                     ` [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
  2022-10-20 10:36                     ` [PATCH v6 2/8] net/gve/base: add OS specific implementation Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 14:42                       ` Ferruh Yigit
  2022-10-20 10:36                     ` [PATCH v6 4/8] net/gve: add support for link update Junfeng Guo
                                       ` (5 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

Support device init and add following devops skeleton:
 - dev_configure
 - dev_start
 - dev_stop
 - dev_close

Note that build system (including doc) is also added in this patch.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  10 +
 doc/guides/nics/gve.rst                |  68 +++++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve_adminq.c      |   1 +
 drivers/net/gve/gve_ethdev.c           | 371 +++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 225 +++++++++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/meson.build            |  14 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 12 files changed, 719 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 92b381bc30..2d06a76efe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..44aec28009
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,10 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..703fbcc5de
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,68 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_gve**) provides poll mode driver support for
+Google Virtual Ethernet device (also called as gVNIC).
+
+gVNIC is the standard virtual ethernet interface on Google Cloud Platform (GCP),
+which is one of the multiple virtual interfaces from those leading CSP
+customers in the world.
+
+Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
+for the device description.
+
+Having a well maintained/optimized gve PMD on DPDK community can help those
+cloud instance consumers with better experience of performance, maintenance
+who wants to run their own VNFs on GCP.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_desc.h
+- gve_desc_dqo.h
+- gve_register.h
+- gve.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL - GQI with queue page list
+- GQI_RDA - GQI with raw DMA addressing
+- DQO_RDA - DQO with raw DMA addressing
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 32c7544968..4d40ea29a3 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -29,6 +29,7 @@ Network Interface Controller Drivers
     enetfec
     enic
     fm10k
+    gve
     hinic
     hns3
     i40e
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 1c3daf141d..715013fa35 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -140,6 +140,11 @@ New Features
 
   * Made compatible with libbpf v0.8.0 (when used with libxdp).
 
+* **Added GVE net PMD**
+
+  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
+
 * **Updated AMD Pensando ionic driver.**
 
   Updated the ionic PMD with new features and improvements, including:
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
index 2344100f1a..ceafa05286 100644
--- a/drivers/net/gve/base/gve_adminq.c
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -5,6 +5,7 @@
  * Copyright(C) 2022 Intel Corporation
  */
 
+#include "../gve_ethdev.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..f8ff3b1923
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,371 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+#include "base/gve_register.h"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_DRV_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map pci bar!");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
+	if (!eth_dev->data->mac_addrs) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
+		return -ENOMEM;
+	}
+	rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	gve_teardown_priv_resources(priv);
+
+	eth_dev->data->mac_addrs = NULL;
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
new file mode 100644
index 0000000000..2ac2a46ac1
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.h
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ETHDEV_H_
+#define _GVE_ETHDEV_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "base/gve.h"
+
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
+
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_rsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+static inline bool
+gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool
+gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+
+#endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..0d02da46e1
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_driver;
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
+		__func__, ## args)
+
+#endif
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..d8ec64b3a3
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'base/gve_adminq.c',
+        'gve_ethdev.c',
+)
+includes += include_directories('base')
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..c2e0723b4c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_22 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 35bfa78dee..355dbd07e9 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v6 4/8] net/gve: add support for link update
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
                                       ` (2 preceding siblings ...)
  2022-10-20 10:36                     ` [PATCH v6 3/8] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 10:36                     ` [PATCH v6 5/8] net/gve: add support for MTU setting Junfeng Guo
                                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 doc/guides/nics/gve.rst          |  3 +++
 drivers/net/gve/gve_ethdev.c     | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 44aec28009..ae466ad677 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Link status          = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 703fbcc5de..c42ff23841 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -60,6 +60,9 @@ Features and Limitations
 
 In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
+Supported features of the GVE PMD are:
+
+- Link state information
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index f8ff3b1923..ca4a467140 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -34,10 +34,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_DRV_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -70,6 +99,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v6 5/8] net/gve: add support for MTU setting
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
                                       ` (3 preceding siblings ...)
  2022-10-20 10:36                     ` [PATCH v6 4/8] net/gve: add support for link update Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 14:45                       ` Ferruh Yigit
  2022-10-20 10:36                     ` [PATCH v6 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
                                       ` (3 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 drivers/net/gve/gve_ethdev.c     | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index ae466ad677..d1703d8dab 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -5,6 +5,7 @@
 ;
 [Features]
 Link status          = Y
+MTU update           = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index ca4a467140..1968f38eb6 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -94,12 +94,39 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v6 6/8] net/gve: add support for dev info get and dev configure
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
                                       ` (4 preceding siblings ...)
  2022-10-20 10:36                     ` [PATCH v6 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 14:45                       ` Ferruh Yigit
  2022-10-20 10:36                     ` [PATCH v6 7/8] net/gve: add support for queue operations Junfeng Guo
                                       ` (2 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads configuration.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 ++
 doc/guides/nics/gve.rst          |  1 +
 drivers/net/gve/gve_ethdev.c     | 56 +++++++++++++++++++++++++++++++-
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index d1703d8dab..986df7f94a 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,8 +4,10 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+RSS hash             = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index c42ff23841..8c09a5a7fa 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
 Supported features of the GVE PMD are:
 
+- Receiver Side Scaling (RSS)
 - Link state information
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 1968f38eb6..5be8d664f3 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -29,8 +29,13 @@ gve_write_version(uint8_t *driver_version_register)
 }
 
 static int
-gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+gve_dev_configure(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_rsc = 1;
+
 	return 0;
 }
 
@@ -94,6 +99,54 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+	dev_info->max_mtu = RTE_ETHER_MTU;
+	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -125,6 +178,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v6 7/8] net/gve: add support for queue operations
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
                                       ` (5 preceding siblings ...)
  2022-10-20 10:36                     ` [PATCH v6 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 10:36                     ` [PATCH v6 8/8] net/gve: add support for Rx/Tx Junfeng Guo
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add support for queue operations:
- setup rx/tx queue
- release rx/tx queue
- start rx/tx queues
- stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 204 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h |  52 +++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 684 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 5be8d664f3..52c335bc16 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -28,6 +28,68 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(struct rte_eth_dev *dev)
 {
@@ -39,6 +101,43 @@ gve_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -70,16 +169,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -88,7 +241,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -96,6 +253,19 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+	rte_free(priv->adminq);
+
 	return err;
 }
 
@@ -179,6 +349,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.dev_infos_get        = gve_dev_info_get,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -316,7 +488,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -367,10 +541,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 2ac2a46ac1..20fe57781e 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,15 +34,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -56,6 +76,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -64,9 +86,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -83,6 +113,8 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_priv {
@@ -222,4 +254,24 @@ gve_clear_device_rings_ok(struct gve_priv *priv)
 				&priv->state_flags);
 }
 
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void
+gve_tx_queue_release(void *txq);
+
+void
+gve_rx_queue_release(void *rxq);
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..e64a461253
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_DRV_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..b706b62e71
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_DRV_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index d8ec64b3a3..af0010c01c 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,6 +9,8 @@ endif
 
 sources = files(
         'base/gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
 includes += include_directories('base')
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v6 8/8] net/gve: add support for Rx/Tx
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
                                       ` (6 preceding siblings ...)
  2022-10-20 10:36                     ` [PATCH v6 7/8] net/gve: add support for queue operations Junfeng Guo
@ 2022-10-20 10:36                     ` Junfeng Guo
  2022-10-20 14:47                       ` Ferruh Yigit
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-20 10:36 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |   2 +
 doc/guides/nics/gve.rst          |   4 +
 drivers/net/gve/gve_ethdev.c     |  15 +-
 drivers/net/gve/gve_ethdev.h     |  18 ++
 drivers/net/gve/gve_rx.c         | 142 ++++++++++
 drivers/net/gve/gve_tx.c         | 455 +++++++++++++++++++++++++++++++
 6 files changed, 635 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 986df7f94a..cdc46b08a3 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -7,7 +7,9 @@
 Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+TSO                  = Y
 RSS hash             = Y
+L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 8c09a5a7fa..1042852fd6 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -62,8 +62,12 @@ In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
 Supported features of the GVE PMD are:
 
+- Multiple queues for TX and RX
 - Receiver Side Scaling (RSS)
+- TSO offload
 - Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 52c335bc16..91f930957d 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -284,7 +284,13 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
 
 	dev_info->rx_offload_capa = 0;
-	dev_info->tx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS	|
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
 
 	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
 		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
@@ -633,6 +639,13 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	} else {
+		PMD_DRV_LOG(ERR, "DQO_RDA is not implemented and will be added in the future");
+	}
+
 	eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
 	if (!eth_dev->data->mac_addrs) {
 		PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 20fe57781e..266b831a01 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -34,6 +34,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -274,4 +286,10 @@ gve_stop_tx_queues(struct rte_eth_dev *dev);
 void
 gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t
+gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t
+gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index e64a461253..ea397d68fa 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,148 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+	uint16_t i;
+
+	rxr = rxq->rx_desc_ring;
+	nb_rx = 0;
+
+	for (i = 0; i < nb_pkts; i++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
+				   (void *)(size_t)addr, len);
+		}
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+		nb_rx++;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index b706b62e71..d94b1186a4 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,461 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+			}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		txq->next_to_clean += nb_clean;
+	}
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+				   (void *)(size_t)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen,
+					 (void *)(size_t)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+					   (void *)(size_t)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(size_t)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-20  9:29                         ` Guo, Junfeng
@ 2022-10-20 11:15                           ` Ferruh Yigit
  2022-10-21  4:46                             ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 11:15 UTC (permalink / raw)
  To: Guo, Junfeng, Li, Xiaoyun, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, awogbemila, Richardson, Bruce, Lin, Xueqin,
	Wang, Haiyue

On 10/20/2022 10:29 AM, Guo, Junfeng wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Thursday, October 20, 2022 05:01
>> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Guo, Junfeng
>> <junfeng.guo@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
>> Jingjing <jingjing.wu@intel.com>
>> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; awogbemila@google.com;
>> Richardson, Bruce <bruce.richardson@intel.com>; Lin, Xueqin
>> <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
>> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
>>
>> On 10/19/2022 4:59 PM, Li, Xiaoyun wrote:
>>
>>>
>>> Hi
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Wednesday, October 19, 2022 14:46
>>>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>>>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
>>>> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
>>>> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
>>>> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
>> Wang,
>>>> Haiyue <haiyue.wang@intel.com>
>>>> Subject: Re: [PATCH v5 3/8] net/gve: add support for device
>> initialization
>>>>
>>>> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
>>>>>
>>>>> Support device init and add following devops skeleton:
>>>>>     - dev_configure
>>>>>     - dev_start
>>>>>     - dev_stop
>>>>>     - dev_close
>>>>>
>>>>> Note that build system (including doc) is also added in this patch.
>>>>>
>>>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>>>
>>>> <...>
>>>>
>>>>> diff --git a/doc/guides/rel_notes/release_22_11.rst
>>>>> b/doc/guides/rel_notes/release_22_11.rst
>>>>> index fbb575255f..c1162ea1a4 100644
>>>>> --- a/doc/guides/rel_notes/release_22_11.rst
>>>>> +++ b/doc/guides/rel_notes/release_22_11.rst
>>>>> @@ -200,6 +200,11 @@ New Features
>>>>>       into single event containing ``rte_event_vector``
>>>>>       whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
>>>>>
>>>>> +* **Added GVE net PMD**
>>>>> +
>>>>> +  * Added the new ``gve`` net driver for Google Virtual Ethernet
>> devices.
>>>>> +  * See the :doc:`../nics/gve` NIC guide for more details on this new
>> driver.
>>>>> +
>>>>>
>>>>
>>>> Can you please move the block amaong the other ethdev drivers, as
>>>> alphabetically sorted?
>>>>
>>>> <...>
>>>>
>>>>> +static int
>>>>> +gve_dev_init(struct rte_eth_dev *eth_dev) {
>>>>> +       struct gve_priv *priv = eth_dev->data->dev_private;
>>>>> +       int max_tx_queues, max_rx_queues;
>>>>> +       struct rte_pci_device *pci_dev;
>>>>> +       struct gve_registers *reg_bar;
>>>>> +       rte_be32_t *db_bar;
>>>>> +       int err;
>>>>> +
>>>>> +       eth_dev->dev_ops = &gve_eth_dev_ops;
>>>>> +
>>>>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>>>>> +               return 0;
>>>>> +
>>>>> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
>>>>> +
>>>>> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
>>>>> +       if (!reg_bar) {
>>>>> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
>>>>> +               return -ENOMEM;
>>>>> +       }
>>>>> +
>>>>> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
>>>>> +       if (!db_bar) {
>>>>> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
>>>>> +               return -ENOMEM;
>>>>> +       }
>>>>> +
>>>>> +       gve_write_version(&reg_bar->driver_version);
>>>>> +       /* Get max queues to alloc etherdev */
>>>>> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
>>>>> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
>>>>> +
>>>>> +       priv->reg_bar0 = reg_bar;
>>>>> +       priv->db_bar2 = db_bar;
>>>>> +       priv->pci_dev = pci_dev;
>>>>> +       priv->state_flags = 0x0;
>>>>> +
>>>>> +       priv->max_nb_txq = max_tx_queues;
>>>>> +       priv->max_nb_rxq = max_rx_queues;
>>>>> +
>>>>> +       err = gve_init_priv(priv, false);
>>>>> +       if (err)
>>>>> +               return err;
>>>>> +
>>>>> +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac",
>> sizeof(struct
>>>> rte_ether_addr), 0);
>>>>> +       if (!eth_dev->data->mac_addrs) {
>>>>> +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store
>> mac
>>>> address");
>>>>> +               return -ENOMEM;
>>>>> +       }
>>>>> +       rte_ether_addr_copy(&priv->dev_addr,
>>>>> + eth_dev->data->mac_addrs);
>>>>> +
>>>>
>>>> Is anything assinged to 'priv->dev_addr' to copy?
>>>> Also since there is a 'priv->dev_addr' field, why not use it directly,
>> instead of
>>>> allocating memory for 'eth_dev->data->mac_addrs'?
>>>> I mean why not "eth_dev->data->mac_addrs = &priv->dev_addr"?
>>>
>>> Makes sense. There's no need to allocate a new memory. @Guo,
>> Junfeng Can you update this?
> 
> Thanks Xiaoyun and Ferruh for the comments!
> I tried to update the code as suggested but may get "Invalid Memory"
> warning when quit the testpmd. I found it was caused at the function
> rte_eth_dev_release_port with " rte_free(eth_dev->data->mac_addrs); ".
> Seems that allocating memory for 'eth_dev->data->mac_addrs' is still
> needed. Please help correct me if I misunderstood this. Thanks! I'll keep
> this part unchanged for the coming patchset first.
> 

No it is not needed, you need to set pointer to NULL on release path to 
prevent common code free it (the problem you are getting). There are 
samples in various PMDs, please check.

>>>>
>>>> <...>
>>>>
>>>>> +struct gve_priv {
>>>>> +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
>>>>> +       const struct rte_memzone *irq_dbs_mz;
>>>>> +       uint32_t mgmt_msix_idx;
>>>>> +       rte_be32_t *cnt_array; /* array of num_event_counters */
>>>>> +       const struct rte_memzone *cnt_array_mz;
>>>>> +
>>>>> +       uint16_t num_event_counters;
>>>>> +       uint16_t tx_desc_cnt; /* txq size */
>>>>> +       uint16_t rx_desc_cnt; /* rxq size */
>>>>> +       uint16_t tx_pages_per_qpl; /* tx buffer length */
>>>>> +       uint16_t rx_data_slot_cnt; /* rx buffer length */
>>>>
>>>> These fields are not used in this patch, I guess some will be used in
>> datapath
>>>> patch.
>>>
>>> This is needed for base code gve_adminq.c not for datapath. Most of
>> the stuff in gve_priv is for gve_adminq.c.
>>> The adminq will update this info which dpdk pmd will need later.
>> Compiler will complain if these don't exsit.
>>>
>>
>> You are right they are used by 'gve_adminq.c', so OK to keep them, if
>> there are ones not used at this stage, can you add them whenever they
>> are used, or remove them if not used at all. If all used/required, no
>> change required.
> 
> Yes, we have already tried to move all the unused items to the corresponding
> stages patch by patch. Thanks for reminding this!
> 

thanks.

>>
>>>>
>>>> Can you please only add fields that is used in the patch? This way it will
>> be
>>>> clear in which functionality that field is used and enable to detect not
>> used
>>>> fields.
>>>> We are accepting batch updates for base code, but this is dpdk related
>> code,
>>>> lets only add things that are used when they are used.
>>>> Same for all data structures.
>>>>
>>>> <...>
>>>>
>>>>> diff --git a/drivers/net/gve/version.map
>> b/drivers/net/gve/version.map
>>>>> new file mode 100644 index 0000000000..c2e0723b4c
>>>>> --- /dev/null
>>>>> +++ b/drivers/net/gve/version.map
>>>>> @@ -0,0 +1,3 @@
>>>>> +DPDK_22 {
>>>>
>>>> DPDK_23
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v5 6/8] net/gve: add support for dev info get and dev configure
  2022-10-20  9:29                     ` Guo, Junfeng
@ 2022-10-20 11:19                       ` Ferruh Yigit
  2022-10-21  5:22                         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 11:19 UTC (permalink / raw)
  To: Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin

On 10/20/2022 10:29 AM, Guo, Junfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, October 19, 2022 21:49
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
>> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
>> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
>> Subject: Re: [PATCH v5 6/8] net/gve: add support for dev info get and dev
>> configure
>>
>> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
>>
>>>
>>> Add dev_ops dev_infos_get.
>>> Complete dev_configure with RX offloads configuration.
>>>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> --- a/drivers/net/gve/gve_ethdev.c
>>> +++ b/drivers/net/gve/gve_ethdev.c
>>> @@ -29,8 +29,16 @@ gve_write_version(uint8_t
>> *driver_version_register)
>>>    }
>>>
>>>    static int
>>> -gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
>>> +gve_dev_configure(struct rte_eth_dev *dev)
>>>    {
>>> +       struct gve_priv *priv = dev->data->dev_private;
>>> +
>>> +       if (dev->data->dev_conf.rxmode.mq_mode &
>> RTE_ETH_MQ_RX_RSS_FLAG)
>>> +               dev->data->dev_conf.rxmode.offloads |=
>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>> +
>>
>> This is force enabling the feature, we are doing this for PMDs that has
>> the hash value anyway and no additional work or performance loss
>> observed to enable this offload. Otherwise drivers shouldn't update
>> 'dev_conf.rxmode'.
>>
>> Can you please confirm this PMD fits above description? And can you
>> please add a coment that says force enabling the feature?
> 
> Yes, it seems force enabling this offloading is not quite reasonable here.
> This may just follow previous PMD convention, so we decided to remove
> this part in the coming version. Thanks!
> 
>>
>>> +       if (dev->data->dev_conf.rxmode.offloads &
>> RTE_ETH_RX_OFFLOAD_TCP_LRO)
>>> +               priv->enable_rsc = 1;
>>> +
>>>           return 0;
>>>    }
>>>
>>> @@ -94,6 +102,60 @@ gve_dev_close(struct rte_eth_dev *dev)
>>>           return err;
>>>    }
>>>
>>> +static int
>>> +gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
>> *dev_info)
>>> +{
>>> +       struct gve_priv *priv = dev->data->dev_private;
>>> +
>>> +       dev_info->device = dev->device;
>>> +       dev_info->max_mac_addrs = 1;
>>> +       dev_info->max_rx_queues = priv->max_nb_rxq;
>>> +       dev_info->max_tx_queues = priv->max_nb_txq;
>>> +       dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
>>> +       dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
>>> +       dev_info->max_mtu = RTE_ETHER_MTU;
>>
>> Can you please confirm max MTU this PMD supports is 1500? Meaning it
>> doesn't support jumbo frames etc...
> 
> Actually here is just a workaround solution for the max_mtu info...
> We can only get the max_mtu value via adminq message from the backend.
> But the real one (i.e., priv->max_mtu) we get is 1460, which is less than 1500
> Seems it is the GCP bug or something.
> If we use "dev_info->max_mtu = priv->max_mtu", the testpmd cannot even
> be launched successfully...
> I'll keep this part unchanged with some comments here if no other solutions.
> Please help correct me if you have any other idea. Thanks a lot!
> 

Getting actual value from device is correct thing to do, but it seems 
received value is not good, so OK to keep as it is.
Can you please follow this with GVE?

>>
>>> +       dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>>> +
>>> +       dev_info->rx_offload_capa = 0;
>>> +       dev_info->tx_offload_capa =
>>> +               RTE_ETH_TX_OFFLOAD_MULTI_SEGS   |
>>> +               RTE_ETH_TX_OFFLOAD_IPV4_CKSUM   |
>>> +               RTE_ETH_TX_OFFLOAD_UDP_CKSUM    |
>>> +               RTE_ETH_TX_OFFLOAD_TCP_CKSUM    |
>>> +               RTE_ETH_TX_OFFLOAD_SCTP_CKSUM   |
>>> +               RTE_ETH_TX_OFFLOAD_TCP_TSO;
>>
>> Can you adverstise these capabilities in the patch that implements them?
> 
> Will move this to the corresponding patch, thanks!
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-20 10:36                     ` [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
@ 2022-10-20 14:39                       ` Ferruh Yigit
  2022-10-24  2:10                         ` Guo, Junfeng
  2022-10-20 14:40                       ` Ferruh Yigit
  1 sibling, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 14:39 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing, Xiaoyun Li
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Haiyue Wang

On 10/20/2022 11:36 AM, Junfeng Guo wrote:
> diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
> new file mode 100644
> index 0000000000..1b0d59b639
> --- /dev/null
> +++ b/drivers/net/gve/base/gve.h
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: MIT
> + * Google Virtual Ethernet (gve) driver
> + * Version: 1.3.0

[1]

> + * Copyright (C) 2015-2022 Google, Inc.
> + * Copyright(C) 2022 Intel Corporation

[2]

> + */
> +
> +#ifndef_GVE_H_
> +#define_GVE_H_
> +
> +#include "gve_desc.h"
> +
> +#define GVE_VERSION            "1.3.0"
> +#define GVE_VERSION_PREFIX     "GVE-"
> +

Is it clarified/decided to keep version in the file comment [1] and keep 
Intel copyright [2], or is this just not addressed yet?

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-20 10:36                     ` [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
  2022-10-20 14:39                       ` Ferruh Yigit
@ 2022-10-20 14:40                       ` Ferruh Yigit
  2022-10-24  2:10                         ` Guo, Junfeng
  1 sibling, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 14:40 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Haiyue Wang

On 10/20/2022 11:36 AM, Junfeng Guo wrote:

> 
> The following base code is based on Google Virtual Ethernet (gve)
> driver v1.3.0 under MIT license.
> - gve_adminq.c
> - gve_adminq.h
> - gve_desc.h
> - gve_desc_dqo.h
> - gve_register.h
> - gve.h
> 
> The original code is in:
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
> tree/v1.3.0/google/gve
> 
> Note that these code are not Intel files and they come from the kernel
> community. The base code there has the statement of
> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> required MIT license as an exception to DPDK.

Can drop "GVE PMD" from patch title, since 'net/gve/base:' already 
implies it, like:
net/gve/base: introduce base code

> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
> +{
> +       int i;
> +
> +       for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
> +               if (ioread32be(&priv->reg_bar0->adminq_event_counter)
> +                   == prod_cnt)

Syntax, why not move second half of the equation in above line?
Unless this is coming from google code and updating it brings 
maintanance cost.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v6 3/8] net/gve: add support for device initialization
  2022-10-20 10:36                     ` [PATCH v6 3/8] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-20 14:42                       ` Ferruh Yigit
  2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 14:42 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Haiyue Wang

On 10/20/2022 11:36 AM, Junfeng Guo wrote:

> 
> Support device init and add following devops skeleton:
>   - dev_configure
>   - dev_start
>   - dev_stop
>   - dev_close
> 
> Note that build system (including doc) is also added in this patch.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> index 1c3daf141d..715013fa35 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -140,6 +140,11 @@ New Features
> 
>     * Made compatible with libbpf v0.8.0 (when used with libxdp).
> 
> +* **Added GVE net PMD**
> +
> +  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
> +  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
> +

Can you please move it one more down, just above 'Intel', to sort it 
alphabetically based on Vendor name, in this case 'G' I guess.
We are almost there :)

<...>

> +static int
> +gve_dev_init(struct rte_eth_dev *eth_dev)
> +{
> +       struct gve_priv *priv = eth_dev->data->dev_private;
> +       int max_tx_queues, max_rx_queues;
> +       struct rte_pci_device *pci_dev;
> +       struct gve_registers *reg_bar;
> +       rte_be32_t *db_bar;
> +       int err;
> +
> +       eth_dev->dev_ops = &gve_eth_dev_ops;
> +
> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +               return 0;
> +
> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> +
> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> +       if (!reg_bar) {
> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> +               return -ENOMEM;
> +       }
> +
> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> +       if (!db_bar) {
> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> +               return -ENOMEM;
> +       }
> +
> +       gve_write_version(&reg_bar->driver_version);
> +       /* Get max queues to alloc etherdev */
> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> +
> +       priv->reg_bar0 = reg_bar;
> +       priv->db_bar2 = db_bar;
> +       priv->pci_dev = pci_dev;
> +       priv->state_flags = 0x0;
> +
> +       priv->max_nb_txq = max_tx_queues;
> +       priv->max_nb_rxq = max_rx_queues;
> +
> +       err = gve_init_priv(priv, false);
> +       if (err)
> +               return err;
> +
> +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct rte_ether_addr), 0);
> +       if (!eth_dev->data->mac_addrs) {
> +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac address");
> +               return -ENOMEM;
> +       }
> +       rte_ether_addr_copy(&priv->dev_addr, eth_dev->data->mac_addrs);
> +

What is the value in 'priv->dev_addr'?
Even allocating memory for 'eth_dev->data->mac_addrs' removed or not, as 
we discussed, independent from it, need to set a valid value to 
'priv->dev_addr'.

<...>

> diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
> new file mode 100644
> index 0000000000..c2e0723b4c
> --- /dev/null
> +++ b/drivers/net/gve/version.map
> @@ -0,0 +1,3 @@
> +DPDK_22 {

DPDK_23

In case it is not clear, since this comment skipped in previous a few 
versions, the ABI version should be 'DPDK_23', so the content of this 
file should be;

DPDK_23 {
         local: *;
};


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v6 5/8] net/gve: add support for MTU setting
  2022-10-20 10:36                     ` [PATCH v6 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-20 14:45                       ` Ferruh Yigit
  2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 14:45 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/20/2022 11:36 AM, Junfeng Guo wrote:

> 
> Support dev_ops mtu_set.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   doc/guides/nics/features/gve.ini |  1 +
>   drivers/net/gve/gve_ethdev.c     | 27 +++++++++++++++++++++++++++
>   2 files changed, 28 insertions(+)
> 
> diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
> index ae466ad677..d1703d8dab 100644
> --- a/doc/guides/nics/features/gve.ini
> +++ b/doc/guides/nics/features/gve.ini
> @@ -5,6 +5,7 @@
>   ;
>   [Features]
>   Link status          = Y
> +MTU update           = Y
>   Linux                = Y
>   x86-32               = Y
>   x86-64               = Y
> diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
> index ca4a467140..1968f38eb6 100644
> --- a/drivers/net/gve/gve_ethdev.c
> +++ b/drivers/net/gve/gve_ethdev.c
> @@ -94,12 +94,39 @@ gve_dev_close(struct rte_eth_dev *dev)
>          return err;
>   }
> 
> +static int
> +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> +       struct gve_priv *priv = dev->data->dev_private;
> +       int err;
> +
> +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> +               PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u", RTE_ETHER_MIN_MTU, priv->max_mtu);

Although this is within new 100 column limit, it is easy to break it 
without sacrificing the readability, can you break it as something like:

PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u",
	RTE_ETHER_MIN_MTU, priv->max_mtu);

> +               return -EINVAL;
> +       }
> +
> +       /* mtu setting is forbidden if port is start */
> +       if (dev->data->dev_started) {
> +               PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
> +               return -EBUSY;
> +       }
> +
> +       err = gve_adminq_set_mtu(priv, mtu);
> +       if (err) {
> +               PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
> +               return err;
> +       }
> +
> +       return 0;
> +}


configure() (gve_dev_configure()) also get 'mtu' as user config 
('eth_conf->rxmode.mtu') which is ignored right now,

since there is 'gve_adminq_set_mtu()' command already what do you think 
to use it within 'gve_dev_configure()'?

> +
>   static const struct eth_dev_ops gve_eth_dev_ops = {
>          .dev_configure        = gve_dev_configure,
>          .dev_start            = gve_dev_start,
>          .dev_stop             = gve_dev_stop,
>          .dev_close            = gve_dev_close,
>          .link_update          = gve_link_update,
> +       .mtu_set              = gve_dev_mtu_set,
>   };
> 
>   static void
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v6 6/8] net/gve: add support for dev info get and dev configure
  2022-10-20 10:36                     ` [PATCH v6 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-20 14:45                       ` Ferruh Yigit
  2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 14:45 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/20/2022 11:36 AM, Junfeng Guo wrote:

> 
> Add dev_ops dev_infos_get.
> Complete dev_configure with RX offloads configuration.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> ---
>   doc/guides/nics/features/gve.ini |  2 ++
>   doc/guides/nics/gve.rst          |  1 +
>   drivers/net/gve/gve_ethdev.c     | 56 +++++++++++++++++++++++++++++++-
>   3 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
> index d1703d8dab..986df7f94a 100644
> --- a/doc/guides/nics/features/gve.ini
> +++ b/doc/guides/nics/features/gve.ini
> @@ -4,8 +4,10 @@
>   ; Refer to default.ini for the full list of available PMD features.
>   ;
>   [Features]
> +Speed capabilities   = Y
>   Link status          = Y
>   MTU update           = Y
> +RSS hash             = Y

I think this was added because of 'RTE_ETH_RX_OFFLOAD_RSS_HASH', it is 
OK to keep this feature if you add force enabling above offload, 
otherwise please remove the feature.

>   Linux                = Y
>   x86-32               = Y
>   x86-64               = Y
> diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
> index c42ff23841..8c09a5a7fa 100644
> --- a/doc/guides/nics/gve.rst
> +++ b/doc/guides/nics/gve.rst
> @@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic functionality of packet
>   reception and transmission.
>   Supported features of the GVE PMD are:
> 
> +- Receiver Side Scaling (RSS)

I am not sure if driver can claim this, I can see a RSS hash is provided 
but is it possible to update which hash function to use or update key or 
RETA table to configure which queue packets goes?

Right now what is RSS calculated on?

Perpaps RSS support can be documented as limited?

And not sure if this update belongs this patch, it should be to the one 
that has the datapath.



^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v6 8/8] net/gve: add support for Rx/Tx
  2022-10-20 10:36                     ` [PATCH v6 8/8] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-10-20 14:47                       ` Ferruh Yigit
  2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-20 14:47 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/20/2022 11:36 AM, Junfeng Guo wrote:

> 
> Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +uint16_t
> +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> +{
> +       volatile struct gve_rx_desc *rxr, *rxd;
> +       struct gve_rx_queue *rxq = rx_queue;
> +       uint16_t rx_id = rxq->rx_tail;
> +       struct rte_mbuf *rxe;
> +       uint16_t nb_rx, len;
> +       uint64_t addr;
> +       uint16_t i;
> +
> +       rxr = rxq->rx_desc_ring;
> +       nb_rx = 0;
> +
> +       for (i = 0; i < nb_pkts; i++) {
> +               rxd = &rxr[rx_id];
> +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> +                       break;
> +
> +               if (rxd->flags_seq & GVE_RXF_ERR)
> +                       continue;
> +
> +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> +               rxe = rxq->sw_ring[rx_id];
> +               if (rxq->is_gqi_qpl) {
> +                       addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
> +                       rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
> +                                  (void *)(size_t)addr, len);
> +               }
> +               rxe->pkt_len = len;
> +               rxe->data_len = len;
> +               rxe->port = rxq->port_id;
> +               rxe->ol_flags = 0;
> +
> +               if (rxd->flags_seq & GVE_RXF_TCP)
> +                       rxe->packet_type |= RTE_PTYPE_L4_TCP;
> +               if (rxd->flags_seq & GVE_RXF_UDP)
> +                       rxe->packet_type |= RTE_PTYPE_L4_UDP;
> +               if (rxd->flags_seq & GVE_RXF_IPV4)
> +                       rxe->packet_type |= RTE_PTYPE_L3_IPV4;
> +               if (rxd->flags_seq & GVE_RXF_IPV6)
> +                       rxe->packet_type |= RTE_PTYPE_L3_IPV6;
> +
> +               if (gve_needs_rss(rxd->flags_seq)) {
> +                       rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
> +                       rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);

You are updating "m->hash.rss" anyway, and if this is without and cost 
you can force enable as done in previous version:
'dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;'

<...>

> +static inline void
> +gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
> +{
> +       struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
> +       int nb_free = 0;
> +       int i, s;
> +
> +       if (unlikely(num == 0))
> +               return;
> +
> +       /* Find the 1st mbuf which needs to be free */
> +       for (s = 0; s < num; s++) {
> +               if (txep[s] != NULL) {
> +                       m = rte_pktmbuf_prefree_seg(txep[s]);
> +                       if (m != NULL)
> +                               break;
> +                       }

'}' indentation is wrong.

<...>

> +static inline void
> +gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
> +{
> +       uint32_t start = txq->sw_ntc;
> +       uint32_t ntc, nb_clean;
> +
> +       ntc = txq->sw_tail;
> +
> +       if (ntc == start)
> +               return;
> +
> +       /* if wrap around, free twice. */
> +       if (ntc < start) {
> +               nb_clean = txq->nb_tx_desc - start;
> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> +
> +               txq->sw_nb_free += nb_clean;
> +               start += nb_clean;
> +               if (start == txq->nb_tx_desc)
> +                       start = 0;
> +               txq->sw_ntc = start;
> +       }
> +
> +       if (ntc > start) {

may be can drop the 'if' block, since "ntc == start" and "ntc < start" 
cases already covered.

<...>

> +uint16_t
> +gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> +{
> +       struct gve_tx_queue *txq = tx_queue;
> +
> +       if (txq->is_gqi_qpl)
> +               return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
> +
> +       return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
> +}
> +

Can there be mix of queue types?
If only one queue type is supported in specific config, perhaps burst 
function can be set during configuration, to prevent if check on datapath.

This is optimization and can be done later, it doesn't have to be in the 
set.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 3/8] net/gve: add support for device initialization
  2022-10-20 11:15                           ` Ferruh Yigit
@ 2022-10-21  4:46                             ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-21  4:46 UTC (permalink / raw)
  To: Ferruh Yigit, Li, Xiaoyun, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, awogbemila, Richardson,  Bruce, Lin, Xueqin,
	Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 19:16
> To: Guo, Junfeng <junfeng.guo@intel.com>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; awogbemila@google.com;
> Richardson, Bruce <bruce.richardson@intel.com>; Lin, Xueqin
> <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v5 3/8] net/gve: add support for device initialization
> 
> On 10/20/2022 10:29 AM, Guo, Junfeng wrote:
> > CAUTION: This message has originated from an External Source. Please
> use proper judgment and caution when opening attachments, clicking
> links, or responding to this email.
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Thursday, October 20, 2022 05:01
> >> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Guo, Junfeng
> >> <junfeng.guo@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wu,
> >> Jingjing <jingjing.wu@intel.com>
> >> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; awogbemila@google.com;
> >> Richardson, Bruce <bruce.richardson@intel.com>; Lin, Xueqin
> >> <xueqin.lin@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> >> Subject: Re: [PATCH v5 3/8] net/gve: add support for device
> initialization
> >>
> >> On 10/19/2022 4:59 PM, Li, Xiaoyun wrote:
> >>
> >>>
> >>> Hi
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >>>> Sent: Wednesday, October 19, 2022 14:46
> >>>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >>>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> >>>> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> >>>> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson,
> Bruce
> >>>> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> >> Wang,
> >>>> Haiyue <haiyue.wang@intel.com>
> >>>> Subject: Re: [PATCH v5 3/8] net/gve: add support for device
> >> initialization
> >>>>
> >>>> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> >>>>>
> >>>>> Support device init and add following devops skeleton:
> >>>>>     - dev_configure
> >>>>>     - dev_start
> >>>>>     - dev_stop
> >>>>>     - dev_close
> >>>>>
> >>>>> Note that build system (including doc) is also added in this patch.
> >>>>>
> >>>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> >>>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>>>
> >>>> <...>
> >>>>
> >>>>> diff --git a/doc/guides/rel_notes/release_22_11.rst
> >>>>> b/doc/guides/rel_notes/release_22_11.rst
> >>>>> index fbb575255f..c1162ea1a4 100644
> >>>>> --- a/doc/guides/rel_notes/release_22_11.rst
> >>>>> +++ b/doc/guides/rel_notes/release_22_11.rst
> >>>>> @@ -200,6 +200,11 @@ New Features
> >>>>>       into single event containing ``rte_event_vector``
> >>>>>       whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
> >>>>>
> >>>>> +* **Added GVE net PMD**
> >>>>> +
> >>>>> +  * Added the new ``gve`` net driver for Google Virtual Ethernet
> >> devices.
> >>>>> +  * See the :doc:`../nics/gve` NIC guide for more details on this
> new
> >> driver.
> >>>>> +
> >>>>>
> >>>>
> >>>> Can you please move the block amaong the other ethdev drivers, as
> >>>> alphabetically sorted?
> >>>>
> >>>> <...>
> >>>>
> >>>>> +static int
> >>>>> +gve_dev_init(struct rte_eth_dev *eth_dev) {
> >>>>> +       struct gve_priv *priv = eth_dev->data->dev_private;
> >>>>> +       int max_tx_queues, max_rx_queues;
> >>>>> +       struct rte_pci_device *pci_dev;
> >>>>> +       struct gve_registers *reg_bar;
> >>>>> +       rte_be32_t *db_bar;
> >>>>> +       int err;
> >>>>> +
> >>>>> +       eth_dev->dev_ops = &gve_eth_dev_ops;
> >>>>> +
> >>>>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> >>>>> +               return 0;
> >>>>> +
> >>>>> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> >>>>> +
> >>>>> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> >>>>> +       if (!reg_bar) {
> >>>>> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> >>>>> +               return -ENOMEM;
> >>>>> +       }
> >>>>> +
> >>>>> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> >>>>> +       if (!db_bar) {
> >>>>> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> >>>>> +               return -ENOMEM;
> >>>>> +       }
> >>>>> +
> >>>>> +       gve_write_version(&reg_bar->driver_version);
> >>>>> +       /* Get max queues to alloc etherdev */
> >>>>> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> >>>>> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> >>>>> +
> >>>>> +       priv->reg_bar0 = reg_bar;
> >>>>> +       priv->db_bar2 = db_bar;
> >>>>> +       priv->pci_dev = pci_dev;
> >>>>> +       priv->state_flags = 0x0;
> >>>>> +
> >>>>> +       priv->max_nb_txq = max_tx_queues;
> >>>>> +       priv->max_nb_rxq = max_rx_queues;
> >>>>> +
> >>>>> +       err = gve_init_priv(priv, false);
> >>>>> +       if (err)
> >>>>> +               return err;
> >>>>> +
> >>>>> +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac",
> >> sizeof(struct
> >>>> rte_ether_addr), 0);
> >>>>> +       if (!eth_dev->data->mac_addrs) {
> >>>>> +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store
> >> mac
> >>>> address");
> >>>>> +               return -ENOMEM;
> >>>>> +       }
> >>>>> +       rte_ether_addr_copy(&priv->dev_addr,
> >>>>> + eth_dev->data->mac_addrs);
> >>>>> +
> >>>>
> >>>> Is anything assinged to 'priv->dev_addr' to copy?
> >>>> Also since there is a 'priv->dev_addr' field, why not use it directly,
> >> instead of
> >>>> allocating memory for 'eth_dev->data->mac_addrs'?
> >>>> I mean why not "eth_dev->data->mac_addrs = &priv->dev_addr"?
> >>>
> >>> Makes sense. There's no need to allocate a new memory. @Guo,
> >> Junfeng Can you update this?
> >
> > Thanks Xiaoyun and Ferruh for the comments!
> > I tried to update the code as suggested but may get "Invalid Memory"
> > warning when quit the testpmd. I found it was caused at the function
> > rte_eth_dev_release_port with " rte_free(eth_dev->data->mac_addrs);
> ".
> > Seems that allocating memory for 'eth_dev->data->mac_addrs' is still
> > needed. Please help correct me if I misunderstood this. Thanks! I'll keep
> > this part unchanged for the coming patchset first.
> >
> 
> No it is not needed, you need to set pointer to NULL on release path to
> prevent common code free it (the problem you are getting). There are
> samples in various PMDs, please check.

Yes, make sense!
I'll double check for this and update in the coming version. Thanks!

> 
> >>>>
> >>>> <...>
> >>>>
> >>>>> +struct gve_priv {
> >>>>> +       struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
> >>>>> +       const struct rte_memzone *irq_dbs_mz;
> >>>>> +       uint32_t mgmt_msix_idx;
> >>>>> +       rte_be32_t *cnt_array; /* array of num_event_counters */
> >>>>> +       const struct rte_memzone *cnt_array_mz;
> >>>>> +
> >>>>> +       uint16_t num_event_counters;
> >>>>> +       uint16_t tx_desc_cnt; /* txq size */
> >>>>> +       uint16_t rx_desc_cnt; /* rxq size */
> >>>>> +       uint16_t tx_pages_per_qpl; /* tx buffer length */
> >>>>> +       uint16_t rx_data_slot_cnt; /* rx buffer length */
> >>>>
> >>>> These fields are not used in this patch, I guess some will be used in
> >> datapath
> >>>> patch.
> >>>
> >>> This is needed for base code gve_adminq.c not for datapath. Most of
> >> the stuff in gve_priv is for gve_adminq.c.
> >>> The adminq will update this info which dpdk pmd will need later.
> >> Compiler will complain if these don't exsit.
> >>>
> >>
> >> You are right they are used by 'gve_adminq.c', so OK to keep them, if
> >> there are ones not used at this stage, can you add them whenever they
> >> are used, or remove them if not used at all. If all used/required, no
> >> change required.
> >
> > Yes, we have already tried to move all the unused items to the
> corresponding
> > stages patch by patch. Thanks for reminding this!
> >
> 
> thanks.
> 
> >>
> >>>>
> >>>> Can you please only add fields that is used in the patch? This way it
> will
> >> be
> >>>> clear in which functionality that field is used and enable to detect
> not
> >> used
> >>>> fields.
> >>>> We are accepting batch updates for base code, but this is dpdk
> related
> >> code,
> >>>> lets only add things that are used when they are used.
> >>>> Same for all data structures.
> >>>>
> >>>> <...>
> >>>>
> >>>>> diff --git a/drivers/net/gve/version.map
> >> b/drivers/net/gve/version.map
> >>>>> new file mode 100644 index 0000000000..c2e0723b4c
> >>>>> --- /dev/null
> >>>>> +++ b/drivers/net/gve/version.map
> >>>>> @@ -0,0 +1,3 @@
> >>>>> +DPDK_22 {
> >>>>
> >>>> DPDK_23
> >


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v5 6/8] net/gve: add support for dev info get and dev configure
  2022-10-20 11:19                       ` Ferruh Yigit
@ 2022-10-21  5:22                         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-21  5:22 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing
  Cc: ferruh.yigit, dev, Li, Xiaoyun, awogbemila, Richardson, Bruce,
	Lin, Xueqin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 19:20
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v5 6/8] net/gve: add support for dev info get and dev
> configure
> 
> On 10/20/2022 10:29 AM, Guo, Junfeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, October 19, 2022 21:49
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> >> Cc: ferruh.yigit@xilinx.com; dev@dpdk.org; Li, Xiaoyun
> >> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>
> >> Subject: Re: [PATCH v5 6/8] net/gve: add support for dev info get and
> dev
> >> configure
> >>
> >> On 10/10/2022 11:17 AM, Junfeng Guo wrote:
> >>
> >>>
> >>> Add dev_ops dev_infos_get.
> >>> Complete dev_configure with RX offloads configuration.
> >>>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> --- a/drivers/net/gve/gve_ethdev.c
> >>> +++ b/drivers/net/gve/gve_ethdev.c
> >>> @@ -29,8 +29,16 @@ gve_write_version(uint8_t
> >> *driver_version_register)
> >>>    }
> >>>
> >>>    static int
> >>> -gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
> >>> +gve_dev_configure(struct rte_eth_dev *dev)
> >>>    {
> >>> +       struct gve_priv *priv = dev->data->dev_private;
> >>> +
> >>> +       if (dev->data->dev_conf.rxmode.mq_mode &
> >> RTE_ETH_MQ_RX_RSS_FLAG)
> >>> +               dev->data->dev_conf.rxmode.offloads |=
> >> RTE_ETH_RX_OFFLOAD_RSS_HASH;
> >>> +
> >>
> >> This is force enabling the feature, we are doing this for PMDs that has
> >> the hash value anyway and no additional work or performance loss
> >> observed to enable this offload. Otherwise drivers shouldn't update
> >> 'dev_conf.rxmode'.
> >>
> >> Can you please confirm this PMD fits above description? And can you
> >> please add a coment that says force enabling the feature?
> >
> > Yes, it seems force enabling this offloading is not quite reasonable here.
> > This may just follow previous PMD convention, so we decided to
> remove
> > this part in the coming version. Thanks!
> >
> >>
> >>> +       if (dev->data->dev_conf.rxmode.offloads &
> >> RTE_ETH_RX_OFFLOAD_TCP_LRO)
> >>> +               priv->enable_rsc = 1;
> >>> +
> >>>           return 0;
> >>>    }
> >>>
> >>> @@ -94,6 +102,60 @@ gve_dev_close(struct rte_eth_dev *dev)
> >>>           return err;
> >>>    }
> >>>
> >>> +static int
> >>> +gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
> >> *dev_info)
> >>> +{
> >>> +       struct gve_priv *priv = dev->data->dev_private;
> >>> +
> >>> +       dev_info->device = dev->device;
> >>> +       dev_info->max_mac_addrs = 1;
> >>> +       dev_info->max_rx_queues = priv->max_nb_rxq;
> >>> +       dev_info->max_tx_queues = priv->max_nb_txq;
> >>> +       dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
> >>> +       dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
> >>> +       dev_info->max_mtu = RTE_ETHER_MTU;
> >>
> >> Can you please confirm max MTU this PMD supports is 1500? Meaning
> it
> >> doesn't support jumbo frames etc...
> >
> > Actually here is just a workaround solution for the max_mtu info...
> > We can only get the max_mtu value via adminq message from the
> backend.
> > But the real one (i.e., priv->max_mtu) we get is 1460, which is less than
> 1500
> > Seems it is the GCP bug or something.
> > If we use "dev_info->max_mtu = priv->max_mtu", the testpmd cannot
> even
> > be launched successfully...
> > I'll keep this part unchanged with some comments here if no other
> solutions.
> > Please help correct me if you have any other idea. Thanks a lot!
> >
> 
> Getting actual value from device is correct thing to do, but it seems
> received value is not good, so OK to keep as it is.
> Can you please follow this with GVE?

Sure, will update this in the coming version. Thanks!

> 
> >>
> >>> +       dev_info->min_mtu = RTE_ETHER_MIN_MTU;
> >>> +
> >>> +       dev_info->rx_offload_capa = 0;
> >>> +       dev_info->tx_offload_capa =
> >>> +               RTE_ETH_TX_OFFLOAD_MULTI_SEGS   |
> >>> +               RTE_ETH_TX_OFFLOAD_IPV4_CKSUM   |
> >>> +               RTE_ETH_TX_OFFLOAD_UDP_CKSUM    |
> >>> +               RTE_ETH_TX_OFFLOAD_TCP_CKSUM    |
> >>> +               RTE_ETH_TX_OFFLOAD_SCTP_CKSUM   |
> >>> +               RTE_ETH_TX_OFFLOAD_TCP_TSO;
> >>
> >> Can you adverstise these capabilities in the patch that implements
> them?
> >
> > Will move this to the corresponding patch, thanks!
> >


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v7 0/8] introduce GVE PMD
  2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
                                       ` (7 preceding siblings ...)
  2022-10-20 10:36                     ` [PATCH v6 8/8] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-10-21  9:19                     ` Junfeng Guo
  2022-10-21  9:19                       ` [PATCH v7 1/8] net/gve/base: introduce base code Junfeng Guo
                                         ` (8 more replies)
  8 siblings, 9 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Introduce a new PMD for Google Virtual Ethernet (GVE).

gve (or gVNIC) is the standard virtual ethernet interface on Google Cloud
Platform (GCP), which is one of the multiple virtual interfaces from those
leading CSP customers in the world.

Having a well maintained/optimized gve PMD on DPDK community can help those
cloud instance consumers with better experience of performance, maintenance
who wants to run their own VNFs on GCP.

Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
for the device description.

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h

It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0


v2:
fix some CI check error.

v3:
refactor some code and fix some build error.

v4:
move the Google base code files into DPDK base folder.

v5:
reorder commit sequence and drop the stats feature.

v6-v7:
improve the code.

Junfeng Guo (8):
  net/gve/base: introduce base code
  net/gve/base: add OS specific implementation
  net/gve: add support for device initialization
  net/gve: add support for link update
  net/gve: add support for MTU setting
  net/gve: add support for dev info get and dev configure
  net/gve: add support for queue operations
  net/gve: add support for Rx/Tx

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  16 +
 doc/guides/nics/gve.rst                |  76 ++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve.h             |  56 ++
 drivers/net/gve/base/gve_adminq.c      | 923 +++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h      | 381 ++++++++++
 drivers/net/gve/base/gve_desc.h        | 138 ++++
 drivers/net/gve/base/gve_desc_dqo.h    | 255 +++++++
 drivers/net/gve/base/gve_osdep.h       | 159 +++++
 drivers/net/gve/base/gve_register.h    |  29 +
 drivers/net/gve/gve_ethdev.c           | 700 +++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 298 ++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/gve_rx.c               | 354 ++++++++++
 drivers/net/gve/gve_tx.c               | 668 ++++++++++++++++++
 drivers/net/gve/meson.build            |  16 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 20 files changed, 4099 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_osdep.h
 create mode 100644 drivers/net/gve/base/gve_register.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.34.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v7 1/8] net/gve/base: introduce base code
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:49                         ` Ferruh Yigit
                                           ` (2 more replies)
  2022-10-21  9:19                       ` [PATCH v7 2/8] net/gve/base: add OS specific implementation Junfeng Guo
                                         ` (7 subsequent siblings)
  8 siblings, 3 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
- gve_adminq.c
- gve_adminq.h
- gve_desc.h
- gve_desc_dqo.h
- gve_register.h
- gve.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Note that these code are not Intel files and they come from the kernel
community. The base code there has the statement of
SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
required MIT license as an exception to DPDK.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve.h          |  56 ++
 drivers/net/gve/base/gve_adminq.c   | 922 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h   | 379 ++++++++++++
 drivers/net/gve/base/gve_desc.h     | 136 ++++
 drivers/net/gve/base/gve_desc_dqo.h | 253 ++++++++
 drivers/net/gve/base/gve_register.h |  27 +
 6 files changed, 1773 insertions(+)
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_register.h

diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
new file mode 100644
index 0000000000..2dc4507acb
--- /dev/null
+++ b/drivers/net/gve/base/gve.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include "gve_desc.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR		0
+#define GVE_DB_BAR		2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES		1024
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
new file mode 100644
index 0000000000..2eb06a7b68
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -0,0 +1,922 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n Expected: length=%d, feature_mask=%x.\n Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIME;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_rsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u8 *mac;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	mac = descriptor->mac;
+	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
+		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR,
+			    "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
new file mode 100644
index 0000000000..b2422d7dc8
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -0,0 +1,379 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
new file mode 100644
index 0000000000..e0bbadcfd4
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+GVE_CHECK_STRUCT_LEN(64, gve_rx_desc);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
new file mode 100644
index 0000000000..9965f190d1
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
new file mode 100644
index 0000000000..bf7f102cde
--- /dev/null
+++ b/drivers/net/gve/base/gve_register.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v7 2/8] net/gve/base: add OS specific implementation
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
  2022-10-21  9:19                       ` [PATCH v7 1/8] net/gve/base: introduce base code Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:19                       ` [PATCH v7 3/8] net/gve: add support for device initialization Junfeng Guo
                                         ` (6 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve_adminq.h   |   2 +
 drivers/net/gve/base/gve_desc.h     |   2 +
 drivers/net/gve/base/gve_desc_dqo.h |   2 +
 drivers/net/gve/base/gve_osdep.h    | 159 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_register.h |   2 +
 5 files changed, 167 insertions(+)
 create mode 100644 drivers/net/gve/base/gve_osdep.h

diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
index b2422d7dc8..05550119de 100644
--- a/drivers/net/gve/base/gve_adminq.h
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -6,6 +6,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
index e0bbadcfd4..006b36442f 100644
--- a/drivers/net/gve/base/gve_desc.h
+++ b/drivers/net/gve/base/gve_desc.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
index 9965f190d1..ee1afdecb8 100644
--- a/drivers/net/gve/base/gve_desc_dqo.h
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/base/gve_osdep.h b/drivers/net/gve/base/gve_osdep.h
new file mode 100644
index 0000000000..7cb73002f4
--- /dev/null
+++ b/drivers/net/gve/base/gve_osdep.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "../gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+
+#ifndef PAGE_SHIFT
+#define PAGE_SHIFT	12
+#endif
+#ifndef PAGE_SIZE
+#define PAGE_SIZE	(1UL << PAGE_SHIFT)
+#endif
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#ifndef ____cacheline_aligned
+#define ____cacheline_aligned	__rte_cache_aligned
+#endif
+#ifndef __packed
+#define __packed		__rte_packed
+#endif
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
index bf7f102cde..c674167f31 100644
--- a/drivers/net/gve/base/gve_register.h
+++ b/drivers/net/gve/base/gve_register.h
@@ -6,6 +6,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v7 3/8] net/gve: add support for device initialization
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
  2022-10-21  9:19                       ` [PATCH v7 1/8] net/gve/base: introduce base code Junfeng Guo
  2022-10-21  9:19                       ` [PATCH v7 2/8] net/gve/base: add OS specific implementation Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:49                         ` Ferruh Yigit
  2022-10-21  9:19                       ` [PATCH v7 4/8] net/gve: add support for link update Junfeng Guo
                                         ` (5 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

Support device init and add following devops skeleton:
 - dev_configure
 - dev_start
 - dev_stop
 - dev_close

Note that build system (including doc) is also added in this patch.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  10 +
 doc/guides/nics/gve.rst                |  68 +++++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve_adminq.c      |   1 +
 drivers/net/gve/gve_ethdev.c           | 368 +++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 225 +++++++++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/meson.build            |  14 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 12 files changed, 716 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 92b381bc30..2d06a76efe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..44aec28009
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,10 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..703fbcc5de
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,68 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_gve**) provides poll mode driver support for
+Google Virtual Ethernet device (also called as gVNIC).
+
+gVNIC is the standard virtual ethernet interface on Google Cloud Platform (GCP),
+which is one of the multiple virtual interfaces from those leading CSP
+customers in the world.
+
+Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
+for the device description.
+
+Having a well maintained/optimized gve PMD on DPDK community can help those
+cloud instance consumers with better experience of performance, maintenance
+who wants to run their own VNFs on GCP.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_desc.h
+- gve_desc_dqo.h
+- gve_register.h
+- gve.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL - GQI with queue page list
+- GQI_RDA - GQI with raw DMA addressing
+- DQO_RDA - DQO with raw DMA addressing
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 32c7544968..4d40ea29a3 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -29,6 +29,7 @@ Network Interface Controller Drivers
     enetfec
     enic
     fm10k
+    gve
     hinic
     hns3
     i40e
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 1c3daf141d..21c366b0e2 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -152,6 +152,11 @@ New Features
   * Added Q-in-CMB feature controlled by devarg ionic_cmb.
   * Added optimized handlers for non-scattered Rx and Tx.
 
+* **Added GVE net PMD**
+
+  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
+
 * **Updated Intel iavf driver.**
 
   * Added flow subscription support.
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
index 2eb06a7b68..a550b27e61 100644
--- a/drivers/net/gve/base/gve_adminq.c
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2015-2022 Google, Inc.
  */
 
+#include "../gve_ethdev.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..acbb412509
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,368 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+#include "base/gve_register.h"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	dev->data->mac_addrs = NULL;
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_DRV_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map pci bar!");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = &priv->dev_addr;
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	gve_teardown_priv_resources(priv);
+
+	eth_dev->data->mac_addrs = NULL;
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
new file mode 100644
index 0000000000..2ac2a46ac1
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.h
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ETHDEV_H_
+#define _GVE_ETHDEV_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "base/gve.h"
+
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
+
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_rsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+static inline bool
+gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool
+gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+
+#endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..0d02da46e1
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_driver;
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
+		__func__, ## args)
+
+#endif
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..d8ec64b3a3
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'base/gve_adminq.c',
+        'gve_ethdev.c',
+)
+includes += include_directories('base')
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..78c3585d7c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_23 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 35bfa78dee..355dbd07e9 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v7 4/8] net/gve: add support for link update
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
                                         ` (2 preceding siblings ...)
  2022-10-21  9:19                       ` [PATCH v7 3/8] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:19                       ` [PATCH v7 5/8] net/gve: add support for MTU setting Junfeng Guo
                                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 doc/guides/nics/gve.rst          |  3 +++
 drivers/net/gve/gve_ethdev.c     | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 44aec28009..ae466ad677 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Link status          = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 703fbcc5de..c42ff23841 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -60,6 +60,9 @@ Features and Limitations
 
 In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
+Supported features of the GVE PMD are:
+
+- Link state information
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index acbb412509..34243c1672 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -34,10 +34,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_DRV_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -72,6 +101,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v7 5/8] net/gve: add support for MTU setting
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
                                         ` (3 preceding siblings ...)
  2022-10-21  9:19                       ` [PATCH v7 4/8] net/gve: add support for link update Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:50                         ` Ferruh Yigit
  2022-10-21  9:19                       ` [PATCH v7 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
                                         ` (3 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 drivers/net/gve/gve_ethdev.c     | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index ae466ad677..d1703d8dab 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -5,6 +5,7 @@
 ;
 [Features]
 Link status          = Y
+MTU update           = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 34243c1672..554f58640d 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -96,12 +96,40 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
+			    RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v7 6/8] net/gve: add support for dev info get and dev configure
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
                                         ` (4 preceding siblings ...)
  2022-10-21  9:19                       ` [PATCH v7 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:51                         ` Ferruh Yigit
  2022-10-21  9:19                       ` [PATCH v7 7/8] net/gve: add support for queue operations Junfeng Guo
                                         ` (2 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads force enabling.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 ++
 doc/guides/nics/gve.rst          |  1 +
 drivers/net/gve/gve_ethdev.c     | 59 +++++++++++++++++++++++++++++++-
 drivers/net/gve/gve_ethdev.h     |  3 ++
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index d1703d8dab..986df7f94a 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,8 +4,10 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+RSS hash             = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index c42ff23841..8c09a5a7fa 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
 Supported features of the GVE PMD are:
 
+- Receiver Side Scaling (RSS)
 - Link state information
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 554f58640d..7fbe0c78c9 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -29,8 +29,16 @@ gve_write_version(uint8_t *driver_version_register)
 }
 
 static int
-gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+gve_dev_configure(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+		dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_rsc = 1;
+
 	return 0;
 }
 
@@ -96,6 +104,54 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+	dev_info->max_mtu = GVE_MAX_MTU;
+	dev_info->min_mtu = GVE_MIN_MTU;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -128,6 +184,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 2ac2a46ac1..57c29374b5 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -18,6 +18,9 @@
 #define GVE_MIN_BUF_SIZE	    1024
 #define GVE_MAX_RX_PKTLEN	    65535
 
+#define GVE_MAX_MTU	RTE_ETHER_MTU
+#define GVE_MIN_MTU	RTE_ETHER_MIN_MTU
+
 /* A list of pages registered with the device during setup and used by a queue
  * as buffers
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v7 7/8] net/gve: add support for queue operations
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
                                         ` (5 preceding siblings ...)
  2022-10-21  9:19                       ` [PATCH v7 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:19                       ` [PATCH v7 8/8] net/gve: add support for Rx/Tx Junfeng Guo
  2022-10-21 13:12                       ` [PATCH v7 0/8] introduce GVE PMD Ferruh Yigit
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add support for queue operations:
- setup rx/tx queue
- release rx/tx queue
- start rx/tx queues
- stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 204 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h |  52 +++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 684 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 7fbe0c78c9..892e7e2e1c 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -28,6 +28,68 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(struct rte_eth_dev *dev)
 {
@@ -42,6 +104,43 @@ gve_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -73,16 +172,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -91,7 +244,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -99,6 +256,19 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+	rte_free(priv->adminq);
+
 	dev->data->mac_addrs = NULL;
 
 	return err;
@@ -185,6 +355,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.dev_infos_get        = gve_dev_info_get,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -322,7 +494,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -373,10 +547,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 57c29374b5..00c69d1b88 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -37,15 +37,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -59,6 +79,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -67,9 +89,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -86,6 +116,8 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_priv {
@@ -225,4 +257,24 @@ gve_clear_device_rings_ok(struct gve_priv *priv)
 				&priv->state_flags);
 }
 
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void
+gve_tx_queue_release(void *txq);
+
+void
+gve_rx_queue_release(void *rxq);
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..e64a461253
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_DRV_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..b706b62e71
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_DRV_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index d8ec64b3a3..af0010c01c 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,6 +9,8 @@ endif
 
 sources = files(
         'base/gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
 includes += include_directories('base')
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v7 8/8] net/gve: add support for Rx/Tx
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
                                         ` (6 preceding siblings ...)
  2022-10-21  9:19                       ` [PATCH v7 7/8] net/gve: add support for queue operations Junfeng Guo
@ 2022-10-21  9:19                       ` Junfeng Guo
  2022-10-21  9:52                         ` Ferruh Yigit
  2022-10-21 13:12                       ` [PATCH v7 0/8] introduce GVE PMD Ferruh Yigit
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-21  9:19 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |   2 +
 doc/guides/nics/gve.rst          |   4 +
 drivers/net/gve/gve_ethdev.c     |  15 +-
 drivers/net/gve/gve_ethdev.h     |  18 ++
 drivers/net/gve/gve_rx.c         | 142 ++++++++++
 drivers/net/gve/gve_tx.c         | 454 +++++++++++++++++++++++++++++++
 6 files changed, 634 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 986df7f94a..cdc46b08a3 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -7,7 +7,9 @@
 Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+TSO                  = Y
 RSS hash             = Y
+L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 8c09a5a7fa..1042852fd6 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -62,8 +62,12 @@ In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
 Supported features of the GVE PMD are:
 
+- Multiple queues for TX and RX
 - Receiver Side Scaling (RSS)
+- TSO offload
 - Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 892e7e2e1c..5c0cd2f2c4 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -289,7 +289,13 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->min_mtu = GVE_MIN_MTU;
 
 	dev_info->rx_offload_capa = 0;
-	dev_info->tx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS	|
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
 
 	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
 		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
@@ -639,6 +645,13 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	} else {
+		PMD_DRV_LOG(ERR, "DQO_RDA is not implemented and will be added in the future");
+	}
+
 	eth_dev->data->mac_addrs = &priv->dev_addr;
 
 	return 0;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 00c69d1b88..36b334c36b 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -37,6 +37,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -277,4 +289,10 @@ gve_stop_tx_queues(struct rte_eth_dev *dev);
 void
 gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t
+gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t
+gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index e64a461253..ea397d68fa 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,148 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+	uint16_t i;
+
+	rxr = rxq->rx_desc_ring;
+	nb_rx = 0;
+
+	for (i = 0; i < nb_pkts; i++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
+				   (void *)(size_t)addr, len);
+		}
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+		nb_rx++;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index b706b62e71..9678bb4dfa 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,460 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+		}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	/* for the case 'ntc > start' */
+	nb_clean = ntc - start;
+	if (nb_clean > GVE_TX_MAX_FREE_SZ)
+		nb_clean = GVE_TX_MAX_FREE_SZ;
+	if (txq->is_gqi_qpl) {
+		for (i = start; i < start + nb_clean; i++) {
+			iov = &txq->iov_ring[i];
+			txq->fifo_avail += iov->iov_len;
+			iov->iov_base = 0;
+			iov->iov_len = 0;
+		}
+	} else {
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+	}
+	txq->nb_free += nb_clean;
+	txq->next_to_clean += nb_clean;
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+				   (void *)(size_t)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen,
+					 (void *)(size_t)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+					   (void *)(size_t)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(size_t)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 1/8] net/gve/base: introduce base code
  2022-10-21  9:19                       ` [PATCH v7 1/8] net/gve/base: introduce base code Junfeng Guo
@ 2022-10-21  9:49                         ` Ferruh Yigit
  2022-10-24  5:04                           ` Guo, Junfeng
  2022-10-24 10:50                         ` Ferruh Yigit
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
  2 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-21  9:49 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Haiyue Wang

On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> The following base code is based on Google Virtual Ethernet (gve)
> driver v1.3.0 under MIT license.
> - gve_adminq.c
> - gve_adminq.h
> - gve_desc.h
> - gve_desc_dqo.h
> - gve_register.h
> - gve.h
> 
> The original code is in:
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
> tree/v1.3.0/google/gve
> 
> Note that these code are not Intel files and they come from the kernel
> community. The base code there has the statement of
> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> required MIT license as an exception to DPDK.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
> +{
> +	int i;
> +
> +	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
> +		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
> +		    == prod_cnt)

[copy/paste from previous version]

Syntax, why not move second half of the equation in above line?
Unless this is coming from google code and updating it brings 
maintanance cost.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 3/8] net/gve: add support for device initialization
  2022-10-21  9:19                       ` [PATCH v7 3/8] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-21  9:49                         ` Ferruh Yigit
  2022-10-24  5:04                           ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-21  9:49 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Haiyue Wang

On 10/21/2022 10:19 AM, Junfeng Guo wrote:

> 
> Support device init and add following devops skeleton:
>   - dev_configure
>   - dev_start
>   - dev_stop
>   - dev_close
> 
> Note that build system (including doc) is also added in this patch.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +static int
> +gve_dev_init(struct rte_eth_dev *eth_dev)
> +{
> +       struct gve_priv *priv = eth_dev->data->dev_private;
> +       int max_tx_queues, max_rx_queues;
> +       struct rte_pci_device *pci_dev;
> +       struct gve_registers *reg_bar;
> +       rte_be32_t *db_bar;
> +       int err;
> +
> +       eth_dev->dev_ops = &gve_eth_dev_ops;
> +
> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +               return 0;
> +
> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> +
> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> +       if (!reg_bar) {
> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> +               return -ENOMEM;
> +       }
> +
> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> +       if (!db_bar) {
> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> +               return -ENOMEM;
> +       }
> +
> +       gve_write_version(&reg_bar->driver_version);
> +       /* Get max queues to alloc etherdev */
> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> +
> +       priv->reg_bar0 = reg_bar;
> +       priv->db_bar2 = db_bar;
> +       priv->pci_dev = pci_dev;
> +       priv->state_flags = 0x0;
> +
> +       priv->max_nb_txq = max_tx_queues;
> +       priv->max_nb_rxq = max_rx_queues;
> +
> +       err = gve_init_priv(priv, false);
> +       if (err)
> +               return err;
> +
> +       eth_dev->data->mac_addrs = &priv->dev_addr;
> +

[copy/paste from previous version]

What is the value in 'priv->dev_addr'?
Even allocating memory for 'eth_dev->data->mac_addrs' removed or not, as 
we discussed, independent from it, need to set a valid value to 
'priv->dev_addr'.




^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 5/8] net/gve: add support for MTU setting
  2022-10-21  9:19                       ` [PATCH v7 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-21  9:50                         ` Ferruh Yigit
  2022-10-24  5:04                           ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-21  9:50 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/21/2022 10:19 AM, Junfeng Guo wrote:

> 
> Support dev_ops mtu_set.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +static int
> +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> +       struct gve_priv *priv = dev->data->dev_private;
> +       int err;
> +
> +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> +               PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
> +                           RTE_ETHER_MIN_MTU, priv->max_mtu);
> +               return -EINVAL;
> +       }
> +
> +       /* mtu setting is forbidden if port is start */
> +       if (dev->data->dev_started) {
> +               PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
> +               return -EBUSY;
> +       }
> +
> +       err = gve_adminq_set_mtu(priv, mtu);
> +       if (err) {
> +               PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
> +               return err;
> +       }
> +
> +       return 0;
> +}
> +

[copy/paste from previous version]

configure() (gve_dev_configure()) also get 'mtu' as user config 
('eth_conf->rxmode.mtu') which is ignored right now,

since there is 'gve_adminq_set_mtu()' command already what do you think 
to use it within 'gve_dev_configure()'?


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 6/8] net/gve: add support for dev info get and dev configure
  2022-10-21  9:19                       ` [PATCH v7 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-21  9:51                         ` Ferruh Yigit
  2022-10-24  5:04                           ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-21  9:51 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/21/2022 10:19 AM, Junfeng Guo wrote:

> 
> Add dev_ops dev_infos_get.
> Complete dev_configure with RX offloads force enabling.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> --- a/doc/guides/nics/gve.rst
> +++ b/doc/guides/nics/gve.rst
> @@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic functionality of packet
>   reception and transmission.
>   Supported features of the GVE PMD are:
> 
> +- Receiver Side Scaling (RSS)

[copy/paste from previous version]

I am not sure if driver can claim this, I can see a RSS hash is provided 
but is it possible to update which hash function to use or update key or 
RETA table to configure which queue packets goes?

Right now what is RSS calculated on?

Perpaps RSS support can be documented as limited?

And not sure if this update belongs this patch, it should be to the one 
that has the datapath.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
  2022-10-21  9:19                       ` [PATCH v7 8/8] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-10-21  9:52                         ` Ferruh Yigit
  2022-10-24  5:04                           ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-21  9:52 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/21/2022 10:19 AM, Junfeng Guo wrote:

> 
> Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>

<...>

> +
> +static inline void
> +gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
> +{
> +       uint32_t start = txq->sw_ntc;
> +       uint32_t ntc, nb_clean;
> +
> +       ntc = txq->sw_tail;
> +
> +       if (ntc == start)
> +               return;
> +
> +       /* if wrap around, free twice. */
> +       if (ntc < start) {
> +               nb_clean = txq->nb_tx_desc - start;
> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> +
> +               txq->sw_nb_free += nb_clean;
> +               start += nb_clean;
> +               if (start == txq->nb_tx_desc)
> +                       start = 0;
> +               txq->sw_ntc = start;
> +       }
> +
> +       if (ntc > start) {
> +               nb_clean = ntc - start;
> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> +               txq->sw_nb_free += nb_clean;
> +               start += nb_clean;
> +               txq->sw_ntc = start;
> +       }
> +}

[copy/paste from previous version]

may be can drop the 'if' block, since "ntc == start" and "ntc < start" 
cases already covered.

<...>

> +uint16_t
> +gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> +{
> +       struct gve_tx_queue *txq = tx_queue;
> +
> +       if (txq->is_gqi_qpl)
> +               return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
> +
> +       return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
> +}
> +

[copy/paste from previous version]

Can there be mix of queue types?
If only one queue type is supported in specific config, perhaps burst 
function can be set during configuration, to prevent if check on datapath.

This is optimization and can be done later, it doesn't have to be in the 
set.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 0/8] introduce GVE PMD
  2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
                                         ` (7 preceding siblings ...)
  2022-10-21  9:19                       ` [PATCH v7 8/8] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-10-21 13:12                       ` Ferruh Yigit
  2022-10-24 10:50                         ` Ferruh Yigit
  8 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-21 13:12 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> Introduce a new PMD for Google Virtual Ethernet (GVE).
> 
> gve (or gVNIC) is the standard virtual ethernet interface on Google Cloud
> Platform (GCP), which is one of the multiple virtual interfaces from those
> leading CSP customers in the world.
> 
> Having a well maintained/optimized gve PMD on DPDK community can help those
> cloud instance consumers with better experience of performance, maintenance
> who wants to run their own VNFs on GCP.
> 
> Please refer tohttps://cloud.google.com/compute/docs/networking/using-gvnic
> for the device description.
> 
> This patch set requires an exception for MIT license for GVE base code.
> And the base code includes the following files:
>   - gve_adminq.c
>   - gve_adminq.h
>   - gve_desc.h
>   - gve_desc_dqo.h
>   - gve_register.h
> 
> It's based on GVE kernel driver v1.3.0 and the original code is in
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0
> 
> 
> v2:
> fix some CI check error.
> 
> v3:
> refactor some code and fix some build error.
> 
> v4:
> move the Google base code files into DPDK base folder.
> 
> v5:
> reorder commit sequence and drop the stats feature.
> 
> v6-v7:
> improve the code.
> 
> Junfeng Guo (8):
>    net/gve/base: introduce base code
>    net/gve/base: add OS specific implementation
>    net/gve: add support for device initialization
>    net/gve: add support for link update
>    net/gve: add support for MTU setting
>    net/gve: add support for dev info get and dev configure
>    net/gve: add support for queue operations
>    net/gve: add support for Rx/Tx

Can you please check the build error reported by CI:
https://mails.dpdk.org/archives/test-report/2022-October/318054.html


Following link can be helpful:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225324


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-20 14:39                       ` Ferruh Yigit
@ 2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  2:10 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei, Li, Xiaoyun
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 22:39
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code
> 
> On 10/20/2022 11:36 AM, Junfeng Guo wrote:
> > diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
> > new file mode 100644
> > index 0000000000..1b0d59b639
> > --- /dev/null
> > +++ b/drivers/net/gve/base/gve.h
> > @@ -0,0 +1,58 @@
> > +/* SPDX-License-Identifier: MIT
> > + * Google Virtual Ethernet (gve) driver
> > + * Version: 1.3.0
> 
> [1]
> 
> > + * Copyright (C) 2015-2022 Google, Inc.
> > + * Copyright(C) 2022 Intel Corporation
> 
> [2]
> 
> > + */
> > +
> > +#ifndef_GVE_H_
> > +#define_GVE_H_
> > +
> > +#include "gve_desc.h"
> > +
> > +#define GVE_VERSION            "1.3.0"
> > +#define GVE_VERSION_PREFIX     "GVE-"
> > +
> 
> Is it clarified/decided to keep version in the file comment [1] and keep
> Intel copyright [2], or is this just not addressed yet?

Yes, we will remove these in the coming version. Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code
  2022-10-20 14:40                       ` Ferruh Yigit
@ 2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  2:10 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 22:41
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code
> 
> On 10/20/2022 11:36 AM, Junfeng Guo wrote:
> 
> >
> > The following base code is based on Google Virtual Ethernet (gve)
> > driver v1.3.0 under MIT license.
> > - gve_adminq.c
> > - gve_adminq.h
> > - gve_desc.h
> > - gve_desc_dqo.h
> > - gve_register.h
> > - gve.h
> >
> > The original code is in:
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linux/\
> > tree/v1.3.0/google/gve
> >
> > Note that these code are not Intel files and they come from the kernel
> > community. The base code there has the statement of
> > SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> > required MIT license as an exception to DPDK.
> 
> Can drop "GVE PMD" from patch title, since 'net/gve/base:' already
> implies it, like:
> net/gve/base: introduce base code

Sure, make sense!
Will update this in the coming version, thanks!

> 
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32
> prod_cnt)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
> > +               if (ioread32be(&priv->reg_bar0->adminq_event_counter)
> > +                   == prod_cnt)
> 
> Syntax, why not move second half of the equation in above line?
> Unless this is coming from google code and updating it brings
> maintanance cost.

Yes, this is basic adminq processing function and is coming from google code
without any change. Better to keep this unchanged with the origin. Thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v6 3/8] net/gve: add support for device initialization
  2022-10-20 14:42                       ` Ferruh Yigit
@ 2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  2:10 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 22:42
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v6 3/8] net/gve: add support for device initialization
> 
> On 10/20/2022 11:36 AM, Junfeng Guo wrote:
> 
> >
> > Support device init and add following devops skeleton:
> >   - dev_configure
> >   - dev_start
> >   - dev_stop
> >   - dev_close
> >
> > Note that build system (including doc) is also added in this patch.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > index 1c3daf141d..715013fa35 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -140,6 +140,11 @@ New Features
> >
> >     * Made compatible with libbpf v0.8.0 (when used with libxdp).
> >
> > +* **Added GVE net PMD**
> > +
> > +  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
> > +  * See the :doc:`../nics/gve` NIC guide for more details on this new
> driver.
> > +
> 
> Can you please move it one more down, just above 'Intel', to sort it
> alphabetically based on Vendor name, in this case 'G' I guess.
> We are almost there :)

Sure, thanks for reminding this!

> 
> <...>
> 
> > +static int
> > +gve_dev_init(struct rte_eth_dev *eth_dev)
> > +{
> > +       struct gve_priv *priv = eth_dev->data->dev_private;
> > +       int max_tx_queues, max_rx_queues;
> > +       struct rte_pci_device *pci_dev;
> > +       struct gve_registers *reg_bar;
> > +       rte_be32_t *db_bar;
> > +       int err;
> > +
> > +       eth_dev->dev_ops = &gve_eth_dev_ops;
> > +
> > +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> > +               return 0;
> > +
> > +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> > +
> > +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> > +       if (!reg_bar) {
> > +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> > +               return -ENOMEM;
> > +       }
> > +
> > +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> > +       if (!db_bar) {
> > +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> > +               return -ENOMEM;
> > +       }
> > +
> > +       gve_write_version(&reg_bar->driver_version);
> > +       /* Get max queues to alloc etherdev */
> > +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> > +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> > +
> > +       priv->reg_bar0 = reg_bar;
> > +       priv->db_bar2 = db_bar;
> > +       priv->pci_dev = pci_dev;
> > +       priv->state_flags = 0x0;
> > +
> > +       priv->max_nb_txq = max_tx_queues;
> > +       priv->max_nb_rxq = max_rx_queues;
> > +
> > +       err = gve_init_priv(priv, false);
> > +       if (err)
> > +               return err;
> > +
> > +       eth_dev->data->mac_addrs = rte_zmalloc("gve_mac", sizeof(struct
> rte_ether_addr), 0);
> > +       if (!eth_dev->data->mac_addrs) {
> > +               PMD_DRV_LOG(ERR, "Failed to allocate memory to store mac
> address");
> > +               return -ENOMEM;
> > +       }
> > +       rte_ether_addr_copy(&priv->dev_addr, eth_dev->data-
> >mac_addrs);
> > +
> 
> What is the value in 'priv->dev_addr'?
> Even allocating memory for 'eth_dev->data->mac_addrs' removed or not,
> as
> we discussed, independent from it, need to set a valid value to
> 'priv->dev_addr'.

Again, thanks for the explanation!
Will update this and re-validate for this.
The addr value could be like " 42:01:0A:00:08:03".
Thanks!

> 
> <...>
> 
> > diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
> > new file mode 100644
> > index 0000000000..c2e0723b4c
> > --- /dev/null
> > +++ b/drivers/net/gve/version.map
> > @@ -0,0 +1,3 @@
> > +DPDK_22 {
> 
> DPDK_23
> 
> In case it is not clear, since this comment skipped in previous a few
> versions, the ABI version should be 'DPDK_23', so the content of this
> file should be;
> 
> DPDK_23 {
>          local: *;
> };

Sure, will update this in the coming version. Thanks a lot!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v6 5/8] net/gve: add support for MTU setting
  2022-10-20 14:45                       ` Ferruh Yigit
@ 2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  2:10 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 22:45
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v6 5/8] net/gve: add support for MTU setting
> 
> On 10/20/2022 11:36 AM, Junfeng Guo wrote:
> 
> >
> > Support dev_ops mtu_set.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   doc/guides/nics/features/gve.ini |  1 +
> >   drivers/net/gve/gve_ethdev.c     | 27 +++++++++++++++++++++++++++
> >   2 files changed, 28 insertions(+)
> >
> > diff --git a/doc/guides/nics/features/gve.ini
> b/doc/guides/nics/features/gve.ini
> > index ae466ad677..d1703d8dab 100644
> > --- a/doc/guides/nics/features/gve.ini
> > +++ b/doc/guides/nics/features/gve.ini
> > @@ -5,6 +5,7 @@
> >   ;
> >   [Features]
> >   Link status          = Y
> > +MTU update           = Y
> >   Linux                = Y
> >   x86-32               = Y
> >   x86-64               = Y
> > diff --git a/drivers/net/gve/gve_ethdev.c
> b/drivers/net/gve/gve_ethdev.c
> > index ca4a467140..1968f38eb6 100644
> > --- a/drivers/net/gve/gve_ethdev.c
> > +++ b/drivers/net/gve/gve_ethdev.c
> > @@ -94,12 +94,39 @@ gve_dev_close(struct rte_eth_dev *dev)
> >          return err;
> >   }
> >
> > +static int
> > +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> > +{
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +       int err;
> > +
> > +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> > +               PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u",
> RTE_ETHER_MIN_MTU, priv->max_mtu);
> 
> Although this is within new 100 column limit, it is easy to break it
> without sacrificing the readability, can you break it as something like:
> 
> PMD_DRV_LOG(ERR, "MIN MTU is %u MAX MTU is %u",
> 	RTE_ETHER_MIN_MTU, priv->max_mtu);

Sure, will improve this. Thanks!

> 
> > +               return -EINVAL;
> > +       }
> > +
> > +       /* mtu setting is forbidden if port is start */
> > +       if (dev->data->dev_started) {
> > +               PMD_DRV_LOG(ERR, "Port must be stopped before
> configuration");
> > +               return -EBUSY;
> > +       }
> > +
> > +       err = gve_adminq_set_mtu(priv, mtu);
> > +       if (err) {
> > +               PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu,
> err);
> > +               return err;
> > +       }
> > +
> > +       return 0;
> > +}
> 
> 
> configure() (gve_dev_configure()) also get 'mtu' as user config
> ('eth_conf->rxmode.mtu') which is ignored right now,
> 
> since there is 'gve_adminq_set_mtu()' command already what do you
> think
> to use it within 'gve_dev_configure()'?

Do you mean to set the mtu with the user config value like:
'gve_dev_mtu_set(dev, dev->data->dev_conf.rxmode.mtu)'
within 'gve_dev_configure()'?

The ' dev->data->dev_conf.rxmode.mtu' I get at dev configure stage
is also 1500, which is lager than priv->max_mtu (1460). And this may
still cause the testpmd launch failed...

So I'll keep this part unchanged and do more investigations to figure
out the mtu issues we met. Thanks!

> 
> > +
> >   static const struct eth_dev_ops gve_eth_dev_ops = {
> >          .dev_configure        = gve_dev_configure,
> >          .dev_start            = gve_dev_start,
> >          .dev_stop             = gve_dev_stop,
> >          .dev_close            = gve_dev_close,
> >          .link_update          = gve_link_update,
> > +       .mtu_set              = gve_dev_mtu_set,
> >   };
> >
> >   static void
> > --
> > 2.34.1
> >


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v6 6/8] net/gve: add support for dev info get and dev configure
  2022-10-20 14:45                       ` Ferruh Yigit
@ 2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  2:10 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 22:46
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v6 6/8] net/gve: add support for dev info get and dev
> configure
> 
> On 10/20/2022 11:36 AM, Junfeng Guo wrote:
> 
> >
> > Add dev_ops dev_infos_get.
> > Complete dev_configure with RX offloads configuration.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > ---
> >   doc/guides/nics/features/gve.ini |  2 ++
> >   doc/guides/nics/gve.rst          |  1 +
> >   drivers/net/gve/gve_ethdev.c     | 56
> +++++++++++++++++++++++++++++++-
> >   3 files changed, 58 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/nics/features/gve.ini
> b/doc/guides/nics/features/gve.ini
> > index d1703d8dab..986df7f94a 100644
> > --- a/doc/guides/nics/features/gve.ini
> > +++ b/doc/guides/nics/features/gve.ini
> > @@ -4,8 +4,10 @@
> >   ; Refer to default.ini for the full list of available PMD features.
> >   ;
> >   [Features]
> > +Speed capabilities   = Y
> >   Link status          = Y
> >   MTU update           = Y
> > +RSS hash             = Y
> 
> I think this was added because of 'RTE_ETH_RX_OFFLOAD_RSS_HASH', it
> is
> OK to keep this feature if you add force enabling above offload,
> otherwise please remove the feature.

Sure, will keep this with the force enabling code at dev config stage. Thanks!

> 
> >   Linux                = Y
> >   x86-32               = Y
> >   x86-64               = Y
> > diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
> > index c42ff23841..8c09a5a7fa 100644
> > --- a/doc/guides/nics/gve.rst
> > +++ b/doc/guides/nics/gve.rst
> > @@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic
> functionality of packet
> >   reception and transmission.
> >   Supported features of the GVE PMD are:
> >
> > +- Receiver Side Scaling (RSS)
> 
> I am not sure if driver can claim this, I can see a RSS hash is provided
> but is it possible to update which hash function to use or update key or
> RETA table to configure which queue packets goes?
> 
> Right now what is RSS calculated on?
> 
> Perpaps RSS support can be documented as limited?
> 
> And not sure if this update belongs this patch, it should be to the one
> that has the datapath.

Same for this, thanks!

> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v6 8/8] net/gve: add support for Rx/Tx
  2022-10-20 14:47                       ` Ferruh Yigit
@ 2022-10-24  2:10                         ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  2:10 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 20, 2022 22:47
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v6 8/8] net/gve: add support for Rx/Tx
> 
> On 10/20/2022 11:36 AM, Junfeng Guo wrote:
> 
> >
> > Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +uint16_t
> > +gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t
> nb_pkts)
> > +{
> > +       volatile struct gve_rx_desc *rxr, *rxd;
> > +       struct gve_rx_queue *rxq = rx_queue;
> > +       uint16_t rx_id = rxq->rx_tail;
> > +       struct rte_mbuf *rxe;
> > +       uint16_t nb_rx, len;
> > +       uint64_t addr;
> > +       uint16_t i;
> > +
> > +       rxr = rxq->rx_desc_ring;
> > +       nb_rx = 0;
> > +
> > +       for (i = 0; i < nb_pkts; i++) {
> > +               rxd = &rxr[rx_id];
> > +               if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
> > +                       break;
> > +
> > +               if (rxd->flags_seq & GVE_RXF_ERR)
> > +                       continue;
> > +
> > +               len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
> > +               rxe = rxq->sw_ring[rx_id];
> > +               if (rxq->is_gqi_qpl) {
> > +                       addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE
> + GVE_RX_PAD;
> > +                       rte_memcpy((void *)((size_t)rxe->buf_addr + rxe-
> >data_off),
> > +                                  (void *)(size_t)addr, len);
> > +               }
> > +               rxe->pkt_len = len;
> > +               rxe->data_len = len;
> > +               rxe->port = rxq->port_id;
> > +               rxe->ol_flags = 0;
> > +
> > +               if (rxd->flags_seq & GVE_RXF_TCP)
> > +                       rxe->packet_type |= RTE_PTYPE_L4_TCP;
> > +               if (rxd->flags_seq & GVE_RXF_UDP)
> > +                       rxe->packet_type |= RTE_PTYPE_L4_UDP;
> > +               if (rxd->flags_seq & GVE_RXF_IPV4)
> > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV4;
> > +               if (rxd->flags_seq & GVE_RXF_IPV6)
> > +                       rxe->packet_type |= RTE_PTYPE_L3_IPV6;
> > +
> > +               if (gve_needs_rss(rxd->flags_seq)) {
> > +                       rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
> > +                       rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
> 
> You are updating "m->hash.rss" anyway, and if this is without and cost
> you can force enable as done in previous version:
> 'dev->data->dev_conf.rxmode.offloads |=
> RTE_ETH_RX_OFFLOAD_RSS_HASH;'

Yes, it seems the RSS is enabled by default with no obvious perf loss.
There is no RSS init stage. We will also force enable this in the dev config
stage. Thanks!

> 
> <...>
> 
> > +static inline void
> > +gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
> > +{
> > +       struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
> > +       int nb_free = 0;
> > +       int i, s;
> > +
> > +       if (unlikely(num == 0))
> > +               return;
> > +
> > +       /* Find the 1st mbuf which needs to be free */
> > +       for (s = 0; s < num; s++) {
> > +               if (txep[s] != NULL) {
> > +                       m = rte_pktmbuf_prefree_seg(txep[s]);
> > +                       if (m != NULL)
> > +                               break;
> > +                       }
> 
> '}' indentation is wrong.

Thanks for the catch! Will update in the coming version.

> 
> <...>
> 
> > +static inline void
> > +gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
> > +{
> > +       uint32_t start = txq->sw_ntc;
> > +       uint32_t ntc, nb_clean;
> > +
> > +       ntc = txq->sw_tail;
> > +
> > +       if (ntc == start)
> > +               return;
> > +
> > +       /* if wrap around, free twice. */
> > +       if (ntc < start) {
> > +               nb_clean = txq->nb_tx_desc - start;
> > +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> > +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> > +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> > +
> > +               txq->sw_nb_free += nb_clean;
> > +               start += nb_clean;
> > +               if (start == txq->nb_tx_desc)
> > +                       start = 0;
> > +               txq->sw_ntc = start;
> > +       }
> > +
> > +       if (ntc > start) {
> 
> may be can drop the 'if' block, since "ntc == start" and "ntc < start"
> cases already covered.

Sure, will drop this 'if' in the coming version. Thanks!

> 
> <...>
> 
> > +uint16_t
> > +gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> nb_pkts)
> > +{
> > +       struct gve_tx_queue *txq = tx_queue;
> > +
> > +       if (txq->is_gqi_qpl)
> > +               return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
> > +
> > +       return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
> > +}
> > +
> 
> Can there be mix of queue types?
> If only one queue type is supported in specific config, perhaps burst
> function can be set during configuration, to prevent if check on datapath.
> 
> This is optimization and can be done later, it doesn't have to be in the
> set.

Maybe not. There are three queue format types, and we can get the real 
used one via adminq with 'priv->queue_format'. So there may not have 
mix of the queue types. And currently only GQI_QPL and GQI_RDA queue 
format are supported in PMD. Also, only GQI_QPL queue format is in use 
on GCP since GQI_RDA hasn't been released in production.

Will do some refactors for the queue types later. Thanks for the advice!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 1/8] net/gve/base: introduce base code
  2022-10-21  9:49                         ` Ferruh Yigit
@ 2022-10-24  5:04                           ` Guo, Junfeng
  2022-10-24 10:47                             ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  5:04 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, October 21, 2022 17:49
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v7 1/8] net/gve/base: introduce base code
> 
> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> > The following base code is based on Google Virtual Ethernet (gve)
> > driver v1.3.0 under MIT license.
> > - gve_adminq.c
> > - gve_adminq.h
> > - gve_desc.h
> > - gve_desc_dqo.h
> > - gve_register.h
> > - gve.h
> >
> > The original code is in:
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linux/\
> > tree/v1.3.0/google/gve
> >
> > Note that these code are not Intel files and they come from the kernel
> > community. The base code there has the statement of
> > SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> > required MIT license as an exception to DPDK.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32
> prod_cnt)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++)
> {
> > +		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
> > +		    == prod_cnt)
> 
> [copy/paste from previous version]
> 
> Syntax, why not move second half of the equation in above line?
> Unless this is coming from google code and updating it brings
> maintanance cost.

Yes, this function is just coming from google code without changes.
So it would be better to keep this unchanged. Thanks!

Hi Ferruh, 
Sorry for the late reply in v6, my mistake.
The replies are edited as drafts but not sent out in the end...
Please help review, really thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 3/8] net/gve: add support for device initialization
  2022-10-21  9:49                         ` Ferruh Yigit
@ 2022-10-24  5:04                           ` Guo, Junfeng
  2022-10-24 10:47                             ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  5:04 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, October 21, 2022 17:50
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v7 3/8] net/gve: add support for device initialization
> 
> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> 
> >
> > Support device init and add following devops skeleton:
> >   - dev_configure
> >   - dev_start
> >   - dev_stop
> >   - dev_close
> >
> > Note that build system (including doc) is also added in this patch.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +static int
> > +gve_dev_init(struct rte_eth_dev *eth_dev)
> > +{
> > +       struct gve_priv *priv = eth_dev->data->dev_private;
> > +       int max_tx_queues, max_rx_queues;
> > +       struct rte_pci_device *pci_dev;
> > +       struct gve_registers *reg_bar;
> > +       rte_be32_t *db_bar;
> > +       int err;
> > +
> > +       eth_dev->dev_ops = &gve_eth_dev_ops;
> > +
> > +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> > +               return 0;
> > +
> > +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> > +
> > +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> > +       if (!reg_bar) {
> > +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> > +               return -ENOMEM;
> > +       }
> > +
> > +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> > +       if (!db_bar) {
> > +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> > +               return -ENOMEM;
> > +       }
> > +
> > +       gve_write_version(&reg_bar->driver_version);
> > +       /* Get max queues to alloc etherdev */
> > +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> > +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> > +
> > +       priv->reg_bar0 = reg_bar;
> > +       priv->db_bar2 = db_bar;
> > +       priv->pci_dev = pci_dev;
> > +       priv->state_flags = 0x0;
> > +
> > +       priv->max_nb_txq = max_tx_queues;
> > +       priv->max_nb_rxq = max_rx_queues;
> > +
> > +       err = gve_init_priv(priv, false);
> > +       if (err)
> > +               return err;
> > +
> > +       eth_dev->data->mac_addrs = &priv->dev_addr;
> > +
> 
> [copy/paste from previous version]
> 
> What is the value in 'priv->dev_addr'?
> Even allocating memory for 'eth_dev->data->mac_addrs' removed or not,
> as
> we discussed, independent from it, need to set a valid value to
> 'priv->dev_addr'.

The value in 'priv->dev_addr' is the 'real' mac address of the gvnic port.
So I suppose that there is no need to set a default valid one, since we can
get them/it from the backend in gve_adminq_describe_device(priv).
Thanks!

> 
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 5/8] net/gve: add support for MTU setting
  2022-10-21  9:50                         ` Ferruh Yigit
@ 2022-10-24  5:04                           ` Guo, Junfeng
  2022-10-24 10:47                             ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  5:04 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, October 21, 2022 17:50
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v7 5/8] net/gve: add support for MTU setting
> 
> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> 
> >
> > Support dev_ops mtu_set.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +static int
> > +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> > +{
> > +       struct gve_priv *priv = dev->data->dev_private;
> > +       int err;
> > +
> > +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> > +               PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
> > +                           RTE_ETHER_MIN_MTU, priv->max_mtu);
> > +               return -EINVAL;
> > +       }
> > +
> > +       /* mtu setting is forbidden if port is start */
> > +       if (dev->data->dev_started) {
> > +               PMD_DRV_LOG(ERR, "Port must be stopped before
> configuration");
> > +               return -EBUSY;
> > +       }
> > +
> > +       err = gve_adminq_set_mtu(priv, mtu);
> > +       if (err) {
> > +               PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu,
> err);
> > +               return err;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> 
> [copy/paste from previous version]
> 
> configure() (gve_dev_configure()) also get 'mtu' as user config
> ('eth_conf->rxmode.mtu') which is ignored right now,
> 
> since there is 'gve_adminq_set_mtu()' command already what do you
> think
> to use it within 'gve_dev_configure()'?

There may be issues to set mtu with ('eth_conf->rxmode.mtu'). 
So better to keep this ignored at this stage. 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 6/8] net/gve: add support for dev info get and dev configure
  2022-10-21  9:51                         ` Ferruh Yigit
@ 2022-10-24  5:04                           ` Guo, Junfeng
  2022-10-24 10:48                             ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  5:04 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, ferruh.yigit, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, October 21, 2022 17:51
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v7 6/8] net/gve: add support for dev info get and dev
> configure
> 
> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> 
> >
> > Add dev_ops dev_infos_get.
> > Complete dev_configure with RX offloads force enabling.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > --- a/doc/guides/nics/gve.rst
> > +++ b/doc/guides/nics/gve.rst
> > @@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic
> functionality of packet
> >   reception and transmission.
> >   Supported features of the GVE PMD are:
> >
> > +- Receiver Side Scaling (RSS)
> 
> [copy/paste from previous version]
> 
> I am not sure if driver can claim this, I can see a RSS hash is provided
> but is it possible to update which hash function to use or update key or
> RETA table to configure which queue packets goes?
> 
> Right now what is RSS calculated on?
> 
> Perpaps RSS support can be documented as limited?
> 
> And not sure if this update belongs this patch, it should be to the one
> that has the datapath.

Looks that the RSS is enabled by default, and there is no RSS init API.
So I just added back the force-enabled RSS offloading code with the
corresponding commit message. So the feature list remains unchaged.
Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 8/8] net/gve: add support for Rx/Tx
  2022-10-21  9:52                         ` Ferruh Yigit
@ 2022-10-24  5:04                           ` Guo, Junfeng
  2022-10-24 10:50                             ` Ferruh Yigit
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24  5:04 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, ferruh.yigit, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, October 21, 2022 17:52
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
> 
> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> 
> >
> > Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> 
> <...>
> 
> > +
> > +static inline void
> > +gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
> > +{
> > +       uint32_t start = txq->sw_ntc;
> > +       uint32_t ntc, nb_clean;
> > +
> > +       ntc = txq->sw_tail;
> > +
> > +       if (ntc == start)
> > +               return;
> > +
> > +       /* if wrap around, free twice. */
> > +       if (ntc < start) {
> > +               nb_clean = txq->nb_tx_desc - start;
> > +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> > +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> > +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> > +
> > +               txq->sw_nb_free += nb_clean;
> > +               start += nb_clean;
> > +               if (start == txq->nb_tx_desc)
> > +                       start = 0;
> > +               txq->sw_ntc = start;
> > +       }
> > +
> > +       if (ntc > start) {
> > +               nb_clean = ntc - start;
> > +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> > +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> > +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> > +               txq->sw_nb_free += nb_clean;
> > +               start += nb_clean;
> > +               txq->sw_ntc = start;
> > +       }
> > +}
> 
> [copy/paste from previous version]
> 
> may be can drop the 'if' block, since "ntc == start" and "ntc < start"
> cases already covered.

Yes, this 'if' block is dropped in v7 as suggested. Thanks!

> 
> <...>
> 
> > +uint16_t
> > +gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> nb_pkts)
> > +{
> > +       struct gve_tx_queue *txq = tx_queue;
> > +
> > +       if (txq->is_gqi_qpl)
> > +               return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
> > +
> > +       return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
> > +}
> > +
> 
> [copy/paste from previous version]
> 
> Can there be mix of queue types?
> If only one queue type is supported in specific config, perhaps burst
> function can be set during configuration, to prevent if check on datapath.
> 
> This is optimization and can be done later, it doesn't have to be in the
> set.

The exact queue type can be fetched from the backend via adminq 
in priv->queue_format. So there won't be mix of the queue types.
Currently, only GQI_QPL and GQI_RDA queue format are supported 
in PMD. Also, only GQI_QPL queue format is in use on GCP since 
GQI_RDA hasn't been released in production.
This part code will be optimized/refactored later when involving 
the queue type DQO_RDA. Thanks!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 3/8] net/gve: add support for device initialization
  2022-10-24  5:04                           ` Guo, Junfeng
@ 2022-10-24 10:47                             ` Ferruh Yigit
  2022-10-24 13:22                               ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-24 10:47 UTC (permalink / raw)
  To: Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue

On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, October 21, 2022 17:50
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
>> Beilei <beilei.xing@intel.com>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
>> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
>> <haiyue.wang@intel.com>
>> Subject: Re: [PATCH v7 3/8] net/gve: add support for device initialization
>>
>> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
>>
>>>
>>> Support device init and add following devops skeleton:
>>>    - dev_configure
>>>    - dev_start
>>>    - dev_stop
>>>    - dev_close
>>>
>>> Note that build system (including doc) is also added in this patch.
>>>
>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> +static int
>>> +gve_dev_init(struct rte_eth_dev *eth_dev)
>>> +{
>>> +       struct gve_priv *priv = eth_dev->data->dev_private;
>>> +       int max_tx_queues, max_rx_queues;
>>> +       struct rte_pci_device *pci_dev;
>>> +       struct gve_registers *reg_bar;
>>> +       rte_be32_t *db_bar;
>>> +       int err;
>>> +
>>> +       eth_dev->dev_ops = &gve_eth_dev_ops;
>>> +
>>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>>> +               return 0;
>>> +
>>> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
>>> +
>>> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
>>> +       if (!reg_bar) {
>>> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
>>> +               return -ENOMEM;
>>> +       }
>>> +
>>> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
>>> +       if (!db_bar) {
>>> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
>>> +               return -ENOMEM;
>>> +       }
>>> +
>>> +       gve_write_version(&reg_bar->driver_version);
>>> +       /* Get max queues to alloc etherdev */
>>> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
>>> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
>>> +
>>> +       priv->reg_bar0 = reg_bar;
>>> +       priv->db_bar2 = db_bar;
>>> +       priv->pci_dev = pci_dev;
>>> +       priv->state_flags = 0x0;
>>> +
>>> +       priv->max_nb_txq = max_tx_queues;
>>> +       priv->max_nb_rxq = max_rx_queues;
>>> +
>>> +       err = gve_init_priv(priv, false);
>>> +       if (err)
>>> +               return err;
>>> +
>>> +       eth_dev->data->mac_addrs = &priv->dev_addr;
>>> +
>>
>> [copy/paste from previous version]
>>
>> What is the value in 'priv->dev_addr'?
>> Even allocating memory for 'eth_dev->data->mac_addrs' removed or not,
>> as
>> we discussed, independent from it, need to set a valid value to
>> 'priv->dev_addr'.
> 
> The value in 'priv->dev_addr' is the 'real' mac address of the gvnic port.
> So I suppose that there is no need to set a default valid one, since we can
> get them/it from the backend in gve_adminq_describe_device(priv).

Ack, thanks for clarification.

In 'gve_adminq_describe_device()', RTE_ETHER_ADDR_PRT_FMT & 
RTE_ETHER_ADDR_BYTES can be used for log, I will comment on patch.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 1/8] net/gve/base: introduce base code
  2022-10-24  5:04                           ` Guo, Junfeng
@ 2022-10-24 10:47                             ` Ferruh Yigit
  2022-10-24 13:23                               ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-24 10:47 UTC (permalink / raw)
  To: Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue

On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, October 21, 2022 17:49
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
>> Beilei <beilei.xing@intel.com>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
>> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
>> <haiyue.wang@intel.com>
>> Subject: Re: [PATCH v7 1/8] net/gve/base: introduce base code
>>
>> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
>>> The following base code is based on Google Virtual Ethernet (gve)
>>> driver v1.3.0 under MIT license.
>>> - gve_adminq.c
>>> - gve_adminq.h
>>> - gve_desc.h
>>> - gve_desc_dqo.h
>>> - gve_register.h
>>> - gve.h
>>>
>>> The original code is in:
>>> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
>> linux/\
>>> tree/v1.3.0/google/gve
>>>
>>> Note that these code are not Intel files and they come from the kernel
>>> community. The base code there has the statement of
>>> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
>>> required MIT license as an exception to DPDK.
>>>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> +static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32
>> prod_cnt)
>>> +{
>>> +	int i;
>>> +
>>> +	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++)
>> {
>>> +		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
>>> +		    == prod_cnt)
>>
>> [copy/paste from previous version]
>>
>> Syntax, why not move second half of the equation in above line?
>> Unless this is coming from google code and updating it brings
>> maintanance cost.
> 
> Yes, this function is just coming from google code without changes.
> So it would be better to keep this unchanged. Thanks!
> 

ack


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 5/8] net/gve: add support for MTU setting
  2022-10-24  5:04                           ` Guo, Junfeng
@ 2022-10-24 10:47                             ` Ferruh Yigit
  2022-10-24 13:23                               ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-24 10:47 UTC (permalink / raw)
  To: Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin

On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, October 21, 2022 17:50
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
>> Beilei <beilei.xing@intel.com>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
>> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> Zhang, Helin <helin.zhang@intel.com>
>> Subject: Re: [PATCH v7 5/8] net/gve: add support for MTU setting
>>
>> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
>>
>>>
>>> Support dev_ops mtu_set.
>>>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> +static int
>>> +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>>> +{
>>> +       struct gve_priv *priv = dev->data->dev_private;
>>> +       int err;
>>> +
>>> +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
>>> +               PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
>>> +                           RTE_ETHER_MIN_MTU, priv->max_mtu);
>>> +               return -EINVAL;
>>> +       }
>>> +
>>> +       /* mtu setting is forbidden if port is start */
>>> +       if (dev->data->dev_started) {
>>> +               PMD_DRV_LOG(ERR, "Port must be stopped before
>> configuration");
>>> +               return -EBUSY;
>>> +       }
>>> +
>>> +       err = gve_adminq_set_mtu(priv, mtu);
>>> +       if (err) {
>>> +               PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu,
>> err);
>>> +               return err;
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>
>> [copy/paste from previous version]
>>
>> configure() (gve_dev_configure()) also get 'mtu' as user config
>> ('eth_conf->rxmode.mtu') which is ignored right now,
>>
>> since there is 'gve_adminq_set_mtu()' command already what do you
>> think
>> to use it within 'gve_dev_configure()'?
> 
> There may be issues to set mtu with ('eth_conf->rxmode.mtu').
> So better to keep this ignored at this stage.
> 

What do you mean by issues?

'eth_conf->rxmode.mtu' is user provided config parameter, so user may 
prefer to provide this value and not call 'rte_eth_dev_set_mtu()' at all 
and still can expect correct MTU value.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 6/8] net/gve: add support for dev info get and dev configure
  2022-10-24  5:04                           ` Guo, Junfeng
@ 2022-10-24 10:48                             ` Ferruh Yigit
  2022-10-24 13:23                               ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-24 10:48 UTC (permalink / raw)
  To: Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin

On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, October 21, 2022 17:51
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
>> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
>> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> Zhang, Helin <helin.zhang@intel.com>
>> Subject: Re: [PATCH v7 6/8] net/gve: add support for dev info get and dev
>> configure
>>
>> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
>>
>>>
>>> Add dev_ops dev_infos_get.
>>> Complete dev_configure with RX offloads force enabling.
>>>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> --- a/doc/guides/nics/gve.rst
>>> +++ b/doc/guides/nics/gve.rst
>>> @@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic
>> functionality of packet
>>>    reception and transmission.
>>>    Supported features of the GVE PMD are:
>>>
>>> +- Receiver Side Scaling (RSS)
>>
>> [copy/paste from previous version]
>>
>> I am not sure if driver can claim this, I can see a RSS hash is provided
>> but is it possible to update which hash function to use or update key or
>> RETA table to configure which queue packets goes?
>>
>> Right now what is RSS calculated on?
>>
>> Perpaps RSS support can be documented as limited?
>>
>> And not sure if this update belongs this patch, it should be to the one
>> that has the datapath.
> 
> Looks that the RSS is enabled by default, and there is no RSS init API.
> So I just added back the force-enabled RSS offloading code with the
> corresponding commit message. So the feature list remains unchaged.

There is difference between RSS and RSS hash, what force enabled is "RSS 
hash" where device calculated hash value is shared with application in 
case application wants to reuse this value for some reasons.

But for RSS support, there is a set of configuration required by DPDK 
seems missing, as mentioned above, config RSS hash function (based on 
which part of the packet is the hash calculated), or RETA table update 
so app can select which packets goes to which queue, etc...

Is it at least possible to document what existing configuration is?

Because of missing configuration support, I don't think it is correct to 
document RSS as supported, can you please update it to say something 
like limited support exist with default config.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
  2022-10-24  5:04                           ` Guo, Junfeng
@ 2022-10-24 10:50                             ` Ferruh Yigit
  2022-10-24 13:25                               ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-24 10:50 UTC (permalink / raw)
  To: Guo, Junfeng, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin

On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, October 21, 2022 17:52
>> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
>> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
>> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> awogbemila@google.com; Richardson, Bruce
>> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
>> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> Zhang, Helin <helin.zhang@intel.com>
>> Subject: Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
>>
>> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
>>
>>>
>>> Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
>>>
>>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
>>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
>>
>> <...>
>>
>>> +
>>> +static inline void
>>> +gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
>>> +{
>>> +       uint32_t start = txq->sw_ntc;
>>> +       uint32_t ntc, nb_clean;
>>> +
>>> +       ntc = txq->sw_tail;
>>> +
>>> +       if (ntc == start)
>>> +               return;
>>> +
>>> +       /* if wrap around, free twice. */
>>> +       if (ntc < start) {
>>> +               nb_clean = txq->nb_tx_desc - start;
>>> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
>>> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
>>> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
>>> +
>>> +               txq->sw_nb_free += nb_clean;
>>> +               start += nb_clean;
>>> +               if (start == txq->nb_tx_desc)
>>> +                       start = 0;
>>> +               txq->sw_ntc = start;
>>> +       }
>>> +
>>> +       if (ntc > start) {
>>> +               nb_clean = ntc - start;
>>> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
>>> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
>>> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
>>> +               txq->sw_nb_free += nb_clean;
>>> +               start += nb_clean;
>>> +               txq->sw_ntc = start;
>>> +       }
>>> +}
>>
>> [copy/paste from previous version]
>>
>> may be can drop the 'if' block, since "ntc == start" and "ntc < start"
>> cases already covered.
> 
> Yes, this 'if' block is dropped in v7 as suggested. Thanks!
> 

This is v7 and please check above code which has the mentioned 'if' 
block exist, do you mean in coming v8?

>>
>> <...>
>>
>>> +uint16_t
>>> +gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
>> nb_pkts)
>>> +{
>>> +       struct gve_tx_queue *txq = tx_queue;
>>> +
>>> +       if (txq->is_gqi_qpl)
>>> +               return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
>>> +
>>> +       return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
>>> +}
>>> +
>>
>> [copy/paste from previous version]
>>
>> Can there be mix of queue types?
>> If only one queue type is supported in specific config, perhaps burst
>> function can be set during configuration, to prevent if check on datapath.
>>
>> This is optimization and can be done later, it doesn't have to be in the
>> set.
> 
> The exact queue type can be fetched from the backend via adminq
> in priv->queue_format. So there won't be mix of the queue types.
> Currently, only GQI_QPL and GQI_RDA queue format are supported
> in PMD. Also, only GQI_QPL queue format is in use on GCP since
> GQI_RDA hasn't been released in production.
> This part code will be optimized/refactored later when involving
> the queue type DQO_RDA. Thanks!
> 


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 0/8] introduce GVE PMD
  2022-10-21 13:12                       ` [PATCH v7 0/8] introduce GVE PMD Ferruh Yigit
@ 2022-10-24 10:50                         ` Ferruh Yigit
  2022-10-24 13:25                           ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-24 10:50 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/21/2022 2:12 PM, Ferruh Yigit wrote:
> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
>> Introduce a new PMD for Google Virtual Ethernet (GVE).
>>
>> gve (or gVNIC) is the standard virtual ethernet interface on Google Cloud
>> Platform (GCP), which is one of the multiple virtual interfaces from 
>> those
>> leading CSP customers in the world.
>>
>> Having a well maintained/optimized gve PMD on DPDK community can help 
>> those
>> cloud instance consumers with better experience of performance, 
>> maintenance
>> who wants to run their own VNFs on GCP.
>>
>> Please refer 
>> tohttps://cloud.google.com/compute/docs/networking/using-gvnic
>> for the device description.
>>
>> This patch set requires an exception for MIT license for GVE base code.
>> And the base code includes the following files:
>>   - gve_adminq.c
>>   - gve_adminq.h
>>   - gve_desc.h
>>   - gve_desc_dqo.h
>>   - gve_register.h
>>
>> It's based on GVE kernel driver v1.3.0 and the original code is in
>> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0
>>
>>
>> v2:
>> fix some CI check error.
>>
>> v3:
>> refactor some code and fix some build error.
>>
>> v4:
>> move the Google base code files into DPDK base folder.
>>
>> v5:
>> reorder commit sequence and drop the stats feature.
>>
>> v6-v7:
>> improve the code.
>>
>> Junfeng Guo (8):
>>    net/gve/base: introduce base code
>>    net/gve/base: add OS specific implementation
>>    net/gve: add support for device initialization
>>    net/gve: add support for link update
>>    net/gve: add support for MTU setting
>>    net/gve: add support for dev info get and dev configure
>>    net/gve: add support for queue operations
>>    net/gve: add support for Rx/Tx
> 
> Can you please check the build error reported by CI:
> https://mails.dpdk.org/archives/test-report/2022-October/318054.html
> 
> 
> Following link can be helpful:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225324
> 

Reminder if this build error, please send v8 with fix.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v7 1/8] net/gve/base: introduce base code
  2022-10-21  9:19                       ` [PATCH v7 1/8] net/gve/base: introduce base code Junfeng Guo
  2022-10-21  9:49                         ` Ferruh Yigit
@ 2022-10-24 10:50                         ` Ferruh Yigit
  2022-10-24 13:26                           ` Guo, Junfeng
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
  2 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-24 10:50 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Haiyue Wang

On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> +int gve_adminq_describe_device(struct gve_priv *priv)
> +{
> +	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
> +	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
> +	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
> +	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
> +	struct gve_device_descriptor *descriptor;
> +	struct gve_dma_mem descriptor_dma_mem;
> +	u32 supported_features_mask = 0;
> +	union gve_adminq_command cmd;
> +	int err = 0;
> +	u8 *mac;
> +	u16 mtu;
> +
> +	memset(&cmd, 0, sizeof(cmd));
> +	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
> +	if (!descriptor)
> +		return -ENOMEM;
> +	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
> +	cmd.describe_device.device_descriptor_addr =
> +					cpu_to_be64(descriptor_dma_mem.pa);
> +	cmd.describe_device.device_descriptor_version =
> +			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
> +	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
> +
> +	err = gve_adminq_execute_cmd(priv, &cmd);
> +	if (err)
> +		goto free_device_descriptor;
> +
> +	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
> +					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
> +					 &dev_op_jumbo_frames);
> +	if (err)
> +		goto free_device_descriptor;
> +
> +	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
> +	 * is not set to GqiRda, choose the queue format in a priority order:
> +	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
> +	 */
> +	if (dev_op_dqo_rda) {
> +		priv->queue_format = GVE_DQO_RDA_FORMAT;
> +		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
> +		supported_features_mask =
> +			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
> +	} else if (dev_op_gqi_rda) {
> +		priv->queue_format = GVE_GQI_RDA_FORMAT;
> +		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
> +		supported_features_mask =
> +			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
> +	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
> +		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
> +	} else {
> +		priv->queue_format = GVE_GQI_QPL_FORMAT;
> +		if (dev_op_gqi_qpl)
> +			supported_features_mask =
> +				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
> +		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
> +	}
> +	if (gve_is_gqi(priv)) {
> +		err = gve_set_desc_cnt(priv, descriptor);
> +	} else {
> +		/* DQO supports LRO. */
> +		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
> +	}
> +	if (err)
> +		goto free_device_descriptor;
> +
> +	priv->max_registered_pages =
> +				be64_to_cpu(descriptor->max_registered_pages);
> +	mtu = be16_to_cpu(descriptor->mtu);
> +	if (mtu < ETH_MIN_MTU) {
> +		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
> +		err = -EINVAL;
> +		goto free_device_descriptor;
> +	}
> +	priv->max_mtu = mtu;
> +	priv->num_event_counters = be16_to_cpu(descriptor->counters);
> +	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
> +	mac = descriptor->mac;
> +	PMD_DRV_LOG(INFO, "MAC addr: %02x:%02x:%02x:%02x:%02x:%02x",
> +		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);


There are 'RTE_ETHER_ADDR_PRT_FMT' & 'RTE_ETHER_ADDR_BYTES' macros for 
this purpose, you can use it like:

PMD_DRV_LOG(INFO, "MAC addr" RTE_ETHER_ADDR_PRT_FMT, 
RTE_ETHER_ADDR_BYTES(priv->dev_addr));

So can get rid of 'mac' variable.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 3/8] net/gve: add support for device initialization
  2022-10-24 10:47                             ` Ferruh Yigit
@ 2022-10-24 13:22                               ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24 13:22 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, October 24, 2022 18:47
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v7 3/8] net/gve: add support for device initialization
> 
> On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, October 21, 2022 17:50
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> >> Beilei <beilei.xing@intel.com>
> >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> >> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> >> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> >> <haiyue.wang@intel.com>
> >> Subject: Re: [PATCH v7 3/8] net/gve: add support for device
> initialization
> >>
> >> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> >>
> >>>
> >>> Support device init and add following devops skeleton:
> >>>    - dev_configure
> >>>    - dev_start
> >>>    - dev_stop
> >>>    - dev_close
> >>>
> >>> Note that build system (including doc) is also added in this patch.
> >>>
> >>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> +static int
> >>> +gve_dev_init(struct rte_eth_dev *eth_dev)
> >>> +{
> >>> +       struct gve_priv *priv = eth_dev->data->dev_private;
> >>> +       int max_tx_queues, max_rx_queues;
> >>> +       struct rte_pci_device *pci_dev;
> >>> +       struct gve_registers *reg_bar;
> >>> +       rte_be32_t *db_bar;
> >>> +       int err;
> >>> +
> >>> +       eth_dev->dev_ops = &gve_eth_dev_ops;
> >>> +
> >>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> >>> +               return 0;
> >>> +
> >>> +       pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
> >>> +
> >>> +       reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
> >>> +       if (!reg_bar) {
> >>> +               PMD_DRV_LOG(ERR, "Failed to map pci bar!");
> >>> +               return -ENOMEM;
> >>> +       }
> >>> +
> >>> +       db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
> >>> +       if (!db_bar) {
> >>> +               PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
> >>> +               return -ENOMEM;
> >>> +       }
> >>> +
> >>> +       gve_write_version(&reg_bar->driver_version);
> >>> +       /* Get max queues to alloc etherdev */
> >>> +       max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
> >>> +       max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
> >>> +
> >>> +       priv->reg_bar0 = reg_bar;
> >>> +       priv->db_bar2 = db_bar;
> >>> +       priv->pci_dev = pci_dev;
> >>> +       priv->state_flags = 0x0;
> >>> +
> >>> +       priv->max_nb_txq = max_tx_queues;
> >>> +       priv->max_nb_rxq = max_rx_queues;
> >>> +
> >>> +       err = gve_init_priv(priv, false);
> >>> +       if (err)
> >>> +               return err;
> >>> +
> >>> +       eth_dev->data->mac_addrs = &priv->dev_addr;
> >>> +
> >>
> >> [copy/paste from previous version]
> >>
> >> What is the value in 'priv->dev_addr'?
> >> Even allocating memory for 'eth_dev->data->mac_addrs' removed or
> not,
> >> as
> >> we discussed, independent from it, need to set a valid value to
> >> 'priv->dev_addr'.
> >
> > The value in 'priv->dev_addr' is the 'real' mac address of the gvnic port.
> > So I suppose that there is no need to set a default valid one, since we
> can
> > get them/it from the backend in gve_adminq_describe_device(priv).
> 
> Ack, thanks for clarification.
> 
> In 'gve_adminq_describe_device()', RTE_ETHER_ADDR_PRT_FMT &
> RTE_ETHER_ADDR_BYTES can be used for log, I will comment on patch.

Thanks for the advice! Will try this in the coming version.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 1/8] net/gve/base: introduce base code
  2022-10-24 10:47                             ` Ferruh Yigit
@ 2022-10-24 13:23                               ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24 13:23 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, October 24, 2022 18:48
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v7 1/8] net/gve/base: introduce base code
> 
> On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, October 21, 2022 17:49
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> >> Beilei <beilei.xing@intel.com>
> >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> >> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> >> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> >> <haiyue.wang@intel.com>
> >> Subject: Re: [PATCH v7 1/8] net/gve/base: introduce base code
> >>
> >> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> >>> The following base code is based on Google Virtual Ethernet (gve)
> >>> driver v1.3.0 under MIT license.
> >>> - gve_adminq.c
> >>> - gve_adminq.h
> >>> - gve_desc.h
> >>> - gve_desc_dqo.h
> >>> - gve_register.h
> >>> - gve.h
> >>>
> >>> The original code is in:
> >>> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> >> linux/\
> >>> tree/v1.3.0/google/gve
> >>>
> >>> Note that these code are not Intel files and they come from the
> kernel
> >>> community. The base code there has the statement of
> >>> SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
> >>> required MIT license as an exception to DPDK.
> >>>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> +static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32
> >> prod_cnt)
> >>> +{
> >>> +	int i;
> >>> +
> >>> +	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++)
> >> {
> >>> +		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
> >>> +		    == prod_cnt)
> >>
> >> [copy/paste from previous version]
> >>
> >> Syntax, why not move second half of the equation in above line?
> >> Unless this is coming from google code and updating it brings
> >> maintanance cost.
> >
> > Yes, this function is just coming from google code without changes.
> > So it would be better to keep this unchanged. Thanks!
> >
> 
> ack

Thanks for your review!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 5/8] net/gve: add support for MTU setting
  2022-10-24 10:47                             ` Ferruh Yigit
@ 2022-10-24 13:23                               ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24 13:23 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, October 24, 2022 18:48
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v7 5/8] net/gve: add support for MTU setting
> 
> On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, October 21, 2022 17:50
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> >> Beilei <beilei.xing@intel.com>
> >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> >> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> >> Zhang, Helin <helin.zhang@intel.com>
> >> Subject: Re: [PATCH v7 5/8] net/gve: add support for MTU setting
> >>
> >> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> >>
> >>>
> >>> Support dev_ops mtu_set.
> >>>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> +static int
> >>> +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> >>> +{
> >>> +       struct gve_priv *priv = dev->data->dev_private;
> >>> +       int err;
> >>> +
> >>> +       if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> >>> +               PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
> >>> +                           RTE_ETHER_MIN_MTU, priv->max_mtu);
> >>> +               return -EINVAL;
> >>> +       }
> >>> +
> >>> +       /* mtu setting is forbidden if port is start */
> >>> +       if (dev->data->dev_started) {
> >>> +               PMD_DRV_LOG(ERR, "Port must be stopped before
> >> configuration");
> >>> +               return -EBUSY;
> >>> +       }
> >>> +
> >>> +       err = gve_adminq_set_mtu(priv, mtu);
> >>> +       if (err) {
> >>> +               PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d",
> mtu,
> >> err);
> >>> +               return err;
> >>> +       }
> >>> +
> >>> +       return 0;
> >>> +}
> >>> +
> >>
> >> [copy/paste from previous version]
> >>
> >> configure() (gve_dev_configure()) also get 'mtu' as user config
> >> ('eth_conf->rxmode.mtu') which is ignored right now,
> >>
> >> since there is 'gve_adminq_set_mtu()' command already what do you
> >> think
> >> to use it within 'gve_dev_configure()'?
> >
> > There may be issues to set mtu with ('eth_conf->rxmode.mtu').
> > So better to keep this ignored at this stage.
> >
> 
> What do you mean by issues?
> 
> 'eth_conf->rxmode.mtu' is user provided config parameter, so user may
> prefer to provide this value and not call 'rte_eth_dev_set_mtu()' at all
> and still can expect correct MTU value.

Yes, it should be like this. But on current GCP, the fetched MTU value is
1460, which is smaller than 1500. And if we set MTU with the value
1500, the backend will return a failed message.
I think it would be better to comment this as an limitation or known issue.


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 6/8] net/gve: add support for dev info get and dev configure
  2022-10-24 10:48                             ` Ferruh Yigit
@ 2022-10-24 13:23                               ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24 13:23 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, October 24, 2022 18:49
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v7 6/8] net/gve: add support for dev info get and dev
> configure
> 
> On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, October 21, 2022 17:51
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> >> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
> >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> >> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> >> Zhang, Helin <helin.zhang@intel.com>
> >> Subject: Re: [PATCH v7 6/8] net/gve: add support for dev info get and
> dev
> >> configure
> >>
> >> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> >>
> >>>
> >>> Add dev_ops dev_infos_get.
> >>> Complete dev_configure with RX offloads force enabling.
> >>>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> --- a/doc/guides/nics/gve.rst
> >>> +++ b/doc/guides/nics/gve.rst
> >>> @@ -62,6 +62,7 @@ In this release, the GVE PMD provides the basic
> >> functionality of packet
> >>>    reception and transmission.
> >>>    Supported features of the GVE PMD are:
> >>>
> >>> +- Receiver Side Scaling (RSS)
> >>
> >> [copy/paste from previous version]
> >>
> >> I am not sure if driver can claim this, I can see a RSS hash is provided
> >> but is it possible to update which hash function to use or update key or
> >> RETA table to configure which queue packets goes?
> >>
> >> Right now what is RSS calculated on?
> >>
> >> Perpaps RSS support can be documented as limited?
> >>
> >> And not sure if this update belongs this patch, it should be to the one
> >> that has the datapath.
> >
> > Looks that the RSS is enabled by default, and there is no RSS init API.
> > So I just added back the force-enabled RSS offloading code with the
> > corresponding commit message. So the feature list remains unchaged.
> 
> There is difference between RSS and RSS hash, what force enabled is "RSS
> hash" where device calculated hash value is shared with application in
> case application wants to reuse this value for some reasons.
> 
> But for RSS support, there is a set of configuration required by DPDK
> seems missing, as mentioned above, config RSS hash function (based on
> which part of the packet is the hash calculated), or RETA table update
> so app can select which packets goes to which queue, etc...
> 
> Is it at least possible to document what existing configuration is?
> 
> Because of missing configuration support, I don't think it is correct to
> document RSS as supported, can you please update it to say something
> like limited support exist with default config.

Thanks for the detailed explanations!
Yes, you are right. The RSS cannot be configured by this user at this time.
So it should be commented as an limitation here. Will update this in the 
coming version. Really thanks for your time and efforts to review!


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 8/8] net/gve: add support for Rx/Tx
  2022-10-24 10:50                             ` Ferruh Yigit
@ 2022-10-24 13:25                               ` Guo, Junfeng
  2022-10-25  9:07                                 ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24 13:25 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, October 24, 2022 18:50
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
> 
> On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, October 21, 2022 17:52
> >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> >> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
> >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> awogbemila@google.com; Richardson, Bruce
> >> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> >> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> >> Zhang, Helin <helin.zhang@intel.com>
> >> Subject: Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
> >>
> >> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> >>
> >>>
> >>> Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> >>>
> >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> >>
> >> <...>
> >>
> >>> +
> >>> +static inline void
> >>> +gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
> >>> +{
> >>> +       uint32_t start = txq->sw_ntc;
> >>> +       uint32_t ntc, nb_clean;
> >>> +
> >>> +       ntc = txq->sw_tail;
> >>> +
> >>> +       if (ntc == start)
> >>> +               return;
> >>> +
> >>> +       /* if wrap around, free twice. */
> >>> +       if (ntc < start) {
> >>> +               nb_clean = txq->nb_tx_desc - start;
> >>> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> >>> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> >>> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> >>> +
> >>> +               txq->sw_nb_free += nb_clean;
> >>> +               start += nb_clean;
> >>> +               if (start == txq->nb_tx_desc)
> >>> +                       start = 0;
> >>> +               txq->sw_ntc = start;
> >>> +       }
> >>> +
> >>> +       if (ntc > start) {
> >>> +               nb_clean = ntc - start;
> >>> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> >>> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> >>> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> >>> +               txq->sw_nb_free += nb_clean;
> >>> +               start += nb_clean;
> >>> +               txq->sw_ntc = start;
> >>> +       }
> >>> +}
> >>
> >> [copy/paste from previous version]
> >>
> >> may be can drop the 'if' block, since "ntc == start" and "ntc < start"
> >> cases already covered.
> >
> > Yes, this 'if' block is dropped in v7 as suggested. Thanks!
> >
> 
> This is v7 and please check above code which has the mentioned 'if'
> block exist, do you mean in coming v8?

Oh, sorry about this! There is another function with the same issue.
I just updated the code there and forgot this. Will update this and 
check for the rest in the coming version. Thanks!

> 
> >>
> >> <...>
> >>
> >>> +uint16_t
> >>> +gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> >> nb_pkts)
> >>> +{
> >>> +       struct gve_tx_queue *txq = tx_queue;
> >>> +
> >>> +       if (txq->is_gqi_qpl)
> >>> +               return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
> >>> +
> >>> +       return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
> >>> +}
> >>> +
> >>
> >> [copy/paste from previous version]
> >>
> >> Can there be mix of queue types?
> >> If only one queue type is supported in specific config, perhaps burst
> >> function can be set during configuration, to prevent if check on
> datapath.
> >>
> >> This is optimization and can be done later, it doesn't have to be in the
> >> set.
> >
> > The exact queue type can be fetched from the backend via adminq
> > in priv->queue_format. So there won't be mix of the queue types.
> > Currently, only GQI_QPL and GQI_RDA queue format are supported
> > in PMD. Also, only GQI_QPL queue format is in use on GCP since
> > GQI_RDA hasn't been released in production.
> > This part code will be optimized/refactored later when involving
> > the queue type DQO_RDA. Thanks!
> >


^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 0/8] introduce GVE PMD
  2022-10-24 10:50                         ` Ferruh Yigit
@ 2022-10-24 13:25                           ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24 13:25 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, October 24, 2022 18:50
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v7 0/8] introduce GVE PMD
> 
> On 10/21/2022 2:12 PM, Ferruh Yigit wrote:
> > On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> >> Introduce a new PMD for Google Virtual Ethernet (GVE).
> >>
> >> gve (or gVNIC) is the standard virtual ethernet interface on Google
> Cloud
> >> Platform (GCP), which is one of the multiple virtual interfaces from
> >> those
> >> leading CSP customers in the world.
> >>
> >> Having a well maintained/optimized gve PMD on DPDK community can
> help
> >> those
> >> cloud instance consumers with better experience of performance,
> >> maintenance
> >> who wants to run their own VNFs on GCP.
> >>
> >> Please refer
> >> tohttps://cloud.google.com/compute/docs/networking/using-gvnic
> >> for the device description.
> >>
> >> This patch set requires an exception for MIT license for GVE base code.
> >> And the base code includes the following files:
> >>   - gve_adminq.c
> >>   - gve_adminq.h
> >>   - gve_desc.h
> >>   - gve_desc_dqo.h
> >>   - gve_register.h
> >>
> >> It's based on GVE kernel driver v1.3.0 and the original code is in
> >> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linux/tree/v1.3.0
> >>
> >>
> >> v2:
> >> fix some CI check error.
> >>
> >> v3:
> >> refactor some code and fix some build error.
> >>
> >> v4:
> >> move the Google base code files into DPDK base folder.
> >>
> >> v5:
> >> reorder commit sequence and drop the stats feature.
> >>
> >> v6-v7:
> >> improve the code.
> >>
> >> Junfeng Guo (8):
> >>    net/gve/base: introduce base code
> >>    net/gve/base: add OS specific implementation
> >>    net/gve: add support for device initialization
> >>    net/gve: add support for link update
> >>    net/gve: add support for MTU setting
> >>    net/gve: add support for dev info get and dev configure
> >>    net/gve: add support for queue operations
> >>    net/gve: add support for Rx/Tx
> >
> > Can you please check the build error reported by CI:
> > https://mails.dpdk.org/archives/test-report/2022-October/318054.html
> >
> >
> > Following link can be helpful:
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225324
> >
> 
> Reminder if this build error, please send v8 with fix.

Sure, will try to fix this. Thanks for reminding this!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 1/8] net/gve/base: introduce base code
  2022-10-24 10:50                         ` Ferruh Yigit
@ 2022-10-24 13:26                           ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-24 13:26 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, ferruh.yigit, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin, Wang, Haiyue



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, October 24, 2022 18:51
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [PATCH v7 1/8] net/gve/base: introduce base code
> 
> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> > +int gve_adminq_describe_device(struct gve_priv *priv)
> > +{
> > +	struct gve_device_option_jumbo_frames
> *dev_op_jumbo_frames = NULL;
> > +	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
> > +	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
> > +	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
> > +	struct gve_device_descriptor *descriptor;
> > +	struct gve_dma_mem descriptor_dma_mem;
> > +	u32 supported_features_mask = 0;
> > +	union gve_adminq_command cmd;
> > +	int err = 0;
> > +	u8 *mac;
> > +	u16 mtu;
> > +
> > +	memset(&cmd, 0, sizeof(cmd));
> > +	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem,
> PAGE_SIZE);
> > +	if (!descriptor)
> > +		return -ENOMEM;
> > +	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
> > +	cmd.describe_device.device_descriptor_addr =
> > +
> 	cpu_to_be64(descriptor_dma_mem.pa);
> > +	cmd.describe_device.device_descriptor_version =
> > +
> 	cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
> > +	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
> > +
> > +	err = gve_adminq_execute_cmd(priv, &cmd);
> > +	if (err)
> > +		goto free_device_descriptor;
> > +
> > +	err = gve_process_device_options(priv, descriptor,
> &dev_op_gqi_rda,
> > +					 &dev_op_gqi_qpl,
> &dev_op_dqo_rda,
> > +					 &dev_op_jumbo_frames);
> > +	if (err)
> > +		goto free_device_descriptor;
> > +
> > +	/* If the GQI_RAW_ADDRESSING option is not enabled and the
> queue format
> > +	 * is not set to GqiRda, choose the queue format in a priority
> order:
> > +	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
> > +	 */
> > +	if (dev_op_dqo_rda) {
> > +		priv->queue_format = GVE_DQO_RDA_FORMAT;
> > +		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA
> queue format.");
> > +		supported_features_mask =
> > +			be32_to_cpu(dev_op_dqo_rda-
> >supported_features_mask);
> > +	} else if (dev_op_gqi_rda) {
> > +		priv->queue_format = GVE_GQI_RDA_FORMAT;
> > +		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA
> queue format.");
> > +		supported_features_mask =
> > +			be32_to_cpu(dev_op_gqi_rda-
> >supported_features_mask);
> > +	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
> > +		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA
> queue format.");
> > +	} else {
> > +		priv->queue_format = GVE_GQI_QPL_FORMAT;
> > +		if (dev_op_gqi_qpl)
> > +			supported_features_mask =
> > +				be32_to_cpu(dev_op_gqi_qpl-
> >supported_features_mask);
> > +		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL
> queue format.");
> > +	}
> > +	if (gve_is_gqi(priv)) {
> > +		err = gve_set_desc_cnt(priv, descriptor);
> > +	} else {
> > +		/* DQO supports LRO. */
> > +		err = gve_set_desc_cnt_dqo(priv, descriptor,
> dev_op_dqo_rda);
> > +	}
> > +	if (err)
> > +		goto free_device_descriptor;
> > +
> > +	priv->max_registered_pages =
> > +				be64_to_cpu(descriptor-
> >max_registered_pages);
> > +	mtu = be16_to_cpu(descriptor->mtu);
> > +	if (mtu < ETH_MIN_MTU) {
> > +		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU",
> mtu);
> > +		err = -EINVAL;
> > +		goto free_device_descriptor;
> > +	}
> > +	priv->max_mtu = mtu;
> > +	priv->num_event_counters = be16_to_cpu(descriptor->counters);
> > +	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac,
> ETH_ALEN);
> > +	mac = descriptor->mac;
> > +	PMD_DRV_LOG(INFO, "MAC
> addr: %02x:%02x:%02x:%02x:%02x:%02x",
> > +		    mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> 
> 
> There are 'RTE_ETHER_ADDR_PRT_FMT' & 'RTE_ETHER_ADDR_BYTES'
> macros for
> this purpose, you can use it like:
> 
> PMD_DRV_LOG(INFO, "MAC addr" RTE_ETHER_ADDR_PRT_FMT,
> RTE_ETHER_ADDR_BYTES(priv->dev_addr));
> 
> So can get rid of 'mac' variable.

Thanks for the advice! Will try this in the coming version.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v8 0/8] introduce GVE PMD
  2022-10-21  9:19                       ` [PATCH v7 1/8] net/gve/base: introduce base code Junfeng Guo
  2022-10-21  9:49                         ` Ferruh Yigit
  2022-10-24 10:50                         ` Ferruh Yigit
@ 2022-10-25  9:07                         ` Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 1/8] net/gve/base: introduce base code Junfeng Guo
                                             ` (8 more replies)
  2 siblings, 9 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Introduce a new PMD for Google Virtual Ethernet (GVE).

gve (or gVNIC) is the standard virtual ethernet interface on Google Cloud
Platform (GCP), which is one of the multiple virtual interfaces from those
leading CSP customers in the world.

Having a well maintained/optimized gve PMD on DPDK community can help those
cloud instance consumers with better experience of performance, maintenance
who wants to run their own VNFs on GCP.

Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
for the device description.

This patch set requires an exception for MIT license for GVE base code.
And the base code includes the following files:
 - gve_adminq.c
 - gve_adminq.h
 - gve_desc.h
 - gve_desc_dqo.h
 - gve_register.h

It's based on GVE kernel driver v1.3.0 and the original code is in
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0


v2:
fix some CI check error.

v3:
refactor some code and fix some build error.

v4:
move the Google base code files into DPDK base folder.

v5:
reorder commit sequence and drop the stats feature.

v6:
improve the code.

v7:
- remove Intel copyright for the google base files.

v8:
- replace ETIME with ETIMEDOUT to pass the build check.
- use RTE_ETHER_ADDR_PRT_FMT/_ADDR_BYTES to get rid of 'mac' variable.
- add limitations in doc for current limited RSS and MTU.


Junfeng Guo (8):
  net/gve/base: introduce base code
  net/gve/base: add OS specific implementation
  net/gve: add support for device initialization
  net/gve: add support for link update
  net/gve: add support for MTU setting
  net/gve: add support for dev info get and dev configure
  net/gve: add support for queue operations
  net/gve: add support for Rx/Tx

 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  16 +
 doc/guides/nics/gve.rst                |  82 +++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve.h             |  56 ++
 drivers/net/gve/base/gve_adminq.c      | 921 +++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h      | 381 ++++++++++
 drivers/net/gve/base/gve_desc.h        | 138 ++++
 drivers/net/gve/base/gve_desc_dqo.h    | 255 +++++++
 drivers/net/gve/base/gve_osdep.h       | 159 +++++
 drivers/net/gve/base/gve_register.h    |  29 +
 drivers/net/gve/gve_ethdev.c           | 700 +++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 298 ++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/gve_rx.c               | 354 ++++++++++
 drivers/net/gve/gve_tx.c               | 669 ++++++++++++++++++
 drivers/net/gve/meson.build            |  16 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 20 files changed, 4104 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_osdep.h
 create mode 100644 drivers/net/gve/base/gve_register.h
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

-- 
2.34.1


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v8 1/8] net/gve/base: introduce base code
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 2/8] net/gve/base: add OS specific implementation Junfeng Guo
                                             ` (7 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

The following base code is based on Google Virtual Ethernet (gve)
driver v1.3.0 under MIT license.
- gve_adminq.c
- gve_adminq.h
- gve_desc.h
- gve_desc_dqo.h
- gve_register.h
- gve.h

The original code is in:
https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/\
tree/v1.3.0/google/gve

Note that these code are not Intel files and they come from the kernel
community. The base code there has the statement of
SPDX-License-Identifier: (GPL-2.0 OR MIT). Here we just follow the
required MIT license as an exception to DPDK.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve.h          |  56 ++
 drivers/net/gve/base/gve_adminq.c   | 920 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_adminq.h   | 379 ++++++++++++
 drivers/net/gve/base/gve_desc.h     | 136 ++++
 drivers/net/gve/base/gve_desc_dqo.h | 253 ++++++++
 drivers/net/gve/base/gve_register.h |  27 +
 6 files changed, 1771 insertions(+)
 create mode 100644 drivers/net/gve/base/gve.h
 create mode 100644 drivers/net/gve/base/gve_adminq.c
 create mode 100644 drivers/net/gve/base/gve_adminq.h
 create mode 100644 drivers/net/gve/base/gve_desc.h
 create mode 100644 drivers/net/gve/base/gve_desc_dqo.h
 create mode 100644 drivers/net/gve/base/gve_register.h

diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
new file mode 100644
index 0000000000..2dc4507acb
--- /dev/null
+++ b/drivers/net/gve/base/gve.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_H_
+#define _GVE_H_
+
+#include "gve_desc.h"
+
+#define GVE_VERSION		"1.3.0"
+#define GVE_VERSION_PREFIX	"GVE-"
+
+#ifndef GOOGLE_VENDOR_ID
+#define GOOGLE_VENDOR_ID	0x1ae0
+#endif
+
+#define GVE_DEV_ID		0x0042
+
+#define GVE_REG_BAR		0
+#define GVE_DB_BAR		2
+
+/* 1 for management, 1 for rx, 1 for tx */
+#define GVE_MIN_MSIX		3
+
+/* PTYPEs are always 10 bits. */
+#define GVE_NUM_PTYPES		1024
+
+struct gve_irq_db {
+	rte_be32_t id;
+} ____cacheline_aligned;
+
+struct gve_ptype {
+	uint8_t l3_type;  /* `gve_l3_type` in gve_adminq.h */
+	uint8_t l4_type;  /* `gve_l4_type` in gve_adminq.h */
+};
+
+struct gve_ptype_lut {
+	struct gve_ptype ptypes[GVE_NUM_PTYPES];
+};
+
+enum gve_queue_format {
+	GVE_QUEUE_FORMAT_UNSPECIFIED = 0x0, /* default unspecified */
+	GVE_GQI_RDA_FORMAT	     = 0x1, /* GQI Raw Addressing */
+	GVE_GQI_QPL_FORMAT	     = 0x2, /* GQI Queue Page List */
+	GVE_DQO_RDA_FORMAT	     = 0x3, /* DQO Raw Addressing */
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
+};
+
+#endif /* _GVE_H_ */
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
new file mode 100644
index 0000000000..045d47615d
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -0,0 +1,920 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#include "gve_adminq.h"
+#include "gve_register.h"
+
+#define GVE_MAX_ADMINQ_RELEASE_CHECK	500
+#define GVE_ADMINQ_SLEEP_LEN		20
+#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK	100
+
+#define GVE_DEVICE_OPTION_ERROR_FMT "%s option error:\n Expected: length=%d, feature_mask=%x.\n Actual: length=%d, feature_mask=%x."
+
+#define GVE_DEVICE_OPTION_TOO_BIG_FMT "Length of %s option larger than expected. Possible older version of guest driver."
+
+static
+struct gve_device_option *gve_get_next_option(struct gve_device_descriptor *descriptor,
+					      struct gve_device_option *option)
+{
+	uintptr_t option_end, descriptor_end;
+
+	option_end = (uintptr_t)option + sizeof(*option) + be16_to_cpu(option->option_length);
+	descriptor_end = (uintptr_t)descriptor + be16_to_cpu(descriptor->total_length);
+
+	return option_end > descriptor_end ? NULL : (struct gve_device_option *)option_end;
+}
+
+static
+void gve_parse_device_option(struct gve_priv *priv,
+			     struct gve_device_option *option,
+			     struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			     struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			     struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			     struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
+	u16 option_length = be16_to_cpu(option->option_length);
+	u16 option_id = be16_to_cpu(option->option_id);
+
+	/* If the length or feature mask doesn't match, continue without
+	 * enabling the feature.
+	 */
+	switch (option_id) {
+	case GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING:
+		if (option_length != GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Raw Addressing",
+				    GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING,
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		PMD_DRV_LOG(INFO, "Gqi raw addressing device option enabled.");
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		break;
+	case GVE_DEV_OPT_ID_GQI_RDA:
+		if (option_length < sizeof(**dev_op_gqi_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI RDA", (int)sizeof(**dev_op_gqi_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI RDA");
+		}
+		*dev_op_gqi_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_GQI_QPL:
+		if (option_length < sizeof(**dev_op_gqi_qpl) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "GQI QPL", (int)sizeof(**dev_op_gqi_qpl),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_gqi_qpl)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "GQI QPL");
+		}
+		*dev_op_gqi_qpl = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_DQO_RDA:
+		if (option_length < sizeof(**dev_op_dqo_rda) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "DQO RDA", (int)sizeof(**dev_op_dqo_rda),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_dqo_rda)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT, "DQO RDA");
+		}
+		*dev_op_dqo_rda = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	case GVE_DEV_OPT_ID_JUMBO_FRAMES:
+		if (option_length < sizeof(**dev_op_jumbo_frames) ||
+		    req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) {
+			PMD_DRV_LOG(WARNING, GVE_DEVICE_OPTION_ERROR_FMT,
+				    "Jumbo Frames",
+				    (int)sizeof(**dev_op_jumbo_frames),
+				    GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES,
+				    option_length, req_feat_mask);
+			break;
+		}
+
+		if (option_length > sizeof(**dev_op_jumbo_frames)) {
+			PMD_DRV_LOG(WARNING,
+				    GVE_DEVICE_OPTION_TOO_BIG_FMT,
+				    "Jumbo Frames");
+		}
+		*dev_op_jumbo_frames = RTE_PTR_ADD(option, sizeof(*option));
+		break;
+	default:
+		/* If we don't recognize the option just continue
+		 * without doing anything.
+		 */
+		PMD_DRV_LOG(DEBUG, "Unrecognized device option 0x%hx not enabled.",
+			    option_id);
+	}
+}
+
+/* Process all device options for a given describe device call. */
+static int
+gve_process_device_options(struct gve_priv *priv,
+			   struct gve_device_descriptor *descriptor,
+			   struct gve_device_option_gqi_rda **dev_op_gqi_rda,
+			   struct gve_device_option_gqi_qpl **dev_op_gqi_qpl,
+			   struct gve_device_option_dqo_rda **dev_op_dqo_rda,
+			   struct gve_device_option_jumbo_frames **dev_op_jumbo_frames)
+{
+	const int num_options = be16_to_cpu(descriptor->num_device_options);
+	struct gve_device_option *dev_opt;
+	int i;
+
+	/* The options struct directly follows the device descriptor. */
+	dev_opt = RTE_PTR_ADD(descriptor, sizeof(*descriptor));
+	for (i = 0; i < num_options; i++) {
+		struct gve_device_option *next_opt;
+
+		next_opt = gve_get_next_option(descriptor, dev_opt);
+		if (!next_opt) {
+			PMD_DRV_LOG(ERR,
+				    "options exceed device_descriptor's total length.");
+			return -EINVAL;
+		}
+
+		gve_parse_device_option(priv, dev_opt,
+					dev_op_gqi_rda, dev_op_gqi_qpl,
+					dev_op_dqo_rda, dev_op_jumbo_frames);
+		dev_opt = next_opt;
+	}
+
+	return 0;
+}
+
+int gve_adminq_alloc(struct gve_priv *priv)
+{
+	priv->adminq = gve_alloc_dma_mem(&priv->adminq_dma_mem, PAGE_SIZE);
+	if (unlikely(!priv->adminq))
+		return -ENOMEM;
+
+	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
+	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
+	priv->adminq_report_link_speed_cnt = 0;
+	priv->adminq_get_ptype_map_cnt = 0;
+
+	/* Setup Admin queue with the device */
+	iowrite32be(priv->adminq_dma_mem.pa / PAGE_SIZE,
+		    &priv->reg_bar0->adminq_pfn);
+
+	gve_set_admin_queue_ok(priv);
+	return 0;
+}
+
+void gve_adminq_release(struct gve_priv *priv)
+{
+	int i = 0;
+
+	/* Tell the device the adminq is leaving */
+	iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
+	while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
+		/* If this is reached the device is unrecoverable and still
+		 * holding memory. Continue looping to avoid memory corruption,
+		 * but WARN so it is visible what is going on.
+		 */
+		if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
+			PMD_DRV_LOG(WARNING, "Unrecoverable platform error!");
+		i++;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+	gve_clear_device_rings_ok(priv);
+	gve_clear_device_resources_ok(priv);
+	gve_clear_admin_queue_ok(priv);
+}
+
+void gve_adminq_free(struct gve_priv *priv)
+{
+	if (!gve_get_admin_queue_ok(priv))
+		return;
+	gve_adminq_release(priv);
+	gve_free_dma_mem(&priv->adminq_dma_mem);
+	gve_clear_admin_queue_ok(priv);
+}
+
+static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
+}
+
+static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
+{
+	int i;
+
+	for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
+		if (ioread32be(&priv->reg_bar0->adminq_event_counter)
+		    == prod_cnt)
+			return true;
+		msleep(GVE_ADMINQ_SLEEP_LEN);
+	}
+
+	return false;
+}
+
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
+{
+	if (status != GVE_ADMINQ_COMMAND_PASSED &&
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		PMD_DRV_LOG(ERR, "AQ command failed with status %d", status);
+		priv->adminq_cmd_fail++;
+	}
+	switch (status) {
+	case GVE_ADMINQ_COMMAND_PASSED:
+		return 0;
+	case GVE_ADMINQ_COMMAND_UNSET:
+		PMD_DRV_LOG(ERR, "parse_aq_err: err and status both unset, this should not be possible.");
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
+	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
+	case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
+	case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
+		return -EAGAIN;
+	case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
+	case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
+	case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
+	case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
+	case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
+	case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
+		return -EINVAL;
+	case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
+		return -ETIMEDOUT;
+	case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
+	case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
+		return -EACCES;
+	case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
+		return -ENOMEM;
+	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
+		return -ENOTSUP;
+	default:
+		PMD_DRV_LOG(ERR, "parse_aq_err: unknown status code %d",
+			    status);
+		return -EINVAL;
+	}
+}
+
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	u32 i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		PMD_DRV_LOG(ERR, "AQ commands timed out, need to reset AQ");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE32(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			/* Return the first error if we failed. */
+			return err;
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ */
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
+{
+	union gve_adminq_command *cmd;
+	u32 opcode;
+	u32 tail;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	/* Check if next command will overflow the buffer. */
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+	    (tail & priv->adminq_mask)) {
+		int err;
+
+		/* Flush existing commands to make room. */
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		/* Retry. */
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) ==
+		    (tail & priv->adminq_mask)) {
+			/* This should never happen. We just flushed the
+			 * command queue so there should be enough space.
+			 */
+			return -ENOMEM;
+		}
+	}
+
+	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
+	priv->adminq_prod_cnt++;
+
+	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE32(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
+	case GVE_ADMINQ_REPORT_LINK_SPEED:
+		priv->adminq_report_link_speed_cnt++;
+		break;
+	case GVE_ADMINQ_GET_PTYPE_MAP:
+		priv->adminq_get_ptype_map_cnt++;
+		break;
+	default:
+		PMD_DRV_LOG(ERR, "unknown AQ command opcode %d", opcode);
+	}
+
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv,
+				  union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		/* This is not a valid path */
+		return -EINVAL;
+
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+/* The device specifies that the management vector can either be the first irq
+ * or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
+ * the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
+ * the management vector is first.
+ *
+ * gve arranges the msix vectors so that the management vector is last.
+ */
+#define GVE_NTFY_BLK_BASE_MSIX_IDX	0
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
+	cmd.configure_device_resources =
+		(struct gve_adminq_configure_device_resources) {
+		.counter_array = cpu_to_be64(counter_array_bus_addr),
+		.num_counters = cpu_to_be32(num_counters),
+		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
+		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_dbs)),
+		.ntfy_blk_msix_base_idx =
+					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
+		.queue_format = priv->queue_format,
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_tx_queue *txq = priv->txqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
+	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.queue_resources_addr =
+			cpu_to_be64(txq->qres_mz->iova),
+		.tx_ring_addr = cpu_to_be64(txq->tx_ring_phys_addr),
+		.ntfy_id = cpu_to_be32(txq->ntfy_id),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : txq->qpl->id;
+
+		cmd.create_tx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+	} else {
+		cmd.create_tx_queue.tx_ring_size =
+			cpu_to_be16(txq->nb_tx_desc);
+		cmd.create_tx_queue.tx_comp_ring_addr =
+			cpu_to_be64(txq->complq->tx_ring_phys_addr);
+		cmd.create_tx_queue.tx_comp_ring_size =
+			cpu_to_be16(priv->tx_compq_size);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	struct gve_rx_queue *rxq = priv->rxqs[queue_index];
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
+	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+		.ntfy_id = cpu_to_be32(rxq->ntfy_id),
+		.queue_resources_addr = cpu_to_be64(rxq->qres_mz->iova),
+	};
+
+	if (gve_is_gqi(priv)) {
+		u32 qpl_id = priv->queue_format == GVE_GQI_RDA_FORMAT ?
+			GVE_RAW_ADDRESSING_QPL_ID : rxq->qpl->id;
+
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->mz->iova),
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->data_mz->iova),
+		cmd.create_rx_queue.index = cpu_to_be32(queue_index);
+		cmd.create_rx_queue.queue_page_list_id = cpu_to_be32(qpl_id);
+		cmd.create_rx_queue.packet_buffer_size = cpu_to_be16(rxq->rx_buf_len);
+	} else {
+		cmd.create_rx_queue.rx_ring_size =
+			cpu_to_be16(priv->rx_desc_cnt);
+		cmd.create_rx_queue.rx_desc_ring_addr =
+			cpu_to_be64(rxq->rx_ring_phys_addr);
+		cmd.create_rx_queue.rx_data_ring_addr =
+			cpu_to_be64(rxq->bufq->rx_ring_phys_addr);
+		cmd.create_rx_queue.packet_buffer_size =
+			cpu_to_be16(rxq->rx_buf_len);
+		cmd.create_rx_queue.rx_buff_ring_size =
+			cpu_to_be16(priv->rx_bufq_size);
+		cmd.create_rx_queue.enable_rsc = !!(priv->enable_rsc);
+	}
+
+	return gve_adminq_issue_cmd(priv, &cmd);
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
+	cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+{
+	union gve_adminq_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
+	cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
+		.queue_id = cpu_to_be32(queue_index),
+	};
+
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	u32 i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_set_desc_cnt(struct gve_priv *priv,
+			    struct gve_device_descriptor *descriptor)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	if (priv->tx_desc_cnt * sizeof(priv->txqs[0]->tx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Tx desc count %d too low", priv->tx_desc_cnt);
+		return -EINVAL;
+	}
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	if (priv->rx_desc_cnt * sizeof(priv->rxqs[0]->rx_desc_ring[0])
+	    < PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "Rx desc count %d too low", priv->rx_desc_cnt);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+gve_set_desc_cnt_dqo(struct gve_priv *priv,
+		     const struct gve_device_descriptor *descriptor,
+		     const struct gve_device_option_dqo_rda *dev_op_dqo_rda)
+{
+	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
+	priv->tx_compq_size = be16_to_cpu(dev_op_dqo_rda->tx_comp_ring_entries);
+	priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
+	priv->rx_bufq_size = be16_to_cpu(dev_op_dqo_rda->rx_buff_ring_entries);
+
+	return 0;
+}
+
+static void gve_enable_supported_features(struct gve_priv *priv,
+					  u32 supported_features_mask,
+					  const struct gve_device_option_jumbo_frames
+						  *dev_op_jumbo_frames)
+{
+	/* Before control reaches this point, the page-size-capped max MTU from
+	 * the gve_device_descriptor field has already been stored in
+	 * priv->dev->max_mtu. We overwrite it with the true max MTU below.
+	 */
+	if (dev_op_jumbo_frames &&
+	    (supported_features_mask & GVE_SUP_JUMBO_FRAMES_MASK)) {
+		PMD_DRV_LOG(INFO, "JUMBO FRAMES device option enabled.");
+		priv->max_mtu = be16_to_cpu(dev_op_jumbo_frames->max_mtu);
+	}
+}
+
+int gve_adminq_describe_device(struct gve_priv *priv)
+{
+	struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
+	struct gve_device_option_gqi_rda *dev_op_gqi_rda = NULL;
+	struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL;
+	struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL;
+	struct gve_device_descriptor *descriptor;
+	struct gve_dma_mem descriptor_dma_mem;
+	u32 supported_features_mask = 0;
+	union gve_adminq_command cmd;
+	int err = 0;
+	u16 mtu;
+
+	memset(&cmd, 0, sizeof(cmd));
+	descriptor = gve_alloc_dma_mem(&descriptor_dma_mem, PAGE_SIZE);
+	if (!descriptor)
+		return -ENOMEM;
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
+	cmd.describe_device.device_descriptor_addr =
+					cpu_to_be64(descriptor_dma_mem.pa);
+	cmd.describe_device.device_descriptor_version =
+			cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
+	cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto free_device_descriptor;
+
+	err = gve_process_device_options(priv, descriptor, &dev_op_gqi_rda,
+					 &dev_op_gqi_qpl, &dev_op_dqo_rda,
+					 &dev_op_jumbo_frames);
+	if (err)
+		goto free_device_descriptor;
+
+	/* If the GQI_RAW_ADDRESSING option is not enabled and the queue format
+	 * is not set to GqiRda, choose the queue format in a priority order:
+	 * DqoRda, GqiRda, GqiQpl. Use GqiQpl as default.
+	 */
+	if (dev_op_dqo_rda) {
+		priv->queue_format = GVE_DQO_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with DQO RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_dqo_rda->supported_features_mask);
+	} else if (dev_op_gqi_rda) {
+		priv->queue_format = GVE_GQI_RDA_FORMAT;
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+		supported_features_mask =
+			be32_to_cpu(dev_op_gqi_rda->supported_features_mask);
+	} else if (priv->queue_format == GVE_GQI_RDA_FORMAT) {
+		PMD_DRV_LOG(INFO, "Driver is running with GQI RDA queue format.");
+	} else {
+		priv->queue_format = GVE_GQI_QPL_FORMAT;
+		if (dev_op_gqi_qpl)
+			supported_features_mask =
+				be32_to_cpu(dev_op_gqi_qpl->supported_features_mask);
+		PMD_DRV_LOG(INFO, "Driver is running with GQI QPL queue format.");
+	}
+	if (gve_is_gqi(priv)) {
+		err = gve_set_desc_cnt(priv, descriptor);
+	} else {
+		/* DQO supports LRO. */
+		err = gve_set_desc_cnt_dqo(priv, descriptor, dev_op_dqo_rda);
+	}
+	if (err)
+		goto free_device_descriptor;
+
+	priv->max_registered_pages =
+				be64_to_cpu(descriptor->max_registered_pages);
+	mtu = be16_to_cpu(descriptor->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		PMD_DRV_LOG(ERR, "MTU %d below minimum MTU", mtu);
+		err = -EINVAL;
+		goto free_device_descriptor;
+	}
+	priv->max_mtu = mtu;
+	priv->num_event_counters = be16_to_cpu(descriptor->counters);
+	rte_memcpy(priv->dev_addr.addr_bytes, descriptor->mac, ETH_ALEN);
+	PMD_DRV_LOG(INFO, "MAC addr: " RTE_ETHER_ADDR_PRT_FMT,
+		    RTE_ETHER_ADDR_BYTES(&priv->dev_addr));
+	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+
+	if (gve_is_gqi(priv) && priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		PMD_DRV_LOG(ERR,
+			    "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d",
+			    priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
+	}
+	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+
+	gve_enable_supported_features(priv, supported_features_mask,
+				      dev_op_jumbo_frames);
+
+free_device_descriptor:
+	gve_free_dma_mem(&descriptor_dma_mem);
+	return err;
+}
+
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl)
+{
+	struct gve_dma_mem page_list_dma_mem;
+	u32 num_entries = qpl->num_entries;
+	u32 size = num_entries * sizeof(qpl->page_buses[0]);
+	union gve_adminq_command cmd;
+	__be64 *page_list;
+	int err;
+	u32 i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	page_list = gve_alloc_dma_mem(&page_list_dma_mem, size);
+	if (!page_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		page_list[i] = cpu_to_be64(qpl->page_buses[i]);
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
+	cmd.reg_page_list = (struct gve_adminq_register_page_list) {
+		.page_list_id = cpu_to_be32(qpl->id),
+		.num_pages = cpu_to_be32(num_entries),
+		.page_address_list_addr = cpu_to_be64(page_list_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	gve_free_dma_mem(&page_list_dma_mem);
+	return err;
+}
+
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
+	cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
+		.page_list_id = cpu_to_be32(page_list_id),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
+	cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
+		.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
+		.parameter_value = cpu_to_be64(mtu),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	struct gve_dma_mem link_speed_region_dma_mem;
+	union gve_adminq_command gvnic_cmd;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region = gve_alloc_dma_mem(&link_speed_region_dma_mem,
+					      sizeof(*link_speed_region));
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_dma_mem.pa);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	gve_free_dma_mem(&link_speed_region_dma_mem);
+	return err;
+}
+
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut)
+{
+	struct gve_dma_mem ptype_map_dma_mem;
+	struct gve_ptype_map *ptype_map;
+	union gve_adminq_command cmd;
+	int err = 0;
+	int i;
+
+	memset(&cmd, 0, sizeof(cmd));
+	ptype_map = gve_alloc_dma_mem(&ptype_map_dma_mem, sizeof(*ptype_map));
+	if (!ptype_map)
+		return -ENOMEM;
+
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_GET_PTYPE_MAP);
+	cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) {
+		.ptype_map_len = cpu_to_be64(sizeof(*ptype_map)),
+		.ptype_map_addr = cpu_to_be64(ptype_map_dma_mem.pa),
+	};
+
+	err = gve_adminq_execute_cmd(priv, &cmd);
+	if (err)
+		goto err;
+
+	/* Populate ptype_lut. */
+	for (i = 0; i < GVE_NUM_PTYPES; i++) {
+		ptype_lut->ptypes[i].l3_type =
+			ptype_map->ptypes[i].l3_type;
+		ptype_lut->ptypes[i].l4_type =
+			ptype_map->ptypes[i].l4_type;
+	}
+err:
+	gve_free_dma_mem(&ptype_map_dma_mem);
+	return err;
+}
diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
new file mode 100644
index 0000000000..b2422d7dc8
--- /dev/null
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -0,0 +1,379 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_ADMINQ_H
+#define _GVE_ADMINQ_H
+
+/* Admin queue opcodes */
+enum gve_adminq_opcodes {
+	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
+	GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES	= 0x2,
+	GVE_ADMINQ_REGISTER_PAGE_LIST		= 0x3,
+	GVE_ADMINQ_UNREGISTER_PAGE_LIST		= 0x4,
+	GVE_ADMINQ_CREATE_TX_QUEUE		= 0x5,
+	GVE_ADMINQ_CREATE_RX_QUEUE		= 0x6,
+	GVE_ADMINQ_DESTROY_TX_QUEUE		= 0x7,
+	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
+	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
+	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED		= 0xD,
+	GVE_ADMINQ_GET_PTYPE_MAP		= 0xE,
+};
+
+/* Admin queue status codes */
+enum gve_adminq_statuses {
+	GVE_ADMINQ_COMMAND_UNSET			= 0x0,
+	GVE_ADMINQ_COMMAND_PASSED			= 0x1,
+	GVE_ADMINQ_COMMAND_ERROR_ABORTED		= 0xFFFFFFF0,
+	GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS		= 0xFFFFFFF1,
+	GVE_ADMINQ_COMMAND_ERROR_CANCELLED		= 0xFFFFFFF2,
+	GVE_ADMINQ_COMMAND_ERROR_DATALOSS		= 0xFFFFFFF3,
+	GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED	= 0xFFFFFFF4,
+	GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION	= 0xFFFFFFF5,
+	GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR		= 0xFFFFFFF6,
+	GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT	= 0xFFFFFFF7,
+	GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND		= 0xFFFFFFF8,
+	GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE		= 0xFFFFFFF9,
+	GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED	= 0xFFFFFFFA,
+	GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED	= 0xFFFFFFFB,
+	GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED	= 0xFFFFFFFC,
+	GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE		= 0xFFFFFFFD,
+	GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED		= 0xFFFFFFFE,
+	GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR		= 0xFFFFFFFF,
+};
+
+#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
+
+/* All AdminQ command structs should be naturally packed.
+ * GVE_CHECK_STRUCT/UNION_LEN will check struct/union length and throw
+ * error at compile time when the size is not correct.
+ */
+
+struct gve_adminq_describe_device {
+	__be64 device_descriptor_addr;
+	__be32 device_descriptor_version;
+	__be32 available_length;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_describe_device);
+
+struct gve_device_descriptor {
+	__be64 max_registered_pages;
+	__be16 reserved1;
+	__be16 tx_queue_entries;
+	__be16 rx_queue_entries;
+	__be16 default_num_queues;
+	__be16 mtu;
+	__be16 counters;
+	__be16 tx_pages_per_qpl;
+	__be16 rx_pages_per_qpl;
+	u8  mac[ETH_ALEN];
+	__be16 num_device_options;
+	__be16 total_length;
+	u8  reserved2[6];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_device_descriptor);
+
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 required_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option);
+
+struct gve_device_option_gqi_rda {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_rda);
+
+struct gve_device_option_gqi_qpl {
+	__be32 supported_features_mask;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_device_option_gqi_qpl);
+
+struct gve_device_option_dqo_rda {
+	__be32 supported_features_mask;
+	__be16 tx_comp_ring_entries;
+	__be16 rx_buff_ring_entries;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_dqo_rda);
+
+struct gve_device_option_jumbo_frames {
+	__be32 supported_features_mask;
+	__be16 max_mtu;
+	u8 padding[2];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_device_option_jumbo_frames);
+
+/* Terminology:
+ *
+ * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
+ *       mapped and read/updated by the device.
+ *
+ * QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with
+ *       the device for read/write and data is copied from/to SKBs.
+ */
+enum gve_dev_opt_id {
+	GVE_DEV_OPT_ID_GQI_RAW_ADDRESSING = 0x1,
+	GVE_DEV_OPT_ID_GQI_RDA = 0x2,
+	GVE_DEV_OPT_ID_GQI_QPL = 0x3,
+	GVE_DEV_OPT_ID_DQO_RDA = 0x4,
+	GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
+};
+
+enum gve_dev_opt_req_feat_mask {
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RAW_ADDRESSING = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_GQI_QPL = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA = 0x0,
+	GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES = 0x0,
+};
+
+enum gve_sup_feature_mask {
+	GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
+};
+
+#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
+
+struct gve_adminq_configure_device_resources {
+	__be64 counter_array;
+	__be64 irq_db_addr;
+	__be32 num_counters;
+	__be32 num_irq_dbs;
+	__be32 irq_db_stride;
+	__be32 ntfy_blk_msix_base_idx;
+	u8 queue_format;
+	u8 padding[7];
+};
+
+GVE_CHECK_STRUCT_LEN(40, gve_adminq_configure_device_resources);
+
+struct gve_adminq_register_page_list {
+	__be32 page_list_id;
+	__be32 num_pages;
+	__be64 page_address_list_addr;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_register_page_list);
+
+struct gve_adminq_unregister_page_list {
+	__be32 page_list_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_unregister_page_list);
+
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
+struct gve_adminq_create_tx_queue {
+	__be32 queue_id;
+	__be32 reserved;
+	__be64 queue_resources_addr;
+	__be64 tx_ring_addr;
+	__be32 queue_page_list_id;
+	__be32 ntfy_id;
+	__be64 tx_comp_ring_addr;
+	__be16 tx_ring_size;
+	__be16 tx_comp_ring_size;
+	u8 padding[4];
+};
+
+GVE_CHECK_STRUCT_LEN(48, gve_adminq_create_tx_queue);
+
+struct gve_adminq_create_rx_queue {
+	__be32 queue_id;
+	__be32 index;
+	__be32 reserved;
+	__be32 ntfy_id;
+	__be64 queue_resources_addr;
+	__be64 rx_desc_ring_addr;
+	__be64 rx_data_ring_addr;
+	__be32 queue_page_list_id;
+	__be16 rx_ring_size;
+	__be16 packet_buffer_size;
+	__be16 rx_buff_ring_size;
+	u8 enable_rsc;
+	u8 padding[5];
+};
+
+GVE_CHECK_STRUCT_LEN(56, gve_adminq_create_rx_queue);
+
+/* Queue resources that are shared with the device */
+struct gve_queue_resources {
+	union {
+		struct {
+			__be32 db_index;	/* Device -> Guest */
+			__be32 counter_index;	/* Device -> Guest */
+		};
+		u8 reserved[64];
+	};
+};
+
+GVE_CHECK_STRUCT_LEN(64, gve_queue_resources);
+
+struct gve_adminq_destroy_tx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_tx_queue);
+
+struct gve_adminq_destroy_rx_queue {
+	__be32 queue_id;
+};
+
+GVE_CHECK_STRUCT_LEN(4, gve_adminq_destroy_rx_queue);
+
+/* GVE Set Driver Parameter Types */
+enum gve_set_driver_param_types {
+	GVE_SET_PARAM_MTU	= 0x1,
+};
+
+struct gve_adminq_set_driver_parameter {
+	__be32 parameter_type;
+	u8 reserved[4];
+	__be64 parameter_value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, gve_adminq_set_driver_parameter);
+
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+GVE_CHECK_STRUCT_LEN(24, gve_adminq_report_stats);
+
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_adminq_report_link_speed);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+GVE_CHECK_STRUCT_LEN(16, stats);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[];
+};
+
+GVE_CHECK_STRUCT_LEN(8, gve_stats_report);
+
+enum gve_stat_names {
+	/* stats from gve */
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	TX_TIMEOUT_CNT			= 8,
+	/* stats from NIC */
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
+enum gve_l3_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L3_TYPE_UNKNOWN = 0,
+	GVE_L3_TYPE_OTHER,
+	GVE_L3_TYPE_IPV4,
+	GVE_L3_TYPE_IPV6,
+};
+
+enum gve_l4_type {
+	/* Must be zero so zero initialized LUT is unknown. */
+	GVE_L4_TYPE_UNKNOWN = 0,
+	GVE_L4_TYPE_OTHER,
+	GVE_L4_TYPE_TCP,
+	GVE_L4_TYPE_UDP,
+	GVE_L4_TYPE_ICMP,
+	GVE_L4_TYPE_SCTP,
+};
+
+/* These are control path types for PTYPE which are the same as the data path
+ * types.
+ */
+struct gve_ptype_entry {
+	u8 l3_type;
+	u8 l4_type;
+};
+
+struct gve_ptype_map {
+	struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */
+};
+
+struct gve_adminq_get_ptype_map {
+	__be64 ptype_map_len;
+	__be64 ptype_map_addr;
+};
+
+union gve_adminq_command {
+	struct {
+		__be32 opcode;
+		__be32 status;
+		union {
+			struct gve_adminq_configure_device_resources
+						configure_device_resources;
+			struct gve_adminq_create_tx_queue create_tx_queue;
+			struct gve_adminq_create_rx_queue create_rx_queue;
+			struct gve_adminq_destroy_tx_queue destroy_tx_queue;
+			struct gve_adminq_destroy_rx_queue destroy_rx_queue;
+			struct gve_adminq_describe_device describe_device;
+			struct gve_adminq_register_page_list reg_page_list;
+			struct gve_adminq_unregister_page_list unreg_page_list;
+			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
+			struct gve_adminq_get_ptype_map get_ptype_map;
+		};
+	};
+	u8 reserved[64];
+};
+
+GVE_CHECK_UNION_LEN(64, gve_adminq_command);
+
+int gve_adminq_alloc(struct gve_priv *priv);
+void gve_adminq_free(struct gve_priv *priv);
+void gve_adminq_release(struct gve_priv *priv);
+int gve_adminq_describe_device(struct gve_priv *priv);
+int gve_adminq_configure_device_resources(struct gve_priv *priv,
+					  dma_addr_t counter_array_bus_addr,
+					  u32 num_counters,
+					  dma_addr_t db_array_bus_addr,
+					  u32 num_ntfy_blks);
+int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_register_page_list(struct gve_priv *priv,
+				  struct gve_queue_page_list *qpl);
+int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
+int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
+
+struct gve_ptype_lut;
+int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
+				 struct gve_ptype_lut *ptype_lut);
+
+#endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
new file mode 100644
index 0000000000..e0bbadcfd4
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE Transmit Descriptor formats */
+
+#ifndef _GVE_DESC_H_
+#define _GVE_DESC_H_
+
+/* A note on seg_addrs
+ *
+ * Base addresses encoded in seg_addr are not assumed to be physical
+ * addresses. The ring format assumes these come from some linear address
+ * space. This could be physical memory, kernel virtual memory, user virtual
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
+ */
+
+struct gve_tx_pkt_desc {
+	u8	type_flags;  /* desc type is lower 4 bits, flags upper */
+	u8	l4_csum_offset;  /* relative offset of L4 csum word */
+	u8	l4_hdr_offset;  /* Offset of start of L4 headers in packet */
+	u8	desc_cnt;  /* Total descriptors for this packet */
+	__be16	len;  /* Total length of this packet (in bytes) */
+	__be16	seg_len;  /* Length of this descriptor's segment */
+	__be64	seg_addr;  /* Base address (see note) of this segment */
+} __packed;
+
+struct gve_tx_mtd_desc {
+	u8      type_flags;     /* type is lower 4 bits, subtype upper  */
+	u8      path_state;     /* state is lower 4 bits, hash type upper */
+	__be16  reserved0;
+	__be32  path_hash;
+	__be64  reserved1;
+} __packed;
+
+struct gve_tx_seg_desc {
+	u8	type_flags;	/* type is lower 4 bits, flags upper	*/
+	u8	l3_offset;	/* TSO: 2 byte units to start of IPH	*/
+	__be16	reserved;
+	__be16	mss;		/* TSO MSS				*/
+	__be16	seg_len;
+	__be64	seg_addr;
+} __packed;
+
+/* GVE Transmit Descriptor Types */
+#define	GVE_TXD_STD		(0x0 << 4) /* Std with Host Address	*/
+#define	GVE_TXD_TSO		(0x1 << 4) /* TSO with Host Address	*/
+#define	GVE_TXD_SEG		(0x2 << 4) /* Seg with Host Address	*/
+#define	GVE_TXD_MTD		(0x3 << 4) /* Metadata			*/
+
+/* GVE Transmit Descriptor Flags for Std Pkts */
+#define	GVE_TXF_L4CSUM	BIT(0)	/* Need csum offload */
+#define	GVE_TXF_TSTAMP	BIT(2)	/* Timestamp required */
+
+/* GVE Transmit Descriptor Flags for TSO Segs */
+#define	GVE_TXSF_IPV6	BIT(1)	/* IPv6 TSO */
+
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH		0
+
+#define GVE_MTD_PATH_STATE_DEFAULT	0
+#define GVE_MTD_PATH_STATE_TIMEOUT	1
+#define GVE_MTD_PATH_STATE_CONGESTION	2
+#define GVE_MTD_PATH_STATE_RETRANSMIT	3
+
+#define GVE_MTD_PATH_HASH_NONE         (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4           (0x1 << 4)
+
+/* GVE Receive Packet Descriptor */
+/* The start of an ethernet packet comes 2 bytes into the rx buffer.
+ * gVNIC adds this padding so that both the DMA and the L3/4 protocol header
+ * access is aligned.
+ */
+#define GVE_RX_PAD 2
+
+struct gve_rx_desc {
+	u8	padding[48];
+	__be32	rss_hash;  /* Receive-side scaling hash (Toeplitz for gVNIC) */
+	__be16	mss;
+	__be16	reserved;  /* Reserved to zero */
+	u8	hdr_len;  /* Header length (L2-L4) including padding */
+	u8	hdr_off;  /* 64-byte-scaled offset into RX_DATA entry */
+	__sum16	csum;  /* 1's-complement partial checksum of L3+ bytes */
+	__be16	len;  /* Length of the received packet */
+	__be16	flags_seq;  /* Flags [15:3] and sequence number [2:0] (1-7) */
+} __packed;
+GVE_CHECK_STRUCT_LEN(64, gve_rx_desc);
+
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments then the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
+ */
+union gve_rx_data_slot {
+	__be64 qpl_offset;
+	__be64 addr;
+};
+
+/* GVE Receive Packet Descriptor Seq No */
+#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
+
+/* GVE Receive Packet Descriptor Flags */
+#define GVE_RXFLG(x)	cpu_to_be16(1 << (3 + (x)))
+#define	GVE_RXF_FRAG		GVE_RXFLG(3)	/* IP Fragment			*/
+#define	GVE_RXF_IPV4		GVE_RXFLG(4)	/* IPv4				*/
+#define	GVE_RXF_IPV6		GVE_RXFLG(5)	/* IPv6				*/
+#define	GVE_RXF_TCP		GVE_RXFLG(6)	/* TCP Packet			*/
+#define	GVE_RXF_UDP		GVE_RXFLG(7)	/* UDP Packet			*/
+#define	GVE_RXF_ERR		GVE_RXFLG(8)	/* Packet Error Detected	*/
+#define	GVE_RXF_PKT_CONT	GVE_RXFLG(10)	/* Multi Fragment RX packet	*/
+
+/* GVE IRQ */
+#define GVE_IRQ_ACK	BIT(31)
+#define GVE_IRQ_MASK	BIT(30)
+#define GVE_IRQ_EVENT	BIT(29)
+
+static inline bool gve_needs_rss(__be16 flag)
+{
+	if (flag & GVE_RXF_FRAG)
+		return false;
+	if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
+		return true;
+	return false;
+}
+
+static inline u8 gve_next_seqno(u8 seq)
+{
+	return (seq + 1) == 8 ? 1 : seq + 1;
+}
+#endif /* _GVE_DESC_H_ */
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
new file mode 100644
index 0000000000..9965f190d1
--- /dev/null
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+/* GVE DQO Descriptor formats */
+
+#ifndef _GVE_DESC_DQO_H_
+#define _GVE_DESC_DQO_H_
+
+#define GVE_TX_MAX_HDR_SIZE_DQO 255
+#define GVE_TX_MIN_TSO_MSS_DQO 88
+
+#ifndef __LITTLE_ENDIAN_BITFIELD
+#error "Only little endian supported"
+#endif
+
+/* Basic TX descriptor (DTYPE 0x0C) */
+struct gve_tx_pkt_desc_dqo {
+	__le64 buf_addr;
+
+	/* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */
+	u8 dtype: 5;
+
+	/* Denotes the last descriptor of a packet. */
+	u8 end_of_packet: 1;
+	u8 checksum_offload_enable: 1;
+
+	/* If set, will generate a descriptor completion for this descriptor. */
+	u8 report_event: 1;
+	u8 reserved0;
+	__le16 reserved1;
+
+	/* The TX completion associated with this packet will contain this tag.
+	 */
+	__le16 compl_tag;
+	u16 buf_size: 14;
+	u16 reserved2: 2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_pkt_desc_dqo);
+
+#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc
+#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1)
+
+/* Maximum number of data descriptors allowed per packet, or per-TSO segment. */
+#define GVE_TX_MAX_DATA_DESCS 10
+
+/* Min gap between tail and head to avoid cacheline overlap */
+#define GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP 4
+
+/* "report_event" on TX packet descriptors may only be reported on the last
+ * descriptor of a TX packet, and they must be spaced apart with at least this
+ * value.
+ */
+#define GVE_TX_MIN_RE_INTERVAL 32
+
+struct gve_tx_context_cmd_dtype {
+	u8 dtype: 5;
+	u8 tso: 1;
+	u8 reserved1: 2;
+
+	u8 reserved2;
+};
+
+GVE_CHECK_STRUCT_LEN(2, gve_tx_context_cmd_dtype);
+
+/* TX Native TSO Context DTYPE (0x05)
+ *
+ * "flex" fields allow the driver to send additional packet context to HW.
+ */
+struct gve_tx_tso_context_desc_dqo {
+	/* The L4 payload bytes that should be segmented. */
+	u32 tso_total_len: 24;
+	u32 flex10: 8;
+
+	/* Max segment size in TSO excluding headers. */
+	u16 mss: 14;
+	u16 reserved: 2;
+
+	u8 header_len; /* Header length to use for TSO offload */
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u8 flex0;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_tso_context_desc_dqo);
+
+#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5
+
+/* General context descriptor for sending metadata. */
+struct gve_tx_general_context_desc_dqo {
+	u8 flex4;
+	u8 flex5;
+	u8 flex6;
+	u8 flex7;
+	u8 flex8;
+	u8 flex9;
+	u8 flex10;
+	u8 flex11;
+	struct gve_tx_context_cmd_dtype cmd_dtype;
+	u16 reserved;
+	u8 flex0;
+	u8 flex1;
+	u8 flex2;
+	u8 flex3;
+} __packed;
+GVE_CHECK_STRUCT_LEN(16, gve_tx_general_context_desc_dqo);
+
+#define GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO 0x4
+
+/* Logical structure of metadata which is packed into context descriptor flex
+ * fields.
+ */
+struct gve_tx_metadata_dqo {
+	union {
+		struct {
+			u8 version;
+
+			/* If `skb->l4_hash` is set, this value should be
+			 * derived from `skb->hash`.
+			 *
+			 * A zero value means no l4_hash was associated with the
+			 * skb.
+			 */
+			u16 path_hash: 15;
+
+			/* Should be set to 1 if the flow associated with the
+			 * skb had a rehash from the TCP stack.
+			 */
+			u16 rehash_event: 1;
+		}  __packed;
+		u8 bytes[12];
+	};
+}  __packed;
+GVE_CHECK_STRUCT_LEN(12, gve_tx_metadata_dqo);
+
+#define GVE_TX_METADATA_VERSION_DQO 0
+
+/* TX completion descriptor */
+struct gve_tx_compl_desc {
+	/* For types 0-4 this is the TX queue ID associated with this
+	 * completion.
+	 */
+	u16 id: 11;
+
+	/* See: GVE_COMPL_TYPE_DQO* */
+	u16 type: 3;
+	u16 reserved0: 1;
+
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	union {
+		/* For descriptor completions, this is the last index fetched
+		 * by HW + 1.
+		 */
+		__le16 tx_head;
+
+		/* For packet completions, this is the completion tag set on the
+		 * TX packet descriptors.
+		 */
+		__le16 completion_tag;
+	};
+	__le32 reserved1;
+} __packed;
+GVE_CHECK_STRUCT_LEN(8, gve_tx_compl_desc);
+
+#define GVE_COMPL_TYPE_DQO_PKT 0x2 /* Packet completion */
+#define GVE_COMPL_TYPE_DQO_DESC 0x4 /* Descriptor completion */
+#define GVE_COMPL_TYPE_DQO_MISS 0x1 /* Miss path completion */
+#define GVE_COMPL_TYPE_DQO_REINJECTION 0x3 /* Re-injection completion */
+
+/* Descriptor to post buffers to HW on buffer queue. */
+struct gve_rx_desc_dqo {
+	__le16 buf_id; /* ID returned in Rx completion descriptor */
+	__le16 reserved0;
+	__le32 reserved1;
+	__le64 buf_addr; /* DMA address of the buffer */
+	__le64 header_buf_addr;
+	__le64 reserved2;
+} __packed;
+GVE_CHECK_STRUCT_LEN(32, gve_rx_desc_dqo);
+
+/* Descriptor for HW to notify SW of new packets received on RX queue. */
+struct gve_rx_compl_desc_dqo {
+	/* Must be 1 */
+	u8 rxdid: 4;
+	u8 reserved0: 4;
+
+	/* Packet originated from this system rather than the network. */
+	u8 loopback: 1;
+	/* Set when IPv6 packet contains a destination options header or routing
+	 * header.
+	 */
+	u8 ipv6_ex_add: 1;
+	/* Invalid packet was received. */
+	u8 rx_error: 1;
+	u8 reserved1: 5;
+
+	u16 packet_type: 10;
+	u16 ip_hdr_err: 1;
+	u16 udp_len_err: 1;
+	u16 raw_cs_invalid: 1;
+	u16 reserved2: 3;
+
+	u16 packet_len: 14;
+	/* Flipped by HW to notify the descriptor is populated. */
+	u16 generation: 1;
+	/* Should be zero. */
+	u16 buffer_queue_id: 1;
+
+	u16 header_len: 10;
+	u16 rsc: 1;
+	u16 split_header: 1;
+	u16 reserved3: 4;
+
+	u8 descriptor_done: 1;
+	u8 end_of_packet: 1;
+	u8 header_buffer_overflow: 1;
+	u8 l3_l4_processed: 1;
+	u8 csum_ip_err: 1;
+	u8 csum_l4_err: 1;
+	u8 csum_external_ip_err: 1;
+	u8 csum_external_udp_err: 1;
+
+	u8 status_error1;
+
+	__le16 reserved5;
+	__le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
+
+	union {
+		/* Packet checksum. */
+		__le16 raw_cs;
+		/* Segment length for RSC packets. */
+		__le16 rsc_seg_len;
+	};
+	__le32 hash;
+	__le32 reserved6;
+	__le64 reserved7;
+} __packed;
+
+GVE_CHECK_STRUCT_LEN(32, gve_rx_compl_desc_dqo);
+
+/* Ringing the doorbell too often can hurt performance.
+ *
+ * HW requires this value to be at least 8.
+ */
+#define GVE_RX_BUF_THRESH_DQO 32
+
+#endif /* _GVE_DESC_DQO_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
new file mode 100644
index 0000000000..bf7f102cde
--- /dev/null
+++ b/drivers/net/gve/base/gve_register.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: MIT
+ * Google Virtual Ethernet (gve) driver
+ * Copyright (C) 2015-2022 Google, Inc.
+ */
+
+#ifndef _GVE_REGISTER_H_
+#define _GVE_REGISTER_H_
+
+/* Fixed Configuration Registers */
+struct gve_registers {
+	__be32	device_status;
+	__be32	driver_status;
+	__be32	max_tx_queues;
+	__be32	max_rx_queues;
+	__be32	adminq_pfn;
+	__be32	adminq_doorbell;
+	__be32	adminq_event_counter;
+	u8	reserved[3];
+	u8	driver_version;
+};
+
+enum gve_device_status_flags {
+	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
+	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
+};
+#endif /* _GVE_REGISTER_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v8 2/8] net/gve/base: add OS specific implementation
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 1/8] net/gve/base: introduce base code Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 3/8] net/gve: add support for device initialization Junfeng Guo
                                             ` (6 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

Add some MACRO definitions and memory operations which are specific
for DPDK.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/base/gve_adminq.h   |   2 +
 drivers/net/gve/base/gve_desc.h     |   2 +
 drivers/net/gve/base/gve_desc_dqo.h |   2 +
 drivers/net/gve/base/gve_osdep.h    | 159 ++++++++++++++++++++++++++++
 drivers/net/gve/base/gve_register.h |   2 +
 5 files changed, 167 insertions(+)
 create mode 100644 drivers/net/gve/base/gve_osdep.h

diff --git a/drivers/net/gve/base/gve_adminq.h b/drivers/net/gve/base/gve_adminq.h
index b2422d7dc8..05550119de 100644
--- a/drivers/net/gve/base/gve_adminq.h
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -6,6 +6,8 @@
 #ifndef _GVE_ADMINQ_H
 #define _GVE_ADMINQ_H
 
+#include "gve_osdep.h"
+
 /* Admin queue opcodes */
 enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESCRIBE_DEVICE		= 0x1,
diff --git a/drivers/net/gve/base/gve_desc.h b/drivers/net/gve/base/gve_desc.h
index e0bbadcfd4..006b36442f 100644
--- a/drivers/net/gve/base/gve_desc.h
+++ b/drivers/net/gve/base/gve_desc.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_DESC_H_
 #define _GVE_DESC_H_
 
+#include "gve_osdep.h"
+
 /* A note on seg_addrs
  *
  * Base addresses encoded in seg_addr are not assumed to be physical
diff --git a/drivers/net/gve/base/gve_desc_dqo.h b/drivers/net/gve/base/gve_desc_dqo.h
index 9965f190d1..ee1afdecb8 100644
--- a/drivers/net/gve/base/gve_desc_dqo.h
+++ b/drivers/net/gve/base/gve_desc_dqo.h
@@ -8,6 +8,8 @@
 #ifndef _GVE_DESC_DQO_H_
 #define _GVE_DESC_DQO_H_
 
+#include "gve_osdep.h"
+
 #define GVE_TX_MAX_HDR_SIZE_DQO 255
 #define GVE_TX_MIN_TSO_MSS_DQO 88
 
diff --git a/drivers/net/gve/base/gve_osdep.h b/drivers/net/gve/base/gve_osdep.h
new file mode 100644
index 0000000000..7cb73002f4
--- /dev/null
+++ b/drivers/net/gve/base/gve_osdep.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_OSDEP_H_
+#define _GVE_OSDEP_H_
+
+#include <string.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_bitops.h>
+#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_ether.h>
+#include <rte_io.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_memzone.h>
+
+#include "../gve_logs.h"
+
+typedef uint8_t u8;
+typedef uint16_t u16;
+typedef uint32_t u32;
+typedef uint64_t u64;
+
+typedef rte_be16_t __sum16;
+
+typedef rte_be16_t __be16;
+typedef rte_be32_t __be32;
+typedef rte_be64_t __be64;
+
+typedef rte_iova_t dma_addr_t;
+
+#define ETH_MIN_MTU	RTE_ETHER_MIN_MTU
+#define ETH_ALEN	RTE_ETHER_ADDR_LEN
+
+#ifndef PAGE_SHIFT
+#define PAGE_SHIFT	12
+#endif
+#ifndef PAGE_SIZE
+#define PAGE_SIZE	(1UL << PAGE_SHIFT)
+#endif
+
+#define BIT(nr)		RTE_BIT32(nr)
+
+#define be16_to_cpu(x) rte_be_to_cpu_16(x)
+#define be32_to_cpu(x) rte_be_to_cpu_32(x)
+#define be64_to_cpu(x) rte_be_to_cpu_64(x)
+
+#define cpu_to_be16(x) rte_cpu_to_be_16(x)
+#define cpu_to_be32(x) rte_cpu_to_be_32(x)
+#define cpu_to_be64(x) rte_cpu_to_be_64(x)
+
+#define READ_ONCE32(x) rte_read32(&(x))
+
+#ifndef ____cacheline_aligned
+#define ____cacheline_aligned	__rte_cache_aligned
+#endif
+#ifndef __packed
+#define __packed		__rte_packed
+#endif
+#define __iomem
+
+#define msleep(ms)		rte_delay_ms(ms)
+
+/* These macros are used to generate compilation errors if a struct/union
+ * is not exactly the correct length. It gives a divide by zero error if
+ * the struct/union is not of the correct size, otherwise it creates an
+ * enum that is never used.
+ */
+#define GVE_CHECK_STRUCT_LEN(n, X) enum gve_static_assert_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(struct X) == (n)) ? 1 : 0) }
+#define GVE_CHECK_UNION_LEN(n, X) enum gve_static_asset_enum_##X \
+	{ gve_static_assert_##X = (n) / ((sizeof(union X) == (n)) ? 1 : 0) }
+
+static __rte_always_inline u8
+readb(volatile void *addr)
+{
+	return rte_read8(addr);
+}
+
+static __rte_always_inline void
+writeb(u8 value, volatile void *addr)
+{
+	rte_write8(value, addr);
+}
+
+static __rte_always_inline void
+writel(u32 value, volatile void *addr)
+{
+	rte_write32(value, addr);
+}
+
+static __rte_always_inline u32
+ioread32be(const volatile void *addr)
+{
+	return rte_be_to_cpu_32(rte_read32(addr));
+}
+
+static __rte_always_inline void
+iowrite32be(u32 value, volatile void *addr)
+{
+	writel(rte_cpu_to_be_32(value), addr);
+}
+
+/* DMA memory allocation tracking */
+struct gve_dma_mem {
+	void *va;
+	rte_iova_t pa;
+	uint32_t size;
+	const void *zone;
+};
+
+static inline void *
+gve_alloc_dma_mem(struct gve_dma_mem *mem, u64 size)
+{
+	static uint16_t gve_dma_memzone_id;
+	const struct rte_memzone *mz = NULL;
+	char z_name[RTE_MEMZONE_NAMESIZE];
+
+	if (!mem)
+		return NULL;
+
+	snprintf(z_name, sizeof(z_name), "gve_dma_%u",
+		 __atomic_fetch_add(&gve_dma_memzone_id, 1, __ATOMIC_RELAXED));
+	mz = rte_memzone_reserve_aligned(z_name, size, SOCKET_ID_ANY,
+					 RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (!mz)
+		return NULL;
+
+	mem->size = size;
+	mem->va = mz->addr;
+	mem->pa = mz->iova;
+	mem->zone = mz;
+	PMD_DRV_LOG(DEBUG, "memzone %s is allocated", mz->name);
+
+	return mem->va;
+}
+
+static inline void
+gve_free_dma_mem(struct gve_dma_mem *mem)
+{
+	PMD_DRV_LOG(DEBUG, "memzone %s to be freed",
+		    ((const struct rte_memzone *)mem->zone)->name);
+
+	rte_memzone_free(mem->zone);
+	mem->zone = NULL;
+	mem->va = NULL;
+	mem->pa = 0;
+}
+
+#endif /* _GVE_OSDEP_H_ */
diff --git a/drivers/net/gve/base/gve_register.h b/drivers/net/gve/base/gve_register.h
index bf7f102cde..c674167f31 100644
--- a/drivers/net/gve/base/gve_register.h
+++ b/drivers/net/gve/base/gve_register.h
@@ -6,6 +6,8 @@
 #ifndef _GVE_REGISTER_H_
 #define _GVE_REGISTER_H_
 
+#include "gve_osdep.h"
+
 /* Fixed Configuration Registers */
 struct gve_registers {
 	__be32	device_status;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v8 3/8] net/gve: add support for device initialization
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 1/8] net/gve/base: introduce base code Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 2/8] net/gve/base: add OS specific implementation Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 4/8] net/gve: add support for link update Junfeng Guo
                                             ` (5 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo, Haiyue Wang

Support device init and add following devops skeleton:
 - dev_configure
 - dev_start
 - dev_stop
 - dev_close

Note that build system (including doc) is also added in this patch.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 MAINTAINERS                            |   6 +
 doc/guides/nics/features/gve.ini       |  10 +
 doc/guides/nics/gve.rst                |  68 +++++
 doc/guides/nics/index.rst              |   1 +
 doc/guides/rel_notes/release_22_11.rst |   5 +
 drivers/net/gve/base/gve_adminq.c      |   1 +
 drivers/net/gve/gve_ethdev.c           | 368 +++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h           | 225 +++++++++++++++
 drivers/net/gve/gve_logs.h             |  14 +
 drivers/net/gve/meson.build            |  14 +
 drivers/net/gve/version.map            |   3 +
 drivers/net/meson.build                |   1 +
 12 files changed, 716 insertions(+)
 create mode 100644 doc/guides/nics/features/gve.ini
 create mode 100644 doc/guides/nics/gve.rst
 create mode 100644 drivers/net/gve/gve_ethdev.c
 create mode 100644 drivers/net/gve/gve_ethdev.h
 create mode 100644 drivers/net/gve/gve_logs.h
 create mode 100644 drivers/net/gve/meson.build
 create mode 100644 drivers/net/gve/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 92b381bc30..2d06a76efe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,12 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Google Virtual Ethernet
+M: Junfeng Guo <junfeng.guo@intel.com>
+F: drivers/net/gve/
+F: doc/guides/nics/gve.rst
+F: doc/guides/nics/features/gve.ini
+
 Hisilicon hns3
 M: Dongdong Liu <liudongdong3@huawei.com>
 M: Yisen Zhuang <yisen.zhuang@huawei.com>
diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
new file mode 100644
index 0000000000..44aec28009
--- /dev/null
+++ b/doc/guides/nics/features/gve.ini
@@ -0,0 +1,10 @@
+;
+; Supported features of the Google Virtual Ethernet 'gve' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux                = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
new file mode 100644
index 0000000000..703fbcc5de
--- /dev/null
+++ b/doc/guides/nics/gve.rst
@@ -0,0 +1,68 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(C) 2022 Intel Corporation.
+
+GVE poll mode driver
+=======================
+
+The GVE PMD (**librte_net_gve**) provides poll mode driver support for
+Google Virtual Ethernet device (also called as gVNIC).
+
+gVNIC is the standard virtual ethernet interface on Google Cloud Platform (GCP),
+which is one of the multiple virtual interfaces from those leading CSP
+customers in the world.
+
+Please refer to https://cloud.google.com/compute/docs/networking/using-gvnic
+for the device description.
+
+Having a well maintained/optimized gve PMD on DPDK community can help those
+cloud instance consumers with better experience of performance, maintenance
+who wants to run their own VNFs on GCP.
+
+The base code is under MIT license and based on GVE kernel driver v1.3.0.
+GVE base code files are:
+
+- gve_adminq.h
+- gve_adminq.c
+- gve_desc.h
+- gve_desc_dqo.h
+- gve_register.h
+- gve.h
+
+Please refer to https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0/google/gve
+to find the original base code.
+
+GVE has 3 queue formats:
+
+- GQI_QPL - GQI with queue page list
+- GQI_RDA - GQI with raw DMA addressing
+- DQO_RDA - DQO with raw DMA addressing
+
+GQI_QPL queue format is queue page list mode. Driver needs to allocate
+memory and register this memory as a Queue Page List (QPL) in hardware
+(Google Hypervisor/GVE Backend) first. Each queue has its own QPL.
+Then Tx needs to copy packets to QPL memory and put this packet's offset
+in the QPL memory into hardware descriptors so that hardware can get the
+packets data. And Rx needs to read descriptors of offset in QPL to get
+QPL address and copy packets from the address to get real packets data.
+
+GQI_RDA queue format works like usual NICs that driver can put packets'
+physical address into hardware descriptors.
+
+DQO_RDA queue format has submission and completion queue pair for each
+Tx/Rx queue. And similar as GQI_RDA, driver can put packets' physical
+address into hardware descriptors.
+
+Please refer to https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/google/gve.html
+to get more information about GVE queue formats.
+
+Features and Limitations
+------------------------
+
+In this release, the GVE PMD provides the basic functionality of packet
+reception and transmission.
+
+Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
+Jumbo Frame is not supported in PMD for now. It'll be added in the future
+DPDK release.
+Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
+released in production.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 32c7544968..4d40ea29a3 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -29,6 +29,7 @@ Network Interface Controller Drivers
     enetfec
     enic
     fm10k
+    gve
     hinic
     hns3
     i40e
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 1c3daf141d..21c366b0e2 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -152,6 +152,11 @@ New Features
   * Added Q-in-CMB feature controlled by devarg ionic_cmb.
   * Added optimized handlers for non-scattered Rx and Tx.
 
+* **Added GVE net PMD**
+
+  * Added the new ``gve`` net driver for Google Virtual Ethernet devices.
+  * See the :doc:`../nics/gve` NIC guide for more details on this new driver.
+
 * **Updated Intel iavf driver.**
 
   * Added flow subscription support.
diff --git a/drivers/net/gve/base/gve_adminq.c b/drivers/net/gve/base/gve_adminq.c
index 045d47615d..e745b709b2 100644
--- a/drivers/net/gve/base/gve_adminq.c
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2015-2022 Google, Inc.
  */
 
+#include "../gve_ethdev.h"
 #include "gve_adminq.h"
 #include "gve_register.h"
 
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
new file mode 100644
index 0000000000..acbb412509
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.c
@@ -0,0 +1,368 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+#include <linux/pci_regs.h>
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+#include "base/gve_register.h"
+
+const char gve_version_str[] = GVE_VERSION;
+static const char gve_version_prefix[] = GVE_VERSION_PREFIX;
+
+static void
+gve_write_version(uint8_t *driver_version_register)
+{
+	const char *c = gve_version_prefix;
+
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+
+	c = gve_version_str;
+	while (*c) {
+		writeb(*c, driver_version_register);
+		c++;
+	}
+	writeb('\n', driver_version_register);
+}
+
+static int
+gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+{
+	return 0;
+}
+
+static int
+gve_dev_start(struct rte_eth_dev *dev)
+{
+	dev->data->dev_started = 1;
+
+	return 0;
+}
+
+static int
+gve_dev_stop(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+	dev->data->dev_started = 0;
+
+	return 0;
+}
+
+static int
+gve_dev_close(struct rte_eth_dev *dev)
+{
+	int err = 0;
+
+	if (dev->data->dev_started) {
+		err = gve_dev_stop(dev);
+		if (err != 0)
+			PMD_DRV_LOG(ERR, "Failed to stop dev.");
+	}
+
+	dev->data->mac_addrs = NULL;
+
+	return err;
+}
+
+static const struct eth_dev_ops gve_eth_dev_ops = {
+	.dev_configure        = gve_dev_configure,
+	.dev_start            = gve_dev_start,
+	.dev_stop             = gve_dev_stop,
+	.dev_close            = gve_dev_close,
+};
+
+static void
+gve_free_counter_array(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->cnt_array_mz);
+	priv->cnt_array = NULL;
+}
+
+static void
+gve_free_irq_db(struct gve_priv *priv)
+{
+	rte_memzone_free(priv->irq_dbs_mz);
+	priv->irq_dbs = NULL;
+}
+
+static void
+gve_teardown_device_resources(struct gve_priv *priv)
+{
+	int err;
+
+	/* Tell device its resources are being freed */
+	if (gve_get_device_resources_ok(priv)) {
+		err = gve_adminq_deconfigure_device_resources(priv);
+		if (err)
+			PMD_DRV_LOG(ERR, "Could not deconfigure device resources: err=%d", err);
+	}
+	gve_free_counter_array(priv);
+	gve_free_irq_db(priv);
+	gve_clear_device_resources_ok(priv);
+}
+
+static uint8_t
+pci_dev_find_capability(struct rte_pci_device *pdev, int cap)
+{
+	uint8_t pos, id;
+	uint16_t ent;
+	int loops;
+	int ret;
+
+	ret = rte_pci_read_config(pdev, &pos, sizeof(pos), PCI_CAPABILITY_LIST);
+	if (ret != sizeof(pos))
+		return 0;
+
+	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
+
+	while (pos && loops--) {
+		ret = rte_pci_read_config(pdev, &ent, sizeof(ent), pos);
+		if (ret != sizeof(ent))
+			return 0;
+
+		id = ent & 0xff;
+		if (id == 0xff)
+			break;
+
+		if (id == cap)
+			return pos;
+
+		pos = (ent >> 8);
+	}
+
+	return 0;
+}
+
+static int
+pci_dev_msix_vec_count(struct rte_pci_device *pdev)
+{
+	uint8_t msix_cap = pci_dev_find_capability(pdev, PCI_CAP_ID_MSIX);
+	uint16_t control;
+	int ret;
+
+	if (!msix_cap)
+		return 0;
+
+	ret = rte_pci_read_config(pdev, &control, sizeof(control), msix_cap + PCI_MSIX_FLAGS);
+	if (ret != sizeof(control))
+		return 0;
+
+	return (control & PCI_MSIX_FLAGS_QSIZE) + 1;
+}
+
+static int
+gve_setup_device_resources(struct gve_priv *priv)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	int err = 0;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_cnt_arr", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 priv->num_event_counters * sizeof(*priv->cnt_array),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for count array");
+		return -ENOMEM;
+	}
+	priv->cnt_array = (rte_be32_t *)mz->addr;
+	priv->cnt_array_mz = mz;
+
+	snprintf(z_name, sizeof(z_name), "gve_%s_irqmz", priv->pci_dev->device.name);
+	mz = rte_memzone_reserve_aligned(z_name,
+					 sizeof(*priv->irq_dbs) * (priv->num_ntfy_blks),
+					 rte_socket_id(), RTE_MEMZONE_IOVA_CONTIG,
+					 PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Could not alloc memzone for irq_dbs");
+		err = -ENOMEM;
+		goto free_cnt_array;
+	}
+	priv->irq_dbs = (struct gve_irq_db *)mz->addr;
+	priv->irq_dbs_mz = mz;
+
+	err = gve_adminq_configure_device_resources(priv,
+						    priv->cnt_array_mz->iova,
+						    priv->num_event_counters,
+						    priv->irq_dbs_mz->iova,
+						    priv->num_ntfy_blks);
+	if (unlikely(err)) {
+		PMD_DRV_LOG(ERR, "Could not config device resources: err=%d", err);
+		goto free_irq_dbs;
+	}
+	return 0;
+
+free_irq_dbs:
+	gve_free_irq_db(priv);
+free_cnt_array:
+	gve_free_counter_array(priv);
+
+	return err;
+}
+
+static int
+gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
+{
+	int num_ntfy;
+	int err;
+
+	/* Set up the adminq */
+	err = gve_adminq_alloc(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to alloc admin queue: err=%d", err);
+		return err;
+	}
+
+	if (skip_describe_device)
+		goto setup_device;
+
+	/* Get the initial information we need from the device */
+	err = gve_adminq_describe_device(priv);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Could not get device information: err=%d", err);
+		goto free_adminq;
+	}
+
+	num_ntfy = pci_dev_msix_vec_count(priv->pci_dev);
+	if (num_ntfy <= 0) {
+		PMD_DRV_LOG(ERR, "Could not count MSI-x vectors");
+		err = -EIO;
+		goto free_adminq;
+	} else if (num_ntfy < GVE_MIN_MSIX) {
+		PMD_DRV_LOG(ERR, "GVE needs at least %d MSI-x vectors, but only has %d",
+			    GVE_MIN_MSIX, num_ntfy);
+		err = -EINVAL;
+		goto free_adminq;
+	}
+
+	priv->num_registered_pages = 0;
+
+	/* gvnic has one Notification Block per MSI-x vector, except for the
+	 * management vector
+	 */
+	priv->num_ntfy_blks = (num_ntfy - 1) & ~0x1;
+	priv->mgmt_msix_idx = priv->num_ntfy_blks;
+
+	priv->max_nb_txq = RTE_MIN(priv->max_nb_txq, priv->num_ntfy_blks / 2);
+	priv->max_nb_rxq = RTE_MIN(priv->max_nb_rxq, priv->num_ntfy_blks / 2);
+
+	if (priv->default_num_queues > 0) {
+		priv->max_nb_txq = RTE_MIN(priv->default_num_queues, priv->max_nb_txq);
+		priv->max_nb_rxq = RTE_MIN(priv->default_num_queues, priv->max_nb_rxq);
+	}
+
+	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
+		    priv->max_nb_txq, priv->max_nb_rxq);
+
+setup_device:
+	err = gve_setup_device_resources(priv);
+	if (!err)
+		return 0;
+free_adminq:
+	gve_adminq_free(priv);
+	return err;
+}
+
+static void
+gve_teardown_priv_resources(struct gve_priv *priv)
+{
+	gve_teardown_device_resources(priv);
+	gve_adminq_free(priv);
+}
+
+static int
+gve_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+	int max_tx_queues, max_rx_queues;
+	struct rte_pci_device *pci_dev;
+	struct gve_registers *reg_bar;
+	rte_be32_t *db_bar;
+	int err;
+
+	eth_dev->dev_ops = &gve_eth_dev_ops;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
+
+	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	if (!reg_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map pci bar!");
+		return -ENOMEM;
+	}
+
+	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	if (!db_bar) {
+		PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
+		return -ENOMEM;
+	}
+
+	gve_write_version(&reg_bar->driver_version);
+	/* Get max queues to alloc etherdev */
+	max_tx_queues = ioread32be(&reg_bar->max_tx_queues);
+	max_rx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	priv->reg_bar0 = reg_bar;
+	priv->db_bar2 = db_bar;
+	priv->pci_dev = pci_dev;
+	priv->state_flags = 0x0;
+
+	priv->max_nb_txq = max_tx_queues;
+	priv->max_nb_rxq = max_rx_queues;
+
+	err = gve_init_priv(priv, false);
+	if (err)
+		return err;
+
+	eth_dev->data->mac_addrs = &priv->dev_addr;
+
+	return 0;
+}
+
+static int
+gve_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct gve_priv *priv = eth_dev->data->dev_private;
+
+	gve_teardown_priv_resources(priv);
+
+	eth_dev->data->mac_addrs = NULL;
+
+	return 0;
+}
+
+static int
+gve_pci_probe(__rte_unused struct rte_pci_driver *pci_drv,
+	      struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct gve_priv), gve_dev_init);
+}
+
+static int
+gve_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev, gve_dev_uninit);
+}
+
+static const struct rte_pci_id pci_id_gve_map[] = {
+	{ RTE_PCI_DEVICE(GOOGLE_VENDOR_ID, GVE_DEV_ID) },
+	{ .device_id = 0 },
+};
+
+static struct rte_pci_driver rte_gve_pmd = {
+	.id_table = pci_id_gve_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.probe = gve_pci_probe,
+	.remove = gve_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_gve, rte_gve_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_gve, pci_id_gve_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_gve, "* igb_uio | vfio-pci");
+RTE_LOG_REGISTER_SUFFIX(gve_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
new file mode 100644
index 0000000000..2ac2a46ac1
--- /dev/null
+++ b/drivers/net/gve/gve_ethdev.h
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_ETHDEV_H_
+#define _GVE_ETHDEV_H_
+
+#include <ethdev_driver.h>
+#include <ethdev_pci.h>
+#include <rte_ether.h>
+
+#include "base/gve.h"
+
+#define GVE_DEFAULT_RX_FREE_THRESH  512
+#define GVE_DEFAULT_TX_FREE_THRESH  256
+#define GVE_TX_MAX_FREE_SZ          512
+
+#define GVE_MIN_BUF_SIZE	    1024
+#define GVE_MAX_RX_PKTLEN	    65535
+
+/* A list of pages registered with the device during setup and used by a queue
+ * as buffers
+ */
+struct gve_queue_page_list {
+	uint32_t id; /* unique id */
+	uint32_t num_entries;
+	dma_addr_t *page_buses; /* the dma addrs of the pages */
+	const struct rte_memzone *mz;
+};
+
+/* A TX desc ring entry */
+union gve_tx_desc {
+	struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
+};
+
+struct gve_tx_queue {
+	volatile union gve_tx_desc *tx_desc_ring;
+	const struct rte_memzone *mz;
+	uint64_t tx_ring_phys_addr;
+
+	uint16_t nb_tx_desc;
+
+	/* Only valid for DQO_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+
+	uint16_t ntfy_id;
+	volatile rte_be32_t *ntfy_addr;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_tx_queue *complq;
+};
+
+struct gve_rx_queue {
+	volatile struct gve_rx_desc *rx_desc_ring;
+	volatile union gve_rx_data_slot *rx_data_ring;
+	const struct rte_memzone *mz;
+	const struct rte_memzone *data_mz;
+	uint64_t rx_ring_phys_addr;
+
+	uint16_t nb_rx_desc;
+
+	volatile rte_be32_t *ntfy_addr;
+
+	/* only valid for GQI_QPL queue format */
+	struct gve_queue_page_list *qpl;
+
+	struct gve_priv *hw;
+	const struct rte_memzone *qres_mz;
+	struct gve_queue_resources *qres;
+
+	uint16_t port_id;
+	uint16_t queue_id;
+	uint16_t ntfy_id;
+	uint16_t rx_buf_len;
+
+	/* Only valid for DQO_RDA queue format */
+	struct gve_rx_queue *bufq;
+};
+
+struct gve_priv {
+	struct gve_irq_db *irq_dbs; /* array of num_ntfy_blks */
+	const struct rte_memzone *irq_dbs_mz;
+	uint32_t mgmt_msix_idx;
+	rte_be32_t *cnt_array; /* array of num_event_counters */
+	const struct rte_memzone *cnt_array_mz;
+
+	uint16_t num_event_counters;
+	uint16_t tx_desc_cnt; /* txq size */
+	uint16_t rx_desc_cnt; /* rxq size */
+	uint16_t tx_pages_per_qpl; /* tx buffer length */
+	uint16_t rx_data_slot_cnt; /* rx buffer length */
+
+	/* Only valid for DQO_RDA queue format */
+	uint16_t tx_compq_size; /* tx completion queue size */
+	uint16_t rx_bufq_size; /* rx buff queue size */
+
+	uint64_t max_registered_pages;
+	uint64_t num_registered_pages; /* num pages registered with NIC */
+	uint16_t default_num_queues; /* default num queues to set up */
+	enum gve_queue_format queue_format; /* see enum gve_queue_format */
+	uint8_t enable_rsc;
+
+	uint16_t max_nb_txq;
+	uint16_t max_nb_rxq;
+	uint32_t num_ntfy_blks; /* spilt between TX and RX so must be even */
+
+	struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
+	rte_be32_t __iomem *db_bar2; /* "array" of doorbells */
+	struct rte_pci_device *pci_dev;
+
+	/* Admin queue - see gve_adminq.h*/
+	union gve_adminq_command *adminq;
+	struct gve_dma_mem adminq_dma_mem;
+	uint32_t adminq_mask; /* masks prod_cnt to adminq size */
+	uint32_t adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	uint32_t adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	uint32_t adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	uint32_t adminq_describe_device_cnt;
+	uint32_t adminq_cfg_device_resources_cnt;
+	uint32_t adminq_register_page_list_cnt;
+	uint32_t adminq_unregister_page_list_cnt;
+	uint32_t adminq_create_tx_queue_cnt;
+	uint32_t adminq_create_rx_queue_cnt;
+	uint32_t adminq_destroy_tx_queue_cnt;
+	uint32_t adminq_destroy_rx_queue_cnt;
+	uint32_t adminq_dcfg_device_resources_cnt;
+	uint32_t adminq_set_driver_parameter_cnt;
+	uint32_t adminq_report_stats_cnt;
+	uint32_t adminq_report_link_speed_cnt;
+	uint32_t adminq_get_ptype_map_cnt;
+
+	volatile uint32_t state_flags;
+
+	/* Gvnic device link speed from hypervisor. */
+	uint64_t link_speed;
+
+	uint16_t max_mtu;
+	struct rte_ether_addr dev_addr; /* mac address */
+
+	struct gve_queue_page_list *qpl;
+
+	struct gve_tx_queue **txqs;
+	struct gve_rx_queue **rxqs;
+};
+
+static inline bool
+gve_is_gqi(struct gve_priv *priv)
+{
+	return priv->queue_format == GVE_GQI_RDA_FORMAT ||
+		priv->queue_format == GVE_GQI_QPL_FORMAT;
+}
+
+static inline bool
+gve_get_admin_queue_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_admin_queue_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_resources_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_resources_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK,
+				&priv->state_flags);
+}
+
+static inline bool
+gve_get_device_rings_ok(struct gve_priv *priv)
+{
+	return !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				       &priv->state_flags);
+}
+
+static inline void
+gve_set_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_set32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+			      &priv->state_flags);
+}
+
+static inline void
+gve_clear_device_rings_ok(struct gve_priv *priv)
+{
+	rte_bit_relaxed_clear32(GVE_PRIV_FLAGS_DEVICE_RINGS_OK,
+				&priv->state_flags);
+}
+
+#endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_logs.h b/drivers/net/gve/gve_logs.h
new file mode 100644
index 0000000000..0d02da46e1
--- /dev/null
+++ b/drivers/net/gve/gve_logs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#ifndef _GVE_LOGS_H_
+#define _GVE_LOGS_H_
+
+extern int gve_logtype_driver;
+
+#define PMD_DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, gve_logtype_driver, "%s(): " fmt "\n", \
+		__func__, ## args)
+
+#endif
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
new file mode 100644
index 0000000000..d8ec64b3a3
--- /dev/null
+++ b/drivers/net/gve/meson.build
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2022 Intel Corporation
+
+if is_windows
+    build = false
+    reason = 'not supported on Windows'
+    subdir_done()
+endif
+
+sources = files(
+        'base/gve_adminq.c',
+        'gve_ethdev.c',
+)
+includes += include_directories('base')
diff --git a/drivers/net/gve/version.map b/drivers/net/gve/version.map
new file mode 100644
index 0000000000..78c3585d7c
--- /dev/null
+++ b/drivers/net/gve/version.map
@@ -0,0 +1,3 @@
+DPDK_23 {
+	local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 35bfa78dee..355dbd07e9 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -23,6 +23,7 @@ drivers = [
         'enic',
         'failsafe',
         'fm10k',
+        'gve',
         'hinic',
         'hns3',
         'i40e',
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v8 4/8] net/gve: add support for link update
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
                                             ` (2 preceding siblings ...)
  2022-10-25  9:07                           ` [PATCH v8 3/8] net/gve: add support for device initialization Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 5/8] net/gve: add support for MTU setting Junfeng Guo
                                             ` (4 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Support dev_ops link_update.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 doc/guides/nics/gve.rst          |  3 +++
 drivers/net/gve/gve_ethdev.c     | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 44aec28009..ae466ad677 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Link status          = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 703fbcc5de..c42ff23841 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -60,6 +60,9 @@ Features and Limitations
 
 In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
+Supported features of the GVE PMD are:
+
+- Link state information
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index acbb412509..34243c1672 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -34,10 +34,39 @@ gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	struct rte_eth_link link;
+	int err;
+
+	memset(&link, 0, sizeof(link));
+	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
+
+	if (!dev->data->dev_started) {
+		link.link_status = RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+	} else {
+		link.link_status = RTE_ETH_LINK_UP;
+		PMD_DRV_LOG(DEBUG, "Get link status from hw");
+		err = gve_adminq_report_link_speed(priv);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to get link speed.");
+			priv->link_speed = RTE_ETH_SPEED_NUM_UNKNOWN;
+		}
+		link.link_speed = priv->link_speed;
+	}
+
+	return rte_eth_linkstatus_set(dev, &link);
+}
+
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
 	dev->data->dev_started = 1;
+	gve_link_update(dev, 0);
 
 	return 0;
 }
@@ -72,6 +101,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.link_update          = gve_link_update,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v8 5/8] net/gve: add support for MTU setting
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
                                             ` (3 preceding siblings ...)
  2022-10-25  9:07                           ` [PATCH v8 4/8] net/gve: add support for link update Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25 15:55                             ` Stephen Hemminger
  2022-10-25  9:07                           ` [PATCH v8 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
                                             ` (3 subsequent siblings)
  8 siblings, 1 reply; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Support dev_ops mtu_set.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  1 +
 doc/guides/nics/gve.rst          |  2 ++
 drivers/net/gve/gve_ethdev.c     | 28 ++++++++++++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index ae466ad677..d1703d8dab 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -5,6 +5,7 @@
 ;
 [Features]
 Link status          = Y
+MTU update           = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index c42ff23841..36a65e4717 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -69,3 +69,5 @@ Jumbo Frame is not supported in PMD for now. It'll be added in the future
 DPDK release.
 Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
 released in production.
+
+Currently, setting MTU with value larger than 1460 is not supported.
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 34243c1672..554f58640d 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -96,12 +96,40 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+	int err;
+
+	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
+		PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
+			    RTE_ETHER_MIN_MTU, priv->max_mtu);
+		return -EINVAL;
+	}
+
+	/* mtu setting is forbidden if port is start */
+	if (dev->data->dev_started) {
+		PMD_DRV_LOG(ERR, "Port must be stopped before configuration");
+		return -EBUSY;
+	}
+
+	err = gve_adminq_set_mtu(priv, mtu);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed to set mtu as %u err = %d", mtu, err);
+		return err;
+	}
+
+	return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_configure        = gve_dev_configure,
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.link_update          = gve_link_update,
+	.mtu_set              = gve_dev_mtu_set,
 };
 
 static void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v8 6/8] net/gve: add support for dev info get and dev configure
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
                                             ` (4 preceding siblings ...)
  2022-10-25  9:07                           ` [PATCH v8 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 7/8] net/gve: add support for queue operations Junfeng Guo
                                             ` (2 subsequent siblings)
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add dev_ops dev_infos_get.
Complete dev_configure with RX offloads force enabling.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |  2 ++
 doc/guides/nics/gve.rst          |  5 +++
 drivers/net/gve/gve_ethdev.c     | 59 +++++++++++++++++++++++++++++++-
 drivers/net/gve/gve_ethdev.h     |  3 ++
 4 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index d1703d8dab..986df7f94a 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -4,8 +4,10 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+RSS hash             = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 36a65e4717..1051d964f1 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -71,3 +71,8 @@ Also, only GQI_QPL queue format is in use on GCP since GQI_RDA hasn't been
 released in production.
 
 Currently, setting MTU with value larger than 1460 is not supported.
+
+Currently, only "RSS hash" is force enabled so that the backend hardware
+device calculated hash values could be shared with applications. But for
+RSS, there is no such API to config RSS hash function or RETA table. So,
+limited RSS is supported only with default config/setting.
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 554f58640d..7fbe0c78c9 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -29,8 +29,16 @@ gve_write_version(uint8_t *driver_version_register)
 }
 
 static int
-gve_dev_configure(__rte_unused struct rte_eth_dev *dev)
+gve_dev_configure(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+
+	if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+		dev->data->dev_conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+
+	if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
+		priv->enable_rsc = 1;
+
 	return 0;
 }
 
@@ -96,6 +104,54 @@ gve_dev_close(struct rte_eth_dev *dev)
 	return err;
 }
 
+static int
+gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct gve_priv *priv = dev->data->dev_private;
+
+	dev_info->device = dev->device;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_queues = priv->max_nb_rxq;
+	dev_info->max_tx_queues = priv->max_nb_txq;
+	dev_info->min_rx_bufsize = GVE_MIN_BUF_SIZE;
+	dev_info->max_rx_pktlen = GVE_MAX_RX_PKTLEN;
+	dev_info->max_mtu = GVE_MAX_MTU;
+	dev_info->min_mtu = GVE_MIN_MTU;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
+		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
+
+	dev_info->default_rxconf = (struct rte_eth_rxconf) {
+		.rx_free_thresh = GVE_DEFAULT_RX_FREE_THRESH,
+		.rx_drop_en = 0,
+		.offloads = 0,
+	};
+
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.tx_free_thresh = GVE_DEFAULT_TX_FREE_THRESH,
+		.offloads = 0,
+	};
+
+	dev_info->default_rxportconf.ring_size = priv->rx_desc_cnt;
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->rx_desc_cnt,
+		.nb_min = priv->rx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	dev_info->default_txportconf.ring_size = priv->tx_desc_cnt;
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = priv->tx_desc_cnt,
+		.nb_min = priv->tx_desc_cnt,
+		.nb_align = 1,
+	};
+
+	return 0;
+}
+
 static int
 gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -128,6 +184,7 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_start            = gve_dev_start,
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
+	.dev_infos_get        = gve_dev_info_get,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 2ac2a46ac1..57c29374b5 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -18,6 +18,9 @@
 #define GVE_MIN_BUF_SIZE	    1024
 #define GVE_MAX_RX_PKTLEN	    65535
 
+#define GVE_MAX_MTU	RTE_ETHER_MTU
+#define GVE_MIN_MTU	RTE_ETHER_MIN_MTU
+
 /* A list of pages registered with the device during setup and used by a queue
  * as buffers
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v8 7/8] net/gve: add support for queue operations
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
                                             ` (5 preceding siblings ...)
  2022-10-25  9:07                           ` [PATCH v8 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25  9:07                           ` [PATCH v8 8/8] net/gve: add support for Rx/Tx Junfeng Guo
  2022-10-25 12:33                           ` [PATCH v8 0/8] introduce GVE PMD Ferruh Yigit
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add support for queue operations:
- setup rx/tx queue
- release rx/tx queue
- start rx/tx queues
- stop rx/tx queues

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 drivers/net/gve/gve_ethdev.c | 204 +++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_ethdev.h |  52 +++++++++
 drivers/net/gve/gve_rx.c     | 212 ++++++++++++++++++++++++++++++++++
 drivers/net/gve/gve_tx.c     | 214 +++++++++++++++++++++++++++++++++++
 drivers/net/gve/meson.build  |   2 +
 5 files changed, 684 insertions(+)
 create mode 100644 drivers/net/gve/gve_rx.c
 create mode 100644 drivers/net/gve/gve_tx.c

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 7fbe0c78c9..892e7e2e1c 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -28,6 +28,68 @@ gve_write_version(uint8_t *driver_version_register)
 	writeb('\n', driver_version_register);
 }
 
+static int
+gve_alloc_queue_page_list(struct gve_priv *priv, uint32_t id, uint32_t pages)
+{
+	char z_name[RTE_MEMZONE_NAMESIZE];
+	struct gve_queue_page_list *qpl;
+	const struct rte_memzone *mz;
+	dma_addr_t page_bus;
+	uint32_t i;
+
+	if (priv->num_registered_pages + pages >
+	    priv->max_registered_pages) {
+		PMD_DRV_LOG(ERR, "Pages %" PRIu64 " > max registered pages %" PRIu64,
+			    priv->num_registered_pages + pages,
+			    priv->max_registered_pages);
+		return -EINVAL;
+	}
+	qpl = &priv->qpl[id];
+	snprintf(z_name, sizeof(z_name), "gve_%s_qpl%d", priv->pci_dev->device.name, id);
+	mz = rte_memzone_reserve_aligned(z_name, pages * PAGE_SIZE,
+					 rte_socket_id(),
+					 RTE_MEMZONE_IOVA_CONTIG, PAGE_SIZE);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc %s.", z_name);
+		return -ENOMEM;
+	}
+	qpl->page_buses = rte_zmalloc("qpl page buses", pages * sizeof(dma_addr_t), 0);
+	if (qpl->page_buses == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to alloc qpl %u page buses", id);
+		return -ENOMEM;
+	}
+	page_bus = mz->iova;
+	for (i = 0; i < pages; i++) {
+		qpl->page_buses[i] = page_bus;
+		page_bus += PAGE_SIZE;
+	}
+	qpl->id = id;
+	qpl->mz = mz;
+	qpl->num_entries = pages;
+
+	priv->num_registered_pages += pages;
+
+	return 0;
+}
+
+static void
+gve_free_qpls(struct gve_priv *priv)
+{
+	uint16_t nb_txqs = priv->max_nb_txq;
+	uint16_t nb_rxqs = priv->max_nb_rxq;
+	uint32_t i;
+
+	for (i = 0; i < nb_txqs + nb_rxqs; i++) {
+		if (priv->qpl[i].mz != NULL)
+			rte_memzone_free(priv->qpl[i].mz);
+		if (priv->qpl[i].page_buses != NULL)
+			rte_free(priv->qpl[i].page_buses);
+	}
+
+	if (priv->qpl != NULL)
+		rte_free(priv->qpl);
+}
+
 static int
 gve_dev_configure(struct rte_eth_dev *dev)
 {
@@ -42,6 +104,43 @@ gve_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+gve_refill_pages(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf *nmb;
+	uint16_t i;
+	int diag;
+
+	diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[0], rxq->nb_rx_desc);
+	if (diag < 0) {
+		for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+			nmb = rte_pktmbuf_alloc(rxq->mpool);
+			if (!nmb)
+				break;
+			rxq->sw_ring[i] = nmb;
+		}
+		if (i < rxq->nb_rx_desc - 1)
+			return -ENOMEM;
+	}
+	rxq->nb_avail = 0;
+	rxq->next_avail = rxq->nb_rx_desc - 1;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->is_gqi_qpl) {
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(i * PAGE_SIZE);
+		} else {
+			if (i == rxq->nb_rx_desc - 1)
+				break;
+			nmb = rxq->sw_ring[i];
+			rxq->rx_data_ring[i].addr = rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+		}
+	}
+
+	rte_write32(rte_cpu_to_be_32(rxq->next_avail), rxq->qrx_tail);
+
+	return 0;
+}
+
 static int
 gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 {
@@ -73,16 +172,70 @@ gve_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
 static int
 gve_dev_start(struct rte_eth_dev *dev)
 {
+	uint16_t num_queues = dev->data->nb_tx_queues;
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	priv->txqs = (struct gve_tx_queue **)dev->data->tx_queues;
+	err = gve_adminq_create_tx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u tx queues.", num_queues);
+		return err;
+	}
+	for (i = 0; i < num_queues; i++) {
+		txq = priv->txqs[i];
+		txq->qtx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(txq->qres->db_index)];
+		txq->qtx_head =
+		&priv->cnt_array[rte_be_to_cpu_32(txq->qres->counter_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), txq->ntfy_addr);
+	}
+
+	num_queues = dev->data->nb_rx_queues;
+	priv->rxqs = (struct gve_rx_queue **)dev->data->rx_queues;
+	err = gve_adminq_create_rx_queues(priv, num_queues);
+	if (err) {
+		PMD_DRV_LOG(ERR, "failed to create %u rx queues.", num_queues);
+		goto err_tx;
+	}
+	for (i = 0; i < num_queues; i++) {
+		rxq = priv->rxqs[i];
+		rxq->qrx_tail =
+		&priv->db_bar2[rte_be_to_cpu_32(rxq->qres->db_index)];
+
+		rte_write32(rte_cpu_to_be_32(GVE_IRQ_MASK), rxq->ntfy_addr);
+
+		err = gve_refill_pages(rxq);
+		if (err) {
+			PMD_DRV_LOG(ERR, "Failed to refill for RX");
+			goto err_rx;
+		}
+	}
+
 	dev->data->dev_started = 1;
 	gve_link_update(dev, 0);
 
 	return 0;
+
+err_rx:
+	gve_stop_rx_queues(dev);
+err_tx:
+	gve_stop_tx_queues(dev);
+	return err;
 }
 
 static int
 gve_dev_stop(struct rte_eth_dev *dev)
 {
 	dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
+
+	gve_stop_tx_queues(dev);
+	gve_stop_rx_queues(dev);
+
 	dev->data->dev_started = 0;
 
 	return 0;
@@ -91,7 +244,11 @@ gve_dev_stop(struct rte_eth_dev *dev)
 static int
 gve_dev_close(struct rte_eth_dev *dev)
 {
+	struct gve_priv *priv = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	struct gve_rx_queue *rxq;
 	int err = 0;
+	uint16_t i;
 
 	if (dev->data->dev_started) {
 		err = gve_dev_stop(dev);
@@ -99,6 +256,19 @@ gve_dev_close(struct rte_eth_dev *dev)
 			PMD_DRV_LOG(ERR, "Failed to stop dev.");
 	}
 
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_tx_queue_release(txq);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_rx_queue_release(rxq);
+	}
+
+	gve_free_qpls(priv);
+	rte_free(priv->adminq);
+
 	dev->data->mac_addrs = NULL;
 
 	return err;
@@ -185,6 +355,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
 	.dev_stop             = gve_dev_stop,
 	.dev_close            = gve_dev_close,
 	.dev_infos_get        = gve_dev_info_get,
+	.rx_queue_setup       = gve_rx_queue_setup,
+	.tx_queue_setup       = gve_tx_queue_setup,
 	.link_update          = gve_link_update,
 	.mtu_set              = gve_dev_mtu_set,
 };
@@ -322,7 +494,9 @@ gve_setup_device_resources(struct gve_priv *priv)
 static int
 gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 {
+	uint16_t pages;
 	int num_ntfy;
+	uint32_t i;
 	int err;
 
 	/* Set up the adminq */
@@ -373,10 +547,40 @@ gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	PMD_DRV_LOG(INFO, "Max TX queues %d, Max RX queues %d",
 		    priv->max_nb_txq, priv->max_nb_rxq);
 
+	/* In GQI_QPL queue format:
+	 * Allocate queue page lists according to max queue number
+	 * tx qpl id should start from 0 while rx qpl id should start
+	 * from priv->max_nb_txq
+	 */
+	if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
+		priv->qpl = rte_zmalloc("gve_qpl",
+					(priv->max_nb_txq + priv->max_nb_rxq) *
+					sizeof(struct gve_queue_page_list), 0);
+		if (priv->qpl == NULL) {
+			PMD_DRV_LOG(ERR, "Failed to alloc qpl.");
+			err = -ENOMEM;
+			goto free_adminq;
+		}
+
+		for (i = 0; i < priv->max_nb_txq + priv->max_nb_rxq; i++) {
+			if (i < priv->max_nb_txq)
+				pages = priv->tx_pages_per_qpl;
+			else
+				pages = priv->rx_data_slot_cnt;
+			err = gve_alloc_queue_page_list(priv, i, pages);
+			if (err != 0) {
+				PMD_DRV_LOG(ERR, "Failed to alloc qpl %u.", i);
+				goto err_qpl;
+			}
+		}
+	}
+
 setup_device:
 	err = gve_setup_device_resources(priv);
 	if (!err)
 		return 0;
+err_qpl:
+	gve_free_qpls(priv);
 free_adminq:
 	gve_adminq_free(priv);
 	return err;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 57c29374b5..00c69d1b88 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -37,15 +37,35 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+struct gve_tx_iovec {
+	uint32_t iov_base; /* offset in fifo */
+	uint32_t iov_len;
+};
+
 struct gve_tx_queue {
 	volatile union gve_tx_desc *tx_desc_ring;
 	const struct rte_memzone *mz;
 	uint64_t tx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	volatile rte_be32_t *qtx_tail;
+	volatile rte_be32_t *qtx_head;
 
+	uint32_t tx_tail;
 	uint16_t nb_tx_desc;
+	uint16_t nb_free;
+	uint32_t next_to_clean;
+	uint16_t free_thresh;
 
 	/* Only valid for DQO_QPL queue format */
+	uint16_t sw_tail;
+	uint16_t sw_ntc;
+	uint16_t sw_nb_free;
+	uint32_t fifo_size;
+	uint32_t fifo_head;
+	uint32_t fifo_avail;
+	uint64_t fifo_base;
 	struct gve_queue_page_list *qpl;
+	struct gve_tx_iovec *iov_ring;
 
 	uint16_t port_id;
 	uint16_t queue_id;
@@ -59,6 +79,8 @@ struct gve_tx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_tx_queue *complq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_rx_queue {
@@ -67,9 +89,17 @@ struct gve_rx_queue {
 	const struct rte_memzone *mz;
 	const struct rte_memzone *data_mz;
 	uint64_t rx_ring_phys_addr;
+	struct rte_mbuf **sw_ring;
+	struct rte_mempool *mpool;
 
+	uint16_t rx_tail;
 	uint16_t nb_rx_desc;
+	uint16_t expected_seqno; /* the next expected seqno */
+	uint16_t free_thresh;
+	uint32_t next_avail;
+	uint32_t nb_avail;
 
+	volatile rte_be32_t *qrx_tail;
 	volatile rte_be32_t *ntfy_addr;
 
 	/* only valid for GQI_QPL queue format */
@@ -86,6 +116,8 @@ struct gve_rx_queue {
 
 	/* Only valid for DQO_RDA queue format */
 	struct gve_rx_queue *bufq;
+
+	uint8_t is_gqi_qpl;
 };
 
 struct gve_priv {
@@ -225,4 +257,24 @@ gve_clear_device_rings_ok(struct gve_priv *priv)
 				&priv->state_flags);
 }
 
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_rxconf *conf,
+		   struct rte_mempool *pool);
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf);
+
+void
+gve_tx_queue_release(void *txq);
+
+void
+gve_rx_queue_release(void *rxq);
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev);
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
new file mode 100644
index 0000000000..e64a461253
--- /dev/null
+++ b/drivers/net/gve/gve_rx.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_rxq(struct gve_rx_queue *rxq)
+{
+	struct rte_mbuf **sw_ring = rxq->sw_ring;
+	uint32_t size, i;
+
+	if (rxq == NULL) {
+		PMD_DRV_LOG(ERR, "pointer to rxq is NULL");
+		return;
+	}
+
+	size = rxq->nb_rx_desc * sizeof(struct gve_rx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_desc_ring)[i] = 0;
+
+	size = rxq->nb_rx_desc * sizeof(union gve_rx_data_slot);
+	for (i = 0; i < size; i++)
+		((volatile char *)rxq->rx_data_ring)[i] = 0;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++)
+		sw_ring[i] = NULL;
+
+	rxq->rx_tail = 0;
+	rxq->next_avail = 0;
+	rxq->nb_avail = rxq->nb_rx_desc;
+	rxq->expected_seqno = 1;
+}
+
+static inline void
+gve_release_rxq_mbufs(struct gve_rx_queue *rxq)
+{
+	uint16_t i;
+
+	for (i = 0; i < rxq->nb_rx_desc; i++) {
+		if (rxq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+			rxq->sw_ring[i] = NULL;
+		}
+	}
+
+	rxq->nb_avail = rxq->nb_rx_desc;
+}
+
+void
+gve_rx_queue_release(void *rxq)
+{
+	struct gve_rx_queue *q = rxq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		q->qpl = NULL;
+	}
+
+	gve_release_rxq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->data_mz);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+		uint16_t nb_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *conf, struct rte_mempool *pool)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_rx_queue *rxq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->rx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->rx_desc_cnt);
+	}
+	nb_desc = hw->rx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->rx_queues[queue_id]) {
+		gve_rx_queue_release(dev->data->rx_queues[queue_id]);
+		dev->data->rx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the RX queue data structure. */
+	rxq = rte_zmalloc_socket("gve rxq",
+				 sizeof(struct gve_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!rxq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for rx queue structure");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	free_thresh = conf->rx_free_thresh ? conf->rx_free_thresh : GVE_DEFAULT_RX_FREE_THRESH;
+	if (free_thresh >= nb_desc) {
+		PMD_DRV_LOG(ERR, "rx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, rxq->nb_rx_desc);
+		err = -EINVAL;
+		goto err_rxq;
+	}
+
+	rxq->nb_rx_desc = nb_desc;
+	rxq->free_thresh = free_thresh;
+	rxq->queue_id = queue_id;
+	rxq->port_id = dev->data->port_id;
+	rxq->ntfy_id = hw->num_ntfy_blks / 2 + queue_id;
+	rxq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	rxq->mpool = pool;
+	rxq->hw = hw;
+	rxq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[rxq->ntfy_id].id)];
+
+	rxq->rx_buf_len = rte_pktmbuf_data_room_size(rxq->mpool) - RTE_PKTMBUF_HEADROOM;
+
+	/* Allocate software ring */
+	rxq->sw_ring = rte_zmalloc_socket("gve rx sw ring", sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!rxq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW RX ring");
+		err = -ENOMEM;
+		goto err_rxq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_id,
+				      nb_desc * sizeof(struct gve_rx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	rxq->rx_desc_ring = (struct gve_rx_desc *)mz->addr;
+	rxq->rx_ring_phys_addr = mz->iova;
+	rxq->mz = mz;
+
+	mz = rte_eth_dma_zone_reserve(dev, "gve rx data ring", queue_id,
+				      sizeof(union gve_rx_data_slot) * nb_desc,
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RX data ring");
+		err = -ENOMEM;
+		goto err_rx_ring;
+	}
+	rxq->rx_data_ring = (union gve_rx_data_slot *)mz->addr;
+	rxq->data_mz = mz;
+	if (rxq->is_gqi_qpl) {
+		rxq->qpl = &hw->qpl[rxq->ntfy_id];
+		err = gve_adminq_register_page_list(hw, rxq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_data_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "rxq_res", queue_id,
+				      sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for RX resource");
+		err = -ENOMEM;
+		goto err_data_ring;
+	}
+	rxq->qres = (struct gve_queue_resources *)mz->addr;
+	rxq->qres_mz = mz;
+
+	gve_reset_rxq(rxq);
+
+	dev->data->rx_queues[queue_id] = rxq;
+
+	return 0;
+
+err_data_ring:
+	rte_memzone_free(rxq->data_mz);
+err_rx_ring:
+	rte_memzone_free(rxq->mz);
+err_sw_ring:
+	rte_free(rxq->sw_ring);
+err_rxq:
+	rte_free(rxq);
+	return err;
+}
+
+void
+gve_stop_rx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_rx_queue *rxq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_rx_queues(hw, dev->data->nb_rx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy rxqs");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		rxq = dev->data->rx_queues[i];
+		gve_release_rxq_mbufs(rxq);
+		gve_reset_rxq(rxq);
+	}
+}
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
new file mode 100644
index 0000000000..b706b62e71
--- /dev/null
+++ b/drivers/net/gve/gve_tx.c
@@ -0,0 +1,214 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Intel Corporation
+ */
+
+#include "gve_ethdev.h"
+#include "base/gve_adminq.h"
+
+static inline void
+gve_reset_txq(struct gve_tx_queue *txq)
+{
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint32_t size, i;
+
+	if (txq == NULL) {
+		PMD_DRV_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	size = txq->nb_tx_desc * sizeof(union gve_tx_desc);
+	for (i = 0; i < size; i++)
+		((volatile char *)txq->tx_desc_ring)[i] = 0;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		sw_ring[i] = NULL;
+		if (txq->is_gqi_qpl) {
+			txq->iov_ring[i].iov_base = 0;
+			txq->iov_ring[i].iov_len = 0;
+		}
+	}
+
+	txq->tx_tail = 0;
+	txq->nb_free = txq->nb_tx_desc - 1;
+	txq->next_to_clean = 0;
+
+	if (txq->is_gqi_qpl) {
+		txq->fifo_size = PAGE_SIZE * txq->hw->tx_pages_per_qpl;
+		txq->fifo_avail = txq->fifo_size;
+		txq->fifo_head = 0;
+		txq->fifo_base = (uint64_t)(txq->qpl->mz->addr);
+
+		txq->sw_tail = 0;
+		txq->sw_nb_free = txq->nb_tx_desc - 1;
+		txq->sw_ntc = 0;
+	}
+}
+
+static inline void
+gve_release_txq_mbufs(struct gve_tx_queue *txq)
+{
+	uint16_t i;
+
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		if (txq->sw_ring[i]) {
+			rte_pktmbuf_free_seg(txq->sw_ring[i]);
+			txq->sw_ring[i] = NULL;
+		}
+	}
+}
+
+void
+gve_tx_queue_release(void *txq)
+{
+	struct gve_tx_queue *q = txq;
+
+	if (!q)
+		return;
+
+	if (q->is_gqi_qpl) {
+		gve_adminq_unregister_page_list(q->hw, q->qpl->id);
+		rte_free(q->iov_ring);
+		q->qpl = NULL;
+	}
+
+	gve_release_txq_mbufs(q);
+	rte_free(q->sw_ring);
+	rte_memzone_free(q->mz);
+	rte_memzone_free(q->qres_mz);
+	q->qres = NULL;
+	rte_free(q);
+}
+
+int
+gve_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, uint16_t nb_desc,
+		   unsigned int socket_id, const struct rte_eth_txconf *conf)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	const struct rte_memzone *mz;
+	struct gve_tx_queue *txq;
+	uint16_t free_thresh;
+	int err = 0;
+
+	if (nb_desc != hw->tx_desc_cnt) {
+		PMD_DRV_LOG(WARNING, "gve doesn't support nb_desc config, use hw nb_desc %u.",
+			    hw->tx_desc_cnt);
+	}
+	nb_desc = hw->tx_desc_cnt;
+
+	/* Free memory if needed. */
+	if (dev->data->tx_queues[queue_id]) {
+		gve_tx_queue_release(dev->data->tx_queues[queue_id]);
+		dev->data->tx_queues[queue_id] = NULL;
+	}
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("gve txq", sizeof(struct gve_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for tx queue structure");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	free_thresh = conf->tx_free_thresh ? conf->tx_free_thresh : GVE_DEFAULT_TX_FREE_THRESH;
+	if (free_thresh >= nb_desc - 3) {
+		PMD_DRV_LOG(ERR, "tx_free_thresh (%u) must be less than nb_desc (%u) minus 3.",
+			    free_thresh, txq->nb_tx_desc);
+		err = -EINVAL;
+		goto err_txq;
+	}
+
+	txq->nb_tx_desc = nb_desc;
+	txq->free_thresh = free_thresh;
+	txq->queue_id = queue_id;
+	txq->port_id = dev->data->port_id;
+	txq->ntfy_id = queue_id;
+	txq->is_gqi_qpl = hw->queue_format == GVE_GQI_QPL_FORMAT;
+	txq->hw = hw;
+	txq->ntfy_addr = &hw->db_bar2[rte_be_to_cpu_32(hw->irq_dbs[txq->ntfy_id].id)];
+
+	/* Allocate software ring */
+	txq->sw_ring = rte_zmalloc_socket("gve tx sw ring",
+					  sizeof(struct rte_mbuf *) * nb_desc,
+					  RTE_CACHE_LINE_SIZE, socket_id);
+	if (!txq->sw_ring) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		err = -ENOMEM;
+		goto err_txq;
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "tx_ring", queue_id,
+				      nb_desc * sizeof(union gve_tx_desc),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX");
+		err = -ENOMEM;
+		goto err_sw_ring;
+	}
+	txq->tx_desc_ring = (union gve_tx_desc *)mz->addr;
+	txq->tx_ring_phys_addr = mz->iova;
+	txq->mz = mz;
+
+	if (txq->is_gqi_qpl) {
+		txq->iov_ring = rte_zmalloc_socket("gve tx iov ring",
+						   sizeof(struct gve_tx_iovec) * nb_desc,
+						   RTE_CACHE_LINE_SIZE, socket_id);
+		if (!txq->iov_ring) {
+			PMD_DRV_LOG(ERR, "Failed to allocate memory for SW TX ring");
+			err = -ENOMEM;
+			goto err_tx_ring;
+		}
+		txq->qpl = &hw->qpl[queue_id];
+		err = gve_adminq_register_page_list(hw, txq->qpl);
+		if (err != 0) {
+			PMD_DRV_LOG(ERR, "Failed to register qpl %u", queue_id);
+			goto err_iov_ring;
+		}
+	}
+
+	mz = rte_eth_dma_zone_reserve(dev, "txq_res", queue_id, sizeof(struct gve_queue_resources),
+				      PAGE_SIZE, socket_id);
+	if (mz == NULL) {
+		PMD_DRV_LOG(ERR, "Failed to reserve DMA memory for TX resource");
+		err = -ENOMEM;
+		goto err_iov_ring;
+	}
+	txq->qres = (struct gve_queue_resources *)mz->addr;
+	txq->qres_mz = mz;
+
+	gve_reset_txq(txq);
+
+	dev->data->tx_queues[queue_id] = txq;
+
+	return 0;
+
+err_iov_ring:
+	if (txq->is_gqi_qpl)
+		rte_free(txq->iov_ring);
+err_tx_ring:
+	rte_memzone_free(txq->mz);
+err_sw_ring:
+	rte_free(txq->sw_ring);
+err_txq:
+	rte_free(txq);
+	return err;
+}
+
+void
+gve_stop_tx_queues(struct rte_eth_dev *dev)
+{
+	struct gve_priv *hw = dev->data->dev_private;
+	struct gve_tx_queue *txq;
+	uint16_t i;
+	int err;
+
+	err = gve_adminq_destroy_tx_queues(hw, dev->data->nb_tx_queues);
+	if (err != 0)
+		PMD_DRV_LOG(WARNING, "failed to destroy txqs");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		gve_release_txq_mbufs(txq);
+		gve_reset_txq(txq);
+	}
+}
diff --git a/drivers/net/gve/meson.build b/drivers/net/gve/meson.build
index d8ec64b3a3..af0010c01c 100644
--- a/drivers/net/gve/meson.build
+++ b/drivers/net/gve/meson.build
@@ -9,6 +9,8 @@ endif
 
 sources = files(
         'base/gve_adminq.c',
+        'gve_rx.c',
+        'gve_tx.c',
         'gve_ethdev.c',
 )
 includes += include_directories('base')
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v8 8/8] net/gve: add support for Rx/Tx
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
                                             ` (6 preceding siblings ...)
  2022-10-25  9:07                           ` [PATCH v8 7/8] net/gve: add support for queue operations Junfeng Guo
@ 2022-10-25  9:07                           ` Junfeng Guo
  2022-10-25 12:33                           ` [PATCH v8 0/8] introduce GVE PMD Ferruh Yigit
  8 siblings, 0 replies; 192+ messages in thread
From: Junfeng Guo @ 2022-10-25  9:07 UTC (permalink / raw)
  To: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang, Junfeng Guo

Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
---
 doc/guides/nics/features/gve.ini |   2 +
 doc/guides/nics/gve.rst          |   4 +
 drivers/net/gve/gve_ethdev.c     |  15 +-
 drivers/net/gve/gve_ethdev.h     |  18 ++
 drivers/net/gve/gve_rx.c         | 142 ++++++++++
 drivers/net/gve/gve_tx.c         | 455 +++++++++++++++++++++++++++++++
 6 files changed, 635 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 986df7f94a..cdc46b08a3 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -7,7 +7,9 @@
 Speed capabilities   = Y
 Link status          = Y
 MTU update           = Y
+TSO                  = Y
 RSS hash             = Y
+L4 checksum offload  = Y
 Linux                = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 1051d964f1..92b85e0dbe 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -62,7 +62,11 @@ In this release, the GVE PMD provides the basic functionality of packet
 reception and transmission.
 Supported features of the GVE PMD are:
 
+- Multiple queues for TX and RX
+- TSO offload
 - Link state information
+- TX multi-segments (Scatter TX)
+- Tx UDP/TCP/SCTP Checksum
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now. It'll be added in the future
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 892e7e2e1c..5c0cd2f2c4 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -289,7 +289,13 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->min_mtu = GVE_MIN_MTU;
 
 	dev_info->rx_offload_capa = 0;
-	dev_info->tx_offload_capa = 0;
+	dev_info->tx_offload_capa =
+		RTE_ETH_TX_OFFLOAD_MULTI_SEGS	|
+		RTE_ETH_TX_OFFLOAD_IPV4_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_UDP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM	|
+		RTE_ETH_TX_OFFLOAD_TCP_TSO;
 
 	if (priv->queue_format == GVE_DQO_RDA_FORMAT)
 		dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TCP_LRO;
@@ -639,6 +645,13 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 	if (err)
 		return err;
 
+	if (gve_is_gqi(priv)) {
+		eth_dev->rx_pkt_burst = gve_rx_burst;
+		eth_dev->tx_pkt_burst = gve_tx_burst;
+	} else {
+		PMD_DRV_LOG(ERR, "DQO_RDA is not implemented and will be added in the future");
+	}
+
 	eth_dev->data->mac_addrs = &priv->dev_addr;
 
 	return 0;
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 00c69d1b88..36b334c36b 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -37,6 +37,18 @@ union gve_tx_desc {
 	struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
 };
 
+/* Offload features */
+union gve_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /* L3 (IP) Header Length. */
+		uint64_t l4_len:8; /* L4 Header Length. */
+		uint64_t tso_segsz:16; /* TCP TSO segment size */
+		/* uint64_t unused : 24; */
+	};
+};
+
 struct gve_tx_iovec {
 	uint32_t iov_base; /* offset in fifo */
 	uint32_t iov_len;
@@ -277,4 +289,10 @@ gve_stop_tx_queues(struct rte_eth_dev *dev);
 void
 gve_stop_rx_queues(struct rte_eth_dev *dev);
 
+uint16_t
+gve_rx_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t
+gve_tx_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 #endif /* _GVE_ETHDEV_H_ */
diff --git a/drivers/net/gve/gve_rx.c b/drivers/net/gve/gve_rx.c
index e64a461253..ea397d68fa 100644
--- a/drivers/net/gve/gve_rx.c
+++ b/drivers/net/gve/gve_rx.c
@@ -5,6 +5,148 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_rx_refill(struct gve_rx_queue *rxq)
+{
+	uint16_t mask = rxq->nb_rx_desc - 1;
+	uint16_t idx = rxq->next_avail & mask;
+	uint32_t next_avail = rxq->next_avail;
+	uint16_t nb_alloc, i;
+	struct rte_mbuf *nmb;
+	int diag;
+
+	/* wrap around */
+	nb_alloc = rxq->nb_rx_desc - idx;
+	if (nb_alloc <= rxq->nb_avail) {
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			if (i != nb_alloc)
+				nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		/* queue page list mode doesn't need real refill. */
+		if (rxq->is_gqi_qpl) {
+			idx += nb_alloc;
+		} else {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+		if (idx == rxq->nb_rx_desc)
+			idx = 0;
+	}
+
+	if (rxq->nb_avail > 0) {
+		nb_alloc = rxq->nb_avail;
+		if (rxq->nb_rx_desc < idx + rxq->nb_avail)
+			nb_alloc = rxq->nb_rx_desc - idx;
+		diag = rte_pktmbuf_alloc_bulk(rxq->mpool, &rxq->sw_ring[idx], nb_alloc);
+		if (diag < 0) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rte_pktmbuf_alloc(rxq->mpool);
+				if (!nmb)
+					break;
+				rxq->sw_ring[idx + i] = nmb;
+			}
+			nb_alloc = i;
+		}
+		rxq->nb_avail -= nb_alloc;
+		next_avail += nb_alloc;
+
+		if (!rxq->is_gqi_qpl) {
+			for (i = 0; i < nb_alloc; i++) {
+				nmb = rxq->sw_ring[idx];
+				rxq->rx_data_ring[idx].addr =
+					rte_cpu_to_be_64(rte_mbuf_data_iova(nmb));
+				idx++;
+			}
+		}
+	}
+
+	if (next_avail != rxq->next_avail) {
+		rte_write32(rte_cpu_to_be_32(next_avail), rxq->qrx_tail);
+		rxq->next_avail = next_avail;
+	}
+}
+
+uint16_t
+gve_rx_burst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	volatile struct gve_rx_desc *rxr, *rxd;
+	struct gve_rx_queue *rxq = rx_queue;
+	uint16_t rx_id = rxq->rx_tail;
+	struct rte_mbuf *rxe;
+	uint16_t nb_rx, len;
+	uint64_t addr;
+	uint16_t i;
+
+	rxr = rxq->rx_desc_ring;
+	nb_rx = 0;
+
+	for (i = 0; i < nb_pkts; i++) {
+		rxd = &rxr[rx_id];
+		if (GVE_SEQNO(rxd->flags_seq) != rxq->expected_seqno)
+			break;
+
+		if (rxd->flags_seq & GVE_RXF_ERR)
+			continue;
+
+		len = rte_be_to_cpu_16(rxd->len) - GVE_RX_PAD;
+		rxe = rxq->sw_ring[rx_id];
+		if (rxq->is_gqi_qpl) {
+			addr = (uint64_t)(rxq->qpl->mz->addr) + rx_id * PAGE_SIZE + GVE_RX_PAD;
+			rte_memcpy((void *)((size_t)rxe->buf_addr + rxe->data_off),
+				   (void *)(size_t)addr, len);
+		}
+		rxe->pkt_len = len;
+		rxe->data_len = len;
+		rxe->port = rxq->port_id;
+		rxe->ol_flags = 0;
+
+		if (rxd->flags_seq & GVE_RXF_TCP)
+			rxe->packet_type |= RTE_PTYPE_L4_TCP;
+		if (rxd->flags_seq & GVE_RXF_UDP)
+			rxe->packet_type |= RTE_PTYPE_L4_UDP;
+		if (rxd->flags_seq & GVE_RXF_IPV4)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV4;
+		if (rxd->flags_seq & GVE_RXF_IPV6)
+			rxe->packet_type |= RTE_PTYPE_L3_IPV6;
+
+		if (gve_needs_rss(rxd->flags_seq)) {
+			rxe->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+			rxe->hash.rss = rte_be_to_cpu_32(rxd->rss_hash);
+		}
+
+		rxq->expected_seqno = gve_next_seqno(rxq->expected_seqno);
+
+		rx_id++;
+		if (rx_id == rxq->nb_rx_desc)
+			rx_id = 0;
+
+		rx_pkts[nb_rx] = rxe;
+		nb_rx++;
+	}
+
+	rxq->nb_avail += nb_rx;
+	rxq->rx_tail = rx_id;
+
+	if (rxq->nb_avail > rxq->free_thresh)
+		gve_rx_refill(rxq);
+
+	return nb_rx;
+}
+
 static inline void
 gve_reset_rxq(struct gve_rx_queue *rxq)
 {
diff --git a/drivers/net/gve/gve_tx.c b/drivers/net/gve/gve_tx.c
index b706b62e71..cd0bdaa2ad 100644
--- a/drivers/net/gve/gve_tx.c
+++ b/drivers/net/gve/gve_tx.c
@@ -5,6 +5,461 @@
 #include "gve_ethdev.h"
 #include "base/gve_adminq.h"
 
+static inline void
+gve_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+	struct rte_mbuf *m, *free[GVE_TX_MAX_FREE_SZ];
+	int nb_free = 0;
+	int i, s;
+
+	if (unlikely(num == 0))
+		return;
+
+	/* Find the 1st mbuf which needs to be free */
+	for (s = 0; s < num; s++) {
+		if (txep[s] != NULL) {
+			m = rte_pktmbuf_prefree_seg(txep[s]);
+			if (m != NULL)
+				break;
+		}
+	}
+
+	if (s == num)
+		return;
+
+	free[0] = m;
+	nb_free = 1;
+	for (i = s + 1; i < num; i++) {
+		if (likely(txep[i] != NULL)) {
+			m = rte_pktmbuf_prefree_seg(txep[i]);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool)) {
+					free[nb_free++] = m;
+				} else {
+					rte_mempool_put_bulk(free[0]->pool, (void *)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+			txep[i] = NULL;
+		}
+	}
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+}
+
+static inline void
+gve_tx_clean(struct gve_tx_queue *txq)
+{
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint32_t start = txq->next_to_clean & mask;
+	uint32_t ntc, nb_clean, i;
+	struct gve_tx_iovec *iov;
+
+	ntc = rte_be_to_cpu_32(rte_read32(txq->qtx_head));
+	ntc = ntc & mask;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->next_to_clean += nb_clean;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		if (txq->is_gqi_qpl) {
+			for (i = start; i < start + nb_clean; i++) {
+				iov = &txq->iov_ring[i];
+				txq->fifo_avail += iov->iov_len;
+				iov->iov_base = 0;
+				iov->iov_len = 0;
+			}
+		} else {
+			gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		}
+		txq->nb_free += nb_clean;
+		txq->next_to_clean += nb_clean;
+	}
+}
+
+static inline void
+gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
+{
+	uint32_t start = txq->sw_ntc;
+	uint32_t ntc, nb_clean;
+
+	ntc = txq->sw_tail;
+
+	if (ntc == start)
+		return;
+
+	/* if wrap around, free twice. */
+	if (ntc < start) {
+		nb_clean = txq->nb_tx_desc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		if (start == txq->nb_tx_desc)
+			start = 0;
+		txq->sw_ntc = start;
+	}
+
+	if (ntc > start) {
+		nb_clean = ntc - start;
+		if (nb_clean > GVE_TX_MAX_FREE_SZ)
+			nb_clean = GVE_TX_MAX_FREE_SZ;
+		gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
+		txq->sw_nb_free += nb_clean;
+		start += nb_clean;
+		txq->sw_ntc = start;
+	}
+}
+
+static inline void
+gve_tx_fill_pkt_desc(volatile union gve_tx_desc *desc, struct rte_mbuf *mbuf,
+		     uint8_t desc_cnt, uint16_t len, uint64_t addr)
+{
+	uint64_t csum_l4 = mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK;
+	uint8_t l4_csum_offset = 0;
+	uint8_t l4_hdr_offset = 0;
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		csum_l4 |= RTE_MBUF_F_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		l4_csum_offset = offsetof(struct rte_sctp_hdr, cksum);
+		l4_hdr_offset = mbuf->l2_len + mbuf->l3_len;
+		break;
+	}
+
+	if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+		desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
+		desc->pkt.l4_csum_offset = l4_csum_offset >> 1;
+		desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
+	} else {
+		desc->pkt.type_flags = GVE_TXD_STD;
+		desc->pkt.l4_csum_offset = 0;
+		desc->pkt.l4_hdr_offset = 0;
+	}
+	desc->pkt.desc_cnt = desc_cnt;
+	desc->pkt.len = rte_cpu_to_be_16(mbuf->pkt_len);
+	desc->pkt.seg_len = rte_cpu_to_be_16(len);
+	desc->pkt.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline void
+gve_tx_fill_seg_desc(volatile union gve_tx_desc *desc, uint64_t ol_flags,
+		      union gve_tx_offload tx_offload,
+		      uint16_t len, uint64_t addr)
+{
+	desc->seg.type_flags = GVE_TXD_SEG;
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		if (ol_flags & RTE_MBUF_F_TX_IPV6)
+			desc->seg.type_flags |= GVE_TXSF_IPV6;
+		desc->seg.l3_offset = tx_offload.l2_len >> 1;
+		desc->seg.mss = rte_cpu_to_be_16(tx_offload.tso_segsz);
+	}
+	desc->seg.seg_len = rte_cpu_to_be_16(len);
+	desc->seg.seg_addr = rte_cpu_to_be_64(addr);
+}
+
+static inline bool
+is_fifo_avail(struct gve_tx_queue *txq, uint16_t len)
+{
+	if (txq->fifo_avail < len)
+		return false;
+	/* Don't split segment. */
+	if (txq->fifo_head + len > txq->fifo_size &&
+	    txq->fifo_size - txq->fifo_head + len > txq->fifo_avail)
+		return false;
+	return true;
+}
+static inline uint64_t
+gve_tx_alloc_from_fifo(struct gve_tx_queue *txq, uint16_t tx_id, uint16_t len)
+{
+	uint32_t head = txq->fifo_head;
+	uint32_t size = txq->fifo_size;
+	struct gve_tx_iovec *iov;
+	uint32_t aligned_head;
+	uint32_t iov_len = 0;
+	uint64_t fifo_addr;
+
+	iov = &txq->iov_ring[tx_id];
+
+	/* Don't split segment */
+	if (head + len > size) {
+		iov_len += (size - head);
+		head = 0;
+	}
+
+	fifo_addr = head;
+	iov_len += len;
+	iov->iov_base = head;
+
+	/* Re-align to a cacheline for next head */
+	head += len;
+	aligned_head = RTE_ALIGN(head, RTE_CACHE_LINE_SIZE);
+	iov_len += (aligned_head - head);
+	iov->iov_len = iov_len;
+
+	if (aligned_head == txq->fifo_size)
+		aligned_head = 0;
+	txq->fifo_head = aligned_head;
+	txq->fifo_avail -= iov_len;
+
+	return fifo_addr;
+}
+
+static inline uint16_t
+gve_tx_burst_qpl(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint64_t ol_flags, addr, fifo_addr;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t sw_id = txq->sw_tail;
+	uint16_t nb_used, i;
+	uint16_t nb_tx = 0;
+	uint32_t hlen;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh || txq->fifo_avail == 0)
+		gve_tx_clean(txq);
+
+	if (txq->sw_nb_free < txq->free_thresh)
+		gve_tx_clean_swr_qpl(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		if (txq->sw_nb_free < tx_pkt->nb_segs) {
+			gve_tx_clean_swr_qpl(txq);
+			if (txq->sw_nb_free < tx_pkt->nb_segs)
+				goto end_of_tx;
+		}
+
+		/* Even for multi-segs, use 1 qpl buf for data */
+		nb_used = 1;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+
+		sw_ring[sw_id] = tx_pkt;
+		if (!is_fifo_avail(txq, hlen)) {
+			gve_tx_clean(txq);
+			if (!is_fifo_avail(txq, hlen))
+				goto end_of_tx;
+		}
+		addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off;
+		fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, hlen);
+
+		/* For TSO, check if there's enough fifo space for data first */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen)) {
+				gve_tx_clean(txq);
+				if (!is_fifo_avail(txq, tx_pkt->pkt_len - hlen))
+					goto end_of_tx;
+			}
+		}
+		if (tx_pkt->nb_segs == 1 || ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+				   (void *)(size_t)addr, hlen);
+		else
+			rte_pktmbuf_read(tx_pkt, 0, hlen,
+					 (void *)(size_t)(fifo_addr + txq->fifo_base));
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, fifo_addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = (uint64_t)(tx_pkt->buf_addr) + tx_pkt->data_off + hlen;
+			fifo_addr = gve_tx_alloc_from_fifo(txq, tx_id, tx_pkt->pkt_len - hlen);
+			if (tx_pkt->nb_segs == 1)
+				rte_memcpy((void *)(size_t)(fifo_addr + txq->fifo_base),
+					   (void *)(size_t)addr,
+					   tx_pkt->pkt_len - hlen);
+			else
+				rte_pktmbuf_read(tx_pkt, hlen, tx_pkt->pkt_len - hlen,
+						 (void *)(size_t)(fifo_addr + txq->fifo_base));
+
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->pkt_len - hlen, fifo_addr);
+		}
+
+		/* record mbuf in sw_ring for free */
+		for (i = 1; i < first->nb_segs; i++) {
+			sw_id = (sw_id + 1) & mask;
+			tx_pkt = tx_pkt->next;
+			sw_ring[sw_id] = tx_pkt;
+		}
+
+		sw_id = (sw_id + 1) & mask;
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		txq->sw_nb_free -= first->nb_segs;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+		txq->sw_tail = sw_id;
+	}
+
+	return nb_tx;
+}
+
+static inline uint16_t
+gve_tx_burst_ra(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	union gve_tx_offload tx_offload = {0};
+	volatile union gve_tx_desc *txr, *txd;
+	struct gve_tx_queue *txq = tx_queue;
+	struct rte_mbuf **sw_ring = txq->sw_ring;
+	uint16_t mask = txq->nb_tx_desc - 1;
+	uint16_t tx_id = txq->tx_tail & mask;
+	uint32_t tx_tail = txq->tx_tail;
+	struct rte_mbuf *tx_pkt, *first;
+	uint16_t nb_used, hlen, i;
+	uint64_t ol_flags, addr;
+	uint16_t nb_tx = 0;
+
+	txr = txq->tx_desc_ring;
+
+	if (txq->nb_free < txq->free_thresh)
+		gve_tx_clean(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		tx_pkt = *tx_pkts++;
+		ol_flags = tx_pkt->ol_flags;
+
+		nb_used = tx_pkt->nb_segs;
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used++;
+
+		if (txq->nb_free < nb_used)
+			goto end_of_tx;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		first = tx_pkt;
+		txd = &txr[tx_id];
+
+		hlen = ol_flags & RTE_MBUF_F_TX_TCP_SEG ?
+			(uint32_t)(tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len) :
+			tx_pkt->pkt_len;
+		/*
+		 * if tso, the driver needs to fill 2 descs for 1 mbuf
+		 * so only put this mbuf into the 1st tx entry in sw ring
+		 */
+		sw_ring[tx_id] = tx_pkt;
+		addr = rte_mbuf_data_iova(tx_pkt);
+		gve_tx_fill_pkt_desc(txd, tx_pkt, nb_used, hlen, addr);
+
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			addr = rte_mbuf_data_iova(tx_pkt) + hlen;
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len - hlen, addr);
+		}
+
+		for (i = 1; i < first->nb_segs; i++) {
+			tx_id = (tx_id + 1) & mask;
+			txd = &txr[tx_id];
+			tx_pkt = tx_pkt->next;
+			sw_ring[tx_id] = tx_pkt;
+			addr = rte_mbuf_data_iova(tx_pkt);
+			gve_tx_fill_seg_desc(txd, ol_flags, tx_offload,
+					     tx_pkt->data_len, addr);
+		}
+		tx_id = (tx_id + 1) & mask;
+
+		txq->nb_free -= nb_used;
+		tx_tail += nb_used;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		rte_write32(rte_cpu_to_be_32(tx_tail), txq->qtx_tail);
+		txq->tx_tail = tx_tail;
+	}
+
+	return nb_tx;
+}
+
+uint16_t
+gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct gve_tx_queue *txq = tx_queue;
+
+	if (txq->is_gqi_qpl)
+		return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
+
+	return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
+}
+
 static inline void
 gve_reset_txq(struct gve_tx_queue *txq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* RE: [PATCH v7 8/8] net/gve: add support for Rx/Tx
  2022-10-24 13:25                               ` Guo, Junfeng
@ 2022-10-25  9:07                                 ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-25  9:07 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Guo, Junfeng
> Sent: Monday, October 24, 2022 21:25
> To: Ferruh Yigit <ferruh.yigit@amd.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <Chenbo.Xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: RE: [PATCH v7 8/8] net/gve: add support for Rx/Tx
> 
> 
> 
> > -----Original Message-----
> > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > Sent: Monday, October 24, 2022 18:50
> > To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> > Beilei <beilei.xing@intel.com>
> > Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> > awogbemila@google.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> > stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> > Zhang, Helin <helin.zhang@intel.com>
> > Subject: Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
> >
> > On 10/24/2022 6:04 AM, Guo, Junfeng wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > >> Sent: Friday, October 21, 2022 17:52
> > >> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> > >> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > >> ferruh.yigit@xilinx.com; Xing, Beilei <beilei.xing@intel.com>
> > >> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> > >> awogbemila@google.com; Richardson, Bruce
> > >> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> > >> stephen@networkplumber.org; Xia, Chenbo
> <chenbo.xia@intel.com>;
> > >> Zhang, Helin <helin.zhang@intel.com>
> > >> Subject: Re: [PATCH v7 8/8] net/gve: add support for Rx/Tx
> > >>
> > >> On 10/21/2022 10:19 AM, Junfeng Guo wrote:
> > >>
> > >>>
> > >>> Add Rx/Tx of GQI_QPL queue format and GQI_RDA queue format.
> > >>>
> > >>> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > >>> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
> > >>
> > >> <...>
> > >>
> > >>> +
> > >>> +static inline void
> > >>> +gve_tx_clean_swr_qpl(struct gve_tx_queue *txq)
> > >>> +{
> > >>> +       uint32_t start = txq->sw_ntc;
> > >>> +       uint32_t ntc, nb_clean;
> > >>> +
> > >>> +       ntc = txq->sw_tail;
> > >>> +
> > >>> +       if (ntc == start)
> > >>> +               return;
> > >>> +
> > >>> +       /* if wrap around, free twice. */
> > >>> +       if (ntc < start) {
> > >>> +               nb_clean = txq->nb_tx_desc - start;
> > >>> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> > >>> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> > >>> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> > >>> +
> > >>> +               txq->sw_nb_free += nb_clean;
> > >>> +               start += nb_clean;
> > >>> +               if (start == txq->nb_tx_desc)
> > >>> +                       start = 0;
> > >>> +               txq->sw_ntc = start;
> > >>> +       }
> > >>> +
> > >>> +       if (ntc > start) {
> > >>> +               nb_clean = ntc - start;
> > >>> +               if (nb_clean > GVE_TX_MAX_FREE_SZ)
> > >>> +                       nb_clean = GVE_TX_MAX_FREE_SZ;
> > >>> +               gve_free_bulk_mbuf(&txq->sw_ring[start], nb_clean);
> > >>> +               txq->sw_nb_free += nb_clean;
> > >>> +               start += nb_clean;
> > >>> +               txq->sw_ntc = start;
> > >>> +       }
> > >>> +}
> > >>
> > >> [copy/paste from previous version]
> > >>
> > >> may be can drop the 'if' block, since "ntc == start" and "ntc < start"
> > >> cases already covered.
> > >
> > > Yes, this 'if' block is dropped in v7 as suggested. Thanks!
> > >
> >
> > This is v7 and please check above code which has the mentioned 'if'
> > block exist, do you mean in coming v8?
> 
> Oh, sorry about this! There is another function with the same issue.
> I just updated the code there and forgot this. Will update this and
> check for the rest in the coming version. Thanks!

Sorry again about this part!!

After running the code and double checking the code, it seems that
the 'if' blocks here cannot be removed directly. As you can see, in the
above 'if' block, the parameter 'start' is not a fixed one and may be
changed gradually. So these 'if' blocks are necessary for some certain
cases.

I remembered that the queue page list is a tricky feature that can be
wraparound to improve the utility or performance. Thus the queue
page list should be freed twice and these two 'if' blocks here would
run one by one.

I'll restore these modifications in the coming version. Thanks! 

> 
> >
> > >>
> > >> <...>
> > >>
> > >>> +uint16_t
> > >>> +gve_tx_burst(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> > >> nb_pkts)
> > >>> +{
> > >>> +       struct gve_tx_queue *txq = tx_queue;
> > >>> +
> > >>> +       if (txq->is_gqi_qpl)
> > >>> +               return gve_tx_burst_qpl(tx_queue, tx_pkts, nb_pkts);
> > >>> +
> > >>> +       return gve_tx_burst_ra(tx_queue, tx_pkts, nb_pkts);
> > >>> +}
> > >>> +
> > >>
> > >> [copy/paste from previous version]
> > >>
> > >> Can there be mix of queue types?
> > >> If only one queue type is supported in specific config, perhaps burst
> > >> function can be set during configuration, to prevent if check on
> > datapath.
> > >>
> > >> This is optimization and can be done later, it doesn't have to be in the
> > >> set.
> > >
> > > The exact queue type can be fetched from the backend via adminq
> > > in priv->queue_format. So there won't be mix of the queue types.
> > > Currently, only GQI_QPL and GQI_RDA queue format are supported
> > > in PMD. Also, only GQI_QPL queue format is in use on GCP since
> > > GQI_RDA hasn't been released in production.
> > > This part code will be optimized/refactored later when involving
> > > the queue type DQO_RDA. Thanks!
> > >


^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v8 0/8] introduce GVE PMD
  2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
                                             ` (7 preceding siblings ...)
  2022-10-25  9:07                           ` [PATCH v8 8/8] net/gve: add support for Rx/Tx Junfeng Guo
@ 2022-10-25 12:33                           ` Ferruh Yigit
  2022-10-26  2:05                             ` Guo, Junfeng
  8 siblings, 1 reply; 192+ messages in thread
From: Ferruh Yigit @ 2022-10-25 12:33 UTC (permalink / raw)
  To: Junfeng Guo, qi.z.zhang, jingjing.wu, beilei.xing
  Cc: dev, xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	stephen, chenbo.xia, helin.zhang

On 10/25/2022 10:07 AM, Junfeng Guo wrote:
> Introduce a new PMD for Google Virtual Ethernet (GVE).
> 
> gve (or gVNIC) is the standard virtual ethernet interface on Google Cloud
> Platform (GCP), which is one of the multiple virtual interfaces from those
> leading CSP customers in the world.
> 
> Having a well maintained/optimized gve PMD on DPDK community can help those
> cloud instance consumers with better experience of performance, maintenance
> who wants to run their own VNFs on GCP.
> 
> Please refer tohttps://cloud.google.com/compute/docs/networking/using-gvnic
> for the device description.
> 
> This patch set requires an exception for MIT license for GVE base code.
> And the base code includes the following files:
>   - gve_adminq.c
>   - gve_adminq.h
>   - gve_desc.h
>   - gve_desc_dqo.h
>   - gve_register.h
> 
> It's based on GVE kernel driver v1.3.0 and the original code is in
> https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/tree/v1.3.0
> 
> 
> v2:
> fix some CI check error.
> 
> v3:
> refactor some code and fix some build error.
> 
> v4:
> move the Google base code files into DPDK base folder.
> 
> v5:
> reorder commit sequence and drop the stats feature.
> 
> v6:
> improve the code.
> 
> v7:
> - remove Intel copyright for the google base files.
> 
> v8:
> - replace ETIME with ETIMEDOUT to pass the build check.
> - use RTE_ETHER_ADDR_PRT_FMT/_ADDR_BYTES to get rid of 'mac' variable.
> - add limitations in doc for current limited RSS and MTU.
> 
> 
> Junfeng Guo (8):
>    net/gve/base: introduce base code
>    net/gve/base: add OS specific implementation
>    net/gve: add support for device initialization
>    net/gve: add support for link update
>    net/gve: add support for MTU setting
>    net/gve: add support for dev info get and dev configure
>    net/gve: add support for queue operations
>    net/gve: add support for Rx/Tx

Series applied to dpdk-next-net/main, thanks.

Please send a web page to document the new PMD:
https://core.dpdk.org/supported/#nics

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v8 5/8] net/gve: add support for MTU setting
  2022-10-25  9:07                           ` [PATCH v8 5/8] net/gve: add support for MTU setting Junfeng Guo
@ 2022-10-25 15:55                             ` Stephen Hemminger
  2022-10-26  2:15                               ` Guo, Junfeng
  0 siblings, 1 reply; 192+ messages in thread
From: Stephen Hemminger @ 2022-10-25 15:55 UTC (permalink / raw)
  To: Junfeng Guo
  Cc: qi.z.zhang, jingjing.wu, ferruh.yigit, beilei.xing, dev,
	xiaoyun.li, awogbemila, bruce.richardson, hemant.agrawal,
	chenbo.xia, helin.zhang

On Tue, 25 Oct 2022 17:07:26 +0800
Junfeng Guo <junfeng.guo@intel.com> wrote:

> +static int
> +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> +	struct gve_priv *priv = dev->data->dev_private;
> +	int err;
> +
> +	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> +		PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
> +			    RTE_ETHER_MIN_MTU, priv->max_mtu);
> +		return -EINVAL;
> +	}

This check should not be necessary.
In rte_eth_dev_set_mtu it queries device for min/max mtu
then calls eth_dev_validate_mtu() to check that the mtu
is ok.

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v8 0/8] introduce GVE PMD
  2022-10-25 12:33                           ` [PATCH v8 0/8] introduce GVE PMD Ferruh Yigit
@ 2022-10-26  2:05                             ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-26  2:05 UTC (permalink / raw)
  To: Ferruh Yigit, Zhang, Qi Z, Wu, Jingjing, Xing, Beilei
  Cc: dev, Li, Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal,
	stephen, Xia, Chenbo, Zhang, Helin



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Tuesday, October 25, 2022 20:33
> To: Guo, Junfeng <junfeng.guo@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>;
> awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com;
> stephen@networkplumber.org; Xia, Chenbo <chenbo.xia@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v8 0/8] introduce GVE PMD
> 
> On 10/25/2022 10:07 AM, Junfeng Guo wrote:
> > Introduce a new PMD for Google Virtual Ethernet (GVE).
> >
> > gve (or gVNIC) is the standard virtual ethernet interface on Google
> Cloud
> > Platform (GCP), which is one of the multiple virtual interfaces from
> those
> > leading CSP customers in the world.
> >
> > Having a well maintained/optimized gve PMD on DPDK community can
> help those
> > cloud instance consumers with better experience of performance,
> maintenance
> > who wants to run their own VNFs on GCP.
> >
> > Please refer
> tohttps://cloud.google.com/compute/docs/networking/using-gvnic
> > for the device description.
> >
> > This patch set requires an exception for MIT license for GVE base code.
> > And the base code includes the following files:
> >   - gve_adminq.c
> >   - gve_adminq.h
> >   - gve_desc.h
> >   - gve_desc_dqo.h
> >   - gve_register.h
> >
> > It's based on GVE kernel driver v1.3.0 and the original code is in
> > https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-
> linux/tree/v1.3.0
> >
> >
> > v2:
> > fix some CI check error.
> >
> > v3:
> > refactor some code and fix some build error.
> >
> > v4:
> > move the Google base code files into DPDK base folder.
> >
> > v5:
> > reorder commit sequence and drop the stats feature.
> >
> > v6:
> > improve the code.
> >
> > v7:
> > - remove Intel copyright for the google base files.
> >
> > v8:
> > - replace ETIME with ETIMEDOUT to pass the build check.
> > - use RTE_ETHER_ADDR_PRT_FMT/_ADDR_BYTES to get rid of 'mac'
> variable.
> > - add limitations in doc for current limited RSS and MTU.
> >
> >
> > Junfeng Guo (8):
> >    net/gve/base: introduce base code
> >    net/gve/base: add OS specific implementation
> >    net/gve: add support for device initialization
> >    net/gve: add support for link update
> >    net/gve: add support for MTU setting
> >    net/gve: add support for dev info get and dev configure
> >    net/gve: add support for queue operations
> >    net/gve: add support for Rx/Tx
> 
> Series applied to dpdk-next-net/main, thanks.
> 
> Please send a web page to document the new PMD:
> https://core.dpdk.org/supported/#nics

Sure, thanks for reminding!

^ permalink raw reply	[flat|nested] 192+ messages in thread

* RE: [PATCH v8 5/8] net/gve: add support for MTU setting
  2022-10-25 15:55                             ` Stephen Hemminger
@ 2022-10-26  2:15                               ` Guo, Junfeng
  0 siblings, 0 replies; 192+ messages in thread
From: Guo, Junfeng @ 2022-10-26  2:15 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Zhang, Qi Z, Wu, Jingjing, ferruh.yigit, Xing, Beilei, dev, Li,
	Xiaoyun, awogbemila, Richardson, Bruce, hemant.agrawal, Xia,
	Chenbo, Zhang, Helin



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, October 25, 2022 23:55
> To: Guo, Junfeng <junfeng.guo@intel.com>
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; ferruh.yigit@xilinx.com; Xing, Beilei
> <beilei.xing@intel.com>; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; awogbemila@google.com; Richardson, Bruce
> <bruce.richardson@intel.com>; hemant.agrawal@nxp.com; Xia, Chenbo
> <chenbo.xia@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v8 5/8] net/gve: add support for MTU setting
> 
> On Tue, 25 Oct 2022 17:07:26 +0800
> Junfeng Guo <junfeng.guo@intel.com> wrote:
> 
> > +static int
> > +gve_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> > +{
> > +	struct gve_priv *priv = dev->data->dev_private;
> > +	int err;
> > +
> > +	if (mtu < RTE_ETHER_MIN_MTU || mtu > priv->max_mtu) {
> > +		PMD_DRV_LOG(ERR, "MIN MTU is %u, MAX MTU is %u",
> > +			    RTE_ETHER_MIN_MTU, priv->max_mtu);
> > +		return -EINVAL;
> > +	}
> 
> This check should not be necessary.
> In rte_eth_dev_set_mtu it queries device for min/max mtu
> then calls eth_dev_validate_mtu() to check that the mtu
> is ok.

Thanks for the comment. Yes, this part seems redundant with 
_validate_mtu() for the same check. 
Maybe better to update this as a bugfix later. Thanks!

^ permalink raw reply	[flat|nested] 192+ messages in thread

end of thread, other threads:[~2022-10-26  2:15 UTC | newest]

Thread overview: 192+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-29 19:30 [PATCH 00/10] introduce GVE PMD Xiaoyun Li
2022-07-29 19:30 ` [PATCH 01/10] net/gve: introduce GVE PMD base code Xiaoyun Li
2022-07-29 22:42   ` Stephen Hemminger
2022-07-29 22:45     ` Stephen Hemminger
2022-08-23  8:44       ` Guo, Junfeng
2022-08-29  8:41   ` [PATCH v2 00/10] introduce GVE PMD Junfeng Guo
2022-08-29  8:41     ` [PATCH v2 01/10] net/gve: introduce GVE PMD base code Junfeng Guo
2022-09-01 17:19       ` Ferruh Yigit
2022-09-01 18:23         ` Stephen Hemminger
2022-09-01 20:49           ` Thomas Monjalon
2022-09-06  9:31             ` Guo, Junfeng
2022-09-14 10:38               ` Thomas Monjalon
2022-08-29  8:41     ` [PATCH v2 02/10] net/gve: add logs and OS specific implementation Junfeng Guo
2022-09-01 17:20       ` Ferruh Yigit
2022-09-07  6:58         ` Guo, Junfeng
2022-09-07 11:16           ` Ferruh Yigit
2022-09-08  8:09             ` Guo, Junfeng
2022-08-29  8:41     ` [PATCH v2 03/10] net/gve: support device initialization Junfeng Guo
2022-09-01 17:21       ` Ferruh Yigit
2022-09-23  9:38         ` Guo, Junfeng
2022-09-01 17:22       ` Ferruh Yigit
2022-08-29  8:41     ` [PATCH v2 04/10] net/gve: add link update support Junfeng Guo
2022-09-01 17:23       ` Ferruh Yigit
2022-09-23  9:38         ` Guo, Junfeng
2022-08-29  8:41     ` [PATCH v2 05/10] net/gve: add MTU set support Junfeng Guo
2022-08-29  8:41     ` [PATCH v2 06/10] net/gve: add queue operations Junfeng Guo
2022-08-29  8:41     ` [PATCH v2 07/10] net/gve: add Rx/Tx support Junfeng Guo
2022-08-29  8:41     ` [PATCH v2 08/10] net/gve: add support to get dev info and configure dev Junfeng Guo
2022-09-01 17:23       ` Ferruh Yigit
2022-09-23  9:38         ` Guo, Junfeng
2022-08-29  8:41     ` [PATCH v2 09/10] net/gve: add stats support Junfeng Guo
2022-09-01 17:24       ` Ferruh Yigit
2022-09-23  9:38         ` Guo, Junfeng
2022-08-29  8:41     ` [PATCH v2 10/10] doc: update documentation Junfeng Guo
2022-09-01 17:20       ` Ferruh Yigit
2022-09-23  9:38       ` [PATCH v3 0/9] introduce GVE PMD Junfeng Guo
2022-09-23  9:38         ` [PATCH v3 1/9] net/gve: introduce GVE PMD base code Junfeng Guo
2022-09-23 18:57           ` Stephen Hemminger
2022-09-27  7:27             ` Guo, Junfeng
2022-09-23 18:58           ` Stephen Hemminger
2022-09-27  7:27             ` Guo, Junfeng
2022-09-27  7:32           ` [PATCH v4 0/9] introduce GVE PMD Junfeng Guo
2022-09-27  7:32             ` [PATCH v4 1/9] net/gve/base: introduce GVE PMD base code Junfeng Guo
2022-10-06 14:19               ` Ferruh Yigit
2022-10-09  9:14                 ` Guo, Junfeng
2022-10-10 10:17               ` [PATCH v5 0/8] introduce GVE PMD Junfeng Guo
2022-10-10 10:17                 ` [PATCH v5 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
2022-10-19 13:45                   ` Ferruh Yigit
2022-10-19 15:13                     ` Hemant Agrawal
2022-10-19 15:18                       ` Ferruh Yigit
2022-10-20  3:33                         ` Hemant Agrawal
2022-10-19 15:48                     ` Li, Xiaoyun
2022-10-19 20:52                       ` Ferruh Yigit
2022-10-20  8:50                         ` Li, Xiaoyun
2022-10-20 10:36                   ` [PATCH v6 0/8] introduce GVE PMD Junfeng Guo
2022-10-20 10:36                     ` [PATCH v6 1/8] net/gve/base: introduce GVE PMD base code Junfeng Guo
2022-10-20 14:39                       ` Ferruh Yigit
2022-10-24  2:10                         ` Guo, Junfeng
2022-10-20 14:40                       ` Ferruh Yigit
2022-10-24  2:10                         ` Guo, Junfeng
2022-10-20 10:36                     ` [PATCH v6 2/8] net/gve/base: add OS specific implementation Junfeng Guo
2022-10-20 10:36                     ` [PATCH v6 3/8] net/gve: add support for device initialization Junfeng Guo
2022-10-20 14:42                       ` Ferruh Yigit
2022-10-24  2:10                         ` Guo, Junfeng
2022-10-20 10:36                     ` [PATCH v6 4/8] net/gve: add support for link update Junfeng Guo
2022-10-20 10:36                     ` [PATCH v6 5/8] net/gve: add support for MTU setting Junfeng Guo
2022-10-20 14:45                       ` Ferruh Yigit
2022-10-24  2:10                         ` Guo, Junfeng
2022-10-20 10:36                     ` [PATCH v6 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
2022-10-20 14:45                       ` Ferruh Yigit
2022-10-24  2:10                         ` Guo, Junfeng
2022-10-20 10:36                     ` [PATCH v6 7/8] net/gve: add support for queue operations Junfeng Guo
2022-10-20 10:36                     ` [PATCH v6 8/8] net/gve: add support for Rx/Tx Junfeng Guo
2022-10-20 14:47                       ` Ferruh Yigit
2022-10-24  2:10                         ` Guo, Junfeng
2022-10-21  9:19                     ` [PATCH v7 0/8] introduce GVE PMD Junfeng Guo
2022-10-21  9:19                       ` [PATCH v7 1/8] net/gve/base: introduce base code Junfeng Guo
2022-10-21  9:49                         ` Ferruh Yigit
2022-10-24  5:04                           ` Guo, Junfeng
2022-10-24 10:47                             ` Ferruh Yigit
2022-10-24 13:23                               ` Guo, Junfeng
2022-10-24 10:50                         ` Ferruh Yigit
2022-10-24 13:26                           ` Guo, Junfeng
2022-10-25  9:07                         ` [PATCH v8 0/8] introduce GVE PMD Junfeng Guo
2022-10-25  9:07                           ` [PATCH v8 1/8] net/gve/base: introduce base code Junfeng Guo
2022-10-25  9:07                           ` [PATCH v8 2/8] net/gve/base: add OS specific implementation Junfeng Guo
2022-10-25  9:07                           ` [PATCH v8 3/8] net/gve: add support for device initialization Junfeng Guo
2022-10-25  9:07                           ` [PATCH v8 4/8] net/gve: add support for link update Junfeng Guo
2022-10-25  9:07                           ` [PATCH v8 5/8] net/gve: add support for MTU setting Junfeng Guo
2022-10-25 15:55                             ` Stephen Hemminger
2022-10-26  2:15                               ` Guo, Junfeng
2022-10-25  9:07                           ` [PATCH v8 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
2022-10-25  9:07                           ` [PATCH v8 7/8] net/gve: add support for queue operations Junfeng Guo
2022-10-25  9:07                           ` [PATCH v8 8/8] net/gve: add support for Rx/Tx Junfeng Guo
2022-10-25 12:33                           ` [PATCH v8 0/8] introduce GVE PMD Ferruh Yigit
2022-10-26  2:05                             ` Guo, Junfeng
2022-10-21  9:19                       ` [PATCH v7 2/8] net/gve/base: add OS specific implementation Junfeng Guo
2022-10-21  9:19                       ` [PATCH v7 3/8] net/gve: add support for device initialization Junfeng Guo
2022-10-21  9:49                         ` Ferruh Yigit
2022-10-24  5:04                           ` Guo, Junfeng
2022-10-24 10:47                             ` Ferruh Yigit
2022-10-24 13:22                               ` Guo, Junfeng
2022-10-21  9:19                       ` [PATCH v7 4/8] net/gve: add support for link update Junfeng Guo
2022-10-21  9:19                       ` [PATCH v7 5/8] net/gve: add support for MTU setting Junfeng Guo
2022-10-21  9:50                         ` Ferruh Yigit
2022-10-24  5:04                           ` Guo, Junfeng
2022-10-24 10:47                             ` Ferruh Yigit
2022-10-24 13:23                               ` Guo, Junfeng
2022-10-21  9:19                       ` [PATCH v7 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
2022-10-21  9:51                         ` Ferruh Yigit
2022-10-24  5:04                           ` Guo, Junfeng
2022-10-24 10:48                             ` Ferruh Yigit
2022-10-24 13:23                               ` Guo, Junfeng
2022-10-21  9:19                       ` [PATCH v7 7/8] net/gve: add support for queue operations Junfeng Guo
2022-10-21  9:19                       ` [PATCH v7 8/8] net/gve: add support for Rx/Tx Junfeng Guo
2022-10-21  9:52                         ` Ferruh Yigit
2022-10-24  5:04                           ` Guo, Junfeng
2022-10-24 10:50                             ` Ferruh Yigit
2022-10-24 13:25                               ` Guo, Junfeng
2022-10-25  9:07                                 ` Guo, Junfeng
2022-10-21 13:12                       ` [PATCH v7 0/8] introduce GVE PMD Ferruh Yigit
2022-10-24 10:50                         ` Ferruh Yigit
2022-10-24 13:25                           ` Guo, Junfeng
2022-10-10 10:17                 ` [PATCH v5 2/8] net/gve/base: add OS specific implementation Junfeng Guo
2022-10-10 10:17                 ` [PATCH v5 3/8] net/gve: add support for device initialization Junfeng Guo
2022-10-19 13:46                   ` Ferruh Yigit
2022-10-19 15:59                     ` Li, Xiaoyun
2022-10-19 21:00                       ` Ferruh Yigit
2022-10-20  9:29                         ` Guo, Junfeng
2022-10-20 11:15                           ` Ferruh Yigit
2022-10-21  4:46                             ` Guo, Junfeng
2022-10-19 13:47                   ` Ferruh Yigit
2022-10-19 14:02                     ` Xia, Chenbo
2022-10-19 14:24                     ` Zhang, Helin
2022-10-19 21:16                       ` Ferruh Yigit
2022-10-19 16:20                     ` Li, Xiaoyun
2022-10-10 10:17                 ` [PATCH v5 4/8] net/gve: add support for link update Junfeng Guo
2022-10-10 10:17                 ` [PATCH v5 5/8] net/gve: add support for MTU setting Junfeng Guo
2022-10-19 13:47                   ` Ferruh Yigit
2022-10-20 10:14                     ` Guo, Junfeng
2022-10-10 10:17                 ` [PATCH v5 6/8] net/gve: add support for dev info get and dev configure Junfeng Guo
2022-10-19 13:49                   ` Ferruh Yigit
2022-10-20  9:29                     ` Guo, Junfeng
2022-10-20 11:19                       ` Ferruh Yigit
2022-10-21  5:22                         ` Guo, Junfeng
2022-10-10 10:17                 ` [PATCH v5 7/8] net/gve: add support for queue operations Junfeng Guo
2022-10-10 10:17                 ` [PATCH v5 8/8] net/gve: add support for Rx/Tx Junfeng Guo
2022-10-19 13:47                   ` Ferruh Yigit
2022-10-20  9:34                     ` Guo, Junfeng
2022-09-27  7:32             ` [PATCH v4 2/9] net/gve/base: add logs and OS specific implementation Junfeng Guo
2022-10-06 14:20               ` Ferruh Yigit
2022-10-09  9:14                 ` Guo, Junfeng
2022-09-27  7:32             ` [PATCH v4 3/9] net/gve: add support for device initialization Junfeng Guo
2022-10-06 14:22               ` Ferruh Yigit
2022-10-09  9:14                 ` Guo, Junfeng
2022-09-27  7:32             ` [PATCH v4 4/9] net/gve: add support for link update Junfeng Guo
2022-10-06 14:23               ` Ferruh Yigit
2022-10-09  9:14                 ` Guo, Junfeng
2022-09-27  7:32             ` [PATCH v4 5/9] net/gve: add support for MTU setting Junfeng Guo
2022-09-27  7:32             ` [PATCH v4 6/9] net/gve: add support for queue operations Junfeng Guo
2022-09-27  7:32             ` [PATCH v4 7/9] net/gve: add support for Rx/Tx Junfeng Guo
2022-10-06 14:24               ` Ferruh Yigit
2022-10-09  9:14                 ` Guo, Junfeng
2022-10-10  9:39                   ` Li, Xiaoyun
2022-10-10 10:18                     ` Guo, Junfeng
2022-09-27  7:32             ` [PATCH v4 8/9] net/gve: add support for dev info get and dev configure Junfeng Guo
2022-10-06 14:25               ` Ferruh Yigit
2022-10-09  9:14                 ` Guo, Junfeng
2022-09-27  7:32             ` [PATCH v4 9/9] net/gve: add support for stats Junfeng Guo
2022-10-06 14:25               ` Ferruh Yigit
2022-10-09  9:15                 ` Guo, Junfeng
2022-09-23  9:38         ` [PATCH v3 2/9] net/gve: add logs and OS specific implementation Junfeng Guo
2022-09-23 19:01           ` Stephen Hemminger
2022-09-27  7:27             ` Guo, Junfeng
2022-09-23  9:38         ` [PATCH v3 3/9] net/gve: support device initialization Junfeng Guo
2022-09-23  9:38         ` [PATCH v3 4/9] net/gve: add link update support Junfeng Guo
2022-09-23  9:38         ` [PATCH v3 5/9] net/gve: add MTU set support Junfeng Guo
2022-09-23  9:38         ` [PATCH v3 6/9] net/gve: add queue operations Junfeng Guo
2022-09-23  9:38         ` [PATCH v3 7/9] net/gve: add Rx/Tx support Junfeng Guo
2022-09-23  9:38         ` [PATCH v3 8/9] net/gve: add support to get dev info and configure dev Junfeng Guo
2022-09-23  9:38         ` [PATCH v3 9/9] net/gve: add stats support Junfeng Guo
2022-09-01 17:19     ` [PATCH v2 00/10] introduce GVE PMD Ferruh Yigit
2022-09-07  2:09       ` Guo, Junfeng
2022-07-29 19:30 ` [PATCH 02/10] net/gve: add logs and OS specific implementation Xiaoyun Li
2022-07-29 19:30 ` [PATCH 03/10] net/gve: support device initialization Xiaoyun Li
2022-07-29 19:30 ` [PATCH 04/10] net/gve: add link update support Xiaoyun Li
2022-07-29 19:30 ` [PATCH 05/10] net/gve: add MTU set support Xiaoyun Li
2022-07-29 19:30 ` [PATCH 06/10] net/gve: add queue operations Xiaoyun Li
2022-07-29 19:30 ` [PATCH 07/10] net/gve: add Rx/Tx support Xiaoyun Li
2022-07-29 19:30 ` [PATCH 08/10] net/gve: add support to get dev info and configure dev Xiaoyun Li
2022-07-29 19:30 ` [PATCH 09/10] net/gve: add stats support Xiaoyun Li
2022-07-29 19:30 ` [PATCH 10/10] doc: update documentation Xiaoyun Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.