All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/18] GVE Driver v1.1.0 Features.
@ 2020-08-18 19:43 David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 01/18] gve: Get and set Rx copybreak via ethtool David Awogbemila
                   ` (18 more replies)
  0 siblings, 19 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:43 UTC (permalink / raw)
  To: netdev; +Cc: David Awogbemila

This patch series updates the GVE driver to v1.1.0 which broadly includes:
- introducing "raw adressing" mode, which allows the driver avoid copies in
  the guest.
- increased stats coverage.
- batching AdminQueue commands to the device.

Catherine Sullivan (8):
  gve: Register netdev earlier
  gve: Add support for dma_mask register
  gve: Add support for raw addressing device option
  gve: Add support for raw addressing to the rx path
  gve: Add support for raw addressing in the tx path
  gve: Add netif_set_xps_queue call
  gve: Add rx buffer pagecnt bias.
  gve: Move the irq db indexes out of the ntfy block struct

David Awogbemila (3):
  gve: Enable Link Speed Reporting in the driver.
  gve: Also WARN for skb index equals num_queues.
  gve: Bump version to 1.1.0.

Kuo Zhao (3):
  gve: Get and set Rx copybreak via ethtool
  gve: Add stats for gve.
  gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.

Patricio Noyola (1):
  gve: Use link status register to report link status

Sagi Shahar (1):
  gve: Batch AQ commands for creating and destroying queues.

Yangchun Fu (2):
  gve: Prefetch packet pages and packet descriptors.
  gve: Switch to use napi_complete_done

 drivers/net/ethernet/google/gve/gve.h         | 171 +++++--
 drivers/net/ethernet/google/gve/gve_adminq.c  | 372 +++++++++++++--
 drivers/net/ethernet/google/gve/gve_adminq.h  |  77 +++-
 drivers/net/ethernet/google/gve/gve_desc.h    |  18 +-
 drivers/net/ethernet/google/gve/gve_ethtool.c | 366 +++++++++++++--
 drivers/net/ethernet/google/gve/gve_main.c    | 424 +++++++++++++-----
 .../net/ethernet/google/gve/gve_register.h    |   4 +-
 drivers/net/ethernet/google/gve/gve_rx.c      | 409 +++++++++++++----
 drivers/net/ethernet/google/gve/gve_tx.c      | 215 +++++++--
 9 files changed, 1700 insertions(+), 356 deletions(-)

-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next 01/18] gve: Get and set Rx copybreak via ethtool
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 20:00   ` Andrew Lunn
  2020-08-18 19:44 ` [PATCH net-next 02/18] gve: Add stats for gve David Awogbemila
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Kuo Zhao, Yangchun Fu, David Awogbemila

From: Kuo Zhao <kuozhao@google.com>

This adds support for getting and setting the RX copybreak
value via ethtool.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Kuo Zhao <kuozhao@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_ethtool.c | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index d8fa816f4473..469d3332bcd6 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -230,6 +230,38 @@ static int gve_user_reset(struct net_device *netdev, u32 *flags)
 	return -EOPNOTSUPP;
 }
 
+static int gve_get_tunable(struct net_device *netdev,
+			   const struct ethtool_tunable *etuna, void *value)
+{
+	struct gve_priv *priv = netdev_priv(netdev);
+
+	switch (etuna->id) {
+	case ETHTOOL_RX_COPYBREAK:
+		*(u32 *)value = priv->rx_copybreak;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int gve_set_tunable(struct net_device *netdev,
+			   const struct ethtool_tunable *etuna, const void *value)
+{
+	struct gve_priv *priv = netdev_priv(netdev);
+	u32 len;
+
+	switch (etuna->id) {
+	case ETHTOOL_RX_COPYBREAK:
+		len = *(u32 *)value;
+		if (len > PAGE_SIZE / 2)
+			return -EINVAL;
+		priv->rx_copybreak = len;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 const struct ethtool_ops gve_ethtool_ops = {
 	.get_drvinfo = gve_get_drvinfo,
 	.get_strings = gve_get_strings,
@@ -242,4 +274,6 @@ const struct ethtool_ops gve_ethtool_ops = {
 	.get_link = ethtool_op_get_link,
 	.get_ringparam = gve_get_ringparam,
 	.reset = gve_user_reset,
+	.get_tunable = gve_get_tunable,
+	.set_tunable = gve_set_tunable,
 };
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 02/18] gve: Add stats for gve.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 01/18] gve: Get and set Rx copybreak via ethtool David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 03/18] gve: Register netdev earlier David Awogbemila
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Kuo Zhao, Yangchun Fu, David Awogbemila

From: Kuo Zhao <kuozhao@google.com>

Sample output of "ethtool -S <interface-name>" with 1 RX queue and 1 TX
queue:
NIC statistics:
     rx_packets: 1039
     tx_packets: 37
     rx_bytes: 822071
     tx_bytes: 4100
     rx_dropped: 0
     tx_dropped: 0
     tx_timeouts: 0
     rx_skb_alloc_fail: 0
     rx_buf_alloc_fail: 0
     rx_desc_err_dropped_pkt: 0
     interface_up_cnt: 1
     interface_down_cnt: 0
     reset_cnt: 0
     page_alloc_fail: 0
     dma_mapping_error: 0
     rx_posted_desc[0]: 1365
     rx_completed_desc[0]: 341
     rx_bytes[0]: 215094
     rx_dropped_pkt[0]: 0
     rx_copybreak_pkt[0]: 3
     rx_copied_pkt[0]: 3
     tx_posted_desc[0]: 6
     tx_completed_desc[0]: 6
     tx_bytes[0]: 420
     tx_wake[0]: 0
     tx_stop[0]: 0
     tx_event_counter[0]: 6
     adminq_prod_cnt: 34
     adminq_cmd_fail: 0
     adminq_timeouts: 0
     adminq_describe_device_cnt: 1
     adminq_cfg_device_resources_cnt: 1
     adminq_register_page_list_cnt: 16
     adminq_unregister_page_list_cnt: 0
     adminq_create_tx_queue_cnt: 8
     adminq_create_rx_queue_cnt: 8
     adminq_destroy_tx_queue_cnt: 0
     adminq_destroy_rx_queue_cnt: 0
     adminq_dcfg_device_resources_cnt: 0
     adminq_set_driver_parameter_cnt: 0

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Kuo Zhao <kuozhao@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h         |  28 +++-
 drivers/net/ethernet/google/gve/gve_adminq.c  |  65 +++++++-
 drivers/net/ethernet/google/gve/gve_ethtool.c | 147 ++++++++++++++----
 drivers/net/ethernet/google/gve/gve_main.c    |  15 +-
 drivers/net/ethernet/google/gve/gve_rx.c      |  37 ++++-
 5 files changed, 245 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index ebc37e256922..55b34b437764 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -71,6 +71,11 @@ struct gve_rx_ring {
 	u32 cnt; /* free-running total number of completed packets */
 	u32 fill_cnt; /* free-running total number of descs and buffs posted */
 	u32 mask; /* masks the cnt and fill_cnt to the size of the ring */
+	u64 rx_copybreak_pkt; /* free-running count of copybreak packets */
+	u64 rx_copied_pkt; /* free-running total number of copied packets */
+	u64 rx_skb_alloc_fail; /* free-running count of skb alloc fails */
+	u64 rx_buf_alloc_fail; /* free-running count of buffer alloc fails */
+	u64 rx_desc_err_dropped_pkt; /* free-running count of packets dropped by descriptor error */
 	u32 q_num; /* queue index */
 	u32 ntfy_id; /* notification block index */
 	struct gve_queue_resources *q_resources; /* head and tail pointer idx */
@@ -202,6 +207,26 @@ struct gve_priv {
 	dma_addr_t adminq_bus_addr;
 	u32 adminq_mask; /* masks prod_cnt to adminq size */
 	u32 adminq_prod_cnt; /* free-running count of AQ cmds executed */
+	u32 adminq_cmd_fail; /* free-running count of AQ cmds failed */
+	u32 adminq_timeouts; /* free-running count of AQ cmds timeouts */
+	/* free-running count of per AQ cmd executed */
+	u32 adminq_describe_device_cnt;
+	u32 adminq_cfg_device_resources_cnt;
+	u32 adminq_register_page_list_cnt;
+	u32 adminq_unregister_page_list_cnt;
+	u32 adminq_create_tx_queue_cnt;
+	u32 adminq_create_rx_queue_cnt;
+	u32 adminq_destroy_tx_queue_cnt;
+	u32 adminq_destroy_rx_queue_cnt;
+	u32 adminq_dcfg_device_resources_cnt;
+	u32 adminq_set_driver_parameter_cnt;
+
+	/* Global stats */
+	u32 interface_up_cnt; /* count of times interface turned up since last reset */
+	u32 interface_down_cnt; /* count of times interface turned down since last reset */
+	u32 reset_cnt; /* count of reset */
+	u32 page_alloc_fail; /* count of page alloc fails */
+	u32 dma_mapping_error; /* count of dma mapping errors */
 
 	struct workqueue_struct *gve_wq;
 	struct work_struct service_task;
@@ -426,7 +451,8 @@ static inline bool gve_can_recycle_pages(struct net_device *dev)
 }
 
 /* buffers */
-int gve_alloc_page(struct device *dev, struct page **page, dma_addr_t *dma,
+int gve_alloc_page(struct gve_priv *priv, struct device *dev,
+		   struct page **page, dma_addr_t *dma,
 		   enum dma_data_direction);
 void gve_free_page(struct device *dev, struct page *page, dma_addr_t dma,
 		   enum dma_data_direction);
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index c3ba7baf0107..9dbfcfb8cf63 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -23,6 +23,18 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
 
 	priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
 	priv->adminq_prod_cnt = 0;
+	priv->adminq_cmd_fail = 0;
+	priv->adminq_timeouts = 0;
+	priv->adminq_describe_device_cnt = 0;
+	priv->adminq_cfg_device_resources_cnt = 0;
+	priv->adminq_register_page_list_cnt = 0;
+	priv->adminq_unregister_page_list_cnt = 0;
+	priv->adminq_create_tx_queue_cnt = 0;
+	priv->adminq_create_rx_queue_cnt = 0;
+	priv->adminq_destroy_tx_queue_cnt = 0;
+	priv->adminq_destroy_rx_queue_cnt = 0;
+	priv->adminq_dcfg_device_resources_cnt = 0;
+	priv->adminq_set_driver_parameter_cnt = 0;
 
 	/* Setup Admin queue with the device */
 	iowrite32be(priv->adminq_bus_addr / PAGE_SIZE,
@@ -81,17 +93,18 @@ static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
 	return false;
 }
 
-static int gve_adminq_parse_err(struct device *dev, u32 status)
+static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
 {
 	if (status != GVE_ADMINQ_COMMAND_PASSED &&
-	    status != GVE_ADMINQ_COMMAND_UNSET)
-		dev_err(dev, "AQ command failed with status %d\n", status);
-
+	    status != GVE_ADMINQ_COMMAND_UNSET) {
+		dev_err(&priv->pdev->dev, "AQ command failed with status %d\n", status);
+		priv->adminq_cmd_fail++;
+	}
 	switch (status) {
 	case GVE_ADMINQ_COMMAND_PASSED:
 		return 0;
 	case GVE_ADMINQ_COMMAND_UNSET:
-		dev_err(dev, "parse_aq_err: err and status both unset, this should not be possible.\n");
+		dev_err(&priv->pdev->dev, "parse_aq_err: err and status both unset, this should not be possible.\n");
 		return -EINVAL;
 	case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
 	case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
@@ -116,7 +129,7 @@ static int gve_adminq_parse_err(struct device *dev, u32 status)
 	case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
 		return -ENOTSUPP;
 	default:
-		dev_err(dev, "parse_aq_err: unknown status code %d\n", status);
+		dev_err(&priv->pdev->dev, "parse_aq_err: unknown status code %d\n", status);
 		return -EINVAL;
 	}
 }
@@ -128,6 +141,7 @@ int gve_adminq_execute_cmd(struct gve_priv *priv,
 			   union gve_adminq_command *cmd_orig)
 {
 	union gve_adminq_command *cmd;
+	u32 opcode;
 	u32 status = 0;
 	u32 prod_cnt;
 
@@ -136,16 +150,53 @@ int gve_adminq_execute_cmd(struct gve_priv *priv,
 	prod_cnt = priv->adminq_prod_cnt;
 
 	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
+	opcode = be32_to_cpu(READ_ONCE(cmd->opcode));
+
+	switch (opcode) {
+	case GVE_ADMINQ_DESCRIBE_DEVICE:
+		priv->adminq_describe_device_cnt++;
+		break;
+	case GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_cfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_REGISTER_PAGE_LIST:
+		priv->adminq_register_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_UNREGISTER_PAGE_LIST:
+		priv->adminq_unregister_page_list_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_TX_QUEUE:
+		priv->adminq_create_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_CREATE_RX_QUEUE:
+		priv->adminq_create_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_TX_QUEUE:
+		priv->adminq_destroy_tx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DESTROY_RX_QUEUE:
+		priv->adminq_destroy_rx_queue_cnt++;
+		break;
+	case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
+		priv->adminq_dcfg_device_resources_cnt++;
+		break;
+	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
+		priv->adminq_set_driver_parameter_cnt++;
+		break;
+	default:
+		dev_err(&priv->pdev->dev, "unknown AQ command opcode %d\n", opcode);
+	}
 
 	gve_adminq_kick_cmd(priv, prod_cnt);
 	if (!gve_adminq_wait_for_cmd(priv, prod_cnt)) {
 		dev_err(&priv->pdev->dev, "AQ command timed out, need to reset AQ\n");
+		priv->adminq_timeouts++;
 		return -ENOTRECOVERABLE;
 	}
 
 	memcpy(cmd_orig, cmd, sizeof(*cmd));
 	status = be32_to_cpu(READ_ONCE(cmd->status));
-	return gve_adminq_parse_err(&priv->pdev->dev, status);
+	return gve_adminq_parse_err(priv, status);
 }
 
 /* The device specifies that the management vector can either be the first irq
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index 469d3332bcd6..ebbf2c4c444e 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -34,17 +34,40 @@ static u32 gve_get_msglevel(struct net_device *netdev)
 static const char gve_gstrings_main_stats[][ETH_GSTRING_LEN] = {
 	"rx_packets", "tx_packets", "rx_bytes", "tx_bytes",
 	"rx_dropped", "tx_dropped", "tx_timeouts",
+	"rx_skb_alloc_fail", "rx_buf_alloc_fail", "rx_desc_err_dropped_pkt",
+	"interface_up_cnt", "interface_down_cnt", "reset_cnt",
+	"page_alloc_fail", "dma_mapping_error",
+};
+
+static const char gve_gstrings_rx_stats[][ETH_GSTRING_LEN] = {
+	"rx_posted_desc[%u]", "rx_completed_desc[%u]", "rx_bytes[%u]",
+	"rx_dropped_pkt[%u]", "rx_copybreak_pkt[%u]", "rx_copied_pkt[%u]",
+};
+
+static const char gve_gstrings_tx_stats[][ETH_GSTRING_LEN] = {
+	"tx_posted_desc[%u]", "tx_completed_desc[%u]", "tx_bytes[%u]",
+	"tx_wake[%u]", "tx_stop[%u]", "tx_event_counter[%u]",
+};
+
+static const char gve_gstrings_adminq_stats[][ETH_GSTRING_LEN] = {
+	"adminq_prod_cnt", "adminq_cmd_fail", "adminq_timeouts",
+	"adminq_describe_device_cnt", "adminq_cfg_device_resources_cnt",
+	"adminq_register_page_list_cnt", "adminq_unregister_page_list_cnt",
+	"adminq_create_tx_queue_cnt", "adminq_create_rx_queue_cnt",
+	"adminq_destroy_tx_queue_cnt", "adminq_destroy_rx_queue_cnt",
+	"adminq_dcfg_device_resources_cnt", "adminq_set_driver_parameter_cnt",
 };
 
 #define GVE_MAIN_STATS_LEN  ARRAY_SIZE(gve_gstrings_main_stats)
-#define NUM_GVE_TX_CNTS	5
-#define NUM_GVE_RX_CNTS	2
+#define GVE_ADMINQ_STATS_LEN  ARRAY_SIZE(gve_gstrings_adminq_stats)
+#define NUM_GVE_TX_CNTS	ARRAY_SIZE(gve_gstrings_tx_stats)
+#define NUM_GVE_RX_CNTS	ARRAY_SIZE(gve_gstrings_rx_stats)
 
 static void gve_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 {
 	struct gve_priv *priv = netdev_priv(netdev);
 	char *s = (char *)data;
-	int i;
+	int i, j;
 
 	if (stringset != ETH_SS_STATS)
 		return;
@@ -52,24 +75,24 @@ static void gve_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 	memcpy(s, *gve_gstrings_main_stats,
 	       sizeof(gve_gstrings_main_stats));
 	s += sizeof(gve_gstrings_main_stats);
+
 	for (i = 0; i < priv->rx_cfg.num_queues; i++) {
-		snprintf(s, ETH_GSTRING_LEN, "rx_desc_cnt[%u]", i);
-		s += ETH_GSTRING_LEN;
-		snprintf(s, ETH_GSTRING_LEN, "rx_desc_fill_cnt[%u]", i);
-		s += ETH_GSTRING_LEN;
+		for (j = 0; j < NUM_GVE_RX_CNTS; j++) {
+			snprintf(s, ETH_GSTRING_LEN, gve_gstrings_rx_stats[j], i);
+			s += ETH_GSTRING_LEN;
+		}
 	}
+
 	for (i = 0; i < priv->tx_cfg.num_queues; i++) {
-		snprintf(s, ETH_GSTRING_LEN, "tx_req[%u]", i);
-		s += ETH_GSTRING_LEN;
-		snprintf(s, ETH_GSTRING_LEN, "tx_done[%u]", i);
-		s += ETH_GSTRING_LEN;
-		snprintf(s, ETH_GSTRING_LEN, "tx_wake[%u]", i);
-		s += ETH_GSTRING_LEN;
-		snprintf(s, ETH_GSTRING_LEN, "tx_stop[%u]", i);
-		s += ETH_GSTRING_LEN;
-		snprintf(s, ETH_GSTRING_LEN, "tx_event_counter[%u]", i);
-		s += ETH_GSTRING_LEN;
+		for (j = 0; j < NUM_GVE_TX_CNTS; j++) {
+			snprintf(s, ETH_GSTRING_LEN, gve_gstrings_tx_stats[j], i);
+			s += ETH_GSTRING_LEN;
+		}
 	}
+
+	memcpy(s, *gve_gstrings_adminq_stats,
+	       sizeof(gve_gstrings_adminq_stats));
+	s += sizeof(gve_gstrings_adminq_stats);
 }
 
 static int gve_get_sset_count(struct net_device *netdev, int sset)
@@ -78,7 +101,7 @@ static int gve_get_sset_count(struct net_device *netdev, int sset)
 
 	switch (sset) {
 	case ETH_SS_STATS:
-		return GVE_MAIN_STATS_LEN +
+		return GVE_MAIN_STATS_LEN + GVE_ADMINQ_STATS_LEN +
 		       (priv->rx_cfg.num_queues * NUM_GVE_RX_CNTS) +
 		       (priv->tx_cfg.num_queues * NUM_GVE_TX_CNTS);
 	default:
@@ -90,24 +113,39 @@ static void
 gve_get_ethtool_stats(struct net_device *netdev,
 		      struct ethtool_stats *stats, u64 *data)
 {
+	u64 tmp_rx_pkts, tmp_rx_bytes, tmp_rx_skb_alloc_fail,
+		tmp_rx_buf_alloc_fail, tmp_rx_desc_err_dropped_pkt,
+		tmp_tx_pkts, tmp_tx_bytes;
+	u64 rx_pkts, rx_bytes, rx_skb_alloc_fail, rx_buf_alloc_fail,
+		rx_desc_err_dropped_pkt, tx_pkts, tx_bytes;
 	struct gve_priv *priv = netdev_priv(netdev);
-	u64 rx_pkts, rx_bytes, tx_pkts, tx_bytes;
 	unsigned int start;
 	int ring;
 	int i;
 
 	ASSERT_RTNL();
 
-	for (rx_pkts = 0, rx_bytes = 0, ring = 0;
+	for (rx_pkts = 0, rx_bytes = 0, rx_skb_alloc_fail = 0,
+	     rx_buf_alloc_fail = 0, rx_desc_err_dropped_pkt = 0, ring = 0;
 	     ring < priv->rx_cfg.num_queues; ring++) {
 		if (priv->rx) {
 			do {
+				struct gve_rx_ring *rx = &priv->rx[ring];
 				start =
 				  u64_stats_fetch_begin(&priv->rx[ring].statss);
-				rx_pkts += priv->rx[ring].rpackets;
-				rx_bytes += priv->rx[ring].rbytes;
+				tmp_rx_pkts = rx->rpackets;
+				tmp_rx_bytes = rx->rbytes;
+				tmp_rx_skb_alloc_fail = rx->rx_skb_alloc_fail;
+				tmp_rx_buf_alloc_fail = rx->rx_buf_alloc_fail;
+				tmp_rx_desc_err_dropped_pkt =
+					rx->rx_desc_err_dropped_pkt;
 			} while (u64_stats_fetch_retry(&priv->rx[ring].statss,
 						       start));
+			rx_pkts += tmp_rx_pkts;
+			rx_bytes += tmp_rx_bytes;
+			rx_skb_alloc_fail += tmp_rx_skb_alloc_fail;
+			rx_buf_alloc_fail += tmp_rx_buf_alloc_fail;
+			rx_desc_err_dropped_pkt += tmp_rx_desc_err_dropped_pkt;
 		}
 	}
 	for (tx_pkts = 0, tx_bytes = 0, ring = 0;
@@ -116,10 +154,12 @@ gve_get_ethtool_stats(struct net_device *netdev,
 			do {
 				start =
 				  u64_stats_fetch_begin(&priv->tx[ring].statss);
-				tx_pkts += priv->tx[ring].pkt_done;
-				tx_bytes += priv->tx[ring].bytes_done;
+				tmp_tx_pkts = priv->tx[ring].pkt_done;
+				tmp_tx_bytes = priv->tx[ring].bytes_done;
 			} while (u64_stats_fetch_retry(&priv->tx[ring].statss,
 						       start));
+			tx_pkts += tmp_tx_pkts;
+			tx_bytes += tmp_tx_bytes;
 		}
 	}
 
@@ -128,9 +168,21 @@ gve_get_ethtool_stats(struct net_device *netdev,
 	data[i++] = tx_pkts;
 	data[i++] = rx_bytes;
 	data[i++] = tx_bytes;
-	/* Skip rx_dropped and tx_dropped */
-	i += 2;
+	/* total rx dropped packets */
+	data[i++] = rx_skb_alloc_fail + rx_buf_alloc_fail +
+		    rx_desc_err_dropped_pkt;
+	/* Skip tx_dropped */
+	i++;
+
 	data[i++] = priv->tx_timeo_cnt;
+	data[i++] = rx_skb_alloc_fail;
+	data[i++] = rx_buf_alloc_fail;
+	data[i++] = rx_desc_err_dropped_pkt;
+	data[i++] = priv->interface_up_cnt;
+	data[i++] = priv->interface_down_cnt;
+	data[i++] = priv->reset_cnt;
+	data[i++] = priv->page_alloc_fail;
+	data[i++] = priv->dma_mapping_error;
 	i = GVE_MAIN_STATS_LEN;
 
 	/* walk RX rings */
@@ -138,8 +190,25 @@ gve_get_ethtool_stats(struct net_device *netdev,
 		for (ring = 0; ring < priv->rx_cfg.num_queues; ring++) {
 			struct gve_rx_ring *rx = &priv->rx[ring];
 
-			data[i++] = rx->cnt;
 			data[i++] = rx->fill_cnt;
+			data[i++] = rx->cnt;
+			do {
+				start =
+				  u64_stats_fetch_begin(&priv->rx[ring].statss);
+				tmp_rx_bytes = rx->rbytes;
+				tmp_rx_skb_alloc_fail = rx->rx_skb_alloc_fail;
+				tmp_rx_buf_alloc_fail = rx->rx_buf_alloc_fail;
+				tmp_rx_desc_err_dropped_pkt =
+					rx->rx_desc_err_dropped_pkt;
+			} while (u64_stats_fetch_retry(&priv->rx[ring].statss,
+						       start));
+			data[i++] = tmp_rx_bytes;
+			/* rx dropped packets */
+			data[i++] = tmp_rx_skb_alloc_fail +
+				tmp_rx_buf_alloc_fail +
+				tmp_rx_desc_err_dropped_pkt;
+			data[i++] = rx->rx_copybreak_pkt;
+			data[i++] = rx->rx_copied_pkt;
 		}
 	} else {
 		i += priv->rx_cfg.num_queues * NUM_GVE_RX_CNTS;
@@ -151,6 +220,13 @@ gve_get_ethtool_stats(struct net_device *netdev,
 
 			data[i++] = tx->req;
 			data[i++] = tx->done;
+			do {
+				start =
+				  u64_stats_fetch_begin(&priv->tx[ring].statss);
+				tmp_tx_bytes = tx->bytes_done;
+			} while (u64_stats_fetch_retry(&priv->tx[ring].statss,
+						       start));
+			data[i++] = tmp_tx_bytes;
 			data[i++] = tx->wake_queue;
 			data[i++] = tx->stop_queue;
 			data[i++] = be32_to_cpu(gve_tx_load_event_counter(priv,
@@ -159,6 +235,20 @@ gve_get_ethtool_stats(struct net_device *netdev,
 	} else {
 		i += priv->tx_cfg.num_queues * NUM_GVE_TX_CNTS;
 	}
+	/* AQ Stats */
+	data[i++] = priv->adminq_prod_cnt;
+	data[i++] = priv->adminq_cmd_fail;
+	data[i++] = priv->adminq_timeouts;
+	data[i++] = priv->adminq_describe_device_cnt;
+	data[i++] = priv->adminq_cfg_device_resources_cnt;
+	data[i++] = priv->adminq_register_page_list_cnt;
+	data[i++] = priv->adminq_unregister_page_list_cnt;
+	data[i++] = priv->adminq_create_tx_queue_cnt;
+	data[i++] = priv->adminq_create_rx_queue_cnt;
+	data[i++] = priv->adminq_destroy_tx_queue_cnt;
+	data[i++] = priv->adminq_destroy_rx_queue_cnt;
+	data[i++] = priv->adminq_dcfg_device_resources_cnt;
+	data[i++] = priv->adminq_set_driver_parameter_cnt;
 }
 
 static void gve_get_channels(struct net_device *netdev,
@@ -245,7 +335,8 @@ static int gve_get_tunable(struct net_device *netdev,
 }
 
 static int gve_set_tunable(struct net_device *netdev,
-			   const struct ethtool_tunable *etuna, const void *value)
+			   const struct ethtool_tunable *etuna,
+			   const void *value)
 {
 	struct gve_priv *priv = netdev_priv(netdev);
 	u32 len;
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index e032563ceefd..4f6c1fc9c58d 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -514,14 +514,18 @@ static void gve_free_rings(struct gve_priv *priv)
 	}
 }
 
-int gve_alloc_page(struct device *dev, struct page **page, dma_addr_t *dma,
+int gve_alloc_page(struct gve_priv *priv, struct device *dev,
+		   struct page **page, dma_addr_t *dma,
 		   enum dma_data_direction dir)
 {
 	*page = alloc_page(GFP_KERNEL);
-	if (!*page)
+	if (!*page) {
+		priv->page_alloc_fail++;
 		return -ENOMEM;
+	}
 	*dma = dma_map_page(dev, *page, 0, PAGE_SIZE, dir);
 	if (dma_mapping_error(dev, *dma)) {
+		priv->dma_mapping_error++;
 		put_page(*page);
 		return -ENOMEM;
 	}
@@ -556,7 +560,7 @@ static int gve_alloc_queue_page_list(struct gve_priv *priv, u32 id,
 		return -ENOMEM;
 
 	for (i = 0; i < pages; i++) {
-		err = gve_alloc_page(&priv->pdev->dev, &qpl->pages[i],
+		err = gve_alloc_page(priv, &priv->pdev->dev, &qpl->pages[i],
 				     &qpl->page_buses[i],
 				     gve_qpl_dma_dir(priv, id));
 		/* caller handles clean up */
@@ -697,6 +701,7 @@ static int gve_open(struct net_device *dev)
 
 	gve_turnup(priv);
 	netif_carrier_on(dev);
+	priv->interface_up_cnt++;
 	return 0;
 
 free_rings:
@@ -738,6 +743,7 @@ static int gve_close(struct net_device *dev)
 
 	gve_free_rings(priv);
 	gve_free_qpls(priv);
+	priv->interface_down_cnt++;
 	return 0;
 
 err:
@@ -1047,6 +1053,9 @@ int gve_reset(struct gve_priv *priv, bool attempt_teardown)
 	/* Set it all back up */
 	err = gve_reset_recovery(priv, was_up);
 	gve_clear_reset_in_progress(priv);
+	priv->reset_cnt++;
+	priv->interface_up_cnt = 0;
+	priv->interface_down_cnt = 0;
 	return err;
 }
 
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index 9f52e72ff641..008fa897a3e6 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -225,7 +225,8 @@ static enum pkt_hash_types gve_rss_type(__be16 pkt_flags)
 	return PKT_HASH_TYPE_L2;
 }
 
-static struct sk_buff *gve_rx_copy(struct net_device *dev,
+static struct sk_buff *gve_rx_copy(struct gve_rx_ring *rx,
+				   struct net_device *dev,
 				   struct napi_struct *napi,
 				   struct gve_rx_slot_page_info *page_info,
 				   u16 len)
@@ -242,6 +243,11 @@ static struct sk_buff *gve_rx_copy(struct net_device *dev,
 	skb_copy_to_linear_data(skb, va, len);
 
 	skb->protocol = eth_type_trans(skb, dev);
+
+	u64_stats_update_begin(&rx->statss);
+	rx->rx_copied_pkt++;
+	u64_stats_update_end(&rx->statss);
+
 	return skb;
 }
 
@@ -284,8 +290,12 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 	u16 len;
 
 	/* drop this packet */
-	if (unlikely(rx_desc->flags_seq & GVE_RXF_ERR))
+	if (unlikely(rx_desc->flags_seq & GVE_RXF_ERR)) {
+		u64_stats_update_begin(&rx->statss);
+		rx->rx_desc_err_dropped_pkt++;
+		u64_stats_update_end(&rx->statss);
 		return true;
+	}
 
 	len = be16_to_cpu(rx_desc->len) - GVE_RX_PAD;
 	page_info = &rx->data.page_info[idx];
@@ -300,11 +310,14 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 	if (PAGE_SIZE == 4096) {
 		if (len <= priv->rx_copybreak) {
 			/* Just copy small packets */
-			skb = gve_rx_copy(dev, napi, page_info, len);
+			skb = gve_rx_copy(rx, dev, napi, page_info, len);
+			u64_stats_update_begin(&rx->statss);
+			rx->rx_copybreak_pkt++;
+			u64_stats_update_end(&rx->statss);
 			goto have_skb;
 		}
 		if (unlikely(!gve_can_recycle_pages(dev))) {
-			skb = gve_rx_copy(dev, napi, page_info, len);
+			skb = gve_rx_copy(rx, dev, napi, page_info, len);
 			goto have_skb;
 		}
 		pagecount = page_count(page_info->page);
@@ -314,8 +327,12 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 			 * stack.
 			 */
 			skb = gve_rx_add_frags(dev, napi, page_info, len);
-			if (!skb)
+			if (!skb) {
+				u64_stats_update_begin(&rx->statss);
+				rx->rx_skb_alloc_fail++;
+				u64_stats_update_end(&rx->statss);
 				return true;
+			}
 			/* Make sure the kernel stack can't release the page */
 			get_page(page_info->page);
 			/* "flip" to other packet buffer on this page */
@@ -324,21 +341,25 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 			/* We have previously passed the other half of this
 			 * page up the stack, but it has not yet been freed.
 			 */
-			skb = gve_rx_copy(dev, napi, page_info, len);
+			skb = gve_rx_copy(rx, dev, napi, page_info, len);
 		} else {
 			WARN(pagecount < 1, "Pagecount should never be < 1");
 			return false;
 		}
 	} else {
-		skb = gve_rx_copy(dev, napi, page_info, len);
+		skb = gve_rx_copy(rx, dev, napi, page_info, len);
 	}
 
 have_skb:
 	/* We didn't manage to allocate an skb but we haven't had any
 	 * reset worthy failures.
 	 */
-	if (!skb)
+	if (!skb) {
+		u64_stats_update_begin(&rx->statss);
+		rx->rx_skb_alloc_fail++;
+		u64_stats_update_end(&rx->statss);
 		return true;
+	}
 
 	if (likely(feat & NETIF_F_RXCSUM)) {
 		/* NIC passes up the partial sum */
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 03/18] gve: Register netdev earlier
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 01/18] gve: Get and set Rx copybreak via ethtool David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 02/18] gve: Add stats for gve David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 20:09   ` Andrew Lunn
  2020-08-18 19:44 ` [PATCH net-next 04/18] gve: Add support for dma_mask register David Awogbemila
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

Move the registration of the netdev up to initialize
it before doing any logging so that the correct device name
is displayed.

dev_info/err should then be used, not netif_info/err.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_adminq.c | 15 +++++------
 drivers/net/ethernet/google/gve/gve_main.c   | 27 ++++++++++++--------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 9dbfcfb8cf63..6a93fe8e6a2b 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -334,8 +334,7 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 
 	priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
 	if (priv->tx_desc_cnt * sizeof(priv->tx->desc[0]) < PAGE_SIZE) {
-		netif_err(priv, drv, priv->dev, "Tx desc count %d too low\n",
-			  priv->tx_desc_cnt);
+		dev_err(&priv->pdev->dev, "Tx desc count %d too low\n", priv->tx_desc_cnt);
 		err = -EINVAL;
 		goto free_device_descriptor;
 	}
@@ -344,8 +343,7 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 	    < PAGE_SIZE ||
 	    priv->rx_desc_cnt * sizeof(priv->rx->data.data_ring[0])
 	    < PAGE_SIZE) {
-		netif_err(priv, drv, priv->dev, "Rx desc count %d too low\n",
-			  priv->rx_desc_cnt);
+		dev_err(&priv->pdev->dev, "Rx desc count %d too low\n", priv->rx_desc_cnt);
 		err = -EINVAL;
 		goto free_device_descriptor;
 	}
@@ -353,8 +351,7 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 				be64_to_cpu(descriptor->max_registered_pages);
 	mtu = be16_to_cpu(descriptor->mtu);
 	if (mtu < ETH_MIN_MTU) {
-		netif_err(priv, drv, priv->dev, "MTU %d below minimum MTU\n",
-			  mtu);
+		dev_err(&priv->pdev->dev, "MTU %d below minimum MTU\n", mtu);
 		err = -EINVAL;
 		goto free_device_descriptor;
 	}
@@ -362,12 +359,12 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 	priv->num_event_counters = be16_to_cpu(descriptor->counters);
 	ether_addr_copy(priv->dev->dev_addr, descriptor->mac);
 	mac = descriptor->mac;
-	netif_info(priv, drv, priv->dev, "MAC addr: %pM\n", mac);
+	dev_info(&priv->pdev->dev, "MAC addr: %pM\n", mac);
 	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
 	priv->rx_pages_per_qpl = be16_to_cpu(descriptor->rx_pages_per_qpl);
 	if (priv->rx_pages_per_qpl < priv->rx_desc_cnt) {
-		netif_err(priv, drv, priv->dev, "rx_pages_per_qpl cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d.\n",
-			  priv->rx_pages_per_qpl);
+		dev_err(&priv->pdev->dev, "rx_pages_per_qpl cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d.\n",
+			priv->rx_pages_per_qpl);
 		priv->rx_desc_cnt = priv->rx_pages_per_qpl;
 	}
 	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 4f6c1fc9c58d..b0c1cfe44a73 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -930,7 +930,7 @@ static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 		priv->dev->max_mtu = PAGE_SIZE;
 		err = gve_adminq_set_mtu(priv, priv->dev->mtu);
 		if (err) {
-			netif_err(priv, drv, priv->dev, "Could not set mtu");
+			dev_err(&priv->pdev->dev, "Could not set mtu");
 			goto err;
 		}
 	}
@@ -970,10 +970,10 @@ static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 						priv->rx_cfg.num_queues);
 	}
 
-	netif_info(priv, drv, priv->dev, "TX queues %d, RX queues %d\n",
-		   priv->tx_cfg.num_queues, priv->rx_cfg.num_queues);
-	netif_info(priv, drv, priv->dev, "Max TX queues %d, Max RX queues %d\n",
-		   priv->tx_cfg.max_queues, priv->rx_cfg.max_queues);
+	dev_info(&priv->pdev->dev, "TX queues %d, RX queues %d\n",
+		 priv->tx_cfg.num_queues, priv->rx_cfg.num_queues);
+	dev_info(&priv->pdev->dev, "Max TX queues %d, Max RX queues %d\n",
+		 priv->tx_cfg.max_queues, priv->rx_cfg.max_queues);
 
 setup_device:
 	err = gve_setup_device_resources(priv);
@@ -1133,7 +1133,9 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto abort_with_db_bar;
 	}
 	SET_NETDEV_DEV(dev, &pdev->dev);
+
 	pci_set_drvdata(pdev, dev);
+
 	dev->ethtool_ops = &gve_ethtool_ops;
 	dev->netdev_ops = &gve_netdev_ops;
 	/* advertise features */
@@ -1160,11 +1162,16 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	priv->state_flags = 0x0;
 
 	gve_set_probe_in_progress(priv);
+
+	err = register_netdev(dev);
+	if (err)
+		goto abort_with_netdev;
+
 	priv->gve_wq = alloc_ordered_workqueue("gve", 0);
 	if (!priv->gve_wq) {
 		dev_err(&pdev->dev, "Could not allocate workqueue");
 		err = -ENOMEM;
-		goto abort_with_netdev;
+		goto abort_while_registered;
 	}
 	INIT_WORK(&priv->service_task, gve_service_task);
 	priv->tx_cfg.max_queues = max_tx_queues;
@@ -1174,18 +1181,18 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto abort_with_wq;
 
-	err = register_netdev(dev);
-	if (err)
-		goto abort_with_wq;
-
 	dev_info(&pdev->dev, "GVE version %s\n", gve_version_str);
 	gve_clear_probe_in_progress(priv);
 	queue_work(priv->gve_wq, &priv->service_task);
+
 	return 0;
 
 abort_with_wq:
 	destroy_workqueue(priv->gve_wq);
 
+abort_while_registered:
+	unregister_netdev(dev);
+
 abort_with_netdev:
 	free_netdev(dev);
 
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 04/18] gve: Add support for dma_mask register
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (2 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 03/18] gve: Register netdev earlier David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 20:15   ` Andrew Lunn
  2020-08-18 19:44 ` [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags David Awogbemila
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

Add the dma_mask register and read it to set the dma_masks.
gve_alloc_page will alloc_page with:
 GFP_DMA if priv->dma_mask is 24,
 GFP_DMA32 if priv->dma_mask is 32.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h         |  6 ++-
 drivers/net/ethernet/google/gve/gve_main.c    | 42 ++++++++++++-------
 .../net/ethernet/google/gve/gve_register.h    |  3 +-
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 55b34b437764..37a3bbced36a 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -232,6 +232,9 @@ struct gve_priv {
 	struct work_struct service_task;
 	unsigned long service_task_flags;
 	unsigned long state_flags;
+
+  /* Gvnic device's dma mask, set during probe. */
+	u8 dma_mask;
 };
 
 enum gve_service_task_flags {
@@ -451,8 +454,7 @@ static inline bool gve_can_recycle_pages(struct net_device *dev)
 }
 
 /* buffers */
-int gve_alloc_page(struct gve_priv *priv, struct device *dev,
-		   struct page **page, dma_addr_t *dma,
+int gve_alloc_page(struct gve_priv *priv, struct device *dev, struct page **page, dma_addr_t *dma,
 		   enum dma_data_direction);
 void gve_free_page(struct device *dev, struct page *page, dma_addr_t dma,
 		   enum dma_data_direction);
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index b0c1cfe44a73..449345d24f29 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -518,7 +518,14 @@ int gve_alloc_page(struct gve_priv *priv, struct device *dev,
 		   struct page **page, dma_addr_t *dma,
 		   enum dma_data_direction dir)
 {
-	*page = alloc_page(GFP_KERNEL);
+	gfp_t gfp_flags = GFP_KERNEL;
+
+	if (priv->dma_mask == 24)
+		gfp_flags |= GFP_DMA;
+	else if (priv->dma_mask == 32)
+		gfp_flags |= GFP_DMA32;
+
+	*page = alloc_page(gfp_flags);
 	if (!*page) {
 		priv->page_alloc_fail++;
 		return -ENOMEM;
@@ -1083,6 +1090,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	__be32 __iomem *db_bar;
 	struct gve_registers __iomem *reg_bar;
 	struct gve_priv *priv;
+	u8 dma_mask;
 	int err;
 
 	err = pci_enable_device(pdev);
@@ -1095,19 +1103,6 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	pci_set_master(pdev);
 
-	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
-	if (err) {
-		dev_err(&pdev->dev, "Failed to set dma mask: err=%d\n", err);
-		goto abort_with_pci_region;
-	}
-
-	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
-	if (err) {
-		dev_err(&pdev->dev,
-			"Failed to set consistent dma mask: err=%d\n", err);
-		goto abort_with_pci_region;
-	}
-
 	reg_bar = pci_iomap(pdev, GVE_REGISTER_BAR, 0);
 	if (!reg_bar) {
 		dev_err(&pdev->dev, "Failed to map pci bar!\n");
@@ -1122,10 +1117,28 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto abort_with_reg_bar;
 	}
 
+	dma_mask = readb(&reg_bar->dma_mask);
+	// Default to 64 if the register isn't set
+	if (!dma_mask)
+		dma_mask = 64;
 	gve_write_version(&reg_bar->driver_version);
 	/* Get max queues to alloc etherdev */
 	max_rx_queues = ioread32be(&reg_bar->max_tx_queues);
 	max_tx_queues = ioread32be(&reg_bar->max_rx_queues);
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (err) {
+		dev_err(&pdev->dev, "Failed to set dma mask: err=%d\n", err);
+		goto abort_with_reg_bar;
+	}
+
+	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (err) {
+		dev_err(&pdev->dev,
+			"Failed to set consistent dma mask: err=%d\n", err);
+		goto abort_with_reg_bar;
+	}
+
 	/* Alloc and setup the netdev and priv */
 	dev = alloc_etherdev_mqs(sizeof(*priv), max_tx_queues, max_rx_queues);
 	if (!dev) {
@@ -1160,6 +1173,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	priv->db_bar2 = db_bar;
 	priv->service_task_flags = 0x0;
 	priv->state_flags = 0x0;
+	priv->dma_mask = dma_mask;
 
 	gve_set_probe_in_progress(priv);
 
diff --git a/drivers/net/ethernet/google/gve/gve_register.h b/drivers/net/ethernet/google/gve/gve_register.h
index 84ab8893aadd..fad8813d1bb1 100644
--- a/drivers/net/ethernet/google/gve/gve_register.h
+++ b/drivers/net/ethernet/google/gve/gve_register.h
@@ -16,7 +16,8 @@ struct gve_registers {
 	__be32	adminq_pfn;
 	__be32	adminq_doorbell;
 	__be32	adminq_event_counter;
-	u8	reserved[3];
+	u8	reserved[2];
+	u8	dma_mask;
 	u8	driver_version;
 };
 
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (3 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 04/18] gve: Add support for dma_mask register David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-19  3:13   ` Jakub Kicinski
  2020-08-18 19:44 ` [PATCH net-next 06/18] gve: Batch AQ commands for creating and destroying queues David Awogbemila
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Kuo Zhao, Yangchun Fu, David Awogbemila

From: Kuo Zhao <kuozhao@google.com>

Changes:
- Add a new flag in service_task_flags. Check both this flag and
ethtool flag when handle report stats. Update the stats when user turns
ethtool flag on.

- In order to expose the NIC stats to the guest even when the ethtool flag
is off, share the address and length of report at setup. When the
ethtool flag turned off, zero off the gve stats instead of detaching the
report. Only detach the report in free_stats_report.

- Adds the NIC stats to ethtool stats. These stats are always
exposed to guest no matter the report stats flag is turned
on or off.

- Update gve stats once every 20 seconds.

- Add a field for the interval of updating stats report to the AQ
command. It will be exposed to USPS so that USPS can use the same
interval to update its stats in the report.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Kuo Zhao <kuozhao@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h         |  77 ++++++-
 drivers/net/ethernet/google/gve/gve_adminq.c  |  21 ++
 drivers/net/ethernet/google/gve/gve_adminq.h  |  43 ++++
 drivers/net/ethernet/google/gve/gve_ethtool.c | 204 ++++++++++++++++--
 drivers/net/ethernet/google/gve/gve_main.c    | 147 ++++++++++++-
 .../net/ethernet/google/gve/gve_register.h    |   1 +
 6 files changed, 458 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 37a3bbced36a..bc54059f9b2e 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -27,6 +27,17 @@
 /* 1 for management, 1 for rx, 1 for tx */
 #define GVE_MIN_MSIX 3
 
+/* Numbers of gve tx/rx stats in stats report. */
+#define GVE_TX_STATS_REPORT_NUM	5
+#define GVE_RX_STATS_REPORT_NUM	2
+
+/* Numbers of NIC tx/rx stats in stats report. */
+#define NIC_TX_STATS_REPORT_NUM	0
+#define NIC_RX_STATS_REPORT_NUM	4
+
+/* Interval to schedule a service task, 20000ms. */
+#define GVE_SERVICE_TIMER_PERIOD	20000
+
 /* Each slot in the desc ring has a 1:1 mapping to a slot in the data ring */
 struct gve_rx_desc_queue {
 	struct gve_rx_desc *desc_ring; /* the descriptor ring */
@@ -220,6 +231,7 @@ struct gve_priv {
 	u32 adminq_destroy_rx_queue_cnt;
 	u32 adminq_dcfg_device_resources_cnt;
 	u32 adminq_set_driver_parameter_cnt;
+	u32 adminq_report_stats_cnt;
 
 	/* Global stats */
 	u32 interface_up_cnt; /* count of times interface turned up since last reset */
@@ -227,27 +239,39 @@ struct gve_priv {
 	u32 reset_cnt; /* count of reset */
 	u32 page_alloc_fail; /* count of page alloc fails */
 	u32 dma_mapping_error; /* count of dma mapping errors */
-
 	struct workqueue_struct *gve_wq;
 	struct work_struct service_task;
 	unsigned long service_task_flags;
 	unsigned long state_flags;
 
+	struct gve_stats_report *stats_report;
+	u64 stats_report_len;
+	dma_addr_t stats_report_bus; /* dma address for the stats report */
+	unsigned long ethtool_flags;
+
+	unsigned long service_timer_period;
+	struct timer_list service_timer;
+
   /* Gvnic device's dma mask, set during probe. */
 	u8 dma_mask;
 };
 
-enum gve_service_task_flags {
-	GVE_PRIV_FLAGS_DO_RESET			= BIT(1),
-	GVE_PRIV_FLAGS_RESET_IN_PROGRESS	= BIT(2),
-	GVE_PRIV_FLAGS_PROBE_IN_PROGRESS	= BIT(3),
+enum gve_service_task_flags_bit {
+	GVE_PRIV_FLAGS_DO_RESET			= 1,
+	GVE_PRIV_FLAGS_RESET_IN_PROGRESS	= 2,
+	GVE_PRIV_FLAGS_PROBE_IN_PROGRESS	= 3,
+	GVE_PRIV_FLAGS_DO_REPORT_STATS = 4,
+};
+
+enum gve_state_flags_bit {
+	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= 1,
+	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= 2,
+	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= 3,
+	GVE_PRIV_FLAGS_NAPI_ENABLED		= 4,
 };
 
-enum gve_state_flags {
-	GVE_PRIV_FLAGS_ADMIN_QUEUE_OK		= BIT(1),
-	GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK	= BIT(2),
-	GVE_PRIV_FLAGS_DEVICE_RINGS_OK		= BIT(3),
-	GVE_PRIV_FLAGS_NAPI_ENABLED		= BIT(4),
+enum gve_ethtool_flags_bit {
+	GVE_PRIV_FLAGS_REPORT_STATS		= 0,
 };
 
 static inline bool gve_get_do_reset(struct gve_priv *priv)
@@ -297,6 +321,22 @@ static inline void gve_clear_probe_in_progress(struct gve_priv *priv)
 	clear_bit(GVE_PRIV_FLAGS_PROBE_IN_PROGRESS, &priv->service_task_flags);
 }
 
+static inline bool gve_get_do_report_stats(struct gve_priv *priv)
+{
+	return test_bit(GVE_PRIV_FLAGS_DO_REPORT_STATS,
+			&priv->service_task_flags);
+}
+
+static inline void gve_set_do_report_stats(struct gve_priv *priv)
+{
+	set_bit(GVE_PRIV_FLAGS_DO_REPORT_STATS, &priv->service_task_flags);
+}
+
+static inline void gve_clear_do_report_stats(struct gve_priv *priv)
+{
+	clear_bit(GVE_PRIV_FLAGS_DO_REPORT_STATS, &priv->service_task_flags);
+}
+
 static inline bool gve_get_admin_queue_ok(struct gve_priv *priv)
 {
 	return test_bit(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK, &priv->state_flags);
@@ -357,6 +397,21 @@ static inline void gve_clear_napi_enabled(struct gve_priv *priv)
 	clear_bit(GVE_PRIV_FLAGS_NAPI_ENABLED, &priv->state_flags);
 }
 
+static inline bool gve_get_report_stats(struct gve_priv *priv)
+{
+	return test_bit(GVE_PRIV_FLAGS_REPORT_STATS, &priv->ethtool_flags);
+}
+
+static inline void gve_set_report_stats(struct gve_priv *priv)
+{
+	set_bit(GVE_PRIV_FLAGS_REPORT_STATS, &priv->ethtool_flags);
+}
+
+static inline void gve_clear_report_stats(struct gve_priv *priv)
+{
+	clear_bit(GVE_PRIV_FLAGS_REPORT_STATS, &priv->ethtool_flags);
+}
+
 /* Returns the address of the ntfy_blocks irq doorbell
  */
 static inline __be32 __iomem *gve_irq_doorbell(struct gve_priv *priv,
@@ -478,6 +533,8 @@ int gve_reset(struct gve_priv *priv, bool attempt_teardown);
 int gve_adjust_queues(struct gve_priv *priv,
 		      struct gve_queue_config new_rx_config,
 		      struct gve_queue_config new_tx_config);
+/* report stats handling */
+void gve_handle_report_stats(struct gve_priv *priv);
 /* exported by ethtool.c */
 extern const struct ethtool_ops gve_ethtool_ops;
 /* needed by ethtool */
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 6a93fe8e6a2b..38e0c3066035 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -35,6 +35,7 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
 	priv->adminq_destroy_rx_queue_cnt = 0;
 	priv->adminq_dcfg_device_resources_cnt = 0;
 	priv->adminq_set_driver_parameter_cnt = 0;
+	priv->adminq_report_stats_cnt = 0;
 
 	/* Setup Admin queue with the device */
 	iowrite32be(priv->adminq_bus_addr / PAGE_SIZE,
@@ -183,6 +184,9 @@ int gve_adminq_execute_cmd(struct gve_priv *priv,
 	case GVE_ADMINQ_SET_DRIVER_PARAMETER:
 		priv->adminq_set_driver_parameter_cnt++;
 		break;
+	case GVE_ADMINQ_REPORT_STATS:
+		priv->adminq_report_stats_cnt++;
+		break;
 	default:
 		dev_err(&priv->pdev->dev, "unknown AQ command opcode %d\n", opcode);
 	}
@@ -433,3 +437,20 @@ int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
 
 	return gve_adminq_execute_cmd(priv, &cmd);
 }
+
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval)
+{
+	union gve_adminq_command cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_STATS);
+	cmd.report_stats = (struct gve_adminq_report_stats) {
+		.stats_report_len = cpu_to_be64(stats_report_len),
+		.stats_report_addr = cpu_to_be64(stats_report_addr),
+		.interval = cpu_to_be64(interval),
+	};
+
+	return gve_adminq_execute_cmd(priv, &cmd);
+}
+
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index 4dfa06edc0f8..a6c8c29f0d13 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -21,6 +21,7 @@ enum gve_adminq_opcodes {
 	GVE_ADMINQ_DESTROY_RX_QUEUE		= 0x8,
 	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
 	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
+	GVE_ADMINQ_REPORT_STATS			= 0xC,
 };
 
 /* Admin queue status codes */
@@ -172,6 +173,45 @@ struct gve_adminq_set_driver_parameter {
 
 static_assert(sizeof(struct gve_adminq_set_driver_parameter) == 16);
 
+struct gve_adminq_report_stats {
+	__be64 stats_report_len;
+	__be64 stats_report_addr;
+	__be64 interval;
+};
+
+static_assert(sizeof(struct gve_adminq_report_stats) == 24);
+
+struct stats {
+	__be32 stat_name;
+	__be32 queue_id;
+	__be64 value;
+};
+
+static_assert(sizeof(struct stats) == 16);
+
+struct gve_stats_report {
+	__be64 written_count;
+	struct stats stats[0];
+};
+
+static_assert(sizeof(struct gve_stats_report) == 8);
+
+enum gve_stat_names {
+	// stats from gve
+	TX_WAKE_CNT			= 1,
+	TX_STOP_CNT			= 2,
+	TX_FRAMES_SENT			= 3,
+	TX_BYTES_SENT			= 4,
+	TX_LAST_COMPLETION_PROCESSED	= 5,
+	RX_NEXT_EXPECTED_SEQUENCE	= 6,
+	RX_BUFFERS_POSTED		= 7,
+	// stats from NIC
+	RX_QUEUE_DROP_CNT		= 65,
+	RX_NO_BUFFERS_POSTED		= 66,
+	RX_DROPS_PACKET_OVER_MRU	= 67,
+	RX_DROPS_INVALID_CHECKSUM	= 68,
+};
+
 union gve_adminq_command {
 	struct {
 		__be32 opcode;
@@ -187,6 +227,7 @@ union gve_adminq_command {
 			struct gve_adminq_register_page_list reg_page_list;
 			struct gve_adminq_unregister_page_list unreg_page_list;
 			struct gve_adminq_set_driver_parameter set_driver_param;
+			struct gve_adminq_report_stats report_stats;
 		};
 	};
 	u8 reserved[64];
@@ -214,4 +255,6 @@ int gve_adminq_register_page_list(struct gve_priv *priv,
 				  struct gve_queue_page_list *qpl);
 int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
 int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
+int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
+			    dma_addr_t stats_report_addr, u64 interval);
 #endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index ebbf2c4c444e..9d778e04b9f4 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -6,6 +6,7 @@
 
 #include <linux/rtnetlink.h>
 #include "gve.h"
+#include "gve_adminq.h"
 
 static void gve_get_drvinfo(struct net_device *netdev,
 			    struct ethtool_drvinfo *info)
@@ -42,6 +43,8 @@ static const char gve_gstrings_main_stats[][ETH_GSTRING_LEN] = {
 static const char gve_gstrings_rx_stats[][ETH_GSTRING_LEN] = {
 	"rx_posted_desc[%u]", "rx_completed_desc[%u]", "rx_bytes[%u]",
 	"rx_dropped_pkt[%u]", "rx_copybreak_pkt[%u]", "rx_copied_pkt[%u]",
+	"rx_queue_drop_cnt[%u]", "rx_no_buffers_posted[%u]",
+	"rx_drops_packet_over_mru[%u]", "rx_drops_invalid_checksum[%u]",
 };
 
 static const char gve_gstrings_tx_stats[][ETH_GSTRING_LEN] = {
@@ -56,12 +59,18 @@ static const char gve_gstrings_adminq_stats[][ETH_GSTRING_LEN] = {
 	"adminq_create_tx_queue_cnt", "adminq_create_rx_queue_cnt",
 	"adminq_destroy_tx_queue_cnt", "adminq_destroy_rx_queue_cnt",
 	"adminq_dcfg_device_resources_cnt", "adminq_set_driver_parameter_cnt",
+	"adminq_report_stats_cnt",
+};
+
+static const char gve_gstrings_priv_flags[][ETH_GSTRING_LEN] = {
+	"report-stats",
 };
 
 #define GVE_MAIN_STATS_LEN  ARRAY_SIZE(gve_gstrings_main_stats)
 #define GVE_ADMINQ_STATS_LEN  ARRAY_SIZE(gve_gstrings_adminq_stats)
 #define NUM_GVE_TX_CNTS	ARRAY_SIZE(gve_gstrings_tx_stats)
 #define NUM_GVE_RX_CNTS	ARRAY_SIZE(gve_gstrings_rx_stats)
+#define GVE_PRIV_FLAGS_STR_LEN ARRAY_SIZE(gve_gstrings_priv_flags)
 
 static void gve_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 {
@@ -69,30 +78,42 @@ static void gve_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 	char *s = (char *)data;
 	int i, j;
 
-	if (stringset != ETH_SS_STATS)
-		return;
-
-	memcpy(s, *gve_gstrings_main_stats,
-	       sizeof(gve_gstrings_main_stats));
-	s += sizeof(gve_gstrings_main_stats);
-
-	for (i = 0; i < priv->rx_cfg.num_queues; i++) {
-		for (j = 0; j < NUM_GVE_RX_CNTS; j++) {
-			snprintf(s, ETH_GSTRING_LEN, gve_gstrings_rx_stats[j], i);
-			s += ETH_GSTRING_LEN;
+	switch (stringset) {
+	case ETH_SS_STATS:
+		memcpy(s, *gve_gstrings_main_stats,
+		       sizeof(gve_gstrings_main_stats));
+		s += sizeof(gve_gstrings_main_stats);
+
+		for (i = 0; i < priv->rx_cfg.num_queues; i++) {
+			for (j = 0; j < NUM_GVE_RX_CNTS; j++) {
+				snprintf(s, ETH_GSTRING_LEN,
+					 gve_gstrings_rx_stats[j], i);
+				s += ETH_GSTRING_LEN;
+			}
 		}
-	}
 
-	for (i = 0; i < priv->tx_cfg.num_queues; i++) {
-		for (j = 0; j < NUM_GVE_TX_CNTS; j++) {
-			snprintf(s, ETH_GSTRING_LEN, gve_gstrings_tx_stats[j], i);
-			s += ETH_GSTRING_LEN;
+		for (i = 0; i < priv->tx_cfg.num_queues; i++) {
+			for (j = 0; j < NUM_GVE_TX_CNTS; j++) {
+				snprintf(s, ETH_GSTRING_LEN,
+					 gve_gstrings_tx_stats[j], i);
+				s += ETH_GSTRING_LEN;
+			}
 		}
-	}
 
-	memcpy(s, *gve_gstrings_adminq_stats,
-	       sizeof(gve_gstrings_adminq_stats));
-	s += sizeof(gve_gstrings_adminq_stats);
+		memcpy(s, *gve_gstrings_adminq_stats,
+		       sizeof(gve_gstrings_adminq_stats));
+		s += sizeof(gve_gstrings_adminq_stats);
+		break;
+
+	case ETH_SS_PRIV_FLAGS:
+		memcpy(s, *gve_gstrings_priv_flags,
+		       sizeof(gve_gstrings_priv_flags));
+		s += sizeof(gve_gstrings_priv_flags);
+		break;
+
+	default:
+		break;
+	}
 }
 
 static int gve_get_sset_count(struct net_device *netdev, int sset)
@@ -104,6 +125,8 @@ static int gve_get_sset_count(struct net_device *netdev, int sset)
 		return GVE_MAIN_STATS_LEN + GVE_ADMINQ_STATS_LEN +
 		       (priv->rx_cfg.num_queues * NUM_GVE_RX_CNTS) +
 		       (priv->tx_cfg.num_queues * NUM_GVE_TX_CNTS);
+	case ETH_SS_PRIV_FLAGS:
+		return GVE_PRIV_FLAGS_STR_LEN;
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -113,18 +136,36 @@ static void
 gve_get_ethtool_stats(struct net_device *netdev,
 		      struct ethtool_stats *stats, u64 *data)
 {
+	u64 rx_pkts, rx_bytes, rx_skb_alloc_fail, rx_buf_alloc_fail,
+		rx_desc_err_dropped_pkt, tx_pkts,
+		tx_bytes;
 	u64 tmp_rx_pkts, tmp_rx_bytes, tmp_rx_skb_alloc_fail,
 		tmp_rx_buf_alloc_fail, tmp_rx_desc_err_dropped_pkt,
 		tmp_tx_pkts, tmp_tx_bytes;
-	u64 rx_pkts, rx_bytes, rx_skb_alloc_fail, rx_buf_alloc_fail,
-		rx_desc_err_dropped_pkt, tx_pkts, tx_bytes;
+	int stats_idx, base_stats_idx, max_stats_idx;
 	struct gve_priv *priv = netdev_priv(netdev);
+	struct stats *report_stats;
+	int *rx_qid_to_stats_idx;
+	int *tx_qid_to_stats_idx;
+	bool skip_nic_stats;
 	unsigned int start;
 	int ring;
-	int i;
+	int i, j;
 
 	ASSERT_RTNL();
 
+	report_stats = priv->stats_report->stats;
+	rx_qid_to_stats_idx = kmalloc_array(priv->rx_cfg.num_queues,
+					    sizeof(int), GFP_KERNEL);
+	if (!rx_qid_to_stats_idx)
+		return;
+	tx_qid_to_stats_idx = kmalloc_array(priv->tx_cfg.num_queues,
+					    sizeof(int), GFP_KERNEL);
+	if (!tx_qid_to_stats_idx) {
+		kfree(rx_qid_to_stats_idx);
+		return;
+	}
+
 	for (rx_pkts = 0, rx_bytes = 0, rx_skb_alloc_fail = 0,
 	     rx_buf_alloc_fail = 0, rx_desc_err_dropped_pkt = 0, ring = 0;
 	     ring < priv->rx_cfg.num_queues; ring++) {
@@ -185,6 +226,25 @@ gve_get_ethtool_stats(struct net_device *netdev,
 	data[i++] = priv->dma_mapping_error;
 	i = GVE_MAIN_STATS_LEN;
 
+	/* For rx cross-reporting stats, start from nic rx stats in report */
+	base_stats_idx = GVE_TX_STATS_REPORT_NUM * priv->tx_cfg.num_queues +
+		GVE_RX_STATS_REPORT_NUM * priv->rx_cfg.num_queues;
+	max_stats_idx = NIC_RX_STATS_REPORT_NUM * priv->rx_cfg.num_queues +
+		base_stats_idx;
+	/* Preprocess the stats report for rx, map queue id to start index */
+	skip_nic_stats = false;
+	for (stats_idx = base_stats_idx; stats_idx < max_stats_idx;
+		stats_idx += NIC_RX_STATS_REPORT_NUM) {
+		u32 stat_name = be32_to_cpu(report_stats[stats_idx].stat_name);
+		u32 queue_id = be32_to_cpu(report_stats[stats_idx].queue_id);
+
+		if (stat_name == 0) {
+			/* no stats written by NIC yet */
+			skip_nic_stats = true;
+			break;
+		}
+		rx_qid_to_stats_idx[queue_id] = stats_idx;
+	}
 	/* walk RX rings */
 	if (priv->rx) {
 		for (ring = 0; ring < priv->rx_cfg.num_queues; ring++) {
@@ -209,10 +269,41 @@ gve_get_ethtool_stats(struct net_device *netdev,
 				tmp_rx_desc_err_dropped_pkt;
 			data[i++] = rx->rx_copybreak_pkt;
 			data[i++] = rx->rx_copied_pkt;
+			/* stats from NIC */
+			if (skip_nic_stats) {
+				/* skip NIC rx stats */
+				i += NIC_RX_STATS_REPORT_NUM;
+				continue;
+			}
+			for (j = 0; j < NIC_RX_STATS_REPORT_NUM; j++) {
+				u64 value =
+				be64_to_cpu(report_stats[rx_qid_to_stats_idx[ring] + j].value);
+
+				data[i++] = value;
+			}
 		}
 	} else {
 		i += priv->rx_cfg.num_queues * NUM_GVE_RX_CNTS;
 	}
+
+	/* For tx cross-reporting stats, start from nic tx stats in report */
+	base_stats_idx = max_stats_idx;
+	max_stats_idx = NIC_TX_STATS_REPORT_NUM * priv->tx_cfg.num_queues +
+		max_stats_idx;
+	/* Preprocess the stats report for tx, map queue id to start index */
+	skip_nic_stats = false;
+	for (stats_idx = base_stats_idx; stats_idx < max_stats_idx;
+		stats_idx += NIC_TX_STATS_REPORT_NUM) {
+		u32 stat_name = be32_to_cpu(report_stats[stats_idx].stat_name);
+		u32 queue_id = be32_to_cpu(report_stats[stats_idx].queue_id);
+
+		if (stat_name == 0) {
+			/* no stats written by NIC yet */
+			skip_nic_stats = true;
+			break;
+		}
+		tx_qid_to_stats_idx[queue_id] = stats_idx;
+	}
 	/* walk TX rings */
 	if (priv->tx) {
 		for (ring = 0; ring < priv->tx_cfg.num_queues; ring++) {
@@ -231,10 +322,26 @@ gve_get_ethtool_stats(struct net_device *netdev,
 			data[i++] = tx->stop_queue;
 			data[i++] = be32_to_cpu(gve_tx_load_event_counter(priv,
 									  tx));
+			/* stats from NIC */
+			if (skip_nic_stats) {
+				/* skip NIC tx stats */
+				i += NIC_TX_STATS_REPORT_NUM;
+				continue;
+			}
+			for (j = 0; j < NIC_TX_STATS_REPORT_NUM; j++) {
+				u64 value =
+				be64_to_cpu(report_stats[tx_qid_to_stats_idx[ring] + j].value);
+
+				data[i++] = value;
+			}
 		}
 	} else {
 		i += priv->tx_cfg.num_queues * NUM_GVE_TX_CNTS;
 	}
+
+	kfree(rx_qid_to_stats_idx);
+	kfree(tx_qid_to_stats_idx);
+
 	/* AQ Stats */
 	data[i++] = priv->adminq_prod_cnt;
 	data[i++] = priv->adminq_cmd_fail;
@@ -249,6 +356,7 @@ gve_get_ethtool_stats(struct net_device *netdev,
 	data[i++] = priv->adminq_destroy_rx_queue_cnt;
 	data[i++] = priv->adminq_dcfg_device_resources_cnt;
 	data[i++] = priv->adminq_set_driver_parameter_cnt;
+	data[i++] = priv->adminq_report_stats_cnt;
 }
 
 static void gve_get_channels(struct net_device *netdev,
@@ -353,6 +461,54 @@ static int gve_set_tunable(struct net_device *netdev,
 	}
 }
 
+static u32 gve_get_priv_flags(struct net_device *netdev)
+{
+	struct gve_priv *priv = netdev_priv(netdev);
+	u32 i, ret_flags = 0;
+
+	for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
+		if (priv->ethtool_flags & BIT(i))
+			ret_flags |= BIT(i);
+	}
+	return ret_flags;
+}
+
+static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
+{
+	struct gve_priv *priv = netdev_priv(netdev);
+	u64 ori_flags, new_flags;
+	u32 i;
+
+	ori_flags = READ_ONCE(priv->ethtool_flags);
+	new_flags = ori_flags;
+
+	for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
+		if (flags & BIT(i))
+			new_flags |= BIT(i);
+		else
+			new_flags &= ~(BIT(i));
+		priv->ethtool_flags = new_flags;
+		/* set report-stats */
+		if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
+			/* update the stats when user turns report-stats on */
+			if (flags & BIT(i))
+				gve_handle_report_stats(priv);
+			/* zero off gve stats when report-stats turned off */
+			if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
+				int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
+					priv->tx_cfg.num_queues;
+				int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
+					priv->rx_cfg.num_queues;
+				memset(priv->stats_report->stats, 0,
+				       (tx_stats_num + rx_stats_num) *
+				       sizeof(struct stats));
+			}
+		}
+	}
+
+	return 0;
+}
+
 const struct ethtool_ops gve_ethtool_ops = {
 	.get_drvinfo = gve_get_drvinfo,
 	.get_strings = gve_get_strings,
@@ -367,4 +523,6 @@ const struct ethtool_ops gve_ethtool_ops = {
 	.reset = gve_user_reset,
 	.get_tunable = gve_get_tunable,
 	.set_tunable = gve_set_tunable,
+	.get_priv_flags = gve_get_priv_flags,
+	.set_priv_flags = gve_set_priv_flags,
 };
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 449345d24f29..099d61f00bf0 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -78,6 +78,59 @@ static void gve_free_counter_array(struct gve_priv *priv)
 	priv->counter_array = NULL;
 }
 
+static void gve_service_task_schedule(struct gve_priv *priv)
+{
+	if (!gve_get_probe_in_progress(priv) &&
+	    !gve_get_reset_in_progress(priv)) {
+		gve_set_do_report_stats(priv);
+		queue_work(priv->gve_wq, &priv->service_task);
+	}
+}
+
+static void gve_service_timer(struct timer_list *t)
+{
+	struct gve_priv *priv = from_timer(priv, t, service_timer);
+
+	mod_timer(&priv->service_timer,
+		  round_jiffies(jiffies +
+		  msecs_to_jiffies(priv->service_timer_period)));
+	gve_service_task_schedule(priv);
+}
+
+static int gve_alloc_stats_report(struct gve_priv *priv)
+{
+	int tx_stats_num, rx_stats_num;
+
+	tx_stats_num = (GVE_TX_STATS_REPORT_NUM + NIC_TX_STATS_REPORT_NUM) *
+		       priv->tx_cfg.num_queues;
+	rx_stats_num = (GVE_RX_STATS_REPORT_NUM + NIC_RX_STATS_REPORT_NUM) *
+		       priv->rx_cfg.num_queues;
+	priv->stats_report_len = sizeof(struct gve_stats_report) +
+				 (tx_stats_num + rx_stats_num) *
+				 sizeof(struct stats);
+	priv->stats_report =
+		dma_alloc_coherent(&priv->pdev->dev, priv->stats_report_len,
+				   &priv->stats_report_bus, GFP_KERNEL);
+	if (!priv->stats_report)
+		return -ENOMEM;
+	/* Set up timer for periodic task */
+	timer_setup(&priv->service_timer, gve_service_timer, 0);
+	priv->service_timer_period = GVE_SERVICE_TIMER_PERIOD;
+	/* Start the service task timer */
+	mod_timer(&priv->service_timer,
+		  round_jiffies(jiffies +
+		  msecs_to_jiffies(priv->service_timer_period)));
+	return 0;
+}
+
+static void gve_free_stats_report(struct gve_priv *priv)
+{
+	del_timer_sync(&priv->service_timer);
+	dma_free_coherent(&priv->pdev->dev, priv->stats_report_len,
+			  priv->stats_report, priv->stats_report_bus);
+	priv->stats_report = NULL;
+}
+
 static irqreturn_t gve_mgmnt_intr(int irq, void *arg)
 {
 	struct gve_priv *priv = arg;
@@ -270,6 +323,9 @@ static int gve_setup_device_resources(struct gve_priv *priv)
 	err = gve_alloc_notify_blocks(priv);
 	if (err)
 		goto abort_with_counter;
+	err = gve_alloc_stats_report(priv);
+	if (err)
+		goto abort_with_ntfy_blocks;
 	err = gve_adminq_configure_device_resources(priv,
 						    priv->counter_array_bus,
 						    priv->num_event_counters,
@@ -279,10 +335,18 @@ static int gve_setup_device_resources(struct gve_priv *priv)
 		dev_err(&priv->pdev->dev,
 			"could not setup device_resources: err=%d\n", err);
 		err = -ENXIO;
-		goto abort_with_ntfy_blocks;
+		goto abort_with_stats_report;
 	}
+	err = gve_adminq_report_stats(priv, priv->stats_report_len,
+				      priv->stats_report_bus,
+				      GVE_SERVICE_TIMER_PERIOD);
+	if (err)
+		dev_err(&priv->pdev->dev,
+			"Failed to report stats: err=%d\n", err);
 	gve_set_device_resources_ok(priv);
 	return 0;
+abort_with_stats_report:
+	gve_free_stats_report(priv);
 abort_with_ntfy_blocks:
 	gve_free_notify_blocks(priv);
 abort_with_counter:
@@ -298,6 +362,13 @@ static void gve_teardown_device_resources(struct gve_priv *priv)
 
 	/* Tell device its resources are being freed */
 	if (gve_get_device_resources_ok(priv)) {
+		/* detach the stats report */
+		err = gve_adminq_report_stats(priv, 0, 0x0, GVE_SERVICE_TIMER_PERIOD);
+		if (err) {
+			dev_err(&priv->pdev->dev,
+				"Failed to detach stats report: err=%d\n", err);
+			gve_trigger_reset(priv);
+		}
 		err = gve_adminq_deconfigure_device_resources(priv);
 		if (err) {
 			dev_err(&priv->pdev->dev,
@@ -308,6 +379,7 @@ static void gve_teardown_device_resources(struct gve_priv *priv)
 	}
 	gve_free_counter_array(priv);
 	gve_free_notify_blocks(priv);
+	gve_free_stats_report(priv);
 	gve_clear_device_resources_ok(priv);
 }
 
@@ -830,6 +902,7 @@ static void gve_turndown(struct gve_priv *priv)
 	netif_tx_disable(priv->dev);
 
 	gve_clear_napi_enabled(priv);
+	gve_clear_report_stats(priv);
 }
 
 static void gve_turnup(struct gve_priv *priv)
@@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
 		dev_info(&priv->pdev->dev, "Device requested reset.\n");
 		gve_set_do_reset(priv);
 	}
+	if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
+		dev_info(&priv->pdev->dev, "Device report stats on.\n");
+		gve_set_do_report_stats(priv);
+	}
 }
 
 static void gve_handle_reset(struct gve_priv *priv)
@@ -898,7 +975,68 @@ static void gve_handle_reset(struct gve_priv *priv)
 	}
 }
 
-/* Handle NIC status register changes and reset requests */
+void gve_handle_report_stats(struct gve_priv *priv)
+{
+	int idx, stats_idx = 0, tx_bytes;
+	unsigned int start = 0;
+	struct stats *stats = priv->stats_report->stats;
+
+	if (!gve_get_report_stats(priv))
+		return;
+
+	be64_add_cpu(&priv->stats_report->written_count, 1);
+	/* tx stats */
+	if (priv->tx) {
+		for (idx = 0; idx < priv->tx_cfg.num_queues; idx++) {
+			do {
+				start = u64_stats_fetch_begin(&priv->tx[idx].statss);
+				tx_bytes = priv->tx[idx].bytes_done;
+			} while (u64_stats_fetch_retry(&priv->tx[idx].statss, start));
+			stats[stats_idx++] = (struct stats) {
+				.stat_name = cpu_to_be32(TX_WAKE_CNT),
+				.value = cpu_to_be64(priv->tx[idx].wake_queue),
+				.queue_id = cpu_to_be32(idx),
+			};
+			stats[stats_idx++] = (struct stats) {
+				.stat_name = cpu_to_be32(TX_STOP_CNT),
+				.value = cpu_to_be64(priv->tx[idx].stop_queue),
+				.queue_id = cpu_to_be32(idx),
+			};
+			stats[stats_idx++] = (struct stats) {
+				.stat_name = cpu_to_be32(TX_FRAMES_SENT),
+				.value = cpu_to_be64(priv->tx[idx].req),
+				.queue_id = cpu_to_be32(idx),
+			};
+			stats[stats_idx++] = (struct stats) {
+				.stat_name = cpu_to_be32(TX_BYTES_SENT),
+				.value = cpu_to_be64(tx_bytes),
+				.queue_id = cpu_to_be32(idx),
+			};
+			stats[stats_idx++] = (struct stats) {
+				.stat_name = cpu_to_be32(TX_LAST_COMPLETION_PROCESSED),
+				.value = cpu_to_be64(priv->tx[idx].done),
+				.queue_id = cpu_to_be32(idx),
+			};
+		}
+	}
+	/* rx stats */
+	if (priv->rx) {
+		for (idx = 0; idx < priv->rx_cfg.num_queues; idx++) {
+			stats[stats_idx++] = (struct stats) {
+				.stat_name = cpu_to_be32(RX_NEXT_EXPECTED_SEQUENCE),
+				.value = cpu_to_be64(priv->rx[idx].desc.seqno),
+				.queue_id = cpu_to_be32(idx),
+			};
+			stats[stats_idx++] = (struct stats) {
+				.stat_name = cpu_to_be32(RX_BUFFERS_POSTED),
+				.value = cpu_to_be64(priv->rx[0].fill_cnt),
+				.queue_id = cpu_to_be32(idx),
+			};
+		}
+	}
+}
+
+/* Handle NIC status register changes, reset requests and report stats */
 static void gve_service_task(struct work_struct *work)
 {
 	struct gve_priv *priv = container_of(work, struct gve_priv,
@@ -908,6 +1046,10 @@ static void gve_service_task(struct work_struct *work)
 			  ioread32be(&priv->reg_bar0->device_status));
 
 	gve_handle_reset(priv);
+	if (gve_get_do_report_stats(priv)) {
+		gve_handle_report_stats(priv);
+		gve_clear_do_report_stats(priv);
+	}
 }
 
 static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
@@ -1173,6 +1315,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	priv->db_bar2 = db_bar;
 	priv->service_task_flags = 0x0;
 	priv->state_flags = 0x0;
+	priv->ethtool_flags = 0x0;
 	priv->dma_mask = dma_mask;
 
 	gve_set_probe_in_progress(priv);
diff --git a/drivers/net/ethernet/google/gve/gve_register.h b/drivers/net/ethernet/google/gve/gve_register.h
index fad8813d1bb1..776c2911842a 100644
--- a/drivers/net/ethernet/google/gve/gve_register.h
+++ b/drivers/net/ethernet/google/gve/gve_register.h
@@ -24,5 +24,6 @@ struct gve_registers {
 enum gve_device_status_flags {
 	GVE_DEVICE_STATUS_RESET_MASK		= BIT(1),
 	GVE_DEVICE_STATUS_LINK_STATUS_MASK	= BIT(2),
+	GVE_DEVICE_STATUS_REPORT_STATS_MASK	= BIT(3),
 };
 #endif /* _GVE_REGISTER_H_ */
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 06/18] gve: Batch AQ commands for creating and destroying queues.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (4 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 20:16   ` David Miller
  2020-08-18 19:44 ` [PATCH net-next 07/18] gve: Use link status register to report link status David Awogbemila
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Sagi Shahar, Yangchun Fu, David Awogbemila

From: Sagi Shahar <sagis@google.com>

Adds support for batching AQ commands and uses it for creating and
destroying queues.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_adminq.c | 188 ++++++++++++++++---
 drivers/net/ethernet/google/gve/gve_adminq.h |  10 +-
 drivers/net/ethernet/google/gve/gve_main.c   |  94 +++++-----
 3 files changed, 211 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 38e0c3066035..023e6f804972 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -135,20 +135,71 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
 	}
 }
 
+/* Flushes all AQ commands currently queued and waits for them to complete.
+ * If there are failures, it will return the first error.
+ */
+static int gve_adminq_kick_and_wait(struct gve_priv *priv)
+{
+	u32 tail, head;
+	int i;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+
+	gve_adminq_kick_cmd(priv, head);
+	if (!gve_adminq_wait_for_cmd(priv, head)) {
+		dev_err(&priv->pdev->dev, "AQ commands timed out, need to reset AQ\n");
+		priv->adminq_timeouts++;
+		return -ENOTRECOVERABLE;
+	}
+
+	for (i = tail; i < head; i++) {
+		union gve_adminq_command *cmd;
+		u32 status, err;
+
+		cmd = &priv->adminq[i & priv->adminq_mask];
+		status = be32_to_cpu(READ_ONCE(cmd->status));
+		err = gve_adminq_parse_err(priv, status);
+		if (err)
+			// Return the first error if we failed.
+			return err;
+	}
+
+	return 0;
+}
+
 /* This function is not threadsafe - the caller is responsible for any
  * necessary locks.
  */
-int gve_adminq_execute_cmd(struct gve_priv *priv,
-			   union gve_adminq_command *cmd_orig)
+static int gve_adminq_issue_cmd(struct gve_priv *priv,
+				union gve_adminq_command *cmd_orig)
 {
 	union gve_adminq_command *cmd;
+	u32 tail;
 	u32 opcode;
-	u32 status = 0;
-	u32 prod_cnt;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+
+	// Check if next command will overflow the buffer.
+	if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) == tail) {
+		int err;
+
+		// Flush existing commands to make room.
+		err = gve_adminq_kick_and_wait(priv);
+		if (err)
+			return err;
+
+		// Retry.
+		tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+		if (((priv->adminq_prod_cnt + 1) & priv->adminq_mask) == tail) {
+			// This should never happen. We just flushed the
+			// command queue so there should be enough space.
+			return -ENOMEM;
+		}
+	}
 
 	cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
 	priv->adminq_prod_cnt++;
-	prod_cnt = priv->adminq_prod_cnt;
 
 	memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
 	opcode = be32_to_cpu(READ_ONCE(cmd->opcode));
@@ -191,16 +242,30 @@ int gve_adminq_execute_cmd(struct gve_priv *priv,
 		dev_err(&priv->pdev->dev, "unknown AQ command opcode %d\n", opcode);
 	}
 
-	gve_adminq_kick_cmd(priv, prod_cnt);
-	if (!gve_adminq_wait_for_cmd(priv, prod_cnt)) {
-		dev_err(&priv->pdev->dev, "AQ command timed out, need to reset AQ\n");
-		priv->adminq_timeouts++;
-		return -ENOTRECOVERABLE;
-	}
+	return 0;
+}
+
+/* This function is not threadsafe - the caller is responsible for any
+ * necessary locks.
+ * The caller is also responsible for making sure there are no commands
+ * waiting to be executed.
+ */
+static int gve_adminq_execute_cmd(struct gve_priv *priv, union gve_adminq_command *cmd_orig)
+{
+	u32 tail, head;
+	int err;
+
+	tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
+	head = priv->adminq_prod_cnt;
+	if (tail != head)
+		// This is not a valid path
+		return -EINVAL;
 
-	memcpy(cmd_orig, cmd, sizeof(*cmd));
-	status = be32_to_cpu(READ_ONCE(cmd->status));
-	return gve_adminq_parse_err(priv, status);
+	err = gve_adminq_issue_cmd(priv, cmd_orig);
+	if (err)
+		return err;
+
+	return gve_adminq_kick_and_wait(priv);
 }
 
 /* The device specifies that the management vector can either be the first irq
@@ -245,29 +310,50 @@ int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
 	return gve_adminq_execute_cmd(priv, &cmd);
 }
 
-int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
+static int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	struct gve_tx_ring *tx = &priv->tx[queue_index];
 	union gve_adminq_command cmd;
+	int err;
 
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
 	cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
 		.queue_id = cpu_to_be32(queue_index),
 		.reserved = 0,
-		.queue_resources_addr = cpu_to_be64(tx->q_resources_bus),
+		.queue_resources_addr =
+			cpu_to_be64(tx->q_resources_bus),
 		.tx_ring_addr = cpu_to_be64(tx->bus),
 		.queue_page_list_id = cpu_to_be32(tx->tx_fifo.qpl->id),
 		.ntfy_id = cpu_to_be32(tx->ntfy_id),
 	};
 
-	return gve_adminq_execute_cmd(priv, &cmd);
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
 }
 
-int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	struct gve_rx_ring *rx = &priv->rx[queue_index];
 	union gve_adminq_command cmd;
+	int err;
 
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
@@ -282,12 +368,31 @@ int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
 		.queue_page_list_id = cpu_to_be32(rx->data.qpl->id),
 	};
 
-	return gve_adminq_execute_cmd(priv, &cmd);
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_create_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
 }
 
-int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
+static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	union gve_adminq_command cmd;
+	int err;
 
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
@@ -295,12 +400,31 @@ int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
 		.queue_id = cpu_to_be32(queue_index),
 	};
 
-	return gve_adminq_execute_cmd(priv, &cmd);
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
 }
 
-int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_tx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
+}
+
+static int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	union gve_adminq_command cmd;
+	int err;
 
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
@@ -308,7 +432,25 @@ int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
 		.queue_id = cpu_to_be32(queue_index),
 	};
 
-	return gve_adminq_execute_cmd(priv, &cmd);
+	err = gve_adminq_issue_cmd(priv, &cmd);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < num_queues; i++) {
+		err = gve_adminq_destroy_rx_queue(priv, i);
+		if (err)
+			return err;
+	}
+
+	return gve_adminq_kick_and_wait(priv);
 }
 
 int gve_adminq_describe_device(struct gve_priv *priv)
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index a6c8c29f0d13..784830f75b7c 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -238,8 +238,6 @@ static_assert(sizeof(union gve_adminq_command) == 64);
 int gve_adminq_alloc(struct device *dev, struct gve_priv *priv);
 void gve_adminq_free(struct device *dev, struct gve_priv *priv);
 void gve_adminq_release(struct gve_priv *priv);
-int gve_adminq_execute_cmd(struct gve_priv *priv,
-			   union gve_adminq_command *cmd_orig);
 int gve_adminq_describe_device(struct gve_priv *priv);
 int gve_adminq_configure_device_resources(struct gve_priv *priv,
 					  dma_addr_t counter_array_bus_addr,
@@ -247,10 +245,10 @@ int gve_adminq_configure_device_resources(struct gve_priv *priv,
 					  dma_addr_t db_array_bus_addr,
 					  u32 num_ntfy_blks);
 int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
-int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_id);
-int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_id);
-int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_id);
-int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 queue_id);
+int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues);
+int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 queue_id);
 int gve_adminq_register_page_list(struct gve_priv *priv,
 				  struct gve_queue_page_list *qpl);
 int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 099d61f00bf0..885943d745aa 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -443,36 +443,37 @@ static int gve_create_rings(struct gve_priv *priv)
 	int err;
 	int i;
 
-	for (i = 0; i < priv->tx_cfg.num_queues; i++) {
-		err = gve_adminq_create_tx_queue(priv, i);
-		if (err) {
-			netif_err(priv, drv, priv->dev, "failed to create tx queue %d\n",
-				  i);
-			/* This failure will trigger a reset - no need to clean
-			 * up
-			 */
-			return err;
-		}
-		netif_dbg(priv, drv, priv->dev, "created tx queue %d\n", i);
+	err = gve_adminq_create_tx_queues(priv, priv->tx_cfg.num_queues);
+	if (err) {
+		netif_err(priv, drv, priv->dev, "failed to create %d tx queues\n",
+			  priv->tx_cfg.num_queues);
+		/* This failure will trigger a reset - no need to clean
+		 * up
+		 */
+		return err;
 	}
-	for (i = 0; i < priv->rx_cfg.num_queues; i++) {
-		err = gve_adminq_create_rx_queue(priv, i);
-		if (err) {
-			netif_err(priv, drv, priv->dev, "failed to create rx queue %d\n",
-				  i);
-			/* This failure will trigger a reset - no need to clean
-			 * up
-			 */
-			return err;
-		}
-		/* Rx data ring has been prefilled with packet buffers at
-		 * queue allocation time.
-		 * Write the doorbell to provide descriptor slots and packet
-		 * buffers to the NIC.
+	netif_dbg(priv, drv, priv->dev, "created %d tx queues\n",
+		  priv->tx_cfg.num_queues);
+
+	err = gve_adminq_create_rx_queues(priv, priv->rx_cfg.num_queues);
+	if (err) {
+		netif_err(priv, drv, priv->dev, "failed to create %d rx queues\n",
+			  priv->rx_cfg.num_queues);
+		/* This failure will trigger a reset - no need to clean
+		 * up
 		 */
-		gve_rx_write_doorbell(priv, &priv->rx[i]);
-		netif_dbg(priv, drv, priv->dev, "created rx queue %d\n", i);
+		return err;
 	}
+	netif_dbg(priv, drv, priv->dev, "created %d rx queues\n",
+		  priv->rx_cfg.num_queues);
+
+	/* Rx data ring has been prefilled with packet buffers at queue
+	 * allocation time.
+	 * Write the doorbell to provide descriptor slots and packet buffers
+	 * to the NIC.
+	 */
+	for (i = 0; i < priv->rx_cfg.num_queues; i++)
+		gve_rx_write_doorbell(priv, &priv->rx[i]);
 
 	return 0;
 }
@@ -530,34 +531,23 @@ static int gve_alloc_rings(struct gve_priv *priv)
 static int gve_destroy_rings(struct gve_priv *priv)
 {
 	int err;
-	int i;
 
-	for (i = 0; i < priv->tx_cfg.num_queues; i++) {
-		err = gve_adminq_destroy_tx_queue(priv, i);
-		if (err) {
-			netif_err(priv, drv, priv->dev,
-				  "failed to destroy tx queue %d\n",
-				  i);
-			/* This failure will trigger a reset - no need to clean
-			 * up
-			 */
-			return err;
-		}
-		netif_dbg(priv, drv, priv->dev, "destroyed tx queue %d\n", i);
+	err = gve_adminq_destroy_tx_queues(priv, priv->tx_cfg.num_queues);
+	if (err) {
+		netif_err(priv, drv, priv->dev,
+			  "failed to destroy tx queues\n");
+		/* This failure will trigger a reset - no need to clean up */
+		return err;
 	}
-	for (i = 0; i < priv->rx_cfg.num_queues; i++) {
-		err = gve_adminq_destroy_rx_queue(priv, i);
-		if (err) {
-			netif_err(priv, drv, priv->dev,
-				  "failed to destroy rx queue %d\n",
-				  i);
-			/* This failure will trigger a reset - no need to clean
-			 * up
-			 */
-			return err;
-		}
-		netif_dbg(priv, drv, priv->dev, "destroyed rx queue %d\n", i);
+	netif_dbg(priv, drv, priv->dev, "destroyed tx queues\n");
+	err = gve_adminq_destroy_rx_queues(priv, priv->rx_cfg.num_queues);
+	if (err) {
+		netif_err(priv, drv, priv->dev,
+			  "failed to destroy rx queues\n");
+		/* This failure will trigger a reset - no need to clean up */
+		return err;
 	}
+	netif_dbg(priv, drv, priv->dev, "destroyed rx queues\n");
 	return 0;
 }
 
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 07/18] gve: Use link status register to report link status
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (5 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 06/18] gve: Batch AQ commands for creating and destroying queues David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-19  3:36   ` Jakub Kicinski
  2020-08-18 19:44 ` [PATCH net-next 08/18] gve: Enable Link Speed Reporting in the driver David Awogbemila
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Patricio Noyola, Yangchun Fu, David Awogbemila

From: Patricio Noyola <patricion@google.com>

This makes the driver better aware of the connectivity status of the
device. Based on the device's status register, the driver can call
netif_carrier_{on,off}.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Patricio Noyola <patricion@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_main.c | 23 +++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 885943d745aa..685f18697fc1 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -769,7 +769,7 @@ static int gve_open(struct net_device *dev)
 	gve_set_device_rings_ok(priv);
 
 	gve_turnup(priv);
-	netif_carrier_on(dev);
+	queue_work(priv->gve_wq, &priv->service_task);
 	priv->interface_up_cnt++;
 	return 0;
 
@@ -1026,16 +1026,33 @@ void gve_handle_report_stats(struct gve_priv *priv)
 	}
 }
 
+static void gve_handle_link_status(struct gve_priv *priv, bool link_status)
+{
+	if (!gve_get_napi_enabled(priv))
+		return;
+
+	if (link_status == netif_carrier_ok(priv->dev))
+		return;
+
+	if (link_status) {
+		netif_carrier_on(priv->dev);
+	} else {
+		dev_info(&priv->pdev->dev, "Device link is down.\n");
+		netif_carrier_off(priv->dev);
+	}
+}
+
 /* Handle NIC status register changes, reset requests and report stats */
 static void gve_service_task(struct work_struct *work)
 {
 	struct gve_priv *priv = container_of(work, struct gve_priv,
 					     service_task);
+	u32 status = ioread32be(&priv->reg_bar0->device_status);
 
-	gve_handle_status(priv,
-			  ioread32be(&priv->reg_bar0->device_status));
+	gve_handle_status(priv, status);
 
 	gve_handle_reset(priv);
+	gve_handle_link_status(priv, GVE_DEVICE_STATUS_LINK_STATUS_MASK & status);
 	if (gve_get_do_report_stats(priv)) {
 		gve_handle_report_stats(priv);
 		gve_clear_do_report_stats(priv);
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 08/18] gve: Enable Link Speed Reporting in the driver.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (6 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 07/18] gve: Use link status register to report link status David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 21:30   ` Jakub Kicinski
  2020-08-18 19:44 ` [PATCH net-next 09/18] gve: Add support for raw addressing device option David Awogbemila
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: David Awogbemila, Yangchun Fu

This change allows the driver to report the device link speed
when the ethtool command:
	ethtool <nic name>
is run.
Getting the link speed is done via a new admin queue command:
ReportLinkSpeed.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h         |  3 +++
 drivers/net/ethernet/google/gve/gve_adminq.c  | 27 +++++++++++++++++++
 drivers/net/ethernet/google/gve/gve_adminq.h  |  9 +++++++
 drivers/net/ethernet/google/gve/gve_ethtool.c | 11 ++++++++
 4 files changed, 50 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index bc54059f9b2e..0f5871af36af 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -252,6 +252,9 @@ struct gve_priv {
 	unsigned long service_timer_period;
 	struct timer_list service_timer;
 
+	/* Gvnic device link speed from hypervisor. */
+	u64 link_speed;
+
   /* Gvnic device's dma mask, set during probe. */
 	u8 dma_mask;
 };
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 023e6f804972..cf5eaab1cb83 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -596,3 +596,30 @@ int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
 	return gve_adminq_execute_cmd(priv, &cmd);
 }
 
+int gve_adminq_report_link_speed(struct gve_priv *priv)
+{
+	union gve_adminq_command gvnic_cmd;
+	dma_addr_t link_speed_region_bus;
+	u64 *link_speed_region;
+	int err;
+
+	link_speed_region =
+		dma_alloc_coherent(&priv->pdev->dev, sizeof(*link_speed_region),
+				   &link_speed_region_bus, GFP_KERNEL);
+
+	if (!link_speed_region)
+		return -ENOMEM;
+
+	memset(&gvnic_cmd, 0, sizeof(gvnic_cmd));
+	gvnic_cmd.opcode = cpu_to_be32(GVE_ADMINQ_REPORT_LINK_SPEED);
+	gvnic_cmd.report_link_speed.link_speed_address =
+		cpu_to_be64(link_speed_region_bus);
+
+	err = gve_adminq_execute_cmd(priv, &gvnic_cmd);
+
+	priv->link_speed = be64_to_cpu(*link_speed_region);
+	dma_free_coherent(&priv->pdev->dev, sizeof(*link_speed_region), link_speed_region,
+			  link_speed_region_bus);
+	return err;
+}
+
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index 784830f75b7c..281de8326bc5 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -22,6 +22,7 @@ enum gve_adminq_opcodes {
 	GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES	= 0x9,
 	GVE_ADMINQ_SET_DRIVER_PARAMETER		= 0xB,
 	GVE_ADMINQ_REPORT_STATS			= 0xC,
+	GVE_ADMINQ_REPORT_LINK_SPEED	= 0xD
 };
 
 /* Admin queue status codes */
@@ -181,6 +182,12 @@ struct gve_adminq_report_stats {
 
 static_assert(sizeof(struct gve_adminq_report_stats) == 24);
 
+struct gve_adminq_report_link_speed {
+	__be64 link_speed_address;
+};
+
+static_assert(sizeof(struct gve_adminq_report_link_speed) == 8);
+
 struct stats {
 	__be32 stat_name;
 	__be32 queue_id;
@@ -228,6 +235,7 @@ union gve_adminq_command {
 			struct gve_adminq_unregister_page_list unreg_page_list;
 			struct gve_adminq_set_driver_parameter set_driver_param;
 			struct gve_adminq_report_stats report_stats;
+			struct gve_adminq_report_link_speed report_link_speed;
 		};
 	};
 	u8 reserved[64];
@@ -255,4 +263,5 @@ int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
 int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
 int gve_adminq_report_stats(struct gve_priv *priv, u64 stats_report_len,
 			    dma_addr_t stats_report_addr, u64 interval);
+int gve_adminq_report_link_speed(struct gve_priv *priv);
 #endif /* _GVE_ADMINQ_H */
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index 9d778e04b9f4..8bc8383558d8 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -509,6 +509,16 @@ static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
 	return 0;
 }
 
+static int gve_get_link_ksettings(struct net_device *netdev,
+				  struct ethtool_link_ksettings *cmd)
+{
+	struct gve_priv *priv = netdev_priv(netdev);
+	int err = gve_adminq_report_link_speed(priv);
+
+	cmd->base.speed = priv->link_speed;
+	return err;
+}
+
 const struct ethtool_ops gve_ethtool_ops = {
 	.get_drvinfo = gve_get_drvinfo,
 	.get_strings = gve_get_strings,
@@ -525,4 +535,5 @@ const struct ethtool_ops gve_ethtool_ops = {
 	.set_tunable = gve_set_tunable,
 	.get_priv_flags = gve_get_priv_flags,
 	.set_priv_flags = gve_set_priv_flags,
+	.get_link_ksettings = gve_get_link_ksettings
 };
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 09/18] gve: Add support for raw addressing device option
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (7 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 08/18] gve: Enable Link Speed Reporting in the driver David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 10/18] gve: Add support for raw addressing to the rx path David Awogbemila
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

Add support to describe device for parsing device options. As
the first device option, add raw addressing.

"Raw Addressing" mode (as opposed to the current "qpl" mode) is an
operational mode which allows the driver avoid bounce buffer copies
which it currently performs using pre-allocated qpls (queue_page_lists)
when sending and receiving packets. For egress packets the provided
skb buffer addresses are dma_map'ed and passed to the device. This
means that the device can perform DMA directly from these addresses
and the driver does not have to copy the buffer content into
pre-allocated buffers/qpls (as in qpl mode). For ingress packets copies
are also eliminated as buffers are recycled or newly allocated as
necessary.

Also add the necessary priv field to track the enablement status.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h        |  1 +
 drivers/net/ethernet/google/gve/gve_adminq.c | 49 +++++++++++++++++++-
 drivers/net/ethernet/google/gve/gve_adminq.h | 15 ++++--
 drivers/net/ethernet/google/gve/gve_main.c   |  9 ++++
 4 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 0f5871af36af..50565516d1fe 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -199,6 +199,7 @@ struct gve_priv {
 	u64 num_registered_pages; /* num pages registered with NIC */
 	u32 rx_copybreak; /* copy packets smaller than this */
 	u16 default_num_queues; /* default num queues to set up */
+	bool raw_addressing; /* true if this dev supports raw addressing */
 
 	struct gve_queue_config tx_cfg;
 	struct gve_queue_config rx_cfg;
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index cf5eaab1cb83..cf5d172bec83 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -456,11 +456,14 @@ int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues)
 int gve_adminq_describe_device(struct gve_priv *priv)
 {
 	struct gve_device_descriptor *descriptor;
+	struct gve_device_option *dev_opt;
 	union gve_adminq_command cmd;
 	dma_addr_t descriptor_bus;
+	u16 num_options;
 	int err = 0;
 	u8 *mac;
 	u16 mtu;
+	int i;
 
 	memset(&cmd, 0, sizeof(cmd));
 	descriptor = dma_alloc_coherent(&priv->pdev->dev, PAGE_SIZE,
@@ -510,10 +513,54 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 	priv->rx_pages_per_qpl = be16_to_cpu(descriptor->rx_pages_per_qpl);
 	if (priv->rx_pages_per_qpl < priv->rx_desc_cnt) {
 		dev_err(&priv->pdev->dev, "rx_pages_per_qpl cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d.\n",
-			priv->rx_pages_per_qpl);
+			  priv->rx_pages_per_qpl);
 		priv->rx_desc_cnt = priv->rx_pages_per_qpl;
 	}
 	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
+	dev_opt = (struct gve_device_option *)((void *)descriptor +
+							sizeof(*descriptor));
+
+	num_options = be16_to_cpu(descriptor->num_device_options);
+	for (i = 0; i < num_options; i++) {
+		u16 option_length = be16_to_cpu(dev_opt->option_length);
+		u16 option_id = be16_to_cpu(dev_opt->option_id);
+
+		if ((void *)dev_opt + sizeof(*dev_opt) + option_length > (void *)descriptor +
+				      be16_to_cpu(descriptor->total_length)) {
+			dev_err(&priv->dev->dev,
+				"options exceed device_descriptor's total length.\n");
+			err = -EINVAL;
+			goto free_device_descriptor;
+		}
+
+		switch (option_id) {
+		case GVE_DEV_OPT_ID_RAW_ADDRESSING:
+			/* If the length or feature mask doesn't match,
+			 * continue without enabling the feature.
+			 */
+			if (option_length != GVE_DEV_OPT_LEN_RAW_ADDRESSING ||
+			    be32_to_cpu(dev_opt->feat_mask) !=
+			    GVE_DEV_OPT_FEAT_MASK_RAW_ADDRESSING) {
+				dev_info(&priv->pdev->dev,
+					 "Raw addressing device option not enabled, length or features mask did not match expected.\n");
+				priv->raw_addressing = false;
+			} else {
+				dev_info(&priv->pdev->dev,
+					 "Raw addressing device option enabled.\n");
+				priv->raw_addressing = true;
+			}
+			break;
+		default:
+			/* If we don't recognize the option just continue
+			 * without doing anything.
+			 */
+			dev_info(&priv->pdev->dev,
+				 "Unrecognized device option 0x%hx not enabled.\n",
+				   option_id);
+			break;
+		}
+		dev_opt = (void *)dev_opt + sizeof(*dev_opt) + option_length;
+	}
 
 free_device_descriptor:
 	dma_free_coherent(&priv->pdev->dev, sizeof(*descriptor), descriptor,
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index 281de8326bc5..af5f586167bd 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -79,12 +79,17 @@ struct gve_device_descriptor {
 
 static_assert(sizeof(struct gve_device_descriptor) == 40);
 
-struct device_option {
-	__be32 option_id;
-	__be32 option_length;
+struct gve_device_option {
+	__be16 option_id;
+	__be16 option_length;
+	__be32 feat_mask;
 };
 
-static_assert(sizeof(struct device_option) == 8);
+static_assert(sizeof(struct gve_device_option) == 8);
+
+#define GVE_DEV_OPT_ID_RAW_ADDRESSING 0x1
+#define GVE_DEV_OPT_LEN_RAW_ADDRESSING 0x0
+#define GVE_DEV_OPT_FEAT_MASK_RAW_ADDRESSING 0x0
 
 struct gve_adminq_configure_device_resources {
 	__be64 counter_array;
@@ -111,6 +116,8 @@ struct gve_adminq_unregister_page_list {
 
 static_assert(sizeof(struct gve_adminq_unregister_page_list) == 4);
 
+#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF
+
 struct gve_adminq_create_tx_queue {
 	__be32 queue_id;
 	__be32 reserved;
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 685f18697fc1..3e9b69747370 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -678,6 +678,10 @@ static int gve_alloc_qpls(struct gve_priv *priv)
 	int i, j;
 	int err;
 
+	/* Raw addressing means no QPLs */
+	if (priv->raw_addressing)
+		return 0;
+
 	priv->qpls = kvzalloc(num_qpls * sizeof(*priv->qpls), GFP_KERNEL);
 	if (!priv->qpls)
 		return -ENOMEM;
@@ -718,6 +722,10 @@ static void gve_free_qpls(struct gve_priv *priv)
 	int num_qpls = gve_num_tx_qpls(priv) + gve_num_rx_qpls(priv);
 	int i;
 
+	/* Raw addressing means no QPLs */
+	if (priv->raw_addressing)
+		return;
+
 	kvfree(priv->qpl_cfg.qpl_id_map);
 
 	for (i = 0; i < num_qpls; i++)
@@ -1075,6 +1083,7 @@ static int gve_init_priv(struct gve_priv *priv, bool skip_describe_device)
 	if (skip_describe_device)
 		goto setup_device;
 
+	priv->raw_addressing = false;
 	/* Get the initial information we need from the device */
 	err = gve_adminq_describe_device(priv);
 	if (err) {
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 10/18] gve: Add support for raw addressing to the rx path
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (8 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 09/18] gve: Add support for raw addressing device option David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 20:18   ` David Miller
  2020-08-18 19:44 ` [PATCH net-next 11/18] gve: Add support for raw addressing in the tx path David Awogbemila
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

Add support to use raw dma addresses in the rx path. Due to this new
support we can alloc a new buffer instead of making a copy.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h        |  23 +-
 drivers/net/ethernet/google/gve/gve_adminq.c |  15 +-
 drivers/net/ethernet/google/gve/gve_desc.h   |  10 +-
 drivers/net/ethernet/google/gve/gve_main.c   |   9 +-
 drivers/net/ethernet/google/gve/gve_rx.c     | 344 ++++++++++++++-----
 5 files changed, 296 insertions(+), 105 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 50565516d1fe..c86cec163bd6 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -50,6 +50,7 @@ struct gve_rx_slot_page_info {
 	struct page *page;
 	void *page_address;
 	u32 page_offset; /* offset to write to in page */
+	bool can_flip; /* page can be flipped and reused */
 };
 
 /* A list of pages registered with the device during setup and used by a queue
@@ -68,6 +69,7 @@ struct gve_rx_data_queue {
 	dma_addr_t data_bus; /* dma mapping of the slots */
 	struct gve_rx_slot_page_info *page_info; /* page info of the buffers */
 	struct gve_queue_page_list *qpl; /* qpl assigned to this queue */
+	bool raw_addressing; /* use raw_addressing? */
 };
 
 struct gve_priv;
@@ -82,11 +84,14 @@ struct gve_rx_ring {
 	u32 cnt; /* free-running total number of completed packets */
 	u32 fill_cnt; /* free-running total number of descs and buffs posted */
 	u32 mask; /* masks the cnt and fill_cnt to the size of the ring */
+	u32 db_threshold; /* threshold for posting new buffs and descs */
 	u64 rx_copybreak_pkt; /* free-running count of copybreak packets */
 	u64 rx_copied_pkt; /* free-running total number of copied packets */
 	u64 rx_skb_alloc_fail; /* free-running count of skb alloc fails */
 	u64 rx_buf_alloc_fail; /* free-running count of buffer alloc fails */
 	u64 rx_desc_err_dropped_pkt; /* free-running count of packets dropped by descriptor error */
+	/* free-running count of packets dropped because of lack of buffer refill */
+	u64 rx_no_refill_dropped_pkt;
 	u32 q_num; /* queue index */
 	u32 ntfy_id; /* notification block index */
 	struct gve_queue_resources *q_resources; /* head and tail pointer idx */
@@ -194,7 +199,7 @@ struct gve_priv {
 	u16 tx_desc_cnt; /* num desc per ring */
 	u16 rx_desc_cnt; /* num desc per ring */
 	u16 tx_pages_per_qpl; /* tx buffer length */
-	u16 rx_pages_per_qpl; /* rx buffer length */
+	u16 rx_data_slot_cnt; /* rx buffer length */
 	u64 max_registered_pages;
 	u64 num_registered_pages; /* num pages registered with NIC */
 	u32 rx_copybreak; /* copy packets smaller than this */
@@ -449,7 +454,10 @@ static inline u32 gve_num_tx_qpls(struct gve_priv *priv)
  */
 static inline u32 gve_num_rx_qpls(struct gve_priv *priv)
 {
-	return priv->rx_cfg.num_queues;
+	if (priv->raw_addressing)
+		return 0;
+	else
+		return priv->rx_cfg.num_queues;
 }
 
 /* Returns a pointer to the next available tx qpl in the list of qpls
@@ -503,18 +511,9 @@ static inline enum dma_data_direction gve_qpl_dma_dir(struct gve_priv *priv,
 		return DMA_FROM_DEVICE;
 }
 
-/* Returns true if the max mtu allows page recycling */
-static inline bool gve_can_recycle_pages(struct net_device *dev)
-{
-	/* We can't recycle the pages if we can't fit a packet into half a
-	 * page.
-	 */
-	return dev->max_mtu <= PAGE_SIZE / 2;
-}
-
 /* buffers */
 int gve_alloc_page(struct gve_priv *priv, struct device *dev, struct page **page, dma_addr_t *dma,
-		   enum dma_data_direction);
+		   enum dma_data_direction, gfp_t gfp_flags);
 void gve_free_page(struct device *dev, struct page *page, dma_addr_t dma,
 		   enum dma_data_direction);
 /* tx handling */
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index cf5d172bec83..bb21891d06a2 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -353,8 +353,11 @@ static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
 {
 	struct gve_rx_ring *rx = &priv->rx[queue_index];
 	union gve_adminq_command cmd;
+	u32 qpl_id;
 	int err;
 
+	qpl_id = priv->raw_addressing ? GVE_RAW_ADDRESSING_QPL_ID :
+		rx->data.qpl->id;
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
 	cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
@@ -365,7 +368,7 @@ static int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
 		.queue_resources_addr = cpu_to_be64(rx->q_resources_bus),
 		.rx_desc_ring_addr = cpu_to_be64(rx->desc.bus),
 		.rx_data_ring_addr = cpu_to_be64(rx->data.data_bus),
-		.queue_page_list_id = cpu_to_be32(rx->data.qpl->id),
+		.queue_page_list_id = cpu_to_be32(qpl_id),
 	};
 
 	err = gve_adminq_issue_cmd(priv, &cmd);
@@ -510,11 +513,11 @@ int gve_adminq_describe_device(struct gve_priv *priv)
 	mac = descriptor->mac;
 	dev_info(&priv->pdev->dev, "MAC addr: %pM\n", mac);
 	priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
-	priv->rx_pages_per_qpl = be16_to_cpu(descriptor->rx_pages_per_qpl);
-	if (priv->rx_pages_per_qpl < priv->rx_desc_cnt) {
-		dev_err(&priv->pdev->dev, "rx_pages_per_qpl cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d.\n",
-			  priv->rx_pages_per_qpl);
-		priv->rx_desc_cnt = priv->rx_pages_per_qpl;
+	priv->rx_data_slot_cnt = be16_to_cpu(descriptor->rx_pages_per_qpl);
+	if (priv->rx_data_slot_cnt < priv->rx_desc_cnt) {
+		dev_err(&priv->pdev->dev, "rx_data_slot_cnt cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d.\n",
+			priv->rx_data_slot_cnt);
+		priv->rx_desc_cnt = priv->rx_data_slot_cnt;
 	}
 	priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
 	dev_opt = (struct gve_device_option *)((void *)descriptor +
diff --git a/drivers/net/ethernet/google/gve/gve_desc.h b/drivers/net/ethernet/google/gve/gve_desc.h
index 54779871d52e..0aad314aefaf 100644
--- a/drivers/net/ethernet/google/gve/gve_desc.h
+++ b/drivers/net/ethernet/google/gve/gve_desc.h
@@ -72,12 +72,14 @@ struct gve_rx_desc {
 } __packed;
 static_assert(sizeof(struct gve_rx_desc) == 64);
 
-/* As with the Tx ring format, the qpl_offset entries below are offsets into an
- * ordered list of registered pages.
+/* If the device supports raw dma addressing then the addr in data slot is
+ * the dma address of the buffer.
+ * If the device only supports registered segments than the addr is a byte
+ * offset into the registered segment (an ordered list of pages) where the
+ * buffer is.
  */
 struct gve_rx_data_slot {
-	/* byte offset into the rx registered segment of this slot */
-	__be64 qpl_offset;
+	__be64 addr;
 };
 
 /* GVE Recive Packet Descriptor Seq No */
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 3e9b69747370..de22c60d1fea 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -578,10 +578,8 @@ static void gve_free_rings(struct gve_priv *priv)
 
 int gve_alloc_page(struct gve_priv *priv, struct device *dev,
 		   struct page **page, dma_addr_t *dma,
-		   enum dma_data_direction dir)
+		   enum dma_data_direction dir, gfp_t gfp_flags)
 {
-	gfp_t gfp_flags = GFP_KERNEL;
-
 	if (priv->dma_mask == 24)
 		gfp_flags |= GFP_DMA;
 	else if (priv->dma_mask == 32)
@@ -596,6 +594,7 @@ int gve_alloc_page(struct gve_priv *priv, struct device *dev,
 	if (dma_mapping_error(dev, *dma)) {
 		priv->dma_mapping_error++;
 		put_page(*page);
+		*page = NULL;
 		return -ENOMEM;
 	}
 	return 0;
@@ -631,7 +630,7 @@ static int gve_alloc_queue_page_list(struct gve_priv *priv, u32 id,
 	for (i = 0; i < pages; i++) {
 		err = gve_alloc_page(priv, &priv->pdev->dev, &qpl->pages[i],
 				     &qpl->page_buses[i],
-				     gve_qpl_dma_dir(priv, id));
+				     gve_qpl_dma_dir(priv, id), GFP_KERNEL);
 		/* caller handles clean up */
 		if (err)
 			return -ENOMEM;
@@ -694,7 +693,7 @@ static int gve_alloc_qpls(struct gve_priv *priv)
 	}
 	for (; i < num_qpls; i++) {
 		err = gve_alloc_queue_page_list(priv, i,
-						priv->rx_pages_per_qpl);
+						priv->rx_data_slot_cnt);
 		if (err)
 			goto free_qpls;
 	}
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index 008fa897a3e6..ca12f267d08a 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -16,12 +16,22 @@ static void gve_rx_remove_from_block(struct gve_priv *priv, int queue_idx)
 	block->rx = NULL;
 }
 
+static void gve_rx_free_buffer(struct device *dev,
+			       struct gve_rx_slot_page_info *page_info,
+			       struct gve_rx_data_slot *data_slot)
+{
+	dma_addr_t dma = (dma_addr_t)(be64_to_cpu(data_slot->addr) -
+				      page_info->page_offset);
+
+	gve_free_page(dev, page_info->page, dma, DMA_FROM_DEVICE);
+}
+
 static void gve_rx_free_ring(struct gve_priv *priv, int idx)
 {
 	struct gve_rx_ring *rx = &priv->rx[idx];
 	struct device *dev = &priv->pdev->dev;
 	size_t bytes;
-	u32 slots;
+	u32 slots = rx->mask + 1;
 
 	gve_rx_remove_from_block(priv, idx);
 
@@ -33,11 +43,18 @@ static void gve_rx_free_ring(struct gve_priv *priv, int idx)
 			  rx->q_resources, rx->q_resources_bus);
 	rx->q_resources = NULL;
 
-	gve_unassign_qpl(priv, rx->data.qpl->id);
-	rx->data.qpl = NULL;
+	if (rx->data.raw_addressing) {
+		int i;
+
+		for (i = 0; i < slots; i++)
+			gve_rx_free_buffer(dev, &rx->data.page_info[i],
+					   &rx->data.data_ring[i]);
+	} else {
+		gve_unassign_qpl(priv, rx->data.qpl->id);
+		rx->data.qpl = NULL;
+	}
 	kvfree(rx->data.page_info);
 
-	slots = rx->mask + 1;
 	bytes = sizeof(*rx->data.data_ring) * slots;
 	dma_free_coherent(dev, bytes, rx->data.data_ring,
 			  rx->data.data_bus);
@@ -52,13 +69,14 @@ static void gve_setup_rx_buffer(struct gve_rx_slot_page_info *page_info,
 	page_info->page = page;
 	page_info->page_offset = 0;
 	page_info->page_address = page_address(page);
-	slot->qpl_offset = cpu_to_be64(addr);
+	slot->addr = cpu_to_be64(addr);
 }
 
 static int gve_prefill_rx_pages(struct gve_rx_ring *rx)
 {
 	struct gve_priv *priv = rx->gve;
 	u32 slots;
+	int err;
 	int i;
 
 	/* Allocate one page per Rx queue slot. Each page is split into two
@@ -71,12 +89,31 @@ static int gve_prefill_rx_pages(struct gve_rx_ring *rx)
 	if (!rx->data.page_info)
 		return -ENOMEM;
 
-	rx->data.qpl = gve_assign_rx_qpl(priv);
-
+	if (!rx->data.raw_addressing)
+		rx->data.qpl = gve_assign_rx_qpl(priv);
 	for (i = 0; i < slots; i++) {
-		struct page *page = rx->data.qpl->pages[i];
-		dma_addr_t addr = i * PAGE_SIZE;
+		struct page *page;
+		dma_addr_t addr;
+
+		if (rx->data.raw_addressing) {
+			err = gve_alloc_page(priv, &priv->pdev->dev, &page,
+					     &addr, DMA_FROM_DEVICE,
+							 GFP_KERNEL);
+			if (err) {
+				int j;
 
+				u64_stats_update_begin(&rx->statss);
+				rx->rx_buf_alloc_fail++;
+				u64_stats_update_end(&rx->statss);
+				for (j = 0; j < i; j++)
+					gve_free_page(&priv->pdev->dev, page,
+						      addr, DMA_FROM_DEVICE);
+				return err;
+			}
+		} else {
+			page = rx->data.qpl->pages[i];
+			addr = i * PAGE_SIZE;
+		}
 		gve_setup_rx_buffer(&rx->data.page_info[i],
 				    &rx->data.data_ring[i], addr, page);
 	}
@@ -110,8 +147,9 @@ static int gve_rx_alloc_ring(struct gve_priv *priv, int idx)
 	rx->gve = priv;
 	rx->q_num = idx;
 
-	slots = priv->rx_pages_per_qpl;
+	slots = priv->rx_data_slot_cnt;
 	rx->mask = slots - 1;
+	rx->data.raw_addressing = priv->raw_addressing;
 
 	/* alloc rx data ring */
 	bytes = sizeof(*rx->data.data_ring) * slots;
@@ -156,8 +194,8 @@ static int gve_rx_alloc_ring(struct gve_priv *priv, int idx)
 		err = -ENOMEM;
 		goto abort_with_q_resources;
 	}
-	rx->mask = slots - 1;
 	rx->cnt = 0;
+	rx->db_threshold = priv->rx_desc_cnt / 2;
 	rx->desc.seqno = 1;
 	gve_rx_add_to_block(priv, idx);
 
@@ -225,8 +263,7 @@ static enum pkt_hash_types gve_rss_type(__be16 pkt_flags)
 	return PKT_HASH_TYPE_L2;
 }
 
-static struct sk_buff *gve_rx_copy(struct gve_rx_ring *rx,
-				   struct net_device *dev,
+static struct sk_buff *gve_rx_copy(struct net_device *dev,
 				   struct napi_struct *napi,
 				   struct gve_rx_slot_page_info *page_info,
 				   u16 len)
@@ -244,15 +281,10 @@ static struct sk_buff *gve_rx_copy(struct gve_rx_ring *rx,
 
 	skb->protocol = eth_type_trans(skb, dev);
 
-	u64_stats_update_begin(&rx->statss);
-	rx->rx_copied_pkt++;
-	u64_stats_update_end(&rx->statss);
-
 	return skb;
 }
 
-static struct sk_buff *gve_rx_add_frags(struct net_device *dev,
-					struct napi_struct *napi,
+static struct sk_buff *gve_rx_add_frags(struct napi_struct *napi,
 					struct gve_rx_slot_page_info *page_info,
 					u16 len)
 {
@@ -268,14 +300,118 @@ static struct sk_buff *gve_rx_add_frags(struct net_device *dev,
 	return skb;
 }
 
-static void gve_rx_flip_buff(struct gve_rx_slot_page_info *page_info,
-			     struct gve_rx_data_slot *data_ring)
+static int gve_rx_alloc_buffer(struct gve_priv *priv, struct device *dev,
+			       struct gve_rx_slot_page_info *page_info,
+			       struct gve_rx_data_slot *data_slot,
+			       struct gve_rx_ring *rx)
+{
+	struct page *page;
+	dma_addr_t dma;
+	int err;
+
+	err = gve_alloc_page(priv, dev, &page, &dma, DMA_FROM_DEVICE, GFP_ATOMIC);
+	if (err) {
+		u64_stats_update_begin(&rx->statss);
+		rx->rx_buf_alloc_fail++;
+		u64_stats_update_end(&rx->statss);
+		return err;
+	}
+
+	gve_setup_rx_buffer(page_info, data_slot, dma, page);
+	return 0;
+}
+
+static void gve_rx_flip_buffer(struct gve_rx_slot_page_info *page_info,
+			       struct gve_rx_data_slot *data_slot)
 {
-	u64 addr = be64_to_cpu(data_ring->qpl_offset);
+	u64 addr = be64_to_cpu(data_slot->addr);
 
+	/* "flip" to other packet buffer on this page */
 	page_info->page_offset ^= PAGE_SIZE / 2;
 	addr ^= PAGE_SIZE / 2;
-	data_ring->qpl_offset = cpu_to_be64(addr);
+	data_slot->addr = cpu_to_be64(addr);
+}
+
+static bool gve_rx_can_flip_buffers(struct net_device *netdev)
+{
+#if PAGE_SIZE == 4096
+	/* We can't flip a buffer if we can't fit a packet
+	 * into half a page.
+	 */
+	if (netdev->max_mtu + GVE_RX_PAD + ETH_HLEN  > PAGE_SIZE / 2)
+		return false;
+	return true;
+#else
+	/* PAGE_SIZE != 4096 - don't try to reuse */
+	return false;
+#endif
+}
+
+static int gve_rx_can_recycle_buffer(struct page *page)
+{
+	int pagecount = page_count(page);
+
+	/* This page is not being used by any SKBs - reuse */
+	if (pagecount == 1) {
+		return 1;
+	/* This page is still being used by an SKB - we can't reuse */
+	} else if (pagecount >= 2) {
+		return 0;
+	}
+	WARN(pagecount < 1, "Pagecount should never be < 1");
+	return -1;
+}
+
+static struct sk_buff *
+gve_rx_raw_addressing(struct device *dev, struct net_device *netdev,
+		      struct gve_rx_slot_page_info *page_info, u16 len,
+		      struct napi_struct *napi,
+		      struct gve_rx_data_slot *data_slot, bool can_flip)
+{
+	struct sk_buff *skb = gve_rx_add_frags(napi, page_info, len);
+
+	if (!skb)
+		return NULL;
+
+	/* Optimistically stop the kernel from freeing the page by increasing
+	 * the page bias. We will check the refcount in refill to determine if
+	 * we need to alloc a new page.
+	 */
+	get_page(page_info->page);
+	page_info->can_flip = can_flip;
+
+	return skb;
+}
+
+static struct sk_buff *
+gve_rx_qpl(struct device *dev, struct net_device *netdev,
+	   struct gve_rx_ring *rx, struct gve_rx_slot_page_info *page_info,
+	   u16 len, struct napi_struct *napi,
+	   struct gve_rx_data_slot *data_slot, bool recycle)
+{
+	struct sk_buff *skb;
+	/* if raw_addressing mode is not enabled gvnic can only receive into
+	 * registered segments. If the buffer can't be recycled, our only
+	 * choice is to copy the data out of it so that we can return it to the
+	 * device.
+	 */
+	if (recycle) {
+		skb = gve_rx_add_frags(napi, page_info, len);
+		/* No point in recycling if we didn't get the skb */
+		if (skb) {
+			/* Make sure the networking stack can't free the page */
+			get_page(page_info->page);
+			gve_rx_flip_buffer(page_info, data_slot);
+		}
+	} else {
+		skb = gve_rx_copy(netdev, napi, page_info, len);
+		if (skb) {
+			u64_stats_update_begin(&rx->statss);
+			rx->rx_copied_pkt++;
+			u64_stats_update_end(&rx->statss);
+		}
+	}
+	return skb;
 }
 
 static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
@@ -284,9 +420,10 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 	struct gve_rx_slot_page_info *page_info;
 	struct gve_priv *priv = rx->gve;
 	struct napi_struct *napi = &priv->ntfy_blocks[rx->ntfy_id].napi;
-	struct net_device *dev = priv->dev;
-	struct sk_buff *skb;
-	int pagecount;
+	struct net_device *netdev = priv->dev;
+	struct gve_rx_data_slot *data_slot;
+	struct sk_buff *skb = NULL;
+	dma_addr_t page_bus;
 	u16 len;
 
 	/* drop this packet */
@@ -294,71 +431,56 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 		u64_stats_update_begin(&rx->statss);
 		rx->rx_desc_err_dropped_pkt++;
 		u64_stats_update_end(&rx->statss);
-		return true;
+		return false;
 	}
 
 	len = be16_to_cpu(rx_desc->len) - GVE_RX_PAD;
 	page_info = &rx->data.page_info[idx];
-	dma_sync_single_for_cpu(&priv->pdev->dev, rx->data.qpl->page_buses[idx],
-				PAGE_SIZE, DMA_FROM_DEVICE);
 
-	/* gvnic can only receive into registered segments. If the buffer
-	 * can't be recycled, our only choice is to copy the data out of
-	 * it so that we can return it to the device.
-	 */
+	data_slot = &rx->data.data_ring[idx];
+	page_bus = (rx->data.raw_addressing) ?
+					be64_to_cpu(data_slot->addr) - page_info->page_offset :
+					rx->data.qpl->page_buses[idx];
+	dma_sync_single_for_cpu(&priv->pdev->dev, page_bus,
+				PAGE_SIZE, DMA_FROM_DEVICE);
 
-	if (PAGE_SIZE == 4096) {
-		if (len <= priv->rx_copybreak) {
-			/* Just copy small packets */
-			skb = gve_rx_copy(rx, dev, napi, page_info, len);
+	if (len <= priv->rx_copybreak) {
+		/* Just copy small packets */
+		skb = gve_rx_copy(netdev, napi, page_info, len);
+		if (skb) {
 			u64_stats_update_begin(&rx->statss);
+			rx->rx_copied_pkt++;
 			rx->rx_copybreak_pkt++;
 			u64_stats_update_end(&rx->statss);
-			goto have_skb;
 		}
-		if (unlikely(!gve_can_recycle_pages(dev))) {
-			skb = gve_rx_copy(rx, dev, napi, page_info, len);
-			goto have_skb;
-		}
-		pagecount = page_count(page_info->page);
-		if (pagecount == 1) {
-			/* No part of this page is used by any SKBs; we attach
-			 * the page fragment to a new SKB and pass it up the
-			 * stack.
-			 */
-			skb = gve_rx_add_frags(dev, napi, page_info, len);
-			if (!skb) {
-				u64_stats_update_begin(&rx->statss);
-				rx->rx_skb_alloc_fail++;
-				u64_stats_update_end(&rx->statss);
-				return true;
+	} else {
+		bool can_flip = gve_rx_can_flip_buffers(netdev);
+		int recycle = 0;
+
+		if (can_flip) {
+			recycle = gve_rx_can_recycle_buffer(page_info->page);
+			if (recycle < 0) {
+				gve_schedule_reset(priv);
+				return false;
 			}
-			/* Make sure the kernel stack can't release the page */
-			get_page(page_info->page);
-			/* "flip" to other packet buffer on this page */
-			gve_rx_flip_buff(page_info, &rx->data.data_ring[idx]);
-		} else if (pagecount >= 2) {
-			/* We have previously passed the other half of this
-			 * page up the stack, but it has not yet been freed.
-			 */
-			skb = gve_rx_copy(rx, dev, napi, page_info, len);
+		}
+		if (rx->data.raw_addressing) {
+			skb = gve_rx_raw_addressing(&priv->pdev->dev, netdev,
+						    page_info, len, napi,
+						    data_slot,
+						    can_flip && recycle);
 		} else {
-			WARN(pagecount < 1, "Pagecount should never be < 1");
-			return false;
+			skb = gve_rx_qpl(&priv->pdev->dev, netdev, rx,
+					 page_info, len, napi, data_slot,
+					 can_flip && recycle);
 		}
-	} else {
-		skb = gve_rx_copy(rx, dev, napi, page_info, len);
 	}
 
-have_skb:
-	/* We didn't manage to allocate an skb but we haven't had any
-	 * reset worthy failures.
-	 */
 	if (!skb) {
 		u64_stats_update_begin(&rx->statss);
 		rx->rx_skb_alloc_fail++;
 		u64_stats_update_end(&rx->statss);
-		return true;
+		return false;
 	}
 
 	if (likely(feat & NETIF_F_RXCSUM)) {
@@ -380,6 +502,7 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 		napi_gro_frags(napi);
 	else
 		napi_gro_receive(napi, skb);
+
 	return true;
 }
 
@@ -399,19 +522,72 @@ static bool gve_rx_work_pending(struct gve_rx_ring *rx)
 	return (GVE_SEQNO(flags_seq) == rx->desc.seqno);
 }
 
+static bool gve_rx_refill_buffers(struct gve_priv *priv, struct gve_rx_ring *rx)
+{
+	u32 fill_cnt = rx->fill_cnt;
+
+	while ((fill_cnt & rx->mask) != (rx->cnt & rx->mask)) {
+		u32 idx = fill_cnt & rx->mask;
+		struct gve_rx_slot_page_info *page_info =
+						&rx->data.page_info[idx];
+
+		if (page_info->can_flip) {
+			/* The other half of the page is free because it was
+			 * free when we processed the descriptor. Flip to it.
+			 */
+			struct gve_rx_data_slot *data_slot =
+						&rx->data.data_ring[idx];
+
+			gve_rx_flip_buffer(page_info, data_slot);
+			page_info->can_flip = false;
+		} else {
+			/* It is possible that the networking stack has already
+			 * finished processing all outstanding packets in the buffer
+			 * and it can be reused.
+			 * Flipping is unceccessary here - if the networking stack still
+			 * owns half the page it is impossible to tell which half. Either
+			 * the whole page is free or it needs to be replaced.
+			 */
+			int recycle = gve_rx_can_recycle_buffer(page_info->page);
+
+			if (recycle < 0) {
+				gve_schedule_reset(priv);
+				return false;
+			}
+			if (!recycle) {
+				/* We can't reuse the buffer - alloc a new one*/
+				struct gve_rx_data_slot *data_slot =
+						&rx->data.data_ring[idx];
+				struct device *dev = &priv->pdev->dev;
+
+				gve_rx_free_buffer(dev, page_info, data_slot);
+				page_info->page = NULL;
+				if (gve_rx_alloc_buffer(priv, dev, page_info,
+							data_slot, rx)) {
+					break;
+				}
+			}
+		}
+		fill_cnt++;
+	}
+	rx->fill_cnt = fill_cnt;
+	return true;
+}
+
 bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
 		       netdev_features_t feat)
 {
 	struct gve_priv *priv = rx->gve;
+	u32 work_done = 0, packets = 0;
 	struct gve_rx_desc *desc;
 	u32 cnt = rx->cnt;
 	u32 idx = cnt & rx->mask;
-	u32 work_done = 0;
 	u64 bytes = 0;
 
 	desc = rx->desc.desc_ring + idx;
 	while ((GVE_SEQNO(desc->flags_seq) == rx->desc.seqno) &&
 	       work_done < budget) {
+		bool dropped;
 		netif_info(priv, rx_status, priv->dev,
 			   "[%d] idx=%d desc=%p desc->flags_seq=0x%x\n",
 			   rx->q_num, idx, desc, desc->flags_seq);
@@ -419,9 +595,11 @@ bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
 			   "[%d] seqno=%d rx->desc.seqno=%d\n",
 			   rx->q_num, GVE_SEQNO(desc->flags_seq),
 			   rx->desc.seqno);
-		bytes += be16_to_cpu(desc->len) - GVE_RX_PAD;
-		if (!gve_rx(rx, desc, feat, idx))
-			gve_schedule_reset(priv);
+		dropped = !gve_rx(rx, desc, feat, idx);
+		if (!dropped) {
+			bytes += be16_to_cpu(desc->len) - GVE_RX_PAD;
+			packets++;
+		}
 		cnt++;
 		idx = cnt & rx->mask;
 		desc = rx->desc.desc_ring + idx;
@@ -433,11 +611,21 @@ bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
 		return false;
 
 	u64_stats_update_begin(&rx->statss);
-	rx->rpackets += work_done;
+	rx->rpackets += packets;
 	rx->rbytes += bytes;
 	u64_stats_update_end(&rx->statss);
 	rx->cnt = cnt;
-	rx->fill_cnt += work_done;
+	/* restock ring slots */
+	if (!rx->data.raw_addressing) {
+		/* In QPL mode buffs are refilled as the desc are processed */
+		rx->fill_cnt += work_done;
+	} else if (rx->fill_cnt - cnt <= rx->db_threshold) {
+		/* In raw addressing mode buffs are only refilled if the avail
+		 * falls below a threshold.
+		 */
+		if (!gve_rx_refill_buffers(priv, rx))
+			return false;
+	}
 
 	gve_rx_write_doorbell(priv, rx);
 	return gve_rx_work_pending(rx);
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 11/18] gve: Add support for raw addressing in the tx path
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (9 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 10/18] gve: Add support for raw addressing to the rx path David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 12/18] gve: Add netif_set_xps_queue call David Awogbemila
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

If raw addressing is supported, don't setup the tx fifo or use it.
Instead dma_map the skb's data buffer and pass the dma address down in
the descriptors. Store the mapping to unmap it during clean.

This means that the device can perform DMA directly from these addresses
and the driver does not have to copy the buffer content into
pre-allocated buffers/qpls (as in qpl mode)

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h      |  18 +-
 drivers/net/ethernet/google/gve/gve_desc.h |   8 +-
 drivers/net/ethernet/google/gve/gve_tx.c   | 208 +++++++++++++++++----
 3 files changed, 192 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index c86cec163bd6..c0f0b22c1ec0 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -112,12 +112,20 @@ struct gve_tx_iovec {
 	u32 iov_padding; /* padding associated with this segment */
 };
 
+struct gve_tx_dma_buf {
+	DEFINE_DMA_UNMAP_ADDR(dma);
+	DEFINE_DMA_UNMAP_LEN(len);
+};
+
 /* Tracks the memory in the fifo occupied by the skb. Mapped 1:1 to a desc
  * ring entry but only used for a pkt_desc not a seg_desc
  */
 struct gve_tx_buffer_state {
 	struct sk_buff *skb; /* skb for this pkt */
-	struct gve_tx_iovec iov[GVE_TX_MAX_IOVEC]; /* segments of this pkt */
+	union {
+		struct gve_tx_iovec iov[GVE_TX_MAX_IOVEC]; /* segments of this pkt */
+		struct gve_tx_dma_buf buf;
+	};
 };
 
 /* A TX buffer - each queue has one */
@@ -140,13 +148,16 @@ struct gve_tx_ring {
 	__be32 last_nic_done ____cacheline_aligned; /* NIC tail pointer */
 	u64 pkt_done; /* free-running - total packets completed */
 	u64 bytes_done; /* free-running - total bytes completed */
+	u32 dropped_pkt; /* free-running - total packets dropped */
 
 	/* Cacheline 2 -- Read-mostly fields */
 	union gve_tx_desc *desc ____cacheline_aligned;
 	struct gve_tx_buffer_state *info; /* Maps 1:1 to a desc */
 	struct netdev_queue *netdev_txq;
 	struct gve_queue_resources *q_resources; /* head and tail pointer idx */
+	struct device *dev;
 	u32 mask; /* masks req and done down to queue size */
+	bool raw_addressing; /* use raw_addressing? */
 
 	/* Slow-path fields */
 	u32 q_num ____cacheline_aligned; /* queue idx */
@@ -447,7 +458,10 @@ static inline u32 gve_rx_idx_to_ntfy(struct gve_priv *priv, u32 queue_idx)
  */
 static inline u32 gve_num_tx_qpls(struct gve_priv *priv)
 {
-	return priv->tx_cfg.num_queues;
+	if (priv->raw_addressing)
+		return 0;
+	else
+		return priv->tx_cfg.num_queues;
 }
 
 /* Returns the number of rx queue page lists
diff --git a/drivers/net/ethernet/google/gve/gve_desc.h b/drivers/net/ethernet/google/gve/gve_desc.h
index 0aad314aefaf..a7da364e81c8 100644
--- a/drivers/net/ethernet/google/gve/gve_desc.h
+++ b/drivers/net/ethernet/google/gve/gve_desc.h
@@ -16,9 +16,11 @@
  * Base addresses encoded in seg_addr are not assumed to be physical
  * addresses. The ring format assumes these come from some linear address
  * space. This could be physical memory, kernel virtual memory, user virtual
- * memory. gVNIC uses lists of registered pages. Each queue is assumed
- * to be associated with a single such linear address space to ensure a
- * consistent meaning for seg_addrs posted to its rings.
+ * memory.
+ * If raw dma addressing is not supported then gVNIC uses lists of registered
+ * pages. Each queue is assumed to be associated with a single such linear
+ * address space to ensure a consistent meaning for seg_addrs posted to its
+ * rings.
  */
 
 struct gve_tx_pkt_desc {
diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index d0244feb0301..01d26bdca9b1 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -158,9 +158,11 @@ static void gve_tx_free_ring(struct gve_priv *priv, int idx)
 			  tx->q_resources, tx->q_resources_bus);
 	tx->q_resources = NULL;
 
-	gve_tx_fifo_release(priv, &tx->tx_fifo);
-	gve_unassign_qpl(priv, tx->tx_fifo.qpl->id);
-	tx->tx_fifo.qpl = NULL;
+	if (!tx->raw_addressing) {
+		gve_tx_fifo_release(priv, &tx->tx_fifo);
+		gve_unassign_qpl(priv, tx->tx_fifo.qpl->id);
+		tx->tx_fifo.qpl = NULL;
+	}
 
 	bytes = sizeof(*tx->desc) * slots;
 	dma_free_coherent(hdev, bytes, tx->desc, tx->bus);
@@ -206,11 +208,15 @@ static int gve_tx_alloc_ring(struct gve_priv *priv, int idx)
 	if (!tx->desc)
 		goto abort_with_info;
 
-	tx->tx_fifo.qpl = gve_assign_tx_qpl(priv);
+	tx->raw_addressing = priv->raw_addressing;
+	tx->dev = &priv->pdev->dev;
+	if (!tx->raw_addressing) {
+		tx->tx_fifo.qpl = gve_assign_tx_qpl(priv);
 
-	/* map Tx FIFO */
-	if (gve_tx_fifo_init(priv, &tx->tx_fifo))
-		goto abort_with_desc;
+		/* map Tx FIFO */
+		if (gve_tx_fifo_init(priv, &tx->tx_fifo))
+			goto abort_with_desc;
+	}
 
 	tx->q_resources =
 		dma_alloc_coherent(hdev,
@@ -228,7 +234,8 @@ static int gve_tx_alloc_ring(struct gve_priv *priv, int idx)
 	return 0;
 
 abort_with_fifo:
-	gve_tx_fifo_release(priv, &tx->tx_fifo);
+	if (!tx->raw_addressing)
+		gve_tx_fifo_release(priv, &tx->tx_fifo);
 abort_with_desc:
 	dma_free_coherent(hdev, bytes, tx->desc, tx->bus);
 	tx->desc = NULL;
@@ -301,27 +308,47 @@ static inline int gve_skb_fifo_bytes_required(struct gve_tx_ring *tx,
 	return bytes;
 }
 
-/* The most descriptors we could need are 3 - 1 for the headers, 1 for
- * the beginning of the payload at the end of the FIFO, and 1 if the
- * payload wraps to the beginning of the FIFO.
+/* The most descriptors we could need is MAX_SKB_FRAGS + 3 : 1 for each skb frag,
+ * +1 for the skb linear portion, +1 for when tcp hdr needs to be in separate descriptor,
+ * and +1 if the payload wraps to the beginning of the FIFO.
  */
-#define MAX_TX_DESC_NEEDED	3
+#define MAX_TX_DESC_NEEDED	(MAX_SKB_FRAGS + 3)
+static void gve_tx_unmap_buf(struct device *dev, struct gve_tx_buffer_state *info)
+{
+	if (info->skb) {
+		dma_unmap_single(dev, dma_unmap_addr(&info->buf, dma),
+				 dma_unmap_len(&info->buf, len),
+				 DMA_TO_DEVICE);
+		dma_unmap_len_set(&info->buf, len, 0);
+	} else {
+		dma_unmap_page(dev, dma_unmap_addr(&info->buf, dma),
+			       dma_unmap_len(&info->buf, len),
+			       DMA_TO_DEVICE);
+		dma_unmap_len_set(&info->buf, len, 0);
+	}
+}
 
 /* Check if sufficient resources (descriptor ring space, FIFO space) are
  * available to transmit the given number of bytes.
  */
 static inline bool gve_can_tx(struct gve_tx_ring *tx, int bytes_required)
 {
-	return (gve_tx_avail(tx) >= MAX_TX_DESC_NEEDED &&
-		gve_tx_fifo_can_alloc(&tx->tx_fifo, bytes_required));
+	bool can_alloc = true;
+
+	if (!tx->raw_addressing)
+		can_alloc = gve_tx_fifo_can_alloc(&tx->tx_fifo, bytes_required);
+
+	return (gve_tx_avail(tx) >= MAX_TX_DESC_NEEDED && can_alloc);
 }
 
 /* Stops the queue if the skb cannot be transmitted. */
 static int gve_maybe_stop_tx(struct gve_tx_ring *tx, struct sk_buff *skb)
 {
-	int bytes_required;
+	int bytes_required = 0;
+
+	if (!tx->raw_addressing)
+		bytes_required = gve_skb_fifo_bytes_required(tx, skb);
 
-	bytes_required = gve_skb_fifo_bytes_required(tx, skb);
 	if (likely(gve_can_tx(tx, bytes_required)))
 		return 0;
 
@@ -390,22 +417,23 @@ static void gve_tx_fill_seg_desc(union gve_tx_desc *seg_desc,
 	seg_desc->seg.seg_addr = cpu_to_be64(addr);
 }
 
-static void gve_dma_sync_for_device(struct device *dev, dma_addr_t *page_buses,
-				    u64 iov_offset, u64 iov_len)
+static void gve_dma_sync_for_device(struct gve_priv *priv,
+				    dma_addr_t *page_buses,
+								u64 iov_offset, u64 iov_len)
 {
 	u64 last_page = (iov_offset + iov_len - 1) / PAGE_SIZE;
 	u64 first_page = iov_offset / PAGE_SIZE;
-	dma_addr_t dma;
 	u64 page;
 
 	for (page = first_page; page <= last_page; page++) {
-		dma = page_buses[page];
-		dma_sync_single_for_device(dev, dma, PAGE_SIZE, DMA_TO_DEVICE);
+		dma_addr_t dma = page_buses[page];
+
+		dma_sync_single_for_device(&priv->pdev->dev, dma, PAGE_SIZE,
+					   DMA_TO_DEVICE);
 	}
 }
 
-static int gve_tx_add_skb(struct gve_tx_ring *tx, struct sk_buff *skb,
-			  struct device *dev)
+static int gve_tx_add_skb_copy(struct gve_priv *priv, struct gve_tx_ring *tx, struct sk_buff *skb)
 {
 	int pad_bytes, hlen, hdr_nfrags, payload_nfrags, l4_hdr_offset;
 	union gve_tx_desc *pkt_desc, *seg_desc;
@@ -447,7 +475,7 @@ static int gve_tx_add_skb(struct gve_tx_ring *tx, struct sk_buff *skb,
 	skb_copy_bits(skb, 0,
 		      tx->tx_fifo.base + info->iov[hdr_nfrags - 1].iov_offset,
 		      hlen);
-	gve_dma_sync_for_device(dev, tx->tx_fifo.qpl->page_buses,
+	gve_dma_sync_for_device(priv, tx->tx_fifo.qpl->page_buses,
 				info->iov[hdr_nfrags - 1].iov_offset,
 				info->iov[hdr_nfrags - 1].iov_len);
 	copy_offset = hlen;
@@ -463,7 +491,7 @@ static int gve_tx_add_skb(struct gve_tx_ring *tx, struct sk_buff *skb,
 		skb_copy_bits(skb, copy_offset,
 			      tx->tx_fifo.base + info->iov[i].iov_offset,
 			      info->iov[i].iov_len);
-		gve_dma_sync_for_device(dev, tx->tx_fifo.qpl->page_buses,
+		gve_dma_sync_for_device(priv, tx->tx_fifo.qpl->page_buses,
 					info->iov[i].iov_offset,
 					info->iov[i].iov_len);
 		copy_offset += info->iov[i].iov_len;
@@ -472,6 +500,98 @@ static int gve_tx_add_skb(struct gve_tx_ring *tx, struct sk_buff *skb,
 	return 1 + payload_nfrags;
 }
 
+static int gve_tx_add_skb_no_copy(struct gve_priv *priv, struct gve_tx_ring *tx,
+				  struct sk_buff *skb)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int hlen, payload_nfrags, l4_hdr_offset, seg_idx_bias;
+	union gve_tx_desc *pkt_desc, *seg_desc;
+	struct gve_tx_buffer_state *info;
+	bool is_gso = skb_is_gso(skb);
+	u32 idx = tx->req & tx->mask;
+	struct gve_tx_dma_buf *buf;
+	int last_mapped = 0;
+	u64 addr;
+	u32 len;
+	int i;
+
+	info = &tx->info[idx];
+	pkt_desc = &tx->desc[idx];
+
+	l4_hdr_offset = skb_checksum_start_offset(skb);
+	/* If the skb is gso, then we want only up to the tcp header in the first segment
+	 * to efficiently replicate on each segment otherwise we want the linear portion
+	 * of the skb (which will contain the checksum because skb->csum_start and
+	 * skb->csum_offset are given relative to skb->head) in the first segment.
+	 */
+	hlen = is_gso ? l4_hdr_offset + tcp_hdrlen(skb) :
+			skb_headlen(skb);
+	len = skb_headlen(skb);
+
+	info->skb =  skb;
+
+	addr = dma_map_single(tx->dev, skb->data, len, DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(tx->dev, addr))) {
+		priv->dma_mapping_error++;
+		goto drop;
+	}
+	buf = &info->buf;
+	dma_unmap_len_set(buf, len, len);
+	dma_unmap_addr_set(buf, dma, addr);
+
+	payload_nfrags = shinfo->nr_frags;
+	if (hlen < len) {
+		/* For gso the rest of the linear portion of the skb needs to
+		 * be in its own descriptor.
+		 */
+		payload_nfrags++;
+		gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset,
+				     1 + payload_nfrags, hlen, addr);
+
+		len -= hlen;
+		addr += hlen;
+		seg_desc = &tx->desc[(tx->req + 1) & tx->mask];
+		seg_idx_bias = 2;
+		gve_tx_fill_seg_desc(seg_desc, skb, is_gso, len, addr);
+	} else {
+		seg_idx_bias = 1;
+		gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset,
+				     1 + payload_nfrags, hlen, addr);
+	}
+	idx = (tx->req + seg_idx_bias) & tx->mask;
+
+	for (i = 0; i < payload_nfrags - (seg_idx_bias - 1); i++) {
+		const skb_frag_t *frag = &shinfo->frags[i];
+
+		seg_desc = &tx->desc[idx];
+		len = skb_frag_size(frag);
+		addr = skb_frag_dma_map(tx->dev, frag, 0, len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx->dev, addr))) {
+			priv->dma_mapping_error++;
+			goto unmap_drop;
+		}
+		buf = &tx->info[idx].buf;
+		tx->info[idx].skb = NULL;
+		dma_unmap_len_set(buf, len, len);
+		dma_unmap_addr_set(buf, dma, addr);
+
+		gve_tx_fill_seg_desc(seg_desc, skb, is_gso, len, addr);
+		idx = (idx + 1) & tx->mask;
+	}
+
+	return 1 + payload_nfrags;
+
+unmap_drop:
+	i--;
+	for (last_mapped = i + seg_idx_bias; last_mapped >= 0; last_mapped--) {
+		idx = (tx->req + last_mapped) & tx->mask;
+		gve_tx_unmap_buf(tx->dev, &tx->info[idx]);
+	}
+drop:
+	tx->dropped_pkt++;
+	return 0;
+}
+
 netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gve_priv *priv = netdev_priv(dev);
@@ -490,12 +610,20 @@ netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev)
 		gve_tx_put_doorbell(priv, tx->q_resources, tx->req);
 		return NETDEV_TX_BUSY;
 	}
-	nsegs = gve_tx_add_skb(tx, skb, &priv->pdev->dev);
-
-	netdev_tx_sent_queue(tx->netdev_txq, skb->len);
-	skb_tx_timestamp(skb);
+	if (tx->raw_addressing)
+		nsegs = gve_tx_add_skb_no_copy(priv, tx, skb);
+	else
+		nsegs = gve_tx_add_skb_copy(priv, tx, skb);
+
+	/* If the packet is getting sent, we need to update the skb */
+	if (nsegs) {
+		netdev_tx_sent_queue(tx->netdev_txq, skb->len);
+		skb_tx_timestamp(skb);
+	}
 
-	/* give packets to NIC */
+	/* Give packets to NIC. Even if this packet failed to send the doorbell
+	 * might need to be rung because of xmit_more.
+	 */
 	tx->req += nsegs;
 
 	if (!netif_xmit_stopped(tx->netdev_txq) && netdev_xmit_more())
@@ -525,24 +653,30 @@ static int gve_clean_tx_done(struct gve_priv *priv, struct gve_tx_ring *tx,
 		info = &tx->info[idx];
 		skb = info->skb;
 
+		/* Unmap the buffer */
+		if (tx->raw_addressing)
+			gve_tx_unmap_buf(tx->dev, info);
 		/* Mark as free */
 		if (skb) {
 			info->skb = NULL;
 			bytes += skb->len;
 			pkts++;
 			dev_consume_skb_any(skb);
-			/* FIFO free */
-			for (i = 0; i < ARRAY_SIZE(info->iov); i++) {
-				space_freed += info->iov[i].iov_len +
-					       info->iov[i].iov_padding;
-				info->iov[i].iov_len = 0;
-				info->iov[i].iov_padding = 0;
+			if (!tx->raw_addressing) {
+				/* FIFO free */
+				for (i = 0; i < ARRAY_SIZE(info->iov); i++) {
+					space_freed += info->iov[i].iov_len +
+						       info->iov[i].iov_padding;
+					info->iov[i].iov_len = 0;
+					info->iov[i].iov_padding = 0;
+				}
 			}
 		}
 		tx->done++;
 	}
 
-	gve_tx_free_fifo(&tx->tx_fifo, space_freed);
+	if (!tx->raw_addressing)
+		gve_tx_free_fifo(&tx->tx_fifo, space_freed);
 	u64_stats_update_begin(&tx->statss);
 	tx->bytes_done += bytes;
 	tx->pkt_done += pkts;
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 12/18] gve: Add netif_set_xps_queue call
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (10 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 11/18] gve: Add support for raw addressing in the tx path David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 13/18] gve: Add rx buffer pagecnt bias David Awogbemila
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

Configure XPS when adding tx queues to the notification blocks.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_tx.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index 01d26bdca9b1..5d75dec1c520 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -176,12 +176,16 @@ static void gve_tx_free_ring(struct gve_priv *priv, int idx)
 
 static void gve_tx_add_to_block(struct gve_priv *priv, int queue_idx)
 {
+	unsigned int active_cpus = min_t(int, priv->num_ntfy_blks / 2,
+					 num_online_cpus());
 	int ntfy_idx = gve_tx_idx_to_ntfy(priv, queue_idx);
 	struct gve_notify_block *block = &priv->ntfy_blocks[ntfy_idx];
 	struct gve_tx_ring *tx = &priv->tx[queue_idx];
 
 	block->tx = tx;
 	tx->ntfy_id = ntfy_idx;
+	netif_set_xps_queue(priv->dev, get_cpu_mask(ntfy_idx % active_cpus),
+			    queue_idx);
 }
 
 static int gve_tx_alloc_ring(struct gve_priv *priv, int idx)
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 13/18] gve: Add rx buffer pagecnt bias.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (11 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 12/18] gve: Add netif_set_xps_queue call David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 14/18] gve: Move the irq db indexes out of the ntfy block struct David Awogbemila
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

Add a pagecnt bias field to rx buffer info struct to eliminate
needing to increment the atomic page ref count on every pass in the
rx hotpath.

We now keep track of whether the nic has the only reference to a page
by decrementing the bias instead of incrementing the atomic page ref
count, which could be expensive.
If the bias is equal to the pagecount, then the nic has the only
reference to that page. But, if the bias is less than the page count,
the networking stack is still using the page.
The pagecount should never be less than the bias.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h    |  1 +
 drivers/net/ethernet/google/gve/gve_rx.c | 47 ++++++++++++++++++------
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index c0f0b22c1ec0..8b1773c45cb6 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -50,6 +50,7 @@ struct gve_rx_slot_page_info {
 	struct page *page;
 	void *page_address;
 	u32 page_offset; /* offset to write to in page */
+	int pagecnt_bias; /* expected pagecnt if only the driver has a ref */
 	bool can_flip; /* page can be flipped and reused */
 };
 
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index ca12f267d08a..c65615b9e602 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -23,6 +23,7 @@ static void gve_rx_free_buffer(struct device *dev,
 	dma_addr_t dma = (dma_addr_t)(be64_to_cpu(data_slot->addr) -
 				      page_info->page_offset);
 
+	page_ref_sub(page_info->page, page_info->pagecnt_bias - 1);
 	gve_free_page(dev, page_info->page, dma, DMA_FROM_DEVICE);
 }
 
@@ -70,6 +71,9 @@ static void gve_setup_rx_buffer(struct gve_rx_slot_page_info *page_info,
 	page_info->page_offset = 0;
 	page_info->page_address = page_address(page);
 	slot->addr = cpu_to_be64(addr);
+
+	set_page_count(page, INT_MAX);
+	page_info->pagecnt_bias = INT_MAX;
 }
 
 static int gve_prefill_rx_pages(struct gve_rx_ring *rx)
@@ -347,21 +351,40 @@ static bool gve_rx_can_flip_buffers(struct net_device *netdev)
 #endif
 }
 
-static int gve_rx_can_recycle_buffer(struct page *page)
+static int gve_rx_can_recycle_buffer(struct gve_rx_slot_page_info *page_info)
 {
-	int pagecount = page_count(page);
+	int pagecount = page_count(page_info->page);
 
 	/* This page is not being used by any SKBs - reuse */
-	if (pagecount == 1) {
+	if (pagecount == page_info->pagecnt_bias) {
 		return 1;
 	/* This page is still being used by an SKB - we can't reuse */
-	} else if (pagecount >= 2) {
+	} else if (pagecount > page_info->pagecnt_bias) {
 		return 0;
 	}
-	WARN(pagecount < 1, "Pagecount should never be < 1");
+	WARN(pagecount < page_info->pagecnt_bias, "Pagecount should never be less than the bias.");
 	return -1;
 }
 
+/* Update page reference not by incrementing the page count, but by
+ * decrementing the "bias" offset from page_count that determines
+ * whether the nic has the only reference.
+ */
+static void gve_rx_update_pagecnt_bias(struct gve_rx_slot_page_info *page_info)
+{
+	page_info->pagecnt_bias--;
+	if (page_info->pagecnt_bias == 0) {
+		int pagecount = page_count(page_info->page);
+
+		/* If we have run out of bias - set it back up to INT_MAX
+		 * minus the existing refs.
+		 */
+		page_info->pagecnt_bias = INT_MAX - (pagecount);
+		/* Set pagecount back up to max */
+		set_page_count(page_info->page, INT_MAX);
+	}
+}
+
 static struct sk_buff *
 gve_rx_raw_addressing(struct device *dev, struct net_device *netdev,
 		      struct gve_rx_slot_page_info *page_info, u16 len,
@@ -373,11 +396,11 @@ gve_rx_raw_addressing(struct device *dev, struct net_device *netdev,
 	if (!skb)
 		return NULL;
 
-	/* Optimistically stop the kernel from freeing the page by increasing
-	 * the page bias. We will check the refcount in refill to determine if
-	 * we need to alloc a new page.
+	/* Optimistically stop the kernel from freeing the page.
+	 * We will check again in refill to determine if we need to alloc a
+	 * new page.
 	 */
-	get_page(page_info->page);
+	gve_rx_update_pagecnt_bias(page_info);
 	page_info->can_flip = can_flip;
 
 	return skb;
@@ -400,7 +423,7 @@ gve_rx_qpl(struct device *dev, struct net_device *netdev,
 		/* No point in recycling if we didn't get the skb */
 		if (skb) {
 			/* Make sure the networking stack can't free the page */
-			get_page(page_info->page);
+			gve_rx_update_pagecnt_bias(page_info);
 			gve_rx_flip_buffer(page_info, data_slot);
 		}
 	} else {
@@ -458,7 +481,7 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 		int recycle = 0;
 
 		if (can_flip) {
-			recycle = gve_rx_can_recycle_buffer(page_info->page);
+			recycle = gve_rx_can_recycle_buffer(page_info);
 			if (recycle < 0) {
 				gve_schedule_reset(priv);
 				return false;
@@ -548,7 +571,7 @@ static bool gve_rx_refill_buffers(struct gve_priv *priv, struct gve_rx_ring *rx)
 			 * owns half the page it is impossible to tell which half. Either
 			 * the whole page is free or it needs to be replaced.
 			 */
-			int recycle = gve_rx_can_recycle_buffer(page_info->page);
+			int recycle = gve_rx_can_recycle_buffer(page_info);
 
 			if (recycle < 0) {
 				gve_schedule_reset(priv);
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 14/18] gve: Move the irq db indexes out of the ntfy block struct
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (12 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 13/18] gve: Add rx buffer pagecnt bias David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 15/18] gve: Prefetch packet pages and packet descriptors David Awogbemila
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Catherine Sullivan, Yangchun Fu, David Awogbemila

From: Catherine Sullivan <csully@google.com>

Giving the device access to other kernel structs is not ideal.
Move the indexes into their own array and just keep pointers to
them in the ntfy block struct.

Reviewed-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h        | 13 ++++---
 drivers/net/ethernet/google/gve/gve_adminq.c |  2 +-
 drivers/net/ethernet/google/gve/gve_main.c   | 37 ++++++++++++++------
 3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 8b1773c45cb6..bacb4070c755 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -174,13 +174,13 @@ struct gve_tx_ring {
  * associated with that irq.
  */
 struct gve_notify_block {
-	__be32 irq_db_index; /* idx into Bar2 - set by device, must be 1st */
+	__be32 *irq_db_index; /* pointer to idx into Bar2 */
 	char name[IFNAMSIZ + 16]; /* name registered with the kernel */
 	struct napi_struct napi; /* kernel napi struct for this block */
 	struct gve_priv *priv;
 	struct gve_tx_ring *tx; /* tx rings on this block */
 	struct gve_rx_ring *rx; /* rx rings on this block */
-} ____cacheline_aligned;
+};
 
 /* Tracks allowed and current queue settings */
 struct gve_queue_config {
@@ -194,13 +194,18 @@ struct gve_qpl_config {
 	unsigned long *qpl_id_map; /* bitmap of used qpl ids */
 };
 
+struct gve_irq_db {
+	__be32 index;
+} ____cacheline_aligned;
+
 struct gve_priv {
 	struct net_device *dev;
 	struct gve_tx_ring *tx; /* array of tx_cfg.num_queues */
 	struct gve_rx_ring *rx; /* array of rx_cfg.num_queues */
 	struct gve_queue_page_list *qpls; /* array of num qpls */
 	struct gve_notify_block *ntfy_blocks; /* array of num_ntfy_blks */
-	dma_addr_t ntfy_block_bus;
+	struct gve_irq_db *irq_db_indices; /* array of num_ntfy_blks */
+	dma_addr_t irq_db_indices_bus;
 	struct msix_entry *msix_vectors; /* array of num_ntfy_blks + 1 */
 	char mgmt_msix_name[IFNAMSIZ + 16];
 	u32 mgmt_msix_idx;
@@ -438,7 +443,7 @@ static inline void gve_clear_report_stats(struct gve_priv *priv)
 static inline __be32 __iomem *gve_irq_doorbell(struct gve_priv *priv,
 					       struct gve_notify_block *block)
 {
-	return &priv->db_bar2[be32_to_cpu(block->irq_db_index)];
+	return &priv->db_bar2[be32_to_cpu(*block->irq_db_index)];
 }
 
 /* Returns the index into ntfy_blocks of the given tx ring's block
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index bb21891d06a2..5d6784c4fc41 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -292,7 +292,7 @@ int gve_adminq_configure_device_resources(struct gve_priv *priv,
 		.num_counters = cpu_to_be32(num_counters),
 		.irq_db_addr = cpu_to_be64(db_array_bus_addr),
 		.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
-		.irq_db_stride = cpu_to_be32(sizeof(priv->ntfy_blocks[0])),
+		.irq_db_stride = cpu_to_be32(sizeof(*priv->irq_db_indices)),
 		.ntfy_blk_msix_base_idx =
 					cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
 	};
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index de22c60d1fea..ee434d3ca5e7 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -239,15 +239,24 @@ static int gve_alloc_notify_blocks(struct gve_priv *priv)
 		dev_err(&priv->pdev->dev, "Did not receive management vector.\n");
 		goto abort_with_msix_enabled;
 	}
-	priv->ntfy_blocks =
+
+	priv->irq_db_indices =
 		dma_alloc_coherent(&priv->pdev->dev,
 				   priv->num_ntfy_blks *
-				   sizeof(*priv->ntfy_blocks),
-				   &priv->ntfy_block_bus, GFP_KERNEL);
-	if (!priv->ntfy_blocks) {
+				   sizeof(*priv->irq_db_indices),
+				   &priv->irq_db_indices_bus, GFP_KERNEL);
+	if (!priv->irq_db_indices) {
 		err = -ENOMEM;
 		goto abort_with_mgmt_vector;
 	}
+
+	priv->ntfy_blocks = kvzalloc(priv->num_ntfy_blks *
+				     sizeof(*priv->ntfy_blocks), GFP_KERNEL);
+	if (!priv->ntfy_blocks) {
+		err = -ENOMEM;
+		goto abort_with_irq_db_indices;
+	}
+
 	/* Setup the other blocks - the first n-1 vectors */
 	for (i = 0; i < priv->num_ntfy_blks; i++) {
 		struct gve_notify_block *block = &priv->ntfy_blocks[i];
@@ -265,6 +274,7 @@ static int gve_alloc_notify_blocks(struct gve_priv *priv)
 		}
 		irq_set_affinity_hint(priv->msix_vectors[msix_idx].vector,
 				      get_cpu_mask(i % active_cpus));
+		block->irq_db_index = &priv->irq_db_indices[i].index;
 	}
 	return 0;
 abort_with_some_ntfy_blocks:
@@ -276,10 +286,13 @@ static int gve_alloc_notify_blocks(struct gve_priv *priv)
 				      NULL);
 		free_irq(priv->msix_vectors[msix_idx].vector, block);
 	}
-	dma_free_coherent(&priv->pdev->dev, priv->num_ntfy_blks *
-			  sizeof(*priv->ntfy_blocks),
-			  priv->ntfy_blocks, priv->ntfy_block_bus);
+	kvfree(priv->ntfy_blocks);
 	priv->ntfy_blocks = NULL;
+abort_with_irq_db_indices:
+	dma_free_coherent(&priv->pdev->dev, priv->num_ntfy_blks *
+			  sizeof(*priv->irq_db_indices),
+			  priv->irq_db_indices, priv->irq_db_indices_bus);
+	priv->irq_db_indices = NULL;
 abort_with_mgmt_vector:
 	free_irq(priv->msix_vectors[priv->mgmt_msix_idx].vector, priv);
 abort_with_msix_enabled:
@@ -303,10 +316,12 @@ static void gve_free_notify_blocks(struct gve_priv *priv)
 				      NULL);
 		free_irq(priv->msix_vectors[msix_idx].vector, block);
 	}
-	dma_free_coherent(&priv->pdev->dev,
-			  priv->num_ntfy_blks * sizeof(*priv->ntfy_blocks),
-			  priv->ntfy_blocks, priv->ntfy_block_bus);
+	kvfree(priv->ntfy_blocks);
 	priv->ntfy_blocks = NULL;
+	dma_free_coherent(&priv->pdev->dev, priv->num_ntfy_blks *
+			  sizeof(*priv->irq_db_indices),
+			  priv->irq_db_indices, priv->irq_db_indices_bus);
+	priv->irq_db_indices = NULL;
 	free_irq(priv->msix_vectors[priv->mgmt_msix_idx].vector, priv);
 	pci_disable_msix(priv->pdev);
 	kvfree(priv->msix_vectors);
@@ -329,7 +344,7 @@ static int gve_setup_device_resources(struct gve_priv *priv)
 	err = gve_adminq_configure_device_resources(priv,
 						    priv->counter_array_bus,
 						    priv->num_event_counters,
-						    priv->ntfy_block_bus,
+						    priv->irq_db_indices_bus,
 						    priv->num_ntfy_blks);
 	if (unlikely(err)) {
 		dev_err(&priv->pdev->dev,
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 15/18] gve: Prefetch packet pages and packet descriptors.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (13 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 14/18] gve: Move the irq db indexes out of the ntfy block struct David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 16/18] gve: Also WARN for skb index equals num_queues David Awogbemila
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Yangchun Fu, Nathan Lewis, David Awogbemila

From: Yangchun Fu <yangchun@google.com>

Prefetching appears to help performance.

Signed-off-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Nathan Lewis <npl@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_rx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index c65615b9e602..24bd556f488e 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -447,8 +447,18 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 	struct gve_rx_data_slot *data_slot;
 	struct sk_buff *skb = NULL;
 	dma_addr_t page_bus;
+	void *va;
 	u16 len;
 
+	/* Prefetch two packet pages ahead, we will need it soon. */
+	page_info = &rx->data.page_info[(idx + 2) & rx->mask];
+	va = page_info->page_address + GVE_RX_PAD +
+			page_info->page_offset;
+
+	prefetch(page_info->page); // Kernel page struct.
+	prefetch(va);              // Packet header.
+	prefetch(va + 64);           // Next cacheline too.
+
 	/* drop this packet */
 	if (unlikely(rx_desc->flags_seq & GVE_RXF_ERR)) {
 		u64_stats_update_begin(&rx->statss);
@@ -618,6 +628,10 @@ bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
 			   "[%d] seqno=%d rx->desc.seqno=%d\n",
 			   rx->q_num, GVE_SEQNO(desc->flags_seq),
 			   rx->desc.seqno);
+
+		// prefetch two descriptors ahead
+		prefetch(rx->desc.desc_ring + ((cnt + 2) & rx->mask));
+
 		dropped = !gve_rx(rx, desc, feat, idx);
 		if (!dropped) {
 			bytes += be16_to_cpu(desc->len) - GVE_RX_PAD;
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 16/18] gve: Also WARN for skb index equals num_queues.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (14 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 15/18] gve: Prefetch packet pages and packet descriptors David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 17/18] gve: Switch to use napi_complete_done David Awogbemila
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: David Awogbemila

The WARN condition currently checks if the skb's
queue_mapping is (strictly) greater than tx_cfg.num_queues.

queue_mapping == tx_cfg.num_queues should also be invalid
and trigger the WARNing.

Use WARN_ON_ONCE instead otherwise the host will be stuck in syslog
flood.

Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_tx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index 5d75dec1c520..5beff4d4d2e3 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -602,8 +602,7 @@ netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev)
 	struct gve_tx_ring *tx;
 	int nsegs;
 
-	WARN(skb_get_queue_mapping(skb) > priv->tx_cfg.num_queues,
-	     "skb queue index out of range");
+	WARN_ON_ONCE(skb_get_queue_mapping(skb) >= priv->tx_cfg.num_queues);
 	tx = &priv->tx[skb_get_queue_mapping(skb)];
 	if (unlikely(gve_maybe_stop_tx(tx, skb))) {
 		/* We need to ring the txq doorbell -- we have stopped the Tx
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 17/18] gve: Switch to use napi_complete_done
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (15 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 16/18] gve: Also WARN for skb index equals num_queues David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-18 19:44 ` [PATCH net-next 18/18] gve: Bump version to 1.1.0 David Awogbemila
  2020-08-18 20:19 ` [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Miller
  18 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: Yangchun Fu, Catherine Sullivan, David Awogbemila

From: Yangchun Fu <yangchun@google.com>

Use napi_complete_done to allow for the use of gro_flush_timeout.

Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve.h      |  5 ++---
 drivers/net/ethernet/google/gve/gve_main.c | 23 ++++++++++++++--------
 drivers/net/ethernet/google/gve/gve_rx.c   | 21 ++++++++++----------
 3 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index bacb4070c755..c67bdbb0ce11 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -545,11 +545,10 @@ __be32 gve_tx_load_event_counter(struct gve_priv *priv,
 				 struct gve_tx_ring *tx);
 /* rx handling */
 void gve_rx_write_doorbell(struct gve_priv *priv, struct gve_rx_ring *rx);
-bool gve_rx_poll(struct gve_notify_block *block, int budget);
+int gve_rx_poll(struct gve_notify_block *block, int budget);
+bool gve_rx_work_pending(struct gve_rx_ring *rx);
 int gve_rx_alloc_rings(struct gve_priv *priv);
 void gve_rx_free_rings(struct gve_priv *priv);
-bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
-		       netdev_features_t feat);
 /* Reset */
 void gve_schedule_reset(struct gve_priv *priv);
 int gve_reset(struct gve_priv *priv, bool attempt_teardown);
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index ee434d3ca5e7..088e8517bb2b 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -155,34 +155,41 @@ static int gve_napi_poll(struct napi_struct *napi, int budget)
 	__be32 __iomem *irq_doorbell;
 	bool reschedule = false;
 	struct gve_priv *priv;
+	int work_done = 0;
 
 	block = container_of(napi, struct gve_notify_block, napi);
 	priv = block->priv;
 
 	if (block->tx)
 		reschedule |= gve_tx_poll(block, budget);
-	if (block->rx)
-		reschedule |= gve_rx_poll(block, budget);
+	if (block->rx) {
+		work_done = gve_rx_poll(block, budget);
+		reschedule |= work_done == budget;
+	}
 
 	if (reschedule)
 		return budget;
 
-	napi_complete(napi);
+	/* Complete processing - don't unmask irq if busy polling is enabled */
+	if (!napi_complete_done(napi, work_done))
+		return work_done;
+
 	irq_doorbell = gve_irq_doorbell(priv, block);
 	iowrite32be(GVE_IRQ_ACK | GVE_IRQ_EVENT, irq_doorbell);
 
-	/* Double check we have no extra work.
-	 * Ensure unmask synchronizes with checking for work.
-	 */
+	/* Double check we have no extra work. Ensure unmask synchronizes with checking */
+	/* for work. */
 	dma_rmb();
+
 	if (block->tx)
 		reschedule |= gve_tx_poll(block, -1);
 	if (block->rx)
-		reschedule |= gve_rx_poll(block, -1);
+		reschedule |= gve_rx_work_pending(block->rx);
+
 	if (reschedule && napi_reschedule(napi))
 		iowrite32be(GVE_IRQ_MASK, irq_doorbell);
 
-	return 0;
+	return work_done;
 }
 
 static int gve_alloc_notify_blocks(struct gve_priv *priv)
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index 24bd556f488e..dad746cc65ac 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -539,7 +539,7 @@ static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
 	return true;
 }
 
-static bool gve_rx_work_pending(struct gve_rx_ring *rx)
+bool gve_rx_work_pending(struct gve_rx_ring *rx)
 {
 	struct gve_rx_desc *desc;
 	__be16 flags_seq;
@@ -607,8 +607,8 @@ static bool gve_rx_refill_buffers(struct gve_priv *priv, struct gve_rx_ring *rx)
 	return true;
 }
 
-bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
-		       netdev_features_t feat)
+static int gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
+			     netdev_features_t feat)
 {
 	struct gve_priv *priv = rx->gve;
 	u32 work_done = 0, packets = 0;
@@ -645,7 +645,7 @@ bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
 	}
 
 	if (!work_done)
-		return false;
+		return 0;
 
 	u64_stats_update_begin(&rx->statss);
 	rx->rpackets += packets;
@@ -665,14 +665,14 @@ bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
 	}
 
 	gve_rx_write_doorbell(priv, rx);
-	return gve_rx_work_pending(rx);
+	return work_done;
 }
 
-bool gve_rx_poll(struct gve_notify_block *block, int budget)
+int gve_rx_poll(struct gve_notify_block *block, int budget)
 {
 	struct gve_rx_ring *rx = block->rx;
 	netdev_features_t feat;
-	bool repoll = false;
+	int work_done = 0;
 
 	feat = block->napi.dev->features;
 
@@ -681,8 +681,7 @@ bool gve_rx_poll(struct gve_notify_block *block, int budget)
 		budget = INT_MAX;
 
 	if (budget > 0)
-		repoll |= gve_clean_rx_done(rx, budget, feat);
-	else
-		repoll |= gve_rx_work_pending(rx);
-	return repoll;
+		work_done = gve_clean_rx_done(rx, budget, feat);
+
+	return work_done;
 }
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next 18/18] gve: Bump version to 1.1.0.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (16 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 17/18] gve: Switch to use napi_complete_done David Awogbemila
@ 2020-08-18 19:44 ` David Awogbemila
  2020-08-19  3:40   ` Jakub Kicinski
  2020-08-18 20:19 ` [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Miller
  18 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 19:44 UTC (permalink / raw)
  To: netdev; +Cc: David Awogbemila

Update the driver version reported to the device and ethtool.

Signed-off-by: David Awogbemila <awogbemila@google.com>
---
 drivers/net/ethernet/google/gve/gve_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 088e8517bb2b..944a595dbfef 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -20,7 +20,7 @@
 #define GVE_DEFAULT_RX_COPYBREAK	(256)
 
 #define DEFAULT_MSG_LEVEL	(NETIF_MSG_DRV | NETIF_MSG_LINK)
-#define GVE_VERSION		"1.0.0"
+#define GVE_VERSION		"1.1.0"
 #define GVE_VERSION_PREFIX	"GVE-"
 
 const char gve_version_str[] = GVE_VERSION;
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 01/18] gve: Get and set Rx copybreak via ethtool
  2020-08-18 19:44 ` [PATCH net-next 01/18] gve: Get and set Rx copybreak via ethtool David Awogbemila
@ 2020-08-18 20:00   ` Andrew Lunn
  0 siblings, 0 replies; 38+ messages in thread
From: Andrew Lunn @ 2020-08-18 20:00 UTC (permalink / raw)
  To: David Awogbemila; +Cc: netdev, Kuo Zhao, Yangchun Fu

On Tue, Aug 18, 2020 at 12:44:00PM -0700, David Awogbemila wrote:
> From: Kuo Zhao <kuozhao@google.com>
> 
> This adds support for getting and setting the RX copybreak
> value via ethtool.
> 
> Reviewed-by: Yangchun Fu <yangchun@google.com>
> Signed-off-by: Kuo Zhao <kuozhao@google.com>
> Signed-off-by: David Awogbemila <awogbemila@google.com>
> ---
>  drivers/net/ethernet/google/gve/gve_ethtool.c | 34 +++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
> index d8fa816f4473..469d3332bcd6 100644
> --- a/drivers/net/ethernet/google/gve/gve_ethtool.c
> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> @@ -230,6 +230,38 @@ static int gve_user_reset(struct net_device *netdev, u32 *flags)
>  	return -EOPNOTSUPP;
>  }


Hi David.

> +static int gve_get_tunable(struct net_device *netdev,
> +			   const struct ethtool_tunable *etuna, void *value)
> +{
> +	struct gve_priv *priv = netdev_priv(netdev);
> +
> +	switch (etuna->id) {
> +	case ETHTOOL_RX_COPYBREAK:
> +		*(u32 *)value = priv->rx_copybreak;
> +		return 0;
> +	default:
> +		return -EINVAL;

EOPNOTSUPP would be better. Other tunables are not invalid, they are
simply not supported by this driver.

> +	}
> +}
> +
> +static int gve_set_tunable(struct net_device *netdev,
> +			   const struct ethtool_tunable *etuna, const void *value)
> +{
> +	struct gve_priv *priv = netdev_priv(netdev);
> +	u32 len;
> +
> +	switch (etuna->id) {
> +	case ETHTOOL_RX_COPYBREAK:
> +		len = *(u32 *)value;
> +		if (len > PAGE_SIZE / 2)
> +			return -EINVAL;
> +		priv->rx_copybreak = len;
> +		return 0;
> +	default:
> +		return -EINVAL;

Same here.

     Andrew

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 03/18] gve: Register netdev earlier
  2020-08-18 19:44 ` [PATCH net-next 03/18] gve: Register netdev earlier David Awogbemila
@ 2020-08-18 20:09   ` Andrew Lunn
  0 siblings, 0 replies; 38+ messages in thread
From: Andrew Lunn @ 2020-08-18 20:09 UTC (permalink / raw)
  To: David Awogbemila; +Cc: netdev, Catherine Sullivan, Yangchun Fu

On Tue, Aug 18, 2020 at 12:44:02PM -0700, David Awogbemila wrote:
> From: Catherine Sullivan <csully@google.com>
> 
> Move the registration of the netdev up to initialize
> it before doing any logging so that the correct device name
> is displayed.

This is generally a bad idea. As soon as register_netdev() is called,
the device is in use. Packets are flowing. This is particularly true
with NFS root. The device must be 100% ready to do work before calling
register_netdev().

Take a look at dev_alloc_name().

     Andrew

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 04/18] gve: Add support for dma_mask register
  2020-08-18 19:44 ` [PATCH net-next 04/18] gve: Add support for dma_mask register David Awogbemila
@ 2020-08-18 20:15   ` Andrew Lunn
  0 siblings, 0 replies; 38+ messages in thread
From: Andrew Lunn @ 2020-08-18 20:15 UTC (permalink / raw)
  To: David Awogbemila; +Cc: netdev, Catherine Sullivan, Yangchun Fu

On Tue, Aug 18, 2020 at 12:44:03PM -0700, David Awogbemila wrote:
> From: Catherine Sullivan <csully@google.com>
> 
> Add the dma_mask register and read it to set the dma_masks.
> gve_alloc_page will alloc_page with:
>  GFP_DMA if priv->dma_mask is 24,
>  GFP_DMA32 if priv->dma_mask is 32.

This needs reviewing by somebody who knows this stuff. My limited
understanding is the core should take care of most of this, so long as
you tell it of any device restrictions.

    Andrew

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 06/18] gve: Batch AQ commands for creating and destroying queues.
  2020-08-18 19:44 ` [PATCH net-next 06/18] gve: Batch AQ commands for creating and destroying queues David Awogbemila
@ 2020-08-18 20:16   ` David Miller
  2020-08-18 22:25     ` David Awogbemila
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2020-08-18 20:16 UTC (permalink / raw)
  To: awogbemila; +Cc: netdev, sagis, yangchun

From: David Awogbemila <awogbemila@google.com>
Date: Tue, 18 Aug 2020 12:44:05 -0700

> -int gve_adminq_execute_cmd(struct gve_priv *priv,
> -			   union gve_adminq_command *cmd_orig)
> +static int gve_adminq_issue_cmd(struct gve_priv *priv,
> +				union gve_adminq_command *cmd_orig)
>  {
>  	union gve_adminq_command *cmd;
> +	u32 tail;
>  	u32 opcode;

Please use reverse christmas tree ordering for local variables.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 10/18] gve: Add support for raw addressing to the rx path
  2020-08-18 19:44 ` [PATCH net-next 10/18] gve: Add support for raw addressing to the rx path David Awogbemila
@ 2020-08-18 20:18   ` David Miller
  2020-08-18 22:25     ` David Awogbemila
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2020-08-18 20:18 UTC (permalink / raw)
  To: awogbemila; +Cc: netdev, csully, yangchun

From: David Awogbemila <awogbemila@google.com>
Date: Tue, 18 Aug 2020 12:44:09 -0700

>  static void gve_rx_free_ring(struct gve_priv *priv, int idx)
>  {
>  	struct gve_rx_ring *rx = &priv->rx[idx];
>  	struct device *dev = &priv->pdev->dev;
>  	size_t bytes;
> -	u32 slots;
> +	u32 slots = rx->mask + 1;

Reverse christmas tree ordering for local variables please.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 00/18] GVE Driver v1.1.0 Features.
  2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
                   ` (17 preceding siblings ...)
  2020-08-18 19:44 ` [PATCH net-next 18/18] gve: Bump version to 1.1.0 David Awogbemila
@ 2020-08-18 20:19 ` David Miller
  2020-08-18 22:24   ` David Awogbemila
  18 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2020-08-18 20:19 UTC (permalink / raw)
  To: awogbemila; +Cc: netdev

From: David Awogbemila <awogbemila@google.com>
Date: Tue, 18 Aug 2020 12:43:59 -0700

> This patch series updates the GVE driver to v1.1.0 which broadly includes:
> - introducing "raw adressing" mode, which allows the driver avoid copies in
>   the guest.
> - increased stats coverage.
> - batching AdminQueue commands to the device.

This series is a bit too large.

Please sync with upstream more often so that you can submit smaller
patch sets, perhaps 10 or so patches at a time at most.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 08/18] gve: Enable Link Speed Reporting in the driver.
  2020-08-18 19:44 ` [PATCH net-next 08/18] gve: Enable Link Speed Reporting in the driver David Awogbemila
@ 2020-08-18 21:30   ` Jakub Kicinski
  0 siblings, 0 replies; 38+ messages in thread
From: Jakub Kicinski @ 2020-08-18 21:30 UTC (permalink / raw)
  To: David Awogbemila; +Cc: netdev, Yangchun Fu

On Tue, 18 Aug 2020 12:44:07 -0700 David Awogbemila wrote:
> This change allows the driver to report the device link speed
> when the ethtool command:
> 	ethtool <nic name>
> is run.
> Getting the link speed is done via a new admin queue command:
> ReportLinkSpeed.
> 
> Reviewed-by: Yangchun Fu <yangchun@google.com>
> Signed-off-by: David Awogbemila <awogbemila@google.com>

Please make sure the code builds cleanly with W=1 C=1:

drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64
drivers/net/ethernet/google/gve/gve_adminq.c:620:28: warning: cast to restricted __be64

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 00/18] GVE Driver v1.1.0 Features.
  2020-08-18 20:19 ` [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Miller
@ 2020-08-18 22:24   ` David Awogbemila
  0 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 22:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Aug 18, 2020 at 1:19 PM David Miller <davem@davemloft.net> wrote:
>
> From: David Awogbemila <awogbemila@google.com>
> Date: Tue, 18 Aug 2020 12:43:59 -0700
>
> > This patch series updates the GVE driver to v1.1.0 which broadly includes:
> > - introducing "raw adressing" mode, which allows the driver avoid copies in
> >   the guest.
> > - increased stats coverage.
> > - batching AdminQueue commands to the device.
>
> This series is a bit too large.
>
> Please sync with upstream more often so that you can submit smaller
> patch sets, perhaps 10 or so patches at a time at most.

Thanks, I'll split the patchset up.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 10/18] gve: Add support for raw addressing to the rx path
  2020-08-18 20:18   ` David Miller
@ 2020-08-18 22:25     ` David Awogbemila
  0 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 22:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Catherine Sullivan, Yangchun Fu

On Tue, Aug 18, 2020 at 1:18 PM David Miller <davem@davemloft.net> wrote:
>
> From: David Awogbemila <awogbemila@google.com>
> Date: Tue, 18 Aug 2020 12:44:09 -0700
>
> >  static void gve_rx_free_ring(struct gve_priv *priv, int idx)
> >  {
> >       struct gve_rx_ring *rx = &priv->rx[idx];
> >       struct device *dev = &priv->pdev->dev;
> >       size_t bytes;
> > -     u32 slots;
> > +     u32 slots = rx->mask + 1;
>
> Reverse christmas tree ordering for local variables please.
Got it. I'll fix this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 06/18] gve: Batch AQ commands for creating and destroying queues.
  2020-08-18 20:16   ` David Miller
@ 2020-08-18 22:25     ` David Awogbemila
  0 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-18 22:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Sagi Shahar, Yangchun Fu

Got it. I'll fix this.

On Tue, Aug 18, 2020 at 1:17 PM David Miller <davem@davemloft.net> wrote:
>
> From: David Awogbemila <awogbemila@google.com>
> Date: Tue, 18 Aug 2020 12:44:05 -0700
>
> > -int gve_adminq_execute_cmd(struct gve_priv *priv,
> > -                        union gve_adminq_command *cmd_orig)
> > +static int gve_adminq_issue_cmd(struct gve_priv *priv,
> > +                             union gve_adminq_command *cmd_orig)
> >  {
> >       union gve_adminq_command *cmd;
> > +     u32 tail;
> >       u32 opcode;
>
> Please use reverse christmas tree ordering for local variables.
>
> Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.
  2020-08-18 19:44 ` [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags David Awogbemila
@ 2020-08-19  3:13   ` Jakub Kicinski
  2020-08-25 15:46     ` David Awogbemila
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Kicinski @ 2020-08-19  3:13 UTC (permalink / raw)
  To: David Awogbemila; +Cc: netdev, Kuo Zhao, Yangchun Fu

On Tue, 18 Aug 2020 12:44:04 -0700 David Awogbemila wrote:
> From: Kuo Zhao <kuozhao@google.com>
> 
> Changes:
> - Add a new flag in service_task_flags. Check both this flag and
> ethtool flag when handle report stats. Update the stats when user turns
> ethtool flag on.
> 
> - In order to expose the NIC stats to the guest even when the ethtool flag
> is off, share the address and length of report at setup. When the
> ethtool flag turned off, zero off the gve stats instead of detaching the
> report. Only detach the report in free_stats_report.
> 
> - Adds the NIC stats to ethtool stats. These stats are always
> exposed to guest no matter the report stats flag is turned
> on or off.
> 
> - Update gve stats once every 20 seconds.
> 
> - Add a field for the interval of updating stats report to the AQ
> command. It will be exposed to USPS so that USPS can use the same
> interval to update its stats in the report.
> 
> Reviewed-by: Yangchun Fu <yangchun@google.com>
> Signed-off-by: Kuo Zhao <kuozhao@google.com>
> Signed-off-by: David Awogbemila <awogbemila@google.com>

This patch is quite hard to parse, please work on improving its
readability. Perhaps start by splitting changes to the stats from
hypervisor from the stats to hypervisor.

> +enum gve_stat_names {
> +	// stats from gve
> +	TX_WAKE_CNT			= 1,
> +	TX_STOP_CNT			= 2,
> +	TX_FRAMES_SENT			= 3,
> +	TX_BYTES_SENT			= 4,
> +	TX_LAST_COMPLETION_PROCESSED	= 5,
> +	RX_NEXT_EXPECTED_SEQUENCE	= 6,
> +	RX_BUFFERS_POSTED		= 7,

Just out of curiosity - what's the use for the stats reported by VM to
the hypervisor? 

> +	// stats from NIC
> +	RX_QUEUE_DROP_CNT		= 65,
> +	RX_NO_BUFFERS_POSTED		= 66,
> +	RX_DROPS_PACKET_OVER_MRU	= 67,
> +	RX_DROPS_INVALID_CHECKSUM	= 68,

Most of these look like a perfect match for members of struct
rtnl_link_stats64. Please use the standard stats to report the errors,
wherever possible.

> +};
> +
>  union gve_adminq_command {
>  	struct {
>  		__be32 opcode;

> +static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
> +{
> +	struct gve_priv *priv = netdev_priv(netdev);
> +	u64 ori_flags, new_flags;
> +	u32 i;
> +
> +	ori_flags = READ_ONCE(priv->ethtool_flags);
> +	new_flags = ori_flags;
> +
> +	for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
> +		if (flags & BIT(i))
> +			new_flags |= BIT(i);
> +		else
> +			new_flags &= ~(BIT(i));
> +		priv->ethtool_flags = new_flags;
> +		/* set report-stats */
> +		if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
> +			/* update the stats when user turns report-stats on */
> +			if (flags & BIT(i))
> +				gve_handle_report_stats(priv);
> +			/* zero off gve stats when report-stats turned off */
> +			if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
> +				int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
> +					priv->tx_cfg.num_queues;
> +				int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
> +					priv->rx_cfg.num_queues;
> +				memset(priv->stats_report->stats, 0,
> +				       (tx_stats_num + rx_stats_num) *
> +				       sizeof(struct stats));

I don't quite get why you need the knob to disable some statistics.
Please remove or explain this in the cover letter. Looks unnecessary.

> +			}
> +		}
> +	}
> +
> +	return 0;
> +}

> @@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
>  		dev_info(&priv->pdev->dev, "Device requested reset.\n");
>  		gve_set_do_reset(priv);
>  	}
> +	if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
> +		dev_info(&priv->pdev->dev, "Device report stats on.\n");

How often is this printed?

> +		gve_set_do_report_stats(priv);
> +	}
>  }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 07/18] gve: Use link status register to report link status
  2020-08-18 19:44 ` [PATCH net-next 07/18] gve: Use link status register to report link status David Awogbemila
@ 2020-08-19  3:36   ` Jakub Kicinski
  2020-08-25 15:46     ` David Awogbemila
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Kicinski @ 2020-08-19  3:36 UTC (permalink / raw)
  To: David Awogbemila; +Cc: netdev, Patricio Noyola, Yangchun Fu

On Tue, 18 Aug 2020 12:44:06 -0700 David Awogbemila wrote:
> +	if (link_status) {
> +		netif_carrier_on(priv->dev);
> +	} else {
> +		dev_info(&priv->pdev->dev, "Device link is down.\n");

Down message but no Up message?

> +		netif_carrier_off(priv->dev);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 18/18] gve: Bump version to 1.1.0.
  2020-08-18 19:44 ` [PATCH net-next 18/18] gve: Bump version to 1.1.0 David Awogbemila
@ 2020-08-19  3:40   ` Jakub Kicinski
  0 siblings, 0 replies; 38+ messages in thread
From: Jakub Kicinski @ 2020-08-19  3:40 UTC (permalink / raw)
  To: David Awogbemila; +Cc: netdev

On Tue, 18 Aug 2020 12:44:17 -0700 David Awogbemila wrote:
> Update the driver version reported to the device and ethtool.
> 
> Signed-off-by: David Awogbemila <awogbemila@google.com>

Please remove the driver version, we're working on getting rid of it in
all networking drivers.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.
  2020-08-19  3:13   ` Jakub Kicinski
@ 2020-08-25 15:46     ` David Awogbemila
  2020-08-25 16:46       ` Jakub Kicinski
  0 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-25 15:46 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, Kuo Zhao, Yangchun Fu

On Tue, Aug 18, 2020 at 8:13 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Aug 2020 12:44:04 -0700 David Awogbemila wrote:
> > From: Kuo Zhao <kuozhao@google.com>
> >
> > Changes:
> > - Add a new flag in service_task_flags. Check both this flag and
> > ethtool flag when handle report stats. Update the stats when user turns
> > ethtool flag on.
> >
> > - In order to expose the NIC stats to the guest even when the ethtool flag
> > is off, share the address and length of report at setup. When the
> > ethtool flag turned off, zero off the gve stats instead of detaching the
> > report. Only detach the report in free_stats_report.
> >
> > - Adds the NIC stats to ethtool stats. These stats are always
> > exposed to guest no matter the report stats flag is turned
> > on or off.
> >
> > - Update gve stats once every 20 seconds.
> >
> > - Add a field for the interval of updating stats report to the AQ
> > command. It will be exposed to USPS so that USPS can use the same
> > interval to update its stats in the report.
> >
> > Reviewed-by: Yangchun Fu <yangchun@google.com>
> > Signed-off-by: Kuo Zhao <kuozhao@google.com>
> > Signed-off-by: David Awogbemila <awogbemila@google.com>
>
> This patch is quite hard to parse, please work on improving its
> readability. Perhaps start by splitting changes to the stats from
> hypervisor from the stats to hypervisor.
Alright, I will split the patch as suggested.

>
> > +enum gve_stat_names {
> > +     // stats from gve
> > +     TX_WAKE_CNT                     = 1,
> > +     TX_STOP_CNT                     = 2,
> > +     TX_FRAMES_SENT                  = 3,
> > +     TX_BYTES_SENT                   = 4,
> > +     TX_LAST_COMPLETION_PROCESSED    = 5,
> > +     RX_NEXT_EXPECTED_SEQUENCE       = 6,
> > +     RX_BUFFERS_POSTED               = 7,
>
> Just out of curiosity - what's the use for the stats reported by VM to
> the hypervisor?
These stats are not used in the driver but are useful when looking at
the virtual NIC to investigate stuck queues and performance.

>
> > +     // stats from NIC
> > +     RX_QUEUE_DROP_CNT               = 65,
> > +     RX_NO_BUFFERS_POSTED            = 66,
> > +     RX_DROPS_PACKET_OVER_MRU        = 67,
> > +     RX_DROPS_INVALID_CHECKSUM       = 68,
>
> Most of these look like a perfect match for members of struct
> rtnl_link_stats64. Please use the standard stats to report the errors,
> wherever possible.
These stats are based on the NIC stats format which don't exactly
match rtnl_link_stats64.
I'll add some clarification in the description and within the comments.

>
> > +};
> > +
> >  union gve_adminq_command {
> >       struct {
> >               __be32 opcode;
>
> > +static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
> > +{
> > +     struct gve_priv *priv = netdev_priv(netdev);
> > +     u64 ori_flags, new_flags;
> > +     u32 i;
> > +
> > +     ori_flags = READ_ONCE(priv->ethtool_flags);
> > +     new_flags = ori_flags;
> > +
> > +     for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
> > +             if (flags & BIT(i))
> > +                     new_flags |= BIT(i);
> > +             else
> > +                     new_flags &= ~(BIT(i));
> > +             priv->ethtool_flags = new_flags;
> > +             /* set report-stats */
> > +             if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
> > +                     /* update the stats when user turns report-stats on */
> > +                     if (flags & BIT(i))
> > +                             gve_handle_report_stats(priv);
> > +                     /* zero off gve stats when report-stats turned off */
> > +                     if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
> > +                             int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
> > +                                     priv->tx_cfg.num_queues;
> > +                             int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
> > +                                     priv->rx_cfg.num_queues;
> > +                             memset(priv->stats_report->stats, 0,
> > +                                    (tx_stats_num + rx_stats_num) *
> > +                                    sizeof(struct stats));
>
> I don't quite get why you need the knob to disable some statistics.
> Please remove or explain this in the cover letter. Looks unnecessary.
We use this to give the guest the option of disabling stats reporting
through ethtool set-priv-flags. I'll update the cover letter.

>
> > +                     }
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}
>
> > @@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
> >               dev_info(&priv->pdev->dev, "Device requested reset.\n");
> >               gve_set_do_reset(priv);
> >       }
> > +     if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
> > +             dev_info(&priv->pdev->dev, "Device report stats on.\n");
>
> How often is this printed?
Stats reporting is disabled by default. But when enabled, this would
only get printed whenever the virtual NIC detects
an issue and triggers a report-stats request.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 07/18] gve: Use link status register to report link status
  2020-08-19  3:36   ` Jakub Kicinski
@ 2020-08-25 15:46     ` David Awogbemila
  0 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-25 15:46 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, Patricio Noyola, Yangchun Fu

On Tue, Aug 18, 2020 at 8:36 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Aug 2020 12:44:06 -0700 David Awogbemila wrote:
> > +     if (link_status) {
> > +             netif_carrier_on(priv->dev);
> > +     } else {
> > +             dev_info(&priv->pdev->dev, "Device link is down.\n");
>
> Down message but no Up message?
>
> > +             netif_carrier_off(priv->dev);

Thanks Jakub. I'll add an Up message to make it more uniform.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.
  2020-08-25 15:46     ` David Awogbemila
@ 2020-08-25 16:46       ` Jakub Kicinski
  2020-08-26  0:06         ` David Awogbemila
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Kicinski @ 2020-08-25 16:46 UTC (permalink / raw)
  To: David Awogbemila, Yangchun Fu; +Cc: netdev, Kuo Zhao, Willem de Bruijn

On Tue, 25 Aug 2020 08:46:12 -0700 David Awogbemila wrote:
> > > +     // stats from NIC
> > > +     RX_QUEUE_DROP_CNT               = 65,
> > > +     RX_NO_BUFFERS_POSTED            = 66,
> > > +     RX_DROPS_PACKET_OVER_MRU        = 67,
> > > +     RX_DROPS_INVALID_CHECKSUM       = 68,  
> >
> > Most of these look like a perfect match for members of struct
> > rtnl_link_stats64. Please use the standard stats to report the errors,
> > wherever possible.  
> These stats are based on the NIC stats format which don't exactly
> match rtnl_link_stats64.
> I'll add some clarification in the description and within the comments.

You must report standard stats. Don't be lazy and just dump everything
in ethtool -S and expect the user to figure out the meaning of your
strings.

> > > +static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
> > > +{
> > > +     struct gve_priv *priv = netdev_priv(netdev);
> > > +     u64 ori_flags, new_flags;
> > > +     u32 i;
> > > +
> > > +     ori_flags = READ_ONCE(priv->ethtool_flags);
> > > +     new_flags = ori_flags;
> > > +
> > > +     for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
> > > +             if (flags & BIT(i))
> > > +                     new_flags |= BIT(i);
> > > +             else
> > > +                     new_flags &= ~(BIT(i));
> > > +             priv->ethtool_flags = new_flags;
> > > +             /* set report-stats */
> > > +             if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
> > > +                     /* update the stats when user turns report-stats on */
> > > +                     if (flags & BIT(i))
> > > +                             gve_handle_report_stats(priv);
> > > +                     /* zero off gve stats when report-stats turned off */
> > > +                     if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
> > > +                             int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
> > > +                                     priv->tx_cfg.num_queues;
> > > +                             int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
> > > +                                     priv->rx_cfg.num_queues;
> > > +                             memset(priv->stats_report->stats, 0,
> > > +                                    (tx_stats_num + rx_stats_num) *
> > > +                                    sizeof(struct stats));  
> >
> > I don't quite get why you need the knob to disable some statistics.
> > Please remove or explain this in the cover letter. Looks unnecessary.  
> We use this to give the guest the option of disabling stats reporting
> through ethtool set-priv-flags. I'll update the cover letter.

I asked you why you reply a week later with "I want to give user the
option. I'll update the cover letter." :/ That's quite painful for the
reviewer. Please just provide the justification.

> > > @@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
> > >               dev_info(&priv->pdev->dev, "Device requested reset.\n");
> > >               gve_set_do_reset(priv);
> > >       }
> > > +     if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
> > > +             dev_info(&priv->pdev->dev, "Device report stats on.\n");  
> >
> > How often is this printed?  
> Stats reporting is disabled by default. But when enabled, this would
> only get printed whenever the virtual NIC detects
> an issue and triggers a report-stats request.

What kind of issue? Something serious? Packet drops?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.
  2020-08-25 16:46       ` Jakub Kicinski
@ 2020-08-26  0:06         ` David Awogbemila
  2020-08-26  0:53           ` Jakub Kicinski
  0 siblings, 1 reply; 38+ messages in thread
From: David Awogbemila @ 2020-08-26  0:06 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Yangchun Fu, netdev, Kuo Zhao, Willem de Bruijn

On Tue, Aug 25, 2020 at 9:46 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 25 Aug 2020 08:46:12 -0700 David Awogbemila wrote:
> > > > +     // stats from NIC
> > > > +     RX_QUEUE_DROP_CNT               = 65,
> > > > +     RX_NO_BUFFERS_POSTED            = 66,
> > > > +     RX_DROPS_PACKET_OVER_MRU        = 67,
> > > > +     RX_DROPS_INVALID_CHECKSUM       = 68,
> > >
> > > Most of these look like a perfect match for members of struct
> > > rtnl_link_stats64. Please use the standard stats to report the errors,
> > > wherever possible.
> > These stats are based on the NIC stats format which don't exactly
> > match rtnl_link_stats64.
> > I'll add some clarification in the description and within the comments.
>
> You must report standard stats. Don't be lazy and just dump everything
> in ethtool -S and expect the user to figure out the meaning of your
> strings.
Apologies for responding a week later, I'll try to respond more quickly.
I could use some help figuring out the use of rtnl_link_stats64 here.
These 4 stats are per-queue stats written by the NIC. It looks like
rtnl_link_stats64 is meant to sum stats for the entire device? Is the
requirement here simply to use the member names in rtnl_link_stats64
when reporting stats via ethtool? Thanks.

>
> > > > +static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
> > > > +{
> > > > +     struct gve_priv *priv = netdev_priv(netdev);
> > > > +     u64 ori_flags, new_flags;
> > > > +     u32 i;
> > > > +
> > > > +     ori_flags = READ_ONCE(priv->ethtool_flags);
> > > > +     new_flags = ori_flags;
> > > > +
> > > > +     for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
> > > > +             if (flags & BIT(i))
> > > > +                     new_flags |= BIT(i);
> > > > +             else
> > > > +                     new_flags &= ~(BIT(i));
> > > > +             priv->ethtool_flags = new_flags;
> > > > +             /* set report-stats */
> > > > +             if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
> > > > +                     /* update the stats when user turns report-stats on */
> > > > +                     if (flags & BIT(i))
> > > > +                             gve_handle_report_stats(priv);
> > > > +                     /* zero off gve stats when report-stats turned off */
> > > > +                     if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
> > > > +                             int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
> > > > +                                     priv->tx_cfg.num_queues;
> > > > +                             int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
> > > > +                                     priv->rx_cfg.num_queues;
> > > > +                             memset(priv->stats_report->stats, 0,
> > > > +                                    (tx_stats_num + rx_stats_num) *
> > > > +                                    sizeof(struct stats));
> > >
> > > I don't quite get why you need the knob to disable some statistics.
> > > Please remove or explain this in the cover letter. Looks unnecessary.
> > We use this to give the guest the option of disabling stats reporting
> > through ethtool set-priv-flags. I'll update the cover letter.
>
> I asked you why you reply a week later with "I want to give user the
> option. I'll update the cover letter." :/ That's quite painful for the
> reviewer. Please just provide the justification.
I apologize for the pain; it certainly wasn't intended :) .
Just to clarify, stats will always be available to the user via ethtool.
This is only giving users the option of disabling the reporting of
stats from the driver to the virtual NIC should the user decide they
do not want to share driver stats with Google as a matter of privacy.

>
> > > > @@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
> > > >               dev_info(&priv->pdev->dev, "Device requested reset.\n");
> > > >               gve_set_do_reset(priv);
> > > >       }
> > > > +     if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
> > > > +             dev_info(&priv->pdev->dev, "Device report stats on.\n");
> > >
> > > How often is this printed?
> > Stats reporting is disabled by default. But when enabled, this would
> > only get printed whenever the virtual NIC detects
> > an issue and triggers a report-stats request.
>
> What kind of issue? Something serious? Packet drops?
Sorry, to correct myself, this would get printed only at the moments
when the device switches report-stats on, not on every stats report.
After that, it would not get printed until it is switched off and then
on again, so rarely.
It would get switched on if there is a networking issue such as packet
drops and help us investigate a stuck queue for example.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.
  2020-08-26  0:06         ` David Awogbemila
@ 2020-08-26  0:53           ` Jakub Kicinski
  2020-08-27 19:24             ` David Awogbemila
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Kicinski @ 2020-08-26  0:53 UTC (permalink / raw)
  To: David Awogbemila; +Cc: Yangchun Fu, netdev, Kuo Zhao, Willem de Bruijn

On Tue, 25 Aug 2020 17:06:27 -0700 David Awogbemila wrote:
> On Tue, Aug 25, 2020 at 9:46 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue, 25 Aug 2020 08:46:12 -0700 David Awogbemila wrote:  
> > > > > +     // stats from NIC
> > > > > +     RX_QUEUE_DROP_CNT               = 65,
> > > > > +     RX_NO_BUFFERS_POSTED            = 66,
> > > > > +     RX_DROPS_PACKET_OVER_MRU        = 67,
> > > > > +     RX_DROPS_INVALID_CHECKSUM       = 68,  
> > > >
> > > > Most of these look like a perfect match for members of struct
> > > > rtnl_link_stats64. Please use the standard stats to report the errors,
> > > > wherever possible.  
> > > These stats are based on the NIC stats format which don't exactly
> > > match rtnl_link_stats64.
> > > I'll add some clarification in the description and within the comments.  
> >
> > You must report standard stats. Don't be lazy and just dump everything
> > in ethtool -S and expect the user to figure out the meaning of your
> > strings.  
> Apologies for responding a week later, I'll try to respond more quickly.

Thanks!

> I could use some help figuring out the use of rtnl_link_stats64 here.
> These 4 stats are per-queue stats written by the NIC. It looks like
> rtnl_link_stats64 is meant to sum stats for the entire device? Is the
> requirement here simply to use the member names in rtnl_link_stats64
> when reporting stats via ethtool? Thanks.

If these are per queue you can report them per queue in ethtool (name
is up to you), but you must report the sum over all queues in
rtnl_link_stats64.

FWIW the mapping I'd suggest is probably:

RX_QUEUE_DROP_CNT         -> rx_dropped
RX_NO_BUFFERS_POSTED      -> rx_missed_errors
RX_DROPS_PACKET_OVER_MRU  -> rx_length_errors
RX_DROPS_INVALID_CHECKSUM -> rx_crc_errors

The no-buffers-posted stat is unfortunately slightly disputable between
rx_missed_errors and rx_fifo_errors (even rx_over_errors). I'd go for missed.

> > > > > +static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
> > > > > +{
> > > > > +     struct gve_priv *priv = netdev_priv(netdev);
> > > > > +     u64 ori_flags, new_flags;
> > > > > +     u32 i;
> > > > > +
> > > > > +     ori_flags = READ_ONCE(priv->ethtool_flags);
> > > > > +     new_flags = ori_flags;
> > > > > +
> > > > > +     for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
> > > > > +             if (flags & BIT(i))
> > > > > +                     new_flags |= BIT(i);
> > > > > +             else
> > > > > +                     new_flags &= ~(BIT(i));
> > > > > +             priv->ethtool_flags = new_flags;
> > > > > +             /* set report-stats */
> > > > > +             if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
> > > > > +                     /* update the stats when user turns report-stats on */
> > > > > +                     if (flags & BIT(i))
> > > > > +                             gve_handle_report_stats(priv);
> > > > > +                     /* zero off gve stats when report-stats turned off */
> > > > > +                     if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
> > > > > +                             int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
> > > > > +                                     priv->tx_cfg.num_queues;
> > > > > +                             int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
> > > > > +                                     priv->rx_cfg.num_queues;
> > > > > +                             memset(priv->stats_report->stats, 0,
> > > > > +                                    (tx_stats_num + rx_stats_num) *
> > > > > +                                    sizeof(struct stats));  
> > > >
> > > > I don't quite get why you need the knob to disable some statistics.
> > > > Please remove or explain this in the cover letter. Looks unnecessary.  
> > > We use this to give the guest the option of disabling stats reporting
> > > through ethtool set-priv-flags. I'll update the cover letter.  
> >
> > I asked you why you reply a week later with "I want to give user the
> > option. I'll update the cover letter." :/ That's quite painful for the
> > reviewer. Please just provide the justification.  
> I apologize for the pain; it certainly wasn't intended :) .
> Just to clarify, stats will always be available to the user via ethtool.
> This is only giving users the option of disabling the reporting of
> stats from the driver to the virtual NIC should the user decide they
> do not want to share driver stats with Google as a matter of privacy.

Okay, so this is for the to-hypervisor direction. Hopefully the patch
split will make this clearer.

> > > > > @@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
> > > > >               dev_info(&priv->pdev->dev, "Device requested reset.\n");
> > > > >               gve_set_do_reset(priv);
> > > > >       }
> > > > > +     if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
> > > > > +             dev_info(&priv->pdev->dev, "Device report stats on.\n");  
> > > >
> > > > How often is this printed?  
> > > Stats reporting is disabled by default. But when enabled, this would
> > > only get printed whenever the virtual NIC detects
> > > an issue and triggers a report-stats request.  
> >
> > What kind of issue? Something serious? Packet drops?  
> Sorry, to correct myself, this would get printed only at the moments
> when the device switches report-stats on, not on every stats report.
> After that, it would not get printed until it is switched off and then
> on again, so rarely.
> It would get switched on if there is a networking issue such as packet
> drops and help us investigate a stuck queue for example.

Reporting of the stats is a one-shot:

+	if (gve_get_do_report_stats(priv)) {
+		gve_handle_report_stats(priv);
+		gve_clear_do_report_stats(priv);
+	}

So the hypervisor implements some rate limiting on the requests?

In general, I don't think users will want these messages to keep
popping up. A ethtool -S statistic for the number of times reporting
happened should be sufficient. Or if you really want them - have a
verbose version of the priv flag, maybe?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.
  2020-08-26  0:53           ` Jakub Kicinski
@ 2020-08-27 19:24             ` David Awogbemila
  0 siblings, 0 replies; 38+ messages in thread
From: David Awogbemila @ 2020-08-27 19:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Yangchun Fu, netdev, Kuo Zhao, Willem de Bruijn

> > I could use some help figuring out the use of rtnl_link_stats64 here.
> > These 4 stats are per-queue stats written by the NIC. It looks like
> > rtnl_link_stats64 is meant to sum stats for the entire device? Is the
> > requirement here simply to use the member names in rtnl_link_stats64
> > when reporting stats via ethtool? Thanks.
>
> If these are per queue you can report them per queue in ethtool (name
> is up to you), but you must report the sum over all queues in
> rtnl_link_stats64.
>
> FWIW the mapping I'd suggest is probably:
>
> RX_QUEUE_DROP_CNT         -> rx_dropped
> RX_NO_BUFFERS_POSTED      -> rx_missed_errors
> RX_DROPS_PACKET_OVER_MRU  -> rx_length_errors
> RX_DROPS_INVALID_CHECKSUM -> rx_crc_errors
>
> The no-buffers-posted stat is unfortunately slightly disputable between
> rx_missed_errors and rx_fifo_errors (even rx_over_errors). I'd go for missed.
This is very helpful, thanks. The aggregate stats are the stats named
in gve_gstrings_main_stats (in gve_ethtool.c). This particular patch
does not change those. Patch 2 ("gve: Add stats for gve.") is what I
think may have to change to conform to rtnl_link_stats64(?)

> > > > > > +static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
> > > > > > +{
> > > > > > +     struct gve_priv *priv = netdev_priv(netdev);
> > > > > > +     u64 ori_flags, new_flags;
> > > > > > +     u32 i;
> > > > > > +
> > > > > > +     ori_flags = READ_ONCE(priv->ethtool_flags);
> > > > > > +     new_flags = ori_flags;
> > > > > > +
> > > > > > +     for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
> > > > > > +             if (flags & BIT(i))
> > > > > > +                     new_flags |= BIT(i);
> > > > > > +             else
> > > > > > +                     new_flags &= ~(BIT(i));
> > > > > > +             priv->ethtool_flags = new_flags;
> > > > > > +             /* set report-stats */
> > > > > > +             if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
> > > > > > +                     /* update the stats when user turns report-stats on */
> > > > > > +                     if (flags & BIT(i))
> > > > > > +                             gve_handle_report_stats(priv);
> > > > > > +                     /* zero off gve stats when report-stats turned off */
> > > > > > +                     if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
> > > > > > +                             int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
> > > > > > +                                     priv->tx_cfg.num_queues;
> > > > > > +                             int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
> > > > > > +                                     priv->rx_cfg.num_queues;
> > > > > > +                             memset(priv->stats_report->stats, 0,
> > > > > > +                                    (tx_stats_num + rx_stats_num) *
> > > > > > +                                    sizeof(struct stats));
> > > > >
> > > > > I don't quite get why you need the knob to disable some statistics.
> > > > > Please remove or explain this in the cover letter. Looks unnecessary.
> > > > We use this to give the guest the option of disabling stats reporting
> > > > through ethtool set-priv-flags. I'll update the cover letter.
> > >
> > > I asked you why you reply a week later with "I want to give user the
> > > option. I'll update the cover letter." :/ That's quite painful for the
> > > reviewer. Please just provide the justification.
> > I apologize for the pain; it certainly wasn't intended :) .
> > Just to clarify, stats will always be available to the user via ethtool.
> > This is only giving users the option of disabling the reporting of
> > stats from the driver to the virtual NIC should the user decide they
> > do not want to share driver stats with Google as a matter of privacy.
>
> Okay, so this is for the to-hypervisor direction. Hopefully the patch
> split will make this clearer.
>
> > > > > > @@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
> > > > > >               dev_info(&priv->pdev->dev, "Device requested reset.\n");
> > > > > >               gve_set_do_reset(priv);
> > > > > >       }
> > > > > > +     if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
> > > > > > +             dev_info(&priv->pdev->dev, "Device report stats on.\n");
> > > > >
> > > > > How often is this printed?
> > > > Stats reporting is disabled by default. But when enabled, this would
> > > > only get printed whenever the virtual NIC detects
> > > > an issue and triggers a report-stats request.
> > >
> > > What kind of issue? Something serious? Packet drops?
> > Sorry, to correct myself, this would get printed only at the moments
> > when the device switches report-stats on, not on every stats report.
> > After that, it would not get printed until it is switched off and then
> > on again, so rarely.
> > It would get switched on if there is a networking issue such as packet
> > drops and help us investigate a stuck queue for example.
>
> Reporting of the stats is a one-shot:
>
> +       if (gve_get_do_report_stats(priv)) {
> +               gve_handle_report_stats(priv);
> +               gve_clear_do_report_stats(priv);
> +       }
>
> So the hypervisor implements some rate limiting on the requests?
Yes, not every packet drop would trigger the request, only if the
virtual NIC feels that "too many" are being dropped.
When report-stats is turned on (both user ethtool setting is enabled
and device has requested to turn it on), the stats would be updated
once every 20 seconds.

>
> In general, I don't think users will want these messages to keep
> popping up. A ethtool -S statistic for the number of times reporting
> happened should be sufficient. Or if you really want them - have a
> verbose version of the priv flag, maybe?
An ethtool stat (counting requests to turn report-stats on from the
virtual NIC) should suffice.
(I think a verbose logging option should probably apply to not just
this log, so maybe that's another patch in itself?)

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-08-27 19:24 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-18 19:43 [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 01/18] gve: Get and set Rx copybreak via ethtool David Awogbemila
2020-08-18 20:00   ` Andrew Lunn
2020-08-18 19:44 ` [PATCH net-next 02/18] gve: Add stats for gve David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 03/18] gve: Register netdev earlier David Awogbemila
2020-08-18 20:09   ` Andrew Lunn
2020-08-18 19:44 ` [PATCH net-next 04/18] gve: Add support for dma_mask register David Awogbemila
2020-08-18 20:15   ` Andrew Lunn
2020-08-18 19:44 ` [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags David Awogbemila
2020-08-19  3:13   ` Jakub Kicinski
2020-08-25 15:46     ` David Awogbemila
2020-08-25 16:46       ` Jakub Kicinski
2020-08-26  0:06         ` David Awogbemila
2020-08-26  0:53           ` Jakub Kicinski
2020-08-27 19:24             ` David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 06/18] gve: Batch AQ commands for creating and destroying queues David Awogbemila
2020-08-18 20:16   ` David Miller
2020-08-18 22:25     ` David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 07/18] gve: Use link status register to report link status David Awogbemila
2020-08-19  3:36   ` Jakub Kicinski
2020-08-25 15:46     ` David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 08/18] gve: Enable Link Speed Reporting in the driver David Awogbemila
2020-08-18 21:30   ` Jakub Kicinski
2020-08-18 19:44 ` [PATCH net-next 09/18] gve: Add support for raw addressing device option David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 10/18] gve: Add support for raw addressing to the rx path David Awogbemila
2020-08-18 20:18   ` David Miller
2020-08-18 22:25     ` David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 11/18] gve: Add support for raw addressing in the tx path David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 12/18] gve: Add netif_set_xps_queue call David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 13/18] gve: Add rx buffer pagecnt bias David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 14/18] gve: Move the irq db indexes out of the ntfy block struct David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 15/18] gve: Prefetch packet pages and packet descriptors David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 16/18] gve: Also WARN for skb index equals num_queues David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 17/18] gve: Switch to use napi_complete_done David Awogbemila
2020-08-18 19:44 ` [PATCH net-next 18/18] gve: Bump version to 1.1.0 David Awogbemila
2020-08-19  3:40   ` Jakub Kicinski
2020-08-18 20:19 ` [PATCH net-next 00/18] GVE Driver v1.1.0 Features David Miller
2020-08-18 22:24   ` David Awogbemila

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.