linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net: hns3: add support for TX push
@ 2021-06-22 11:11 Guangbin Huang
  2021-06-22 11:11 ` [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging Guangbin Huang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Guangbin Huang @ 2021-06-22 11:11 UTC (permalink / raw)
  To: davem, kuba, catalin.marinas, will, maz, mark.rutland, dbrazdil, qperret
  Cc: netdev, linux-kernel, linux-arm-kernel, lipeng321, huangguangbin2

This series adds TX push support for the HNS3 ethernet driver.

Huazhong Tan (2):
  net: hns3: add support for TX push mode
  net: hns3: add ethtool priv-flag for TX push

Xiongfeng Wang (1):
  arm64: barrier: add DGH macros to control memory accesses merging

 arch/arm64/include/asm/assembler.h                 |  7 ++
 arch/arm64/include/asm/barrier.h                   |  1 +
 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |  2 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 86 +++++++++++++++++++++-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |  6 ++
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 21 +++++-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c |  2 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 11 ++-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  8 ++
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c   |  2 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 11 ++-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  8 ++
 12 files changed, 156 insertions(+), 9 deletions(-)

-- 
2.8.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging
  2021-06-22 11:11 [PATCH net-next 0/3] net: hns3: add support for TX push Guangbin Huang
@ 2021-06-22 11:11 ` Guangbin Huang
  2021-06-22 12:16   ` Will Deacon
  2021-06-22 11:11 ` [PATCH net-next 2/3] net: hns3: add support for TX push mode Guangbin Huang
  2021-06-22 11:11 ` [PATCH net-next 3/3] net: hns3: add ethtool priv-flag for TX push Guangbin Huang
  2 siblings, 1 reply; 12+ messages in thread
From: Guangbin Huang @ 2021-06-22 11:11 UTC (permalink / raw)
  To: davem, kuba, catalin.marinas, will, maz, mark.rutland, dbrazdil, qperret
  Cc: netdev, linux-kernel, linux-arm-kernel, lipeng321, huangguangbin2

From: Xiongfeng Wang <wangxiongfeng2@huawei.com>

DGH prohibits merging memory accesses with Normal-NC or Device-GRE
attributes before the hint instruction with any memory accesses
appearing after the hint instruction. Provide macros to expose it to the
arch code.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
---
 arch/arm64/include/asm/assembler.h | 7 +++++++
 arch/arm64/include/asm/barrier.h   | 1 +
 2 files changed, 8 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 8418c1bd8f04..d723899328bd 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -90,6 +90,13 @@
 	.endm
 
 /*
+ * Data gathering hint
+ */
+	.macro	dgh
+	hint	#6
+	.endm
+
+/*
  * RAS Error Synchronization barrier
  */
 	.macro  esb
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 451e11e5fd23..02e1735706d2 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -22,6 +22,7 @@
 #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
 
+#define dgh()		asm volatile("hint #6" : : : "memory")
 #define psb_csync()	asm volatile("hint #17" : : : "memory")
 #define tsb_csync()	asm volatile("hint #18" : : : "memory")
 #define csdb()		asm volatile("hint #20" : : : "memory")
-- 
2.8.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/3] net: hns3: add support for TX push mode
  2021-06-22 11:11 [PATCH net-next 0/3] net: hns3: add support for TX push Guangbin Huang
  2021-06-22 11:11 ` [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging Guangbin Huang
@ 2021-06-22 11:11 ` Guangbin Huang
  2021-06-22 12:16   ` Will Deacon
  2021-06-22 11:11 ` [PATCH net-next 3/3] net: hns3: add ethtool priv-flag for TX push Guangbin Huang
  2 siblings, 1 reply; 12+ messages in thread
From: Guangbin Huang @ 2021-06-22 11:11 UTC (permalink / raw)
  To: davem, kuba, catalin.marinas, will, maz, mark.rutland, dbrazdil, qperret
  Cc: netdev, linux-kernel, linux-arm-kernel, lipeng321, huangguangbin2

From: Huazhong Tan <tanhuazhong@huawei.com>

For the device that supports the TX push capability, the BD can
be directly copied to the device memory. However, due to hardware
restrictions, the push mode can be used only when there are no
more than two BDs, otherwise, the doorbell mode based on device
memory is used.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |  1 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 83 ++++++++++++++++++++--
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |  6 ++
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  2 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c |  2 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 11 ++-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  8 +++
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c   |  2 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 11 ++-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  8 +++
 10 files changed, 126 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 0b202f4def83..3979d5d2e842 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -163,6 +163,7 @@ struct hnae3_handle;
 
 struct hnae3_queue {
 	void __iomem *io_base;
+	void __iomem *mem_base;
 	struct hnae3_ae_algo *ae_algo;
 	struct hnae3_handle *handle;
 	int tqp_index;		/* index in a handle */
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index cdb5f14fb6bc..8649bd8e1b57 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2002,9 +2002,77 @@ static int hns3_fill_skb_to_desc(struct hns3_enet_ring *ring,
 	return bd_num;
 }
 
+static void hns3_tx_push_bd(struct hns3_enet_ring *ring, int num)
+{
+#define HNS3_BYTES_PER_64BIT		8
+
+	struct hns3_desc desc[HNS3_MAX_PUSH_BD_NUM] = {};
+	int offset = 0;
+
+	/* make sure everything is visible to device before
+	 * excuting tx push or updating doorbell
+	 */
+	dma_wmb();
+
+	do {
+		int idx = (ring->next_to_use - num + ring->desc_num) %
+			  ring->desc_num;
+
+		u64_stats_update_begin(&ring->syncp);
+		ring->stats.tx_push++;
+		u64_stats_update_end(&ring->syncp);
+		memcpy(&desc[offset], &ring->desc[idx],
+		       sizeof(struct hns3_desc));
+		offset++;
+	} while (--num);
+
+	__iowrite64_copy(ring->tqp->mem_base, desc,
+			 (sizeof(struct hns3_desc) * HNS3_MAX_PUSH_BD_NUM) /
+			 HNS3_BYTES_PER_64BIT);
+
+#if defined(CONFIG_ARM64)
+	dgh();
+#endif
+}
+
+static void hns3_tx_mem_doorbell(struct hns3_enet_ring *ring)
+{
+#define HNS3_MEM_DOORBELL_OFFSET	64
+
+	__le64 bd_num = cpu_to_le64((u64)ring->pending_buf);
+
+	/* make sure everything is visible to device before
+	 * excuting tx push or updating doorbell
+	 */
+	dma_wmb();
+
+	__iowrite64_copy(ring->tqp->mem_base + HNS3_MEM_DOORBELL_OFFSET,
+			 &bd_num, 1);
+	u64_stats_update_begin(&ring->syncp);
+	ring->stats.tx_mem_doorbell += ring->pending_buf;
+	u64_stats_update_end(&ring->syncp);
+
+#if defined(CONFIG_ARM64)
+	dgh();
+#endif
+}
+
 static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num,
 			     bool doorbell)
 {
+	struct net_device *netdev = ring_to_netdev(ring);
+	struct hns3_nic_priv *priv = netdev_priv(netdev);
+
+	/* when tx push is enabled, the packet whose number of BD below
+	 * HNS3_MAX_PUSH_BD_NUM can be pushed directly.
+	 */
+	if (test_bit(HNS3_NIC_STATE_TX_PUSH_ENABLE, &priv->state) && num &&
+	    !ring->pending_buf && num <= HNS3_MAX_PUSH_BD_NUM && doorbell) {
+		hns3_tx_push_bd(ring, num);
+		WRITE_ONCE(ring->last_to_use, ring->next_to_use);
+		return;
+	}
+
 	ring->pending_buf += num;
 
 	if (!doorbell) {
@@ -2014,11 +2082,12 @@ static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num,
 		return;
 	}
 
-	if (!ring->pending_buf)
-		return;
+	if (ring->tqp->mem_base)
+		hns3_tx_mem_doorbell(ring);
+	else
+		writel(ring->pending_buf,
+		       ring->tqp->io_base + HNS3_RING_TX_RING_TAIL_REG);
 
-	writel(ring->pending_buf,
-	       ring->tqp->io_base + HNS3_RING_TX_RING_TAIL_REG);
 	ring->pending_buf = 0;
 	WRITE_ONCE(ring->last_to_use, ring->next_to_use);
 }
@@ -2713,6 +2782,9 @@ static bool hns3_get_tx_timeo_queue_info(struct net_device *ndev)
 		    tx_ring->stats.seg_pkt_cnt, tx_ring->stats.tx_more,
 		    tx_ring->stats.restart_queue, tx_ring->stats.tx_busy);
 
+	netdev_info(ndev, "tx_push: %llu, tx_mem_doorbell: %llu\n",
+		    tx_ring->stats.tx_push, tx_ring->stats.tx_mem_doorbell);
+
 	/* When mac received many pause frames continuous, it's unable to send
 	 * packets, which may cause tx timeout
 	 */
@@ -5060,6 +5132,9 @@ static int hns3_client_init(struct hnae3_handle *handle)
 	if (hnae3_ae_dev_rxd_adv_layout_supported(ae_dev))
 		set_bit(HNS3_NIC_STATE_RXD_ADV_LAYOUT_ENABLE, &priv->state);
 
+	if (test_bit(HNAE3_DEV_SUPPORT_TX_PUSH_B, ae_dev->caps))
+		set_bit(HNS3_NIC_STATE_TX_PUSH_ENABLE, &priv->state);
+
 	set_bit(HNS3_NIC_STATE_INITED, &priv->state);
 
 	if (ae_dev->dev_version >= HNAE3_DEVICE_VERSION_V3)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 15af3d93857b..277c4e1bdfa1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -6,6 +6,7 @@
 
 #include <linux/dim.h>
 #include <linux/if_vlan.h>
+#include <asm/barrier.h>
 
 #include "hnae3.h"
 
@@ -21,9 +22,12 @@ enum hns3_nic_state {
 	HNS3_NIC_STATE2_RESET_REQUESTED,
 	HNS3_NIC_STATE_HW_TX_CSUM_ENABLE,
 	HNS3_NIC_STATE_RXD_ADV_LAYOUT_ENABLE,
+	HNS3_NIC_STATE_TX_PUSH_ENABLE,
 	HNS3_NIC_STATE_MAX
 };
 
+#define HNS3_MAX_PUSH_BD_NUM		2
+
 #define HNS3_RING_RX_RING_BASEADDR_L_REG	0x00000
 #define HNS3_RING_RX_RING_BASEADDR_H_REG	0x00004
 #define HNS3_RING_RX_RING_BD_NUM_REG		0x00008
@@ -399,6 +403,8 @@ struct ring_stats {
 			u64 tx_pkts;
 			u64 tx_bytes;
 			u64 tx_more;
+			u64 tx_push;
+			u64 tx_mem_doorbell;
 			u64 restart_queue;
 			u64 tx_busy;
 			u64 tx_copy;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 82061ab6930f..155a58e11089 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -37,6 +37,8 @@ static const struct hns3_stats hns3_txq_stats[] = {
 	HNS3_TQP_STAT("packets", tx_pkts),
 	HNS3_TQP_STAT("bytes", tx_bytes),
 	HNS3_TQP_STAT("more", tx_more),
+	HNS3_TQP_STAT("push", tx_push),
+	HNS3_TQP_STAT("mem_doorbell", tx_mem_doorbell),
 	HNS3_TQP_STAT("wake", restart_queue),
 	HNS3_TQP_STAT("busy", tx_busy),
 	HNS3_TQP_STAT("copy", tx_copy),
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 887297e37cf3..fe985fd65870 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -395,6 +395,8 @@ static void hclge_parse_capability(struct hclge_dev *hdev,
 		set_bit(HNAE3_DEV_SUPPORT_PORT_VLAN_BYPASS_B, ae_dev->caps);
 		set_bit(HNAE3_DEV_SUPPORT_VLAN_FLTR_MDF_B, ae_dev->caps);
 	}
+	if (hnae3_get_bit(caps, HCLGE_CAP_TX_PUSH_B))
+		set_bit(HNAE3_DEV_SUPPORT_TX_PUSH_B, ae_dev->caps);
 }
 
 static __le32 hclge_build_api_caps(void)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index f3e482ab3c71..369b588abf84 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1642,6 +1642,7 @@ static int hclge_config_gro(struct hclge_dev *hdev, bool en)
 
 static int hclge_alloc_tqps(struct hclge_dev *hdev)
 {
+	struct hnae3_ae_dev *ae_dev = pci_get_drvdata(hdev->pdev);
 	struct hclge_tqp *tqp;
 	int i;
 
@@ -1675,6 +1676,14 @@ static int hclge_alloc_tqps(struct hclge_dev *hdev)
 					 (i - HCLGE_TQP_MAX_SIZE_DEV_V2) *
 					 HCLGE_TQP_REG_SIZE;
 
+		/* when device supports tx push and has device memory,
+		 * the queue can execute push mode or doorbell mode on
+		 * device memory.
+		 */
+		if (test_bit(HNAE3_DEV_SUPPORT_TX_PUSH_B, ae_dev->caps))
+			tqp->q.mem_base = hdev->hw.mem_base +
+					  HCLGE_TQP_MEM_OFFSET(hdev, i);
+
 		tqp++;
 	}
 
@@ -11249,8 +11258,6 @@ static void hclge_uninit_client_instance(struct hnae3_client *client,
 
 static int hclge_dev_mem_map(struct hclge_dev *hdev)
 {
-#define HCLGE_MEM_BAR		4
-
 	struct pci_dev *pdev = hdev->pdev;
 	struct hclge_hw *hw = &hdev->hw;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 3d3352491dba..db54fdf3ad38 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -194,6 +194,14 @@ enum HLCGE_PORT_TYPE {
 #define HCLGE_VECTOR0_IMP_RD_POISON_B	5U
 #define HCLGE_VECTOR0_ALL_MSIX_ERR_B	6U
 
+#define HCLGE_TQP_MEM_SIZE		0x10000
+#define HCLGE_MEM_BAR			4
+/* in the bar4, the first half is for roce, and the second half is for nic */
+#define HCLGE_NIC_MEM_OFFSET(hdev)	\
+	(pci_resource_len((hdev)->pdev, HCLGE_MEM_BAR) >> 1)
+#define HCLGE_TQP_MEM_OFFSET(hdev, i)	\
+	(HCLGE_NIC_MEM_OFFSET(hdev) + HCLGE_TQP_MEM_SIZE * (i))
+
 #define HCLGE_MAC_DEFAULT_FRAME \
 	(ETH_HLEN + ETH_FCS_LEN + 2 * VLAN_HLEN + ETH_DATA_LEN)
 #define HCLGE_MAC_MIN_FRAME		64
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
index bd19a2d89f6c..55c56c28bb81 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
@@ -361,6 +361,8 @@ static void hclgevf_parse_capability(struct hclgevf_dev *hdev,
 		set_bit(HNAE3_DEV_SUPPORT_UDP_TUNNEL_CSUM_B, ae_dev->caps);
 	if (hnae3_get_bit(caps, HCLGEVF_CAP_RXD_ADV_LAYOUT_B))
 		set_bit(HNAE3_DEV_SUPPORT_RXD_ADV_LAYOUT_B, ae_dev->caps);
+	if (hnae3_get_bit(caps, HCLGEVF_CAP_TX_PUSH_B))
+		set_bit(HNAE3_DEV_SUPPORT_TX_PUSH_B, ae_dev->caps);
 }
 
 static __le32 hclgevf_build_api_caps(void)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 52eaf82b7cd7..983894532b38 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -396,6 +396,7 @@ static int hclgevf_get_pf_media_type(struct hclgevf_dev *hdev)
 
 static int hclgevf_alloc_tqps(struct hclgevf_dev *hdev)
 {
+	struct hnae3_ae_dev *ae_dev = pci_get_drvdata(hdev->pdev);
 	struct hclgevf_tqp *tqp;
 	int i;
 
@@ -429,6 +430,14 @@ static int hclgevf_alloc_tqps(struct hclgevf_dev *hdev)
 					 (i - HCLGEVF_TQP_MAX_SIZE_DEV_V2) *
 					 HCLGEVF_TQP_REG_SIZE;
 
+		/* when device supports tx push and has device memory,
+		 * the queue can execute push mode or doorbell mode on
+		 * device memory.
+		 */
+		if (test_bit(HNAE3_DEV_SUPPORT_TX_PUSH_B, ae_dev->caps))
+			tqp->q.mem_base = hdev->hw.mem_base +
+					  HCLGEVF_TQP_MEM_OFFSET(hdev, i);
+
 		tqp++;
 	}
 
@@ -3001,8 +3010,6 @@ static void hclgevf_uninit_client_instance(struct hnae3_client *client,
 
 static int hclgevf_dev_mem_map(struct hclgevf_dev *hdev)
 {
-#define HCLGEVF_MEM_BAR		4
-
 	struct pci_dev *pdev = hdev->pdev;
 	struct hclgevf_hw *hw = &hdev->hw;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
index d7d02848d674..cacb7c23ca1c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
@@ -125,6 +125,14 @@
 #define HCLGEVF_RSS_INPUT_TUPLE_SCTP_NO_PORT	\
 	(HCLGEVF_D_IP_BIT | HCLGEVF_S_IP_BIT | HCLGEVF_V_TAG_BIT)
 
+#define HCLGEVF_TQP_MEM_SIZE		0x10000
+#define HCLGEVF_MEM_BAR			4
+/* in the bar4, the first half is for roce, and the second half is for nic */
+#define HCLGEVF_NIC_MEM_OFFSET(hdev)	\
+	(pci_resource_len((hdev)->pdev, HCLGEVF_MEM_BAR) >> 1)
+#define HCLGEVF_TQP_MEM_OFFSET(hdev, i)	\
+	(HCLGEVF_NIC_MEM_OFFSET(hdev) + HCLGEVF_TQP_MEM_SIZE * (i))
+
 #define HCLGEVF_MAC_MAX_FRAME		9728
 
 #define HCLGEVF_STATS_TIMER_INTERVAL	36U
-- 
2.8.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 3/3] net: hns3: add ethtool priv-flag for TX push
  2021-06-22 11:11 [PATCH net-next 0/3] net: hns3: add support for TX push Guangbin Huang
  2021-06-22 11:11 ` [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging Guangbin Huang
  2021-06-22 11:11 ` [PATCH net-next 2/3] net: hns3: add support for TX push mode Guangbin Huang
@ 2021-06-22 11:11 ` Guangbin Huang
  2 siblings, 0 replies; 12+ messages in thread
From: Guangbin Huang @ 2021-06-22 11:11 UTC (permalink / raw)
  To: davem, kuba, catalin.marinas, will, maz, mark.rutland, dbrazdil, qperret
  Cc: netdev, linux-kernel, linux-arm-kernel, lipeng321, huangguangbin2

From: Huazhong Tan <tanhuazhong@huawei.com>

Add a control private flag in ethtool for enable/disable
TX push feature.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |  1 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    |  5 ++++-
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 19 ++++++++++++++++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 3979d5d2e842..bebb91f7d9a4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -805,6 +805,7 @@ struct hnae3_roce_private_info {
 
 enum hnae3_pflag {
 	HNAE3_PFLAG_LIMIT_PROMISC,
+	HNAE3_PFLAG_PUSH_ENABLE,
 	HNAE3_PFLAG_MAX
 };
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 8649bd8e1b57..8ea6ad783e55 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -5132,8 +5132,11 @@ static int hns3_client_init(struct hnae3_handle *handle)
 	if (hnae3_ae_dev_rxd_adv_layout_supported(ae_dev))
 		set_bit(HNS3_NIC_STATE_RXD_ADV_LAYOUT_ENABLE, &priv->state);
 
-	if (test_bit(HNAE3_DEV_SUPPORT_TX_PUSH_B, ae_dev->caps))
+	if (test_bit(HNAE3_DEV_SUPPORT_TX_PUSH_B, ae_dev->caps)) {
 		set_bit(HNS3_NIC_STATE_TX_PUSH_ENABLE, &priv->state);
+		handle->priv_flags |= BIT(HNAE3_PFLAG_PUSH_ENABLE);
+		set_bit(HNAE3_PFLAG_PUSH_ENABLE, &handle->supported_pflags);
+	}
 
 	set_bit(HNS3_NIC_STATE_INITED, &priv->state);
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 155a58e11089..0b2557d4441d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -423,8 +423,25 @@ static void hns3_update_limit_promisc_mode(struct net_device *netdev,
 	hns3_request_update_promisc_mode(handle);
 }
 
+static void hns3_update_state(struct net_device *netdev,
+			      enum hns3_nic_state state, bool enable)
+{
+	struct hns3_nic_priv *priv = netdev_priv(netdev);
+
+	if (enable)
+		set_bit(state, &priv->state);
+	else
+		clear_bit(state, &priv->state);
+}
+
+static void hns3_update_push_state(struct net_device *netdev, bool enable)
+{
+	hns3_update_state(netdev, HNS3_NIC_STATE_TX_PUSH_ENABLE, enable);
+}
+
 static const struct hns3_pflag_desc hns3_priv_flags[HNAE3_PFLAG_MAX] = {
-	{ "limit_promisc",	hns3_update_limit_promisc_mode }
+	{ "limit_promisc",	hns3_update_limit_promisc_mode },
+	{ "tx_push_enable",	hns3_update_push_state }
 };
 
 static int hns3_get_sset_count(struct net_device *netdev, int stringset)
-- 
2.8.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/3] net: hns3: add support for TX push mode
  2021-06-22 11:11 ` [PATCH net-next 2/3] net: hns3: add support for TX push mode Guangbin Huang
@ 2021-06-22 12:16   ` Will Deacon
  2021-06-24 14:15     ` huangguangbin (A)
  0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2021-06-22 12:16 UTC (permalink / raw)
  To: Guangbin Huang
  Cc: davem, kuba, catalin.marinas, maz, mark.rutland, dbrazdil,
	qperret, netdev, linux-kernel, linux-arm-kernel, lipeng321,
	peterz

On Tue, Jun 22, 2021 at 07:11:10PM +0800, Guangbin Huang wrote:
> From: Huazhong Tan <tanhuazhong@huawei.com>
> 
> For the device that supports the TX push capability, the BD can
> be directly copied to the device memory. However, due to hardware
> restrictions, the push mode can be used only when there are no
> more than two BDs, otherwise, the doorbell mode based on device
> memory is used.
> 
> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
> ---
>  drivers/net/ethernet/hisilicon/hns3/hnae3.h        |  1 +
>  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 83 ++++++++++++++++++++--
>  drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |  6 ++
>  drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  2 +
>  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c |  2 +
>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 11 ++-
>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  8 +++
>  .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c   |  2 +
>  .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 11 ++-
>  .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  8 +++
>  10 files changed, 126 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
> index 0b202f4def83..3979d5d2e842 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
> +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
> @@ -163,6 +163,7 @@ struct hnae3_handle;
>  
>  struct hnae3_queue {
>  	void __iomem *io_base;
> +	void __iomem *mem_base;
>  	struct hnae3_ae_algo *ae_algo;
>  	struct hnae3_handle *handle;
>  	int tqp_index;		/* index in a handle */
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> index cdb5f14fb6bc..8649bd8e1b57 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> @@ -2002,9 +2002,77 @@ static int hns3_fill_skb_to_desc(struct hns3_enet_ring *ring,
>  	return bd_num;
>  }
>  
> +static void hns3_tx_push_bd(struct hns3_enet_ring *ring, int num)
> +{
> +#define HNS3_BYTES_PER_64BIT		8
> +
> +	struct hns3_desc desc[HNS3_MAX_PUSH_BD_NUM] = {};
> +	int offset = 0;
> +
> +	/* make sure everything is visible to device before
> +	 * excuting tx push or updating doorbell
> +	 */
> +	dma_wmb();
> +
> +	do {
> +		int idx = (ring->next_to_use - num + ring->desc_num) %
> +			  ring->desc_num;
> +
> +		u64_stats_update_begin(&ring->syncp);
> +		ring->stats.tx_push++;
> +		u64_stats_update_end(&ring->syncp);
> +		memcpy(&desc[offset], &ring->desc[idx],
> +		       sizeof(struct hns3_desc));
> +		offset++;
> +	} while (--num);
> +
> +	__iowrite64_copy(ring->tqp->mem_base, desc,
> +			 (sizeof(struct hns3_desc) * HNS3_MAX_PUSH_BD_NUM) /
> +			 HNS3_BYTES_PER_64BIT);
> +
> +#if defined(CONFIG_ARM64)
> +	dgh();
> +#endif

It looks a bit weird putting this at the end of the function, given that
it's supposed to do something to a pair of accesses. Please can you explain
what it's doing, and also provide some numbers to show that it's worthwhile
(given that it's a performance hint not a correctness thing afaict).

> +}
> +
> +static void hns3_tx_mem_doorbell(struct hns3_enet_ring *ring)
> +{
> +#define HNS3_MEM_DOORBELL_OFFSET	64
> +
> +	__le64 bd_num = cpu_to_le64((u64)ring->pending_buf);
> +
> +	/* make sure everything is visible to device before
> +	 * excuting tx push or updating doorbell
> +	 */
> +	dma_wmb();
> +
> +	__iowrite64_copy(ring->tqp->mem_base + HNS3_MEM_DOORBELL_OFFSET,
> +			 &bd_num, 1);
> +	u64_stats_update_begin(&ring->syncp);
> +	ring->stats.tx_mem_doorbell += ring->pending_buf;
> +	u64_stats_update_end(&ring->syncp);
> +
> +#if defined(CONFIG_ARM64)
> +	dgh();
> +#endif

Same here.

Thanks,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging
  2021-06-22 11:11 ` [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging Guangbin Huang
@ 2021-06-22 12:16   ` Will Deacon
  2021-06-22 12:32     ` Mark Rutland
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Will Deacon @ 2021-06-22 12:16 UTC (permalink / raw)
  To: Guangbin Huang
  Cc: davem, kuba, catalin.marinas, maz, mark.rutland, dbrazdil,
	qperret, netdev, linux-kernel, linux-arm-kernel, lipeng321,
	peterz

On Tue, Jun 22, 2021 at 07:11:09PM +0800, Guangbin Huang wrote:
> From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> 
> DGH prohibits merging memory accesses with Normal-NC or Device-GRE
> attributes before the hint instruction with any memory accesses
> appearing after the hint instruction. Provide macros to expose it to the
> arch code.

Hmm.

The architecture states:

  | DGH is a hint instruction. A DGH instruction is not expected to be
  | performance optimal to merge memory accesses with Normal Non-cacheable
  | or Device-GRE attributes appearing in program order before the hint
  | instruction with any memory accesses appearing after the hint instruction
  | into a single memory transaction on an interconnect.

which doesn't make a whole lot of sense to me, in all honesty.

> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
> ---
>  arch/arm64/include/asm/assembler.h | 7 +++++++
>  arch/arm64/include/asm/barrier.h   | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 8418c1bd8f04..d723899328bd 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -90,6 +90,13 @@
>  	.endm
>  
>  /*
> + * Data gathering hint
> + */
> +	.macro	dgh
> +	hint	#6
> +	.endm
> +
> +/*
>   * RAS Error Synchronization barrier
>   */
>  	.macro  esb
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index 451e11e5fd23..02e1735706d2 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -22,6 +22,7 @@
>  #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
>  #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
>  
> +#define dgh()		asm volatile("hint #6" : : : "memory")

Although I'm fine with this in arm64, I don't think this is the interface
which drivers should be using. Instead, once we know what this instruction
is supposed to do, we should look at exposing it as part of the I/O barriers
and providing a NOP implementation for other architectures. That way,
drivers can use it without having to have the #ifdef CONFIG_ARM64 stuff that
you have in the later patches here.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging
  2021-06-22 12:16   ` Will Deacon
@ 2021-06-22 12:32     ` Mark Rutland
  2021-06-24 14:18       ` huangguangbin (A)
  2021-06-24 13:38     ` huangguangbin (A)
  2021-06-29 11:11     ` Xiongfeng Wang
  2 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2021-06-22 12:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Guangbin Huang, davem, kuba, catalin.marinas, maz, dbrazdil,
	qperret, netdev, linux-kernel, linux-arm-kernel, lipeng321,
	peterz

On Tue, Jun 22, 2021 at 01:16:31PM +0100, Will Deacon wrote:
> On Tue, Jun 22, 2021 at 07:11:09PM +0800, Guangbin Huang wrote:
> > From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> > 
> > DGH prohibits merging memory accesses with Normal-NC or Device-GRE
> > attributes before the hint instruction with any memory accesses
> > appearing after the hint instruction. Provide macros to expose it to the
> > arch code.
> 
> Hmm.
> 
> The architecture states:
> 
>   | DGH is a hint instruction. A DGH instruction is not expected to be
>   | performance optimal to merge memory accesses with Normal Non-cacheable
>   | or Device-GRE attributes appearing in program order before the hint
>   | instruction with any memory accesses appearing after the hint instruction
>   | into a single memory transaction on an interconnect.
> 
> which doesn't make a whole lot of sense to me, in all honesty.

I think there are some missing words, and this was supposed to say
something like:

| DGH is a hint instruction. A DGH instruction *indicates that it* is
| not expected to be performance optimal to merge memory accesses with
| Normal Non-cacheable or Device-GRE attributes appearing in program
| order before the hint instruction with any memory accesses appearing
| after the hint instruction into a single memory transaction on an
| interconnect.

... i.e. it's a hint to the CPU to avoid merging accesses which are
either side of the DGH, so that the prior accesses don't get
indefinitely delayed waiting to be merged.

I'll try to get the documentation fixed, since as-is the wording does
not make sense.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging
  2021-06-22 12:16   ` Will Deacon
  2021-06-22 12:32     ` Mark Rutland
@ 2021-06-24 13:38     ` huangguangbin (A)
  2021-06-29 11:11     ` Xiongfeng Wang
  2 siblings, 0 replies; 12+ messages in thread
From: huangguangbin (A) @ 2021-06-24 13:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: davem, kuba, catalin.marinas, maz, mark.rutland, dbrazdil,
	qperret, netdev, linux-kernel, linux-arm-kernel, lipeng321,
	peterz



On 2021/6/22 20:16, Will Deacon wrote:
> On Tue, Jun 22, 2021 at 07:11:09PM +0800, Guangbin Huang wrote:
>> From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>>
>> DGH prohibits merging memory accesses with Normal-NC or Device-GRE
>> attributes before the hint instruction with any memory accesses
>> appearing after the hint instruction. Provide macros to expose it to the
>> arch code.
> 
> Hmm.
> 
> The architecture states:
> 
>    | DGH is a hint instruction. A DGH instruction is not expected to be
>    | performance optimal to merge memory accesses with Normal Non-cacheable
>    | or Device-GRE attributes appearing in program order before the hint
>    | instruction with any memory accesses appearing after the hint instruction
>    | into a single memory transaction on an interconnect.
> 
> which doesn't make a whole lot of sense to me, in all honesty.
> 
Thanks for your review and modification of commit log.

>> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
>> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
>> ---
>>   arch/arm64/include/asm/assembler.h | 7 +++++++
>>   arch/arm64/include/asm/barrier.h   | 1 +
>>   2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index 8418c1bd8f04..d723899328bd 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -90,6 +90,13 @@
>>   	.endm
>>   
>>   /*
>> + * Data gathering hint
>> + */
>> +	.macro	dgh
>> +	hint	#6
>> +	.endm
>> +
>> +/*
>>    * RAS Error Synchronization barrier
>>    */
>>   	.macro  esb
>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> index 451e11e5fd23..02e1735706d2 100644
>> --- a/arch/arm64/include/asm/barrier.h
>> +++ b/arch/arm64/include/asm/barrier.h
>> @@ -22,6 +22,7 @@
>>   #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
>>   #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
>>   
>> +#define dgh()		asm volatile("hint #6" : : : "memory")
> 
> Although I'm fine with this in arm64, I don't think this is the interface
> which drivers should be using. Instead, once we know what this instruction
> is supposed to do, we should look at exposing it as part of the I/O barriers
> and providing a NOP implementation for other architectures. That way,
> drivers can use it without having to have the #ifdef CONFIG_ARM64 stuff that
> you have in the later patches here.
> 
> Will
> .
> 
Ok, thanks, we will try to implement a new I/O barriers interface as your opinion
and repost a new version after we test ok.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/3] net: hns3: add support for TX push mode
  2021-06-22 12:16   ` Will Deacon
@ 2021-06-24 14:15     ` huangguangbin (A)
  0 siblings, 0 replies; 12+ messages in thread
From: huangguangbin (A) @ 2021-06-24 14:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: davem, kuba, catalin.marinas, maz, mark.rutland, dbrazdil,
	qperret, netdev, linux-kernel, linux-arm-kernel, lipeng321,
	peterz



On 2021/6/22 20:16, Will Deacon wrote:
> On Tue, Jun 22, 2021 at 07:11:10PM +0800, Guangbin Huang wrote:
>> From: Huazhong Tan <tanhuazhong@huawei.com>
>>
>> For the device that supports the TX push capability, the BD can
>> be directly copied to the device memory. However, due to hardware
>> restrictions, the push mode can be used only when there are no
>> more than two BDs, otherwise, the doorbell mode based on device
>> memory is used.
>>
>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
>> ---
>>   drivers/net/ethernet/hisilicon/hns3/hnae3.h        |  1 +
>>   drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 83 ++++++++++++++++++++--
>>   drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |  6 ++
>>   drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  2 +
>>   .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c |  2 +
>>   .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 11 ++-
>>   .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  8 +++
>>   .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c   |  2 +
>>   .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 11 ++-
>>   .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  8 +++
>>   10 files changed, 126 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
>> index 0b202f4def83..3979d5d2e842 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
>> @@ -163,6 +163,7 @@ struct hnae3_handle;
>>   
>>   struct hnae3_queue {
>>   	void __iomem *io_base;
>> +	void __iomem *mem_base;
>>   	struct hnae3_ae_algo *ae_algo;
>>   	struct hnae3_handle *handle;
>>   	int tqp_index;		/* index in a handle */
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
>> index cdb5f14fb6bc..8649bd8e1b57 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
>> @@ -2002,9 +2002,77 @@ static int hns3_fill_skb_to_desc(struct hns3_enet_ring *ring,
>>   	return bd_num;
>>   }
>>   
>> +static void hns3_tx_push_bd(struct hns3_enet_ring *ring, int num)
>> +{
>> +#define HNS3_BYTES_PER_64BIT		8
>> +
>> +	struct hns3_desc desc[HNS3_MAX_PUSH_BD_NUM] = {};
>> +	int offset = 0;
>> +
>> +	/* make sure everything is visible to device before
>> +	 * excuting tx push or updating doorbell
>> +	 */
>> +	dma_wmb();
>> +
>> +	do {
>> +		int idx = (ring->next_to_use - num + ring->desc_num) %
>> +			  ring->desc_num;
>> +
>> +		u64_stats_update_begin(&ring->syncp);
>> +		ring->stats.tx_push++;
>> +		u64_stats_update_end(&ring->syncp);
>> +		memcpy(&desc[offset], &ring->desc[idx],
>> +		       sizeof(struct hns3_desc));
>> +		offset++;
>> +	} while (--num);
>> +
>> +	__iowrite64_copy(ring->tqp->mem_base, desc,
>> +			 (sizeof(struct hns3_desc) * HNS3_MAX_PUSH_BD_NUM) /
>> +			 HNS3_BYTES_PER_64BIT);
>> +
>> +#if defined(CONFIG_ARM64)
>> +	dgh();
>> +#endif
> 
> It looks a bit weird putting this at the end of the function, given that
> it's supposed to do something to a pair of accesses. Please can you explain
> what it's doing, and also provide some numbers to show that it's worthwhile
> (given that it's a performance hint not a correctness thing afaict).
> 
When the driver writes the device space mapped to the WriteCombine,
CPU combines into the cacheline unit by using the merge window mechanism
and delivers the cacheline to the device. However, even if the cacheline
is full, the device space is delivered only after the merge window
ends. (There is about 10ns delay at 3G frequency). To reduce the delay,
the WriteCombine needs to be flushed explicitly. This is why the DGH
needs to be invoked here.

>> +}
>> +
>> +static void hns3_tx_mem_doorbell(struct hns3_enet_ring *ring)
>> +{
>> +#define HNS3_MEM_DOORBELL_OFFSET	64
>> +
>> +	__le64 bd_num = cpu_to_le64((u64)ring->pending_buf);
>> +
>> +	/* make sure everything is visible to device before
>> +	 * excuting tx push or updating doorbell
>> +	 */
>> +	dma_wmb();
>> +
>> +	__iowrite64_copy(ring->tqp->mem_base + HNS3_MEM_DOORBELL_OFFSET,
>> +			 &bd_num, 1);
>> +	u64_stats_update_begin(&ring->syncp);
>> +	ring->stats.tx_mem_doorbell += ring->pending_buf;
>> +	u64_stats_update_end(&ring->syncp);
>> +
>> +#if defined(CONFIG_ARM64)
>> +	dgh();
>> +#endif
> 
> Same here.
> 
> Thanks,
> 
> Will
> .
> 
Thanks,

Guangbin
.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging
  2021-06-22 12:32     ` Mark Rutland
@ 2021-06-24 14:18       ` huangguangbin (A)
  0 siblings, 0 replies; 12+ messages in thread
From: huangguangbin (A) @ 2021-06-24 14:18 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon
  Cc: davem, kuba, catalin.marinas, maz, dbrazdil, qperret, netdev,
	linux-kernel, linux-arm-kernel, lipeng321, peterz



On 2021/6/22 20:32, Mark Rutland wrote:
> On Tue, Jun 22, 2021 at 01:16:31PM +0100, Will Deacon wrote:
>> On Tue, Jun 22, 2021 at 07:11:09PM +0800, Guangbin Huang wrote:
>>> From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>>>
>>> DGH prohibits merging memory accesses with Normal-NC or Device-GRE
>>> attributes before the hint instruction with any memory accesses
>>> appearing after the hint instruction. Provide macros to expose it to the
>>> arch code.
>>
>> Hmm.
>>
>> The architecture states:
>>
>>    | DGH is a hint instruction. A DGH instruction is not expected to be
>>    | performance optimal to merge memory accesses with Normal Non-cacheable
>>    | or Device-GRE attributes appearing in program order before the hint
>>    | instruction with any memory accesses appearing after the hint instruction
>>    | into a single memory transaction on an interconnect.
>>
>> which doesn't make a whole lot of sense to me, in all honesty.
> 
> I think there are some missing words, and this was supposed to say
> something like:
> 
> | DGH is a hint instruction. A DGH instruction *indicates that it* is
> | not expected to be performance optimal to merge memory accesses with
> | Normal Non-cacheable or Device-GRE attributes appearing in program
> | order before the hint instruction with any memory accesses appearing
> | after the hint instruction into a single memory transaction on an
> | interconnect.
> 
> ... i.e. it's a hint to the CPU to avoid merging accesses which are
> either side of the DGH, so that the prior accesses don't get
> indefinitely delayed waiting to be merged.
> 
> I'll try to get the documentation fixed, since as-is the wording does
> not make sense.
> 
> Thanks,
> Mark.
> .
> 
Thanks very much, we will fix the documentation.

Thanks,
Guangbin,
.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging
  2021-06-22 12:16   ` Will Deacon
  2021-06-22 12:32     ` Mark Rutland
  2021-06-24 13:38     ` huangguangbin (A)
@ 2021-06-29 11:11     ` Xiongfeng Wang
  2021-07-13  7:27       ` Xiongfeng Wang
  2 siblings, 1 reply; 12+ messages in thread
From: Xiongfeng Wang @ 2021-06-29 11:11 UTC (permalink / raw)
  To: Will Deacon, Guangbin Huang
  Cc: davem, kuba, catalin.marinas, maz, mark.rutland, dbrazdil,
	qperret, netdev, linux-kernel, linux-arm-kernel, lipeng321,
	peterz

Hi Will,

On 2021/6/22 20:16, Will Deacon wrote:
> On Tue, Jun 22, 2021 at 07:11:09PM +0800, Guangbin Huang wrote:
>> From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>>
>> DGH prohibits merging memory accesses with Normal-NC or Device-GRE
>> attributes before the hint instruction with any memory accesses
>> appearing after the hint instruction. Provide macros to expose it to the
>> arch code.
> 
> Hmm.
> 
> The architecture states:
> 
>   | DGH is a hint instruction. A DGH instruction is not expected to be
>   | performance optimal to merge memory accesses with Normal Non-cacheable
>   | or Device-GRE attributes appearing in program order before the hint
>   | instruction with any memory accesses appearing after the hint instruction
>   | into a single memory transaction on an interconnect.
> 
> which doesn't make a whole lot of sense to me, in all honesty.
> 
>> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
>> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
>> ---
>>  arch/arm64/include/asm/assembler.h | 7 +++++++
>>  arch/arm64/include/asm/barrier.h   | 1 +
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index 8418c1bd8f04..d723899328bd 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -90,6 +90,13 @@
>>  	.endm
>>  
>>  /*
>> + * Data gathering hint
>> + */
>> +	.macro	dgh
>> +	hint	#6
>> +	.endm
>> +
>> +/*
>>   * RAS Error Synchronization barrier
>>   */
>>  	.macro  esb
>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> index 451e11e5fd23..02e1735706d2 100644
>> --- a/arch/arm64/include/asm/barrier.h
>> +++ b/arch/arm64/include/asm/barrier.h
>> @@ -22,6 +22,7 @@
>>  #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
>>  #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
>>  
>> +#define dgh()		asm volatile("hint #6" : : : "memory")
> 
> Although I'm fine with this in arm64, I don't think this is the interface
> which drivers should be using. Instead, once we know what this instruction
> is supposed to do, we should look at exposing it as part of the I/O barriers
> and providing a NOP implementation for other architectures. That way,
> drivers can use it without having to have the #ifdef CONFIG_ARM64 stuff that
> you have in the later patches here.

How about we adding a interface called flush_wc_writeX(), which can be used to
flush the write-combined buffers to the device immediately.
I found it has been disscussed in the below link, but it is unnessary in their
situation.
https://patchwork.ozlabs.org/project/netdev/patch/20200102180830.66676-3-liran.alon@oracle.com/

Thanks,
Xiongfeng

> 
> Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging
  2021-06-29 11:11     ` Xiongfeng Wang
@ 2021-07-13  7:27       ` Xiongfeng Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Xiongfeng Wang @ 2021-07-13  7:27 UTC (permalink / raw)
  To: Will Deacon, Guangbin Huang
  Cc: davem, kuba, catalin.marinas, maz, mark.rutland, dbrazdil,
	qperret, netdev, linux-kernel, linux-arm-kernel, lipeng321,
	peterz

Hi,

On 2021/6/29 19:11, Xiongfeng Wang wrote:
> Hi Will,
> 
> On 2021/6/22 20:16, Will Deacon wrote:
>> On Tue, Jun 22, 2021 at 07:11:09PM +0800, Guangbin Huang wrote:
>>> From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>>>
>>> DGH prohibits merging memory accesses with Normal-NC or Device-GRE
>>> attributes before the hint instruction with any memory accesses
>>> appearing after the hint instruction. Provide macros to expose it to the
>>> arch code.
>>
>> Hmm.
>>
>> The architecture states:
>>
>>   | DGH is a hint instruction. A DGH instruction is not expected to be
>>   | performance optimal to merge memory accesses with Normal Non-cacheable
>>   | or Device-GRE attributes appearing in program order before the hint
>>   | instruction with any memory accesses appearing after the hint instruction
>>   | into a single memory transaction on an interconnect.
>>
>> which doesn't make a whole lot of sense to me, in all honesty.
>>
>>> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>>> Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
>>> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
>>> ---
>>>  arch/arm64/include/asm/assembler.h | 7 +++++++
>>>  arch/arm64/include/asm/barrier.h   | 1 +
>>>  2 files changed, 8 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>>> index 8418c1bd8f04..d723899328bd 100644
>>> --- a/arch/arm64/include/asm/assembler.h
>>> +++ b/arch/arm64/include/asm/assembler.h
>>> @@ -90,6 +90,13 @@
>>>  	.endm
>>>  
>>>  /*
>>> + * Data gathering hint
>>> + */
>>> +	.macro	dgh
>>> +	hint	#6
>>> +	.endm
>>> +
>>> +/*
>>>   * RAS Error Synchronization barrier
>>>   */
>>>  	.macro  esb
>>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>>> index 451e11e5fd23..02e1735706d2 100644
>>> --- a/arch/arm64/include/asm/barrier.h
>>> +++ b/arch/arm64/include/asm/barrier.h
>>> @@ -22,6 +22,7 @@
>>>  #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
>>>  #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
>>>  
>>> +#define dgh()		asm volatile("hint #6" : : : "memory")
>>
>> Although I'm fine with this in arm64, I don't think this is the interface
>> which drivers should be using. Instead, once we know what this instruction
>> is supposed to do, we should look at exposing it as part of the I/O barriers
>> and providing a NOP implementation for other architectures. That way,
>> drivers can use it without having to have the #ifdef CONFIG_ARM64 stuff that
>> you have in the later patches here.
> 
> How about we adding a interface called flush_wc_writeX(), which can be used to
> flush the write-combined buffers to the device immediately.
> I found it has been disscussed in the below link, but it is unnessary in their
> situation.
> https://patchwork.ozlabs.org/project/netdev/patch/20200102180830.66676-3-liran.alon@oracle.com/

Do you have some suggestions on this problem ? How about we adding an interface
called flush_wc_writeX() ?

Thanks,
Xiongfeng

> 
> Thanks,
> Xiongfeng
> 
>>
>> Will
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>> .
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-13  7:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-22 11:11 [PATCH net-next 0/3] net: hns3: add support for TX push Guangbin Huang
2021-06-22 11:11 ` [PATCH net-next 1/3] arm64: barrier: add DGH macros to control memory accesses merging Guangbin Huang
2021-06-22 12:16   ` Will Deacon
2021-06-22 12:32     ` Mark Rutland
2021-06-24 14:18       ` huangguangbin (A)
2021-06-24 13:38     ` huangguangbin (A)
2021-06-29 11:11     ` Xiongfeng Wang
2021-07-13  7:27       ` Xiongfeng Wang
2021-06-22 11:11 ` [PATCH net-next 2/3] net: hns3: add support for TX push mode Guangbin Huang
2021-06-22 12:16   ` Will Deacon
2021-06-24 14:15     ` huangguangbin (A)
2021-06-22 11:11 ` [PATCH net-next 3/3] net: hns3: add ethtool priv-flag for TX push Guangbin Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).