linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
@ 2020-09-15 17:10 Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 02/14] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
                   ` (15 more replies)
  0 siblings, 16 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli

Hello,

This is the second version of the patch-set to upstream the GAUDI NIC code
into the habanalabs driver.

The only modification from v2 is in the ethtool patch (patch 12). Details
are in that patch's commit message.

Link to v2 cover letter:
https://lkml.org/lkml/2020/9/12/201

Thanks,
Oded

Omer Shpigelman (14):
  habanalabs/gaudi: add NIC H/W and registers definitions
  habanalabs/gaudi: add NIC firmware-related definitions
  habanalabs/gaudi: add NIC security configuration
  habanalabs/gaudi: add support for NIC QMANs
  habanalabs/gaudi: add NIC Ethernet support
  habanalabs/gaudi: add NIC PHY code
  habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL
  habanalabs/gaudi: add a new IOCTL for NIC control operations
  habanalabs/gaudi: add CQ control operations
  habanalabs/gaudi: add WQ control operations
  habanalabs/gaudi: add QP error handling
  habanalabs/gaudi: Add ethtool support using coresight
  habanalabs/gaudi: support DCB protocol
  habanalabs/gaudi: add NIC init/fini calls from common code

 drivers/misc/habanalabs/common/context.c      |    1 +
 drivers/misc/habanalabs/common/device.c       |   24 +-
 drivers/misc/habanalabs/common/firmware_if.c  |   44 +
 drivers/misc/habanalabs/common/habanalabs.h   |   33 +-
 .../misc/habanalabs/common/habanalabs_drv.c   |    5 +
 .../misc/habanalabs/common/habanalabs_ioctl.c |  151 +-
 drivers/misc/habanalabs/common/pci.c          |    1 +
 drivers/misc/habanalabs/gaudi/Makefile        |    3 +
 drivers/misc/habanalabs/gaudi/gaudi.c         |  957 +++-
 drivers/misc/habanalabs/gaudi/gaudiP.h        |  331 +-
 .../misc/habanalabs/gaudi/gaudi_coresight.c   |  144 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 4093 +++++++++++++++++
 drivers/misc/habanalabs/gaudi/gaudi_nic.h     |  353 ++
 .../misc/habanalabs/gaudi/gaudi_nic_dcbnl.c   |  108 +
 .../misc/habanalabs/gaudi/gaudi_nic_ethtool.c |  616 +++
 drivers/misc/habanalabs/gaudi/gaudi_phy.c     | 1276 +++++
 .../misc/habanalabs/gaudi/gaudi_security.c    | 3973 ++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |   44 +
 .../misc/habanalabs/include/common/cpucp_if.h |   34 +-
 .../include/gaudi/asic_reg/gaudi_regs.h       |   26 +-
 .../include/gaudi/asic_reg/nic0_qm0_masks.h   |  800 ++++
 .../include/gaudi/asic_reg/nic0_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic0_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic0_qpc0_masks.h  |  500 ++
 .../include/gaudi/asic_reg/nic0_qpc0_regs.h   |  710 +++
 .../include/gaudi/asic_reg/nic0_qpc1_regs.h   |  710 +++
 .../include/gaudi/asic_reg/nic0_rxb_regs.h    |  508 ++
 .../include/gaudi/asic_reg/nic0_rxe0_masks.h  |  354 ++
 .../include/gaudi/asic_reg/nic0_rxe0_regs.h   |  158 +
 .../include/gaudi/asic_reg/nic0_rxe1_regs.h   |  158 +
 .../include/gaudi/asic_reg/nic0_stat_regs.h   |  518 +++
 .../include/gaudi/asic_reg/nic0_tmr_regs.h    |  184 +
 .../include/gaudi/asic_reg/nic0_txe0_masks.h  |  336 ++
 .../include/gaudi/asic_reg/nic0_txe0_regs.h   |  264 ++
 .../include/gaudi/asic_reg/nic0_txe1_regs.h   |  264 ++
 .../include/gaudi/asic_reg/nic0_txs0_masks.h  |  336 ++
 .../include/gaudi/asic_reg/nic0_txs0_regs.h   |  214 +
 .../include/gaudi/asic_reg/nic0_txs1_regs.h   |  214 +
 .../include/gaudi/asic_reg/nic1_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic1_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic2_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic2_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic3_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic3_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic4_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic4_qm1_regs.h    |  834 ++++
 drivers/misc/habanalabs/include/gaudi/gaudi.h |   12 +
 .../habanalabs/include/gaudi/gaudi_fw_if.h    |   24 +
 .../habanalabs/include/gaudi/gaudi_masks.h    |   15 +
 .../include/hw_ip/nic/nic_general.h           |   13 +
 include/uapi/misc/habanalabs.h                |  296 +-
 51 files changed, 27083 insertions(+), 62 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 02/14] habanalabs/gaudi: add NIC firmware-related definitions
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 03/14] habanalabs/gaudi: add NIC security configuration Oded Gabbay
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add new structures and messages that the driver use to interact with the
firmware to receive information and events (errors) about GAUDI's NIC.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 .../misc/habanalabs/include/common/cpucp_if.h | 34 ++++++++++++++++---
 .../habanalabs/include/gaudi/gaudi_fw_if.h    | 24 +++++++++++++
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/habanalabs/include/common/cpucp_if.h b/drivers/misc/habanalabs/include/common/cpucp_if.h
index 2a5c9cb3d505..782b8b8636be 100644
--- a/drivers/misc/habanalabs/include/common/cpucp_if.h
+++ b/drivers/misc/habanalabs/include/common/cpucp_if.h
@@ -9,6 +9,7 @@
 #define CPUCP_IF_H
 
 #include <linux/types.h>
+#include <linux/if_ether.h>
 
 /*
  * EVENT QUEUE
@@ -199,6 +200,11 @@ enum pq_init_status {
  *       CpuCP to write to the structure, to prevent data corruption in case of
  *       mismatched driver/FW versions.
  *
+ * CPUCP_PACKET_NIC_INFO_GET -
+ *       Fetch information from the device regarding the NIC. the host's driver
+ *       passes the max size it allows the CpuCP to write to the structure, to
+ *       prevent data corruption in case of mismatched driver/FW versions.
+ *
  * CPUCP_PACKET_TEMPERATURE_SET -
  *       Set the value of the offset property of a specified thermal sensor.
  *       The packet's arguments specify the desired sensor and the field to
@@ -244,12 +250,12 @@ enum cpucp_packet_id {
 	CPUCP_PACKET_MAX_POWER_GET,		/* sysfs */
 	CPUCP_PACKET_MAX_POWER_SET,		/* sysfs */
 	CPUCP_PACKET_EEPROM_DATA_GET,		/* sysfs */
-	CPUCP_RESERVED,
+	CPUCP_PACKET_NIC_INFO_GET,		/* internal */
 	CPUCP_PACKET_TEMPERATURE_SET,		/* sysfs */
 	CPUCP_PACKET_VOLTAGE_SET,		/* sysfs */
 	CPUCP_PACKET_CURRENT_SET,		/* sysfs */
-	CPUCP_PACKET_PCIE_THROUGHPUT_GET,		/* internal */
-	CPUCP_PACKET_PCIE_REPLAY_CNT_GET,		/* internal */
+	CPUCP_PACKET_PCIE_THROUGHPUT_GET,	/* internal */
+	CPUCP_PACKET_PCIE_REPLAY_CNT_GET,	/* internal */
 	CPUCP_PACKET_TOTAL_ENERGY_GET,		/* internal */
 	CPUCP_PACKET_PLL_REG_GET,		/* internal */
 };
@@ -300,7 +306,7 @@ struct cpucp_packet {
 		/* For led set */
 		__le32 led_index;
 
-		/* For get CpuCP info/EEPROM data */
+		/* For get CpuCP info/EEPROM data/NIC info */
 		__le32 data_max_size;
 	};
 
@@ -392,6 +398,12 @@ struct eq_generic_event {
 #define CARD_NAME_MAX_LEN		16
 #define VERSION_MAX_LEN			128
 #define CPUCP_MAX_SENSORS		128
+#define CPUCP_MAX_NICS			128
+#define CPUCP_LANES_PER_NIC		4
+#define CPUCP_NIC_QSFP_EEPROM_MAX_LEN	1024
+#define CPUCP_MAX_NIC_LANES		(CPUCP_MAX_NICS * CPUCP_LANES_PER_NIC)
+#define CPUCP_NIC_MASK_ARR_LEN		((CPUCP_MAX_NICS + 63) / 64)
+#define CPUCP_NIC_POLARITY_ARR_LEN	((CPUCP_MAX_NIC_LANES + 63) / 64)
 
 struct cpucp_sensor {
 	__le32 type;
@@ -440,4 +452,18 @@ struct cpucp_info {
 	char card_name[CARD_NAME_MAX_LEN];
 };
 
+struct cpucp_mac_addr {
+	__u8 mac_addr[ETH_ALEN];
+};
+
+struct cpucp_nic_info {
+	struct cpucp_mac_addr mac_addrs[CPUCP_MAX_NICS];
+	__le64 link_mask[CPUCP_NIC_MASK_ARR_LEN];
+	__le64 pol_tx_mask[CPUCP_NIC_POLARITY_ARR_LEN];
+	__le64 pol_rx_mask[CPUCP_NIC_POLARITY_ARR_LEN];
+	__le64 link_ext_mask[CPUCP_NIC_MASK_ARR_LEN];
+	__u8 qsfp_eeprom[CPUCP_NIC_QSFP_EEPROM_MAX_LEN];
+	__le64 auto_neg_mask[CPUCP_NIC_MASK_ARR_LEN];
+};
+
 #endif /* CPUCP_IF_H */
diff --git a/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h b/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h
index 8aadc6357da1..d61a4c87b765 100644
--- a/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h
+++ b/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h
@@ -8,6 +8,8 @@
 #ifndef GAUDI_FW_IF_H
 #define GAUDI_FW_IF_H
 
+#include <linux/types.h>
+
 #define GAUDI_EVENT_QUEUE_MSI_IDX	8
 #define GAUDI_NIC_PORT1_MSI_IDX		10
 #define GAUDI_NIC_PORT3_MSI_IDX		12
@@ -31,6 +33,28 @@ enum gaudi_pll_index {
 	IF_PLL
 };
 
+enum gaudi_nic_axi_error {
+	RXB,
+	RXE,
+	TXS,
+	TXE,
+	QPC_RESP,
+	NON_AXI_ERR,
+};
+
+/*
+ * struct eq_nic_sei_event - describes an AXI error cause.
+ * @axi_error_cause: one of the events defined in enum gaudi_nic_axi_error.
+ * @id: can be either 0 or 1, to further describe unit with interrupt cause
+ *      (i.e. TXE0 or TXE1).
+ * @pad[6]: padding structure to 64bit.
+ */
+struct eq_nic_sei_event {
+	__u8 axi_error_cause;
+	__u8 id;
+	__u8 pad[6];
+};
+
 #define GAUDI_PLL_FREQ_LOW		200000000 /* 200 MHz */
 
 #endif /* GAUDI_FW_IF_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 03/14] habanalabs/gaudi: add NIC security configuration
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 02/14] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 04/14] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Configure the security properties of the NIC IP. This is to prevent the
user process from doing something with the NIC that he shouldn't do. e.g.
crash the server, steal data, etc.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 .../misc/habanalabs/gaudi/gaudi_security.c    | 3973 +++++++++++++++++
 1 file changed, 3973 insertions(+)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi_security.c b/drivers/misc/habanalabs/gaudi/gaudi_security.c
index 2d7add0e5bcc..8a921ab56557 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_security.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_security.c
@@ -5157,6 +5157,3977 @@ static void gaudi_init_dma_protection_bits(struct hl_device *hdev)
 	WREG32(pb_addr + word_offset, ~mask);
 }
 
+static void gaudi_init_nic_protection_bits(struct hl_device *hdev)
+{
+	u32 pb_addr, mask;
+	u8 word_offset;
+
+	WREG32(mmNIC0_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC0_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC0_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+				PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC1_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC1_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC1_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC2_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC2_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC2_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_2 & PROT_BITS_OFFS)
+				>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC3_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC3_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC3_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC4_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC4_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC4_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+}
+
 static void gaudi_init_tpc_protection_bits(struct hl_device *hdev)
 {
 	u32 pb_addr, mask;
@@ -8861,6 +12832,8 @@ static void gaudi_init_protection_bits(struct hl_device *hdev)
 
 	gaudi_init_mme_protection_bits(hdev);
 
+	gaudi_init_nic_protection_bits(hdev);
+
 	gaudi_init_tpc_protection_bits(hdev);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 04/14] habanalabs/gaudi: add support for NIC QMANs
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 02/14] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 03/14] habanalabs/gaudi: add NIC security configuration Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 05/14] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Initialize the QMANs that are responsible to submit doorbells to the NIC
engines. Add support for stopping and disabling them, and reset them as
part of the hard-reset procedure of GAUDI. This will allow the user to
submit work to the NICs.

Add support for receiving events on QMAN errors from the firmware.

However, the nic_ports_mask is still initialized to 0. That means this code
won't initialize the QMANs just yet. That will be in a later patch.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/habanalabs.h |   3 +-
 drivers/misc/habanalabs/gaudi/gaudi.c       | 741 ++++++++++++++++++--
 drivers/misc/habanalabs/gaudi/gaudiP.h      |  32 +
 3 files changed, 731 insertions(+), 45 deletions(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index eaa9bf3f82a3..146cf14d4d81 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -1574,8 +1574,6 @@ struct hl_mmu_funcs {
  * @pmmu_huge_range: is a different virtual addresses range used for PMMU with
  *                   huge pages.
  * @init_done: is the initialization of the device done.
- * @mmu_enable: is MMU enabled.
- * @mmu_huge_page_opt: is MMU huge pages optimization enabled.
  * @device_cpu_disabled: is the device CPU disabled (due to timeouts)
  * @dma_mask: the dma mask that was set for this device
  * @in_debug: is device under debug. This, together with fpriv_list, enforces
@@ -1691,6 +1689,7 @@ struct hl_device {
 	u8				supports_cb_mapping;
 
 	/* Parameters for bring-up */
+	u64				nic_ports_mask;
 	u8				mmu_enable;
 	u8				mmu_huge_page_opt;
 	u8				cpu_enable;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 6f7f6ad7a358..ecf89d1e37c8 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -301,46 +301,46 @@ static enum hl_queue_type gaudi_queue_type[GAUDI_QUEUE_ID_SIZE] = {
 	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_1 */
 	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_2 */
 	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_3 */
 };
 
 struct ecc_info_extract_params {
@@ -793,6 +793,27 @@ static int gaudi_late_init(struct hl_device *hdev)
 		return rc;
 	}
 
+	if ((hdev->card_type == cpucp_card_type_pci) &&
+			(hdev->nic_ports_mask & 0x3)) {
+		dev_info(hdev->dev,
+			"PCI card detected, only 8 ports are enabled\n");
+		hdev->nic_ports_mask &= ~0x3;
+
+		/* Stop and disable unused NIC QMANs */
+		WREG32(mmNIC0_QM0_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+		WREG32(mmNIC0_QM1_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+		WREG32(mmNIC0_QM0_GLBL_CFG0, 0);
+		WREG32(mmNIC0_QM1_GLBL_CFG0, 0);
+
+		gaudi->hw_cap_initialized &= ~(HW_CAP_NIC0 | HW_CAP_NIC1);
+	}
+
 	rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS);
 	if (rc) {
 		dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
@@ -939,6 +960,9 @@ static int gaudi_alloc_internal_qmans_pq_mem(struct hl_device *hdev)
 		case GAUDI_QUEUE_ID_TPC_0_0 ... GAUDI_QUEUE_ID_TPC_7_3:
 			q->pq_size = TPC_QMAN_SIZE_IN_BYTES;
 			break;
+		case GAUDI_QUEUE_ID_NIC_0_0 ... GAUDI_QUEUE_ID_NIC_9_3:
+			q->pq_size = NIC_QMAN_SIZE_IN_BYTES;
+			break;
 		default:
 			dev_err(hdev->dev, "Bad internal queue index %d", i);
 			rc = -EINVAL;
@@ -2333,6 +2357,133 @@ static void gaudi_init_tpc_qmans(struct hl_device *hdev)
 	}
 }
 
+static void gaudi_init_nic_qman(struct hl_device *hdev, u32 nic_offset,
+				int qman_id, u64 qman_base_addr, int nic_id)
+{
+	u32 mtr_base_lo, mtr_base_hi;
+	u32 so_base_lo, so_base_hi;
+	u32 q_off;
+	u32 nic_qm_err_cfg;
+
+	mtr_base_lo = lower_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+	mtr_base_hi = upper_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+	so_base_lo = lower_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+	so_base_hi = upper_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+	q_off = nic_offset + qman_id * 4;
+
+	WREG32(mmNIC0_QM0_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_base_addr));
+	WREG32(mmNIC0_QM0_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_base_addr));
+
+	WREG32(mmNIC0_QM0_PQ_SIZE_0 + q_off, ilog2(NIC_QMAN_LENGTH));
+	WREG32(mmNIC0_QM0_PQ_PI_0 + q_off, 0);
+	WREG32(mmNIC0_QM0_PQ_CI_0 + q_off, 0);
+
+	WREG32(mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_0 + q_off, 0x74);
+	WREG32(mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off, 0x14);
+	WREG32(mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off, 0x1C);
+
+	WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_lo);
+	WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_hi);
+	WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_lo);
+	WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_hi);
+
+	if (qman_id == 0) {
+		/* Configure RAZWI IRQ */
+		nic_qm_err_cfg = NIC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
+		if (hdev->stop_on_err) {
+			nic_qm_err_cfg |=
+				NIC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
+		}
+
+		WREG32(mmNIC0_QM0_GLBL_ERR_CFG + nic_offset, nic_qm_err_cfg);
+		WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_LO + nic_offset,
+			lower_32_bits(CFG_BASE +
+				mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR));
+		WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_HI + nic_offset,
+			upper_32_bits(CFG_BASE +
+				mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR));
+		WREG32(mmNIC0_QM0_GLBL_ERR_WDATA + nic_offset,
+			gaudi_irq_map_table[GAUDI_EVENT_NIC0_QM0].cpu_id +
+									nic_id);
+
+		WREG32(mmNIC0_QM0_ARB_ERR_MSG_EN + nic_offset,
+				QM_ARB_ERR_MSG_EN_MASK);
+
+		/* Increase ARB WDT to support streams architecture */
+		WREG32(mmNIC0_QM0_ARB_SLV_CHOISE_WDT + nic_offset,
+				GAUDI_ARB_WDT_TIMEOUT);
+
+		WREG32(mmNIC0_QM0_GLBL_CFG1 + nic_offset, 0);
+		WREG32(mmNIC0_QM0_GLBL_PROT + nic_offset,
+				QMAN_INTERNAL_MAKE_TRUSTED);
+	}
+}
+
+/**
+ * gaudi_init_nic_qmans - Initialize NIC QMAN registers
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Initialize the H/W registers of the NIC QMANs
+ *
+ */
+void gaudi_init_nic_qmans(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_internal_qman_info *q;
+	u64 qman_base_addr;
+	u32 nic_offset = 0;
+	u32 nic_delta_between_qmans =
+			mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	u32 nic_delta_between_nics =
+			mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	int i, nic_id, internal_q_index;
+
+	if (!hdev->nic_ports_mask)
+		return;
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC_MASK)
+		return;
+
+	dev_dbg(hdev->dev, "Initializing NIC QMANs\n");
+
+	for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
+		if (!(hdev->nic_ports_mask & (1 << nic_id))) {
+			nic_offset += nic_delta_between_qmans;
+			if (nic_id & 1) {
+				nic_offset -= (nic_delta_between_qmans * 2);
+				nic_offset += nic_delta_between_nics;
+			}
+			continue;
+		}
+
+		for (i = 0 ; i < QMAN_STREAMS ; i++) {
+			internal_q_index = GAUDI_QUEUE_ID_NIC_0_0 +
+						nic_id * QMAN_STREAMS + i;
+			q = &gaudi->internal_qmans[internal_q_index];
+			qman_base_addr = (u64) q->pq_dma_addr;
+			gaudi_init_nic_qman(hdev, nic_offset, (i & 0x3),
+						qman_base_addr, nic_id);
+		}
+
+		/* Enable the QMAN */
+		WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, NIC_QMAN_ENABLE);
+
+		nic_offset += nic_delta_between_qmans;
+		if (nic_id & 1) {
+			nic_offset -= (nic_delta_between_qmans * 2);
+			nic_offset += nic_delta_between_nics;
+		}
+
+		gaudi->hw_cap_initialized |= 1 << (HW_CAP_NIC_SHIFT + nic_id);
+	}
+}
+
 static void gaudi_disable_pci_dma_qmans(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -2385,6 +2536,30 @@ static void gaudi_disable_tpc_qmans(struct hl_device *hdev)
 	}
 }
 
+static void gaudi_disable_nic_qmans(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 nic_mask, nic_offset = 0;
+	u32 nic_delta_between_qmans =
+			mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	u32 nic_delta_between_nics =
+			mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	int nic_id;
+
+	for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
+		nic_mask = 1 << (HW_CAP_NIC_SHIFT + nic_id);
+
+		if (gaudi->hw_cap_initialized & nic_mask)
+			WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, 0);
+
+		nic_offset += nic_delta_between_qmans;
+		if (nic_id & 1) {
+			nic_offset -= (nic_delta_between_qmans * 2);
+			nic_offset += nic_delta_between_nics;
+		}
+	}
+}
+
 static void gaudi_stop_pci_dma_qmans(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -2443,6 +2618,73 @@ static void gaudi_stop_tpc_qmans(struct hl_device *hdev)
 	WREG32(mmTPC7_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 }
 
+static void gaudi_stop_nic_qmans(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	/* Stop upper CPs of QMANs */
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC0)
+		WREG32(mmNIC0_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC1)
+		WREG32(mmNIC0_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC2)
+		WREG32(mmNIC1_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC3)
+		WREG32(mmNIC1_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC4)
+		WREG32(mmNIC2_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC5)
+		WREG32(mmNIC2_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC6)
+		WREG32(mmNIC3_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC7)
+		WREG32(mmNIC3_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC8)
+		WREG32(mmNIC4_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC9)
+		WREG32(mmNIC4_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+}
+
 static void gaudi_pci_dma_stall(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -2632,6 +2874,7 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 	else
 		wait_timeout_ms = GAUDI_RESET_WAIT_MSEC;
 
+	gaudi_stop_nic_qmans(hdev);
 
 	gaudi_stop_mme_qmans(hdev);
 	gaudi_stop_tpc_qmans(hdev);
@@ -2649,6 +2892,7 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 
 	msleep(wait_timeout_ms);
 
+	gaudi_disable_nic_qmans(hdev);
 	gaudi_disable_mme_qmans(hdev);
 	gaudi_disable_tpc_qmans(hdev);
 	gaudi_disable_hbm_dma_qmans(hdev);
@@ -2964,11 +3208,13 @@ static int gaudi_hw_init(struct hl_device *hdev)
 
 	gaudi_init_tpc_qmans(hdev);
 
+	gaudi_init_nic_qmans(hdev);
+
 	hdev->asic_funcs->set_clock_gating(hdev);
 
 	gaudi_enable_timestamp(hdev);
 
-	/* MSI must be enabled before CPU queues are initialized */
+	/* MSI must be enabled before CPU queues and NIC are initialized */
 	rc = gaudi_enable_msi(hdev);
 	if (rc)
 		goto disable_queues;
@@ -3067,7 +3313,7 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset)
 					HW_CAP_HBM | HW_CAP_PCI_DMA |
 					HW_CAP_MME | HW_CAP_TPC_MASK |
 					HW_CAP_HBM_DMA | HW_CAP_PLL |
-					HW_CAP_MMU |
+					HW_CAP_NIC_MASK | HW_CAP_MMU |
 					HW_CAP_SRAM_SCRAMBLER |
 					HW_CAP_HBM_SCRAMBLER |
 					HW_CAP_CLK_GATE);
@@ -3337,6 +3583,166 @@ static void gaudi_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 		db_reg_offset = mmTPC7_QM_PQ_PI_3;
 		break;
 
+	case GAUDI_QUEUE_ID_NIC_0_0:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_0_1:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_0_2:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_0_3:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_0:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_1:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_2:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_3:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_0:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_1:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_2:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_3:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_0:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_1:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_2:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_3:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_0:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_1:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_2:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_3:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_0:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_1:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_2:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_3:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_0:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_1:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_2:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_3:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_0:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_1:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_2:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_3:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_0:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_1:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_2:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_3:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_0:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_1:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_2:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_3:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_3;
+		break;
+
 	default:
 		invalid_queue = true;
 	}
@@ -4233,6 +4639,17 @@ static int gaudi_parse_cb_no_ext_queue(struct hl_device *hdev,
 					struct hl_cs_parser *parser)
 {
 	struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 nic_mask_q_id = 1 << (HW_CAP_NIC_SHIFT +
+		((parser->hw_queue_id - GAUDI_QUEUE_ID_NIC_0_0) >> 2));
+
+	if ((parser->hw_queue_id >= GAUDI_QUEUE_ID_NIC_0_0) &&
+			(parser->hw_queue_id <= GAUDI_QUEUE_ID_NIC_9_3) &&
+			(!(gaudi->hw_cap_initialized & nic_mask_q_id))) {
+		dev_err(hdev->dev, "h/w queue %d is disabled\n",
+				parser->hw_queue_id);
+		return -EINVAL;
+	}
 
 	/* For internal queue jobs just check if CB address is valid */
 	if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
@@ -4466,6 +4883,12 @@ static void gaudi_restore_qm_registers(struct hl_device *hdev)
 		qman_offset = i * TPC_QMAN_OFFSET;
 		WREG32(mmTPC0_QM_ARB_CFG_0 + qman_offset, 0);
 	}
+
+	for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
+		qman_offset = (i >> 1) * NIC_MACRO_QMAN_OFFSET +
+				(i & 0x1) * NIC_ENGINE_QMAN_OFFSET;
+		WREG32(mmNIC0_QM0_ARB_CFG_0 + qman_offset, 0);
+	}
 }
 
 static void gaudi_restore_user_registers(struct hl_device *hdev)
@@ -4900,6 +5323,136 @@ static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid)
 	gaudi_mmu_prepare_reg(hdev, mmMME2_ACC_WBC, asid);
 	gaudi_mmu_prepare_reg(hdev, mmMME3_ACC_WBC, asid);
 
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC0) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC1) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC2) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC3) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC4) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC5) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC6) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC7) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC8) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC9) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
 	gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_ARUSER, asid);
 	gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_AWUSER, asid);
 
@@ -5429,6 +5982,8 @@ static void gaudi_handle_ecc_event(struct hl_device *hdev, u16 event_type,
 		params.num_memories = 33;
 		params.derr = true;
 		params.disable_clock_gating = true;
+		extract_info_from_fw = false;
+		break;
 	default:
 		return;
 	}
@@ -5480,6 +6035,56 @@ static void gaudi_handle_qman_err(struct hl_device *hdev, u16 event_type)
 			mmDMA0_QM_ARB_ERR_CAUSE + index * DMA_QMAN_OFFSET;
 		snprintf(desc, ARRAY_SIZE(desc), "%s%d", "DMA_QM", index);
 		break;
+	case GAUDI_EVENT_NIC0_QM0:
+		glbl_sts_addr = mmNIC0_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC0_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM0");
+		break;
+	case GAUDI_EVENT_NIC0_QM1:
+		glbl_sts_addr = mmNIC0_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC0_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM1");
+		break;
+	case GAUDI_EVENT_NIC1_QM0:
+		glbl_sts_addr = mmNIC1_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC1_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM0");
+		break;
+	case GAUDI_EVENT_NIC1_QM1:
+		glbl_sts_addr = mmNIC1_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC1_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM1");
+		break;
+	case GAUDI_EVENT_NIC2_QM0:
+		glbl_sts_addr = mmNIC2_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC2_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM0");
+		break;
+	case GAUDI_EVENT_NIC2_QM1:
+		glbl_sts_addr = mmNIC2_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC2_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM1");
+		break;
+	case GAUDI_EVENT_NIC3_QM0:
+		glbl_sts_addr = mmNIC3_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC3_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM0");
+		break;
+	case GAUDI_EVENT_NIC3_QM1:
+		glbl_sts_addr = mmNIC3_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC3_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM1");
+		break;
+	case GAUDI_EVENT_NIC4_QM0:
+		glbl_sts_addr = mmNIC4_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC4_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM0");
+		break;
+	case GAUDI_EVENT_NIC4_QM1:
+		glbl_sts_addr = mmNIC4_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC4_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM1");
+		break;
 	default:
 		return;
 	}
@@ -5857,6 +6462,16 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
 	case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
 		fallthrough;
+	case GAUDI_EVENT_NIC0_QM0:
+	case GAUDI_EVENT_NIC0_QM1:
+	case GAUDI_EVENT_NIC1_QM0:
+	case GAUDI_EVENT_NIC1_QM1:
+	case GAUDI_EVENT_NIC2_QM0:
+	case GAUDI_EVENT_NIC2_QM1:
+	case GAUDI_EVENT_NIC3_QM0:
+	case GAUDI_EVENT_NIC3_QM1:
+	case GAUDI_EVENT_NIC4_QM0:
+	case GAUDI_EVENT_NIC4_QM1:
 	case GAUDI_EVENT_DMA0_CORE ... GAUDI_EVENT_DMA7_CORE:
 		gaudi_print_irq_info(hdev, event_type, true);
 		gaudi_handle_qman_err(hdev, event_type);
@@ -6090,10 +6705,11 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
 	struct gaudi_device *gaudi = hdev->asic_specific;
 	const char *fmt = "%-5d%-9s%#-14x%#-12x%#x\n";
 	const char *mme_slave_fmt = "%-5d%-9s%-14s%-12s%#x\n";
+	const char *nic_fmt = "%-5d%-9s%#-14x%#x\n";
 	u32 qm_glbl_sts0, qm_cgm_sts, dma_core_sts0, tpc_cfg_sts, mme_arch_sts;
 	bool is_idle = true, is_eng_idle, is_slave;
 	u64 offset;
-	int i, dma_id;
+	int i, dma_id, port;
 
 	mutex_lock(&gaudi->clk_gate_mutex);
 
@@ -6182,6 +6798,45 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
 		}
 	}
 
+	if (s)
+		seq_puts(s, "\nNIC  is_idle  QM_GLBL_STS0  QM_CGM_STS\n"
+				"---  -------  ------------  ----------\n");
+
+	for (i = 0 ; i < (NIC_NUMBER_OF_ENGINES / 2) ; i++) {
+		offset = i * NIC_MACRO_QMAN_OFFSET;
+		port = 2 * i;
+		if (hdev->nic_ports_mask & BIT(port)) {
+			qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
+			qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
+			is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
+			is_idle &= is_eng_idle;
+
+			if (mask)
+				*mask |= ((u64) !is_eng_idle) <<
+						(GAUDI_ENGINE_ID_NIC_0 + port);
+			if (s)
+				seq_printf(s, nic_fmt, port,
+						is_eng_idle ? "Y" : "N",
+						qm_glbl_sts0, qm_cgm_sts);
+		}
+
+		port = 2 * i + 1;
+		if (hdev->nic_ports_mask & BIT(port)) {
+			qm_glbl_sts0 = RREG32(mmNIC0_QM1_GLBL_STS0 + offset);
+			qm_cgm_sts = RREG32(mmNIC0_QM1_CGM_STS + offset);
+			is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
+			is_idle &= is_eng_idle;
+
+			if (mask)
+				*mask |= ((u64) !is_eng_idle) <<
+						(GAUDI_ENGINE_ID_NIC_0 + port);
+			if (s)
+				seq_printf(s, nic_fmt, port,
+						is_eng_idle ? "Y" : "N",
+						qm_glbl_sts0, qm_cgm_sts);
+		}
+	}
+
 	if (s)
 		seq_puts(s, "\n");
 
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index b70b810c21c9..858434d50b59 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -79,6 +79,7 @@
 #define TPC_QMAN_OFFSET		(mmTPC1_QM_BASE - mmTPC0_QM_BASE)
 #define MME_QMAN_OFFSET		(mmMME1_QM_BASE - mmMME0_QM_BASE)
 #define NIC_MACRO_QMAN_OFFSET	(mmNIC1_QM0_BASE - mmNIC0_QM0_BASE)
+#define NIC_ENGINE_QMAN_OFFSET	(mmNIC0_QM1_BASE - mmNIC0_QM0_BASE)
 
 #define TPC_CFG_OFFSET		(mmTPC1_CFG_BASE - mmTPC0_CFG_BASE)
 
@@ -132,6 +133,10 @@
 #define TPC_QMAN_LENGTH			1024
 #define TPC_QMAN_SIZE_IN_BYTES		(TPC_QMAN_LENGTH * QMAN_PQ_ENTRY_SIZE)
 
+#define NIC_QMAN_LENGTH			1024
+#define NIC_QMAN_SIZE_IN_BYTES		(NIC_QMAN_LENGTH * QMAN_PQ_ENTRY_SIZE)
+
+
 #define SRAM_USER_BASE_OFFSET  GAUDI_DRIVER_SRAM_RESERVED_SIZE_FROM_START
 
 /* Virtual address space */
@@ -153,6 +158,19 @@
 #define HW_CAP_SRAM_SCRAMBLER	BIT(10)
 #define HW_CAP_HBM_SCRAMBLER	BIT(11)
 
+#define HW_CAP_NIC0		BIT(14)
+#define HW_CAP_NIC1		BIT(15)
+#define HW_CAP_NIC2		BIT(16)
+#define HW_CAP_NIC3		BIT(17)
+#define HW_CAP_NIC4		BIT(18)
+#define HW_CAP_NIC5		BIT(19)
+#define HW_CAP_NIC6		BIT(20)
+#define HW_CAP_NIC7		BIT(21)
+#define HW_CAP_NIC8		BIT(22)
+#define HW_CAP_NIC9		BIT(23)
+#define HW_CAP_NIC_MASK		GENMASK(23, 14)
+#define HW_CAP_NIC_SHIFT	14
+
 #define HW_CAP_TPC0		BIT(24)
 #define HW_CAP_TPC1		BIT(25)
 #define HW_CAP_TPC2		BIT(26)
@@ -200,6 +218,20 @@ enum gaudi_tpc_mask {
 	GAUDI_TPC_MASK_ALL = 0xFF
 };
 
+enum gaudi_nic_mask {
+	GAUDI_NIC_MASK_NIC0 = 0x01,
+	GAUDI_NIC_MASK_NIC1 = 0x02,
+	GAUDI_NIC_MASK_NIC2 = 0x04,
+	GAUDI_NIC_MASK_NIC3 = 0x08,
+	GAUDI_NIC_MASK_NIC4 = 0x10,
+	GAUDI_NIC_MASK_NIC5 = 0x20,
+	GAUDI_NIC_MASK_NIC6 = 0x40,
+	GAUDI_NIC_MASK_NIC7 = 0x80,
+	GAUDI_NIC_MASK_NIC8 = 0x100,
+	GAUDI_NIC_MASK_NIC9 = 0x200,
+	GAUDI_NIC_MASK_ALL = 0x3FF
+};
+
 /**
  * struct gaudi_internal_qman_info - Internal QMAN information.
  * @pq_kernel_addr: Kernel address of the PQ memory area in the host.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 05/14] habanalabs/gaudi: add NIC Ethernet support
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (2 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 04/14] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 06/14] habanalabs/gaudi: add NIC PHY code Oded Gabbay
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Basic NIC driver which handles Ethernet packet of several types like IPv4,
IPv6, LLDP, VLAN and ARP.

The NIC HW is composed of 5 NIC macros, in each macro 2 NIC engines of
100GbE each. Each engine exposes a single port of 100GbE, so in total we
have 10 ports per GAUDI device.

The driver gets the needed information for initialization from the firmware
such as card type, available ports, Auto-negotiation support, polarity and
Tx taps configuration.

Two card types are supported: standalone PCI and PCI Mezzanine Card (PMC)
which is part of a server called HLS-1. Each type has its own unique
configurations.

We define two types of port connectivity - internal and external. Internal
port is connected to a port on another Gaudi card and external port is
connected to a switch.

The Ethernet support is needed only for control flows e.g. get IP. Hence it
is implemented in a very simple way - the packets are copied rather than
using descriptors.

The Rx flow uses NAPI by default and polling mode is supported by a
kernel module parameter.

Because we must not access the HW while doing hard reset to the device, a
new stage of stopping all NIC activity is added at the beginning of the
reset flow.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/context.c      |    1 +
 drivers/misc/habanalabs/common/firmware_if.c  |   44 +
 drivers/misc/habanalabs/common/habanalabs.h   |   13 +-
 .../misc/habanalabs/common/habanalabs_drv.c   |    4 +
 drivers/misc/habanalabs/gaudi/Makefile        |    2 +
 drivers/misc/habanalabs/gaudi/gaudi.c         |  176 +-
 drivers/misc/habanalabs/gaudi/gaudiP.h        |  286 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 2327 +++++++++++++++++
 drivers/misc/habanalabs/gaudi/gaudi_nic.h     |  336 +++
 drivers/misc/habanalabs/goya/goya.c           |    6 +
 include/uapi/misc/habanalabs.h                |    3 +
 11 files changed, 3190 insertions(+), 8 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h

diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index df8171a2226c..39b12d00e287 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -37,6 +37,7 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
 		if ((hdev->in_debug) && (hdev->compute_ctx == ctx))
 			hl_device_set_debug_mode(hdev, false);
 
+		hdev->asic_funcs->ctx_fini(ctx);
 		hl_cb_va_pool_fini(ctx);
 		hl_vm_ctx_fini(ctx);
 		hl_asid_free(hdev, ctx->asid);
diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c
index 4409962d30ae..95260dc0458b 100644
--- a/drivers/misc/habanalabs/common/firmware_if.c
+++ b/drivers/misc/habanalabs/common/firmware_if.c
@@ -364,6 +364,50 @@ int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
 	return rc;
 }
 
+int hl_fw_cpucp_nic_info_get(struct hl_device *hdev)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct cpucp_packet pkt = {};
+	void *cpucp_nic_info_cpu_addr;
+	dma_addr_t cpucp_nic_info_dma_addr;
+	long result;
+	int rc;
+
+	cpucp_nic_info_cpu_addr =
+			hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev,
+					sizeof(struct cpucp_nic_info),
+					&cpucp_nic_info_dma_addr);
+	if (!cpucp_nic_info_cpu_addr) {
+		dev_err(hdev->dev,
+			"Failed to allocate DMA memory for CPU-CP NIC info packet\n");
+		return -ENOMEM;
+	}
+
+	memset(cpucp_nic_info_cpu_addr, 0, sizeof(struct cpucp_nic_info));
+
+	pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_INFO_GET <<
+				CPUCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.addr = cpu_to_le64(cpucp_nic_info_dma_addr);
+	pkt.data_max_size = cpu_to_le32(sizeof(struct cpucp_nic_info));
+
+	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
+					HL_CPUCP_INFO_TIMEOUT_USEC, &result);
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to handle CPU-CP NIC info pkt, error %d\n", rc);
+		goto out;
+	}
+
+	memcpy(&prop->cpucp_nic_info, cpucp_nic_info_cpu_addr,
+			sizeof(prop->cpucp_nic_info));
+
+out:
+	hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev,
+			sizeof(struct cpucp_nic_info), cpucp_nic_info_cpu_addr);
+
+	return rc;
+}
+
 int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
 		struct hl_info_pci_counters *counters)
 {
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 146cf14d4d81..45feb4884ab3 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -270,6 +270,8 @@ struct hl_mmu_properties {
  * @hw_queues_props: H/W queues properties.
  * @cpucp_info: received various information from CPU-CP regarding the H/W, e.g.
  *		available sensors.
+ * @cpucp_nic_info: received various information from CPU-CP regarding the NIC
+ *                  H/W, e.g. MAC addresses.
  * @uboot_ver: F/W U-boot version.
  * @preboot_ver: F/W Preboot version.
  * @dmmu: DRAM MMU address translation properties.
@@ -284,7 +286,7 @@ struct hl_mmu_properties {
  * @dram_user_base_address: DRAM physical start address for user access.
  * @dram_size: DRAM total size.
  * @dram_pci_bar_size: size of PCI bar towards DRAM.
- * @max_power_default: max power of the device after reset
+ * @max_power_default: max power of the device after reset.
  * @dram_size_for_default_page_mapping: DRAM size needed to map to avoid page
  *                                      fault.
  * @pcie_dbi_base_address: Base address of the PCIE_DBI block.
@@ -324,6 +326,7 @@ struct hl_mmu_properties {
 struct asic_fixed_properties {
 	struct hw_queue_properties	*hw_queues_props;
 	struct cpucp_info		cpucp_info;
+	struct cpucp_nic_info		cpucp_nic_info;
 	char				uboot_ver[VERSION_MAX_LEN];
 	char				preboot_ver[VERSION_MAX_LEN];
 	struct hl_mmu_properties	dmmu;
@@ -697,6 +700,7 @@ enum div_select_defs {
  * @wreg: Write a register. Needed for simulator support.
  * @halt_coresight: stop the ETF and ETR traces.
  * @ctx_init: context dependent initialization.
+ * @ctx_fini: context dependent cleanup.
  * @get_clk_rate: Retrieve the ASIC current and maximum clock rate in MHz
  * @get_queue_id_for_cq: Get the H/W queue id related to the given CQ index.
  * @read_device_fw_version: read the device's firmware versions that are
@@ -799,6 +803,7 @@ struct hl_asic_funcs {
 	void (*wreg)(struct hl_device *hdev, u32 reg, u32 val);
 	void (*halt_coresight)(struct hl_device *hdev);
 	int (*ctx_init)(struct hl_ctx *ctx);
+	void (*ctx_fini)(struct hl_ctx *ctx);
 	int (*get_clk_rate)(struct hl_device *hdev, u32 *cur_clk, u32 *max_clk);
 	u32 (*get_queue_id_for_cq)(struct hl_device *hdev, u32 cq_idx);
 	void (*read_device_fw_version)(struct hl_device *hdev,
@@ -1586,6 +1591,7 @@ struct hl_mmu_funcs {
  * @sync_stream_queue_idx: helper index for sync stream queues initialization.
  * @supports_coresight: is CoreSight supported.
  * @supports_soft_reset: is soft reset supported.
+ * @nic_rx_poll: enable NIC Rx in polling mode rather than IRQ.
  * @supports_cb_mapping: is mapping a CB to the device's MMU supported.
  */
 struct hl_device {
@@ -1686,10 +1692,13 @@ struct hl_device {
 	u8				sync_stream_queue_idx;
 	u8				supports_coresight;
 	u8				supports_soft_reset;
+	u8				nic_rx_poll;
 	u8				supports_cb_mapping;
 
 	/* Parameters for bring-up */
 	u64				nic_ports_mask;
+	u64				nic_ports_ext_mask;
+	u64				nic_auto_neg_mask;
 	u8				mmu_enable;
 	u8				mmu_huge_page_opt;
 	u8				cpu_enable;
@@ -1702,6 +1711,7 @@ struct hl_device {
 	u8				dram_scrambler_enable;
 	u8				hard_reset_on_fw_events;
 	u8				bmc_enable;
+	u8				nic_load_fw;
 	u8				rl_enable;
 };
 
@@ -1924,6 +1934,7 @@ void hl_fw_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
 int hl_fw_send_heartbeat(struct hl_device *hdev);
 int hl_fw_cpucp_info_get(struct hl_device *hdev);
 int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size);
+int hl_fw_cpucp_nic_info_get(struct hl_device *hdev);
 int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
 		struct hl_info_pci_counters *counters);
 int hl_fw_cpucp_total_energy_get(struct hl_device *hdev,
diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c
index f9067d3ef437..b7fbbe8f2577 100644
--- a/drivers/misc/habanalabs/common/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/common/habanalabs_drv.c
@@ -241,6 +241,10 @@ static void set_driver_behavior_per_device(struct hl_device *hdev)
 	hdev->dram_scrambler_enable = 1;
 	hdev->bmc_enable = 1;
 	hdev->hard_reset_on_fw_events = 1;
+	hdev->card_type = cpucp_card_type_pci;
+	hdev->nic_ports_ext_mask = 0x3FF;
+	hdev->nic_auto_neg_mask = 0x3FF;
+	hdev->nic_load_fw = 0;
 }
 
 /*
diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index c9f4703cff24..24e14cff563d 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -1,3 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
+
+HL_GAUDI_FILES += gaudi/gaudi_nic.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index ecf89d1e37c8..eee83e0a8c6d 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -78,6 +78,7 @@
 #define GAUDI_PLDM_MMU_TIMEOUT_USEC	(MMU_CONFIG_TIMEOUT_USEC * 100)
 #define GAUDI_PLDM_QMAN0_TIMEOUT_USEC	(HL_DEVICE_TIMEOUT_USEC * 30)
 #define GAUDI_PLDM_TPC_KERNEL_WAIT_USEC	(HL_DEVICE_TIMEOUT_USEC * 30)
+#define GAUDI_PLDM_NIC_QPC_INV_USEC	(NIC_QPC_INV_USEC * 10)
 #define GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC	1000000		/* 1s */
 #define GAUDI_MSG_TO_CPU_TIMEOUT_USEC	4000000		/* 4s */
 
@@ -458,7 +459,10 @@ static int gaudi_get_fixed_properties(struct hl_device *hdev)
 	prop->num_of_events = GAUDI_EVENT_SIZE;
 	prop->tpc_enabled_mask = TPC_ENABLED_MASK;
 
-	prop->max_power_default = MAX_POWER_DEFAULT_PCI;
+	if (hdev->card_type == cpucp_card_type_pmc)
+		prop->max_power_default = MAX_POWER_DEFAULT_PMC;
+	else
+		prop->max_power_default = MAX_POWER_DEFAULT_PCI;
 
 	prop->cb_pool_cb_cnt = GAUDI_CB_POOL_CB_CNT;
 	prop->cb_pool_cb_size = GAUDI_CB_POOL_CB_SIZE;
@@ -782,6 +786,14 @@ static int gaudi_init_tpc_mem(struct hl_device *hdev)
 	return rc;
 }
 
+static int gaudi_nic_clear_mem(struct hl_device *hdev)
+{
+	if (!hdev->nic_ports_mask)
+		return 0;
+
+	return gaudi_memset_device_memory(hdev, NIC_DRV_ADDR, NIC_DRV_SIZE, 0);
+}
+
 static int gaudi_late_init(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -836,6 +848,12 @@ static int gaudi_late_init(struct hl_device *hdev)
 		goto disable_pci_access;
 	}
 
+	rc = gaudi_nic_clear_mem(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to clear NIC memory\n");
+		goto disable_pci_access;
+	}
+
 	return 0;
 
 disable_pci_access:
@@ -865,6 +883,17 @@ static void gaudi_late_fini(struct hl_device *hdev)
 	hdev->hl_chip_info->info = NULL;
 }
 
+static void gaudi_nic_handle_rx(struct gaudi_nic_device *gaudi_nic)
+{
+	/* at this point, interrupts were disabled by the H/W */
+	napi_schedule(&gaudi_nic->napi);
+}
+
+static int gaudi_nic_handle_tx(struct gaudi_nic_device *gaudi_nic, void *data)
+{
+	return gaudi_nic_handle_tx_pkt(gaudi_nic, data);
+}
+
 static int gaudi_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
 {
 	dma_addr_t dma_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
@@ -1013,6 +1042,8 @@ static int gaudi_sw_init(struct hl_device *hdev)
 	}
 
 	gaudi->cpucp_info_get = gaudi_cpucp_info_get;
+	gaudi->nic_handle_rx = gaudi_nic_handle_rx;
+	gaudi->nic_handle_tx = gaudi_nic_handle_tx;
 
 	gaudi->max_freq_value = GAUDI_MAX_CLK_FREQ;
 
@@ -1053,14 +1084,29 @@ static int gaudi_sw_init(struct hl_device *hdev)
 	if (rc)
 		goto free_cpu_accessible_dma_pool;
 
+	rc = gaudi_nic_sw_init(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to init NIC S/W\n");
+		rc = -ENOMEM;
+		goto free_internal_qmans_pq_mem;
+	}
+
 	spin_lock_init(&gaudi->hw_queues_lock);
 	mutex_init(&gaudi->clk_gate_mutex);
 
+	/* Device CPU loads the PHY F/W at boot */
+	gaudi->nic_phy_load_fw = (!hdev->cpu_enable && !hdev->pldm) ||
+					(hdev->nic_load_fw);
+	gaudi->nic_phy_config_fw = !hdev->pldm;
+	gaudi->nic_qpc_cache_inv_timeout = hdev->pldm ?
+			GAUDI_PLDM_NIC_QPC_INV_USEC : NIC_QPC_INV_USEC;
 	hdev->supports_sync_stream = true;
 	hdev->supports_coresight = true;
 
 	return 0;
 
+free_internal_qmans_pq_mem:
+	gaudi_free_internal_qmans_pq_mem(hdev);
 free_cpu_accessible_dma_pool:
 	gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 free_cpu_dma_mem:
@@ -1081,6 +1127,8 @@ static int gaudi_sw_fini(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
 
+	gaudi_nic_sw_fini(hdev);
+
 	gaudi_free_internal_qmans_pq_mem(hdev);
 
 	gen_pool_destroy(hdev->cpu_accessible_dma_pool);
@@ -1104,6 +1152,8 @@ static int gaudi_sw_fini(struct hl_device *hdev)
 static irqreturn_t gaudi_irq_handler_single(int irq, void *arg)
 {
 	struct hl_device *hdev = arg;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
 	int i;
 
 	if (hdev->disabled)
@@ -1112,6 +1162,16 @@ static irqreturn_t gaudi_irq_handler_single(int irq, void *arg)
 	for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
 		hl_irq_handler_cq(irq, &hdev->completion_queue[i]);
 
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		if (!(hdev->nic_ports_mask & BIT(i)) || (!gaudi_nic->port_open))
+			continue;
+
+		gaudi_nic_rx_irq_handler(irq, gaudi_nic);
+	}
+
+	gaudi_nic_cq_irq_handler(irq, hdev);
 	hl_irq_handler_eq(irq, &hdev->event_queue);
 
 	return IRQ_HANDLED;
@@ -1271,6 +1331,8 @@ static void gaudi_disable_msi(struct hl_device *hdev)
 static void gaudi_init_scrambler_sram(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 status;
+	int rc;
 
 	if (gaudi->hw_cap_initialized & HW_CAP_SRAM_SCRAMBLER)
 		return;
@@ -1278,6 +1340,36 @@ static void gaudi_init_scrambler_sram(struct hl_device *hdev)
 	if (!hdev->sram_scrambler_enable)
 		return;
 
+	/* In case we don't load F/W, we must wait for uboot to finish before
+	 * we enable scrambling. Otherwise, we risk interrupting it in the
+	 * middle of initialization, which can cause the device to get stuck
+	 */
+	if ((!hdev->pldm) && (hdev->cpu_enable) && (!hdev->fw_loading)) {
+		dev_info(hdev->dev,
+			"Waiting for u-boot to finish before enabling SRAM scrambler\n");
+
+		rc = hl_poll_timeout(
+			hdev,
+			mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS,
+			status,
+			(status == CPU_BOOT_STATUS_NIC_FW_RDY) ||
+			(status == CPU_BOOT_STATUS_READY_TO_BOOT) ||
+			(status == CPU_BOOT_STATUS_SRAM_AVAIL),
+			10000,
+			GAUDI_NIC_FW_TIMEOUT_USEC);
+
+		if (rc)
+			dev_warn(hdev->dev,
+				"Failed to detect u-boot has finished loading NIC F/W. Maybe running old F/W?\n");
+
+		if (status != CPU_BOOT_STATUS_SRAM_AVAIL)
+			ssleep(1);
+
+		/* Stop the device CPU to make sure nothing bad happens */
+		WREG32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU, KMD_MSG_GOTO_WFE);
+		msleep(GAUDI_CPU_RESET_WAIT_MSEC);
+	}
+
 	WREG32(mmNIF_RTR_CTRL_0_SCRAM_SRAM_EN,
 			1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 	WREG32(mmNIF_RTR_CTRL_1_SCRAM_SRAM_EN,
@@ -2874,6 +2966,13 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 	else
 		wait_timeout_ms = GAUDI_RESET_WAIT_MSEC;
 
+	/*
+	 * Mark the NIC as in reset to avoid any new NIC accesses to the
+	 * H/W. This must be done before we stop the CPU as the NIC
+	 * might use it e.g. get/set EEPROM data.
+	 */
+	gaudi_nic_hard_reset_prepare(hdev);
+
 	gaudi_stop_nic_qmans(hdev);
 
 	gaudi_stop_mme_qmans(hdev);
@@ -2900,6 +2999,8 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 
 	gaudi_disable_timestamp(hdev);
 
+	/* NIC stop must be called before MSI is disabled */
+	gaudi_nic_stop(hdev);
 	gaudi_disable_msi(hdev);
 }
 
@@ -3184,6 +3285,16 @@ static int gaudi_hw_init(struct hl_device *hdev)
 
 	gaudi_init_hbm_dma_qmans(hdev);
 
+	/*
+	 * Before pushing u-boot/linux to device, need to set the hbm bar to
+	 * base address of dram
+	 */
+	if (gaudi_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
+		dev_err(hdev->dev,
+			"failed to map HBM bar to DRAM base address\n");
+		return -EIO;
+	}
+
 	rc = gaudi_init_cpu(hdev);
 	if (rc) {
 		dev_err(hdev->dev, "failed to initialize CPU\n");
@@ -3315,7 +3426,7 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset)
 					HW_CAP_HBM_DMA | HW_CAP_PLL |
 					HW_CAP_NIC_MASK | HW_CAP_MMU |
 					HW_CAP_SRAM_SCRAMBLER |
-					HW_CAP_HBM_SCRAMBLER |
+					HW_CAP_HBM_SCRAMBLER | HW_CAP_NIC_DRV |
 					HW_CAP_CLK_GATE);
 
 	memset(gaudi->events_stat, 0, sizeof(gaudi->events_stat));
@@ -6107,6 +6218,45 @@ static void gaudi_print_irq_info(struct hl_device *hdev, u16 event_type,
 	}
 }
 
+static void gaudi_print_nic_axi_irq_info(struct hl_device *hdev, u16 event_type,
+						void *data)
+{
+	char desc[64] = "", *type;
+	struct eq_nic_sei_event *eq_nic_sei = data;
+	u16 nic_id = event_type - GAUDI_EVENT_NIC_SEI_0;
+
+	switch (eq_nic_sei->axi_error_cause) {
+	case RXB:
+		type = "RXB";
+		break;
+	case RXE:
+		type = "RXE";
+		break;
+	case TXS:
+		type = "TXS";
+		break;
+	case TXE:
+		type = "TXE";
+		break;
+	case QPC_RESP:
+		type = "QPC_RESP";
+		break;
+	case NON_AXI_ERR:
+		type = "NON_AXI_ERR";
+		break;
+	default:
+		dev_err(hdev->dev, "unknown NIC AXI cause %d\n",
+			eq_nic_sei->axi_error_cause);
+		type = "N/A";
+		break;
+	}
+
+	snprintf(desc, sizeof(desc), "NIC%d_%s%d", nic_id, type,
+			eq_nic_sei->id);
+	dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
+		event_type, desc);
+}
+
 static int gaudi_soft_reset_late_init(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -6305,6 +6455,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 				struct hl_eq_entry *eq_entry)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
+	u64 data = le64_to_cpu(eq_entry->data[0]);
 	u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
 	u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
 			>> EQ_CTL_EVENT_TYPE_SHIFT);
@@ -6333,6 +6484,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_PSOC_MEM_DERR:
 	case GAUDI_EVENT_PSOC_CORESIGHT_DERR:
 	case GAUDI_EVENT_SRAM0_DERR ... GAUDI_EVENT_SRAM28_DERR:
+	case GAUDI_EVENT_NIC0_DERR ... GAUDI_EVENT_NIC4_DERR:
 	case GAUDI_EVENT_DMA_IF0_DERR ... GAUDI_EVENT_DMA_IF3_DERR:
 	case GAUDI_EVENT_HBM_0_DERR ... GAUDI_EVENT_HBM_3_DERR:
 	case GAUDI_EVENT_MMU_DERR:
@@ -6434,6 +6586,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_PSOC_MEM_SERR:
 	case GAUDI_EVENT_PSOC_CORESIGHT_SERR:
 	case GAUDI_EVENT_SRAM0_SERR ... GAUDI_EVENT_SRAM28_SERR:
+	case GAUDI_EVENT_NIC0_SERR ... GAUDI_EVENT_NIC4_SERR:
 	case GAUDI_EVENT_DMA_IF0_SERR ... GAUDI_EVENT_DMA_IF3_SERR:
 	case GAUDI_EVENT_HBM_0_SERR ... GAUDI_EVENT_HBM_3_SERR:
 		fallthrough;
@@ -6497,6 +6650,11 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 		hl_fw_unmask_irq(hdev, event_type);
 		break;
 
+	case GAUDI_EVENT_NIC_SEI_0 ... GAUDI_EVENT_NIC_SEI_4:
+		gaudi_print_nic_axi_irq_info(hdev, event_type, &data);
+		hl_fw_unmask_irq(hdev, event_type);
+		break;
+
 	case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E:
 		gaudi_print_clk_change_info(hdev, event_type);
 		hl_fw_unmask_irq(hdev, event_type);
@@ -7002,6 +7160,19 @@ static int gaudi_ctx_init(struct hl_ctx *ctx)
 	return 0;
 }
 
+static void gaudi_ctx_fini(struct hl_ctx *ctx)
+{
+	struct hl_device *hdev = ctx->hdev;
+
+	/* Gaudi will NEVER support more then a single compute context.
+	 * Therefore, don't clear anything unless it is the compute context
+	 */
+	if (hdev->compute_ctx != ctx)
+		return;
+
+	gaudi_nic_ctx_fini(ctx);
+}
+
 static u32 gaudi_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 {
 	return gaudi_cq_assignment[cq_idx];
@@ -7305,6 +7476,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.wreg = hl_wreg,
 	.halt_coresight = gaudi_halt_coresight,
 	.ctx_init = gaudi_ctx_init,
+	.ctx_fini = gaudi_ctx_fini,
 	.get_clk_rate = gaudi_get_clk_rate,
 	.get_queue_id_for_cq = gaudi_get_queue_id_for_cq,
 	.read_device_fw_version = gaudi_read_device_fw_version,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 858434d50b59..6dea73c5682f 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -15,6 +15,9 @@
 #include "../include/gaudi/gaudi.h"
 #include "../include/gaudi/gaudi_async_events.h"
 
+#include <linux/netdevice.h>
+#include <linux/kfifo.h>
+
 #define NUMBER_OF_EXT_HW_QUEUES		12
 #define NUMBER_OF_CMPLT_QUEUES		NUMBER_OF_EXT_HW_QUEUES
 #define NUMBER_OF_CPU_HW_QUEUES		1
@@ -27,9 +30,12 @@
  * Number of MSI interrupts IDS:
  * Each completion queue has 1 ID
  * The event queue has 1 ID
+ * Each NIC engine has 1 ID for Rx
+ * The NIC CQ has 1 ID
  */
 #define NUMBER_OF_INTERRUPTS		(NUMBER_OF_CMPLT_QUEUES + \
-						NUMBER_OF_CPU_HW_QUEUES)
+						NUMBER_OF_CPU_HW_QUEUES + \
+						NIC_NUMBER_OF_ENGINES + 1)
 
 #if (NUMBER_OF_INTERRUPTS > GAUDI_MSI_ENTRIES)
 #error "Number of MSI interrupts must be smaller or equal to GAUDI_MSI_ENTRIES"
@@ -44,6 +50,10 @@
 
 #define GAUDI_CPU_TIMEOUT_USEC		30000000	/* 30s */
 
+#define GAUDI_NIC_FW_TIMEOUT_USEC	12000000	/* 12s */
+
+#define NIC_QPC_INV_USEC		1000000		/* 1s */
+
 #define TPC_ENABLED_MASK		0xFF
 
 #define GAUDI_HBM_SIZE_32GB		0x800000000ull
@@ -100,20 +110,22 @@
 	(((mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_511 - \
 	mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0) + 4) >> 2)
 
+#define NIC_NUMBER_OF_PORTS	NIC_NUMBER_OF_ENGINES
+#define NIC_MAX_NUM_OF_LANES	(NIC_NUMBER_OF_MACROS * NIC_MAC_NUM_OF_LANES)
 
 /* DRAM Memory Map */
 
 #define CPU_FW_IMAGE_SIZE	0x10000000	/* 256MB */
 #define MMU_PAGE_TABLES_SIZE	0x0BF00000	/* 191MB */
 #define MMU_CACHE_MNG_SIZE	0x00100000	/* 1MB */
-#define RESERVED		0x04000000	/* 64MB */
+#define NIC_DRV_SIZE		0x04000000	/* 64MB */
 
 #define CPU_FW_IMAGE_ADDR	DRAM_PHYS_BASE
 #define MMU_PAGE_TABLES_ADDR	(CPU_FW_IMAGE_ADDR + CPU_FW_IMAGE_SIZE)
 #define MMU_CACHE_MNG_ADDR	(MMU_PAGE_TABLES_ADDR + MMU_PAGE_TABLES_SIZE)
+#define NIC_DRV_ADDR		(MMU_CACHE_MNG_ADDR + MMU_CACHE_MNG_SIZE)
 
-#define DRAM_DRIVER_END_ADDR	(MMU_CACHE_MNG_ADDR + MMU_CACHE_MNG_SIZE +\
-								RESERVED)
+#define DRAM_DRIVER_END_ADDR	(NIC_DRV_ADDR + NIC_DRV_SIZE)
 
 #define DRAM_BASE_ADDR_USER	0x20000000
 
@@ -145,6 +157,8 @@
 #define VA_HOST_SPACE_SIZE	(VA_HOST_SPACE_END - \
 					VA_HOST_SPACE_START) /* 767TB */
 
+#define VA_NIC_MEM_ADDR		0x10000000000ull /* 1TB */
+
 #define HW_CAP_PLL		BIT(0)
 #define HW_CAP_HBM		BIT(1)
 #define HW_CAP_MMU		BIT(2)
@@ -157,6 +171,7 @@
 #define HW_CAP_CLK_GATE		BIT(9)
 #define HW_CAP_SRAM_SCRAMBLER	BIT(10)
 #define HW_CAP_HBM_SCRAMBLER	BIT(11)
+#define HW_CAP_NIC_DRV		BIT(12)
 
 #define HW_CAP_NIC0		BIT(14)
 #define HW_CAP_NIC1		BIT(15)
@@ -232,6 +247,180 @@ enum gaudi_nic_mask {
 	GAUDI_NIC_MASK_ALL = 0x3FF
 };
 
+/**
+ * struct gaudi_nic_tx_taps - holds the NIC Tx taps values for a specific lane.
+ *                            Currently used for PAM4 only.
+ * @taps: holds all taps - tx_pre2, tx_pre1, tx_main, tx_post1 and tx_post2.
+ */
+struct gaudi_nic_tx_taps {
+	s32	taps[NIC_PHY_TX_TAPS_NUM];
+};
+
+/**
+ * struct gaudi_nic_macro - manage specific NIC macro that holds two NIC
+ *                          engines.
+ * @idx: index of the NIC macro.
+ * @num_of_lanes: number of lanes in the NIC macro.
+ */
+struct gaudi_nic_macro {
+	u8	idx;
+	u8	num_of_lanes;
+};
+
+/**
+ * struct gaudi_nic_device - manage specific NIC port.
+ * @hdev: habanalabs device structure.
+ * @ndev: pointer to network device.
+ * @nic_macro: pointer to the manage structure of the containing NIC macro.
+ * @napi: New API structure.
+ * @tx_wq: Tx work queue for handling packet transmission outside interrupt
+ *         context (for simulator only).
+ * @rx_wq: Rx work queue for handling incoming packets outside interrupt
+ *         context (for simulator only).
+ * @cq_wq: CQ work queue for handling CQEs outside interrupt context.
+ * @rx_mem_cpu: CPU address of RX memory.
+ * @rx_mem_dma: DMA address of RX memory.
+ * @cq_mem_cpu: CPU address of CQ memory.
+ * @cq_mem_dma: DMA address of CQ memory.
+ * @qp_err_mem_cpu: CPU address of QP error memory.
+ * @qp_err_mem_dma: DMA address of QP error memory.
+ * @in_reset: 1 if the NIC is currently in reset, 0 otherwise.
+ * @rx_poll_work: Rx work for polling mode.
+ * @cq_work: CQ work for processing CQEs.
+ * @link_status_work: work for checking NIC link status.
+ * @port_open_work: work for initializing the port H/W.
+ * @idr_lock: Protects qp_ids.
+ * @user_wq_lock: protects the user WQ configuration.
+ * @qp_ids: IDR to hold all connections IDs.
+ * @pcs_fail_fifo: queue for keeping the PCS link failures time stamps in order
+ *                 to reconfigure F/W if needed.
+ * @last_cqe_ts: time stamp of last processed CQE.
+ * @last_fw_tuning_ts: time stamp of last F/W tuning.
+ * @last_pcs_link_drop_ts: time stamp of last PCS link drop.
+ * @rx_msi_addr: Rx MSI address.
+ * @tx_swq_mem_device_va: device virtual address of Tx SWQ memory.
+ * @cq_mem_device_va: device virtual address of CQ memory.
+ * @rx_mem_size: Rx memory size.
+ * @cq_mem_size: CQ memory size.
+ * @qp_err_mem_size: QP error buffer memory size.
+ * @rx_ci: incremented by S/W for each received packet from the H/W.
+ * @tx_pi: incremented by S/W for each sent packet to the H/W.
+ * @tx_ci: incremented by H/W for each sent packet from the H/W.
+ * @cq_ci: incremented by S/W for each consumed CQE.
+ * @port: NIC specific port.
+ * @data_rate: NIC data rate according to speed and number of lanes.
+ * @tx_wq_pi: TX work queue PI for transmitting packets by their order (for
+ *            simulator only).
+ * @tx_wq_ci: TX work queue CI for transmitting packets by their order (for
+ *            simulator only).
+ * @qp_err_ci: next index of the QP error to fetch.
+ * @retry_cnt: counts the number of retries during link establishment.
+ * @pcs_fail_cnt: counter of PCS link failures since last F/W configuration.
+ * @pcs_local_fault_cnt: counter of PCS link local errors since last F/W
+ *                       configuration. These errors can appear even when link
+ *                       is up.
+ * @pcs_remote_fault_cnt: counter of PCS link remote errors since last F/W
+ *                        configuration. These errors can appear even when link
+ *                        is up.
+ * @speed: the bandwidth of the port in Mb/s.
+ * @last_cqe_cnt: the last number of processed CQEs.
+ * @cq_delay: the time between two invocations of the CQ polling work when not
+ *            idle.
+ * @cq_delay_idle: the time between two invocations of the CQ polling work when
+ *                 idle.
+ * @correctable_errors_cnt: count the correctable FEC blocks.
+ * @uncorrectable_errors_cnt: count the uncorrectable FEC blocks.
+ * @enabled: true if the NIC is enabled by the S/W, false otherwise. Can be
+ *           changed only from ndo_open/ndo_stop callbacks.
+ * @active: true if the NIC H/W is operational, false otherwise.
+ * @port_open: true if the port H/W is initialized, false otherwise.
+ * @do_macro_cfg: true if this port should handle the macro configuration, false
+ *              otherwise. Each NIC macro contains two ports - even and odd, and
+ *              only one of them should handle the shared configuration.
+ *              The default is for the even port to handle it, but in case that
+ *              the even port is disabled, the odd port will do it.
+ * @phy_fw_tuned: true if F/W is tuned, false otherwise.
+ * @pcs_link: true if the NIC has PCS link, false otherwise.
+ * @mac_loopback: true if port in MAC loopback mode, false otherwise.
+ * @auto_neg_enable: true if this port supports Autonegotiation, false
+ *                   otherwise.
+ * @auto_neg_resolved: true if this port completed Autonegotiation, false
+ *                     otherwise.
+ * @power_up_mask: represents which MAC channels should be configured during PHY
+ *                 power up.
+ * @fw_tuning_mask: represents which MAC channels should be configured during
+ *                  F/W tuning.
+ * @auto_neg_mask: represents which MAC channels should be configured during
+ *                 Autonegotiation.
+ * @pfc_enable: true if this port supports Priority Flow Control, false
+ *              otherwise.
+ */
+struct gaudi_nic_device {
+	struct hl_device	*hdev;
+	struct net_device	*ndev;
+	struct gaudi_nic_macro	*nic_macro;
+	struct napi_struct	napi;
+	struct workqueue_struct	*tx_wq;
+	struct workqueue_struct	*rx_wq;
+	struct workqueue_struct	*cq_wq;
+	void			*rx_mem_cpu;
+	dma_addr_t		rx_mem_dma;
+	void			*cq_mem_cpu;
+	dma_addr_t		cq_mem_dma;
+	void			*qp_err_mem_cpu;
+	dma_addr_t		qp_err_mem_dma;
+	atomic_t		in_reset;
+	struct delayed_work	rx_poll_work;
+	struct delayed_work	cq_work;
+	struct delayed_work	link_status_work;
+	struct delayed_work	port_open_work;
+	struct mutex		idr_lock;
+	struct mutex		user_wq_lock;
+	struct idr		qp_ids;
+	struct kfifo		pcs_fail_fifo;
+	ktime_t			last_cqe_ts;
+	ktime_t			last_fw_tuning_ts;
+	ktime_t			last_pcs_link_drop_ts;
+	u64			rx_msi_addr;
+	u64			tx_swq_mem_device_va;
+	u64			cq_mem_device_va;
+	u32			rx_mem_size;
+	u32			cq_mem_size;
+	u32			qp_err_mem_size;
+	u32			rx_ci;
+	u32			tx_pi;
+	u32			tx_ci;
+	u32			cq_ci;
+	u32			port;
+	u32			data_rate;
+	u32			tx_wq_pi;
+	u32			tx_wq_ci;
+	u32			qp_err_ci;
+	u32			retry_cnt;
+	u32			pcs_fail_cnt;
+	u32			pcs_local_fault_cnt;
+	u32			pcs_remote_fault_cnt;
+	u32			speed;
+	u32			last_cqe_cnt;
+	u32			cq_delay;
+	u32			cq_delay_idle;
+	u32			correctable_errors_cnt;
+	u32			uncorrectable_errors_cnt;
+	u8			enabled;
+	u8			active;
+	u8			port_open;
+	u8			do_macro_cfg;
+	u8			phy_fw_tuned;
+	u8			pcs_link;
+	u8			mac_loopback;
+	u8			auto_neg_enable;
+	u8			auto_neg_resolved;
+	u8			power_up_mask;
+	u8			fw_tuning_mask;
+	u8			auto_neg_mask;
+	u8			pfc_enable;
+};
+
 /**
  * struct gaudi_internal_qman_info - Internal QMAN information.
  * @pq_kernel_addr: Kernel address of the PQ memory area in the host.
@@ -247,14 +436,29 @@ struct gaudi_internal_qman_info {
 /**
  * struct gaudi_device - ASIC specific manage structure.
  * @cpucp_info_get: get information on device from CPU-CP
+ * @nic_handle_rx: NIC handler for incoming packet.
+ * @nic_handle_tx: NIC handler for outgoing packet.
+ * @nic_devices: array that holds all NIC ports manage structures.
+ * @nic_macros: array that holds all NIC macros manage structures.
+ * @nic_pam4_tx_taps: array that holds all PAM4 Tx taps of all NIC lanes.
+ * @nic_cq_comp: completion queue to handle wait/poll NIC CQ IOCTL.
+ * @nic_cq_lock: for serial copying of the CQEs from the NIC buffer to the user
+ *               queue.
  * @hw_queues_lock: protects the H/W queues from concurrent access.
  * @clk_gate_mutex: protects code areas that require clock gating to be disabled
  *                  temporarily
+ * @nic_cq_user_lock: protects the NIC CQ from concurrent operations that may
+ *               interfere with each other such as wait/mmap/destroy etc.
+ * @nic_qp_err_lock: protects the NIC QP error handler from pushing error
+ *                   entries to the CQ while it is under destruction.
+ * @nic_cq_buf: NIC CQ buffer, shared for all ports.
  * @internal_qmans: Internal QMANs information. The array size is larger than
  *                  the actual number of internal queues because they are not in
  *                  consecutive order.
  * @hbm_bar_cur_addr: current address of HBM PCI bar.
  * @max_freq_value: current max clk frequency.
+ * @nic_mac_loopback: enable MAC loopback on specific NIC ports.
+ * @nic_cq_user_new_cqes: number of available CQEs to process.
  * @events: array that holds all event id's
  * @events_stat: array that holds histogram of all received events.
  * @events_stat_aggregate: same as events_stat but doesn't get cleared on reset
@@ -263,29 +467,86 @@ struct gaudi_internal_qman_info {
  *                      signal we can use this engine in later code paths.
  *                      Each bit is cleared upon reset of its corresponding H/W
  *                      engine.
+ * @nic_cq_user_num_of_entries: number of CQ entries in the user CQ buffer
+ *                              (received from the user).
+ * @nic_cq_user_pi: producer index of the NIC CQ user buffer.
+ * @nic_cq_user_ci: consumer index of the NIC CQ user buffer.
+ * @nic_cq_status: return status of the CQ.
+ * @nic_cq_mmap_size: size of the mmapped CQ buffer.
+ * @nic_pcs_fail_time_frame: time frame is seconds to count PCS link failure.
+ * @nic_pcs_fail_threshold: PCS link failures threshold to reset link.
+ * @nic_qpc_cache_inv_timeout: timeout for NIC QPC cache invalidation.
+ * @nic_phy_load_fw: true if the NIC PHY F/W should be loaded, false otherwise.
+ * @nic_phy_config_fw: true if the NIC PHY F/W should be configured, false
+ *                     otherwise. The NIC PHY F/W should be configured on ASIC
+ *                     only, in contrary to simulator/Palladium.
+ * @nic_cq_enable: true if NIC CQ is enabled, false otherwise.
+ * @nic_cq_mmap: true if NIC CQ is mmapped, false otherwise.
+ * @nic_use_fw_polarity: true if NIC should use polarity values from F/W,
+ *                       false if NIC should use hard coded values.
  * @multi_msi_mode: whether we are working in multi MSI single MSI mode.
  *                  Multi MSI is possible only with IOMMU enabled.
+ * @nic_in_reset: true if the NIC was marked as in reset, false otherwise. Used
+ *                to avoid an additional stopping of the NIC if a hard reset was
+ *                re-initiated.
  * @mmu_cache_inv_pi: PI for MMU cache invalidation flow. The H/W expects an
  *                    8-bit value so use u8.
+ * @nic_check_link: true if the PCS link should be checked periodically.
+ * @nic_cq_irq_enable: true if an interrupt was allocated for the NIC CQ.
+ * @nic_in_teardown: true if the NIC is in teardown (during device remove).
+ * @nic_phy_auto_neg_lpbk: true if the NIC PHY should support Autoneg in
+ *                         loopback mode.
  */
 struct gaudi_device {
 	int (*cpucp_info_get)(struct hl_device *hdev);
-
+	void (*nic_handle_rx)(struct gaudi_nic_device *gaudi_nic);
+	int (*nic_handle_tx)(struct gaudi_nic_device *gaudi_nic, void *data);
+	struct gaudi_nic_device		nic_devices[NIC_NUMBER_OF_PORTS];
+	struct gaudi_nic_macro		nic_macros[NIC_NUMBER_OF_MACROS];
+	struct gaudi_nic_tx_taps	nic_pam4_tx_taps[NIC_MAX_NUM_OF_LANES];
+	struct completion		nic_cq_comp;
+
+	spinlock_t			nic_cq_lock;
 	/* TODO: remove hw_queues_lock after moving to scheduler code */
 	spinlock_t			hw_queues_lock;
 	struct mutex			clk_gate_mutex;
 
+	struct mutex			nic_cq_user_lock;
+	struct mutex			nic_qp_err_lock;
+
+	struct hl_nic_cqe		*nic_cq_buf;
 	struct gaudi_internal_qman_info	internal_qmans[GAUDI_QUEUE_ID_SIZE];
 
 	u64				hbm_bar_cur_addr;
 	u64				max_freq_value;
+	u64				nic_mac_loopback;
+
+	atomic_t			nic_cq_user_new_cqes;
 
 	u32				events[GAUDI_EVENT_SIZE];
 	u32				events_stat[GAUDI_EVENT_SIZE];
 	u32				events_stat_aggregate[GAUDI_EVENT_SIZE];
 	u32				hw_cap_initialized;
+	u32				nic_cq_user_num_of_entries;
+	u32				nic_cq_user_pi;
+	u32				nic_cq_user_ci;
+	u32				nic_cq_status;
+	u32				nic_cq_mmap_size;
+	u32				nic_pcs_fail_time_frame;
+	u32				nic_pcs_fail_threshold;
+	u32				nic_qpc_cache_inv_timeout;
+	u8				nic_phy_load_fw;
+	u8				nic_phy_config_fw;
+	u8				nic_cq_enable;
+	u8				nic_cq_mmap;
+	u8				nic_use_fw_polarity;
 	u8				multi_msi_mode;
+	u8				nic_in_reset;
 	u8				mmu_cache_inv_pi;
+	u8				nic_check_link;
+	u8				nic_cq_irq_enable;
+	u8				nic_in_teardown;
+	u8				nic_phy_auto_neg_lpbk;
 };
 
 void gaudi_init_security(struct hl_device *hdev);
@@ -296,4 +557,19 @@ int gaudi_debug_coresight(struct hl_device *hdev, void *data);
 void gaudi_halt_coresight(struct hl_device *hdev);
 int gaudi_get_clk_rate(struct hl_device *hdev, u32 *cur_clk, u32 *max_clk);
 
+/* NIC functions */
+
+int gaudi_nic_ports_init(struct hl_device *hdev);
+void gaudi_nic_ports_fini(struct hl_device *hdev);
+int gaudi_nic_hard_reset_prepare(struct hl_device *hdev);
+void gaudi_nic_stop(struct hl_device *hdev);
+void gaudi_nic_ports_reopen(struct hl_device *hdev);
+void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
+irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
+irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
+netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
+					struct sk_buff *skb);
+int gaudi_nic_sw_init(struct hl_device *hdev);
+void gaudi_nic_sw_fini(struct hl_device *hdev);
+
 #endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
new file mode 100644
index 000000000000..9fc6e9fe7ac4
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -0,0 +1,2327 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include "../include/gaudi/asic_reg/gaudi_regs.h"
+#include "../include/hw_ip/mmu/mmu_general.h"
+#include "../include/hw_ip/nic/nic_general.h"
+#include <uapi/misc/habanalabs.h>
+
+#include <linux/vmalloc.h>
+#include <linux/etherdevice.h>
+#include <linux/pci.h>
+#include <linux/ipv6.h>
+#include <linux/if_vlan.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+
+#define HL_NIC_DEBUG 0
+
+/*
+ * enum link_status - PCS link status.
+ * @LINK_UP: PHY is ready and PCS has link.
+ * @PCS_DOWN: PCS has no link.
+ * @PHY_DON: PHY is not ready.
+ * @FAIL_RECONFIG: need to reconfigure the PHY due to PCS link failures.
+ * @FAULT_RECONFIG: need to reconfigure the PHY due to PCS link faults.
+ */
+enum link_status {
+	LINK_UP,
+	PCS_DOWN,
+	PHY_DOWN,
+	FAIL_RECONFIG,
+	FAULT_RECONFIG
+};
+
+/*
+ * enum eth_pkt_status - status of Rx Ethernet packet.
+ * ETH_PKT_OK: packet was received successfully.
+ * ETH_PKT_DROP: packet should be dropped.
+ * ETH_PKT_NONE: no available packet.
+ */
+enum eth_pkt_status {
+	ETH_PKT_OK,
+	ETH_PKT_DROP,
+	ETH_PKT_NONE
+};
+
+#define HLS1_EXT_PORTS_MASK		0x302
+#define FW_LINK_TRAINING_CNT		200
+#define FW_TUNING_CNT			3000
+#define PCS_LINK_CNT			10
+#define PCS_FAIL_TIME_FRAME_SEC		(60 * 5) /* 5 minutes */
+#define PCS_FAIL_THRESHOLD		8
+#define PCS_FAULT_THRESHOLD		20
+#define PCS_LINK_RETRY_MSEC		20
+
+/* NIC_MAX_MTU equals 8K minus eth header */
+#define NIC_MAX_MTU	((1 << 13) - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
+
+/* MAC configuration */
+#define MAC_CFG_MAC(addr, data)		\
+				mac_write(gaudi_nic, i, "mac", addr, data)
+#define MAC_CFG_MAC_CORE(addr, data)	\
+				mac_write(gaudi_nic, i, "mac_core", addr, data)
+#define MAC_CFG_XPCS(addr, data)	\
+				mac_write(gaudi_nic, i, "xpcs", addr, data)
+#define MAC_CFG_XPCS91(addr, data)	\
+				mac_write(gaudi_nic, i, "xpcs91", addr, data)
+
+bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic)
+{
+	return atomic_read(&gaudi_nic->in_reset) ||
+			hl_device_disabled_or_in_reset(gaudi_nic->hdev);
+}
+
+static void qpc_cache_inv(struct gaudi_nic_device *gaudi_nic, bool is_req)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 status, port = gaudi_nic->port;
+	u64 inv_reg, status_reg, base;
+	int rc;
+
+	if (is_req) {
+		inv_reg = mmNIC0_QPC0_REQ_QPC_CACHE_INVALIDATE;
+		status_reg = mmNIC0_QPC0_REQ_QPC_CACHE_INV_STATUS;
+	} else {
+		inv_reg = mmNIC0_QPC0_RES_QPC_CACHE_INVALIDATE;
+		status_reg = mmNIC0_QPC0_RES_QPC_CACHE_INV_STATUS;
+	}
+
+	/* fix the address to the correct NIC */
+	base = NIC_CFG_BASE(port);
+	inv_reg += base;
+	status_reg += base;
+
+	WREG32(inv_reg, 1);
+	WREG32(inv_reg, 0);
+
+	/* no need to wait for the status in case of hard reset */
+	if (hdev->hard_reset_pending)
+		return;
+
+	rc = hl_poll_timeout(
+		hdev,
+		status_reg,
+		status,
+		status &
+			NIC0_QPC0_REQ_QPC_CACHE_INV_STATUS_INVALIDATE_DONE_MASK,
+		1000,
+		gaudi->nic_qpc_cache_inv_timeout);
+
+	if (rc)
+		dev_warn(hdev->dev,
+			"NIC %s QPC cache invalidation timeout, port: %d\n",
+			is_req ? "requester" : "responder", port);
+}
+
+static void eth_start_stop(struct gaudi_nic_device *gaudi_nic, bool is_start)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u64 *qpc_addr, req_qpc_addr, res_qpc_addr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct qpc_requester req_qp;
+	struct qpc_responder res_qp;
+	u32 port = gaudi_nic->port;
+	int i;
+
+	/*
+	 * Due to H/W bug, odd ports cannot generate MSI interrupts.
+	 * Hence they generate wire interrupts and CPU-CP converts them to MSI
+	 * interrupts. In order to avoid CPU-CP from generating MSI interrupts
+	 * after the odd port went down, clear here the interrupt enable bit.
+	 */
+	if (!is_start && !hdev->nic_rx_poll && (port & 1))
+		NIC_RMWREG32(mmNIC0_QPC0_INTERRUPT_EN, 0,
+				NIC0_QPC0_INTERRUPT_EN_INTERRUPT4_WIRE_EN_MASK);
+
+	/* ETH uses QP 0 */
+	req_qpc_addr = REQ_QPC_ADDR(port, 0);
+
+	memset(&req_qp, 0, sizeof(req_qp));
+	REQ_QPC_SET_TRANSPORT_SERVICE(req_qp, TS_RAW);
+	REQ_QPC_SET_LAST_IDX(req_qp, (WQ_BUFFER_SIZE - 1));
+	/*
+	 * See comment regarding the NIC_HW_MAX_QP_NUM value in the sction of
+	 * TXE configuration in config_port_hw().
+	 */
+	REQ_QPC_SET_WQ_BASE_ADDR(req_qp, NIC_HW_MAX_QP_NUM);
+	REQ_QPC_SET_VALID(req_qp, (u64) is_start);
+	REQ_QPC_SET_SECURED(req_qp, SECURED);
+	REQ_QPC_SET_PORT(req_qp, 0);
+
+	qpc_addr = (u64 *) &req_qp;
+	for (i = 0 ; i < (sizeof(req_qp) / sizeof(u64)) ; i++)
+		writeq(qpc_addr[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((req_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qpc_cache_inv(gaudi_nic, true);
+
+	/* ETH uses QP 0 */
+	res_qpc_addr = RES_QPC_ADDR(port, 0);
+
+	memset(&res_qp, 0, sizeof(res_qp));
+	RES_QPC_SET_TRANSPORT_SERVICE(res_qp, TS_RAW);
+	RES_QPC_SET_LOG_BUF_SIZE_MASK(res_qp, QPC_RES_LOG_BUF_SIZE_MASK);
+	RES_QPC_SET_VALID(res_qp, (u64) is_start);
+	RES_QPC_SET_SECURED(res_qp, SECURED);
+	RES_QPC_SET_PORT(res_qp, 0);
+
+	qpc_addr = (u64 *) &res_qp;
+	for (i = 0 ; i < (sizeof(res_qp) / sizeof(u64)) ; i++)
+		writeq(qpc_addr[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((res_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qpc_cache_inv(gaudi_nic, false);
+}
+
+static u32 mac_addr_convert(int mac, char *cfg_type, u32 addr)
+{
+	if (!strcmp(cfg_type, "xpcs")) {
+		if (addr >= 200 && addr <= 219)
+			addr = addr - 200 + 54;
+		else if (addr >= 400 && addr <= 419)
+			addr = addr - 400 + 74;
+		else if (addr >= (1 << 15))
+			addr = addr - (1 << 15) + 95;
+
+		addr = addr * 4 + mac * (1 << 12);
+	} else if (!strcmp(cfg_type, "mac")) {
+		addr = addr + mac * (1 << 12) + (1 << 10);
+	} else if (!strcmp(cfg_type, "mac_core")) {
+		addr = addr + (1 << 15);
+	} else if (!strcmp(cfg_type, "xpcs91")) {
+		addr = addr * 4 + (1 << 11) * 10;
+	}
+
+	return addr + 0xCC0000;
+}
+
+static void mac_write(struct gaudi_nic_device *gaudi_nic, int mac,
+			char *cfg_type, u32 addr, u32 data)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	addr = mac_addr_convert(mac, cfg_type, addr);
+
+	NIC_MACRO_WREG32(addr, data);
+}
+
+u32 gaudi_nic_mac_read(struct gaudi_nic_device *gaudi_nic, int mac,
+			char *cfg_type, u32 addr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	addr = mac_addr_convert(mac, cfg_type, addr);
+
+	return NIC_MACRO_RREG32(addr);
+}
+
+static void config_port_hw(struct gaudi_nic_device *gaudi_nic, u64 mac_addr)
+{
+	u64 swq_base_addr = SWQ_BASE_ADDR + gaudi_nic->port * SWQ_BASE_SIZE;
+	u32 rx_mem_addr_lo, rx_mem_addr_hi, txs_fence_idx, txs_pi, txs_ci,
+		txs_tail, txs_head, txs_timeout_31_0, timeout_47_32, prio,
+		txs_port, rl_en_log_time, txs_schedq, port = gaudi_nic->port;
+	u64 txs_addr, cq_msi_addr,
+		req_qpc_base_addr = REQ_QPC_ADDR(gaudi_nic->port, 0),
+		res_qpc_base_addr = RES_QPC_ADDR(gaudi_nic->port, 0);
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u64 tx_swq_base, cq_mem_addr = gaudi_nic->cq_mem_device_va;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int i;
+
+	if (gaudi->multi_msi_mode) {
+		gaudi_nic->rx_msi_addr = RX_MSI_ADDRESS + port * 4;
+		cq_msi_addr = CQ_MSI_ADDRESS;
+	} else {
+		gaudi_nic->rx_msi_addr = cq_msi_addr = mmPCIE_MSI_INTR_0;
+	}
+
+	/* TXS Configuration */
+	txs_addr = TXS_BASE_ADDR + port * TXS_BASE_SIZE;
+
+	/* Timer free list */
+	for (i = 0 ; i < TXS_FREE_NUM_ENTRIES ; i++) {
+		writel(TXS_GRANULARITY + i, hdev->pcie_bar[HBM_BAR_ID] +
+			((txs_addr + TXS_FREE_OFFS + i * 4) -
+				gaudi->hbm_bar_cur_addr));
+	}
+
+	/* Perform read to flush the writes */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_49_18,
+				(txs_addr + TXS_FIFO_OFFS) >> 18);
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_17_7,
+				((txs_addr + TXS_FIFO_OFFS) >> 7) & 0x7FF);
+	NIC_WREG32(mmNIC0_TXS0_FREE_LIST_PUSH_MASK_EN, 1);
+
+	txs_fence_idx = 0;
+	txs_pi = 0;
+	txs_ci = 0;
+	txs_tail = 0;
+	txs_head = 0;
+	txs_timeout_31_0 = 0;
+	timeout_47_32 = 0;
+	prio = 0;
+	txs_port = 0;
+	rl_en_log_time = 0;
+	txs_schedq = (timeout_47_32 & 0xFFFF) | ((prio & 0x3) << 16) |
+			((txs_port & 1) << 18) |
+			((rl_en_log_time & 0x3F) << 19);
+
+	for (i = 0 ; i < TXS_SCHEDQ ; i++) {
+		txs_tail = txs_head = i;
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_31_0, txs_fence_idx);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_63_32, txs_pi);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_95_64, txs_ci);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_127_96, txs_tail);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_159_128, txs_head);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_191_160,
+							txs_timeout_31_0);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_217_192, txs_schedq);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_FIFO, i);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_EN, 1);
+	}
+
+	NIC_WREG32(mmNIC0_TXS0_TICK_WRAP, 100);
+	NIC_WREG32(mmNIC0_TXS0_FIRST_SCHEDQ_ID,
+			0 << NIC0_TXS0_FIRST_SCHEDQ_ID_R0_SHIFT |
+			64 << NIC0_TXS0_FIRST_SCHEDQ_ID_R1_SHIFT |
+			128 << NIC0_TXS0_FIRST_SCHEDQ_ID_R2_SHIFT |
+			192 << NIC0_TXS0_FIRST_SCHEDQ_ID_R3_SHIFT);
+	NIC_WREG32(mmNIC0_TXS0_LAST_SCHEDQ_ID,
+			63 << NIC0_TXS0_FIRST_SCHEDQ_ID_R0_SHIFT |
+			127 << NIC0_TXS0_FIRST_SCHEDQ_ID_R1_SHIFT |
+			191 << NIC0_TXS0_FIRST_SCHEDQ_ID_R2_SHIFT |
+			155 << NIC0_TXS0_FIRST_SCHEDQ_ID_R3_SHIFT);
+	NIC_WREG32(mmNIC0_TXS0_SCAN_TIME_COMPARE_0, 4);
+	NIC_WREG32(mmNIC0_TXS0_SCAN_TIME_COMPARE_1, 0);
+	NIC_WREG32(mmNIC0_TXS0_TMR_SCAN_EN, 1);
+
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_FREE_LIST_49_32,
+				(txs_addr + TXS_FREE_OFFS) >> 32);
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_FREE_LIST_31_0,
+				(txs_addr + TXS_FREE_OFFS) & 0xFFFFFFFF);
+
+	NIC_WREG32(mmNIC0_TXS0_LIST_MASK,
+			~(0xFFFFFFFF << (ilog2(TXS_FREE_NUM_ENTRIES) - 5)));
+	NIC_WREG32(mmNIC0_TXS0_PRODUCER_UPDATE, TXS_FREE_NUM_ENTRIES);
+	NIC_WREG32(mmNIC0_TXS0_PRODUCER_UPDATE_EN, 1);
+	NIC_WREG32(mmNIC0_TXS0_PRODUCER_UPDATE_EN, 0);
+	NIC_WREG32(mmNIC0_TXS0_LIST_MEM_READ_MASK, 0);
+	NIC_WREG32(mmNIC0_TXS0_PUSH_LOCK_EN, 1);
+
+	/* Consider burst size */
+	NIC_WREG32(mmNIC0_TXS0_IGNORE_BURST_EN, 0);
+
+	/* TXE Configuration */
+
+	/*
+	 * We want to separate the driver WQ from the user WQs.
+	 * Since the NIC supports 4 different WQ base addresses, base address 0
+	 * will be used by the user and base address 1 by the driver.
+	 * The WQ base address index is inferred by two bits that are taken from
+	 * QPC.WQ_BASE_ADDR and are configurable by SQ_BASE_ADDRESS_SEL.
+	 * Since we support up to NIC_HW_MAX_QP_NUM user QPs and the single
+	 * driver QP is located after them, we configure the driver
+	 * QPC.WQ_BASE_ADDR to the value NIC_HW_MAX_QP_NUM, and
+	 * SQ_BASE_ADDRESS_SEL to have the right shift value so the driver will
+	 * indeed use base address 1.
+	 */
+
+	/*
+	 * Need to subtract the size of the user WQs because the driver uses WQ
+	 * base address 1.
+	 */
+	tx_swq_base = swq_base_addr -
+			(1 << (WQ_BUFFER_LOG_SIZE - 2)) * NIC_HW_MAX_QP_NUM *
+				DEVICE_CACHE_LINE_SIZE;
+
+	NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_49_32_1,
+			(tx_swq_base >> 32) & 0x3FFFFF);
+	NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_1,
+			tx_swq_base & 0xFFFFFFFF);
+
+	/*
+	 * This register should contain the value of the shift that the H/W will
+	 * apply on QPC.WQ_BASE_ADDR in order to get the WQ base address index.
+	 * The driver uses WQ base address 1 so we need to trim the leading
+	 * zero bits.
+	 */
+	NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_SEL, ffs(NIC_HW_MAX_QP_NUM) - 1);
+
+	NIC_WREG32(mmNIC0_TXE0_LOG_MAX_WQ_SIZE_1, WQ_BUFFER_LOG_SIZE - 2);
+	NIC_WREG32(mmNIC0_TXE0_PORT0_MAC_CFG_47_32, (mac_addr >> 32) & 0xFFFF);
+	NIC_WREG32(mmNIC0_TXE0_PORT0_MAC_CFG_31_0, mac_addr & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_TXE0_PORT1_MAC_CFG_47_32, (mac_addr >> 32) & 0xFFFF);
+	NIC_WREG32(mmNIC0_TXE0_PORT1_MAC_CFG_31_0, mac_addr & 0xFFFFFFFF);
+
+	/* Since the user WQs are mapped via MMU by the user, its AXI_USER
+	 * registers are set without MMU bypass and with the user ASID.
+	 * Because these configuration registers are shared between the user WQs
+	 * and the ETH Tx WQ, the latter can't be mapped via MMU as we need to
+	 * configure the LKD ASID for that.
+	 * In addition, the ETH Tx WQ is secured so the user shouldn't be able
+	 * to access it. Hence we place the ETH Tx WQ on HBM in the LKD reserved
+	 * section.
+	 */
+	NIC_WREG32(mmNIC0_TXE0_WQE_FETCH_AXI_USER, 1);
+	/*
+	 * The Tx data is placed on HBM. Hence configure it without MMU bypass
+	 * and with the user ASID to avoid any successful access to the host
+	 */
+	NIC_WREG32(mmNIC0_TXE0_DATA_FETCH_AXI_USER, 1);
+	NIC_WREG32(mmNIC0_TXE0_INTERRUPT_MASK, 3);
+
+	/* Make sure data fetch can never be privileged */
+	NIC_WREG32(mmNIC0_TXE0_DATA_FETCH_AXI_PROT, 0x80);
+	/* Make sure WQE fetch can never be privileged */
+	NIC_WREG32(mmNIC0_TXE0_WQE_FETCH_AXI_PROT, 0x80);
+
+	/* QPC Configuration */
+	NIC_WREG32(mmNIC0_QPC0_REQ_BASE_ADDRESS_49_18,
+			(req_qpc_base_addr >> 18) & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_QPC0_REQ_BASE_ADDRESS_17_7,
+			(req_qpc_base_addr >> 7) & 0x7FF);
+	NIC_WREG32(mmNIC0_QPC0_RES_BASE_ADDRESS_49_18,
+			(res_qpc_base_addr >> 18) & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_QPC0_RES_BASE_ADDRESS_17_7,
+			(res_qpc_base_addr >> 7) & 0x7FF);
+	NIC_WREG32(mmNIC0_QPC0_RES_QPC_CACHE_INVALIDATE, 1);
+	NIC_WREG32(mmNIC0_QPC0_REQ_QPC_CACHE_INVALIDATE, 1);
+	NIC_WREG32(mmNIC0_QPC0_RES_QPC_CACHE_INVALIDATE, 0);
+	NIC_WREG32(mmNIC0_QPC0_REQ_QPC_CACHE_INVALIDATE, 0);
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_BASE_4, gaudi_nic->rx_msi_addr);
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_DATA_4, 1);
+	NIC_WREG32(mmNIC0_QPC0_RES_RING0_CFG, RAW_QPN);
+	/* Interrupt each packet */
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_CFG, 0x1FF);
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_CAUSE, 0);
+	/* enable only the QP error interrupt, other interrupts are unused */
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_MASK, 0x110);
+	NIC_WREG32(mmNIC0_QPC0_AXI_PROT, 0); /* secured */
+
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_BASE_ADDR_49_18,
+			(gaudi_nic->qp_err_mem_dma >> 18) & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_BASE_ADDR_17_7,
+			gaudi_nic->qp_err_mem_dma & 0x3FF80);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_PRODUCER_INDEX, 0);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_CONSUMER_INDEX, 0);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_WRITE_INDEX, 0);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_MASK, QP_ERR_BUF_SIZE - 1);
+	/* The error FIFO is unmapped, hence the bypass */
+	NIC_WREG32(mmNIC0_QPC0_AXI_USER, 0x400);
+	NIC_WREG32(mmNIC0_QPC0_RETRY_COUNT_MAX, 0xFEFE);
+
+	/*
+	 * Generate wire interrupt in case of a QP error.
+	 * CPU-CP converts it to event.
+	 */
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_EN,
+		1 << NIC0_QPC0_INTERRUPT_EN_INTERRUPT8_WIRE_EN_SHIFT);
+
+	/* RXE Configuration */
+	rx_mem_addr_lo = lower_32_bits(gaudi_nic->rx_mem_dma);
+	/* discard packets above the max size */
+	rx_mem_addr_hi = (upper_32_bits(gaudi_nic->rx_mem_dma) <<
+			NIC0_RXE0_RAW_BASE_HI_P1_RAW_BASE_ADDR_HI_P1_SHIFT) |
+		(ilog2(NIC_MAX_PKT_SIZE) <<
+			NIC0_RXE0_RAW_BASE_HI_P1_LOG_RAW_ENTRY_SIZE_P1_SHIFT);
+
+	NIC_WREG32(mmNIC0_RXE0_ARUSER_HBW_10_0, 1);
+	NIC_WREG32(mmNIC0_RXE0_ARUSER_HBW_31_11, 0);
+
+	/* Make sure LBW write access (for SM) can never be privileged */
+	NIC_WREG32(mmNIC0_RXE0_AWPROT_LBW, 0x2);
+
+	/* Make sure HBW read access (for WQE) is always unsecured */
+	NIC_WREG32(mmNIC0_RXE0_ARPROT_HBW, 0x222);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P0_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P0_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P1_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P1_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P2_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P2_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P3_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P3_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P0_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P0_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P0_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P0_1, rx_mem_addr_hi);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P1_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P1_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P1_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P1_1, rx_mem_addr_hi);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P2_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P2_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P2_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P2_1, rx_mem_addr_hi);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P3_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P3_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P3_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P3_1, rx_mem_addr_hi);
+
+	/*
+	 * See the comment for mmNIC0_TXE0_SQ_BASE_ADDRESS_SEL. The same applies
+	 * for the Rx.
+	 */
+	NIC_WREG32(mmNIC0_RXE0_WQ_BASE_WINDOW_SEL, ffs(NIC_HW_MAX_QP_NUM) - 1);
+
+	NIC_WREG32(mmNIC0_RXE0_PKT_DROP,
+			(0 << NIC0_RXE0_PKT_DROP_ERR_QP_INVALID_SHIFT) |
+			(1 << NIC0_RXE0_PKT_DROP_ERR_TS_MISMATCH_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_CS_INVALID_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_REQ_PSN_INVALID_SHIFT) |
+			(1 << NIC0_RXE0_PKT_DROP_ERR_RES_RKEY_INVALID_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_RES_RESYNC_INVALID_SHIFT) |
+			/* H/W WA for check priority order */
+			(0 << NIC0_RXE0_PKT_DROP_ERR_INV_OPCODE_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_INV_SYNDROME_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_INV_RAW_SIZE_SHIFT));
+
+	/* CQ */
+	NIC_WREG32(mmNIC0_RXE0_CQ_BASE_ADDR_31_7, cq_mem_addr &
+					NIC0_RXE0_CQ_BASE_ADDR_31_7_R_MASK);
+	NIC_WREG32(mmNIC0_RXE0_CA_BASE_ADDR_49_32, cq_mem_addr >> 32);
+	NIC_WREG32(mmNIC0_RXE0_CQ_WRITE_INDEX, 0);
+	NIC_WREG32(mmNIC0_RXE0_CQ_PRODUCER_INDEX, 0);
+	NIC_WREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX, 0);
+	NIC_WREG32(mmNIC0_RXE0_CQ_CFG0,
+			(1 << NIC0_RXE0_CQ_CFG0_ENABLE_SHIFT) |
+			(1 << NIC0_RXE0_CQ_CFG0_INTERRUPT_MASK_SHIFT) |
+			(8 << NIC0_RXE0_CQ_CFG0_CREDIT_SHIFT) |
+			(1 << NIC0_RXE0_CQ_CFG0_WRAPAROUND_EN_SHIFT) |
+			(1 << NIC0_RXE0_CQ_CFG0_SOB_CQ_MUTEX_SHIFT) |
+			(24 << NIC0_RXE0_CQ_CFG0_CQ_SELECT_SHIFT));
+	NIC_WREG32(mmNIC0_RXE0_CQ_MASK, CQ_PORT_BUF_LEN - 1);
+	/* CQ overrun interrupt only */
+	NIC_WREG32(mmNIC0_RXE0_CQ_MSI_ADDR_1, cq_msi_addr);
+	NIC_WREG32(mmNIC0_RXE0_CQ_MSI_DATA_1, 1);
+	NIC_WREG32(mmNIC0_RXE0_MSI_CASUE_MASK, 2);
+	NIC_WREG32(mmNIC0_RXE0_MSI_CAUSE, 0);
+
+	/*
+	 * Due to H/W bug, odd ports cannot generate MSI interrupts.
+	 * Hence they generate wire interrupts and CPU-CP converts them to MSI
+	 * interrupts.
+	 */
+	if (!hdev->nic_rx_poll && (port & 1))
+		NIC_RMWREG32(mmNIC0_QPC0_INTERRUPT_EN, 1,
+			NIC0_QPC0_INTERRUPT_EN_INTERRUPT4_WIRE_EN_MASK);
+	else
+		NIC_RMWREG32(mmNIC0_QPC0_INTERRUPT_EN, 1,
+			NIC0_QPC0_INTERRUPT_EN_INTERRUPT4_MSI_EN_MASK);
+
+	/* MAC filtering */
+	if (port & 1) {
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_2,
+					mac_addr & 0xFFFFFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_3,
+					mac_addr & 0xFFFFFFFF);
+
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_2,
+					(mac_addr >> 32) & 0xFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_3,
+					(mac_addr >> 32) & 0xFFFF);
+	} else {
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_0,
+					mac_addr & 0xFFFFFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_1,
+					mac_addr & 0xFFFFFFFF);
+
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_0,
+					(mac_addr >> 32) & 0xFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_1,
+					(mac_addr >> 32) & 0xFFFF);
+	}
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		MAC_CFG_XPCS(0, gaudi_nic->mac_loopback ? 0xC000 : 0x8000);
+	}
+
+	gaudi_nic_set_pfc(gaudi_nic);
+}
+
+void gaudi_nic_set_pfc(struct gaudi_nic_device *gaudi_nic)
+{
+	int i;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		MAC_CFG_MAC(0x8, gaudi_nic->pfc_enable ? 0x80813 : 0x2913);
+	}
+}
+
+static void config_port_mac(struct gaudi_nic_device *gaudi_nic)
+{
+	u32 port = gaudi_nic->port, speed = gaudi_nic->speed;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int i;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		/* H/W WA for error length */
+		MAC_CFG_MAC(0x14, 8192);
+
+		/* Disable FC FEC */
+		MAC_CFG_MAC_CORE(0x10, 0);
+
+		MAC_CFG_MAC(0x20, 4);
+		MAC_CFG_MAC(0x1C, 4);
+
+		switch (speed) {
+		case SPEED_10000:
+			MAC_CFG_XPCS(0x8010, 3);
+			break;
+		case SPEED_25000:
+			MAC_CFG_XPCS(0x8002, 0x4FFF);
+			MAC_CFG_XPCS(0x8010, 5);
+			MAC_CFG_XPCS(0x8008, 0x68C1);
+			MAC_CFG_XPCS(0x8009, 0x21);
+			MAC_CFG_XPCS(0x800A, 0xC4F0);
+			MAC_CFG_XPCS(0x800B, 0xE6);
+			MAC_CFG_XPCS(0x800C, 0x65C5);
+			MAC_CFG_XPCS(0x800D, 0x9B);
+			MAC_CFG_XPCS(0x800E, 0x79A2);
+			MAC_CFG_XPCS(0x800F, 0x3D);
+			break;
+		case SPEED_50000:
+			MAC_CFG_XPCS(0x8002, 0x4FFF);
+			MAC_CFG_XPCS(0x8010, 0);
+			MAC_CFG_XPCS(0x8008, 0x7690);
+			MAC_CFG_XPCS(0x8009, 0x47);
+			MAC_CFG_XPCS(0x800A, 0xC4F0);
+			MAC_CFG_XPCS(0x800B, 0xE6);
+			MAC_CFG_XPCS(0x800C, 0x65C5);
+			MAC_CFG_XPCS(0x800D, 0x9B);
+			MAC_CFG_XPCS(0x800E, 0x79A2);
+			MAC_CFG_XPCS(0x800F, 0x3D);
+			break;
+		case SPEED_100000:
+			MAC_CFG_XPCS(0x8002, 0x3FFF);
+			MAC_CFG_XPCS(0x8010, 0);
+			MAC_CFG_XPCS(0x8008, 0x68C1);
+			MAC_CFG_XPCS(0x8009, 0x21);
+			MAC_CFG_XPCS(0x800A, 0x719D);
+			MAC_CFG_XPCS(0x800B, 0x8E);
+			MAC_CFG_XPCS(0x800C, 0x4B59);
+			MAC_CFG_XPCS(0x800D, 0xE8);
+			MAC_CFG_XPCS(0x800E, 0x954D);
+			MAC_CFG_XPCS(0x800F, 0x7B);
+			MAC_CFG_XPCS(0x8048, 0x07F5);
+			MAC_CFG_XPCS(0x8049, 0x09);
+			MAC_CFG_XPCS(0x804A, 0x14DD);
+			MAC_CFG_XPCS(0x804B, 0xC2);
+			MAC_CFG_XPCS(0x804C, 0x4A9A);
+			MAC_CFG_XPCS(0x804D, 0x26);
+			MAC_CFG_XPCS(0x804E, 0x457B);
+			MAC_CFG_XPCS(0x804F, 0x66);
+			MAC_CFG_XPCS(0x8050, 0x24A0);
+			MAC_CFG_XPCS(0x8051, 0x76);
+			MAC_CFG_XPCS(0x8052, 0xC968);
+			MAC_CFG_XPCS(0x8053, 0xFB);
+			MAC_CFG_XPCS(0x8054, 0x6CFD);
+			MAC_CFG_XPCS(0x8055, 0x99);
+			MAC_CFG_XPCS(0x8056, 0x91B9);
+			MAC_CFG_XPCS(0x8057, 0x55);
+			MAC_CFG_XPCS(0x8058, 0xB95C);
+			MAC_CFG_XPCS(0x8059, 0xB2);
+			MAC_CFG_XPCS(0x805A, 0xF81A);
+			MAC_CFG_XPCS(0x805B, 0xBD);
+			MAC_CFG_XPCS(0x805C, 0xC783);
+			MAC_CFG_XPCS(0x805D, 0xCA);
+			MAC_CFG_XPCS(0x805E, 0x3635);
+			MAC_CFG_XPCS(0x805F, 0xCD);
+			MAC_CFG_XPCS(0x8060, 0x31C4);
+			MAC_CFG_XPCS(0x8061, 0x4C);
+			MAC_CFG_XPCS(0x8062, 0xD6AD);
+			MAC_CFG_XPCS(0x8063, 0xB7);
+			MAC_CFG_XPCS(0x8064, 0x665F);
+			MAC_CFG_XPCS(0x8065, 0x2A);
+			MAC_CFG_XPCS(0x8066, 0xF0C0);
+			MAC_CFG_XPCS(0x8067, 0xE5);
+			break;
+		default:
+			dev_err(hdev->dev,
+				"unknown NIC port %d speed %dMb/s, cannot configure MAC XPCS\n",
+				port, speed);
+			break;
+		}
+	}
+
+	switch (speed) {
+	case SPEED_10000:
+		MAC_CFG_MAC_CORE(0, 0xF0FF00);
+		MAC_CFG_MAC_CORE(0x1C, 0);
+		MAC_CFG_MAC_CORE(0x10, 0);
+		break;
+	case SPEED_25000:
+		MAC_CFG_MAC_CORE(0, 0xF0FF00);
+		MAC_CFG_MAC_CORE(0x18, 0x60F);
+		MAC_CFG_MAC_CORE(0x1C, 0);
+		MAC_CFG_MAC_CORE(0x10, 0);
+		break;
+	case SPEED_50000:
+		MAC_CFG_MAC_CORE(0x18, 0xFF);
+		MAC_CFG_MAC_CORE(0, 0xF0FFF0);
+		MAC_CFG_MAC_CORE(0x1C, 0);
+		MAC_CFG_XPCS91(0, 0x400);
+		MAC_CFG_XPCS91(0x8, 0x400);
+		MAC_CFG_XPCS91(0x10, 0x400);
+		MAC_CFG_XPCS91(0x18, 0x400);
+		break;
+	case SPEED_100000:
+		if (gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4) {
+			MAC_CFG_MAC_CORE(0, 0xF0FF00);
+			MAC_CFG_MAC_CORE(0x18, 0x0F);
+		} else {
+			MAC_CFG_MAC_CORE(0x18, 0xFF);
+		}
+		break;
+	default:
+		dev_err(hdev->dev,
+			"unknown NIC port %d speed %dMb/s, cannot configure MAC CORE\n",
+			port, speed);
+		break;
+	}
+}
+
+static int hw_config(struct gaudi_nic_device *gaudi_nic)
+{
+	u32 port = gaudi_nic->port, data_rate, speed = gaudi_nic->speed;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u64 mac_addr = 0, tmr_addr;
+	int i;
+
+	for (i = 0 ; i < ETH_ALEN ; i++) {
+		mac_addr <<= 8;
+		mac_addr |= gaudi_nic->ndev->dev_addr[i];
+	}
+
+	switch (speed) {
+	case SPEED_10000:
+		data_rate = NIC_DR_10;
+		break;
+	case SPEED_25000:
+		data_rate = NIC_DR_25;
+		break;
+	case SPEED_50000:
+		data_rate = NIC_DR_50;
+		break;
+	case SPEED_100000:
+		if (gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4)
+			data_rate = NIC_DR_25;
+		else
+			data_rate = NIC_DR_50;
+		break;
+	default:
+		data_rate = NIC_DR_50;
+		dev_err(hdev->dev,
+			"unknown NIC port %d speed, continue with 50 GHz\n",
+			port);
+		break;
+	}
+
+	dev_dbg(hdev->dev, "NIC port %d, speed %d data rate %d\n",
+		port, speed, data_rate);
+
+	gaudi_nic->data_rate = data_rate;
+
+	/* if no need in macro configuration, do only port configuration */
+	if (gaudi_nic->do_macro_cfg) {
+		config_port_mac(gaudi_nic);
+		config_port_hw(gaudi_nic, mac_addr);
+	} else {
+		config_port_hw(gaudi_nic, mac_addr);
+		goto out;
+	}
+
+	/*
+	 * the following registers are shared between each pair of ports,
+	 * hence need to configure only once per NIC macro
+	 */
+	/* RXB Configuration */
+	NIC_MACRO_WREG32(mmNIC0_RXB_LBW_OFFSET_0, CFG_BASE & 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_LBW_OFFSET_1, (CFG_BASE >> 32) & 0x3FFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_ICRC_CFG, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_0, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_1, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_2, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_3, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_0, 0xFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_1, 0xFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_2, 0xFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_3, 0xFFFF);
+	/* H/W WA for credit leakage */
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_0, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_1, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_2, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_3, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_4, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_5, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_6, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_7, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_8, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_9, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_10, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_11, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_12, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_13, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_14, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_15, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXUSER_10_0_UNTRUST, 1);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXUSER_10_0_TRUST, 0x400);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXUSER_10_0_PRIV, 0x400);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXPROT_PRIV, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXPROT_TRUST, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXPROT_UNTRUST, 2);
+
+	/* MAC filtering */
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_0, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_1, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_2, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_3, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_0, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_1, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_2, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_3, 0);
+
+	/* Credits allocation - all dynamic */
+	/* H/W WA for credit leakage */
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_DYNAMIC, 0xB36);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_0, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_1, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_2, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_3, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_4, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_5, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_6, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_7, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_8, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_9, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_10, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_11, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_12, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_13, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_14, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_15, 0x41);
+
+	/* TMR Configuration */
+	tmr_addr = TMR_BASE_ADDR + gaudi_nic->nic_macro->idx * TMR_BASE_SIZE;
+
+	/* Clear timer FSM0 */
+	for (i = 0 ; i < NIC_HW_MAX_QP_NUM ; i++)
+		writeb(0, hdev->pcie_bar[HBM_BAR_ID] +
+			((tmr_addr + TMR_FSM0_OFFS + i) -
+				gaudi->hbm_bar_cur_addr));
+
+	/* Clear timer FSM1 */
+	for (i = 0 ; i < NIC_HW_MAX_QP_NUM ; i++)
+		writeb(0, hdev->pcie_bar[HBM_BAR_ID] +
+			((tmr_addr + TMR_FSM1_OFFS + i) -
+				gaudi->hbm_bar_cur_addr));
+
+	/* Timer free list */
+	for (i = 0 ; i < TMR_FREE_NUM_ENTRIES ; i++)
+		writel(TMR_GRANULARITY + i, hdev->pcie_bar[HBM_BAR_ID] +
+			((tmr_addr + TMR_FREE_OFFS + i * 4) -
+				gaudi->hbm_bar_cur_addr));
+
+	/* Perform read to flush the writes */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_49_18,
+				(tmr_addr + TMR_FIFO_OFFS) >> 18);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_17_7,
+				((tmr_addr + TMR_FIFO_OFFS) >> 7) & 0x7FF);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_FREE_LIST_49_32,
+				(tmr_addr + TMR_FREE_OFFS) >> 32);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_FREE_LIST_31_0,
+				(tmr_addr + TMR_FREE_OFFS) & 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_CACHE_BASE_ADDR_49_32,
+				(tmr_addr + TMR_FSM0_OFFS) >> 32);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_CACHE_BASE_ADDR_31_7,
+				((tmr_addr + TMR_FSM0_OFFS) >> 7) & 0xFFFFFF);
+
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_31_0, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_63_32, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_95_64, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_191_160, 1000);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_216_192, 0);
+
+	for (i = 0 ; i < TMR_GRANULARITY ; i++) {
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_127_96, i);
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_159_128, i);
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_FIFO, i);
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_EN, 1);
+	}
+
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCAN_TIMER_COMP_31_0, 10);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_TICK_WRAP, 500);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_LIST_MASK,
+			~(0xFFFFFFFF << (ilog2(TMR_FREE_NUM_ENTRIES) - 5)));
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PRODUCER_UPDATE, TMR_FREE_NUM_ENTRIES);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PRODUCER_UPDATE_EN, 1);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PRODUCER_UPDATE_EN, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_LIST_MEM_READ_MASK, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PUSH_LOCK_EN, 1);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_TIMER_EN, 1);
+	NIC_MACRO_WREG32(mmNIC0_TMR_FREE_LIST_PUSH_MASK_EN, 0);
+
+out:
+	/* Perform read from the device to flush all configurations */
+	NIC_MACRO_RREG32(mmNIC0_TMR_TMR_TIMER_EN);
+
+	return 0;
+}
+
+static bool write_pkt_to_hw(struct gaudi_nic_device *gaudi_nic, u64 *data,
+				u64 size)
+{
+	u32 port = gaudi_nic->port, pi = gaudi_nic->tx_pi, diff, new_pi,
+		ci = gaudi_nic->tx_ci;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u64 swq_addr, sb_base_address, swq_base_addr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct sq_wqe swq;
+	u64 *swq_p;
+	int i;
+
+	swq_p = (u64 *) &swq;
+
+	if (pi >= ci)
+		diff = pi - ci;
+	else
+		diff = WQ_BUFFER_SIZE - ci + pi;
+
+	/* update CI once in a while */
+	if (diff > (WQ_BUFFER_SIZE >> 1))
+		gaudi_nic->tx_ci = ci = NIC_RREG32(mmNIC0_QPC0_REQ_RING0_CI);
+
+	new_pi = (pi + 1) & (WQ_BUFFER_SIZE - 1);
+	if (new_pi == ci)
+		return false;
+
+	gaudi_nic->tx_pi = new_pi;
+
+	sb_base_address = (SB_BASE_ADDR + port * SB_BASE_SIZE) +
+				pi * NIC_MAX_PKT_SIZE;
+	swq_base_addr = SWQ_BASE_ADDR + port * SWQ_BASE_SIZE;
+
+	/* Create SWQ */
+	memset(&swq, 0, sizeof(swq));
+	CFG_SQ_WQE_OPCODE(swq, WQE_LINEAR);
+	CFG_SQ_WQE_LOCAL_ADDRESS_31_0(swq, sb_base_address & 0xFFFFFFFF);
+	CFG_SQ_WQE_LOCAL_ADDRESS_49_32(swq, (sb_base_address >> 32) & 0x3FFFF);
+	CFG_SQ_WQE_SIZE(swq, size);
+
+	/* Copy packet to SB */
+	for (i = 0 ; i < size ; i++)
+		writeq(data[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((sb_base_address + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	/* Copy WQE to SWQ Buffer */
+	for (i = 0 ; i < (sizeof(swq) / sizeof(u64)) ; i++) {
+		swq_addr = swq_base_addr +
+				(pi * sizeof(struct sq_wqe) + i * 8);
+		writeq(swq_p[i], hdev->pcie_bar[HBM_BAR_ID] +
+				(swq_addr - gaudi->hbm_bar_cur_addr));
+	}
+
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	/* Make sure we ring the doorbell after the data copying */
+	mb();
+
+	/* Doorbell push */
+	NIC_WREG32(mmNIC0_QPC0_SECURED_DOORBELL_PI, new_pi);
+	NIC_WREG32(mmNIC0_QPC0_SECURED_DOORBELL_QPN, 0x80000000 | RAW_QPN);
+
+	return true;
+}
+
+static enum eth_pkt_status get_pkt_from_hw(struct gaudi_nic_device *gaudi_nic,
+						u64 *ppkt_addr, u32 *ppkt_size,
+						u32 *pi)
+{
+	u64 pkt_addr, mem_addr = (u64) (uintptr_t) gaudi_nic->rx_mem_cpu;
+	u32 ci = gaudi_nic->rx_ci, ether_type, tpid, ipv4_len, ipv6_len,
+		pkt_size, hdr_size = ETH_HLEN, port = gaudi_nic->port;
+	enum eth_pkt_status pkt_status = ETH_PKT_OK;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	bool vlan_double_tag = false;
+	__be32 *data;
+	int idx;
+
+	/*
+	 * check if packet is available by reading the PI, but do it only if
+	 * needed as it is expensive
+	 */
+	if (*pi == ci) {
+		*pi = NIC_RREG32(mmNIC0_QPC0_RES_RING0_PI) & (NIC_RX_SIZE - 1);
+		if (*pi == ci)
+			return ETH_PKT_NONE;
+	}
+
+	pkt_addr = mem_addr + ci * NIC_MAX_PKT_SIZE;
+	data = (__be32 *) pkt_addr;
+
+	/* skip MAC header */
+	idx = (ETH_ALEN * 2) / 4;
+
+	/* handle VLAN tagging */
+	tpid = ntohl(data[idx++]) >> 16;
+	if (tpid == ETH_P_8021AD) {
+		/* skip VLAN double tagging */
+		tpid = ntohl(data[idx++]) >> 16;
+		vlan_double_tag = true;
+		hdr_size += 4;
+	}
+
+	if (tpid == ETH_P_8021Q) {
+		/* skip VLAN tagging */
+		ether_type = ntohl(data[idx++]) >> 16;
+		hdr_size += 4;
+	} else if (vlan_double_tag) {
+		dev_dbg_ratelimited(hdev->dev,
+					"Wrong VLAN TPID double tagging 0x%x\n",
+					tpid);
+		ether_type = UINT_MAX;
+	} else {
+		ether_type = tpid;
+	}
+
+	if (ether_type <= ETH_DATA_LEN) {
+		pkt_size = ether_type;
+	} else if (ether_type == ETH_P_ARP) {
+		pkt_size = hdr_size + NIC_ARP_PKT_SIZE;
+	} else if (ether_type == ETH_P_IP) {
+		ipv4_len = ntohl(data[idx]) >> 16;
+		pkt_size = hdr_size + ipv4_len;
+	} else if (ether_type == ETH_P_IPV6) {
+		ipv6_len = ntohl(data[idx]) & 0xFFFF;
+		pkt_size = hdr_size + ipv6_len + sizeof(struct ipv6hdr);
+	} else if ((ether_type == ETH_P_LLDP) ||
+			(ether_type == ETH_P_LOOPBACK)) {
+		pkt_size = hdr_size + ETH_DATA_LEN;
+	} else {
+		dev_dbg_ratelimited(hdev->dev,
+					"error, unsupported EtherType 0x%x, port %d\n",
+					ether_type, port);
+		pkt_status = ETH_PKT_DROP;
+		goto out;
+	}
+
+	if (pkt_size > NIC_MAX_PKT_SIZE) {
+		dev_dbg_ratelimited(hdev->dev,
+				"error, packet size %uB exceeds maximum of %uB, port %d\n",
+				pkt_size, NIC_MAX_PKT_SIZE, port);
+		pkt_status = ETH_PKT_DROP;
+		goto out;
+	}
+
+#if HL_NIC_DEBUG
+	dev_dbg_ratelimited(hdev->dev,
+				"port %d packet_size %d ether_type 0x%x\n",
+				gaudi_nic->port, pkt_size,
+				ether_type);
+#endif
+
+	*ppkt_addr = pkt_addr;
+	*ppkt_size = pkt_size;
+out:
+	gaudi_nic->rx_ci = (ci + 1) & (NIC_RX_SIZE - 1);
+
+	return pkt_status;
+}
+
+static int gaudi_nic_handle_rx_pkt(struct gaudi_nic_device *gaudi_nic,
+					int budget, u32 *last_pi)
+{
+	struct net_device_stats *stats = &gaudi_nic->ndev->stats;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 pkt_size, pi = gaudi_nic->rx_ci;
+	enum eth_pkt_status pkt_status;
+	int rc, pkt_count = 0;
+	struct sk_buff *skb;
+	u64 pkt_address;
+
+	if (!gaudi_nic->active)
+		return 0;
+
+	while (1) {
+		if (pkt_count >= budget || disabled_or_in_reset(gaudi_nic))
+			break;
+
+		pkt_status = get_pkt_from_hw(gaudi_nic, &pkt_address, &pkt_size,
+						&pi);
+
+		if (pkt_status == ETH_PKT_NONE)
+			break;
+
+		pkt_count++;
+
+		if (pkt_status == ETH_PKT_DROP) {
+			stats->rx_dropped++;
+			continue;
+		}
+
+		if (hdev->nic_rx_poll)
+			skb = netdev_alloc_skb_ip_align(gaudi_nic->ndev,
+							pkt_size);
+		else
+			skb = napi_alloc_skb(&gaudi_nic->napi, pkt_size);
+
+		if (!skb)
+			break;
+
+		skb_copy_to_linear_data(skb, (void *) pkt_address, pkt_size);
+		skb_put(skb, pkt_size);
+		skb->protocol = eth_type_trans(skb, gaudi_nic->ndev);
+
+#if HL_NIC_DEBUG
+		dev_dbg_ratelimited(hdev->dev,
+					"port: %d, addr: 0x%llx, size: %d, rx_ci: %d\n",
+					gaudi_nic->port, pkt_address, pkt_size,
+					gaudi_nic->rx_ci);
+#endif
+
+		rc = netif_receive_skb(skb);
+		if (rc == NET_RX_SUCCESS) {
+			stats->rx_packets++;
+			stats->rx_bytes += pkt_size;
+			pkt_count++;
+		} else {
+			stats->rx_dropped++;
+		}
+	}
+
+	*last_pi = pi;
+
+	return pkt_count;
+}
+
+static void rx_pkt_poll(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							rx_poll_work.work);
+	u32 ignore;
+
+	gaudi_nic_handle_rx_pkt(gaudi_nic, NIC_NAPI_MAX_RX_BUDGET, &ignore);
+	schedule_delayed_work(&gaudi_nic->rx_poll_work, msecs_to_jiffies(1));
+}
+
+static void gaudi_nic_reenable_rx_irq(struct gaudi_nic_device *gaudi_nic,
+								u32 last_pi)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 new_pi;
+
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_CLR, 0xFFFF);
+
+	if (gaudi_nic->active) {
+		/*
+		 * packets can still arrive when IRQ is disabled. Hence if the
+		 * PI has changed since we finished to handle the Rx ring, it
+		 * means we have more packets to process. Hence we generate an
+		 * IRQ to handle them.
+		 */
+		new_pi = NIC_RREG32(mmNIC0_QPC0_RES_RING0_PI) &
+				(NIC_RX_SIZE - 1);
+		if (last_pi != new_pi)
+			WREG32(gaudi_nic->rx_msi_addr, 1);
+	}
+}
+
+static int napi_clean(struct napi_struct *napi, int budget)
+{
+	struct gaudi_nic_device *gaudi_nic =
+			container_of(napi, struct gaudi_nic_device, napi);
+	int work_done;
+	u32 last_pi;
+
+	work_done = gaudi_nic_handle_rx_pkt(gaudi_nic, budget, &last_pi);
+
+	/* If budget not fully consumed, exit the polling mode */
+	if (work_done < budget) {
+		napi_complete_done(napi, work_done);
+		gaudi_nic_reenable_rx_irq(gaudi_nic, last_pi);
+	}
+
+	return work_done;
+}
+
+irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg)
+{
+	struct gaudi_nic_device *gaudi_nic = arg;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi;
+
+	gaudi = gaudi_nic->hdev->asic_specific;
+
+	if (!hdev->nic_rx_poll)
+		gaudi->nic_handle_rx(gaudi_nic);
+
+	return IRQ_HANDLED;
+}
+
+static void set_port_status(struct gaudi_nic_device *gaudi_nic, bool active)
+{
+	if (gaudi_nic->active == active)
+		return;
+
+	if (active) {
+		netif_wake_queue(gaudi_nic->ndev);
+		netif_start_queue(gaudi_nic->ndev);
+		netif_carrier_on(gaudi_nic->ndev);
+		gaudi_nic->active = true;
+	} else {
+		netif_stop_queue(gaudi_nic->ndev);
+		netif_carrier_off(gaudi_nic->ndev);
+		gaudi_nic->active = false;
+	}
+}
+
+static void port_reset_state(struct gaudi_nic_device *gaudi_nic)
+{
+	kfifo_reset(&gaudi_nic->pcs_fail_fifo);
+	gaudi_nic->pcs_link = false;
+	gaudi_nic->auto_neg_resolved = false;
+	gaudi_nic->phy_fw_tuned = false;
+	gaudi_nic->retry_cnt = 0;
+	gaudi_nic->pcs_fail_cnt = 0;
+	gaudi_nic->pcs_local_fault_cnt = 0;
+	gaudi_nic->pcs_remote_fault_cnt = 0;
+	gaudi_nic->correctable_errors_cnt = 0;
+	gaudi_nic->uncorrectable_errors_cnt = 0;
+}
+
+static int _gaudi_nic_sw_init(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int rc;
+
+	gaudi_nic->rx_mem_size = NIC_RX_SIZE * NIC_MAX_PKT_SIZE;
+
+	gaudi_nic->rx_mem_cpu = hdev->asic_funcs->asic_dma_alloc_coherent(hdev,
+							gaudi_nic->rx_mem_size,
+							&gaudi_nic->rx_mem_dma,
+							GFP_KERNEL);
+	if (!gaudi_nic->rx_mem_cpu) {
+		dev_err(hdev->dev, "Failed to allocate Rx memory, port: %d\n",
+			port);
+		return -ENOMEM;
+	}
+
+	gaudi_nic->cq_mem_size = CQ_PORT_BUF_SIZE;
+
+	if (!IS_ALIGNED(gaudi_nic->cq_mem_size, PAGE_SIZE_4KB)) {
+		dev_err(hdev->dev,
+			"NIC CQ port buffer size should be aligned to 4KB, port: %d\n",
+			port);
+		rc = -EFAULT;
+		goto free_rx;
+	}
+
+	gaudi_nic->cq_mem_cpu = hdev->asic_funcs->asic_dma_alloc_coherent(hdev,
+							gaudi_nic->cq_mem_size,
+							&gaudi_nic->cq_mem_dma,
+							GFP_KERNEL);
+	if (!gaudi_nic->cq_mem_cpu) {
+		dev_err(hdev->dev, "Failed to allocate CQ memory, port: %d\n",
+			port);
+		rc = -ENOMEM;
+		goto free_rx;
+	}
+
+	gaudi_nic->qp_err_mem_size = QP_ERR_BUF_SIZE;
+
+	gaudi_nic->qp_err_mem_cpu = hdev->asic_funcs->asic_dma_alloc_coherent(
+						hdev,
+						gaudi_nic->qp_err_mem_size,
+						&gaudi_nic->qp_err_mem_dma,
+						GFP_KERNEL);
+	if (!gaudi_nic->qp_err_mem_cpu) {
+		dev_err(hdev->dev,
+			"Failed to allocate QP error memory, port: %d\n",
+			port);
+		rc = -ENOMEM;
+		goto free_cq;
+	}
+
+	mutex_init(&gaudi_nic->user_wq_lock);
+
+	mutex_init(&gaudi_nic->idr_lock);
+	idr_init(&gaudi_nic->qp_ids);
+
+	return 0;
+
+free_cq:
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->cq_mem_size,
+							gaudi_nic->cq_mem_cpu,
+							gaudi_nic->cq_mem_dma);
+free_rx:
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->rx_mem_size,
+							gaudi_nic->rx_mem_cpu,
+							gaudi_nic->rx_mem_dma);
+
+	return rc;
+}
+
+static void _gaudi_nic_sw_fini(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	idr_destroy(&gaudi_nic->qp_ids);
+	mutex_destroy(&gaudi_nic->idr_lock);
+
+	mutex_destroy(&gaudi_nic->user_wq_lock);
+
+	hdev->asic_funcs->asic_dma_free_coherent(hdev,
+						gaudi_nic->qp_err_mem_size,
+						gaudi_nic->qp_err_mem_cpu,
+						gaudi_nic->qp_err_mem_dma);
+
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->cq_mem_size,
+							gaudi_nic->cq_mem_cpu,
+							gaudi_nic->cq_mem_dma);
+
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->rx_mem_size,
+							gaudi_nic->rx_mem_cpu,
+							gaudi_nic->rx_mem_dma);
+}
+
+int gaudi_nic_sw_init(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	int rc, i, init_cnt = 0;
+
+	/* At this stage, we don't know how many links we have, so we must
+	 * allocate for the maximum number of links (and also free all of them
+	 * in sw_fini
+	 */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++, init_cnt++) {
+		gaudi_nic = &gaudi->nic_devices[i];
+		gaudi_nic->hdev = hdev;
+		gaudi_nic->port = i;
+
+		rc = _gaudi_nic_sw_init(gaudi_nic);
+		if (rc) {
+			dev_err(hdev->dev,
+				"NIC S/W init failed, port: %d, rc: %d\n", i,
+				rc);
+			goto err;
+		}
+	}
+
+	mutex_init(&gaudi->nic_cq_user_lock);
+	mutex_init(&gaudi->nic_qp_err_lock);
+
+	return 0;
+
+err:
+	for (i = 0 ; i < init_cnt ; i++)
+		_gaudi_nic_sw_fini(&gaudi->nic_devices[i]);
+
+	return rc;
+}
+
+void gaudi_nic_sw_fini(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int i;
+
+	mutex_destroy(&gaudi->nic_qp_err_lock);
+	mutex_destroy(&gaudi->nic_cq_user_lock);
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++)
+		_gaudi_nic_sw_fini(&gaudi->nic_devices[i]);
+}
+
+
+/* used for physically contiguous memory only */
+static int map_nic_mem(struct hl_device *hdev, u64 va, dma_addr_t pa, u32 size)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct hl_ctx *ctx = hdev->kernel_ctx;
+	s64 off;
+	int rc;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+		return 0;
+
+	mutex_lock(&ctx->mmu_lock);
+
+	for (off = 0 ; off < size ; off += PAGE_SIZE_4KB) {
+		rc = hl_mmu_map(ctx, va + off, pa + off, PAGE_SIZE_4KB,
+				(off + PAGE_SIZE_4KB) >= size);
+		if (rc) {
+			dev_err(hdev->dev,
+				"Map failed for va 0x%llx to pa 0x%llx\n",
+				va + off, pa + off);
+			goto unmap;
+		}
+	}
+
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, false, 0);
+
+	mutex_unlock(&ctx->mmu_lock);
+
+	return 0;
+
+unmap:
+	for (; off >= 0 ; off -= PAGE_SIZE_4KB)
+		if (hl_mmu_unmap(ctx, va + off, PAGE_SIZE_4KB,
+					(off - (s32) PAGE_SIZE_4KB) < 0))
+			dev_warn_ratelimited(hdev->dev,
+					"failed to unmap va 0x%llx\n",
+					va + off);
+
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, true, 0);
+
+	mutex_unlock(&ctx->mmu_lock);
+
+	return rc;
+}
+
+static void unmap_nic_mem(struct hl_device *hdev, u64 va, u32 size)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct hl_ctx *ctx = hdev->kernel_ctx;
+	s64 off;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+		return;
+
+	mutex_lock(&ctx->mmu_lock);
+
+	for (off = 0 ; off < size ; off += PAGE_SIZE_4KB)
+		if (hl_mmu_unmap(ctx, va + off, PAGE_SIZE_4KB,
+				       (off + PAGE_SIZE_4KB) >= size))
+			dev_warn_ratelimited(hdev->dev,
+					"Failed to unmap va 0x%llx\n",
+					va + off);
+
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, true, 0);
+
+	mutex_unlock(&ctx->mmu_lock);
+}
+
+static int map_cq_mem(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU)) {
+		gaudi_nic->cq_mem_device_va = gaudi_nic->cq_mem_dma;
+		return 0;
+	}
+
+	gaudi_nic->cq_mem_device_va = CQ_VIRTUAL_ADDRESS +
+				gaudi_nic->port * gaudi_nic->cq_mem_size;
+
+	return map_nic_mem(hdev, gaudi_nic->cq_mem_device_va,
+				gaudi_nic->cq_mem_dma, gaudi_nic->cq_mem_size);
+}
+
+static void unmap_cq_mem(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+		return;
+
+	unmap_nic_mem(hdev, gaudi_nic->cq_mem_device_va,
+			gaudi_nic->cq_mem_size);
+}
+
+static void mac_channels_init(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_nic_macro *nic_macro = gaudi_nic->nic_macro;
+	u32 port = gaudi_nic->port;
+
+	if (gaudi_nic->auto_neg_enable) {
+		if (gaudi_nic->speed == SPEED_100000) {
+			if (nic_macro->num_of_lanes == NIC_LANES_4) {
+				gaudi_nic->power_up_mask = 0x1;
+				gaudi_nic->fw_tuning_mask = 0xF;
+			} else {
+				gaudi_nic->power_up_mask =
+							(port & 1) ? 0xC : 0x3;
+				gaudi_nic->fw_tuning_mask =
+							(port & 1) ? 0xC : 0x3;
+				gaudi_nic->auto_neg_mask =
+							(port & 1) ? 0x4 : 0x1;
+			}
+		} else {
+			gaudi_nic->fw_tuning_mask = gaudi_nic->power_up_mask =
+				(port & 1) ? 0xC : 0x3;
+		}
+	} else {
+		if (nic_macro->num_of_lanes == NIC_LANES_2)
+			gaudi_nic->power_up_mask = (port & 1) ? 0xC : 0x3;
+		else
+			/*
+			 * in the special mode of 100000Mb/s with 4 lanes, only
+			 * the even port should be up and should configure all
+			 * the lanes
+			 */
+			gaudi_nic->power_up_mask = 0xF;
+
+		gaudi_nic->fw_tuning_mask = gaudi_nic->power_up_mask;
+	}
+}
+
+static int port_open(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u32 port = gaudi_nic->port, pcs_fifo_size;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	char cq_wq_name[15] = {0};
+	int rc, rx_irq = 0;
+
+	if (gaudi_nic->port_open)
+		return 0;
+
+	/*
+	 * Temporary WA until DevOps starts to use nic_mac_loopback properly by
+	 * writing a bitmask rather than a boolean (SW-15223).
+	 * When they implement that, the following code should be used:
+	 * !!(gaudi->nic_mac_loopback_mask & BIT(port))
+	 */
+	gaudi_nic->mac_loopback = !!gaudi->nic_mac_loopback;
+
+	gaudi_nic->auto_neg_enable = !!(hdev->nic_auto_neg_mask & BIT(port));
+	mac_channels_init(gaudi_nic);
+
+	pcs_fifo_size = gaudi->nic_pcs_fail_threshold * sizeof(ktime_t);
+	if (!is_power_of_2(pcs_fifo_size)) {
+		dev_err(hdev->dev,
+			"PCS fifo size must be a power of 2, port: %d\n", port);
+		return -EFAULT;
+	}
+
+	rc = kfifo_alloc(&gaudi_nic->pcs_fail_fifo, pcs_fifo_size, GFP_KERNEL);
+	if (rc) {
+		dev_err(hdev->dev, "PCS fifo alloc failed, port: %d\n", port);
+		return rc;
+	}
+
+	/*
+	 * Workaround for H3 #HW-2061 bug.
+	 * MMU bypass cannot be set to the NIC CQ. But since it uses ASID 0, we
+	 * solve it by mapping the CQ buffer.
+	 */
+	rc = map_cq_mem(gaudi_nic);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to map NIC CQ buffer, port: %d\n",
+			port);
+		goto pcs_fifo_free;
+	}
+
+	memset(gaudi_nic->rx_mem_cpu, 0, gaudi_nic->rx_mem_size);
+	memset(gaudi_nic->cq_mem_cpu, 0, gaudi_nic->cq_mem_size);
+
+	snprintf(cq_wq_name, sizeof(cq_wq_name) - 1, "nic%d-cq",
+			gaudi_nic->port);
+
+	/*
+	 * Use only one thread because cq_irq_work() should not be executed
+	 * concurrently for the same port.
+	 */
+	gaudi_nic->cq_wq = create_singlethread_workqueue(cq_wq_name);
+	if (!gaudi_nic->cq_wq) {
+		dev_err(hdev->dev, "Failed to create CQ WQ, port: %d, %d\n",
+			port, rc);
+		goto cq_unmap;
+	}
+
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		rx_irq = pci_irq_vector(hdev->pdev, RX_MSI_IDX + port);
+
+		rc = request_irq(rx_irq, gaudi_nic_rx_irq_handler, 0,
+					gaudi_nic->ndev->name,
+					gaudi_nic);
+		if (rc) {
+			dev_err(hdev->dev,
+				"Failed to request Rx IRQ %d, port: %d, %d\n",
+				rx_irq, port, rc);
+			goto cq_wq_free;
+		}
+	}
+
+	gaudi_nic->rx_ci = gaudi_nic->tx_pi = gaudi_nic->tx_ci =
+		gaudi_nic->cq_ci = gaudi_nic->last_cqe_cnt = 0;
+
+	gaudi_nic->cq_delay = usecs_to_jiffies(1);
+	gaudi_nic->cq_delay_idle = msecs_to_jiffies(1);
+
+	/* after hw_config(), interrupts may arrive */
+	rc = hw_config(gaudi_nic);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to configure NIC H/W, port: %d, %d",
+					port, rc);
+		goto rx_irq_free;
+	}
+
+	eth_start_stop(gaudi_nic, true);
+
+	if (hdev->nic_rx_poll) {
+		/*
+		 * init the delayed work here to support on the fly switch
+		 * between NAPI and polling mode.
+		 */
+		INIT_DELAYED_WORK(&gaudi_nic->rx_poll_work, rx_pkt_poll);
+		schedule_delayed_work(&gaudi_nic->rx_poll_work,
+					msecs_to_jiffies(1));
+	} else {
+		napi_enable(&gaudi_nic->napi);
+	}
+
+	set_port_status(gaudi_nic, true);
+
+	gaudi_nic->port_open = true;
+
+	return 0;
+
+rx_irq_free:
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		synchronize_irq(rx_irq);
+		free_irq(rx_irq, gaudi_nic);
+	}
+cq_wq_free:
+	destroy_workqueue(gaudi_nic->cq_wq);
+cq_unmap:
+	unmap_cq_mem(gaudi_nic);
+pcs_fifo_free:
+	kfifo_free(&gaudi_nic->pcs_fail_fifo);
+
+	return rc;
+}
+
+static void port_open_work(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							port_open_work.work);
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int rc;
+
+	rc = port_open(gaudi_nic);
+	if (rc)
+		dev_err(hdev->dev, "Failed to init NIC H/W, port: %d\n",
+			gaudi_nic->port);
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+}
+
+static void port_close(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int irq;
+
+	cancel_delayed_work_sync(&gaudi_nic->port_open_work);
+
+	if (!gaudi_nic->port_open)
+		return;
+
+	gaudi_nic->port_open = false;
+	gaudi_nic->active = false;
+
+	/* Print if not in hard reset flow e.g. from ifconfig */
+	if (gaudi_nic->pcs_link && !hdev->hard_reset_pending)
+		dev_info(hdev->dev, "port %d was closed\n", port);
+
+	port_reset_state(gaudi_nic);
+
+	kfifo_free(&gaudi_nic->pcs_fail_fifo);
+
+	/* disable Tx in S/W */
+	netif_stop_queue(gaudi_nic->ndev);
+
+	/* disable Rx/Tx in H/W */
+	eth_start_stop(gaudi_nic, false);
+
+	if (hdev->nic_rx_poll) {
+		cancel_delayed_work_sync(&gaudi_nic->rx_poll_work);
+	} else {
+		napi_synchronize(&gaudi_nic->napi);
+		napi_disable(&gaudi_nic->napi);
+	}
+
+	/* disable Rx in S/W */
+	if (hdev->pdev) {
+		if (gaudi->multi_msi_mode) {
+			irq = pci_irq_vector(hdev->pdev, RX_MSI_IDX + port);
+			synchronize_irq(irq);
+			free_irq(irq, gaudi_nic);
+		} else {
+			irq = pci_irq_vector(hdev->pdev, 0);
+			synchronize_irq(irq);
+		}
+	}
+
+	netif_carrier_off(gaudi_nic->ndev);
+
+	flush_workqueue(gaudi_nic->cq_wq);
+	destroy_workqueue(gaudi_nic->cq_wq);
+
+	unmap_cq_mem(gaudi_nic);
+}
+
+int gaudi_nic_port_reset(struct gaudi_nic_device *gaudi_nic)
+{
+	port_close(gaudi_nic);
+	return port_open(gaudi_nic);
+}
+
+static int gaudi_nic_open(struct net_device *netdev)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	if (gaudi_nic->enabled)
+		return 0;
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		dev_err(hdev->dev, "port %d is in reset, can't open it\n",
+			gaudi_nic->port);
+		return -EBUSY;
+	}
+
+	netif_carrier_off(netdev);
+
+	/* in_reset will be set to 0 in port_open_work() */
+	INIT_DELAYED_WORK(&gaudi_nic->port_open_work, port_open_work);
+	schedule_delayed_work(&gaudi_nic->port_open_work, msecs_to_jiffies(1));
+
+	gaudi_nic->enabled = true;
+
+	return 0;
+}
+
+static int gaudi_nic_close(struct net_device *netdev)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi;
+
+	gaudi = hdev->asic_specific;
+
+	if (!gaudi_nic->enabled)
+		return 0;
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		if (!gaudi->nic_in_teardown)
+			dev_err(hdev->dev,
+				"port %d is in reset, can't close it\n",
+				gaudi_nic->port);
+		return -EBUSY;
+	}
+
+	/*
+	 * this function may be called from 'ifconfig <nic_name> down', hence
+	 * the cleanup
+	 */
+	port_close(gaudi_nic);
+
+	gaudi_nic->enabled = false;
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+
+	return 0;
+}
+
+netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
+					struct sk_buff *skb)
+{
+	struct net_device_stats *stats = &gaudi_nic->ndev->stats;
+	bool pkt_sent;
+
+	if (!gaudi_nic->active || gaudi_nic->mac_loopback)
+		return NETDEV_TX_OK;
+
+	if (disabled_or_in_reset(gaudi_nic))
+		return NETDEV_TX_BUSY;
+
+	if (skb->len <= 0) {
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+
+#if HL_NIC_DEBUG
+	{
+		struct hl_device *hdev = gaudi_nic->hdev;
+
+		dev_dbg_ratelimited(hdev->dev,
+			"port: %d, addr: 0x%p, size: %d, tx_pi: %d, tx_ci: %d\n",
+			gaudi_nic->port, skb->data, skb->len,
+			gaudi_nic->tx_pi, gaudi_nic->tx_ci);
+	}
+#endif
+
+	pkt_sent = write_pkt_to_hw(gaudi_nic, (u64 *) skb->data, skb->len);
+	if (pkt_sent) {
+		stats->tx_packets++;
+		stats->tx_bytes += skb->len;
+	}
+
+	dev_kfree_skb_any(skb);
+
+	return NETDEV_TX_OK;
+}
+
+static netdev_tx_t gaudi_nic_xmit_frame(struct sk_buff *skb,
+					struct net_device *netdev)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct gaudi_device *gaudi;
+
+	gaudi = gaudi_nic->hdev->asic_specific;
+
+	return (netdev_tx_t) gaudi->nic_handle_tx(gaudi_nic, skb);
+}
+
+static int gaudi_nic_change_mtu(struct net_device *netdev, int new_mtu)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int rc;
+
+#ifndef _HAS_MIN_MAX_MTU
+	if (new_mtu < (ETH_ZLEN + ETH_FCS_LEN + VLAN_HLEN) ||
+			new_mtu > NIC_MAX_MTU)
+		return -EOPNOTSUPP;
+#endif
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		dev_err(hdev->dev, "port %d is in reset, can't change MTU",
+			port);
+		return -EBUSY;
+	}
+
+	if (gaudi_nic->enabled) {
+		port_close(gaudi_nic);
+		netdev->mtu = new_mtu;
+		rc = port_open(gaudi_nic);
+		if (rc)
+			dev_err(hdev->dev,
+				"Failed to reinit port %d for MTU change, rc %d",
+				port, rc);
+	}
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+
+	return 0;
+}
+
+static const struct net_device_ops gaudi_nic_netdev_ops = {
+	.ndo_open		= gaudi_nic_open,
+	.ndo_stop		= gaudi_nic_close,
+	.ndo_start_xmit		= gaudi_nic_xmit_frame,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= gaudi_nic_change_mtu,
+};
+
+static int port_register(struct hl_device *hdev, int port)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct gaudi_nic_device **ptr;
+	struct net_device *ndev;
+	int rc;
+
+	gaudi_nic = &gaudi->nic_devices[port];
+
+	ndev = alloc_etherdev(sizeof(struct gaudi_nic_device *));
+	if (!ndev) {
+		dev_err(hdev->dev, "netdevice %d alloc failed\n", port);
+		return -ENOMEM;
+	}
+
+	gaudi_nic->ndev = ndev;
+	gaudi_nic->speed = hdev->pldm ? SPEED_50000 : SPEED_100000;
+	gaudi_nic->nic_macro = &gaudi->nic_macros[port >> 1];
+
+	if (gaudi_nic->speed != SPEED_100000 &&
+		gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4) {
+		dev_err(hdev->dev,
+			"NIC %d with 4 lanes should be used only with speed of 100000Mb/s\n",
+			port);
+		rc = -EFAULT;
+		goto netdev_free;
+	}
+
+	if (gaudi_nic->speed == SPEED_100000 &&
+			gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4 &&
+			(port & 1)) {
+		dev_err(hdev->dev,
+			"only even NIC ports should be up for speed of 100000Mb/s with 4 lanes\n");
+		rc = -EFAULT;
+		goto netdev_free;
+	}
+
+	gaudi_nic->pfc_enable = true;
+
+	SET_NETDEV_DEV(ndev, hdev->pdev ? &hdev->pdev->dev : NULL);
+	ptr = netdev_priv(ndev);
+	*ptr = gaudi_nic;
+
+	/* this is necessary for creating multiple NICs by the same driver */
+	ndev->dev_port = port;
+
+	ndev->netdev_ops = &gaudi_nic_netdev_ops;
+	ndev->watchdog_timeo = NIC_TX_TIMEOUT;
+	ndev->min_mtu = ETH_MIN_MTU;
+	ndev->max_mtu = NIC_MAX_MTU;
+
+	netif_napi_add(ndev, &gaudi_nic->napi, napi_clean,
+			NIC_NAPI_MAX_RX_BUDGET);
+
+	ether_addr_copy(ndev->dev_addr,
+		hdev->asic_prop.cpucp_nic_info.mac_addrs[port].mac_addr);
+
+	if (register_netdev(ndev)) {
+		dev_err(hdev->dev,
+			"Could not register netdevice, port: %d\n", port);
+		rc = -EFAULT;
+		goto netdev_free;
+	}
+
+	netif_carrier_off(ndev);
+
+	return 0;
+
+netdev_free:
+	free_netdev(ndev);
+	gaudi_nic->ndev = NULL;
+
+	return rc;
+}
+
+static void port_unregister(struct gaudi_nic_device *gaudi_nic)
+{
+	unregister_netdev(gaudi_nic->ndev);
+
+	free_netdev(gaudi_nic->ndev);
+	gaudi_nic->ndev = NULL;
+}
+
+irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg)
+{
+	return IRQ_HANDLED;
+}
+
+/**
+ * gaudi_nic_ports_init() - initialize NIC ports.
+ * @hdev: habanalabs device structure.
+ *
+ * Allocate and initialize the NIC ports.
+ *
+ * Return: 0 for success, non-zero for failure.
+ */
+int gaudi_nic_ports_init(struct hl_device *hdev)
+{
+	struct cpucp_nic_info *nic_info = &hdev->asic_prop.cpucp_nic_info;
+	struct cpucp_info *cpucp_info = &hdev->asic_prop.cpucp_info;
+	struct cpucp_mac_addr *mac_arr = nic_info->mac_addrs;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int rc, i, nics_init = 0, cq_irq = 0;
+	bool read_card_location = false;
+	u8 mac[ETH_ALEN];
+	s32 *taps;
+
+	if (!hdev->nic_ports_mask)
+		return 0;
+
+	if (NIC_DRV_END_ADDR - NIC_DRV_BASE_ADDR > NIC_DRV_SIZE) {
+		dev_err(hdev->dev,
+			"DRAM allocation for NIC shouldn't exceed %dMB\n",
+			NIC_DRV_SIZE / 1024 / 1024);
+		return -ENOMEM;
+	}
+
+	if (TMR_FSM_SIZE + TMR_FREE_SIZE + TMR_FIFO_SIZE +
+			TMR_FIFO_STATIC_SIZE >
+		TMR_FSM_ENGINE_OFFS) {
+		dev_err(hdev->dev,
+			"NIC TMR data shouldn't be bigger than %dMB\n",
+			TMR_FSM_ENGINE_OFFS / 1024 / 1024);
+		return -ENOMEM;
+	}
+
+	/* set the default PAM4 Tx taps */
+	for (i = 0 ; i < NIC_MAX_NUM_OF_LANES ; i++) {
+		taps = gaudi->nic_pam4_tx_taps[i].taps;
+		taps[0] = 0;
+		taps[1] = -6;
+		taps[2] = 25;
+		taps[3] = 0;
+		taps[4] = 0;
+	}
+
+	/* copy the MAC OUI in reverse */
+	for (i = 0 ; i < 3 ; i++)
+		mac[i] = HABANALABS_MAC_OUI_1 >> (8 * (2 - i));
+
+	if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q) {
+		char buf[VERSION_MAX_LEN] = {0}, *str;
+		u8 *mac_addr;
+
+		rc = hl_fw_cpucp_nic_info_get(hdev);
+		if (rc)
+			return rc;
+
+		for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+			if (!(hdev->nic_ports_mask & BIT(i)))
+				continue;
+
+			mac_addr = mac_arr[i].mac_addr;
+			if (strncmp(mac, mac_addr, 3)) {
+				dev_err(hdev->dev,
+					"bad MAC OUI %02x:%02x:%02x:%02x:%02x:%02x, port %d\n",
+					mac_addr[0], mac_addr[1], mac_addr[2],
+					mac_addr[3], mac_addr[4], mac_addr[5],
+					i);
+				return -EFAULT;
+			}
+		}
+
+		hdev->nic_ports_mask &= le64_to_cpu(nic_info->link_mask[0]);
+		hdev->nic_ports_ext_mask &=
+					le64_to_cpu(nic_info->link_ext_mask[0]);
+		hdev->nic_auto_neg_mask &=
+					le64_to_cpu(nic_info->auto_neg_mask[0]);
+		gaudi->nic_use_fw_polarity = true;
+
+		for (i = 1 ; i < 11 ; i++) {
+			sprintf(buf, "hl-gaudi-0.%d.", i);
+			str = strstr(cpucp_info->kernel_version, buf);
+			if (!str)
+				continue;
+
+			/*
+			 * No PMC polarity and external ports mask prior to F/W
+			 * version 0.9.0.
+			 */
+			if (i < 9) {
+				hdev->nic_ports_ext_mask = HLS1_EXT_PORTS_MASK;
+				gaudi->nic_use_fw_polarity = false;
+			}
+
+			/* No Autoneg mask prior to F/W version 0.11.0, hence:
+			 * - No Autoneg on external ports on PMC card prior to
+			 *   that version.
+			 * - No Autoneg at all on PCI card prior to that
+			 *   version.
+			 */
+			if (hdev->card_type == cpucp_card_type_pmc)
+				hdev->nic_auto_neg_mask = hdev->nic_ports_mask &
+						~hdev->nic_ports_ext_mask;
+			else
+				hdev->nic_auto_neg_mask = 0;
+
+			/*
+			 * No privileged protection prior to F/W version 0.11.0
+			 * so we can read the card location from a register.
+			 */
+			read_card_location = true;
+			break;
+		}
+	} else {
+		/*
+		 * No CPU, hence set the MAC addresses manually.
+		 * Each device will have its own unique MAC random.
+		 */
+		get_random_bytes(&mac[3], 2);
+
+		for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+			mac[ETH_ALEN - 1] = i;
+			memcpy(mac_arr[i].mac_addr, mac, ETH_ALEN);
+		}
+
+		read_card_location = true;
+	}
+
+	if (read_card_location) {
+		u32 card_location = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
+
+		cpucp_info->card_location =
+				cpu_to_le32((card_location >> 22) & 0x7);
+	}
+
+	for (i = 0 ; i < NIC_NUMBER_OF_MACROS ; i++) {
+		gaudi->nic_macros[i].idx = i;
+		gaudi->nic_macros[i].num_of_lanes = NIC_LANES_2;
+	}
+
+	/*
+	 * for each NIC macro, set the even port to handle the macro
+	 * configuration, unless the even port is disabled and in this case the
+	 * odd port will handle the configuration.
+	 */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++)
+		if ((hdev->nic_ports_mask & BIT(i)) &&
+			(!(i & 1) || !(hdev->nic_ports_mask & BIT(i - 1))))
+			gaudi->nic_devices[i].do_macro_cfg = true;
+
+	gaudi->nic_pcs_fail_time_frame = PCS_FAIL_TIME_FRAME_SEC;
+	gaudi->nic_pcs_fail_threshold = PCS_FAIL_THRESHOLD;
+	gaudi->nic_check_link = true;
+
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		/* One IRQ for all ports to indicate a CQ overrun */
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		rc = request_irq(cq_irq, gaudi_nic_cq_irq_handler, 0,
+					"gaudi nic cq", hdev);
+		if (rc) {
+			dev_err(hdev->dev, "Failed to request CQ IRQ %d, %d\n",
+				cq_irq, rc);
+			return rc;
+		}
+
+		gaudi->nic_cq_irq_enable = true;
+	}
+
+	/* Must be called here as it depends on the earlier initializations */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++, nics_init++)
+		if (hdev->nic_ports_mask & BIT(i)) {
+			rc = port_register(hdev, i);
+			if (rc) {
+				dev_err(hdev->dev, "NIC port %d init failed\n",
+							i);
+				goto unregister_ports;
+			}
+		}
+
+	gaudi->hw_cap_initialized |= HW_CAP_NIC_DRV;
+
+	return 0;
+
+unregister_ports:
+	for (i = 0 ; i < nics_init ; i++)
+		if (hdev->nic_ports_mask & BIT(i))
+			port_unregister(&gaudi->nic_devices[i]);
+
+	if (gaudi->nic_cq_irq_enable) {
+		synchronize_irq(cq_irq);
+		free_irq(cq_irq, hdev);
+	}
+
+	return rc;
+}
+
+/**
+ * gaudi_nic_ports_fini() - cleanup NIC ports.
+ * @hdev: habanalabs device structure.
+ *
+ * Perform cleanup and freeing of the NIC ports.
+ */
+void gaudi_nic_ports_fini(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int i, cq_irq;
+
+	gaudi->nic_in_teardown = true;
+
+	/* The HW_CAP_NIC_DRV bit of gaudi->hw_cap_initialized cannot be used as
+	 * a prerequisite for this function, as we may arrive here after a
+	 * failing hard reset w/o calling to gaudi_nic_ports_reopen().
+	 */
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)) ||
+				!gaudi->nic_devices[i].ndev)
+			continue;
+
+		port_unregister(&gaudi->nic_devices[i]);
+	}
+
+	if (gaudi->nic_cq_irq_enable) {
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		synchronize_irq(cq_irq);
+		free_irq(cq_irq, hdev);
+		gaudi->nic_cq_irq_enable = false;
+	}
+}
+
+/**
+ * gaudi_nic_hard_reset_prepare() - stop the NIC Rx, Tx, CQ and synchronize
+ *                                  with other NIC reset flows.
+ * @hdev: habanalabs device structure.
+ *
+ * This function makes sure that during the reset no packets will be processed
+ * and that ndo_open/ndo_close do not open/close the NIC.
+ * A hard reset might occur right after the driver was loaded, which means
+ * before the NICs initialization was finished. Therefore, even if the NIC is
+ * not yet enabled, we mark it as in reset to avoid races. We clear the in reset
+ * flag later on when reopening the NICs.
+ *
+ * Return: 0 for success, non-zero for failure.
+ */
+int gaudi_nic_hard_reset_prepare(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	ktime_t timeout;
+	int i;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV) ||
+			(gaudi->nic_in_reset))
+		return 0;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		/*
+		 * This function is competing with the NIC reset from ethtool,
+		 * so try to take the in_reset atomic and if we are already in a
+		 * middle of reset, wait until reset function is finished.
+		 * Reset function is designed to always finish (could take up to
+		 * a few seconds in worst case).
+		 */
+
+		timeout = ktime_add_ms(ktime_get(),
+					HL_PENDING_RESET_PER_SEC * 1000 * 4);
+		while (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+			usleep_range(50, 200);
+			if (ktime_compare(ktime_get(), timeout) > 0) {
+				WARN(1,
+					"Timeout while waiting for port %d to finish reset\n",
+					gaudi_nic->port);
+				return -EBUSY;
+			}
+		}
+	}
+
+	gaudi->nic_in_reset = true;
+
+	return 0;
+}
+
+/**
+ * gaudi_nic_stop() - stop the NIC S/W and H/W.
+ * @hdev: habanalabs device structure.
+ *
+ * This function stops the operation of the NIC S/W and H/W, no packets are
+ * processed after this call.
+ */
+void gaudi_nic_stop(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	int i, cq_irq;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		if ((hdev->nic_ports_mask & BIT(i)) && gaudi_nic->enabled)
+			port_close(gaudi_nic);
+	}
+
+	if (gaudi->nic_cq_irq_enable) {
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		synchronize_irq(cq_irq);
+		free_irq(cq_irq, hdev);
+		gaudi->nic_cq_irq_enable = false;
+	}
+}
+
+/**
+ * gaudi_nic_ports_reopen() - reopen the NIC ports.
+ * @hdev: habanalabs device structure.
+ *
+ * This function start the operation of the NIC ports, packets will be processed
+ * after this call.
+ * Called after hard reset to reopen the NIC ports that were closed during the
+ * reset.
+ */
+void gaudi_nic_ports_reopen(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	int rc, i, nics_init = 0, cq_irq;
+	u32 port;
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC_DRV)
+		return;
+
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		/* One IRQ for all ports to indicate a CQ overrun */
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		rc = request_irq(cq_irq, gaudi_nic_cq_irq_handler, 0,
+					"gaudi nic cq", hdev);
+		if (rc)
+			dev_err(hdev->dev, "Failed to request CQ IRQ %d, %d\n",
+				cq_irq, rc);
+		else
+			gaudi->nic_cq_irq_enable = true;
+	}
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++, nics_init++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+		port = gaudi_nic->port;
+
+		/*
+		 * It could be that the port was shutdown by 'ifconfig down',
+		 * and there is no need in reopening it.
+		 * Since we mark the ports as in reset even if they are
+		 * disabled, we clear the flag here anyway.
+		 * See gaudi_nic_hard_reset_prepare() for more info.
+		 */
+		if (!gaudi_nic->enabled) {
+			atomic_set(&gaudi_nic->in_reset, 0);
+			continue;
+		}
+
+		schedule_delayed_work(&gaudi_nic->port_open_work,
+					msecs_to_jiffies(1));
+	}
+
+	gaudi->nic_in_reset = false;
+
+	gaudi->hw_cap_initialized |= HW_CAP_NIC_DRV;
+}
+
+void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
+{
+}
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.h b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
new file mode 100644
index 000000000000..7259b01b78fb
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
@@ -0,0 +1,336 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+#ifndef GAUDI_NIC_DRV_H_
+#define GAUDI_NIC_DRV_H_
+
+#include "gaudiP.h"
+#include "../include/gaudi/gaudi_fw_if.h"
+
+/* Time in jiffies before concluding the transmitter is hung */
+#define NIC_TX_TIMEOUT			(5 * HZ)
+
+#define NIC_RX_SIZE			1024
+#define NIC_NAPI_MAX_RX_BUDGET		64
+#define NIC_MAX_PKT_SIZE		2048
+#define NIC_ARP_PKT_SIZE		28
+
+#if (NIC_MAX_PKT_SIZE & (NIC_MAX_PKT_SIZE - 1))
+#error "Max ETH packet size is not a power of 2"
+#endif
+
+#define ETH_P_LLDP		0x88CC
+
+#define NIC_MACRO_CFG_SIZE	(mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0)
+#define NIC_CFG_SIZE		(mmNIC0_QPC1_REQ_STATIC_CONFIG - \
+					mmNIC0_QPC0_REQ_STATIC_CONFIG)
+
+#define NIC_MAX_QP_NUM		(HL_NIC_MAX_CONN_ID + 1)
+#define NIC_HW_MAX_QP_NUM	0x8000 /* 32K */
+
+#if (NIC_MAX_QP_NUM > NIC_HW_MAX_QP_NUM)
+#error "Number of available QPs must be smaller or equal to NIC_HW_MAX_QP_NUM"
+#endif
+
+/* The '*_SIZE' defines are per NIC port */
+#define REQ_QPC_BASE_SIZE	(NIC_MAX_QP_NUM * sizeof(struct qpc_requester))
+#define RES_QPC_BASE_SIZE	(NIC_MAX_QP_NUM * sizeof(struct qpc_responder))
+#define SWQ_BASE_SIZE		(WQ_BUFFER_SIZE * sizeof(struct sq_wqe))
+#define SB_BASE_SIZE		(WQ_BUFFER_SIZE * NIC_MAX_PKT_SIZE)
+
+#define TMR_BASE_SIZE		(TMR_FSM_ENGINE_OFFS + TMR_FSM_SIZE)
+
+#define TMR_FSM_ENGINE_OFFS	(1 << 22) /* H/W constraint */
+
+#define TMR_FSM_SIZE		ALIGN(NIC_HW_MAX_QP_NUM, DEVICE_CACHE_LINE_SIZE)
+#define TMR_FREE_SIZE		ALIGN(TMR_FREE_NUM_ENTRIES * 4, \
+					DEVICE_CACHE_LINE_SIZE)
+/* each timer serves two NICs, hence multiply by 2 */
+#define TMR_FIFO_SIZE		ALIGN((NIC_MAX_QP_NUM * 2 * 4), \
+					DEVICE_CACHE_LINE_SIZE)
+#define TMR_FIFO_STATIC_SIZE	(DEVICE_CACHE_LINE_SIZE * TMR_GRANULARITY)
+
+#define TMR_FSM0_OFFS		0
+#define TMR_FREE_OFFS		(TMR_FSM0_OFFS + TMR_FSM_SIZE)
+#define TMR_FIFO_OFFS		(TMR_FREE_OFFS + TMR_FREE_SIZE)
+#define TMR_FSM1_OFFS		(TMR_FSM0_OFFS + TMR_FSM_ENGINE_OFFS)
+
+#define TMR_FREE_NUM_ENTRIES	(TMR_FIFO_SIZE / DEVICE_CACHE_LINE_SIZE)
+#define TMR_GRANULARITY		128
+
+#define TXS_BASE_SIZE		(TXS_FREE_SIZE + TXS_FIFO_SIZE + \
+					TXS_FIFO_STATIC_SIZE)
+
+
+#define TXS_FREE_SIZE		ALIGN(TXS_FREE_NUM_ENTRIES * 4, \
+					DEVICE_CACHE_LINE_SIZE)
+/* TXS serves requester and responder QPs, hence multiply by 2 */
+#define TXS_FIFO_SIZE		ALIGN((NIC_MAX_QP_NUM * 2 * 4), \
+					DEVICE_CACHE_LINE_SIZE)
+#define TXS_FIFO_STATIC_SIZE	(DEVICE_CACHE_LINE_SIZE * TXS_GRANULARITY)
+
+#define TXS_FREE_OFFS		0
+#define TXS_FIFO_OFFS		(TXS_FREE_OFFS + TXS_FREE_SIZE)
+
+#define TXS_FREE_NUM_ENTRIES	(TXS_FIFO_SIZE / DEVICE_CACHE_LINE_SIZE)
+#define TXS_GRANULARITY		256
+#define TXS_SCHEDQ		256
+
+#define SECTION_ALIGN_SIZE	0x100000ull
+#define NIC_DRV_BASE_ADDR	ALIGN(NIC_DRV_ADDR, SECTION_ALIGN_SIZE)
+
+#define REQ_QPC_BASE_ADDR	NIC_DRV_BASE_ADDR
+
+#define RES_QPC_BASE_ADDR	ALIGN(REQ_QPC_BASE_ADDR + \
+					NIC_NUMBER_OF_ENGINES * \
+					REQ_QPC_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define TMR_BASE_ADDR		ALIGN(RES_QPC_BASE_ADDR + \
+					NIC_NUMBER_OF_ENGINES * \
+					RES_QPC_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define TXS_BASE_ADDR		ALIGN(TMR_BASE_ADDR + \
+					NIC_NUMBER_OF_MACROS * \
+					TMR_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define SWQ_BASE_ADDR		ALIGN(TXS_BASE_ADDR + \
+					NIC_NUMBER_OF_ENGINES * \
+					TXS_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define SB_BASE_ADDR		ALIGN(SWQ_BASE_ADDR + \
+					NIC_MAX_NUMBER_OF_PORTS * \
+					SWQ_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define NIC_DRV_END_ADDR	ALIGN(SB_BASE_ADDR + NIC_MAX_NUMBER_OF_PORTS * \
+					SB_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define WQ_BUFFER_LOG_SIZE		8
+#define WQ_BUFFER_SIZE			(1 << WQ_BUFFER_LOG_SIZE)
+#define CQ_PORT_BUF_LEN			(1 << 18)
+#define CQE_SIZE			sizeof(struct cqe)
+#define CQ_PORT_BUF_SIZE		(CQ_PORT_BUF_LEN * CQE_SIZE)
+#define CQ_USER_MAX_SIZE		(1 << 30) /* 1GB */
+#define CQ_USER_MIN_ENTRIES		128
+#define CQ_USER_MAX_ENTRIES		(CQ_USER_MAX_SIZE / CQE_SIZE)
+#define QP_ERR_BUF_SIZE			(QP_ERR_SIZE * QP_ERR_BUF_LEN)
+#define QP_ERR_SIZE			sizeof(struct qp_err)
+#define QP_ERR_BUF_LEN			1024
+#define RX_PKT_MAX_SIZE			2048
+#define QPC_RES_LOG_BUF_SIZE_MASK	10
+#define RAW_QPN				0
+#define RX_MSI_IDX			(GAUDI_EVENT_QUEUE_MSI_IDX + 1)
+#define RX_MSI_ADDRESS			(mmPCIE_MSI_INTR_0 + RX_MSI_IDX * 4)
+#define CQ_MSI_IDX			(NUMBER_OF_CMPLT_QUEUES + \
+						NUMBER_OF_CPU_HW_QUEUES + \
+						NIC_NUMBER_OF_ENGINES)
+#define CQ_MSI_ADDRESS			(mmPCIE_MSI_INTR_0 + CQ_MSI_IDX * 4)
+
+#define WQE_MAX_SIZE			max(NIC_SEND_WQE_SIZE, \
+						NIC_RECV_WQE_SIZE)
+#define USER_WQES_MAX_NUM		(1 << 21) /* 2MB */
+#define USER_WQ_ARR_MAX_SIZE		ALIGN((1ull * NIC_HW_MAX_QP_NUM * \
+					USER_WQES_MAX_NUM * \
+						WQE_MAX_SIZE), PAGE_SIZE_2MB)
+
+#define CQ_VIRTUAL_ADDRESS		VA_NIC_MEM_ADDR
+
+#define USER_SWQ_VIRTUAL_ADDRESS	ALIGN(CQ_VIRTUAL_ADDRESS + \
+					NIC_NUMBER_OF_ENGINES * \
+						CQ_PORT_BUF_SIZE, \
+							SECTION_ALIGN_SIZE)
+
+#define USER_RWQ_VIRTUAL_ADDRESS	ALIGN(USER_SWQ_VIRTUAL_ADDRESS + \
+					NIC_NUMBER_OF_ENGINES * \
+						USER_WQ_ARR_MAX_SIZE, \
+							SECTION_ALIGN_SIZE)
+
+#define REQ_QPC_ADDR(port, conn_id) \
+	(REQ_QPC_BASE_ADDR + (port) * REQ_QPC_BASE_SIZE + (conn_id) * \
+			sizeof(struct qpc_requester))
+
+#define RES_QPC_ADDR(port, conn_id) \
+	(RES_QPC_BASE_ADDR + (port) * RES_QPC_BASE_SIZE + (conn_id) * \
+			sizeof(struct qpc_responder))
+
+#define NIC_DR_10		1031250
+#define NIC_DR_25		2578125
+#define NIC_DR_26		2656250
+#define NIC_DR_50		5312500
+
+#define NIC_LANES_2		2
+#define NIC_LANES_4		4
+
+/*
+ * change WQ_BUFFER_LOG_SIZE to log2(SWQ_BASE_SIZE/WQE_BB_SIZE).
+ * can use WQ_BUFFER_SIZE/WQE_BB_SIZE instead.
+ */
+
+enum ts_type {
+	TS_RC = 0,
+	TS_RAW = 1
+};
+
+enum wqe_opcode {
+	WQE_NOP = 0,
+	WQE_SEND = 1,
+	WQE_LINEAR = 2,
+	WQE_STRIDE = 3,
+	WQE_MULTI_STRIDE = 4,
+	WQE_RATE_UPDATE  = 5
+};
+
+enum trust_level {
+	UNSECURED = 0,
+	SECURED = 1,
+	PRIVILEGE = 2
+};
+
+struct qpc_requester {
+	u64	data[8];
+};
+
+#define QPC_SET(qpc, idx, shift, val, len) \
+		((qpc).data[idx] |= (u64) ((val) & (BIT(len) - 1)) << (shift))
+
+#define REQ_QPC_SET_DST_QP(req, val)		QPC_SET(req, 0, 0, val, 24)
+#define REQ_QPC_SET_PORT(req, val)		QPC_SET(req, 0, 24, val, 4)
+#define REQ_QPC_SET_PRIORITY(req, val)		QPC_SET(req, 0, 28, val, 2)
+#define REQ_QPC_SET_RKEY(req, val)		QPC_SET(req, 0, 32, val, 32)
+#define REQ_QPC_SET_DST_IP(req, val)		QPC_SET(req, 1, 0, val, 32)
+#define REQ_QPC_SET_SRC_IP(req, val)		QPC_SET(req, 1, 32, val, 32)
+#define REQ_QPC_SET_DST_MAC_31_0(req, val)	QPC_SET(req, 2, 0, val, 32)
+#define REQ_QPC_SET_DST_MAC_47_32(req, val)	QPC_SET(req, 2, 32, val, 16)
+#define REQ_QPC_SET_SQ_NUM(req, val)		QPC_SET(req, 3, 24, val, 8)
+#define REQ_QPC_SET_TM_GRANULARITY(req, val)	QPC_SET(req, 3, 56, val, 7)
+#define REQ_QPC_SET_SOB_EN(req, val)		QPC_SET(req, 3, 63, val, 1)
+#define REQ_QPC_SET_TRANSPORT_SERVICE(req, val)	QPC_SET(req, 5, 49, val, 1)
+#define REQ_QPC_SET_BURST_SIZE(req, val)	QPC_SET(req, 5, 50, val, 22)
+#define REQ_QPC_SET_LAST_IDX(req, val)		QPC_SET(req, 6, 8, val, 22)
+#define REQ_QPC_SET_SWQ_GRANULARITY(req, val)	QPC_SET(req, 7, 58, val, 1)
+#define REQ_QPC_SET_WQ_BASE_ADDR(req, val)	QPC_SET(req, 7, 32, val, 24)
+#define REQ_QPC_SET_SECURED(req, val)		QPC_SET(req, 7, 59, val, 2)
+#define REQ_QPC_SET_VALID(req, val)		QPC_SET(req, 7, 63, val, 1)
+
+struct qpc_responder {
+	u64	data[4];
+};
+
+#define RES_QPC_SET_DST_QP(res, val)		QPC_SET(res, 0, 0, val, 24)
+#define RES_QPC_SET_PORT(res, val)		QPC_SET(res, 0, 24, val, 4)
+#define RES_QPC_SET_PRIORITY(res, val)		QPC_SET(res, 0, 28, val, 2)
+#define RES_QPC_SET_SQ_NUM(res, val)		QPC_SET(res, 2, 48, val, 8)
+#define RES_QPC_SET_LKEY(res, val)		QPC_SET(res, 0, 32, val, 32)
+#define RES_QPC_SET_DST_IP(res, val)		QPC_SET(res, 1, 0, val, 32)
+#define RES_QPC_SET_SRC_IP(res, val)		QPC_SET(res, 1, 32, val, 32)
+#define RES_QPC_SET_DST_MAC_31_0(res, val)	QPC_SET(res, 2, 0, val, 32)
+#define RES_QPC_SET_DST_MAC_47_32(res, val)	QPC_SET(res, 2, 32, val, 16)
+#define RES_QPC_SET_TRANSPORT_SERVICE(res, val)	QPC_SET(res, 2, 63, val, 1)
+#define RES_QPC_SET_LOG_BUF_SIZE_MASK(res, val)	QPC_SET(res, 3, 24, val, 5)
+#define RES_QPC_SET_SOB_EN(res, val)		QPC_SET(res, 3, 59, val, 1)
+#define RES_QPC_SET_VALID(res, val)		QPC_SET(res, 3, 63, val, 1)
+#define RES_QPC_SET_SECURED(res, val)		QPC_SET(res, 3, 60, val, 2)
+
+/**
+ * struct hl_qp - Describes a NIC Queue Pair.
+ * @qpc_lock: Mutex to protect accessing the QP context.
+ * @refcount: Reference counter for the QP usage.
+ * @gaudi_nic: Pointer to NIC device this QP belongs to.
+ * @port: The port number this QP belongs to.
+ * @conn_id: The QP number within its port.
+ * @local_key: Key for local access.
+ * @remote_key: Key for remote access.
+ * @is_req: is requester context was set for the QP.
+ * @is_res: is responder context was set for the QP.
+ */
+struct hl_qp {
+	struct mutex qpc_lock;
+	struct kref refcount;
+	struct gaudi_nic_device *gaudi_nic;
+	u32 port;
+	u32 conn_id;
+	u32 local_key;
+	u32 remote_key;
+	u8 is_req;
+	u8 is_res;
+};
+
+struct sq_wqe {
+	u64	data[4];
+};
+
+#define CFG_SQ_WQE_OPCODE(swq, val) \
+						((swq).data[0] |= (val) << 28)
+#define CFG_SQ_WQE_LOCAL_ADDRESS_31_0(swq, val) \
+						((swq).data[0] |= (val) << 32)
+#define CFG_SQ_WQE_LOCAL_ADDRESS_49_32(swq, val) \
+						((swq).data[1] |= (val))
+#define CFG_SQ_WQE_SIZE(swq, val) \
+						((swq).data[1] |= (val) << 18)
+
+struct cqe {
+	u64	data;
+};
+
+#define CQE_IS_VALID(cqe)		(((cqe)->data >> 63) & 1)
+#define CQE_TYPE(cqe)			(((cqe)->data >> 23) & 1)
+#define CQE_RES_NIC(cqe)		(((cqe)->data >> 10) & 1)
+#define CQE_RES_IMDT_21_0(cqe)		(((cqe)->data >> 32) & 0x3FFFFF)
+#define CQE_RES_IMDT_31_22(cqe)		((cqe)->data & 0x3FF)
+#define CQE_REQ_WQE_IDX(cqe)		(((cqe)->data >> 32) & 0x3FFFFF)
+#define CQE_REQ_QPN(cqe)		((cqe)->data & 0x7FFFFF)
+#define CQE_SET_INVALID(cqe)		((cqe)->data &= ~(1ull << 63))
+
+struct qp_err {
+	u32	data;
+};
+
+#define QP_ERR_QP_NUM(qp_err)		((qp_err).data & 0xFFFFFF)
+#define QP_ERR_ERR_NUM(qp_err)		(((qp_err).data >> 24) & 0x7F)
+#define QP_ERR_IS_REQ(qp_err)		(((qp_err).data >> 31) & 1)
+
+/*
+ * Some registers are specific for each NIC port, and some are shared for all
+ * the NIC macro (a pair of even and odd port).
+ * Therefore we need different methods to handle these registers.
+ */
+
+/* read/write port specific registers */
+#define NIC_CFG_BASE(port) \
+			((u64) (NIC_MACRO_CFG_SIZE * (u64) ((port) >> 1) + \
+					NIC_CFG_SIZE * (u64) ((port) & 1)))
+
+#define NIC_RREG32(reg)		RREG32(NIC_CFG_BASE(gaudi_nic->port) + (reg))
+#define NIC_WREG32(reg, val)	WREG32(NIC_CFG_BASE(gaudi_nic->port) + (reg), \
+					(val))
+#define NIC_RMWREG32(reg, val, mask)	\
+		RMWREG32(NIC_CFG_BASE(gaudi_nic->port) + (reg), (val), (mask))
+
+/* read/write shared registers */
+#define NIC_MACRO_CFG_BASE(port) \
+			((u64) (NIC_MACRO_CFG_SIZE * (u64) ((port) >> 1)))
+
+#define NIC_MACRO_RREG32_PORT(reg, port) \
+			RREG32(NIC_MACRO_CFG_BASE(port) + reg)
+#define NIC_MACRO_WREG32_PORT(reg, val, port) \
+			WREG32(NIC_MACRO_CFG_BASE(port) + reg, val)
+
+#define NIC_MACRO_RREG32(reg) NIC_MACRO_RREG32_PORT(reg, gaudi_nic->port)
+#define NIC_MACRO_WREG32(reg, val) \
+				NIC_MACRO_WREG32_PORT(reg, val, gaudi_nic->port)
+
+extern const struct ethtool_ops gaudi_nic_ethtool_ops;
+extern const struct dcbnl_rtnl_ops gaudi_nic_dcbnl_ops;
+
+void gaudi_nic_set_pfc(struct gaudi_nic_device *gaudi_nic);
+u32 gaudi_nic_mac_read(struct gaudi_nic_device *gaudi_nic, int mac,
+			char *cfg_type, u32 addr);
+int gaudi_nic_port_reset(struct gaudi_nic_device *gaudi_nic);
+bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic);
+u64 gaudi_nic_read_mac_stat_counter(struct hl_device *hdev, u32 port, int idx,
+					bool is_rx);
+
+#endif /* GAUDI_NIC_DRV_H_ */
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5cddd46a8fb8..f82212310114 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5274,6 +5274,11 @@ static int goya_ctx_init(struct hl_ctx *ctx)
 	return 0;
 }
 
+static void goya_ctx_fini(struct hl_ctx *ctx)
+{
+
+}
+
 u32 goya_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 {
 	return cq_idx;
@@ -5387,6 +5392,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.wreg = hl_wreg,
 	.halt_coresight = goya_halt_coresight,
 	.ctx_init = goya_ctx_init,
+	.ctx_fini = goya_ctx_fini,
 	.get_clk_rate = goya_get_clk_rate,
 	.get_queue_id_for_cq = goya_get_queue_id_for_cq,
 	.read_device_fw_version = goya_read_device_fw_version,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 9705b8adb60c..cd9d05e03464 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -831,6 +831,9 @@ struct hl_debug_args {
 	__u32 ctx_id;
 };
 
+#define HL_NIC_MIN_CONN_ID	1
+#define HL_NIC_MAX_CONN_ID	1023
+
 /*
  * Various information operations such as:
  * - H/W IP information
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 06/14] habanalabs/gaudi: add NIC PHY code
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (3 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 05/14] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 07/14] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Configure the NIC PHY (physical layer). The PHY is configured with the
correct polarity and Tx taps depending on the card type.

After the initial configuration, the PHY flow contains the following:
- Auto-negotiation (if enabled)
- PHY F/W tuning
- Physical Coding Sublayer (PCS) link check

After acquiring the initial PCS link, it is checked periodically. Once we
detect that there is no link, we fall to PHY F/W tuning or even
Auto-negotiation to re-acquire the link.

Currently we use Auto-negotiation only because it is a prerequisite for
link training (physical layer quality improvement) and not for setting the
transmission parameters. As a result, the Auto-negotiation is currently
supported only between Gaudi cards.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/Makefile    |    2 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c |  456 +++++++-
 drivers/misc/habanalabs/gaudi/gaudi_nic.h |   17 +
 drivers/misc/habanalabs/gaudi/gaudi_phy.c | 1276 +++++++++++++++++++++
 4 files changed, 1748 insertions(+), 3 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c

diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index 24e14cff563d..c5143cf6f025 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -2,4 +2,4 @@
 HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
-HL_GAUDI_FILES += gaudi/gaudi_nic.o
+HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 9fc6e9fe7ac4..1e3f58297e5e 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -704,13 +704,26 @@ static void config_port_mac(struct gaudi_nic_device *gaudi_nic)
 	}
 }
 
+static void phy_start_stop(struct gaudi_nic_device *gaudi_nic, bool is_start)
+{
+	int i;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->power_up_mask & BIT(i)))
+			continue;
+
+		gaudi_nic_phy_start_stop(gaudi_nic, i, is_start);
+	}
+}
+
 static int hw_config(struct gaudi_nic_device *gaudi_nic)
 {
 	u32 port = gaudi_nic->port, data_rate, speed = gaudi_nic->speed;
 	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
 	struct hl_device *hdev = gaudi_nic->hdev;
 	u64 mac_addr = 0, tmr_addr;
-	int i;
+	bool do_auto_neg;
+	int i, rc;
 
 	for (i = 0 ; i < ETH_ALEN ; i++) {
 		mac_addr <<= 8;
@@ -746,6 +759,26 @@ static int hw_config(struct gaudi_nic_device *gaudi_nic)
 
 	gaudi_nic->data_rate = data_rate;
 
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->power_up_mask & BIT(i)))
+				continue;
+
+			do_auto_neg = gaudi_nic->auto_neg_enable &&
+					(gaudi_nic->auto_neg_mask & BIT(i));
+
+			rc = gaudi_nic_phy_power_up(gaudi_nic, i, do_auto_neg);
+			if (rc) {
+				dev_err(hdev->dev,
+					"PHY power up failed for port %d\n",
+					port);
+				return rc;
+			}
+		}
+
+		phy_start_stop(gaudi_nic, true);
+	}
+
 	/* if no need in macro configuration, do only port configuration */
 	if (gaudi_nic->do_macro_cfg) {
 		config_port_mac(gaudi_nic);
@@ -1216,6 +1249,366 @@ static void port_reset_state(struct gaudi_nic_device *gaudi_nic)
 	gaudi_nic->uncorrectable_errors_cnt = 0;
 }
 
+static void phy_reconfig(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int i, rc;
+
+	if (!gaudi->nic_phy_config_fw)
+		return;
+
+	dev_dbg(hdev->dev, "reconfiguring PHY, port %d\n", port);
+
+	if (gaudi_nic->auto_neg_enable) {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->auto_neg_mask & BIT(i)))
+				continue;
+
+			rc = gaudi_nic_phy_fw_config_auto_neg(gaudi_nic, i);
+			if (rc)
+				dev_dbg(hdev->dev,
+					"F/W reconfig autoneg failed, port: %d, lane: %d\n",
+					port, i);
+		}
+	} else {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->power_up_mask & BIT(i)))
+				continue;
+
+			rc = gaudi_nic_phy_power_up(gaudi_nic, i, false);
+			if (rc) {
+				dev_err(hdev->dev,
+					"PHY reconfig power up failed for port %d\n",
+					port);
+				break;
+			}
+		}
+	}
+
+	port_reset_state(gaudi_nic);
+}
+
+static enum link_status update_pcs_link_failure(
+					struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct kfifo *pcs_fifo = &gaudi_nic->pcs_fail_fifo;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	ktime_t now, before;
+	int count;
+
+	if (!gaudi_nic->auto_neg_enable)
+		return PCS_DOWN;
+
+	now = ktime_get();
+
+	count = kfifo_in(pcs_fifo, &now, sizeof(now));
+	if (count != sizeof(now)) {
+		dev_err(hdev->dev,
+			"Failed to push to PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+		return PCS_DOWN;
+	}
+
+	gaudi_nic->pcs_fail_cnt++;
+
+	if (gaudi_nic->pcs_fail_cnt < gaudi->nic_pcs_fail_threshold)
+		return PCS_DOWN;
+
+	/*
+	 * Here we reached the threshold count of failures to reconfigure the
+	 * link. Now need to check if all of the failure are in the needed time
+	 * frame. It is sufficient to check the first item in the queue as it is
+	 * the earliest failure and if it is in the needed time frame, all the
+	 * rest if failures are in it too.
+	 */
+	count = kfifo_out_peek(pcs_fifo, &before, sizeof(before));
+	if (count != sizeof(before))
+		dev_err(hdev->dev,
+			"Failed to peek in PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+
+	if (ktime_ms_delta(now, before) <=
+			(gaudi->nic_pcs_fail_time_frame * MSEC_PER_SEC)) {
+		dev_dbg(hdev->dev,
+			"PHY reconfig due to PCS link failure cnt, port: %d\n",
+			port);
+		return FAIL_RECONFIG;
+	}
+
+	/*
+	 * The earliest failure is not in the needed time frame, hence
+	 * we can remove it.
+	 */
+	count = kfifo_out(pcs_fifo, &before, sizeof(before));
+	if (count != sizeof(before))
+		dev_err(hdev->dev,
+			"Failed to pop from PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+
+	gaudi_nic->pcs_fail_cnt--;
+
+	return PCS_DOWN;
+}
+
+static void reset_tx(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int i;
+
+	/* This temporary WA is only for HLS external ports */
+	if ((hdev->card_type != cpucp_card_type_pmc) ||
+			(BIT(gaudi_nic->port) & ~hdev->nic_ports_ext_mask))
+		return;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++)
+		if (gaudi_nic->fw_tuning_mask & BIT(i))
+			gaudi_nic_phy_reset_tx(gaudi_nic, i);
+}
+
+static enum link_status _check_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	u32 port = gaudi_nic->port, pcs_val, mac_val, start_lane;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int i, rc;
+
+	start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_check_link_status(gaudi_nic, i);
+		if (rc)
+			return PHY_DOWN;
+	}
+
+	/* need to check the first lane only */
+	mac_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "mac", 0x40);
+
+	if (mac_val & 1)
+		gaudi_nic->pcs_local_fault_cnt++;
+	else if (gaudi_nic->pcs_local_fault_cnt)
+		gaudi_nic->pcs_local_fault_cnt--;
+
+	if (mac_val & 2)
+		gaudi_nic->pcs_remote_fault_cnt++;
+	else if (gaudi_nic->pcs_remote_fault_cnt)
+		gaudi_nic->pcs_remote_fault_cnt--;
+
+	if (gaudi_nic->pcs_remote_fault_cnt == PCS_FAULT_THRESHOLD) {
+		dev_dbg(hdev->dev,
+			"PHY reconfig due to PCS remote fault cnt, port: %d\n",
+			port);
+		return FAULT_RECONFIG;
+	}
+
+	/* need to check the first lane only */
+	pcs_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs", 0x20);
+
+	if ((pcs_val >> 12) & 1)
+		return LINK_UP;
+
+	return PCS_DOWN;
+}
+
+static void check_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	enum link_status link_status;
+	u32 port = gaudi_nic->port;
+
+	if (!gaudi->nic_check_link)
+		return;
+
+	link_status = _check_pcs_link(gaudi_nic);
+	if ((link_status == PCS_DOWN) || (link_status == PHY_DOWN)) {
+		/* Try again to overcome a momentary glitch */
+		msleep(PCS_LINK_RETRY_MSEC);
+
+		link_status = _check_pcs_link(gaudi_nic);
+
+		if (link_status == LINK_UP)
+			dev_info(hdev->dev, "PCS link restored, port %d\n",
+					port);
+	}
+
+	if (link_status == LINK_UP)
+		return;
+
+	set_port_status(gaudi_nic, false);
+	gaudi_nic->pcs_link = false;
+	gaudi_nic->last_pcs_link_drop_ts = ktime_get();
+
+	dev_info(hdev->dev, "%s lost signal, port %d\n",
+			(link_status == PHY_DOWN) ? "PHY" : "PCS", port);
+
+	/* TODO: fix the bug in the retimer to remove this Tx reset WA */
+	/*
+	 * No need to update about the PCS failure if we already need to
+	 * reconfigure the PHY.
+	 */
+	if (link_status == FAULT_RECONFIG)
+		reset_tx(gaudi_nic);
+	else
+		link_status = update_pcs_link_failure(gaudi_nic);
+
+	if ((link_status == FAULT_RECONFIG) ||
+			(link_status == FAIL_RECONFIG))
+		phy_reconfig(gaudi_nic);
+}
+
+static void acquire_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	u32 port = gaudi_nic->port, pcs_val, start_lane;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+
+	/* need to check the first lane only */
+	pcs_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs", 0x20);
+	gaudi_nic->pcs_link = (pcs_val >> 12) & 1;
+	gaudi_nic->retry_cnt++;
+
+	if (gaudi_nic->pcs_link) {
+		dev_info(hdev->dev, "PCS link up, port %d\n", port);
+		set_port_status(gaudi_nic, true);
+		gaudi_nic->retry_cnt = 0;
+	} else if (gaudi_nic->retry_cnt == PCS_LINK_CNT) {
+		if (ktime_after(gaudi_nic->last_fw_tuning_ts,
+				gaudi_nic->last_pcs_link_drop_ts))
+			dev_dbg(hdev->dev,
+				"PHY_reconfig due to PCS link down after F/W tuning, port %d\n",
+				port);
+		else
+			dev_dbg(hdev->dev,
+				"PHY reconfig due to PCS link cnt, port %d\n",
+				port);
+		phy_reconfig(gaudi_nic);
+	}
+}
+
+static void do_fw_tuning(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int i, rc = 0;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_fw_tuning(gaudi_nic, i, true);
+		if (rc) {
+			if (rc == -EAGAIN) {
+				if (gaudi_nic->retry_cnt++ == FW_TUNING_CNT) {
+					dev_dbg(hdev->dev,
+						"PHY reconfig due to F/W tuning cnt, port %d, lane %d\n",
+						port, i);
+					phy_reconfig(gaudi_nic);
+				}
+			} else {
+				dev_dbg(hdev->dev,
+					"PHY F/W tuning failed for port %d, lane %d, rc %d\n",
+					port, i, rc);
+				phy_reconfig(gaudi_nic);
+			}
+			break;
+		}
+	}
+
+	if (!rc) {
+		gaudi_nic->phy_fw_tuned = true;
+		gaudi_nic->retry_cnt = 0;
+		gaudi_nic->last_fw_tuning_ts = ktime_get();
+	}
+}
+
+static void do_fw_tuning_auto_neg(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int i, rc;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->auto_neg_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_fw_tuning(gaudi_nic, i, false);
+		if (rc) {
+			if (rc != -EAGAIN)
+				dev_dbg(hdev->dev,
+					"PHY auto neg F/W tuning failed, port %d, lane %d, rc %d\n",
+					port, i, rc);
+			return;
+		}
+	}
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_config_pam4_link_training(gaudi_nic, i);
+		if (rc) {
+			if (rc == -EAGAIN) {
+				if (gaudi_nic->retry_cnt++ ==
+						FW_LINK_TRAINING_CNT) {
+					dev_dbg(hdev->dev,
+						"PHY reconfig due to PAM4 cnt, port: %d, lane: %d\n",
+						port, i);
+					phy_reconfig(gaudi_nic);
+				}
+			} else {
+				dev_dbg(hdev->dev,
+					"PHY auto neg F/W speed config failed, port %d, lane %d, rc %d\n",
+					port, i, rc);
+				phy_reconfig(gaudi_nic);
+			}
+
+			return;
+		}
+	}
+
+	dev_dbg(hdev->dev, "auto neg done, port: %d\n", port);
+	gaudi_nic->auto_neg_resolved = true;
+	gaudi_nic->retry_cnt = 0;
+	do_fw_tuning(gaudi_nic);
+}
+
+static void check_link_status(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							link_status_work.work);
+	u32 timeout_ms;
+
+	if (gaudi_nic->phy_fw_tuned) {
+		if (gaudi_nic->pcs_link)
+			check_pcs_link(gaudi_nic);
+		else
+			acquire_pcs_link(gaudi_nic);
+	} else {
+		if (gaudi_nic->auto_neg_enable && !gaudi_nic->auto_neg_resolved)
+			do_fw_tuning_auto_neg(gaudi_nic);
+		else
+			do_fw_tuning(gaudi_nic);
+	}
+
+	if (gaudi_nic->pcs_link)
+		timeout_ms = 1000;
+	else if (gaudi_nic->phy_fw_tuned)
+		timeout_ms = 500;
+	else
+		timeout_ms = 1;
+
+	schedule_delayed_work(&gaudi_nic->link_status_work,
+				msecs_to_jiffies(timeout_ms));
+}
+
 static int _gaudi_nic_sw_init(struct gaudi_nic_device *gaudi_nic)
 {
 	struct hl_device *hdev = gaudi_nic->hdev;
@@ -1601,7 +1994,13 @@ static int port_open(struct gaudi_nic_device *gaudi_nic)
 		napi_enable(&gaudi_nic->napi);
 	}
 
-	set_port_status(gaudi_nic, true);
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
+		INIT_DELAYED_WORK(&gaudi_nic->link_status_work,
+					check_link_status);
+		schedule_delayed_work(&gaudi_nic->link_status_work, 0);
+	} else {
+		set_port_status(gaudi_nic, true);
+	}
 
 	gaudi_nic->port_open = true;
 
@@ -1653,10 +2052,17 @@ static void port_close(struct gaudi_nic_device *gaudi_nic)
 	gaudi_nic->port_open = false;
 	gaudi_nic->active = false;
 
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback)
+		cancel_delayed_work_sync(&gaudi_nic->link_status_work);
+
 	/* Print if not in hard reset flow e.g. from ifconfig */
 	if (gaudi_nic->pcs_link && !hdev->hard_reset_pending)
 		dev_info(hdev->dev, "port %d was closed\n", port);
 
+	/* stop F/W so the peer port will also lose link */
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback)
+		phy_start_stop(gaudi_nic, false);
+
 	port_reset_state(gaudi_nic);
 
 	kfifo_free(&gaudi_nic->pcs_fail_fifo);
@@ -1911,6 +2317,19 @@ static int port_register(struct hl_device *hdev, int port)
 	ether_addr_copy(ndev->dev_addr,
 		hdev->asic_prop.cpucp_nic_info.mac_addrs[port].mac_addr);
 
+	/*
+	 * Reset the NIC macro PHY before the PHY configuration by each port.
+	 * This function resets all the 4 lanes in the PHY macro, therefore only
+	 * one of the two ports should call it.
+	 */
+	if (gaudi->nic_phy_config_fw && gaudi_nic->do_macro_cfg) {
+		rc = gaudi_nic_phy_reset_macro(gaudi_nic);
+		if (rc)
+			dev_err(hdev->dev,
+				"PHY power up 1 failed for port %d\n",
+				port);
+	}
+
 	if (register_netdev(ndev)) {
 		dev_err(hdev->dev,
 			"Could not register netdevice, port: %d\n", port);
@@ -2080,6 +2499,24 @@ int gaudi_nic_ports_init(struct hl_device *hdev)
 				cpu_to_le32((card_location >> 22) & 0x7);
 	}
 
+	if (gaudi->nic_phy_load_fw) {
+		rc = gaudi_nic_phy_has_fw(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "NIC F/W file was not found\n");
+			return rc;
+		}
+
+		rc = gaudi_nic_phy_fw_load_all(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "NIC F/W load for all failed\n");
+			return rc;
+		}
+	}
+
+	if (gaudi->nic_phy_config_fw)
+		dev_dbg(hdev->dev, "NIC F/W CRC: 0x%x\n",
+				gaudi_nic_phy_get_crc(hdev));
+
 	for (i = 0 ; i < NIC_NUMBER_OF_MACROS ; i++) {
 		gaudi->nic_macros[i].idx = i;
 		gaudi->nic_macros[i].num_of_lanes = NIC_LANES_2;
@@ -2301,6 +2738,21 @@ void gaudi_nic_ports_reopen(struct hl_device *hdev)
 		gaudi_nic = &gaudi->nic_devices[i];
 		port = gaudi_nic->port;
 
+		/*
+		 * Reset the NIC macro PHY before the PHY configuration by each
+		 * port. This function resets all the 4 lanes in the PHY macro,
+		 * therefore only one of the two ports should call it.
+		 * This must be called before we check if the port is enabled,
+		 * as the PHY reset should be called anyway.
+		 */
+		if (gaudi->nic_phy_config_fw && gaudi_nic->do_macro_cfg) {
+			rc = gaudi_nic_phy_reset_macro(gaudi_nic);
+			if (rc)
+				dev_err(hdev->dev,
+					"PHY power up 1 failed for port %d\n",
+					port);
+		}
+
 		/*
 		 * It could be that the port was shutdown by 'ifconfig down',
 		 * and there is no need in reopening it.
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.h b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
index 7259b01b78fb..2aa6ef712073 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.h
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
@@ -332,5 +332,22 @@ int gaudi_nic_port_reset(struct gaudi_nic_device *gaudi_nic);
 bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic);
 u64 gaudi_nic_read_mac_stat_counter(struct hl_device *hdev, u32 port, int idx,
 					bool is_rx);
+int gaudi_nic_phy_reset_macro(struct gaudi_nic_device *gaudi_nic);
+int gaudi_nic_phy_power_up(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool do_auto_neg);
+int gaudi_nic_phy_has_fw(struct hl_device *hdev);
+int gaudi_nic_phy_fw_tuning(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool check_status);
+int gaudi_nic_phy_fw_load_all(struct hl_device *hdev);
+int gaudi_nic_phy_check_link_status(struct gaudi_nic_device *gaudi_nic,
+					int lane);
+int gaudi_nic_phy_config_pam4_link_training(struct gaudi_nic_device *gaudi_nic,
+						int lane);
+int gaudi_nic_phy_fw_config_auto_neg(struct gaudi_nic_device *gaudi_nic,
+					int lane);
+u16 gaudi_nic_phy_get_crc(struct hl_device *hdev);
+void gaudi_nic_phy_reset_tx(struct gaudi_nic_device *gaudi_nic, int lane);
+void gaudi_nic_phy_start_stop(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool is_start);
 
 #endif /* GAUDI_NIC_DRV_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_phy.c b/drivers/misc/habanalabs/gaudi/gaudi_phy.c
new file mode 100644
index 000000000000..5ab8619502fd
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_phy.c
@@ -0,0 +1,1276 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include "../include/gaudi/asic_reg/gaudi_regs.h"
+
+#include <linux/module.h>
+#include <linux/firmware.h>
+#include <asm/unaligned.h>
+
+#define HL_PHY_DEBUG 0
+
+#define GAUDI_PHY_FW_FILE	"habanalabs/gaudi/gaudi_nic_fw.bin"
+
+#define PHY_READ_COUNTS_PER_MS	1000
+#define PHY_FW_SIZE		0x1020
+#define PHY_FW_FINISHED		(1 << 2)
+#define PHY_FW_ERROR		(1 << 3)
+
+#define NIC0_PHY_BASE		(mmNIC0_PHY_BASE - CFG_BASE)
+
+static void phy_write_all(struct hl_device *hdev, u32 addr, u32 data)
+{
+	int lane, port;
+
+	for (port = 0 ; port < 10 ; port += 2)
+		for (lane = 0 ; lane < 4 ; lane++) {
+			NIC_MACRO_WREG32_PORT(NIC0_PHY_BASE + 0xF60 + lane * 4,
+						addr, port);
+
+			/* only the lower 16 bits are in use */
+			NIC_MACRO_WREG32_PORT(NIC0_PHY_BASE - 0x8000 + 0x2000 *
+						lane, data & 0xFFFF, port);
+		}
+}
+
+static void phy_write_port(struct hl_device *hdev, int port, int lane, u32 addr,
+				u32 data)
+{
+	NIC_MACRO_WREG32_PORT(NIC0_PHY_BASE + 0xF60 + lane * 4, addr, port);
+
+	/* only the lower 16 bits are in use */
+	NIC_MACRO_WREG32_PORT(NIC0_PHY_BASE - 0x8000 + 0x2000 * lane,
+				data & 0xFFFF, port);
+}
+
+static void phy_write(struct gaudi_nic_device *gaudi_nic, int lane, u32 addr,
+			u32 data)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	NIC_MACRO_WREG32(NIC0_PHY_BASE + 0xF60 + lane * 4, addr);
+
+	/* only the lower 16 bits are in use */
+	NIC_MACRO_WREG32(NIC0_PHY_BASE - 0x8000 + 0x2000 * lane, data & 0xFFFF);
+}
+
+static u32 phy_read_port(struct hl_device *hdev, int port, int lane, u32 addr)
+{
+	NIC_MACRO_WREG32_PORT(NIC0_PHY_BASE + 0xF60 + lane * 4, addr, port);
+
+	/* only the lower 16 bits are in use */
+	return NIC_MACRO_RREG32_PORT(NIC0_PHY_BASE - 0x8000 + 0x2000 * lane,
+					port) & 0xFFFF;
+}
+
+static u32 phy_read(struct gaudi_nic_device *gaudi_nic, int lane, u32 addr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	NIC_MACRO_WREG32(NIC0_PHY_BASE + 0xF60 + lane * 4, addr);
+
+	/* only the lower 16 bits are in use */
+	return NIC_MACRO_RREG32(NIC0_PHY_BASE - 0x8000 + 0x2000 * lane) &
+									0xFFFF;
+}
+
+static void phy_write_mask(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 addr, u32 raw_data, u32 mask)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 data;
+
+	NIC_MACRO_WREG32(NIC0_PHY_BASE + 0xF60 + lane * 4, addr);
+
+	data = (NIC_MACRO_RREG32(NIC0_PHY_BASE - 0x8000 + 0x2000 * lane)) &
+									0xFFFF;
+	data = (data & ~mask) | (((raw_data << (__ffs(mask) % 32))) & 0xFFFF);
+
+	NIC_MACRO_WREG32(NIC0_PHY_BASE - 0x8000 + 0x2000 * lane, data);
+}
+
+static u32 twos_to_int(s32 twos_val, u32 bitWidth)
+{
+	return (u32) ((s32) (twos_val) -
+				((s32) ((twos_val << 1) & (1 << bitWidth))));
+}
+
+static int fw_cmd_port(struct hl_device *hdev, int port, int lane, u32 cmd,
+			u32 detail, u32 expected_res, u32 *res_ptr)
+{
+	u32 res, val;
+	int checks;
+
+	if (detail)
+		phy_write_port(hdev, port, lane, 0x9816, detail);
+
+	phy_write_port(hdev, port, lane, 0x9815, cmd);
+
+	checks = 0;
+	do {
+		usleep_range(1000, 2000);
+		res = phy_read_port(hdev, port, lane, 0x9815);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_err(hdev->dev, "timeout for PHY cmd 0x%x\n", cmd);
+			return -ETIMEDOUT;
+		}
+	} while (res == cmd);
+
+	val = (res >> 8) & 0xF;
+	if (val != expected_res) {
+		dev_err(hdev->dev, "cmd 0x%x returned error 0x%x\n", cmd, val);
+		return -EFAULT;
+	}
+
+	*res_ptr = res;
+
+	return 0;
+}
+
+static int fw_cmd(struct gaudi_nic_device *gaudi_nic, int lane, u32 cmd,
+			u32 detail, u32 expected_res, u32 *res_ptr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 res, val;
+	int checks;
+
+	if (detail)
+		phy_write(gaudi_nic, lane, 0x9816, detail);
+
+	phy_write(gaudi_nic, lane, 0x9815, cmd);
+
+	checks = 0;
+	do {
+		usleep_range(1000, 2000);
+		res = phy_read(gaudi_nic, lane, 0x9815);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_dbg(hdev->dev,
+				"timeout for PHY cmd 0x%x port %d lane %d\n",
+				cmd, port, lane);
+			return -ETIMEDOUT;
+		}
+	} while (res == cmd);
+
+	val = (res >> 8) & 0xF;
+	if (val != expected_res) {
+		dev_dbg(hdev->dev,
+			"cmd 0x%x returned error 0x%x port %d lane %d\n", cmd,
+			val, port, lane);
+		return -EFAULT;
+	}
+
+	*res_ptr = res;
+
+	return 0;
+}
+
+static int fw_hash_port(struct hl_device *hdev, int port, int lane, u32 *hash)
+{
+	u32 res, low_word;
+	int rc;
+
+	rc = fw_cmd_port(hdev, port, lane, 0xF000, 0, 0xF, &res);
+	if (rc) {
+		dev_err(hdev->dev, "F/W hash failed for port %d lane %d\n",
+			port, lane);
+		return rc;
+	}
+
+	low_word = phy_read_port(hdev, port, lane, 0x9816);
+
+	*hash = ((res & 0xFF) << 16) | low_word;
+
+	return 0;
+}
+
+static void set_pll(struct gaudi_nic_device *gaudi_nic, int lane, u32 data_rate,
+			bool pam4)
+{
+	u32 pll_n_val = 0, pll_cap_val = 0;
+	bool div4 = 1; /* for easy debug in the future */
+
+	phy_write_mask(gaudi_nic, lane, 0xFF, 1, 1 << 5);
+
+	if (!pam4)
+		phy_write_mask(gaudi_nic, lane, 0x179, data_rate == NIC_DR_10,
+				1);
+
+	if (data_rate == NIC_DR_50) {
+		if (div4)
+			pll_n_val = 170;
+		else
+			pll_n_val = 42;
+
+		pll_cap_val = 10;
+	} else if (data_rate == NIC_DR_25) {
+		if (div4)
+			pll_n_val = 165;
+		else
+			pll_n_val = 41;
+
+		pll_cap_val = 12;
+	} else if (data_rate == NIC_DR_10) {
+		if (div4)
+			pll_n_val = 132;
+		else
+			pll_n_val = 33;
+
+		pll_cap_val = 34;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xFD, pll_n_val, 0xFF80);
+	phy_write_mask(gaudi_nic, lane, 0xFC, pll_cap_val, 0xFC00);
+}
+
+static void set_tx_taps(struct gaudi_nic_device *gaudi_nic, int lane,
+			s32 tx_pre2, s32 tx_pre1, s32 tx_main, s32 tx_post1,
+			s32 tx_post2)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAD, twos_to_int(tx_pre2, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xAB, twos_to_int(tx_pre1, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA9, twos_to_int(tx_main, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA7, twos_to_int(tx_post1, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA5, twos_to_int(tx_post2, 8), 0xFF00);
+}
+
+static void config_nrz_tx(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool half_rate)
+{
+	phy_write(gaudi_nic, lane, 0xAF, 0xF83E);
+	phy_write(gaudi_nic, lane, 0xB0, 0x4802);
+	phy_write_mask(gaudi_nic, lane, 0xB0, half_rate ? 1 : 0, 1);
+	phy_write_mask(gaudi_nic, lane, 0xB0, 0, 0x800);
+	phy_write_mask(gaudi_nic, lane, 0xB0, 1, 0x800);
+	phy_write(gaudi_nic, lane, 0xA0, 0xE300);
+	set_tx_taps(gaudi_nic, lane, 0, -4, 25, 0, 0);
+}
+
+static void config_pam4_tx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	u32 lane_idx = (gaudi_nic->port >> 1) * NIC_MAC_NUM_OF_LANES + lane;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	s32 *taps;
+
+	taps = gaudi->nic_pam4_tx_taps[lane_idx].taps;
+
+	phy_write(gaudi_nic, lane, 0xAF, 0xF83E);
+	phy_write(gaudi_nic, lane, 0xB0, 0);
+	phy_write(gaudi_nic, lane, 0xB0, 0x800);
+	phy_write(gaudi_nic, lane, 0xB0, 0);
+	phy_write(gaudi_nic, lane, 0xA0, 0xEF00);
+	set_tx_taps(gaudi_nic, lane, taps[0], taps[1], taps[2], taps[3],
+			taps[4]);
+}
+
+static void pol(struct gaudi_nic_device *gaudi_nic, int lane, bool pam4,
+		u32 tx_pol, u32 rx_pol)
+{
+	phy_write_mask(gaudi_nic, lane, 0xA0, tx_pol, 0x20);
+	phy_write_mask(gaudi_nic, lane, 0x161, rx_pol, 0x4000); /* nrz */
+	phy_write_mask(gaudi_nic, lane, 0x43, rx_pol, 0x80); /* pam4 */
+}
+
+static void msblsb(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_msblsb,
+			u32 rx_msblsb)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_msblsb, 0x400);
+	phy_write_mask(gaudi_nic, lane, 0x43, rx_msblsb, 0x8000);
+}
+
+static void gc(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_gc,
+		u32 rx_gc)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_gc, 0x200);
+	phy_write_mask(gaudi_nic, lane, 0x42, rx_gc, 1);
+}
+
+static void pc(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_pc,
+		u32 rx_pc)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_pc, 0x100);
+	phy_write_mask(gaudi_nic, lane, 0x42, rx_pc, 2);
+}
+
+static void set_prbs_type(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool pam4, char *pat)
+{
+	u32 prbs_mode_sel_addr;
+	u32 prbs_mode_sel_mask;
+	u32 pat_sel = 0;
+
+	if (pam4) {
+		prbs_mode_sel_addr = 0x43;
+		prbs_mode_sel_mask = 0x60;
+	} else {
+		prbs_mode_sel_addr = 0x161;
+		prbs_mode_sel_mask = 0x3000;
+	}
+
+	if (pam4) {
+		if (!strncmp(pat, "PRBS9", strlen(pat)))
+			pat_sel = 0;
+		else if (!strncmp(pat, "PRBS13", strlen(pat)))
+			pat_sel = 1;
+		else if (!strncmp(pat, "PRBS15", strlen(pat)))
+			pat_sel = 2;
+		else if (!strncmp(pat, "PRBS31", strlen(pat)))
+			pat_sel = 3;
+	} else {
+		if (!strncmp(pat, "PRBS9", strlen(pat)))
+			pat_sel = 0;
+		else if (!strncmp(pat, "PRBS15", strlen(pat)))
+			pat_sel = 1;
+		else if (!strncmp(pat, "PRBS23", strlen(pat)))
+			pat_sel = 2;
+		else if (!strncmp(pat, "PRBS31", strlen(pat)))
+			pat_sel = 3;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xA0, pat_sel, 0x300);
+	phy_write_mask(gaudi_nic, lane, prbs_mode_sel_addr, pat_sel,
+			prbs_mode_sel_mask);
+}
+
+static void get_pol_tx_rx(struct gaudi_nic_device *gaudi_nic, u32 lane_idx,
+				u32 *pol_tx, u32 *pol_rx)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 card_location;
+
+	card_location = le32_to_cpu(hdev->asic_prop.cpucp_info.card_location);
+
+	switch (hdev->card_type) {
+	case cpucp_card_type_pci:
+		switch (lane_idx) {
+		case 0 ... 3:
+		case 10 ... 11:
+			*pol_tx = 0;
+			*pol_rx = 0;
+			break;
+		case 5 ... 8:
+		case 12:
+		case 16:
+			*pol_tx = 0;
+			*pol_rx = 1;
+			break;
+		case 15:
+		case 19:
+			*pol_tx = 1;
+			*pol_rx = 0;
+			break;
+		case 4:
+		case 9:
+		case 13 ... 14:
+		case 17 ... 18:
+			*pol_tx = 1;
+			*pol_rx = 1;
+			break;
+		default:
+			dev_err(hdev->dev, "PCI NIC %d wrong lane idx %d\n",
+				gaudi_nic->port, lane_idx);
+			break;
+		}
+		break;
+
+	case cpucp_card_type_pmc:
+		*pol_tx = *pol_rx = 0;
+		switch (card_location) {
+		case 0:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 1:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 2:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 3:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 4:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 5:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 10:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 6:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 7:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		}
+		break;
+	default:
+		dev_err(hdev->dev, "wrong card type %d\n", hdev->card_type);
+		break;
+	}
+}
+
+static void config_connection(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool pam4, bool do_auto_neg)
+{
+	u32 lane_idx = (gaudi_nic->port >> 1) * NIC_MAC_NUM_OF_LANES + lane;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct cpucp_nic_info *nic_info;
+	char *prbs = "PRBS31";
+	u32 msblsb_tx = 0;
+	u32 msblsb_rx = 0;
+	u32 pol_tx = 0;
+	u32 pol_rx = 0;
+	u32 gc_tx = 1;
+	u32 gc_rx = 1;
+	u32 pc_tx = 0;
+	u32 pc_rx = 0;
+
+	nic_info = &hdev->asic_prop.cpucp_nic_info;
+
+	if (!pam4)
+		gc_tx = gc_rx = 0;
+
+	if (gaudi->nic_use_fw_polarity) {
+		pol_tx =
+			(le64_to_cpu(nic_info->pol_tx_mask[0]) >> lane_idx) & 1;
+		pol_rx =
+			(le64_to_cpu(nic_info->pol_rx_mask[0]) >> lane_idx) & 1;
+	} else {
+		get_pol_tx_rx(gaudi_nic, lane_idx, &pol_tx, &pol_rx);
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xF7, 1, 0x1000);
+	pol(gaudi_nic, lane, pam4, pol_tx, pol_rx);
+	msblsb(gaudi_nic, lane, msblsb_tx, msblsb_rx);
+	gc(gaudi_nic, lane, gc_tx, gc_rx);
+	pc(gaudi_nic, lane, pc_tx, pc_rx);
+
+	set_prbs_type(gaudi_nic, lane, pam4, prbs);
+}
+
+static void functional_mode(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool pam4)
+{
+	if (!pam4) {
+		phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+		phy_write_mask(gaudi_nic, lane, 0x161, 0, 0x400);
+	} else {
+		phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+		phy_write_mask(gaudi_nic, lane, 0x43, 0, 0x10);
+	}
+}
+
+static u32 get_fw_reg(struct gaudi_nic_device *gaudi_nic, int lane, u32 fw_addr)
+{
+	u32 ignore;
+
+	fw_cmd(gaudi_nic, lane, 0xE010, fw_addr, 0xE, &ignore);
+
+	return phy_read(gaudi_nic, lane, 0x9812);
+}
+
+static void config_pam4_fw_rx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x1000);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0400);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0800);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0200);
+
+	phy_write(gaudi_nic, lane, 0x43, 0x8CFA);
+	phy_write(gaudi_nic, lane, 0x44, 0x1035);
+	phy_write(gaudi_nic, lane, 0x45, 0x1008);
+}
+
+static int fw_config_speed_nrz(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 data_rate, u32 speed, bool half_rate,
+				bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 ignore;
+	int rc, i;
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80C0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed nrz configuration of lane %d\n",
+			lane);
+		return rc;
+	}
+
+	config_nrz_tx(gaudi_nic, lane, half_rate);
+	phy_write_mask(gaudi_nic, lane, 0x0161, 0x1D, 0xFC00);
+	config_connection(gaudi_nic, lane, pam4, false);
+	functional_mode(gaudi_nic, lane, pam4);
+
+	/* clock configuration */
+	for (i = 0 ; i < 4 ; i++)
+		if (i == 0)
+			phy_write(gaudi_nic, i, 0x00C9, 0x390);
+		else
+			phy_write(gaudi_nic, i, 0x00C9, 0x310);
+
+	set_pll(gaudi_nic, lane, data_rate, pam4);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+int gaudi_nic_phy_fw_config_auto_neg(struct gaudi_nic_device *gaudi_nic,
+					int lane)
+{
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u64 basepage = 0x000080000001;
+	u32 ignore;
+	int rc;
+
+	usleep_range(500, 1000);
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	set_pll(gaudi_nic, lane, NIC_DR_25, false);
+
+	/* Disable AN/LT lane swapping */
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+	config_nrz_tx(gaudi_nic, lane, 0);
+
+	/* config_nrz_fw_rx */
+	phy_write_mask(gaudi_nic, lane, 0x0161, 0x1D, 0x0);
+	config_connection(gaudi_nic, lane, false, true);
+
+	phy_write_mask(gaudi_nic, lane, 0x8300, 7, 0xE000);
+
+	/* AN mode */
+	phy_write(gaudi_nic, lane, 0x8010, basepage & 0xffff);
+	phy_write(gaudi_nic, lane, 0x8011, (basepage >> 16) & 0xffff);
+	phy_write(gaudi_nic, lane, 0x8012, (basepage >> 32) & 0xffff);
+
+	/* IEEE */
+	phy_write_mask(gaudi_nic, lane, 0x8300, 1, 0x1000);
+
+	if (gaudi->nic_phy_auto_neg_lpbk)
+		phy_write_mask(gaudi_nic, lane, 0x8300, 1, 0x400);
+
+	/* set FW to start AN */
+	rc = fw_cmd(gaudi_nic, lane, 0x8000, 0, 8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd 0x8000 failed for auto neg, port %d, lane %d\n",
+			gaudi_nic->port, lane);
+		return rc;
+	}
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+static int fw_config_speed_pam4(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 data_rate, u32 speed, bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 ignore;
+	int rc;
+
+	dev_dbg(hdev->dev,
+		"port: %d, lane: %d, data rate: %d, pam4: %d, speed: %d\n",
+		gaudi_nic->port, lane, data_rate, pam4, speed);
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80D0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed pam4 configuration of lane %d\n",
+			lane);
+		return rc;
+	}
+
+	config_pam4_tx(gaudi_nic, lane);
+	config_pam4_fw_rx(gaudi_nic, lane);
+	config_connection(gaudi_nic, lane, pam4, false);
+	functional_mode(gaudi_nic, lane, pam4);
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+int gaudi_nic_phy_config_pam4_link_training(struct gaudi_nic_device *gaudi_nic,
+						int lane)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 ignore, speed = 9;
+	int rc;
+
+#if HL_PHY_DEBUG
+	dev_dbg(hdev->dev, "NIC %d lane: %d, speed: %d\n", port, lane, speed);
+#endif
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	/* Disable lane swapping */
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+
+	/* Enable Link Training */
+	speed |= 0x100;
+
+	config_pam4_tx(gaudi_nic, lane);
+	phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+	config_pam4_fw_rx(gaudi_nic, lane);
+	config_connection(gaudi_nic, lane, true, false);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80D0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed pam4 configuration of port %d lane %d\n",
+			port, lane);
+		return rc;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xAF, 0, 0x200);
+	phy_write_mask(gaudi_nic, lane, 0xAF, 0, 0x100);
+	phy_write_mask(gaudi_nic, lane, 0x42, 0, 0x2);
+	phy_write_mask(gaudi_nic, lane, 0x42, 0, 0x1);
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+static int fw_config(struct gaudi_nic_device *gaudi_nic, int lane,
+			u32 data_rate, bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	set_pll(gaudi_nic, lane, data_rate, pam4);
+
+	if (data_rate == NIC_DR_10)
+		return fw_config_speed_nrz(gaudi_nic, lane, data_rate, 1, 1,
+						fmode, pam4);
+	else if (data_rate == NIC_DR_25 || data_rate == NIC_DR_26)
+		return fw_config_speed_nrz(gaudi_nic, lane, data_rate, 3, 0,
+						fmode, pam4);
+	else if (data_rate == NIC_DR_50)
+		return fw_config_speed_pam4(gaudi_nic, lane, data_rate, 9,
+						fmode, pam4);
+
+	dev_err(hdev->dev, "invalid data_rate %d\n", data_rate);
+
+	return -EFAULT;
+}
+
+static int fw_crc_port(struct hl_device *hdev, int port, int lane, u16 *crc)
+{
+	u32 res;
+	int rc;
+
+	rc = fw_cmd_port(hdev, port, lane, 0xF001, 0, 0xF, &res);
+	if (rc) {
+		dev_err(hdev->dev, "F/W crc failed for port %d lane %d\n", port,
+			lane);
+		return rc;
+	}
+
+	*crc = phy_read_port(hdev, port, lane, 0x9816) & 0xFFFF;
+
+	return 0;
+}
+
+int gaudi_nic_phy_has_fw(struct hl_device *hdev)
+{
+	const struct firmware *fw;
+	int rc;
+
+	rc = request_firmware(&fw, GAUDI_PHY_FW_FILE, hdev->dev);
+	if (rc) {
+		dev_err(hdev->dev, "Firmware file %s is not found!\n",
+				GAUDI_PHY_FW_FILE);
+		return rc;
+	}
+
+	if (fw->size < PHY_FW_SIZE) {
+		dev_err(hdev->dev, "Illegal %s firmware size %zu\n",
+				GAUDI_PHY_FW_FILE, fw->size);
+		rc = -EFAULT;
+	}
+
+	release_firmware(fw);
+
+	return rc;
+}
+
+static void fw_unload_all(struct hl_device *hdev, bool pam4)
+{
+	phy_write_all(hdev, 0x9814, 0xFFF0);
+	phy_write_all(hdev, 0x980D, 0xAAA);
+	phy_write_all(hdev, 0x980D, 0);
+
+	msleep(100);
+
+	phy_write_all(hdev, 0x9814, 0);
+
+	if (pam4)
+		phy_write_all(hdev, 0x11, 0);
+	else
+		phy_write_all(hdev, 0x10B, 0);
+}
+
+u16 gaudi_nic_phy_get_crc(struct hl_device *hdev)
+{
+	u16 crc = 0;
+
+	fw_crc_port(hdev, 0, 0, &crc);
+
+	return crc;
+}
+
+int gaudi_nic_phy_fw_load_all(struct hl_device *hdev)
+{
+	u32 entry_point, length, ram_addr, sections, status, checks, hash = 0,
+		checksum = 0x800C, fw0 = 0x9F00, fw1 = 0x980D, fw2 = 0x9814;
+	int rc, i, j, port, data_ptr = 0, lane = 0;
+	const struct firmware *fw;
+	u16 mdio_data, crc = 0;
+	const void *fw_data;
+	bool pam4 = true; /* for debug */
+
+	fw_unload_all(hdev, pam4);
+
+	rc = request_firmware(&fw, GAUDI_PHY_FW_FILE, hdev->dev);
+	if (rc) {
+		dev_err(hdev->dev, "Firmware file %s is not found!\n",
+				GAUDI_PHY_FW_FILE);
+		return rc;
+	}
+
+	if (fw->size < PHY_FW_SIZE) {
+		dev_err(hdev->dev, "Illegal %s firmware size %zu\n",
+				GAUDI_PHY_FW_FILE, fw->size);
+		release_firmware(fw);
+		return -EFAULT;
+	}
+
+	fw_data = (const void *) fw->data;
+	fw_data += 0x1000;
+
+	/* skip hash, crc and date */
+	entry_point = get_unaligned_be32(fw_data + 8);
+	length = get_unaligned_be32(fw_data + 12);
+	ram_addr = get_unaligned_be32(fw_data + 16);
+
+	dev_dbg(hdev->dev, "entry_point: 0x%x\n", entry_point);
+	dev_dbg(hdev->dev, "length: 0x%x\n", length);
+
+	fw_data += 20;
+
+	sections = DIV_ROUND_UP(length, 24);
+
+	dev_dbg(hdev->dev, "sections: %d\n", sections);
+
+	phy_write_all(hdev, fw2, 0xFFF0);
+	phy_write_all(hdev, fw1, 0x0AAA);
+	phy_write_all(hdev, fw1, 0);
+
+	msleep(500);
+
+	checks = 0;
+	do {
+		usleep_range(10000, 20000);
+		status = phy_read_port(hdev, 0, 0, fw2);
+		dev_dbg(hdev->dev, "lane: %d, status: 0x%x\n", lane, status);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_err(hdev->dev,
+				"failed to load NIC F/W, fw2 timeout 0x%x\n",
+				status);
+			release_firmware(fw);
+			return -ETIMEDOUT;
+		}
+	} while (status);
+
+	phy_write_all(hdev, fw2, 0);
+
+	for (i = 0 ; i <= sections ; i++) {
+		checksum = 0x800C;
+		phy_write_all(hdev, fw0 + 12, ram_addr >> 16);
+		phy_write_all(hdev, fw0 + 13, ram_addr & 0xFFFF);
+		checksum += (ram_addr >> 16) + (ram_addr & 0xFFFF);
+		for (j = 0 ; j < 12 ; j++) {
+			if (data_ptr >= length)
+				mdio_data = 0;
+			else
+				mdio_data =
+					get_unaligned_be16(fw_data + data_ptr);
+
+			phy_write_all(hdev, fw0 + j, mdio_data);
+			checksum += mdio_data;
+			data_ptr += 2;
+			ram_addr += 2;
+		}
+
+		phy_write_all(hdev, fw0 + 14, (~checksum + 1) & 0xFFFF);
+		phy_write_all(hdev, fw0 + 15, 0x800C);
+
+		checks = 0;
+
+		do {
+			usleep_range(1000, 2000);
+			status = phy_read_port(hdev, 0, 0, fw0 + 15);
+			if (checks++ > PHY_READ_COUNTS_PER_MS) {
+				dev_err(hdev->dev,
+					"failed to load NIC F/W, fw0 timeout 0x%x\n",
+					status);
+				release_firmware(fw);
+				return -ETIMEDOUT;
+			}
+		} while (status == 0x800C);
+	}
+
+	phy_write_all(hdev, fw0 + 12, entry_point >> 16);
+	phy_write_all(hdev, fw0 + 13, entry_point & 0xFFFF);
+	checksum = (entry_point >> 16) + (entry_point & 0xFFFF) + 0x4000;
+	phy_write_all(hdev, fw0 + 14, (~checksum + 1) & 0xFFFF);
+	phy_write_all(hdev, fw0 + 15, 0x4000);
+
+	for (port = 0 ; port < 1 ; port += 2)
+		for (lane = 0 ; lane < 1 ; lane++) {
+			fw_crc_port(hdev, port, lane, &crc);
+			dev_dbg(hdev->dev, "port: %d lane: %d crc: 0x%x\n",
+				port, lane, crc);
+			fw_hash_port(hdev, port, lane, &hash);
+			dev_dbg(hdev->dev, "port: %d lane: %d hash: 0x%x\n",
+				port, lane, hash);
+		}
+
+	return 0;
+}
+
+static u32 fw_tuning_counter(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	return get_fw_reg(gaudi_nic, lane, 5);
+}
+
+static u32 fw_reset_counter(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	return get_fw_reg(gaudi_nic, lane, 4);
+}
+
+static void print_eye(struct gaudi_nic_device *gaudi_nic, int lane, bool pam4)
+{
+	s32 plus_margin, minus_margin, result, diff;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int pam4_eye[3], eye_index, i, sel;
+	u32 dac, eye, mask, val1, val2;
+
+	if (pam4) {
+		dac = (phy_read(gaudi_nic, lane, 0x28) & 0x1E0) >> 5;
+		for (eye_index = 0; eye_index < 3; eye_index++) {
+			result = 0xffff;
+			for (i = 0; i < 3; i++) {
+				sel = 3 * i + eye_index;
+				phy_write_mask(gaudi_nic, lane, 0x88, sel,
+						0xF00);
+				phy_write_mask(gaudi_nic, lane, 0x88, sel,
+						0xF000);
+
+				msleep(100);
+
+				val1 = phy_read(gaudi_nic, lane, 0x32);
+				plus_margin = (val1 & 0xFFF0) >> 4;
+				if (plus_margin > 0x7ff)
+					plus_margin = plus_margin - 0x1000;
+
+				val1 = phy_read(gaudi_nic, lane, 0x32);
+				val2 = phy_read(gaudi_nic, lane, 0x33);
+				minus_margin = ((val1 & 0xF) << 8) +
+						((val2 & 0xFF00) >> 8);
+				if (minus_margin > 0x7ff)
+					minus_margin = minus_margin - 0x1000;
+
+				diff = plus_margin - minus_margin;
+				if (diff < result)
+					result = diff;
+			}
+
+			pam4_eye[eye_index] =
+					(result * (100 + (50 * dac))) / 2048;
+		}
+
+		dev_dbg(hdev->dev,
+			"NIC PAM4 dac: %d eye0: %d eye1: %d eye2: %d\n", dac,
+			pam4_eye[0], pam4_eye[1], pam4_eye[2]);
+	} else {
+		mask = 0xF000;
+		dac = (phy_read(gaudi_nic, lane, 0x17F) & mask) >> __ffs(mask);
+		mask = 0xFFF;
+		eye = (phy_read(gaudi_nic, lane, 0x12A) & mask) >> __ffs(mask);
+
+		dev_dbg(hdev->dev, "dac: %d, eye: %d\n", dac, eye);
+
+		if (eye > 0)
+			dev_dbg(hdev->dev,
+				"NIC port %d lane %d: F/W eye is %d\n",
+				gaudi_nic->port, lane,
+				(eye * (200 + 50 * dac)) / 2048);
+		else
+			dev_err(hdev->dev,
+				"NIC port %d lane %d: F/W got no eye\n",
+				gaudi_nic->port, lane);
+	}
+}
+
+int gaudi_nic_phy_check_link_status(struct gaudi_nic_device *gaudi_nic,
+					int lane)
+{
+	bool phy_ready, pam4 = gaudi_nic->data_rate == NIC_DR_50;
+#if HL_PHY_DEBUG
+	bool signal_detect;
+#endif
+	u32 phy_status;
+
+	if (pam4) {
+		phy_status = phy_read(gaudi_nic, lane, 0x6A);
+		phy_ready = ((phy_status & 0x8000) >> 15) & 1;
+#if HL_PHY_DEBUG
+		signal_detect = ((phy_status & 0x80) >> 7) & 1;
+#endif
+	} else {
+		phy_status = phy_read(gaudi_nic, lane, 0x12E);
+		phy_ready = ((phy_status & 0x4) >> 2) & 1;
+#if HL_PHY_DEBUG
+		signal_detect = ((phy_status & 0x8) >> 3) & 1;
+#endif
+	}
+
+#if HL_PHY_DEBUG
+	{
+		struct hl_device *hdev = gaudi_nic->hdev;
+
+		dev_dbg_ratelimited(hdev->dev,
+			"port: %d, lane, %d, phy ready: %d, signal detect: %d\n",
+			gaudi_nic->port, lane, phy_ready, signal_detect);
+	}
+#endif
+
+	return phy_ready ? 0 : -EFAULT;
+}
+
+int gaudi_nic_phy_fw_tuning(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool check_status)
+{
+	bool pam4 = (gaudi_nic->data_rate == NIC_DR_50);
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 status, port = gaudi_nic->port;
+
+	fw_tuning_counter(gaudi_nic, lane);
+	fw_reset_counter(gaudi_nic, lane);
+	status = phy_read(gaudi_nic, lane, 0x9811);
+
+	if (status & PHY_FW_FINISHED) {
+		if (status & PHY_FW_ERROR) {
+			dev_dbg(hdev->dev, "NIC %d lane %d F/W tuning failed\n",
+				port, lane);
+			return -EFAULT;
+		}
+#if HL_PHY_DEBUG
+		dev_dbg(hdev->dev,
+			"NIC %d lane %d F/W Tuning is done\n", port, lane);
+#endif
+	} else {
+		return -EAGAIN;
+	}
+
+	if (!gaudi_nic->auto_neg_enable) {
+		phy_write_mask(gaudi_nic, lane, 0x14D, 1, 1 << 15);
+		print_eye(gaudi_nic, lane, pam4);
+	} else if (!check_status) {
+		return 0;
+	}
+
+	return gaudi_nic_phy_check_link_status(gaudi_nic, lane);
+}
+
+int gaudi_nic_phy_power_up(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool do_auto_neg)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 data_rate = gaudi_nic->data_rate;
+	bool pam4, fmode = 0;
+	int rc;
+
+	pam4 = (data_rate == NIC_DR_50);
+
+	dev_dbg(hdev->dev, "PHY power up port %d lane %d auto_neg: %d\n",
+		gaudi_nic->port, lane, do_auto_neg);
+
+	/* F/W configurations */
+	if (gaudi_nic->auto_neg_enable) {
+		if (do_auto_neg) {
+			rc = gaudi_nic_phy_fw_config_auto_neg(gaudi_nic, lane);
+			if (rc) {
+				dev_err(hdev->dev,
+					"F/W configuration failed for NIC PHY\n");
+				return rc;
+			}
+		}
+	} else {
+		rc = fw_config(gaudi_nic, lane, data_rate, fmode, pam4);
+		if (rc) {
+			dev_err(hdev->dev,
+				"F/W configuration failed for NIC PHY\n");
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+int gaudi_nic_phy_reset_macro(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	s32 chip_reset_addr = 0x980D;
+	bool fmode = 0;
+	int rc, i;
+
+	dev_dbg(hdev->dev, "PHY reset macro, port %d\n", gaudi_nic->port);
+
+	/* soft reset */
+	for (i = 0 ; i < 4 ; i++)
+		phy_write(gaudi_nic, i, chip_reset_addr, 0x888);
+
+	usleep_range(500, 1000);
+
+	/* clock configuration */
+	for (i = 0 ; i < 4 ; i++)
+		if (i == 0)
+			phy_write(gaudi_nic, i, 0x00C9, 0x390);
+		else
+			phy_write(gaudi_nic, i, 0x00C9, 0x310);
+
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, 0x8000, 0xC000);
+		phy_write(gaudi_nic, i, 0x8210, 0);
+		phy_write(gaudi_nic, i, 0x8100, 0);
+	}
+
+	/* PHY controller reset - to force F/W to start from pointer 0 */
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, chip_reset_addr, 0xAAA);
+		phy_write(gaudi_nic, i, chip_reset_addr, 0);
+	}
+
+	/* force the lane pll to run in PAM4 before logical reset */
+	for (i = 0 ; i < 4 ; i++) {
+		rc = fw_config(gaudi_nic, i, NIC_DR_50, fmode, true);
+		if (rc) {
+			dev_err(hdev->dev,
+				"F/W configuration failed for NIC PHY\n");
+			return rc;
+		}
+	}
+
+	/* logic reset */
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, chip_reset_addr, 0x777);
+		phy_write(gaudi_nic, i, chip_reset_addr, 0);
+	}
+
+	usleep_range(500, 1000);
+
+	return 0;
+}
+
+void gaudi_nic_phy_reset_tx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	u32 val;
+
+	/* disable TX */
+	val = phy_read(gaudi_nic, lane, 0xA0);
+	/* set bit 13 to 1 */
+	val |= 0x2000;
+	/* set bit 11 to 0 */
+	val &= ~0x800;
+	phy_write(gaudi_nic, lane, 0xA0, val);
+
+	msleep(500);
+
+	/* enable TX */
+	val = phy_read(gaudi_nic, lane, 0xA0);
+	/* set bit 13 to 0 */
+	val &= ~0x2000;
+	phy_write(gaudi_nic, lane, 0xA0, val);
+}
+
+void gaudi_nic_phy_start_stop(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool is_start)
+{
+	if (is_start) {
+		/* Enable TX driver in SerDes */
+		phy_write_mask(gaudi_nic, lane, 0xE3, 1, 0x2000);
+		/* Enable F/W Rx tuning is done during power up sequence */
+	} else {
+		/* Disable TX driver in SerDes */
+		phy_write_mask(gaudi_nic, lane, 0xE3, 0, 0x2000);
+		/* Silence F/W Rx tuning */
+		phy_write(gaudi_nic, lane, 0x9815, 0x9000);
+	}
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 07/14] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (4 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 06/14] habanalabs/gaudi: add NIC PHY code Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 08/14] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

The user needs this information when working in a distributed environment
with master/slave configuration. All the slaves get their MAC addresses
from the driver and send them to the master.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/habanalabs.h   |  5 +++
 .../misc/habanalabs/common/habanalabs_ioctl.c | 31 +++++++++++++++++++
 drivers/misc/habanalabs/gaudi/gaudi.c         |  1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |  2 ++
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 27 ++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |  9 ++++++
 include/uapi/misc/habanalabs.h                | 20 +++++++++++-
 7 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 45feb4884ab3..fee04299360d 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -619,6 +619,8 @@ enum div_select_defs {
 	DIV_SEL_DIVIDED_PLL = 3,
 };
 
+struct hl_info_mac_addr;
+
 /**
  * struct hl_asic_funcs - ASIC specific functions that are can be called from
  *                        common code.
@@ -696,6 +698,7 @@ enum div_select_defs {
  * @get_hw_state: retrieve the H/W state
  * @pci_bars_map: Map PCI BARs.
  * @init_iatu: Initialize the iATU unit inside the PCI controller.
+ * @get_mac_addr: Get list of MAC addresses.
  * @rreg: Read a register. Needed for simulator support.
  * @wreg: Write a register. Needed for simulator support.
  * @halt_coresight: stop the ETF and ETR traces.
@@ -799,6 +802,8 @@ struct hl_asic_funcs {
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
 	int (*pci_bars_map)(struct hl_device *hdev);
 	int (*init_iatu)(struct hl_device *hdev);
+	int (*get_mac_addr)(struct hl_device *hdev,
+				struct hl_info_mac_addr *mac_addr);
 	u32 (*rreg)(struct hl_device *hdev, u32 reg);
 	void (*wreg)(struct hl_device *hdev, u32 reg, u32 val);
 	void (*halt_coresight)(struct hl_device *hdev);
diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 07317ea49129..5db6c978415c 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -203,6 +203,33 @@ static int debug_coresight(struct hl_device *hdev, struct hl_debug_args *args)
 	return rc;
 }
 
+static int mac_addr_info(struct hl_device *hdev, struct hl_info_args *args)
+{
+	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
+	struct hl_info_mac_addr *mac_addr;
+	u32 max_size = args->return_size;
+	int rc;
+
+	if (!max_size || !out)
+		return -EINVAL;
+
+	mac_addr = kzalloc(sizeof(struct hl_info_mac_addr), GFP_KERNEL);
+	if (!mac_addr)
+		return -ENOMEM;
+
+	rc = hdev->asic_funcs->get_mac_addr(hdev, mac_addr);
+	if (rc)
+		goto out;
+
+	rc = copy_to_user(out, mac_addr,
+		min((size_t) max_size, sizeof(struct hl_info_mac_addr))) ?
+								-EFAULT : 0;
+
+out:
+	kfree(mac_addr);
+	return rc;
+}
+
 static int device_utilization(struct hl_device *hdev, struct hl_info_args *args)
 {
 	struct hl_info_device_utilization device_util = {0};
@@ -423,6 +450,10 @@ static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
 		rc = hw_idle(hdev, args);
 		break;
 
+	case HL_INFO_MAC_ADDR:
+		rc = mac_addr_info(hdev, args);
+		break;
+
 	case HL_INFO_DEVICE_UTILIZATION:
 		rc = device_utilization(hdev, args);
 		break;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index eee83e0a8c6d..d2f51497fa8e 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7472,6 +7472,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.get_hw_state = gaudi_get_hw_state,
 	.pci_bars_map = gaudi_pci_bars_map,
 	.init_iatu = gaudi_init_iatu,
+	.get_mac_addr = gaudi_nic_get_mac_addr,
 	.rreg = hl_rreg,
 	.wreg = hl_wreg,
 	.halt_coresight = gaudi_halt_coresight,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 6dea73c5682f..69b3656eaaeb 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -564,6 +564,8 @@ void gaudi_nic_ports_fini(struct hl_device *hdev);
 int gaudi_nic_hard_reset_prepare(struct hl_device *hdev);
 void gaudi_nic_stop(struct hl_device *hdev);
 void gaudi_nic_ports_reopen(struct hl_device *hdev);
+int gaudi_nic_get_mac_addr(struct hl_device *hdev,
+				struct hl_info_mac_addr *mac_addr);
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
 irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 1e3f58297e5e..fc4fc80eb005 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -2774,6 +2774,33 @@ void gaudi_nic_ports_reopen(struct hl_device *hdev)
 	gaudi->hw_cap_initialized |= HW_CAP_NIC_DRV;
 }
 
+int gaudi_nic_get_mac_addr(struct hl_device *hdev,
+				struct hl_info_mac_addr *mac_addr)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct net_device *ndev;
+	int i, number_of_ports;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		goto out;
+
+	number_of_ports = min_t(int, NIC_NUMBER_OF_PORTS,
+				HL_INFO_MAC_ADDR_MAX_NUM);
+
+	for (i = 0 ; i < number_of_ports ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		ndev = gaudi->nic_devices[i].ndev;
+		if (!ndev)
+			continue;
+
+		ether_addr_copy(mac_addr->array[i].addr, ndev->dev_addr);
+		mac_addr->mask[i / 64] |= BIT_ULL(i % 64);
+	}
+out:
+	return 0;
+}
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 {
 }
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index f82212310114..75e3b3bac47c 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5269,6 +5269,14 @@ static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 	return RREG32(mmHW_STATE);
 }
 
+static int goya_get_mac_addr(struct hl_device *hdev,
+			struct hl_info_mac_addr *mac_addr)
+{
+	dev_err_ratelimited(hdev->dev,
+				"No MAC addresses are assigned to Goya\n");
+	return -ENXIO;
+}
+
 static int goya_ctx_init(struct hl_ctx *ctx)
 {
 	return 0;
@@ -5388,6 +5396,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.get_hw_state = goya_get_hw_state,
 	.pci_bars_map = goya_pci_bars_map,
 	.init_iatu = goya_init_iatu,
+	.get_mac_addr = goya_get_mac_addr,
 	.rreg = hl_rreg,
 	.wreg = hl_wreg,
 	.halt_coresight = goya_halt_coresight,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index cd9d05e03464..4c545ae8b6df 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -10,6 +10,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/if_ether.h>
 
 /*
  * Defines that are asic-specific but constitutes as ABI between kernel driver
@@ -248,6 +249,8 @@ enum hl_device_status {
  *                         internal engine.
  * HL_INFO_DEVICE_STATUS - Retrieve the device's status. This opcode doesn't
  *                         require an open context.
+ * HL_INFO_MAC_ADDR      - Retrieve the list of MAC addresses of the device's
+ *                         network ports, if the device has network ports.
  * HL_INFO_DEVICE_UTILIZATION  - Retrieve the total utilization of the device
  *                               over the last period specified by the user.
  *                               The period can be between 100ms to 1s, in
@@ -274,6 +277,7 @@ enum hl_device_status {
 #define HL_INFO_DRAM_USAGE		2
 #define HL_INFO_HW_IDLE			3
 #define HL_INFO_DEVICE_STATUS		4
+#define HL_INFO_MAC_ADDR		5
 #define HL_INFO_DEVICE_UTILIZATION	6
 #define HL_INFO_HW_EVENTS_AGGREGATE	7
 #define HL_INFO_CLK_RATE		8
@@ -285,9 +289,11 @@ enum hl_device_status {
 #define HL_INFO_SYNC_MANAGER		14
 #define HL_INFO_TOTAL_ENERGY		15
 
-#define HL_INFO_VERSION_MAX_LEN	128
+#define HL_INFO_VERSION_MAX_LEN		128
 #define HL_INFO_CARD_NAME_MAX_LEN	16
 
+#define HL_INFO_MAC_ADDR_MAX_NUM	128
+
 struct hl_info_hw_ip_info {
 	__u64 sram_base_address;
 	__u64 dram_base_address;
@@ -334,6 +340,18 @@ struct hl_info_device_status {
 	__u32 pad;
 };
 
+struct hl_mac_addr {
+	__u8 addr[ETH_ALEN];
+	__u8 pad[2];
+};
+
+struct hl_info_mac_addr {
+	/* MAC address at index N is of the corresponding PORT ID */
+	struct hl_mac_addr array[HL_INFO_MAC_ADDR_MAX_NUM];
+	/* Mask of valid entries at the MAC addresses array */
+	__u64 mask[2];
+};
+
 struct hl_info_device_utilization {
 	__u32 utilization;
 	__u32 pad;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 08/14] habanalabs/gaudi: add a new IOCTL for NIC control operations
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (5 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 07/14] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 09/14] habanalabs/gaudi: add CQ " Oded Gabbay
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add Queue Pair (QP) opcodes to the NIC ioctl.

A QP represents a connection between two Gaudi ports. Each port currently
supports 1024 QPs where QP 0 is reserved for the driver for Ethernet.
User-space process needs to create a QP in order to communicate with other
Gaudis.

QP can have two contexts: requester (sender) and responder (receiver). Both
have unique parameters as well as shared ones.

The QP numbers are not recycled immediately but only after wraparound. This
to avoid cases where a QP was closed and reopened and got data of the
"old" QP.

The added opcodes are:

- Create a QP
- Set requester context
- Set responder context
- Destroy a QP

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/habanalabs.h   |   3 +
 .../misc/habanalabs/common/habanalabs_ioctl.c |  98 ++++-
 drivers/misc/habanalabs/gaudi/gaudi.c         |   1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |   2 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 406 ++++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |   9 +
 include/uapi/misc/habanalabs.h                | 129 +++++-
 7 files changed, 646 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index fee04299360d..cae6d1e26c36 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -696,6 +696,7 @@ struct hl_info_mac_addr;
  *                    then the timeout is the default timeout for the specific
  *                    ASIC
  * @get_hw_state: retrieve the H/W state
+ * @nic_control: Perform NIC related operations.
  * @pci_bars_map: Map PCI BARs.
  * @init_iatu: Initialize the iATU unit inside the PCI controller.
  * @get_mac_addr: Get list of MAC addresses.
@@ -800,6 +801,8 @@ struct hl_asic_funcs {
 	int (*send_cpu_message)(struct hl_device *hdev, u32 *msg,
 				u16 len, u32 timeout, long *result);
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
+	int (*nic_control)(struct hl_device *hdev, u32 op, void *input,
+				void *output);
 	int (*pci_bars_map)(struct hl_device *hdev);
 	int (*init_iatu)(struct hl_device *hdev);
 	int (*get_mac_addr)(struct hl_device *hdev,
diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 5db6c978415c..a0d6a9ad7882 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -24,6 +24,20 @@ static u32 hl_debug_struct_size[HL_DEBUG_OP_TIMESTAMP + 1] = {
 
 };
 
+static u32 hl_nic_input_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_in),
+	[HL_NIC_OP_SET_REQ_CONN_CTX] = sizeof(struct hl_nic_req_conn_ctx_in),
+	[HL_NIC_OP_SET_RES_CONN_CTX] = sizeof(struct hl_nic_res_conn_ctx_in),
+	[HL_NIC_OP_DESTROY_CONN] = sizeof(struct hl_nic_destroy_conn_in),
+};
+
+static u32 hl_nic_output_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_out),
+	[HL_NIC_OP_SET_REQ_CONN_CTX] = 0,
+	[HL_NIC_OP_SET_RES_CONN_CTX] = 0,
+	[HL_NIC_OP_DESTROY_CONN] = 0,
+};
+
 static int device_status_info(struct hl_device *hdev, struct hl_info_args *args)
 {
 	struct hl_info_device_status dev_stat = {0};
@@ -545,6 +559,87 @@ static int hl_debug_ioctl(struct hl_fpriv *hpriv, void *data)
 	return rc;
 }
 
+static int nic_control(struct hl_device *hdev, struct hl_nic_args *args)
+{
+	void *input = NULL, *output = NULL;
+	int rc;
+
+	if (args->input_ptr && args->input_size) {
+		input = kzalloc(hl_nic_input_size[args->op], GFP_KERNEL);
+		if (!input) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		if (copy_from_user(input, u64_to_user_ptr(args->input_ptr),
+					args->input_size)) {
+			rc = -EFAULT;
+			dev_err(hdev->dev, "failed to copy input NIC data\n");
+			goto out;
+		}
+	}
+
+	if (args->output_ptr && args->output_size) {
+		output = kzalloc(hl_nic_output_size[args->op], GFP_KERNEL);
+		if (!output) {
+			rc = -ENOMEM;
+			goto out;
+		}
+	}
+
+	rc = hdev->asic_funcs->nic_control(hdev, args->op, input, output);
+	if (rc)
+		dev_err_ratelimited(hdev->dev,
+				"NIC control operation %d failed %d\n",
+				args->op, rc);
+
+	if (output && copy_to_user((void __user *) (uintptr_t) args->output_ptr,
+					output, args->output_size)) {
+		dev_err(hdev->dev, "copy to user failed in nic ioctl\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+out:
+	kfree(output);
+	kfree(input);
+
+	return rc;
+}
+
+static int hl_nic_ioctl(struct hl_fpriv *hpriv, void *data)
+{
+	struct hl_device *hdev = hpriv->hdev;
+	struct hl_nic_args *args = data;
+	int rc;
+
+	if (hl_device_disabled_or_in_reset(hdev)) {
+		dev_warn_ratelimited(hdev->dev,
+			"Device is %s. Can't execute NIC IOCTL\n",
+			atomic_read(&hdev->in_reset) ? "in_reset" : "disabled");
+		return -EBUSY;
+	}
+
+	switch (args->op) {
+	case HL_NIC_OP_ALLOC_CONN:
+	case HL_NIC_OP_SET_REQ_CONN_CTX:
+	case HL_NIC_OP_SET_RES_CONN_CTX:
+	case HL_NIC_OP_DESTROY_CONN:
+		args->input_size =
+			min(args->input_size, hl_nic_input_size[args->op]);
+		args->output_size =
+			min(args->output_size, hl_nic_output_size[args->op]);
+		rc = nic_control(hdev, args);
+		break;
+	default:
+		dev_err(hdev->dev, "Invalid request %d\n", args->op);
+		rc = -ENOTTY;
+		break;
+	}
+
+	return rc;
+}
+
 #define HL_IOCTL_DEF(ioctl, _func) \
 	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func}
 
@@ -554,7 +649,8 @@ static const struct hl_ioctl_desc hl_ioctls[] = {
 	HL_IOCTL_DEF(HL_IOCTL_CS, hl_cs_ioctl),
 	HL_IOCTL_DEF(HL_IOCTL_WAIT_CS, hl_cs_wait_ioctl),
 	HL_IOCTL_DEF(HL_IOCTL_MEMORY, hl_mem_ioctl),
-	HL_IOCTL_DEF(HL_IOCTL_DEBUG, hl_debug_ioctl)
+	HL_IOCTL_DEF(HL_IOCTL_DEBUG, hl_debug_ioctl),
+	HL_IOCTL_DEF(HL_IOCTL_NIC, hl_nic_ioctl)
 };
 
 static const struct hl_ioctl_desc hl_ioctls_control[] = {
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index d2f51497fa8e..9ad34e22f00b 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7470,6 +7470,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.get_eeprom_data = gaudi_get_eeprom_data,
 	.send_cpu_message = gaudi_send_cpu_message,
 	.get_hw_state = gaudi_get_hw_state,
+	.nic_control = gaudi_nic_control,
 	.pci_bars_map = gaudi_pci_bars_map,
 	.init_iatu = gaudi_init_iatu,
 	.get_mac_addr = gaudi_nic_get_mac_addr,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 69b3656eaaeb..4143be6479fb 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -566,6 +566,8 @@ void gaudi_nic_stop(struct hl_device *hdev);
 void gaudi_nic_ports_reopen(struct hl_device *hdev);
 int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 				struct hl_info_mac_addr *mac_addr);
+int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input,
+			void *output);
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
 irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index fc4fc80eb005..ed994d25da4f 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -56,6 +56,7 @@ enum eth_pkt_status {
 #define PCS_FAIL_THRESHOLD		8
 #define PCS_FAULT_THRESHOLD		20
 #define PCS_LINK_RETRY_MSEC		20
+#define QPC_REQ_BURST_SIZE		16
 
 /* NIC_MAX_MTU equals 8K minus eth header */
 #define NIC_MAX_MTU	((1 << 13) - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
@@ -70,6 +71,9 @@ enum eth_pkt_status {
 #define MAC_CFG_XPCS91(addr, data)	\
 				mac_write(gaudi_nic, i, "xpcs91", addr, data)
 
+static struct hl_qp dummy_qp;
+static int qp_put(struct hl_qp *qp);
+
 bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic)
 {
 	return atomic_read(&gaudi_nic->in_reset) ||
@@ -2801,6 +2805,408 @@ int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 out:
 	return 0;
 }
+
+static struct hl_qp *qp_get(struct hl_device *hdev,
+			struct gaudi_nic_device *gaudi_nic, u32 conn_id)
+{
+	struct hl_qp *qp;
+
+	mutex_lock(&gaudi_nic->idr_lock);
+	qp = idr_find(&gaudi_nic->qp_ids, conn_id);
+	if (!qp || qp == &dummy_qp) {
+		dev_err(hdev->dev,
+			"Failed to find matching QP for handle %d in port %d\n",
+			conn_id, gaudi_nic->port);
+		goto out;
+	}
+
+	kref_get(&qp->refcount);
+out:
+	mutex_unlock(&gaudi_nic->idr_lock);
+
+	return qp;
+}
+
+static void qp_do_release(struct hl_qp *qp)
+{
+	mutex_destroy(&qp->qpc_lock);
+	kfree(qp);
+}
+
+static void qp_release(struct kref *ref)
+{
+	struct hl_qp *qp = container_of(ref, struct hl_qp, refcount);
+	struct gaudi_nic_device *gaudi_nic = qp->gaudi_nic;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct qpc_requester req_qpc = {};
+	struct qpc_responder res_qpc = {};
+	u64 req_qpc_addr, res_qpc_addr;
+	void __iomem *base_bar_addr;
+	struct gaudi_device *gaudi;
+	int i;
+
+	gaudi = hdev->asic_specific;
+	base_bar_addr = hdev->pcie_bar[HBM_BAR_ID] - gaudi->hbm_bar_cur_addr;
+
+	req_qpc_addr = REQ_QPC_ADDR(qp->port, qp->conn_id);
+	res_qpc_addr = RES_QPC_ADDR(qp->port, qp->conn_id);
+
+	REQ_QPC_SET_VALID(req_qpc, 0);
+	RES_QPC_SET_VALID(res_qpc, 0);
+
+	mutex_lock(&qp->qpc_lock);
+
+	if (qp->is_req)
+		for (i = 0 ; i < (sizeof(req_qpc) / sizeof(u64)) ; i++)
+			writeq(req_qpc.data[i], base_bar_addr +
+					(req_qpc_addr + i * 8));
+
+	if (qp->is_res)
+		for (i = 0 ; i < (sizeof(res_qpc) / sizeof(u64)) ; i++)
+			writeq(res_qpc.data[i], base_bar_addr +
+					(res_qpc_addr + i * 8));
+
+	/* Perform read to flush the writes of the connection context */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	if (qp->is_req)
+		qpc_cache_inv(gaudi_nic, true);
+	if (qp->is_res)
+		qpc_cache_inv(gaudi_nic, false);
+
+	mutex_unlock(&qp->qpc_lock);
+
+	/*
+	 * No need in removing the QP ID from the IDR. This will be done once
+	 * the IDR gets full. We do this lazy cleanup because we don't want to
+	 * reuse a QP ID immediately after a QP was destroyed.
+	 */
+	qp_do_release(qp);
+}
+
+static int qp_put(struct hl_qp *qp)
+{
+	return kref_put(&qp->refcount, qp_release);
+}
+
+/* "gaudi_nic->idr_lock" should be taken from the caller function if needed */
+static void qps_clean_dummies(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_qp *qp;
+	int qp_id;
+
+	idr_for_each_entry(&gaudi_nic->qp_ids, qp, qp_id)
+		if (qp == &dummy_qp)
+			idr_remove(&gaudi_nic->qp_ids, qp_id);
+}
+
+static int conn_ioctl_check(struct hl_device *hdev, u32 port, u32 conn_id)
+{
+	if (port >= NIC_NUMBER_OF_PORTS) {
+		dev_err(hdev->dev, "Invalid port %d\n", port);
+		return -EINVAL;
+	}
+
+	if (!(hdev->nic_ports_mask & BIT(port))) {
+		dev_err(hdev->dev, "Port %d is disabled\n", port);
+		return -ENODEV;
+	}
+
+	if (conn_id < HL_NIC_MIN_CONN_ID || conn_id > HL_NIC_MAX_CONN_ID) {
+		dev_err(hdev->dev, "Invalid connection ID %d for port %d\n",
+			conn_id, port);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int alloc_conn(struct hl_device *hdev, struct hl_nic_alloc_conn_in *in,
+			struct hl_nic_alloc_conn_out *out)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct hl_qp *qp;
+	int id, rc;
+
+	if (!in || !out) {
+		dev_err(hdev->dev,
+			"Missing parameters to allocate a NIC context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, HL_NIC_MIN_CONN_ID);
+	if (rc)
+		return rc;
+
+	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+	if (!qp)
+		return -ENOMEM;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+	mutex_init(&qp->qpc_lock);
+	kref_init(&qp->refcount);
+	qp->gaudi_nic = gaudi_nic;
+	qp->port = in->port;
+
+	/* TODO: handle local/remote keys */
+
+	mutex_lock(&gaudi_nic->idr_lock);
+	id = idr_alloc(&gaudi_nic->qp_ids, qp, HL_NIC_MIN_CONN_ID,
+			HL_NIC_MAX_CONN_ID + 1, GFP_KERNEL);
+
+	if (id < 0) {
+		/* Try again after removing the dummy ids */
+		qps_clean_dummies(gaudi_nic);
+		id = idr_alloc(&gaudi_nic->qp_ids, qp, HL_NIC_MIN_CONN_ID,
+				HL_NIC_MAX_CONN_ID + 1, GFP_KERNEL);
+	}
+
+	qp->conn_id = id;
+	mutex_unlock(&gaudi_nic->idr_lock);
+
+	if (id < 0) {
+		qp_do_release(qp);
+		return id;
+	}
+
+	dev_dbg(hdev->dev, "Allocating connection id %d in port %d",
+		id, qp->port);
+
+	out->conn_id = id;
+
+	return 0;
+}
+
+static int set_req_conn_ctx(struct hl_device *hdev,
+				struct hl_nic_req_conn_ctx_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct qpc_requester req_qpc = {};
+	struct hl_qp *qp;
+	u64 req_qpc_addr;
+	int i, rc;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to set a requester context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, in->conn_id);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+
+	qp = qp_get(hdev, gaudi_nic, in->conn_id);
+	if (!qp)
+		return -EINVAL;
+
+	req_qpc_addr = REQ_QPC_ADDR(in->port, in->conn_id);
+	REQ_QPC_SET_DST_QP(req_qpc, in->dst_conn_id);
+	REQ_QPC_SET_PORT(req_qpc, 0);
+	REQ_QPC_SET_PRIORITY(req_qpc, in->priority);
+	REQ_QPC_SET_RKEY(req_qpc, qp->remote_key);
+	REQ_QPC_SET_DST_IP(req_qpc, in->dst_ip_addr);
+	REQ_QPC_SET_SRC_IP(req_qpc, in->src_ip_addr);
+	REQ_QPC_SET_DST_MAC_31_0(req_qpc, *(u32 *) in->dst_mac_addr);
+	REQ_QPC_SET_DST_MAC_47_32(req_qpc, *(u16 *) (in->dst_mac_addr + 4));
+	REQ_QPC_SET_SQ_NUM(req_qpc, in->sq_number);
+	REQ_QPC_SET_TM_GRANULARITY(req_qpc, in->timer_granularity);
+	REQ_QPC_SET_SOB_EN(req_qpc, in->enable_sob);
+	REQ_QPC_SET_TRANSPORT_SERVICE(req_qpc, TS_RC);
+	REQ_QPC_SET_BURST_SIZE(req_qpc, QPC_REQ_BURST_SIZE);
+	REQ_QPC_SET_LAST_IDX(req_qpc, in->last_index);
+	REQ_QPC_SET_WQ_BASE_ADDR(req_qpc, in->conn_id);
+	REQ_QPC_SET_SWQ_GRANULARITY(req_qpc, in->swq_granularity);
+	REQ_QPC_SET_VALID(req_qpc, 1);
+
+	mutex_lock(&qp->qpc_lock);
+
+	for (i = 0 ; i < (sizeof(req_qpc) / sizeof(u64)) ; i++)
+		writeq(req_qpc.data[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((req_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	/* Perform read to flush the writes of the connection context */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qp->is_req = true;
+	qpc_cache_inv(gaudi_nic, true);
+
+	mutex_unlock(&qp->qpc_lock);
+
+	qp_put(qp);
+
+	return 0;
+}
+
+static int set_res_conn_ctx(struct hl_device *hdev,
+				struct hl_nic_res_conn_ctx_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct qpc_responder res_qpc = {};
+	struct hl_qp *qp;
+	u64 res_qpc_addr;
+	int i, rc;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to set a responder context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, in->conn_id);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+
+	qp = qp_get(hdev, gaudi_nic, in->conn_id);
+	if (!qp)
+		return -EINVAL;
+
+	res_qpc_addr = RES_QPC_ADDR(in->port, in->conn_id);
+	RES_QPC_SET_DST_QP(res_qpc, in->dst_conn_id);
+	RES_QPC_SET_PORT(res_qpc, 0);
+	RES_QPC_SET_PRIORITY(res_qpc, in->priority);
+	RES_QPC_SET_SQ_NUM(res_qpc, in->sq_number);
+	RES_QPC_SET_LKEY(res_qpc, qp->local_key);
+	RES_QPC_SET_DST_IP(res_qpc, in->dst_ip_addr);
+	RES_QPC_SET_SRC_IP(res_qpc, in->src_ip_addr);
+	RES_QPC_SET_DST_MAC_31_0(res_qpc, *(u32 *) in->dst_mac_addr);
+	RES_QPC_SET_DST_MAC_47_32(res_qpc, *(u16 *) (in->dst_mac_addr + 4));
+	RES_QPC_SET_TRANSPORT_SERVICE(res_qpc, TS_RC);
+	RES_QPC_SET_LOG_BUF_SIZE_MASK(res_qpc, 0);
+	RES_QPC_SET_SOB_EN(res_qpc, in->enable_sob);
+	RES_QPC_SET_VALID(res_qpc, 1);
+
+	mutex_lock(&qp->qpc_lock);
+
+	for (i = 0 ; i < (sizeof(res_qpc) / sizeof(u64)) ; i++)
+		writeq(res_qpc.data[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((res_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	/* Perform read to flush the writes of the connection context */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qp->is_res = true;
+	qpc_cache_inv(gaudi_nic, false);
+
+	mutex_unlock(&qp->qpc_lock);
+
+	qp_put(qp);
+
+	return 0;
+}
+
+static int destroy_conn(struct hl_device *hdev,
+			struct hl_nic_destroy_conn_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct hl_qp *qp;
+	int rc;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to destroy a NIC context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, in->conn_id);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+
+	/* The QP pointer is replaced with the dummy QP to prevent other threads
+	 * from using the QP. The ID is kept allocated at this stage so the QP
+	 * context can be safely modified. qp_put() is called right afterwards.
+	 */
+	mutex_lock(&gaudi_nic->idr_lock);
+	qp = idr_replace(&gaudi_nic->qp_ids, &dummy_qp, in->conn_id);
+	mutex_unlock(&gaudi_nic->idr_lock);
+
+	if (IS_ERR(qp))
+		return PTR_ERR(qp);
+
+	qp_put(qp);
+
+	return 0;
+}
+
+int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input, void *output)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int rc;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return -EFAULT;
+
+	switch (op) {
+	case HL_NIC_OP_ALLOC_CONN:
+		rc = alloc_conn(hdev, input, output);
+		break;
+	case HL_NIC_OP_SET_REQ_CONN_CTX:
+		rc = set_req_conn_ctx(hdev, input);
+		break;
+	case HL_NIC_OP_SET_RES_CONN_CTX:
+		rc = set_res_conn_ctx(hdev, input);
+		break;
+	case HL_NIC_OP_DESTROY_CONN:
+		rc = destroy_conn(hdev, input);
+		break;
+	default:
+		dev_err(hdev->dev, "Invalid NIC control request %d\n", op);
+		return -ENOTTY;
+	}
+
+	return rc;
+}
+
+static void qps_destroy(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct hl_qp *qp;
+	int qp_id, i;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		/*
+		 * No need to acquire "gaudi_nic->idr_lock", as qps_destroy() is
+		 * only called when a context is closed, and in Gaudi we have a
+		 * single context.
+		 */
+
+		qps_clean_dummies(gaudi_nic);
+
+		idr_for_each_entry(&gaudi_nic->qp_ids, qp, qp_id) {
+			idr_remove(&gaudi_nic->qp_ids, qp_id);
+			if (qp_put(qp) != 1)
+				dev_err(hdev->dev,
+					"QP %d of port %d is still alive\n",
+					qp->conn_id, qp->port);
+		}
+	}
+}
+
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 {
+	struct gaudi_device *gaudi = ctx->hdev->asic_specific;
+	struct hl_device *hdev = ctx->hdev;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return;
+
+	qps_destroy(hdev);
+	/* wait for the NIC to digest the invalid QPs */
+	msleep(20);
 }
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 75e3b3bac47c..13b2bfac2b7a 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5269,6 +5269,14 @@ static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 	return RREG32(mmHW_STATE);
 }
 
+static int goya_nic_control(struct hl_device *hdev, u32 op, void *input,
+			void *output)
+{
+	dev_err_ratelimited(hdev->dev,
+				"NIC operations cannot be performed on Goya\n");
+	return -ENXIO;
+}
+
 static int goya_get_mac_addr(struct hl_device *hdev,
 			struct hl_info_mac_addr *mac_addr)
 {
@@ -5394,6 +5402,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.get_eeprom_data = goya_get_eeprom_data,
 	.send_cpu_message = goya_send_cpu_message,
 	.get_hw_state = goya_get_hw_state,
+	.nic_control = goya_nic_control,
 	.pci_bars_map = goya_pci_bars_map,
 	.init_iatu = goya_init_iatu,
 	.get_mac_addr = goya_get_mac_addr,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 4c545ae8b6df..dbee6a16b952 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -852,6 +852,116 @@ struct hl_debug_args {
 #define HL_NIC_MIN_CONN_ID	1
 #define HL_NIC_MAX_CONN_ID	1023
 
+struct hl_nic_alloc_conn_in {
+	/* NIC port ID */
+	__u32 port;
+	__u32 pad;
+};
+
+struct hl_nic_alloc_conn_out {
+	/* Connection ID */
+	__u32 conn_id;
+	__u32 pad;
+};
+
+struct hl_nic_req_conn_ctx_in {
+	/* Source IP address */
+	__u32 src_ip_addr;
+	/* Destination IP address */
+	__u32 dst_ip_addr;
+	/* Destination connection ID */
+	__u32 dst_conn_id;
+	/* Burst size [1..(2^22)-1 or 0 to disable] */
+	__u32 burst_size;
+	/* Index of last entry [2..(2^22)-1] */
+	__u32 last_index;
+	/* NIC port ID */
+	__u32 port;
+	/* Connection ID */
+	__u32 conn_id;
+	/* Destination MAC address */
+	__u8 dst_mac_addr[ETH_ALEN];
+	/* SQ number */
+	__u8 sq_number;
+	/* Connection priority [0..3] */
+	__u8 priority;
+	/* Enable/disable SOB */
+	__u8 enable_sob;
+	/* Timer granularity [0..127]*/
+	__u8 timer_granularity;
+	/* SWQ granularity [0 for 64B or 1 for 32B] */
+	__u8 swq_granularity;
+	/* Work queue type [1..3] */
+	__u8 wq_type;
+	/* Version type in remote side [0..1] */
+	__u8 version;
+	/* Completion queue number */
+	__u8 cq_number;
+	/* Remote Work queue log size [2^QPC] Rendezvous */
+	__u8 wq_remote_log_size;
+	__u8 pad;
+};
+
+struct hl_nic_res_conn_ctx_in {
+	/* Source IP address */
+	__u32 src_ip_addr;
+	/* Destination IP address */
+	__u32 dst_ip_addr;
+	/* Destination connection ID */
+	__u32 dst_conn_id;
+	/* NIC port ID */
+	__u32 port;
+	/* Connection ID */
+	__u32 conn_id;
+	/* Destination MAC address */
+	__u8 dst_mac_addr[ETH_ALEN];
+	/* Connection priority [0..3] */
+	__u8 priority;
+	/* SQ number */
+	__u8 sq_number;
+	/* Enable/disable SOB */
+	__u8 enable_sob;
+	/* Work queue granularity */
+	__u8 wq_peer_granularity;
+	/* Completion queue number */
+	__u8 cq_number;
+	/* Version type in remote side [0..1] */
+	__u8 version;
+	/* Connection peer */
+	__u32 conn_peer;
+};
+
+struct hl_nic_destroy_conn_in {
+	/* NIC port ID */
+	__u32 port;
+	/* Connection ID */
+	__u32 conn_id;
+};
+
+/* Opcode to allocate connection ID */
+#define HL_NIC_OP_ALLOC_CONN			0
+/* Opcode to set up a requester connection context */
+#define HL_NIC_OP_SET_REQ_CONN_CTX		1
+/* Opcode to set up a responder connection context */
+#define HL_NIC_OP_SET_RES_CONN_CTX		2
+/* Opcode to destroy a connection */
+#define HL_NIC_OP_DESTROY_CONN			3
+
+struct hl_nic_args {
+	/* Pointer to user input structure (relevant to specific opcodes) */
+	__u64 input_ptr;
+	/* Pointer to user output structure (relevant to specific opcodes) */
+	__u64 output_ptr;
+	/* Size of user input structure */
+	__u32 input_size;
+	/* Size of user output structure */
+	__u32 output_size;
+	/* Context ID - Currently not in use */
+	__u32 ctx_id;
+	/* HL_NIC_OP_* */
+	__u32 op;
+};
+
 /*
  * Various information operations such as:
  * - H/W IP information
@@ -1017,7 +1127,24 @@ struct hl_debug_args {
 #define HL_IOCTL_DEBUG		\
 		_IOWR('H', 0x06, struct hl_debug_args)
 
+/*
+ * NIC
+ *
+ * This IOCTL allows the user to manage and configure the device's NIC ports.
+ * The following operations are available:
+ * - Allocate connection ID
+ * - Set up a requester connection context
+ * - Set up a responder connection context
+ * - Destroy a connection
+ *
+ * For all operations, the user should provide a pointer to an input structure
+ * with the context parameters. Some of the operations also require a pointer to
+ * an output structure for result/status.
+ *
+ */
+#define HL_IOCTL_NIC	_IOWR('H', 0x07, struct hl_nic_args)
+
 #define HL_COMMAND_START	0x01
-#define HL_COMMAND_END		0x07
+#define HL_COMMAND_END		0x08
 
 #endif /* HABANALABS_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 09/14] habanalabs/gaudi: add CQ control operations
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (6 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 08/14] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 10/14] habanalabs/gaudi: add WQ " Oded Gabbay
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add NIC Completion Queue (CQ) opcodes to NIC ioctl. The CQ is used by the
user-space process to get notification of a completed work.

A CQ entry (CQE) has three types: requester (sender), responder
(receiver) and error. Each type has unique fields as well as shared ones.

Currently only a single user CQ is supported but it may be extended in the
future, hence proper locking was added as well. In addition, an error
interrupt was added to identify CQ overrun.

The added opcodes are:
- Create CQ
- Destroy CQ
- Wait on CQ: sleeps until CQEs are available in the buffer.
- Poll CQ: check if there are available CQEs in the buffer. It is a
           non-blocking function.
- Update consumed CQEs: The user informs the driver regarding processed
                        CQEs so these can be overridden by the driver.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/device.c       |   6 +-
 drivers/misc/habanalabs/common/habanalabs.h   |   3 +
 .../misc/habanalabs/common/habanalabs_ioctl.c |  20 +-
 drivers/misc/habanalabs/gaudi/gaudi.c         |   1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |   1 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 594 ++++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |   8 +
 include/uapi/misc/habanalabs.h                | 111 ++++
 8 files changed, 741 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 196e35d71118..73d64f84aeba 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -117,12 +117,13 @@ static int hl_device_release_ctrl(struct inode *inode, struct file *filp)
  * @*filp: pointer to file structure
  * @*vma: pointer to vm_area_struct of the process
  *
- * Called when process does an mmap on habanalabs device. Call the device's mmap
+ * Called when process does an mmap on habanalabs device. Call the relevant mmap
  * function at the end of the common code.
  */
 static int hl_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct hl_fpriv *hpriv = filp->private_data;
+	struct hl_device *hdev = hpriv->hdev;
 	unsigned long vm_pgoff;
 
 	vm_pgoff = vma->vm_pgoff;
@@ -131,6 +132,9 @@ static int hl_mmap(struct file *filp, struct vm_area_struct *vma)
 	switch (vm_pgoff & HL_MMAP_TYPE_MASK) {
 	case HL_MMAP_TYPE_CB:
 		return hl_cb_mmap(hpriv, vma);
+
+	case HL_MMAP_TYPE_NIC_CQ:
+		return hdev->asic_funcs->nic_cq_mmap(hdev, vma);
 	}
 
 	return -EINVAL;
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index cae6d1e26c36..65bc2527338b 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -32,6 +32,7 @@
 #define HL_MMAP_TYPE_SHIFT		(62 - PAGE_SHIFT)
 #define HL_MMAP_TYPE_MASK		(0x3ull << HL_MMAP_TYPE_SHIFT)
 #define HL_MMAP_TYPE_CB			(0x2ull << HL_MMAP_TYPE_SHIFT)
+#define HL_MMAP_TYPE_NIC_CQ		(0x1ull << HL_MMAP_TYPE_SHIFT)
 
 #define HL_MMAP_OFFSET_VALUE_MASK	(0x3FFFFFFFFFFFull >> PAGE_SHIFT)
 #define HL_MMAP_OFFSET_VALUE_GET(off)	(off & HL_MMAP_OFFSET_VALUE_MASK)
@@ -697,6 +698,7 @@ struct hl_info_mac_addr;
  *                    ASIC
  * @get_hw_state: retrieve the H/W state
  * @nic_control: Perform NIC related operations.
+ * @nic_cq_mmap: map the NIC CQ buffer.
  * @pci_bars_map: Map PCI BARs.
  * @init_iatu: Initialize the iATU unit inside the PCI controller.
  * @get_mac_addr: Get list of MAC addresses.
@@ -803,6 +805,7 @@ struct hl_asic_funcs {
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
 	int (*nic_control)(struct hl_device *hdev, u32 op, void *input,
 				void *output);
+	int (*nic_cq_mmap)(struct hl_device *hdev, struct vm_area_struct *vma);
 	int (*pci_bars_map)(struct hl_device *hdev);
 	int (*init_iatu)(struct hl_device *hdev);
 	int (*get_mac_addr)(struct hl_device *hdev,
diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index a0d6a9ad7882..6ba1b9da0486 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -24,18 +24,29 @@ static u32 hl_debug_struct_size[HL_DEBUG_OP_TIMESTAMP + 1] = {
 
 };
 
-static u32 hl_nic_input_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+static u32 hl_nic_input_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_in),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = sizeof(struct hl_nic_req_conn_ctx_in),
 	[HL_NIC_OP_SET_RES_CONN_CTX] = sizeof(struct hl_nic_res_conn_ctx_in),
 	[HL_NIC_OP_DESTROY_CONN] = sizeof(struct hl_nic_destroy_conn_in),
+	[HL_NIC_OP_CQ_CREATE] = sizeof(struct hl_nic_cq_create_in),
+	[HL_NIC_OP_CQ_DESTROY] = sizeof(struct hl_nic_cq_destroy_in),
+	[HL_NIC_OP_CQ_WAIT] = sizeof(struct hl_nic_cq_poll_wait_in),
+	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_in),
+	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] =
+			sizeof(struct hl_nic_cq_update_consumed_cqes_in),
 };
 
-static u32 hl_nic_output_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+static u32 hl_nic_output_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_out),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = 0,
 	[HL_NIC_OP_SET_RES_CONN_CTX] = 0,
 	[HL_NIC_OP_DESTROY_CONN] = 0,
+	[HL_NIC_OP_CQ_CREATE] = sizeof(struct hl_nic_cq_create_out),
+	[HL_NIC_OP_CQ_DESTROY] = 0,
+	[HL_NIC_OP_CQ_WAIT] = sizeof(struct hl_nic_cq_poll_wait_out),
+	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_out),
+	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] = 0,
 };
 
 static int device_status_info(struct hl_device *hdev, struct hl_info_args *args)
@@ -625,6 +636,11 @@ static int hl_nic_ioctl(struct hl_fpriv *hpriv, void *data)
 	case HL_NIC_OP_SET_REQ_CONN_CTX:
 	case HL_NIC_OP_SET_RES_CONN_CTX:
 	case HL_NIC_OP_DESTROY_CONN:
+	case HL_NIC_OP_CQ_CREATE:
+	case HL_NIC_OP_CQ_DESTROY:
+	case HL_NIC_OP_CQ_WAIT:
+	case HL_NIC_OP_CQ_POLL:
+	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
 		args->input_size =
 			min(args->input_size, hl_nic_input_size[args->op]);
 		args->output_size =
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 9ad34e22f00b..4602e4780651 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7471,6 +7471,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.send_cpu_message = gaudi_send_cpu_message,
 	.get_hw_state = gaudi_get_hw_state,
 	.nic_control = gaudi_nic_control,
+	.nic_cq_mmap = gaudi_nic_cq_mmap,
 	.pci_bars_map = gaudi_pci_bars_map,
 	.init_iatu = gaudi_init_iatu,
 	.get_mac_addr = gaudi_nic_get_mac_addr,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 4143be6479fb..3158d5d68c1d 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -569,6 +569,7 @@ int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input,
 			void *output);
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
+int gaudi_nic_cq_mmap(struct hl_device *hdev, struct vm_area_struct *vma);
 irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
 netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index ed994d25da4f..999e9ded22fb 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -1757,6 +1757,466 @@ void gaudi_nic_sw_fini(struct hl_device *hdev)
 		_gaudi_nic_sw_fini(&gaudi->nic_devices[i]);
 }
 
+/* this function is called from multiple threads */
+static void copy_cqe_to_main_queue(struct hl_device *hdev,
+					struct hl_nic_cqe *cqe)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 pi;
+
+	spin_lock(&gaudi->nic_cq_lock);
+
+	pi = gaudi->nic_cq_user_pi++;
+	/* wraparound according to the user CQ length */
+	pi &= (gaudi->nic_cq_user_num_of_entries - 1);
+	memcpy(&gaudi->nic_cq_buf[pi], cqe, sizeof(*cqe));
+
+#if HL_NIC_DEBUG
+	if (cqe->type == HL_NIC_CQE_TYPE_RES) {
+		dev_dbg(hdev->dev,
+			"responder, msg_id: 0x%x, port: %d, was copied to pi %d\n",
+			cqe->responder.msg_id, cqe->port, pi);
+	} else {
+		dev_dbg(hdev->dev,
+			"requester, wqe_index: 0x%x, qp_number: %d, port: %d, was copied to pi %d\n",
+			cqe->requester.wqe_index,
+			cqe->qp_number, cqe->port, pi);
+	}
+#endif
+
+	/* copy the CQE before the counter update */
+	mb();
+
+	if (unlikely(!atomic_add_unless(&gaudi->nic_cq_user_new_cqes, 1,
+				gaudi->nic_cq_user_num_of_entries))) {
+		gaudi->nic_cq_status = HL_NIC_CQ_OVERFLOW;
+		dev_err(hdev->dev, "NIC CQ overflow, should recreate NIC CQ\n");
+	}
+
+	spin_unlock(&gaudi->nic_cq_lock);
+}
+
+static void cq_work(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							cq_work.work);
+	u32 ci = gaudi_nic->cq_ci, cqe_cnt = 0, port = gaudi_nic->port, delay;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct cqe *cq_arr = gaudi_nic->cq_mem_cpu, *cqe_hw;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct hl_nic_cqe cqe_sw;
+	bool stop_work = false;
+
+	while (1) {
+		if (unlikely(!gaudi->nic_cq_enable) ||
+			unlikely(gaudi->nic_cq_status != HL_NIC_CQ_SUCCESS)) {
+			stop_work = true;
+			break;
+		}
+
+		memset(&cqe_sw, 0, sizeof(cqe_sw));
+
+		/* wraparound according to our buffer length */
+		cqe_hw = &cq_arr[ci & (CQ_PORT_BUF_LEN - 1)];
+
+		if (!CQE_IS_VALID(cqe_hw))
+			break;
+		/* Make sure we read CQE contents after the valid bit check */
+		dma_rmb();
+
+		cqe_sw.port = port;
+
+		if (CQE_TYPE(cqe_hw)) {
+			cqe_sw.type = HL_NIC_CQE_TYPE_RES;
+			cqe_sw.responder.msg_id =
+					(CQE_RES_IMDT_31_22(cqe_hw) << 22) |
+						CQE_RES_IMDT_21_0(cqe_hw);
+
+			/*
+			 * the even port publishes its responder CQEs on the odd
+			 * port CQ. take the correct port in this case.
+			 */
+			if (!CQE_RES_NIC(cqe_hw))
+				cqe_sw.port--;
+		} else {
+			cqe_sw.requester.wqe_index = CQE_REQ_WQE_IDX(cqe_hw);
+			cqe_sw.qp_number = CQE_REQ_QPN(cqe_hw);
+		}
+
+		copy_cqe_to_main_queue(hdev, &cqe_sw);
+
+		CQE_SET_INVALID(cqe_hw);
+
+		/* the H/W CI does wraparound every 32 bit */
+		ci++;
+
+		cqe_cnt++;
+		if (unlikely(cqe_cnt > CQ_PORT_BUF_LEN)) {
+			dev_err(hdev->dev,
+				"handled too many CQEs (%d), port: %d\n",
+				cqe_cnt, port);
+			stop_work = true;
+			break;
+		}
+	}
+
+	/* no CQEs to handle */
+	if (cqe_cnt == 0)
+		goto out;
+
+#if HL_NIC_DEBUG
+	dev_dbg(hdev->dev, "update H/W CQ CI: %d, port: %d\n", ci, port);
+#endif
+
+	NIC_WREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX, ci);
+
+	/*
+	 * perform a read to flush the new CI value before checking for hidden
+	 * packets
+	 */
+	NIC_RREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX);
+
+	gaudi_nic->cq_ci = ci;
+
+	/* make sure we wake up the waiter after the CI update */
+	mb();
+
+	/* signal the completion queue that there are available CQEs */
+	complete(&gaudi->nic_cq_comp);
+
+	if (unlikely(stop_work))
+		goto out;
+
+out:
+	if (likely(cqe_cnt)) {
+		gaudi_nic->last_cqe_cnt = cqe_cnt;
+		delay = gaudi_nic->cq_delay;
+	} else {
+		ktime_t later;
+
+		/*
+		 * take base TS on the first polling invocation where no CQEs
+		 * were processed
+		 */
+		if (gaudi_nic->last_cqe_cnt) {
+			gaudi_nic->last_cqe_cnt = 0;
+			gaudi_nic->last_cqe_ts = ktime_get();
+		}
+
+		/* extend the delay if no CQEs were processed for 1 sec */
+		later = ktime_add_ms(gaudi_nic->last_cqe_ts, 1 * MSEC_PER_SEC);
+		if (ktime_compare(ktime_get(), later) > 0)
+			delay = gaudi_nic->cq_delay_idle;
+		else
+			delay = gaudi_nic->cq_delay;
+	}
+
+	queue_delayed_work(gaudi_nic->cq_wq, &gaudi_nic->cq_work, delay);
+}
+
+static int cq_update_consumed_cqes(struct hl_device *hdev,
+				struct hl_nic_cq_update_consumed_cqes_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 num_of_cqes;
+	int rc = 0;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to update consumed CQEs\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable) {
+		dev_err(hdev->dev,
+			"NIC CQ is not enabled, can't update user CI\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	num_of_cqes = in->cq_num_of_consumed_entries;
+
+	if (atomic_read(&gaudi->nic_cq_user_new_cqes) < num_of_cqes) {
+		dev_err(hdev->dev,
+			"nunmber of consumed CQEs is too big %d/%d\n",
+			num_of_cqes, atomic_read(&gaudi->nic_cq_user_new_cqes));
+		rc = -EINVAL;
+		goto out;
+	}
+
+	gaudi->nic_cq_user_ci = (gaudi->nic_cq_user_ci + num_of_cqes) &
+				(gaudi->nic_cq_user_num_of_entries - 1);
+
+	atomic_sub(num_of_cqes, &gaudi->nic_cq_user_new_cqes);
+
+#if HL_NIC_DEBUG
+	dev_dbg(hdev->dev, "consumed %d CQEs\n", num_of_cqes);
+	dev_dbg(hdev->dev, "user CQ CI: %d\n", gaudi->nic_cq_user_ci);
+#endif
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
+
+static int cq_poll_wait(struct hl_device *hdev,
+			struct hl_nic_cq_poll_wait_in *in,
+			struct hl_nic_cq_poll_wait_out *out,
+			bool do_wait)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	char *op_str = do_wait ? "wait" : "poll";
+	bool has_work = false;
+	u32 num_of_cqes;
+	long rc_wait;
+	int rc = 0;
+
+	if (!in || !out) {
+		dev_err(hdev->dev, "Missing parameters to poll/wait on CQ\n");
+		return -EINVAL;
+	}
+
+	/* allow only one thread to wait */
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable) {
+		dev_err(hdev->dev, "NIC CQ is not enabled, can't %s\n", op_str);
+		rc = -EFAULT;
+		goto out;
+	}
+
+	if (gaudi->nic_cq_status != HL_NIC_CQ_SUCCESS) {
+		dev_err(hdev->dev, "NIC CQ is not operational, can't %s\n",
+			op_str);
+		rc = -EFAULT;
+		goto out;
+	}
+
+#if HL_NIC_DEBUG
+	dev_dbg(hdev->dev, "ci: %d, wait: %d\n",
+		gaudi->nic_cq_user_ci, do_wait);
+#endif
+
+	if (do_wait) {
+		while (1) {
+			rc_wait = wait_for_completion_interruptible_timeout(
+					&gaudi->nic_cq_comp,
+					usecs_to_jiffies(in->timeout_us));
+
+			if (rc_wait == -ERESTARTSYS) {
+				dev_info(hdev->dev,
+						"stopping CQ %s due to signal\n",
+						op_str);
+				/* ERESTARTSYS is not returned to the user */
+				rc = -EINTR;
+				break;
+			}
+
+			if (!rc_wait) {
+				gaudi->nic_cq_status = HL_NIC_CQ_TIMEOUT;
+				break;
+			}
+
+			if (!gaudi->nic_cq_enable) {
+				dev_info(hdev->dev,
+						"stopping CQ %s upon request\n",
+						op_str);
+				rc = -EBUSY;
+				break;
+			}
+
+			if (gaudi->nic_cq_status != HL_NIC_CQ_SUCCESS)
+				break;
+
+			/*
+			 * A waiter can read 0 here.
+			 * Consider the following scenario:
+			 * 1. complete() is called twice for two CQEs.
+			 * 2. The first waiter grabs the two CQEs.
+			 * 3. The second waiter wakes up immediately and has no
+			 *    CQES to handle.
+			 */
+			num_of_cqes = atomic_read(&gaudi->nic_cq_user_new_cqes);
+			if (num_of_cqes) {
+				has_work = true;
+				break;
+			}
+		}
+	} else {
+		has_work = try_wait_for_completion(&gaudi->nic_cq_comp);
+		if (has_work)
+			num_of_cqes = atomic_read(&gaudi->nic_cq_user_new_cqes);
+	}
+
+	if (rc)
+		goto out;
+
+	if (has_work) {
+		out->pi = gaudi->nic_cq_user_ci;
+		out->num_of_cqes = num_of_cqes;
+#if HL_NIC_DEBUG
+		dev_dbg(hdev->dev, "pulled %d CQEs\n", num_of_cqes);
+		dev_dbg(hdev->dev, "user CQ CI: %d\n", gaudi->nic_cq_user_ci);
+#endif
+	} else {
+		out->num_of_cqes = 0;
+	}
+
+	out->status = gaudi->nic_cq_status;
+
+	/* timeout is not a real error, CQ should stay operational */
+	if (gaudi->nic_cq_status == HL_NIC_CQ_TIMEOUT)
+		gaudi->nic_cq_status = HL_NIC_CQ_SUCCESS;
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
+
+static int cq_create(struct hl_device *hdev, struct hl_nic_cq_create_in *in,
+			struct hl_nic_cq_create_out *out)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct cqe *cq_arr;
+	int rc = 0, i, j;
+
+	if (!in || !out) {
+		dev_err(hdev->dev, "Missing parameters to create CQ\n");
+		return -EINVAL;
+	}
+
+	if (in->cq_num_of_entries < CQ_USER_MIN_ENTRIES) {
+		dev_err(hdev->dev, "NIC CQ buffer length must be at least %d entries\n",
+			CQ_USER_MIN_ENTRIES);
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(in->cq_num_of_entries)) {
+		dev_err(hdev->dev,
+			"NIC CQ buffer length must be at power of 2\n");
+		return -EINVAL;
+	}
+
+	if (in->cq_num_of_entries > CQ_USER_MAX_ENTRIES) {
+		dev_err(hdev->dev,
+			"NIC CQ buffer length must not be more than 0x%lx entries\n",
+			CQ_USER_MAX_ENTRIES);
+		return -EINVAL;
+	}
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (gaudi->nic_cq_enable) {
+		dev_err(hdev->dev, "NIC CQ was already created\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	gaudi->nic_cq_user_num_of_entries = in->cq_num_of_entries;
+	gaudi->nic_cq_buf = vmalloc_user(gaudi->nic_cq_user_num_of_entries *
+					sizeof(struct hl_nic_cqe));
+	if (!gaudi->nic_cq_buf) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	init_completion(&gaudi->nic_cq_comp);
+	memset(gaudi->nic_cq_buf, 0,
+		gaudi->nic_cq_user_num_of_entries * sizeof(struct hl_nic_cqe));
+
+	spin_lock_init(&gaudi->nic_cq_lock);
+	gaudi->nic_cq_user_ci = 0;
+	gaudi->nic_cq_user_pi = 0;
+	atomic_set(&gaudi->nic_cq_user_new_cqes, 0);
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)) ||
+			!gaudi->nic_devices[i].port_open)
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+		gaudi_nic->cq_ci = gaudi_nic->last_cqe_cnt = 0;
+
+		NIC_WREG32(mmNIC0_RXE0_CQ_PRODUCER_INDEX, 0);
+		NIC_WREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX, 0);
+		NIC_WREG32(mmNIC0_RXE0_CQ_WRITE_INDEX, 0);
+
+		cq_arr = gaudi_nic->cq_mem_cpu;
+		for (j = 0 ; j < CQ_PORT_BUF_LEN ; j++)
+			CQE_SET_INVALID(&cq_arr[j]);
+
+	}
+
+	out->handle = HL_MMAP_TYPE_NIC_CQ << PAGE_SHIFT;
+	gaudi->nic_cq_status = HL_NIC_CQ_SUCCESS;
+	gaudi->nic_cq_enable = true;
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
+
+static void cq_stop(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	if (!gaudi->nic_cq_enable)
+		return;
+
+	/* if the CQ wait IOCTL is in progress, wake it up to return to US */
+	gaudi->nic_cq_enable = false;
+	/* make sure we disable the CQ before waking up the waiter */
+	mb();
+	complete(&gaudi->nic_cq_comp);
+
+	/* let the CQ wait IOCTL do cleanup gracefully */
+	msleep(100);
+}
+
+static int cq_destroy(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int rc = 0;
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable)
+		goto out;
+
+	if (gaudi->nic_cq_mmap) {
+		dev_err(hdev->dev, "NIC CQ is still mapped, can't destroy\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * mark the CQ as disabled while holding the NIC QP error lock to avoid
+	 * from pushing QP error entries to a CQ under destruction
+	 */
+	mutex_lock(&gaudi->nic_qp_err_lock);
+	gaudi->nic_cq_enable = false;
+	mutex_unlock(&gaudi->nic_qp_err_lock);
+
+	/* make sure we disable the CQ before draining the polling threads */
+	mb();
+
+	/*
+	 * Wait for the polling threads to digest the new CQ state. This in
+	 * order to free the user buffer after they stopped processing CQEs and
+	 * copy them to the buffer.
+	 */
+	msleep(100);
+
+	vfree(gaudi->nic_cq_buf);
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
 
 /* used for physically contiguous memory only */
 static int map_nic_mem(struct hl_device *hdev, u64 va, dma_addr_t pa, u32 size)
@@ -1956,6 +2416,8 @@ static int port_open(struct gaudi_nic_device *gaudi_nic)
 		goto cq_unmap;
 	}
 
+	INIT_DELAYED_WORK(&gaudi_nic->cq_work, cq_work);
+
 	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
 		rx_irq = pci_irq_vector(hdev->pdev, RX_MSI_IDX + port);
 
@@ -1998,6 +2460,9 @@ static int port_open(struct gaudi_nic_device *gaudi_nic)
 		napi_enable(&gaudi_nic->napi);
 	}
 
+	queue_delayed_work(gaudi_nic->cq_wq, &gaudi_nic->cq_work,
+				gaudi_nic->cq_delay_idle);
+
 	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
 		INIT_DELAYED_WORK(&gaudi_nic->link_status_work,
 					check_link_status);
@@ -2098,6 +2563,8 @@ static void port_close(struct gaudi_nic_device *gaudi_nic)
 
 	netif_carrier_off(gaudi_nic->ndev);
 
+	cancel_delayed_work_sync(&gaudi_nic->cq_work);
+
 	flush_workqueue(gaudi_nic->cq_wq);
 	destroy_workqueue(gaudi_nic->cq_wq);
 
@@ -2362,6 +2829,33 @@ static void port_unregister(struct gaudi_nic_device *gaudi_nic)
 
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg)
 {
+	struct gaudi_nic_device *gaudi_nic;
+	struct hl_device *hdev = arg;
+	struct gaudi_device *gaudi;
+	int i;
+
+	gaudi = hdev->asic_specific;
+
+	/* one IRQ for all ports, need to iterate and read the cause */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		if (disabled_or_in_reset(gaudi_nic))
+			continue;
+
+		if (NIC_RREG32(mmNIC0_RXE0_MSI_CAUSE) & 2) {
+			dev_crit(hdev->dev, "NIC CQ overrun, port %d\n",
+					gaudi_nic->port);
+			NIC_WREG32(mmNIC0_RXE0_MSI_CAUSE, 0);
+			NIC_WREG32(mmNIC0_RXE0_CQ_MSI_CAUSE_CLR, 0xFFFF);
+			/* flush the cause clear */
+			NIC_RREG32(mmNIC0_RXE0_CQ_MSI_CAUSE_CLR);
+		}
+	}
+
 	return IRQ_HANDLED;
 }
 
@@ -2641,6 +3135,8 @@ int gaudi_nic_hard_reset_prepare(struct hl_device *hdev)
 			(gaudi->nic_in_reset))
 		return 0;
 
+	cq_stop(hdev);
+
 	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
 		if (!(hdev->nic_ports_mask & BIT(i)))
 			continue;
@@ -3159,6 +3655,21 @@ int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input, void *output)
 	case HL_NIC_OP_DESTROY_CONN:
 		rc = destroy_conn(hdev, input);
 		break;
+	case HL_NIC_OP_CQ_CREATE:
+		rc = cq_create(hdev, input, output);
+		break;
+	case HL_NIC_OP_CQ_DESTROY:
+		rc = cq_destroy(hdev);
+		break;
+	case HL_NIC_OP_CQ_WAIT:
+		rc = cq_poll_wait(hdev, input, output, true);
+		break;
+	case HL_NIC_OP_CQ_POLL:
+		rc = cq_poll_wait(hdev, input, output, false);
+		break;
+	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
+		rc = cq_update_consumed_cqes(hdev, input);
+		break;
 	default:
 		dev_err(hdev->dev, "Invalid NIC control request %d\n", op);
 		return -ENOTTY;
@@ -3209,4 +3720,87 @@ void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 	qps_destroy(hdev);
 	/* wait for the NIC to digest the invalid QPs */
 	msleep(20);
+	cq_destroy(hdev);
+}
+
+static void nic_cq_vm_close(struct vm_area_struct *vma)
+{
+	struct hl_device *hdev = (struct hl_device *) vma->vm_private_data;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	long new_mmap_size;
+
+	new_mmap_size = gaudi->nic_cq_mmap_size - (vma->vm_end - vma->vm_start);
+
+	dev_dbg(hdev->dev, "munmap NIC CQEs buffer, new_mmap_size: %ld\n",
+		new_mmap_size);
+
+	if (new_mmap_size > 0) {
+		gaudi->nic_cq_mmap_size = new_mmap_size;
+		return;
+	}
+
+	vma->vm_private_data = NULL;
+	gaudi->nic_cq_mmap = false;
+}
+
+static const struct vm_operations_struct nic_cq_vm_ops = {
+	.close = nic_cq_vm_close
+};
+
+int gaudi_nic_cq_mmap(struct hl_device *hdev, struct vm_area_struct *vma)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 size;
+	int rc;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return -EFAULT;
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable) {
+		dev_err(hdev->dev, "NIC CQ is disabled, can't mmap\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	if (gaudi->nic_cq_mmap) {
+		dev_err(hdev->dev, "NIC CQ is already mmapped, can't mmap\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	size = gaudi->nic_cq_user_num_of_entries * sizeof(struct hl_nic_cqe);
+
+	dev_dbg(hdev->dev, "mmap NIC CQ buffer, size: 0x%x\n", size);
+
+	/* Validation check */
+	if ((vma->vm_end - vma->vm_start) != ALIGN(size, PAGE_SIZE)) {
+		dev_err(hdev->dev,
+			"NIC mmap failed, mmap size 0x%lx != 0x%x CQ buffer size\n",
+			vma->vm_end - vma->vm_start, size);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	vma->vm_ops = &nic_cq_vm_ops;
+	vma->vm_private_data = hdev;
+
+	dev_dbg(hdev->dev, "mapping NIC CQ buffer\n");
+
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY |
+			VM_NORESERVE;
+
+	rc = remap_vmalloc_range(vma, gaudi->nic_cq_buf, 0);
+	if (rc) {
+		dev_err(hdev->dev, "failed to map the NIC CQ buffer\n");
+		goto out;
+	}
+
+	gaudi->nic_cq_mmap_size = size;
+	gaudi->nic_cq_mmap = true;
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
 }
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 13b2bfac2b7a..9620654eefae 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5277,6 +5277,13 @@ static int goya_nic_control(struct hl_device *hdev, u32 op, void *input,
 	return -ENXIO;
 }
 
+static int goya_nic_mmap(struct hl_device *hdev, struct vm_area_struct *vma)
+{
+	dev_err_ratelimited(hdev->dev,
+			"NIC mmap operations cannot be performed on Goya\n");
+	return -ENXIO;
+}
+
 static int goya_get_mac_addr(struct hl_device *hdev,
 			struct hl_info_mac_addr *mac_addr)
 {
@@ -5403,6 +5410,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.send_cpu_message = goya_send_cpu_message,
 	.get_hw_state = goya_get_hw_state,
 	.nic_control = goya_nic_control,
+	.nic_cq_mmap = goya_nic_mmap,
 	.pci_bars_map = goya_pci_bars_map,
 	.init_iatu = goya_init_iatu,
 	.get_mac_addr = goya_get_mac_addr,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index dbee6a16b952..83a707c207f7 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -852,6 +852,46 @@ struct hl_debug_args {
 #define HL_NIC_MIN_CONN_ID	1
 #define HL_NIC_MAX_CONN_ID	1023
 
+/* Requester */
+#define HL_NIC_CQE_TYPE_REQ	0
+/* Responder */
+#define HL_NIC_CQE_TYPE_RES	1
+
+/**
+ * struct hl_nic_cqe: NIC CQ entry. This structure is shared between the driver
+ *                    and the user application. It represents each entry of the
+ *                    NIC CQ buffer.
+ * @requester.wqe_index: work queue index - for requester only.
+ * @responder.msg_id: message ID to notify which receive action was completed -
+ *                    for responder only.
+ * @qp_err.syndrome: error syndrome of the QP error - for QP error only.
+ * @port: NIC port index of the related CQ.
+ * @qp_number: QP number - for requester or QP error only.
+ * @type: type of the CQE - requester or responder.
+ * @is_err: true for QP error entry, false otherwise.
+ */
+struct hl_nic_cqe {
+	union {
+		struct {
+			__u32 wqe_index;
+		} requester;
+
+		struct {
+			__u32 msg_id;
+		} responder;
+
+		struct {
+			__u32 syndrome;
+		} qp_err;
+	};
+
+	__u32 port;
+	__u32 qp_number;
+	__u8 type;
+	__u8 is_err;
+	__u8 pad[2];
+};
+
 struct hl_nic_alloc_conn_in {
 	/* NIC port ID */
 	__u32 port;
@@ -938,6 +978,53 @@ struct hl_nic_destroy_conn_in {
 	__u32 conn_id;
 };
 
+struct hl_nic_cq_create_in {
+	/* Number of entries in the CQ buffer */
+	__u32 cq_num_of_entries;
+	__u32 pad;
+};
+
+struct hl_nic_cq_create_out {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+};
+
+struct hl_nic_cq_destroy_in {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+};
+
+struct hl_nic_cq_update_consumed_cqes_in {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+	/* Number of consumed CQEs */
+	__u32 cq_num_of_consumed_entries;
+	__u32 pad;
+};
+
+struct hl_nic_cq_poll_wait_in {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+	/* Absolute timeout to wait in microseconds */
+	__u64 timeout_us;
+};
+
+enum hl_nic_cq_status {
+	HL_NIC_CQ_SUCCESS,
+	HL_NIC_CQ_TIMEOUT,
+	HL_NIC_CQ_OVERFLOW
+};
+
+struct hl_nic_cq_poll_wait_out {
+	/* CQE producer index - first CQE to consume */
+	__u32 pi;
+	/* Number of CQEs to consume, starting from pi */
+	__u32 num_of_cqes;
+	/* Return status */
+	__u32 status;
+	__u32 pad;
+};
+
 /* Opcode to allocate connection ID */
 #define HL_NIC_OP_ALLOC_CONN			0
 /* Opcode to set up a requester connection context */
@@ -946,6 +1033,16 @@ struct hl_nic_destroy_conn_in {
 #define HL_NIC_OP_SET_RES_CONN_CTX		2
 /* Opcode to destroy a connection */
 #define HL_NIC_OP_DESTROY_CONN			3
+/* Opcode to create a CQ */
+#define HL_NIC_OP_CQ_CREATE			4
+/* Opcode to destroy a CQ */
+#define HL_NIC_OP_CQ_DESTROY			5
+/* Opcode to wait on CQ */
+#define HL_NIC_OP_CQ_WAIT			6
+/* Opcode to poll on CQ */
+#define HL_NIC_OP_CQ_POLL			7
+/* Opcode to update the number of consumed CQ entries */
+#define HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES	8
 
 struct hl_nic_args {
 	/* Pointer to user input structure (relevant to specific opcodes) */
@@ -1136,10 +1233,24 @@ struct hl_nic_args {
  * - Set up a requester connection context
  * - Set up a responder connection context
  * - Destroy a connection
+ * - Create a completion queue
+ * - Destroy a completion queue
+ * - Wait on completion queue
+ * - Poll a completion queue
+ * - Update consumed completion queue entries
  *
  * For all operations, the user should provide a pointer to an input structure
  * with the context parameters. Some of the operations also require a pointer to
  * an output structure for result/status.
+ * The CQ create operation returns a handle which the user-space process needs
+ * to use to mmap the CQ buffer in order to access the CQ entries.
+ * This handle should be provided when destroying the CQ.
+ * The poll/wait CQ operations return the number of available CQ entries of type
+ * struct hl_nic_cqe.
+ * Since the CQ is a cyclic buffer, the user-space process needs to inform the
+ * driver regarding how many of the available CQEs were actually
+ * processed/consumed. Only then the driver will override them with newer
+ * entries.
  *
  */
 #define HL_IOCTL_NIC	_IOWR('H', 0x07, struct hl_nic_args)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 10/14] habanalabs/gaudi: add WQ control operations
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (7 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 09/14] habanalabs/gaudi: add CQ " Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 11/14] habanalabs/gaudi: add QP error handling Oded Gabbay
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add Work Queue (WQ) opcodes to NIC ioctl. A WQ contains entries (WQEs)
where each WQE represents a packet that should be sent or received.

Each WQ has two types: requester (sender) and responder (receiver).

The added opcodes are:
- Set WQ: set the WQ configuration in the HW. The user should provide the
          device virtual address of the WQ.
- Unset WQ: reset the WQ configuration in the HW.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 .../misc/habanalabs/common/habanalabs_ioctl.c |  10 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 184 ++++++++++++++++++
 include/uapi/misc/habanalabs.h                |  33 ++++
 3 files changed, 225 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 6ba1b9da0486..faf7eeb88b4f 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -24,7 +24,7 @@ static u32 hl_debug_struct_size[HL_DEBUG_OP_TIMESTAMP + 1] = {
 
 };
 
-static u32 hl_nic_input_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
+static u32 hl_nic_input_size[HL_NIC_OP_USER_WQ_UNSET + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_in),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = sizeof(struct hl_nic_req_conn_ctx_in),
 	[HL_NIC_OP_SET_RES_CONN_CTX] = sizeof(struct hl_nic_res_conn_ctx_in),
@@ -35,9 +35,11 @@ static u32 hl_nic_input_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_in),
 	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] =
 			sizeof(struct hl_nic_cq_update_consumed_cqes_in),
+	[HL_NIC_OP_USER_WQ_SET] = sizeof(struct hl_nic_user_wq_arr_set_in),
+	[HL_NIC_OP_USER_WQ_UNSET] = sizeof(struct hl_nic_user_wq_arr_unset_in)
 };
 
-static u32 hl_nic_output_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
+static u32 hl_nic_output_size[HL_NIC_OP_USER_WQ_UNSET + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_out),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = 0,
 	[HL_NIC_OP_SET_RES_CONN_CTX] = 0,
@@ -47,6 +49,8 @@ static u32 hl_nic_output_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_CQ_WAIT] = sizeof(struct hl_nic_cq_poll_wait_out),
 	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_out),
 	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] = 0,
+	[HL_NIC_OP_USER_WQ_SET] = 0,
+	[HL_NIC_OP_USER_WQ_UNSET] = 0
 };
 
 static int device_status_info(struct hl_device *hdev, struct hl_info_args *args)
@@ -641,6 +645,8 @@ static int hl_nic_ioctl(struct hl_fpriv *hpriv, void *data)
 	case HL_NIC_OP_CQ_WAIT:
 	case HL_NIC_OP_CQ_POLL:
 	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
+	case HL_NIC_OP_USER_WQ_SET:
+	case HL_NIC_OP_USER_WQ_UNSET:
 		args->input_size =
 			min(args->input_size, hl_nic_input_size[args->op]);
 		args->output_size =
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 999e9ded22fb..37f25247f751 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -3302,6 +3302,170 @@ int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 	return 0;
 }
 
+static int wq_port_check(struct hl_device *hdev, u32 port)
+{
+	if (port >= NIC_NUMBER_OF_ENGINES) {
+		dev_err(hdev->dev, "Invalid port %d\n", port);
+		return -EINVAL;
+	}
+
+	if (!(hdev->nic_ports_mask & BIT(port))) {
+		dev_err(hdev->dev, "Port %d is disabled\n", port);
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int user_wq_arr_set(struct hl_device *hdev,
+				struct hl_nic_user_wq_arr_set_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u64 wq_base_addr, num_of_wq_entries_log;
+	struct gaudi_nic_device *gaudi_nic;
+	u32 port, type;
+	int rc;
+
+	if (!in) {
+		dev_err(hdev->dev, "missing parameters, can't set user WQ\n");
+		return -EINVAL;
+	}
+
+	type = in->type;
+	if (type != HL_NIC_USER_WQ_SEND && type != HL_NIC_USER_WQ_RECV) {
+		dev_err(hdev->dev, "invalid type %d, can't set user WQ\n",
+			type);
+		return -EINVAL;
+	}
+
+	port = in->port;
+
+	rc = wq_port_check(hdev, port);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[port];
+
+	if (in->num_of_wqs == 0) {
+		dev_err(hdev->dev,
+			"number of WQs must be bigger than zero, port: %d\n",
+			port);
+		return -EINVAL;
+	}
+
+	/* H/W limitation */
+	if (in->num_of_wqs > NIC_HW_MAX_QP_NUM) {
+		dev_err(hdev->dev,
+			"number of WQs (0x%x) can't be bigger than 0x%x, port: %d\n",
+			in->num_of_wqs, NIC_HW_MAX_QP_NUM, port);
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(in->num_of_wq_entries)) {
+		dev_err(hdev->dev,
+			"number of entries (0x%x) must be a power of 2, port: %d\n",
+			in->num_of_wq_entries, port);
+		return -EINVAL;
+	}
+
+	/* H/W cache line constraint */
+	if (in->num_of_wq_entries < 4) {
+		dev_err(hdev->dev,
+			"number of entries (0x%x) must be at least 4, port: %d\n",
+			in->num_of_wq_entries, port);
+		return -EINVAL;
+	}
+
+	/* H/W limitation */
+	if (in->num_of_wq_entries > USER_WQES_MAX_NUM) {
+		dev_err(hdev->dev,
+			"number of entries (0x%x) can't be bigger than 0x%x, port: %d\n",
+			in->num_of_wq_entries, USER_WQES_MAX_NUM, port);
+		return -EINVAL;
+	}
+
+	if (!IS_ALIGNED(in->addr, DEVICE_CACHE_LINE_SIZE)) {
+		dev_err(hdev->dev,
+			"WQ VA (0x%llx) must be aligned to cache line size (0x%x), port: %d\n",
+			in->addr, DEVICE_CACHE_LINE_SIZE, port);
+		return -EINVAL;
+	}
+
+	wq_base_addr = in->addr;
+	num_of_wq_entries_log = ilog2(in->num_of_wq_entries);
+
+	mutex_lock(&gaudi_nic->user_wq_lock);
+
+	if (type == HL_NIC_USER_WQ_SEND) {
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_49_32_0,
+				(wq_base_addr >> 32) & 0x3FFFFF);
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_0,
+				wq_base_addr & 0xFFFFFFFF);
+		NIC_WREG32(mmNIC0_TXE0_LOG_MAX_WQ_SIZE_0,
+				num_of_wq_entries_log - 2);
+	} else {
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_LO,
+				wq_base_addr & 0xFFFFFFFF);
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_HI,
+			((wq_base_addr >> 32) & 0xFFFFFFFF) |
+			((num_of_wq_entries_log - 4) << 24));
+	}
+
+	mutex_unlock(&gaudi_nic->user_wq_lock);
+
+	return 0;
+}
+
+static void _user_wq_arr_unset(struct hl_device *hdev, u32 port, u32 type)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+
+	gaudi_nic = &gaudi->nic_devices[port];
+
+	mutex_lock(&gaudi_nic->user_wq_lock);
+
+	if (type == HL_NIC_USER_WQ_SEND) {
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_49_32_0, 0);
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_0, 0);
+		NIC_WREG32(mmNIC0_TXE0_LOG_MAX_WQ_SIZE_0, 0);
+	} else {
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_LO, 0);
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_HI, 0);
+	}
+
+	mutex_unlock(&gaudi_nic->user_wq_lock);
+}
+
+static int user_wq_arr_unset(struct hl_device *hdev,
+				struct hl_nic_user_wq_arr_unset_in *in)
+{
+	u32 port, type;
+	int rc;
+
+	if (!in) {
+		dev_err(hdev->dev, "missing parameters, can't unset user WQ\n");
+		return -EINVAL;
+	}
+
+	type = in->type;
+	if (type != HL_NIC_USER_WQ_SEND && type != HL_NIC_USER_WQ_RECV) {
+		dev_err(hdev->dev, "invalid type %d, can't unset user WQ\n",
+			type);
+		return -EINVAL;
+	}
+
+	port = in->port;
+
+	rc = wq_port_check(hdev, port);
+	if (rc)
+		return rc;
+
+	_user_wq_arr_unset(hdev, port, type);
+
+	return 0;
+}
+
 static struct hl_qp *qp_get(struct hl_device *hdev,
 			struct gaudi_nic_device *gaudi_nic, u32 conn_id)
 {
@@ -3670,6 +3834,12 @@ int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input, void *output)
 	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
 		rc = cq_update_consumed_cqes(hdev, input);
 		break;
+	case HL_NIC_OP_USER_WQ_SET:
+		rc = user_wq_arr_set(hdev, input);
+		break;
+	case HL_NIC_OP_USER_WQ_UNSET:
+		rc = user_wq_arr_unset(hdev, input);
+		break;
 	default:
 		dev_err(hdev->dev, "Invalid NIC control request %d\n", op);
 		return -ENOTTY;
@@ -3709,6 +3879,19 @@ static void qps_destroy(struct hl_device *hdev)
 	}
 }
 
+static void wq_arrs_destroy(struct hl_device *hdev)
+{
+	int i;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		_user_wq_arr_unset(hdev, i, HL_NIC_USER_WQ_SEND);
+		_user_wq_arr_unset(hdev, i, HL_NIC_USER_WQ_RECV);
+	}
+}
+
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 {
 	struct gaudi_device *gaudi = ctx->hdev->asic_specific;
@@ -3721,6 +3904,7 @@ void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 	/* wait for the NIC to digest the invalid QPs */
 	msleep(20);
 	cq_destroy(hdev);
+	wq_arrs_destroy(hdev);
 }
 
 static void nic_cq_vm_close(struct vm_area_struct *vma)
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 83a707c207f7..7fa23b06249e 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -1025,6 +1025,31 @@ struct hl_nic_cq_poll_wait_out {
 	__u32 pad;
 };
 
+/* Send user WQ array type */
+#define HL_NIC_USER_WQ_SEND	0
+/* Receive user WQ array type */
+#define HL_NIC_USER_WQ_RECV	1
+
+struct hl_nic_user_wq_arr_set_in {
+	/* WQ array address */
+	__u64 addr;
+	/* NIC port ID */
+	__u32 port;
+	/* Number of user WQs */
+	__u32 num_of_wqs;
+	/* Number of entries per user WQ */
+	__u32 num_of_wq_entries;
+	/* Type of user WQ array */
+	__u32 type;
+};
+
+struct hl_nic_user_wq_arr_unset_in {
+	/* NIC port ID */
+	__u32 port;
+	/* Type of user WQ array */
+	__u32 type;
+};
+
 /* Opcode to allocate connection ID */
 #define HL_NIC_OP_ALLOC_CONN			0
 /* Opcode to set up a requester connection context */
@@ -1043,6 +1068,10 @@ struct hl_nic_cq_poll_wait_out {
 #define HL_NIC_OP_CQ_POLL			7
 /* Opcode to update the number of consumed CQ entries */
 #define HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES	8
+/* Opcode to set a user WQ array */
+#define HL_NIC_OP_USER_WQ_SET			9
+/* Opcode to unset a user WQ array */
+#define HL_NIC_OP_USER_WQ_UNSET			10
 
 struct hl_nic_args {
 	/* Pointer to user input structure (relevant to specific opcodes) */
@@ -1238,6 +1267,8 @@ struct hl_nic_args {
  * - Wait on completion queue
  * - Poll a completion queue
  * - Update consumed completion queue entries
+ * - Set a work queue
+ * - Unset a work queue
  *
  * For all operations, the user should provide a pointer to an input structure
  * with the context parameters. Some of the operations also require a pointer to
@@ -1251,6 +1282,8 @@ struct hl_nic_args {
  * driver regarding how many of the available CQEs were actually
  * processed/consumed. Only then the driver will override them with newer
  * entries.
+ * The set WQ operation should provide the device virtual address of the WQ with
+ * a matching size for the number of WQs and entries per WQ.
  *
  */
 #define HL_IOCTL_NIC	_IOWR('H', 0x07, struct hl_nic_args)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 11/14] habanalabs/gaudi: add QP error handling
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (8 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 10/14] habanalabs/gaudi: add WQ " Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 12/14] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add Queue Pair (QP) error notification to the user e.g. security violation,
too many retransmissions, invalid QP etc.

Whenever a QP caused an error, the firmware will send an event to the
driver which will push the error as an error entry to the Completion Queue
(if exists).

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/gaudi.c     | 13 ++++
 drivers/misc/habanalabs/gaudi/gaudiP.h    |  1 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c | 95 +++++++++++++++++++++++
 3 files changed, 109 insertions(+)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 4602e4780651..71c9e2d18032 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -6660,6 +6660,19 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 		hl_fw_unmask_irq(hdev, event_type);
 		break;
 
+	case GAUDI_EVENT_NIC0_QP0:
+	case GAUDI_EVENT_NIC0_QP1:
+	case GAUDI_EVENT_NIC1_QP0:
+	case GAUDI_EVENT_NIC1_QP1:
+	case GAUDI_EVENT_NIC2_QP0:
+	case GAUDI_EVENT_NIC2_QP1:
+	case GAUDI_EVENT_NIC3_QP0:
+	case GAUDI_EVENT_NIC3_QP1:
+	case GAUDI_EVENT_NIC4_QP0:
+	case GAUDI_EVENT_NIC4_QP1:
+		gaudi_nic_handle_qp_err(hdev, event_type);
+		break;
+
 	case GAUDI_EVENT_PSOC_GPIO_U16_0:
 		cause = le64_to_cpu(eq_entry->data[0]) & 0xFF;
 		dev_err(hdev->dev,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 3158d5d68c1d..7d7439da88bc 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -576,5 +576,6 @@ netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
 					struct sk_buff *skb);
 int gaudi_nic_sw_init(struct hl_device *hdev);
 void gaudi_nic_sw_fini(struct hl_device *hdev);
+void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type);
 
 #endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 37f25247f751..49e94e9c786a 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -3988,3 +3988,98 @@ int gaudi_nic_cq_mmap(struct hl_device *hdev, struct vm_area_struct *vma)
 
 	return rc;
 }
+
+static char *get_syndrome_text(u32 syndrome)
+{
+	char *str;
+
+	switch (syndrome) {
+	case 0x05:
+		str = "Rx got invalid QP";
+		break;
+	case 0x06:
+		str = "Rx transport service mismatch";
+		break;
+	case 0x09:
+		str = "Rx Rkey check failed";
+		break;
+	case 0x40:
+		str = "timer retry exceeded";
+		break;
+	case 0x41:
+		str = "NACK retry exceeded";
+		break;
+	case 0x42:
+		str = "doorbell on invalid QP";
+		break;
+	case 0x43:
+		str = "doorbell security check failed";
+		break;
+	case 0x44:
+		str = "Tx got invalid QP";
+		break;
+	case 0x45:
+		str = "responder got ACK/NACK on invalid QP";
+		break;
+	case 0x46:
+		str = "responder try to send ACK/NACK on invalid QP";
+		break;
+	default:
+		str = "unknown syndrome";
+		break;
+	}
+
+	return str;
+}
+
+void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct qp_err *qp_err_arr;
+	struct hl_nic_cqe cqe_sw;
+	u32 pi, ci;
+
+	gaudi_nic = &gaudi->nic_devices[event_type - GAUDI_EVENT_NIC0_QP0];
+	qp_err_arr = gaudi_nic->qp_err_mem_cpu;
+
+	mutex_lock(&gaudi->nic_qp_err_lock);
+
+	if (!gaudi->nic_cq_enable)
+		dev_err_ratelimited(hdev->dev,
+			"received NIC %d QP error event %d but no CQ to push it\n",
+			gaudi_nic->port, event_type);
+
+	pi = NIC_RREG32(mmNIC0_QPC0_ERR_FIFO_PRODUCER_INDEX);
+	ci = gaudi_nic->qp_err_ci;
+
+	cqe_sw.is_err = true;
+	cqe_sw.port = gaudi_nic->port;
+
+	while (ci < pi) {
+		cqe_sw.type = QP_ERR_IS_REQ(qp_err_arr[ci]) ?
+				HL_NIC_CQE_TYPE_REQ : HL_NIC_CQE_TYPE_RES;
+		cqe_sw.qp_number = QP_ERR_QP_NUM(qp_err_arr[ci]);
+		cqe_sw.qp_err.syndrome = QP_ERR_ERR_NUM(qp_err_arr[ci]);
+
+		ci = (ci + 1) & (QP_ERR_BUF_LEN - 1);
+
+		dev_err_ratelimited(hdev->dev,
+			"NIC QP error port: %d, type: %d, qpn: %d, syndrome: %s (0x%x)\n",
+			cqe_sw.port, cqe_sw.type, cqe_sw.qp_number,
+			get_syndrome_text(cqe_sw.qp_err.syndrome),
+			cqe_sw.qp_err.syndrome);
+
+		if (gaudi->nic_cq_enable)
+			copy_cqe_to_main_queue(hdev, &cqe_sw);
+	}
+
+	gaudi_nic->qp_err_ci = ci;
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_CONSUMER_INDEX, ci);
+
+	/* signal the completion queue that there are available CQEs */
+	if (gaudi->nic_cq_enable)
+		complete(&gaudi->nic_cq_comp);
+
+	mutex_unlock(&gaudi->nic_qp_err_lock);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 12/14] habanalabs/gaudi: Add ethtool support using coresight
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (9 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 11/14] habanalabs/gaudi: add QP error handling Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 13/14] habanalabs/gaudi: support DCB protocol Oded Gabbay
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

The driver supports ethtool callbacks and provides statistics using the
device's profiling infrastructure (coresight).

We support the basic ethtool functionality and counters, as far as our H/W
provides support.

A summary of the supported callbacks:

- get_drvinfo: fill some basic information regarding the driver
- get_link_ksettings: get basic settings like speed, duplex,
                      Auto-negotiation and link modes.
- set_link_ksettings: only speed and Auto-negotiation setting is supported.
- get_link: returns link indication.
- get_strings: get counters strings.
- get_sset_count: get counters number.
- get_ethtool_stats: get counters values.
- get_module_info: get EEPROM type and length.
- get_module_eeprom: get EEPROM (supported in raw mode only).

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
Changes in v3:
  - verify the offset and length before copying the eeprom dump to the user
    to prevent kernel memroy leakage
  - Remove the Tx/Rx counters titles from ethtool statistics

 drivers/misc/habanalabs/gaudi/Makefile        |   3 +-
 drivers/misc/habanalabs/gaudi/gaudi.c         |   1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |   7 +
 .../misc/habanalabs/gaudi/gaudi_coresight.c   | 144 ++++
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     |   5 +
 .../misc/habanalabs/gaudi/gaudi_nic_ethtool.c | 616 ++++++++++++++++++
 6 files changed, 775 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c

diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index c5143cf6f025..df674c5973e0 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -2,4 +2,5 @@
 HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
-HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o
+HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o \
+	gaudi/gaudi_nic_ethtool.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 71c9e2d18032..2af07eb4165c 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -1044,6 +1044,7 @@ static int gaudi_sw_init(struct hl_device *hdev)
 	gaudi->cpucp_info_get = gaudi_cpucp_info_get;
 	gaudi->nic_handle_rx = gaudi_nic_handle_rx;
 	gaudi->nic_handle_tx = gaudi_nic_handle_tx;
+	gaudi->nic_spmu_init = gaudi_nic_spmu_init;
 
 	gaudi->max_freq_value = GAUDI_MAX_CLK_FREQ;
 
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 7d7439da88bc..8b420a86c11b 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -438,6 +438,7 @@ struct gaudi_internal_qman_info {
  * @cpucp_info_get: get information on device from CPU-CP
  * @nic_handle_rx: NIC handler for incoming packet.
  * @nic_handle_tx: NIC handler for outgoing packet.
+ * @nic_spmu_init: initialize NIC CoreSight spmu counters.
  * @nic_devices: array that holds all NIC ports manage structures.
  * @nic_macros: array that holds all NIC macros manage structures.
  * @nic_pam4_tx_taps: array that holds all PAM4 Tx taps of all NIC lanes.
@@ -501,6 +502,7 @@ struct gaudi_device {
 	int (*cpucp_info_get)(struct hl_device *hdev);
 	void (*nic_handle_rx)(struct gaudi_nic_device *gaudi_nic);
 	int (*nic_handle_tx)(struct gaudi_nic_device *gaudi_nic, void *data);
+	void (*nic_spmu_init)(struct hl_device *hdev, int port);
 	struct gaudi_nic_device		nic_devices[NIC_NUMBER_OF_PORTS];
 	struct gaudi_nic_macro		nic_macros[NIC_NUMBER_OF_MACROS];
 	struct gaudi_nic_tx_taps	nic_pam4_tx_taps[NIC_MAX_NUM_OF_LANES];
@@ -574,8 +576,13 @@ irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
 netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
 					struct sk_buff *skb);
+void gaudi_nic_spmu_init(struct hl_device *hdev, int port);
 int gaudi_nic_sw_init(struct hl_device *hdev);
 void gaudi_nic_sw_fini(struct hl_device *hdev);
 void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type);
+int gaudi_config_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_event_types, u32 event_types[]);
+int gaudi_sample_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_out_data, u64 out_data[]);
 
 #endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_coresight.c b/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
index 881531d4d9da..6b43501d20ad 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
@@ -16,6 +16,11 @@
 #define SPMU_SECTION_SIZE		MME0_ACC_SPMU_MAX_OFFSET
 #define SPMU_EVENT_TYPES_OFFSET		0x400
 #define SPMU_MAX_COUNTERS		6
+#define PMSCR				0x6F0	/* Snapshot Control */
+#define PMEVCNTSR0			0x620	/* Event Counters Snapshot */
+#define PMOVSSR				0x614	/* Overflow Status Snapshot */
+#define PMCCNTSR_L			0x618	/* Cycle Counter Snapshot */
+#define PMCCNTSR_H			0x61c	/* Cycle Counter Snapshot */
 
 static u64 debug_stm_regs[GAUDI_STM_LAST + 1] = {
 	[GAUDI_STM_MME0_ACC]	= mmMME0_ACC_STM_BASE,
@@ -752,6 +757,27 @@ static int gaudi_config_bmon(struct hl_device *hdev,
 	return 0;
 }
 
+static bool gaudi_reg_is_nic_spmu(enum gaudi_debug_spmu_regs_index reg_idx)
+{
+	switch (reg_idx) {
+	case GAUDI_SPMU_NIC0_0:
+	case GAUDI_SPMU_NIC0_1:
+	case GAUDI_SPMU_NIC1_0:
+	case GAUDI_SPMU_NIC1_1:
+	case GAUDI_SPMU_NIC2_0:
+	case GAUDI_SPMU_NIC2_1:
+	case GAUDI_SPMU_NIC3_0:
+	case GAUDI_SPMU_NIC3_1:
+	case GAUDI_SPMU_NIC4_0:
+	case GAUDI_SPMU_NIC4_1:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
 static int gaudi_config_spmu(struct hl_device *hdev,
 		struct hl_debug_params *params)
 {
@@ -769,6 +795,16 @@ static int gaudi_config_spmu(struct hl_device *hdev,
 		return -EINVAL;
 	}
 
+	/*
+	 * NIC spmus are now configured by driver at init
+	 * and not accessible to user in dbg mode
+	 */
+	if (hdev->in_debug && gaudi_reg_is_nic_spmu(params->reg_idx)) {
+		dev_err(hdev->dev,
+			"Rejecting user debug configuration for NIC spmu\n");
+		return -EFAULT;
+	}
+
 	base_reg = debug_spmu_regs[params->reg_idx] - CFG_BASE;
 
 	if (params->enable) {
@@ -837,6 +873,114 @@ static int gaudi_config_spmu(struct hl_device *hdev,
 	return 0;
 }
 
+static int gaudi_sample_spmu(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	u32 output_arr_len;
+	u32 cycle_cnt_idx;
+	u32 overflow_idx;
+	u32 events_num;
+	u64 base_reg;
+	u64 *output;
+	int i;
+
+	if (params->reg_idx >= ARRAY_SIZE(debug_spmu_regs)) {
+		dev_err(hdev->dev, "Invalid register index in SPMU\n");
+		return -EINVAL;
+	}
+
+	base_reg = debug_spmu_regs[params->reg_idx] - CFG_BASE;
+
+	output = params->output;
+	output_arr_len = params->output_size / 8;
+	events_num = output_arr_len - 2;
+	overflow_idx = output_arr_len - 2;
+	cycle_cnt_idx = output_arr_len - 1;
+
+	if (!output)
+		return -EINVAL;
+
+	if (output_arr_len < 1) {
+		dev_err(hdev->dev,
+			"not enough values for SPMU sample\n");
+		return -EINVAL;
+	}
+
+	if (events_num > SPMU_MAX_COUNTERS) {
+		dev_err(hdev->dev,
+			"too many events values for SPMU sample\n");
+		return -EINVAL;
+	}
+
+	/* capture */
+	WREG32(base_reg + PMSCR, 1);
+
+	/* read the shadow registers */
+	for (i = 0 ; i < events_num ; i++)
+		output[i] = RREG32(base_reg + PMEVCNTSR0 + i * 4);
+
+	/* also get overflow and cyclecount */
+	if (output_arr_len == SPMU_MAX_COUNTERS + 2) {
+		output[overflow_idx] = RREG32(base_reg + PMOVSSR);
+
+		output[cycle_cnt_idx] = RREG32(base_reg + PMCCNTSR_H);
+		output[cycle_cnt_idx] <<= 32;
+		output[cycle_cnt_idx] |= RREG32(base_reg + PMCCNTSR_L);
+	}
+
+	return 0;
+}
+
+int gaudi_config_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_event_types, u32 event_types[])
+{
+	struct hl_debug_params_spmu spmu;
+	struct hl_debug_params params;
+	int i;
+
+	/* validate nic port */
+	if  (!gaudi_reg_is_nic_spmu(GAUDI_SPMU_NIC0_0 + port)) {
+		dev_err(hdev->dev, "Invalid nic port %u\n", port);
+		return -EFAULT;
+	}
+
+	memset(&params, 0, sizeof(struct hl_debug_params));
+	params.op = HL_DEBUG_OP_SPMU;
+	params.input = &spmu;
+	params.enable = true;
+	params.reg_idx = GAUDI_SPMU_NIC0_0 + port;
+
+	memset(&spmu, 0, sizeof(struct hl_debug_params_spmu));
+	spmu.event_types_num  = num_event_types;
+
+	for (i = 0 ; i < spmu.event_types_num ; i++)
+		spmu.event_types[i] = event_types[i];
+
+	return gaudi_config_spmu(hdev, &params);
+}
+
+int gaudi_sample_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_out_data, u64 out_data[])
+{
+	struct hl_debug_params params;
+
+	if (!hdev->supports_coresight)
+		return 0;
+
+	/* validate nic port */
+	if  (!gaudi_reg_is_nic_spmu(GAUDI_SPMU_NIC0_0 + port)) {
+		dev_err(hdev->dev, "Invalid nic port %u\n", port);
+		return -EFAULT;
+	}
+
+	memset(&params, 0, sizeof(struct hl_debug_params));
+	params.output = out_data;
+	params.output_size = num_out_data * sizeof(uint64_t);
+	params.reg_idx = GAUDI_SPMU_NIC0_0 + port;
+
+	return gaudi_sample_spmu(hdev, &params);
+}
+
 int gaudi_debug_coresight(struct hl_device *hdev, void *data)
 {
 	struct hl_debug_params *params = data;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 49e94e9c786a..c97e5f0e1c53 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -2778,6 +2778,7 @@ static int port_register(struct hl_device *hdev, int port)
 	ndev->dev_port = port;
 
 	ndev->netdev_ops = &gaudi_nic_netdev_ops;
+	ndev->ethtool_ops = &gaudi_nic_ethtool_ops;
 	ndev->watchdog_timeo = NIC_TX_TIMEOUT;
 	ndev->min_mtu = ETH_MIN_MTU;
 	ndev->max_mtu = NIC_MAX_MTU;
@@ -2801,6 +2802,8 @@ static int port_register(struct hl_device *hdev, int port)
 				port);
 	}
 
+	gaudi->nic_spmu_init(hdev, port);
+
 	if (register_netdev(ndev)) {
 		dev_err(hdev->dev,
 			"Could not register netdevice, port: %d\n", port);
@@ -3265,6 +3268,8 @@ void gaudi_nic_ports_reopen(struct hl_device *hdev)
 			continue;
 		}
 
+		gaudi->nic_spmu_init(hdev, port);
+
 		schedule_delayed_work(&gaudi_nic->port_open_work,
 					msecs_to_jiffies(1));
 	}
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c b/drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
new file mode 100644
index 000000000000..62c8cd40c927
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
@@ -0,0 +1,616 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include "../include/gaudi/asic_reg/gaudi_regs.h"
+#include <linux/pci.h>
+
+#define NIC_STATS_LEN		ARRAY_SIZE(gaudi_nic_ethtool_stats)
+#define NIC_SPMU0_STATS_LEN	ARRAY_SIZE(gaudi_nic0_spmu_event_type)
+#define NIC_SPMU1_STATS_LEN	ARRAY_SIZE(gaudi_nic1_spmu_event_type)
+#define NIC_SPMU_STATS_LEN_MAX	6
+#define NIC_MAC_STATS_RX_LEN	ARRAY_SIZE(gaudi_nic_mac_stats_rx)
+#define NIC_MAC_STATS_TX_LEN	ARRAY_SIZE(gaudi_nic_mac_stats_tx)
+#define NIC_XPCS91_REGS_CNT_LEN	ARRAY_SIZE(gaudi_nic_xpcs91_reg_type)
+#define NIC_SW_CNT_LEN		ARRAY_SIZE(gaudi_nic_sw_cnt_type)
+
+#define NIC_MAC_STAT_BLOCK_SIZE	(mmNIC1_STAT_BASE - mmNIC0_STAT_BASE)
+#define NIC_MAC_STAT_HI_PART	mmNIC0_STAT_DATA_HI_REG
+#define NIC_MAC_RX_PORT0_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS
+#define NIC_MAC_RX_PORT1_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS_2
+#define NIC_MAC_TX_PORT0_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS_4
+#define NIC_MAC_TX_PORT1_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS_6
+
+#define NIC_MAC_STAT_BASE(port) \
+			((u64) (NIC_MAC_STAT_BLOCK_SIZE * (u64) ((port) >> 1)))
+
+#define NIC_MAC_STAT_RREG32(port, reg) \
+			RREG32(NIC_MAC_STAT_BASE(port) + (reg))
+
+#define ethtool_add_mode ethtool_link_ksettings_add_link_mode
+
+struct gaudi_nic_ethtool_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int stat_offset;
+};
+
+struct gaudi_nic_spmu_event_type {
+	char stat_string[ETH_GSTRING_LEN];
+	int index;
+};
+
+struct gaudi_nic_xpcs91_reg_type {
+	char stat_string[ETH_GSTRING_LEN];
+	int lo_offset;
+	int hi_offset;
+};
+
+struct gaudi_nic_sw_cnt_type {
+	char stat_string[ETH_GSTRING_LEN];
+};
+
+#define NIC_STAT(m) {__stringify(m), offsetof(struct net_device, stats.m)}
+
+static struct gaudi_nic_ethtool_stats gaudi_nic_ethtool_stats[] = {
+	NIC_STAT(rx_packets),
+	NIC_STAT(tx_packets),
+	NIC_STAT(rx_bytes),
+	NIC_STAT(tx_bytes),
+	NIC_STAT(rx_errors),
+	NIC_STAT(tx_errors),
+	NIC_STAT(rx_dropped),
+	NIC_STAT(tx_dropped),
+	NIC_STAT(multicast),
+	NIC_STAT(collisions),
+	NIC_STAT(rx_length_errors),
+	NIC_STAT(rx_over_errors),
+	NIC_STAT(rx_crc_errors),
+	NIC_STAT(rx_frame_errors),
+	NIC_STAT(rx_fifo_errors),
+	NIC_STAT(rx_missed_errors),
+	NIC_STAT(tx_aborted_errors),
+	NIC_STAT(tx_carrier_errors),
+	NIC_STAT(tx_fifo_errors),
+	NIC_STAT(tx_heartbeat_errors),
+	NIC_STAT(tx_window_errors)
+};
+
+static struct gaudi_nic_ethtool_stats gaudi_nic_mac_stats_rx[] = {
+	{"etherStatsOctets", 0x0},
+	{"OctetsReceivedOK", 0x4},
+	{"aAlignmentErrors", 0x8},
+	{"aPAUSEMACCtrlFramesReceived", 0xC},
+	{"aFrameTooLongErrors", 0x10},
+	{"aInRangeLengthErrors", 0x14},
+	{"aFramesReceivedOK", 0x18},
+	{"VLANReceivedOK", 0x1C},
+	{"aFrameCheckSequenceErrors", 0x20},
+	{"ifInErrors", 0x24},
+	{"ifInUcastPkts", 0x28},
+	{"ifInMulticastPkts", 0x2C},
+	{"ifInBroadcastPkts", 0x30},
+	{"etherStatsDropEvents", 0x34},
+	{"etherStatsUndersizePkts", 0x38},
+	{"etherStatsPkts", 0x3C},
+	{"etherStatsPkts64Octets", 0x40},
+	{"etherStatsPkts65to127Octets", 0x44},
+	{"etherStatsPkts128to255Octets", 0x48},
+	{"etherStatsPkts256to511Octets", 0x4C},
+	{"etherStatsPkts512to1023Octets", 0x50},
+	{"etherStatsPkts1024to1518Octets", 0x54},
+	{"etherStatsPkts1519toMaxOctets", 0x58},
+	{"etherStatsOversizePkts", 0x5C},
+	{"etherStatsJabbers", 0x60},
+	{"etherStatsFragments", 0x64},
+	{"aCBFCPAUSEFramesReceived_0", 0x68},
+	{"aCBFCPAUSEFramesReceived_1", 0x6C},
+	{"aCBFCPAUSEFramesReceived_2", 0x70},
+	{"aCBFCPAUSEFramesReceived_3", 0x74},
+	{"aCBFCPAUSEFramesReceived_4", 0x78},
+	{"aCBFCPAUSEFramesReceived_5", 0x7C},
+	{"aCBFCPAUSEFramesReceived_6", 0x80},
+	{"aCBFCPAUSEFramesReceived_7", 0x84},
+	{"aMACControlFramesReceived", 0x88}
+};
+
+static struct gaudi_nic_ethtool_stats gaudi_nic_mac_stats_tx[] = {
+	{"etherStatsOctets", 0x0},
+	{"OctetsTransmittedOK", 0x4},
+	{"aPAUSEMACCtrlFramesTransmitted", 0x8},
+	{"aFramesTransmittedOK", 0xC},
+	{"VLANTransmittedOK", 0x10},
+	{"ifOutErrors", 0x14},
+	{"ifOutUcastPkts", 0x18},
+	{"ifOutMulticastPkts", 0x1C},
+	{"ifOutBroadcastPkts", 0x20},
+	{"etherStatsPkts64Octets", 0x24},
+	{"etherStatsPkts65to127Octets", 0x28},
+	{"etherStatsPkts128to255Octets", 0x2C},
+	{"etherStatsPkts256to511Octets", 0x30},
+	{"etherStatsPkts512to1023Octets", 0x34},
+	{"etherStatsPkts1024to1518Octets", 0x38},
+	{"etherStatsPkts1519toMaxOctets", 0x3C},
+	{"aCBFCPAUSEFramesTransmitted_0", 0x40},
+	{"aCBFCPAUSEFramesTransmitted_1", 0x44},
+	{"aCBFCPAUSEFramesTransmitted_2", 0x48},
+	{"aCBFCPAUSEFramesTransmitted_3", 0x4C},
+	{"aCBFCPAUSEFramesTransmitted_4", 0x50},
+	{"aCBFCPAUSEFramesTransmitted_5", 0x54},
+	{"aCBFCPAUSEFramesTransmitted_6", 0x58},
+	{"aCBFCPAUSEFramesTransmitted_7", 0x5C},
+	{"aMACControlFramesTransmitted", 0x60},
+	{"etherStatsPkts", 0x64}
+};
+
+static struct gaudi_nic_spmu_event_type gaudi_nic0_spmu_event_type[] = {
+	{"requester_psn_out_of_range", 18},
+	{"responder_duplicate_psn", 21},
+	{"responder_out_of_sequence_psn", 22}
+};
+
+static struct gaudi_nic_spmu_event_type gaudi_nic1_spmu_event_type[] = {
+	{"requester_psn_out_of_range", 6},
+	{"responder_duplicate_psn", 9},
+	{"responder_out_of_sequence_psn", 10}
+};
+
+static struct gaudi_nic_xpcs91_reg_type gaudi_nic_xpcs91_reg_type[] = {
+	{"correctable_errors", 0x2, 0x3},
+	{"uncorrectable_errors", 0x4, 0x5}
+};
+
+static struct gaudi_nic_sw_cnt_type gaudi_nic_sw_cnt_type[] = {
+	{"pcs_local_faults"},
+	{"pcs_remote_faults"},
+};
+
+static void gaudi_nic_get_drvinfo(struct net_device *netdev,
+					struct ethtool_drvinfo *drvinfo)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	strlcpy(drvinfo->driver, HL_NAME, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->fw_version, hdev->asic_prop.cpucp_info.cpucp_version,
+		sizeof(drvinfo->fw_version));
+	if (hdev->pdev)
+		strlcpy(drvinfo->bus_info, pci_name(hdev->pdev),
+				sizeof(drvinfo->bus_info));
+}
+
+static int gaudi_nic_get_link_ksettings(struct net_device *netdev,
+					struct ethtool_link_ksettings *cmd)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port, speed;
+
+	port = gaudi_nic->port;
+	speed = gaudi_nic->speed;
+
+	cmd->base.speed = speed;
+	cmd->base.duplex = DUPLEX_FULL;
+
+	ethtool_link_ksettings_zero_link_mode(cmd, supported);
+	ethtool_link_ksettings_zero_link_mode(cmd, advertising);
+
+	ethtool_add_mode(cmd, supported, 100000baseCR4_Full);
+	ethtool_add_mode(cmd, supported, 100000baseSR4_Full);
+	ethtool_add_mode(cmd, supported, 100000baseKR4_Full);
+	ethtool_add_mode(cmd, supported, 100000baseLR4_ER4_Full);
+
+	ethtool_add_mode(cmd, supported, 50000baseSR2_Full);
+	ethtool_add_mode(cmd, supported, 50000baseCR2_Full);
+	ethtool_add_mode(cmd, supported, 50000baseKR2_Full);
+
+	if (speed == SPEED_100000) {
+		ethtool_add_mode(cmd, advertising, 100000baseCR4_Full);
+		ethtool_add_mode(cmd, advertising, 100000baseSR4_Full);
+		ethtool_add_mode(cmd, advertising, 100000baseKR4_Full);
+		ethtool_add_mode(cmd, advertising, 100000baseLR4_ER4_Full);
+
+		cmd->base.port = PORT_FIBRE;
+
+		ethtool_add_mode(cmd, supported, FIBRE);
+		ethtool_add_mode(cmd, advertising, FIBRE);
+
+		ethtool_add_mode(cmd, supported, Backplane);
+		ethtool_add_mode(cmd, advertising, Backplane);
+	} else if (speed == SPEED_50000) {
+		ethtool_add_mode(cmd, advertising, 50000baseSR2_Full);
+		ethtool_add_mode(cmd, advertising, 50000baseCR2_Full);
+		ethtool_add_mode(cmd, advertising, 50000baseKR2_Full);
+	} else {
+		dev_err(hdev->dev, "unknown speed %d, port %d\n", speed, port);
+		return -EFAULT;
+	}
+
+	ethtool_add_mode(cmd, supported, Autoneg);
+
+	if (gaudi_nic->auto_neg_enable) {
+		ethtool_add_mode(cmd, advertising, Autoneg);
+		cmd->base.autoneg = AUTONEG_ENABLE;
+		if (gaudi_nic->auto_neg_resolved)
+			ethtool_add_mode(cmd, lp_advertising, Autoneg);
+	} else {
+		cmd->base.autoneg = AUTONEG_DISABLE;
+	}
+
+	ethtool_add_mode(cmd, supported, Pause);
+
+	if (gaudi_nic->pfc_enable)
+		ethtool_add_mode(cmd, advertising, Pause);
+
+	return 0;
+}
+
+static bool check_ksettings(const struct ethtool_link_ksettings *old_cmd,
+				const struct ethtool_link_ksettings *new_cmd)
+{
+	/* only autoneg and speed are mutable */
+	return (old_cmd->base.duplex == new_cmd->base.duplex) &&
+		(old_cmd->base.port == new_cmd->base.port) &&
+		(old_cmd->base.phy_address == new_cmd->base.phy_address) &&
+		(old_cmd->base.eth_tp_mdix_ctrl ==
+				new_cmd->base.eth_tp_mdix_ctrl) &&
+		bitmap_empty(new_cmd->link_modes.advertising,
+				__ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static int gaudi_nic_set_link_ksettings(struct net_device *netdev,
+				const struct ethtool_link_ksettings *cmd)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct ethtool_link_ksettings curr_cmd = {0};
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	u32 port = gaudi_nic->port;
+	struct hl_device *hdev;
+	bool auto_neg;
+	int rc, speed;
+
+	hdev = gaudi_nic->hdev;
+
+	rc = gaudi_nic_get_link_ksettings(netdev, &curr_cmd);
+	if (rc)
+		return rc;
+
+	if (!check_ksettings(&curr_cmd, cmd))
+		return -EOPNOTSUPP;
+
+	speed = cmd->base.speed;
+	auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
+
+	switch (speed) {
+	case SPEED_10000:
+	case SPEED_25000:
+	case SPEED_50000:
+		if (gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4) {
+			dev_err(hdev->dev,
+				"NIC %d with 4 lanes should be used only with speed of 100000Mb/s\n",
+				port);
+			return -EFAULT;
+		}
+		break;
+	case SPEED_100000:
+		break;
+	default:
+		dev_err(hdev->dev, "got invalid speed %dMb/s for NIC %d",
+			speed, port);
+		return -EINVAL;
+	}
+
+	if ((gaudi_nic->speed == speed) &&
+			(gaudi_nic->auto_neg_enable == auto_neg))
+		return 0;
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		dev_err(hdev->dev, "port %d is in reset, can't change speed",
+			port);
+		return -EBUSY;
+	}
+
+	gaudi_nic->speed = speed;
+	if (auto_neg)
+		hdev->nic_auto_neg_mask |= BIT(port);
+	else
+		hdev->nic_auto_neg_mask &= ~BIT(port);
+
+	if (gaudi_nic->enabled) {
+		rc = gaudi_nic_port_reset(gaudi_nic);
+		if (rc)
+			dev_err(hdev->dev,
+				"Failed to reset NIC %d for speed change, rc %d",
+				port, rc);
+	}
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+
+	return rc;
+}
+
+static u32 gaudi_nic_get_link(struct net_device *netdev)
+{
+	return netif_carrier_ok(netdev);
+}
+
+static void gaudi_nic_get_internal_strings(struct net_device *netdev,
+					u8 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_spmu_event_type *spmu_stats;
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	u32 port = gaudi_nic->port;
+	u32 num_spmus;
+	u32 i;
+
+	if (port & 1) {
+		num_spmus = NIC_SPMU1_STATS_LEN;
+		spmu_stats = gaudi_nic1_spmu_event_type;
+	} else {
+		num_spmus = NIC_SPMU0_STATS_LEN;
+		spmu_stats = gaudi_nic0_spmu_event_type;
+	}
+
+	for (i = 0 ; i < num_spmus ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				spmu_stats[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_MAC_STATS_RX_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_mac_stats_rx[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_XPCS91_REGS_CNT_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_xpcs91_reg_type[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_SW_CNT_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_sw_cnt_type[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_MAC_STATS_TX_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_mac_stats_tx[i].stat_string,
+				ETH_GSTRING_LEN);
+
+}
+
+static void gaudi_nic_get_strings(struct net_device *netdev, u32 stringset,
+					u8 *data)
+{
+	int i;
+
+	if (stringset == ETH_SS_STATS) {
+		for (i = 0; i < NIC_STATS_LEN; i++)
+			memcpy(data + i * ETH_GSTRING_LEN,
+					gaudi_nic_ethtool_stats[i].stat_string,
+					ETH_GSTRING_LEN);
+		gaudi_nic_get_internal_strings(netdev,
+					data + i * ETH_GSTRING_LEN);
+	}
+}
+
+static int gaudi_nic_get_sset_count(struct net_device *netdev, int sset)
+{
+	int num_spmus, mac_counters, xpcs91_counters, sw_counetrs;
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	u32 port = gaudi_nic->port;
+
+	num_spmus = (port & 1) ? NIC_SPMU1_STATS_LEN : NIC_SPMU0_STATS_LEN;
+	mac_counters = NIC_MAC_STATS_RX_LEN + NIC_MAC_STATS_TX_LEN;
+	xpcs91_counters = NIC_XPCS91_REGS_CNT_LEN;
+	sw_counetrs = NIC_SW_CNT_LEN;
+
+	switch (sset) {
+	case ETH_SS_STATS:
+		return NIC_STATS_LEN + num_spmus + mac_counters +
+			xpcs91_counters + sw_counetrs;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+static u64 gaudi_nic_read_mac_counter(struct hl_device *hdev, u32 port,
+						int offset, bool is_rx)
+{
+	u64 lo_part, hi_part;
+	u64 start_reg;
+
+	if (!hdev->supports_coresight)
+		return 0;
+
+	if (is_rx)
+		if (port & 1)
+			start_reg = NIC_MAC_RX_PORT1_OFFSET;
+		else
+			start_reg = NIC_MAC_RX_PORT0_OFFSET;
+	else
+		if (port & 1)
+			start_reg = NIC_MAC_TX_PORT1_OFFSET;
+		else
+			start_reg = NIC_MAC_TX_PORT0_OFFSET;
+
+	lo_part = NIC_MAC_STAT_RREG32(port, start_reg + offset);
+	/* Volatile read: MUST read high part after low */
+	hi_part = NIC_MAC_STAT_RREG32(port, NIC_MAC_STAT_HI_PART);
+
+	return lo_part | (hi_part << 32);
+}
+
+static void gaudi_nic_read_xpcs91_regs(struct gaudi_nic_device *gaudi_nic,
+					u64 *out_data)
+{
+	u32 lo_part, hi_part, start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+
+	lo_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[0].lo_offset);
+	hi_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[0].hi_offset);
+	gaudi_nic->correctable_errors_cnt +=
+					(hi_part << 16) | lo_part;
+	out_data[0] = gaudi_nic->correctable_errors_cnt;
+
+	lo_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[1].lo_offset);
+	hi_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[1].hi_offset);
+	gaudi_nic->uncorrectable_errors_cnt +=
+					(hi_part << 16) | lo_part;
+	out_data[1] = gaudi_nic->uncorrectable_errors_cnt;
+}
+
+static void gaudi_nic_read_sw_counters(struct gaudi_nic_device *gaudi_nic,
+					u64 *out_data)
+{
+	out_data[0] = gaudi_nic->pcs_local_fault_cnt;
+	out_data[1] = gaudi_nic->pcs_remote_fault_cnt;
+}
+
+static void gaudi_nic_get_internal_stats(struct net_device *netdev, u64 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 num_spmus;
+	int i;
+
+	num_spmus = (port & 1) ? NIC_SPMU1_STATS_LEN : NIC_SPMU0_STATS_LEN;
+
+	gaudi_sample_spmu_nic(hdev, port, num_spmus, data);
+	data += num_spmus;
+
+	for (i = 1 ; i < NIC_MAC_STATS_RX_LEN ; i++)
+		data[i] = gaudi_nic_read_mac_counter(hdev, port,
+				gaudi_nic_mac_stats_rx[i].stat_offset, true);
+	data += i;
+
+	gaudi_nic_read_xpcs91_regs(gaudi_nic, data);
+	data += NIC_XPCS91_REGS_CNT_LEN;
+
+	gaudi_nic_read_sw_counters(gaudi_nic, data);
+	data += NIC_SW_CNT_LEN;
+
+	for (i = 1 ; i < NIC_MAC_STATS_TX_LEN ; i++)
+		data[i] = gaudi_nic_read_mac_counter(hdev, port,
+				gaudi_nic_mac_stats_tx[i].stat_offset, false);
+}
+
+static void gaudi_nic_get_ethtool_stats(struct net_device *netdev,
+					struct ethtool_stats *stats, u64 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	char *p;
+	int i;
+
+	if (disabled_or_in_reset(gaudi_nic)) {
+		dev_info_ratelimited(hdev->dev,
+			"port %d is in reset, can't get ethtool stats", port);
+		return;
+	}
+
+	for (i = 0; i < NIC_STATS_LEN ; i++) {
+		p = (char *) netdev + gaudi_nic_ethtool_stats[i].stat_offset;
+		data[i] = *(u32 *) p;
+	}
+
+	gaudi_nic_get_internal_stats(netdev, data + i);
+}
+
+static int gaudi_nic_get_module_info(struct net_device *netdev,
+					struct ethtool_modinfo *modinfo)
+{
+	modinfo->type = ETH_MODULE_SFF_8636;
+	modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+
+	return 0;
+}
+
+static int gaudi_nic_get_module_eeprom(struct net_device *netdev,
+					struct ethtool_eeprom *ee, u8 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 first, last, len;
+
+	if (ee->len == 0)
+		return -EINVAL;
+
+	first = ee->offset;
+	last = ee->offset + ee->len;
+
+	if (first < ETH_MODULE_SFF_8636_LEN) {
+		len = min_t(unsigned int, last, ETH_MODULE_SFF_8079_LEN);
+		len -= first;
+
+		memcpy(data, hdev->asic_prop.cpucp_nic_info.qsfp_eeprom + first,
+			len);
+	}
+
+	return 0;
+}
+
+/* enable spmus for ethtool monitoring */
+void gaudi_nic_spmu_init(struct hl_device *hdev, int port)
+{
+	u32 spmu_events[NIC_SPMU_STATS_LEN_MAX], num_event_types;
+	struct gaudi_nic_spmu_event_type *event_types;
+	int rc, i;
+
+	if (port & 1) {
+		num_event_types = NIC_SPMU1_STATS_LEN;
+		event_types = gaudi_nic1_spmu_event_type;
+	} else {
+		num_event_types = NIC_SPMU0_STATS_LEN;
+		event_types = gaudi_nic0_spmu_event_type;
+	}
+
+	if (num_event_types > NIC_SPMU_STATS_LEN_MAX)
+		num_event_types = NIC_SPMU_STATS_LEN_MAX;
+
+	for (i = 0 ; i < num_event_types ; i++)
+		spmu_events[i] = event_types[i].index;
+
+	rc = gaudi_config_spmu_nic(hdev, port, num_event_types,
+			spmu_events);
+	if (rc)
+		dev_err(hdev->dev,
+			"Failed to configure spmu for NIC port %d\n",
+			port);
+}
+
+u64 gaudi_nic_read_mac_stat_counter(struct hl_device *hdev, u32 port, int idx,
+					bool is_rx)
+{
+	struct gaudi_nic_ethtool_stats *stat = is_rx ?
+						&gaudi_nic_mac_stats_rx[idx] :
+						&gaudi_nic_mac_stats_tx[idx];
+
+	return gaudi_nic_read_mac_counter(hdev, port, stat->stat_offset, is_rx);
+}
+
+const struct ethtool_ops gaudi_nic_ethtool_ops = {
+	.get_drvinfo = gaudi_nic_get_drvinfo,
+	.get_link_ksettings = gaudi_nic_get_link_ksettings,
+	.set_link_ksettings = gaudi_nic_set_link_ksettings,
+	.get_link = gaudi_nic_get_link,
+	.get_strings = gaudi_nic_get_strings,
+	.get_sset_count = gaudi_nic_get_sset_count,
+	.get_ethtool_stats = gaudi_nic_get_ethtool_stats,
+	.get_module_info   = gaudi_nic_get_module_info,
+	.get_module_eeprom = gaudi_nic_get_module_eeprom,
+};
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 13/14] habanalabs/gaudi: support DCB protocol
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (10 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 12/14] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 17:10 ` [PATCH v3 14/14] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add DCB support to configure the NIC's Priority Flow Control (PFC).
The added support is minimal because a full support is not
currently required.

A summary of the supported callbacks:

- ieee_getpfc: get the current PFC configuration. PFC is enabled by
               default.
- ieee_setpfc: set PFC configuration. Only 0 or all 4 priorities can be
               enabled, no subset is allowed.
- getdcbx: get DCBX capability.
- setdcbx: set DCBX capability. Only host LLDP agent and IEEE protocol
           flavors are supported.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/Makefile        |   2 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     |   3 +
 .../misc/habanalabs/gaudi/gaudi_nic_dcbnl.c   | 108 ++++++++++++++++++
 3 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c

diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index df674c5973e0..0345c91c40f8 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -3,4 +3,4 @@ HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
 HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o \
-	gaudi/gaudi_nic_ethtool.o
+	gaudi/gaudi_nic_ethtool.o gaudi/gaudi_nic_dcbnl.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index c97e5f0e1c53..83d369ffbd89 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -2779,6 +2779,9 @@ static int port_register(struct hl_device *hdev, int port)
 
 	ndev->netdev_ops = &gaudi_nic_netdev_ops;
 	ndev->ethtool_ops = &gaudi_nic_ethtool_ops;
+#ifdef CONFIG_DCB
+	ndev->dcbnl_ops = &gaudi_nic_dcbnl_ops;
+#endif
 	ndev->watchdog_timeo = NIC_TX_TIMEOUT;
 	ndev->min_mtu = ETH_MIN_MTU;
 	ndev->max_mtu = NIC_MAX_MTU;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c b/drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
new file mode 100644
index 000000000000..87394f50400a
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+
+#define PFC_PRIO_NUM		4
+#define PFC_PRIO_MASK_ALL	GENMASK(PFC_PRIO_NUM - 1, 0)
+#define PFC_PRIO_MASK_NONE	0
+#define PFC_STAT_TX_OFFSET	17
+#define PFC_STAT_RX_OFFSET	27
+
+#ifdef CONFIG_DCB
+static int gaudi_nic_dcbnl_ieee_getpfc(struct net_device *netdev,
+					struct ieee_pfc *pfc)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int rc = 0, i, tx_idx, rx_idx;
+	u32 port = gaudi_nic->port;
+
+	if (disabled_or_in_reset(gaudi_nic)) {
+		dev_info_ratelimited(hdev->dev,
+				"port %d is in reset, can't get PFC", port);
+		return -EBUSY;
+	}
+
+	pfc->pfc_en = gaudi_nic->pfc_enable ? PFC_PRIO_MASK_ALL :
+							PFC_PRIO_MASK_NONE;
+	pfc->pfc_cap = PFC_PRIO_NUM;
+
+	for (i = 0 ; i < PFC_PRIO_NUM ; i++) {
+		tx_idx = PFC_STAT_TX_OFFSET + i;
+		rx_idx = PFC_STAT_RX_OFFSET + i;
+
+		pfc->requests[i] = gaudi_nic_read_mac_stat_counter(hdev, port,
+								tx_idx, false);
+		pfc->indications[i] = gaudi_nic_read_mac_stat_counter(hdev,
+							port, rx_idx, true);
+	}
+
+	return rc;
+}
+
+static int gaudi_nic_dcbnl_ieee_setpfc(struct net_device *netdev,
+					struct ieee_pfc *pfc)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u8 curr_pfc_en;
+
+	if (pfc->pfc_en & ~PFC_PRIO_MASK_ALL) {
+		dev_info_ratelimited(hdev->dev,
+					"PFC supports %d priorities only, port %d\n",
+					PFC_PRIO_NUM, port);
+		return -EINVAL;
+	}
+
+	if ((pfc->pfc_en != PFC_PRIO_MASK_NONE) &&
+			(pfc->pfc_en != PFC_PRIO_MASK_ALL)) {
+		dev_info_ratelimited(hdev->dev,
+					"PFC should be enabled/disabled on all priorities, port %d\n",
+					port);
+		return -EINVAL;
+	}
+
+	if (disabled_or_in_reset(gaudi_nic)) {
+		dev_info_ratelimited(hdev->dev,
+				"port %d is in reset, can't set PFC", port);
+		return -EBUSY;
+	}
+
+	curr_pfc_en = gaudi_nic->pfc_enable ? PFC_PRIO_MASK_ALL :
+							PFC_PRIO_MASK_NONE;
+
+	if (pfc->pfc_en == curr_pfc_en)
+		return 0;
+
+	gaudi_nic->pfc_enable = !gaudi_nic->pfc_enable;
+
+	gaudi_nic_set_pfc(gaudi_nic);
+
+	return 0;
+}
+
+static u8 gaudi_nic_dcbnl_getdcbx(struct net_device *netdev)
+{
+	return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
+}
+
+static u8 gaudi_nic_dcbnl_setdcbx(struct net_device *netdev, u8 mode)
+{
+	return !(mode == (DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE));
+}
+
+const struct dcbnl_rtnl_ops gaudi_nic_dcbnl_ops = {
+	.ieee_getpfc	= gaudi_nic_dcbnl_ieee_getpfc,
+	.ieee_setpfc	= gaudi_nic_dcbnl_ieee_setpfc,
+	.getdcbx	= gaudi_nic_dcbnl_getdcbx,
+	.setdcbx	= gaudi_nic_dcbnl_setdcbx
+};
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 14/14] habanalabs/gaudi: add NIC init/fini calls from common code
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (11 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 13/14] habanalabs/gaudi: support DCB protocol Oded Gabbay
@ 2020-09-15 17:10 ` Oded Gabbay
  2020-09-15 20:35 ` [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: SW_Drivers, gregkh, davem, kuba, andrew, f.fainelli, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Finally, enable the NIC engines. Initialize the NIC ports mask variable
with full mask so all ports will be initialized.

Call the NIC init/fini from the common code.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/device.c       | 18 +++++++++++++++
 drivers/misc/habanalabs/common/habanalabs.h   |  6 +++++
 .../misc/habanalabs/common/habanalabs_drv.c   |  1 +
 drivers/misc/habanalabs/common/pci.c          |  1 +
 drivers/misc/habanalabs/gaudi/gaudi.c         | 23 +++++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           | 12 ++++++++++
 6 files changed, 61 insertions(+)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 73d64f84aeba..dd815a545160 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -1083,6 +1083,12 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
 			goto out_err;
 		}
 
+		rc = hdev->asic_funcs->nic_init(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "Failed to init NIC driver\n");
+			goto out_err;
+		}
+
 		hl_set_max_power(hdev);
 	} else {
 		rc = hdev->asic_funcs->soft_reset_late_init(hdev);
@@ -1318,6 +1324,13 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 		goto out_disabled;
 	}
 
+	rc = hdev->asic_funcs->nic_init(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to init NIC driver\n");
+		rc = 0;
+		goto out_disabled;
+	}
+
 	/*
 	 * Expose devices and sysfs nodes to user.
 	 * From here there is no need to add char devices and create sysfs nodes
@@ -1469,6 +1482,11 @@ void hl_device_fini(struct hl_device *hdev)
 
 	hl_cb_pool_fini(hdev);
 
+	/* the NIC uses the kernel context for MMU mappings, therefore must be
+	 * cleaned before it
+	 */
+	hdev->asic_funcs->nic_fini(hdev);
+
 	/* Release kernel context */
 	if ((hdev->kernel_ctx) && (hl_ctx_put(hdev->kernel_ctx) != 1))
 		dev_err(hdev->dev, "kernel ctx is still alive\n");
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 65bc2527338b..d6130715a0ef 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -697,6 +697,10 @@ struct hl_info_mac_addr;
  *                    then the timeout is the default timeout for the specific
  *                    ASIC
  * @get_hw_state: retrieve the H/W state
+ * @nic_init: init the NIC H/W and I/F. This should be called in the final satge
+ *            of the init flow, as we must not have anything that might fail
+ *            during its initialization after the NIC init.
+ * @nic_fini: perform NIC cleanup.
  * @nic_control: Perform NIC related operations.
  * @nic_cq_mmap: map the NIC CQ buffer.
  * @pci_bars_map: Map PCI BARs.
@@ -803,6 +807,8 @@ struct hl_asic_funcs {
 	int (*send_cpu_message)(struct hl_device *hdev, u32 *msg,
 				u16 len, u32 timeout, long *result);
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
+	int (*nic_init)(struct hl_device *hdev);
+	void (*nic_fini)(struct hl_device *hdev);
 	int (*nic_control)(struct hl_device *hdev, u32 op, void *input,
 				void *output);
 	int (*nic_cq_mmap)(struct hl_device *hdev, struct vm_area_struct *vma);
diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c
index b7fbbe8f2577..e99e84b6a787 100644
--- a/drivers/misc/habanalabs/common/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/common/habanalabs_drv.c
@@ -242,6 +242,7 @@ static void set_driver_behavior_per_device(struct hl_device *hdev)
 	hdev->bmc_enable = 1;
 	hdev->hard_reset_on_fw_events = 1;
 	hdev->card_type = cpucp_card_type_pci;
+	hdev->nic_ports_mask = 0x3FF;
 	hdev->nic_ports_ext_mask = 0x3FF;
 	hdev->nic_auto_neg_mask = 0x3FF;
 	hdev->nic_load_fw = 0;
diff --git a/drivers/misc/habanalabs/common/pci.c b/drivers/misc/habanalabs/common/pci.c
index 923b2606e29f..c376ab4695ab 100644
--- a/drivers/misc/habanalabs/common/pci.c
+++ b/drivers/misc/habanalabs/common/pci.c
@@ -230,6 +230,7 @@ int hl_pci_set_inbound_region(struct hl_device *hdev, u8 region,
 			lower_32_bits(pci_region->addr));
 	rc |= hl_pci_iatu_write(hdev, offset + 0x18,
 			upper_32_bits(pci_region->addr));
+	/* Set bar type as memory */
 	rc |= hl_pci_iatu_write(hdev, offset + 0x0, 0);
 
 	/* Enable + bar/address match + match enable + bar number */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 2af07eb4165c..836391ddb890 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -883,6 +883,27 @@ static void gaudi_late_fini(struct hl_device *hdev)
 	hdev->hl_chip_info->info = NULL;
 }
 
+static int gaudi_nic_init(struct hl_device *hdev)
+{
+	/*
+	 * In init flow we initialize the NIC ports from scratch. In hard reset
+	 * flow, we get here after the NIC ports were halted, hence we only
+	 * need to reopen them.
+	 */
+	if (atomic_read(&hdev->in_reset)) {
+		gaudi_nic_ports_reopen(hdev);
+		return 0;
+	}
+
+	return gaudi_nic_ports_init(hdev);
+}
+
+static void gaudi_nic_fini(struct hl_device *hdev)
+{
+	/* must be called after MSI was disabled */
+	gaudi_nic_ports_fini(hdev);
+}
+
 static void gaudi_nic_handle_rx(struct gaudi_nic_device *gaudi_nic)
 {
 	/* at this point, interrupts were disabled by the H/W */
@@ -7484,6 +7505,8 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.get_eeprom_data = gaudi_get_eeprom_data,
 	.send_cpu_message = gaudi_send_cpu_message,
 	.get_hw_state = gaudi_get_hw_state,
+	.nic_init = gaudi_nic_init,
+	.nic_fini = gaudi_nic_fini,
 	.nic_control = gaudi_nic_control,
 	.nic_cq_mmap = gaudi_nic_cq_mmap,
 	.pci_bars_map = gaudi_pci_bars_map,
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 9620654eefae..76f855fbc4d5 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5269,6 +5269,16 @@ static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 	return RREG32(mmHW_STATE);
 }
 
+static int goya_nic_init(struct hl_device *hdev)
+{
+	return 0;
+}
+
+static void goya_nic_fini(struct hl_device *hdev)
+{
+
+}
+
 static int goya_nic_control(struct hl_device *hdev, u32 op, void *input,
 			void *output)
 {
@@ -5409,6 +5419,8 @@ static const struct hl_asic_funcs goya_funcs = {
 	.get_eeprom_data = goya_get_eeprom_data,
 	.send_cpu_message = goya_send_cpu_message,
 	.get_hw_state = goya_get_hw_state,
+	.nic_init = goya_nic_init,
+	.nic_fini = goya_nic_fini,
 	.nic_control = goya_nic_control,
 	.nic_cq_mmap = goya_nic_mmap,
 	.pci_bars_map = goya_pci_bars_map,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (12 preceding siblings ...)
  2020-09-15 17:10 ` [PATCH v3 14/14] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
@ 2020-09-15 20:35 ` Jakub Kicinski
  2020-09-15 20:46   ` Oded Gabbay
  2020-09-15 20:42 ` David Miller
  2020-09-18 12:00 ` Jason Gunthorpe
  15 siblings, 1 reply; 83+ messages in thread
From: Jakub Kicinski @ 2020-09-15 20:35 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-kernel, netdev, SW_Drivers, gregkh, davem, andrew,
	f.fainelli, linux-rdma

On Tue, 15 Sep 2020 20:10:08 +0300 Oded Gabbay wrote:
> Hello,
> 
> This is the second version of the patch-set to upstream the GAUDI NIC code
> into the habanalabs driver.
> 
> The only modification from v2 is in the ethtool patch (patch 12). Details
> are in that patch's commit message.

You keep reposting this, yet this SDK shim^W^W driver is still living in
drivers/misc. If you want to make it look like a NIC, the code belongs
where NIC drivers are.

Then again, is it a NIC? Why do you have those custom IOCTLs? That's far
from normal.

Please make sure to CC linux-rdma. You clearly stated that the device
does RDMA-like transfers.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (13 preceding siblings ...)
  2020-09-15 20:35 ` [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
@ 2020-09-15 20:42 ` David Miller
  2020-09-15 20:49   ` Oded Gabbay
  2020-09-18 12:00 ` Jason Gunthorpe
  15 siblings, 1 reply; 83+ messages in thread
From: David Miller @ 2020-09-15 20:42 UTC (permalink / raw)
  To: oded.gabbay
  Cc: linux-kernel, netdev, SW_Drivers, gregkh, kuba, andrew, f.fainelli

From: Oded Gabbay <oded.gabbay@gmail.com>
Date: Tue, 15 Sep 2020 20:10:08 +0300

> This is the second version of the patch-set to upstream the GAUDI NIC code
> into the habanalabs driver.
> 
> The only modification from v2 is in the ethtool patch (patch 12). Details
> are in that patch's commit message.
> 
> Link to v2 cover letter:
> https://lkml.org/lkml/2020/9/12/201

I agree with Jakub, this driver definitely can't go-in as it is currently
structured and designed.  And because of the RDMA'ness of it, the RDMA
folks have to be CC:'d and have a chance to review this.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 20:35 ` [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
@ 2020-09-15 20:46   ` Oded Gabbay
  2020-09-15 21:04     ` Jakub Kicinski
  2020-09-17 17:18     ` Jason Gunthorpe
  0 siblings, 2 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 20:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Tue, Sep 15, 2020 at 11:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 15 Sep 2020 20:10:08 +0300 Oded Gabbay wrote:
> > Hello,
> >
> > This is the second version of the patch-set to upstream the GAUDI NIC code
> > into the habanalabs driver.
> >
> > The only modification from v2 is in the ethtool patch (patch 12). Details
> > are in that patch's commit message.
>
> You keep reposting this, yet this SDK shim^W^W driver is still living in
> drivers/misc. If you want to make it look like a NIC, the code belongs
> where NIC drivers are.
>
> Then again, is it a NIC? Why do you have those custom IOCTLs? That's far
> from normal.

Hi Jakub,
I'm sorry but from your question it seems as if you didn't read my
cover letter at all, as I took great lengths in explaining exactly
what our device is and why we use custom IOCTLs.
TL;DR
We have an accelerator for deep learning (GAUDI) which uses RDMA as
infrastructure for communication between multiple accelerators. Same
as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
The RDMA implementation we did does NOT support some basic RDMA
IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
library or to connect to the rdma infrastructure in the kernel. We
wanted to do it but when we analyzed it, we saw we wouldn't be able to
support basic stuff and therefore we had to revert to our IOCTLs.
To sum it up, because our NIC is used for intra-communication, we
don't expose nor intend users to use it as a NIC per-se. However, to
be able to get statistics and manage them in a standard way, and
support control plane over Ethernet, we do register each port to the
net subsystem (i.e. create netdev per port).

I hope this short summary explains this better.
As per your request that this code lives in the net subsystem, I think
that will make it only more complicated and hard to upstream and
maintain.
I see there are other examples (e.g. sgi-xp) that contain networking
driver code in misc so I don't understand this objection.
>
> Please make sure to CC linux-rdma. You clearly stated that the device
> does RDMA-like transfers.

We don't use the RDMA infrastructure in the kernel and we can't
connect to it due to the lack of H/W support we have so I don't see
why we need to CC linux-rdma.

Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 20:42 ` David Miller
@ 2020-09-15 20:49   ` Oded Gabbay
  2020-09-16  6:26     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 20:49 UTC (permalink / raw)
  To: David Miller
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, Jakub Kicinski, Andrew Lunn,
	Florian Fainelli

On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
>
> From: Oded Gabbay <oded.gabbay@gmail.com>
> Date: Tue, 15 Sep 2020 20:10:08 +0300
>
> > This is the second version of the patch-set to upstream the GAUDI NIC code
> > into the habanalabs driver.
> >
> > The only modification from v2 is in the ethtool patch (patch 12). Details
> > are in that patch's commit message.
> >
> > Link to v2 cover letter:
> > https://lkml.org/lkml/2020/9/12/201
>
> I agree with Jakub, this driver definitely can't go-in as it is currently
> structured and designed.
Why is that ?
Can you please point to the things that bother you or not working correctly?
I can't really fix the driver if I don't know what's wrong.

In addition, please read my reply to Jakub with the explanation of why
we designed this driver as is.

And because of the RDMA'ness of it, the RDMA
> folks have to be CC:'d and have a chance to review this.
As I said to Jakub, the driver doesn't use the RDMA infrastructure in
the kernel and we can't connect to it due to the lack of H/W support
we have
Therefore, I don't see why we need to CC linux-rdma.
I understood why Greg asked me to CC you because we do connect to the
netdev and standard eth infrastructure, but regarding the RDMA, it's
not really the same.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 20:46   ` Oded Gabbay
@ 2020-09-15 21:04     ` Jakub Kicinski
  2020-09-15 21:20       ` Oded Gabbay
  2020-09-17 17:18     ` Jason Gunthorpe
  1 sibling, 1 reply; 83+ messages in thread
From: Jakub Kicinski @ 2020-09-15 21:04 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Tue, 15 Sep 2020 23:46:58 +0300 Oded Gabbay wrote:
> On Tue, Sep 15, 2020 at 11:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue, 15 Sep 2020 20:10:08 +0300 Oded Gabbay wrote:  
> > > Hello,
> > >
> > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > into the habanalabs driver.
> > >
> > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > are in that patch's commit message.  
> >
> > You keep reposting this, yet this SDK shim^W^W driver is still living in
> > drivers/misc. If you want to make it look like a NIC, the code belongs
> > where NIC drivers are.
> >
> > Then again, is it a NIC? Why do you have those custom IOCTLs? That's far
> > from normal.  
> 
> I'm sorry but from your question it seems as if you didn't read my
> cover letter at all, as I took great lengths in explaining exactly
> what our device is and why we use custom IOCTLs.
> TL;DR
> We have an accelerator for deep learning (GAUDI) which uses RDMA as
> infrastructure for communication between multiple accelerators. Same
> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> The RDMA implementation we did does NOT support some basic RDMA
> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> library or to connect to the rdma infrastructure in the kernel. We
> wanted to do it but when we analyzed it, we saw we wouldn't be able to
> support basic stuff and therefore we had to revert to our IOCTLs.
> To sum it up, because our NIC is used for intra-communication, we
> don't expose nor intend users to use it as a NIC per-se. However, to
> be able to get statistics and manage them in a standard way, and
> support control plane over Ethernet, we do register each port to the
> net subsystem (i.e. create netdev per port).
> 
> I hope this short summary explains this better.

I read your cover letter. Networking drivers don't get to define random
IOCTLs as they please. You have to take that part out of the "NIC"
driver.

> As per your request that this code lives in the net subsystem, I think
> that will make it only more complicated and hard to upstream and
> maintain.
> I see there are other examples (e.g. sgi-xp) that contain networking
> driver code in misc so I don't understand this objection.

The maintenance structure and CI systems for the kernel depend on the
directory layout. If you don't understand that I don't know how to help
you.

> > Please make sure to CC linux-rdma. You clearly stated that the device
> > does RDMA-like transfers.  
> 
> We don't use the RDMA infrastructure in the kernel and we can't
> connect to it due to the lack of H/W support we have so I don't see
> why we need to CC linux-rdma.

You have it backward. You don't get to pick and choose which parts of
the infrastructure you use, and therefore who reviews your drivers.
The device uses RDMA under the hood so Linux RDMA experts must very
much be okay with it getting merged. That's how we ensure Linux
interfaces are consistent and good quality.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 21:04     ` Jakub Kicinski
@ 2020-09-15 21:20       ` Oded Gabbay
  2020-09-15 21:37         ` Andrew Lunn
  2020-09-15 22:34         ` David Miller
  0 siblings, 2 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 21:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Wed, Sep 16, 2020 at 12:04 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 15 Sep 2020 23:46:58 +0300 Oded Gabbay wrote:
> > On Tue, Sep 15, 2020 at 11:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Tue, 15 Sep 2020 20:10:08 +0300 Oded Gabbay wrote:
> > > > Hello,
> > > >
> > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > into the habanalabs driver.
> > > >
> > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > are in that patch's commit message.
> > >
> > > You keep reposting this, yet this SDK shim^W^W driver is still living in
> > > drivers/misc. If you want to make it look like a NIC, the code belongs
> > > where NIC drivers are.
> > >
> > > Then again, is it a NIC? Why do you have those custom IOCTLs? That's far
> > > from normal.
> >
> > I'm sorry but from your question it seems as if you didn't read my
> > cover letter at all, as I took great lengths in explaining exactly
> > what our device is and why we use custom IOCTLs.
> > TL;DR
> > We have an accelerator for deep learning (GAUDI) which uses RDMA as
> > infrastructure for communication between multiple accelerators. Same
> > as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > The RDMA implementation we did does NOT support some basic RDMA
> > IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > library or to connect to the rdma infrastructure in the kernel. We
> > wanted to do it but when we analyzed it, we saw we wouldn't be able to
> > support basic stuff and therefore we had to revert to our IOCTLs.
> > To sum it up, because our NIC is used for intra-communication, we
> > don't expose nor intend users to use it as a NIC per-se. However, to
> > be able to get statistics and manage them in a standard way, and
> > support control plane over Ethernet, we do register each port to the
> > net subsystem (i.e. create netdev per port).
> >
> > I hope this short summary explains this better.
>
> I read your cover letter. Networking drivers don't get to define random
> IOCTLs as they please. You have to take that part out of the "NIC"
> driver.

The IOCTLs are not for the Ethernet part. They are strictly for the
RDMA operations. RDMA drivers also have IOCTLs as interfaces in the
drivers/infiniband area, so I don't think I'm doing something
different here.
And my driver is not networking. It is an accelerator which has some
network ports.
btw, this is only a single new IOCTL call. The rest of the IOCTLs are
already upstreamed and are for the rest of the ASIC's compute
functionality. What I'm trying to say is that it's very common to
define IOCTLs for accelerators.

>
> > As per your request that this code lives in the net subsystem, I think
> > that will make it only more complicated and hard to upstream and
> > maintain.
> > I see there are other examples (e.g. sgi-xp) that contain networking
> > driver code in misc so I don't understand this objection.
>
> The maintenance structure and CI systems for the kernel depend on the
> directory layout. If you don't understand that I don't know how to help
> you.
I completely understand but you didn't answer my question. How come
there are drivers which create netdev objects, and specifically sgi-xp
in misc (but I also saw it in usb drivers) that live outside
drivers/net ? Why doesn't your request apply to them as well ?
When we wrote the code, we saw those examples and therefore assumed it was fine.

>
> > > Please make sure to CC linux-rdma. You clearly stated that the device
> > > does RDMA-like transfers.
> >
> > We don't use the RDMA infrastructure in the kernel and we can't
> > connect to it due to the lack of H/W support we have so I don't see
> > why we need to CC linux-rdma.
>
> You have it backward. You don't get to pick and choose which parts of
> the infrastructure you use, and therefore who reviews your drivers.
> The device uses RDMA under the hood so Linux RDMA experts must very
> much be okay with it getting merged. That's how we ensure Linux
> interfaces are consistent and good quality.

I understand your point of view but If my H/W doesn't support the
basic requirements of the RDMA infrastructure and interfaces, then
really there is nothing I can do about it. I can't use them.
I wish I was able to use that infrastructure but I can't. That's why
we wrote the IOCTLs in our accelerator driver.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 21:20       ` Oded Gabbay
@ 2020-09-15 21:37         ` Andrew Lunn
  2020-09-15 21:43           ` Oded Gabbay
  2020-09-15 22:36           ` David Miller
  2020-09-15 22:34         ` David Miller
  1 sibling, 2 replies; 83+ messages in thread
From: Andrew Lunn @ 2020-09-15 21:37 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller,
	Florian Fainelli, linux-rdma

> I completely understand but you didn't answer my question. How come
> there are drivers which create netdev objects, and specifically sgi-xp
> in misc (but I also saw it in usb drivers) that live outside
> drivers/net ? Why doesn't your request apply to them as well ?
> When we wrote the code, we saw those examples and therefore assumed it was fine.

commit 45d9ca492e4bd1522d1b5bd125c2908f1cee3d4a
Author: Dean Nelson <dcn@sgi.com>
Date:   Tue Apr 22 14:46:56 2008 -0500

    [IA64] move XP and XPC to drivers/misc/sgi-xp
    
    Move XPC and XPNET from arch/ia64/sn/kernel to drivers/misc/sgi-xp.
    
    Signed-off-by: Dean Nelson <dcn@sgi.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>

It has been there a long time, and no networking person was involved
in its move.

drivers/usb/gadget/function/f_ncm.c
commit 00a2430ff07d4e0e0e7e24e02fd8adede333b797
Author: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Date:   Tue Jul 15 13:09:46 2014 +0200

    usb: gadget: Gadget directory cleanup - group usb functions
    
    The drivers/usb/gadget directory contains many files.
    Files which are related can be distributed into separate directories.
    This patch moves the USB functions implementations into a separate directory.
    
    Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
    Signed-off-by: Felipe Balbi <balbi@ti.com>

Again, old.

Can you find an example of a network driver added in the last couple
of years outside of drivers/met?

> > > > Please make sure to CC linux-rdma. You clearly stated that the device
> > > > does RDMA-like transfers.
> > >
> > > We don't use the RDMA infrastructure in the kernel and we can't
> > > connect to it due to the lack of H/W support we have so I don't see
> > > why we need to CC linux-rdma.
> >
> > You have it backward. You don't get to pick and choose which parts of
> > the infrastructure you use, and therefore who reviews your drivers.
> > The device uses RDMA under the hood so Linux RDMA experts must very
> > much be okay with it getting merged. That's how we ensure Linux
> > interfaces are consistent and good quality.
> 
> I understand your point of view but If my H/W doesn't support the
> basic requirements of the RDMA infrastructure and interfaces, then
> really there is nothing I can do about it. I can't use them.

It is up to the RDMA people to say that. They might see how the RDMA
core can be made to work for your hardware.

     Andrew

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 21:37         ` Andrew Lunn
@ 2020-09-15 21:43           ` Oded Gabbay
  2020-09-15 22:35             ` David Miller
  2020-09-15 22:36           ` David Miller
  1 sibling, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-15 21:43 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller,
	Florian Fainelli, linux-rdma

On Wed, Sep 16, 2020 at 12:37 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > I completely understand but you didn't answer my question. How come
> > there are drivers which create netdev objects, and specifically sgi-xp
> > in misc (but I also saw it in usb drivers) that live outside
> > drivers/net ? Why doesn't your request apply to them as well ?
> > When we wrote the code, we saw those examples and therefore assumed it was fine.
>
> commit 45d9ca492e4bd1522d1b5bd125c2908f1cee3d4a
> Author: Dean Nelson <dcn@sgi.com>
> Date:   Tue Apr 22 14:46:56 2008 -0500
>
>     [IA64] move XP and XPC to drivers/misc/sgi-xp
>
>     Move XPC and XPNET from arch/ia64/sn/kernel to drivers/misc/sgi-xp.
>
>     Signed-off-by: Dean Nelson <dcn@sgi.com>
>     Signed-off-by: Tony Luck <tony.luck@intel.com>
>
> It has been there a long time, and no networking person was involved
> in its move.
>
> drivers/usb/gadget/function/f_ncm.c
> commit 00a2430ff07d4e0e0e7e24e02fd8adede333b797
> Author: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
> Date:   Tue Jul 15 13:09:46 2014 +0200
>
>     usb: gadget: Gadget directory cleanup - group usb functions
>
>     The drivers/usb/gadget directory contains many files.
>     Files which are related can be distributed into separate directories.
>     This patch moves the USB functions implementations into a separate directory.
>
>     Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
>     Signed-off-by: Felipe Balbi <balbi@ti.com>
>
> Again, old.
>
> Can you find an example of a network driver added in the last couple
> of years outside of drivers/met?
I honestly don't know and I admit we didn't look at the dates of when
these drivers were introduced.
Oded

>
> > > > > Please make sure to CC linux-rdma. You clearly stated that the device
> > > > > does RDMA-like transfers.
> > > >
> > > > We don't use the RDMA infrastructure in the kernel and we can't
> > > > connect to it due to the lack of H/W support we have so I don't see
> > > > why we need to CC linux-rdma.
> > >
> > > You have it backward. You don't get to pick and choose which parts of
> > > the infrastructure you use, and therefore who reviews your drivers.
> > > The device uses RDMA under the hood so Linux RDMA experts must very
> > > much be okay with it getting merged. That's how we ensure Linux
> > > interfaces are consistent and good quality.
> >
> > I understand your point of view but If my H/W doesn't support the
> > basic requirements of the RDMA infrastructure and interfaces, then
> > really there is nothing I can do about it. I can't use them.
>
> It is up to the RDMA people to say that. They might see how the RDMA
> core can be made to work for your hardware.
>
>      Andrew

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 21:20       ` Oded Gabbay
  2020-09-15 21:37         ` Andrew Lunn
@ 2020-09-15 22:34         ` David Miller
  2020-09-16  4:26           ` Oded Gabbay
  1 sibling, 1 reply; 83+ messages in thread
From: David Miller @ 2020-09-15 22:34 UTC (permalink / raw)
  To: oded.gabbay
  Cc: kuba, linux-kernel, netdev, SW_Drivers, gregkh, andrew,
	f.fainelli, linux-rdma

From: Oded Gabbay <oded.gabbay@gmail.com>
Date: Wed, 16 Sep 2020 00:20:12 +0300

> I completely understand but you didn't answer my question. How come
> there are drivers which create netdev objects, and specifically sgi-xp
> in misc (but I also saw it in usb drivers) that live outside
> drivers/net ? Why doesn't your request apply to them as well ?

Don't use examples of drivers doing the wrong thing as an excuse for
you to repeat the mistake.

Ok?

That kind of argument doesn't work here.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 21:43           ` Oded Gabbay
@ 2020-09-15 22:35             ` David Miller
  0 siblings, 0 replies; 83+ messages in thread
From: David Miller @ 2020-09-15 22:35 UTC (permalink / raw)
  To: oded.gabbay
  Cc: andrew, kuba, linux-kernel, netdev, SW_Drivers, gregkh,
	f.fainelli, linux-rdma

From: Oded Gabbay <oded.gabbay@gmail.com>
Date: Wed, 16 Sep 2020 00:43:00 +0300

> I honestly don't know and I admit we didn't look at the dates of when
> these drivers were introduced.

Please do research when you make claims in the future, thank you.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 21:37         ` Andrew Lunn
  2020-09-15 21:43           ` Oded Gabbay
@ 2020-09-15 22:36           ` David Miller
  1 sibling, 0 replies; 83+ messages in thread
From: David Miller @ 2020-09-15 22:36 UTC (permalink / raw)
  To: andrew
  Cc: oded.gabbay, kuba, linux-kernel, netdev, SW_Drivers, gregkh,
	f.fainelli, linux-rdma

From: Andrew Lunn <andrew@lunn.ch>
Date: Tue, 15 Sep 2020 23:37:35 +0200

>> I understand your point of view but If my H/W doesn't support the
>> basic requirements of the RDMA infrastructure and interfaces, then
>> really there is nothing I can do about it. I can't use them.
> 
> It is up to the RDMA people to say that. They might see how the RDMA
> core can be made to work for your hardware.

+1

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 22:34         ` David Miller
@ 2020-09-16  4:26           ` Oded Gabbay
  0 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-16  4:26 UTC (permalink / raw)
  To: David Miller
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, Andrew Lunn, Florian Fainelli,
	linux-rdma

On Wed, Sep 16, 2020 at 1:34 AM David Miller <davem@davemloft.net> wrote:
>
> From: Oded Gabbay <oded.gabbay@gmail.com>
> Date: Wed, 16 Sep 2020 00:20:12 +0300
>
> > I completely understand but you didn't answer my question. How come
> > there are drivers which create netdev objects, and specifically sgi-xp
> > in misc (but I also saw it in usb drivers) that live outside
> > drivers/net ? Why doesn't your request apply to them as well ?
>
> Don't use examples of drivers doing the wrong thing as an excuse for
> you to repeat the mistake.
>
> Ok?
Well, it's not like there is a big red warning near those drivers
saying "this is wrong"...
How could I have known that in advance ?

>
> That kind of argument doesn't work here.
I know that, I just didn't know those drivers did "the wrong thing"

Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 20:49   ` Oded Gabbay
@ 2020-09-16  6:26     ` Greg Kroah-Hartman
  2020-09-16  6:36       ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-16  6:26 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Miller, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Jakub Kicinski, Andrew Lunn, Florian Fainelli

On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> >
> > From: Oded Gabbay <oded.gabbay@gmail.com>
> > Date: Tue, 15 Sep 2020 20:10:08 +0300
> >
> > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > into the habanalabs driver.
> > >
> > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > are in that patch's commit message.
> > >
> > > Link to v2 cover letter:
> > > https://lkml.org/lkml/2020/9/12/201
> >
> > I agree with Jakub, this driver definitely can't go-in as it is currently
> > structured and designed.
> Why is that ?
> Can you please point to the things that bother you or not working correctly?
> I can't really fix the driver if I don't know what's wrong.
> 
> In addition, please read my reply to Jakub with the explanation of why
> we designed this driver as is.
> 
> And because of the RDMA'ness of it, the RDMA
> > folks have to be CC:'d and have a chance to review this.
> As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> the kernel and we can't connect to it due to the lack of H/W support
> we have
> Therefore, I don't see why we need to CC linux-rdma.
> I understood why Greg asked me to CC you because we do connect to the
> netdev and standard eth infrastructure, but regarding the RDMA, it's
> not really the same.

Ok, to do this "right" it needs to be split up into separate drivers,
hopefully using the "virtual bus" code that some day Intel will resubmit
again that will solve this issue.

That will allow you to put the network driver portion in drivers/net/
and split the code up into the proper different pieces easier.

I recommend grabbing the virtual bus code from the archives and looking
at that for how this can be done.  Now that you are part of Intel, I'm
sure that the internal-Intel-Linux-kernel-review-process can kick in and
those developers can help you out.  If not, let me know, so I can go
kick them :)

As for the RDMA stuff, yeah, you should look at the current RDMA
interfaces and verify that those really do not work for you here, and
then document why that is in your patch submission.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16  6:26     ` Greg Kroah-Hartman
@ 2020-09-16  6:36       ` Oded Gabbay
  2020-09-16  7:42         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-16  6:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: David Miller, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Jakub Kicinski, Andrew Lunn, Florian Fainelli

On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> > >
> > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > >
> > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > into the habanalabs driver.
> > > >
> > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > are in that patch's commit message.
> > > >
> > > > Link to v2 cover letter:
> > > > https://lkml.org/lkml/2020/9/12/201
> > >
> > > I agree with Jakub, this driver definitely can't go-in as it is currently
> > > structured and designed.
> > Why is that ?
> > Can you please point to the things that bother you or not working correctly?
> > I can't really fix the driver if I don't know what's wrong.
> >
> > In addition, please read my reply to Jakub with the explanation of why
> > we designed this driver as is.
> >
> > And because of the RDMA'ness of it, the RDMA
> > > folks have to be CC:'d and have a chance to review this.
> > As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> > the kernel and we can't connect to it due to the lack of H/W support
> > we have
> > Therefore, I don't see why we need to CC linux-rdma.
> > I understood why Greg asked me to CC you because we do connect to the
> > netdev and standard eth infrastructure, but regarding the RDMA, it's
> > not really the same.
>
> Ok, to do this "right" it needs to be split up into separate drivers,
> hopefully using the "virtual bus" code that some day Intel will resubmit
> again that will solve this issue.
Hi Greg,
Can I suggest an alternative for the short/medium term ?

In an earlier email, Jakub said:
"Is it not possible to move the files and still build them into a single
module?"

I thought maybe that's a good way to progress here ?
First, split the content to Ethernet and RDMA.
Then move the Ethernet part to drivers/net but build it as part of
habanalabs.ko.
Regarding the RDMA code, upstream/review it in a different patch-set
(maybe they will want me to put the files elsewhere).

What do you think ?

>
> That will allow you to put the network driver portion in drivers/net/
> and split the code up into the proper different pieces easier.
>
> I recommend grabbing the virtual bus code from the archives and looking
> at that for how this can be done.  Now that you are part of Intel, I'm
> sure that the internal-Intel-Linux-kernel-review-process can kick in and
> those developers can help you out.  If not, let me know, so I can go
> kick them :)
>
> As for the RDMA stuff, yeah, you should look at the current RDMA
> interfaces and verify that those really do not work for you here, and
> then document why that is in your patch submission.
ok, will do that.

Thanks,
Oded
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16  6:36       ` Oded Gabbay
@ 2020-09-16  7:42         ` Greg Kroah-Hartman
  2020-09-16  8:02           ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-16  7:42 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Miller, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Jakub Kicinski, Andrew Lunn, Florian Fainelli

On Wed, Sep 16, 2020 at 09:36:23AM +0300, Oded Gabbay wrote:
> On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > > On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> > > >
> > > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > > >
> > > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > > into the habanalabs driver.
> > > > >
> > > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > > are in that patch's commit message.
> > > > >
> > > > > Link to v2 cover letter:
> > > > > https://lkml.org/lkml/2020/9/12/201
> > > >
> > > > I agree with Jakub, this driver definitely can't go-in as it is currently
> > > > structured and designed.
> > > Why is that ?
> > > Can you please point to the things that bother you or not working correctly?
> > > I can't really fix the driver if I don't know what's wrong.
> > >
> > > In addition, please read my reply to Jakub with the explanation of why
> > > we designed this driver as is.
> > >
> > > And because of the RDMA'ness of it, the RDMA
> > > > folks have to be CC:'d and have a chance to review this.
> > > As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> > > the kernel and we can't connect to it due to the lack of H/W support
> > > we have
> > > Therefore, I don't see why we need to CC linux-rdma.
> > > I understood why Greg asked me to CC you because we do connect to the
> > > netdev and standard eth infrastructure, but regarding the RDMA, it's
> > > not really the same.
> >
> > Ok, to do this "right" it needs to be split up into separate drivers,
> > hopefully using the "virtual bus" code that some day Intel will resubmit
> > again that will solve this issue.
> Hi Greg,
> Can I suggest an alternative for the short/medium term ?
> 
> In an earlier email, Jakub said:
> "Is it not possible to move the files and still build them into a single
> module?"
> 
> I thought maybe that's a good way to progress here ?

Cross-directory builds of a single module are crazy.  Yes, they work,
but really, that's a mess, and would never suggest doing that.

> First, split the content to Ethernet and RDMA.
> Then move the Ethernet part to drivers/net but build it as part of
> habanalabs.ko.
> Regarding the RDMA code, upstream/review it in a different patch-set
> (maybe they will want me to put the files elsewhere).
> 
> What do you think ?

I think you are asking for more work there than just splitting out into
separate modules :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16  7:42         ` Greg Kroah-Hartman
@ 2020-09-16  8:02           ` Oded Gabbay
  2020-09-16  8:22             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-16  8:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: David Miller, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Jakub Kicinski, Andrew Lunn, Florian Fainelli

On Wed, Sep 16, 2020 at 10:41 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Sep 16, 2020 at 09:36:23AM +0300, Oded Gabbay wrote:
> > On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > > > On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> > > > >
> > > > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > > > >
> > > > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > > > into the habanalabs driver.
> > > > > >
> > > > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > > > are in that patch's commit message.
> > > > > >
> > > > > > Link to v2 cover letter:
> > > > > > https://lkml.org/lkml/2020/9/12/201
> > > > >
> > > > > I agree with Jakub, this driver definitely can't go-in as it is currently
> > > > > structured and designed.
> > > > Why is that ?
> > > > Can you please point to the things that bother you or not working correctly?
> > > > I can't really fix the driver if I don't know what's wrong.
> > > >
> > > > In addition, please read my reply to Jakub with the explanation of why
> > > > we designed this driver as is.
> > > >
> > > > And because of the RDMA'ness of it, the RDMA
> > > > > folks have to be CC:'d and have a chance to review this.
> > > > As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> > > > the kernel and we can't connect to it due to the lack of H/W support
> > > > we have
> > > > Therefore, I don't see why we need to CC linux-rdma.
> > > > I understood why Greg asked me to CC you because we do connect to the
> > > > netdev and standard eth infrastructure, but regarding the RDMA, it's
> > > > not really the same.
> > >
> > > Ok, to do this "right" it needs to be split up into separate drivers,
> > > hopefully using the "virtual bus" code that some day Intel will resubmit
> > > again that will solve this issue.
> > Hi Greg,
> > Can I suggest an alternative for the short/medium term ?
> >
> > In an earlier email, Jakub said:
> > "Is it not possible to move the files and still build them into a single
> > module?"
> >
> > I thought maybe that's a good way to progress here ?
>
> Cross-directory builds of a single module are crazy.  Yes, they work,
> but really, that's a mess, and would never suggest doing that.
>
> > First, split the content to Ethernet and RDMA.
> > Then move the Ethernet part to drivers/net but build it as part of
> > habanalabs.ko.
> > Regarding the RDMA code, upstream/review it in a different patch-set
> > (maybe they will want me to put the files elsewhere).
> >
> > What do you think ?
>
> I think you are asking for more work there than just splitting out into
> separate modules :)
>
> thanks,
>
> greg k-h
Hi Greg,

If cross-directory building is out of the question, what about
splitting into separate modules ? And use cross-module notifiers/calls
? I did that with amdkfd and amdgpu/radeon a couple of years back. It
worked (that's the best thing I can say about it).
The main problem with this "virtual bus" thing is that I'm not
familiar with it at all and from my experience I imagine it would take
a considerable time and effort to upstream this infrastructure work.
This could delay the NIC code for a couple of years, which by then
this won't be relevant at all.

So I'm trying to find some middle ground here on how to proceed.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16  8:02           ` Oded Gabbay
@ 2020-09-16  8:22             ` Greg Kroah-Hartman
  2020-09-16  8:47               ` Oded Gabbay
  2020-09-16 23:04               ` Williams, Dan J
  0 siblings, 2 replies; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-16  8:22 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Miller, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Jakub Kicinski, Andrew Lunn, Florian Fainelli

On Wed, Sep 16, 2020 at 11:02:39AM +0300, Oded Gabbay wrote:
> On Wed, Sep 16, 2020 at 10:41 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Sep 16, 2020 at 09:36:23AM +0300, Oded Gabbay wrote:
> > > On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > > > > On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> > > > > >
> > > > > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > > > > >
> > > > > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > > > > into the habanalabs driver.
> > > > > > >
> > > > > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > > > > are in that patch's commit message.
> > > > > > >
> > > > > > > Link to v2 cover letter:
> > > > > > > https://lkml.org/lkml/2020/9/12/201
> > > > > >
> > > > > > I agree with Jakub, this driver definitely can't go-in as it is currently
> > > > > > structured and designed.
> > > > > Why is that ?
> > > > > Can you please point to the things that bother you or not working correctly?
> > > > > I can't really fix the driver if I don't know what's wrong.
> > > > >
> > > > > In addition, please read my reply to Jakub with the explanation of why
> > > > > we designed this driver as is.
> > > > >
> > > > > And because of the RDMA'ness of it, the RDMA
> > > > > > folks have to be CC:'d and have a chance to review this.
> > > > > As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> > > > > the kernel and we can't connect to it due to the lack of H/W support
> > > > > we have
> > > > > Therefore, I don't see why we need to CC linux-rdma.
> > > > > I understood why Greg asked me to CC you because we do connect to the
> > > > > netdev and standard eth infrastructure, but regarding the RDMA, it's
> > > > > not really the same.
> > > >
> > > > Ok, to do this "right" it needs to be split up into separate drivers,
> > > > hopefully using the "virtual bus" code that some day Intel will resubmit
> > > > again that will solve this issue.
> > > Hi Greg,
> > > Can I suggest an alternative for the short/medium term ?
> > >
> > > In an earlier email, Jakub said:
> > > "Is it not possible to move the files and still build them into a single
> > > module?"
> > >
> > > I thought maybe that's a good way to progress here ?
> >
> > Cross-directory builds of a single module are crazy.  Yes, they work,
> > but really, that's a mess, and would never suggest doing that.
> >
> > > First, split the content to Ethernet and RDMA.
> > > Then move the Ethernet part to drivers/net but build it as part of
> > > habanalabs.ko.
> > > Regarding the RDMA code, upstream/review it in a different patch-set
> > > (maybe they will want me to put the files elsewhere).
> > >
> > > What do you think ?
> >
> > I think you are asking for more work there than just splitting out into
> > separate modules :)
> >
> > thanks,
> >
> > greg k-h
> Hi Greg,
> 
> If cross-directory building is out of the question, what about
> splitting into separate modules ? And use cross-module notifiers/calls
> ? I did that with amdkfd and amdgpu/radeon a couple of years back. It
> worked (that's the best thing I can say about it).

That's fine with me.

> The main problem with this "virtual bus" thing is that I'm not
> familiar with it at all and from my experience I imagine it would take
> a considerable time and effort to upstream this infrastructure work.

It shouldn't be taking that long, but for some unknown reason, the
original author of that code is sitting on it and not resending it.  Go
poke them through internal Intel channels to find out what the problem
is, as I have no clue why a 200-300 line bus module is taking so long to
get "right" :(

I'm _ALMOST_ at the point where I would just do that work myself, but
due to my current status with Intel, I'll let them do it as I have
enough other things on my plate...

> This could delay the NIC code for a couple of years, which by then
> this won't be relevant at all.

Why wouldn't this code be relevant in a year?  It's going to be 2+ years
before any of this shows up in an "enterprise distro" based on their
release cycles anyway :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16  8:22             ` Greg Kroah-Hartman
@ 2020-09-16  8:47               ` Oded Gabbay
  2020-09-16 12:00                 ` Greg Kroah-Hartman
  2020-09-16 23:04               ` Williams, Dan J
  1 sibling, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-16  8:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: David Miller, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Jakub Kicinski, Andrew Lunn, Florian Fainelli

On Wed, Sep 16, 2020 at 11:21 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Sep 16, 2020 at 11:02:39AM +0300, Oded Gabbay wrote:
> > On Wed, Sep 16, 2020 at 10:41 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Sep 16, 2020 at 09:36:23AM +0300, Oded Gabbay wrote:
> > > > On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > > > > > On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> > > > > > >
> > > > > > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > > > > > >
> > > > > > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > > > > > into the habanalabs driver.
> > > > > > > >
> > > > > > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > > > > > are in that patch's commit message.
> > > > > > > >
> > > > > > > > Link to v2 cover letter:
> > > > > > > > https://lkml.org/lkml/2020/9/12/201
> > > > > > >
> > > > > > > I agree with Jakub, this driver definitely can't go-in as it is currently
> > > > > > > structured and designed.
> > > > > > Why is that ?
> > > > > > Can you please point to the things that bother you or not working correctly?
> > > > > > I can't really fix the driver if I don't know what's wrong.
> > > > > >
> > > > > > In addition, please read my reply to Jakub with the explanation of why
> > > > > > we designed this driver as is.
> > > > > >
> > > > > > And because of the RDMA'ness of it, the RDMA
> > > > > > > folks have to be CC:'d and have a chance to review this.
> > > > > > As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> > > > > > the kernel and we can't connect to it due to the lack of H/W support
> > > > > > we have
> > > > > > Therefore, I don't see why we need to CC linux-rdma.
> > > > > > I understood why Greg asked me to CC you because we do connect to the
> > > > > > netdev and standard eth infrastructure, but regarding the RDMA, it's
> > > > > > not really the same.
> > > > >
> > > > > Ok, to do this "right" it needs to be split up into separate drivers,
> > > > > hopefully using the "virtual bus" code that some day Intel will resubmit
> > > > > again that will solve this issue.
> > > > Hi Greg,
> > > > Can I suggest an alternative for the short/medium term ?
> > > >
> > > > In an earlier email, Jakub said:
> > > > "Is it not possible to move the files and still build them into a single
> > > > module?"
> > > >
> > > > I thought maybe that's a good way to progress here ?
> > >
> > > Cross-directory builds of a single module are crazy.  Yes, they work,
> > > but really, that's a mess, and would never suggest doing that.
> > >
> > > > First, split the content to Ethernet and RDMA.
> > > > Then move the Ethernet part to drivers/net but build it as part of
> > > > habanalabs.ko.
> > > > Regarding the RDMA code, upstream/review it in a different patch-set
> > > > (maybe they will want me to put the files elsewhere).
> > > >
> > > > What do you think ?
> > >
> > > I think you are asking for more work there than just splitting out into
> > > separate modules :)
> > >
> > > thanks,
> > >
> > > greg k-h
> > Hi Greg,
> >
> > If cross-directory building is out of the question, what about
> > splitting into separate modules ? And use cross-module notifiers/calls
> > ? I did that with amdkfd and amdgpu/radeon a couple of years back. It
> > worked (that's the best thing I can say about it).
>
> That's fine with me.
>
> > The main problem with this "virtual bus" thing is that I'm not
> > familiar with it at all and from my experience I imagine it would take
> > a considerable time and effort to upstream this infrastructure work.
>
> It shouldn't be taking that long, but for some unknown reason, the
> original author of that code is sitting on it and not resending it.  Go
> poke them through internal Intel channels to find out what the problem
> is, as I have no clue why a 200-300 line bus module is taking so long to
> get "right" :(
>
> I'm _ALMOST_ at the point where I would just do that work myself, but
> due to my current status with Intel, I'll let them do it as I have
> enough other things on my plate...
>
> > This could delay the NIC code for a couple of years, which by then
> > this won't be relevant at all.
>
> Why wouldn't this code be relevant in a year?  It's going to be 2+ years
> before any of this shows up in an "enterprise distro" based on their
> release cycles anyway :)
>
> thanks,
>
> greg k-h

Hi Greg,
ok, I'll take a look. Do you happen to have the name of the patch-set / author ?

Regarding the RDMA stuff, I'll do some work internally to separate it
from the Ethernet code and then will send that code only to RDMA
people with more detailed explanations.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16  8:47               ` Oded Gabbay
@ 2020-09-16 12:00                 ` Greg Kroah-Hartman
  2020-09-20 16:45                   ` Daniel Vetter
  0 siblings, 1 reply; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-16 12:00 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Miller, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Jakub Kicinski, Andrew Lunn, Florian Fainelli

On Wed, Sep 16, 2020 at 11:47:58AM +0300, Oded Gabbay wrote:
> On Wed, Sep 16, 2020 at 11:21 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Sep 16, 2020 at 11:02:39AM +0300, Oded Gabbay wrote:
> > > On Wed, Sep 16, 2020 at 10:41 AM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Wed, Sep 16, 2020 at 09:36:23AM +0300, Oded Gabbay wrote:
> > > > > On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
> > > > > <gregkh@linuxfoundation.org> wrote:
> > > > > >
> > > > > > On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > > > > > > On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> > > > > > > >
> > > > > > > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > > > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > > > > > > >
> > > > > > > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > > > > > > into the habanalabs driver.
> > > > > > > > >
> > > > > > > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > > > > > > are in that patch's commit message.
> > > > > > > > >
> > > > > > > > > Link to v2 cover letter:
> > > > > > > > > https://lkml.org/lkml/2020/9/12/201
> > > > > > > >
> > > > > > > > I agree with Jakub, this driver definitely can't go-in as it is currently
> > > > > > > > structured and designed.
> > > > > > > Why is that ?
> > > > > > > Can you please point to the things that bother you or not working correctly?
> > > > > > > I can't really fix the driver if I don't know what's wrong.
> > > > > > >
> > > > > > > In addition, please read my reply to Jakub with the explanation of why
> > > > > > > we designed this driver as is.
> > > > > > >
> > > > > > > And because of the RDMA'ness of it, the RDMA
> > > > > > > > folks have to be CC:'d and have a chance to review this.
> > > > > > > As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> > > > > > > the kernel and we can't connect to it due to the lack of H/W support
> > > > > > > we have
> > > > > > > Therefore, I don't see why we need to CC linux-rdma.
> > > > > > > I understood why Greg asked me to CC you because we do connect to the
> > > > > > > netdev and standard eth infrastructure, but regarding the RDMA, it's
> > > > > > > not really the same.
> > > > > >
> > > > > > Ok, to do this "right" it needs to be split up into separate drivers,
> > > > > > hopefully using the "virtual bus" code that some day Intel will resubmit
> > > > > > again that will solve this issue.
> > > > > Hi Greg,
> > > > > Can I suggest an alternative for the short/medium term ?
> > > > >
> > > > > In an earlier email, Jakub said:
> > > > > "Is it not possible to move the files and still build them into a single
> > > > > module?"
> > > > >
> > > > > I thought maybe that's a good way to progress here ?
> > > >
> > > > Cross-directory builds of a single module are crazy.  Yes, they work,
> > > > but really, that's a mess, and would never suggest doing that.
> > > >
> > > > > First, split the content to Ethernet and RDMA.
> > > > > Then move the Ethernet part to drivers/net but build it as part of
> > > > > habanalabs.ko.
> > > > > Regarding the RDMA code, upstream/review it in a different patch-set
> > > > > (maybe they will want me to put the files elsewhere).
> > > > >
> > > > > What do you think ?
> > > >
> > > > I think you are asking for more work there than just splitting out into
> > > > separate modules :)
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > > Hi Greg,
> > >
> > > If cross-directory building is out of the question, what about
> > > splitting into separate modules ? And use cross-module notifiers/calls
> > > ? I did that with amdkfd and amdgpu/radeon a couple of years back. It
> > > worked (that's the best thing I can say about it).
> >
> > That's fine with me.
> >
> > > The main problem with this "virtual bus" thing is that I'm not
> > > familiar with it at all and from my experience I imagine it would take
> > > a considerable time and effort to upstream this infrastructure work.
> >
> > It shouldn't be taking that long, but for some unknown reason, the
> > original author of that code is sitting on it and not resending it.  Go
> > poke them through internal Intel channels to find out what the problem
> > is, as I have no clue why a 200-300 line bus module is taking so long to
> > get "right" :(
> >
> > I'm _ALMOST_ at the point where I would just do that work myself, but
> > due to my current status with Intel, I'll let them do it as I have
> > enough other things on my plate...
> >
> > > This could delay the NIC code for a couple of years, which by then
> > > this won't be relevant at all.
> >
> > Why wouldn't this code be relevant in a year?  It's going to be 2+ years
> > before any of this shows up in an "enterprise distro" based on their
> > release cycles anyway :)
> >
> > thanks,
> >
> > greg k-h
> 
> Hi Greg,
> ok, I'll take a look. Do you happen to have the name of the patch-set / author ?

Here's at least one copy:
	https://lore.kernel.org/linux-rdma/20200520070227.3392100-2-jeffrey.t.kirsher@intel.com/

there might have been a newer one, can't remember, sorry.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16  8:22             ` Greg Kroah-Hartman
  2020-09-16  8:47               ` Oded Gabbay
@ 2020-09-16 23:04               ` Williams, Dan J
  1 sibling, 0 replies; 83+ messages in thread
From: Williams, Dan J @ 2020-09-16 23:04 UTC (permalink / raw)
  To: oded.gabbay, gregkh
  Cc: f.fainelli, andrew, kuba, davem, linux-kernel, SW_Drivers, netdev

On Wed, 2020-09-16 at 10:22 +0200, Greg Kroah-Hartman wrote:
> On Wed, Sep 16, 2020 at 11:02:39AM +0300, Oded Gabbay wrote:
> > On Wed, Sep 16, 2020 at 10:41 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > On Wed, Sep 16, 2020 at 09:36:23AM +0300, Oded Gabbay wrote:
> > > > On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > > On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > > > > > On Tue, Sep 15, 2020 at 11:42 PM David Miller <
> > > > > > davem@davemloft.net> wrote:
> > > > > > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > > > > > > 
> > > > > > > > This is the second version of the patch-set to upstream
> > > > > > > > the GAUDI NIC code
> > > > > > > > into the habanalabs driver.
> > > > > > > > 
> > > > > > > > The only modification from v2 is in the ethtool patch
> > > > > > > > (patch 12). Details
> > > > > > > > are in that patch's commit message.
> > > > > > > > 
> > > > > > > > Link to v2 cover letter:
> > > > > > > > https://lkml.org/lkml/2020/9/12/201
> > > > > > > 
> > > > > > > I agree with Jakub, this driver definitely can't go-in as
> > > > > > > it is currently
> > > > > > > structured and designed.
> > > > > > Why is that ?
> > > > > > Can you please point to the things that bother you or not
> > > > > > working correctly?
> > > > > > I can't really fix the driver if I don't know what's wrong.
> > > > > > 
> > > > > > In addition, please read my reply to Jakub with the
> > > > > > explanation of why
> > > > > > we designed this driver as is.
> > > > > > 
> > > > > > And because of the RDMA'ness of it, the RDMA
> > > > > > > folks have to be CC:'d and have a chance to review this.
> > > > > > As I said to Jakub, the driver doesn't use the RDMA
> > > > > > infrastructure in
> > > > > > the kernel and we can't connect to it due to the lack of
> > > > > > H/W support
> > > > > > we have
> > > > > > Therefore, I don't see why we need to CC linux-rdma.
> > > > > > I understood why Greg asked me to CC you because we do
> > > > > > connect to the
> > > > > > netdev and standard eth infrastructure, but regarding the
> > > > > > RDMA, it's
> > > > > > not really the same.
> > > > > 
> > > > > Ok, to do this "right" it needs to be split up into separate
> > > > > drivers,
> > > > > hopefully using the "virtual bus" code that some day Intel
> > > > > will resubmit
> > > > > again that will solve this issue.
> > > > Hi Greg,
> > > > Can I suggest an alternative for the short/medium term ?
> > > > 
> > > > In an earlier email, Jakub said:
> > > > "Is it not possible to move the files and still build them into
> > > > a single
> > > > module?"
> > > > 
> > > > I thought maybe that's a good way to progress here ?
> > > 
> > > Cross-directory builds of a single module are crazy.  Yes, they
> > > work,
> > > but really, that's a mess, and would never suggest doing that.
> > > 
> > > > First, split the content to Ethernet and RDMA.
> > > > Then move the Ethernet part to drivers/net but build it as part
> > > > of
> > > > habanalabs.ko.
> > > > Regarding the RDMA code, upstream/review it in a different
> > > > patch-set
> > > > (maybe they will want me to put the files elsewhere).
> > > > 
> > > > What do you think ?
> > > 
> > > I think you are asking for more work there than just splitting
> > > out into
> > > separate modules :)
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > Hi Greg,
> > 
> > If cross-directory building is out of the question, what about
> > splitting into separate modules ? And use cross-module
> > notifiers/calls
> > ? I did that with amdkfd and amdgpu/radeon a couple of years back.
> > It
> > worked (that's the best thing I can say about it).
> 
> That's fine with me.
> 
> > The main problem with this "virtual bus" thing is that I'm not
> > familiar with it at all and from my experience I imagine it would
> > take
> > a considerable time and effort to upstream this infrastructure
> > work.
> 
> It shouldn't be taking that long, but for some unknown reason, the
> original author of that code is sitting on it and not resending
> it.  Go
> poke them through internal Intel channels to find out what the
> problem
> is, as I have no clue why a 200-300 line bus module is taking so long
> to
> get "right" :(

It turns out that they were caught between being deeply respectful of
your request to get another senior kernel developer to look at it
before sending it out, and deeply respectful of not disclosing that I
was out on bonding leave.

It just happened that I left before they could
get the latest version over to review.

> I'm _ALMOST_ at the point where I would just do that work myself, but
> due to my current status with Intel, I'll let them do it as I have
> enough other things on my plate...

I'm back now, let's get this thing moving. /me goes to review.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 20:46   ` Oded Gabbay
  2020-09-15 21:04     ` Jakub Kicinski
@ 2020-09-17 17:18     ` Jason Gunthorpe
  2020-09-18 11:36       ` Gal Pressman
  1 sibling, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-17 17:18 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> infrastructure for communication between multiple accelerators. Same
> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> The RDMA implementation we did does NOT support some basic RDMA
> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> library or to connect to the rdma infrastructure in the kernel. 

You can't create a parallel RDMA subsystem in netdev, or in misc, and
you can't add random device offloads as IOCTL to nedevs.

RDMA is the proper home for all the networking offloads that don't fit
into netdev.

EFA was able to fit into rdma-core/etc and it isn't even RoCE at
all. I'm sure this can too.

> wanted to do it but when we analyzed it, we saw we wouldn't be able to
> support basic stuff and therefore we had to revert to our IOCTLs.

Try again. Ask for help.

Your patches add CQs, WQ, and other RDMA objects. This is very clearly
not an appropriate functionality for netdev.

> To sum it up, because our NIC is used for intra-communication, we
> don't expose nor intend users to use it as a NIC per-se. However, to
> be able to get statistics and manage them in a standard way, and
> support control plane over Ethernet, we do register each port to the
> net subsystem (i.e. create netdev per port).

Sure, the basic ethernet side is conceptually fine.

> > Please make sure to CC linux-rdma. You clearly stated that the device
> > does RDMA-like transfers.
> 
> We don't use the RDMA infrastructure in the kernel and we can't
> connect to it due to the lack of H/W support we have so I don't see
> why we need to CC linux-rdma.

Because you can't put RDMA like concepts under net.

Jakub, NAK from me on this series.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-17 17:18     ` Jason Gunthorpe
@ 2020-09-18 11:36       ` Gal Pressman
  2020-09-18 11:52         ` Leon Romanovsky
                           ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Gal Pressman @ 2020-09-18 11:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On 17/09/2020 20:18, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
>> infrastructure for communication between multiple accelerators. Same
>> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
>> The RDMA implementation we did does NOT support some basic RDMA
>> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
>> library or to connect to the rdma infrastructure in the kernel. 
> 
> You can't create a parallel RDMA subsystem in netdev, or in misc, and
> you can't add random device offloads as IOCTL to nedevs.
> 
> RDMA is the proper home for all the networking offloads that don't fit
> into netdev.
> 
> EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> all. I'm sure this can too.

Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
was suggested to go through the vfio subsystem instead.

I think this comes back to the discussion we had when EFA was upstreamed, which
is what's the bar to get accepted to the RDMA subsystem.
IIRC, what we eventually agreed on is having a userspace rdma-core provider and
ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).

Does GAUDI fit these requirements? If not, should it be in a different subsystem
or should we open the "what qualifies as an RDMA device" question again?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 11:36       ` Gal Pressman
@ 2020-09-18 11:52         ` Leon Romanovsky
  2020-09-18 11:56           ` Oded Gabbay
  2020-09-18 11:56         ` Jason Gunthorpe
  2020-09-18 12:10         ` Oded Gabbay
  2 siblings, 1 reply; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-18 11:52 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Jason Gunthorpe, Oded Gabbay, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> >> infrastructure for communication between multiple accelerators. Same
> >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> >> The RDMA implementation we did does NOT support some basic RDMA
> >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> >> library or to connect to the rdma infrastructure in the kernel.
> >
> > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > you can't add random device offloads as IOCTL to nedevs.
> >
> > RDMA is the proper home for all the networking offloads that don't fit
> > into netdev.
> >
> > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > all. I'm sure this can too.
>
> Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> was suggested to go through the vfio subsystem instead.
>
> I think this comes back to the discussion we had when EFA was upstreamed, which
> is what's the bar to get accepted to the RDMA subsystem.
> IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
>
> Does GAUDI fit these requirements? If not, should it be in a different subsystem
> or should we open the "what qualifies as an RDMA device" question again?

I want to remind you that rdma-core requirement came to make sure that
anything exposed from the RDMA to the userspace is strict with proper
UAPI header hygiene.

I doubt that Havana's ioctls are backed by anything like this.

Thanks

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 11:36       ` Gal Pressman
  2020-09-18 11:52         ` Leon Romanovsky
@ 2020-09-18 11:56         ` Jason Gunthorpe
  2020-09-18 11:59           ` Oded Gabbay
  2020-09-18 12:10         ` Oded Gabbay
  2 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 11:56 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Oded Gabbay, Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, Greg Kroah-Hartman, David S. Miller,
	Andrew Lunn, Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> >> infrastructure for communication between multiple accelerators. Same
> >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> >> The RDMA implementation we did does NOT support some basic RDMA
> >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> >> library or to connect to the rdma infrastructure in the kernel. 
> > 
> > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > you can't add random device offloads as IOCTL to nedevs.
> > 
> > RDMA is the proper home for all the networking offloads that don't fit
> > into netdev.
> > 
> > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > all. I'm sure this can too.
> 
> Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> was suggested to go through the vfio subsystem instead.
> 
> I think this comes back to the discussion we had when EFA was upstreamed, which
> is what's the bar to get accepted to the RDMA subsystem.
> IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).

That is more or less where we ended up, yes.

I'm most worried about this lack of PD and MR.

Kernel must provide security for apps doing user DMA, PD and MR do
this. If the device doesn't have PD/MR then it is hard to see how a WQ
could ever be exposed directly to userspace, regardless of subsystem.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 11:52         ` Leon Romanovsky
@ 2020-09-18 11:56           ` Oded Gabbay
  2020-09-18 12:03             ` Leon Romanovsky
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 11:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > >> infrastructure for communication between multiple accelerators. Same
> > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > >> The RDMA implementation we did does NOT support some basic RDMA
> > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > >> library or to connect to the rdma infrastructure in the kernel.
> > >
> > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > you can't add random device offloads as IOCTL to nedevs.
> > >
> > > RDMA is the proper home for all the networking offloads that don't fit
> > > into netdev.
> > >
> > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > all. I'm sure this can too.
> >
> > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > was suggested to go through the vfio subsystem instead.
> >
> > I think this comes back to the discussion we had when EFA was upstreamed, which
> > is what's the bar to get accepted to the RDMA subsystem.
> > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> >
> > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > or should we open the "what qualifies as an RDMA device" question again?
>
> I want to remind you that rdma-core requirement came to make sure that
> anything exposed from the RDMA to the userspace is strict with proper
> UAPI header hygiene.
>
> I doubt that Havana's ioctls are backed by anything like this.
>
> Thanks

Why do you doubt that ? Have you looked at our code ?
Our uapi and IOCTLs interface is based on drm subsystem uapi interface
and it is very safe and protected.
Otherwise Greg would have never allowed me to go upstream in the first place.

We have a single function which is the entry point for all the IOCTLs
of our drivers (only one IOCTL is RDMA related, all the others are
compute related).
That function is almost 1:1 copy of the function in drm.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 11:56         ` Jason Gunthorpe
@ 2020-09-18 11:59           ` Oded Gabbay
  2020-09-18 12:16             ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 11:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gal Pressman, Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, Greg Kroah-Hartman, David S. Miller,
	Andrew Lunn, Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 2:56 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > >> infrastructure for communication between multiple accelerators. Same
> > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > >> The RDMA implementation we did does NOT support some basic RDMA
> > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > >> library or to connect to the rdma infrastructure in the kernel.
> > >
> > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > you can't add random device offloads as IOCTL to nedevs.
> > >
> > > RDMA is the proper home for all the networking offloads that don't fit
> > > into netdev.
> > >
> > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > all. I'm sure this can too.
> >
> > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > was suggested to go through the vfio subsystem instead.
> >
> > I think this comes back to the discussion we had when EFA was upstreamed, which
> > is what's the bar to get accepted to the RDMA subsystem.
> > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
>
> That is more or less where we ended up, yes.
>
> I'm most worried about this lack of PD and MR.
>
> Kernel must provide security for apps doing user DMA, PD and MR do
> this. If the device doesn't have PD/MR then it is hard to see how a WQ
> could ever be exposed directly to userspace, regardless of subsystem.
>
> Jason

Hi Jason,
What you say here is very true and we handle that with different
mechanisms. I will start working on a dedicated patch-set of the RDMA
code in the next few weeks with MUCH MORE details in the commit
messages. That will explain exactly how we expose stuff and protect.

For example, regarding isolating between applications, we only support
a single application opening our file descriptor.
Another example is that the submission of WQ is done through our QMAN
mechanism and is NOT mapped to userspace (due to the restrictions you
mentioned above and other restrictions).

But again, I want to send something organized and with proper explanations.
I hope to have something in a couple of weeks.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (14 preceding siblings ...)
  2020-09-15 20:42 ` David Miller
@ 2020-09-18 12:00 ` Jason Gunthorpe
  2020-09-18 12:01   ` Oded Gabbay
  15 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 12:00 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-kernel, netdev, SW_Drivers, gregkh, davem, kuba, andrew,
	f.fainelli

 
On Tue, Sep 15, 2020 at 08:10:08PM +0300, Oded Gabbay wrote:
> Hello,
> 
> This is the second version of the patch-set to upstream the GAUDI NIC code
> into the habanalabs driver.
> 
> The only modification from v2 is in the ethtool patch (patch 12). Details
> are in that patch's commit message.
> 
> Link to v2 cover letter:
> https://lkml.org/lkml/2020/9/12/201

>1. The NIC functionality is NOT exposed as different PCI Physical
>   Functions. There is a single PF which is used for compute and
>   networking, as the main goal of the NIC ports is to be used as
>   intra-communication and not as standard network interfaces. This
>   implies we can't connect different drivers to handle the networking
>   ports because it is the same device, from the kernel POV, as the
>   compute. Therefore, we must integrate the networking code into the
>   main habanalabs driver.

No, this means you need to use virtual bus/ancillary bus that your
other Intel colleagues have been working on with Greg.

It is specificaly intended as the way to split a single PCI function
across multiple subsystems. eg drivers/misc/habanalabs would be the
pci_driver and drivers/net/ethernet/habanadalabs would be the
'virtual/ancillary' driver. Probably one per port.

Jasno

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:00 ` Jason Gunthorpe
@ 2020-09-18 12:01   ` Oded Gabbay
  0 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 12:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Jakub Kicinski, Andrew Lunn,
	Florian Fainelli

On Fri, Sep 18, 2020 at 3:00 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
>
> On Tue, Sep 15, 2020 at 08:10:08PM +0300, Oded Gabbay wrote:
> > Hello,
> >
> > This is the second version of the patch-set to upstream the GAUDI NIC code
> > into the habanalabs driver.
> >
> > The only modification from v2 is in the ethtool patch (patch 12). Details
> > are in that patch's commit message.
> >
> > Link to v2 cover letter:
> > https://lkml.org/lkml/2020/9/12/201
>
> >1. The NIC functionality is NOT exposed as different PCI Physical
> >   Functions. There is a single PF which is used for compute and
> >   networking, as the main goal of the NIC ports is to be used as
> >   intra-communication and not as standard network interfaces. This
> >   implies we can't connect different drivers to handle the networking
> >   ports because it is the same device, from the kernel POV, as the
> >   compute. Therefore, we must integrate the networking code into the
> >   main habanalabs driver.
>
> No, this means you need to use virtual bus/ancillary bus that your
> other Intel colleagues have been working on with Greg.
>
> It is specificaly intended as the way to split a single PCI function
> across multiple subsystems. eg drivers/misc/habanalabs would be the
> pci_driver and drivers/net/ethernet/habanadalabs would be the
> 'virtual/ancillary' driver. Probably one per port.
>
> Jasno

Understood.
We are doing a refactor of the code according to those guidelines and
will send an updated patch-set in a couple of weeks.
Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 11:56           ` Oded Gabbay
@ 2020-09-18 12:03             ` Leon Romanovsky
  2020-09-18 12:07               ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-18 12:03 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > >> infrastructure for communication between multiple accelerators. Same
> > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > >> library or to connect to the rdma infrastructure in the kernel.
> > > >
> > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > you can't add random device offloads as IOCTL to nedevs.
> > > >
> > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > into netdev.
> > > >
> > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > all. I'm sure this can too.
> > >
> > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > was suggested to go through the vfio subsystem instead.
> > >
> > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > is what's the bar to get accepted to the RDMA subsystem.
> > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > >
> > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > or should we open the "what qualifies as an RDMA device" question again?
> >
> > I want to remind you that rdma-core requirement came to make sure that
> > anything exposed from the RDMA to the userspace is strict with proper
> > UAPI header hygiene.
> >
> > I doubt that Havana's ioctls are backed by anything like this.
> >
> > Thanks
>
> Why do you doubt that ? Have you looked at our code ?
> Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> and it is very safe and protected.

Yes, I looked and didn't find open-source users of your UAPI headers.
It is not related to being safe or protected by to the common request
to present userspace that relies on those exported interfaces.

> Otherwise Greg would have never allowed me to go upstream in the first place.

Nice, can we get a link?

>
> We have a single function which is the entry point for all the IOCTLs
> of our drivers (only one IOCTL is RDMA related, all the others are
> compute related).
> That function is almost 1:1 copy of the function in drm.

DRM has same rules as RDMA, no kernel code will be merged without seeing
open-source userspace.

Thanks

>
> Thanks,
> Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:03             ` Leon Romanovsky
@ 2020-09-18 12:07               ` Oded Gabbay
  2020-09-18 12:19                 ` Leon Romanovsky
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 12:07 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 3:03 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> > On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > > >> infrastructure for communication between multiple accelerators. Same
> > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > > >> library or to connect to the rdma infrastructure in the kernel.
> > > > >
> > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > > you can't add random device offloads as IOCTL to nedevs.
> > > > >
> > > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > > into netdev.
> > > > >
> > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > > all. I'm sure this can too.
> > > >
> > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > > was suggested to go through the vfio subsystem instead.
> > > >
> > > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > > is what's the bar to get accepted to the RDMA subsystem.
> > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > > >
> > > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > > or should we open the "what qualifies as an RDMA device" question again?
> > >
> > > I want to remind you that rdma-core requirement came to make sure that
> > > anything exposed from the RDMA to the userspace is strict with proper
> > > UAPI header hygiene.
> > >
> > > I doubt that Havana's ioctls are backed by anything like this.
> > >
> > > Thanks
> >
> > Why do you doubt that ? Have you looked at our code ?
> > Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> > and it is very safe and protected.
>
> Yes, I looked and didn't find open-source users of your UAPI headers.
> It is not related to being safe or protected by to the common request
> to present userspace that relies on those exported interfaces.
>
> > Otherwise Greg would have never allowed me to go upstream in the first place.
>
> Nice, can we get a link?
>
> >
> > We have a single function which is the entry point for all the IOCTLs
> > of our drivers (only one IOCTL is RDMA related, all the others are
> > compute related).
> > That function is almost 1:1 copy of the function in drm.
>
> DRM has same rules as RDMA, no kernel code will be merged without seeing
> open-source userspace.
>
> Thanks
>
> >
> > Thanks,
> > Oded

So we do have an open-source library called hl-thunk, which uses our
driver and indeed that was part of the requirement.
It is similar to libdrm.
Here is the link:
https://github.com/HabanaAI/hl-thunk

That library also comes with a comprehensive suite of tests which
shows how to use the accelerator and we have many NIC tests which show
how to use the NIC.
All the rest of the user-space code in Habana is going through that library.

Currently, you won't find the NIC code there because we didn't
upstream it as the driver code wasn't ready, but I'll push it there in
a private branch if you want to take a look.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 11:36       ` Gal Pressman
  2020-09-18 11:52         ` Leon Romanovsky
  2020-09-18 11:56         ` Jason Gunthorpe
@ 2020-09-18 12:10         ` Oded Gabbay
  2 siblings, 0 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 12:10 UTC (permalink / raw)
  To: izur
  Cc: Jason Gunthorpe, Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, Greg Kroah-Hartman, David S. Miller,
	Andrew Lunn, Florian Fainelli, linux-rdma, Gal Pressman,
	Leon Romanovsky

On Fri, Sep 18, 2020 at 2:36 PM Gal Pressman <galpress@amazon.com> wrote:
>
> On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> >> infrastructure for communication between multiple accelerators. Same
> >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> >> The RDMA implementation we did does NOT support some basic RDMA
> >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> >> library or to connect to the rdma infrastructure in the kernel.
> >
> > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > you can't add random device offloads as IOCTL to nedevs.
> >
> > RDMA is the proper home for all the networking offloads that don't fit
> > into netdev.
> >
> > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > all. I'm sure this can too.
>
> Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> was suggested to go through the vfio subsystem instead.
>
> I think this comes back to the discussion we had when EFA was upstreamed, which
> is what's the bar to get accepted to the RDMA subsystem.
> IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
>
> Does GAUDI fit these requirements? If not, should it be in a different subsystem
> or should we open the "what qualifies as an RDMA device" question again?

Hi Itay,
Please see the above comments/questions.
Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 11:59           ` Oded Gabbay
@ 2020-09-18 12:16             ` Jason Gunthorpe
  2020-09-18 12:34               ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 12:16 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, Greg Kroah-Hartman, David S. Miller,
	Andrew Lunn, Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 02:59:28PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 2:56 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > >> infrastructure for communication between multiple accelerators. Same
> > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > >> library or to connect to the rdma infrastructure in the kernel.
> > > >
> > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > you can't add random device offloads as IOCTL to nedevs.
> > > >
> > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > into netdev.
> > > >
> > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > all. I'm sure this can too.
> > >
> > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > was suggested to go through the vfio subsystem instead.
> > >
> > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > is what's the bar to get accepted to the RDMA subsystem.
> > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> >
> > That is more or less where we ended up, yes.
> >
> > I'm most worried about this lack of PD and MR.
> >
> > Kernel must provide security for apps doing user DMA, PD and MR do
> > this. If the device doesn't have PD/MR then it is hard to see how a WQ
> > could ever be exposed directly to userspace, regardless of subsystem.
> 
> Hi Jason,
> What you say here is very true and we handle that with different
> mechanisms. I will start working on a dedicated patch-set of the RDMA
> code in the next few weeks with MUCH MORE details in the commit
> messages. That will explain exactly how we expose stuff and protect.
> 
> For example, regarding isolating between applications, we only support
> a single application opening our file descriptor.

Then the driver has a special PD create that requires the misc file
descriptor to authorize RDMA access to the resources in that security
context.

> Another example is that the submission of WQ is done through our QMAN
> mechanism and is NOT mapped to userspace (due to the restrictions you
> mentioned above and other restrictions).

Sure, other RDMA drivers also require a kernel ioctl for command
execution.

In this model the MR can be a software construct, again representing a
security authorization:

- A 'full process' MR, in which case the kernel command excution
  handles dma map and pinning at command execution time
- A 'normal' MR, in which case the DMA list is pre-created and the
  command execution just re-uses this data

The general requirement for RDMA is the same as DRM, you must provide
enough code in rdma-core to show how the device works, and minimally
test it. EFA uses ibv_ud_pingpong, and some pyverbs tests IIRC.

So you'll want to arrange something where the default MR and PD
mechanisms do something workable on this device, like auto-open the
misc FD when building the PD, and support the 'normal' MR flow for
command execution.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:07               ` Oded Gabbay
@ 2020-09-18 12:19                 ` Leon Romanovsky
  2020-09-18 12:31                   ` Oded Gabbay
  2020-09-19  6:40                   ` Greg Kroah-Hartman
  0 siblings, 2 replies; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-18 12:19 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 03:07:19PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 3:03 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> > > On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > > > >> infrastructure for communication between multiple accelerators. Same
> > > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > > > >> library or to connect to the rdma infrastructure in the kernel.
> > > > > >
> > > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > > > you can't add random device offloads as IOCTL to nedevs.
> > > > > >
> > > > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > > > into netdev.
> > > > > >
> > > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > > > all. I'm sure this can too.
> > > > >
> > > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > > > was suggested to go through the vfio subsystem instead.
> > > > >
> > > > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > > > is what's the bar to get accepted to the RDMA subsystem.
> > > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > > > >
> > > > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > > > or should we open the "what qualifies as an RDMA device" question again?
> > > >
> > > > I want to remind you that rdma-core requirement came to make sure that
> > > > anything exposed from the RDMA to the userspace is strict with proper
> > > > UAPI header hygiene.
> > > >
> > > > I doubt that Havana's ioctls are backed by anything like this.
> > > >
> > > > Thanks
> > >
> > > Why do you doubt that ? Have you looked at our code ?
> > > Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> > > and it is very safe and protected.
> >
> > Yes, I looked and didn't find open-source users of your UAPI headers.
> > It is not related to being safe or protected by to the common request
> > to present userspace that relies on those exported interfaces.
> >
> > > Otherwise Greg would have never allowed me to go upstream in the first place.
> >
> > Nice, can we get a link?
> >
> > >
> > > We have a single function which is the entry point for all the IOCTLs
> > > of our drivers (only one IOCTL is RDMA related, all the others are
> > > compute related).
> > > That function is almost 1:1 copy of the function in drm.
> >
> > DRM has same rules as RDMA, no kernel code will be merged without seeing
> > open-source userspace.
> >
> > Thanks
> >
> > >
> > > Thanks,
> > > Oded
>
> So we do have an open-source library called hl-thunk, which uses our
> driver and indeed that was part of the requirement.
> It is similar to libdrm.
> Here is the link:
> https://github.com/HabanaAI/hl-thunk

Are you kidding?

This is mirror of some internal repository that looks like dumpster
with ChangeId, internal bug tracker numbers, not part of major OS
distributions.

It is not open-source library and shows very clear why you chose
to upstream your driver through driver/misc/ tree.

Thanks

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:19                 ` Leon Romanovsky
@ 2020-09-18 12:31                   ` Oded Gabbay
  2020-09-18 13:09                     ` Leon Romanovsky
  2020-09-19  6:40                   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 12:31 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma, izur, Olof Johansson

On Fri, Sep 18, 2020 at 3:19 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Fri, Sep 18, 2020 at 03:07:19PM +0300, Oded Gabbay wrote:
> > On Fri, Sep 18, 2020 at 3:03 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> > > > On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > > > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > > > > >> infrastructure for communication between multiple accelerators. Same
> > > > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > > > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > > > > >> library or to connect to the rdma infrastructure in the kernel.
> > > > > > >
> > > > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > > > > you can't add random device offloads as IOCTL to nedevs.
> > > > > > >
> > > > > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > > > > into netdev.
> > > > > > >
> > > > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > > > > all. I'm sure this can too.
> > > > > >
> > > > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > > > > was suggested to go through the vfio subsystem instead.
> > > > > >
> > > > > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > > > > is what's the bar to get accepted to the RDMA subsystem.
> > > > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > > > > >
> > > > > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > > > > or should we open the "what qualifies as an RDMA device" question again?
> > > > >
> > > > > I want to remind you that rdma-core requirement came to make sure that
> > > > > anything exposed from the RDMA to the userspace is strict with proper
> > > > > UAPI header hygiene.
> > > > >
> > > > > I doubt that Havana's ioctls are backed by anything like this.
> > > > >
> > > > > Thanks
> > > >
> > > > Why do you doubt that ? Have you looked at our code ?
> > > > Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> > > > and it is very safe and protected.
> > >
> > > Yes, I looked and didn't find open-source users of your UAPI headers.
> > > It is not related to being safe or protected by to the common request
> > > to present userspace that relies on those exported interfaces.
> > >
> > > > Otherwise Greg would have never allowed me to go upstream in the first place.
> > >
> > > Nice, can we get a link?
> > >
> > > >
> > > > We have a single function which is the entry point for all the IOCTLs
> > > > of our drivers (only one IOCTL is RDMA related, all the others are
> > > > compute related).
> > > > That function is almost 1:1 copy of the function in drm.
> > >
> > > DRM has same rules as RDMA, no kernel code will be merged without seeing
> > > open-source userspace.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks,
> > > > Oded
> >
> > So we do have an open-source library called hl-thunk, which uses our
> > driver and indeed that was part of the requirement.
> > It is similar to libdrm.
> > Here is the link:
> > https://github.com/HabanaAI/hl-thunk
>
> Are you kidding?
>
> This is mirror of some internal repository that looks like dumpster
> with ChangeId, internal bug tracker numbers, not part of major OS
> distributions.
>
> It is not open-source library and shows very clear why you chose
> to upstream your driver through driver/misc/ tree.
>
> Thanks

Adding Olof here.

No, usually not.
But are you kidding ?
What did you exactly expect to find ? Is there an open-source project
somewhere that encapsulates Deep-learning accelerators which I could
connect to ?
AFAIK, the only thing remotely relevant is CUDA and that is
closed-source (strange to hear lectures about open-source from NVIDIA
people here...)

So we are trying to give to the community such an open source library,
or at least an example. Hopefully one day, when more companies
upstream their drivers for deep-learning accelerators we could do
something like libdrm or rdma-core, but for now, it's just our driver.

I have been in this community since 2013 with AMD and then RedHat, and
I come with good intentions and a desire to open source and upstream
as much as I can. I don't think I deserve this kind of response.

The bottom line is that we had this discussion with Greg and Olof and
DRM people almost 2 years ago and if there was some open-source
project in user-space or some subsystem in the kernel we could connect
to, we would have done that instead of what we did, but the fact of
the matter there isn't such thing. Olof tried and is trying to create
a h/w accelerator subsystem but it still hasn't got up from the ground
yet.

Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:16             ` Jason Gunthorpe
@ 2020-09-18 12:34               ` Oded Gabbay
  2020-09-18 12:50                 ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 12:34 UTC (permalink / raw)
  To: Jason Gunthorpe, izur
  Cc: Gal Pressman, Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, Greg Kroah-Hartman, David S. Miller,
	Andrew Lunn, Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 3:16 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Sep 18, 2020 at 02:59:28PM +0300, Oded Gabbay wrote:
> > On Fri, Sep 18, 2020 at 2:56 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > > >> infrastructure for communication between multiple accelerators. Same
> > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > > >> library or to connect to the rdma infrastructure in the kernel.
> > > > >
> > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > > you can't add random device offloads as IOCTL to nedevs.
> > > > >
> > > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > > into netdev.
> > > > >
> > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > > all. I'm sure this can too.
> > > >
> > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > > was suggested to go through the vfio subsystem instead.
> > > >
> > > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > > is what's the bar to get accepted to the RDMA subsystem.
> > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > >
> > > That is more or less where we ended up, yes.
> > >
> > > I'm most worried about this lack of PD and MR.
> > >
> > > Kernel must provide security for apps doing user DMA, PD and MR do
> > > this. If the device doesn't have PD/MR then it is hard to see how a WQ
> > > could ever be exposed directly to userspace, regardless of subsystem.
> >
> > Hi Jason,
> > What you say here is very true and we handle that with different
> > mechanisms. I will start working on a dedicated patch-set of the RDMA
> > code in the next few weeks with MUCH MORE details in the commit
> > messages. That will explain exactly how we expose stuff and protect.
> >
> > For example, regarding isolating between applications, we only support
> > a single application opening our file descriptor.
>
> Then the driver has a special PD create that requires the misc file
> descriptor to authorize RDMA access to the resources in that security
> context.
>
> > Another example is that the submission of WQ is done through our QMAN
> > mechanism and is NOT mapped to userspace (due to the restrictions you
> > mentioned above and other restrictions).
>
> Sure, other RDMA drivers also require a kernel ioctl for command
> execution.
>
> In this model the MR can be a software construct, again representing a
> security authorization:
>
> - A 'full process' MR, in which case the kernel command excution
>   handles dma map and pinning at command execution time
> - A 'normal' MR, in which case the DMA list is pre-created and the
>   command execution just re-uses this data
>
> The general requirement for RDMA is the same as DRM, you must provide
> enough code in rdma-core to show how the device works, and minimally
> test it. EFA uses ibv_ud_pingpong, and some pyverbs tests IIRC.
>
> So you'll want to arrange something where the default MR and PD
> mechanisms do something workable on this device, like auto-open the
> misc FD when building the PD, and support the 'normal' MR flow for
> command execution.
>
> Jason

I don't know how we can support MR because we can't support any
virtual address on the host. Our internal MMU doesn't support 64-bits.
We investigated in the past, very much wanted to use IBverbs but
didn't figure out how to make it work.
I'm adding Itay here and he can also shed more details on that.
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:34               ` Oded Gabbay
@ 2020-09-18 12:50                 ` Jason Gunthorpe
  2020-09-18 13:02                   ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 12:50 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 03:34:54PM +0300, Oded Gabbay wrote:
> > > Another example is that the submission of WQ is done through our QMAN
> > > mechanism and is NOT mapped to userspace (due to the restrictions you
> > > mentioned above and other restrictions).
> >
> > Sure, other RDMA drivers also require a kernel ioctl for command
> > execution.
> >
> > In this model the MR can be a software construct, again representing a
> > security authorization:
> >
> > - A 'full process' MR, in which case the kernel command excution
> >   handles dma map and pinning at command execution time
> > - A 'normal' MR, in which case the DMA list is pre-created and the
> >   command execution just re-uses this data
> >
> > The general requirement for RDMA is the same as DRM, you must provide
> > enough code in rdma-core to show how the device works, and minimally
> > test it. EFA uses ibv_ud_pingpong, and some pyverbs tests IIRC.
> >
> > So you'll want to arrange something where the default MR and PD
> > mechanisms do something workable on this device, like auto-open the
> > misc FD when building the PD, and support the 'normal' MR flow for
> > command execution.
> 
> I don't know how we can support MR because we can't support any
> virtual address on the host. Our internal MMU doesn't support 64-bits.
> We investigated in the past, very much wanted to use IBverbs but
> didn't figure out how to make it work.
> I'm adding Itay here and he can also shed more details on that.

I'm not sure what that means, if the driver intends to DMA from
process memory then it certainly has a MR concept. 

MRs can control the IOVA directly so if you say the HW needs a MR IOVA
< 2**32 then that is still OK.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:50                 ` Jason Gunthorpe
@ 2020-09-18 13:02                   ` Oded Gabbay
  2020-09-18 13:26                     ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 13:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 3:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Sep 18, 2020 at 03:34:54PM +0300, Oded Gabbay wrote:
> > > > Another example is that the submission of WQ is done through our QMAN
> > > > mechanism and is NOT mapped to userspace (due to the restrictions you
> > > > mentioned above and other restrictions).
> > >
> > > Sure, other RDMA drivers also require a kernel ioctl for command
> > > execution.
> > >
> > > In this model the MR can be a software construct, again representing a
> > > security authorization:
> > >
> > > - A 'full process' MR, in which case the kernel command excution
> > >   handles dma map and pinning at command execution time
> > > - A 'normal' MR, in which case the DMA list is pre-created and the
> > >   command execution just re-uses this data
> > >
> > > The general requirement for RDMA is the same as DRM, you must provide
> > > enough code in rdma-core to show how the device works, and minimally
> > > test it. EFA uses ibv_ud_pingpong, and some pyverbs tests IIRC.
> > >
> > > So you'll want to arrange something where the default MR and PD
> > > mechanisms do something workable on this device, like auto-open the
> > > misc FD when building the PD, and support the 'normal' MR flow for
> > > command execution.
> >
> > I don't know how we can support MR because we can't support any
> > virtual address on the host. Our internal MMU doesn't support 64-bits.
> > We investigated in the past, very much wanted to use IBverbs but
> > didn't figure out how to make it work.
> > I'm adding Itay here and he can also shed more details on that.
>
> I'm not sure what that means, if the driver intends to DMA from
> process memory then it certainly has a MR concept.
>
> MRs can control the IOVA directly so if you say the HW needs a MR IOVA
> < 2**32 then that is still OK.
>
> Jason

Hi Jason,
I'll try to explain but please bear with me because it requires some
understanding of our H/W architecture.

Our ASIC has 32 GB of HBM memory (similar to GPUs). The problem is
that HBM memory is accessed by our ASIC's engines (DMA, NIC, etc.)
with physical addressing, which is mapped inside our device between
0x0 to 0x8_0000_0000.

Now, if a user performs malloc and then maps that memory to our device
(using our memory MAP ioctl, similar to how GPU works), it will get a
new virtual address, which is in the range of 0x80_0000_0000 - (2^50
-1). Then, he can use that new VA in our device with different engines
(DMA, NIC, compute).

That way, addresses that represent the host memory do not overlap
addresses that represent HBM memory.

The problem with MR is that the API doesn't let us return a new VA. It
forces us to use the original VA that the Host OS allocated. What will
we do if that VA is in the range of our HBM addresses ? The device
won't be able to distinguish between them. The transaction that is
generated by an engine inside our device will go to the HBM instead of
going to the PCI controller and then to the host.

That's the crust of the problem and why we didn't use MR.
If that's not clear, I'll be happy to explain more.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:31                   ` Oded Gabbay
@ 2020-09-18 13:09                     ` Leon Romanovsky
  0 siblings, 0 replies; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-18 13:09 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma, izur, Olof Johansson

On Fri, Sep 18, 2020 at 03:31:51PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 3:19 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Sep 18, 2020 at 03:07:19PM +0300, Oded Gabbay wrote:
> > > On Fri, Sep 18, 2020 at 3:03 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> > > > > On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > > > > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > > > > > >> infrastructure for communication between multiple accelerators. Same
> > > > > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > > > > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > > > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > > > > > >> library or to connect to the rdma infrastructure in the kernel.
> > > > > > > >
> > > > > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > > > > > you can't add random device offloads as IOCTL to nedevs.
> > > > > > > >
> > > > > > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > > > > > into netdev.
> > > > > > > >
> > > > > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > > > > > all. I'm sure this can too.
> > > > > > >
> > > > > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > > > > > was suggested to go through the vfio subsystem instead.
> > > > > > >
> > > > > > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > > > > > is what's the bar to get accepted to the RDMA subsystem.
> > > > > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > > > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > > > > > >
> > > > > > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > > > > > or should we open the "what qualifies as an RDMA device" question again?
> > > > > >
> > > > > > I want to remind you that rdma-core requirement came to make sure that
> > > > > > anything exposed from the RDMA to the userspace is strict with proper
> > > > > > UAPI header hygiene.
> > > > > >
> > > > > > I doubt that Havana's ioctls are backed by anything like this.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Why do you doubt that ? Have you looked at our code ?
> > > > > Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> > > > > and it is very safe and protected.
> > > >
> > > > Yes, I looked and didn't find open-source users of your UAPI headers.
> > > > It is not related to being safe or protected by to the common request
> > > > to present userspace that relies on those exported interfaces.
> > > >
> > > > > Otherwise Greg would have never allowed me to go upstream in the first place.
> > > >
> > > > Nice, can we get a link?
> > > >
> > > > >
> > > > > We have a single function which is the entry point for all the IOCTLs
> > > > > of our drivers (only one IOCTL is RDMA related, all the others are
> > > > > compute related).
> > > > > That function is almost 1:1 copy of the function in drm.
> > > >
> > > > DRM has same rules as RDMA, no kernel code will be merged without seeing
> > > > open-source userspace.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks,
> > > > > Oded
> > >
> > > So we do have an open-source library called hl-thunk, which uses our
> > > driver and indeed that was part of the requirement.
> > > It is similar to libdrm.
> > > Here is the link:
> > > https://github.com/HabanaAI/hl-thunk
> >
> > Are you kidding?
> >
> > This is mirror of some internal repository that looks like dumpster
> > with ChangeId, internal bug tracker numbers, not part of major OS
> > distributions.
> >
> > It is not open-source library and shows very clear why you chose
> > to upstream your driver through driver/misc/ tree.
> >
> > Thanks
>
> Adding Olof here.
>
> No, usually not.
> But are you kidding ?
> What did you exactly expect to find ? Is there an open-source project
> somewhere that encapsulates Deep-learning accelerators which I could
> connect to ?

I would expect certain level of code quality, collaboration and review
that distros require for inclusion. It is not the case for the github
repo you presented.

> AFAIK, the only thing remotely relevant is CUDA and that is
> closed-source (strange to hear lectures about open-source from NVIDIA
> people here...)

Please check git log statistics to estimate Nvidia/Mellanox/Cumulus
contributions to the Linux kernel and the open-source. You will be
surprised.

>
> So we are trying to give to the community such an open source library,
> or at least an example. Hopefully one day, when more companies
> upstream their drivers for deep-learning accelerators we could do
> something like libdrm or rdma-core, but for now, it's just our driver.

AFAIR, your driver is not unique, HiSilicon tried to submit something
similar years ago (warpdrive) and they are not alone.

>
> I have been in this community since 2013 with AMD and then RedHat, and
> I come with good intentions and a desire to open source and upstream
> as much as I can. I don't think I deserve this kind of response.

There is no need to take it personal. It was you who posted a link
to the github repo. What did you expect?

>
> The bottom line is that we had this discussion with Greg and Olof and
> DRM people almost 2 years ago and if there was some open-source
> project in user-space or some subsystem in the kernel we could connect
> to, we would have done that instead of what we did, but the fact of
> the matter there isn't such thing. Olof tried and is trying to create
> a h/w accelerator subsystem but it still hasn't got up from the ground
> yet.

Maybe it is a time to do it right.

>
> Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 13:02                   ` Oded Gabbay
@ 2020-09-18 13:26                     ` Jason Gunthorpe
  2020-09-18 13:49                       ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 13:26 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Andrew Lunn,
	Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote:
 
> The problem with MR is that the API doesn't let us return a new VA. It
> forces us to use the original VA that the Host OS allocated.

If using the common MR API you'd have to assign a unique linear range
in the single device address map and record both the IOVA and the MMU
VA in the kernel struct.

Then when submitting work using that MR lkey the kernel will adjust
the work VA using the equation (WORK_VA - IOVA) + MMU_VA before
forwarding to HW.

EFA doesn't support rkeys, so they are not required to be emulated. It
would have to create rkeys using some guadidv_reg_mr_rkey()

It is important to understand that the usual way we support these
non-RDMA devices is to insist that they use SW to construct a minimal
standards based RDMA API, and then allow the device to have a 'dv' API
to access a faster, highly device specific, SW bypass path.

So for instance you might have some guadidv_post_work(qp) that doesn't
use lkeys and works directly on the MMU_VA. A guadidv_get_mmu_va(mr)
would return the required HW VA from the kernel.

Usually the higher level communication library (UCX, MPI, etc) forms
the dv primitives into something application usable.

> we do if that VA is in the range of our HBM addresses ? The device
> won't be able to distinguish between them. The transaction that is
> generated by an engine inside our device will go to the HBM instead of
> going to the PCI controller and then to the host.
> 
> That's the crust of the problem and why we didn't use MR.

No, the problem with the device is that it doesn't have a lkey/rkey,
so it is stuck with a single translation domain. RoCE compliant
devices are required to have multiple translation domains - each
lkey/rkey specifies a unique translation.

The MR concept is a region of process VA mapped into the device for
device access, and this device *clearly* has that.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 13:26                     ` Jason Gunthorpe
@ 2020-09-18 13:49                       ` Oded Gabbay
  2020-09-18 13:59                         ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 13:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Greg Kroah-Hartman
  Cc: izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 4:26 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote:
>
> > The problem with MR is that the API doesn't let us return a new VA. It
> > forces us to use the original VA that the Host OS allocated.
>
> If using the common MR API you'd have to assign a unique linear range
> in the single device address map and record both the IOVA and the MMU
> VA in the kernel struct.
>
> Then when submitting work using that MR lkey the kernel will adjust
> the work VA using the equation (WORK_VA - IOVA) + MMU_VA before
> forwarding to HW.
>
We can't do that. That will kill the performance. If for every
submission I need to modify the packet's contents, the throughput will
go downhill.
Also, submissions to our RDMA qmans are coupled with submissions to
our DMA/Compute QMANs. We can't separate those to different API calls.
That will also kill performance and in addition, will prevent us from
synchronizing all the engines.

I also have to say, it troubles me that you keep referring to our
device as an RDMA device. It is not an RDMA device. It is a
deep-learning accelerator which uses RDMA as a way to interconnect
multiple devices. We don't intend to replace General-Purpose RDMA
devices. We know we don't support that.
Therefore, I still fail to see why we need to support all the above...

Our work submission is not to just "send/receive packets". Sending
packets is part of a general recipe to do DMA, perform compute on data
and send/receive data. All together, in a synchronized fashion.

The way you try to force me to go is to separate that into different
functionality, as if I have different ASICs, which is very
counter-productive in terms of performance and simplicity. i.e. have
one method of submitting work to DMA/compute and another way to RDMA
ports.

I know this is how the kernel is structured now - subsystems for
devices that belong to a single domain (graphics, net, storage). But I
fear that you will soon see this paradigm doesn't work with new
devices in AI, which combine multiple domains into a single ASIC.

Greg, I would love to hear your opinion here. Am I totally wrong ? Is
treating a single ASIC that belongs to multiple domains as if it were
multiple ASICs a good thing ? Don't you think it will hurt the
performance ?

Oded

> EFA doesn't support rkeys, so they are not required to be emulated. It
> would have to create rkeys using some guadidv_reg_mr_rkey()
>
> It is important to understand that the usual way we support these
> non-RDMA devices is to insist that they use SW to construct a minimal
> standards based RDMA API, and then allow the device to have a 'dv' API
> to access a faster, highly device specific, SW bypass path.
>
> So for instance you might have some guadidv_post_work(qp) that doesn't
> use lkeys and works directly on the MMU_VA. A guadidv_get_mmu_va(mr)
> would return the required HW VA from the kernel.
>
> Usually the higher level communication library (UCX, MPI, etc) forms
> the dv primitives into something application usable.
>
> > we do if that VA is in the range of our HBM addresses ? The device
> > won't be able to distinguish between them. The transaction that is
> > generated by an engine inside our device will go to the HBM instead of
> > going to the PCI controller and then to the host.
> >
> > That's the crust of the problem and why we didn't use MR.
>
> No, the problem with the device is that it doesn't have a lkey/rkey,
> so it is stuck with a single translation domain. RoCE compliant
> devices are required to have multiple translation domains - each
> lkey/rkey specifies a unique translation.
>
> The MR concept is a region of process VA mapped into the device for
> device access, and this device *clearly* has that.
>
> Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 13:49                       ` Oded Gabbay
@ 2020-09-18 13:59                         ` Jason Gunthorpe
  2020-09-18 14:12                           ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 13:59 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Greg Kroah-Hartman, izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 04:49:25PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 4:26 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote:
> >
> > > The problem with MR is that the API doesn't let us return a new VA. It
> > > forces us to use the original VA that the Host OS allocated.
> >
> > If using the common MR API you'd have to assign a unique linear range
> > in the single device address map and record both the IOVA and the MMU
> > VA in the kernel struct.
> >
> > Then when submitting work using that MR lkey the kernel will adjust
> > the work VA using the equation (WORK_VA - IOVA) + MMU_VA before
> > forwarding to HW.
> >
> We can't do that. That will kill the performance. If for every
> submission I need to modify the packet's contents, the throughput will
> go downhill.

You clearly didn't read where I explained there is a fast path and
slow path expectation.

> Also, submissions to our RDMA qmans are coupled with submissions to
> our DMA/Compute QMANs. We can't separate those to different API calls.
> That will also kill performance and in addition, will prevent us from
> synchronizing all the engines.

Not sure I see why this is a problem. I already explained the fast
device specific path. 

As long as the kernel maintains proper security when it processes
submissions the driver can allow objects to cross between the two
domains.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 13:59                         ` Jason Gunthorpe
@ 2020-09-18 14:12                           ` Oded Gabbay
  2020-09-18 14:19                             ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 14:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Greg Kroah-Hartman, izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 4:59 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Sep 18, 2020 at 04:49:25PM +0300, Oded Gabbay wrote:
> > On Fri, Sep 18, 2020 at 4:26 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote:
> > >
> > > > The problem with MR is that the API doesn't let us return a new VA. It
> > > > forces us to use the original VA that the Host OS allocated.
> > >
> > > If using the common MR API you'd have to assign a unique linear range
> > > in the single device address map and record both the IOVA and the MMU
> > > VA in the kernel struct.
> > >
> > > Then when submitting work using that MR lkey the kernel will adjust
> > > the work VA using the equation (WORK_VA - IOVA) + MMU_VA before
> > > forwarding to HW.
> > >
> > We can't do that. That will kill the performance. If for every
> > submission I need to modify the packet's contents, the throughput will
> > go downhill.
>
> You clearly didn't read where I explained there is a fast path and
> slow path expectation.
>
> > Also, submissions to our RDMA qmans are coupled with submissions to
> > our DMA/Compute QMANs. We can't separate those to different API calls.
> > That will also kill performance and in addition, will prevent us from
> > synchronizing all the engines.
>
> Not sure I see why this is a problem. I already explained the fast
> device specific path.
>
> As long as the kernel maintains proper security when it processes
> submissions the driver can allow objects to cross between the two
> domains.
Can you please explain what you mean by "two domains" ?
You mean the RDMA and compute domains ? Or something else ?

What I was trying to say is that I don't want the application to split
its submissions to different system calls.

Currently we perform submissions through the CS_IOCTL that is defined
in our driver. It is a single IOCTL which allows the user to submit
work to all queues, without regard to the underlying engine of each
queue.
If I need to split that to different system calls it will have major
implications. I don't even want to start thinking about all the
synchronization at the host (userspace) level that I will need to do.
That's what I meant by saying that you force me to treat my device as
if it were multiple devices. The whole point of our ASIC is to combine
multiple IPs on the same ASIC.

What will happen when we will add a third domain to our device (e.g.
storage, video decoding, encryption engine, whatever). Will I then
need to separate submissions to 3 different system calls ? In 3
different subsystems ? This doesn't scale. And I strongly say that it
will kill the performance of the device. Not because of the driver.
Because of the complications to the user-space.

Oded

>
> Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 14:12                           ` Oded Gabbay
@ 2020-09-18 14:19                             ` Jason Gunthorpe
  2020-09-18 14:45                               ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 14:19 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Greg Kroah-Hartman, izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 05:12:04PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 4:59 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Sep 18, 2020 at 04:49:25PM +0300, Oded Gabbay wrote:
> > > On Fri, Sep 18, 2020 at 4:26 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote:
> > > >
> > > > > The problem with MR is that the API doesn't let us return a new VA. It
> > > > > forces us to use the original VA that the Host OS allocated.
> > > >
> > > > If using the common MR API you'd have to assign a unique linear range
> > > > in the single device address map and record both the IOVA and the MMU
> > > > VA in the kernel struct.
> > > >
> > > > Then when submitting work using that MR lkey the kernel will adjust
> > > > the work VA using the equation (WORK_VA - IOVA) + MMU_VA before
> > > > forwarding to HW.
> > > >
> > > We can't do that. That will kill the performance. If for every
> > > submission I need to modify the packet's contents, the throughput will
> > > go downhill.
> >
> > You clearly didn't read where I explained there is a fast path and
> > slow path expectation.
> >
> > > Also, submissions to our RDMA qmans are coupled with submissions to
> > > our DMA/Compute QMANs. We can't separate those to different API calls.
> > > That will also kill performance and in addition, will prevent us from
> > > synchronizing all the engines.
> >
> > Not sure I see why this is a problem. I already explained the fast
> > device specific path.
> >
> > As long as the kernel maintains proper security when it processes
> > submissions the driver can allow objects to cross between the two
> > domains.
> Can you please explain what you mean by "two domains" ?
> You mean the RDMA and compute domains ? Or something else ?

Yes

> What I was trying to say is that I don't want the application to split
> its submissions to different system calls.

If you can manage the security then you can cross them. Eg since The
RDMA PD would be created on top of the /dev/misc char dev then it is
fine for the /dev/misc char dev to access the RDMA objects as a 'dv
fast path'.

But now that you say everything is interconnected, I'm wondering,
without HW security how do you keep netdev isolated from userspace?

Can I issue commands to /dev/misc and write to kernel memory (does the
kernel put any pages into the single MMU?) or corrupt the netdev
driver operations in any way?

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 14:19                             ` Jason Gunthorpe
@ 2020-09-18 14:45                               ` Oded Gabbay
  2020-09-18 15:07                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 14:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Greg Kroah-Hartman, izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 5:19 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Sep 18, 2020 at 05:12:04PM +0300, Oded Gabbay wrote:
> > On Fri, Sep 18, 2020 at 4:59 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Sep 18, 2020 at 04:49:25PM +0300, Oded Gabbay wrote:
> > > > On Fri, Sep 18, 2020 at 4:26 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote:
> > > > >
> > > > > > The problem with MR is that the API doesn't let us return a new VA. It
> > > > > > forces us to use the original VA that the Host OS allocated.
> > > > >
> > > > > If using the common MR API you'd have to assign a unique linear range
> > > > > in the single device address map and record both the IOVA and the MMU
> > > > > VA in the kernel struct.
> > > > >
> > > > > Then when submitting work using that MR lkey the kernel will adjust
> > > > > the work VA using the equation (WORK_VA - IOVA) + MMU_VA before
> > > > > forwarding to HW.
> > > > >
> > > > We can't do that. That will kill the performance. If for every
> > > > submission I need to modify the packet's contents, the throughput will
> > > > go downhill.
> > >
> > > You clearly didn't read where I explained there is a fast path and
> > > slow path expectation.
> > >
> > > > Also, submissions to our RDMA qmans are coupled with submissions to
> > > > our DMA/Compute QMANs. We can't separate those to different API calls.
> > > > That will also kill performance and in addition, will prevent us from
> > > > synchronizing all the engines.
> > >
> > > Not sure I see why this is a problem. I already explained the fast
> > > device specific path.
> > >
> > > As long as the kernel maintains proper security when it processes
> > > submissions the driver can allow objects to cross between the two
> > > domains.
> > Can you please explain what you mean by "two domains" ?
> > You mean the RDMA and compute domains ? Or something else ?
>
> Yes
>
> > What I was trying to say is that I don't want the application to split
> > its submissions to different system calls.
>
> If you can manage the security then you can cross them. Eg since The
> RDMA PD would be created on top of the /dev/misc char dev then it is
> fine for the /dev/misc char dev to access the RDMA objects as a 'dv
> fast path'.
>
> But now that you say everything is interconnected, I'm wondering,
> without HW security how do you keep netdev isolated from userspace?
>
> Can I issue commands to /dev/misc and write to kernel memory (does the
> kernel put any pages into the single MMU?) or corrupt the netdev
> driver operations in any way?
>
> Jason

No, no, no. Please give me more credit :) btw, our kernel interface
was scrutinized when we upstreamed the driver and it was under review
by the Intel security team.

To explain our security mechanism will require some time. It is
detailed in the driver, but it is hard to understand without some
background.
I wonder where to start...

First of all, we support open, close, mmap and IOCTLs to
/dev/misc/hlX. We don't support read/write system calls.
A user never gets direct access to kernel memory. Only through
standard mmap. The only thing we allow to mmap is a command buffer
(which is used to submit work to certain DMA  queues on our device)
and to a memory region we use for "CQ" for the RDMA. That's it.

Any access by the device's engines to the host memory is done via our
device's MMU. Our MMU supports multiple ASIDs - Address Space IDs. The
kernel driver is assigned ASID 0, while the user is assigned ASID 1.
We can support up to 1024 ASIDs, but because we limit the user to have
a single application, we only use ASID 0 and 1.

The above means a user can't program an engine (DMA, NIC, compute) to
access memory he didn't first mapped into our device's MMU. The
mapping is done via one of our IOCTLs and the kernel driver makes sure
(using standard kernel internal APIs) the host memory truly belongs to
the user process. All those mappings are done using ASID 1.

If the driver needs to map kernel pages into the device's MMU, then
this is done using ASID 0. This is how we take care of separation
between kernel memory and user memory.

Each transaction our engines create and is going to the host first
passes through our MMU. The transaction comes with its ASID value.
According to that, the MMU knows which page tables to do the walk on.

Specifically regarding RDMA, the user prepares a WQE on the host
memory in an area which is mapped into our MMU using ASID 1. The user
uses the NIC control IOCTL to give the kernel driver the virtual base
address of the WQ and the driver programs it to the H/W. Then, the
user can submit the WQE by submitting a command buffer to the NIC
QMAN. The command buffer contains a message to the QMAN that tells it
to ring the doorbell of the relevant NIC port. The user can't do it
from userspace.

For regular Ethernet traffice, we don't have any IOCTLs of course. All
Ethernet operations are done via the standard networking subsystem
(sockets, etc.).

There are more details of course. I don't know how much you want me to
go deeper. If you have specific questions I'll be happy to answer.
Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 14:45                               ` Oded Gabbay
@ 2020-09-18 15:07                                 ` Jason Gunthorpe
  2020-09-18 15:15                                   ` Oded Gabbay
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 15:07 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Greg Kroah-Hartman, izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 05:45:21PM +0300, Oded Gabbay wrote:

> Any access by the device's engines to the host memory is done via our
> device's MMU. Our MMU supports multiple ASIDs - Address Space IDs. The
> kernel driver is assigned ASID 0, while the user is assigned ASID 1.
> We can support up to 1024 ASIDs, but because we limit the user to have
> a single application, we only use ASID 0 and 1.

If the QP/WQ/etc is HW bound to an ASID then that binding is called a
PD and the ASID is acting in the PD role.

If the ASID is translating from on the wire IOVA to DMA PA, then it is
acting in the MR role as well.

Bundling those two things together is not as flexible as standards
based RDMA, but it is not as far away as you are making things out to
be.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 15:07                                 ` Jason Gunthorpe
@ 2020-09-18 15:15                                   ` Oded Gabbay
  2020-09-18 15:28                                     ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-18 15:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Greg Kroah-Hartman, izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 6:07 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Sep 18, 2020 at 05:45:21PM +0300, Oded Gabbay wrote:
>
> > Any access by the device's engines to the host memory is done via our
> > device's MMU. Our MMU supports multiple ASIDs - Address Space IDs. The
> > kernel driver is assigned ASID 0, while the user is assigned ASID 1.
> > We can support up to 1024 ASIDs, but because we limit the user to have
> > a single application, we only use ASID 0 and 1.
>
> If the QP/WQ/etc is HW bound to an ASID then that binding is called a
> PD and the ASID is acting in the PD role.
>
> If the ASID is translating from on the wire IOVA to DMA PA, then it is
> acting in the MR role as well.
>
> Bundling those two things together is not as flexible as standards
> based RDMA, but it is not as far away as you are making things out to
> be.
>
> Jason

But Jason, why do I need to use RDMA definitions in my common code ?
RDMA is such a small part of our ASIC. We also have an ASIC called
GOYA for inference, which is handled by the same driver, but doesn't
have RDMA ports at all. Why would I need to use RDMA definitions for
that ?

I'm sorry, but you won't be able to convince me here that I need to
"enslave" my entire code to RDMA, just because my ASIC "also" has some
RDMA ports.
On the same weight, the GPU people tried and failed to say that my
device is a GPU. And I think the reasoning that we applied back then,
and Greg and Olof agreed with it, applies here as well.

I want to play along, but it has to be something that won't make my
entire device's driver into an RDMA driver. And it has to be something
that doesn't hurt performance.
All other things can and will be changed according to your inputs.

Thanks,
Oded

Oded

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 15:15                                   ` Oded Gabbay
@ 2020-09-18 15:28                                     ` Jason Gunthorpe
  2020-09-21 11:22                                       ` Gal Pressman
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-18 15:28 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Greg Kroah-Hartman, izur, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Fri, Sep 18, 2020 at 06:15:52PM +0300, Oded Gabbay wrote:

> I'm sorry, but you won't be able to convince me here that I need to
> "enslave" my entire code to RDMA, just because my ASIC "also" has some
> RDMA ports.

You can't recreate common shared subsystems in a driver just because
you don't want to work with the subsystem.

I don't care what else the ASIC has. In Linux the netdev part is
exposed through netdev, the RDMA part through RDMA, the
totally-not-a-GPU part through drivers/misc.

It is always been this way. Chelsio didn't get to rebuild the SCSI
stack in their driver just because "storage is a small part of their
device"

Drivers are not allowed to re-implement I2C/SPI/etc without re-using
the comon code for that just because "I2C is a small part of their
device"

Exposing to userspace the creation of RoCE QPs and their related
objects are unambiguously a RDMA subsystem task. I don't even know how
you think you can argue it is not. It is your company proudly claiming
the device has 100G RoCE ports in all the marketing literature, after
all.

It is too bad the device has a non-standards compliant implementation
of RoCE so this will be a bit hard for you. Oh well.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 12:19                 ` Leon Romanovsky
  2020-09-18 12:31                   ` Oded Gabbay
@ 2020-09-19  6:40                   ` Greg Kroah-Hartman
  2020-09-19  8:20                     ` Leon Romanovsky
  1 sibling, 1 reply; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-19  6:40 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Oded Gabbay, Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Fri, Sep 18, 2020 at 03:19:05PM +0300, Leon Romanovsky wrote:
> > So we do have an open-source library called hl-thunk, which uses our
> > driver and indeed that was part of the requirement.
> > It is similar to libdrm.
> > Here is the link:
> > https://github.com/HabanaAI/hl-thunk
> 
> Are you kidding?
> 
> This is mirror of some internal repository that looks like dumpster
> with ChangeId, internal bug tracker numbers, not part of major OS
> distributions.
> 
> It is not open-source library and shows very clear why you chose
> to upstream your driver through driver/misc/ tree.

It is an open source library, as per the license and the code
availability.  What more is expected here?

No distro has to pick it up, that's not a requirement for kernel code,
we have many kernel helper programs that are not in distros.  Heck, udev
took a long time to get into distros, does that mean the kernel side of
that interface should never have been merged?

I don't understand your complaint here, it's not our place to judge the
code quality of userspace libraries, otherwise we would never get any
real-work done :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19  6:40                   ` Greg Kroah-Hartman
@ 2020-09-19  8:20                     ` Leon Romanovsky
  2020-09-19  8:30                       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-19  8:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Oded Gabbay, Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sat, Sep 19, 2020 at 08:40:20AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 18, 2020 at 03:19:05PM +0300, Leon Romanovsky wrote:
> > > So we do have an open-source library called hl-thunk, which uses our
> > > driver and indeed that was part of the requirement.
> > > It is similar to libdrm.
> > > Here is the link:
> > > https://github.com/HabanaAI/hl-thunk
> >
> > Are you kidding?
> >
> > This is mirror of some internal repository that looks like dumpster
> > with ChangeId, internal bug tracker numbers, not part of major OS
> > distributions.
> >
> > It is not open-source library and shows very clear why you chose
> > to upstream your driver through driver/misc/ tree.
>
> It is an open source library, as per the license and the code
> availability.  What more is expected here?

So can I fork iproute2, add bunch of new custom netlink UAPIs and expect
Dave to merge it after I throw it on github?

>
> No distro has to pick it up, that's not a requirement for kernel code,
> we have many kernel helper programs that are not in distros.  Heck, udev
> took a long time to get into distros, does that mean the kernel side of
> that interface should never have been merged?
>
> I don't understand your complaint here, it's not our place to judge the
> code quality of userspace libraries, otherwise we would never get any
> real-work done :)

My main complaint is that you can't imagine merging code into large
subsystems (netdev, RDMA, DRM? e.t.c) without being civil open-source
citizen. It means use of existing user-space libraries/tools and/or
providing new ones that will be usable for everyone.

In this case, we have some custom char device with library that is not
usable for anyone else and this is why drivers/misc/ is right place.

While we are talking about real-work, it is our benefit to push companies
to make investment into ecosystem and not letting them to find an excuse
for not doing it.

Thanks

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19  8:20                     ` Leon Romanovsky
@ 2020-09-19  8:30                       ` Greg Kroah-Hartman
  2020-09-19  8:58                         ` Leon Romanovsky
  2020-09-19 16:43                         ` Oded Gabbay
  0 siblings, 2 replies; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-19  8:30 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Oded Gabbay, Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sat, Sep 19, 2020 at 11:20:03AM +0300, Leon Romanovsky wrote:
> On Sat, Sep 19, 2020 at 08:40:20AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Sep 18, 2020 at 03:19:05PM +0300, Leon Romanovsky wrote:
> > > > So we do have an open-source library called hl-thunk, which uses our
> > > > driver and indeed that was part of the requirement.
> > > > It is similar to libdrm.
> > > > Here is the link:
> > > > https://github.com/HabanaAI/hl-thunk
> > >
> > > Are you kidding?
> > >
> > > This is mirror of some internal repository that looks like dumpster
> > > with ChangeId, internal bug tracker numbers, not part of major OS
> > > distributions.
> > >
> > > It is not open-source library and shows very clear why you chose
> > > to upstream your driver through driver/misc/ tree.
> >
> > It is an open source library, as per the license and the code
> > availability.  What more is expected here?
> 
> So can I fork iproute2, add bunch of new custom netlink UAPIs and expect
> Dave to merge it after I throw it on github?

Don't be silly, that's not the case here at all and you know that.

> > No distro has to pick it up, that's not a requirement for kernel code,
> > we have many kernel helper programs that are not in distros.  Heck, udev
> > took a long time to get into distros, does that mean the kernel side of
> > that interface should never have been merged?
> >
> > I don't understand your complaint here, it's not our place to judge the
> > code quality of userspace libraries, otherwise we would never get any
> > real-work done :)
> 
> My main complaint is that you can't imagine merging code into large
> subsystems (netdev, RDMA, DRM? e.t.c) without being civil open-source
> citizen. It means use of existing user-space libraries/tools and/or
> providing new ones that will be usable for everyone.

Agreed.

> In this case, we have some custom char device with library that is not
> usable for anyone else and this is why drivers/misc/ is right place.

Also agreed.

> While we are talking about real-work, it is our benefit to push companies
> to make investment into ecosystem and not letting them to find an excuse
> for not doing it.

So why are you complaining about a stand-alone driver that does not have
any shared subsystems's userspace code to control that driver?

Yes, when integrating into other subsystems (i.e. networking and rdma),
they should use those common subsystems interfaces, no one is arguing
that at all.

totally lost,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19  8:30                       ` Greg Kroah-Hartman
@ 2020-09-19  8:58                         ` Leon Romanovsky
  2020-09-19 16:43                         ` Oded Gabbay
  1 sibling, 0 replies; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-19  8:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Oded Gabbay, Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sat, Sep 19, 2020 at 10:30:12AM +0200, Greg Kroah-Hartman wrote:
> On Sat, Sep 19, 2020 at 11:20:03AM +0300, Leon Romanovsky wrote:
> > On Sat, Sep 19, 2020 at 08:40:20AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Sep 18, 2020 at 03:19:05PM +0300, Leon Romanovsky wrote:
> > > > > So we do have an open-source library called hl-thunk, which uses our
> > > > > driver and indeed that was part of the requirement.
> > > > > It is similar to libdrm.
> > > > > Here is the link:
> > > > > https://github.com/HabanaAI/hl-thunk
> > > >
> > > > Are you kidding?
> > > >
> > > > This is mirror of some internal repository that looks like dumpster
> > > > with ChangeId, internal bug tracker numbers, not part of major OS
> > > > distributions.
> > > >
> > > > It is not open-source library and shows very clear why you chose
> > > > to upstream your driver through driver/misc/ tree.
> > >
> > > It is an open source library, as per the license and the code
> > > availability.  What more is expected here?
> >
> > So can I fork iproute2, add bunch of new custom netlink UAPIs and expect
> > Dave to merge it after I throw it on github?
>
> Don't be silly, that's not the case here at all and you know that.

It was far-fetched example.

>
> > > No distro has to pick it up, that's not a requirement for kernel code,
> > > we have many kernel helper programs that are not in distros.  Heck, udev
> > > took a long time to get into distros, does that mean the kernel side of
> > > that interface should never have been merged?
> > >
> > > I don't understand your complaint here, it's not our place to judge the
> > > code quality of userspace libraries, otherwise we would never get any
> > > real-work done :)
> >
> > My main complaint is that you can't imagine merging code into large
> > subsystems (netdev, RDMA, DRM? e.t.c) without being civil open-source
> > citizen. It means use of existing user-space libraries/tools and/or
> > providing new ones that will be usable for everyone.
>
> Agreed.
>
> > In this case, we have some custom char device with library that is not
> > usable for anyone else and this is why drivers/misc/ is right place.
>
> Also agreed.
>
> > While we are talking about real-work, it is our benefit to push companies
> > to make investment into ecosystem and not letting them to find an excuse
> > for not doing it.
>
> So why are you complaining about a stand-alone driver that does not have
> any shared subsystems's userspace code to control that driver?

I didn't, everything started when I explained to Gal why RDMA subsystem
requires rdma-core counterpart for any UAPI code.
https://lore.kernel.org/linux-rdma/CAFCwf12B4vCCwmfA7+VTUYUgJ9EHAtvg6F0bMYnsSCUBST+aWA@mail.gmail.com/T/#m17d52d61adadf54c12bfecf1af5db40f5d829ac3

And expressed my view on the quality of the library that was presented
as open-source example.
https://lore.kernel.org/linux-rdma/CAFCwf12B4vCCwmfA7+VTUYUgJ9EHAtvg6F0bMYnsSCUBST+aWA@mail.gmail.com/T/#m9059c5a9405ba932d9ffb731195a43b27443d265

>
> Yes, when integrating into other subsystems (i.e. networking and rdma),
> they should use those common subsystems interfaces, no one is arguing
> that at all.
>
> totally lost,

And here comes my request to do it right
https://lore.kernel.org/linux-rdma/CAFCwf12B4vCCwmfA7+VTUYUgJ9EHAtvg6F0bMYnsSCUBST+aWA@mail.gmail.com/T/#ma1fa6fe63666f630674eb668f1c00e6a672db85b

All that I asked from Oded is to do UAPI/libraries right, while all the responses
can be summarized to one sentence - "it is too hard, we don't want to do it."

Thanks

>
> greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19  8:30                       ` Greg Kroah-Hartman
  2020-09-19  8:58                         ` Leon Romanovsky
@ 2020-09-19 16:43                         ` Oded Gabbay
  2020-09-19 17:27                           ` Greg Kroah-Hartman
  2020-09-19 18:49                           ` Andrew Lunn
  1 sibling, 2 replies; 83+ messages in thread
From: Oded Gabbay @ 2020-09-19 16:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Leon Romanovsky, Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sat, Sep 19, 2020 at 11:30 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Sat, Sep 19, 2020 at 11:20:03AM +0300, Leon Romanovsky wrote:
> > On Sat, Sep 19, 2020 at 08:40:20AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Sep 18, 2020 at 03:19:05PM +0300, Leon Romanovsky wrote:
> > > > > So we do have an open-source library called hl-thunk, which uses our
> > > > > driver and indeed that was part of the requirement.
> > > > > It is similar to libdrm.
> > > > > Here is the link:
> > > > > https://github.com/HabanaAI/hl-thunk
> > > >
> > > > Are you kidding?
> > > >
> > > > This is mirror of some internal repository that looks like dumpster
> > > > with ChangeId, internal bug tracker numbers, not part of major OS
> > > > distributions.
> > > >
> > > > It is not open-source library and shows very clear why you chose
> > > > to upstream your driver through driver/misc/ tree.
> > >
> > > It is an open source library, as per the license and the code
> > > availability.  What more is expected here?
> >
> > So can I fork iproute2, add bunch of new custom netlink UAPIs and expect
> > Dave to merge it after I throw it on github?
>
> Don't be silly, that's not the case here at all and you know that.
>
> > > No distro has to pick it up, that's not a requirement for kernel code,
> > > we have many kernel helper programs that are not in distros.  Heck, udev
> > > took a long time to get into distros, does that mean the kernel side of
> > > that interface should never have been merged?
> > >
> > > I don't understand your complaint here, it's not our place to judge the
> > > code quality of userspace libraries, otherwise we would never get any
> > > real-work done :)
> >
> > My main complaint is that you can't imagine merging code into large
> > subsystems (netdev, RDMA, DRM? e.t.c) without being civil open-source
> > citizen. It means use of existing user-space libraries/tools and/or
> > providing new ones that will be usable for everyone.
>
> Agreed.
>
> > In this case, we have some custom char device with library that is not
> > usable for anyone else and this is why drivers/misc/ is right place.
>
> Also agreed.
>
> > While we are talking about real-work, it is our benefit to push companies
> > to make investment into ecosystem and not letting them to find an excuse
> > for not doing it.
>
> So why are you complaining about a stand-alone driver that does not have
> any shared subsystems's userspace code to control that driver?
>
> Yes, when integrating into other subsystems (i.e. networking and rdma),
> they should use those common subsystems interfaces, no one is arguing
> that at all.
Hi Greg,
It's probably heresy, but why do I need to integrate into the RDMA subsystem ?
I understand your reasoning about networking (Ethernet) as the driver
connects to the kernel networking stack (netdev), but with RDMA the
driver doesn't use or connect to anything in that stack. If I were to
support IBverbs and declare that I support it, then of course I would
need to integrate to the RDMA subsystem and add my backend to
rdma-core.
But we don't do that so why am I being forced to support IBverbs ?
Forcing GAUDI to use the RDMA stack and IBverbs is like swatting flies
with a sledgehammer.
I do hope that in future devices we will support it natively and of
course then we will integrate as requested, but for GAUDI it is just a
huge overkill IMHO.

Thanks,
Oded
>
> totally lost,
>
> greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19 16:43                         ` Oded Gabbay
@ 2020-09-19 17:27                           ` Greg Kroah-Hartman
  2020-09-19 19:22                             ` Jason Gunthorpe
  2020-09-19 18:49                           ` Andrew Lunn
  1 sibling, 1 reply; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-19 17:27 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Leon Romanovsky, Gal Pressman, Jason Gunthorpe, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sat, Sep 19, 2020 at 07:43:28PM +0300, Oded Gabbay wrote:
> On Sat, Sep 19, 2020 at 11:30 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Sat, Sep 19, 2020 at 11:20:03AM +0300, Leon Romanovsky wrote:
> > > On Sat, Sep 19, 2020 at 08:40:20AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Sep 18, 2020 at 03:19:05PM +0300, Leon Romanovsky wrote:
> > > > > > So we do have an open-source library called hl-thunk, which uses our
> > > > > > driver and indeed that was part of the requirement.
> > > > > > It is similar to libdrm.
> > > > > > Here is the link:
> > > > > > https://github.com/HabanaAI/hl-thunk
> > > > >
> > > > > Are you kidding?
> > > > >
> > > > > This is mirror of some internal repository that looks like dumpster
> > > > > with ChangeId, internal bug tracker numbers, not part of major OS
> > > > > distributions.
> > > > >
> > > > > It is not open-source library and shows very clear why you chose
> > > > > to upstream your driver through driver/misc/ tree.
> > > >
> > > > It is an open source library, as per the license and the code
> > > > availability.  What more is expected here?
> > >
> > > So can I fork iproute2, add bunch of new custom netlink UAPIs and expect
> > > Dave to merge it after I throw it on github?
> >
> > Don't be silly, that's not the case here at all and you know that.
> >
> > > > No distro has to pick it up, that's not a requirement for kernel code,
> > > > we have many kernel helper programs that are not in distros.  Heck, udev
> > > > took a long time to get into distros, does that mean the kernel side of
> > > > that interface should never have been merged?
> > > >
> > > > I don't understand your complaint here, it's not our place to judge the
> > > > code quality of userspace libraries, otherwise we would never get any
> > > > real-work done :)
> > >
> > > My main complaint is that you can't imagine merging code into large
> > > subsystems (netdev, RDMA, DRM? e.t.c) without being civil open-source
> > > citizen. It means use of existing user-space libraries/tools and/or
> > > providing new ones that will be usable for everyone.
> >
> > Agreed.
> >
> > > In this case, we have some custom char device with library that is not
> > > usable for anyone else and this is why drivers/misc/ is right place.
> >
> > Also agreed.
> >
> > > While we are talking about real-work, it is our benefit to push companies
> > > to make investment into ecosystem and not letting them to find an excuse
> > > for not doing it.
> >
> > So why are you complaining about a stand-alone driver that does not have
> > any shared subsystems's userspace code to control that driver?
> >
> > Yes, when integrating into other subsystems (i.e. networking and rdma),
> > they should use those common subsystems interfaces, no one is arguing
> > that at all.
> Hi Greg,
> It's probably heresy, but why do I need to integrate into the RDMA subsystem ?
> I understand your reasoning about networking (Ethernet) as the driver
> connects to the kernel networking stack (netdev), but with RDMA the
> driver doesn't use or connect to anything in that stack. If I were to
> support IBverbs and declare that I support it, then of course I would
> need to integrate to the RDMA subsystem and add my backend to
> rdma-core.

IBverbs are horrid and I would not wish them on anyone.  Seriously.

> But we don't do that so why am I being forced to support IBverbs ?

You shouldn't.

> Forcing GAUDI to use the RDMA stack and IBverbs is like swatting flies
> with a sledgehammer.
> I do hope that in future devices we will support it natively and of
> course then we will integrate as requested, but for GAUDI it is just a
> huge overkill IMHO.

I think the general rdma apis are the key here, not the userspace api.

Note, I do not know exactly what they are, but no, IBverbs are not ok.

Ick.

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19 16:43                         ` Oded Gabbay
  2020-09-19 17:27                           ` Greg Kroah-Hartman
@ 2020-09-19 18:49                           ` Andrew Lunn
  1 sibling, 0 replies; 83+ messages in thread
From: Andrew Lunn @ 2020-09-19 18:49 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Greg Kroah-Hartman, Leon Romanovsky, Gal Pressman,
	Jason Gunthorpe, Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, David S. Miller, Florian Fainelli,
	linux-rdma

On Sat, Sep 19, 2020 at 07:43:28PM +0300, Oded Gabbay wrote:
> It's probably heresy, but why do I need to integrate into the RDMA subsystem ?

Hi Oded

I don't know the RDMA subsystem at all. So i will give a more generic
answer. Are you reinventing things which a subsystem core already has?
The subsystem core will be well tested, since lots of devices use
it. Because of this, subsystem cores generally have a lower bug count
per line of code than driver code. Using core code means drivers are
smaller, and smaller code has less bugs by definition.

We as maintainers have to assume you are going to abandon the driver
at some point, while the hardware still exists, and leave the
community to maintain it. So a smaller driver, which makes heavy use
of the core is much easier to maintain.

By making use of core code, you also get freebies. Somebody adds new
functionality to the core, your driver automatically gets it.

Look at this from the opposite perspective. Say every driver
implemented their own TCP/IP stack? Or DMA engine? SPI infrastructure?
How big a nightmare would it be to maintain?

In your case, some parts of you hardware looks a bit like RDMA? So you
ideally want to use the core code from the RDMA subsystem. Maybe you
just need some of the lower layers? Maybe you need to refactor some of
the RDMA core to make it a library you can pick and choice the bits
useful to you? What you really want to avoid is re-implementing stuff
in your driver which is already in the core.

      Andrew

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19 17:27                           ` Greg Kroah-Hartman
@ 2020-09-19 19:22                             ` Jason Gunthorpe
  2020-09-20  8:47                               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-19 19:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Oded Gabbay, Leon Romanovsky, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sat, Sep 19, 2020 at 07:27:30PM +0200, Greg Kroah-Hartman wrote:
> > It's probably heresy, but why do I need to integrate into the RDMA subsystem ?
> > I understand your reasoning about networking (Ethernet) as the driver
> > connects to the kernel networking stack (netdev), but with RDMA the
> > driver doesn't use or connect to anything in that stack. If I were to
> > support IBverbs and declare that I support it, then of course I would
> > need to integrate to the RDMA subsystem and add my backend to
> > rdma-core.
> 
> IBverbs are horrid and I would not wish them on anyone.  Seriously.

I'm curious what drives this opinion? Did you have it since you
reviewed the initial submission all those years ago?

> I think the general rdma apis are the key here, not the userspace api.

Are you proposing that habana should have uAPI in drivers/misc and
present a standard rdma-core userspace for it? This is the only
userspace programming interface for RoCE HW. I think that would be
much more work.

If not, what open source userspace are you going to ask them to
present to merge the kernel side into misc?

> Note, I do not know exactly what they are, but no, IBverbs are not ok.

Should we stop merging new drivers and abandon the RDMA subsystem? Is
there something you'd like to see fixed?

Don't really understand your position, sorry.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-19 19:22                             ` Jason Gunthorpe
@ 2020-09-20  8:47                               ` Greg Kroah-Hartman
  2020-09-20 19:05                                 ` Oded Gabbay
  2020-09-21 11:52                                 ` Jason Gunthorpe
  0 siblings, 2 replies; 83+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-20  8:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Leon Romanovsky, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sat, Sep 19, 2020 at 04:22:35PM -0300, Jason Gunthorpe wrote:
> On Sat, Sep 19, 2020 at 07:27:30PM +0200, Greg Kroah-Hartman wrote:
> > > It's probably heresy, but why do I need to integrate into the RDMA subsystem ?
> > > I understand your reasoning about networking (Ethernet) as the driver
> > > connects to the kernel networking stack (netdev), but with RDMA the
> > > driver doesn't use or connect to anything in that stack. If I were to
> > > support IBverbs and declare that I support it, then of course I would
> > > need to integrate to the RDMA subsystem and add my backend to
> > > rdma-core.
> > 
> > IBverbs are horrid and I would not wish them on anyone.  Seriously.
> 
> I'm curious what drives this opinion? Did you have it since you
> reviewed the initial submission all those years ago?

As I learned more about that interface, yes, I like it less and less :)

But that's the userspace api you all are stuck with, for various
reasons, my opinion doesn't matter here.

> > I think the general rdma apis are the key here, not the userspace api.
> 
> Are you proposing that habana should have uAPI in drivers/misc and
> present a standard rdma-core userspace for it? This is the only
> userspace programming interface for RoCE HW. I think that would be
> much more work.
> 
> If not, what open source userspace are you going to ask them to
> present to merge the kernel side into misc?

I don't think that they have a userspace api to their rdma feature from
what I understand, but I could be totally wrong as I do not know their
hardware at all, so I'll let them answer this question.

> > Note, I do not know exactly what they are, but no, IBverbs are not ok.
> 
> Should we stop merging new drivers and abandon the RDMA subsystem? Is
> there something you'd like to see fixed?
> 
> Don't really understand your position, sorry.

For anything that _has_ to have a userspace RMDA interface, sure ibverbs
are the one we are stuck with, but I didn't think that was the issue
here at all, which is why I wrote the above comments.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-16 12:00                 ` Greg Kroah-Hartman
@ 2020-09-20 16:45                   ` Daniel Vetter
  0 siblings, 0 replies; 83+ messages in thread
From: Daniel Vetter @ 2020-09-20 16:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Oded Gabbay, David Miller, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, Jakub Kicinski, Andrew Lunn,
	Florian Fainelli

On Wed, Sep 16, 2020 at 02:00:54PM +0200, Greg Kroah-Hartman wrote:
> On Wed, Sep 16, 2020 at 11:47:58AM +0300, Oded Gabbay wrote:
> > On Wed, Sep 16, 2020 at 11:21 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Sep 16, 2020 at 11:02:39AM +0300, Oded Gabbay wrote:
> > > > On Wed, Sep 16, 2020 at 10:41 AM Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Wed, Sep 16, 2020 at 09:36:23AM +0300, Oded Gabbay wrote:
> > > > > > On Wed, Sep 16, 2020 at 9:25 AM Greg Kroah-Hartman
> > > > > > <gregkh@linuxfoundation.org> wrote:
> > > > > > >
> > > > > > > On Tue, Sep 15, 2020 at 11:49:12PM +0300, Oded Gabbay wrote:
> > > > > > > > On Tue, Sep 15, 2020 at 11:42 PM David Miller <davem@davemloft.net> wrote:
> > > > > > > > >
> > > > > > > > > From: Oded Gabbay <oded.gabbay@gmail.com>
> > > > > > > > > Date: Tue, 15 Sep 2020 20:10:08 +0300
> > > > > > > > >
> > > > > > > > > > This is the second version of the patch-set to upstream the GAUDI NIC code
> > > > > > > > > > into the habanalabs driver.
> > > > > > > > > >
> > > > > > > > > > The only modification from v2 is in the ethtool patch (patch 12). Details
> > > > > > > > > > are in that patch's commit message.
> > > > > > > > > >
> > > > > > > > > > Link to v2 cover letter:
> > > > > > > > > > https://lkml.org/lkml/2020/9/12/201
> > > > > > > > >
> > > > > > > > > I agree with Jakub, this driver definitely can't go-in as it is currently
> > > > > > > > > structured and designed.
> > > > > > > > Why is that ?
> > > > > > > > Can you please point to the things that bother you or not working correctly?
> > > > > > > > I can't really fix the driver if I don't know what's wrong.
> > > > > > > >
> > > > > > > > In addition, please read my reply to Jakub with the explanation of why
> > > > > > > > we designed this driver as is.
> > > > > > > >
> > > > > > > > And because of the RDMA'ness of it, the RDMA
> > > > > > > > > folks have to be CC:'d and have a chance to review this.
> > > > > > > > As I said to Jakub, the driver doesn't use the RDMA infrastructure in
> > > > > > > > the kernel and we can't connect to it due to the lack of H/W support
> > > > > > > > we have
> > > > > > > > Therefore, I don't see why we need to CC linux-rdma.
> > > > > > > > I understood why Greg asked me to CC you because we do connect to the
> > > > > > > > netdev and standard eth infrastructure, but regarding the RDMA, it's
> > > > > > > > not really the same.
> > > > > > >
> > > > > > > Ok, to do this "right" it needs to be split up into separate drivers,
> > > > > > > hopefully using the "virtual bus" code that some day Intel will resubmit
> > > > > > > again that will solve this issue.
> > > > > > Hi Greg,
> > > > > > Can I suggest an alternative for the short/medium term ?
> > > > > >
> > > > > > In an earlier email, Jakub said:
> > > > > > "Is it not possible to move the files and still build them into a single
> > > > > > module?"
> > > > > >
> > > > > > I thought maybe that's a good way to progress here ?
> > > > >
> > > > > Cross-directory builds of a single module are crazy.  Yes, they work,
> > > > > but really, that's a mess, and would never suggest doing that.
> > > > >
> > > > > > First, split the content to Ethernet and RDMA.
> > > > > > Then move the Ethernet part to drivers/net but build it as part of
> > > > > > habanalabs.ko.
> > > > > > Regarding the RDMA code, upstream/review it in a different patch-set
> > > > > > (maybe they will want me to put the files elsewhere).
> > > > > >
> > > > > > What do you think ?
> > > > >
> > > > > I think you are asking for more work there than just splitting out into
> > > > > separate modules :)
> > > > >
> > > > > thanks,
> > > > >
> > > > > greg k-h
> > > > Hi Greg,
> > > >
> > > > If cross-directory building is out of the question, what about
> > > > splitting into separate modules ? And use cross-module notifiers/calls
> > > > ? I did that with amdkfd and amdgpu/radeon a couple of years back. It
> > > > worked (that's the best thing I can say about it).
> > >
> > > That's fine with me.
> > >
> > > > The main problem with this "virtual bus" thing is that I'm not
> > > > familiar with it at all and from my experience I imagine it would take
> > > > a considerable time and effort to upstream this infrastructure work.
> > >
> > > It shouldn't be taking that long, but for some unknown reason, the
> > > original author of that code is sitting on it and not resending it.  Go
> > > poke them through internal Intel channels to find out what the problem
> > > is, as I have no clue why a 200-300 line bus module is taking so long to
> > > get "right" :(
> > >
> > > I'm _ALMOST_ at the point where I would just do that work myself, but
> > > due to my current status with Intel, I'll let them do it as I have
> > > enough other things on my plate...
> > >
> > > > This could delay the NIC code for a couple of years, which by then
> > > > this won't be relevant at all.
> > >
> > > Why wouldn't this code be relevant in a year?  It's going to be 2+ years
> > > before any of this shows up in an "enterprise distro" based on their
> > > release cycles anyway :)
> > >
> > > thanks,
> > >
> > > greg k-h
> > 
> > Hi Greg,
> > ok, I'll take a look. Do you happen to have the name of the patch-set / author ?
> 
> Here's at least one copy:
> 	https://lore.kernel.org/linux-rdma/20200520070227.3392100-2-jeffrey.t.kirsher@intel.com/
> 
> there might have been a newer one, can't remember, sorry.

Maybe I'm missing something or maybe the in-tree code we have already
should be refactored to use more buses and drivers, but
drivers/base/component.c is made for this. We use this to glue all kinds
of things across all kinds of subsystems already.

Of course it really should be only used for one-off problems, as soon as
you have a standard interface/interaction there should be some kind of
standard lookup way to get at your thing (and the driver behind it), e.g.
in drivers/gpu we're now building up drm_bridge and trying to get away
from componenent.c for these things.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-20  8:47                               ` Greg Kroah-Hartman
@ 2020-09-20 19:05                                 ` Oded Gabbay
  2020-09-21 10:39                                   ` Leon Romanovsky
  2020-09-21 11:52                                 ` Jason Gunthorpe
  1 sibling, 1 reply; 83+ messages in thread
From: Oded Gabbay @ 2020-09-20 19:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jason Gunthorpe, Leon Romanovsky, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma, izur

On Sun, Sep 20, 2020 at 11:47 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Sat, Sep 19, 2020 at 04:22:35PM -0300, Jason Gunthorpe wrote:
> > On Sat, Sep 19, 2020 at 07:27:30PM +0200, Greg Kroah-Hartman wrote:
> > > > It's probably heresy, but why do I need to integrate into the RDMA subsystem ?
> > > > I understand your reasoning about networking (Ethernet) as the driver
> > > > connects to the kernel networking stack (netdev), but with RDMA the
> > > > driver doesn't use or connect to anything in that stack. If I were to
> > > > support IBverbs and declare that I support it, then of course I would
> > > > need to integrate to the RDMA subsystem and add my backend to
> > > > rdma-core.
> > >
> > > IBverbs are horrid and I would not wish them on anyone.  Seriously.
> >
> > I'm curious what drives this opinion? Did you have it since you
> > reviewed the initial submission all those years ago?
>
> As I learned more about that interface, yes, I like it less and less :)
>
> But that's the userspace api you all are stuck with, for various
> reasons, my opinion doesn't matter here.
>
> > > I think the general rdma apis are the key here, not the userspace api.
> >
> > Are you proposing that habana should have uAPI in drivers/misc and
> > present a standard rdma-core userspace for it? This is the only
> > userspace programming interface for RoCE HW. I think that would be
> > much more work.
> >
> > If not, what open source userspace are you going to ask them to
> > present to merge the kernel side into misc?
>
> I don't think that they have a userspace api to their rdma feature from
> what I understand, but I could be totally wrong as I do not know their
> hardware at all, so I'll let them answer this question.

Hi Greg,
We do expose a new IOCTL to enable the user to configure connections
between multiple GAUDI devices.

Having said that, we restrict this IOCTL to be used only by the same
user who is doing the compute on our device, as opposed to a real RDMA
device where any number of applications can send and receive.
In addition, this IOCTL limits the user to connect ONLY to another
GAUDI device and not to a 3rd party RDMA device.

It is true that GAUDI supports RDMA data movement but the data
movement is NOT done by the user. It is done by our compute engines.
i.e. the compute engines performs "send" and "receive" without going
to the host (aka no support for ibv_postsend, ibv_postreceive). The
only thing that is controlled by the user is to say which GAUDI is
connected to which. After that, the command submission the user
performs to operate our compute engines will cause them to send and
receive RDMA packets.

Moreover, as opposed to smart NICs where the Networking is the main
focus and the compute is only secondary, in our device the compute is
our major focus and the networking is a slave for it.

The hl-thunk userspace library will have wrappers around this single
IOCTL (like all our driver's IOCTLs) and also contain demos to show
how to use it.


>
> > > Note, I do not know exactly what they are, but no, IBverbs are not ok.
> >
> > Should we stop merging new drivers and abandon the RDMA subsystem? Is
> > there something you'd like to see fixed?
> >
> > Don't really understand your position, sorry.
>
> For anything that _has_ to have a userspace RMDA interface, sure ibverbs
> are the one we are stuck with, but I didn't think that was the issue
> here at all, which is why I wrote the above comments.
To emphasize again, we don't want to expose a userspace RDMA interface.
We just want to allow our single compute user to configure a
connection to another GAUDI.

Thanks,
Oded

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-20 19:05                                 ` Oded Gabbay
@ 2020-09-21 10:39                                   ` Leon Romanovsky
  0 siblings, 0 replies; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-21 10:39 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Greg Kroah-Hartman, Jason Gunthorpe, Gal Pressman,
	Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, David S. Miller, Andrew Lunn, Florian Fainelli,
	linux-rdma, izur

On Sun, Sep 20, 2020 at 10:05:39PM +0300, Oded Gabbay wrote:
> On Sun, Sep 20, 2020 at 11:47 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Sat, Sep 19, 2020 at 04:22:35PM -0300, Jason Gunthorpe wrote:
> > > On Sat, Sep 19, 2020 at 07:27:30PM +0200, Greg Kroah-Hartman wrote:
> > > > > It's probably heresy, but why do I need to integrate into the RDMA subsystem ?
> > > > > I understand your reasoning about networking (Ethernet) as the driver
> > > > > connects to the kernel networking stack (netdev), but with RDMA the
> > > > > driver doesn't use or connect to anything in that stack. If I were to
> > > > > support IBverbs and declare that I support it, then of course I would
> > > > > need to integrate to the RDMA subsystem and add my backend to
> > > > > rdma-core.
> > > >
> > > > IBverbs are horrid and I would not wish them on anyone.  Seriously.
> > >
> > > I'm curious what drives this opinion? Did you have it since you
> > > reviewed the initial submission all those years ago?
> >
> > As I learned more about that interface, yes, I like it less and less :)
> >
> > But that's the userspace api you all are stuck with, for various
> > reasons, my opinion doesn't matter here.
> >
> > > > I think the general rdma apis are the key here, not the userspace api.
> > >
> > > Are you proposing that habana should have uAPI in drivers/misc and
> > > present a standard rdma-core userspace for it? This is the only
> > > userspace programming interface for RoCE HW. I think that would be
> > > much more work.
> > >
> > > If not, what open source userspace are you going to ask them to
> > > present to merge the kernel side into misc?
> >
> > I don't think that they have a userspace api to their rdma feature from
> > what I understand, but I could be totally wrong as I do not know their
> > hardware at all, so I'll let them answer this question.
>
> Hi Greg,
> We do expose a new IOCTL to enable the user to configure connections
> between multiple GAUDI devices.

How is it different from RDMA QP configuration?

>
> Having said that, we restrict this IOCTL to be used only by the same
> user who is doing the compute on our device, as opposed to a real RDMA
> device where any number of applications can send and receive.

The ability to support multiple applications is not RDMA-requirement,
but the implementation. For example MPI jobs are single user of RDMA device.

> In addition, this IOCTL limits the user to connect ONLY to another
> GAUDI device and not to a 3rd party RDMA device.

I don't see how it is different from EFA with their SQD QP type or mlx5
devices with DC QPs that you can connect only to similar devices (no
interoperability).

Thanks

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-18 15:28                                     ` Jason Gunthorpe
@ 2020-09-21 11:22                                       ` Gal Pressman
  2020-09-21 11:49                                         ` Leon Romanovsky
  2020-09-22 11:41                                         ` Jason Gunthorpe
  0 siblings, 2 replies; 83+ messages in thread
From: Gal Pressman @ 2020-09-21 11:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Oded Gabbay
  Cc: Greg Kroah-Hartman, izur, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On 18/09/2020 18:28, Jason Gunthorpe wrote:
> On Fri, Sep 18, 2020 at 06:15:52PM +0300, Oded Gabbay wrote:
> 
>> I'm sorry, but you won't be able to convince me here that I need to
>> "enslave" my entire code to RDMA, just because my ASIC "also" has some
>> RDMA ports.
> 
> You can't recreate common shared subsystems in a driver just because
> you don't want to work with the subsystem.
> 
> I don't care what else the ASIC has. In Linux the netdev part is
> exposed through netdev, the RDMA part through RDMA, the
> totally-not-a-GPU part through drivers/misc.
> 
> It is always been this way. Chelsio didn't get to rebuild the SCSI
> stack in their driver just because "storage is a small part of their
> device"
> 
> Drivers are not allowed to re-implement I2C/SPI/etc without re-using
> the comon code for that just because "I2C is a small part of their
> device"
> 
> Exposing to userspace the creation of RoCE QPs and their related
> objects are unambiguously a RDMA subsystem task. I don't even know how
> you think you can argue it is not. It is your company proudly claiming
> the device has 100G RoCE ports in all the marketing literature, after
> all.
> 
> It is too bad the device has a non-standards compliant implementation
> of RoCE so this will be a bit hard for you. Oh well.

What is considered a RoCE port in this case if it's not compliant with RoCE?
Sounds like it's an implementation of RDMA over ethernet, not RoCE.
Does GAUDI support UD/RC/.. QPs? Is it using a proprietary wire protocol?
(BTW, Oded claims it's similar to nvlink, how is nvlink's implementation
exposed? Or is it closed source?)

Jason, how do you imagine GAUDI in the RDMA subsystem? Userspace control path
verbs (used by hl-thunk?) and all data path verbs exposed as kverbs (used by
habanalabs driver)?
So neither any userspace verbs apps could use it nor kernel ULPs?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-21 11:22                                       ` Gal Pressman
@ 2020-09-21 11:49                                         ` Leon Romanovsky
  2020-09-22 11:41                                         ` Jason Gunthorpe
  1 sibling, 0 replies; 83+ messages in thread
From: Leon Romanovsky @ 2020-09-21 11:49 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Jason Gunthorpe, Oded Gabbay, Greg Kroah-Hartman, izur,
	Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, David S. Miller, Andrew Lunn, Florian Fainelli,
	linux-rdma, Olof Johansson

On Mon, Sep 21, 2020 at 02:22:02PM +0300, Gal Pressman wrote:
> On 18/09/2020 18:28, Jason Gunthorpe wrote:
> > On Fri, Sep 18, 2020 at 06:15:52PM +0300, Oded Gabbay wrote:
> >
> >> I'm sorry, but you won't be able to convince me here that I need to
> >> "enslave" my entire code to RDMA, just because my ASIC "also" has some
> >> RDMA ports.
> >
> > You can't recreate common shared subsystems in a driver just because
> > you don't want to work with the subsystem.
> >
> > I don't care what else the ASIC has. In Linux the netdev part is
> > exposed through netdev, the RDMA part through RDMA, the
> > totally-not-a-GPU part through drivers/misc.
> >
> > It is always been this way. Chelsio didn't get to rebuild the SCSI
> > stack in their driver just because "storage is a small part of their
> > device"
> >
> > Drivers are not allowed to re-implement I2C/SPI/etc without re-using
> > the comon code for that just because "I2C is a small part of their
> > device"
> >
> > Exposing to userspace the creation of RoCE QPs and their related
> > objects are unambiguously a RDMA subsystem task. I don't even know how
> > you think you can argue it is not. It is your company proudly claiming
> > the device has 100G RoCE ports in all the marketing literature, after
> > all.
> >
> > It is too bad the device has a non-standards compliant implementation
> > of RoCE so this will be a bit hard for you. Oh well.
>
> What is considered a RoCE port in this case if it's not compliant with RoCE?

They claim that it is RoCE v2.
https://www.hotchips.org/hc31/HC31_1.14_HabanaLabs.Eitan_Medina.v9.pdf

Thanks

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-20  8:47                               ` Greg Kroah-Hartman
  2020-09-20 19:05                                 ` Oded Gabbay
@ 2020-09-21 11:52                                 ` Jason Gunthorpe
  2020-09-21 21:20                                   ` Jakub Kicinski
  1 sibling, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-21 11:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Oded Gabbay, Leon Romanovsky, Gal Pressman, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Sun, Sep 20, 2020 at 10:47:02AM +0200, Greg Kroah-Hartman wrote:
> > If not, what open source userspace are you going to ask them to
> > present to merge the kernel side into misc?
> 
> I don't think that they have a userspace api to their rdma feature from
> what I understand, but I could be totally wrong as I do not know their
> hardware at all, so I'll let them answer this question.

I thought Oded was pretty clear, the goal of this series is to expose
their RDMA HW to userspace. This problem space requires co-mingling
networking and compute at extremely high speed/low overhead. This is
all done in userspace.

We are specifically talking about this in
include/uapi/misc/habanalabs.h:

 /*
  * NIC
  *
  * This IOCTL allows the user to manage and configure the device's NIC ports.
  * The following operations are available:
  * - Create a completion queue
  * - Destroy a completion queue
  * - Wait on completion queue
  * - Poll a completion queue
  * - Update consumed completion queue entries
  * - Set a work queue
  * - Unset a work queue
  *
  * For all operations, the user should provide a pointer to an input structure
  * with the context parameters. Some of the operations also require a pointer to
  * driver regarding how many of the available CQEs were actually
  * processed/consumed. Only then the driver will override them with newer
  * entries.
  * The set WQ operation should provide the device virtual address of the WQ with
  * a matching size for the number of WQs and entries per WQ.
  *
  */
 #define HL_IOCTL_NIC	_IOWR('H', 0x07, struct hl_nic_args)

Which is ibv_create_qp, ibv_create_cq, ibv_poll_cq, etc, etc

Habana has repeatedly described their HW as having multiple 100G RoCE
ports. RoCE is one of the common industry standards that ibverbs
unambiguously is responsible for.

I would be much less annoyed if they were not actively marketing their
product as RoCE RDMA.

Sure there is some argument that their RoCE isn't spec compliant, but
I don't think it excuses the basic principle of our subsystem:

 RDMA HW needs to demonstrate some basic functionality using the
 standard open source userspace software stack.

I don't like this idea of backdooring a bunch of proprietary closed
source RDMA userspace through drivers/misc, and if you don't have a
clear idea how to get something equal for drivers/misc you should not
accept the H_IOCTL_NIC.

Plus RoCE is complicated, there is a bunch of interaction with netdev
and rules related to that that really needs to be respected.

> For anything that _has_ to have a userspace RMDA interface, sure ibverbs
> are the one we are stuck with, but I didn't think that was the issue
> here at all, which is why I wrote the above comments.

I think you should look at the patches #8 through 11:

https://lore.kernel.org/lkml/20200915171022.10561-9-oded.gabbay@gmail.com/

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-21 11:52                                 ` Jason Gunthorpe
@ 2020-09-21 21:20                                   ` Jakub Kicinski
  2020-09-22 11:49                                     ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Jakub Kicinski @ 2020-09-21 21:20 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jason Gunthorpe, Greg Kroah-Hartman, Leon Romanovsky,
	Gal Pressman, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Mon, 21 Sep 2020 08:52:39 -0300 Jason Gunthorpe wrote:
> I don't like this idea of backdooring a bunch of proprietary closed
> source RDMA userspace through drivers/misc, and if you don't have a
> clear idea how to get something equal for drivers/misc you should not
> accept the H_IOCTL_NIC.
> 
> Plus RoCE is complicated, there is a bunch of interaction with netdev
> and rules related to that that really needs to be respected.

+1

To me this code quite clearly fits the description of vendor SDK which
runs proprietary stuff on top. It's such an vendor SDK thing to do to
pick the parts of our infrastructure they like and "simplify the rest"
with its own re-implementation.

I'd wager the only reason you expose the netdevs at all is for link
settings, stats, packet capture and debug. You'd never run TCP traffic
over those links. And you're fighting against using Linux APIs for the
only real traffic that runs on those links - RDMA(ish) traffic.

Greg - I'm probably the least experience of the folks involved in this
conversation - could you ELI5 what's the benefit to the community from
merging this code?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-21 11:22                                       ` Gal Pressman
  2020-09-21 11:49                                         ` Leon Romanovsky
@ 2020-09-22 11:41                                         ` Jason Gunthorpe
  2020-09-22 12:46                                           ` Gal Pressman
  1 sibling, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 11:41 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Oded Gabbay, Greg Kroah-Hartman, izur, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Mon, Sep 21, 2020 at 02:22:02PM +0300, Gal Pressman wrote:

> What is considered a RoCE port in this case if it's not compliant with RoCE?
> Sounds like it's an implementation of RDMA over ethernet, not RoCE.
> Does GAUDI support UD/RC/.. QPs? Is it using a proprietary wire protocol?
> (BTW, Oded claims it's similar to nvlink, how is nvlink's implementation
> exposed? Or is it closed source?)

I think Oded was drawing a parallel to how nvlink is integral with the
compute element. From Oded's descriptions I don't think it is much
like nvlink at all.

> Jason, how do you imagine GAUDI in the RDMA subsystem? Userspace control path
> verbs (used by hl-thunk?) and all data path verbs exposed as kverbs (used by
> habanalabs driver)?
> So neither any userspace verbs apps could use it nor kernel ULPs?

Based on what Oded described it seems like a reasonable RDMA device
with some limitations around MR IOVA.

Looks like the desire is to create a RDMA WR and CQ ring in userspace,
and then co-mingle that with the compute side of the device.

So instead of doing the special IOCTL and mmap against the compute FD
it would create a RDMA QP and RDMA CQ, use dv to access the raw
internals, and the propritary stack would have exactly the same stuff
it would have had with the misc ioctl.

But, completely separately, they'd also have to implement some of
verbs which serves as the open source userspace showing how this HW
works. What that is depends largely on what their HW can do, and if
they want to connect to UCX/mpi/libfabric/etc

A bunch of ioctl stubs or a few tests is far below our standard in
RDMA.

There may have been some argument that the compute side of this device
has no industry standards so should be a drivers/misc, but HPC
networking *does* have extensive standards and extensive open source
software stacks. It is very hard for me to see how a device in this
market could be competitive without integrating with that stuff.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-21 21:20                                   ` Jakub Kicinski
@ 2020-09-22 11:49                                     ` Jason Gunthorpe
  0 siblings, 0 replies; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 11:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Oded Gabbay, Greg Kroah-Hartman, Leon Romanovsky, Gal Pressman,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma

On Mon, Sep 21, 2020 at 02:20:53PM -0700, Jakub Kicinski wrote:
> I'd wager the only reason you expose the netdevs at all is for link
> settings, stats, packet capture and debug. You'd never run TCP traffic
> over those links. And you're fighting against using Linux APIs for the
> only real traffic that runs on those links - RDMA(ish) traffic.

The usual working flow is to use something like TCP to exchange
connection information then pivot to RDMA for the actual data
flow. This is why a driver like this could get away with such a low
performance implementation for a 100G NIC, it is just application boot
metadata being exchanged.

Sniffing probably won't work as typically the HW will capture the RoCE
traffic before reaching Linux - and the Linux driver couldn't handle a
100G flow anyhow. Stats might not work either.

As far as the "usual rules" we do require that accelerator devices
sharing a netdev are secure in the concept of netdev userspace
security. They can access the assigned RoCEv2 UDP port but cannot do
things like forge src IP/MAC addresses, violate VLANs, reach outside
net namespaces, capature arbitary traffic, etc.

This stuff is tricky and generally requires HW support. Someone has to
audit all of this and ensure it meets the netdev security requirements
too, otherwise it will need CAP_NET_RAW to function. Obviously this
requires seeing enough of a userspace implementation to understand how
the design approaches verbs 'Address Handles' and so forth.

RDMA HW has had errors before and when discovered it was blocked with
CAP_NET_RAW until new chip revs came out, this is something I take
very seriously.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-22 11:41                                         ` Jason Gunthorpe
@ 2020-09-22 12:46                                           ` Gal Pressman
  2020-09-22 16:14                                             ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Gal Pressman @ 2020-09-22 12:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Greg Kroah-Hartman, izur, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On 22/09/2020 14:41, Jason Gunthorpe wrote:
> On Mon, Sep 21, 2020 at 02:22:02PM +0300, Gal Pressman wrote:
> 
>> What is considered a RoCE port in this case if it's not compliant with RoCE?
>> Sounds like it's an implementation of RDMA over ethernet, not RoCE.
>> Does GAUDI support UD/RC/.. QPs? Is it using a proprietary wire protocol?
>> (BTW, Oded claims it's similar to nvlink, how is nvlink's implementation
>> exposed? Or is it closed source?)
> 
> I think Oded was drawing a parallel to how nvlink is integral with the
> compute element. From Oded's descriptions I don't think it is much
> like nvlink at all.
> 
>> Jason, how do you imagine GAUDI in the RDMA subsystem? Userspace control path
>> verbs (used by hl-thunk?) and all data path verbs exposed as kverbs (used by
>> habanalabs driver)?
>> So neither any userspace verbs apps could use it nor kernel ULPs?
> 
> Based on what Oded described it seems like a reasonable RDMA device
> with some limitations around MR IOVA.
> 
> Looks like the desire is to create a RDMA WR and CQ ring in userspace,
> and then co-mingle that with the compute side of the device.
> 
> So instead of doing the special IOCTL and mmap against the compute FD
> it would create a RDMA QP and RDMA CQ, use dv to access the raw
> internals, and the propritary stack would have exactly the same stuff
> it would have had with the misc ioctl.
> 
> But, completely separately, they'd also have to implement some of
> verbs which serves as the open source userspace showing how this HW
> works. What that is depends largely on what their HW can do, and if
> they want to connect to UCX/mpi/libfabric/etc
> 
> A bunch of ioctl stubs or a few tests is far below our standard in
> RDMA.
> 
> There may have been some argument that the compute side of this device
> has no industry standards so should be a drivers/misc, but HPC
> networking *does* have extensive standards and extensive open source
> software stacks. It is very hard for me to see how a device in this
> market could be competitive without integrating with that stuff.

I agree, that makes sense.
But assuming Oded actually goes and implements all the needed verbs to get a
basic functional libibverbs provider (assuming their HW can do it somehow), is
it really useful if no one is going to use it?
It doesn't sound like habanalabs want people to use GAUDI as an RDMA adapter,
and I'm assuming the only real world use case is going to be using the hl stack,
which means we're left with a lot of dead code that's not used/tested by anyone.

Genuine question, wouldn't it be better if they only implement what's actually
going to be used and tested by their customers?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-22 12:46                                           ` Gal Pressman
@ 2020-09-22 16:14                                             ` Jason Gunthorpe
  2020-09-22 16:30                                               ` Gal Pressman
  0 siblings, 1 reply; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 16:14 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Oded Gabbay, Greg Kroah-Hartman, izur, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Tue, Sep 22, 2020 at 03:46:29PM +0300, Gal Pressman wrote:

> I agree, that makes sense.
> But assuming Oded actually goes and implements all the needed verbs to get a
> basic functional libibverbs provider (assuming their HW can do it somehow), is
> it really useful if no one is going to use it?
> It doesn't sound like habanalabs want people to use GAUDI as an RDMA adapter,
> and I'm assuming the only real world use case is going to be using the hl stack,
> which means we're left with a lot of dead code that's not used/tested by anyone.
> 
> Genuine question, wouldn't it be better if they only implement what's actually
> going to be used and tested by their customers?

The general standard for this 'accel' hardware, both in DRM and RDMA
is to present an open source userspace. Companies are encouraged to
use that as their main interface but I suppose are free to carry the
cost of dual APIs, and the community's wrath if they want.

At least for RDMA this is guided by the founding event of Linux RDMA
where all customers demanded the madness of every supplier having a
unique software stack from the kernel down stop. Since then the low
level stack has been cross vendor and uniform.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-22 16:14                                             ` Jason Gunthorpe
@ 2020-09-22 16:30                                               ` Gal Pressman
  2020-09-22 16:52                                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 83+ messages in thread
From: Gal Pressman @ 2020-09-22 16:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Oded Gabbay, Greg Kroah-Hartman, izur, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On 22/09/2020 19:14, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 03:46:29PM +0300, Gal Pressman wrote:
> 
>> I agree, that makes sense.
>> But assuming Oded actually goes and implements all the needed verbs to get a
>> basic functional libibverbs provider (assuming their HW can do it somehow), is
>> it really useful if no one is going to use it?
>> It doesn't sound like habanalabs want people to use GAUDI as an RDMA adapter,
>> and I'm assuming the only real world use case is going to be using the hl stack,
>> which means we're left with a lot of dead code that's not used/tested by anyone.
>>
>> Genuine question, wouldn't it be better if they only implement what's actually
>> going to be used and tested by their customers?
> 
> The general standard for this 'accel' hardware, both in DRM and RDMA
> is to present an open source userspace. Companies are encouraged to
> use that as their main interface but I suppose are free to carry the
> cost of dual APIs, and the community's wrath if they want.

I didn't mean they should maintain two interfaces.
The question is whether they should implement libibverbs support that covers the
cases used by their stack, or should they implement all "mandatory" verbs so
they could be able to run libibverbs' examples/perftest/pyverbs as well, even
though these will likely be the only apps covering these verbs.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
  2020-09-22 16:30                                               ` Gal Pressman
@ 2020-09-22 16:52                                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 83+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 16:52 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Oded Gabbay, Greg Kroah-Hartman, izur, Jakub Kicinski,
	Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	David S. Miller, Andrew Lunn, Florian Fainelli, linux-rdma,
	Olof Johansson

On Tue, Sep 22, 2020 at 07:30:32PM +0300, Gal Pressman wrote:
> On 22/09/2020 19:14, Jason Gunthorpe wrote:
> > On Tue, Sep 22, 2020 at 03:46:29PM +0300, Gal Pressman wrote:
> > 
> >> I agree, that makes sense.
> >> But assuming Oded actually goes and implements all the needed verbs to get a
> >> basic functional libibverbs provider (assuming their HW can do it somehow), is
> >> it really useful if no one is going to use it?
> >> It doesn't sound like habanalabs want people to use GAUDI as an RDMA adapter,
> >> and I'm assuming the only real world use case is going to be using the hl stack,
> >> which means we're left with a lot of dead code that's not used/tested by anyone.
> >>
> >> Genuine question, wouldn't it be better if they only implement what's actually
> >> going to be used and tested by their customers?
> > 
> > The general standard for this 'accel' hardware, both in DRM and RDMA
> > is to present an open source userspace. Companies are encouraged to
> > use that as their main interface but I suppose are free to carry the
> > cost of dual APIs, and the community's wrath if they want.
> 
> I didn't mean they should maintain two interfaces.
> The question is whether they should implement libibverbs support that covers the
> cases used by their stack, or should they implement all "mandatory" verbs so
> they could be able to run libibverbs' examples/perftest/pyverbs as well, even
> though these will likely be the only apps covering these verbs.

As I said, the minimum standard is an open source user space that will
operate the NIC. For EFA we decided that was ibv_ud_pingpong, and now
parts of pyverbs. A similar decision would be needed here too. It is a
conversation that should start with a propsal from Oded.

The *point* is to have the open userspace, so I really don't care what
their proprietary universe does, and shrinking the opensource side
becuase it is "redundant" is completely backwards to what we want to
see.

Jason

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2020-09-22 16:52 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 02/14] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 03/14] habanalabs/gaudi: add NIC security configuration Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 04/14] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 05/14] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 06/14] habanalabs/gaudi: add NIC PHY code Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 07/14] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 08/14] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 09/14] habanalabs/gaudi: add CQ " Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 10/14] habanalabs/gaudi: add WQ " Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 11/14] habanalabs/gaudi: add QP error handling Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 12/14] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 13/14] habanalabs/gaudi: support DCB protocol Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 14/14] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
2020-09-15 20:35 ` [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
2020-09-15 20:46   ` Oded Gabbay
2020-09-15 21:04     ` Jakub Kicinski
2020-09-15 21:20       ` Oded Gabbay
2020-09-15 21:37         ` Andrew Lunn
2020-09-15 21:43           ` Oded Gabbay
2020-09-15 22:35             ` David Miller
2020-09-15 22:36           ` David Miller
2020-09-15 22:34         ` David Miller
2020-09-16  4:26           ` Oded Gabbay
2020-09-17 17:18     ` Jason Gunthorpe
2020-09-18 11:36       ` Gal Pressman
2020-09-18 11:52         ` Leon Romanovsky
2020-09-18 11:56           ` Oded Gabbay
2020-09-18 12:03             ` Leon Romanovsky
2020-09-18 12:07               ` Oded Gabbay
2020-09-18 12:19                 ` Leon Romanovsky
2020-09-18 12:31                   ` Oded Gabbay
2020-09-18 13:09                     ` Leon Romanovsky
2020-09-19  6:40                   ` Greg Kroah-Hartman
2020-09-19  8:20                     ` Leon Romanovsky
2020-09-19  8:30                       ` Greg Kroah-Hartman
2020-09-19  8:58                         ` Leon Romanovsky
2020-09-19 16:43                         ` Oded Gabbay
2020-09-19 17:27                           ` Greg Kroah-Hartman
2020-09-19 19:22                             ` Jason Gunthorpe
2020-09-20  8:47                               ` Greg Kroah-Hartman
2020-09-20 19:05                                 ` Oded Gabbay
2020-09-21 10:39                                   ` Leon Romanovsky
2020-09-21 11:52                                 ` Jason Gunthorpe
2020-09-21 21:20                                   ` Jakub Kicinski
2020-09-22 11:49                                     ` Jason Gunthorpe
2020-09-19 18:49                           ` Andrew Lunn
2020-09-18 11:56         ` Jason Gunthorpe
2020-09-18 11:59           ` Oded Gabbay
2020-09-18 12:16             ` Jason Gunthorpe
2020-09-18 12:34               ` Oded Gabbay
2020-09-18 12:50                 ` Jason Gunthorpe
2020-09-18 13:02                   ` Oded Gabbay
2020-09-18 13:26                     ` Jason Gunthorpe
2020-09-18 13:49                       ` Oded Gabbay
2020-09-18 13:59                         ` Jason Gunthorpe
2020-09-18 14:12                           ` Oded Gabbay
2020-09-18 14:19                             ` Jason Gunthorpe
2020-09-18 14:45                               ` Oded Gabbay
2020-09-18 15:07                                 ` Jason Gunthorpe
2020-09-18 15:15                                   ` Oded Gabbay
2020-09-18 15:28                                     ` Jason Gunthorpe
2020-09-21 11:22                                       ` Gal Pressman
2020-09-21 11:49                                         ` Leon Romanovsky
2020-09-22 11:41                                         ` Jason Gunthorpe
2020-09-22 12:46                                           ` Gal Pressman
2020-09-22 16:14                                             ` Jason Gunthorpe
2020-09-22 16:30                                               ` Gal Pressman
2020-09-22 16:52                                                 ` Jason Gunthorpe
2020-09-18 12:10         ` Oded Gabbay
2020-09-15 20:42 ` David Miller
2020-09-15 20:49   ` Oded Gabbay
2020-09-16  6:26     ` Greg Kroah-Hartman
2020-09-16  6:36       ` Oded Gabbay
2020-09-16  7:42         ` Greg Kroah-Hartman
2020-09-16  8:02           ` Oded Gabbay
2020-09-16  8:22             ` Greg Kroah-Hartman
2020-09-16  8:47               ` Oded Gabbay
2020-09-16 12:00                 ` Greg Kroah-Hartman
2020-09-20 16:45                   ` Daniel Vetter
2020-09-16 23:04               ` Williams, Dan J
2020-09-18 12:00 ` Jason Gunthorpe
2020-09-18 12:01   ` Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).