linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
@ 2020-09-10 16:11 Oded Gabbay
  2020-09-10 16:11 ` [PATCH 02/15] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
                   ` (14 more replies)
  0 siblings, 15 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba

This patch-set adds support for initializing and using the GAUDI NIC ports,
functioning as scale-out interconnect when doing distributed Deep Learning
training. The training can be performed over tens of thousands of GAUDIs
and it is done using the RDMA-over-converged-Ethernet (RoCE) v2 protocol.

Each GAUDI exposes 10x100GbE ports that are designed to scale-out the
inter-GAUDI communication by integrating a complete communication engine
on-die. This native integration allows users to use the same scaling
technology, both inside the server and rack (termed as scale-up), as well
as for scaling across racks (scale-out). The racks can be connected
directly between GAUDI processors, or through any number of standard
Ethernet switches.

The driver exposes the NIC ports to the user as standard Ethernet ports by
registering each port to the networking subsystem. This allows the user to
manage the ports with standard tools such as ifconfig, ethtool, etc. It
also enables us to connect to the Linux networking stack and thus support
standard networking protocols, such as IPv4, IPv6, TCP, etc. In addition,
we can also leverage protocols such as DCB for dynamically configuring
priorities to avoid congestion.

For each NIC port there is a matching QMAN entity. For RoCE, the user
submits workloads to the NIC through the QMAN, same as he does for the
compute engines. For regular Ethernet, the user sends and receives packets
through the standard Ethernet sockets. Those sockets are used only as a
control path. The data path that is used for AI training goes through the
RoCE interface.

It is important to note that there are some limitations and uniqueness
in GAUDI's NIC H/W, compared to other networking adapters that enforced us
to use a less-than-common driver design:

1. The NIC functionality is NOT exposed as different PCI Physical
   Functions. There is a single PF which is used for compute and
   networking, as the main goal of the NIC ports is to be used as
   intra-communication and not as standard network interfaces. This
   implies we can't connect different drivers to handle the networking
   ports because it is the same device, from the kernel POV, as the
   compute. Therefore, we must integrate the networking code into the
   main habanalabs driver.

2. Although our communication engine implements RDMA, and the driver code
   uses well-known RDMA concepts such as QP context, CQ, WQ, etc., the
   GAUDI architecture does NOT support other basic IBverbs concepts, such
   as MR and protection domain. Therefore, we can't connect to the standard
   IBverb infrastructure in the user-space and kernel (rdma-core library
   and infiniband subsystem, respectively) because the standard RDMA s/w
   and tools won't work on our H/W. Instead, we added a new IOCTL to the
   driver's existing IOCTL API. The new IOCTL exposes the available
   NIC control operations to the user (e.g. Create a QP context).

3. The die-on communication engine provides minimal offloading for standard
   Ethernet and TCP/IP protocols, as those are only used for control plane.
   E.g. the packets are copied rather than using descriptors.
   Therefore, the Ethernet performance is quite low compared to standard
   Ethernet adapters.

4. There is no virtualization support per port.

Most or all of the above limitations will hopefully be improved in future
ASIC generations.

Patch-set organization:

- Patches 1 & 2 are just adding some auto-generated register header files
  and NIC-related definitions to the interface between the driver and the
  GAUDI firmware. 

- Patch 3 adds initialization of security restrictions on the NIC engines.

- Patch 4 adds initialization of the NIC QMANs. The QMANs are needed to
  send RDMA packets through the NIC engines.

- Patches 5-11 adds the NIC driver code. It contains the basic Ethernet
  driver and H/W initialization, the NIC PHY driver code and the new NIC
  control IOCTL operations.

- Patch 12-14 adds support for debugfs, ethtool and DCB.

- Patch 15 adds the implementation of the high-level init/fini functions
  and their calls from the common code. This is the patch that actually
  enables the NIC ports and allows the user to work with them.

Thanks,
Oded


Omer Shpigelman (15):
  habanalabs/gaudi: add NIC H/W and registers definitions
  habanalabs/gaudi: add NIC firmware-related definitions
  habanalabs/gaudi: add NIC security configuration
  habanalabs/gaudi: add support for NIC QMANs
  habanalabs/gaudi: add NIC Ethernet support
  habanalabs/gaudi: add NIC PHY code
  habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL
  habanalabs/gaudi: add a new IOCTL for NIC control operations
  habanalabs/gaudi: add CQ control operations
  habanalabs/gaudi: add WQ control operations
  habanalabs/gaudi: add QP error handling
  habanalabs/gaudi: add debugfs entries for the NIC
  habanalabs/gaudi: Add ethtool support using coresight
  habanalabs/gaudi: support DCB protocol
  habanalabs/gaudi: add NIC init/fini calls from common code

 .../ABI/testing/debugfs-driver-habanalabs     |   69 +
 drivers/misc/habanalabs/common/context.c      |    1 +
 drivers/misc/habanalabs/common/device.c       |   24 +-
 drivers/misc/habanalabs/common/firmware_if.c  |   44 +
 drivers/misc/habanalabs/common/habanalabs.h   |   33 +-
 .../misc/habanalabs/common/habanalabs_drv.c   |   11 +
 .../misc/habanalabs/common/habanalabs_ioctl.c |  151 +-
 drivers/misc/habanalabs/common/pci.c          |    1 +
 drivers/misc/habanalabs/gaudi/Makefile        |    4 +
 drivers/misc/habanalabs/gaudi/gaudi.c         |  958 +++-
 drivers/misc/habanalabs/gaudi/gaudiP.h        |  333 +-
 .../misc/habanalabs/gaudi/gaudi_coresight.c   |  144 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 4063 +++++++++++++++++
 drivers/misc/habanalabs/gaudi/gaudi_nic.h     |  354 ++
 .../misc/habanalabs/gaudi/gaudi_nic_dcbnl.c   |  108 +
 .../misc/habanalabs/gaudi/gaudi_nic_debugfs.c |  402 ++
 .../misc/habanalabs/gaudi/gaudi_nic_ethtool.c |  582 +++
 drivers/misc/habanalabs/gaudi/gaudi_phy.c     | 1272 ++++++
 .../misc/habanalabs/gaudi/gaudi_security.c    | 3973 ++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |   44 +
 .../misc/habanalabs/include/common/cpucp_if.h |   34 +-
 .../include/gaudi/asic_reg/gaudi_regs.h       |   26 +-
 .../include/gaudi/asic_reg/nic0_qm0_masks.h   |  800 ++++
 .../include/gaudi/asic_reg/nic0_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic0_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic0_qpc0_masks.h  |  500 ++
 .../include/gaudi/asic_reg/nic0_qpc0_regs.h   |  710 +++
 .../include/gaudi/asic_reg/nic0_qpc1_regs.h   |  710 +++
 .../include/gaudi/asic_reg/nic0_rxb_regs.h    |  508 +++
 .../include/gaudi/asic_reg/nic0_rxe0_masks.h  |  354 ++
 .../include/gaudi/asic_reg/nic0_rxe0_regs.h   |  158 +
 .../include/gaudi/asic_reg/nic0_rxe1_regs.h   |  158 +
 .../include/gaudi/asic_reg/nic0_stat_regs.h   |  518 +++
 .../include/gaudi/asic_reg/nic0_tmr_regs.h    |  184 +
 .../include/gaudi/asic_reg/nic0_txe0_masks.h  |  336 ++
 .../include/gaudi/asic_reg/nic0_txe0_regs.h   |  264 ++
 .../include/gaudi/asic_reg/nic0_txe1_regs.h   |  264 ++
 .../include/gaudi/asic_reg/nic0_txs0_masks.h  |  336 ++
 .../include/gaudi/asic_reg/nic0_txs0_regs.h   |  214 +
 .../include/gaudi/asic_reg/nic0_txs1_regs.h   |  214 +
 .../include/gaudi/asic_reg/nic1_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic1_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic2_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic2_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic3_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic3_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic4_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic4_qm1_regs.h    |  834 ++++
 drivers/misc/habanalabs/include/gaudi/gaudi.h |   12 +
 .../habanalabs/include/gaudi/gaudi_fw_if.h    |   24 +
 .../habanalabs/include/gaudi/gaudi_masks.h    |   15 +
 .../include/hw_ip/nic/nic_general.h           |   13 +
 include/uapi/misc/habanalabs.h                |  296 +-
 53 files changed, 27497 insertions(+), 62 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 02/15] habanalabs/gaudi: add NIC firmware-related definitions
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 03/15] habanalabs/gaudi: add NIC security configuration Oded Gabbay
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add new structures and messages that the driver use to interact with the
firmware to receive information and events (errors) about GAUDI's NIC.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 .../misc/habanalabs/include/common/cpucp_if.h | 34 ++++++++++++++++---
 .../habanalabs/include/gaudi/gaudi_fw_if.h    | 24 +++++++++++++
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/habanalabs/include/common/cpucp_if.h b/drivers/misc/habanalabs/include/common/cpucp_if.h
index dcde440427b4..ace746bb206e 100644
--- a/drivers/misc/habanalabs/include/common/cpucp_if.h
+++ b/drivers/misc/habanalabs/include/common/cpucp_if.h
@@ -9,6 +9,7 @@
 #define CPUCP_IF_H
 
 #include <linux/types.h>
+#include <linux/if_ether.h>
 
 /*
  * EVENT QUEUE
@@ -199,6 +200,11 @@ enum pq_init_status {
  *       CpuCP to write to the structure, to prevent data corruption in case of
  *       mismatched driver/FW versions.
  *
+ * CPUCP_PACKET_NIC_INFO_GET -
+ *       Fetch information from the device regarding the NIC. the host's driver
+ *       passes the max size it allows the CpuCP to write to the structure, to
+ *       prevent data corruption in case of mismatched driver/FW versions.
+ *
  * CPUCP_PACKET_TEMPERATURE_SET -
  *       Set the value of the offset property of a specified thermal sensor.
  *       The packet's arguments specify the desired sensor and the field to
@@ -238,12 +244,12 @@ enum cpucp_packet_id {
 	CPUCP_PACKET_MAX_POWER_GET,		/* sysfs */
 	CPUCP_PACKET_MAX_POWER_SET,		/* sysfs */
 	CPUCP_PACKET_EEPROM_DATA_GET,		/* sysfs */
-	CPUCP_RESERVED,
+	CPUCP_PACKET_NIC_INFO_GET,		/* internal */
 	CPUCP_PACKET_TEMPERATURE_SET,		/* sysfs */
 	CPUCP_PACKET_VOLTAGE_SET,		/* sysfs */
 	CPUCP_PACKET_CURRENT_SET,		/* sysfs */
-	CPUCP_PACKET_PCIE_THROUGHPUT_GET,		/* internal */
-	CPUCP_PACKET_PCIE_REPLAY_CNT_GET,		/* internal */
+	CPUCP_PACKET_PCIE_THROUGHPUT_GET,	/* internal */
+	CPUCP_PACKET_PCIE_REPLAY_CNT_GET,	/* internal */
 	CPUCP_PACKET_TOTAL_ENERGY_GET,		/* internal */
 };
 
@@ -288,7 +294,7 @@ struct cpucp_packet {
 		/* For led set */
 		__le32 led_index;
 
-		/* For get CpuCP info/EEPROM data */
+		/* For get CpuCP info/EEPROM data/NIC info */
 		__le32 data_max_size;
 	};
 
@@ -367,6 +373,12 @@ struct eq_generic_event {
 #define CARD_NAME_MAX_LEN		16
 #define VERSION_MAX_LEN			128
 #define CPUCP_MAX_SENSORS		128
+#define CPUCP_MAX_NICS			128
+#define CPUCP_LANES_PER_NIC		4
+#define CPUCP_NIC_QSFP_EEPROM_MAX_LEN	1024
+#define CPUCP_MAX_NIC_LANES		(CPUCP_MAX_NICS * CPUCP_LANES_PER_NIC)
+#define CPUCP_NIC_MASK_ARR_LEN		((CPUCP_MAX_NICS + 63) / 64)
+#define CPUCP_NIC_POLARITY_ARR_LEN	((CPUCP_MAX_NIC_LANES + 63) / 64)
 
 struct cpucp_sensor {
 	__le32 type;
@@ -415,4 +427,18 @@ struct cpucp_info {
 	char card_name[CARD_NAME_MAX_LEN];
 };
 
+struct cpucp_mac_addr {
+	__u8 mac_addr[ETH_ALEN];
+};
+
+struct cpucp_nic_info {
+	struct cpucp_mac_addr mac_addrs[CPUCP_MAX_NICS];
+	__le64 link_mask[CPUCP_NIC_MASK_ARR_LEN];
+	__le64 pol_tx_mask[CPUCP_NIC_POLARITY_ARR_LEN];
+	__le64 pol_rx_mask[CPUCP_NIC_POLARITY_ARR_LEN];
+	__le64 link_ext_mask[CPUCP_NIC_MASK_ARR_LEN];
+	__u8 qsfp_eeprom[CPUCP_NIC_QSFP_EEPROM_MAX_LEN];
+	__le64 auto_neg_mask[CPUCP_NIC_MASK_ARR_LEN];
+};
+
 #endif /* CPUCP_IF_H */
diff --git a/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h b/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h
index 8aadc6357da1..d61a4c87b765 100644
--- a/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h
+++ b/drivers/misc/habanalabs/include/gaudi/gaudi_fw_if.h
@@ -8,6 +8,8 @@
 #ifndef GAUDI_FW_IF_H
 #define GAUDI_FW_IF_H
 
+#include <linux/types.h>
+
 #define GAUDI_EVENT_QUEUE_MSI_IDX	8
 #define GAUDI_NIC_PORT1_MSI_IDX		10
 #define GAUDI_NIC_PORT3_MSI_IDX		12
@@ -31,6 +33,28 @@ enum gaudi_pll_index {
 	IF_PLL
 };
 
+enum gaudi_nic_axi_error {
+	RXB,
+	RXE,
+	TXS,
+	TXE,
+	QPC_RESP,
+	NON_AXI_ERR,
+};
+
+/*
+ * struct eq_nic_sei_event - describes an AXI error cause.
+ * @axi_error_cause: one of the events defined in enum gaudi_nic_axi_error.
+ * @id: can be either 0 or 1, to further describe unit with interrupt cause
+ *      (i.e. TXE0 or TXE1).
+ * @pad[6]: padding structure to 64bit.
+ */
+struct eq_nic_sei_event {
+	__u8 axi_error_cause;
+	__u8 id;
+	__u8 pad[6];
+};
+
 #define GAUDI_PLL_FREQ_LOW		200000000 /* 200 MHz */
 
 #endif /* GAUDI_FW_IF_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/15] habanalabs/gaudi: add NIC security configuration
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
  2020-09-10 16:11 ` [PATCH 02/15] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 04/15] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Configure the security properties of the NIC IP. This is to prevent the
user process from doing something with the NIC that he shouldn't do. e.g.
crash the server, steal data, etc.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 .../misc/habanalabs/gaudi/gaudi_security.c    | 3973 +++++++++++++++++
 1 file changed, 3973 insertions(+)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi_security.c b/drivers/misc/habanalabs/gaudi/gaudi_security.c
index 2d7add0e5bcc..8a921ab56557 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_security.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_security.c
@@ -5157,6 +5157,3977 @@ static void gaudi_init_dma_protection_bits(struct hl_device *hdev)
 	WREG32(pb_addr + word_offset, ~mask);
 }
 
+static void gaudi_init_nic_protection_bits(struct hl_device *hdev)
+{
+	u32 pb_addr, mask;
+	u8 word_offset;
+
+	WREG32(mmNIC0_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC0_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC0_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+				PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC0_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC0_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC0_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC0_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC1_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC1_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC1_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC1_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC1_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC1_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC1_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC2_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC2_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC2_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_2 & PROT_BITS_OFFS)
+				>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC2_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC2_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC2_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC2_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC3_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC3_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC3_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC3_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC3_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC3_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC3_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	WREG32(mmNIC4_QM0_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+	WREG32(mmNIC4_QM1_BASE - CFG_BASE + PROT_BITS_OFFS + 0x7C, 0);
+
+	pb_addr = (mmNIC4_QM0_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM0_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM0_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM0_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM0_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_GLBL_CFG0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_GLBL_CFG0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_GLBL_CFG0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_CFG1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_PROT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_NON_SECURE_PROPS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_STS1_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_MSG_EN_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_LO_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_PQ_BASE_HI_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_PQ_BASE_HI_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_PQ_BASE_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_BASE_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_SIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_PI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_CFG1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS0_3 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_PQ_STS1_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_PQ_STS1_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_PQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_PQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_STS1_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_0 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CQ_CTL_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CQ_CTL_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CQ_CTL_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_LO_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_PTR_HI_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_TSIZE_STS_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CQ_CTL_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CQ_CTL_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CQ_CTL_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_CTL_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CQ_IFIFO_CNT_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE0_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE1_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_2 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE2_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_MSG_BASE3_ADDR_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_TSIZE_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_SRC_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_LDMA_DST_BASE_LO_OFFSET_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_STS_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_STS_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_LO_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_CURRENT_INST_HI_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_2 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_BARRIER_CFG_3 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_BARRIER_CFG_3 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_BARRIER_CFG_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_CP_DBG_0_2 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_CP_DBG_0_2 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_CP_DBG_0_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_DBG_0_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_ARUSER_31_11_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CP_AWUSER_31_11_4 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_CFG_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_CFG_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_19 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_23 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_MST_AVAIL_CRED_24 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_24 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_AVAIL_CRED_31 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_MST_CHOISE_PUSH_OFST_23 & ~0xFFF) +
+			PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_MST_CHOISE_PUSH_OFST_23 &
+			PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_SLV_CHOISE_WDT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_MAX_INFLIGHT & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_AWUSER_31_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_AWUSER_SEC_PROP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_AWUSER_NON_SEC_PROP & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_STATE_STS & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_STATE_STS & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_STATE_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_CHOISE_FULLNESS_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MSG_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_SLV_CHOISE_Q_HEAD & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_ERR_CAUSE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_ERR_MSG_EN & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_ERR_STS_DRP & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_2 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_3 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_4 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_5 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_6 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_7 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_8 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_9 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_10 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_11 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_12 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_13 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_14 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_15 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_16 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_17 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_18 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_19 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_ARB_MST_CRED_STS_20 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_ARB_MST_CRED_STS_20 & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_20 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_21 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_22 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_23 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_24 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_25 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_26 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_27 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_28 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_29 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_30 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_ARB_MST_CRED_STS_31 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CGM_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CGM_STS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CGM_CFG1 & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_LOCAL_RANGE_BASE & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_LOCAL_RANGE_BASE & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_LOCAL_RANGE_BASE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_LOCAL_RANGE_SIZE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_CSMR_STRICT_PRIO_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_HBW_RD_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_LBW_WR_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_LBW_WR_RATE_LIM_CFG_1 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_HBW_RD_RATE_LIM_CFG_0 & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_AXCACHE & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_CFG & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_WDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_RDATA & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_IND_GW_APB_STATUS & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_ADDR_LO & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_ADDR_HI & 0x7F) >> 2);
+	mask |= 1U << ((mmNIC4_QM1_GLBL_ERR_WDATA & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+
+	pb_addr = (mmNIC4_QM1_GLBL_MEM_INIT_BUSY & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmNIC4_QM1_GLBL_MEM_INIT_BUSY & PROT_BITS_OFFS)
+			>> 7) << 2;
+	mask = 1U << ((mmNIC4_QM1_GLBL_MEM_INIT_BUSY & 0x7F) >> 2);
+
+	WREG32(pb_addr + word_offset, ~mask);
+}
+
 static void gaudi_init_tpc_protection_bits(struct hl_device *hdev)
 {
 	u32 pb_addr, mask;
@@ -8861,6 +12832,8 @@ static void gaudi_init_protection_bits(struct hl_device *hdev)
 
 	gaudi_init_mme_protection_bits(hdev);
 
+	gaudi_init_nic_protection_bits(hdev);
+
 	gaudi_init_tpc_protection_bits(hdev);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/15] habanalabs/gaudi: add support for NIC QMANs
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
  2020-09-10 16:11 ` [PATCH 02/15] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
  2020-09-10 16:11 ` [PATCH 03/15] habanalabs/gaudi: add NIC security configuration Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Initialize the QMANs that are responsible to submit doorbells to the NIC
engines. Add support for stopping and disabling them, and reset them as
part of the hard-reset procedure of GAUDI. This will allow the user to
submit work to the NICs.

Add support for receiving events on QMAN errors from the firmware.

However, the nic_ports_mask is still initialized to 0. That means this code
won't initialize the QMANs just yet. That will be in a later patch.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/habanalabs.h |   3 +-
 drivers/misc/habanalabs/gaudi/gaudi.c       | 741 ++++++++++++++++++--
 drivers/misc/habanalabs/gaudi/gaudiP.h      |  32 +
 3 files changed, 731 insertions(+), 45 deletions(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index ec765320159a..3be39d8b0563 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -1516,8 +1516,6 @@ struct hl_device_idle_busy_ts {
  * @pmmu_huge_range: is a different virtual addresses range used for PMMU with
  *                   huge pages.
  * @init_done: is the initialization of the device done.
- * @mmu_enable: is MMU enabled.
- * @mmu_huge_page_opt: is MMU huge pages optimization enabled.
  * @device_cpu_disabled: is the device CPU disabled (due to timeouts)
  * @dma_mask: the dma mask that was set for this device
  * @in_debug: is device under debug. This, together with fpriv_list, enforces
@@ -1630,6 +1628,7 @@ struct hl_device {
 	u8				supports_soft_reset;
 
 	/* Parameters for bring-up */
+	u64				nic_ports_mask;
 	u8				mmu_enable;
 	u8				mmu_huge_page_opt;
 	u8				cpu_enable;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 483989500863..2159e14be4ef 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -301,46 +301,46 @@ static enum hl_queue_type gaudi_queue_type[GAUDI_QUEUE_ID_SIZE] = {
 	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_1 */
 	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_2 */
 	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_0_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_1_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_2_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_3_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_4_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_5_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_6_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_7_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_8_3 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_0 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_1 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_2 */
-	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_3 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_0 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_1 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_2 */
+	QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_3 */
 };
 
 struct ecc_info_extract_params {
@@ -792,6 +792,27 @@ static int gaudi_late_init(struct hl_device *hdev)
 		return rc;
 	}
 
+	if ((hdev->card_type == cpucp_card_type_pci) &&
+			(hdev->nic_ports_mask & 0x3)) {
+		dev_info(hdev->dev,
+			"PCI card detected, only 8 ports are enabled\n");
+		hdev->nic_ports_mask &= ~0x3;
+
+		/* Stop and disable unused NIC QMANs */
+		WREG32(mmNIC0_QM0_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+		WREG32(mmNIC0_QM1_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+					NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+		WREG32(mmNIC0_QM0_GLBL_CFG0, 0);
+		WREG32(mmNIC0_QM1_GLBL_CFG0, 0);
+
+		gaudi->hw_cap_initialized &= ~(HW_CAP_NIC0 | HW_CAP_NIC1);
+	}
+
 	rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS);
 	if (rc) {
 		dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
@@ -938,6 +959,9 @@ static int gaudi_alloc_internal_qmans_pq_mem(struct hl_device *hdev)
 		case GAUDI_QUEUE_ID_TPC_0_0 ... GAUDI_QUEUE_ID_TPC_7_3:
 			q->pq_size = TPC_QMAN_SIZE_IN_BYTES;
 			break;
+		case GAUDI_QUEUE_ID_NIC_0_0 ... GAUDI_QUEUE_ID_NIC_9_3:
+			q->pq_size = NIC_QMAN_SIZE_IN_BYTES;
+			break;
 		default:
 			dev_err(hdev->dev, "Bad internal queue index %d", i);
 			rc = -EINVAL;
@@ -2332,6 +2356,133 @@ static void gaudi_init_tpc_qmans(struct hl_device *hdev)
 	}
 }
 
+static void gaudi_init_nic_qman(struct hl_device *hdev, u32 nic_offset,
+				int qman_id, u64 qman_base_addr, int nic_id)
+{
+	u32 mtr_base_lo, mtr_base_hi;
+	u32 so_base_lo, so_base_hi;
+	u32 q_off;
+	u32 nic_qm_err_cfg;
+
+	mtr_base_lo = lower_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+	mtr_base_hi = upper_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+	so_base_lo = lower_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+	so_base_hi = upper_32_bits(CFG_BASE +
+				mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+	q_off = nic_offset + qman_id * 4;
+
+	WREG32(mmNIC0_QM0_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_base_addr));
+	WREG32(mmNIC0_QM0_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_base_addr));
+
+	WREG32(mmNIC0_QM0_PQ_SIZE_0 + q_off, ilog2(NIC_QMAN_LENGTH));
+	WREG32(mmNIC0_QM0_PQ_PI_0 + q_off, 0);
+	WREG32(mmNIC0_QM0_PQ_CI_0 + q_off, 0);
+
+	WREG32(mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_0 + q_off, 0x74);
+	WREG32(mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off, 0x14);
+	WREG32(mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off, 0x1C);
+
+	WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_lo);
+	WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_hi);
+	WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_lo);
+	WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_hi);
+
+	if (qman_id == 0) {
+		/* Configure RAZWI IRQ */
+		nic_qm_err_cfg = NIC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
+		if (hdev->stop_on_err) {
+			nic_qm_err_cfg |=
+				NIC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
+		}
+
+		WREG32(mmNIC0_QM0_GLBL_ERR_CFG + nic_offset, nic_qm_err_cfg);
+		WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_LO + nic_offset,
+			lower_32_bits(CFG_BASE +
+				mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR));
+		WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_HI + nic_offset,
+			upper_32_bits(CFG_BASE +
+				mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR));
+		WREG32(mmNIC0_QM0_GLBL_ERR_WDATA + nic_offset,
+			gaudi_irq_map_table[GAUDI_EVENT_NIC0_QM0].cpu_id +
+									nic_id);
+
+		WREG32(mmNIC0_QM0_ARB_ERR_MSG_EN + nic_offset,
+				QM_ARB_ERR_MSG_EN_MASK);
+
+		/* Increase ARB WDT to support streams architecture */
+		WREG32(mmNIC0_QM0_ARB_SLV_CHOISE_WDT + nic_offset,
+				GAUDI_ARB_WDT_TIMEOUT);
+
+		WREG32(mmNIC0_QM0_GLBL_CFG1 + nic_offset, 0);
+		WREG32(mmNIC0_QM0_GLBL_PROT + nic_offset,
+				QMAN_INTERNAL_MAKE_TRUSTED);
+	}
+}
+
+/**
+ * gaudi_init_nic_qmans - Initialize NIC QMAN registers
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Initialize the H/W registers of the NIC QMANs
+ *
+ */
+void gaudi_init_nic_qmans(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_internal_qman_info *q;
+	u64 qman_base_addr;
+	u32 nic_offset = 0;
+	u32 nic_delta_between_qmans =
+			mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	u32 nic_delta_between_nics =
+			mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	int i, nic_id, internal_q_index;
+
+	if (!hdev->nic_ports_mask)
+		return;
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC_MASK)
+		return;
+
+	dev_dbg(hdev->dev, "Initializing NIC QMANs\n");
+
+	for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
+		if (!(hdev->nic_ports_mask & (1 << nic_id))) {
+			nic_offset += nic_delta_between_qmans;
+			if (nic_id & 1) {
+				nic_offset -= (nic_delta_between_qmans * 2);
+				nic_offset += nic_delta_between_nics;
+			}
+			continue;
+		}
+
+		for (i = 0 ; i < QMAN_STREAMS ; i++) {
+			internal_q_index = GAUDI_QUEUE_ID_NIC_0_0 +
+						nic_id * QMAN_STREAMS + i;
+			q = &gaudi->internal_qmans[internal_q_index];
+			qman_base_addr = (u64) q->pq_dma_addr;
+			gaudi_init_nic_qman(hdev, nic_offset, (i & 0x3),
+						qman_base_addr, nic_id);
+		}
+
+		/* Enable the QMAN */
+		WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, NIC_QMAN_ENABLE);
+
+		nic_offset += nic_delta_between_qmans;
+		if (nic_id & 1) {
+			nic_offset -= (nic_delta_between_qmans * 2);
+			nic_offset += nic_delta_between_nics;
+		}
+
+		gaudi->hw_cap_initialized |= 1 << (HW_CAP_NIC_SHIFT + nic_id);
+	}
+}
+
 static void gaudi_disable_pci_dma_qmans(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -2384,6 +2535,30 @@ static void gaudi_disable_tpc_qmans(struct hl_device *hdev)
 	}
 }
 
+static void gaudi_disable_nic_qmans(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 nic_mask, nic_offset = 0;
+	u32 nic_delta_between_qmans =
+			mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	u32 nic_delta_between_nics =
+			mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+	int nic_id;
+
+	for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
+		nic_mask = 1 << (HW_CAP_NIC_SHIFT + nic_id);
+
+		if (gaudi->hw_cap_initialized & nic_mask)
+			WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, 0);
+
+		nic_offset += nic_delta_between_qmans;
+		if (nic_id & 1) {
+			nic_offset -= (nic_delta_between_qmans * 2);
+			nic_offset += nic_delta_between_nics;
+		}
+	}
+}
+
 static void gaudi_stop_pci_dma_qmans(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -2442,6 +2617,73 @@ static void gaudi_stop_tpc_qmans(struct hl_device *hdev)
 	WREG32(mmTPC7_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 }
 
+static void gaudi_stop_nic_qmans(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	/* Stop upper CPs of QMANs */
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC0)
+		WREG32(mmNIC0_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC1)
+		WREG32(mmNIC0_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC2)
+		WREG32(mmNIC1_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC3)
+		WREG32(mmNIC1_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC4)
+		WREG32(mmNIC2_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC5)
+		WREG32(mmNIC2_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC6)
+		WREG32(mmNIC3_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC7)
+		WREG32(mmNIC3_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC8)
+		WREG32(mmNIC4_QM0_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC9)
+		WREG32(mmNIC4_QM1_GLBL_CFG1,
+				NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+				NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+}
+
 static void gaudi_pci_dma_stall(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -2631,6 +2873,7 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 	else
 		wait_timeout_ms = GAUDI_RESET_WAIT_MSEC;
 
+	gaudi_stop_nic_qmans(hdev);
 
 	gaudi_stop_mme_qmans(hdev);
 	gaudi_stop_tpc_qmans(hdev);
@@ -2648,6 +2891,7 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 
 	msleep(wait_timeout_ms);
 
+	gaudi_disable_nic_qmans(hdev);
 	gaudi_disable_mme_qmans(hdev);
 	gaudi_disable_tpc_qmans(hdev);
 	gaudi_disable_hbm_dma_qmans(hdev);
@@ -2963,11 +3207,13 @@ static int gaudi_hw_init(struct hl_device *hdev)
 
 	gaudi_init_tpc_qmans(hdev);
 
+	gaudi_init_nic_qmans(hdev);
+
 	hdev->asic_funcs->set_clock_gating(hdev);
 
 	gaudi_enable_timestamp(hdev);
 
-	/* MSI must be enabled before CPU queues are initialized */
+	/* MSI must be enabled before CPU queues and NIC are initialized */
 	rc = gaudi_enable_msi(hdev);
 	if (rc)
 		goto disable_queues;
@@ -3066,7 +3312,7 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset)
 					HW_CAP_HBM | HW_CAP_PCI_DMA |
 					HW_CAP_MME | HW_CAP_TPC_MASK |
 					HW_CAP_HBM_DMA | HW_CAP_PLL |
-					HW_CAP_MMU |
+					HW_CAP_NIC_MASK | HW_CAP_MMU |
 					HW_CAP_SRAM_SCRAMBLER |
 					HW_CAP_HBM_SCRAMBLER |
 					HW_CAP_CLK_GATE);
@@ -3336,6 +3582,166 @@ static void gaudi_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 		db_reg_offset = mmTPC7_QM_PQ_PI_3;
 		break;
 
+	case GAUDI_QUEUE_ID_NIC_0_0:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_0_1:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_0_2:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_0_3:
+		db_reg_offset = mmNIC0_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_0:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_1:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_2:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_1_3:
+		db_reg_offset = mmNIC0_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_0:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_1:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_2:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_2_3:
+		db_reg_offset = mmNIC1_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_0:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_1:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_2:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_3_3:
+		db_reg_offset = mmNIC1_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_0:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_1:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_2:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_4_3:
+		db_reg_offset = mmNIC2_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_0:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_1:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_2:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_5_3:
+		db_reg_offset = mmNIC2_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_0:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_1:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_2:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_6_3:
+		db_reg_offset = mmNIC3_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_0:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_1:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_2:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_7_3:
+		db_reg_offset = mmNIC3_QM1_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_0:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_1:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_2:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_8_3:
+		db_reg_offset = mmNIC4_QM0_PQ_PI_3;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_0:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_0;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_1:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_1;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_2:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_2;
+		break;
+
+	case GAUDI_QUEUE_ID_NIC_9_3:
+		db_reg_offset = mmNIC4_QM1_PQ_PI_3;
+		break;
+
 	default:
 		invalid_queue = true;
 	}
@@ -4230,6 +4636,17 @@ static int gaudi_parse_cb_no_ext_queue(struct hl_device *hdev,
 					struct hl_cs_parser *parser)
 {
 	struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 nic_mask_q_id = 1 << (HW_CAP_NIC_SHIFT +
+		((parser->hw_queue_id - GAUDI_QUEUE_ID_NIC_0_0) >> 2));
+
+	if ((parser->hw_queue_id >= GAUDI_QUEUE_ID_NIC_0_0) &&
+			(parser->hw_queue_id <= GAUDI_QUEUE_ID_NIC_9_3) &&
+			(!(gaudi->hw_cap_initialized & nic_mask_q_id))) {
+		dev_err(hdev->dev, "h/w queue %d is disabled\n",
+				parser->hw_queue_id);
+		return -EINVAL;
+	}
 
 	/* For internal queue jobs just check if CB address is valid */
 	if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
@@ -4463,6 +4880,12 @@ static void gaudi_restore_qm_registers(struct hl_device *hdev)
 		qman_offset = i * TPC_QMAN_OFFSET;
 		WREG32(mmTPC0_QM_ARB_CFG_0 + qman_offset, 0);
 	}
+
+	for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
+		qman_offset = (i >> 1) * NIC_MACRO_QMAN_OFFSET +
+				(i & 0x1) * NIC_ENGINE_QMAN_OFFSET;
+		WREG32(mmNIC0_QM0_ARB_CFG_0 + qman_offset, 0);
+	}
 }
 
 static void gaudi_restore_user_registers(struct hl_device *hdev)
@@ -4897,6 +5320,136 @@ static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid)
 	gaudi_mmu_prepare_reg(hdev, mmMME2_ACC_WBC, asid);
 	gaudi_mmu_prepare_reg(hdev, mmMME3_ACC_WBC, asid);
 
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC0) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC1) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC2) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC3) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC4) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC5) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC6) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC7) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC8) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
+	if (hdev->nic_ports_mask & GAUDI_NIC_MASK_NIC9) {
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_0,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_1,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_2,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_3,
+				asid);
+		gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_4,
+				asid);
+	}
+
 	gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_ARUSER, asid);
 	gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_AWUSER, asid);
 
@@ -5426,6 +5979,8 @@ static void gaudi_handle_ecc_event(struct hl_device *hdev, u16 event_type,
 		params.num_memories = 33;
 		params.derr = true;
 		params.disable_clock_gating = true;
+		extract_info_from_fw = false;
+		break;
 	default:
 		return;
 	}
@@ -5477,6 +6032,56 @@ static void gaudi_handle_qman_err(struct hl_device *hdev, u16 event_type)
 			mmDMA0_QM_ARB_ERR_CAUSE + index * DMA_QMAN_OFFSET;
 		snprintf(desc, ARRAY_SIZE(desc), "%s%d", "DMA_QM", index);
 		break;
+	case GAUDI_EVENT_NIC0_QM0:
+		glbl_sts_addr = mmNIC0_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC0_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM0");
+		break;
+	case GAUDI_EVENT_NIC0_QM1:
+		glbl_sts_addr = mmNIC0_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC0_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM1");
+		break;
+	case GAUDI_EVENT_NIC1_QM0:
+		glbl_sts_addr = mmNIC1_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC1_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM0");
+		break;
+	case GAUDI_EVENT_NIC1_QM1:
+		glbl_sts_addr = mmNIC1_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC1_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM1");
+		break;
+	case GAUDI_EVENT_NIC2_QM0:
+		glbl_sts_addr = mmNIC2_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC2_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM0");
+		break;
+	case GAUDI_EVENT_NIC2_QM1:
+		glbl_sts_addr = mmNIC2_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC2_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM1");
+		break;
+	case GAUDI_EVENT_NIC3_QM0:
+		glbl_sts_addr = mmNIC3_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC3_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM0");
+		break;
+	case GAUDI_EVENT_NIC3_QM1:
+		glbl_sts_addr = mmNIC3_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC3_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM1");
+		break;
+	case GAUDI_EVENT_NIC4_QM0:
+		glbl_sts_addr = mmNIC4_QM0_GLBL_STS1_0;
+		arb_err_addr = mmNIC4_QM0_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM0");
+		break;
+	case GAUDI_EVENT_NIC4_QM1:
+		glbl_sts_addr = mmNIC4_QM1_GLBL_STS1_0;
+		arb_err_addr = mmNIC4_QM1_ARB_ERR_CAUSE;
+		snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM1");
+		break;
 	default:
 		return;
 	}
@@ -5854,6 +6459,16 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
 	case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
 		fallthrough;
+	case GAUDI_EVENT_NIC0_QM0:
+	case GAUDI_EVENT_NIC0_QM1:
+	case GAUDI_EVENT_NIC1_QM0:
+	case GAUDI_EVENT_NIC1_QM1:
+	case GAUDI_EVENT_NIC2_QM0:
+	case GAUDI_EVENT_NIC2_QM1:
+	case GAUDI_EVENT_NIC3_QM0:
+	case GAUDI_EVENT_NIC3_QM1:
+	case GAUDI_EVENT_NIC4_QM0:
+	case GAUDI_EVENT_NIC4_QM1:
 	case GAUDI_EVENT_DMA0_CORE ... GAUDI_EVENT_DMA7_CORE:
 		gaudi_print_irq_info(hdev, event_type, true);
 		gaudi_handle_qman_err(hdev, event_type);
@@ -6087,10 +6702,11 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
 	struct gaudi_device *gaudi = hdev->asic_specific;
 	const char *fmt = "%-5d%-9s%#-14x%#-12x%#x\n";
 	const char *mme_slave_fmt = "%-5d%-9s%-14s%-12s%#x\n";
+	const char *nic_fmt = "%-5d%-9s%#-14x%#x\n";
 	u32 qm_glbl_sts0, qm_cgm_sts, dma_core_sts0, tpc_cfg_sts, mme_arch_sts;
 	bool is_idle = true, is_eng_idle, is_slave;
 	u64 offset;
-	int i, dma_id;
+	int i, dma_id, port;
 
 	mutex_lock(&gaudi->clk_gate_mutex);
 
@@ -6179,6 +6795,45 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
 		}
 	}
 
+	if (s)
+		seq_puts(s, "\nNIC  is_idle  QM_GLBL_STS0  QM_CGM_STS\n"
+				"---  -------  ------------  ----------\n");
+
+	for (i = 0 ; i < (NIC_NUMBER_OF_ENGINES / 2) ; i++) {
+		offset = i * NIC_MACRO_QMAN_OFFSET;
+		port = 2 * i;
+		if (hdev->nic_ports_mask & BIT(port)) {
+			qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
+			qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
+			is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
+			is_idle &= is_eng_idle;
+
+			if (mask)
+				*mask |= ((u64) !is_eng_idle) <<
+						(GAUDI_ENGINE_ID_NIC_0 + port);
+			if (s)
+				seq_printf(s, nic_fmt, port,
+						is_eng_idle ? "Y" : "N",
+						qm_glbl_sts0, qm_cgm_sts);
+		}
+
+		port = 2 * i + 1;
+		if (hdev->nic_ports_mask & BIT(port)) {
+			qm_glbl_sts0 = RREG32(mmNIC0_QM1_GLBL_STS0 + offset);
+			qm_cgm_sts = RREG32(mmNIC0_QM1_CGM_STS + offset);
+			is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
+			is_idle &= is_eng_idle;
+
+			if (mask)
+				*mask |= ((u64) !is_eng_idle) <<
+						(GAUDI_ENGINE_ID_NIC_0 + port);
+			if (s)
+				seq_printf(s, nic_fmt, port,
+						is_eng_idle ? "Y" : "N",
+						qm_glbl_sts0, qm_cgm_sts);
+		}
+	}
+
 	if (s)
 		seq_puts(s, "\n");
 
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index dd222bc128f9..2ccf7e5a97c7 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -79,6 +79,7 @@
 #define TPC_QMAN_OFFSET		(mmTPC1_QM_BASE - mmTPC0_QM_BASE)
 #define MME_QMAN_OFFSET		(mmMME1_QM_BASE - mmMME0_QM_BASE)
 #define NIC_MACRO_QMAN_OFFSET	(mmNIC1_QM0_BASE - mmNIC0_QM0_BASE)
+#define NIC_ENGINE_QMAN_OFFSET	(mmNIC0_QM1_BASE - mmNIC0_QM0_BASE)
 
 #define TPC_CFG_OFFSET		(mmTPC1_CFG_BASE - mmTPC0_CFG_BASE)
 
@@ -132,6 +133,10 @@
 #define TPC_QMAN_LENGTH			1024
 #define TPC_QMAN_SIZE_IN_BYTES		(TPC_QMAN_LENGTH * QMAN_PQ_ENTRY_SIZE)
 
+#define NIC_QMAN_LENGTH			1024
+#define NIC_QMAN_SIZE_IN_BYTES		(NIC_QMAN_LENGTH * QMAN_PQ_ENTRY_SIZE)
+
+
 #define SRAM_USER_BASE_OFFSET  GAUDI_DRIVER_SRAM_RESERVED_SIZE_FROM_START
 
 /* Virtual address space */
@@ -153,6 +158,19 @@
 #define HW_CAP_SRAM_SCRAMBLER	BIT(10)
 #define HW_CAP_HBM_SCRAMBLER	BIT(11)
 
+#define HW_CAP_NIC0		BIT(14)
+#define HW_CAP_NIC1		BIT(15)
+#define HW_CAP_NIC2		BIT(16)
+#define HW_CAP_NIC3		BIT(17)
+#define HW_CAP_NIC4		BIT(18)
+#define HW_CAP_NIC5		BIT(19)
+#define HW_CAP_NIC6		BIT(20)
+#define HW_CAP_NIC7		BIT(21)
+#define HW_CAP_NIC8		BIT(22)
+#define HW_CAP_NIC9		BIT(23)
+#define HW_CAP_NIC_MASK		GENMASK(23, 14)
+#define HW_CAP_NIC_SHIFT	14
+
 #define HW_CAP_TPC0		BIT(24)
 #define HW_CAP_TPC1		BIT(25)
 #define HW_CAP_TPC2		BIT(26)
@@ -200,6 +218,20 @@ enum gaudi_tpc_mask {
 	GAUDI_TPC_MASK_ALL = 0xFF
 };
 
+enum gaudi_nic_mask {
+	GAUDI_NIC_MASK_NIC0 = 0x01,
+	GAUDI_NIC_MASK_NIC1 = 0x02,
+	GAUDI_NIC_MASK_NIC2 = 0x04,
+	GAUDI_NIC_MASK_NIC3 = 0x08,
+	GAUDI_NIC_MASK_NIC4 = 0x10,
+	GAUDI_NIC_MASK_NIC5 = 0x20,
+	GAUDI_NIC_MASK_NIC6 = 0x40,
+	GAUDI_NIC_MASK_NIC7 = 0x80,
+	GAUDI_NIC_MASK_NIC8 = 0x100,
+	GAUDI_NIC_MASK_NIC9 = 0x200,
+	GAUDI_NIC_MASK_ALL = 0x3FF
+};
+
 /**
  * struct gaudi_internal_qman_info - Internal QMAN information.
  * @pq_kernel_addr: Kernel address of the PQ memory area in the host.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (2 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 04/15] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 20:03   ` Jakub Kicinski
  2020-09-10 16:11 ` [PATCH 06/15] habanalabs/gaudi: add NIC PHY code Oded Gabbay
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Basic NIC driver which handles Ethernet packet of several types like IPv4,
IPv6, LLDP, VLAN and ARP.

The NIC HW is composed of 5 NIC macros, in each macro 2 NIC engines of
100GbE each. Each engine exposes a single port of 100GbE, so in total we
have 10 ports per GAUDI device.

The driver gets the needed information for initialization from the firmware
such as card type, available ports, Auto-negotiation support, polarity and
Tx taps configuration.

Two card types are supported: standalone PCI and PCI Mezzanine Card (PMC)
which is part of a server called HLS-1. Each type has its own unique
configurations.

We define two types of port connectivity - internal and external. Internal
port is connected to a port on another Gaudi card and external port is
connected to a switch.

The Ethernet support is needed only for control flows e.g. get IP. Hence it
is implemented in a very simple way - the packets are copied rather than
using descriptors.

The Rx flow uses NAPI by default and polling mode is supported by a
kernel module parameter.

Because we must not access the HW while doing hard reset to the device, a
new stage of stopping all NIC activity is added at the beginning of the
reset flow.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/context.c      |    1 +
 drivers/misc/habanalabs/common/firmware_if.c  |   44 +
 drivers/misc/habanalabs/common/habanalabs.h   |   13 +-
 .../misc/habanalabs/common/habanalabs_drv.c   |   10 +
 drivers/misc/habanalabs/gaudi/Makefile        |    2 +
 drivers/misc/habanalabs/gaudi/gaudi.c         |  177 +-
 drivers/misc/habanalabs/gaudi/gaudiP.h        |  288 ++-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 2298 +++++++++++++++++
 drivers/misc/habanalabs/gaudi/gaudi_nic.h     |  337 +++
 drivers/misc/habanalabs/goya/goya.c           |    6 +
 include/uapi/misc/habanalabs.h                |    3 +
 11 files changed, 3171 insertions(+), 8 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h

diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index b168a9fce817..248fd3287d88 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -37,6 +37,7 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
 		if ((hdev->in_debug) && (hdev->compute_ctx == ctx))
 			hl_device_set_debug_mode(hdev, false);
 
+		hdev->asic_funcs->ctx_fini(ctx);
 		hl_vm_ctx_fini(ctx);
 		hl_asid_free(hdev, ctx->asid);
 	} else {
diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c
index 4409962d30ae..95260dc0458b 100644
--- a/drivers/misc/habanalabs/common/firmware_if.c
+++ b/drivers/misc/habanalabs/common/firmware_if.c
@@ -364,6 +364,50 @@ int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
 	return rc;
 }
 
+int hl_fw_cpucp_nic_info_get(struct hl_device *hdev)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct cpucp_packet pkt = {};
+	void *cpucp_nic_info_cpu_addr;
+	dma_addr_t cpucp_nic_info_dma_addr;
+	long result;
+	int rc;
+
+	cpucp_nic_info_cpu_addr =
+			hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev,
+					sizeof(struct cpucp_nic_info),
+					&cpucp_nic_info_dma_addr);
+	if (!cpucp_nic_info_cpu_addr) {
+		dev_err(hdev->dev,
+			"Failed to allocate DMA memory for CPU-CP NIC info packet\n");
+		return -ENOMEM;
+	}
+
+	memset(cpucp_nic_info_cpu_addr, 0, sizeof(struct cpucp_nic_info));
+
+	pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_INFO_GET <<
+				CPUCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.addr = cpu_to_le64(cpucp_nic_info_dma_addr);
+	pkt.data_max_size = cpu_to_le32(sizeof(struct cpucp_nic_info));
+
+	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
+					HL_CPUCP_INFO_TIMEOUT_USEC, &result);
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to handle CPU-CP NIC info pkt, error %d\n", rc);
+		goto out;
+	}
+
+	memcpy(&prop->cpucp_nic_info, cpucp_nic_info_cpu_addr,
+			sizeof(prop->cpucp_nic_info));
+
+out:
+	hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev,
+			sizeof(struct cpucp_nic_info), cpucp_nic_info_cpu_addr);
+
+	return rc;
+}
+
 int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
 		struct hl_info_pci_counters *counters)
 {
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 3be39d8b0563..f99db3483ba4 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -264,6 +264,8 @@ struct hl_mmu_properties {
  * @hw_queues_props: H/W queues properties.
  * @cpucp_info: received various information from CPU-CP regarding the H/W, e.g.
  *		available sensors.
+ * @cpucp_nic_info: received various information from CPU-CP regarding the NIC
+ *                  H/W, e.g. MAC addresses.
  * @uboot_ver: F/W U-boot version.
  * @preboot_ver: F/W Preboot version.
  * @dmmu: DRAM MMU address translation properties.
@@ -278,7 +280,7 @@ struct hl_mmu_properties {
  * @dram_user_base_address: DRAM physical start address for user access.
  * @dram_size: DRAM total size.
  * @dram_pci_bar_size: size of PCI bar towards DRAM.
- * @max_power_default: max power of the device after reset
+ * @max_power_default: max power of the device after reset.
  * @dram_size_for_default_page_mapping: DRAM size needed to map to avoid page
  *                                      fault.
  * @pcie_dbi_base_address: Base address of the PCIE_DBI block.
@@ -314,6 +316,7 @@ struct hl_mmu_properties {
 struct asic_fixed_properties {
 	struct hw_queue_properties	*hw_queues_props;
 	struct cpucp_info		cpucp_info;
+	struct cpucp_nic_info		cpucp_nic_info;
 	char				uboot_ver[VERSION_MAX_LEN];
 	char				preboot_ver[VERSION_MAX_LEN];
 	struct hl_mmu_properties	dmmu;
@@ -680,6 +683,7 @@ enum div_select_defs {
  * @wreg: Write a register. Needed for simulator support.
  * @halt_coresight: stop the ETF and ETR traces.
  * @ctx_init: context dependent initialization.
+ * @ctx_fini: context dependent cleanup.
  * @get_clk_rate: Retrieve the ASIC current and maximum clock rate in MHz
  * @get_queue_id_for_cq: Get the H/W queue id related to the given CQ index.
  * @read_device_fw_version: read the device's firmware versions that are
@@ -782,6 +786,7 @@ struct hl_asic_funcs {
 	void (*wreg)(struct hl_device *hdev, u32 reg, u32 val);
 	void (*halt_coresight)(struct hl_device *hdev);
 	int (*ctx_init)(struct hl_ctx *ctx);
+	void (*ctx_fini)(struct hl_ctx *ctx);
 	int (*get_clk_rate)(struct hl_device *hdev, u32 *cur_clk, u32 *max_clk);
 	u32 (*get_queue_id_for_cq)(struct hl_device *hdev, u32 cq_idx);
 	void (*read_device_fw_version)(struct hl_device *hdev,
@@ -1528,6 +1533,7 @@ struct hl_device_idle_busy_ts {
  * @sync_stream_queue_idx: helper index for sync stream queues initialization.
  * @supports_coresight: is CoreSight supported.
  * @supports_soft_reset: is soft reset supported.
+ * @nic_rx_poll: enable NIC Rx in polling mode rather than IRQ.
  */
 struct hl_device {
 	struct pci_dev			*pdev;
@@ -1626,9 +1632,12 @@ struct hl_device {
 	u8				sync_stream_queue_idx;
 	u8				supports_coresight;
 	u8				supports_soft_reset;
+	u8				nic_rx_poll;
 
 	/* Parameters for bring-up */
 	u64				nic_ports_mask;
+	u64				nic_ports_ext_mask;
+	u64				nic_auto_neg_mask;
 	u8				mmu_enable;
 	u8				mmu_huge_page_opt;
 	u8				cpu_enable;
@@ -1641,6 +1650,7 @@ struct hl_device {
 	u8				dram_scrambler_enable;
 	u8				hard_reset_on_fw_events;
 	u8				bmc_enable;
+	u8				nic_load_fw;
 	u8				rl_enable;
 };
 
@@ -1858,6 +1868,7 @@ void hl_fw_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
 int hl_fw_send_heartbeat(struct hl_device *hdev);
 int hl_fw_cpucp_info_get(struct hl_device *hdev);
 int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size);
+int hl_fw_cpucp_nic_info_get(struct hl_device *hdev);
 int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
 		struct hl_info_pci_counters *counters);
 int hl_fw_cpucp_total_energy_get(struct hl_device *hdev,
diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c
index f9067d3ef437..df92afc1b9d5 100644
--- a/drivers/misc/habanalabs/common/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/common/habanalabs_drv.c
@@ -29,6 +29,7 @@ static DEFINE_MUTEX(hl_devs_idr_lock);
 
 static int timeout_locked = 5;
 static int reset_on_lockup = 1;
+static int nic_rx_poll;
 
 module_param(timeout_locked, int, 0444);
 MODULE_PARM_DESC(timeout_locked,
@@ -38,6 +39,10 @@ module_param(reset_on_lockup, int, 0444);
 MODULE_PARM_DESC(reset_on_lockup,
 	"Do device reset on lockup (0 = no, 1 = yes, default yes)");
 
+module_param(nic_rx_poll, int, 0444);
+MODULE_PARM_DESC(nic_rx_poll,
+	"Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");
+
 #define PCI_VENDOR_ID_HABANALABS	0x1da3
 
 #define PCI_IDS_GOYA			0x0001
@@ -241,6 +246,10 @@ static void set_driver_behavior_per_device(struct hl_device *hdev)
 	hdev->dram_scrambler_enable = 1;
 	hdev->bmc_enable = 1;
 	hdev->hard_reset_on_fw_events = 1;
+	hdev->card_type = cpucp_card_type_pci;
+	hdev->nic_ports_ext_mask = 0x3FF;
+	hdev->nic_auto_neg_mask = 0x3FF;
+	hdev->nic_load_fw = 0;
 }
 
 /*
@@ -283,6 +292,7 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
 
 	hdev->major = hl_major;
 	hdev->reset_on_lockup = reset_on_lockup;
+	hdev->nic_rx_poll = nic_rx_poll;
 	hdev->pldm = 0;
 
 	set_driver_behavior_per_device(hdev);
diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index c9f4703cff24..24e14cff563d 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -1,3 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
+
+HL_GAUDI_FILES += gaudi/gaudi_nic.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 2159e14be4ef..d350519a9e31 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -78,6 +78,7 @@
 #define GAUDI_PLDM_MMU_TIMEOUT_USEC	(MMU_CONFIG_TIMEOUT_USEC * 100)
 #define GAUDI_PLDM_QMAN0_TIMEOUT_USEC	(HL_DEVICE_TIMEOUT_USEC * 30)
 #define GAUDI_PLDM_TPC_KERNEL_WAIT_USEC	(HL_DEVICE_TIMEOUT_USEC * 30)
+#define GAUDI_PLDM_NIC_QPC_INV_USEC	(NIC_QPC_INV_USEC * 10)
 #define GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC	1000000		/* 1s */
 #define GAUDI_MSG_TO_CPU_TIMEOUT_USEC	4000000		/* 4s */
 
@@ -457,7 +458,10 @@ static int gaudi_get_fixed_properties(struct hl_device *hdev)
 	prop->num_of_events = GAUDI_EVENT_SIZE;
 	prop->tpc_enabled_mask = TPC_ENABLED_MASK;
 
-	prop->max_power_default = MAX_POWER_DEFAULT_PCI;
+	if (hdev->card_type == cpucp_card_type_pmc)
+		prop->max_power_default = MAX_POWER_DEFAULT_PMC;
+	else
+		prop->max_power_default = MAX_POWER_DEFAULT_PCI;
 
 	prop->cb_pool_cb_cnt = GAUDI_CB_POOL_CB_CNT;
 	prop->cb_pool_cb_size = GAUDI_CB_POOL_CB_SIZE;
@@ -781,6 +785,14 @@ static int gaudi_init_tpc_mem(struct hl_device *hdev)
 	return rc;
 }
 
+static int gaudi_nic_clear_mem(struct hl_device *hdev)
+{
+	if (!hdev->nic_ports_mask)
+		return 0;
+
+	return gaudi_memset_device_memory(hdev, NIC_DRV_ADDR, NIC_DRV_SIZE, 0);
+}
+
 static int gaudi_late_init(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -835,6 +847,12 @@ static int gaudi_late_init(struct hl_device *hdev)
 		goto disable_pci_access;
 	}
 
+	rc = gaudi_nic_clear_mem(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to clear NIC memory\n");
+		goto disable_pci_access;
+	}
+
 	return 0;
 
 disable_pci_access:
@@ -864,6 +882,17 @@ static void gaudi_late_fini(struct hl_device *hdev)
 	hdev->hl_chip_info->info = NULL;
 }
 
+static void gaudi_nic_handle_rx(struct gaudi_nic_device *gaudi_nic)
+{
+	/* at this point, interrupts were disabled by the H/W */
+	napi_schedule(&gaudi_nic->napi);
+}
+
+static int gaudi_nic_handle_tx(struct gaudi_nic_device *gaudi_nic, void *data)
+{
+	return gaudi_nic_handle_tx_pkt(gaudi_nic, data);
+}
+
 static int gaudi_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
 {
 	dma_addr_t dma_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
@@ -1012,6 +1041,8 @@ static int gaudi_sw_init(struct hl_device *hdev)
 	}
 
 	gaudi->cpucp_info_get = gaudi_cpucp_info_get;
+	gaudi->nic_handle_rx = gaudi_nic_handle_rx;
+	gaudi->nic_handle_tx = gaudi_nic_handle_tx;
 
 	gaudi->max_freq_value = GAUDI_MAX_CLK_FREQ;
 
@@ -1052,14 +1083,30 @@ static int gaudi_sw_init(struct hl_device *hdev)
 	if (rc)
 		goto free_cpu_accessible_dma_pool;
 
+	rc = gaudi_nic_sw_init(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to init NIC S/W\n");
+		rc = -ENOMEM;
+		goto free_internal_qmans_pq_mem;
+	}
+
 	spin_lock_init(&gaudi->hw_queues_lock);
 	mutex_init(&gaudi->clk_gate_mutex);
 
+	/* Device CPU loads the PHY F/W at boot */
+	gaudi->nic_phy_load_fw = (!hdev->cpu_enable && !hdev->pldm) ||
+					(hdev->nic_load_fw);
+	gaudi->nic_phy_config_fw = !hdev->pldm;
+	gaudi->nic_qpc_cache_inv_timeout = hdev->pldm ?
+			GAUDI_PLDM_NIC_QPC_INV_USEC : NIC_QPC_INV_USEC;
+	gaudi->nic_debugfs_reset = true;
 	hdev->supports_sync_stream = true;
 	hdev->supports_coresight = true;
 
 	return 0;
 
+free_internal_qmans_pq_mem:
+	gaudi_free_internal_qmans_pq_mem(hdev);
 free_cpu_accessible_dma_pool:
 	gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 free_cpu_dma_mem:
@@ -1080,6 +1127,8 @@ static int gaudi_sw_fini(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
 
+	gaudi_nic_sw_fini(hdev);
+
 	gaudi_free_internal_qmans_pq_mem(hdev);
 
 	gen_pool_destroy(hdev->cpu_accessible_dma_pool);
@@ -1103,6 +1152,8 @@ static int gaudi_sw_fini(struct hl_device *hdev)
 static irqreturn_t gaudi_irq_handler_single(int irq, void *arg)
 {
 	struct hl_device *hdev = arg;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
 	int i;
 
 	if (hdev->disabled)
@@ -1111,6 +1162,16 @@ static irqreturn_t gaudi_irq_handler_single(int irq, void *arg)
 	for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
 		hl_irq_handler_cq(irq, &hdev->completion_queue[i]);
 
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		if (!(hdev->nic_ports_mask & BIT(i)) || (!gaudi_nic->port_open))
+			continue;
+
+		gaudi_nic_rx_irq_handler(irq, gaudi_nic);
+	}
+
+	gaudi_nic_cq_irq_handler(irq, hdev);
 	hl_irq_handler_eq(irq, &hdev->event_queue);
 
 	return IRQ_HANDLED;
@@ -1270,6 +1331,8 @@ static void gaudi_disable_msi(struct hl_device *hdev)
 static void gaudi_init_scrambler_sram(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 status;
+	int rc;
 
 	if (gaudi->hw_cap_initialized & HW_CAP_SRAM_SCRAMBLER)
 		return;
@@ -1277,6 +1340,36 @@ static void gaudi_init_scrambler_sram(struct hl_device *hdev)
 	if (!hdev->sram_scrambler_enable)
 		return;
 
+	/* In case we don't load F/W, we must wait for uboot to finish before
+	 * we enable scrambling. Otherwise, we risk interrupting it in the
+	 * middle of initialization, which can cause the device to get stuck
+	 */
+	if ((!hdev->pldm) && (hdev->cpu_enable) && (!hdev->fw_loading)) {
+		dev_info(hdev->dev,
+			"Waiting for u-boot to finish before enabling SRAM scrambler\n");
+
+		rc = hl_poll_timeout(
+			hdev,
+			mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS,
+			status,
+			(status == CPU_BOOT_STATUS_NIC_FW_RDY) ||
+			(status == CPU_BOOT_STATUS_READY_TO_BOOT) ||
+			(status == CPU_BOOT_STATUS_SRAM_AVAIL),
+			10000,
+			GAUDI_NIC_FW_TIMEOUT_USEC);
+
+		if (rc)
+			dev_warn(hdev->dev,
+				"Failed to detect u-boot has finished loading NIC F/W. Maybe running old F/W?\n");
+
+		if (status != CPU_BOOT_STATUS_SRAM_AVAIL)
+			ssleep(1);
+
+		/* Stop the device CPU to make sure nothing bad happens */
+		WREG32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU, KMD_MSG_GOTO_WFE);
+		msleep(GAUDI_CPU_RESET_WAIT_MSEC);
+	}
+
 	WREG32(mmNIF_RTR_CTRL_0_SCRAM_SRAM_EN,
 			1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 	WREG32(mmNIF_RTR_CTRL_1_SCRAM_SRAM_EN,
@@ -2873,6 +2966,13 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 	else
 		wait_timeout_ms = GAUDI_RESET_WAIT_MSEC;
 
+	/*
+	 * Mark the NIC as in reset to avoid any new NIC accesses to the
+	 * H/W. This must be done before we stop the CPU as the NIC
+	 * might use it e.g. get/set EEPROM data.
+	 */
+	gaudi_nic_hard_reset_prepare(hdev);
+
 	gaudi_stop_nic_qmans(hdev);
 
 	gaudi_stop_mme_qmans(hdev);
@@ -2899,6 +2999,8 @@ static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset)
 
 	gaudi_disable_timestamp(hdev);
 
+	/* NIC stop must be called before MSI is disabled */
+	gaudi_nic_stop(hdev);
 	gaudi_disable_msi(hdev);
 }
 
@@ -3183,6 +3285,16 @@ static int gaudi_hw_init(struct hl_device *hdev)
 
 	gaudi_init_hbm_dma_qmans(hdev);
 
+	/*
+	 * Before pushing u-boot/linux to device, need to set the hbm bar to
+	 * base address of dram
+	 */
+	if (gaudi_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
+		dev_err(hdev->dev,
+			"failed to map HBM bar to DRAM base address\n");
+		return -EIO;
+	}
+
 	rc = gaudi_init_cpu(hdev);
 	if (rc) {
 		dev_err(hdev->dev, "failed to initialize CPU\n");
@@ -3314,7 +3426,7 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset)
 					HW_CAP_HBM_DMA | HW_CAP_PLL |
 					HW_CAP_NIC_MASK | HW_CAP_MMU |
 					HW_CAP_SRAM_SCRAMBLER |
-					HW_CAP_HBM_SCRAMBLER |
+					HW_CAP_HBM_SCRAMBLER | HW_CAP_NIC_DRV |
 					HW_CAP_CLK_GATE);
 
 	memset(gaudi->events_stat, 0, sizeof(gaudi->events_stat));
@@ -6104,6 +6216,45 @@ static void gaudi_print_irq_info(struct hl_device *hdev, u16 event_type,
 	}
 }
 
+static void gaudi_print_nic_axi_irq_info(struct hl_device *hdev, u16 event_type,
+						void *data)
+{
+	char desc[64] = "", *type;
+	struct eq_nic_sei_event *eq_nic_sei = data;
+	u16 nic_id = event_type - GAUDI_EVENT_NIC_SEI_0;
+
+	switch (eq_nic_sei->axi_error_cause) {
+	case RXB:
+		type = "RXB";
+		break;
+	case RXE:
+		type = "RXE";
+		break;
+	case TXS:
+		type = "TXS";
+		break;
+	case TXE:
+		type = "TXE";
+		break;
+	case QPC_RESP:
+		type = "QPC_RESP";
+		break;
+	case NON_AXI_ERR:
+		type = "NON_AXI_ERR";
+		break;
+	default:
+		dev_err(hdev->dev, "unknown NIC AXI cause %d\n",
+			eq_nic_sei->axi_error_cause);
+		type = "N/A";
+		break;
+	}
+
+	snprintf(desc, sizeof(desc), "NIC%d_%s%d", nic_id, type,
+			eq_nic_sei->id);
+	dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
+		event_type, desc);
+}
+
 static int gaudi_soft_reset_late_init(struct hl_device *hdev)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
@@ -6302,6 +6453,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 				struct hl_eq_entry *eq_entry)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
+	u64 data = le64_to_cpu(eq_entry->data[0]);
 	u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
 	u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
 			>> EQ_CTL_EVENT_TYPE_SHIFT);
@@ -6330,6 +6482,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_PSOC_MEM_DERR:
 	case GAUDI_EVENT_PSOC_CORESIGHT_DERR:
 	case GAUDI_EVENT_SRAM0_DERR ... GAUDI_EVENT_SRAM28_DERR:
+	case GAUDI_EVENT_NIC0_DERR ... GAUDI_EVENT_NIC4_DERR:
 	case GAUDI_EVENT_DMA_IF0_DERR ... GAUDI_EVENT_DMA_IF3_DERR:
 	case GAUDI_EVENT_HBM_0_DERR ... GAUDI_EVENT_HBM_3_DERR:
 	case GAUDI_EVENT_MMU_DERR:
@@ -6431,6 +6584,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_PSOC_MEM_SERR:
 	case GAUDI_EVENT_PSOC_CORESIGHT_SERR:
 	case GAUDI_EVENT_SRAM0_SERR ... GAUDI_EVENT_SRAM28_SERR:
+	case GAUDI_EVENT_NIC0_SERR ... GAUDI_EVENT_NIC4_SERR:
 	case GAUDI_EVENT_DMA_IF0_SERR ... GAUDI_EVENT_DMA_IF3_SERR:
 	case GAUDI_EVENT_HBM_0_SERR ... GAUDI_EVENT_HBM_3_SERR:
 		fallthrough;
@@ -6494,6 +6648,11 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 		hl_fw_unmask_irq(hdev, event_type);
 		break;
 
+	case GAUDI_EVENT_NIC_SEI_0 ... GAUDI_EVENT_NIC_SEI_4:
+		gaudi_print_nic_axi_irq_info(hdev, event_type, &data);
+		hl_fw_unmask_irq(hdev, event_type);
+		break;
+
 	case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E:
 		gaudi_print_clk_change_info(hdev, event_type);
 		hl_fw_unmask_irq(hdev, event_type);
@@ -6999,6 +7158,19 @@ static int gaudi_ctx_init(struct hl_ctx *ctx)
 	return 0;
 }
 
+static void gaudi_ctx_fini(struct hl_ctx *ctx)
+{
+	struct hl_device *hdev = ctx->hdev;
+
+	/* Gaudi will NEVER support more then a single compute context.
+	 * Therefore, don't clear anything unless it is the compute context
+	 */
+	if (hdev->compute_ctx != ctx)
+		return;
+
+	gaudi_nic_ctx_fini(ctx);
+}
+
 static u32 gaudi_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 {
 	return gaudi_cq_assignment[cq_idx];
@@ -7302,6 +7474,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.wreg = hl_wreg,
 	.halt_coresight = gaudi_halt_coresight,
 	.ctx_init = gaudi_ctx_init,
+	.ctx_fini = gaudi_ctx_fini,
 	.get_clk_rate = gaudi_get_clk_rate,
 	.get_queue_id_for_cq = gaudi_get_queue_id_for_cq,
 	.read_device_fw_version = gaudi_read_device_fw_version,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 2ccf7e5a97c7..bf3a215e0f8e 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -15,6 +15,9 @@
 #include "../include/gaudi/gaudi.h"
 #include "../include/gaudi/gaudi_async_events.h"
 
+#include <linux/netdevice.h>
+#include <linux/kfifo.h>
+
 #define NUMBER_OF_EXT_HW_QUEUES		12
 #define NUMBER_OF_CMPLT_QUEUES		NUMBER_OF_EXT_HW_QUEUES
 #define NUMBER_OF_CPU_HW_QUEUES		1
@@ -27,9 +30,12 @@
  * Number of MSI interrupts IDS:
  * Each completion queue has 1 ID
  * The event queue has 1 ID
+ * Each NIC engine has 1 ID for Rx
+ * The NIC CQ has 1 ID
  */
 #define NUMBER_OF_INTERRUPTS		(NUMBER_OF_CMPLT_QUEUES + \
-						NUMBER_OF_CPU_HW_QUEUES)
+						NUMBER_OF_CPU_HW_QUEUES + \
+						NIC_NUMBER_OF_ENGINES + 1)
 
 #if (NUMBER_OF_INTERRUPTS > GAUDI_MSI_ENTRIES)
 #error "Number of MSI interrupts must be smaller or equal to GAUDI_MSI_ENTRIES"
@@ -44,6 +50,10 @@
 
 #define GAUDI_CPU_TIMEOUT_USEC		15000000	/* 15s */
 
+#define GAUDI_NIC_FW_TIMEOUT_USEC	12000000	/* 12s */
+
+#define NIC_QPC_INV_USEC		1000000		/* 1s */
+
 #define TPC_ENABLED_MASK		0xFF
 
 #define GAUDI_HBM_SIZE_32GB		0x800000000ull
@@ -100,20 +110,22 @@
 	(((mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_511 - \
 	mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0) + 4) >> 2)
 
+#define NIC_NUMBER_OF_PORTS	NIC_NUMBER_OF_ENGINES
+#define NIC_MAX_NUM_OF_LANES	(NIC_NUMBER_OF_MACROS * NIC_MAC_NUM_OF_LANES)
 
 /* DRAM Memory Map */
 
 #define CPU_FW_IMAGE_SIZE	0x10000000	/* 256MB */
 #define MMU_PAGE_TABLES_SIZE	0x0BF00000	/* 191MB */
 #define MMU_CACHE_MNG_SIZE	0x00100000	/* 1MB */
-#define RESERVED		0x04000000	/* 64MB */
+#define NIC_DRV_SIZE		0x04000000	/* 64MB */
 
 #define CPU_FW_IMAGE_ADDR	DRAM_PHYS_BASE
 #define MMU_PAGE_TABLES_ADDR	(CPU_FW_IMAGE_ADDR + CPU_FW_IMAGE_SIZE)
 #define MMU_CACHE_MNG_ADDR	(MMU_PAGE_TABLES_ADDR + MMU_PAGE_TABLES_SIZE)
+#define NIC_DRV_ADDR		(MMU_CACHE_MNG_ADDR + MMU_CACHE_MNG_SIZE)
 
-#define DRAM_DRIVER_END_ADDR	(MMU_CACHE_MNG_ADDR + MMU_CACHE_MNG_SIZE +\
-								RESERVED)
+#define DRAM_DRIVER_END_ADDR	(NIC_DRV_ADDR + NIC_DRV_SIZE)
 
 #define DRAM_BASE_ADDR_USER	0x20000000
 
@@ -145,6 +157,8 @@
 #define VA_HOST_SPACE_SIZE	(VA_HOST_SPACE_END - \
 					VA_HOST_SPACE_START) /* 767TB */
 
+#define VA_NIC_MEM_ADDR		0x10000000000ull /* 1TB */
+
 #define HW_CAP_PLL		BIT(0)
 #define HW_CAP_HBM		BIT(1)
 #define HW_CAP_MMU		BIT(2)
@@ -157,6 +171,7 @@
 #define HW_CAP_CLK_GATE		BIT(9)
 #define HW_CAP_SRAM_SCRAMBLER	BIT(10)
 #define HW_CAP_HBM_SCRAMBLER	BIT(11)
+#define HW_CAP_NIC_DRV		BIT(12)
 
 #define HW_CAP_NIC0		BIT(14)
 #define HW_CAP_NIC1		BIT(15)
@@ -232,6 +247,180 @@ enum gaudi_nic_mask {
 	GAUDI_NIC_MASK_ALL = 0x3FF
 };
 
+/**
+ * struct gaudi_nic_tx_taps - holds the NIC Tx taps values for a specific lane.
+ *                            Currently used for PAM4 only.
+ * @taps: holds all taps - tx_pre2, tx_pre1, tx_main, tx_post1 and tx_post2.
+ */
+struct gaudi_nic_tx_taps {
+	s32	taps[NIC_PHY_TX_TAPS_NUM];
+};
+
+/**
+ * struct gaudi_nic_macro - manage specific NIC macro that holds two NIC
+ *                          engines.
+ * @idx: index of the NIC macro.
+ * @num_of_lanes: number of lanes in the NIC macro.
+ */
+struct gaudi_nic_macro {
+	u8	idx;
+	u8	num_of_lanes;
+};
+
+/**
+ * struct gaudi_nic_device - manage specific NIC port.
+ * @hdev: habanalabs device structure.
+ * @ndev: pointer to network device.
+ * @nic_macro: pointer to the manage structure of the containing NIC macro.
+ * @napi: New API structure.
+ * @tx_wq: Tx work queue for handling packet transmission outside interrupt
+ *         context (for simulator only).
+ * @rx_wq: Rx work queue for handling incoming packets outside interrupt
+ *         context (for simulator only).
+ * @cq_wq: CQ work queue for handling CQEs outside interrupt context.
+ * @rx_mem_cpu: CPU address of RX memory.
+ * @rx_mem_dma: DMA address of RX memory.
+ * @cq_mem_cpu: CPU address of CQ memory.
+ * @cq_mem_dma: DMA address of CQ memory.
+ * @qp_err_mem_cpu: CPU address of QP error memory.
+ * @qp_err_mem_dma: DMA address of QP error memory.
+ * @in_reset: 1 if the NIC is currently in reset, 0 otherwise.
+ * @rx_poll_work: Rx work for polling mode.
+ * @cq_work: CQ work for processing CQEs.
+ * @link_status_work: work for checking NIC link status.
+ * @port_open_work: work for initializing the port H/W.
+ * @idr_lock: Protects qp_ids.
+ * @user_wq_lock: protects the user WQ configuration.
+ * @qp_ids: IDR to hold all connections IDs.
+ * @pcs_fail_fifo: queue for keeping the PCS link failures time stamps in order
+ *                 to reconfigure F/W if needed.
+ * @last_cqe_ts: time stamp of last processed CQE.
+ * @last_fw_tuning_ts: time stamp of last F/W tuning.
+ * @last_pcs_link_drop_ts: time stamp of last PCS link drop.
+ * @rx_msi_addr: Rx MSI address.
+ * @tx_swq_mem_device_va: device virtual address of Tx SWQ memory.
+ * @cq_mem_device_va: device virtual address of CQ memory.
+ * @rx_mem_size: Rx memory size.
+ * @cq_mem_size: CQ memory size.
+ * @qp_err_mem_size: QP error buffer memory size.
+ * @rx_ci: incremented by S/W for each received packet from the H/W.
+ * @tx_pi: incremented by S/W for each sent packet to the H/W.
+ * @tx_ci: incremented by H/W for each sent packet from the H/W.
+ * @cq_ci: incremented by S/W for each consumed CQE.
+ * @port: NIC specific port.
+ * @data_rate: NIC data rate according to speed and number of lanes.
+ * @tx_wq_pi: TX work queue PI for transmitting packets by their order (for
+ *            simulator only).
+ * @tx_wq_ci: TX work queue CI for transmitting packets by their order (for
+ *            simulator only).
+ * @qp_err_ci: next index of the QP error to fetch.
+ * @retry_cnt: counts the number of retries during link establishment.
+ * @pcs_fail_cnt: counter of PCS link failures since last F/W configuration.
+ * @pcs_local_fault_cnt: counter of PCS link local errors since last F/W
+ *                       configuration. These errors can appear even when link
+ *                       is up.
+ * @pcs_remote_fault_cnt: counter of PCS link remote errors since last F/W
+ *                        configuration. These errors can appear even when link
+ *                        is up.
+ * @speed: the bandwidth of the port in Mb/s.
+ * @last_cqe_cnt: the last number of processed CQEs.
+ * @cq_delay: the time between two invocations of the CQ polling work when not
+ *            idle.
+ * @cq_delay_idle: the time between two invocations of the CQ polling work when
+ *                 idle.
+ * @correctable_errors_cnt: count the correctable FEC blocks.
+ * @uncorrectable_errors_cnt: count the uncorrectable FEC blocks.
+ * @enabled: true if the NIC is enabled by the S/W, false otherwise. Can be
+ *           changed only from ndo_open/ndo_stop callbacks.
+ * @active: true if the NIC H/W is operational, false otherwise.
+ * @port_open: true if the port H/W is initialized, false otherwise.
+ * @do_macro_cfg: true if this port should handle the macro configuration, false
+ *              otherwise. Each NIC macro contains two ports - even and odd, and
+ *              only one of them should handle the shared configuration.
+ *              The default is for the even port to handle it, but in case that
+ *              the even port is disabled, the odd port will do it.
+ * @phy_fw_tuned: true if F/W is tuned, false otherwise.
+ * @pcs_link: true if the NIC has PCS link, false otherwise.
+ * @mac_loopback: true if port in MAC loopback mode, false otherwise.
+ * @auto_neg_enable: true if this port supports Autonegotiation, false
+ *                   otherwise.
+ * @auto_neg_resolved: true if this port completed Autonegotiation, false
+ *                     otherwise.
+ * @power_up_mask: represents which MAC channels should be configured during PHY
+ *                 power up.
+ * @fw_tuning_mask: represents which MAC channels should be configured during
+ *                  F/W tuning.
+ * @auto_neg_mask: represents which MAC channels should be configured during
+ *                 Autonegotiation.
+ * @pfc_enable: true if this port supports Priority Flow Control, false
+ *              otherwise.
+ */
+struct gaudi_nic_device {
+	struct hl_device	*hdev;
+	struct net_device	*ndev;
+	struct gaudi_nic_macro	*nic_macro;
+	struct napi_struct	napi;
+	struct workqueue_struct	*tx_wq;
+	struct workqueue_struct	*rx_wq;
+	struct workqueue_struct	*cq_wq;
+	void			*rx_mem_cpu;
+	dma_addr_t		rx_mem_dma;
+	void			*cq_mem_cpu;
+	dma_addr_t		cq_mem_dma;
+	void			*qp_err_mem_cpu;
+	dma_addr_t		qp_err_mem_dma;
+	atomic_t		in_reset;
+	struct delayed_work	rx_poll_work;
+	struct delayed_work	cq_work;
+	struct delayed_work	link_status_work;
+	struct delayed_work	port_open_work;
+	struct mutex		idr_lock;
+	struct mutex		user_wq_lock;
+	struct idr		qp_ids;
+	struct kfifo		pcs_fail_fifo;
+	ktime_t			last_cqe_ts;
+	ktime_t			last_fw_tuning_ts;
+	ktime_t			last_pcs_link_drop_ts;
+	u64			rx_msi_addr;
+	u64			tx_swq_mem_device_va;
+	u64			cq_mem_device_va;
+	u32			rx_mem_size;
+	u32			cq_mem_size;
+	u32			qp_err_mem_size;
+	u32			rx_ci;
+	u32			tx_pi;
+	u32			tx_ci;
+	u32			cq_ci;
+	u32			port;
+	u32			data_rate;
+	u32			tx_wq_pi;
+	u32			tx_wq_ci;
+	u32			qp_err_ci;
+	u32			retry_cnt;
+	u32			pcs_fail_cnt;
+	u32			pcs_local_fault_cnt;
+	u32			pcs_remote_fault_cnt;
+	u32			speed;
+	u32			last_cqe_cnt;
+	u32			cq_delay;
+	u32			cq_delay_idle;
+	u32			correctable_errors_cnt;
+	u32			uncorrectable_errors_cnt;
+	u8			enabled;
+	u8			active;
+	u8			port_open;
+	u8			do_macro_cfg;
+	u8			phy_fw_tuned;
+	u8			pcs_link;
+	u8			mac_loopback;
+	u8			auto_neg_enable;
+	u8			auto_neg_resolved;
+	u8			power_up_mask;
+	u8			fw_tuning_mask;
+	u8			auto_neg_mask;
+	u8			pfc_enable;
+};
+
 /**
  * struct gaudi_internal_qman_info - Internal QMAN information.
  * @pq_kernel_addr: Kernel address of the PQ memory area in the host.
@@ -247,14 +436,29 @@ struct gaudi_internal_qman_info {
 /**
  * struct gaudi_device - ASIC specific manage structure.
  * @cpucp_info_get: get information on device from CPU-CP
+ * @nic_handle_rx: NIC handler for incoming packet.
+ * @nic_handle_tx: NIC handler for outgoing packet.
+ * @nic_devices: array that holds all NIC ports manage structures.
+ * @nic_macros: array that holds all NIC macros manage structures.
+ * @nic_pam4_tx_taps: array that holds all PAM4 Tx taps of all NIC lanes.
+ * @nic_cq_comp: completion queue to handle wait/poll NIC CQ IOCTL.
+ * @nic_cq_lock: for serial copying of the CQEs from the NIC buffer to the user
+ *               queue.
  * @hw_queues_lock: protects the H/W queues from concurrent access.
  * @clk_gate_mutex: protects code areas that require clock gating to be disabled
  *                  temporarily
+ * @nic_cq_user_lock: protects the NIC CQ from concurrent operations that may
+ *               interfere with each other such as wait/mmap/destroy etc.
+ * @nic_qp_err_lock: protects the NIC QP error handler from pushing error
+ *                   entries to the CQ while it is under destruction.
+ * @nic_cq_buf: NIC CQ buffer, shared for all ports.
  * @internal_qmans: Internal QMANs information. The array size is larger than
  *                  the actual number of internal queues because they are not in
  *                  consecutive order.
  * @hbm_bar_cur_addr: current address of HBM PCI bar.
  * @max_freq_value: current max clk frequency.
+ * @nic_mac_loopback: enable MAC loopback on specific NIC ports.
+ * @nic_cq_user_new_cqes: number of available CQEs to process.
  * @events: array that holds all event id's
  * @events_stat: array that holds histogram of all received events.
  * @events_stat_aggregate: same as events_stat but doesn't get cleared on reset
@@ -263,29 +467,88 @@ struct gaudi_internal_qman_info {
  *                      signal we can use this engine in later code paths.
  *                      Each bit is cleared upon reset of its corresponding H/W
  *                      engine.
+ * @nic_cq_user_num_of_entries: number of CQ entries in the user CQ buffer
+ *                              (received from the user).
+ * @nic_cq_user_pi: producer index of the NIC CQ user buffer.
+ * @nic_cq_user_ci: consumer index of the NIC CQ user buffer.
+ * @nic_cq_status: return status of the CQ.
+ * @nic_cq_mmap_size: size of the mmapped CQ buffer.
+ * @nic_pcs_fail_time_frame: time frame is seconds to count PCS link failure.
+ * @nic_pcs_fail_threshold: PCS link failures threshold to reset link.
+ * @nic_qpc_cache_inv_timeout: timeout for NIC QPC cache invalidation.
+ * @nic_phy_load_fw: true if the NIC PHY F/W should be loaded, false otherwise.
+ * @nic_phy_config_fw: true if the NIC PHY F/W should be configured, false
+ *                     otherwise. The NIC PHY F/W should be configured on ASIC
+ *                     only, in contrary to simulator/Palladium.
+ * @nic_cq_enable: true if NIC CQ is enabled, false otherwise.
+ * @nic_cq_mmap: true if NIC CQ is mmapped, false otherwise.
+ * @nic_use_fw_polarity: true if NIC should use polarity values from F/W,
+ *                       false if NIC should use hard coded values.
  * @multi_msi_mode: whether we are working in multi MSI single MSI mode.
  *                  Multi MSI is possible only with IOMMU enabled.
+ * @nic_in_reset: true if the NIC was marked as in reset, false otherwise. Used
+ *                to avoid an additional stopping of the NIC if a hard reset was
+ *                re-initiated.
  * @mmu_cache_inv_pi: PI for MMU cache invalidation flow. The H/W expects an
  *                    8-bit value so use u8.
+ * @nic_check_link: true if the PCS link should be checked periodically.
+ * @nic_cq_irq_enable: true if an interrupt was allocated for the NIC CQ.
+ * @nic_in_teardown: true if the NIC is in teardown (during device remove).
+ * @nic_phy_auto_neg_lpbk: true if the NIC PHY should support Autoneg in
+ *                         loopback mode.
+ * @nic_debugfs_reset: true if a device reset can be done from NIC debugfs.
  */
 struct gaudi_device {
 	int (*cpucp_info_get)(struct hl_device *hdev);
-
+	void (*nic_handle_rx)(struct gaudi_nic_device *gaudi_nic);
+	int (*nic_handle_tx)(struct gaudi_nic_device *gaudi_nic, void *data);
+	struct gaudi_nic_device		nic_devices[NIC_NUMBER_OF_PORTS];
+	struct gaudi_nic_macro		nic_macros[NIC_NUMBER_OF_MACROS];
+	struct gaudi_nic_tx_taps	nic_pam4_tx_taps[NIC_MAX_NUM_OF_LANES];
+	struct completion		nic_cq_comp;
+
+	spinlock_t			nic_cq_lock;
 	/* TODO: remove hw_queues_lock after moving to scheduler code */
 	spinlock_t			hw_queues_lock;
 	struct mutex			clk_gate_mutex;
 
+	struct mutex			nic_cq_user_lock;
+	struct mutex			nic_qp_err_lock;
+
+	struct hl_nic_cqe		*nic_cq_buf;
 	struct gaudi_internal_qman_info	internal_qmans[GAUDI_QUEUE_ID_SIZE];
 
 	u64				hbm_bar_cur_addr;
 	u64				max_freq_value;
+	u64				nic_mac_loopback;
+
+	atomic_t			nic_cq_user_new_cqes;
 
 	u32				events[GAUDI_EVENT_SIZE];
 	u32				events_stat[GAUDI_EVENT_SIZE];
 	u32				events_stat_aggregate[GAUDI_EVENT_SIZE];
 	u32				hw_cap_initialized;
+	u32				nic_cq_user_num_of_entries;
+	u32				nic_cq_user_pi;
+	u32				nic_cq_user_ci;
+	u32				nic_cq_status;
+	u32				nic_cq_mmap_size;
+	u32				nic_pcs_fail_time_frame;
+	u32				nic_pcs_fail_threshold;
+	u32				nic_qpc_cache_inv_timeout;
+	u8				nic_phy_load_fw;
+	u8				nic_phy_config_fw;
+	u8				nic_cq_enable;
+	u8				nic_cq_mmap;
+	u8				nic_use_fw_polarity;
 	u8				multi_msi_mode;
+	u8				nic_in_reset;
 	u8				mmu_cache_inv_pi;
+	u8				nic_check_link;
+	u8				nic_cq_irq_enable;
+	u8				nic_in_teardown;
+	u8				nic_phy_auto_neg_lpbk;
+	u8				nic_debugfs_reset;
 };
 
 void gaudi_init_security(struct hl_device *hdev);
@@ -296,4 +559,19 @@ int gaudi_debug_coresight(struct hl_device *hdev, void *data);
 void gaudi_halt_coresight(struct hl_device *hdev);
 int gaudi_get_clk_rate(struct hl_device *hdev, u32 *cur_clk, u32 *max_clk);
 
+/* NIC functions */
+
+int gaudi_nic_ports_init(struct hl_device *hdev);
+void gaudi_nic_ports_fini(struct hl_device *hdev);
+int gaudi_nic_hard_reset_prepare(struct hl_device *hdev);
+void gaudi_nic_stop(struct hl_device *hdev);
+void gaudi_nic_ports_reopen(struct hl_device *hdev);
+void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
+irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
+irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
+netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
+					struct sk_buff *skb);
+int gaudi_nic_sw_init(struct hl_device *hdev);
+void gaudi_nic_sw_fini(struct hl_device *hdev);
+
 #endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
new file mode 100644
index 000000000000..df41de95ba58
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -0,0 +1,2298 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include "../include/gaudi/asic_reg/gaudi_regs.h"
+#include "../include/hw_ip/mmu/mmu_general.h"
+#include "../include/hw_ip/nic/nic_general.h"
+#include <uapi/misc/habanalabs.h>
+
+#include <linux/vmalloc.h>
+#include <linux/etherdevice.h>
+#include <linux/pci.h>
+#include <linux/ipv6.h>
+#include <linux/if_vlan.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+
+#define HL_NIC_DEBUG 0
+
+/*
+ * enum link_status - link status
+ * @LINK_UP: PHY is ready and PCS has link..
+ * @PCS_DOWN: PCS has no link.
+ * @PHY_DON: PHY is not ready.
+ * @FAIL_RECONFIG: need to reconfigure the PHY due to PCS link failures.
+ * @FAULT_RECONFIG: need to reconfigure the PHY due to PCS link faults.
+ */
+enum link_status {
+	LINK_UP,
+	PCS_DOWN,
+	PHY_DOWN,
+	FAIL_RECONFIG,
+	FAULT_RECONFIG
+};
+
+#define HLS1_EXT_PORTS_MASK		0x302
+#define FW_LINK_TRAINING_CNT		200
+#define FW_TUNING_CNT			3000
+#define PCS_LINK_CNT			10
+#define PCS_FAIL_TIME_FRAME_SEC		(60 * 5) /* 5 minutes */
+#define PCS_FAIL_THRESHOLD		8
+#define PCS_FAULT_THRESHOLD		20
+#define PCS_LINK_RETRY_MSEC		20
+
+/* NIC_MAX_MTU equals 8K minus eth header */
+#define NIC_MAX_MTU	((1 << 13) - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
+
+/* MAC configuration */
+#define MAC_CFG_MAC(addr, data)		\
+				mac_write(gaudi_nic, i, "mac", addr, data)
+#define MAC_CFG_MAC_CORE(addr, data)	\
+				mac_write(gaudi_nic, i, "mac_core", addr, data)
+#define MAC_CFG_XPCS(addr, data)	\
+				mac_write(gaudi_nic, i, "xpcs", addr, data)
+#define MAC_CFG_XPCS91(addr, data)	\
+				mac_write(gaudi_nic, i, "xpcs91", addr, data)
+
+static void qpc_cache_inv(struct gaudi_nic_device *gaudi_nic, bool is_req)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u64 inv_reg, status_reg, base;
+	u32 status, port = gaudi_nic->port;
+	int rc;
+
+	if (is_req) {
+		inv_reg = mmNIC0_QPC0_REQ_QPC_CACHE_INVALIDATE;
+		status_reg = mmNIC0_QPC0_REQ_QPC_CACHE_INV_STATUS;
+	} else {
+		inv_reg = mmNIC0_QPC0_RES_QPC_CACHE_INVALIDATE;
+		status_reg = mmNIC0_QPC0_RES_QPC_CACHE_INV_STATUS;
+	}
+
+	/* fix the address to the correct NIC */
+	base = NIC_CFG_BASE(port);
+	inv_reg += base;
+	status_reg += base;
+
+	WREG32(inv_reg, 1);
+	WREG32(inv_reg, 0);
+
+	/* no need to wait for the status in case of hard reset */
+	if (hdev->hard_reset_pending)
+		return;
+
+	rc = hl_poll_timeout(
+		hdev,
+		status_reg,
+		status,
+		status &
+			NIC0_QPC0_REQ_QPC_CACHE_INV_STATUS_INVALIDATE_DONE_MASK,
+		1000,
+		gaudi->nic_qpc_cache_inv_timeout);
+
+	if (rc)
+		dev_warn(hdev->dev,
+			"NIC %s QPC cache invalidation timeout, port: %d\n",
+			is_req ? "requester" : "responder", port);
+}
+
+static void eth_start_stop(struct gaudi_nic_device *gaudi_nic, bool is_start)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct qpc_requester req_qp;
+	struct qpc_responder res_qp;
+	u64 *qpc_addr, req_qpc_addr, res_qpc_addr;
+	u32 port = gaudi_nic->port;
+	int i;
+
+	/*
+	 * Due to H/W bug, odd ports cannot generate MSI interrupts.
+	 * Hence they generate wire interrupts and CPU-CP converts them to MSI
+	 * interrupts. In order to avoid CPU-CP from generating MSI interrupts
+	 * after the odd port went down, clear here the interrupt enable bit.
+	 */
+	if (!is_start && !hdev->nic_rx_poll && (port & 1))
+		NIC_RMWREG32(mmNIC0_QPC0_INTERRUPT_EN, 0,
+				NIC0_QPC0_INTERRUPT_EN_INTERRUPT4_WIRE_EN_MASK);
+
+	/* ETH uses QP 0 */
+	req_qpc_addr = REQ_QPC_ADDR(port, 0);
+
+	memset(&req_qp, 0, sizeof(req_qp));
+	REQ_QPC_SET_TRANSPORT_SERVICE(req_qp, TS_RAW);
+	REQ_QPC_SET_LAST_IDX(req_qp, (WQ_BUFFER_SIZE - 1));
+	/*
+	 * See comment regarding the NIC_HW_MAX_QP_NUM value in the sction of
+	 * TXE configuration in config_port_hw().
+	 */
+	REQ_QPC_SET_WQ_BASE_ADDR(req_qp, NIC_HW_MAX_QP_NUM);
+	REQ_QPC_SET_VALID(req_qp, (u64) is_start);
+	REQ_QPC_SET_SECURED(req_qp, SECURED);
+	REQ_QPC_SET_PORT(req_qp, 0);
+
+	qpc_addr = (u64 *) &req_qp;
+	for (i = 0 ; i < (sizeof(req_qp) / sizeof(u64)) ; i++)
+		writeq(qpc_addr[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((req_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qpc_cache_inv(gaudi_nic, true);
+
+	/* ETH uses QP 0 */
+	res_qpc_addr = RES_QPC_ADDR(port, 0);
+
+	memset(&res_qp, 0, sizeof(res_qp));
+	RES_QPC_SET_TRANSPORT_SERVICE(res_qp, TS_RAW);
+	RES_QPC_SET_LOG_BUF_SIZE_MASK(res_qp, QPC_RES_LOG_BUF_SIZE_MASK);
+	RES_QPC_SET_VALID(res_qp, (u64) is_start);
+	RES_QPC_SET_SECURED(res_qp, SECURED);
+	RES_QPC_SET_PORT(res_qp, 0);
+
+	qpc_addr = (u64 *) &res_qp;
+	for (i = 0 ; i < (sizeof(res_qp) / sizeof(u64)) ; i++)
+		writeq(qpc_addr[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((res_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qpc_cache_inv(gaudi_nic, false);
+}
+
+static u32 mac_addr_convert(int mac, char *cfg_type, u32 addr)
+{
+	if (!strcmp(cfg_type, "xpcs")) {
+		if (addr >= 200 && addr <= 219)
+			addr = addr - 200 + 54;
+		else if (addr >= 400 && addr <= 419)
+			addr = addr - 400 + 74;
+		else if (addr >= (1 << 15))
+			addr = addr - (1 << 15) + 95;
+
+		addr = addr * 4 + mac * (1 << 12);
+	} else if (!strcmp(cfg_type, "mac")) {
+		addr = addr + mac * (1 << 12) + (1 << 10);
+	} else if (!strcmp(cfg_type, "mac_core")) {
+		addr = addr + (1 << 15);
+	} else if (!strcmp(cfg_type, "xpcs91")) {
+		addr = addr * 4 + (1 << 11) * 10;
+	}
+
+	return addr + 0xCC0000;
+}
+
+static void mac_write(struct gaudi_nic_device *gaudi_nic, int mac,
+			char *cfg_type, u32 addr, u32 data)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	addr = mac_addr_convert(mac, cfg_type, addr);
+
+	NIC_MACRO_WREG32(addr, data);
+}
+
+u32 gaudi_nic_mac_read(struct gaudi_nic_device *gaudi_nic, int mac,
+			char *cfg_type, u32 addr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	addr = mac_addr_convert(mac, cfg_type, addr);
+
+	return NIC_MACRO_RREG32(addr);
+}
+
+static void config_port_hw(struct gaudi_nic_device *gaudi_nic, u64 mac_addr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u64 swq_base_addr = SWQ_BASE_ADDR + gaudi_nic->port * SWQ_BASE_SIZE;
+	u64 tx_swq_base, cq_mem_addr = gaudi_nic->cq_mem_device_va;
+	u64 req_qpc_base_addr = REQ_QPC_ADDR(gaudi_nic->port, 0),
+		res_qpc_base_addr = RES_QPC_ADDR(gaudi_nic->port, 0),
+		txs_addr, cq_msi_addr;
+	u32 port = gaudi_nic->port, rx_mem_addr_lo, rx_mem_addr_hi,
+		txs_fence_idx, txs_pi, txs_ci, txs_tail, txs_head,
+		txs_timeout_31_0, timeout_47_32, prio, txs_port, rl_en_log_time,
+		txs_schedq;
+	int i;
+
+	if (gaudi->multi_msi_mode) {
+		gaudi_nic->rx_msi_addr = RX_MSI_ADDRESS + port * 4;
+		cq_msi_addr = CQ_MSI_ADDRESS;
+	} else {
+		gaudi_nic->rx_msi_addr = cq_msi_addr = mmPCIE_MSI_INTR_0;
+	}
+
+	/* TXS Configuration */
+	txs_addr = TXS_BASE_ADDR + port * TXS_BASE_SIZE;
+
+	/* Timer free list */
+	for (i = 0 ; i < TXS_FREE_NUM_ENTRIES ; i++) {
+		writel(TXS_GRANULARITY + i, hdev->pcie_bar[HBM_BAR_ID] +
+			((txs_addr + TXS_FREE_OFFS + i * 4) -
+				gaudi->hbm_bar_cur_addr));
+	}
+
+	/* Perform read to flush the writes */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_49_18,
+				(txs_addr + TXS_FIFO_OFFS) >> 18);
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_17_7,
+				((txs_addr + TXS_FIFO_OFFS) >> 7) & 0x7FF);
+	NIC_WREG32(mmNIC0_TXS0_FREE_LIST_PUSH_MASK_EN, 1);
+
+	txs_fence_idx = 0;
+	txs_pi = 0;
+	txs_ci = 0;
+	txs_tail = 0;
+	txs_head = 0;
+	txs_timeout_31_0 = 0;
+	timeout_47_32 = 0;
+	prio = 0;
+	txs_port = 0;
+	rl_en_log_time = 0;
+	txs_schedq = (timeout_47_32 & 0xFFFF) | ((prio & 0x3) << 16) |
+			((txs_port & 1) << 18) |
+			((rl_en_log_time & 0x3F) << 19);
+
+	for (i = 0 ; i < TXS_SCHEDQ ; i++) {
+		txs_tail = txs_head = i;
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_31_0, txs_fence_idx);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_63_32, txs_pi);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_95_64, txs_ci);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_127_96, txs_tail);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_159_128, txs_head);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_191_160,
+							txs_timeout_31_0);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_217_192, txs_schedq);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_FIFO, i);
+		NIC_WREG32(mmNIC0_TXS0_SCHEDQ_UPDATE_EN, 1);
+	}
+
+	NIC_WREG32(mmNIC0_TXS0_TICK_WRAP, 100);
+	NIC_WREG32(mmNIC0_TXS0_FIRST_SCHEDQ_ID,
+			0 << NIC0_TXS0_FIRST_SCHEDQ_ID_R0_SHIFT |
+			64 << NIC0_TXS0_FIRST_SCHEDQ_ID_R1_SHIFT |
+			128 << NIC0_TXS0_FIRST_SCHEDQ_ID_R2_SHIFT |
+			192 << NIC0_TXS0_FIRST_SCHEDQ_ID_R3_SHIFT);
+	NIC_WREG32(mmNIC0_TXS0_LAST_SCHEDQ_ID,
+			63 << NIC0_TXS0_FIRST_SCHEDQ_ID_R0_SHIFT |
+			127 << NIC0_TXS0_FIRST_SCHEDQ_ID_R1_SHIFT |
+			191 << NIC0_TXS0_FIRST_SCHEDQ_ID_R2_SHIFT |
+			155 << NIC0_TXS0_FIRST_SCHEDQ_ID_R3_SHIFT);
+	NIC_WREG32(mmNIC0_TXS0_SCAN_TIME_COMPARE_0, 4);
+	NIC_WREG32(mmNIC0_TXS0_SCAN_TIME_COMPARE_1, 0);
+	NIC_WREG32(mmNIC0_TXS0_TMR_SCAN_EN, 1);
+
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_FREE_LIST_49_32,
+				(txs_addr + TXS_FREE_OFFS) >> 32);
+	NIC_WREG32(mmNIC0_TXS0_BASE_ADDRESS_FREE_LIST_31_0,
+				(txs_addr + TXS_FREE_OFFS) & 0xFFFFFFFF);
+
+	NIC_WREG32(mmNIC0_TXS0_LIST_MASK,
+			~(0xFFFFFFFF << (ilog2(TXS_FREE_NUM_ENTRIES) - 5)));
+	NIC_WREG32(mmNIC0_TXS0_PRODUCER_UPDATE, TXS_FREE_NUM_ENTRIES);
+	NIC_WREG32(mmNIC0_TXS0_PRODUCER_UPDATE_EN, 1);
+	NIC_WREG32(mmNIC0_TXS0_PRODUCER_UPDATE_EN, 0);
+	NIC_WREG32(mmNIC0_TXS0_LIST_MEM_READ_MASK, 0);
+	NIC_WREG32(mmNIC0_TXS0_PUSH_LOCK_EN, 1);
+
+	/* Consider burst size */
+	NIC_WREG32(mmNIC0_TXS0_IGNORE_BURST_EN, 0);
+
+	/* TXE Configuration */
+
+	/*
+	 * We want to separate the driver WQ from the user WQs.
+	 * Since the NIC supports 4 different WQ base addresses, base address 0
+	 * will be used by the user and base address 1 by the driver.
+	 * The WQ base address index is inferred by two bits that are taken from
+	 * QPC.WQ_BASE_ADDR and are configurable by SQ_BASE_ADDRESS_SEL.
+	 * Since we support up to NIC_HW_MAX_QP_NUM user QPs and the single
+	 * driver QP is located after them, we configure the driver
+	 * QPC.WQ_BASE_ADDR to the value NIC_HW_MAX_QP_NUM, and
+	 * SQ_BASE_ADDRESS_SEL to have the right shift value so the driver will
+	 * indeed use base address 1.
+	 */
+
+	/*
+	 * Need to subtract the size of the user WQs because the driver uses WQ
+	 * base address 1.
+	 */
+	tx_swq_base = swq_base_addr -
+			(1 << (WQ_BUFFER_LOG_SIZE - 2)) * NIC_HW_MAX_QP_NUM *
+				DEVICE_CACHE_LINE_SIZE;
+
+	NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_49_32_1,
+			(tx_swq_base >> 32) & 0x3FFFFF);
+	NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_1,
+			tx_swq_base & 0xFFFFFFFF);
+
+	/*
+	 * This register should contain the value of the shift that the H/W will
+	 * apply on QPC.WQ_BASE_ADDR in order to get the WQ base address index.
+	 * The driver uses WQ base address 1 so we need to trim the leading
+	 * zero bits.
+	 */
+	NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_SEL, ffs(NIC_HW_MAX_QP_NUM) - 1);
+
+	NIC_WREG32(mmNIC0_TXE0_LOG_MAX_WQ_SIZE_1, WQ_BUFFER_LOG_SIZE - 2);
+	NIC_WREG32(mmNIC0_TXE0_PORT0_MAC_CFG_47_32, (mac_addr >> 32) & 0xFFFF);
+	NIC_WREG32(mmNIC0_TXE0_PORT0_MAC_CFG_31_0, mac_addr & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_TXE0_PORT1_MAC_CFG_47_32, (mac_addr >> 32) & 0xFFFF);
+	NIC_WREG32(mmNIC0_TXE0_PORT1_MAC_CFG_31_0, mac_addr & 0xFFFFFFFF);
+
+	/* Since the user WQs are mapped via MMU by the user, its AXI_USER
+	 * registers are set without MMU bypass and with the user ASID.
+	 * Because these configuration registers are shared between the user WQs
+	 * and the ETH Tx WQ, the latter can't be mapped via MMU as we need to
+	 * configure the LKD ASID for that.
+	 * In addition, the ETH Tx WQ is secured so the user shouldn't be able
+	 * to access it. Hence we place the ETH Tx WQ on HBM in the LKD reserved
+	 * section.
+	 */
+	NIC_WREG32(mmNIC0_TXE0_WQE_FETCH_AXI_USER, 1);
+	/*
+	 * The Tx data is placed on HBM. Hence configure it without MMU bypass
+	 * and with the user ASID to avoid any successful access to the host
+	 */
+	NIC_WREG32(mmNIC0_TXE0_DATA_FETCH_AXI_USER, 1);
+	NIC_WREG32(mmNIC0_TXE0_INTERRUPT_MASK, 3);
+
+	/* Make sure data fetch can never be privileged */
+	NIC_WREG32(mmNIC0_TXE0_DATA_FETCH_AXI_PROT, 0x80);
+	/* Make sure WQE fetch can never be privileged */
+	NIC_WREG32(mmNIC0_TXE0_WQE_FETCH_AXI_PROT, 0x80);
+
+	/* QPC Configuration */
+	NIC_WREG32(mmNIC0_QPC0_REQ_BASE_ADDRESS_49_18,
+			(req_qpc_base_addr >> 18) & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_QPC0_REQ_BASE_ADDRESS_17_7,
+			(req_qpc_base_addr >> 7) & 0x7FF);
+	NIC_WREG32(mmNIC0_QPC0_RES_BASE_ADDRESS_49_18,
+			(res_qpc_base_addr >> 18) & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_QPC0_RES_BASE_ADDRESS_17_7,
+			(res_qpc_base_addr >> 7) & 0x7FF);
+	NIC_WREG32(mmNIC0_QPC0_RES_QPC_CACHE_INVALIDATE, 1);
+	NIC_WREG32(mmNIC0_QPC0_REQ_QPC_CACHE_INVALIDATE, 1);
+	NIC_WREG32(mmNIC0_QPC0_RES_QPC_CACHE_INVALIDATE, 0);
+	NIC_WREG32(mmNIC0_QPC0_REQ_QPC_CACHE_INVALIDATE, 0);
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_BASE_4, gaudi_nic->rx_msi_addr);
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_DATA_4, 1);
+	NIC_WREG32(mmNIC0_QPC0_RES_RING0_CFG, RAW_QPN);
+	/* Interrupt each packet */
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_CFG, 0x1FF);
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_CAUSE, 0);
+	/* enable only the QP error interrupt, other interrupts are unused */
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_MASK, 0x110);
+	NIC_WREG32(mmNIC0_QPC0_AXI_PROT, 0); /* secured */
+
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_BASE_ADDR_49_18,
+			(gaudi_nic->qp_err_mem_dma >> 18) & 0xFFFFFFFF);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_BASE_ADDR_17_7,
+			gaudi_nic->qp_err_mem_dma & 0x3FF80);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_PRODUCER_INDEX, 0);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_CONSUMER_INDEX, 0);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_WRITE_INDEX, 0);
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_MASK, QP_ERR_BUF_SIZE - 1);
+	/* The error FIFO is unmapped, hence the bypass */
+	NIC_WREG32(mmNIC0_QPC0_AXI_USER, 0x400);
+	NIC_WREG32(mmNIC0_QPC0_RETRY_COUNT_MAX, 0xFEFE);
+
+	/*
+	 * Generate wire interrupt in case of a QP error.
+	 * CPU-CP converts it to event.
+	 */
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_EN,
+		1 << NIC0_QPC0_INTERRUPT_EN_INTERRUPT8_WIRE_EN_SHIFT);
+
+	/* RXE Configuration */
+	rx_mem_addr_lo = lower_32_bits(gaudi_nic->rx_mem_dma);
+	/* discard packets above the max size */
+	rx_mem_addr_hi = (upper_32_bits(gaudi_nic->rx_mem_dma) <<
+			NIC0_RXE0_RAW_BASE_HI_P1_RAW_BASE_ADDR_HI_P1_SHIFT) |
+		(ilog2(NIC_MAX_PKT_SIZE) <<
+			NIC0_RXE0_RAW_BASE_HI_P1_LOG_RAW_ENTRY_SIZE_P1_SHIFT);
+
+	NIC_WREG32(mmNIC0_RXE0_ARUSER_HBW_10_0, 1);
+	NIC_WREG32(mmNIC0_RXE0_ARUSER_HBW_31_11, 0);
+
+	/* Make sure LBW write access (for SM) can never be privileged */
+	NIC_WREG32(mmNIC0_RXE0_AWPROT_LBW, 0x2);
+
+	/* Make sure HBW read access (for WQE) is always unsecured */
+	NIC_WREG32(mmNIC0_RXE0_ARPROT_HBW, 0x222);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P0_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P0_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P1_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P1_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P2_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P2_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P3_0, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_QPN_P3_1, RAW_QPN);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P0_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P0_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P0_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P0_1, rx_mem_addr_hi);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P1_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P1_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P1_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P1_1, rx_mem_addr_hi);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P2_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P2_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P2_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P2_1, rx_mem_addr_hi);
+
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P3_0, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_LO_P3_1, rx_mem_addr_lo);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P3_0, rx_mem_addr_hi);
+	NIC_WREG32(mmNIC0_RXE0_RAW_BASE_HI_P3_1, rx_mem_addr_hi);
+
+	/*
+	 * See the comment for mmNIC0_TXE0_SQ_BASE_ADDRESS_SEL. The same applies
+	 * for the Rx.
+	 */
+	NIC_WREG32(mmNIC0_RXE0_WQ_BASE_WINDOW_SEL, ffs(NIC_HW_MAX_QP_NUM) - 1);
+
+	NIC_WREG32(mmNIC0_RXE0_PKT_DROP,
+			(0 << NIC0_RXE0_PKT_DROP_ERR_QP_INVALID_SHIFT) |
+			(1 << NIC0_RXE0_PKT_DROP_ERR_TS_MISMATCH_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_CS_INVALID_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_REQ_PSN_INVALID_SHIFT) |
+			(1 << NIC0_RXE0_PKT_DROP_ERR_RES_RKEY_INVALID_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_RES_RESYNC_INVALID_SHIFT) |
+			/* H/W WA for check priority order */
+			(0 << NIC0_RXE0_PKT_DROP_ERR_INV_OPCODE_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_INV_SYNDROME_SHIFT) |
+			(0 << NIC0_RXE0_PKT_DROP_ERR_INV_RAW_SIZE_SHIFT));
+
+	/* CQ */
+	NIC_WREG32(mmNIC0_RXE0_CQ_BASE_ADDR_31_7, cq_mem_addr &
+					NIC0_RXE0_CQ_BASE_ADDR_31_7_R_MASK);
+	NIC_WREG32(mmNIC0_RXE0_CA_BASE_ADDR_49_32, cq_mem_addr >> 32);
+	NIC_WREG32(mmNIC0_RXE0_CQ_WRITE_INDEX, 0);
+	NIC_WREG32(mmNIC0_RXE0_CQ_PRODUCER_INDEX, 0);
+	NIC_WREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX, 0);
+	NIC_WREG32(mmNIC0_RXE0_CQ_CFG0,
+			(1 << NIC0_RXE0_CQ_CFG0_ENABLE_SHIFT) |
+			(1 << NIC0_RXE0_CQ_CFG0_INTERRUPT_MASK_SHIFT) |
+			(8 << NIC0_RXE0_CQ_CFG0_CREDIT_SHIFT) |
+			(1 << NIC0_RXE0_CQ_CFG0_WRAPAROUND_EN_SHIFT) |
+			(1 << NIC0_RXE0_CQ_CFG0_SOB_CQ_MUTEX_SHIFT) |
+			(24 << NIC0_RXE0_CQ_CFG0_CQ_SELECT_SHIFT));
+	NIC_WREG32(mmNIC0_RXE0_CQ_MASK, CQ_PORT_BUF_LEN - 1);
+	/* CQ overrun interrupt only */
+	NIC_WREG32(mmNIC0_RXE0_CQ_MSI_ADDR_1, cq_msi_addr);
+	NIC_WREG32(mmNIC0_RXE0_CQ_MSI_DATA_1, 1);
+	NIC_WREG32(mmNIC0_RXE0_MSI_CASUE_MASK, 2);
+	NIC_WREG32(mmNIC0_RXE0_MSI_CAUSE, 0);
+
+	/*
+	 * Due to H/W bug, odd ports cannot generate MSI interrupts.
+	 * Hence they generate wire interrupts and CPU-CP converts them to MSI
+	 * interrupts.
+	 */
+	if (!hdev->nic_rx_poll && (port & 1))
+		NIC_RMWREG32(mmNIC0_QPC0_INTERRUPT_EN, 1,
+			NIC0_QPC0_INTERRUPT_EN_INTERRUPT4_WIRE_EN_MASK);
+	else
+		NIC_RMWREG32(mmNIC0_QPC0_INTERRUPT_EN, 1,
+			NIC0_QPC0_INTERRUPT_EN_INTERRUPT4_MSI_EN_MASK);
+
+	/* MAC filtering */
+	if (port & 1) {
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_2,
+					mac_addr & 0xFFFFFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_3,
+					mac_addr & 0xFFFFFFFF);
+
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_2,
+					(mac_addr >> 32) & 0xFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_3,
+					(mac_addr >> 32) & 0xFFFF);
+	} else {
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_0,
+					mac_addr & 0xFFFFFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_1,
+					mac_addr & 0xFFFFFFFF);
+
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_0,
+					(mac_addr >> 32) & 0xFFFF);
+		NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_1,
+					(mac_addr >> 32) & 0xFFFF);
+	}
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		MAC_CFG_XPCS(0, gaudi_nic->mac_loopback ? 0xC000 : 0x8000);
+	}
+
+	gaudi_nic_set_pfc(gaudi_nic);
+}
+
+void gaudi_nic_set_pfc(struct gaudi_nic_device *gaudi_nic)
+{
+	int i;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		MAC_CFG_MAC(0x8, gaudi_nic->pfc_enable ? 0x80813 : 0x2913);
+	}
+}
+
+static void config_port_mac(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port, speed = gaudi_nic->speed;
+	int i;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		/* H/W WA for error length */
+		MAC_CFG_MAC(0x14, 8192);
+
+		/* Disable FC FEC */
+		MAC_CFG_MAC_CORE(0x10, 0);
+
+		MAC_CFG_MAC(0x20, 4);
+		MAC_CFG_MAC(0x1C, 4);
+
+		switch (speed) {
+		case SPEED_10000:
+			MAC_CFG_XPCS(0x8010, 3);
+			break;
+		case SPEED_25000:
+			MAC_CFG_XPCS(0x8002, 0x4FFF);
+			MAC_CFG_XPCS(0x8010, 5);
+			MAC_CFG_XPCS(0x8008, 0x68C1);
+			MAC_CFG_XPCS(0x8009, 0x21);
+			MAC_CFG_XPCS(0x800A, 0xC4F0);
+			MAC_CFG_XPCS(0x800B, 0xE6);
+			MAC_CFG_XPCS(0x800C, 0x65C5);
+			MAC_CFG_XPCS(0x800D, 0x9B);
+			MAC_CFG_XPCS(0x800E, 0x79A2);
+			MAC_CFG_XPCS(0x800F, 0x3D);
+			break;
+		case SPEED_50000:
+			MAC_CFG_XPCS(0x8002, 0x4FFF);
+			MAC_CFG_XPCS(0x8010, 0);
+			MAC_CFG_XPCS(0x8008, 0x7690);
+			MAC_CFG_XPCS(0x8009, 0x47);
+			MAC_CFG_XPCS(0x800A, 0xC4F0);
+			MAC_CFG_XPCS(0x800B, 0xE6);
+			MAC_CFG_XPCS(0x800C, 0x65C5);
+			MAC_CFG_XPCS(0x800D, 0x9B);
+			MAC_CFG_XPCS(0x800E, 0x79A2);
+			MAC_CFG_XPCS(0x800F, 0x3D);
+			break;
+		case SPEED_100000:
+			MAC_CFG_XPCS(0x8002, 0x3FFF);
+			MAC_CFG_XPCS(0x8010, 0);
+			MAC_CFG_XPCS(0x8008, 0x68C1);
+			MAC_CFG_XPCS(0x8009, 0x21);
+			MAC_CFG_XPCS(0x800A, 0x719D);
+			MAC_CFG_XPCS(0x800B, 0x8E);
+			MAC_CFG_XPCS(0x800C, 0x4B59);
+			MAC_CFG_XPCS(0x800D, 0xE8);
+			MAC_CFG_XPCS(0x800E, 0x954D);
+			MAC_CFG_XPCS(0x800F, 0x7B);
+			MAC_CFG_XPCS(0x8048, 0x07F5);
+			MAC_CFG_XPCS(0x8049, 0x09);
+			MAC_CFG_XPCS(0x804A, 0x14DD);
+			MAC_CFG_XPCS(0x804B, 0xC2);
+			MAC_CFG_XPCS(0x804C, 0x4A9A);
+			MAC_CFG_XPCS(0x804D, 0x26);
+			MAC_CFG_XPCS(0x804E, 0x457B);
+			MAC_CFG_XPCS(0x804F, 0x66);
+			MAC_CFG_XPCS(0x8050, 0x24A0);
+			MAC_CFG_XPCS(0x8051, 0x76);
+			MAC_CFG_XPCS(0x8052, 0xC968);
+			MAC_CFG_XPCS(0x8053, 0xFB);
+			MAC_CFG_XPCS(0x8054, 0x6CFD);
+			MAC_CFG_XPCS(0x8055, 0x99);
+			MAC_CFG_XPCS(0x8056, 0x91B9);
+			MAC_CFG_XPCS(0x8057, 0x55);
+			MAC_CFG_XPCS(0x8058, 0xB95C);
+			MAC_CFG_XPCS(0x8059, 0xB2);
+			MAC_CFG_XPCS(0x805A, 0xF81A);
+			MAC_CFG_XPCS(0x805B, 0xBD);
+			MAC_CFG_XPCS(0x805C, 0xC783);
+			MAC_CFG_XPCS(0x805D, 0xCA);
+			MAC_CFG_XPCS(0x805E, 0x3635);
+			MAC_CFG_XPCS(0x805F, 0xCD);
+			MAC_CFG_XPCS(0x8060, 0x31C4);
+			MAC_CFG_XPCS(0x8061, 0x4C);
+			MAC_CFG_XPCS(0x8062, 0xD6AD);
+			MAC_CFG_XPCS(0x8063, 0xB7);
+			MAC_CFG_XPCS(0x8064, 0x665F);
+			MAC_CFG_XPCS(0x8065, 0x2A);
+			MAC_CFG_XPCS(0x8066, 0xF0C0);
+			MAC_CFG_XPCS(0x8067, 0xE5);
+			break;
+		default:
+			dev_err(hdev->dev,
+				"unknown NIC port %d speed %dMb/s, cannot configure MAC XPCS\n",
+				port, speed);
+			break;
+		}
+	}
+
+	switch (speed) {
+	case SPEED_10000:
+		MAC_CFG_MAC_CORE(0, 0xF0FF00);
+		MAC_CFG_MAC_CORE(0x1C, 0);
+		MAC_CFG_MAC_CORE(0x10, 0);
+		break;
+	case SPEED_25000:
+		MAC_CFG_MAC_CORE(0, 0xF0FF00);
+		MAC_CFG_MAC_CORE(0x18, 0x60F);
+		MAC_CFG_MAC_CORE(0x1C, 0);
+		MAC_CFG_MAC_CORE(0x10, 0);
+		break;
+	case SPEED_50000:
+		MAC_CFG_MAC_CORE(0x18, 0xFF);
+		MAC_CFG_MAC_CORE(0, 0xF0FFF0);
+		MAC_CFG_MAC_CORE(0x1C, 0);
+		MAC_CFG_XPCS91(0, 0x400);
+		MAC_CFG_XPCS91(0x8, 0x400);
+		MAC_CFG_XPCS91(0x10, 0x400);
+		MAC_CFG_XPCS91(0x18, 0x400);
+		break;
+	case SPEED_100000:
+		if (gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4) {
+			MAC_CFG_MAC_CORE(0, 0xF0FF00);
+			MAC_CFG_MAC_CORE(0x18, 0x0F);
+		} else {
+			MAC_CFG_MAC_CORE(0x18, 0xFF);
+		}
+		break;
+	default:
+		dev_err(hdev->dev,
+			"unknown NIC port %d speed %dMb/s, cannot configure MAC CORE\n",
+			port, speed);
+		break;
+	}
+}
+
+static int hw_config(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u64 mac_addr = 0, tmr_addr;
+	u32 port = gaudi_nic->port, data_rate, speed = gaudi_nic->speed;
+	int i;
+
+	for (i = 0 ; i < ETH_ALEN ; i++) {
+		mac_addr <<= 8;
+		mac_addr |= gaudi_nic->ndev->dev_addr[i];
+	}
+
+	switch (speed) {
+	case SPEED_10000:
+		data_rate = NIC_DR_10;
+		break;
+	case SPEED_25000:
+		data_rate = NIC_DR_25;
+		break;
+	case SPEED_50000:
+		data_rate = NIC_DR_50;
+		break;
+	case SPEED_100000:
+		if (gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4)
+			data_rate = NIC_DR_25;
+		else
+			data_rate = NIC_DR_50;
+		break;
+	default:
+		data_rate = NIC_DR_50;
+		dev_err(hdev->dev,
+			"unknown NIC port %d speed, continue with 50 GHz\n",
+			port);
+		break;
+	}
+
+	dev_dbg(hdev->dev, "NIC port %d, speed %d data rate %d\n",
+		port, speed, data_rate);
+
+	gaudi_nic->data_rate = data_rate;
+
+	/* if no need in macro configuration, do only port configuration */
+	if (gaudi_nic->do_macro_cfg) {
+		config_port_mac(gaudi_nic);
+		config_port_hw(gaudi_nic, mac_addr);
+	} else {
+		config_port_hw(gaudi_nic, mac_addr);
+		goto out;
+	}
+
+	/*
+	 * the following registers are shared between each pair of ports,
+	 * hence need to configure only once per NIC macro
+	 */
+	/* RXB Configuration */
+	NIC_MACRO_WREG32(mmNIC0_RXB_LBW_OFFSET_0, CFG_BASE & 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_LBW_OFFSET_1, (CFG_BASE >> 32) & 0x3FFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_ICRC_CFG, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_0, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_1, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_2, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_3, 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_0, 0xFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_1, 0xFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_2, 0xFFFF);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_3, 0xFFFF);
+	/* H/W WA for credit leakage */
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_0, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_1, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_2, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_3, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_4, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_5, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_6, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_7, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_8, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_9, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_10, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_11, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_12, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_13, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_14, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_DROP_THRESHOLD_15, 0xB37 | (0xB37 << 13));
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXUSER_10_0_UNTRUST, 1);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXUSER_10_0_TRUST, 0x400);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXUSER_10_0_PRIV, 0x400);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXPROT_PRIV, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXPROT_TRUST, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_AXI_AXPROT_UNTRUST, 2);
+
+	/* MAC filtering */
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_0, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_1, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_2, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_31_0_MASK_3, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_0, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_1, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_2, 0);
+	NIC_MACRO_WREG32(mmNIC0_RXB_TS_RC_MAC_47_32_MASK_3, 0);
+
+	/* Credits allocation - all dynamic */
+	/* H/W WA for credit leakage */
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_DYNAMIC, 0xB36);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_0, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_1, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_2, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_3, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_4, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_5, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_6, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_7, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_8, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_9, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_10, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_11, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_12, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_13, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_14, 0x41);
+	NIC_MACRO_WREG32(mmNIC0_RXB_MAX_STATIC_CREDITS_15, 0x41);
+
+	/* TMR Configuration */
+	tmr_addr = TMR_BASE_ADDR + gaudi_nic->nic_macro->idx * TMR_BASE_SIZE;
+
+	/* Clear timer FSM0 */
+	for (i = 0 ; i < NIC_HW_MAX_QP_NUM ; i++)
+		writeb(0, hdev->pcie_bar[HBM_BAR_ID] +
+			((tmr_addr + TMR_FSM0_OFFS + i) -
+				gaudi->hbm_bar_cur_addr));
+
+	/* Clear timer FSM1 */
+	for (i = 0 ; i < NIC_HW_MAX_QP_NUM ; i++)
+		writeb(0, hdev->pcie_bar[HBM_BAR_ID] +
+			((tmr_addr + TMR_FSM1_OFFS + i) -
+				gaudi->hbm_bar_cur_addr));
+
+	/* Timer free list */
+	for (i = 0 ; i < TMR_FREE_NUM_ENTRIES ; i++)
+		writel(TMR_GRANULARITY + i, hdev->pcie_bar[HBM_BAR_ID] +
+			((tmr_addr + TMR_FREE_OFFS + i * 4) -
+				gaudi->hbm_bar_cur_addr));
+
+	/* Perform read to flush the writes */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_49_18,
+				(tmr_addr + TMR_FIFO_OFFS) >> 18);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_17_7,
+				((tmr_addr + TMR_FIFO_OFFS) >> 7) & 0x7FF);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_FREE_LIST_49_32,
+				(tmr_addr + TMR_FREE_OFFS) >> 32);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_BASE_ADDRESS_FREE_LIST_31_0,
+				(tmr_addr + TMR_FREE_OFFS) & 0xFFFFFFFF);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_CACHE_BASE_ADDR_49_32,
+				(tmr_addr + TMR_FSM0_OFFS) >> 32);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_CACHE_BASE_ADDR_31_7,
+				((tmr_addr + TMR_FSM0_OFFS) >> 7) & 0xFFFFFF);
+
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_31_0, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_63_32, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_95_64, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_191_160, 1000);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_216_192, 0);
+
+	for (i = 0 ; i < TMR_GRANULARITY ; i++) {
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_127_96, i);
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_159_128, i);
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_FIFO, i);
+		NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCHEDQ_UPDATE_EN, 1);
+	}
+
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_SCAN_TIMER_COMP_31_0, 10);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_TICK_WRAP, 500);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_LIST_MASK,
+			~(0xFFFFFFFF << (ilog2(TMR_FREE_NUM_ENTRIES) - 5)));
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PRODUCER_UPDATE, TMR_FREE_NUM_ENTRIES);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PRODUCER_UPDATE_EN, 1);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PRODUCER_UPDATE_EN, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_LIST_MEM_READ_MASK, 0);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_PUSH_LOCK_EN, 1);
+	NIC_MACRO_WREG32(mmNIC0_TMR_TMR_TIMER_EN, 1);
+	NIC_MACRO_WREG32(mmNIC0_TMR_FREE_LIST_PUSH_MASK_EN, 0);
+
+out:
+	/* Perform read from the device to flush all configurations */
+	NIC_MACRO_RREG32(mmNIC0_TMR_TMR_TIMER_EN);
+
+	return 0;
+}
+
+static bool write_pkt_to_hw(struct gaudi_nic_device *gaudi_nic, u64 *data,
+				u64 size)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct sq_wqe swq;
+	u64 *swq_p = (u64 *) &swq;
+	u64 swq_addr, sb_base_address, swq_base_addr;
+	u32 port = gaudi_nic->port, pi = gaudi_nic->tx_pi,
+		ci = gaudi_nic->tx_ci, diff, new_pi;
+	int i;
+
+	if (pi >= ci)
+		diff = pi - ci;
+	else
+		diff = WQ_BUFFER_SIZE - ci + pi;
+
+	/* update CI once in a while */
+	if (diff > (WQ_BUFFER_SIZE >> 1))
+		gaudi_nic->tx_ci = ci = NIC_RREG32(mmNIC0_QPC0_REQ_RING0_CI);
+
+	new_pi = (pi + 1) & (WQ_BUFFER_SIZE - 1);
+	if (new_pi == ci)
+		return false;
+
+	gaudi_nic->tx_pi = new_pi;
+
+	sb_base_address = (SB_BASE_ADDR + port * SB_BASE_SIZE) +
+				pi * NIC_MAX_PKT_SIZE;
+	swq_base_addr = SWQ_BASE_ADDR + port * SWQ_BASE_SIZE;
+
+	/* Create SWQ */
+	memset(&swq, 0, sizeof(swq));
+	CFG_SQ_WQE_OPCODE(swq, WQE_LINEAR);
+	CFG_SQ_WQE_LOCAL_ADDRESS_31_0(swq, sb_base_address & 0xFFFFFFFF);
+	CFG_SQ_WQE_LOCAL_ADDRESS_49_32(swq, (sb_base_address >> 32) & 0x3FFFF);
+	CFG_SQ_WQE_SIZE(swq, size);
+
+	/* Copy packet to SB */
+	for (i = 0 ; i < size ; i++)
+		writeq(data[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((sb_base_address + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	/* Copy WQE to SWQ Buffer */
+	for (i = 0 ; i < (sizeof(swq) / sizeof(u64)) ; i++) {
+		swq_addr = swq_base_addr +
+				(pi * sizeof(struct sq_wqe) + i * 8);
+		writeq(swq_p[i], hdev->pcie_bar[HBM_BAR_ID] +
+				(swq_addr - gaudi->hbm_bar_cur_addr));
+	}
+
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	/* Make sure we ring the doorbell after the data copying */
+	mb();
+
+	/* Doorbell push */
+	/* TODO: change to QMAN cmd */
+	NIC_WREG32(mmNIC0_QPC0_SECURED_DOORBELL_PI, new_pi);
+	NIC_WREG32(mmNIC0_QPC0_SECURED_DOORBELL_QPN, 0x80000000 | RAW_QPN);
+
+	return true;
+}
+
+static bool get_pkt_from_hw(struct gaudi_nic_device *gaudi_nic, u64 mem_addr,
+				u64 *ppkt_addr, u32 *ppkt_size, u32 *pi)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u64 pkt_addr;
+	u32 ci = gaudi_nic->rx_ci, ether_type, tpid, ipv4_len, ipv6_len,
+		pkt_size, hdr_size = ETH_HLEN;
+	__be32 *data;
+	int idx;
+	bool vlan_double_tag = false, ret = true;
+
+	/*
+	 * check if packet is available by reading the PI, but do it only if
+	 * needed as it is expensive
+	 */
+	if (*pi == ci) {
+		/* TODO: need this wraparound? */
+		*pi = NIC_RREG32(mmNIC0_QPC0_RES_RING0_PI) & (NIC_RX_SIZE - 1);
+		if (*pi == ci)
+			return false;
+	}
+
+	pkt_addr = mem_addr + ci * NIC_MAX_PKT_SIZE;
+	data = (__be32 *) pkt_addr;
+
+	/* skip MAC header */
+	idx = (ETH_ALEN * 2) / 4;
+
+	/* handle VLAN tagging */
+	tpid = ntohl(data[idx++]) >> 16;
+	if (tpid == ETH_P_8021AD) {
+		/* skip VLAN double tagging */
+		tpid = ntohl(data[idx++]) >> 16;
+		vlan_double_tag = true;
+		hdr_size += 4;
+	}
+
+	if (tpid == ETH_P_8021Q) {
+		/* skip VLAN tagging */
+		ether_type = ntohl(data[idx++]) >> 16;
+		hdr_size += 4;
+	} else if (vlan_double_tag) {
+		dev_dbg_ratelimited(hdev->dev,
+					"Wrong VLAN TPID double tagging 0x%x\n",
+					tpid);
+		ether_type = UINT_MAX;
+	} else {
+		ether_type = tpid;
+	}
+
+	if (ether_type <= ETH_DATA_LEN) {
+		pkt_size = ether_type;
+	} else if (ether_type == ETH_P_ARP) {
+		pkt_size = hdr_size + NIC_ARP_PKT_SIZE;
+	} else if (ether_type == ETH_P_IP) {
+		ipv4_len = ntohl(data[idx]) >> 16;
+		pkt_size = hdr_size + ipv4_len;
+	} else if (ether_type == ETH_P_IPV6) {
+		ipv6_len = ntohl(data[idx]) & 0xFFFF;
+		pkt_size = hdr_size + ipv6_len + sizeof(struct ipv6hdr);
+	} else if ((ether_type == ETH_P_LLDP) ||
+			(ether_type == ETH_P_LOOPBACK)) {
+		pkt_size = hdr_size + ETH_DATA_LEN;
+	} else {
+		dev_err_ratelimited(hdev->dev,
+					"error, unsupported EtherType 0x%x, port %d\n",
+					ether_type, gaudi_nic->port);
+		ret = false;
+		goto out;
+	}
+
+	if (pkt_size > NIC_MAX_PKT_SIZE) {
+		dev_err_ratelimited(hdev->dev,
+				"error, packet size %u exceeds maximum of %u\n",
+				pkt_size, NIC_MAX_PKT_SIZE);
+		ret = false;
+		goto out;
+	}
+
+#if HL_NIC_DEBUG
+	dev_dbg_ratelimited(hdev->dev,
+				"port %d packet_size %d ether_type 0x%x\n",
+				gaudi_nic->port, pkt_size,
+				ether_type);
+#endif
+
+	*ppkt_addr = pkt_addr;
+	*ppkt_size = pkt_size;
+out:
+	gaudi_nic->rx_ci = (ci + 1) & (NIC_RX_SIZE - 1);
+
+	return ret;
+}
+
+bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic)
+{
+	return atomic_read(&gaudi_nic->in_reset) ||
+			hl_device_disabled_or_in_reset(gaudi_nic->hdev);
+}
+
+static int gaudi_nic_handle_rx_pkt(struct gaudi_nic_device *gaudi_nic,
+					int budget, u32 *last_pi)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct net_device_stats *stats = &gaudi_nic->ndev->stats;
+	struct sk_buff *skb;
+	u64 pkt_address;
+	u32 pkt_size, pi = gaudi_nic->rx_ci;
+	int rc, pkt_count = 0;
+
+	if (!gaudi_nic->active)
+		return 0;
+
+	while (1) {
+		if (pkt_count >= budget || disabled_or_in_reset(gaudi_nic))
+			break;
+
+		rc = get_pkt_from_hw(gaudi_nic,
+					(u64) (uintptr_t) gaudi_nic->rx_mem_cpu,
+					&pkt_address, &pkt_size, &pi);
+		if (!rc)
+			break;
+
+		if (hdev->nic_rx_poll)
+			skb = netdev_alloc_skb_ip_align(gaudi_nic->ndev,
+							pkt_size);
+		else
+			skb = napi_alloc_skb(&gaudi_nic->napi, pkt_size);
+
+		if (!skb)
+			break;
+
+		skb_copy_to_linear_data(skb, (void *) pkt_address, pkt_size);
+		skb_put(skb, pkt_size);
+		skb->protocol = eth_type_trans(skb, gaudi_nic->ndev);
+
+#if HL_NIC_DEBUG
+		dev_dbg_ratelimited(hdev->dev,
+					"port: %d, addr: 0x%llx, size: %d, rx_ci: %d\n",
+					gaudi_nic->port, pkt_address, pkt_size,
+					gaudi_nic->rx_ci);
+#endif
+
+		rc = netif_receive_skb(skb);
+		if (rc == NET_RX_DROP) {
+			stats->rx_dropped++;
+		} else {
+			stats->rx_packets++;
+			stats->rx_bytes += pkt_size;
+			pkt_count++;
+		}
+	}
+
+	*last_pi = pi;
+
+	return pkt_count;
+}
+
+static void rx_pkt_poll(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							rx_poll_work.work);
+	u32 ignore;
+
+	gaudi_nic_handle_rx_pkt(gaudi_nic, NIC_NAPI_MAX_RX_BUDGET, &ignore);
+	schedule_delayed_work(&gaudi_nic->rx_poll_work, msecs_to_jiffies(1));
+}
+
+static void gaudi_nic_reenable_rx_irq(struct gaudi_nic_device *gaudi_nic,
+								u32 last_pi)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 new_pi;
+
+	NIC_WREG32(mmNIC0_QPC0_INTERRUPT_CLR, 0xFFFF);
+
+	if (gaudi_nic->active) {
+		/*
+		 * packets can still arrive when IRQ is disabled. Hence if the
+		 * PI has changed since we finished to handle the Rx ring, it
+		 * means we have more packets to process. Hence we generate an
+		 * IRQ to handle them.
+		 */
+		new_pi = NIC_RREG32(mmNIC0_QPC0_RES_RING0_PI) &
+				(NIC_RX_SIZE - 1);
+		if (last_pi != new_pi)
+			WREG32(gaudi_nic->rx_msi_addr, 1);
+	}
+}
+
+static int napi_clean(struct napi_struct *napi, int budget)
+{
+	struct gaudi_nic_device *gaudi_nic =
+			container_of(napi, struct gaudi_nic_device, napi);
+	u32 last_pi;
+	int work_done = gaudi_nic_handle_rx_pkt(gaudi_nic, budget, &last_pi);
+
+	/* If budget not fully consumed, exit the polling mode */
+	if (work_done < budget) {
+		napi_complete_done(napi, work_done);
+		gaudi_nic_reenable_rx_irq(gaudi_nic, last_pi);
+	}
+
+	return work_done;
+}
+
+irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg)
+{
+	struct gaudi_nic_device *gaudi_nic = arg;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+
+	if (!hdev->nic_rx_poll)
+		gaudi->nic_handle_rx(gaudi_nic);
+
+	return IRQ_HANDLED;
+}
+
+static void set_port_status(struct gaudi_nic_device *gaudi_nic, bool active)
+{
+	if (gaudi_nic->active == active)
+		return;
+
+	if (active) {
+		netif_wake_queue(gaudi_nic->ndev);
+		netif_start_queue(gaudi_nic->ndev);
+		netif_carrier_on(gaudi_nic->ndev);
+		gaudi_nic->active = true;
+	} else {
+		netif_stop_queue(gaudi_nic->ndev);
+		netif_carrier_off(gaudi_nic->ndev);
+		gaudi_nic->active = false;
+	}
+}
+
+static void port_reset_state(struct gaudi_nic_device *gaudi_nic)
+{
+	kfifo_reset(&gaudi_nic->pcs_fail_fifo);
+	gaudi_nic->pcs_link = false;
+	gaudi_nic->auto_neg_resolved = false;
+	gaudi_nic->phy_fw_tuned = false;
+	gaudi_nic->retry_cnt = 0;
+	gaudi_nic->pcs_fail_cnt = 0;
+	gaudi_nic->pcs_local_fault_cnt = 0;
+	gaudi_nic->pcs_remote_fault_cnt = 0;
+	gaudi_nic->correctable_errors_cnt = 0;
+	gaudi_nic->uncorrectable_errors_cnt = 0;
+}
+
+static int _gaudi_nic_sw_init(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int rc;
+
+	gaudi_nic->rx_mem_size = NIC_RX_SIZE * NIC_MAX_PKT_SIZE;
+
+	gaudi_nic->rx_mem_cpu = hdev->asic_funcs->asic_dma_alloc_coherent(hdev,
+							gaudi_nic->rx_mem_size,
+							&gaudi_nic->rx_mem_dma,
+							GFP_KERNEL);
+	if (!gaudi_nic->rx_mem_cpu) {
+		dev_err(hdev->dev, "Failed to allocate Rx memory, port: %d\n",
+			port);
+		return -ENOMEM;
+	}
+
+	gaudi_nic->cq_mem_size = CQ_PORT_BUF_SIZE;
+
+	if (!IS_ALIGNED(gaudi_nic->cq_mem_size, PAGE_SIZE_4KB)) {
+		dev_err(hdev->dev,
+			"NIC CQ port buffer size should be aligned to 4KB, port: %d\n",
+			port);
+		rc = -EFAULT;
+		goto free_rx;
+	}
+
+	gaudi_nic->cq_mem_cpu = hdev->asic_funcs->asic_dma_alloc_coherent(hdev,
+							gaudi_nic->cq_mem_size,
+							&gaudi_nic->cq_mem_dma,
+							GFP_KERNEL);
+	if (!gaudi_nic->cq_mem_cpu) {
+		dev_err(hdev->dev, "Failed to allocate CQ memory, port: %d\n",
+			port);
+		rc = -ENOMEM;
+		goto free_rx;
+	}
+
+	gaudi_nic->qp_err_mem_size = QP_ERR_BUF_SIZE;
+
+	gaudi_nic->qp_err_mem_cpu = hdev->asic_funcs->asic_dma_alloc_coherent(
+						hdev,
+						gaudi_nic->qp_err_mem_size,
+						&gaudi_nic->qp_err_mem_dma,
+						GFP_KERNEL);
+	if (!gaudi_nic->qp_err_mem_cpu) {
+		dev_err(hdev->dev,
+			"Failed to allocate QP error memory, port: %d\n",
+			port);
+		rc = -ENOMEM;
+		goto free_cq;
+	}
+
+	mutex_init(&gaudi_nic->user_wq_lock);
+
+	mutex_init(&gaudi_nic->idr_lock);
+	idr_init(&gaudi_nic->qp_ids);
+
+	return 0;
+
+free_cq:
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->cq_mem_size,
+							gaudi_nic->cq_mem_cpu,
+							gaudi_nic->cq_mem_dma);
+free_rx:
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->rx_mem_size,
+							gaudi_nic->rx_mem_cpu,
+							gaudi_nic->rx_mem_dma);
+
+	return rc;
+}
+
+static void _gaudi_nic_sw_fini(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	idr_destroy(&gaudi_nic->qp_ids);
+	mutex_destroy(&gaudi_nic->idr_lock);
+
+	mutex_destroy(&gaudi_nic->user_wq_lock);
+
+	hdev->asic_funcs->asic_dma_free_coherent(hdev,
+						gaudi_nic->qp_err_mem_size,
+						gaudi_nic->qp_err_mem_cpu,
+						gaudi_nic->qp_err_mem_dma);
+
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->cq_mem_size,
+							gaudi_nic->cq_mem_cpu,
+							gaudi_nic->cq_mem_dma);
+
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, gaudi_nic->rx_mem_size,
+							gaudi_nic->rx_mem_cpu,
+							gaudi_nic->rx_mem_dma);
+}
+
+int gaudi_nic_sw_init(struct hl_device *hdev)
+{
+	struct gaudi_nic_device *gaudi_nic;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int rc, i, init_cnt = 0;
+
+	/* At this stage, we don't know how many links we have, so we must
+	 * allocate for the maximum number of links (and also free all of them
+	 * in sw_fini
+	 */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++, init_cnt++) {
+		gaudi_nic = &gaudi->nic_devices[i];
+		gaudi_nic->hdev = hdev;
+		gaudi_nic->port = i;
+
+		rc = _gaudi_nic_sw_init(gaudi_nic);
+		if (rc) {
+			dev_err(hdev->dev,
+				"NIC S/W init failed, port: %d, rc: %d\n", i,
+				rc);
+			goto err;
+		}
+	}
+
+	mutex_init(&gaudi->nic_cq_user_lock);
+	mutex_init(&gaudi->nic_qp_err_lock);
+
+	return 0;
+
+err:
+	for (i = 0 ; i < init_cnt ; i++)
+		_gaudi_nic_sw_fini(&gaudi->nic_devices[i]);
+
+	return rc;
+}
+
+void gaudi_nic_sw_fini(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int i;
+
+	mutex_destroy(&gaudi->nic_qp_err_lock);
+	mutex_destroy(&gaudi->nic_cq_user_lock);
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++)
+		_gaudi_nic_sw_fini(&gaudi->nic_devices[i]);
+}
+
+
+/* used for physically contiguous memory only */
+static int map_nic_mem(struct hl_device *hdev, u64 va, dma_addr_t pa, u32 size)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct hl_ctx *ctx = hdev->kernel_ctx;
+	s64 off;
+	int rc;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+		return 0;
+
+	mutex_lock(&ctx->mmu_lock);
+
+	for (off = 0 ; off < size ; off += PAGE_SIZE_4KB) {
+		rc = hl_mmu_map(ctx, va + off, pa + off, PAGE_SIZE_4KB,
+				(off + PAGE_SIZE_4KB) >= size);
+		if (rc) {
+			dev_err(hdev->dev,
+				"Map failed for va 0x%llx to pa 0x%llx\n",
+				va + off, pa + off);
+			goto unmap;
+		}
+	}
+
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, false, 0);
+
+	mutex_unlock(&ctx->mmu_lock);
+
+	return 0;
+
+unmap:
+	for (; off >= 0 ; off -= PAGE_SIZE_4KB)
+		if (hl_mmu_unmap(ctx, va + off, PAGE_SIZE_4KB,
+					(off - (s32) PAGE_SIZE_4KB) < 0))
+			dev_warn_ratelimited(hdev->dev,
+					"failed to unmap va 0x%llx\n",
+					va + off);
+
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, true, 0);
+
+	mutex_unlock(&ctx->mmu_lock);
+
+	return rc;
+}
+
+static void unmap_nic_mem(struct hl_device *hdev, u64 va, u32 size)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct hl_ctx *ctx = hdev->kernel_ctx;
+	s64 off;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+		return;
+
+	mutex_lock(&ctx->mmu_lock);
+
+	for (off = 0 ; off < size ; off += PAGE_SIZE_4KB)
+		if (hl_mmu_unmap(ctx, va + off, PAGE_SIZE_4KB,
+				       (off + PAGE_SIZE_4KB) >= size))
+			dev_warn_ratelimited(hdev->dev,
+					"Failed to unmap va 0x%llx\n",
+					va + off);
+
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, true, 0);
+
+	mutex_unlock(&ctx->mmu_lock);
+}
+
+static int map_cq_mem(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU)) {
+		gaudi_nic->cq_mem_device_va = gaudi_nic->cq_mem_dma;
+		return 0;
+	}
+
+	gaudi_nic->cq_mem_device_va = CQ_VIRTUAL_ADDRESS +
+				gaudi_nic->port * gaudi_nic->cq_mem_size;
+
+	return map_nic_mem(hdev, gaudi_nic->cq_mem_device_va,
+				gaudi_nic->cq_mem_dma, gaudi_nic->cq_mem_size);
+}
+
+static void unmap_cq_mem(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+		return;
+
+	unmap_nic_mem(hdev, gaudi_nic->cq_mem_device_va,
+			gaudi_nic->cq_mem_size);
+}
+
+static void mac_channels_init(struct gaudi_nic_device *gaudi_nic)
+{
+	struct gaudi_nic_macro *nic_macro = gaudi_nic->nic_macro;
+	u32 port = gaudi_nic->port;
+
+	if (gaudi_nic->auto_neg_enable) {
+		if (gaudi_nic->speed == SPEED_100000) {
+			if (nic_macro->num_of_lanes == NIC_LANES_4) {
+				gaudi_nic->power_up_mask = 0x1;
+				gaudi_nic->fw_tuning_mask = 0xF;
+			} else {
+				gaudi_nic->power_up_mask =
+							(port & 1) ? 0xC : 0x3;
+				gaudi_nic->fw_tuning_mask =
+							(port & 1) ? 0xC : 0x3;
+				gaudi_nic->auto_neg_mask =
+							(port & 1) ? 0x4 : 0x1;
+			}
+		} else {
+			gaudi_nic->fw_tuning_mask = gaudi_nic->power_up_mask =
+				(port & 1) ? 0xC : 0x3;
+		}
+	} else {
+		if (nic_macro->num_of_lanes == NIC_LANES_2)
+			gaudi_nic->power_up_mask = (port & 1) ? 0xC : 0x3;
+		else
+			/*
+			 * in the special mode of 100000Mb/s with 4 lanes, only
+			 * the even port should be up and should configure all
+			 * the lanes
+			 */
+			gaudi_nic->power_up_mask = 0xF;
+
+		gaudi_nic->fw_tuning_mask = gaudi_nic->power_up_mask;
+	}
+}
+
+static int port_open(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 port = gaudi_nic->port, pcs_fifo_size;
+	char cq_wq_name[15] = {0};
+	int rc, rx_irq = 0;
+
+	if (gaudi_nic->port_open)
+		return 0;
+
+	/*
+	 * Temporary WA until DevOps starts to use nic_mac_loopback properly by
+	 * writing a bitmask rather than a boolean (SW-15223).
+	 * When they implement that, the following code should be used:
+	 * !!(gaudi->nic_mac_loopback_mask & BIT(port))
+	 */
+	gaudi_nic->mac_loopback = !!gaudi->nic_mac_loopback;
+
+	gaudi_nic->auto_neg_enable = !!(hdev->nic_auto_neg_mask & BIT(port));
+	mac_channels_init(gaudi_nic);
+
+	pcs_fifo_size = gaudi->nic_pcs_fail_threshold * sizeof(ktime_t);
+	if (!is_power_of_2(pcs_fifo_size)) {
+		dev_err(hdev->dev,
+			"PCS fifo size must be a power of 2, port: %d\n", port);
+		return -EFAULT;
+	}
+
+	rc = kfifo_alloc(&gaudi_nic->pcs_fail_fifo, pcs_fifo_size, GFP_KERNEL);
+	if (rc) {
+		dev_err(hdev->dev, "PCS fifo alloc failed, port: %d\n", port);
+		return rc;
+	}
+
+	/*
+	 * Workaround for H3 #HW-2061 bug.
+	 * MMU bypass cannot be set to the NIC CQ. But since it uses ASID 0, we
+	 * solve it by mapping the CQ buffer.
+	 */
+	rc = map_cq_mem(gaudi_nic);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to map NIC CQ buffer, port: %d\n",
+			port);
+		goto pcs_fifo_free;
+	}
+
+	memset(gaudi_nic->rx_mem_cpu, 0, gaudi_nic->rx_mem_size);
+	memset(gaudi_nic->cq_mem_cpu, 0, gaudi_nic->cq_mem_size);
+
+	snprintf(cq_wq_name, sizeof(cq_wq_name) - 1, "nic%d-cq",
+			gaudi_nic->port);
+
+	/*
+	 * Use only one thread because cq_irq_work() should not be executed
+	 * concurrently for the same port.
+	 */
+	gaudi_nic->cq_wq = create_singlethread_workqueue(cq_wq_name);
+	if (!gaudi_nic->cq_wq) {
+		dev_err(hdev->dev, "Failed to create CQ WQ, port: %d, %d\n",
+			port, rc);
+		goto cq_unmap;
+	}
+
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		rx_irq = pci_irq_vector(hdev->pdev, RX_MSI_IDX + port);
+
+		rc = request_irq(rx_irq, gaudi_nic_rx_irq_handler, 0,
+					gaudi_nic->ndev->name,
+					gaudi_nic);
+		if (rc) {
+			dev_err(hdev->dev,
+				"Failed to request Rx IRQ %d, port: %d, %d\n",
+				rx_irq, port, rc);
+			goto cq_wq_free;
+		}
+	}
+
+	gaudi_nic->rx_ci = gaudi_nic->tx_pi = gaudi_nic->tx_ci =
+		gaudi_nic->cq_ci = gaudi_nic->last_cqe_cnt = 0;
+
+	gaudi_nic->cq_delay = usecs_to_jiffies(1);
+	gaudi_nic->cq_delay_idle = msecs_to_jiffies(1);
+
+	/* after hw_config(), interrupts may arrive */
+	rc = hw_config(gaudi_nic);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to configure NIC H/W, port: %d, %d",
+					port, rc);
+		goto rx_irq_free;
+	}
+
+	eth_start_stop(gaudi_nic, true);
+
+	if (hdev->nic_rx_poll) {
+		/*
+		 * init the delayed work here to support on the fly switch
+		 * between NAPI and polling mode.
+		 */
+		INIT_DELAYED_WORK(&gaudi_nic->rx_poll_work, rx_pkt_poll);
+		schedule_delayed_work(&gaudi_nic->rx_poll_work,
+					msecs_to_jiffies(1));
+	} else {
+		napi_enable(&gaudi_nic->napi);
+	}
+
+	set_port_status(gaudi_nic, true);
+
+	gaudi_nic->port_open = true;
+
+	return 0;
+
+rx_irq_free:
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		synchronize_irq(rx_irq);
+		free_irq(rx_irq, gaudi_nic);
+	}
+cq_wq_free:
+	destroy_workqueue(gaudi_nic->cq_wq);
+cq_unmap:
+	unmap_cq_mem(gaudi_nic);
+pcs_fifo_free:
+	kfifo_free(&gaudi_nic->pcs_fail_fifo);
+
+	return rc;
+}
+
+static void port_open_work(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							port_open_work.work);
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int rc;
+
+	rc = port_open(gaudi_nic);
+	if (rc)
+		dev_err(hdev->dev, "Failed to init NIC H/W, port: %d\n",
+			gaudi_nic->port);
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+}
+
+static void port_close(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 port = gaudi_nic->port;
+	int irq;
+
+	cancel_delayed_work_sync(&gaudi_nic->port_open_work);
+
+	if (!gaudi_nic->port_open)
+		return;
+
+	gaudi_nic->port_open = false;
+	gaudi_nic->active = false;
+
+	/* Print if not in hard reset flow e.g. from ifconfig */
+	if (gaudi_nic->pcs_link && !hdev->hard_reset_pending)
+		dev_info(hdev->dev, "port %d was closed\n", port);
+
+	port_reset_state(gaudi_nic);
+
+	kfifo_free(&gaudi_nic->pcs_fail_fifo);
+
+	/* disable Tx in S/W */
+	netif_stop_queue(gaudi_nic->ndev);
+
+	/* disable Rx/Tx in H/W */
+	eth_start_stop(gaudi_nic, false);
+
+	if (hdev->nic_rx_poll) {
+		cancel_delayed_work_sync(&gaudi_nic->rx_poll_work);
+	} else {
+		napi_synchronize(&gaudi_nic->napi);
+		napi_disable(&gaudi_nic->napi);
+	}
+
+	/* disable Rx in S/W */
+	if (hdev->pdev) {
+		if (gaudi->multi_msi_mode) {
+			irq = pci_irq_vector(hdev->pdev, RX_MSI_IDX + port);
+			synchronize_irq(irq);
+			free_irq(irq, gaudi_nic);
+		} else {
+			irq = pci_irq_vector(hdev->pdev, 0);
+			synchronize_irq(irq);
+		}
+	}
+
+	netif_carrier_off(gaudi_nic->ndev);
+
+	flush_workqueue(gaudi_nic->cq_wq);
+	destroy_workqueue(gaudi_nic->cq_wq);
+
+	unmap_cq_mem(gaudi_nic);
+}
+
+int gaudi_nic_port_reset(struct gaudi_nic_device *gaudi_nic)
+{
+	port_close(gaudi_nic);
+	return port_open(gaudi_nic);
+}
+
+static int gaudi_nic_open(struct net_device *netdev)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	if (gaudi_nic->enabled)
+		return 0;
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		dev_err(hdev->dev, "port %d is in reset, can't open it\n",
+			gaudi_nic->port);
+		return -EBUSY;
+	}
+
+	netif_carrier_off(netdev);
+
+	/* in_reset will be set to 0 in port_open_work() */
+	INIT_DELAYED_WORK(&gaudi_nic->port_open_work, port_open_work);
+	schedule_delayed_work(&gaudi_nic->port_open_work, msecs_to_jiffies(1));
+
+	gaudi_nic->enabled = true;
+
+	return 0;
+}
+
+static int gaudi_nic_close(struct net_device *netdev)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+
+	if (!gaudi_nic->enabled)
+		return 0;
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		if (!gaudi->nic_in_teardown)
+			dev_err(hdev->dev,
+				"port %d is in reset, can't close it\n",
+				gaudi_nic->port);
+		return -EBUSY;
+	}
+
+	/*
+	 * this function may be called from 'ifconfig <nic_name> down', hence
+	 * the cleanup
+	 */
+	port_close(gaudi_nic);
+
+	gaudi_nic->enabled = false;
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+
+	return 0;
+}
+
+netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
+					struct sk_buff *skb)
+{
+	struct net_device_stats *stats = &gaudi_nic->ndev->stats;
+	bool pkt_sent;
+
+	if (!gaudi_nic->active || gaudi_nic->mac_loopback)
+		return NETDEV_TX_OK;
+
+	if (disabled_or_in_reset(gaudi_nic))
+		return NETDEV_TX_BUSY;
+
+	if (skb->len <= 0) {
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+
+#if HL_NIC_DEBUG
+	{
+		struct hl_device *hdev = gaudi_nic->hdev;
+
+		dev_dbg_ratelimited(hdev->dev,
+			"port: %d, addr: 0x%p, size: %d, tx_pi: %d, tx_ci: %d\n",
+			gaudi_nic->port, skb->data, skb->len,
+			gaudi_nic->tx_pi, gaudi_nic->tx_ci);
+	}
+#endif
+
+	pkt_sent = write_pkt_to_hw(gaudi_nic, (u64 *) skb->data, skb->len);
+	if (pkt_sent) {
+		stats->tx_packets++;
+		stats->tx_bytes += skb->len;
+	}
+
+	dev_kfree_skb_any(skb);
+
+	return NETDEV_TX_OK;
+}
+
+static netdev_tx_t gaudi_nic_xmit_frame(struct sk_buff *skb,
+					struct net_device *netdev)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+
+	return (netdev_tx_t) gaudi->nic_handle_tx(gaudi_nic, skb);
+}
+
+static int gaudi_nic_change_mtu(struct net_device *netdev, int new_mtu)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int rc;
+
+#ifndef _HAS_MIN_MAX_MTU
+	if (new_mtu < (ETH_ZLEN + ETH_FCS_LEN + VLAN_HLEN) ||
+			new_mtu > NIC_MAX_MTU)
+		return -EOPNOTSUPP;
+#endif
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		dev_err(hdev->dev, "port %d is in reset, can't change MTU",
+			port);
+		return -EBUSY;
+	}
+
+	if (gaudi_nic->enabled) {
+		port_close(gaudi_nic);
+		netdev->mtu = new_mtu;
+		rc = port_open(gaudi_nic);
+		if (rc)
+			dev_err(hdev->dev,
+				"Failed to reinit port %d for MTU change, rc %d",
+				port, rc);
+	}
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+
+	return 0;
+}
+
+static const struct net_device_ops gaudi_nic_netdev_ops = {
+	.ndo_open		= gaudi_nic_open,
+	.ndo_stop		= gaudi_nic_close,
+	.ndo_start_xmit		= gaudi_nic_xmit_frame,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= gaudi_nic_change_mtu,
+};
+
+static int port_register(struct hl_device *hdev, int port)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic = &gaudi->nic_devices[port];
+	struct gaudi_nic_device **ptr;
+	struct net_device *ndev;
+	int rc;
+
+	ndev = alloc_etherdev(sizeof(struct gaudi_nic_device *));
+	if (!ndev) {
+		dev_err(hdev->dev, "netdevice %d alloc failed\n", port);
+		return -ENOMEM;
+	}
+
+	gaudi_nic->ndev = ndev;
+	gaudi_nic->speed = hdev->pldm ? SPEED_50000 : SPEED_100000;
+	gaudi_nic->nic_macro = &gaudi->nic_macros[port >> 1];
+
+	if (gaudi_nic->speed != SPEED_100000 &&
+		gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4) {
+		dev_err(hdev->dev,
+			"NIC %d with 4 lanes should be used only with speed of 100000Mb/s\n",
+			port);
+		rc = -EFAULT;
+		goto netdev_free;
+	}
+
+	if (gaudi_nic->speed == SPEED_100000 &&
+			gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4 &&
+			(port & 1)) {
+		dev_err(hdev->dev,
+			"only even NIC ports should be up for speed of 100000Mb/s with 4 lanes\n");
+		rc = -EFAULT;
+		goto netdev_free;
+	}
+
+	gaudi_nic->pfc_enable = true;
+
+	SET_NETDEV_DEV(ndev, hdev->pdev ? &hdev->pdev->dev : NULL);
+	ptr = netdev_priv(ndev);
+	*ptr = gaudi_nic;
+
+	/* this is necessary for creating multiple NICs by the same driver */
+	ndev->dev_port = port;
+
+	ndev->netdev_ops = &gaudi_nic_netdev_ops;
+	ndev->watchdog_timeo = NIC_TX_TIMEOUT;
+	ndev->min_mtu = ETH_MIN_MTU;
+	ndev->max_mtu = NIC_MAX_MTU;
+
+	netif_napi_add(ndev, &gaudi_nic->napi, napi_clean,
+			NIC_NAPI_MAX_RX_BUDGET);
+
+	/* TODO: declare on supported features */
+
+	ether_addr_copy(ndev->dev_addr,
+		hdev->asic_prop.cpucp_nic_info.mac_addrs[port].mac_addr);
+
+	if (register_netdev(ndev)) {
+		dev_err(hdev->dev,
+			"Could not register netdevice, port: %d\n", port);
+		rc = -EFAULT;
+		goto netdev_free;
+	}
+
+	netif_carrier_off(ndev);
+
+	return 0;
+
+netdev_free:
+	free_netdev(ndev);
+	gaudi_nic->ndev = NULL;
+
+	return rc;
+}
+
+static void port_unregister(struct gaudi_nic_device *gaudi_nic)
+{
+	unregister_netdev(gaudi_nic->ndev);
+
+	free_netdev(gaudi_nic->ndev);
+	gaudi_nic->ndev = NULL;
+}
+
+irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg)
+{
+	return IRQ_HANDLED;
+}
+
+/**
+ * gaudi_nic_ports_init() - initialize NIC ports.
+ * @hdev: habanalabs device structure.
+ *
+ * Allocate and initialize the NIC ports.
+ *
+ * Return: 0 for success, non-zero for failure.
+ */
+int gaudi_nic_ports_init(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct cpucp_info *cpucp_info = &hdev->asic_prop.cpucp_info;
+	struct cpucp_nic_info *nic_info = &hdev->asic_prop.cpucp_nic_info;
+	struct cpucp_mac_addr *mac_arr = nic_info->mac_addrs;
+	s32 *taps;
+	int rc, i, nics_init = 0, cq_irq = 0;
+	u8 mac[ETH_ALEN];
+	bool read_card_location = false;
+
+	if (!hdev->nic_ports_mask)
+		return 0;
+
+	if (NIC_DRV_END_ADDR - NIC_DRV_BASE_ADDR > NIC_DRV_SIZE) {
+		dev_err(hdev->dev,
+			"DRAM allocation for NIC shouldn't exceed %dMB\n",
+			NIC_DRV_SIZE / 1024 / 1024);
+		return -ENOMEM;
+	}
+
+	if (TMR_FSM_SIZE + TMR_FREE_SIZE + TMR_FIFO_SIZE +
+			TMR_FIFO_STATIC_SIZE >
+		TMR_FSM_ENGINE_OFFS) {
+		dev_err(hdev->dev,
+			"NIC TMR data shouldn't be bigger than %dMB\n",
+			TMR_FSM_ENGINE_OFFS / 1024 / 1024);
+		return -ENOMEM;
+	}
+
+	/* set the default PAM4 Tx taps */
+	for (i = 0 ; i < NIC_MAX_NUM_OF_LANES ; i++) {
+		taps = gaudi->nic_pam4_tx_taps[i].taps;
+		taps[0] = 0;
+		taps[1] = -6;
+		taps[2] = 25;
+		taps[3] = 0;
+		taps[4] = 0;
+	}
+
+	/* copy the MAC OUI in reverse */
+	for (i = 0 ; i < 3 ; i++)
+		mac[i] = HABANALABS_MAC_OUI_1 >> (8 * (2 - i));
+
+	if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q) {
+		char buf[VERSION_MAX_LEN] = {0}, *str;
+		u8 *mac_addr;
+
+		rc = hl_fw_cpucp_nic_info_get(hdev);
+		if (rc)
+			return rc;
+
+		for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+			if (!(hdev->nic_ports_mask & BIT(i)))
+				continue;
+
+			mac_addr = mac_arr[i].mac_addr;
+			if (strncmp(mac, mac_addr, 3)) {
+				dev_err(hdev->dev,
+					"bad MAC OUI %02x:%02x:%02x:%02x:%02x:%02x, port %d\n",
+					mac_addr[0], mac_addr[1], mac_addr[2],
+					mac_addr[3], mac_addr[4], mac_addr[5],
+					i);
+				return -EFAULT;
+			}
+		}
+
+		hdev->nic_ports_mask &= le64_to_cpu(nic_info->link_mask[0]);
+		hdev->nic_ports_ext_mask &=
+					le64_to_cpu(nic_info->link_ext_mask[0]);
+		hdev->nic_auto_neg_mask &=
+					le64_to_cpu(nic_info->auto_neg_mask[0]);
+		gaudi->nic_use_fw_polarity = true;
+
+		for (i = 1 ; i < 11 ; i++) {
+			sprintf(buf, "hl-gaudi-0.%d.", i);
+			str = strstr(cpucp_info->kernel_version, buf);
+			if (!str)
+				continue;
+
+			/*
+			 * No PMC polarity and external ports mask prior to F/W
+			 * version 0.9.0.
+			 */
+			if (i < 9) {
+				hdev->nic_ports_ext_mask = HLS1_EXT_PORTS_MASK;
+				gaudi->nic_use_fw_polarity = false;
+			}
+
+			/* No Autoneg mask prior to F/W version 0.11.0, hence:
+			 * - No Autoneg on external ports on PMC card prior to
+			 *   that version.
+			 * - No Autoneg at all on PCI card prior to that
+			 *   version.
+			 */
+			if (hdev->card_type == cpucp_card_type_pmc)
+				hdev->nic_auto_neg_mask = hdev->nic_ports_mask &
+						~hdev->nic_ports_ext_mask;
+			else
+				hdev->nic_auto_neg_mask = 0;
+
+			/*
+			 * No privileged protection prior to F/W version 0.11.0
+			 * so we can read the card location from a register.
+			 */
+			read_card_location = true;
+			break;
+		}
+	} else {
+		/*
+		 * No CPU, hence set the MAC addresses manually.
+		 * Each device will have its own unique MAC random.
+		 */
+		get_random_bytes(&mac[3], 2);
+
+		for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+			mac[ETH_ALEN - 1] = i;
+			memcpy(mac_arr[i].mac_addr, mac, ETH_ALEN);
+		}
+
+		read_card_location = true;
+	}
+
+	if (read_card_location) {
+		u32 card_location = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
+
+		cpucp_info->card_location =
+				cpu_to_le32((card_location >> 22) & 0x7);
+	}
+
+	for (i = 0 ; i < NIC_NUMBER_OF_MACROS ; i++) {
+		gaudi->nic_macros[i].idx = i;
+		gaudi->nic_macros[i].num_of_lanes = NIC_LANES_2;
+	}
+
+	/*
+	 * for each NIC macro, set the even port to handle the macro
+	 * configuration, unless the even port is disabled and in this case the
+	 * odd port will handle the configuration.
+	 */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++)
+		if ((hdev->nic_ports_mask & BIT(i)) &&
+			(!(i & 1) || !(hdev->nic_ports_mask & BIT(i - 1))))
+			gaudi->nic_devices[i].do_macro_cfg = true;
+
+	gaudi->nic_pcs_fail_time_frame = PCS_FAIL_TIME_FRAME_SEC;
+	gaudi->nic_pcs_fail_threshold = PCS_FAIL_THRESHOLD;
+	gaudi->nic_check_link = true;
+
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		/* One IRQ for all ports to indicate a CQ overrun */
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		rc = request_irq(cq_irq, gaudi_nic_cq_irq_handler, 0,
+					"gaudi nic cq", hdev);
+		if (rc) {
+			dev_err(hdev->dev, "Failed to request CQ IRQ %d, %d\n",
+				cq_irq, rc);
+			return rc;
+		}
+
+		gaudi->nic_cq_irq_enable = true;
+	}
+
+	/* Must be called here as it depends on the earlier initializations */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++, nics_init++)
+		if (hdev->nic_ports_mask & BIT(i)) {
+			rc = port_register(hdev, i);
+			if (rc) {
+				dev_err(hdev->dev, "NIC port %d init failed\n",
+							i);
+				goto unregister_ports;
+			}
+		}
+
+	gaudi->hw_cap_initialized |= HW_CAP_NIC_DRV;
+
+	return 0;
+
+unregister_ports:
+	for (i = 0 ; i < nics_init ; i++)
+		if (hdev->nic_ports_mask & BIT(i))
+			port_unregister(&gaudi->nic_devices[i]);
+
+	if (gaudi->nic_cq_irq_enable) {
+		synchronize_irq(cq_irq);
+		free_irq(cq_irq, hdev);
+	}
+
+	return rc;
+}
+
+/**
+ * gaudi_nic_ports_fini() - cleanup NIC ports.
+ * @hdev: habanalabs device structure.
+ *
+ * Perform cleanup and freeing of the NIC ports.
+ */
+void gaudi_nic_ports_fini(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int i, cq_irq;
+
+	gaudi->nic_in_teardown = true;
+
+	/* The HW_CAP_NIC_DRV bit of gaudi->hw_cap_initialized cannot be used as
+	 * a prerequisite for this function, as we may arrive here after a
+	 * failing hard reset w/o calling to gaudi_nic_ports_reopen().
+	 */
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)) ||
+				!gaudi->nic_devices[i].ndev)
+			continue;
+
+		port_unregister(&gaudi->nic_devices[i]);
+	}
+
+	if (gaudi->nic_cq_irq_enable) {
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		synchronize_irq(cq_irq);
+		free_irq(cq_irq, hdev);
+		gaudi->nic_cq_irq_enable = false;
+	}
+}
+
+/**
+ * gaudi_nic_hard_reset_prepare() - stop the NIC Rx, Tx, CQ and synchronize
+ *                                  with other NIC reset flows.
+ * @hdev: habanalabs device structure.
+ *
+ * This function makes sure that during the reset no packets will be processed
+ * and that ndo_open/ndo_close do not open/close the NIC.
+ * A hard reset might occur right after the driver was loaded, which means
+ * before the NICs initialization was finished. Therefore, even if the NIC is
+ * not yet enabled, we mark it as in reset to avoid races. We clear the in reset
+ * flag later on when reopening the NICs.
+ *
+ * Return: 0 for success, non-zero for failure.
+ */
+int gaudi_nic_hard_reset_prepare(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	ktime_t timeout;
+	int i;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV) ||
+			(gaudi->nic_in_reset))
+		return 0;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		/*
+		 * This function is competing with the NIC reset from ethtool,
+		 * so try to take the in_reset atomic and if we are already in a
+		 * middle of reset, wait until reset function is finished.
+		 * Reset function is designed to always finish (could take up to
+		 * a few seconds in worst case).
+		 */
+
+		timeout = ktime_add_ms(ktime_get(),
+					HL_PENDING_RESET_PER_SEC * 1000 * 4);
+		while (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+			usleep_range(50, 200);
+			if (ktime_compare(ktime_get(), timeout) > 0) {
+				WARN(1,
+					"Timeout while waiting for port %d to finish reset\n",
+					gaudi_nic->port);
+				return -EBUSY;
+			}
+		}
+	}
+
+	gaudi->nic_in_reset = true;
+
+	return 0;
+}
+
+/**
+ * gaudi_nic_stop() - stop the NIC S/W and H/W.
+ * @hdev: habanalabs device structure.
+ *
+ * This function stops the operation of the NIC S/W and H/W, no packets are
+ * processed after this call.
+ */
+void gaudi_nic_stop(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	int i, cq_irq;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		if ((hdev->nic_ports_mask & BIT(i)) && gaudi_nic->enabled)
+			port_close(gaudi_nic);
+	}
+
+	if (gaudi->nic_cq_irq_enable) {
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		synchronize_irq(cq_irq);
+		free_irq(cq_irq, hdev);
+		gaudi->nic_cq_irq_enable = false;
+	}
+}
+
+/**
+ * gaudi_nic_ports_reopen() - reopen the NIC ports.
+ * @hdev: habanalabs device structure.
+ *
+ * This function start the operation of the NIC ports, packets will be processed
+ * after this call.
+ * Called after hard reset to reopen the NIC ports that were closed during the
+ * reset.
+ */
+void gaudi_nic_ports_reopen(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	u32 port;
+	int rc, i, nics_init = 0, cq_irq;
+
+	if (gaudi->hw_cap_initialized & HW_CAP_NIC_DRV)
+		return;
+
+	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
+		/* One IRQ for all ports to indicate a CQ overrun */
+		cq_irq = pci_irq_vector(hdev->pdev, CQ_MSI_IDX);
+		rc = request_irq(cq_irq, gaudi_nic_cq_irq_handler, 0,
+					"gaudi nic cq", hdev);
+		if (rc)
+			dev_err(hdev->dev, "Failed to request CQ IRQ %d, %d\n",
+				cq_irq, rc);
+		else
+			gaudi->nic_cq_irq_enable = true;
+	}
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++, nics_init++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+		port = gaudi_nic->port;
+
+		/*
+		 * It could be that the port was shutdown by 'ifconfig down',
+		 * and there is no need in reopening it.
+		 * Since we mark the ports as in reset even if they are
+		 * disabled, we clear the flag here anyway.
+		 * See gaudi_nic_hard_reset_prepare() for more info.
+		 */
+		if (!gaudi_nic->enabled) {
+			atomic_set(&gaudi_nic->in_reset, 0);
+			continue;
+		}
+
+		schedule_delayed_work(&gaudi_nic->port_open_work,
+					msecs_to_jiffies(1));
+	}
+
+	gaudi->nic_in_reset = false;
+
+	gaudi->hw_cap_initialized |= HW_CAP_NIC_DRV;
+}
+
+void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
+{
+}
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.h b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
new file mode 100644
index 000000000000..34bcf0514d30
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
@@ -0,0 +1,337 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+#ifndef GAUDI_NIC_DRV_H_
+#define GAUDI_NIC_DRV_H_
+
+#include "gaudiP.h"
+#include "../include/gaudi/gaudi_fw_if.h"
+
+/* Time in jiffies before concluding the transmitter is hung */
+#define NIC_TX_TIMEOUT			(5 * HZ)
+
+#define NIC_RX_SIZE			1024
+#define NIC_NAPI_MAX_RX_BUDGET		64
+#define NIC_MAX_PKT_SIZE		2048
+#define NIC_ARP_PKT_SIZE		28
+
+#if (NIC_MAX_PKT_SIZE & (NIC_MAX_PKT_SIZE - 1))
+#error "Max ETH packet size is not a power of 2"
+#endif
+
+#define ETH_P_LLDP		0x88CC
+
+#define NIC_MACRO_CFG_SIZE	(mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0)
+#define NIC_CFG_SIZE		(mmNIC0_QPC1_REQ_STATIC_CONFIG - \
+					mmNIC0_QPC0_REQ_STATIC_CONFIG)
+
+#define NIC_MAX_QP_NUM		(HL_NIC_MAX_CONN_ID + 1)
+#define NIC_HW_MAX_QP_NUM	0x8000 /* 32K */
+
+#if (NIC_MAX_QP_NUM > NIC_HW_MAX_QP_NUM)
+#error "Number of available QPs must be smaller or equal to NIC_HW_MAX_QP_NUM"
+#endif
+
+/* The '*_SIZE' defines are per NIC port */
+#define REQ_QPC_BASE_SIZE	(NIC_MAX_QP_NUM * sizeof(struct qpc_requester))
+#define RES_QPC_BASE_SIZE	(NIC_MAX_QP_NUM * sizeof(struct qpc_responder))
+#define SWQ_BASE_SIZE		(WQ_BUFFER_SIZE * sizeof(struct sq_wqe))
+#define SB_BASE_SIZE		(WQ_BUFFER_SIZE * NIC_MAX_PKT_SIZE)
+
+#define TMR_BASE_SIZE		(TMR_FSM_ENGINE_OFFS + TMR_FSM_SIZE)
+
+#define TMR_FSM_ENGINE_OFFS	(1 << 22) /* H/W constraint */
+
+#define TMR_FSM_SIZE		ALIGN(NIC_HW_MAX_QP_NUM, DEVICE_CACHE_LINE_SIZE)
+#define TMR_FREE_SIZE		ALIGN(TMR_FREE_NUM_ENTRIES * 4, \
+					DEVICE_CACHE_LINE_SIZE)
+/* each timer serves two NICs, hence multiply by 2 */
+#define TMR_FIFO_SIZE		ALIGN((NIC_MAX_QP_NUM * 2 * 4), \
+					DEVICE_CACHE_LINE_SIZE)
+#define TMR_FIFO_STATIC_SIZE	(DEVICE_CACHE_LINE_SIZE * TMR_GRANULARITY)
+
+#define TMR_FSM0_OFFS		0
+#define TMR_FREE_OFFS		(TMR_FSM0_OFFS + TMR_FSM_SIZE)
+#define TMR_FIFO_OFFS		(TMR_FREE_OFFS + TMR_FREE_SIZE)
+#define TMR_FSM1_OFFS		(TMR_FSM0_OFFS + TMR_FSM_ENGINE_OFFS)
+
+#define TMR_FREE_NUM_ENTRIES	(TMR_FIFO_SIZE / DEVICE_CACHE_LINE_SIZE)
+#define TMR_GRANULARITY		128
+
+#define TXS_BASE_SIZE		(TXS_FREE_SIZE + TXS_FIFO_SIZE + \
+					TXS_FIFO_STATIC_SIZE)
+
+
+#define TXS_FREE_SIZE		ALIGN(TXS_FREE_NUM_ENTRIES * 4, \
+					DEVICE_CACHE_LINE_SIZE)
+/* TXS serves requester and responder QPs, hence multiply by 2 */
+#define TXS_FIFO_SIZE		ALIGN((NIC_MAX_QP_NUM * 2 * 4), \
+					DEVICE_CACHE_LINE_SIZE)
+#define TXS_FIFO_STATIC_SIZE	(DEVICE_CACHE_LINE_SIZE * TXS_GRANULARITY)
+
+#define TXS_FREE_OFFS		0
+#define TXS_FIFO_OFFS		(TXS_FREE_OFFS + TXS_FREE_SIZE)
+
+#define TXS_FREE_NUM_ENTRIES	(TXS_FIFO_SIZE / DEVICE_CACHE_LINE_SIZE)
+#define TXS_GRANULARITY		256
+#define TXS_SCHEDQ		256
+
+#define SECTION_ALIGN_SIZE	0x100000ull
+#define NIC_DRV_BASE_ADDR	ALIGN(NIC_DRV_ADDR, SECTION_ALIGN_SIZE)
+
+#define REQ_QPC_BASE_ADDR	NIC_DRV_BASE_ADDR
+
+#define RES_QPC_BASE_ADDR	ALIGN(REQ_QPC_BASE_ADDR + \
+					NIC_NUMBER_OF_ENGINES * \
+					REQ_QPC_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define TMR_BASE_ADDR		ALIGN(RES_QPC_BASE_ADDR + \
+					NIC_NUMBER_OF_ENGINES * \
+					RES_QPC_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define TXS_BASE_ADDR		ALIGN(TMR_BASE_ADDR + \
+					NIC_NUMBER_OF_MACROS * \
+					TMR_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define SWQ_BASE_ADDR		ALIGN(TXS_BASE_ADDR + \
+					NIC_NUMBER_OF_ENGINES * \
+					TXS_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define SB_BASE_ADDR		ALIGN(SWQ_BASE_ADDR + \
+					NIC_MAX_NUMBER_OF_PORTS * \
+					SWQ_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define NIC_DRV_END_ADDR	ALIGN(SB_BASE_ADDR + NIC_MAX_NUMBER_OF_PORTS * \
+					SB_BASE_SIZE, SECTION_ALIGN_SIZE)
+
+#define WQ_BUFFER_LOG_SIZE		8
+#define WQ_BUFFER_SIZE			(1 << WQ_BUFFER_LOG_SIZE)
+#define CQ_PORT_BUF_LEN			(1 << 18)
+#define CQE_SIZE			sizeof(struct cqe)
+#define CQ_PORT_BUF_SIZE		(CQ_PORT_BUF_LEN * CQE_SIZE)
+#define CQ_USER_MAX_SIZE		(1 << 30) /* 1GB */
+#define CQ_USER_MIN_ENTRIES		128
+#define CQ_USER_MAX_ENTRIES		(CQ_USER_MAX_SIZE / CQE_SIZE)
+#define QP_ERR_BUF_SIZE			(QP_ERR_SIZE * QP_ERR_BUF_LEN)
+#define QP_ERR_SIZE			sizeof(struct qp_err)
+#define QP_ERR_BUF_LEN			1024
+#define RX_PKT_MAX_SIZE			2048
+#define QPC_RES_LOG_BUF_SIZE_MASK	10
+#define RAW_QPN				0
+#define RX_MSI_IDX			(GAUDI_EVENT_QUEUE_MSI_IDX + 1)
+#define RX_MSI_ADDRESS			(mmPCIE_MSI_INTR_0 + RX_MSI_IDX * 4)
+#define CQ_MSI_IDX			(NUMBER_OF_CMPLT_QUEUES + \
+						NUMBER_OF_CPU_HW_QUEUES + \
+						NIC_NUMBER_OF_ENGINES)
+#define CQ_MSI_ADDRESS			(mmPCIE_MSI_INTR_0 + CQ_MSI_IDX * 4)
+
+#define WQE_MAX_SIZE			max(NIC_SEND_WQE_SIZE, \
+						NIC_RECV_WQE_SIZE)
+#define USER_WQES_MAX_NUM		(1 << 21) /* 2MB */
+#define USER_WQ_ARR_MAX_SIZE		ALIGN((1ull * NIC_HW_MAX_QP_NUM * \
+					USER_WQES_MAX_NUM * \
+						WQE_MAX_SIZE), PAGE_SIZE_2MB)
+
+#define CQ_VIRTUAL_ADDRESS		VA_NIC_MEM_ADDR
+
+#define USER_SWQ_VIRTUAL_ADDRESS	ALIGN(CQ_VIRTUAL_ADDRESS + \
+					NIC_NUMBER_OF_ENGINES * \
+						CQ_PORT_BUF_SIZE, \
+							SECTION_ALIGN_SIZE)
+
+#define USER_RWQ_VIRTUAL_ADDRESS	ALIGN(USER_SWQ_VIRTUAL_ADDRESS + \
+					NIC_NUMBER_OF_ENGINES * \
+						USER_WQ_ARR_MAX_SIZE, \
+							SECTION_ALIGN_SIZE)
+
+#define REQ_QPC_ADDR(port, conn_id) \
+	(REQ_QPC_BASE_ADDR + (port) * REQ_QPC_BASE_SIZE + (conn_id) * \
+			sizeof(struct qpc_requester))
+
+#define RES_QPC_ADDR(port, conn_id) \
+	(RES_QPC_BASE_ADDR + (port) * RES_QPC_BASE_SIZE + (conn_id) * \
+			sizeof(struct qpc_responder))
+
+#define NIC_DR_10		1031250
+#define NIC_DR_25		2578125
+#define NIC_DR_26		2656250
+#define NIC_DR_50		5312500
+
+#define NIC_LANES_2		2
+#define NIC_LANES_4		4
+
+/*
+ * change WQ_BUFFER_LOG_SIZE to log2(SWQ_BASE_SIZE/WQE_BB_SIZE).
+ * can use WQ_BUFFER_SIZE/WQE_BB_SIZE instead.
+ */
+
+enum ts_type {
+	TS_RC = 0,
+	TS_RAW = 1
+};
+
+enum wqe_opcode {
+	WQE_NOP = 0,
+	WQE_SEND = 1,
+	WQE_LINEAR = 2,
+	WQE_STRIDE = 3,
+	WQE_MULTI_STRIDE = 4,
+	WQE_RATE_UPDATE  = 5
+};
+
+enum trust_level {
+	UNSECURED = 0,
+	SECURED = 1,
+	PRIVILEGE = 2
+};
+
+struct qpc_requester {
+	u64	data[8];
+};
+
+#define QPC_SET(qpc, idx, shift, val, len) \
+		((qpc).data[idx] |= (u64) ((val) & (BIT(len) - 1)) << (shift))
+
+#define REQ_QPC_SET_DST_QP(req, val)		QPC_SET(req, 0, 0, val, 24)
+#define REQ_QPC_SET_PORT(req, val)		QPC_SET(req, 0, 24, val, 4)
+#define REQ_QPC_SET_PRIORITY(req, val)		QPC_SET(req, 0, 28, val, 2)
+#define REQ_QPC_SET_RKEY(req, val)		QPC_SET(req, 0, 32, val, 32)
+#define REQ_QPC_SET_DST_IP(req, val)		QPC_SET(req, 1, 0, val, 32)
+#define REQ_QPC_SET_SRC_IP(req, val)		QPC_SET(req, 1, 32, val, 32)
+#define REQ_QPC_SET_DST_MAC_31_0(req, val)	QPC_SET(req, 2, 0, val, 32)
+#define REQ_QPC_SET_DST_MAC_47_32(req, val)	QPC_SET(req, 2, 32, val, 16)
+#define REQ_QPC_SET_SQ_NUM(req, val)		QPC_SET(req, 3, 24, val, 8)
+#define REQ_QPC_SET_TM_GRANULARITY(req, val)	QPC_SET(req, 3, 56, val, 7)
+#define REQ_QPC_SET_SOB_EN(req, val)		QPC_SET(req, 3, 63, val, 1)
+#define REQ_QPC_SET_TRANSPORT_SERVICE(req, val)	QPC_SET(req, 5, 49, val, 1)
+#define REQ_QPC_SET_BURST_SIZE(req, val)	QPC_SET(req, 5, 50, val, 22)
+#define REQ_QPC_SET_LAST_IDX(req, val)		QPC_SET(req, 6, 8, val, 22)
+#define REQ_QPC_SET_SWQ_GRANULARITY(req, val)	QPC_SET(req, 7, 58, val, 1)
+#define REQ_QPC_SET_WQ_BASE_ADDR(req, val)	QPC_SET(req, 7, 32, val, 24)
+#define REQ_QPC_SET_SECURED(req, val)		QPC_SET(req, 7, 59, val, 2)
+#define REQ_QPC_SET_VALID(req, val)		QPC_SET(req, 7, 63, val, 1)
+
+struct qpc_responder {
+	u64	data[4];
+};
+
+#define RES_QPC_SET_DST_QP(res, val)		QPC_SET(res, 0, 0, val, 24)
+#define RES_QPC_SET_PORT(res, val)		QPC_SET(res, 0, 24, val, 4)
+#define RES_QPC_SET_PRIORITY(res, val)		QPC_SET(res, 0, 28, val, 2)
+#define RES_QPC_SET_SQ_NUM(res, val)		QPC_SET(res, 2, 48, val, 8)
+#define RES_QPC_SET_LKEY(res, val)		QPC_SET(res, 0, 32, val, 32)
+#define RES_QPC_SET_DST_IP(res, val)		QPC_SET(res, 1, 0, val, 32)
+#define RES_QPC_SET_SRC_IP(res, val)		QPC_SET(res, 1, 32, val, 32)
+#define RES_QPC_SET_DST_MAC_31_0(res, val)	QPC_SET(res, 2, 0, val, 32)
+#define RES_QPC_SET_DST_MAC_47_32(res, val)	QPC_SET(res, 2, 32, val, 16)
+#define RES_QPC_SET_TRANSPORT_SERVICE(res, val)	QPC_SET(res, 2, 63, val, 1)
+#define RES_QPC_SET_LOG_BUF_SIZE_MASK(res, val)	QPC_SET(res, 3, 24, val, 5)
+#define RES_QPC_SET_SOB_EN(res, val)		QPC_SET(res, 3, 59, val, 1)
+#define RES_QPC_SET_VALID(res, val)		QPC_SET(res, 3, 63, val, 1)
+#define RES_QPC_SET_SECURED(res, val)		QPC_SET(res, 3, 60, val, 2)
+
+/**
+ * struct hl_qp - Describes a NIC Queue Pair.
+ * @qpc_lock: Mutex to protect accessing the QP context.
+ * @refcount: Reference counter for the QP usage.
+ * @gaudi_nic: Pointer to NIC device this QP belongs to.
+ * @port: The port number this QP belongs to.
+ * @conn_id: The QP number within its port.
+ * @local_key: Key for local access.
+ * @remote_key: Key for remote access.
+ * @is_req: is requester context was set for the QP.
+ * @is_res: is responder context was set for the QP.
+ */
+struct hl_qp {
+	struct mutex qpc_lock;
+	struct kref refcount;
+	struct gaudi_nic_device *gaudi_nic;
+	u32 port;
+	u32 conn_id;
+	u32 local_key;
+	u32 remote_key;
+	u8 is_req;
+	u8 is_res;
+};
+
+struct sq_wqe {
+	u64	data[4];
+};
+
+#define CFG_SQ_WQE_OPCODE(swq, val) \
+						((swq).data[0] |= (val) << 28)
+#define CFG_SQ_WQE_LOCAL_ADDRESS_31_0(swq, val) \
+						((swq).data[0] |= (val) << 32)
+#define CFG_SQ_WQE_LOCAL_ADDRESS_49_32(swq, val) \
+						((swq).data[1] |= (val))
+#define CFG_SQ_WQE_SIZE(swq, val) \
+						((swq).data[1] |= (val) << 18)
+
+struct cqe {
+	u64	data;
+};
+
+#define CQE_IS_VALID(cqe)		(((cqe)->data >> 63) & 1)
+#define CQE_TYPE(cqe)			(((cqe)->data >> 23) & 1)
+#define CQE_RES_NIC(cqe)		(((cqe)->data >> 10) & 1)
+#define CQE_RES_IMDT_21_0(cqe)		(((cqe)->data >> 32) & 0x3FFFFF)
+#define CQE_RES_IMDT_31_22(cqe)		((cqe)->data & 0x3FF)
+#define CQE_REQ_WQE_IDX(cqe)		(((cqe)->data >> 32) & 0x3FFFFF)
+#define CQE_REQ_QPN(cqe)		((cqe)->data & 0x7FFFFF)
+#define CQE_SET_INVALID(cqe)		((cqe)->data &= ~(1ull << 63))
+
+struct qp_err {
+	u32	data;
+};
+
+#define QP_ERR_QP_NUM(qp_err)		((qp_err).data & 0xFFFFFF)
+#define QP_ERR_ERR_NUM(qp_err)		(((qp_err).data >> 24) & 0x7F)
+#define QP_ERR_IS_REQ(qp_err)		(((qp_err).data >> 31) & 1)
+
+/*
+ * Some registers are specific for each NIC port, and some are shared for all
+ * the NIC macro (a pair of even and odd port).
+ * Therefore we need different methods to handle these registers.
+ */
+
+/* read/write port specific registers */
+#define NIC_CFG_BASE(port) \
+			((u64) (NIC_MACRO_CFG_SIZE * (u64) ((port) >> 1) + \
+					NIC_CFG_SIZE * (u64) ((port) & 1)))
+
+#define NIC_RREG32(reg)		RREG32(NIC_CFG_BASE(gaudi_nic->port) + (reg))
+#define NIC_WREG32(reg, val)	WREG32(NIC_CFG_BASE(gaudi_nic->port) + (reg), \
+					(val))
+#define NIC_RMWREG32(reg, val, mask)	\
+		RMWREG32(NIC_CFG_BASE(gaudi_nic->port) + (reg), (val), (mask))
+
+/* read/write shared registers */
+#define NIC_MACRO_CFG_BASE(port) \
+			((u64) (NIC_MACRO_CFG_SIZE * (u64) ((port) >> 1)))
+
+#define NIC_MACRO_RREG32_PORT(reg, port) \
+			RREG32(NIC_MACRO_CFG_BASE(port) + reg)
+#define NIC_MACRO_WREG32_PORT(reg, val, port) \
+			WREG32(NIC_MACRO_CFG_BASE(port) + reg, val)
+
+#define NIC_MACRO_RREG32(reg) NIC_MACRO_RREG32_PORT(reg, gaudi_nic->port)
+#define NIC_MACRO_WREG32(reg, val) \
+				NIC_MACRO_WREG32_PORT(reg, val, gaudi_nic->port)
+
+extern const struct ethtool_ops gaudi_nic_ethtool_ops;
+extern const struct dcbnl_rtnl_ops gaudi_nic_dcbnl_ops;
+
+void gaudi_nic_debugfs_init(struct hl_device *hdev);
+void gaudi_nic_set_pfc(struct gaudi_nic_device *gaudi_nic);
+u32 gaudi_nic_mac_read(struct gaudi_nic_device *gaudi_nic, int mac,
+			char *cfg_type, u32 addr);
+int gaudi_nic_port_reset(struct gaudi_nic_device *gaudi_nic);
+bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic);
+u64 gaudi_nic_read_mac_stat_counter(struct hl_device *hdev, u32 port, int idx,
+					bool is_rx);
+
+#endif /* GAUDI_NIC_DRV_H_ */
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 46a900fb3ef8..8e15d9f85af8 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5270,6 +5270,11 @@ static int goya_ctx_init(struct hl_ctx *ctx)
 	return 0;
 }
 
+static void goya_ctx_fini(struct hl_ctx *ctx)
+{
+
+}
+
 u32 goya_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 {
 	return cq_idx;
@@ -5383,6 +5388,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.wreg = hl_wreg,
 	.halt_coresight = goya_halt_coresight,
 	.ctx_init = goya_ctx_init,
+	.ctx_fini = goya_ctx_fini,
 	.get_clk_rate = goya_get_clk_rate,
 	.get_queue_id_for_cq = goya_get_queue_id_for_cq,
 	.read_device_fw_version = goya_read_device_fw_version,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 69fb44d35292..e8a5b62b95dd 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -827,6 +827,9 @@ struct hl_debug_args {
 	__u32 ctx_id;
 };
 
+#define HL_NIC_MIN_CONN_ID	1
+#define HL_NIC_MAX_CONN_ID	1023
+
 /*
  * Various information operations such as:
  * - H/W IP information
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/15] habanalabs/gaudi: add NIC PHY code
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (3 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 07/15] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Configure the NIC PHY (physical layer). The PHY is configured with the
correct polarity and Tx taps depending on the card type.

After the initial configuration, the PHY flow contains the following:
- Auto-negotiation (if enabled)
- PHY F/W tuning
- Physical Coding Sublayer (PCS) link check

After acquiring the initial PCS link, it is checked periodically. Once we
detect that there is no link, we fall to PHY F/W tuning or even
Auto-negotiation to re-acquire the link.

Currently we use Auto-negotiation only because it is a prerequisite for
link training (physical layer quality improvement) and not for setting the
transmission parameters. As a result, the Auto-negotiation is currently
supported only between Gaudi cards.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/Makefile    |    2 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c |  454 +++++++-
 drivers/misc/habanalabs/gaudi/gaudi_nic.h |   17 +
 drivers/misc/habanalabs/gaudi/gaudi_phy.c | 1272 +++++++++++++++++++++
 4 files changed, 1742 insertions(+), 3 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c

diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index 24e14cff563d..c5143cf6f025 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -2,4 +2,4 @@
 HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
-HL_GAUDI_FILES += gaudi/gaudi_nic.o
+HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index df41de95ba58..ff08cfc81e69 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -687,13 +687,26 @@ static void config_port_mac(struct gaudi_nic_device *gaudi_nic)
 	}
 }
 
+static void phy_start_stop(struct gaudi_nic_device *gaudi_nic, bool is_start)
+{
+	int i;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->power_up_mask & BIT(i)))
+			continue;
+
+		gaudi_nic_phy_start_stop(gaudi_nic, i, is_start);
+	}
+}
+
 static int hw_config(struct gaudi_nic_device *gaudi_nic)
 {
 	struct hl_device *hdev = gaudi_nic->hdev;
 	struct gaudi_device *gaudi = hdev->asic_specific;
 	u64 mac_addr = 0, tmr_addr;
 	u32 port = gaudi_nic->port, data_rate, speed = gaudi_nic->speed;
-	int i;
+	int i, rc;
+	bool do_auto_neg;
 
 	for (i = 0 ; i < ETH_ALEN ; i++) {
 		mac_addr <<= 8;
@@ -729,6 +742,26 @@ static int hw_config(struct gaudi_nic_device *gaudi_nic)
 
 	gaudi_nic->data_rate = data_rate;
 
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->power_up_mask & BIT(i)))
+				continue;
+
+			do_auto_neg = gaudi_nic->auto_neg_enable &&
+					(gaudi_nic->auto_neg_mask & BIT(i));
+
+			rc = gaudi_nic_phy_power_up(gaudi_nic, i, do_auto_neg);
+			if (rc) {
+				dev_err(hdev->dev,
+					"PHY power up failed for port %d\n",
+					port);
+				return rc;
+			}
+		}
+
+		phy_start_stop(gaudi_nic, true);
+	}
+
 	/* if no need in macro configuration, do only port configuration */
 	if (gaudi_nic->do_macro_cfg) {
 		config_port_mac(gaudi_nic);
@@ -1191,6 +1224,364 @@ static void port_reset_state(struct gaudi_nic_device *gaudi_nic)
 	gaudi_nic->uncorrectable_errors_cnt = 0;
 }
 
+static void phy_reconfig(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u32 port = gaudi_nic->port;
+	int i, rc;
+
+	if (!gaudi->nic_phy_config_fw)
+		return;
+
+	dev_dbg(hdev->dev, "reconfiguring PHY, port %d\n", port);
+
+	if (gaudi_nic->auto_neg_enable) {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->auto_neg_mask & BIT(i)))
+				continue;
+
+			rc = gaudi_nic_phy_fw_config_auto_neg(gaudi_nic, i);
+			if (rc)
+				dev_dbg(hdev->dev,
+					"F/W reconfig autoneg failed, port: %d, lane: %d\n",
+					port, i);
+		}
+	} else {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->power_up_mask & BIT(i)))
+				continue;
+
+			rc = gaudi_nic_phy_power_up(gaudi_nic, i, false);
+			if (rc) {
+				dev_err(hdev->dev,
+					"PHY reconfig power up failed for port %d\n",
+					port);
+				break;
+			}
+		}
+	}
+
+	port_reset_state(gaudi_nic);
+}
+
+static enum link_status update_pcs_link_failure(
+					struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct kfifo *pcs_fifo = &gaudi_nic->pcs_fail_fifo;
+	ktime_t now, before;
+	u32 port = gaudi_nic->port;
+	int count;
+
+	if (!gaudi_nic->auto_neg_enable)
+		return PCS_DOWN;
+
+	now = ktime_get();
+
+	count = kfifo_in(pcs_fifo, &now, sizeof(now));
+	if (count != sizeof(now)) {
+		dev_err(hdev->dev,
+			"Failed to push to PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+		return PCS_DOWN;
+	}
+
+	gaudi_nic->pcs_fail_cnt++;
+
+	if (gaudi_nic->pcs_fail_cnt < gaudi->nic_pcs_fail_threshold)
+		return PCS_DOWN;
+
+	/*
+	 * Here we reached the threshold count of failures to reconfigure the
+	 * link. Now need to check if all of the failure are in the needed time
+	 * frame. It is sufficient to check the first item in the queue as it is
+	 * the earliest failure and if it is in the needed time frame, all the
+	 * rest if failures are in it too.
+	 */
+	count = kfifo_out_peek(pcs_fifo, &before, sizeof(before));
+	if (count != sizeof(before))
+		dev_err(hdev->dev,
+			"Failed to peek in PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+
+	if (ktime_ms_delta(now, before) <=
+			(gaudi->nic_pcs_fail_time_frame * MSEC_PER_SEC)) {
+		dev_dbg(hdev->dev,
+			"PHY reconfig due to PCS link failure cnt, port: %d\n",
+			port);
+		return FAIL_RECONFIG;
+	}
+
+	/*
+	 * The earliest failure is not in the needed time frame, hence
+	 * we can remove it.
+	 */
+	count = kfifo_out(pcs_fifo, &before, sizeof(before));
+	if (count != sizeof(before))
+		dev_err(hdev->dev,
+			"Failed to pop from PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+
+	gaudi_nic->pcs_fail_cnt--;
+
+	return PCS_DOWN;
+}
+
+static void reset_tx(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int i;
+
+	/* This temporary WA is only for HLS external ports */
+	if ((hdev->card_type != cpucp_card_type_pmc) ||
+			(BIT(gaudi_nic->port) & ~hdev->nic_ports_ext_mask))
+		return;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++)
+		if (gaudi_nic->fw_tuning_mask & BIT(i))
+			gaudi_nic_phy_reset_tx(gaudi_nic, i);
+}
+
+static enum link_status _check_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port, pcs_val, mac_val,
+		start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+	int i, rc;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_check_link_status(gaudi_nic, i);
+		if (rc)
+			return PHY_DOWN;
+	}
+
+	/* need to check the first lane only */
+	mac_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "mac", 0x40);
+
+	if (mac_val & 1)
+		gaudi_nic->pcs_local_fault_cnt++;
+	else if (gaudi_nic->pcs_local_fault_cnt)
+		gaudi_nic->pcs_local_fault_cnt--;
+
+	if (mac_val & 2)
+		gaudi_nic->pcs_remote_fault_cnt++;
+	else if (gaudi_nic->pcs_remote_fault_cnt)
+		gaudi_nic->pcs_remote_fault_cnt--;
+
+	if (gaudi_nic->pcs_remote_fault_cnt == PCS_FAULT_THRESHOLD) {
+		dev_dbg(hdev->dev,
+			"PHY reconfig due to PCS remote fault cnt, port: %d\n",
+			port);
+		return FAULT_RECONFIG;
+	}
+
+	/* need to check the first lane only */
+	pcs_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs", 0x20);
+
+	if ((pcs_val >> 12) & 1)
+		return LINK_UP;
+
+	return PCS_DOWN;
+}
+
+static void check_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u32 port = gaudi_nic->port;
+	enum link_status link_status;
+
+	if (!gaudi->nic_check_link)
+		return;
+
+	link_status = _check_pcs_link(gaudi_nic);
+	if ((link_status == PCS_DOWN) || (link_status == PHY_DOWN)) {
+		/* Try again to overcome a momentary glitch */
+		msleep(PCS_LINK_RETRY_MSEC);
+
+		link_status = _check_pcs_link(gaudi_nic);
+
+		if (link_status == LINK_UP)
+			dev_info(hdev->dev, "PCS link restored, port %d\n",
+					port);
+	}
+
+	if (link_status == LINK_UP)
+		return;
+
+	set_port_status(gaudi_nic, false);
+	gaudi_nic->pcs_link = false;
+	gaudi_nic->last_pcs_link_drop_ts = ktime_get();
+
+	dev_info(hdev->dev, "%s lost signal, port %d\n",
+			(link_status == PHY_DOWN) ? "PHY" : "PCS", port);
+
+	/* TODO: fix the bug in the retimer to remove this Tx reset WA */
+	/*
+	 * No need to update about the PCS failure if we already need to
+	 * reconfigure the PHY.
+	 */
+	if (link_status == FAULT_RECONFIG)
+		reset_tx(gaudi_nic);
+	else
+		link_status = update_pcs_link_failure(gaudi_nic);
+
+	if ((link_status == FAULT_RECONFIG) ||
+			(link_status == FAIL_RECONFIG))
+		phy_reconfig(gaudi_nic);
+}
+
+static void acquire_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port, pcs_val,
+		start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+
+	/* need to check the first lane only */
+	pcs_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs", 0x20);
+	gaudi_nic->pcs_link = (pcs_val >> 12) & 1;
+	gaudi_nic->retry_cnt++;
+
+	if (gaudi_nic->pcs_link) {
+		dev_info(hdev->dev, "PCS link up, port %d\n", port);
+		set_port_status(gaudi_nic, true);
+		gaudi_nic->retry_cnt = 0;
+	} else if (gaudi_nic->retry_cnt == PCS_LINK_CNT) {
+		if (ktime_after(gaudi_nic->last_fw_tuning_ts,
+				gaudi_nic->last_pcs_link_drop_ts))
+			dev_dbg(hdev->dev,
+				"PHY_reconfig due to PCS link down after F/W tuning, port %d\n",
+				port);
+		else
+			dev_dbg(hdev->dev,
+				"PHY reconfig due to PCS link cnt, port %d\n",
+				port);
+		phy_reconfig(gaudi_nic);
+	}
+}
+
+static void do_fw_tuning(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int i, rc = 0;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_fw_tuning(gaudi_nic, i, true);
+		if (rc) {
+			if (rc == -EAGAIN) {
+				if (gaudi_nic->retry_cnt++ == FW_TUNING_CNT) {
+					dev_dbg(hdev->dev,
+						"PHY reconfig due to F/W tuning cnt, port %d, lane %d\n",
+						port, i);
+					phy_reconfig(gaudi_nic);
+				}
+			} else {
+				dev_dbg(hdev->dev,
+					"PHY F/W tuning failed for port %d, lane %d, rc %d\n",
+					port, i, rc);
+				phy_reconfig(gaudi_nic);
+			}
+			break;
+		}
+	}
+
+	if (!rc) {
+		gaudi_nic->phy_fw_tuned = true;
+		gaudi_nic->retry_cnt = 0;
+		gaudi_nic->last_fw_tuning_ts = ktime_get();
+	}
+}
+
+static void do_fw_tuning_auto_neg(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int i, rc;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->auto_neg_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_fw_tuning(gaudi_nic, i, false);
+		if (rc) {
+			if (rc != -EAGAIN)
+				dev_dbg(hdev->dev,
+					"PHY auto neg F/W tuning failed, port %d, lane %d, rc %d\n",
+					port, i, rc);
+			return;
+		}
+	}
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_config_pam4_link_training(gaudi_nic, i);
+		if (rc) {
+			if (rc == -EAGAIN) {
+				if (gaudi_nic->retry_cnt++ ==
+						FW_LINK_TRAINING_CNT) {
+					dev_dbg(hdev->dev,
+						"PHY reconfig due to PAM4 cnt, port: %d, lane: %d\n",
+						port, i);
+					phy_reconfig(gaudi_nic);
+				}
+			} else {
+				dev_dbg(hdev->dev,
+					"PHY auto neg F/W speed config failed, port %d, lane %d, rc %d\n",
+					port, i, rc);
+				phy_reconfig(gaudi_nic);
+			}
+
+			return;
+		}
+	}
+
+	dev_dbg(hdev->dev, "auto neg done, port: %d\n", port);
+	gaudi_nic->auto_neg_resolved = true;
+	gaudi_nic->retry_cnt = 0;
+	do_fw_tuning(gaudi_nic);
+}
+
+static void check_link_status(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							link_status_work.work);
+	u32 timeout_ms;
+
+	if (gaudi_nic->phy_fw_tuned) {
+		if (gaudi_nic->pcs_link)
+			check_pcs_link(gaudi_nic);
+		else
+			acquire_pcs_link(gaudi_nic);
+	} else {
+		if (gaudi_nic->auto_neg_enable && !gaudi_nic->auto_neg_resolved)
+			do_fw_tuning_auto_neg(gaudi_nic);
+		else
+			do_fw_tuning(gaudi_nic);
+	}
+
+	if (gaudi_nic->pcs_link)
+		timeout_ms = 1000;
+	else if (gaudi_nic->phy_fw_tuned)
+		timeout_ms = 500;
+	else
+		timeout_ms = 1;
+
+	schedule_delayed_work(&gaudi_nic->link_status_work,
+				msecs_to_jiffies(timeout_ms));
+}
+
 static int _gaudi_nic_sw_init(struct gaudi_nic_device *gaudi_nic)
 {
 	struct hl_device *hdev = gaudi_nic->hdev;
@@ -1576,7 +1967,13 @@ static int port_open(struct gaudi_nic_device *gaudi_nic)
 		napi_enable(&gaudi_nic->napi);
 	}
 
-	set_port_status(gaudi_nic, true);
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
+		INIT_DELAYED_WORK(&gaudi_nic->link_status_work,
+					check_link_status);
+		schedule_delayed_work(&gaudi_nic->link_status_work, 0);
+	} else {
+		set_port_status(gaudi_nic, true);
+	}
 
 	gaudi_nic->port_open = true;
 
@@ -1628,10 +2025,17 @@ static void port_close(struct gaudi_nic_device *gaudi_nic)
 	gaudi_nic->port_open = false;
 	gaudi_nic->active = false;
 
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback)
+		cancel_delayed_work_sync(&gaudi_nic->link_status_work);
+
 	/* Print if not in hard reset flow e.g. from ifconfig */
 	if (gaudi_nic->pcs_link && !hdev->hard_reset_pending)
 		dev_info(hdev->dev, "port %d was closed\n", port);
 
+	/* stop F/W so the peer port will also lose link */
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback)
+		phy_start_stop(gaudi_nic, false);
+
 	port_reset_state(gaudi_nic);
 
 	kfifo_free(&gaudi_nic->pcs_fail_fifo);
@@ -1882,6 +2286,19 @@ static int port_register(struct hl_device *hdev, int port)
 	ether_addr_copy(ndev->dev_addr,
 		hdev->asic_prop.cpucp_nic_info.mac_addrs[port].mac_addr);
 
+	/*
+	 * Reset the NIC macro PHY before the PHY configuration by each port.
+	 * This function resets all the 4 lanes in the PHY macro, therefore only
+	 * one of the two ports should call it.
+	 */
+	if (gaudi->nic_phy_config_fw && gaudi_nic->do_macro_cfg) {
+		rc = gaudi_nic_phy_reset_macro(gaudi_nic);
+		if (rc)
+			dev_err(hdev->dev,
+				"PHY power up 1 failed for port %d\n",
+				port);
+	}
+
 	if (register_netdev(ndev)) {
 		dev_err(hdev->dev,
 			"Could not register netdevice, port: %d\n", port);
@@ -2051,6 +2468,24 @@ int gaudi_nic_ports_init(struct hl_device *hdev)
 				cpu_to_le32((card_location >> 22) & 0x7);
 	}
 
+	if (gaudi->nic_phy_load_fw) {
+		rc = gaudi_nic_phy_has_fw(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "NIC F/W file was not found\n");
+			return rc;
+		}
+
+		rc = gaudi_nic_phy_fw_load_all(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "NIC F/W load for all failed\n");
+			return rc;
+		}
+	}
+
+	if (gaudi->nic_phy_config_fw)
+		dev_dbg(hdev->dev, "NIC F/W CRC: 0x%x\n",
+				gaudi_nic_phy_get_crc(hdev));
+
 	for (i = 0 ; i < NIC_NUMBER_OF_MACROS ; i++) {
 		gaudi->nic_macros[i].idx = i;
 		gaudi->nic_macros[i].num_of_lanes = NIC_LANES_2;
@@ -2272,6 +2707,21 @@ void gaudi_nic_ports_reopen(struct hl_device *hdev)
 		gaudi_nic = &gaudi->nic_devices[i];
 		port = gaudi_nic->port;
 
+		/*
+		 * Reset the NIC macro PHY before the PHY configuration by each
+		 * port. This function resets all the 4 lanes in the PHY macro,
+		 * therefore only one of the two ports should call it.
+		 * This must be called before we check if the port is enabled,
+		 * as the PHY reset should be called anyway.
+		 */
+		if (gaudi->nic_phy_config_fw && gaudi_nic->do_macro_cfg) {
+			rc = gaudi_nic_phy_reset_macro(gaudi_nic);
+			if (rc)
+				dev_err(hdev->dev,
+					"PHY power up 1 failed for port %d\n",
+					port);
+		}
+
 		/*
 		 * It could be that the port was shutdown by 'ifconfig down',
 		 * and there is no need in reopening it.
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.h b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
index 34bcf0514d30..1b2c42fb927c 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.h
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
@@ -333,5 +333,22 @@ int gaudi_nic_port_reset(struct gaudi_nic_device *gaudi_nic);
 bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic);
 u64 gaudi_nic_read_mac_stat_counter(struct hl_device *hdev, u32 port, int idx,
 					bool is_rx);
+int gaudi_nic_phy_reset_macro(struct gaudi_nic_device *gaudi_nic);
+int gaudi_nic_phy_power_up(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool do_auto_neg);
+int gaudi_nic_phy_has_fw(struct hl_device *hdev);
+int gaudi_nic_phy_fw_tuning(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool check_status);
+int gaudi_nic_phy_fw_load_all(struct hl_device *hdev);
+int gaudi_nic_phy_check_link_status(struct gaudi_nic_device *gaudi_nic,
+					int lane);
+int gaudi_nic_phy_config_pam4_link_training(struct gaudi_nic_device *gaudi_nic,
+						int lane);
+int gaudi_nic_phy_fw_config_auto_neg(struct gaudi_nic_device *gaudi_nic,
+					int lane);
+u16 gaudi_nic_phy_get_crc(struct hl_device *hdev);
+void gaudi_nic_phy_reset_tx(struct gaudi_nic_device *gaudi_nic, int lane);
+void gaudi_nic_phy_start_stop(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool is_start);
 
 #endif /* GAUDI_NIC_DRV_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_phy.c b/drivers/misc/habanalabs/gaudi/gaudi_phy.c
new file mode 100644
index 000000000000..e96188d9e47f
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_phy.c
@@ -0,0 +1,1272 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include "../include/gaudi/asic_reg/gaudi_regs.h"
+
+#include <linux/module.h>
+#include <linux/firmware.h>
+#include <asm/unaligned.h>
+
+#define HL_PHY_DEBUG 0
+
+#define GAUDI_PHY_FW_FILE	"habanalabs/gaudi/gaudi_nic_fw.bin"
+
+#define PHY_READ_COUNTS_PER_MS	1000
+#define PHY_FW_SIZE		0x1020
+#define PHY_FW_FINISHED		(1 << 2)
+#define PHY_FW_ERROR		(1 << 3)
+
+static void phy_write_all(struct hl_device *hdev, u32 addr, u32 data)
+{
+	int lane, port;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	for (port = 0 ; port < 10 ; port += 2)
+		for (lane = 0 ; lane < 4 ; lane++) {
+			NIC_MACRO_WREG32_PORT(phy_base + 0xF60 + lane * 4, addr,
+						port);
+			/* only the lower 16 bits are in use */
+			NIC_MACRO_WREG32_PORT(phy_base - 0x8000 + 0x2000 * lane,
+						data & 0xFFFF, port);
+		}
+}
+
+static void phy_write_port(struct hl_device *hdev, int port, int lane, u32 addr,
+				u32 data)
+{
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32_PORT(phy_base + 0xF60 + lane * 4, addr, port);
+	/* only the lower 16 bits are in use */
+	NIC_MACRO_WREG32_PORT(phy_base - 0x8000 + 0x2000 * lane, data & 0xFFFF,
+				port);
+}
+
+static void phy_write(struct gaudi_nic_device *gaudi_nic, int lane, u32 addr,
+			u32 data)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32(phy_base + 0xF60 + lane * 4, addr);
+	/* only the lower 16 bits are in use */
+	NIC_MACRO_WREG32(phy_base - 0x8000 + 0x2000 * lane, data & 0xFFFF);
+}
+
+static u32 phy_read_port(struct hl_device *hdev, int port, int lane, u32 addr)
+{
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32_PORT(phy_base + 0xF60 + lane * 4, addr, port);
+	/* only the lower 16 bits are in use */
+	return NIC_MACRO_RREG32_PORT(phy_base - 0x8000 + 0x2000 * lane, port) &
+					0xFFFF;
+}
+
+static u32 phy_read(struct gaudi_nic_device *gaudi_nic, int lane, u32 addr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32(phy_base + 0xF60 + lane * 4, addr);
+
+	/* only the lower 16 bits are in use */
+	return NIC_MACRO_RREG32(phy_base - 0x8000 + 0x2000 * lane) & 0xFFFF;
+}
+
+static void phy_write_mask(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 addr, u32 raw_data, u32 mask)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+	u32 data;
+
+	NIC_MACRO_WREG32(phy_base + 0xF60 + lane * 4, addr);
+
+	data = (NIC_MACRO_RREG32(phy_base - 0x8000 + 0x2000 * lane)) & 0xFFFF;
+	data = (data & ~mask) | (((raw_data << (__ffs(mask) % 32))) & 0xFFFF);
+
+	NIC_MACRO_WREG32(phy_base - 0x8000 + 0x2000 * lane, data);
+}
+
+static u32 twos_to_int(s32 twos_val, u32 bitWidth)
+{
+	return (u32) ((s32) (twos_val) -
+				((s32) ((twos_val << 1) & (1 << bitWidth))));
+}
+
+static int fw_cmd_port(struct hl_device *hdev, int port, int lane, u32 cmd,
+			u32 detail, u32 expected_res, u32 *res_ptr)
+{
+	u32 res, val;
+	int checks;
+
+	if (detail)
+		phy_write_port(hdev, port, lane, 0x9816, detail);
+
+	phy_write_port(hdev, port, lane, 0x9815, cmd);
+
+	checks = 0;
+	do {
+		usleep_range(1000, 2000);
+		res = phy_read_port(hdev, port, lane, 0x9815);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_err(hdev->dev, "timeout for PHY cmd 0x%x\n", cmd);
+			return -ETIMEDOUT;
+		}
+	} while (res == cmd);
+
+	val = (res >> 8) & 0xF;
+	if (val != expected_res) {
+		dev_err(hdev->dev, "cmd 0x%x returned error 0x%x\n", cmd, val);
+		return -EFAULT;
+	}
+
+	*res_ptr = res;
+
+	return 0;
+}
+
+static int fw_cmd(struct gaudi_nic_device *gaudi_nic, int lane, u32 cmd,
+			u32 detail, u32 expected_res, u32 *res_ptr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 res, val;
+	int checks;
+
+	if (detail)
+		phy_write(gaudi_nic, lane, 0x9816, detail);
+
+	phy_write(gaudi_nic, lane, 0x9815, cmd);
+
+	checks = 0;
+	do {
+		usleep_range(1000, 2000);
+		res = phy_read(gaudi_nic, lane, 0x9815);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_dbg(hdev->dev,
+				"timeout for PHY cmd 0x%x port %d lane %d\n",
+				cmd, port, lane);
+			return -ETIMEDOUT;
+		}
+	} while (res == cmd);
+
+	val = (res >> 8) & 0xF;
+	if (val != expected_res) {
+		dev_dbg(hdev->dev,
+			"cmd 0x%x returned error 0x%x port %d lane %d\n", cmd,
+			val, port, lane);
+		return -EFAULT;
+	}
+
+	*res_ptr = res;
+
+	return 0;
+}
+
+static int fw_hash_port(struct hl_device *hdev, int port, int lane, u32 *hash)
+{
+	u32 res, low_word;
+	int rc;
+
+	rc = fw_cmd_port(hdev, port, lane, 0xF000, 0, 0xF, &res);
+	if (rc) {
+		dev_err(hdev->dev, "F/W hash failed for port %d lane %d\n",
+			port, lane);
+		return rc;
+	}
+
+	low_word = phy_read_port(hdev, port, lane, 0x9816);
+
+	*hash = ((res & 0xFF) << 16) | low_word;
+
+	return 0;
+}
+
+static void set_pll(struct gaudi_nic_device *gaudi_nic, int lane, u32 data_rate,
+			bool pam4)
+{
+	u32 pll_n_val = 0, pll_cap_val = 0;
+	bool div4 = 1; /* for easy debug in the future */
+
+	phy_write_mask(gaudi_nic, lane, 0xFF, 1, 1 << 5);
+
+	if (!pam4)
+		phy_write_mask(gaudi_nic, lane, 0x179, data_rate == NIC_DR_10,
+				1);
+
+	if (data_rate == NIC_DR_50) {
+		if (div4)
+			pll_n_val = 170;
+		else
+			pll_n_val = 42;
+
+		pll_cap_val = 10;
+	} else if (data_rate == NIC_DR_25) {
+		if (div4)
+			pll_n_val = 165;
+		else
+			pll_n_val = 41;
+
+		pll_cap_val = 12;
+	} else if (data_rate == NIC_DR_10) {
+		if (div4)
+			pll_n_val = 132;
+		else
+			pll_n_val = 33;
+
+		pll_cap_val = 34;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xFD, pll_n_val, 0xFF80);
+	phy_write_mask(gaudi_nic, lane, 0xFC, pll_cap_val, 0xFC00);
+}
+
+static void set_tx_taps(struct gaudi_nic_device *gaudi_nic, int lane,
+			s32 tx_pre2, s32 tx_pre1, s32 tx_main, s32 tx_post1,
+			s32 tx_post2)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAD, twos_to_int(tx_pre2, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xAB, twos_to_int(tx_pre1, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA9, twos_to_int(tx_main, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA7, twos_to_int(tx_post1, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA5, twos_to_int(tx_post2, 8), 0xFF00);
+}
+
+static void config_nrz_tx(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool half_rate)
+{
+	phy_write(gaudi_nic, lane, 0xAF, 0xF83E);
+	phy_write(gaudi_nic, lane, 0xB0, 0x4802);
+	phy_write_mask(gaudi_nic, lane, 0xB0, half_rate ? 1 : 0, 1);
+	phy_write_mask(gaudi_nic, lane, 0xB0, 0, 0x800);
+	phy_write_mask(gaudi_nic, lane, 0xB0, 1, 0x800);
+	phy_write(gaudi_nic, lane, 0xA0, 0xE300);
+	set_tx_taps(gaudi_nic, lane, 0, -4, 25, 0, 0);
+}
+
+static void config_pam4_tx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 lane_idx = (gaudi_nic->port >> 1) * NIC_MAC_NUM_OF_LANES + lane;
+	s32 *taps;
+
+	taps = gaudi->nic_pam4_tx_taps[lane_idx].taps;
+
+	phy_write(gaudi_nic, lane, 0xAF, 0xF83E);
+	phy_write(gaudi_nic, lane, 0xB0, 0);
+	phy_write(gaudi_nic, lane, 0xB0, 0x800);
+	phy_write(gaudi_nic, lane, 0xB0, 0);
+	phy_write(gaudi_nic, lane, 0xA0, 0xEF00);
+	set_tx_taps(gaudi_nic, lane, taps[0], taps[1], taps[2], taps[3],
+			taps[4]);
+}
+
+static void pol(struct gaudi_nic_device *gaudi_nic, int lane, bool pam4,
+		u32 tx_pol, u32 rx_pol)
+{
+	phy_write_mask(gaudi_nic, lane, 0xA0, tx_pol, 0x20);
+	phy_write_mask(gaudi_nic, lane, 0x161, rx_pol, 0x4000); /* nrz */
+	phy_write_mask(gaudi_nic, lane, 0x43, rx_pol, 0x80); /* pam4 */
+}
+
+static void msblsb(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_msblsb,
+		u32 rx_msblsb)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_msblsb, 0x400);
+	phy_write_mask(gaudi_nic, lane, 0x43, rx_msblsb, 0x8000);
+}
+
+static void gc(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_gc,
+		u32 rx_gc)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_gc, 0x200);
+	phy_write_mask(gaudi_nic, lane, 0x42, rx_gc, 1);
+}
+
+static void pc(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_pc,
+		u32 rx_pc)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_pc, 0x100);
+	phy_write_mask(gaudi_nic, lane, 0x42, rx_pc, 2);
+}
+
+static void set_prbs_type(struct gaudi_nic_device *gaudi_nic, int lane,
+		bool pam4, char *pat)
+{
+	u32 prbs_mode_sel_addr;
+	u32 prbs_mode_sel_mask;
+	u32 pat_sel = 0;
+
+	if (pam4) {
+		prbs_mode_sel_addr = 0x43;
+		prbs_mode_sel_mask = 0x60;
+	} else {
+		prbs_mode_sel_addr = 0x161;
+		prbs_mode_sel_mask = 0x3000;
+	}
+
+	if (pam4) {
+		if (!strncmp(pat, "PRBS9", strlen(pat)))
+			pat_sel = 0;
+		else if (!strncmp(pat, "PRBS13", strlen(pat)))
+			pat_sel = 1;
+		else if (!strncmp(pat, "PRBS15", strlen(pat)))
+			pat_sel = 2;
+		else if (!strncmp(pat, "PRBS31", strlen(pat)))
+			pat_sel = 3;
+	} else {
+		if (!strncmp(pat, "PRBS9", strlen(pat)))
+			pat_sel = 0;
+		else if (!strncmp(pat, "PRBS15", strlen(pat)))
+			pat_sel = 1;
+		else if (!strncmp(pat, "PRBS23", strlen(pat)))
+			pat_sel = 2;
+		else if (!strncmp(pat, "PRBS31", strlen(pat)))
+			pat_sel = 3;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xA0, pat_sel, 0x300);
+	phy_write_mask(gaudi_nic, lane, prbs_mode_sel_addr, pat_sel,
+			prbs_mode_sel_mask);
+}
+
+static void get_pol_tx_rx(struct gaudi_nic_device *gaudi_nic, u32 lane_idx,
+				u32 *pol_tx, u32 *pol_rx)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 card_location =
+			le32_to_cpu(hdev->asic_prop.cpucp_info.card_location);
+
+	switch (hdev->card_type) {
+	case cpucp_card_type_pci:
+		switch (lane_idx) {
+		case 0 ... 3:
+		case 10 ... 11:
+			*pol_tx = 0;
+			*pol_rx = 0;
+			break;
+		case 5 ... 8:
+		case 12:
+		case 16:
+			*pol_tx = 0;
+			*pol_rx = 1;
+			break;
+		case 15:
+		case 19:
+			*pol_tx = 1;
+			*pol_rx = 0;
+			break;
+		case 4:
+		case 9:
+		case 13 ... 14:
+		case 17 ... 18:
+			*pol_tx = 1;
+			*pol_rx = 1;
+			break;
+		default:
+			dev_err(hdev->dev, "PCI NIC %d wrong lane idx %d\n",
+				gaudi_nic->port, lane_idx);
+			break;
+		}
+		break;
+
+	case cpucp_card_type_pmc:
+		*pol_tx = *pol_rx = 0;
+		switch (card_location) {
+		case 0:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 1:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 2:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 3:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 4:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 5:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 10:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 6:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 7:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		}
+		break;
+	default:
+		dev_err(hdev->dev, "wrong card type %d\n", hdev->card_type);
+		break;
+	}
+}
+
+static void config_connection(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool pam4, bool do_auto_neg)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct cpucp_nic_info *nic_info = &hdev->asic_prop.cpucp_nic_info;
+	char *prbs = "PRBS31";
+	u32 pol_tx = 0;
+	u32 pol_rx = 0;
+	u32 msblsb_tx = 0;
+	u32 msblsb_rx = 0;
+	u32 gc_tx = 1;
+	u32 gc_rx = 1;
+	u32 pc_tx = 0;
+	u32 pc_rx = 0;
+	u32 lane_idx = (gaudi_nic->port >> 1) * NIC_MAC_NUM_OF_LANES + lane;
+
+	if (!pam4)
+		gc_tx = gc_rx = 0;
+
+	if (gaudi->nic_use_fw_polarity) {
+		pol_tx =
+			(le64_to_cpu(nic_info->pol_tx_mask[0]) >> lane_idx) & 1;
+		pol_rx =
+			(le64_to_cpu(nic_info->pol_rx_mask[0]) >> lane_idx) & 1;
+	} else {
+		get_pol_tx_rx(gaudi_nic, lane_idx, &pol_tx, &pol_rx);
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xF7, 1, 0x1000);
+	pol(gaudi_nic, lane, pam4, pol_tx, pol_rx);
+	msblsb(gaudi_nic, lane, msblsb_tx, msblsb_rx);
+	gc(gaudi_nic, lane, gc_tx, gc_rx);
+	pc(gaudi_nic, lane, pc_tx, pc_rx);
+
+	set_prbs_type(gaudi_nic, lane, pam4, prbs);
+}
+
+static void functional_mode(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool pam4)
+{
+	if (!pam4) {
+		phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+		phy_write_mask(gaudi_nic, lane, 0x161, 0, 0x400);
+	} else {
+		phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+		phy_write_mask(gaudi_nic, lane, 0x43, 0, 0x10);
+	}
+}
+
+static u32 get_fw_reg(struct gaudi_nic_device *gaudi_nic, int lane, u32 fw_addr)
+{
+	u32 ignore;
+
+	fw_cmd(gaudi_nic, lane, 0xE010, fw_addr, 0xE, &ignore);
+
+	return phy_read(gaudi_nic, lane, 0x9812);
+}
+
+static void config_pam4_fw_rx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x1000);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0400);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0800);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0200);
+
+	phy_write(gaudi_nic, lane, 0x43, 0x8CFA);
+	phy_write(gaudi_nic, lane, 0x44, 0x1035);
+	phy_write(gaudi_nic, lane, 0x45, 0x1008);
+}
+
+static int fw_config_speed_nrz(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 data_rate, u32 speed, bool half_rate,
+				bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 ignore;
+	int rc, i;
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80C0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed nrz configuration of lane %d\n",
+			lane);
+		return rc;
+	}
+
+	config_nrz_tx(gaudi_nic, lane, half_rate);
+	phy_write_mask(gaudi_nic, lane, 0x0161, 0x1D, 0xFC00);
+	config_connection(gaudi_nic, lane, pam4, false);
+	functional_mode(gaudi_nic, lane, pam4);
+
+	/* clock configuration */
+	for (i = 0 ; i < 4 ; i++)
+		if (i == 0)
+			phy_write(gaudi_nic, i, 0x00C9, 0x390);
+		else
+			phy_write(gaudi_nic, i, 0x00C9, 0x310);
+
+	set_pll(gaudi_nic, lane, data_rate, pam4);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+int gaudi_nic_phy_fw_config_auto_neg(struct gaudi_nic_device *gaudi_nic,
+					int lane)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 ignore;
+	u64 basepage = 0x000080000001;
+	int rc;
+
+	usleep_range(500, 1000);
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	set_pll(gaudi_nic, lane, NIC_DR_25, false);
+
+	/* Disable AN/LT lane swapping */
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+	config_nrz_tx(gaudi_nic, lane, 0);
+
+	/* config_nrz_fw_rx */
+	phy_write_mask(gaudi_nic, lane, 0x0161, 0x1D, 0x0);
+	config_connection(gaudi_nic, lane, false, true);
+
+	phy_write_mask(gaudi_nic, lane, 0x8300, 7, 0xE000);
+
+	/* AN mode */
+	phy_write(gaudi_nic, lane, 0x8010, basepage & 0xffff);
+	phy_write(gaudi_nic, lane, 0x8011, (basepage >> 16) & 0xffff);
+	phy_write(gaudi_nic, lane, 0x8012, (basepage >> 32) & 0xffff);
+
+	/* IEEE */
+	phy_write_mask(gaudi_nic, lane, 0x8300, 1, 0x1000);
+
+	if (gaudi->nic_phy_auto_neg_lpbk)
+		phy_write_mask(gaudi_nic, lane, 0x8300, 1, 0x400);
+
+	/* set FW to start AN */
+	rc = fw_cmd(gaudi_nic, lane, 0x8000, 0, 8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd 0x8000 failed for auto neg, port %d, lane %d\n",
+			gaudi_nic->port, lane);
+		return rc;
+	}
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+static int fw_config_speed_pam4(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 data_rate, u32 speed, bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 ignore;
+	int rc;
+
+	dev_dbg(hdev->dev,
+		"port: %d, lane: %d, data rate: %d, pam4: %d, speed: %d\n",
+		gaudi_nic->port, lane, data_rate, pam4, speed);
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80D0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed pam4 configuration of lane %d\n",
+			lane);
+		return rc;
+	}
+
+	config_pam4_tx(gaudi_nic, lane);
+	config_pam4_fw_rx(gaudi_nic, lane);
+	config_connection(gaudi_nic, lane, pam4, false);
+	functional_mode(gaudi_nic, lane, pam4);
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+int gaudi_nic_phy_config_pam4_link_training(struct gaudi_nic_device *gaudi_nic,
+						int lane)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 ignore, speed = 9;
+	int rc;
+
+#if HL_PHY_DEBUG
+	dev_dbg(hdev->dev, "NIC %d lane: %d, speed: %d\n", port, lane, speed);
+#endif
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	/* Disable lane swapping */
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+
+	/* Enable Link Training */
+	speed |= 0x100;
+
+	config_pam4_tx(gaudi_nic, lane);
+	phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+	config_pam4_fw_rx(gaudi_nic, lane);
+	config_connection(gaudi_nic, lane, true, false);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80D0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed pam4 configuration of port %d lane %d\n",
+			port, lane);
+		return rc;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xAF, 0, 0x200);
+	phy_write_mask(gaudi_nic, lane, 0xAF, 0, 0x100);
+	phy_write_mask(gaudi_nic, lane, 0x42, 0, 0x2);
+	phy_write_mask(gaudi_nic, lane, 0x42, 0, 0x1);
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+static int fw_config(struct gaudi_nic_device *gaudi_nic, int lane,
+			u32 data_rate, bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	set_pll(gaudi_nic, lane, data_rate, pam4);
+
+	if (data_rate == NIC_DR_10)
+		return fw_config_speed_nrz(gaudi_nic, lane, data_rate, 1, 1,
+						fmode, pam4);
+	else if (data_rate == NIC_DR_25 || data_rate == NIC_DR_26)
+		return fw_config_speed_nrz(gaudi_nic, lane, data_rate, 3, 0,
+						fmode, pam4);
+	else if (data_rate == NIC_DR_50)
+		return fw_config_speed_pam4(gaudi_nic, lane, data_rate, 9,
+						fmode, pam4);
+
+	dev_err(hdev->dev, "invalid data_rate %d\n", data_rate);
+
+	return -EFAULT;
+}
+
+static int fw_crc_port(struct hl_device *hdev, int port, int lane, u16 *crc)
+{
+	u32 res;
+	int rc;
+
+	rc = fw_cmd_port(hdev, port, lane, 0xF001, 0, 0xF, &res);
+	if (rc) {
+		dev_err(hdev->dev, "F/W crc failed for port %d lane %d\n", port,
+			lane);
+		return rc;
+	}
+
+	*crc = phy_read_port(hdev, port, lane, 0x9816) & 0xFFFF;
+
+	return 0;
+}
+
+int gaudi_nic_phy_has_fw(struct hl_device *hdev)
+{
+	const struct firmware *fw;
+	int rc;
+
+	rc = request_firmware(&fw, GAUDI_PHY_FW_FILE, hdev->dev);
+	if (rc) {
+		dev_err(hdev->dev, "Firmware file %s is not found!\n",
+				GAUDI_PHY_FW_FILE);
+		return rc;
+	}
+
+	if (fw->size < PHY_FW_SIZE) {
+		dev_err(hdev->dev, "Illegal %s firmware size %zu\n",
+				GAUDI_PHY_FW_FILE, fw->size);
+		rc = -EFAULT;
+	}
+
+	release_firmware(fw);
+
+	return rc;
+}
+
+static void fw_unload_all(struct hl_device *hdev, bool pam4)
+{
+	phy_write_all(hdev, 0x9814, 0xFFF0);
+	phy_write_all(hdev, 0x980D, 0xAAA);
+	phy_write_all(hdev, 0x980D, 0);
+
+	msleep(100);
+
+	phy_write_all(hdev, 0x9814, 0);
+
+	if (pam4)
+		phy_write_all(hdev, 0x11, 0);
+	else
+		phy_write_all(hdev, 0x10B, 0);
+}
+
+u16 gaudi_nic_phy_get_crc(struct hl_device *hdev)
+{
+	u16 crc = 0;
+
+	fw_crc_port(hdev, 0, 0, &crc);
+
+	return crc;
+}
+
+int gaudi_nic_phy_fw_load_all(struct hl_device *hdev)
+{
+	const struct firmware *fw;
+	const void *fw_data;
+	u32 entry_point, length, ram_addr, sections, status, checks, hash = 0,
+		checksum = 0x800C, fw0 = 0x9F00, fw1 = 0x980D, fw2 = 0x9814;
+	u16 mdio_data, crc = 0;
+	int rc, i, j, port, data_ptr = 0, lane = 0;
+	bool pam4 = true; /* for debug */
+
+	fw_unload_all(hdev, pam4);
+
+	rc = request_firmware(&fw, GAUDI_PHY_FW_FILE, hdev->dev);
+	if (rc) {
+		dev_err(hdev->dev, "Firmware file %s is not found!\n",
+				GAUDI_PHY_FW_FILE);
+		return rc;
+	}
+
+	if (fw->size < PHY_FW_SIZE) {
+		dev_err(hdev->dev, "Illegal %s firmware size %zu\n",
+				GAUDI_PHY_FW_FILE, fw->size);
+		release_firmware(fw);
+		return -EFAULT;
+	}
+
+	fw_data = (const void *) fw->data;
+	fw_data += 0x1000;
+
+	/* skip hash, crc and date */
+	entry_point = get_unaligned_be32(fw_data + 8);
+	length = get_unaligned_be32(fw_data + 12);
+	ram_addr = get_unaligned_be32(fw_data + 16);
+
+	dev_dbg(hdev->dev, "entry_point: 0x%x\n", entry_point);
+	dev_dbg(hdev->dev, "length: 0x%x\n", length);
+
+	fw_data += 20;
+
+	sections = DIV_ROUND_UP(length, 24);
+
+	dev_dbg(hdev->dev, "sections: %d\n", sections);
+
+	phy_write_all(hdev, fw2, 0xFFF0);
+	phy_write_all(hdev, fw1, 0x0AAA);
+	phy_write_all(hdev, fw1, 0);
+
+	msleep(500);
+
+	checks = 0;
+	do {
+		usleep_range(10000, 20000);
+		status = phy_read_port(hdev, 0, 0, fw2);
+		dev_dbg(hdev->dev, "lane: %d, status: 0x%x\n", lane, status);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_err(hdev->dev,
+				"failed to load NIC F/W, fw2 timeout 0x%x\n",
+				status);
+			release_firmware(fw);
+			return -ETIMEDOUT;
+		}
+	} while (status);
+
+	phy_write_all(hdev, fw2, 0);
+
+	for (i = 0 ; i <= sections ; i++) {
+		checksum = 0x800C;
+		phy_write_all(hdev, fw0 + 12, ram_addr >> 16);
+		phy_write_all(hdev, fw0 + 13, ram_addr & 0xFFFF);
+		checksum += (ram_addr >> 16) + (ram_addr & 0xFFFF);
+		for (j = 0 ; j < 12 ; j++) {
+			if (data_ptr >= length)
+				mdio_data = 0;
+			else
+				mdio_data =
+					get_unaligned_be16(fw_data + data_ptr);
+
+			phy_write_all(hdev, fw0 + j, mdio_data);
+			checksum += mdio_data;
+			data_ptr += 2;
+			ram_addr += 2;
+		}
+
+		phy_write_all(hdev, fw0 + 14, (~checksum + 1) & 0xFFFF);
+		phy_write_all(hdev, fw0 + 15, 0x800C);
+
+		checks = 0;
+
+		do {
+			usleep_range(1000, 2000);
+			status = phy_read_port(hdev, 0, 0, fw0 + 15);
+			if (checks++ > PHY_READ_COUNTS_PER_MS) {
+				dev_err(hdev->dev,
+					"failed to load NIC F/W, fw0 timeout 0x%x\n",
+					status);
+				release_firmware(fw);
+				return -ETIMEDOUT;
+			}
+		} while (status == 0x800C);
+	}
+
+	phy_write_all(hdev, fw0 + 12, entry_point >> 16);
+	phy_write_all(hdev, fw0 + 13, entry_point & 0xFFFF);
+	checksum = (entry_point >> 16) + (entry_point & 0xFFFF) + 0x4000;
+	phy_write_all(hdev, fw0 + 14, (~checksum + 1) & 0xFFFF);
+	phy_write_all(hdev, fw0 + 15, 0x4000);
+
+	for (port = 0 ; port < 1 ; port += 2)
+		for (lane = 0 ; lane < 1 ; lane++) {
+			fw_crc_port(hdev, port, lane, &crc);
+			dev_dbg(hdev->dev, "port: %d lane: %d crc: 0x%x\n",
+				port, lane, crc);
+			fw_hash_port(hdev, port, lane, &hash);
+			dev_dbg(hdev->dev, "port: %d lane: %d hash: 0x%x\n",
+				port, lane, hash);
+		}
+
+	return 0;
+}
+
+static u32 fw_tuning_counter(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	return get_fw_reg(gaudi_nic, lane, 5);
+}
+
+static u32 fw_reset_counter(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	return get_fw_reg(gaudi_nic, lane, 4);
+}
+
+static void print_eye(struct gaudi_nic_device *gaudi_nic, int lane, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 dac, eye, mask, val1, val2;
+	s32 plus_margin, minus_margin, result, diff;
+	int pam4_eye[3], eye_index, i, sel;
+
+	if (pam4) {
+		dac = (phy_read(gaudi_nic, lane, 0x28) & 0x1E0) >> 5;
+		for (eye_index = 0; eye_index < 3; eye_index++) {
+			result = 0xffff;
+			for (i = 0; i < 3; i++) {
+				sel = 3 * i + eye_index;
+				phy_write_mask(gaudi_nic, lane, 0x88, sel,
+						0xF00);
+				phy_write_mask(gaudi_nic, lane, 0x88, sel,
+						0xF000);
+
+				msleep(100);
+
+				val1 = phy_read(gaudi_nic, lane, 0x32);
+				plus_margin = (val1 & 0xFFF0) >> 4;
+				if (plus_margin > 0x7ff)
+					plus_margin = plus_margin - 0x1000;
+
+				val1 = phy_read(gaudi_nic, lane, 0x32);
+				val2 = phy_read(gaudi_nic, lane, 0x33);
+				minus_margin = ((val1 & 0xF) << 8) +
+						((val2 & 0xFF00) >> 8);
+				if (minus_margin > 0x7ff)
+					minus_margin = minus_margin - 0x1000;
+
+				diff = plus_margin - minus_margin;
+				if (diff < result)
+					result = diff;
+			}
+
+			pam4_eye[eye_index] =
+					(result * (100 + (50 * dac))) / 2048;
+		}
+
+		dev_dbg(hdev->dev,
+			"NIC PAM4 dac: %d eye0: %d eye1: %d eye2: %d\n", dac,
+			pam4_eye[0], pam4_eye[1], pam4_eye[2]);
+	} else {
+		mask = 0xF000;
+		dac = (phy_read(gaudi_nic, lane, 0x17F) & mask) >> __ffs(mask);
+		mask = 0xFFF;
+		eye = (phy_read(gaudi_nic, lane, 0x12A) & mask) >> __ffs(mask);
+
+		dev_dbg(hdev->dev, "dac: %d, eye: %d\n", dac, eye);
+
+		if (eye > 0)
+			dev_dbg(hdev->dev,
+				"NIC port %d lane %d: F/W eye is %d\n",
+				gaudi_nic->port, lane,
+				(eye * (200 + 50 * dac)) / 2048);
+		else
+			dev_err(hdev->dev,
+				"NIC port %d lane %d: F/W got no eye\n",
+				gaudi_nic->port, lane);
+	}
+}
+
+int gaudi_nic_phy_check_link_status(struct gaudi_nic_device *gaudi_nic,
+					int lane)
+{
+	u32 phy_status;
+#if HL_PHY_DEBUG
+	bool signal_detect;
+#endif
+	bool phy_ready, pam4 = gaudi_nic->data_rate == NIC_DR_50;
+
+	if (pam4) {
+		phy_status = phy_read(gaudi_nic, lane, 0x6A);
+		phy_ready = ((phy_status & 0x8000) >> 15) & 1;
+#if HL_PHY_DEBUG
+		signal_detect = ((phy_status & 0x80) >> 7) & 1;
+#endif
+	} else {
+		phy_status = phy_read(gaudi_nic, lane, 0x12E);
+		phy_ready = ((phy_status & 0x4) >> 2) & 1;
+#if HL_PHY_DEBUG
+		signal_detect = ((phy_status & 0x8) >> 3) & 1;
+#endif
+	}
+
+#if HL_PHY_DEBUG
+	{
+		struct hl_device *hdev = gaudi_nic->hdev;
+
+		dev_dbg_ratelimited(hdev->dev,
+			"port: %d, lane, %d, phy ready: %d, signal detect: %d\n",
+			gaudi_nic->port, lane, phy_ready, signal_detect);
+	}
+#endif
+
+	return phy_ready ? 0 : -EFAULT;
+}
+
+int gaudi_nic_phy_fw_tuning(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool check_status)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 status, port = gaudi_nic->port;
+	bool pam4 = gaudi_nic->data_rate == NIC_DR_50;
+
+	fw_tuning_counter(gaudi_nic, lane);
+	fw_reset_counter(gaudi_nic, lane);
+	status = phy_read(gaudi_nic, lane, 0x9811);
+
+	if (status & PHY_FW_FINISHED) {
+		if (status & PHY_FW_ERROR) {
+			dev_dbg(hdev->dev, "NIC %d lane %d F/W tuning failed\n",
+				port, lane);
+			return -EFAULT;
+		}
+#if HL_PHY_DEBUG
+		dev_dbg(hdev->dev,
+			"NIC %d lane %d F/W Tuning is done\n", port, lane);
+#endif
+	} else {
+		return -EAGAIN;
+	}
+
+	if (!gaudi_nic->auto_neg_enable) {
+		phy_write_mask(gaudi_nic, lane, 0x14D, 1, 1 << 15);
+		print_eye(gaudi_nic, lane, pam4);
+	} else if (!check_status) {
+		return 0;
+	}
+
+	return gaudi_nic_phy_check_link_status(gaudi_nic, lane);
+}
+
+int gaudi_nic_phy_power_up(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool do_auto_neg)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 data_rate = gaudi_nic->data_rate;
+	bool pam4 = data_rate == NIC_DR_50, fmode = 0;
+	int rc;
+
+	dev_dbg(hdev->dev, "PHY power up port %d lane %d auto_neg: %d\n",
+		gaudi_nic->port, lane, do_auto_neg);
+
+	/* F/W configurations */
+	if (gaudi_nic->auto_neg_enable) {
+		if (do_auto_neg) {
+			rc = gaudi_nic_phy_fw_config_auto_neg(gaudi_nic, lane);
+			if (rc) {
+				dev_err(hdev->dev,
+					"F/W configuration failed for NIC PHY\n");
+				return rc;
+			}
+		}
+	} else {
+		rc = fw_config(gaudi_nic, lane, data_rate, fmode, pam4);
+		if (rc) {
+			dev_err(hdev->dev,
+				"F/W configuration failed for NIC PHY\n");
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+int gaudi_nic_phy_reset_macro(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	s32 chip_reset_addr = 0x980D;
+	bool fmode = 0;
+	int rc, i;
+
+	dev_dbg(hdev->dev, "PHY reset macro, port %d\n", gaudi_nic->port);
+
+	/* soft reset */
+	for (i = 0 ; i < 4 ; i++)
+		phy_write(gaudi_nic, i, chip_reset_addr, 0x888);
+
+	usleep_range(500, 1000);
+
+	/* clock configuration */
+	for (i = 0 ; i < 4 ; i++)
+		if (i == 0)
+			phy_write(gaudi_nic, i, 0x00C9, 0x390);
+		else
+			phy_write(gaudi_nic, i, 0x00C9, 0x310);
+
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, 0x8000, 0xC000);
+		phy_write(gaudi_nic, i, 0x8210, 0);
+		phy_write(gaudi_nic, i, 0x8100, 0);
+	}
+
+	/* PHY controller reset - to force F/W to start from pointer 0 */
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, chip_reset_addr, 0xAAA);
+		phy_write(gaudi_nic, i, chip_reset_addr, 0);
+	}
+
+	/* force the lane pll to run in PAM4 before logical reset */
+	for (i = 0 ; i < 4 ; i++) {
+		rc = fw_config(gaudi_nic, i, NIC_DR_50, fmode, true);
+		if (rc) {
+			dev_err(hdev->dev,
+				"F/W configuration failed for NIC PHY\n");
+			return rc;
+		}
+	}
+
+	/* logic reset */
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, chip_reset_addr, 0x777);
+		phy_write(gaudi_nic, i, chip_reset_addr, 0);
+	}
+
+	usleep_range(500, 1000);
+
+	return 0;
+}
+
+void gaudi_nic_phy_reset_tx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	u32 val;
+
+	/* disable TX */
+	val = phy_read(gaudi_nic, lane, 0xA0);
+	/* set bit 13 to 1 */
+	val |= 0x2000;
+	/* set bit 11 to 0 */
+	val &= ~0x800;
+	phy_write(gaudi_nic, lane, 0xA0, val);
+
+	msleep(500);
+
+	/* enable TX */
+	val = phy_read(gaudi_nic, lane, 0xA0);
+	/* set bit 13 to 0 */
+	val &= ~0x2000;
+	phy_write(gaudi_nic, lane, 0xA0, val);
+}
+
+void gaudi_nic_phy_start_stop(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool is_start)
+{
+	if (is_start) {
+		/* Enable TX driver in SerDes */
+		phy_write_mask(gaudi_nic, lane, 0xE3, 1, 0x2000);
+		/* Enable F/W Rx tuning is done during power up sequence */
+	} else {
+		/* Disable TX driver in SerDes */
+		phy_write_mask(gaudi_nic, lane, 0xE3, 0, 0x2000);
+		/* Silence F/W Rx tuning */
+		phy_write(gaudi_nic, lane, 0x9815, 0x9000);
+	}
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/15] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (4 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 06/15] habanalabs/gaudi: add NIC PHY code Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 08/15] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

The user needs this information when working in a distributed environment
with master/slave configuration. All the slaves get their MAC addresses
from the driver and send them to the master.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/habanalabs.h   |  5 +++
 .../misc/habanalabs/common/habanalabs_ioctl.c | 31 +++++++++++++++++++
 drivers/misc/habanalabs/gaudi/gaudi.c         |  1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |  2 ++
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 27 ++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |  9 ++++++
 include/uapi/misc/habanalabs.h                | 20 +++++++++++-
 7 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index f99db3483ba4..6bfef3da6e61 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -602,6 +602,8 @@ enum div_select_defs {
 	DIV_SEL_DIVIDED_PLL = 3,
 };
 
+struct hl_info_mac_addr;
+
 /**
  * struct hl_asic_funcs - ASIC specific functions that are can be called from
  *                        common code.
@@ -679,6 +681,7 @@ enum div_select_defs {
  * @get_hw_state: retrieve the H/W state
  * @pci_bars_map: Map PCI BARs.
  * @init_iatu: Initialize the iATU unit inside the PCI controller.
+ * @get_mac_addr: Get list of MAC addresses.
  * @rreg: Read a register. Needed for simulator support.
  * @wreg: Write a register. Needed for simulator support.
  * @halt_coresight: stop the ETF and ETR traces.
@@ -782,6 +785,8 @@ struct hl_asic_funcs {
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
 	int (*pci_bars_map)(struct hl_device *hdev);
 	int (*init_iatu)(struct hl_device *hdev);
+	int (*get_mac_addr)(struct hl_device *hdev,
+				struct hl_info_mac_addr *mac_addr);
 	u32 (*rreg)(struct hl_device *hdev, u32 reg);
 	void (*wreg)(struct hl_device *hdev, u32 reg, u32 val);
 	void (*halt_coresight)(struct hl_device *hdev);
diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 07317ea49129..01425b821828 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -203,6 +203,33 @@ static int debug_coresight(struct hl_device *hdev, struct hl_debug_args *args)
 	return rc;
 }
 
+static int mac_addr_info(struct hl_device *hdev, struct hl_info_args *args)
+{
+	struct hl_info_mac_addr *mac_addr;
+	u32 max_size = args->return_size;
+	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
+	int rc;
+
+	if (!max_size || !out)
+		return -EINVAL;
+
+	mac_addr = kzalloc(sizeof(struct hl_info_mac_addr), GFP_KERNEL);
+	if (!mac_addr)
+		return -ENOMEM;
+
+	rc = hdev->asic_funcs->get_mac_addr(hdev, mac_addr);
+	if (rc)
+		goto out;
+
+	rc = copy_to_user(out, mac_addr,
+		min((size_t) max_size, sizeof(struct hl_info_mac_addr))) ?
+								-EFAULT : 0;
+
+out:
+	kfree(mac_addr);
+	return rc;
+}
+
 static int device_utilization(struct hl_device *hdev, struct hl_info_args *args)
 {
 	struct hl_info_device_utilization device_util = {0};
@@ -423,6 +450,10 @@ static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
 		rc = hw_idle(hdev, args);
 		break;
 
+	case HL_INFO_MAC_ADDR:
+		rc = mac_addr_info(hdev, args);
+		break;
+
 	case HL_INFO_DEVICE_UTILIZATION:
 		rc = device_utilization(hdev, args);
 		break;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index d350519a9e31..8ce20e0f8c59 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7470,6 +7470,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.get_hw_state = gaudi_get_hw_state,
 	.pci_bars_map = gaudi_pci_bars_map,
 	.init_iatu = gaudi_init_iatu,
+	.get_mac_addr = gaudi_nic_get_mac_addr,
 	.rreg = hl_rreg,
 	.wreg = hl_wreg,
 	.halt_coresight = gaudi_halt_coresight,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index bf3a215e0f8e..17560510a05f 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -566,6 +566,8 @@ void gaudi_nic_ports_fini(struct hl_device *hdev);
 int gaudi_nic_hard_reset_prepare(struct hl_device *hdev);
 void gaudi_nic_stop(struct hl_device *hdev);
 void gaudi_nic_ports_reopen(struct hl_device *hdev);
+int gaudi_nic_get_mac_addr(struct hl_device *hdev,
+				struct hl_info_mac_addr *mac_addr);
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
 irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index ff08cfc81e69..491f426ab0bb 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -2743,6 +2743,33 @@ void gaudi_nic_ports_reopen(struct hl_device *hdev)
 	gaudi->hw_cap_initialized |= HW_CAP_NIC_DRV;
 }
 
+int gaudi_nic_get_mac_addr(struct hl_device *hdev,
+				struct hl_info_mac_addr *mac_addr)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct net_device *ndev;
+	int i, number_of_ports;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		goto out;
+
+	number_of_ports = min_t(int, NIC_NUMBER_OF_PORTS,
+				HL_INFO_MAC_ADDR_MAX_NUM);
+
+	for (i = 0 ; i < number_of_ports ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		ndev = gaudi->nic_devices[i].ndev;
+		if (!ndev)
+			continue;
+
+		ether_addr_copy(mac_addr->array[i].addr, ndev->dev_addr);
+		mac_addr->mask[i / 64] |= BIT_ULL(i % 64);
+	}
+out:
+	return 0;
+}
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 {
 }
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 8e15d9f85af8..e49ee24cde50 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5265,6 +5265,14 @@ static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 	return RREG32(mmHW_STATE);
 }
 
+static int goya_get_mac_addr(struct hl_device *hdev,
+			struct hl_info_mac_addr *mac_addr)
+{
+	dev_err_ratelimited(hdev->dev,
+				"No MAC addresses are assigned to Goya\n");
+	return -ENXIO;
+}
+
 static int goya_ctx_init(struct hl_ctx *ctx)
 {
 	return 0;
@@ -5384,6 +5392,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.get_hw_state = goya_get_hw_state,
 	.pci_bars_map = goya_pci_bars_map,
 	.init_iatu = goya_init_iatu,
+	.get_mac_addr = goya_get_mac_addr,
 	.rreg = hl_rreg,
 	.wreg = hl_wreg,
 	.halt_coresight = goya_halt_coresight,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index e8a5b62b95dd..cd600a52f40a 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -10,6 +10,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/if_ether.h>
 
 /*
  * Defines that are asic-specific but constitutes as ABI between kernel driver
@@ -248,6 +249,8 @@ enum hl_device_status {
  *                         internal engine.
  * HL_INFO_DEVICE_STATUS - Retrieve the device's status. This opcode doesn't
  *                         require an open context.
+ * HL_INFO_MAC_ADDR      - Retrieve the list of MAC addresses of the device's
+ *                         network ports, if the device has network ports.
  * HL_INFO_DEVICE_UTILIZATION  - Retrieve the total utilization of the device
  *                               over the last period specified by the user.
  *                               The period can be between 100ms to 1s, in
@@ -274,6 +277,7 @@ enum hl_device_status {
 #define HL_INFO_DRAM_USAGE		2
 #define HL_INFO_HW_IDLE			3
 #define HL_INFO_DEVICE_STATUS		4
+#define HL_INFO_MAC_ADDR		5
 #define HL_INFO_DEVICE_UTILIZATION	6
 #define HL_INFO_HW_EVENTS_AGGREGATE	7
 #define HL_INFO_CLK_RATE		8
@@ -285,9 +289,11 @@ enum hl_device_status {
 #define HL_INFO_SYNC_MANAGER		14
 #define HL_INFO_TOTAL_ENERGY		15
 
-#define HL_INFO_VERSION_MAX_LEN	128
+#define HL_INFO_VERSION_MAX_LEN		128
 #define HL_INFO_CARD_NAME_MAX_LEN	16
 
+#define HL_INFO_MAC_ADDR_MAX_NUM	128
+
 struct hl_info_hw_ip_info {
 	__u64 sram_base_address;
 	__u64 dram_base_address;
@@ -334,6 +340,18 @@ struct hl_info_device_status {
 	__u32 pad;
 };
 
+struct hl_mac_addr {
+	__u8 addr[ETH_ALEN];
+	__u8 pad[2];
+};
+
+struct hl_info_mac_addr {
+	/* MAC address at index N is of the corresponding PORT ID */
+	struct hl_mac_addr array[HL_INFO_MAC_ADDR_MAX_NUM];
+	/* Mask of valid entries at the MAC addresses array */
+	__u64 mask[2];
+};
+
 struct hl_info_device_utilization {
 	__u32 utilization;
 	__u32 pad;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/15] habanalabs/gaudi: add a new IOCTL for NIC control operations
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (5 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 07/15] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 09/15] habanalabs/gaudi: add CQ " Oded Gabbay
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add Queue Pair (QP) opcodes to the NIC ioctl.

A QP represents a connection between two Gaudi ports. Each port currently
supports 1024 QPs where QP 0 is reserved for the driver for Ethernet.
User-space process needs to create a QP in order to communicate with other
Gaudis.

QP can have two contexts: requester (sender) and responder (receiver). Both
have unique parameters as well as shared ones.

The QP numbers are not recycled immediately but only after wraparound. This
to avoid cases where a QP was closed and reopened and got data of the
"old" QP.

The added opcodes are:

- Create a QP
- Set requester context
- Set responder context
- Destroy a QP

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/habanalabs.h   |   3 +
 .../misc/habanalabs/common/habanalabs_ioctl.c |  98 ++++-
 drivers/misc/habanalabs/gaudi/gaudi.c         |   1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |   2 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 409 ++++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |   9 +
 include/uapi/misc/habanalabs.h                | 129 +++++-
 7 files changed, 649 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 6bfef3da6e61..5c48d9855e2e 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -679,6 +679,7 @@ struct hl_info_mac_addr;
  *                    then the timeout is the default timeout for the specific
  *                    ASIC
  * @get_hw_state: retrieve the H/W state
+ * @nic_control: Perform NIC related operations.
  * @pci_bars_map: Map PCI BARs.
  * @init_iatu: Initialize the iATU unit inside the PCI controller.
  * @get_mac_addr: Get list of MAC addresses.
@@ -783,6 +784,8 @@ struct hl_asic_funcs {
 	int (*send_cpu_message)(struct hl_device *hdev, u32 *msg,
 				u16 len, u32 timeout, long *result);
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
+	int (*nic_control)(struct hl_device *hdev, u32 op, void *input,
+				void *output);
 	int (*pci_bars_map)(struct hl_device *hdev);
 	int (*init_iatu)(struct hl_device *hdev);
 	int (*get_mac_addr)(struct hl_device *hdev,
diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 01425b821828..9924288aabae 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -24,6 +24,20 @@ static u32 hl_debug_struct_size[HL_DEBUG_OP_TIMESTAMP + 1] = {
 
 };
 
+static u32 hl_nic_input_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_in),
+	[HL_NIC_OP_SET_REQ_CONN_CTX] = sizeof(struct hl_nic_req_conn_ctx_in),
+	[HL_NIC_OP_SET_RES_CONN_CTX] = sizeof(struct hl_nic_res_conn_ctx_in),
+	[HL_NIC_OP_DESTROY_CONN] = sizeof(struct hl_nic_destroy_conn_in),
+};
+
+static u32 hl_nic_output_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_out),
+	[HL_NIC_OP_SET_REQ_CONN_CTX] = 0,
+	[HL_NIC_OP_SET_RES_CONN_CTX] = 0,
+	[HL_NIC_OP_DESTROY_CONN] = 0,
+};
+
 static int device_status_info(struct hl_device *hdev, struct hl_info_args *args)
 {
 	struct hl_info_device_status dev_stat = {0};
@@ -545,6 +559,87 @@ static int hl_debug_ioctl(struct hl_fpriv *hpriv, void *data)
 	return rc;
 }
 
+static int nic_control(struct hl_device *hdev, struct hl_nic_args *args)
+{
+	void *input = NULL, *output = NULL;
+	int rc;
+
+	if (args->input_ptr && args->input_size) {
+		input = kzalloc(hl_nic_input_size[args->op], GFP_KERNEL);
+		if (!input) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		if (copy_from_user(input, u64_to_user_ptr(args->input_ptr),
+					args->input_size)) {
+			rc = -EFAULT;
+			dev_err(hdev->dev, "failed to copy input NIC data\n");
+			goto out;
+		}
+	}
+
+	if (args->output_ptr && args->output_size) {
+		output = kzalloc(hl_nic_output_size[args->op], GFP_KERNEL);
+		if (!output) {
+			rc = -ENOMEM;
+			goto out;
+		}
+	}
+
+	rc = hdev->asic_funcs->nic_control(hdev, args->op, input, output);
+	if (rc)
+		dev_err_ratelimited(hdev->dev,
+				"NIC control operation %d failed %d\n",
+				args->op, rc);
+
+	if (output && copy_to_user((void __user *) (uintptr_t) args->output_ptr,
+					output, args->output_size)) {
+		dev_err(hdev->dev, "copy to user failed in nic ioctl\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+out:
+	kfree(output);
+	kfree(input);
+
+	return rc;
+}
+
+static int hl_nic_ioctl(struct hl_fpriv *hpriv, void *data)
+{
+	struct hl_nic_args *args = data;
+	struct hl_device *hdev = hpriv->hdev;
+	int rc;
+
+	if (hl_device_disabled_or_in_reset(hdev)) {
+		dev_warn_ratelimited(hdev->dev,
+			"Device is %s. Can't execute NIC IOCTL\n",
+			atomic_read(&hdev->in_reset) ? "in_reset" : "disabled");
+		return -EBUSY;
+	}
+
+	switch (args->op) {
+	case HL_NIC_OP_ALLOC_CONN:
+	case HL_NIC_OP_SET_REQ_CONN_CTX:
+	case HL_NIC_OP_SET_RES_CONN_CTX:
+	case HL_NIC_OP_DESTROY_CONN:
+		args->input_size =
+			min(args->input_size, hl_nic_input_size[args->op]);
+		args->output_size =
+			min(args->output_size, hl_nic_output_size[args->op]);
+		rc = nic_control(hdev, args);
+		break;
+	default:
+		dev_err(hdev->dev, "Invalid request %d\n", args->op);
+		rc = -ENOTTY;
+		break;
+	}
+
+	return rc;
+}
+
 #define HL_IOCTL_DEF(ioctl, _func) \
 	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func}
 
@@ -554,7 +649,8 @@ static const struct hl_ioctl_desc hl_ioctls[] = {
 	HL_IOCTL_DEF(HL_IOCTL_CS, hl_cs_ioctl),
 	HL_IOCTL_DEF(HL_IOCTL_WAIT_CS, hl_cs_wait_ioctl),
 	HL_IOCTL_DEF(HL_IOCTL_MEMORY, hl_mem_ioctl),
-	HL_IOCTL_DEF(HL_IOCTL_DEBUG, hl_debug_ioctl)
+	HL_IOCTL_DEF(HL_IOCTL_DEBUG, hl_debug_ioctl),
+	HL_IOCTL_DEF(HL_IOCTL_NIC, hl_nic_ioctl)
 };
 
 static const struct hl_ioctl_desc hl_ioctls_control[] = {
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 8ce20e0f8c59..33464bbad157 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7468,6 +7468,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.get_eeprom_data = gaudi_get_eeprom_data,
 	.send_cpu_message = gaudi_send_cpu_message,
 	.get_hw_state = gaudi_get_hw_state,
+	.nic_control = gaudi_nic_control,
 	.pci_bars_map = gaudi_pci_bars_map,
 	.init_iatu = gaudi_init_iatu,
 	.get_mac_addr = gaudi_nic_get_mac_addr,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 17560510a05f..8a89da6b86a1 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -568,6 +568,8 @@ void gaudi_nic_stop(struct hl_device *hdev);
 void gaudi_nic_ports_reopen(struct hl_device *hdev);
 int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 				struct hl_info_mac_addr *mac_addr);
+int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input,
+			void *output);
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
 irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 491f426ab0bb..cbe625a9b6f3 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -58,6 +58,9 @@ enum link_status {
 #define MAC_CFG_XPCS91(addr, data)	\
 				mac_write(gaudi_nic, i, "xpcs91", addr, data)
 
+static struct hl_qp dummy_qp;
+static int qp_put(struct hl_qp *qp);
+
 static void qpc_cache_inv(struct gaudi_nic_device *gaudi_nic, bool is_req)
 {
 	struct hl_device *hdev = gaudi_nic->hdev;
@@ -2770,6 +2773,412 @@ int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 out:
 	return 0;
 }
+
+static struct hl_qp *qp_get(struct hl_device *hdev,
+			struct gaudi_nic_device *gaudi_nic, u32 conn_id)
+{
+	struct hl_qp *qp;
+
+	mutex_lock(&gaudi_nic->idr_lock);
+	qp = idr_find(&gaudi_nic->qp_ids, conn_id);
+	if (!qp || qp == &dummy_qp) {
+		dev_err(hdev->dev,
+			"Failed to find matching QP for handle %d in port %d\n",
+			conn_id, gaudi_nic->port);
+		goto out;
+	}
+
+	kref_get(&qp->refcount);
+out:
+	mutex_unlock(&gaudi_nic->idr_lock);
+
+	return qp;
+}
+
+static void qp_do_release(struct hl_qp *qp)
+{
+	mutex_destroy(&qp->qpc_lock);
+	kfree(qp);
+}
+
+static void qp_release(struct kref *ref)
+{
+	struct hl_qp *qp = container_of(ref, struct hl_qp, refcount);
+	struct gaudi_nic_device *gaudi_nic = qp->gaudi_nic;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	void __iomem *base_bar_addr = hdev->pcie_bar[HBM_BAR_ID] -
+					gaudi->hbm_bar_cur_addr;
+	struct qpc_requester req_qpc = {};
+	struct qpc_responder res_qpc = {};
+	u64 req_qpc_addr, res_qpc_addr;
+	int i;
+
+	req_qpc_addr = REQ_QPC_ADDR(qp->port, qp->conn_id);
+	res_qpc_addr = RES_QPC_ADDR(qp->port, qp->conn_id);
+
+	REQ_QPC_SET_VALID(req_qpc, 0);
+	RES_QPC_SET_VALID(res_qpc, 0);
+
+	mutex_lock(&qp->qpc_lock);
+
+	if (qp->is_req)
+		for (i = 0 ; i < (sizeof(req_qpc) / sizeof(u64)) ; i++)
+			writeq(req_qpc.data[i], base_bar_addr +
+					(req_qpc_addr + i * 8));
+
+	if (qp->is_res)
+		for (i = 0 ; i < (sizeof(res_qpc) / sizeof(u64)) ; i++)
+			writeq(res_qpc.data[i], base_bar_addr +
+					(res_qpc_addr + i * 8));
+
+	/* Perform read to flush the writes of the connection context */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	if (qp->is_req)
+		qpc_cache_inv(gaudi_nic, true);
+	if (qp->is_res)
+		qpc_cache_inv(gaudi_nic, false);
+
+	mutex_unlock(&qp->qpc_lock);
+
+	/*
+	 * No need in removing the QP ID from the IDR. This will be done once
+	 * the IDR gets full. We do this lazy cleanup because we don't want to
+	 * reuse a QP ID immediately after a QP was destroyed.
+	 */
+	qp_do_release(qp);
+}
+
+static int qp_put(struct hl_qp *qp)
+{
+	return kref_put(&qp->refcount, qp_release);
+}
+
+/* "gaudi_nic->idr_lock" should be taken from the caller function if needed */
+static void qps_clean_dummies(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_qp *qp;
+	int qp_id;
+
+	idr_for_each_entry(&gaudi_nic->qp_ids, qp, qp_id)
+		if (qp == &dummy_qp)
+			idr_remove(&gaudi_nic->qp_ids, qp_id);
+}
+
+static int conn_ioctl_check(struct hl_device *hdev, u32 port, u32 conn_id)
+{
+	if (port >= NIC_NUMBER_OF_PORTS) {
+		dev_err(hdev->dev, "Invalid port %d\n", port);
+		return -EINVAL;
+	}
+
+	if (!(hdev->nic_ports_mask & BIT(port))) {
+		dev_err(hdev->dev, "Port %d is disabled\n", port);
+		return -ENODEV;
+	}
+
+	if (conn_id < HL_NIC_MIN_CONN_ID || conn_id > HL_NIC_MAX_CONN_ID) {
+		dev_err(hdev->dev, "Invalid connection ID %d for port %d\n",
+			conn_id, port);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int alloc_conn(struct hl_device *hdev, struct hl_nic_alloc_conn_in *in,
+			struct hl_nic_alloc_conn_out *out)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct hl_qp *qp;
+	int id, rc;
+
+	if (!in || !out) {
+		dev_err(hdev->dev,
+			"Missing parameters to allocate a NIC context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, HL_NIC_MIN_CONN_ID);
+	if (rc)
+		return rc;
+
+	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+	if (!qp)
+		return -ENOMEM;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+	mutex_init(&qp->qpc_lock);
+	kref_init(&qp->refcount);
+	qp->gaudi_nic = gaudi_nic;
+	qp->port = in->port;
+
+	/* TODO: handle local/remote keys */
+
+	mutex_lock(&gaudi_nic->idr_lock);
+	id = idr_alloc(&gaudi_nic->qp_ids, qp, HL_NIC_MIN_CONN_ID,
+			HL_NIC_MAX_CONN_ID + 1, GFP_KERNEL);
+
+	if (id < 0) {
+		/* Try again after removing the dummy ids */
+		qps_clean_dummies(gaudi_nic);
+		id = idr_alloc(&gaudi_nic->qp_ids, qp, HL_NIC_MIN_CONN_ID,
+				HL_NIC_MAX_CONN_ID + 1, GFP_KERNEL);
+	}
+
+	qp->conn_id = id;
+	mutex_unlock(&gaudi_nic->idr_lock);
+
+	if (id < 0) {
+		qp_do_release(qp);
+		return id;
+	}
+
+	dev_dbg(hdev->dev, "Allocating connection id %d in port %d",
+		id, qp->port);
+
+	out->conn_id = id;
+
+	return 0;
+}
+
+static int set_req_conn_ctx(struct hl_device *hdev,
+				struct hl_nic_req_conn_ctx_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct qpc_requester req_qpc = {};
+	struct hl_qp *qp;
+	u64 req_qpc_addr;
+	int i, rc;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to set a requester context\n");
+		return -EINVAL;
+	}
+
+	if (in->burst_size == 0) {
+		dev_err(hdev->dev,
+			"Burst size can't be disabled in requester context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, in->conn_id);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+
+	qp = qp_get(hdev, gaudi_nic, in->conn_id);
+	if (!qp)
+		return -EINVAL;
+
+	req_qpc_addr = REQ_QPC_ADDR(in->port, in->conn_id);
+	REQ_QPC_SET_DST_QP(req_qpc, in->dst_conn_id);
+	REQ_QPC_SET_PORT(req_qpc, 0);
+	REQ_QPC_SET_PRIORITY(req_qpc, in->priority);
+	REQ_QPC_SET_RKEY(req_qpc, qp->remote_key);
+	REQ_QPC_SET_DST_IP(req_qpc, in->dst_ip_addr);
+	REQ_QPC_SET_SRC_IP(req_qpc, in->src_ip_addr);
+	REQ_QPC_SET_DST_MAC_31_0(req_qpc, *(u32 *) in->dst_mac_addr);
+	REQ_QPC_SET_DST_MAC_47_32(req_qpc, *(u16 *) (in->dst_mac_addr + 4));
+	REQ_QPC_SET_SQ_NUM(req_qpc, in->sq_number);
+	REQ_QPC_SET_TM_GRANULARITY(req_qpc, in->timer_granularity);
+	REQ_QPC_SET_SOB_EN(req_qpc, in->enable_sob);
+	REQ_QPC_SET_TRANSPORT_SERVICE(req_qpc, TS_RC);
+	REQ_QPC_SET_BURST_SIZE(req_qpc, in->burst_size);
+	REQ_QPC_SET_LAST_IDX(req_qpc, in->last_index);
+	REQ_QPC_SET_WQ_BASE_ADDR(req_qpc, in->conn_id);
+	REQ_QPC_SET_SWQ_GRANULARITY(req_qpc, in->swq_granularity);
+	REQ_QPC_SET_VALID(req_qpc, 1);
+
+	mutex_lock(&qp->qpc_lock);
+
+	for (i = 0 ; i < (sizeof(req_qpc) / sizeof(u64)) ; i++)
+		writeq(req_qpc.data[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((req_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	/* Perform read to flush the writes of the connection context */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qp->is_req = true;
+	qpc_cache_inv(gaudi_nic, true);
+
+	mutex_unlock(&qp->qpc_lock);
+
+	qp_put(qp);
+
+	return 0;
+}
+
+static int set_res_conn_ctx(struct hl_device *hdev,
+				struct hl_nic_res_conn_ctx_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct qpc_responder res_qpc = {};
+	struct hl_qp *qp;
+	u64 res_qpc_addr;
+	int i, rc;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to set a responder context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, in->conn_id);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+
+	qp = qp_get(hdev, gaudi_nic, in->conn_id);
+	if (!qp)
+		return -EINVAL;
+
+	res_qpc_addr = RES_QPC_ADDR(in->port, in->conn_id);
+	RES_QPC_SET_DST_QP(res_qpc, in->dst_conn_id);
+	RES_QPC_SET_PORT(res_qpc, 0);
+	RES_QPC_SET_PRIORITY(res_qpc, in->priority);
+	RES_QPC_SET_SQ_NUM(res_qpc, in->sq_number);
+	RES_QPC_SET_LKEY(res_qpc, qp->local_key);
+	RES_QPC_SET_DST_IP(res_qpc, in->dst_ip_addr);
+	RES_QPC_SET_SRC_IP(res_qpc, in->src_ip_addr);
+	RES_QPC_SET_DST_MAC_31_0(res_qpc, *(u32 *) in->dst_mac_addr);
+	RES_QPC_SET_DST_MAC_47_32(res_qpc, *(u16 *) (in->dst_mac_addr + 4));
+	RES_QPC_SET_TRANSPORT_SERVICE(res_qpc, TS_RC);
+	RES_QPC_SET_LOG_BUF_SIZE_MASK(res_qpc, 0);
+	RES_QPC_SET_SOB_EN(res_qpc, in->enable_sob);
+	RES_QPC_SET_VALID(res_qpc, 1);
+
+	mutex_lock(&qp->qpc_lock);
+
+	for (i = 0 ; i < (sizeof(res_qpc) / sizeof(u64)) ; i++)
+		writeq(res_qpc.data[i], hdev->pcie_bar[HBM_BAR_ID] +
+			((res_qpc_addr + i * 8) - gaudi->hbm_bar_cur_addr));
+
+	/* Perform read to flush the writes of the connection context */
+	readq(hdev->pcie_bar[HBM_BAR_ID]);
+
+	qp->is_res = true;
+	qpc_cache_inv(gaudi_nic, false);
+
+	mutex_unlock(&qp->qpc_lock);
+
+	qp_put(qp);
+
+	return 0;
+}
+
+static int destroy_conn(struct hl_device *hdev,
+			struct hl_nic_destroy_conn_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct hl_qp *qp;
+	int rc;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to destroy a NIC context\n");
+		return -EINVAL;
+	}
+
+	rc = conn_ioctl_check(hdev, in->port, in->conn_id);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[in->port];
+
+	/* The QP pointer is replaced with the dummy QP to prevent other threads
+	 * from using the QP. The ID is kept allocated at this stage so the QP
+	 * context can be safely modified. qp_put() is called right afterwards.
+	 */
+	mutex_lock(&gaudi_nic->idr_lock);
+	qp = idr_replace(&gaudi_nic->qp_ids, &dummy_qp, in->conn_id);
+	mutex_unlock(&gaudi_nic->idr_lock);
+
+	if (IS_ERR(qp))
+		return PTR_ERR(qp);
+
+	qp_put(qp);
+
+	return 0;
+}
+
+int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input, void *output)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int rc;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return -EFAULT;
+
+	switch (op) {
+	case HL_NIC_OP_ALLOC_CONN:
+		rc = alloc_conn(hdev, input, output);
+		break;
+	case HL_NIC_OP_SET_REQ_CONN_CTX:
+		rc = set_req_conn_ctx(hdev, input);
+		break;
+	case HL_NIC_OP_SET_RES_CONN_CTX:
+		rc = set_res_conn_ctx(hdev, input);
+		break;
+	case HL_NIC_OP_DESTROY_CONN:
+		rc = destroy_conn(hdev, input);
+		break;
+	default:
+		dev_err(hdev->dev, "Invalid NIC control request %d\n", op);
+		return -ENOTTY;
+	}
+
+	return rc;
+}
+
+static void qps_destroy(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct hl_qp *qp;
+	int qp_id, i;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		/*
+		 * No need to acquire "gaudi_nic->idr_lock", as qps_destroy() is
+		 * only called when a context is closed, and in Gaudi we have a
+		 * single context.
+		 */
+
+		qps_clean_dummies(gaudi_nic);
+
+		idr_for_each_entry(&gaudi_nic->qp_ids, qp, qp_id) {
+			idr_remove(&gaudi_nic->qp_ids, qp_id);
+			if (qp_put(qp) != 1)
+				dev_err(hdev->dev,
+					"QP %d of port %d is still alive\n",
+					qp->conn_id, qp->port);
+		}
+	}
+}
+
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 {
+	struct hl_device *hdev = ctx->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return;
+
+	qps_destroy(hdev);
+	/* wait for the NIC to digest the invalid QPs */
+	msleep(20);
 }
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index e49ee24cde50..151f886cd7c4 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5265,6 +5265,14 @@ static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 	return RREG32(mmHW_STATE);
 }
 
+static int goya_nic_control(struct hl_device *hdev, u32 op, void *input,
+			void *output)
+{
+	dev_err_ratelimited(hdev->dev,
+				"NIC operations cannot be performed on Goya\n");
+	return -ENXIO;
+}
+
 static int goya_get_mac_addr(struct hl_device *hdev,
 			struct hl_info_mac_addr *mac_addr)
 {
@@ -5390,6 +5398,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.get_eeprom_data = goya_get_eeprom_data,
 	.send_cpu_message = goya_send_cpu_message,
 	.get_hw_state = goya_get_hw_state,
+	.nic_control = goya_nic_control,
 	.pci_bars_map = goya_pci_bars_map,
 	.init_iatu = goya_init_iatu,
 	.get_mac_addr = goya_get_mac_addr,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index cd600a52f40a..227bc7c98e08 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -848,6 +848,116 @@ struct hl_debug_args {
 #define HL_NIC_MIN_CONN_ID	1
 #define HL_NIC_MAX_CONN_ID	1023
 
+struct hl_nic_alloc_conn_in {
+	/* NIC port ID */
+	__u32 port;
+	__u32 pad;
+};
+
+struct hl_nic_alloc_conn_out {
+	/* Connection ID */
+	__u32 conn_id;
+	__u32 pad;
+};
+
+struct hl_nic_req_conn_ctx_in {
+	/* Source IP address */
+	__u32 src_ip_addr;
+	/* Destination IP address */
+	__u32 dst_ip_addr;
+	/* Destination connection ID */
+	__u32 dst_conn_id;
+	/* Burst size [1..(2^22)-1 or 0 to disable] */
+	__u32 burst_size;
+	/* Index of last entry [2..(2^22)-1] */
+	__u32 last_index;
+	/* NIC port ID */
+	__u32 port;
+	/* Connection ID */
+	__u32 conn_id;
+	/* Destination MAC address */
+	__u8 dst_mac_addr[ETH_ALEN];
+	/* SQ number */
+	__u8 sq_number;
+	/* Connection priority [0..3] */
+	__u8 priority;
+	/* Enable/disable SOB */
+	__u8 enable_sob;
+	/* Timer granularity [0..127]*/
+	__u8 timer_granularity;
+	/* SWQ granularity [0 for 64B or 1 for 32B] */
+	__u8 swq_granularity;
+	/* Work queue type [1..3] */
+	__u8 wq_type;
+	/* Version type in remote side [0..1] */
+	__u8 version;
+	/* Completion queue number */
+	__u8 cq_number;
+	/* Remote Work queue log size [2^QPC] Rendezvous */
+	__u8 wq_remote_log_size;
+	__u8 pad;
+};
+
+struct hl_nic_res_conn_ctx_in {
+	/* Source IP address */
+	__u32 src_ip_addr;
+	/* Destination IP address */
+	__u32 dst_ip_addr;
+	/* Destination connection ID */
+	__u32 dst_conn_id;
+	/* NIC port ID */
+	__u32 port;
+	/* Connection ID */
+	__u32 conn_id;
+	/* Destination MAC address */
+	__u8 dst_mac_addr[ETH_ALEN];
+	/* Connection priority [0..3] */
+	__u8 priority;
+	/* SQ number */
+	__u8 sq_number;
+	/* Enable/disable SOB */
+	__u8 enable_sob;
+	/* Work queue granularity */
+	__u8 wq_peer_granularity;
+	/* Completion queue number */
+	__u8 cq_number;
+	/* Version type in remote side [0..1] */
+	__u8 version;
+	/* Connection peer */
+	__u32 conn_peer;
+};
+
+struct hl_nic_destroy_conn_in {
+	/* NIC port ID */
+	__u32 port;
+	/* Connection ID */
+	__u32 conn_id;
+};
+
+/* Opcode to allocate connection ID */
+#define HL_NIC_OP_ALLOC_CONN			0
+/* Opcode to set up a requester connection context */
+#define HL_NIC_OP_SET_REQ_CONN_CTX		1
+/* Opcode to set up a responder connection context */
+#define HL_NIC_OP_SET_RES_CONN_CTX		2
+/* Opcode to destroy a connection */
+#define HL_NIC_OP_DESTROY_CONN			3
+
+struct hl_nic_args {
+	/* Pointer to user input structure (relevant to specific opcodes) */
+	__u64 input_ptr;
+	/* Pointer to user output structure (relevant to specific opcodes) */
+	__u64 output_ptr;
+	/* Size of user input structure */
+	__u32 input_size;
+	/* Size of user output structure */
+	__u32 output_size;
+	/* Context ID - Currently not in use */
+	__u32 ctx_id;
+	/* HL_NIC_OP_* */
+	__u32 op;
+};
+
 /*
  * Various information operations such as:
  * - H/W IP information
@@ -1004,7 +1114,24 @@ struct hl_debug_args {
 #define HL_IOCTL_DEBUG		\
 		_IOWR('H', 0x06, struct hl_debug_args)
 
+/*
+ * NIC
+ *
+ * This IOCTL allows the user to manage and configure the device's NIC ports.
+ * The following operations are available:
+ * - Allocate connection ID
+ * - Set up a requester connection context
+ * - Set up a responder connection context
+ * - Destroy a connection
+ *
+ * For all operations, the user should provide a pointer to an input structure
+ * with the context parameters. Some of the operations also require a pointer to
+ * an output structure for result/status.
+ *
+ */
+#define HL_IOCTL_NIC	_IOWR('H', 0x07, struct hl_nic_args)
+
 #define HL_COMMAND_START	0x01
-#define HL_COMMAND_END		0x07
+#define HL_COMMAND_END		0x08
 
 #endif /* HABANALABS_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/15] habanalabs/gaudi: add CQ control operations
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (6 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 08/15] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 10/15] habanalabs/gaudi: add WQ " Oded Gabbay
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add NIC Completion Queue (CQ) opcodes to NIC ioctl. The CQ is used by the
user-space process to get notification of a completed work.

A CQ entry (CQE) has three types: requester (sender), responder
(receiver) and error. Each type has unique fields as well as shared ones.

Currently only a single user CQ is supported but it may be extended in the
future, hence proper locking was added as well. In addition, an error
interrupt was added to identify CQ overrun.

The added opcodes are:
- Create CQ
- Destroy CQ
- Wait on CQ: sleeps until CQEs are available in the buffer.
- Poll CQ: check if there are available CQEs in the buffer. It is a
           non-blocking function.
- Update consumed CQEs: The user informs the driver regarding processed
                        CQEs so these can be overridden by the driver.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/device.c       |   6 +-
 drivers/misc/habanalabs/common/habanalabs.h   |   3 +
 .../misc/habanalabs/common/habanalabs_ioctl.c |  20 +-
 drivers/misc/habanalabs/gaudi/gaudi.c         |   1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |   1 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 592 ++++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |   8 +
 include/uapi/misc/habanalabs.h                | 111 ++++
 8 files changed, 739 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index aa7fa9e94651..57f5b945fa41 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -117,12 +117,13 @@ static int hl_device_release_ctrl(struct inode *inode, struct file *filp)
  * @*filp: pointer to file structure
  * @*vma: pointer to vm_area_struct of the process
  *
- * Called when process does an mmap on habanalabs device. Call the device's mmap
+ * Called when process does an mmap on habanalabs device. Call the relevant mmap
  * function at the end of the common code.
  */
 static int hl_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct hl_fpriv *hpriv = filp->private_data;
+	struct hl_device *hdev = hpriv->hdev;
 	unsigned long vm_pgoff;
 
 	vm_pgoff = vma->vm_pgoff;
@@ -131,6 +132,9 @@ static int hl_mmap(struct file *filp, struct vm_area_struct *vma)
 	switch (vm_pgoff & HL_MMAP_TYPE_MASK) {
 	case HL_MMAP_TYPE_CB:
 		return hl_cb_mmap(hpriv, vma);
+
+	case HL_MMAP_TYPE_NIC_CQ:
+		return hdev->asic_funcs->nic_cq_mmap(hdev, vma);
 	}
 
 	return -EINVAL;
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 5c48d9855e2e..1f3735a64d88 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -32,6 +32,7 @@
 #define HL_MMAP_TYPE_SHIFT		(62 - PAGE_SHIFT)
 #define HL_MMAP_TYPE_MASK		(0x3ull << HL_MMAP_TYPE_SHIFT)
 #define HL_MMAP_TYPE_CB			(0x2ull << HL_MMAP_TYPE_SHIFT)
+#define HL_MMAP_TYPE_NIC_CQ		(0x1ull << HL_MMAP_TYPE_SHIFT)
 
 #define HL_MMAP_OFFSET_VALUE_MASK	(0x3FFFFFFFFFFFull >> PAGE_SHIFT)
 #define HL_MMAP_OFFSET_VALUE_GET(off)	(off & HL_MMAP_OFFSET_VALUE_MASK)
@@ -680,6 +681,7 @@ struct hl_info_mac_addr;
  *                    ASIC
  * @get_hw_state: retrieve the H/W state
  * @nic_control: Perform NIC related operations.
+ * @nic_cq_mmap: map the NIC CQ buffer.
  * @pci_bars_map: Map PCI BARs.
  * @init_iatu: Initialize the iATU unit inside the PCI controller.
  * @get_mac_addr: Get list of MAC addresses.
@@ -786,6 +788,7 @@ struct hl_asic_funcs {
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
 	int (*nic_control)(struct hl_device *hdev, u32 op, void *input,
 				void *output);
+	int (*nic_cq_mmap)(struct hl_device *hdev, struct vm_area_struct *vma);
 	int (*pci_bars_map)(struct hl_device *hdev);
 	int (*init_iatu)(struct hl_device *hdev);
 	int (*get_mac_addr)(struct hl_device *hdev,
diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 9924288aabae..6947ef519872 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -24,18 +24,29 @@ static u32 hl_debug_struct_size[HL_DEBUG_OP_TIMESTAMP + 1] = {
 
 };
 
-static u32 hl_nic_input_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+static u32 hl_nic_input_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_in),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = sizeof(struct hl_nic_req_conn_ctx_in),
 	[HL_NIC_OP_SET_RES_CONN_CTX] = sizeof(struct hl_nic_res_conn_ctx_in),
 	[HL_NIC_OP_DESTROY_CONN] = sizeof(struct hl_nic_destroy_conn_in),
+	[HL_NIC_OP_CQ_CREATE] = sizeof(struct hl_nic_cq_create_in),
+	[HL_NIC_OP_CQ_DESTROY] = sizeof(struct hl_nic_cq_destroy_in),
+	[HL_NIC_OP_CQ_WAIT] = sizeof(struct hl_nic_cq_poll_wait_in),
+	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_in),
+	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] =
+			sizeof(struct hl_nic_cq_update_consumed_cqes_in),
 };
 
-static u32 hl_nic_output_size[HL_NIC_OP_DESTROY_CONN + 1] = {
+static u32 hl_nic_output_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_out),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = 0,
 	[HL_NIC_OP_SET_RES_CONN_CTX] = 0,
 	[HL_NIC_OP_DESTROY_CONN] = 0,
+	[HL_NIC_OP_CQ_CREATE] = sizeof(struct hl_nic_cq_create_out),
+	[HL_NIC_OP_CQ_DESTROY] = 0,
+	[HL_NIC_OP_CQ_WAIT] = sizeof(struct hl_nic_cq_poll_wait_out),
+	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_out),
+	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] = 0,
 };
 
 static int device_status_info(struct hl_device *hdev, struct hl_info_args *args)
@@ -625,6 +636,11 @@ static int hl_nic_ioctl(struct hl_fpriv *hpriv, void *data)
 	case HL_NIC_OP_SET_REQ_CONN_CTX:
 	case HL_NIC_OP_SET_RES_CONN_CTX:
 	case HL_NIC_OP_DESTROY_CONN:
+	case HL_NIC_OP_CQ_CREATE:
+	case HL_NIC_OP_CQ_DESTROY:
+	case HL_NIC_OP_CQ_WAIT:
+	case HL_NIC_OP_CQ_POLL:
+	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
 		args->input_size =
 			min(args->input_size, hl_nic_input_size[args->op]);
 		args->output_size =
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 33464bbad157..34b99bd94ef0 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7469,6 +7469,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.send_cpu_message = gaudi_send_cpu_message,
 	.get_hw_state = gaudi_get_hw_state,
 	.nic_control = gaudi_nic_control,
+	.nic_cq_mmap = gaudi_nic_cq_mmap,
 	.pci_bars_map = gaudi_pci_bars_map,
 	.init_iatu = gaudi_init_iatu,
 	.get_mac_addr = gaudi_nic_get_mac_addr,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 8a89da6b86a1..ba3150c073ca 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -571,6 +571,7 @@ int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input,
 			void *output);
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx);
+int gaudi_nic_cq_mmap(struct hl_device *hdev, struct vm_area_struct *vma);
 irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
 netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index cbe625a9b6f3..0583b34a728f 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -1729,6 +1729,466 @@ void gaudi_nic_sw_fini(struct hl_device *hdev)
 		_gaudi_nic_sw_fini(&gaudi->nic_devices[i]);
 }
 
+/* this function is called from multiple threads */
+static void copy_cqe_to_main_queue(struct hl_device *hdev,
+					struct hl_nic_cqe *cqe)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 pi;
+
+	spin_lock(&gaudi->nic_cq_lock);
+
+	pi = gaudi->nic_cq_user_pi++;
+	/* wraparound according to the user CQ length */
+	pi &= (gaudi->nic_cq_user_num_of_entries - 1);
+	memcpy(&gaudi->nic_cq_buf[pi], cqe, sizeof(*cqe));
+
+#if HL_NIC_DEBUG
+	if (cqe->type == HL_NIC_CQE_TYPE_RES) {
+		dev_dbg(hdev->dev,
+			"responder, msg_id: 0x%x, port: %d, was copied to pi %d\n",
+			cqe->responder.msg_id, cqe->port, pi);
+	} else {
+		dev_dbg(hdev->dev,
+			"requester, wqe_index: 0x%x, qp_number: %d, port: %d, was copied to pi %d\n",
+			cqe->requester.wqe_index,
+			cqe->qp_number, cqe->port, pi);
+	}
+#endif
+
+	/* copy the CQE before the counter update */
+	mb();
+
+	if (unlikely(!atomic_add_unless(&gaudi->nic_cq_user_new_cqes, 1,
+				gaudi->nic_cq_user_num_of_entries))) {
+		gaudi->nic_cq_status = HL_NIC_CQ_OVERFLOW;
+		dev_err(hdev->dev, "NIC CQ overflow, should recreate NIC CQ\n");
+	}
+
+	spin_unlock(&gaudi->nic_cq_lock);
+}
+
+static void cq_work(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							cq_work.work);
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct cqe *cq_arr = gaudi_nic->cq_mem_cpu, *cqe_hw;
+	struct hl_nic_cqe cqe_sw;
+	u32 ci = gaudi_nic->cq_ci, cqe_cnt = 0, port = gaudi_nic->port, delay;
+	bool stop_work = false;
+
+	while (1) {
+		if (unlikely(!gaudi->nic_cq_enable) ||
+			unlikely(gaudi->nic_cq_status != HL_NIC_CQ_SUCCESS)) {
+			stop_work = true;
+			break;
+		}
+
+		memset(&cqe_sw, 0, sizeof(cqe_sw));
+
+		/* wraparound according to our buffer length */
+		cqe_hw = &cq_arr[ci & (CQ_PORT_BUF_LEN - 1)];
+
+		if (!CQE_IS_VALID(cqe_hw))
+			break;
+		/* Make sure we read CQE contents after the valid bit check */
+		dma_rmb();
+
+		cqe_sw.port = port;
+
+		if (CQE_TYPE(cqe_hw)) {
+			cqe_sw.type = HL_NIC_CQE_TYPE_RES;
+			cqe_sw.responder.msg_id =
+					(CQE_RES_IMDT_31_22(cqe_hw) << 22) |
+						CQE_RES_IMDT_21_0(cqe_hw);
+
+			/*
+			 * the even port publishes its responder CQEs on the odd
+			 * port CQ. take the correct port in this case.
+			 */
+			if (!CQE_RES_NIC(cqe_hw))
+				cqe_sw.port--;
+		} else {
+			cqe_sw.requester.wqe_index = CQE_REQ_WQE_IDX(cqe_hw);
+			cqe_sw.qp_number = CQE_REQ_QPN(cqe_hw);
+		}
+
+		copy_cqe_to_main_queue(hdev, &cqe_sw);
+
+		CQE_SET_INVALID(cqe_hw);
+
+		/* the H/W CI does wraparound every 32 bit */
+		ci++;
+
+		cqe_cnt++;
+		if (unlikely(cqe_cnt > CQ_PORT_BUF_LEN)) {
+			dev_err(hdev->dev,
+				"handled too many CQEs (%d), port: %d\n",
+				cqe_cnt, port);
+			stop_work = true;
+			break;
+		}
+	}
+
+	/* no CQEs to handle */
+	if (cqe_cnt == 0)
+		goto out;
+
+#if HL_NIC_DEBUG
+	dev_dbg(hdev->dev, "update H/W CQ CI: %d, port: %d\n", ci, port);
+#endif
+
+	NIC_WREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX, ci);
+
+	/*
+	 * perform a read to flush the new CI value before checking for hidden
+	 * packets
+	 */
+	NIC_RREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX);
+
+	gaudi_nic->cq_ci = ci;
+
+	/* make sure we wake up the waiter after the CI update */
+	mb();
+
+	/* signal the completion queue that there are available CQEs */
+	complete(&gaudi->nic_cq_comp);
+
+	if (unlikely(stop_work))
+		goto out;
+
+out:
+	if (likely(cqe_cnt)) {
+		gaudi_nic->last_cqe_cnt = cqe_cnt;
+		delay = gaudi_nic->cq_delay;
+	} else {
+		ktime_t later;
+
+		/*
+		 * take base TS on the first polling invocation where no CQEs
+		 * were processed
+		 */
+		if (gaudi_nic->last_cqe_cnt) {
+			gaudi_nic->last_cqe_cnt = 0;
+			gaudi_nic->last_cqe_ts = ktime_get();
+		}
+
+		/* extend the delay if no CQEs were processed for 1 sec */
+		later = ktime_add_ms(gaudi_nic->last_cqe_ts, 1 * MSEC_PER_SEC);
+		if (ktime_compare(ktime_get(), later) > 0)
+			delay = gaudi_nic->cq_delay_idle;
+		else
+			delay = gaudi_nic->cq_delay;
+	}
+
+	queue_delayed_work(gaudi_nic->cq_wq, &gaudi_nic->cq_work, delay);
+}
+
+static int cq_update_consumed_cqes(struct hl_device *hdev,
+				struct hl_nic_cq_update_consumed_cqes_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 num_of_cqes;
+	int rc = 0;
+
+	if (!in) {
+		dev_err(hdev->dev,
+			"Missing parameters to update consumed CQEs\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable) {
+		dev_err(hdev->dev,
+			"NIC CQ is not enabled, can't update user CI\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	num_of_cqes = in->cq_num_of_consumed_entries;
+
+	if (atomic_read(&gaudi->nic_cq_user_new_cqes) < num_of_cqes) {
+		dev_err(hdev->dev,
+			"nunmber of consumed CQEs is too big %d/%d\n",
+			num_of_cqes, atomic_read(&gaudi->nic_cq_user_new_cqes));
+		rc = -EINVAL;
+		goto out;
+	}
+
+	gaudi->nic_cq_user_ci = (gaudi->nic_cq_user_ci + num_of_cqes) &
+				(gaudi->nic_cq_user_num_of_entries - 1);
+
+	atomic_sub(num_of_cqes, &gaudi->nic_cq_user_new_cqes);
+
+#if HL_NIC_DEBUG
+	dev_dbg(hdev->dev, "consumed %d CQEs\n", num_of_cqes);
+	dev_dbg(hdev->dev, "user CQ CI: %d\n", gaudi->nic_cq_user_ci);
+#endif
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
+
+static int cq_poll_wait(struct hl_device *hdev,
+			struct hl_nic_cq_poll_wait_in *in,
+			struct hl_nic_cq_poll_wait_out *out,
+			bool do_wait)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	char *op_str = do_wait ? "wait" : "poll";
+	u32 num_of_cqes;
+	bool has_work = false;
+	long rc_wait;
+	int rc = 0;
+
+	if (!in || !out) {
+		dev_err(hdev->dev, "Missing parameters to poll/wait on CQ\n");
+		return -EINVAL;
+	}
+
+	/* allow only one thread to wait */
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable) {
+		dev_err(hdev->dev, "NIC CQ is not enabled, can't %s\n", op_str);
+		rc = -EFAULT;
+		goto out;
+	}
+
+	if (gaudi->nic_cq_status != HL_NIC_CQ_SUCCESS) {
+		dev_err(hdev->dev, "NIC CQ is not operational, can't %s\n",
+			op_str);
+		rc = -EFAULT;
+		goto out;
+	}
+
+#if HL_NIC_DEBUG
+	dev_dbg(hdev->dev, "ci: %d, wait: %d\n",
+		gaudi->nic_cq_user_ci, do_wait);
+#endif
+
+	if (do_wait) {
+		while (1) {
+			rc_wait = wait_for_completion_interruptible_timeout(
+					&gaudi->nic_cq_comp,
+					usecs_to_jiffies(in->timeout_us));
+
+			if (rc_wait == -ERESTARTSYS) {
+				dev_info(hdev->dev,
+						"stopping CQ %s due to signal\n",
+						op_str);
+				/* ERESTARTSYS is not returned to the user */
+				rc = -EINTR;
+				break;
+			}
+
+			if (!rc_wait) {
+				gaudi->nic_cq_status = HL_NIC_CQ_TIMEOUT;
+				break;
+			}
+
+			if (!gaudi->nic_cq_enable) {
+				dev_info(hdev->dev,
+						"stopping CQ %s upon request\n",
+						op_str);
+				rc = -EBUSY;
+				break;
+			}
+
+			if (gaudi->nic_cq_status != HL_NIC_CQ_SUCCESS)
+				break;
+
+			/*
+			 * A waiter can read 0 here.
+			 * Consider the following scenario:
+			 * 1. complete() is called twice for two CQEs.
+			 * 2. The first waiter grabs the two CQEs.
+			 * 3. The second waiter wakes up immediately and has no
+			 *    CQES to handle.
+			 */
+			num_of_cqes = atomic_read(&gaudi->nic_cq_user_new_cqes);
+			if (num_of_cqes) {
+				has_work = true;
+				break;
+			}
+		}
+	} else {
+		has_work = try_wait_for_completion(&gaudi->nic_cq_comp);
+		if (has_work)
+			num_of_cqes = atomic_read(&gaudi->nic_cq_user_new_cqes);
+	}
+
+	if (rc)
+		goto out;
+
+	if (has_work) {
+		out->pi = gaudi->nic_cq_user_ci;
+		out->num_of_cqes = num_of_cqes;
+#if HL_NIC_DEBUG
+		dev_dbg(hdev->dev, "pulled %d CQEs\n", num_of_cqes);
+		dev_dbg(hdev->dev, "user CQ CI: %d\n", gaudi->nic_cq_user_ci);
+#endif
+	} else {
+		out->num_of_cqes = 0;
+	}
+
+	out->status = gaudi->nic_cq_status;
+
+	/* timeout is not a real error, CQ should stay operational */
+	if (gaudi->nic_cq_status == HL_NIC_CQ_TIMEOUT)
+		gaudi->nic_cq_status = HL_NIC_CQ_SUCCESS;
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
+
+static int cq_create(struct hl_device *hdev, struct hl_nic_cq_create_in *in,
+			struct hl_nic_cq_create_out *out)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	struct cqe *cq_arr;
+	int rc = 0, i, j;
+
+	if (!in || !out) {
+		dev_err(hdev->dev, "Missing parameters to create CQ\n");
+		return -EINVAL;
+	}
+
+	if (in->cq_num_of_entries < CQ_USER_MIN_ENTRIES) {
+		dev_err(hdev->dev, "NIC CQ buffer length must be at least %d entries\n",
+			CQ_USER_MIN_ENTRIES);
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(in->cq_num_of_entries)) {
+		dev_err(hdev->dev,
+			"NIC CQ buffer length must be at power of 2\n");
+		return -EINVAL;
+	}
+
+	if (in->cq_num_of_entries > CQ_USER_MAX_ENTRIES) {
+		dev_err(hdev->dev,
+			"NIC CQ buffer length must not be more than 0x%lx entries\n",
+			CQ_USER_MAX_ENTRIES);
+		return -EINVAL;
+	}
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (gaudi->nic_cq_enable) {
+		dev_err(hdev->dev, "NIC CQ was already created\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	gaudi->nic_cq_user_num_of_entries = in->cq_num_of_entries;
+	gaudi->nic_cq_buf = vmalloc_user(gaudi->nic_cq_user_num_of_entries *
+					sizeof(struct hl_nic_cqe));
+	if (!gaudi->nic_cq_buf) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	init_completion(&gaudi->nic_cq_comp);
+	memset(gaudi->nic_cq_buf, 0,
+		gaudi->nic_cq_user_num_of_entries * sizeof(struct hl_nic_cqe));
+
+	spin_lock_init(&gaudi->nic_cq_lock);
+	gaudi->nic_cq_user_ci = 0;
+	gaudi->nic_cq_user_pi = 0;
+	atomic_set(&gaudi->nic_cq_user_new_cqes, 0);
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)) ||
+			!gaudi->nic_devices[i].port_open)
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+		gaudi_nic->cq_ci = gaudi_nic->last_cqe_cnt = 0;
+
+		NIC_WREG32(mmNIC0_RXE0_CQ_PRODUCER_INDEX, 0);
+		NIC_WREG32(mmNIC0_RXE0_CQ_CONSUMER_INDEX, 0);
+		NIC_WREG32(mmNIC0_RXE0_CQ_WRITE_INDEX, 0);
+
+		cq_arr = gaudi_nic->cq_mem_cpu;
+		for (j = 0 ; j < CQ_PORT_BUF_LEN ; j++)
+			CQE_SET_INVALID(&cq_arr[j]);
+
+	}
+
+	out->handle = HL_MMAP_TYPE_NIC_CQ << PAGE_SHIFT;
+	gaudi->nic_cq_status = HL_NIC_CQ_SUCCESS;
+	gaudi->nic_cq_enable = true;
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
+
+static void cq_stop(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	if (!gaudi->nic_cq_enable)
+		return;
+
+	/* if the CQ wait IOCTL is in progress, wake it up to return to US */
+	gaudi->nic_cq_enable = false;
+	/* make sure we disable the CQ before waking up the waiter */
+	mb();
+	complete(&gaudi->nic_cq_comp);
+
+	/* let the CQ wait IOCTL do cleanup gracefully */
+	msleep(100);
+}
+
+static int cq_destroy(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	int rc = 0;
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable)
+		goto out;
+
+	if (gaudi->nic_cq_mmap) {
+		dev_err(hdev->dev, "NIC CQ is still mapped, can't destroy\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * mark the CQ as disabled while holding the NIC QP error lock to avoid
+	 * from pushing QP error entries to a CQ under destruction
+	 */
+	mutex_lock(&gaudi->nic_qp_err_lock);
+	gaudi->nic_cq_enable = false;
+	mutex_unlock(&gaudi->nic_qp_err_lock);
+
+	/* make sure we disable the CQ before draining the polling threads */
+	mb();
+
+	/*
+	 * Wait for the polling threads to digest the new CQ state. This in
+	 * order to free the user buffer after they stopped processing CQEs and
+	 * copy them to the buffer.
+	 */
+	msleep(100);
+
+	vfree(gaudi->nic_cq_buf);
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
+}
 
 /* used for physically contiguous memory only */
 static int map_nic_mem(struct hl_device *hdev, u64 va, dma_addr_t pa, u32 size)
@@ -1928,6 +2388,8 @@ static int port_open(struct gaudi_nic_device *gaudi_nic)
 		goto cq_unmap;
 	}
 
+	INIT_DELAYED_WORK(&gaudi_nic->cq_work, cq_work);
+
 	if ((hdev->pdev) && (gaudi->multi_msi_mode)) {
 		rx_irq = pci_irq_vector(hdev->pdev, RX_MSI_IDX + port);
 
@@ -1970,6 +2432,9 @@ static int port_open(struct gaudi_nic_device *gaudi_nic)
 		napi_enable(&gaudi_nic->napi);
 	}
 
+	queue_delayed_work(gaudi_nic->cq_wq, &gaudi_nic->cq_work,
+				gaudi_nic->cq_delay_idle);
+
 	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
 		INIT_DELAYED_WORK(&gaudi_nic->link_status_work,
 					check_link_status);
@@ -2070,6 +2535,8 @@ static void port_close(struct gaudi_nic_device *gaudi_nic)
 
 	netif_carrier_off(gaudi_nic->ndev);
 
+	cancel_delayed_work_sync(&gaudi_nic->cq_work);
+
 	flush_workqueue(gaudi_nic->cq_wq);
 	destroy_workqueue(gaudi_nic->cq_wq);
 
@@ -2330,6 +2797,31 @@ static void port_unregister(struct gaudi_nic_device *gaudi_nic)
 
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg)
 {
+	struct hl_device *hdev = arg;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	int i;
+
+	/* one IRQ for all ports, need to iterate and read the cause */
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		gaudi_nic = &gaudi->nic_devices[i];
+
+		if (disabled_or_in_reset(gaudi_nic))
+			continue;
+
+		if (NIC_RREG32(mmNIC0_RXE0_MSI_CAUSE) & 2) {
+			dev_crit(hdev->dev, "NIC CQ overrun, port %d\n",
+					gaudi_nic->port);
+			NIC_WREG32(mmNIC0_RXE0_MSI_CAUSE, 0);
+			NIC_WREG32(mmNIC0_RXE0_CQ_MSI_CAUSE_CLR, 0xFFFF);
+			/* flush the cause clear */
+			NIC_RREG32(mmNIC0_RXE0_CQ_MSI_CAUSE_CLR);
+		}
+	}
+
 	return IRQ_HANDLED;
 }
 
@@ -2609,6 +3101,8 @@ int gaudi_nic_hard_reset_prepare(struct hl_device *hdev)
 			(gaudi->nic_in_reset))
 		return 0;
 
+	cq_stop(hdev);
+
 	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
 		if (!(hdev->nic_ports_mask & BIT(i)))
 			continue;
@@ -3131,6 +3625,21 @@ int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input, void *output)
 	case HL_NIC_OP_DESTROY_CONN:
 		rc = destroy_conn(hdev, input);
 		break;
+	case HL_NIC_OP_CQ_CREATE:
+		rc = cq_create(hdev, input, output);
+		break;
+	case HL_NIC_OP_CQ_DESTROY:
+		rc = cq_destroy(hdev);
+		break;
+	case HL_NIC_OP_CQ_WAIT:
+		rc = cq_poll_wait(hdev, input, output, true);
+		break;
+	case HL_NIC_OP_CQ_POLL:
+		rc = cq_poll_wait(hdev, input, output, false);
+		break;
+	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
+		rc = cq_update_consumed_cqes(hdev, input);
+		break;
 	default:
 		dev_err(hdev->dev, "Invalid NIC control request %d\n", op);
 		return -ENOTTY;
@@ -3181,4 +3690,87 @@ void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 	qps_destroy(hdev);
 	/* wait for the NIC to digest the invalid QPs */
 	msleep(20);
+	cq_destroy(hdev);
+}
+
+static void nic_cq_vm_close(struct vm_area_struct *vma)
+{
+	struct hl_device *hdev = (struct hl_device *) vma->vm_private_data;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	long new_mmap_size;
+
+	new_mmap_size = gaudi->nic_cq_mmap_size - (vma->vm_end - vma->vm_start);
+
+	dev_dbg(hdev->dev, "munmap NIC CQEs buffer, new_mmap_size: %ld\n",
+		new_mmap_size);
+
+	if (new_mmap_size > 0) {
+		gaudi->nic_cq_mmap_size = new_mmap_size;
+		return;
+	}
+
+	vma->vm_private_data = NULL;
+	gaudi->nic_cq_mmap = false;
+}
+
+static const struct vm_operations_struct nic_cq_vm_ops = {
+	.close = nic_cq_vm_close
+};
+
+int gaudi_nic_cq_mmap(struct hl_device *hdev, struct vm_area_struct *vma)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 size;
+	int rc;
+
+	if (!(gaudi->hw_cap_initialized & HW_CAP_NIC_DRV))
+		return -EFAULT;
+
+	mutex_lock(&gaudi->nic_cq_user_lock);
+
+	if (!gaudi->nic_cq_enable) {
+		dev_err(hdev->dev, "NIC CQ is disabled, can't mmap\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	if (gaudi->nic_cq_mmap) {
+		dev_err(hdev->dev, "NIC CQ is already mmapped, can't mmap\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	size = gaudi->nic_cq_user_num_of_entries * sizeof(struct hl_nic_cqe);
+
+	dev_dbg(hdev->dev, "mmap NIC CQ buffer, size: 0x%x\n", size);
+
+	/* Validation check */
+	if ((vma->vm_end - vma->vm_start) != ALIGN(size, PAGE_SIZE)) {
+		dev_err(hdev->dev,
+			"NIC mmap failed, mmap size 0x%lx != 0x%x CQ buffer size\n",
+			vma->vm_end - vma->vm_start, size);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	vma->vm_ops = &nic_cq_vm_ops;
+	vma->vm_private_data = hdev;
+
+	dev_dbg(hdev->dev, "mapping NIC CQ buffer\n");
+
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY |
+			VM_NORESERVE;
+
+	rc = remap_vmalloc_range(vma, gaudi->nic_cq_buf, 0);
+	if (rc) {
+		dev_err(hdev->dev, "failed to map the NIC CQ buffer\n");
+		goto out;
+	}
+
+	gaudi->nic_cq_mmap_size = size;
+	gaudi->nic_cq_mmap = true;
+out:
+	mutex_unlock(&gaudi->nic_cq_user_lock);
+
+	return rc;
 }
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 151f886cd7c4..6e98c830f6a2 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5273,6 +5273,13 @@ static int goya_nic_control(struct hl_device *hdev, u32 op, void *input,
 	return -ENXIO;
 }
 
+static int goya_nic_mmap(struct hl_device *hdev, struct vm_area_struct *vma)
+{
+	dev_err_ratelimited(hdev->dev,
+			"NIC mmap operations cannot be performed on Goya\n");
+	return -ENXIO;
+}
+
 static int goya_get_mac_addr(struct hl_device *hdev,
 			struct hl_info_mac_addr *mac_addr)
 {
@@ -5399,6 +5406,7 @@ static const struct hl_asic_funcs goya_funcs = {
 	.send_cpu_message = goya_send_cpu_message,
 	.get_hw_state = goya_get_hw_state,
 	.nic_control = goya_nic_control,
+	.nic_cq_mmap = goya_nic_mmap,
 	.pci_bars_map = goya_pci_bars_map,
 	.init_iatu = goya_init_iatu,
 	.get_mac_addr = goya_get_mac_addr,
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 227bc7c98e08..840f31a18209 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -848,6 +848,46 @@ struct hl_debug_args {
 #define HL_NIC_MIN_CONN_ID	1
 #define HL_NIC_MAX_CONN_ID	1023
 
+/* Requester */
+#define HL_NIC_CQE_TYPE_REQ	0
+/* Responder */
+#define HL_NIC_CQE_TYPE_RES	1
+
+/**
+ * struct hl_nic_cqe: NIC CQ entry. This structure is shared between the driver
+ *                    and the user application. It represents each entry of the
+ *                    NIC CQ buffer.
+ * @requester.wqe_index: work queue index - for requester only.
+ * @responder.msg_id: message ID to notify which receive action was completed -
+ *                    for responder only.
+ * @qp_err.syndrome: error syndrome of the QP error - for QP error only.
+ * @port: NIC port index of the related CQ.
+ * @qp_number: QP number - for requester or QP error only.
+ * @type: type of the CQE - requester or responder.
+ * @is_err: true for QP error entry, false otherwise.
+ */
+struct hl_nic_cqe {
+	union {
+		struct {
+			__u32 wqe_index;
+		} requester;
+
+		struct {
+			__u32 msg_id;
+		} responder;
+
+		struct {
+			__u32 syndrome;
+		} qp_err;
+	};
+
+	__u32 port;
+	__u32 qp_number;
+	__u8 type;
+	__u8 is_err;
+	__u8 pad[2];
+};
+
 struct hl_nic_alloc_conn_in {
 	/* NIC port ID */
 	__u32 port;
@@ -934,6 +974,53 @@ struct hl_nic_destroy_conn_in {
 	__u32 conn_id;
 };
 
+struct hl_nic_cq_create_in {
+	/* Number of entries in the CQ buffer */
+	__u32 cq_num_of_entries;
+	__u32 pad;
+};
+
+struct hl_nic_cq_create_out {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+};
+
+struct hl_nic_cq_destroy_in {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+};
+
+struct hl_nic_cq_update_consumed_cqes_in {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+	/* Number of consumed CQEs */
+	__u32 cq_num_of_consumed_entries;
+	__u32 pad;
+};
+
+struct hl_nic_cq_poll_wait_in {
+	/* Handle of the CQ buffer */
+	__u64 handle;
+	/* Absolute timeout to wait in microseconds */
+	__u64 timeout_us;
+};
+
+enum hl_nic_cq_status {
+	HL_NIC_CQ_SUCCESS,
+	HL_NIC_CQ_TIMEOUT,
+	HL_NIC_CQ_OVERFLOW
+};
+
+struct hl_nic_cq_poll_wait_out {
+	/* CQE producer index - first CQE to consume */
+	__u32 pi;
+	/* Number of CQEs to consume, starting from pi */
+	__u32 num_of_cqes;
+	/* Return status */
+	__u32 status;
+	__u32 pad;
+};
+
 /* Opcode to allocate connection ID */
 #define HL_NIC_OP_ALLOC_CONN			0
 /* Opcode to set up a requester connection context */
@@ -942,6 +1029,16 @@ struct hl_nic_destroy_conn_in {
 #define HL_NIC_OP_SET_RES_CONN_CTX		2
 /* Opcode to destroy a connection */
 #define HL_NIC_OP_DESTROY_CONN			3
+/* Opcode to create a CQ */
+#define HL_NIC_OP_CQ_CREATE			4
+/* Opcode to destroy a CQ */
+#define HL_NIC_OP_CQ_DESTROY			5
+/* Opcode to wait on CQ */
+#define HL_NIC_OP_CQ_WAIT			6
+/* Opcode to poll on CQ */
+#define HL_NIC_OP_CQ_POLL			7
+/* Opcode to update the number of consumed CQ entries */
+#define HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES	8
 
 struct hl_nic_args {
 	/* Pointer to user input structure (relevant to specific opcodes) */
@@ -1123,10 +1220,24 @@ struct hl_nic_args {
  * - Set up a requester connection context
  * - Set up a responder connection context
  * - Destroy a connection
+ * - Create a completion queue
+ * - Destroy a completion queue
+ * - Wait on completion queue
+ * - Poll a completion queue
+ * - Update consumed completion queue entries
  *
  * For all operations, the user should provide a pointer to an input structure
  * with the context parameters. Some of the operations also require a pointer to
  * an output structure for result/status.
+ * The CQ create operation returns a handle which the user-space process needs
+ * to use to mmap the CQ buffer in order to access the CQ entries.
+ * This handle should be provided when destroying the CQ.
+ * The poll/wait CQ operations return the number of available CQ entries of type
+ * struct hl_nic_cqe.
+ * Since the CQ is a cyclic buffer, the user-space process needs to inform the
+ * driver regarding how many of the available CQEs were actually
+ * processed/consumed. Only then the driver will override them with newer
+ * entries.
  *
  */
 #define HL_IOCTL_NIC	_IOWR('H', 0x07, struct hl_nic_args)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/15] habanalabs/gaudi: add WQ control operations
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (7 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 09/15] habanalabs/gaudi: add CQ " Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 11/15] habanalabs/gaudi: add QP error handling Oded Gabbay
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add Work Queue (WQ) opcodes to NIC ioctl. A WQ contains entries (WQEs)
where each WQE represents a packet that should be sent or received.

Each WQ has two types: requester (sender) and responder (receiver).

The added opcodes are:
- Set WQ: set the WQ configuration in the HW. The user should provide the
          device virtual address of the WQ.
- Unset WQ: reset the WQ configuration in the HW.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 .../misc/habanalabs/common/habanalabs_ioctl.c |  10 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 184 ++++++++++++++++++
 include/uapi/misc/habanalabs.h                |  33 ++++
 3 files changed, 225 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs_ioctl.c b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
index 6947ef519872..ad6dab5344f9 100644
--- a/drivers/misc/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/common/habanalabs_ioctl.c
@@ -24,7 +24,7 @@ static u32 hl_debug_struct_size[HL_DEBUG_OP_TIMESTAMP + 1] = {
 
 };
 
-static u32 hl_nic_input_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
+static u32 hl_nic_input_size[HL_NIC_OP_USER_WQ_UNSET + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_in),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = sizeof(struct hl_nic_req_conn_ctx_in),
 	[HL_NIC_OP_SET_RES_CONN_CTX] = sizeof(struct hl_nic_res_conn_ctx_in),
@@ -35,9 +35,11 @@ static u32 hl_nic_input_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_in),
 	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] =
 			sizeof(struct hl_nic_cq_update_consumed_cqes_in),
+	[HL_NIC_OP_USER_WQ_SET] = sizeof(struct hl_nic_user_wq_arr_set_in),
+	[HL_NIC_OP_USER_WQ_UNSET] = sizeof(struct hl_nic_user_wq_arr_unset_in)
 };
 
-static u32 hl_nic_output_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
+static u32 hl_nic_output_size[HL_NIC_OP_USER_WQ_UNSET + 1] = {
 	[HL_NIC_OP_ALLOC_CONN] = sizeof(struct hl_nic_alloc_conn_out),
 	[HL_NIC_OP_SET_REQ_CONN_CTX] = 0,
 	[HL_NIC_OP_SET_RES_CONN_CTX] = 0,
@@ -47,6 +49,8 @@ static u32 hl_nic_output_size[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES + 1] = {
 	[HL_NIC_OP_CQ_WAIT] = sizeof(struct hl_nic_cq_poll_wait_out),
 	[HL_NIC_OP_CQ_POLL] = sizeof(struct hl_nic_cq_poll_wait_out),
 	[HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES] = 0,
+	[HL_NIC_OP_USER_WQ_SET] = 0,
+	[HL_NIC_OP_USER_WQ_UNSET] = 0
 };
 
 static int device_status_info(struct hl_device *hdev, struct hl_info_args *args)
@@ -641,6 +645,8 @@ static int hl_nic_ioctl(struct hl_fpriv *hpriv, void *data)
 	case HL_NIC_OP_CQ_WAIT:
 	case HL_NIC_OP_CQ_POLL:
 	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
+	case HL_NIC_OP_USER_WQ_SET:
+	case HL_NIC_OP_USER_WQ_UNSET:
 		args->input_size =
 			min(args->input_size, hl_nic_input_size[args->op]);
 		args->output_size =
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 0583b34a728f..8f6585c700cf 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -3268,6 +3268,170 @@ int gaudi_nic_get_mac_addr(struct hl_device *hdev,
 	return 0;
 }
 
+static int wq_port_check(struct hl_device *hdev, u32 port)
+{
+	if (port >= NIC_NUMBER_OF_ENGINES) {
+		dev_err(hdev->dev, "Invalid port %d\n", port);
+		return -EINVAL;
+	}
+
+	if (!(hdev->nic_ports_mask & BIT(port))) {
+		dev_err(hdev->dev, "Port %d is disabled\n", port);
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int user_wq_arr_set(struct hl_device *hdev,
+				struct hl_nic_user_wq_arr_set_in *in)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+	u64 wq_base_addr, num_of_wq_entries_log;
+	u32 port, type;
+	int rc;
+
+	if (!in) {
+		dev_err(hdev->dev, "missing parameters, can't set user WQ\n");
+		return -EINVAL;
+	}
+
+	type = in->type;
+	if (type != HL_NIC_USER_WQ_SEND && type != HL_NIC_USER_WQ_RECV) {
+		dev_err(hdev->dev, "invalid type %d, can't set user WQ\n",
+			type);
+		return -EINVAL;
+	}
+
+	port = in->port;
+
+	rc = wq_port_check(hdev, port);
+	if (rc)
+		return rc;
+
+	gaudi_nic = &gaudi->nic_devices[port];
+
+	if (in->num_of_wqs == 0) {
+		dev_err(hdev->dev,
+			"number of WQs must be bigger than zero, port: %d\n",
+			port);
+		return -EINVAL;
+	}
+
+	/* H/W limitation */
+	if (in->num_of_wqs > NIC_HW_MAX_QP_NUM) {
+		dev_err(hdev->dev,
+			"number of WQs (0x%x) can't be bigger than 0x%x, port: %d\n",
+			in->num_of_wqs, NIC_HW_MAX_QP_NUM, port);
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(in->num_of_wq_entries)) {
+		dev_err(hdev->dev,
+			"number of entries (0x%x) must be a power of 2, port: %d\n",
+			in->num_of_wq_entries, port);
+		return -EINVAL;
+	}
+
+	/* H/W cache line constraint */
+	if (in->num_of_wq_entries < 4) {
+		dev_err(hdev->dev,
+			"number of entries (0x%x) must be at least 4, port: %d\n",
+			in->num_of_wq_entries, port);
+		return -EINVAL;
+	}
+
+	/* H/W limitation */
+	if (in->num_of_wq_entries > USER_WQES_MAX_NUM) {
+		dev_err(hdev->dev,
+			"number of entries (0x%x) can't be bigger than 0x%x, port: %d\n",
+			in->num_of_wq_entries, USER_WQES_MAX_NUM, port);
+		return -EINVAL;
+	}
+
+	if (!IS_ALIGNED(in->addr, DEVICE_CACHE_LINE_SIZE)) {
+		dev_err(hdev->dev,
+			"WQ VA (0x%llx) must be aligned to cache line size (0x%x), port: %d\n",
+			in->addr, DEVICE_CACHE_LINE_SIZE, port);
+		return -EINVAL;
+	}
+
+	wq_base_addr = in->addr;
+	num_of_wq_entries_log = ilog2(in->num_of_wq_entries);
+
+	mutex_lock(&gaudi_nic->user_wq_lock);
+
+	if (type == HL_NIC_USER_WQ_SEND) {
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_49_32_0,
+				(wq_base_addr >> 32) & 0x3FFFFF);
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_0,
+				wq_base_addr & 0xFFFFFFFF);
+		NIC_WREG32(mmNIC0_TXE0_LOG_MAX_WQ_SIZE_0,
+				num_of_wq_entries_log - 2);
+	} else {
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_LO,
+				wq_base_addr & 0xFFFFFFFF);
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_HI,
+			((wq_base_addr >> 32) & 0xFFFFFFFF) |
+			((num_of_wq_entries_log - 4) << 24));
+	}
+
+	mutex_unlock(&gaudi_nic->user_wq_lock);
+
+	return 0;
+}
+
+static void _user_wq_arr_unset(struct hl_device *hdev, u32 port, u32 type)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic;
+
+	gaudi_nic = &gaudi->nic_devices[port];
+
+	mutex_lock(&gaudi_nic->user_wq_lock);
+
+	if (type == HL_NIC_USER_WQ_SEND) {
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_49_32_0, 0);
+		NIC_WREG32(mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_0, 0);
+		NIC_WREG32(mmNIC0_TXE0_LOG_MAX_WQ_SIZE_0, 0);
+	} else {
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_LO, 0);
+		NIC_WREG32(mmNIC0_RXE0_WIN0_WQ_BASE_HI, 0);
+	}
+
+	mutex_unlock(&gaudi_nic->user_wq_lock);
+}
+
+static int user_wq_arr_unset(struct hl_device *hdev,
+				struct hl_nic_user_wq_arr_unset_in *in)
+{
+	u32 port, type;
+	int rc;
+
+	if (!in) {
+		dev_err(hdev->dev, "missing parameters, can't unset user WQ\n");
+		return -EINVAL;
+	}
+
+	type = in->type;
+	if (type != HL_NIC_USER_WQ_SEND && type != HL_NIC_USER_WQ_RECV) {
+		dev_err(hdev->dev, "invalid type %d, can't unset user WQ\n",
+			type);
+		return -EINVAL;
+	}
+
+	port = in->port;
+
+	rc = wq_port_check(hdev, port);
+	if (rc)
+		return rc;
+
+	_user_wq_arr_unset(hdev, port, type);
+
+	return 0;
+}
+
 static struct hl_qp *qp_get(struct hl_device *hdev,
 			struct gaudi_nic_device *gaudi_nic, u32 conn_id)
 {
@@ -3640,6 +3804,12 @@ int gaudi_nic_control(struct hl_device *hdev, u32 op, void *input, void *output)
 	case HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES:
 		rc = cq_update_consumed_cqes(hdev, input);
 		break;
+	case HL_NIC_OP_USER_WQ_SET:
+		rc = user_wq_arr_set(hdev, input);
+		break;
+	case HL_NIC_OP_USER_WQ_UNSET:
+		rc = user_wq_arr_unset(hdev, input);
+		break;
 	default:
 		dev_err(hdev->dev, "Invalid NIC control request %d\n", op);
 		return -ENOTTY;
@@ -3679,6 +3849,19 @@ static void qps_destroy(struct hl_device *hdev)
 	}
 }
 
+static void wq_arrs_destroy(struct hl_device *hdev)
+{
+	int i;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+		if (!(hdev->nic_ports_mask & BIT(i)))
+			continue;
+
+		_user_wq_arr_unset(hdev, i, HL_NIC_USER_WQ_SEND);
+		_user_wq_arr_unset(hdev, i, HL_NIC_USER_WQ_RECV);
+	}
+}
+
 void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 {
 	struct hl_device *hdev = ctx->hdev;
@@ -3691,6 +3874,7 @@ void gaudi_nic_ctx_fini(struct hl_ctx *ctx)
 	/* wait for the NIC to digest the invalid QPs */
 	msleep(20);
 	cq_destroy(hdev);
+	wq_arrs_destroy(hdev);
 }
 
 static void nic_cq_vm_close(struct vm_area_struct *vma)
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 840f31a18209..5678fda2fddc 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -1021,6 +1021,31 @@ struct hl_nic_cq_poll_wait_out {
 	__u32 pad;
 };
 
+/* Send user WQ array type */
+#define HL_NIC_USER_WQ_SEND	0
+/* Receive user WQ array type */
+#define HL_NIC_USER_WQ_RECV	1
+
+struct hl_nic_user_wq_arr_set_in {
+	/* WQ array address */
+	__u64 addr;
+	/* NIC port ID */
+	__u32 port;
+	/* Number of user WQs */
+	__u32 num_of_wqs;
+	/* Number of entries per user WQ */
+	__u32 num_of_wq_entries;
+	/* Type of user WQ array */
+	__u32 type;
+};
+
+struct hl_nic_user_wq_arr_unset_in {
+	/* NIC port ID */
+	__u32 port;
+	/* Type of user WQ array */
+	__u32 type;
+};
+
 /* Opcode to allocate connection ID */
 #define HL_NIC_OP_ALLOC_CONN			0
 /* Opcode to set up a requester connection context */
@@ -1039,6 +1064,10 @@ struct hl_nic_cq_poll_wait_out {
 #define HL_NIC_OP_CQ_POLL			7
 /* Opcode to update the number of consumed CQ entries */
 #define HL_NIC_OP_CQ_UPDATE_CONSUMED_CQES	8
+/* Opcode to set a user WQ array */
+#define HL_NIC_OP_USER_WQ_SET			9
+/* Opcode to unset a user WQ array */
+#define HL_NIC_OP_USER_WQ_UNSET			10
 
 struct hl_nic_args {
 	/* Pointer to user input structure (relevant to specific opcodes) */
@@ -1225,6 +1254,8 @@ struct hl_nic_args {
  * - Wait on completion queue
  * - Poll a completion queue
  * - Update consumed completion queue entries
+ * - Set a work queue
+ * - Unset a work queue
  *
  * For all operations, the user should provide a pointer to an input structure
  * with the context parameters. Some of the operations also require a pointer to
@@ -1238,6 +1269,8 @@ struct hl_nic_args {
  * driver regarding how many of the available CQEs were actually
  * processed/consumed. Only then the driver will override them with newer
  * entries.
+ * The set WQ operation should provide the device virtual address of the WQ with
+ * a matching size for the number of WQs and entries per WQ.
  *
  */
 #define HL_IOCTL_NIC	_IOWR('H', 0x07, struct hl_nic_args)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/15] habanalabs/gaudi: add QP error handling
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (8 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 10/15] habanalabs/gaudi: add WQ " Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC Oded Gabbay
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add Queue Pair (QP) error notification to the user e.g. security violation,
too many retransmissions, invalid QP etc.

Whenever a QP caused an error, the firmware will send an event to the
driver which will push the error as an error entry to the Completion Queue
(if exists).

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/gaudi.c     | 13 ++++
 drivers/misc/habanalabs/gaudi/gaudiP.h    |  1 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c | 93 +++++++++++++++++++++++
 3 files changed, 107 insertions(+)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 34b99bd94ef0..8fc2288fb424 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -6658,6 +6658,19 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 		hl_fw_unmask_irq(hdev, event_type);
 		break;
 
+	case GAUDI_EVENT_NIC0_QP0:
+	case GAUDI_EVENT_NIC0_QP1:
+	case GAUDI_EVENT_NIC1_QP0:
+	case GAUDI_EVENT_NIC1_QP1:
+	case GAUDI_EVENT_NIC2_QP0:
+	case GAUDI_EVENT_NIC2_QP1:
+	case GAUDI_EVENT_NIC3_QP0:
+	case GAUDI_EVENT_NIC3_QP1:
+	case GAUDI_EVENT_NIC4_QP0:
+	case GAUDI_EVENT_NIC4_QP1:
+		gaudi_nic_handle_qp_err(hdev, event_type);
+		break;
+
 	case GAUDI_EVENT_PSOC_GPIO_U16_0:
 		cause = le64_to_cpu(eq_entry->data[0]) & 0xFF;
 		dev_err(hdev->dev,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index ba3150c073ca..dc1dcff43cd6 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -578,5 +578,6 @@ netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
 					struct sk_buff *skb);
 int gaudi_nic_sw_init(struct hl_device *hdev);
 void gaudi_nic_sw_fini(struct hl_device *hdev);
+void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type);
 
 #endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 8f6585c700cf..41789f7ed32e 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -3958,3 +3958,96 @@ int gaudi_nic_cq_mmap(struct hl_device *hdev, struct vm_area_struct *vma)
 
 	return rc;
 }
+
+static char *get_syndrome_text(u32 syndrome)
+{
+	char *str;
+
+	switch (syndrome) {
+	case 0x05:
+		str = "Rx got invalid QP";
+		break;
+	case 0x06:
+		str = "Rx transport service mismatch";
+		break;
+	case 0x09:
+		str = "Rx Rkey check failed";
+		break;
+	case 0x40:
+		str = "timer retry exceeded";
+		break;
+	case 0x41:
+		str = "NACK retry exceeded";
+		break;
+	case 0x42:
+		str = "doorbell on invalid QP";
+		break;
+	case 0x43:
+		str = "doorbell security check failed";
+		break;
+	case 0x44:
+		str = "Tx got invalid QP";
+		break;
+	case 0x45:
+		str = "responder got ACK/NACK on invalid QP";
+		break;
+	case 0x46:
+		str = "responder try to send ACK/NACK on invalid QP";
+		break;
+	default:
+		str = "unknown syndrome";
+		break;
+	}
+
+	return str;
+}
+
+void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic =
+			&gaudi->nic_devices[event_type - GAUDI_EVENT_NIC0_QP0];
+	struct qp_err *qp_err_arr = gaudi_nic->qp_err_mem_cpu;
+	struct hl_nic_cqe cqe_sw;
+	u32 pi, ci;
+
+	mutex_lock(&gaudi->nic_qp_err_lock);
+
+	if (!gaudi->nic_cq_enable)
+		dev_err_ratelimited(hdev->dev,
+			"received NIC %d QP error event %d but no CQ to push it\n",
+			gaudi_nic->port, event_type);
+
+	pi = NIC_RREG32(mmNIC0_QPC0_ERR_FIFO_PRODUCER_INDEX);
+	ci = gaudi_nic->qp_err_ci;
+
+	cqe_sw.is_err = true;
+	cqe_sw.port = gaudi_nic->port;
+
+	while (ci < pi) {
+		cqe_sw.type = QP_ERR_IS_REQ(qp_err_arr[ci]) ?
+				HL_NIC_CQE_TYPE_REQ : HL_NIC_CQE_TYPE_RES;
+		cqe_sw.qp_number = QP_ERR_QP_NUM(qp_err_arr[ci]);
+		cqe_sw.qp_err.syndrome = QP_ERR_ERR_NUM(qp_err_arr[ci]);
+
+		ci = (ci + 1) & (QP_ERR_BUF_LEN - 1);
+
+		dev_err_ratelimited(hdev->dev,
+			"NIC QP error port: %d, type: %d, qpn: %d, syndrome: %s (0x%x)\n",
+			cqe_sw.port, cqe_sw.type, cqe_sw.qp_number,
+			get_syndrome_text(cqe_sw.qp_err.syndrome),
+			cqe_sw.qp_err.syndrome);
+
+		if (gaudi->nic_cq_enable)
+			copy_cqe_to_main_queue(hdev, &cqe_sw);
+	}
+
+	gaudi_nic->qp_err_ci = ci;
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_CONSUMER_INDEX, ci);
+
+	/* signal the completion queue that there are available CQEs */
+	if (gaudi->nic_cq_enable)
+		complete(&gaudi->nic_cq_comp);
+
+	mutex_unlock(&gaudi->nic_qp_err_lock);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (9 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 11/15] habanalabs/gaudi: add QP error handling Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 20:01   ` Jakub Kicinski
  2020-09-10 16:11 ` [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add several debugfs entries to help us debug the NIC engines and ports and
also the communication layer of the DL training application that use them.

There are eight new entries. Detailed description is in the documentation
file but here is a summary:

- nic_mac_loopback: enable mac loopback mode per port
- nic_ports_status: print physical connection status per port
- nic_pcs_fail_time_frame: configure windows size for measuring pcs
                           failures
- nic_pcs_fail_threshold: configure pcs failures threshold for
                          reconfiguring the link
- nic_pam4_tx_taps: configure PAM4 TX taps
- nic_polarity: configure polarity for NIC port lanes
- nic_check_link: configure whether to check the PCS link periodically
- nic_phy_auto_neg_lpbk: enable PHY auto-negotiation loopback

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 .../ABI/testing/debugfs-driver-habanalabs     |  69 +++
 drivers/misc/habanalabs/gaudi/Makefile        |   3 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     |   2 +
 .../misc/habanalabs/gaudi/gaudi_nic_debugfs.c | 402 ++++++++++++++++++
 4 files changed, 475 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c

diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs
index 2e9ae311e02d..8fca02b92a80 100644
--- a/Documentation/ABI/testing/debugfs-driver-habanalabs
+++ b/Documentation/ABI/testing/debugfs-driver-habanalabs
@@ -176,3 +176,72 @@ KernelVersion:  5.6
 Contact:        oded.gabbay@gmail.com
 Description:    Sets the stop-on_error option for the device engines. Value of
                 "0" is for disable, otherwise enable.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_mac_loopback
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Allows the root user to disable/enable MAC loopback for Gaudi
+                NIC ports. The ports will function as if a physical loopback
+                transceiver was connected. A bitmask should be provided where
+                each bit represents a port, up to 20 bits.
+                Known issues for this mode:
+                1. Odd ports PHY is not stopped so the peer's odd ports will get
+                PCS link.
+                2. Might cause an interrupt storm due to high B/W.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_ports_status
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Displays a summary the PC link state of all Gaudi NIC ports.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_pcs_fail_time_frame
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Allows the root user to set the used time frame in seconds for
+                detecting a loose PCS link of a Gaudi NIC port. We count how
+                many PCS link failures occurred in a time frame up to a
+                threshold which will cause PHY reconfiguration for getting a new
+                PCS link.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_pcs_fail_threshold
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Allows the root user to set the used threshold for detecting a
+                loose PCS link of a Gaudi NIC port. We count how many PCS link
+                failures occurred in a time frame up to the threshold which will
+                cause PHY reconfiguration for getting a new PCS link.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_pam4_tx_taps
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Allows the root user to set the PAM4 Tx taps for Gaudi NIC port
+                lanes. The lanes indices are 0-39.
+                Acceptable input string form:
+                <lane> <tx_pre2> <tx_pre1> <tx_main> <tx_post1> <tx_post2>.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_polarity
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Allows the root user to set the polarity for Gaudi NIC port
+                lanes. The lanes indices are 0-39.
+                Acceptable input string form: <lane> <pol_tx> <pol_rx>.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_check_link
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Sets the PCS link periodic check for all Gaudi NIC ports. Value
+                of "0" is for disable, otherwise enable.
+
+What:           /sys/kernel/debug/habanalabs/hl<n>/nic_phy_auto_neg_lpbk
+Date:           Nov 2020
+KernelVersion:  5.10
+Contact:        oshpigelman@habana.ai
+Description:    Sets the PHY Autoneg loopback support for all Gaudi NIC ports.
+                Value of "0" is for disable, otherwise enable.
diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index c5143cf6f025..437b21e54c95 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -2,4 +2,5 @@
 HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
-HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o
+HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o \
+	gaudi/gaudi_nic_debugfs.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 41789f7ed32e..a73635a4c44b 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -3025,6 +3025,8 @@ int gaudi_nic_ports_init(struct hl_device *hdev)
 			}
 		}
 
+	gaudi_nic_debugfs_init(hdev);
+
 	gaudi->hw_cap_initialized |= HW_CAP_NIC_DRV;
 
 	return 0;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c b/drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
new file mode 100644
index 000000000000..2e99d2683512
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
@@ -0,0 +1,402 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include <linux/debugfs.h>
+#include <linux/nospec.h>
+
+#ifdef CONFIG_DEBUG_FS
+
+#define POLARITY_KBUF_SIZE	8
+#define TX_TAPS_KBUF_SIZE	25
+
+static ssize_t debugfs_pam4_tx_taps_write(struct file *f,
+						const char __user *buf,
+						size_t count, loff_t *ppos)
+{
+	struct hl_device *hdev = file_inode(f)->i_private;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	char kbuf[TX_TAPS_KBUF_SIZE];
+	char *c1, *c2;
+	ssize_t rc;
+	u32 lane;
+	s32 tx_pre2, tx_pre1, tx_main, tx_post1, tx_post2;
+	s32 *taps;
+
+	if (count > sizeof(kbuf) - 1)
+		goto err;
+	if (copy_from_user(kbuf, buf, count))
+		goto err;
+	kbuf[count] = '\0';
+
+	c1 = kbuf;
+	c2 = strchr(c1, ' ');
+	if (!c2)
+		goto err;
+	*c2 = '\0';
+
+	rc = kstrtou32(c1, 10, &lane);
+	if (rc)
+		goto err;
+
+	if (lane >= NIC_MAX_NUM_OF_LANES) {
+		dev_err(hdev->dev, "lane max value is %d\n",
+			NIC_MAX_NUM_OF_LANES - 1);
+		return -EINVAL;
+	}
+
+	/* Turn off speculation due to Spectre vulnerability */
+	lane = array_index_nospec(lane, NIC_MAX_NUM_OF_LANES);
+
+	c1 = c2 + 1;
+
+	c2 = strchr(c1, ' ');
+	if (!c2)
+		goto err;
+	*c2 = '\0';
+
+	rc = kstrtos32(c1, 10, &tx_pre2);
+	if (rc)
+		goto err;
+
+	c1 = c2 + 1;
+
+	c2 = strchr(c1, ' ');
+	if (!c2)
+		goto err;
+	*c2 = '\0';
+
+	rc = kstrtos32(c1, 10, &tx_pre1);
+	if (rc)
+		goto err;
+
+	c1 = c2 + 1;
+
+	c2 = strchr(c1, ' ');
+	if (!c2)
+		goto err;
+	*c2 = '\0';
+
+	rc = kstrtos32(c1, 10, &tx_main);
+	if (rc)
+		goto err;
+
+	c1 = c2 + 1;
+
+	c2 = strchr(c1, ' ');
+	if (!c2)
+		goto err;
+	*c2 = '\0';
+
+	rc = kstrtos32(c1, 10, &tx_post1);
+	if (rc)
+		goto err;
+
+	c1 = c2 + 1;
+
+	rc = kstrtos32(c1, 10, &tx_post2);
+	if (rc)
+		goto err;
+
+	taps = gaudi->nic_pam4_tx_taps[lane].taps;
+	taps[0] = tx_pre2;
+	taps[1] = tx_pre1;
+	taps[2] = tx_main;
+	taps[3] = tx_post1;
+	taps[4] = tx_post2;
+
+	return count;
+err:
+	dev_err(hdev->dev,
+		"usage: echo <lane> <tx_pre2> <tx_pre1> <tx_main> <tx_post1> <tx_post2> > nic_pam4_tx_taps\n");
+
+	return -EINVAL;
+}
+
+static const struct file_operations debugfs_pam4_tx_taps_fops = {
+	.owner = THIS_MODULE,
+	.write = debugfs_pam4_tx_taps_write,
+};
+
+static ssize_t debugfs_polarity_write(struct file *f, const char __user *buf,
+					size_t count, loff_t *ppos)
+{
+	struct hl_device *hdev = file_inode(f)->i_private;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct cpucp_nic_info *nic_info = &hdev->asic_prop.cpucp_nic_info;
+	char kbuf[POLARITY_KBUF_SIZE];
+	char *c1, *c2;
+	ssize_t rc;
+	u64 val;
+	u32 lane;
+	u8 pol_tx, pol_rx;
+
+	if (count > sizeof(kbuf) - 1)
+		goto err;
+	if (copy_from_user(kbuf, buf, count))
+		goto err;
+	kbuf[count] = '\0';
+
+	c1 = kbuf;
+	c2 = strchr(c1, ' ');
+	if (!c2)
+		goto err;
+	*c2 = '\0';
+
+	rc = kstrtou32(c1, 10, &lane);
+	if (rc)
+		goto err;
+
+	if (lane >= NIC_MAX_NUM_OF_LANES) {
+		dev_err(hdev->dev, "lane max value is %d\n",
+			NIC_MAX_NUM_OF_LANES - 1);
+		return -EINVAL;
+	}
+
+	c1 = c2 + 1;
+
+	c2 = strchr(c1, ' ');
+	if (!c2)
+		goto err;
+	*c2 = '\0';
+
+	rc = kstrtou8(c1, 10, &pol_tx);
+	if (rc)
+		goto err;
+
+	c1 = c2 + 1;
+
+	rc = kstrtou8(c1, 10, &pol_rx);
+	if (rc)
+		goto err;
+
+	if ((pol_tx & ~1) || (pol_rx & ~1)) {
+		dev_err(hdev->dev, "pol_tx and pol_rx should be 0 or 1\n");
+		goto err;
+	}
+
+	val = le64_to_cpu(nic_info->pol_tx_mask[0]);
+	val &= ~BIT_ULL(lane);
+	val |= ((u64) pol_tx) << lane;
+	nic_info->pol_tx_mask[0] = cpu_to_le64(val);
+
+	val = le64_to_cpu(nic_info->pol_rx_mask[0]);
+	val &= ~BIT_ULL(lane);
+	val |= ((u64) pol_rx) << lane;
+	nic_info->pol_rx_mask[0] = cpu_to_le64(val);
+
+	gaudi->nic_use_fw_polarity = true;
+
+	return count;
+err:
+	dev_err(hdev->dev,
+		"usage: echo <lane> <pol_tx> <pol_rx> > nic_polarity\n");
+
+	return -EINVAL;
+}
+
+static const struct file_operations debugfs_polarity_fops = {
+	.owner = THIS_MODULE,
+	.write = debugfs_polarity_write,
+};
+
+static ssize_t debugfs_ports_status_read(struct file *f, char __user *buf,
+					size_t count, loff_t *ppos)
+{
+	struct hl_device *hdev = file_inode(f)->i_private;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	char tmp_buf[512] = {0};
+	ssize_t rc;
+	int i, up_cnt = 0, down_cnt = 0;
+
+	if (*ppos)
+		return 0;
+
+	for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++)
+		if ((hdev->nic_ports_mask & BIT(i))) {
+			if (gaudi->nic_devices[i].active)
+				up_cnt++;
+			else
+				down_cnt++;
+		}
+
+	if (up_cnt) {
+		sprintf(tmp_buf, "%d ports up (", up_cnt);
+
+		for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++)
+			if ((hdev->nic_ports_mask & BIT(i)) &&
+				gaudi->nic_devices[i].active)
+				sprintf(tmp_buf + strlen(tmp_buf), "%d, ", i);
+
+		sprintf(tmp_buf + strlen(tmp_buf) - 2, ")");
+	}
+
+	if (down_cnt) {
+		if (up_cnt)
+			sprintf(tmp_buf + strlen(tmp_buf), "\n");
+
+		sprintf(tmp_buf + strlen(tmp_buf), "%d ports down (", down_cnt);
+
+		for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++)
+			if ((hdev->nic_ports_mask & BIT(i)) &&
+				!gaudi->nic_devices[i].active)
+				sprintf(tmp_buf + strlen(tmp_buf), "%d, ", i);
+
+		sprintf(tmp_buf + strlen(tmp_buf) - 2, ")");
+	}
+
+	sprintf(tmp_buf + strlen(tmp_buf), "\n");
+
+	rc = simple_read_from_buffer(buf, strlen(tmp_buf) + 1, ppos, tmp_buf,
+					strlen(tmp_buf) + 1);
+
+	return rc;
+}
+
+static const struct file_operations debugfs_ports_status_fops = {
+	.owner = THIS_MODULE,
+	.read = debugfs_ports_status_read,
+};
+
+#define NIC_DEBUGFS(X, fmt, do_reset) \
+static ssize_t debugfs_##X##_read(struct file *f, \
+					char __user *buf, \
+					size_t count, \
+					loff_t *ppos) \
+{ \
+	struct hl_device *hdev = file_inode(f)->i_private; \
+	struct gaudi_device *gaudi = hdev->asic_specific; \
+	char tmp_buf[32]; \
+	ssize_t rc; \
+\
+	if (*ppos) \
+		return 0; \
+\
+	sprintf(tmp_buf, fmt "\n", gaudi->nic_##X); \
+	rc = simple_read_from_buffer(buf, strlen(tmp_buf) + 1, ppos, tmp_buf, \
+			strlen(tmp_buf) + 1); \
+\
+	return rc; \
+} \
+\
+static ssize_t debugfs_##X##_write(struct file *f, \
+					const char __user *buf, \
+					size_t count, \
+					loff_t *ppos) \
+{ \
+	struct hl_device *hdev = file_inode(f)->i_private; \
+	struct gaudi_device *gaudi = hdev->asic_specific; \
+	u64 val, base; \
+	ssize_t rc; \
+\
+	if (!strcmp(fmt, "%d")) \
+		base = 10; \
+	else \
+		base = 16; \
+\
+	rc = kstrtoull_from_user(buf, count, base, &val); \
+	if (rc) \
+		return rc; \
+\
+	if (val == gaudi->nic_##X) \
+		return count; \
+\
+	if (do_reset && gaudi->nic_debugfs_reset) { \
+		gaudi->nic_##X = val; \
+		hl_device_reset(hdev, true, false); \
+		ssleep(HL_PENDING_RESET_PER_SEC); \
+		return count; \
+	} \
+\
+	dev_info(hdev->dev, "NIC reset for %s started\n", __stringify(X)); \
+\
+	rc = gaudi_nic_hard_reset_prepare(hdev); \
+	if (rc) \
+		return rc; \
+\
+	gaudi_nic_stop(hdev); \
+\
+	/* must do this so the ports will be reopened */ \
+	gaudi->hw_cap_initialized &= ~HW_CAP_NIC_DRV; \
+\
+	gaudi->nic_##X = val; \
+\
+	gaudi_nic_ports_reopen(hdev); \
+\
+	dev_info(hdev->dev, "NIC reset for %s finished\n", __stringify(X)); \
+\
+	return count; \
+} \
+\
+static const struct file_operations debugfs_##X##_fops = { \
+	.owner = THIS_MODULE, \
+	.read = debugfs_##X##_read, \
+	.write = debugfs_##X##_write, \
+}
+
+NIC_DEBUGFS(mac_loopback, "0x%llx", true);
+NIC_DEBUGFS(pcs_fail_time_frame, "%d", false);
+NIC_DEBUGFS(pcs_fail_threshold, "%d", false);
+
+void gaudi_nic_debugfs_init(struct hl_device *hdev)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+
+	debugfs_create_file("nic_mac_loopback",
+				0644,
+				hdev->hl_debugfs.root,
+				hdev,
+				&debugfs_mac_loopback_fops);
+
+	debugfs_create_file("nic_ports_status",
+				0444,
+				hdev->hl_debugfs.root,
+				hdev,
+				&debugfs_ports_status_fops);
+
+	debugfs_create_file("nic_pcs_fail_time_frame",
+				0644,
+				hdev->hl_debugfs.root,
+				hdev,
+				&debugfs_pcs_fail_time_frame_fops);
+
+	debugfs_create_file("nic_pcs_fail_threshold",
+				0644,
+				hdev->hl_debugfs.root,
+				hdev,
+				&debugfs_pcs_fail_threshold_fops);
+
+	debugfs_create_file("nic_pam4_tx_taps",
+				0444,
+				hdev->hl_debugfs.root,
+				hdev,
+				&debugfs_pam4_tx_taps_fops);
+
+	debugfs_create_file("nic_polarity",
+				0444,
+				hdev->hl_debugfs.root,
+				hdev,
+				&debugfs_polarity_fops);
+
+	debugfs_create_u8("nic_check_link",
+				0644,
+				hdev->hl_debugfs.root,
+				&gaudi->nic_check_link);
+
+	debugfs_create_u8("nic_phy_auto_neg_lpbk",
+				0644,
+				hdev->hl_debugfs.root,
+				&gaudi->nic_phy_auto_neg_lpbk);
+}
+
+#else
+
+void gaudi_nic_debugfs_init(struct hl_device *hdev)
+{
+}
+
+#endif /* CONFIG_DEBUG_FS */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (10 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 20:19   ` Andrew Lunn
  2020-09-10 16:11 ` [PATCH 14/15] habanalabs/gaudi: support DCB protocol Oded Gabbay
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

The driver supports ethtool callbacks and provides statistics using the
device's profiling infrastructure (coresight).

We support the basic ethtool functionality and counters, as far as our H/W
provides support.

A summary of the supported callbacks:

- get_drvinfo: fill some basic information regarding the driver
- get_link_ksettings: get basic settings like speed, duplex,
                      Auto-negotiation and link modes.
- set_link_ksettings: only speed and Auto-negotiation setting is supported.
- get_link: returns link indication.
- get_strings: get counters strings.
- get_sset_count: get counters number.
- get_ethtool_stats: get counters values.
- get_module_info: get EEPROM type and length.
- get_module_eeprom: get EEPROM (supported in raw mode only).

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/Makefile        |   1 +
 drivers/misc/habanalabs/gaudi/gaudi.c         |   1 +
 drivers/misc/habanalabs/gaudi/gaudiP.h        |   7 +
 .../misc/habanalabs/gaudi/gaudi_coresight.c   | 144 +++++
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     |   5 +
 .../misc/habanalabs/gaudi/gaudi_nic_ethtool.c | 582 ++++++++++++++++++
 6 files changed, 740 insertions(+)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c

diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index 437b21e54c95..e3002dc34a74 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -3,4 +3,5 @@ HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
 HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o \
+	gaudi/gaudi_nic_ethtool.o \
 	gaudi/gaudi_nic_debugfs.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 8fc2288fb424..eb733a48eb72 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -1043,6 +1043,7 @@ static int gaudi_sw_init(struct hl_device *hdev)
 	gaudi->cpucp_info_get = gaudi_cpucp_info_get;
 	gaudi->nic_handle_rx = gaudi_nic_handle_rx;
 	gaudi->nic_handle_tx = gaudi_nic_handle_tx;
+	gaudi->nic_spmu_init = gaudi_nic_spmu_init;
 
 	gaudi->max_freq_value = GAUDI_MAX_CLK_FREQ;
 
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index dc1dcff43cd6..c7dbca13f197 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -438,6 +438,7 @@ struct gaudi_internal_qman_info {
  * @cpucp_info_get: get information on device from CPU-CP
  * @nic_handle_rx: NIC handler for incoming packet.
  * @nic_handle_tx: NIC handler for outgoing packet.
+ * @nic_spmu_init: initialize NIC CoreSight spmu counters.
  * @nic_devices: array that holds all NIC ports manage structures.
  * @nic_macros: array that holds all NIC macros manage structures.
  * @nic_pam4_tx_taps: array that holds all PAM4 Tx taps of all NIC lanes.
@@ -502,6 +503,7 @@ struct gaudi_device {
 	int (*cpucp_info_get)(struct hl_device *hdev);
 	void (*nic_handle_rx)(struct gaudi_nic_device *gaudi_nic);
 	int (*nic_handle_tx)(struct gaudi_nic_device *gaudi_nic, void *data);
+	void (*nic_spmu_init)(struct hl_device *hdev, int port);
 	struct gaudi_nic_device		nic_devices[NIC_NUMBER_OF_PORTS];
 	struct gaudi_nic_macro		nic_macros[NIC_NUMBER_OF_MACROS];
 	struct gaudi_nic_tx_taps	nic_pam4_tx_taps[NIC_MAX_NUM_OF_LANES];
@@ -576,8 +578,13 @@ irqreturn_t gaudi_nic_rx_irq_handler(int irq, void *arg);
 irqreturn_t gaudi_nic_cq_irq_handler(int irq, void *arg);
 netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
 					struct sk_buff *skb);
+void gaudi_nic_spmu_init(struct hl_device *hdev, int port);
 int gaudi_nic_sw_init(struct hl_device *hdev);
 void gaudi_nic_sw_fini(struct hl_device *hdev);
 void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type);
+int gaudi_config_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_event_types, u32 event_types[]);
+int gaudi_sample_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_out_data, u64 out_data[]);
 
 #endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_coresight.c b/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
index 881531d4d9da..c31953403d09 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_coresight.c
@@ -16,6 +16,11 @@
 #define SPMU_SECTION_SIZE		MME0_ACC_SPMU_MAX_OFFSET
 #define SPMU_EVENT_TYPES_OFFSET		0x400
 #define SPMU_MAX_COUNTERS		6
+#define PMSCR				0x6F0	/* Snapshot Control */
+#define PMEVCNTSR0			0x620	/* Event Counters Snapshot */
+#define PMOVSSR				0x614	/* Overflow Status Snapshot */
+#define PMCCNTSR_L			0x618	/* Cycle Counter Snapshot */
+#define PMCCNTSR_H			0x61c	/* Cycle Counter Snapshot */
 
 static u64 debug_stm_regs[GAUDI_STM_LAST + 1] = {
 	[GAUDI_STM_MME0_ACC]	= mmMME0_ACC_STM_BASE,
@@ -752,6 +757,27 @@ static int gaudi_config_bmon(struct hl_device *hdev,
 	return 0;
 }
 
+static bool gaudi_reg_is_nic_spmu(enum gaudi_debug_spmu_regs_index reg_idx)
+{
+	switch (reg_idx) {
+	case GAUDI_SPMU_NIC0_0:
+	case GAUDI_SPMU_NIC0_1:
+	case GAUDI_SPMU_NIC1_0:
+	case GAUDI_SPMU_NIC1_1:
+	case GAUDI_SPMU_NIC2_0:
+	case GAUDI_SPMU_NIC2_1:
+	case GAUDI_SPMU_NIC3_0:
+	case GAUDI_SPMU_NIC3_1:
+	case GAUDI_SPMU_NIC4_0:
+	case GAUDI_SPMU_NIC4_1:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
 static int gaudi_config_spmu(struct hl_device *hdev,
 		struct hl_debug_params *params)
 {
@@ -769,6 +795,16 @@ static int gaudi_config_spmu(struct hl_device *hdev,
 		return -EINVAL;
 	}
 
+	/*
+	 * NIC spmus are now configured by driver at init
+	 * and not accessible to user in dbg mode
+	 */
+	if (hdev->in_debug && gaudi_reg_is_nic_spmu(params->reg_idx)) {
+		dev_err(hdev->dev,
+			"Rejecting user debug configuration for NIC spmu\n");
+		return -EFAULT;
+	}
+
 	base_reg = debug_spmu_regs[params->reg_idx] - CFG_BASE;
 
 	if (params->enable) {
@@ -837,6 +873,114 @@ static int gaudi_config_spmu(struct hl_device *hdev,
 	return 0;
 }
 
+static int gaudi_sample_spmu(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	u64 base_reg;
+	u64 *output;
+	u32 output_arr_len;
+	u32 events_num;
+	u32 overflow_idx;
+	u32 cycle_cnt_idx;
+	int i;
+
+	if (params->reg_idx >= ARRAY_SIZE(debug_spmu_regs)) {
+		dev_err(hdev->dev, "Invalid register index in SPMU\n");
+		return -EINVAL;
+	}
+
+	base_reg = debug_spmu_regs[params->reg_idx] - CFG_BASE;
+
+	output = params->output;
+	output_arr_len = params->output_size / 8;
+	events_num = output_arr_len - 2;
+	overflow_idx = output_arr_len - 2;
+	cycle_cnt_idx = output_arr_len - 1;
+
+	if (!output)
+		return -EINVAL;
+
+	if (output_arr_len < 1) {
+		dev_err(hdev->dev,
+			"not enough values for SPMU sample\n");
+		return -EINVAL;
+	}
+
+	if (events_num > SPMU_MAX_COUNTERS) {
+		dev_err(hdev->dev,
+			"too many events values for SPMU sample\n");
+		return -EINVAL;
+	}
+
+	/* capture */
+	WREG32(base_reg + PMSCR, 1);
+
+	/* read the shadow registers */
+	for (i = 0 ; i < events_num ; i++)
+		output[i] = RREG32(base_reg + PMEVCNTSR0 + i * 4);
+
+	/* also get overflow and cyclecount */
+	if (output_arr_len == SPMU_MAX_COUNTERS + 2) {
+		output[overflow_idx] = RREG32(base_reg + PMOVSSR);
+
+		output[cycle_cnt_idx] = RREG32(base_reg + PMCCNTSR_H);
+		output[cycle_cnt_idx] <<= 32;
+		output[cycle_cnt_idx] |= RREG32(base_reg + PMCCNTSR_L);
+	}
+
+	return 0;
+}
+
+int gaudi_config_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_event_types, u32 event_types[])
+{
+	struct hl_debug_params params;
+	struct hl_debug_params_spmu spmu;
+	int i;
+
+	/* validate nic port */
+	if  (!gaudi_reg_is_nic_spmu(GAUDI_SPMU_NIC0_0 + port)) {
+		dev_err(hdev->dev, "Invalid nic port %u\n", port);
+		return -EFAULT;
+	}
+
+	memset(&params, 0, sizeof(struct hl_debug_params));
+	params.op = HL_DEBUG_OP_SPMU;
+	params.input = &spmu;
+	params.enable = true;
+	params.reg_idx = GAUDI_SPMU_NIC0_0 + port;
+
+	memset(&spmu, 0, sizeof(struct hl_debug_params_spmu));
+	spmu.event_types_num  = num_event_types;
+
+	for (i = 0 ; i < spmu.event_types_num ; i++)
+		spmu.event_types[i] = event_types[i];
+
+	return gaudi_config_spmu(hdev, &params);
+}
+
+int gaudi_sample_spmu_nic(struct hl_device *hdev, u32 port,
+		u32 num_out_data, u64 out_data[])
+{
+	struct hl_debug_params params;
+
+	if (!hdev->supports_coresight)
+		return 0;
+
+	/* validate nic port */
+	if  (!gaudi_reg_is_nic_spmu(GAUDI_SPMU_NIC0_0 + port)) {
+		dev_err(hdev->dev, "Invalid nic port %u\n", port);
+		return -EFAULT;
+	}
+
+	memset(&params, 0, sizeof(struct hl_debug_params));
+	params.output = out_data;
+	params.output_size = num_out_data * sizeof(uint64_t);
+	params.reg_idx = GAUDI_SPMU_NIC0_0 + port;
+
+	return gaudi_sample_spmu(hdev, &params);
+}
+
 int gaudi_debug_coresight(struct hl_device *hdev, void *data)
 {
 	struct hl_debug_params *params = data;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index a73635a4c44b..108db990efa8 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -2744,6 +2744,7 @@ static int port_register(struct hl_device *hdev, int port)
 	ndev->dev_port = port;
 
 	ndev->netdev_ops = &gaudi_nic_netdev_ops;
+	ndev->ethtool_ops = &gaudi_nic_ethtool_ops;
 	ndev->watchdog_timeo = NIC_TX_TIMEOUT;
 	ndev->min_mtu = ETH_MIN_MTU;
 	ndev->max_mtu = NIC_MAX_MTU;
@@ -2769,6 +2770,8 @@ static int port_register(struct hl_device *hdev, int port)
 				port);
 	}
 
+	gaudi->nic_spmu_init(hdev, port);
+
 	if (register_netdev(ndev)) {
 		dev_err(hdev->dev,
 			"Could not register netdevice, port: %d\n", port);
@@ -3233,6 +3236,8 @@ void gaudi_nic_ports_reopen(struct hl_device *hdev)
 			continue;
 		}
 
+		gaudi->nic_spmu_init(hdev, port);
+
 		schedule_delayed_work(&gaudi_nic->port_open_work,
 					msecs_to_jiffies(1));
 	}
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c b/drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
new file mode 100644
index 000000000000..28982192e98d
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
@@ -0,0 +1,582 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include "../include/gaudi/asic_reg/gaudi_regs.h"
+#include <linux/pci.h>
+
+#define NIC_STATS_LEN		ARRAY_SIZE(gaudi_nic_ethtool_stats)
+#define NIC_SPMU0_STATS_LEN	ARRAY_SIZE(gaudi_nic0_spmu_event_type)
+#define NIC_SPMU1_STATS_LEN	ARRAY_SIZE(gaudi_nic1_spmu_event_type)
+#define NIC_SPMU_STATS_LEN_MAX	6
+#define NIC_MAC_STATS_RX_LEN	ARRAY_SIZE(gaudi_nic_mac_stats_rx)
+#define NIC_MAC_STATS_TX_LEN	ARRAY_SIZE(gaudi_nic_mac_stats_tx)
+#define NIC_XPCS91_REGS_CNT_LEN	ARRAY_SIZE(gaudi_nic_xpcs91_reg_type)
+#define NIC_SW_CNT_LEN		ARRAY_SIZE(gaudi_nic_sw_cnt_type)
+
+#define NIC_MAC_STAT_BLOCK_SIZE	(mmNIC1_STAT_BASE - mmNIC0_STAT_BASE)
+#define NIC_MAC_STAT_HI_PART	mmNIC0_STAT_DATA_HI_REG
+#define NIC_MAC_RX_PORT0_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS
+#define NIC_MAC_RX_PORT1_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS_2
+#define NIC_MAC_TX_PORT0_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS_4
+#define NIC_MAC_TX_PORT1_OFFSET	mmNIC0_STAT_ETHERSTATSOCTETS_6
+
+#define NIC_MAC_STAT_BASE(port) \
+			((u64) (NIC_MAC_STAT_BLOCK_SIZE * (u64) ((port) >> 1)))
+
+#define NIC_MAC_STAT_RREG32(port, reg) \
+			RREG32(NIC_MAC_STAT_BASE(port) + (reg))
+
+#define ethtool_add_mode ethtool_link_ksettings_add_link_mode
+
+struct gaudi_nic_ethtool_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int stat_offset;
+};
+
+struct gaudi_nic_spmu_event_type {
+	char stat_string[ETH_GSTRING_LEN];
+	int index;
+};
+
+struct gaudi_nic_xpcs91_reg_type {
+	char stat_string[ETH_GSTRING_LEN];
+	int lo_offset;
+	int hi_offset;
+};
+
+struct gaudi_nic_sw_cnt_type {
+	char stat_string[ETH_GSTRING_LEN];
+};
+
+#define NIC_STAT(m) {__stringify(m), offsetof(struct net_device, stats.m)}
+
+static struct gaudi_nic_ethtool_stats gaudi_nic_ethtool_stats[] = {
+	NIC_STAT(rx_packets),
+	NIC_STAT(tx_packets),
+	NIC_STAT(rx_bytes),
+	NIC_STAT(tx_bytes),
+	NIC_STAT(rx_errors),
+	NIC_STAT(tx_errors),
+	NIC_STAT(rx_dropped),
+	NIC_STAT(tx_dropped),
+	NIC_STAT(multicast),
+	NIC_STAT(collisions),
+	NIC_STAT(rx_length_errors),
+	NIC_STAT(rx_over_errors),
+	NIC_STAT(rx_crc_errors),
+	NIC_STAT(rx_frame_errors),
+	NIC_STAT(rx_fifo_errors),
+	NIC_STAT(rx_missed_errors),
+	NIC_STAT(tx_aborted_errors),
+	NIC_STAT(tx_carrier_errors),
+	NIC_STAT(tx_fifo_errors),
+	NIC_STAT(tx_heartbeat_errors),
+	NIC_STAT(tx_window_errors)
+};
+
+static struct gaudi_nic_ethtool_stats gaudi_nic_mac_stats_rx[] = {
+	{"Rx MAC counters", 0},
+	{"  etherStatsOctets", 0x0},
+	{"  OctetsReceivedOK", 0x4},
+	{"  aAlignmentErrors", 0x8},
+	{"  aPAUSEMACCtrlFramesReceived", 0xC},
+	{"  aFrameTooLongErrors", 0x10},
+	{"  aInRangeLengthErrors", 0x14},
+	{"  aFramesReceivedOK", 0x18},
+	{"  VLANReceivedOK", 0x1C},
+	{"  aFrameCheckSequenceErrors", 0x20},
+	{"  ifInErrors", 0x24},
+	{"  ifInUcastPkts", 0x28},
+	{"  ifInMulticastPkts", 0x2C},
+	{"  ifInBroadcastPkts", 0x30},
+	{"  etherStatsDropEvents", 0x34},
+	{"  etherStatsUndersizePkts", 0x38},
+	{"  etherStatsPkts", 0x3C},
+	{"  etherStatsPkts64Octets", 0x40},
+	{"  etherStatsPkts65to127Octets", 0x44},
+	{"  etherStatsPkts128to255Octets", 0x48},
+	{"  etherStatsPkts256to511Octets", 0x4C},
+	{"  etherStatsPkts512to1023Octets", 0x50},
+	{"  etherStatsPkts1024to1518Octets", 0x54},
+	{"  etherStatsPkts1519toMaxOctets", 0x58},
+	{"  etherStatsOversizePkts", 0x5C},
+	{"  etherStatsJabbers", 0x60},
+	{"  etherStatsFragments", 0x64},
+	{"  aCBFCPAUSEFramesReceived_0", 0x68},
+	{"  aCBFCPAUSEFramesReceived_1", 0x6C},
+	{"  aCBFCPAUSEFramesReceived_2", 0x70},
+	{"  aCBFCPAUSEFramesReceived_3", 0x74},
+	{"  aCBFCPAUSEFramesReceived_4", 0x78},
+	{"  aCBFCPAUSEFramesReceived_5", 0x7C},
+	{"  aCBFCPAUSEFramesReceived_6", 0x80},
+	{"  aCBFCPAUSEFramesReceived_7", 0x84},
+	{"  aMACControlFramesReceived", 0x88}
+};
+
+static struct gaudi_nic_ethtool_stats gaudi_nic_mac_stats_tx[] = {
+	{"Tx MAC counters", 0},
+	{"  etherStatsOctets", 0x0},
+	{"  OctetsTransmittedOK", 0x4},
+	{"  aPAUSEMACCtrlFramesTransmitted", 0x8},
+	{"  aFramesTransmittedOK", 0xC},
+	{"  VLANTransmittedOK", 0x10},
+	{"  ifOutErrors", 0x14},
+	{"  ifOutUcastPkts", 0x18},
+	{"  ifOutMulticastPkts", 0x1C},
+	{"  ifOutBroadcastPkts", 0x20},
+	{"  etherStatsPkts64Octets", 0x24},
+	{"  etherStatsPkts65to127Octets", 0x28},
+	{"  etherStatsPkts128to255Octets", 0x2C},
+	{"  etherStatsPkts256to511Octets", 0x30},
+	{"  etherStatsPkts512to1023Octets", 0x34},
+	{"  etherStatsPkts1024to1518Octets", 0x38},
+	{"  etherStatsPkts1519toMaxOctets", 0x3C},
+	{"  aCBFCPAUSEFramesTransmitted_0", 0x40},
+	{"  aCBFCPAUSEFramesTransmitted_1", 0x44},
+	{"  aCBFCPAUSEFramesTransmitted_2", 0x48},
+	{"  aCBFCPAUSEFramesTransmitted_3", 0x4C},
+	{"  aCBFCPAUSEFramesTransmitted_4", 0x50},
+	{"  aCBFCPAUSEFramesTransmitted_5", 0x54},
+	{"  aCBFCPAUSEFramesTransmitted_6", 0x58},
+	{"  aCBFCPAUSEFramesTransmitted_7", 0x5C},
+	{"  aMACControlFramesTransmitted", 0x60},
+	{"  etherStatsPkts", 0x64}
+};
+
+static struct gaudi_nic_spmu_event_type gaudi_nic0_spmu_event_type[] = {
+	{"requester_psn_out_of_range", 18},
+	{"responder_duplicate_psn", 21},
+	{"responder_out_of_sequence_psn", 22}
+};
+
+static struct gaudi_nic_spmu_event_type gaudi_nic1_spmu_event_type[] = {
+	{"requester_psn_out_of_range", 6},
+	{"responder_duplicate_psn", 9},
+	{"responder_out_of_sequence_psn", 10}
+};
+
+static struct gaudi_nic_xpcs91_reg_type gaudi_nic_xpcs91_reg_type[] = {
+	{"  correctable_errors", 0x2, 0x3},
+	{"  uncorrectable_errors", 0x4, 0x5}
+};
+
+static struct gaudi_nic_sw_cnt_type gaudi_nic_sw_cnt_type[] = {
+	{"  pcs_local_faults"},
+	{"  pcs_remote_faults"},
+};
+
+static void gaudi_nic_get_drvinfo(struct net_device *netdev,
+					struct ethtool_drvinfo *drvinfo)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	strlcpy(drvinfo->driver, HL_NAME, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->fw_version, hdev->asic_prop.cpucp_info.cpucp_version,
+		sizeof(drvinfo->fw_version));
+	if (hdev->pdev)
+		strlcpy(drvinfo->bus_info, pci_name(hdev->pdev),
+				sizeof(drvinfo->bus_info));
+}
+
+static int gaudi_nic_get_link_ksettings(struct net_device *netdev,
+					struct ethtool_link_ksettings *cmd)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port, speed = gaudi_nic->speed;
+
+	cmd->base.speed = speed;
+	cmd->base.duplex = DUPLEX_FULL;
+
+	ethtool_link_ksettings_zero_link_mode(cmd, supported);
+	ethtool_link_ksettings_zero_link_mode(cmd, advertising);
+
+	ethtool_add_mode(cmd, supported, 100000baseCR4_Full);
+	ethtool_add_mode(cmd, supported, 100000baseSR4_Full);
+	ethtool_add_mode(cmd, supported, 100000baseKR4_Full);
+	ethtool_add_mode(cmd, supported, 100000baseLR4_ER4_Full);
+
+	ethtool_add_mode(cmd, supported, 50000baseSR2_Full);
+	ethtool_add_mode(cmd, supported, 50000baseCR2_Full);
+	ethtool_add_mode(cmd, supported, 50000baseKR2_Full);
+
+	if (speed == SPEED_100000) {
+		ethtool_add_mode(cmd, advertising, 100000baseCR4_Full);
+		ethtool_add_mode(cmd, advertising, 100000baseSR4_Full);
+		ethtool_add_mode(cmd, advertising, 100000baseKR4_Full);
+		ethtool_add_mode(cmd, advertising, 100000baseLR4_ER4_Full);
+
+		cmd->base.port = PORT_FIBRE;
+
+		ethtool_add_mode(cmd, supported, FIBRE);
+		ethtool_add_mode(cmd, advertising, FIBRE);
+
+		ethtool_add_mode(cmd, supported, Backplane);
+		ethtool_add_mode(cmd, advertising, Backplane);
+	} else if (speed == SPEED_50000) {
+		ethtool_add_mode(cmd, advertising, 50000baseSR2_Full);
+		ethtool_add_mode(cmd, advertising, 50000baseCR2_Full);
+		ethtool_add_mode(cmd, advertising, 50000baseKR2_Full);
+	} else {
+		dev_err(hdev->dev, "unknown speed %d, port %d\n", speed, port);
+		return -EFAULT;
+	}
+
+	ethtool_add_mode(cmd, supported, Autoneg);
+
+	if (gaudi_nic->auto_neg_enable) {
+		ethtool_add_mode(cmd, advertising, Autoneg);
+		cmd->base.autoneg = AUTONEG_ENABLE;
+		if (gaudi_nic->auto_neg_resolved)
+			ethtool_add_mode(cmd, lp_advertising, Autoneg);
+	} else {
+		cmd->base.autoneg = AUTONEG_DISABLE;
+	}
+
+	ethtool_add_mode(cmd, supported, Pause);
+
+	if (gaudi_nic->pfc_enable)
+		ethtool_add_mode(cmd, advertising, Pause);
+
+	return 0;
+}
+
+static int gaudi_nic_set_link_ksettings(struct net_device *netdev,
+				const struct ethtool_link_ksettings *cmd)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int rc = 0, speed = cmd->base.speed;
+	bool auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
+
+	switch (speed) {
+	case SPEED_10000:
+	case SPEED_25000:
+	case SPEED_50000:
+		if (gaudi_nic->nic_macro->num_of_lanes == NIC_LANES_4) {
+			dev_err(hdev->dev,
+				"NIC %d with 4 lanes should be used only with speed of 100000Mb/s\n",
+				port);
+			return -EFAULT;
+		}
+		break;
+	case SPEED_100000:
+		break;
+	default:
+		dev_err(hdev->dev, "got invalid speed %dMb/s for NIC %d",
+			speed, port);
+		return -EINVAL;
+	}
+
+	if ((gaudi_nic->speed == speed) &&
+			(gaudi_nic->auto_neg_enable == auto_neg))
+		return 0;
+
+	if (atomic_cmpxchg(&gaudi_nic->in_reset, 0, 1)) {
+		dev_err(hdev->dev, "port %d is in reset, can't change speed",
+			port);
+		return -EBUSY;
+	}
+
+	gaudi_nic->speed = speed;
+	if (auto_neg)
+		hdev->nic_auto_neg_mask |= BIT(port);
+	else
+		hdev->nic_auto_neg_mask &= ~BIT(port);
+
+	if (gaudi_nic->enabled) {
+		rc = gaudi_nic_port_reset(gaudi_nic);
+		if (rc)
+			dev_err(hdev->dev,
+				"Failed to reset NIC %d for speed change, rc %d",
+				port, rc);
+	}
+
+	atomic_set(&gaudi_nic->in_reset, 0);
+
+	return rc;
+}
+
+static u32 gaudi_nic_get_link(struct net_device *netdev)
+{
+	return netif_carrier_ok(netdev);
+}
+
+static void gaudi_nic_get_internal_strings(struct net_device *netdev,
+					u8 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct gaudi_nic_spmu_event_type *spmu_stats;
+	u32 port = gaudi_nic->port;
+	u32 num_spmus;
+	u32 i;
+
+	if (port & 1) {
+		num_spmus = NIC_SPMU1_STATS_LEN;
+		spmu_stats = gaudi_nic1_spmu_event_type;
+	} else {
+		num_spmus = NIC_SPMU0_STATS_LEN;
+		spmu_stats = gaudi_nic0_spmu_event_type;
+	}
+
+	for (i = 0 ; i < num_spmus ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				spmu_stats[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_MAC_STATS_RX_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_mac_stats_rx[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_XPCS91_REGS_CNT_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_xpcs91_reg_type[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_SW_CNT_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_sw_cnt_type[i].stat_string,
+				ETH_GSTRING_LEN);
+	data += i * ETH_GSTRING_LEN;
+	for (i = 0 ; i < NIC_MAC_STATS_TX_LEN ; i++)
+		memcpy(data + i * ETH_GSTRING_LEN,
+				gaudi_nic_mac_stats_tx[i].stat_string,
+				ETH_GSTRING_LEN);
+
+}
+
+static void gaudi_nic_get_strings(struct net_device *netdev, u32 stringset,
+					u8 *data)
+{
+	int i;
+
+	if (stringset == ETH_SS_STATS) {
+		for (i = 0; i < NIC_STATS_LEN; i++)
+			memcpy(data + i * ETH_GSTRING_LEN,
+					gaudi_nic_ethtool_stats[i].stat_string,
+					ETH_GSTRING_LEN);
+		gaudi_nic_get_internal_strings(netdev,
+					data + i * ETH_GSTRING_LEN);
+	}
+}
+
+static int gaudi_nic_get_sset_count(struct net_device *netdev, int sset)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	u32 port = gaudi_nic->port;
+	int num_spmus, mac_counters, xpcs91_counters, sw_counetrs;
+
+	num_spmus = (port & 1) ? NIC_SPMU1_STATS_LEN : NIC_SPMU0_STATS_LEN;
+	mac_counters = NIC_MAC_STATS_RX_LEN + NIC_MAC_STATS_TX_LEN;
+	xpcs91_counters = NIC_XPCS91_REGS_CNT_LEN;
+	sw_counetrs = NIC_SW_CNT_LEN;
+
+	switch (sset) {
+	case ETH_SS_STATS:
+		return NIC_STATS_LEN + num_spmus + mac_counters +
+			xpcs91_counters + sw_counetrs;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+static u64 gaudi_nic_read_mac_counter(struct hl_device *hdev, u32 port,
+						int offset, bool is_rx)
+{
+	u64 lo_part, hi_part;
+	u64 start_reg;
+
+	if (!hdev->supports_coresight)
+		return 0;
+
+	if (is_rx)
+		if (port & 1)
+			start_reg = NIC_MAC_RX_PORT1_OFFSET;
+		else
+			start_reg = NIC_MAC_RX_PORT0_OFFSET;
+	else
+		if (port & 1)
+			start_reg = NIC_MAC_TX_PORT1_OFFSET;
+		else
+			start_reg = NIC_MAC_TX_PORT0_OFFSET;
+
+	lo_part = NIC_MAC_STAT_RREG32(port, start_reg + offset);
+	/* Volatile read: MUST read high part after low */
+	hi_part = NIC_MAC_STAT_RREG32(port, NIC_MAC_STAT_HI_PART);
+
+	return lo_part | (hi_part << 32);
+}
+
+static void gaudi_nic_read_xpcs91_regs(struct gaudi_nic_device *gaudi_nic,
+					u64 *out_data)
+{
+	u32 lo_part, hi_part, start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+
+	lo_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[0].lo_offset);
+	hi_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[0].hi_offset);
+	gaudi_nic->correctable_errors_cnt +=
+					(hi_part << 16) | lo_part;
+	out_data[0] = gaudi_nic->correctable_errors_cnt;
+
+	lo_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[1].lo_offset);
+	hi_part = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs91",
+			gaudi_nic_xpcs91_reg_type[1].hi_offset);
+	gaudi_nic->uncorrectable_errors_cnt +=
+					(hi_part << 16) | lo_part;
+	out_data[1] = gaudi_nic->uncorrectable_errors_cnt;
+}
+
+static void gaudi_nic_read_sw_counters(struct gaudi_nic_device *gaudi_nic,
+					u64 *out_data)
+{
+	out_data[0] = gaudi_nic->pcs_local_fault_cnt;
+	out_data[1] = gaudi_nic->pcs_remote_fault_cnt;
+}
+
+static void gaudi_nic_get_internal_stats(struct net_device *netdev, u64 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 num_spmus = (port & 1) ? NIC_SPMU1_STATS_LEN : NIC_SPMU0_STATS_LEN;
+	int i;
+
+	gaudi_sample_spmu_nic(hdev, port, num_spmus, data);
+	data += num_spmus;
+
+	/* first entry is title */
+	data[0] = 0;
+	for (i = 1 ; i < NIC_MAC_STATS_RX_LEN ; i++)
+		data[i] = gaudi_nic_read_mac_counter(hdev, port,
+				gaudi_nic_mac_stats_rx[i].stat_offset, true);
+	data += i;
+
+	gaudi_nic_read_xpcs91_regs(gaudi_nic, data);
+	data += NIC_XPCS91_REGS_CNT_LEN;
+
+	gaudi_nic_read_sw_counters(gaudi_nic, data);
+	data += NIC_SW_CNT_LEN;
+
+	/* first entry is title */
+	data[0] = 0;
+	for (i = 1 ; i < NIC_MAC_STATS_TX_LEN ; i++)
+		data[i] = gaudi_nic_read_mac_counter(hdev, port,
+				gaudi_nic_mac_stats_tx[i].stat_offset, false);
+}
+
+static void gaudi_nic_get_ethtool_stats(struct net_device *netdev,
+					struct ethtool_stats *stats, u64 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	char *p;
+	u32 port = gaudi_nic->port;
+	int i;
+
+	if (disabled_or_in_reset(gaudi_nic)) {
+		dev_info_ratelimited(hdev->dev,
+			"port %d is in reset, can't get ethtool stats", port);
+		return;
+	}
+
+	for (i = 0; i < NIC_STATS_LEN ; i++) {
+		p = (char *) netdev + gaudi_nic_ethtool_stats[i].stat_offset;
+		data[i] = *(u32 *) p;
+	}
+
+	gaudi_nic_get_internal_stats(netdev, data + i);
+}
+
+static int gaudi_nic_get_module_info(struct net_device *netdev,
+					struct ethtool_modinfo *modinfo)
+{
+	modinfo->type = ETH_MODULE_SFF_8636;
+	modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+
+	return 0;
+}
+
+static int gaudi_nic_get_module_eeprom(struct net_device *netdev,
+					struct ethtool_eeprom *ee, u8 *data)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	if (!ee->len)
+		return -EINVAL;
+
+	memset(data, 0, ee->len);
+	memcpy(data, hdev->asic_prop.cpucp_nic_info.qsfp_eeprom, ee->len);
+
+	return 0;
+}
+
+/* enable spmus for ethtool monitoring */
+void gaudi_nic_spmu_init(struct hl_device *hdev, int port)
+{
+	struct gaudi_nic_spmu_event_type *event_types;
+	u32 spmu_events[NIC_SPMU_STATS_LEN_MAX], num_event_types;
+	int rc, i;
+
+	if (port & 1) {
+		num_event_types = NIC_SPMU1_STATS_LEN;
+		event_types = gaudi_nic1_spmu_event_type;
+	} else {
+		num_event_types = NIC_SPMU0_STATS_LEN;
+		event_types = gaudi_nic0_spmu_event_type;
+	}
+
+	if (num_event_types > NIC_SPMU_STATS_LEN_MAX)
+		num_event_types = NIC_SPMU_STATS_LEN_MAX;
+
+	for (i = 0 ; i < num_event_types ; i++)
+		spmu_events[i] = event_types[i].index;
+
+	rc = gaudi_config_spmu_nic(hdev, port, num_event_types,
+			spmu_events);
+	if (rc)
+		dev_err(hdev->dev,
+			"Failed to configure spmu for NIC port %d\n",
+			port);
+}
+
+u64 gaudi_nic_read_mac_stat_counter(struct hl_device *hdev, u32 port, int idx,
+					bool is_rx)
+{
+	struct gaudi_nic_ethtool_stats *stat = is_rx ?
+						&gaudi_nic_mac_stats_rx[idx] :
+						&gaudi_nic_mac_stats_tx[idx];
+
+	return gaudi_nic_read_mac_counter(hdev, port, stat->stat_offset, is_rx);
+}
+
+const struct ethtool_ops gaudi_nic_ethtool_ops = {
+	.get_drvinfo = gaudi_nic_get_drvinfo,
+	.get_link_ksettings = gaudi_nic_get_link_ksettings,
+	.set_link_ksettings = gaudi_nic_set_link_ksettings,
+	.get_link = gaudi_nic_get_link,
+	.get_strings = gaudi_nic_get_strings,
+	.get_sset_count = gaudi_nic_get_sset_count,
+	.get_ethtool_stats = gaudi_nic_get_ethtool_stats,
+	.get_module_info   = gaudi_nic_get_module_info,
+	.get_module_eeprom = gaudi_nic_get_module_eeprom,
+};
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/15] habanalabs/gaudi: support DCB protocol
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (11 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 16:11 ` [PATCH 15/15] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
  2020-09-10 20:01 ` [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Add DCB support to configure the NIC's Priority Flow Control (PFC).
The added support is minimal because a full support is not
currently required.

A summary of the supported callbacks:

- ieee_getpfc: get the current PFC configuration. PFC is enabled by
               default.
- ieee_setpfc: set PFC configuration. Only 0 or all 4 priorities can be
               enabled, no subset is allowed.
- getdcbx: get DCBX capability.
- setdcbx: set DCBX capability. Only host LLDP agent and IEEE protocol
           flavors are supported.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/Makefile        |   2 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     |   3 +
 .../misc/habanalabs/gaudi/gaudi_nic_dcbnl.c   | 108 ++++++++++++++++++
 3 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c

diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index e3002dc34a74..9757c8ba1cb0 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -3,5 +3,5 @@ HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
 HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o \
-	gaudi/gaudi_nic_ethtool.o \
+	gaudi/gaudi_nic_ethtool.o gaudi/gaudi_nic_dcbnl.o \
 	gaudi/gaudi_nic_debugfs.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 108db990efa8..1ea410cdafdf 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -2745,6 +2745,9 @@ static int port_register(struct hl_device *hdev, int port)
 
 	ndev->netdev_ops = &gaudi_nic_netdev_ops;
 	ndev->ethtool_ops = &gaudi_nic_ethtool_ops;
+#ifdef CONFIG_DCB
+	ndev->dcbnl_ops = &gaudi_nic_dcbnl_ops;
+#endif
 	ndev->watchdog_timeo = NIC_TX_TIMEOUT;
 	ndev->min_mtu = ETH_MIN_MTU;
 	ndev->max_mtu = NIC_MAX_MTU;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c b/drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
new file mode 100644
index 000000000000..b8f37fd0d3cf
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2018-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+
+#define PFC_PRIO_NUM		4
+#define PFC_PRIO_MASK_ALL	GENMASK(PFC_PRIO_NUM - 1, 0)
+#define PFC_PRIO_MASK_NONE	0
+#define PFC_STAT_TX_OFFSET	17
+#define PFC_STAT_RX_OFFSET	27
+
+#ifdef CONFIG_DCB
+static int gaudi_nic_dcbnl_ieee_getpfc(struct net_device *netdev,
+					struct ieee_pfc *pfc)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int rc = 0, i, tx_idx, rx_idx;
+
+	if (disabled_or_in_reset(gaudi_nic)) {
+		dev_info_ratelimited(hdev->dev,
+				"port %d is in reset, can't get PFC", port);
+		return -EBUSY;
+	}
+
+	pfc->pfc_en = gaudi_nic->pfc_enable ? PFC_PRIO_MASK_ALL :
+							PFC_PRIO_MASK_NONE;
+	pfc->pfc_cap = PFC_PRIO_NUM;
+
+	for (i = 0 ; i < PFC_PRIO_NUM ; i++) {
+		tx_idx = PFC_STAT_TX_OFFSET + i;
+		rx_idx = PFC_STAT_RX_OFFSET + i;
+
+		pfc->requests[i] = gaudi_nic_read_mac_stat_counter(hdev, port,
+								tx_idx, false);
+		pfc->indications[i] = gaudi_nic_read_mac_stat_counter(hdev,
+							port, rx_idx, true);
+	}
+
+	return rc;
+}
+
+static int gaudi_nic_dcbnl_ieee_setpfc(struct net_device *netdev,
+					struct ieee_pfc *pfc)
+{
+	struct gaudi_nic_device **ptr = netdev_priv(netdev);
+	struct gaudi_nic_device *gaudi_nic = *ptr;
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u8 curr_pfc_en;
+
+	if (pfc->pfc_en & ~PFC_PRIO_MASK_ALL) {
+		dev_info_ratelimited(hdev->dev,
+					"PFC supports %d priorities only, port %d\n",
+					PFC_PRIO_NUM, port);
+		return -EINVAL;
+	}
+
+	if ((pfc->pfc_en != PFC_PRIO_MASK_NONE) &&
+			(pfc->pfc_en != PFC_PRIO_MASK_ALL)) {
+		dev_info_ratelimited(hdev->dev,
+					"PFC should be enabled/disabled on all priorities, port %d\n",
+					port);
+		return -EINVAL;
+	}
+
+	if (disabled_or_in_reset(gaudi_nic)) {
+		dev_info_ratelimited(hdev->dev,
+				"port %d is in reset, can't set PFC", port);
+		return -EBUSY;
+	}
+
+	curr_pfc_en = gaudi_nic->pfc_enable ? PFC_PRIO_MASK_ALL :
+							PFC_PRIO_MASK_NONE;
+
+	if (pfc->pfc_en == curr_pfc_en)
+		return 0;
+
+	gaudi_nic->pfc_enable = !gaudi_nic->pfc_enable;
+
+	gaudi_nic_set_pfc(gaudi_nic);
+
+	return 0;
+}
+
+static u8 gaudi_nic_dcbnl_getdcbx(struct net_device *netdev)
+{
+	return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
+}
+
+static u8 gaudi_nic_dcbnl_setdcbx(struct net_device *netdev, u8 mode)
+{
+	return !(mode == (DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE));
+}
+
+const struct dcbnl_rtnl_ops gaudi_nic_dcbnl_ops = {
+	.ieee_getpfc	= gaudi_nic_dcbnl_ieee_getpfc,
+	.ieee_setpfc	= gaudi_nic_dcbnl_ieee_setpfc,
+	.getdcbx	= gaudi_nic_dcbnl_getdcbx,
+	.setdcbx	= gaudi_nic_dcbnl_setdcbx
+};
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/15] habanalabs/gaudi: add NIC init/fini calls from common code
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (12 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 14/15] habanalabs/gaudi: support DCB protocol Oded Gabbay
@ 2020-09-10 16:11 ` Oded Gabbay
  2020-09-10 20:01 ` [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
  14 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 16:11 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Finally, enable the NIC engines. Initialize the NIC ports mask variable
with full mask so all ports will be initialized.

Call the NIC init/fini from the common code.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/device.c       | 18 +++++++++++++++
 drivers/misc/habanalabs/common/habanalabs.h   |  6 +++++
 .../misc/habanalabs/common/habanalabs_drv.c   |  1 +
 drivers/misc/habanalabs/common/pci.c          |  1 +
 drivers/misc/habanalabs/gaudi/gaudi.c         | 23 +++++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           | 12 ++++++++++
 6 files changed, 61 insertions(+)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 57f5b945fa41..8f8744e4f2c7 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -1077,6 +1077,12 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
 			goto out_err;
 		}
 
+		rc = hdev->asic_funcs->nic_init(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "Failed to init NIC driver\n");
+			goto out_err;
+		}
+
 		hl_set_max_power(hdev);
 	} else {
 		rc = hdev->asic_funcs->soft_reset_late_init(hdev);
@@ -1312,6 +1318,13 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 		goto out_disabled;
 	}
 
+	rc = hdev->asic_funcs->nic_init(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to init NIC driver\n");
+		rc = 0;
+		goto out_disabled;
+	}
+
 	/*
 	 * Expose devices and sysfs nodes to user.
 	 * From here there is no need to add char devices and create sysfs nodes
@@ -1463,6 +1476,11 @@ void hl_device_fini(struct hl_device *hdev)
 
 	hl_cb_pool_fini(hdev);
 
+	/* the NIC uses the kernel context for MMU mappings, therefore must be
+	 * cleaned before it
+	 */
+	hdev->asic_funcs->nic_fini(hdev);
+
 	/* Release kernel context */
 	if ((hdev->kernel_ctx) && (hl_ctx_put(hdev->kernel_ctx) != 1))
 		dev_err(hdev->dev, "kernel ctx is still alive\n");
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 1f3735a64d88..1c1850114aa4 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -680,6 +680,10 @@ struct hl_info_mac_addr;
  *                    then the timeout is the default timeout for the specific
  *                    ASIC
  * @get_hw_state: retrieve the H/W state
+ * @nic_init: init the NIC H/W and I/F. This should be called in the final satge
+ *            of the init flow, as we must not have anything that might fail
+ *            during its initialization after the NIC init.
+ * @nic_fini: perform NIC cleanup.
  * @nic_control: Perform NIC related operations.
  * @nic_cq_mmap: map the NIC CQ buffer.
  * @pci_bars_map: Map PCI BARs.
@@ -786,6 +790,8 @@ struct hl_asic_funcs {
 	int (*send_cpu_message)(struct hl_device *hdev, u32 *msg,
 				u16 len, u32 timeout, long *result);
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
+	int (*nic_init)(struct hl_device *hdev);
+	void (*nic_fini)(struct hl_device *hdev);
 	int (*nic_control)(struct hl_device *hdev, u32 op, void *input,
 				void *output);
 	int (*nic_cq_mmap)(struct hl_device *hdev, struct vm_area_struct *vma);
diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c
index df92afc1b9d5..fcb28975fac5 100644
--- a/drivers/misc/habanalabs/common/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/common/habanalabs_drv.c
@@ -247,6 +247,7 @@ static void set_driver_behavior_per_device(struct hl_device *hdev)
 	hdev->bmc_enable = 1;
 	hdev->hard_reset_on_fw_events = 1;
 	hdev->card_type = cpucp_card_type_pci;
+	hdev->nic_ports_mask = 0x3FF;
 	hdev->nic_ports_ext_mask = 0x3FF;
 	hdev->nic_auto_neg_mask = 0x3FF;
 	hdev->nic_load_fw = 0;
diff --git a/drivers/misc/habanalabs/common/pci.c b/drivers/misc/habanalabs/common/pci.c
index 923b2606e29f..c376ab4695ab 100644
--- a/drivers/misc/habanalabs/common/pci.c
+++ b/drivers/misc/habanalabs/common/pci.c
@@ -230,6 +230,7 @@ int hl_pci_set_inbound_region(struct hl_device *hdev, u8 region,
 			lower_32_bits(pci_region->addr));
 	rc |= hl_pci_iatu_write(hdev, offset + 0x18,
 			upper_32_bits(pci_region->addr));
+	/* Set bar type as memory */
 	rc |= hl_pci_iatu_write(hdev, offset + 0x0, 0);
 
 	/* Enable + bar/address match + match enable + bar number */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index eb733a48eb72..758a26b43cf2 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -882,6 +882,27 @@ static void gaudi_late_fini(struct hl_device *hdev)
 	hdev->hl_chip_info->info = NULL;
 }
 
+static int gaudi_nic_init(struct hl_device *hdev)
+{
+	/*
+	 * In init flow we initialize the NIC ports from scratch. In hard reset
+	 * flow, we get here after the NIC ports were halted, hence we only
+	 * need to reopen them.
+	 */
+	if (atomic_read(&hdev->in_reset)) {
+		gaudi_nic_ports_reopen(hdev);
+		return 0;
+	}
+
+	return gaudi_nic_ports_init(hdev);
+}
+
+static void gaudi_nic_fini(struct hl_device *hdev)
+{
+	/* must be called after MSI was disabled */
+	gaudi_nic_ports_fini(hdev);
+}
+
 static void gaudi_nic_handle_rx(struct gaudi_nic_device *gaudi_nic)
 {
 	/* at this point, interrupts were disabled by the H/W */
@@ -7482,6 +7503,8 @@ static const struct hl_asic_funcs gaudi_funcs = {
 	.get_eeprom_data = gaudi_get_eeprom_data,
 	.send_cpu_message = gaudi_send_cpu_message,
 	.get_hw_state = gaudi_get_hw_state,
+	.nic_init = gaudi_nic_init,
+	.nic_fini = gaudi_nic_fini,
 	.nic_control = gaudi_nic_control,
 	.nic_cq_mmap = gaudi_nic_cq_mmap,
 	.pci_bars_map = gaudi_pci_bars_map,
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 6e98c830f6a2..e753b3a0079f 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5265,6 +5265,16 @@ static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 	return RREG32(mmHW_STATE);
 }
 
+static int goya_nic_init(struct hl_device *hdev)
+{
+	return 0;
+}
+
+static void goya_nic_fini(struct hl_device *hdev)
+{
+
+}
+
 static int goya_nic_control(struct hl_device *hdev, u32 op, void *input,
 			void *output)
 {
@@ -5405,6 +5415,8 @@ static const struct hl_asic_funcs goya_funcs = {
 	.get_eeprom_data = goya_get_eeprom_data,
 	.send_cpu_message = goya_send_cpu_message,
 	.get_hw_state = goya_get_hw_state,
+	.nic_init = goya_nic_init,
+	.nic_fini = goya_nic_fini,
 	.nic_control = goya_nic_control,
 	.nic_cq_mmap = goya_nic_mmap,
 	.pci_bars_map = goya_pci_bars_map,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
                   ` (13 preceding siblings ...)
  2020-09-10 16:11 ` [PATCH 15/15] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
@ 2020-09-10 20:01 ` Jakub Kicinski
  2020-09-10 20:16   ` Oded Gabbay
  14 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-10 20:01 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-kernel, netdev, SW_Drivers, gregkh, davem

On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
>  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
>  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
>  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
>  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
>  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
>  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
>  create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h

The relevant code needs to live under drivers/net/(ethernet/).
For one thing our automation won't trigger for drivers in random
(/misc) part of the tree.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 16:11 ` [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC Oded Gabbay
@ 2020-09-10 20:01   ` Jakub Kicinski
  2020-09-10 20:10     ` Oded Gabbay
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-10 20:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-kernel, netdev, SW_Drivers, gregkh, davem, Omer Shpigelman

On Thu, 10 Sep 2020 19:11:23 +0300 Oded Gabbay wrote:
> From: Omer Shpigelman <oshpigelman@habana.ai>
> 
> Add several debugfs entries to help us debug the NIC engines and ports and
> also the communication layer of the DL training application that use them.
> 
> There are eight new entries. Detailed description is in the documentation
> file but here is a summary:
> 
> - nic_mac_loopback: enable mac loopback mode per port
> - nic_ports_status: print physical connection status per port
> - nic_pcs_fail_time_frame: configure windows size for measuring pcs
>                            failures
> - nic_pcs_fail_threshold: configure pcs failures threshold for
>                           reconfiguring the link
> - nic_pam4_tx_taps: configure PAM4 TX taps
> - nic_polarity: configure polarity for NIC port lanes
> - nic_check_link: configure whether to check the PCS link periodically
> - nic_phy_auto_neg_lpbk: enable PHY auto-negotiation loopback
> 
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

debugfs configuration interfaces are not acceptable in netdev.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support
  2020-09-10 16:11 ` [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
@ 2020-09-10 20:03   ` Jakub Kicinski
  2020-09-10 20:18     ` Oded Gabbay
  2020-09-14  9:52     ` Omer Shpigelman
  0 siblings, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-10 20:03 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-kernel, netdev, SW_Drivers, gregkh, davem, Omer Shpigelman

On Thu, 10 Sep 2020 19:11:16 +0300 Oded Gabbay wrote:
> +module_param(nic_rx_poll, int, 0444);
> +MODULE_PARM_DESC(nic_rx_poll,
> +	"Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");

If your chip does not support IRQ coalescing you can configure polling
and the timeout via ethtool -C, rather than a module parameter.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 20:01   ` Jakub Kicinski
@ 2020-09-10 20:10     ` Oded Gabbay
  2020-09-10 20:16       ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Omer Shpigelman

On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 19:11:23 +0300 Oded Gabbay wrote:
> > From: Omer Shpigelman <oshpigelman@habana.ai>
> >
> > Add several debugfs entries to help us debug the NIC engines and ports and
> > also the communication layer of the DL training application that use them.
> >
> > There are eight new entries. Detailed description is in the documentation
> > file but here is a summary:
> >
> > - nic_mac_loopback: enable mac loopback mode per port
> > - nic_ports_status: print physical connection status per port
> > - nic_pcs_fail_time_frame: configure windows size for measuring pcs
> >                            failures
> > - nic_pcs_fail_threshold: configure pcs failures threshold for
> >                           reconfiguring the link
> > - nic_pam4_tx_taps: configure PAM4 TX taps
> > - nic_polarity: configure polarity for NIC port lanes
> > - nic_check_link: configure whether to check the PCS link periodically
> > - nic_phy_auto_neg_lpbk: enable PHY auto-negotiation loopback
> >
> > Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> > Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
> > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
>
> debugfs configuration interfaces are not acceptable in netdev.

no problem, but can we have only these two entries:
> - nic_mac_loopback: enable mac loopback mode per port
> - nic_ports_status: print physical connection status per port

nic_ports_status is print only (not configuration). nic_mac_loopback
is to set a port to loopback mode and out of it. It's not really
configuration but rather a mode change.
If not, what's the alternative ?

Thanks,
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:01 ` [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
@ 2020-09-10 20:16   ` Oded Gabbay
  2020-09-10 20:25     ` Andrew Lunn
  2020-09-10 20:28     ` Jakub Kicinski
  0 siblings, 2 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller

On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
>
> The relevant code needs to live under drivers/net/(ethernet/).
> For one thing our automation won't trigger for drivers in random
> (/misc) part of the tree.

Can you please elaborate on how to do this with a single driver that
is already in misc ?
As I mentioned in the cover letter, we are not developing a
stand-alone NIC. We have a deep-learning accelerator with a NIC
interface.
Therefore, we don't have a separate PCI physical function for the NIC
and I can't have a second driver registering to it.

We did this design based on existing examples in the kernel
(registering to netdev), such as (just a few examples):
1. sgi-xp driver in drivers/misc/sgi-xp (see file xpnet.c)
2. bnx2fc in drivers/scsi/bnx2fc

Thanks,
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 20:10     ` Oded Gabbay
@ 2020-09-10 20:16       ` Jakub Kicinski
  2020-09-10 20:17         ` Oded Gabbay
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-10 20:16 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Omer Shpigelman

On Thu, 10 Sep 2020 23:10:47 +0300 Oded Gabbay wrote:
> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 10 Sep 2020 19:11:23 +0300 Oded Gabbay wrote:  
> > > From: Omer Shpigelman <oshpigelman@habana.ai>
> > >
> > > Add several debugfs entries to help us debug the NIC engines and ports and
> > > also the communication layer of the DL training application that use them.
> > >
> > > There are eight new entries. Detailed description is in the documentation
> > > file but here is a summary:
> > >
> > > - nic_mac_loopback: enable mac loopback mode per port
> > > - nic_ports_status: print physical connection status per port
> > > - nic_pcs_fail_time_frame: configure windows size for measuring pcs
> > >                            failures
> > > - nic_pcs_fail_threshold: configure pcs failures threshold for
> > >                           reconfiguring the link
> > > - nic_pam4_tx_taps: configure PAM4 TX taps
> > > - nic_polarity: configure polarity for NIC port lanes
> > > - nic_check_link: configure whether to check the PCS link periodically
> > > - nic_phy_auto_neg_lpbk: enable PHY auto-negotiation loopback
> > >
> > > Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> > > Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
> > > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>  
> >
> > debugfs configuration interfaces are not acceptable in netdev.  
> 
> no problem, but can we have only these two entries:
> > - nic_mac_loopback: enable mac loopback mode per port
> > - nic_ports_status: print physical connection status per port  
> 
> nic_ports_status is print only (not configuration).

Doesn't seem like this one shows any more information than can be
queried with ethtool, right?

> nic_mac_loopback
> is to set a port to loopback mode and out of it. It's not really
> configuration but rather a mode change.

What is this loopback for? Testing?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 20:16       ` Jakub Kicinski
@ 2020-09-10 20:17         ` Oded Gabbay
  2020-09-10 20:30           ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Omer Shpigelman

On Thu, Sep 10, 2020 at 11:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 23:10:47 +0300 Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Thu, 10 Sep 2020 19:11:23 +0300 Oded Gabbay wrote:
> > > > From: Omer Shpigelman <oshpigelman@habana.ai>
> > > >
> > > > Add several debugfs entries to help us debug the NIC engines and ports and
> > > > also the communication layer of the DL training application that use them.
> > > >
> > > > There are eight new entries. Detailed description is in the documentation
> > > > file but here is a summary:
> > > >
> > > > - nic_mac_loopback: enable mac loopback mode per port
> > > > - nic_ports_status: print physical connection status per port
> > > > - nic_pcs_fail_time_frame: configure windows size for measuring pcs
> > > >                            failures
> > > > - nic_pcs_fail_threshold: configure pcs failures threshold for
> > > >                           reconfiguring the link
> > > > - nic_pam4_tx_taps: configure PAM4 TX taps
> > > > - nic_polarity: configure polarity for NIC port lanes
> > > > - nic_check_link: configure whether to check the PCS link periodically
> > > > - nic_phy_auto_neg_lpbk: enable PHY auto-negotiation loopback
> > > >
> > > > Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> > > > Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
> > > > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> > >
> > > debugfs configuration interfaces are not acceptable in netdev.
> >
> > no problem, but can we have only these two entries:
> > > - nic_mac_loopback: enable mac loopback mode per port
> > > - nic_ports_status: print physical connection status per port
> >
> > nic_ports_status is print only (not configuration).
>
> Doesn't seem like this one shows any more information than can be
> queried with ethtool, right?
correct, it just displays it in a format that is much more readable
>
> > nic_mac_loopback
> > is to set a port to loopback mode and out of it. It's not really
> > configuration but rather a mode change.
>
> What is this loopback for? Testing?

Correct.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support
  2020-09-10 20:03   ` Jakub Kicinski
@ 2020-09-10 20:18     ` Oded Gabbay
  2020-09-14  9:52     ` Omer Shpigelman
  1 sibling, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Omer Shpigelman

On Thu, Sep 10, 2020 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 19:11:16 +0300 Oded Gabbay wrote:
> > +module_param(nic_rx_poll, int, 0444);
> > +MODULE_PARM_DESC(nic_rx_poll,
> > +     "Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");
>
> If your chip does not support IRQ coalescing you can configure polling
> and the timeout via ethtool -C, rather than a module parameter.

Thanks for the pointer, we will take a look at that.

Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight
  2020-09-10 16:11 ` [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
@ 2020-09-10 20:19   ` Andrew Lunn
  2020-09-10 20:22     ` Oded Gabbay
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Lunn @ 2020-09-10 20:19 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-kernel, netdev, SW_Drivers, gregkh, davem, kuba, Omer Shpigelman

> +static int gaudi_nic_get_link_ksettings(struct net_device *netdev,
> +					struct ethtool_link_ksettings *cmd)
> +{
> +	struct gaudi_nic_device **ptr = netdev_priv(netdev);
> +	struct gaudi_nic_device *gaudi_nic = *ptr;
> +	struct hl_device *hdev = gaudi_nic->hdev;
> +	u32 port = gaudi_nic->port, speed = gaudi_nic->speed;

Please go through the code and fixup Reverse Christmas tree.

> +
> +	cmd->base.speed = speed;
> +	cmd->base.duplex = DUPLEX_FULL;
> +
> +	ethtool_link_ksettings_zero_link_mode(cmd, supported);
> +	ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> +
> +	ethtool_add_mode(cmd, supported, 100000baseCR4_Full);
> +	ethtool_add_mode(cmd, supported, 100000baseSR4_Full);
> +	ethtool_add_mode(cmd, supported, 100000baseKR4_Full);
> +	ethtool_add_mode(cmd, supported, 100000baseLR4_ER4_Full);
> +
> +	ethtool_add_mode(cmd, supported, 50000baseSR2_Full);
> +	ethtool_add_mode(cmd, supported, 50000baseCR2_Full);
> +	ethtool_add_mode(cmd, supported, 50000baseKR2_Full);
> +
> +	if (speed == SPEED_100000) {
> +		ethtool_add_mode(cmd, advertising, 100000baseCR4_Full);
> +		ethtool_add_mode(cmd, advertising, 100000baseSR4_Full);
> +		ethtool_add_mode(cmd, advertising, 100000baseKR4_Full);
> +		ethtool_add_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> +
> +		cmd->base.port = PORT_FIBRE;
> +
> +		ethtool_add_mode(cmd, supported, FIBRE);
> +		ethtool_add_mode(cmd, advertising, FIBRE);
> +
> +		ethtool_add_mode(cmd, supported, Backplane);
> +		ethtool_add_mode(cmd, advertising, Backplane);
> +	} else if (speed == SPEED_50000) {
> +		ethtool_add_mode(cmd, advertising, 50000baseSR2_Full);
> +		ethtool_add_mode(cmd, advertising, 50000baseCR2_Full);
> +		ethtool_add_mode(cmd, advertising, 50000baseKR2_Full);
> +	} else {
> +		dev_err(hdev->dev, "unknown speed %d, port %d\n", speed, port);
> +		return -EFAULT;
> +	}
> +
> +	ethtool_add_mode(cmd, supported, Autoneg);
> +
> +	if (gaudi_nic->auto_neg_enable) {
> +		ethtool_add_mode(cmd, advertising, Autoneg);
> +		cmd->base.autoneg = AUTONEG_ENABLE;
> +		if (gaudi_nic->auto_neg_resolved)
> +			ethtool_add_mode(cmd, lp_advertising, Autoneg);
> +	} else {
> +		cmd->base.autoneg = AUTONEG_DISABLE;
> +	}
> +
> +	ethtool_add_mode(cmd, supported, Pause);
> +
> +	if (gaudi_nic->pfc_enable)
> +		ethtool_add_mode(cmd, advertising, Pause);
> +
> +	return 0;
> +}
> +
> +static int gaudi_nic_set_link_ksettings(struct net_device *netdev,
> +				const struct ethtool_link_ksettings *cmd)
> +{
> +	struct gaudi_nic_device **ptr = netdev_priv(netdev);
> +	struct gaudi_nic_device *gaudi_nic = *ptr;
> +	struct hl_device *hdev = gaudi_nic->hdev;
> +	u32 port = gaudi_nic->port;
> +	int rc = 0, speed = cmd->base.speed;
> +	bool auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;

It appears you only support speed and auto_neg. You should check that
all other things which could be configured are empty, e.g. none of the
bits are set in cmd->link_modes.advertising. If you are requested to
configure something which is not supported, you need to return
-EOPNOTSUPP.

	Andrew

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight
  2020-09-10 20:19   ` Andrew Lunn
@ 2020-09-10 20:22     ` Oded Gabbay
  0 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:22 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Jakub Kicinski,
	Omer Shpigelman

On Thu, Sep 10, 2020 at 11:19 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > +static int gaudi_nic_get_link_ksettings(struct net_device *netdev,
> > +                                     struct ethtool_link_ksettings *cmd)
> > +{
> > +     struct gaudi_nic_device **ptr = netdev_priv(netdev);
> > +     struct gaudi_nic_device *gaudi_nic = *ptr;
> > +     struct hl_device *hdev = gaudi_nic->hdev;
> > +     u32 port = gaudi_nic->port, speed = gaudi_nic->speed;
>
> Please go through the code and fixup Reverse Christmas tree.
Of course, we will fix this.

>
> > +
> > +     cmd->base.speed = speed;
> > +     cmd->base.duplex = DUPLEX_FULL;
> > +
> > +     ethtool_link_ksettings_zero_link_mode(cmd, supported);
> > +     ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> > +
> > +     ethtool_add_mode(cmd, supported, 100000baseCR4_Full);
> > +     ethtool_add_mode(cmd, supported, 100000baseSR4_Full);
> > +     ethtool_add_mode(cmd, supported, 100000baseKR4_Full);
> > +     ethtool_add_mode(cmd, supported, 100000baseLR4_ER4_Full);
> > +
> > +     ethtool_add_mode(cmd, supported, 50000baseSR2_Full);
> > +     ethtool_add_mode(cmd, supported, 50000baseCR2_Full);
> > +     ethtool_add_mode(cmd, supported, 50000baseKR2_Full);
> > +
> > +     if (speed == SPEED_100000) {
> > +             ethtool_add_mode(cmd, advertising, 100000baseCR4_Full);
> > +             ethtool_add_mode(cmd, advertising, 100000baseSR4_Full);
> > +             ethtool_add_mode(cmd, advertising, 100000baseKR4_Full);
> > +             ethtool_add_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> > +
> > +             cmd->base.port = PORT_FIBRE;
> > +
> > +             ethtool_add_mode(cmd, supported, FIBRE);
> > +             ethtool_add_mode(cmd, advertising, FIBRE);
> > +
> > +             ethtool_add_mode(cmd, supported, Backplane);
> > +             ethtool_add_mode(cmd, advertising, Backplane);
> > +     } else if (speed == SPEED_50000) {
> > +             ethtool_add_mode(cmd, advertising, 50000baseSR2_Full);
> > +             ethtool_add_mode(cmd, advertising, 50000baseCR2_Full);
> > +             ethtool_add_mode(cmd, advertising, 50000baseKR2_Full);
> > +     } else {
> > +             dev_err(hdev->dev, "unknown speed %d, port %d\n", speed, port);
> > +             return -EFAULT;
> > +     }
> > +
> > +     ethtool_add_mode(cmd, supported, Autoneg);
> > +
> > +     if (gaudi_nic->auto_neg_enable) {
> > +             ethtool_add_mode(cmd, advertising, Autoneg);
> > +             cmd->base.autoneg = AUTONEG_ENABLE;
> > +             if (gaudi_nic->auto_neg_resolved)
> > +                     ethtool_add_mode(cmd, lp_advertising, Autoneg);
> > +     } else {
> > +             cmd->base.autoneg = AUTONEG_DISABLE;
> > +     }
> > +
> > +     ethtool_add_mode(cmd, supported, Pause);
> > +
> > +     if (gaudi_nic->pfc_enable)
> > +             ethtool_add_mode(cmd, advertising, Pause);
> > +
> > +     return 0;
> > +}
> > +
> > +static int gaudi_nic_set_link_ksettings(struct net_device *netdev,
> > +                             const struct ethtool_link_ksettings *cmd)
> > +{
> > +     struct gaudi_nic_device **ptr = netdev_priv(netdev);
> > +     struct gaudi_nic_device *gaudi_nic = *ptr;
> > +     struct hl_device *hdev = gaudi_nic->hdev;
> > +     u32 port = gaudi_nic->port;
> > +     int rc = 0, speed = cmd->base.speed;
> > +     bool auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
>
> It appears you only support speed and auto_neg. You should check that
> all other things which could be configured are empty, e.g. none of the
> bits are set in cmd->link_modes.advertising. If you are requested to
> configure something which is not supported, you need to return
> -EOPNOTSUPP.
>
>         Andrew

Thanks Andrew,
We will do that and send an updated version.
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:16   ` Oded Gabbay
@ 2020-09-10 20:25     ` Andrew Lunn
  2020-09-10 20:30       ` Oded Gabbay
  2020-09-10 20:28     ` Jakub Kicinski
  1 sibling, 1 reply; 44+ messages in thread
From: Andrew Lunn @ 2020-09-10 20:25 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller

> Can you please elaborate on how to do this with a single driver that
> is already in misc ?
> As I mentioned in the cover letter, we are not developing a
> stand-alone NIC. We have a deep-learning accelerator with a NIC
> interface.

This sounds like an MFD.

     Andrew

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:16   ` Oded Gabbay
  2020-09-10 20:25     ` Andrew Lunn
@ 2020-09-10 20:28     ` Jakub Kicinski
  2020-09-10 20:32       ` Oded Gabbay
  1 sibling, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-10 20:28 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller

On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:  
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h  
> >
> > The relevant code needs to live under drivers/net/(ethernet/).
> > For one thing our automation won't trigger for drivers in random
> > (/misc) part of the tree.  
> 
> Can you please elaborate on how to do this with a single driver that
> is already in misc ?
> As I mentioned in the cover letter, we are not developing a
> stand-alone NIC. We have a deep-learning accelerator with a NIC
> interface.
> Therefore, we don't have a separate PCI physical function for the NIC
> and I can't have a second driver registering to it.

Is it not possible to move the files and still build them into a single
module?

> We did this design based on existing examples in the kernel
> (registering to netdev), such as (just a few examples):
> 1. sgi-xp driver in drivers/misc/sgi-xp (see file xpnet.c)
> 2. bnx2fc in drivers/scsi/bnx2fc
> 
> Thanks,
> Oded


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:25     ` Andrew Lunn
@ 2020-09-10 20:30       ` Oded Gabbay
  2020-09-10 20:38         ` Andrew Lunn
  0 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:30 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller

On Thu, Sep 10, 2020 at 11:25 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Can you please elaborate on how to do this with a single driver that
> > is already in misc ?
> > As I mentioned in the cover letter, we are not developing a
> > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > interface.
>
> This sounds like an MFD.
>
>      Andrew

Yes and no. There is only one functionality - training of deep
learning (Accelerating compute operations) :)
The rdma is just our method of scaling-out - our method of
intra-connection between GAUDI devices (similar to NVlink or AMD
crossfire).
So the H/W exposes a single physical function at the PCI level. And
thus Linux can call a single driver for it during the PCI probe.

I hope that in future generations we will improve that, but it is what
it is for GAUDI.
I don't see how to do it otherwise currently but if you have ideas please share.
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 20:17         ` Oded Gabbay
@ 2020-09-10 20:30           ` Jakub Kicinski
  2020-09-10 20:33             ` Oded Gabbay
  2020-09-14 13:48             ` Omer Shpigelman
  0 siblings, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-10 20:30 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Omer Shpigelman

On Thu, 10 Sep 2020 23:17:59 +0300 Oded Gabbay wrote:
> > Doesn't seem like this one shows any more information than can be
> > queried with ethtool, right?  
> correct, it just displays it in a format that is much more readable

You can cat /sys/class/net/$ifc/carrier if you want 0/1.

> > > nic_mac_loopback
> > > is to set a port to loopback mode and out of it. It's not really
> > > configuration but rather a mode change.  
> >
> > What is this loopback for? Testing?  
> 
> Correct.

Loopback test is commonly implemented via ethtool -t

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:28     ` Jakub Kicinski
@ 2020-09-10 20:32       ` Oded Gabbay
  2020-09-10 21:05         ` Florian Fainelli
  0 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller

On Thu, Sep 10, 2020 at 11:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
> > >
> > > The relevant code needs to live under drivers/net/(ethernet/).
> > > For one thing our automation won't trigger for drivers in random
> > > (/misc) part of the tree.
> >
> > Can you please elaborate on how to do this with a single driver that
> > is already in misc ?
> > As I mentioned in the cover letter, we are not developing a
> > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > interface.
> > Therefore, we don't have a separate PCI physical function for the NIC
> > and I can't have a second driver registering to it.
>
> Is it not possible to move the files and still build them into a single
> module?
hmm...
I actually didn't try that as I thought it will be very strange and
I'm not familiar with other drivers that build as a single ko but have
files spread out in different subsystems.
I don't feel it is a better option than what we did here.

Will I need to split pull requests to different subsystem maintainers
? For the same driver ?
Sounds to me this is not going to fly.
Thanks,
Oded


>
> > We did this design based on existing examples in the kernel
> > (registering to netdev), such as (just a few examples):
> > 1. sgi-xp driver in drivers/misc/sgi-xp (see file xpnet.c)
> > 2. bnx2fc in drivers/scsi/bnx2fc
> >
> > Thanks,
> > Oded
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 20:30           ` Jakub Kicinski
@ 2020-09-10 20:33             ` Oded Gabbay
  2020-09-14 13:48             ` Omer Shpigelman
  1 sibling, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller, Omer Shpigelman

On Thu, Sep 10, 2020 at 11:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 23:17:59 +0300 Oded Gabbay wrote:
> > > Doesn't seem like this one shows any more information than can be
> > > queried with ethtool, right?
> > correct, it just displays it in a format that is much more readable
>
> You can cat /sys/class/net/$ifc/carrier if you want 0/1.
>
> > > > nic_mac_loopback
> > > > is to set a port to loopback mode and out of it. It's not really
> > > > configuration but rather a mode change.
> > >
> > > What is this loopback for? Testing?
> >
> > Correct.
>
> Loopback test is commonly implemented via ethtool -t

ok, thanks for the feedback, we will take a look at it and update the patch.
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:30       ` Oded Gabbay
@ 2020-09-10 20:38         ` Andrew Lunn
  2020-09-10 20:52           ` Oded Gabbay
  2020-09-11  6:22           ` Greg Kroah-Hartman
  0 siblings, 2 replies; 44+ messages in thread
From: Andrew Lunn @ 2020-09-10 20:38 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller

On Thu, Sep 10, 2020 at 11:30:33PM +0300, Oded Gabbay wrote:
> On Thu, Sep 10, 2020 at 11:25 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > Can you please elaborate on how to do this with a single driver that
> > > is already in misc ?
> > > As I mentioned in the cover letter, we are not developing a
> > > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > > interface.
> >
> > This sounds like an MFD.
> >
> >      Andrew
> 
> Yes and no. There is only one functionality - training of deep
> learning (Accelerating compute operations) :)
> The rdma is just our method of scaling-out - our method of
> intra-connection between GAUDI devices (similar to NVlink or AMD
> crossfire).
> So the H/W exposes a single physical function at the PCI level. And
> thus Linux can call a single driver for it during the PCI probe.

Yes, it probes the MFD driver. The MFD driver then creates platform
drivers for the sub functions. i.e. it would create an Ethernet
platform driver. That then gets probed in the usual way. The child
device can get access to the parent device, if it needs to share
things, e.g. a device on a bus. This is typically I2C or SPI, but
there is no reason it cannot be a PCI device.

Go look in drivers/mfd.

      Andrew

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:38         ` Andrew Lunn
@ 2020-09-10 20:52           ` Oded Gabbay
  2020-09-11  6:22           ` Greg Kroah-Hartman
  1 sibling, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 20:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller

On Thu, Sep 10, 2020 at 11:38 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Thu, Sep 10, 2020 at 11:30:33PM +0300, Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:25 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > Can you please elaborate on how to do this with a single driver that
> > > > is already in misc ?
> > > > As I mentioned in the cover letter, we are not developing a
> > > > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > > > interface.
> > >
> > > This sounds like an MFD.
> > >
> > >      Andrew
> >
> > Yes and no. There is only one functionality - training of deep
> > learning (Accelerating compute operations) :)
> > The rdma is just our method of scaling-out - our method of
> > intra-connection between GAUDI devices (similar to NVlink or AMD
> > crossfire).
> > So the H/W exposes a single physical function at the PCI level. And
> > thus Linux can call a single driver for it during the PCI probe.
>
> Yes, it probes the MFD driver. The MFD driver then creates platform
> drivers for the sub functions. i.e. it would create an Ethernet
> platform driver. That then gets probed in the usual way. The child
> device can get access to the parent device, if it needs to share
> things, e.g. a device on a bus. This is typically I2C or SPI, but
> there is no reason it cannot be a PCI device.
>
> Go look in drivers/mfd.
>
>       Andrew

I'm slightly familiar with drivers/mfd and as you mentioned, those are
for "simple" devices, which use a bus with different functionality on
them, like I2C with many devices (sensors for various things, etc).
I've never seen anyone doing a PCI device there and frankly, I don't
see the benefit of trying to migrate our complex PCI driver to that
subsystem, if it will even work.
And I would like to reiterate that our NIC ports are highly integrated
with our compute engines.
They "talk" to each other via sync objects inside the SOC, and all of
them are used as part of the training of the deep learning network.
Another example why this is not MFD - when a compute engine gets
stuck, all the NIC ports are going through reset.
So it's not the same as multiple devices that use the same bus or H/W.
It's a single device with some engines that work in harmony.
The bottom line is that we have single functionality and the scale-out
is done via RDMA that is integrated on the device.
We could have chosen other ways to scale-out (like some proprietary
bus) and then would that count as another functionality ? I think not.

So I'm not going to drivers/mfd with our driver. I wish that I had
multiple PCI PF so I could do a proper Ethernet driver but I can't for
this H/W.
And I think that physically splitting the files into two subsystems
will be very hard to maintain and definitely I will want to hear
Greg's opinion on that.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:32       ` Oded Gabbay
@ 2020-09-10 21:05         ` Florian Fainelli
  2020-09-10 21:15           ` Oded Gabbay
  0 siblings, 1 reply; 44+ messages in thread
From: Florian Fainelli @ 2020-09-10 21:05 UTC (permalink / raw)
  To: Oded Gabbay, Jakub Kicinski
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller



On 9/10/2020 1:32 PM, Oded Gabbay wrote:
> On Thu, Sep 10, 2020 at 11:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
>>> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>> On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
>>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
>>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
>>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
>>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
>>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
>>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
>>>>>   create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
>>>>
>>>> The relevant code needs to live under drivers/net/(ethernet/).
>>>> For one thing our automation won't trigger for drivers in random
>>>> (/misc) part of the tree.
>>>
>>> Can you please elaborate on how to do this with a single driver that
>>> is already in misc ?
>>> As I mentioned in the cover letter, we are not developing a
>>> stand-alone NIC. We have a deep-learning accelerator with a NIC
>>> interface.
>>> Therefore, we don't have a separate PCI physical function for the NIC
>>> and I can't have a second driver registering to it.
>>
>> Is it not possible to move the files and still build them into a single
>> module?
> hmm...
> I actually didn't try that as I thought it will be very strange and
> I'm not familiar with other drivers that build as a single ko but have
> files spread out in different subsystems.
> I don't feel it is a better option than what we did here.
> 
> Will I need to split pull requests to different subsystem maintainers
> ? For the same driver ?
> Sounds to me this is not going to fly.

Not necessarily, you can post your patches to all relevant lists and 
seek maintainer review/acked-by tags from the relevant maintainers. This 
is not unheard of with mlx5 for instance.

Have you considered using notifiers to get your NIC driver registered 
while the NIC code lives in a different module?
-- 
Florian

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 21:05         ` Florian Fainelli
@ 2020-09-10 21:15           ` Oded Gabbay
  2020-09-10 21:23             ` Florian Fainelli
  0 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 21:15 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller

On Fri, Sep 11, 2020 at 12:05 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 9/10/2020 1:32 PM, Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
> >>> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>> On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
> >>>>
> >>>> The relevant code needs to live under drivers/net/(ethernet/).
> >>>> For one thing our automation won't trigger for drivers in random
> >>>> (/misc) part of the tree.
> >>>
> >>> Can you please elaborate on how to do this with a single driver that
> >>> is already in misc ?
> >>> As I mentioned in the cover letter, we are not developing a
> >>> stand-alone NIC. We have a deep-learning accelerator with a NIC
> >>> interface.
> >>> Therefore, we don't have a separate PCI physical function for the NIC
> >>> and I can't have a second driver registering to it.
> >>
> >> Is it not possible to move the files and still build them into a single
> >> module?
> > hmm...
> > I actually didn't try that as I thought it will be very strange and
> > I'm not familiar with other drivers that build as a single ko but have
> > files spread out in different subsystems.
> > I don't feel it is a better option than what we did here.
> >
> > Will I need to split pull requests to different subsystem maintainers
> > ? For the same driver ?
> > Sounds to me this is not going to fly.
>
> Not necessarily, you can post your patches to all relevant lists and
> seek maintainer review/acked-by tags from the relevant maintainers. This
> is not unheard of with mlx5 for instance.
Yeah, I see what you are saying, the problem is that sometimes,
because everything is tightly integrated in our SOC, the patches
contain code from common code (common to ALL our ASICs, even those who
don't have NIC at all), GAUDI specific code which is not NIC related
and the NIC code itself.
But I guess that as a last resort if this is a *must* I can do that.
Though I would like to hear Greg's opinion on this as he is my current
maintainer.

Personally I do want to send relevant patches to netdev because I want
to get your expert reviews on them, but still keep the code in a
single location.

>
> Have you considered using notifiers to get your NIC driver registered
> while the NIC code lives in a different module?
Yes, and I prefered to keep it simple. I didn't want to start sending
notifications to the NIC driver every time, for example, I needed to
reset the SOC because a compute engine got stuck. Or vice-versa - when
some error happened in the NIC to start sending notifications to the
common driver.

In addition, from my AMD days, we had a very tough time managing two
drivers that "talk" to each other and manage the same H/W. I'm talking
about amdgpu for graphics and amdkfd for compute (which I was the
maintainer). AMD is working in the past years to unite those two
drivers to get out of that mess. That's why I didn't want to go down
that road.

Thanks,
Oded

> --
> Florian

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 21:15           ` Oded Gabbay
@ 2020-09-10 21:23             ` Florian Fainelli
  0 siblings, 0 replies; 44+ messages in thread
From: Florian Fainelli @ 2020-09-10 21:23 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller



On 9/10/2020 2:15 PM, Oded Gabbay wrote:
> On Fri, Sep 11, 2020 at 12:05 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>>
>> On 9/10/2020 1:32 PM, Oded Gabbay wrote:
>>> On Thu, Sep 10, 2020 at 11:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
>>>>> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>> On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
>>>>>>
>>>>>> The relevant code needs to live under drivers/net/(ethernet/).
>>>>>> For one thing our automation won't trigger for drivers in random
>>>>>> (/misc) part of the tree.
>>>>>
>>>>> Can you please elaborate on how to do this with a single driver that
>>>>> is already in misc ?
>>>>> As I mentioned in the cover letter, we are not developing a
>>>>> stand-alone NIC. We have a deep-learning accelerator with a NIC
>>>>> interface.
>>>>> Therefore, we don't have a separate PCI physical function for the NIC
>>>>> and I can't have a second driver registering to it.
>>>>
>>>> Is it not possible to move the files and still build them into a single
>>>> module?
>>> hmm...
>>> I actually didn't try that as I thought it will be very strange and
>>> I'm not familiar with other drivers that build as a single ko but have
>>> files spread out in different subsystems.
>>> I don't feel it is a better option than what we did here.
>>>
>>> Will I need to split pull requests to different subsystem maintainers
>>> ? For the same driver ?
>>> Sounds to me this is not going to fly.
>>
>> Not necessarily, you can post your patches to all relevant lists and
>> seek maintainer review/acked-by tags from the relevant maintainers. This
>> is not unheard of with mlx5 for instance.
> Yeah, I see what you are saying, the problem is that sometimes,
> because everything is tightly integrated in our SOC, the patches
> contain code from common code (common to ALL our ASICs, even those who
> don't have NIC at all), GAUDI specific code which is not NIC related
> and the NIC code itself.
> But I guess that as a last resort if this is a *must* I can do that.
> Though I would like to hear Greg's opinion on this as he is my current
> maintainer.
> 
> Personally I do want to send relevant patches to netdev because I want
> to get your expert reviews on them, but still keep the code in a
> single location.

We do have network drivers sprinkled across the kernel tree already, but 
I would agree that from a networking maintainer perspective this makes 
auditing code harder, you would naturally grep for net/ and drivers/net 
and easily miss arch/uml/ for instance. When you do treewide changes, 
having all your ducklings in the same pond is a lot easier.

There is a possible "risk" with posting a patch series for the 
habanalabs driver to netdev that people will be wondering what this is 
about and completely miss it is about the networking bits. If there is a 
NIC driver under drivers/net then people will start to filter or pay 
attention based on the directory.

> 
>>
>> Have you considered using notifiers to get your NIC driver registered
>> while the NIC code lives in a different module?
> Yes, and I prefered to keep it simple. I didn't want to start sending
> notifications to the NIC driver every time, for example, I needed to
> reset the SOC because a compute engine got stuck. Or vice-versa - when
> some error happened in the NIC to start sending notifications to the
> common driver.
> 
> In addition, from my AMD days, we had a very tough time managing two
> drivers that "talk" to each other and manage the same H/W. I'm talking
> about amdgpu for graphics and amdkfd for compute (which I was the
> maintainer). AMD is working in the past years to unite those two
> drivers to get out of that mess. That's why I didn't want to go down
> that road.

You are trading an indirect call for a direct call, and it does provide 
some nice interface, but it could be challenging to work with given the 
context in which the notifier is called can be problematic. You could 
still have direct module references then, and that would avoid the need 
for notifiers.

You are the driver maintainer, so you definitively have a bigger say in 
the matter than most of us, drive by contributors.
-- 
Florian

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
  2020-09-10 20:38         ` Andrew Lunn
  2020-09-10 20:52           ` Oded Gabbay
@ 2020-09-11  6:22           ` Greg Kroah-Hartman
  1 sibling, 0 replies; 44+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-11  6:22 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Oded Gabbay, Jakub Kicinski, Linux-Kernel@Vger. Kernel. Org,
	netdev, SW_Drivers, David S. Miller

On Thu, Sep 10, 2020 at 10:38:48PM +0200, Andrew Lunn wrote:
> On Thu, Sep 10, 2020 at 11:30:33PM +0300, Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:25 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > Can you please elaborate on how to do this with a single driver that
> > > > is already in misc ?
> > > > As I mentioned in the cover letter, we are not developing a
> > > > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > > > interface.
> > >
> > > This sounds like an MFD.
> > >
> > >      Andrew
> > 
> > Yes and no. There is only one functionality - training of deep
> > learning (Accelerating compute operations) :)
> > The rdma is just our method of scaling-out - our method of
> > intra-connection between GAUDI devices (similar to NVlink or AMD
> > crossfire).
> > So the H/W exposes a single physical function at the PCI level. And
> > thus Linux can call a single driver for it during the PCI probe.
> 
> Yes, it probes the MFD driver. The MFD driver then creates platform
> drivers for the sub functions. i.e. it would create an Ethernet
> platform driver. That then gets probed in the usual way. The child
> device can get access to the parent device, if it needs to share
> things, e.g. a device on a bus. This is typically I2C or SPI, but
> there is no reason it cannot be a PCI device.
> 
> Go look in drivers/mfd.

Why do we keep going down this path each time this comes up...

No, mfd doesn't work for this case, but the "virtual bus" that gets
posted ever once in a while would solve this issue.  Hopefully one day
the Intel developers will wake up and post it again here so that we can
review it and hopefully merge it to solve this very common problem...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support
  2020-09-10 20:03   ` Jakub Kicinski
  2020-09-10 20:18     ` Oded Gabbay
@ 2020-09-14  9:52     ` Omer Shpigelman
  2020-09-14 16:47       ` Jakub Kicinski
  1 sibling, 1 reply; 44+ messages in thread
From: Omer Shpigelman @ 2020-09-14  9:52 UTC (permalink / raw)
  To: Jakub Kicinski, Oded Gabbay
  Cc: linux-kernel, netdev, SW_Drivers, gregkh, davem

On Thu, Sep 10, 2020 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 10 Sep 2020 19:11:16 +0300 Oded Gabbay wrote:
> > +module_param(nic_rx_poll, int, 0444);
> MODULE_PARM_DESC(nic_rx_poll,
> > +	"Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");
> 
> If your chip does not support IRQ coalescing you can configure polling and the
> timeout via ethtool -C, rather than a module parameter.

I couldn't find an example for that in other drivers and I didn't see anything regarding polling mode in the parameters description of this ethtool callback.
Can you please specify some pointer for that? Or in other words, what parameter can we use to enable polling/setting the timeout?

Thanks,
Omer

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-10 20:30           ` Jakub Kicinski
  2020-09-10 20:33             ` Oded Gabbay
@ 2020-09-14 13:48             ` Omer Shpigelman
  2020-09-14 16:50               ` Jakub Kicinski
  1 sibling, 1 reply; 44+ messages in thread
From: Omer Shpigelman @ 2020-09-14 13:48 UTC (permalink / raw)
  To: Jakub Kicinski, Oded Gabbay
  Cc: Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller

On Thu, Sep 10, 2020 at 11:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 10 Sep 2020 23:17:59 +0300 Oded Gabbay wrote:
> > > Doesn't seem like this one shows any more information than can be
> > > queried with ethtool, right?
> > correct, it just displays it in a format that is much more readable
> 
> You can cat /sys/class/net/$ifc/carrier if you want 0/1.
> 
> > > > nic_mac_loopback
> > > > is to set a port to loopback mode and out of it. It's not really
> > > > configuration but rather a mode change.
> > >
> > > What is this loopback for? Testing?
> >
> > Correct.
> 
> Loopback test is commonly implemented via ethtool -t

This debugfs entry is only to set the port to loopback mode, not running a loopback test.
Hence IMO adding a private flag is more suitable here and please correct me if I'm wrong.
But either way, doing that from ethtool instead of debugfs is not a good practice in our case.
Due to HW limitations, when we switch a port to/from loopback mode, we need to reset the device.
Since ethtool works on specific interface rather than an entire device, we'll need to reset the device 10 times in order to switch the entire device to loopback mode.
Moreover, running this command for one interface affects other interfaces which is not desirable when using ethtool AFAIK.
Is there any other acceptable debugfs-like mechanism for that?

Thanks,
Omer

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support
  2020-09-14  9:52     ` Omer Shpigelman
@ 2020-09-14 16:47       ` Jakub Kicinski
  0 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-14 16:47 UTC (permalink / raw)
  To: Omer Shpigelman
  Cc: Oded Gabbay, linux-kernel, netdev, SW_Drivers, gregkh, davem

On Mon, 14 Sep 2020 09:52:00 +0000 Omer Shpigelman wrote:
> On Thu, Sep 10, 2020 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 10 Sep 2020 19:11:16 +0300 Oded Gabbay wrote:  
> > > +module_param(nic_rx_poll, int, 0444);  
> > MODULE_PARM_DESC(nic_rx_poll,  
> > > +	"Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");  
> > 
> > If your chip does not support IRQ coalescing you can configure polling and the
> > timeout via ethtool -C, rather than a module parameter.  
> 
> I couldn't find an example for that in other drivers and I didn't see
> anything regarding polling mode in the parameters description of this
> ethtool callback.
> Can you please specify some pointer for that? Or in other words, what
> parameter can we use to enable polling/setting the timeout?

Look at stmmac, hip04_eth..

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-14 13:48             ` Omer Shpigelman
@ 2020-09-14 16:50               ` Jakub Kicinski
  2020-09-15 12:57                 ` Oded Gabbay
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-14 16:50 UTC (permalink / raw)
  To: Omer Shpigelman
  Cc: Oded Gabbay, Linux-Kernel@Vger. Kernel. Org, netdev, SW_Drivers,
	Greg Kroah-Hartman, David S. Miller

On Mon, 14 Sep 2020 13:48:14 +0000 Omer Shpigelman wrote:
> On Thu, Sep 10, 2020 at 11:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 10 Sep 2020 23:17:59 +0300 Oded Gabbay wrote:  
> > > > Doesn't seem like this one shows any more information than can be
> > > > queried with ethtool, right?  
> > > correct, it just displays it in a format that is much more readable  
> > 
> > You can cat /sys/class/net/$ifc/carrier if you want 0/1.
> >   
> > > > > nic_mac_loopback
> > > > > is to set a port to loopback mode and out of it. It's not really
> > > > > configuration but rather a mode change.  
> > > >
> > > > What is this loopback for? Testing?  
> > >
> > > Correct.  
> > 
> > Loopback test is commonly implemented via ethtool -t  
> 
> This debugfs entry is only to set the port to loopback mode, not running a loopback test.
> Hence IMO adding a private flag is more suitable here and please correct me if I'm wrong.
> But either way, doing that from ethtool instead of debugfs is not a good practice in our case.
> Due to HW limitations, when we switch a port to/from loopback mode, we need to reset the device.
> Since ethtool works on specific interface rather than an entire device, we'll need to reset the device 10 times in order to switch the entire device to loopback mode.
> Moreover, running this command for one interface affects other interfaces which is not desirable when using ethtool AFAIK.
> Is there any other acceptable debugfs-like mechanism for that?

What's the use for a networking device which only communicates with
itself, other than testing?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-14 16:50               ` Jakub Kicinski
@ 2020-09-15 12:57                 ` Oded Gabbay
  2020-09-16 16:38                   ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2020-09-15 12:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Omer Shpigelman, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller

On Mon, Sep 14, 2020 at 7:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 14 Sep 2020 13:48:14 +0000 Omer Shpigelman wrote:
> > On Thu, Sep 10, 2020 at 11:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Thu, 10 Sep 2020 23:17:59 +0300 Oded Gabbay wrote:
> > > > > Doesn't seem like this one shows any more information than can be
> > > > > queried with ethtool, right?
> > > > correct, it just displays it in a format that is much more readable
> > >
> > > You can cat /sys/class/net/$ifc/carrier if you want 0/1.
> > >
> > > > > > nic_mac_loopback
> > > > > > is to set a port to loopback mode and out of it. It's not really
> > > > > > configuration but rather a mode change.
> > > > >
> > > > > What is this loopback for? Testing?
> > > >
> > > > Correct.
> > >
> > > Loopback test is commonly implemented via ethtool -t
> >
> > This debugfs entry is only to set the port to loopback mode, not running a loopback test.
> > Hence IMO adding a private flag is more suitable here and please correct me if I'm wrong.
> > But either way, doing that from ethtool instead of debugfs is not a good practice in our case.
> > Due to HW limitations, when we switch a port to/from loopback mode, we need to reset the device.
> > Since ethtool works on specific interface rather than an entire device, we'll need to reset the device 10 times in order to switch the entire device to loopback mode.
> > Moreover, running this command for one interface affects other interfaces which is not desirable when using ethtool AFAIK.
> > Is there any other acceptable debugfs-like mechanism for that?
>
> What's the use for a networking device which only communicates with
> itself, other than testing?

No use, and we do have a suite of tests that runs from user-space on
the device after we move the interfaces to loopback mode.
The main problem, as Omer said, is that we have two H/W bugs:

1. Where you need to reset the entire SoC in case you want to move a
single interface into (or out of) loopback mode. So doing it via
ethtool will cause a reset to the entire SoC, and if you want to move
all 10 ports to loopback mode, you need to reset the device 10 times
before you can actually use that.

2. Our 10 ports are divided into 5 groups of 2 ports each, from H/W
POV. That means if you move port 0 to loopback mode, it will affect
port 1 (and vice-versa). I don't think we want that behavior.

That's why we need this specific exception to the rule and do it via
debugfs. I understand it is not common practice, but due to H/W bugs
we can't workaround, we ask this exception.

Thanks,
Oded

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC
  2020-09-15 12:57                 ` Oded Gabbay
@ 2020-09-16 16:38                   ` Jakub Kicinski
  0 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2020-09-16 16:38 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Omer Shpigelman, Linux-Kernel@Vger. Kernel. Org, netdev,
	SW_Drivers, Greg Kroah-Hartman, David S. Miller

On Tue, 15 Sep 2020 15:57:16 +0300 Oded Gabbay wrote:
> On Mon, Sep 14, 2020 at 7:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > What's the use for a networking device which only communicates with
> > itself, other than testing?  
> 
> No use, and we do have a suite of tests that runs from user-space on
> the device after we move the interfaces to loopback mode.
> The main problem, as Omer said, is that we have two H/W bugs:
> 
> 1. Where you need to reset the entire SoC in case you want to move a
> single interface into (or out of) loopback mode. So doing it via
> ethtool will cause a reset to the entire SoC, and if you want to move
> all 10 ports to loopback mode, you need to reset the device 10 times
> before you can actually use that.
> 
> 2. Our 10 ports are divided into 5 groups of 2 ports each, from H/W
> POV. That means if you move port 0 to loopback mode, it will affect
> port 1 (and vice-versa). I don't think we want that behavior.
> 
> That's why we need this specific exception to the rule and do it via
> debugfs. I understand it is not common practice, but due to H/W bugs
> we can't workaround, we ask this exception.

Are those tests open source?

Are you sure you need this upstream? Are your users going to run those
tests?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 06/15] habanalabs/gaudi: add NIC PHY code
  2020-09-10 15:03 Oded Gabbay
@ 2020-09-10 15:03 ` Oded Gabbay
  0 siblings, 0 replies; 44+ messages in thread
From: Oded Gabbay @ 2020-09-10 15:03 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: gregkh, Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

Configure the NIC PHY (physical layer). The PHY is configured with the
correct polarity and Tx taps depending on the card type.

After the initial configuration, the PHY flow contains the following:
- Auto-negotiation (if enabled)
- PHY F/W tuning
- Physical Coding Sublayer (PCS) link check

After acquiring the initial PCS link, it is checked periodically. Once we
detect that there is no link, we fall to PHY F/W tuning or even
Auto-negotiation to re-acquire the link.

Currently we use Auto-negotiation only because it is a prerequisite for
link training (physical layer quality improvement) and not for setting the
transmission parameters. As a result, the Auto-negotiation is currently
supported only between Gaudi cards.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/Makefile    |    2 +-
 drivers/misc/habanalabs/gaudi/gaudi_nic.c |  454 +++++++-
 drivers/misc/habanalabs/gaudi/gaudi_nic.h |   17 +
 drivers/misc/habanalabs/gaudi/gaudi_phy.c | 1272 +++++++++++++++++++++
 4 files changed, 1742 insertions(+), 3 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c

diff --git a/drivers/misc/habanalabs/gaudi/Makefile b/drivers/misc/habanalabs/gaudi/Makefile
index 24e14cff563d..c5143cf6f025 100644
--- a/drivers/misc/habanalabs/gaudi/Makefile
+++ b/drivers/misc/habanalabs/gaudi/Makefile
@@ -2,4 +2,4 @@
 HL_GAUDI_FILES := gaudi/gaudi.o gaudi/gaudi_hwmgr.o gaudi/gaudi_security.o \
 	gaudi/gaudi_coresight.o
 
-HL_GAUDI_FILES += gaudi/gaudi_nic.o
+HL_GAUDI_FILES += gaudi/gaudi_nic.o gaudi/gaudi_phy.o
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index df41de95ba58..ff08cfc81e69 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -687,13 +687,26 @@ static void config_port_mac(struct gaudi_nic_device *gaudi_nic)
 	}
 }
 
+static void phy_start_stop(struct gaudi_nic_device *gaudi_nic, bool is_start)
+{
+	int i;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->power_up_mask & BIT(i)))
+			continue;
+
+		gaudi_nic_phy_start_stop(gaudi_nic, i, is_start);
+	}
+}
+
 static int hw_config(struct gaudi_nic_device *gaudi_nic)
 {
 	struct hl_device *hdev = gaudi_nic->hdev;
 	struct gaudi_device *gaudi = hdev->asic_specific;
 	u64 mac_addr = 0, tmr_addr;
 	u32 port = gaudi_nic->port, data_rate, speed = gaudi_nic->speed;
-	int i;
+	int i, rc;
+	bool do_auto_neg;
 
 	for (i = 0 ; i < ETH_ALEN ; i++) {
 		mac_addr <<= 8;
@@ -729,6 +742,26 @@ static int hw_config(struct gaudi_nic_device *gaudi_nic)
 
 	gaudi_nic->data_rate = data_rate;
 
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->power_up_mask & BIT(i)))
+				continue;
+
+			do_auto_neg = gaudi_nic->auto_neg_enable &&
+					(gaudi_nic->auto_neg_mask & BIT(i));
+
+			rc = gaudi_nic_phy_power_up(gaudi_nic, i, do_auto_neg);
+			if (rc) {
+				dev_err(hdev->dev,
+					"PHY power up failed for port %d\n",
+					port);
+				return rc;
+			}
+		}
+
+		phy_start_stop(gaudi_nic, true);
+	}
+
 	/* if no need in macro configuration, do only port configuration */
 	if (gaudi_nic->do_macro_cfg) {
 		config_port_mac(gaudi_nic);
@@ -1191,6 +1224,364 @@ static void port_reset_state(struct gaudi_nic_device *gaudi_nic)
 	gaudi_nic->uncorrectable_errors_cnt = 0;
 }
 
+static void phy_reconfig(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u32 port = gaudi_nic->port;
+	int i, rc;
+
+	if (!gaudi->nic_phy_config_fw)
+		return;
+
+	dev_dbg(hdev->dev, "reconfiguring PHY, port %d\n", port);
+
+	if (gaudi_nic->auto_neg_enable) {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->auto_neg_mask & BIT(i)))
+				continue;
+
+			rc = gaudi_nic_phy_fw_config_auto_neg(gaudi_nic, i);
+			if (rc)
+				dev_dbg(hdev->dev,
+					"F/W reconfig autoneg failed, port: %d, lane: %d\n",
+					port, i);
+		}
+	} else {
+		for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+			if (!(gaudi_nic->power_up_mask & BIT(i)))
+				continue;
+
+			rc = gaudi_nic_phy_power_up(gaudi_nic, i, false);
+			if (rc) {
+				dev_err(hdev->dev,
+					"PHY reconfig power up failed for port %d\n",
+					port);
+				break;
+			}
+		}
+	}
+
+	port_reset_state(gaudi_nic);
+}
+
+static enum link_status update_pcs_link_failure(
+					struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	struct kfifo *pcs_fifo = &gaudi_nic->pcs_fail_fifo;
+	ktime_t now, before;
+	u32 port = gaudi_nic->port;
+	int count;
+
+	if (!gaudi_nic->auto_neg_enable)
+		return PCS_DOWN;
+
+	now = ktime_get();
+
+	count = kfifo_in(pcs_fifo, &now, sizeof(now));
+	if (count != sizeof(now)) {
+		dev_err(hdev->dev,
+			"Failed to push to PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+		return PCS_DOWN;
+	}
+
+	gaudi_nic->pcs_fail_cnt++;
+
+	if (gaudi_nic->pcs_fail_cnt < gaudi->nic_pcs_fail_threshold)
+		return PCS_DOWN;
+
+	/*
+	 * Here we reached the threshold count of failures to reconfigure the
+	 * link. Now need to check if all of the failure are in the needed time
+	 * frame. It is sufficient to check the first item in the queue as it is
+	 * the earliest failure and if it is in the needed time frame, all the
+	 * rest if failures are in it too.
+	 */
+	count = kfifo_out_peek(pcs_fifo, &before, sizeof(before));
+	if (count != sizeof(before))
+		dev_err(hdev->dev,
+			"Failed to peek in PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+
+	if (ktime_ms_delta(now, before) <=
+			(gaudi->nic_pcs_fail_time_frame * MSEC_PER_SEC)) {
+		dev_dbg(hdev->dev,
+			"PHY reconfig due to PCS link failure cnt, port: %d\n",
+			port);
+		return FAIL_RECONFIG;
+	}
+
+	/*
+	 * The earliest failure is not in the needed time frame, hence
+	 * we can remove it.
+	 */
+	count = kfifo_out(pcs_fifo, &before, sizeof(before));
+	if (count != sizeof(before))
+		dev_err(hdev->dev,
+			"Failed to pop from PCS fifo, size: %d, count: %d, port: %d\n",
+			gaudi_nic->pcs_fail_cnt, count, port);
+
+	gaudi_nic->pcs_fail_cnt--;
+
+	return PCS_DOWN;
+}
+
+static void reset_tx(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	int i;
+
+	/* This temporary WA is only for HLS external ports */
+	if ((hdev->card_type != cpucp_card_type_pmc) ||
+			(BIT(gaudi_nic->port) & ~hdev->nic_ports_ext_mask))
+		return;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++)
+		if (gaudi_nic->fw_tuning_mask & BIT(i))
+			gaudi_nic_phy_reset_tx(gaudi_nic, i);
+}
+
+static enum link_status _check_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port, pcs_val, mac_val,
+		start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+	int i, rc;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_check_link_status(gaudi_nic, i);
+		if (rc)
+			return PHY_DOWN;
+	}
+
+	/* need to check the first lane only */
+	mac_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "mac", 0x40);
+
+	if (mac_val & 1)
+		gaudi_nic->pcs_local_fault_cnt++;
+	else if (gaudi_nic->pcs_local_fault_cnt)
+		gaudi_nic->pcs_local_fault_cnt--;
+
+	if (mac_val & 2)
+		gaudi_nic->pcs_remote_fault_cnt++;
+	else if (gaudi_nic->pcs_remote_fault_cnt)
+		gaudi_nic->pcs_remote_fault_cnt--;
+
+	if (gaudi_nic->pcs_remote_fault_cnt == PCS_FAULT_THRESHOLD) {
+		dev_dbg(hdev->dev,
+			"PHY reconfig due to PCS remote fault cnt, port: %d\n",
+			port);
+		return FAULT_RECONFIG;
+	}
+
+	/* need to check the first lane only */
+	pcs_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs", 0x20);
+
+	if ((pcs_val >> 12) & 1)
+		return LINK_UP;
+
+	return PCS_DOWN;
+}
+
+static void check_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = gaudi_nic->hdev->asic_specific;
+	u32 port = gaudi_nic->port;
+	enum link_status link_status;
+
+	if (!gaudi->nic_check_link)
+		return;
+
+	link_status = _check_pcs_link(gaudi_nic);
+	if ((link_status == PCS_DOWN) || (link_status == PHY_DOWN)) {
+		/* Try again to overcome a momentary glitch */
+		msleep(PCS_LINK_RETRY_MSEC);
+
+		link_status = _check_pcs_link(gaudi_nic);
+
+		if (link_status == LINK_UP)
+			dev_info(hdev->dev, "PCS link restored, port %d\n",
+					port);
+	}
+
+	if (link_status == LINK_UP)
+		return;
+
+	set_port_status(gaudi_nic, false);
+	gaudi_nic->pcs_link = false;
+	gaudi_nic->last_pcs_link_drop_ts = ktime_get();
+
+	dev_info(hdev->dev, "%s lost signal, port %d\n",
+			(link_status == PHY_DOWN) ? "PHY" : "PCS", port);
+
+	/* TODO: fix the bug in the retimer to remove this Tx reset WA */
+	/*
+	 * No need to update about the PCS failure if we already need to
+	 * reconfigure the PHY.
+	 */
+	if (link_status == FAULT_RECONFIG)
+		reset_tx(gaudi_nic);
+	else
+		link_status = update_pcs_link_failure(gaudi_nic);
+
+	if ((link_status == FAULT_RECONFIG) ||
+			(link_status == FAIL_RECONFIG))
+		phy_reconfig(gaudi_nic);
+}
+
+static void acquire_pcs_link(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port, pcs_val,
+		start_lane = __ffs(gaudi_nic->fw_tuning_mask);
+
+	/* need to check the first lane only */
+	pcs_val = gaudi_nic_mac_read(gaudi_nic, start_lane, "xpcs", 0x20);
+	gaudi_nic->pcs_link = (pcs_val >> 12) & 1;
+	gaudi_nic->retry_cnt++;
+
+	if (gaudi_nic->pcs_link) {
+		dev_info(hdev->dev, "PCS link up, port %d\n", port);
+		set_port_status(gaudi_nic, true);
+		gaudi_nic->retry_cnt = 0;
+	} else if (gaudi_nic->retry_cnt == PCS_LINK_CNT) {
+		if (ktime_after(gaudi_nic->last_fw_tuning_ts,
+				gaudi_nic->last_pcs_link_drop_ts))
+			dev_dbg(hdev->dev,
+				"PHY_reconfig due to PCS link down after F/W tuning, port %d\n",
+				port);
+		else
+			dev_dbg(hdev->dev,
+				"PHY reconfig due to PCS link cnt, port %d\n",
+				port);
+		phy_reconfig(gaudi_nic);
+	}
+}
+
+static void do_fw_tuning(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int i, rc = 0;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_fw_tuning(gaudi_nic, i, true);
+		if (rc) {
+			if (rc == -EAGAIN) {
+				if (gaudi_nic->retry_cnt++ == FW_TUNING_CNT) {
+					dev_dbg(hdev->dev,
+						"PHY reconfig due to F/W tuning cnt, port %d, lane %d\n",
+						port, i);
+					phy_reconfig(gaudi_nic);
+				}
+			} else {
+				dev_dbg(hdev->dev,
+					"PHY F/W tuning failed for port %d, lane %d, rc %d\n",
+					port, i, rc);
+				phy_reconfig(gaudi_nic);
+			}
+			break;
+		}
+	}
+
+	if (!rc) {
+		gaudi_nic->phy_fw_tuned = true;
+		gaudi_nic->retry_cnt = 0;
+		gaudi_nic->last_fw_tuning_ts = ktime_get();
+	}
+}
+
+static void do_fw_tuning_auto_neg(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	int i, rc;
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->auto_neg_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_fw_tuning(gaudi_nic, i, false);
+		if (rc) {
+			if (rc != -EAGAIN)
+				dev_dbg(hdev->dev,
+					"PHY auto neg F/W tuning failed, port %d, lane %d, rc %d\n",
+					port, i, rc);
+			return;
+		}
+	}
+
+	for (i = NIC_MAC_LANES_START ; i < NIC_MAC_NUM_OF_LANES ; i++) {
+		if (!(gaudi_nic->fw_tuning_mask & BIT(i)))
+			continue;
+
+		rc = gaudi_nic_phy_config_pam4_link_training(gaudi_nic, i);
+		if (rc) {
+			if (rc == -EAGAIN) {
+				if (gaudi_nic->retry_cnt++ ==
+						FW_LINK_TRAINING_CNT) {
+					dev_dbg(hdev->dev,
+						"PHY reconfig due to PAM4 cnt, port: %d, lane: %d\n",
+						port, i);
+					phy_reconfig(gaudi_nic);
+				}
+			} else {
+				dev_dbg(hdev->dev,
+					"PHY auto neg F/W speed config failed, port %d, lane %d, rc %d\n",
+					port, i, rc);
+				phy_reconfig(gaudi_nic);
+			}
+
+			return;
+		}
+	}
+
+	dev_dbg(hdev->dev, "auto neg done, port: %d\n", port);
+	gaudi_nic->auto_neg_resolved = true;
+	gaudi_nic->retry_cnt = 0;
+	do_fw_tuning(gaudi_nic);
+}
+
+static void check_link_status(struct work_struct *work)
+{
+	struct gaudi_nic_device *gaudi_nic = container_of(work,
+							struct gaudi_nic_device,
+							link_status_work.work);
+	u32 timeout_ms;
+
+	if (gaudi_nic->phy_fw_tuned) {
+		if (gaudi_nic->pcs_link)
+			check_pcs_link(gaudi_nic);
+		else
+			acquire_pcs_link(gaudi_nic);
+	} else {
+		if (gaudi_nic->auto_neg_enable && !gaudi_nic->auto_neg_resolved)
+			do_fw_tuning_auto_neg(gaudi_nic);
+		else
+			do_fw_tuning(gaudi_nic);
+	}
+
+	if (gaudi_nic->pcs_link)
+		timeout_ms = 1000;
+	else if (gaudi_nic->phy_fw_tuned)
+		timeout_ms = 500;
+	else
+		timeout_ms = 1;
+
+	schedule_delayed_work(&gaudi_nic->link_status_work,
+				msecs_to_jiffies(timeout_ms));
+}
+
 static int _gaudi_nic_sw_init(struct gaudi_nic_device *gaudi_nic)
 {
 	struct hl_device *hdev = gaudi_nic->hdev;
@@ -1576,7 +1967,13 @@ static int port_open(struct gaudi_nic_device *gaudi_nic)
 		napi_enable(&gaudi_nic->napi);
 	}
 
-	set_port_status(gaudi_nic, true);
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback) {
+		INIT_DELAYED_WORK(&gaudi_nic->link_status_work,
+					check_link_status);
+		schedule_delayed_work(&gaudi_nic->link_status_work, 0);
+	} else {
+		set_port_status(gaudi_nic, true);
+	}
 
 	gaudi_nic->port_open = true;
 
@@ -1628,10 +2025,17 @@ static void port_close(struct gaudi_nic_device *gaudi_nic)
 	gaudi_nic->port_open = false;
 	gaudi_nic->active = false;
 
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback)
+		cancel_delayed_work_sync(&gaudi_nic->link_status_work);
+
 	/* Print if not in hard reset flow e.g. from ifconfig */
 	if (gaudi_nic->pcs_link && !hdev->hard_reset_pending)
 		dev_info(hdev->dev, "port %d was closed\n", port);
 
+	/* stop F/W so the peer port will also lose link */
+	if (gaudi->nic_phy_config_fw && !gaudi_nic->mac_loopback)
+		phy_start_stop(gaudi_nic, false);
+
 	port_reset_state(gaudi_nic);
 
 	kfifo_free(&gaudi_nic->pcs_fail_fifo);
@@ -1882,6 +2286,19 @@ static int port_register(struct hl_device *hdev, int port)
 	ether_addr_copy(ndev->dev_addr,
 		hdev->asic_prop.cpucp_nic_info.mac_addrs[port].mac_addr);
 
+	/*
+	 * Reset the NIC macro PHY before the PHY configuration by each port.
+	 * This function resets all the 4 lanes in the PHY macro, therefore only
+	 * one of the two ports should call it.
+	 */
+	if (gaudi->nic_phy_config_fw && gaudi_nic->do_macro_cfg) {
+		rc = gaudi_nic_phy_reset_macro(gaudi_nic);
+		if (rc)
+			dev_err(hdev->dev,
+				"PHY power up 1 failed for port %d\n",
+				port);
+	}
+
 	if (register_netdev(ndev)) {
 		dev_err(hdev->dev,
 			"Could not register netdevice, port: %d\n", port);
@@ -2051,6 +2468,24 @@ int gaudi_nic_ports_init(struct hl_device *hdev)
 				cpu_to_le32((card_location >> 22) & 0x7);
 	}
 
+	if (gaudi->nic_phy_load_fw) {
+		rc = gaudi_nic_phy_has_fw(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "NIC F/W file was not found\n");
+			return rc;
+		}
+
+		rc = gaudi_nic_phy_fw_load_all(hdev);
+		if (rc) {
+			dev_err(hdev->dev, "NIC F/W load for all failed\n");
+			return rc;
+		}
+	}
+
+	if (gaudi->nic_phy_config_fw)
+		dev_dbg(hdev->dev, "NIC F/W CRC: 0x%x\n",
+				gaudi_nic_phy_get_crc(hdev));
+
 	for (i = 0 ; i < NIC_NUMBER_OF_MACROS ; i++) {
 		gaudi->nic_macros[i].idx = i;
 		gaudi->nic_macros[i].num_of_lanes = NIC_LANES_2;
@@ -2272,6 +2707,21 @@ void gaudi_nic_ports_reopen(struct hl_device *hdev)
 		gaudi_nic = &gaudi->nic_devices[i];
 		port = gaudi_nic->port;
 
+		/*
+		 * Reset the NIC macro PHY before the PHY configuration by each
+		 * port. This function resets all the 4 lanes in the PHY macro,
+		 * therefore only one of the two ports should call it.
+		 * This must be called before we check if the port is enabled,
+		 * as the PHY reset should be called anyway.
+		 */
+		if (gaudi->nic_phy_config_fw && gaudi_nic->do_macro_cfg) {
+			rc = gaudi_nic_phy_reset_macro(gaudi_nic);
+			if (rc)
+				dev_err(hdev->dev,
+					"PHY power up 1 failed for port %d\n",
+					port);
+		}
+
 		/*
 		 * It could be that the port was shutdown by 'ifconfig down',
 		 * and there is no need in reopening it.
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.h b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
index 34bcf0514d30..1b2c42fb927c 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.h
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.h
@@ -333,5 +333,22 @@ int gaudi_nic_port_reset(struct gaudi_nic_device *gaudi_nic);
 bool disabled_or_in_reset(struct gaudi_nic_device *gaudi_nic);
 u64 gaudi_nic_read_mac_stat_counter(struct hl_device *hdev, u32 port, int idx,
 					bool is_rx);
+int gaudi_nic_phy_reset_macro(struct gaudi_nic_device *gaudi_nic);
+int gaudi_nic_phy_power_up(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool do_auto_neg);
+int gaudi_nic_phy_has_fw(struct hl_device *hdev);
+int gaudi_nic_phy_fw_tuning(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool check_status);
+int gaudi_nic_phy_fw_load_all(struct hl_device *hdev);
+int gaudi_nic_phy_check_link_status(struct gaudi_nic_device *gaudi_nic,
+					int lane);
+int gaudi_nic_phy_config_pam4_link_training(struct gaudi_nic_device *gaudi_nic,
+						int lane);
+int gaudi_nic_phy_fw_config_auto_neg(struct gaudi_nic_device *gaudi_nic,
+					int lane);
+u16 gaudi_nic_phy_get_crc(struct hl_device *hdev);
+void gaudi_nic_phy_reset_tx(struct gaudi_nic_device *gaudi_nic, int lane);
+void gaudi_nic_phy_start_stop(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool is_start);
 
 #endif /* GAUDI_NIC_DRV_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_phy.c b/drivers/misc/habanalabs/gaudi/gaudi_phy.c
new file mode 100644
index 000000000000..e96188d9e47f
--- /dev/null
+++ b/drivers/misc/habanalabs/gaudi/gaudi_phy.c
@@ -0,0 +1,1272 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi_nic.h"
+#include "../include/gaudi/asic_reg/gaudi_regs.h"
+
+#include <linux/module.h>
+#include <linux/firmware.h>
+#include <asm/unaligned.h>
+
+#define HL_PHY_DEBUG 0
+
+#define GAUDI_PHY_FW_FILE	"habanalabs/gaudi/gaudi_nic_fw.bin"
+
+#define PHY_READ_COUNTS_PER_MS	1000
+#define PHY_FW_SIZE		0x1020
+#define PHY_FW_FINISHED		(1 << 2)
+#define PHY_FW_ERROR		(1 << 3)
+
+static void phy_write_all(struct hl_device *hdev, u32 addr, u32 data)
+{
+	int lane, port;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	for (port = 0 ; port < 10 ; port += 2)
+		for (lane = 0 ; lane < 4 ; lane++) {
+			NIC_MACRO_WREG32_PORT(phy_base + 0xF60 + lane * 4, addr,
+						port);
+			/* only the lower 16 bits are in use */
+			NIC_MACRO_WREG32_PORT(phy_base - 0x8000 + 0x2000 * lane,
+						data & 0xFFFF, port);
+		}
+}
+
+static void phy_write_port(struct hl_device *hdev, int port, int lane, u32 addr,
+				u32 data)
+{
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32_PORT(phy_base + 0xF60 + lane * 4, addr, port);
+	/* only the lower 16 bits are in use */
+	NIC_MACRO_WREG32_PORT(phy_base - 0x8000 + 0x2000 * lane, data & 0xFFFF,
+				port);
+}
+
+static void phy_write(struct gaudi_nic_device *gaudi_nic, int lane, u32 addr,
+			u32 data)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32(phy_base + 0xF60 + lane * 4, addr);
+	/* only the lower 16 bits are in use */
+	NIC_MACRO_WREG32(phy_base - 0x8000 + 0x2000 * lane, data & 0xFFFF);
+}
+
+static u32 phy_read_port(struct hl_device *hdev, int port, int lane, u32 addr)
+{
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32_PORT(phy_base + 0xF60 + lane * 4, addr, port);
+	/* only the lower 16 bits are in use */
+	return NIC_MACRO_RREG32_PORT(phy_base - 0x8000 + 0x2000 * lane, port) &
+					0xFFFF;
+}
+
+static u32 phy_read(struct gaudi_nic_device *gaudi_nic, int lane, u32 addr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+
+	NIC_MACRO_WREG32(phy_base + 0xF60 + lane * 4, addr);
+
+	/* only the lower 16 bits are in use */
+	return NIC_MACRO_RREG32(phy_base - 0x8000 + 0x2000 * lane) & 0xFFFF;
+}
+
+static void phy_write_mask(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 addr, u32 raw_data, u32 mask)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 phy_base = mmNIC0_PHY_BASE - CFG_BASE;
+	u32 data;
+
+	NIC_MACRO_WREG32(phy_base + 0xF60 + lane * 4, addr);
+
+	data = (NIC_MACRO_RREG32(phy_base - 0x8000 + 0x2000 * lane)) & 0xFFFF;
+	data = (data & ~mask) | (((raw_data << (__ffs(mask) % 32))) & 0xFFFF);
+
+	NIC_MACRO_WREG32(phy_base - 0x8000 + 0x2000 * lane, data);
+}
+
+static u32 twos_to_int(s32 twos_val, u32 bitWidth)
+{
+	return (u32) ((s32) (twos_val) -
+				((s32) ((twos_val << 1) & (1 << bitWidth))));
+}
+
+static int fw_cmd_port(struct hl_device *hdev, int port, int lane, u32 cmd,
+			u32 detail, u32 expected_res, u32 *res_ptr)
+{
+	u32 res, val;
+	int checks;
+
+	if (detail)
+		phy_write_port(hdev, port, lane, 0x9816, detail);
+
+	phy_write_port(hdev, port, lane, 0x9815, cmd);
+
+	checks = 0;
+	do {
+		usleep_range(1000, 2000);
+		res = phy_read_port(hdev, port, lane, 0x9815);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_err(hdev->dev, "timeout for PHY cmd 0x%x\n", cmd);
+			return -ETIMEDOUT;
+		}
+	} while (res == cmd);
+
+	val = (res >> 8) & 0xF;
+	if (val != expected_res) {
+		dev_err(hdev->dev, "cmd 0x%x returned error 0x%x\n", cmd, val);
+		return -EFAULT;
+	}
+
+	*res_ptr = res;
+
+	return 0;
+}
+
+static int fw_cmd(struct gaudi_nic_device *gaudi_nic, int lane, u32 cmd,
+			u32 detail, u32 expected_res, u32 *res_ptr)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 res, val;
+	int checks;
+
+	if (detail)
+		phy_write(gaudi_nic, lane, 0x9816, detail);
+
+	phy_write(gaudi_nic, lane, 0x9815, cmd);
+
+	checks = 0;
+	do {
+		usleep_range(1000, 2000);
+		res = phy_read(gaudi_nic, lane, 0x9815);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_dbg(hdev->dev,
+				"timeout for PHY cmd 0x%x port %d lane %d\n",
+				cmd, port, lane);
+			return -ETIMEDOUT;
+		}
+	} while (res == cmd);
+
+	val = (res >> 8) & 0xF;
+	if (val != expected_res) {
+		dev_dbg(hdev->dev,
+			"cmd 0x%x returned error 0x%x port %d lane %d\n", cmd,
+			val, port, lane);
+		return -EFAULT;
+	}
+
+	*res_ptr = res;
+
+	return 0;
+}
+
+static int fw_hash_port(struct hl_device *hdev, int port, int lane, u32 *hash)
+{
+	u32 res, low_word;
+	int rc;
+
+	rc = fw_cmd_port(hdev, port, lane, 0xF000, 0, 0xF, &res);
+	if (rc) {
+		dev_err(hdev->dev, "F/W hash failed for port %d lane %d\n",
+			port, lane);
+		return rc;
+	}
+
+	low_word = phy_read_port(hdev, port, lane, 0x9816);
+
+	*hash = ((res & 0xFF) << 16) | low_word;
+
+	return 0;
+}
+
+static void set_pll(struct gaudi_nic_device *gaudi_nic, int lane, u32 data_rate,
+			bool pam4)
+{
+	u32 pll_n_val = 0, pll_cap_val = 0;
+	bool div4 = 1; /* for easy debug in the future */
+
+	phy_write_mask(gaudi_nic, lane, 0xFF, 1, 1 << 5);
+
+	if (!pam4)
+		phy_write_mask(gaudi_nic, lane, 0x179, data_rate == NIC_DR_10,
+				1);
+
+	if (data_rate == NIC_DR_50) {
+		if (div4)
+			pll_n_val = 170;
+		else
+			pll_n_val = 42;
+
+		pll_cap_val = 10;
+	} else if (data_rate == NIC_DR_25) {
+		if (div4)
+			pll_n_val = 165;
+		else
+			pll_n_val = 41;
+
+		pll_cap_val = 12;
+	} else if (data_rate == NIC_DR_10) {
+		if (div4)
+			pll_n_val = 132;
+		else
+			pll_n_val = 33;
+
+		pll_cap_val = 34;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xFD, pll_n_val, 0xFF80);
+	phy_write_mask(gaudi_nic, lane, 0xFC, pll_cap_val, 0xFC00);
+}
+
+static void set_tx_taps(struct gaudi_nic_device *gaudi_nic, int lane,
+			s32 tx_pre2, s32 tx_pre1, s32 tx_main, s32 tx_post1,
+			s32 tx_post2)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAD, twos_to_int(tx_pre2, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xAB, twos_to_int(tx_pre1, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA9, twos_to_int(tx_main, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA7, twos_to_int(tx_post1, 8), 0xFF00);
+	phy_write_mask(gaudi_nic, lane, 0xA5, twos_to_int(tx_post2, 8), 0xFF00);
+}
+
+static void config_nrz_tx(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool half_rate)
+{
+	phy_write(gaudi_nic, lane, 0xAF, 0xF83E);
+	phy_write(gaudi_nic, lane, 0xB0, 0x4802);
+	phy_write_mask(gaudi_nic, lane, 0xB0, half_rate ? 1 : 0, 1);
+	phy_write_mask(gaudi_nic, lane, 0xB0, 0, 0x800);
+	phy_write_mask(gaudi_nic, lane, 0xB0, 1, 0x800);
+	phy_write(gaudi_nic, lane, 0xA0, 0xE300);
+	set_tx_taps(gaudi_nic, lane, 0, -4, 25, 0, 0);
+}
+
+static void config_pam4_tx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 lane_idx = (gaudi_nic->port >> 1) * NIC_MAC_NUM_OF_LANES + lane;
+	s32 *taps;
+
+	taps = gaudi->nic_pam4_tx_taps[lane_idx].taps;
+
+	phy_write(gaudi_nic, lane, 0xAF, 0xF83E);
+	phy_write(gaudi_nic, lane, 0xB0, 0);
+	phy_write(gaudi_nic, lane, 0xB0, 0x800);
+	phy_write(gaudi_nic, lane, 0xB0, 0);
+	phy_write(gaudi_nic, lane, 0xA0, 0xEF00);
+	set_tx_taps(gaudi_nic, lane, taps[0], taps[1], taps[2], taps[3],
+			taps[4]);
+}
+
+static void pol(struct gaudi_nic_device *gaudi_nic, int lane, bool pam4,
+		u32 tx_pol, u32 rx_pol)
+{
+	phy_write_mask(gaudi_nic, lane, 0xA0, tx_pol, 0x20);
+	phy_write_mask(gaudi_nic, lane, 0x161, rx_pol, 0x4000); /* nrz */
+	phy_write_mask(gaudi_nic, lane, 0x43, rx_pol, 0x80); /* pam4 */
+}
+
+static void msblsb(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_msblsb,
+		u32 rx_msblsb)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_msblsb, 0x400);
+	phy_write_mask(gaudi_nic, lane, 0x43, rx_msblsb, 0x8000);
+}
+
+static void gc(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_gc,
+		u32 rx_gc)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_gc, 0x200);
+	phy_write_mask(gaudi_nic, lane, 0x42, rx_gc, 1);
+}
+
+static void pc(struct gaudi_nic_device *gaudi_nic, int lane, u32 tx_pc,
+		u32 rx_pc)
+{
+	phy_write_mask(gaudi_nic, lane, 0xAF, tx_pc, 0x100);
+	phy_write_mask(gaudi_nic, lane, 0x42, rx_pc, 2);
+}
+
+static void set_prbs_type(struct gaudi_nic_device *gaudi_nic, int lane,
+		bool pam4, char *pat)
+{
+	u32 prbs_mode_sel_addr;
+	u32 prbs_mode_sel_mask;
+	u32 pat_sel = 0;
+
+	if (pam4) {
+		prbs_mode_sel_addr = 0x43;
+		prbs_mode_sel_mask = 0x60;
+	} else {
+		prbs_mode_sel_addr = 0x161;
+		prbs_mode_sel_mask = 0x3000;
+	}
+
+	if (pam4) {
+		if (!strncmp(pat, "PRBS9", strlen(pat)))
+			pat_sel = 0;
+		else if (!strncmp(pat, "PRBS13", strlen(pat)))
+			pat_sel = 1;
+		else if (!strncmp(pat, "PRBS15", strlen(pat)))
+			pat_sel = 2;
+		else if (!strncmp(pat, "PRBS31", strlen(pat)))
+			pat_sel = 3;
+	} else {
+		if (!strncmp(pat, "PRBS9", strlen(pat)))
+			pat_sel = 0;
+		else if (!strncmp(pat, "PRBS15", strlen(pat)))
+			pat_sel = 1;
+		else if (!strncmp(pat, "PRBS23", strlen(pat)))
+			pat_sel = 2;
+		else if (!strncmp(pat, "PRBS31", strlen(pat)))
+			pat_sel = 3;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xA0, pat_sel, 0x300);
+	phy_write_mask(gaudi_nic, lane, prbs_mode_sel_addr, pat_sel,
+			prbs_mode_sel_mask);
+}
+
+static void get_pol_tx_rx(struct gaudi_nic_device *gaudi_nic, u32 lane_idx,
+				u32 *pol_tx, u32 *pol_rx)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 card_location =
+			le32_to_cpu(hdev->asic_prop.cpucp_info.card_location);
+
+	switch (hdev->card_type) {
+	case cpucp_card_type_pci:
+		switch (lane_idx) {
+		case 0 ... 3:
+		case 10 ... 11:
+			*pol_tx = 0;
+			*pol_rx = 0;
+			break;
+		case 5 ... 8:
+		case 12:
+		case 16:
+			*pol_tx = 0;
+			*pol_rx = 1;
+			break;
+		case 15:
+		case 19:
+			*pol_tx = 1;
+			*pol_rx = 0;
+			break;
+		case 4:
+		case 9:
+		case 13 ... 14:
+		case 17 ... 18:
+			*pol_tx = 1;
+			*pol_rx = 1;
+			break;
+		default:
+			dev_err(hdev->dev, "PCI NIC %d wrong lane idx %d\n",
+				gaudi_nic->port, lane_idx);
+			break;
+		}
+		break;
+
+	case cpucp_card_type_pmc:
+		*pol_tx = *pol_rx = 0;
+		switch (card_location) {
+		case 0:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 1:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 2:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 3:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 4:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 5:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 10:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 6:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3:
+			case 5 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		case 7:
+			switch (lane_idx) {
+			case 0 ... 1:
+			case 3 ... 6:
+			case 8 ... 9:
+			case 12 ... 15:
+				fallthrough;
+			case 17:
+			case 19:
+				*pol_rx = 1;
+				break;
+			case 2:
+			case 16:
+			case 18:
+				*pol_tx = 1;
+				break;
+			default:
+				break;
+			}
+			break;
+		}
+		break;
+	default:
+		dev_err(hdev->dev, "wrong card type %d\n", hdev->card_type);
+		break;
+	}
+}
+
+static void config_connection(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool pam4, bool do_auto_neg)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct cpucp_nic_info *nic_info = &hdev->asic_prop.cpucp_nic_info;
+	char *prbs = "PRBS31";
+	u32 pol_tx = 0;
+	u32 pol_rx = 0;
+	u32 msblsb_tx = 0;
+	u32 msblsb_rx = 0;
+	u32 gc_tx = 1;
+	u32 gc_rx = 1;
+	u32 pc_tx = 0;
+	u32 pc_rx = 0;
+	u32 lane_idx = (gaudi_nic->port >> 1) * NIC_MAC_NUM_OF_LANES + lane;
+
+	if (!pam4)
+		gc_tx = gc_rx = 0;
+
+	if (gaudi->nic_use_fw_polarity) {
+		pol_tx =
+			(le64_to_cpu(nic_info->pol_tx_mask[0]) >> lane_idx) & 1;
+		pol_rx =
+			(le64_to_cpu(nic_info->pol_rx_mask[0]) >> lane_idx) & 1;
+	} else {
+		get_pol_tx_rx(gaudi_nic, lane_idx, &pol_tx, &pol_rx);
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xF7, 1, 0x1000);
+	pol(gaudi_nic, lane, pam4, pol_tx, pol_rx);
+	msblsb(gaudi_nic, lane, msblsb_tx, msblsb_rx);
+	gc(gaudi_nic, lane, gc_tx, gc_rx);
+	pc(gaudi_nic, lane, pc_tx, pc_rx);
+
+	set_prbs_type(gaudi_nic, lane, pam4, prbs);
+}
+
+static void functional_mode(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool pam4)
+{
+	if (!pam4) {
+		phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+		phy_write_mask(gaudi_nic, lane, 0x161, 0, 0x400);
+	} else {
+		phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+		phy_write_mask(gaudi_nic, lane, 0x43, 0, 0x10);
+	}
+}
+
+static u32 get_fw_reg(struct gaudi_nic_device *gaudi_nic, int lane, u32 fw_addr)
+{
+	u32 ignore;
+
+	fw_cmd(gaudi_nic, lane, 0xE010, fw_addr, 0xE, &ignore);
+
+	return phy_read(gaudi_nic, lane, 0x9812);
+}
+
+static void config_pam4_fw_rx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x1000);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0400);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0800);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0x1, 0x0200);
+
+	phy_write(gaudi_nic, lane, 0x43, 0x8CFA);
+	phy_write(gaudi_nic, lane, 0x44, 0x1035);
+	phy_write(gaudi_nic, lane, 0x45, 0x1008);
+}
+
+static int fw_config_speed_nrz(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 data_rate, u32 speed, bool half_rate,
+				bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 ignore;
+	int rc, i;
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80C0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed nrz configuration of lane %d\n",
+			lane);
+		return rc;
+	}
+
+	config_nrz_tx(gaudi_nic, lane, half_rate);
+	phy_write_mask(gaudi_nic, lane, 0x0161, 0x1D, 0xFC00);
+	config_connection(gaudi_nic, lane, pam4, false);
+	functional_mode(gaudi_nic, lane, pam4);
+
+	/* clock configuration */
+	for (i = 0 ; i < 4 ; i++)
+		if (i == 0)
+			phy_write(gaudi_nic, i, 0x00C9, 0x390);
+		else
+			phy_write(gaudi_nic, i, 0x00C9, 0x310);
+
+	set_pll(gaudi_nic, lane, data_rate, pam4);
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+int gaudi_nic_phy_fw_config_auto_neg(struct gaudi_nic_device *gaudi_nic,
+					int lane)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	u32 ignore;
+	u64 basepage = 0x000080000001;
+	int rc;
+
+	usleep_range(500, 1000);
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	set_pll(gaudi_nic, lane, NIC_DR_25, false);
+
+	/* Disable AN/LT lane swapping */
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+	config_nrz_tx(gaudi_nic, lane, 0);
+
+	/* config_nrz_fw_rx */
+	phy_write_mask(gaudi_nic, lane, 0x0161, 0x1D, 0x0);
+	config_connection(gaudi_nic, lane, false, true);
+
+	phy_write_mask(gaudi_nic, lane, 0x8300, 7, 0xE000);
+
+	/* AN mode */
+	phy_write(gaudi_nic, lane, 0x8010, basepage & 0xffff);
+	phy_write(gaudi_nic, lane, 0x8011, (basepage >> 16) & 0xffff);
+	phy_write(gaudi_nic, lane, 0x8012, (basepage >> 32) & 0xffff);
+
+	/* IEEE */
+	phy_write_mask(gaudi_nic, lane, 0x8300, 1, 0x1000);
+
+	if (gaudi->nic_phy_auto_neg_lpbk)
+		phy_write_mask(gaudi_nic, lane, 0x8300, 1, 0x400);
+
+	/* set FW to start AN */
+	rc = fw_cmd(gaudi_nic, lane, 0x8000, 0, 8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd 0x8000 failed for auto neg, port %d, lane %d\n",
+			gaudi_nic->port, lane);
+		return rc;
+	}
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+static int fw_config_speed_pam4(struct gaudi_nic_device *gaudi_nic, int lane,
+				u32 data_rate, u32 speed, bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 ignore;
+	int rc;
+
+	dev_dbg(hdev->dev,
+		"port: %d, lane: %d, data rate: %d, pam4: %d, speed: %d\n",
+		gaudi_nic->port, lane, data_rate, pam4, speed);
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80D0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed pam4 configuration of lane %d\n",
+			lane);
+		return rc;
+	}
+
+	config_pam4_tx(gaudi_nic, lane);
+	config_pam4_fw_rx(gaudi_nic, lane);
+	config_connection(gaudi_nic, lane, pam4, false);
+	functional_mode(gaudi_nic, lane, pam4);
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+int gaudi_nic_phy_config_pam4_link_training(struct gaudi_nic_device *gaudi_nic,
+						int lane)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 port = gaudi_nic->port;
+	u32 ignore, speed = 9;
+	int rc;
+
+#if HL_PHY_DEBUG
+	dev_dbg(hdev->dev, "NIC %d lane: %d, speed: %d\n", port, lane, speed);
+#endif
+
+	/* clear go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 0, 0x8000);
+
+	/* Disable lane swapping */
+	phy_write_mask(gaudi_nic, lane, 0x8440, 0, 0x8000);
+
+	/* Enable Link Training */
+	speed |= 0x100;
+
+	config_pam4_tx(gaudi_nic, lane);
+	phy_write_mask(gaudi_nic, lane, 0xA0, 0, 0x2000);
+	config_pam4_fw_rx(gaudi_nic, lane);
+	config_connection(gaudi_nic, lane, true, false);
+
+	rc = fw_cmd(gaudi_nic, lane, 0x80D0, speed, 0x8, &ignore);
+	if (rc) {
+		dev_err(hdev->dev,
+			"F/W cmd failed for speed pam4 configuration of port %d lane %d\n",
+			port, lane);
+		return rc;
+	}
+
+	phy_write_mask(gaudi_nic, lane, 0xAF, 0, 0x200);
+	phy_write_mask(gaudi_nic, lane, 0xAF, 0, 0x100);
+	phy_write_mask(gaudi_nic, lane, 0x42, 0, 0x2);
+	phy_write_mask(gaudi_nic, lane, 0x42, 0, 0x1);
+
+	/* set go bit */
+	phy_write_mask(gaudi_nic, lane, 0x980F, 1, 0x8000);
+
+	return 0;
+}
+
+static int fw_config(struct gaudi_nic_device *gaudi_nic, int lane,
+			u32 data_rate, bool fmode, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+
+	set_pll(gaudi_nic, lane, data_rate, pam4);
+
+	if (data_rate == NIC_DR_10)
+		return fw_config_speed_nrz(gaudi_nic, lane, data_rate, 1, 1,
+						fmode, pam4);
+	else if (data_rate == NIC_DR_25 || data_rate == NIC_DR_26)
+		return fw_config_speed_nrz(gaudi_nic, lane, data_rate, 3, 0,
+						fmode, pam4);
+	else if (data_rate == NIC_DR_50)
+		return fw_config_speed_pam4(gaudi_nic, lane, data_rate, 9,
+						fmode, pam4);
+
+	dev_err(hdev->dev, "invalid data_rate %d\n", data_rate);
+
+	return -EFAULT;
+}
+
+static int fw_crc_port(struct hl_device *hdev, int port, int lane, u16 *crc)
+{
+	u32 res;
+	int rc;
+
+	rc = fw_cmd_port(hdev, port, lane, 0xF001, 0, 0xF, &res);
+	if (rc) {
+		dev_err(hdev->dev, "F/W crc failed for port %d lane %d\n", port,
+			lane);
+		return rc;
+	}
+
+	*crc = phy_read_port(hdev, port, lane, 0x9816) & 0xFFFF;
+
+	return 0;
+}
+
+int gaudi_nic_phy_has_fw(struct hl_device *hdev)
+{
+	const struct firmware *fw;
+	int rc;
+
+	rc = request_firmware(&fw, GAUDI_PHY_FW_FILE, hdev->dev);
+	if (rc) {
+		dev_err(hdev->dev, "Firmware file %s is not found!\n",
+				GAUDI_PHY_FW_FILE);
+		return rc;
+	}
+
+	if (fw->size < PHY_FW_SIZE) {
+		dev_err(hdev->dev, "Illegal %s firmware size %zu\n",
+				GAUDI_PHY_FW_FILE, fw->size);
+		rc = -EFAULT;
+	}
+
+	release_firmware(fw);
+
+	return rc;
+}
+
+static void fw_unload_all(struct hl_device *hdev, bool pam4)
+{
+	phy_write_all(hdev, 0x9814, 0xFFF0);
+	phy_write_all(hdev, 0x980D, 0xAAA);
+	phy_write_all(hdev, 0x980D, 0);
+
+	msleep(100);
+
+	phy_write_all(hdev, 0x9814, 0);
+
+	if (pam4)
+		phy_write_all(hdev, 0x11, 0);
+	else
+		phy_write_all(hdev, 0x10B, 0);
+}
+
+u16 gaudi_nic_phy_get_crc(struct hl_device *hdev)
+{
+	u16 crc = 0;
+
+	fw_crc_port(hdev, 0, 0, &crc);
+
+	return crc;
+}
+
+int gaudi_nic_phy_fw_load_all(struct hl_device *hdev)
+{
+	const struct firmware *fw;
+	const void *fw_data;
+	u32 entry_point, length, ram_addr, sections, status, checks, hash = 0,
+		checksum = 0x800C, fw0 = 0x9F00, fw1 = 0x980D, fw2 = 0x9814;
+	u16 mdio_data, crc = 0;
+	int rc, i, j, port, data_ptr = 0, lane = 0;
+	bool pam4 = true; /* for debug */
+
+	fw_unload_all(hdev, pam4);
+
+	rc = request_firmware(&fw, GAUDI_PHY_FW_FILE, hdev->dev);
+	if (rc) {
+		dev_err(hdev->dev, "Firmware file %s is not found!\n",
+				GAUDI_PHY_FW_FILE);
+		return rc;
+	}
+
+	if (fw->size < PHY_FW_SIZE) {
+		dev_err(hdev->dev, "Illegal %s firmware size %zu\n",
+				GAUDI_PHY_FW_FILE, fw->size);
+		release_firmware(fw);
+		return -EFAULT;
+	}
+
+	fw_data = (const void *) fw->data;
+	fw_data += 0x1000;
+
+	/* skip hash, crc and date */
+	entry_point = get_unaligned_be32(fw_data + 8);
+	length = get_unaligned_be32(fw_data + 12);
+	ram_addr = get_unaligned_be32(fw_data + 16);
+
+	dev_dbg(hdev->dev, "entry_point: 0x%x\n", entry_point);
+	dev_dbg(hdev->dev, "length: 0x%x\n", length);
+
+	fw_data += 20;
+
+	sections = DIV_ROUND_UP(length, 24);
+
+	dev_dbg(hdev->dev, "sections: %d\n", sections);
+
+	phy_write_all(hdev, fw2, 0xFFF0);
+	phy_write_all(hdev, fw1, 0x0AAA);
+	phy_write_all(hdev, fw1, 0);
+
+	msleep(500);
+
+	checks = 0;
+	do {
+		usleep_range(10000, 20000);
+		status = phy_read_port(hdev, 0, 0, fw2);
+		dev_dbg(hdev->dev, "lane: %d, status: 0x%x\n", lane, status);
+		if (checks++ > PHY_READ_COUNTS_PER_MS) {
+			dev_err(hdev->dev,
+				"failed to load NIC F/W, fw2 timeout 0x%x\n",
+				status);
+			release_firmware(fw);
+			return -ETIMEDOUT;
+		}
+	} while (status);
+
+	phy_write_all(hdev, fw2, 0);
+
+	for (i = 0 ; i <= sections ; i++) {
+		checksum = 0x800C;
+		phy_write_all(hdev, fw0 + 12, ram_addr >> 16);
+		phy_write_all(hdev, fw0 + 13, ram_addr & 0xFFFF);
+		checksum += (ram_addr >> 16) + (ram_addr & 0xFFFF);
+		for (j = 0 ; j < 12 ; j++) {
+			if (data_ptr >= length)
+				mdio_data = 0;
+			else
+				mdio_data =
+					get_unaligned_be16(fw_data + data_ptr);
+
+			phy_write_all(hdev, fw0 + j, mdio_data);
+			checksum += mdio_data;
+			data_ptr += 2;
+			ram_addr += 2;
+		}
+
+		phy_write_all(hdev, fw0 + 14, (~checksum + 1) & 0xFFFF);
+		phy_write_all(hdev, fw0 + 15, 0x800C);
+
+		checks = 0;
+
+		do {
+			usleep_range(1000, 2000);
+			status = phy_read_port(hdev, 0, 0, fw0 + 15);
+			if (checks++ > PHY_READ_COUNTS_PER_MS) {
+				dev_err(hdev->dev,
+					"failed to load NIC F/W, fw0 timeout 0x%x\n",
+					status);
+				release_firmware(fw);
+				return -ETIMEDOUT;
+			}
+		} while (status == 0x800C);
+	}
+
+	phy_write_all(hdev, fw0 + 12, entry_point >> 16);
+	phy_write_all(hdev, fw0 + 13, entry_point & 0xFFFF);
+	checksum = (entry_point >> 16) + (entry_point & 0xFFFF) + 0x4000;
+	phy_write_all(hdev, fw0 + 14, (~checksum + 1) & 0xFFFF);
+	phy_write_all(hdev, fw0 + 15, 0x4000);
+
+	for (port = 0 ; port < 1 ; port += 2)
+		for (lane = 0 ; lane < 1 ; lane++) {
+			fw_crc_port(hdev, port, lane, &crc);
+			dev_dbg(hdev->dev, "port: %d lane: %d crc: 0x%x\n",
+				port, lane, crc);
+			fw_hash_port(hdev, port, lane, &hash);
+			dev_dbg(hdev->dev, "port: %d lane: %d hash: 0x%x\n",
+				port, lane, hash);
+		}
+
+	return 0;
+}
+
+static u32 fw_tuning_counter(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	return get_fw_reg(gaudi_nic, lane, 5);
+}
+
+static u32 fw_reset_counter(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	return get_fw_reg(gaudi_nic, lane, 4);
+}
+
+static void print_eye(struct gaudi_nic_device *gaudi_nic, int lane, bool pam4)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 dac, eye, mask, val1, val2;
+	s32 plus_margin, minus_margin, result, diff;
+	int pam4_eye[3], eye_index, i, sel;
+
+	if (pam4) {
+		dac = (phy_read(gaudi_nic, lane, 0x28) & 0x1E0) >> 5;
+		for (eye_index = 0; eye_index < 3; eye_index++) {
+			result = 0xffff;
+			for (i = 0; i < 3; i++) {
+				sel = 3 * i + eye_index;
+				phy_write_mask(gaudi_nic, lane, 0x88, sel,
+						0xF00);
+				phy_write_mask(gaudi_nic, lane, 0x88, sel,
+						0xF000);
+
+				msleep(100);
+
+				val1 = phy_read(gaudi_nic, lane, 0x32);
+				plus_margin = (val1 & 0xFFF0) >> 4;
+				if (plus_margin > 0x7ff)
+					plus_margin = plus_margin - 0x1000;
+
+				val1 = phy_read(gaudi_nic, lane, 0x32);
+				val2 = phy_read(gaudi_nic, lane, 0x33);
+				minus_margin = ((val1 & 0xF) << 8) +
+						((val2 & 0xFF00) >> 8);
+				if (minus_margin > 0x7ff)
+					minus_margin = minus_margin - 0x1000;
+
+				diff = plus_margin - minus_margin;
+				if (diff < result)
+					result = diff;
+			}
+
+			pam4_eye[eye_index] =
+					(result * (100 + (50 * dac))) / 2048;
+		}
+
+		dev_dbg(hdev->dev,
+			"NIC PAM4 dac: %d eye0: %d eye1: %d eye2: %d\n", dac,
+			pam4_eye[0], pam4_eye[1], pam4_eye[2]);
+	} else {
+		mask = 0xF000;
+		dac = (phy_read(gaudi_nic, lane, 0x17F) & mask) >> __ffs(mask);
+		mask = 0xFFF;
+		eye = (phy_read(gaudi_nic, lane, 0x12A) & mask) >> __ffs(mask);
+
+		dev_dbg(hdev->dev, "dac: %d, eye: %d\n", dac, eye);
+
+		if (eye > 0)
+			dev_dbg(hdev->dev,
+				"NIC port %d lane %d: F/W eye is %d\n",
+				gaudi_nic->port, lane,
+				(eye * (200 + 50 * dac)) / 2048);
+		else
+			dev_err(hdev->dev,
+				"NIC port %d lane %d: F/W got no eye\n",
+				gaudi_nic->port, lane);
+	}
+}
+
+int gaudi_nic_phy_check_link_status(struct gaudi_nic_device *gaudi_nic,
+					int lane)
+{
+	u32 phy_status;
+#if HL_PHY_DEBUG
+	bool signal_detect;
+#endif
+	bool phy_ready, pam4 = gaudi_nic->data_rate == NIC_DR_50;
+
+	if (pam4) {
+		phy_status = phy_read(gaudi_nic, lane, 0x6A);
+		phy_ready = ((phy_status & 0x8000) >> 15) & 1;
+#if HL_PHY_DEBUG
+		signal_detect = ((phy_status & 0x80) >> 7) & 1;
+#endif
+	} else {
+		phy_status = phy_read(gaudi_nic, lane, 0x12E);
+		phy_ready = ((phy_status & 0x4) >> 2) & 1;
+#if HL_PHY_DEBUG
+		signal_detect = ((phy_status & 0x8) >> 3) & 1;
+#endif
+	}
+
+#if HL_PHY_DEBUG
+	{
+		struct hl_device *hdev = gaudi_nic->hdev;
+
+		dev_dbg_ratelimited(hdev->dev,
+			"port: %d, lane, %d, phy ready: %d, signal detect: %d\n",
+			gaudi_nic->port, lane, phy_ready, signal_detect);
+	}
+#endif
+
+	return phy_ready ? 0 : -EFAULT;
+}
+
+int gaudi_nic_phy_fw_tuning(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool check_status)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 status, port = gaudi_nic->port;
+	bool pam4 = gaudi_nic->data_rate == NIC_DR_50;
+
+	fw_tuning_counter(gaudi_nic, lane);
+	fw_reset_counter(gaudi_nic, lane);
+	status = phy_read(gaudi_nic, lane, 0x9811);
+
+	if (status & PHY_FW_FINISHED) {
+		if (status & PHY_FW_ERROR) {
+			dev_dbg(hdev->dev, "NIC %d lane %d F/W tuning failed\n",
+				port, lane);
+			return -EFAULT;
+		}
+#if HL_PHY_DEBUG
+		dev_dbg(hdev->dev,
+			"NIC %d lane %d F/W Tuning is done\n", port, lane);
+#endif
+	} else {
+		return -EAGAIN;
+	}
+
+	if (!gaudi_nic->auto_neg_enable) {
+		phy_write_mask(gaudi_nic, lane, 0x14D, 1, 1 << 15);
+		print_eye(gaudi_nic, lane, pam4);
+	} else if (!check_status) {
+		return 0;
+	}
+
+	return gaudi_nic_phy_check_link_status(gaudi_nic, lane);
+}
+
+int gaudi_nic_phy_power_up(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool do_auto_neg)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	u32 data_rate = gaudi_nic->data_rate;
+	bool pam4 = data_rate == NIC_DR_50, fmode = 0;
+	int rc;
+
+	dev_dbg(hdev->dev, "PHY power up port %d lane %d auto_neg: %d\n",
+		gaudi_nic->port, lane, do_auto_neg);
+
+	/* F/W configurations */
+	if (gaudi_nic->auto_neg_enable) {
+		if (do_auto_neg) {
+			rc = gaudi_nic_phy_fw_config_auto_neg(gaudi_nic, lane);
+			if (rc) {
+				dev_err(hdev->dev,
+					"F/W configuration failed for NIC PHY\n");
+				return rc;
+			}
+		}
+	} else {
+		rc = fw_config(gaudi_nic, lane, data_rate, fmode, pam4);
+		if (rc) {
+			dev_err(hdev->dev,
+				"F/W configuration failed for NIC PHY\n");
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+int gaudi_nic_phy_reset_macro(struct gaudi_nic_device *gaudi_nic)
+{
+	struct hl_device *hdev = gaudi_nic->hdev;
+	s32 chip_reset_addr = 0x980D;
+	bool fmode = 0;
+	int rc, i;
+
+	dev_dbg(hdev->dev, "PHY reset macro, port %d\n", gaudi_nic->port);
+
+	/* soft reset */
+	for (i = 0 ; i < 4 ; i++)
+		phy_write(gaudi_nic, i, chip_reset_addr, 0x888);
+
+	usleep_range(500, 1000);
+
+	/* clock configuration */
+	for (i = 0 ; i < 4 ; i++)
+		if (i == 0)
+			phy_write(gaudi_nic, i, 0x00C9, 0x390);
+		else
+			phy_write(gaudi_nic, i, 0x00C9, 0x310);
+
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, 0x8000, 0xC000);
+		phy_write(gaudi_nic, i, 0x8210, 0);
+		phy_write(gaudi_nic, i, 0x8100, 0);
+	}
+
+	/* PHY controller reset - to force F/W to start from pointer 0 */
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, chip_reset_addr, 0xAAA);
+		phy_write(gaudi_nic, i, chip_reset_addr, 0);
+	}
+
+	/* force the lane pll to run in PAM4 before logical reset */
+	for (i = 0 ; i < 4 ; i++) {
+		rc = fw_config(gaudi_nic, i, NIC_DR_50, fmode, true);
+		if (rc) {
+			dev_err(hdev->dev,
+				"F/W configuration failed for NIC PHY\n");
+			return rc;
+		}
+	}
+
+	/* logic reset */
+	for (i = 0 ; i < 4 ; i++) {
+		phy_write(gaudi_nic, i, chip_reset_addr, 0x777);
+		phy_write(gaudi_nic, i, chip_reset_addr, 0);
+	}
+
+	usleep_range(500, 1000);
+
+	return 0;
+}
+
+void gaudi_nic_phy_reset_tx(struct gaudi_nic_device *gaudi_nic, int lane)
+{
+	u32 val;
+
+	/* disable TX */
+	val = phy_read(gaudi_nic, lane, 0xA0);
+	/* set bit 13 to 1 */
+	val |= 0x2000;
+	/* set bit 11 to 0 */
+	val &= ~0x800;
+	phy_write(gaudi_nic, lane, 0xA0, val);
+
+	msleep(500);
+
+	/* enable TX */
+	val = phy_read(gaudi_nic, lane, 0xA0);
+	/* set bit 13 to 0 */
+	val &= ~0x2000;
+	phy_write(gaudi_nic, lane, 0xA0, val);
+}
+
+void gaudi_nic_phy_start_stop(struct gaudi_nic_device *gaudi_nic, int lane,
+				bool is_start)
+{
+	if (is_start) {
+		/* Enable TX driver in SerDes */
+		phy_write_mask(gaudi_nic, lane, 0xE3, 1, 0x2000);
+		/* Enable F/W Rx tuning is done during power up sequence */
+	} else {
+		/* Disable TX driver in SerDes */
+		phy_write_mask(gaudi_nic, lane, 0xE3, 0, 0x2000);
+		/* Silence F/W Rx tuning */
+		phy_write(gaudi_nic, lane, 0x9815, 0x9000);
+	}
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-09-16 17:56 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
2020-09-10 16:11 ` [PATCH 02/15] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
2020-09-10 16:11 ` [PATCH 03/15] habanalabs/gaudi: add NIC security configuration Oded Gabbay
2020-09-10 16:11 ` [PATCH 04/15] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
2020-09-10 16:11 ` [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
2020-09-10 20:03   ` Jakub Kicinski
2020-09-10 20:18     ` Oded Gabbay
2020-09-14  9:52     ` Omer Shpigelman
2020-09-14 16:47       ` Jakub Kicinski
2020-09-10 16:11 ` [PATCH 06/15] habanalabs/gaudi: add NIC PHY code Oded Gabbay
2020-09-10 16:11 ` [PATCH 07/15] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
2020-09-10 16:11 ` [PATCH 08/15] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
2020-09-10 16:11 ` [PATCH 09/15] habanalabs/gaudi: add CQ " Oded Gabbay
2020-09-10 16:11 ` [PATCH 10/15] habanalabs/gaudi: add WQ " Oded Gabbay
2020-09-10 16:11 ` [PATCH 11/15] habanalabs/gaudi: add QP error handling Oded Gabbay
2020-09-10 16:11 ` [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC Oded Gabbay
2020-09-10 20:01   ` Jakub Kicinski
2020-09-10 20:10     ` Oded Gabbay
2020-09-10 20:16       ` Jakub Kicinski
2020-09-10 20:17         ` Oded Gabbay
2020-09-10 20:30           ` Jakub Kicinski
2020-09-10 20:33             ` Oded Gabbay
2020-09-14 13:48             ` Omer Shpigelman
2020-09-14 16:50               ` Jakub Kicinski
2020-09-15 12:57                 ` Oded Gabbay
2020-09-16 16:38                   ` Jakub Kicinski
2020-09-10 16:11 ` [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
2020-09-10 20:19   ` Andrew Lunn
2020-09-10 20:22     ` Oded Gabbay
2020-09-10 16:11 ` [PATCH 14/15] habanalabs/gaudi: support DCB protocol Oded Gabbay
2020-09-10 16:11 ` [PATCH 15/15] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
2020-09-10 20:01 ` [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
2020-09-10 20:16   ` Oded Gabbay
2020-09-10 20:25     ` Andrew Lunn
2020-09-10 20:30       ` Oded Gabbay
2020-09-10 20:38         ` Andrew Lunn
2020-09-10 20:52           ` Oded Gabbay
2020-09-11  6:22           ` Greg Kroah-Hartman
2020-09-10 20:28     ` Jakub Kicinski
2020-09-10 20:32       ` Oded Gabbay
2020-09-10 21:05         ` Florian Fainelli
2020-09-10 21:15           ` Oded Gabbay
2020-09-10 21:23             ` Florian Fainelli
  -- strict thread matches above, loose matches on Subject: below --
2020-09-10 15:03 Oded Gabbay
2020-09-10 15:03 ` [PATCH 06/15] habanalabs/gaudi: add NIC PHY code Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).