linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] habanalabs fixes for merge window
@ 2019-02-28  8:46 Oded Gabbay
  2019-02-28  8:46 ` [PATCH 01/15] habanalabs: Dissociate RAZWI info from event types Oded Gabbay
                   ` (14 more replies)
  0 siblings, 15 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

Hi Greg,

This patch-set contains only fixes for H/W, F/W and driver bugs that were
discovered and fixed since v5 of the habanalabs upstream patch-set.

In addition, the patch-set contains fixes to sparse warnings regarding
little-endian to/from cpu conversions (and to other sparse warnings).

Thanks,
Oded

Oded Gabbay (10):
  habanalabs: disable CPU access on timeouts
  habanalabs: fix mmu cache registers init
  habanalabs: fix validation of WREG32 to DMA completion
  habanalabs: set DMA0 completion to SOB 1007
  habanalabs: add comments in uapi/misc/habanalabs.h
  habanalabs: fix memory leak with CBs with unaligned size
  habanalabs: print pointer using %p
  habanalabs: soft-reset device if context-switch fails
  habanalabs: use NULL to initialize array of pointers
  habanalabs: fix little-endian<->cpu conversion warnings

Omer Shpigelman (3):
  habanalabs: add MMU DRAM default page mapping
  habanalabs: extend QMAN0 job timeout
  habanalabs: return correct error code on MMU mapping failure

Tomer Tayar (2):
  habanalabs: Dissociate RAZWI info from event types
  habanalabs: fix little-endian<->cpu conversion warnings

 drivers/misc/habanalabs/command_buffer.c      |   9 +-
 drivers/misc/habanalabs/command_submission.c  |  16 +-
 drivers/misc/habanalabs/debugfs.c             |  21 +-
 drivers/misc/habanalabs/device.c              |   2 +
 drivers/misc/habanalabs/goya/goya.c           | 667 +++++++++---------
 drivers/misc/habanalabs/goya/goyaP.h          |  29 +-
 drivers/misc/habanalabs/habanalabs.h          |  14 +-
 drivers/misc/habanalabs/habanalabs_ioctl.c    |   2 +-
 drivers/misc/habanalabs/hw_queue.c            |  23 +-
 drivers/misc/habanalabs/hwmon.c               |  54 +-
 .../include/goya/asic_reg/goya_regs.h         |   1 +
 .../include/hw_ip/mmu/mmu_general.h           |   1 +
 drivers/misc/habanalabs/irq.c                 |   8 +-
 drivers/misc/habanalabs/memory.c              |  12 +-
 drivers/misc/habanalabs/mmu.c                 | 287 +++++++-
 drivers/misc/habanalabs/sysfs.c               |  29 +-
 include/uapi/misc/habanalabs.h                |  10 +-
 17 files changed, 740 insertions(+), 445 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/15] habanalabs: Dissociate RAZWI info from event types
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 02/15] habanalabs: add MMU DRAM default page mapping Oded Gabbay
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel; +Cc: Tomer Tayar

From: Tomer Tayar <ttayar@habana.ai>

This patch provides a workaround for a H/W bug in the RAZWI logger in
Goya. The logger doesn't recognize the initiator correctly and as a
result, accesses from one initiator are reported that were coming from a
different initiator.

The WA is to print the error information from the event entries we receive
without looking at the RAZWI logger at all.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 227 ++++++++++++++++------------
 1 file changed, 127 insertions(+), 100 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 54218f147627..447d907bddf3 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -111,29 +111,6 @@ static u16 goya_packet_sizes[MAX_PACKET_ID] = {
 	[PACKET_STOP]		= sizeof(struct packet_stop)
 };
 
-static const char *goya_axi_name[GOYA_MAX_INITIATORS] = {
-	"MME0",
-	"MME1",
-	"MME2",
-	"MME3",
-	"MME4",
-	"MME5",
-	"TPC0",
-	"TPC1",
-	"TPC2",
-	"TPC3",
-	"TPC4",
-	"TPC5",
-	"TPC6",
-	"TPC7",
-	"PCI",
-	"DMA", /* HBW */
-	"DMA", /* LBW */
-	"PSOC",
-	"CPU",
-	"MMU"
-};
-
 static u64 goya_mmu_regs[GOYA_MMU_REGS_NUM] = {
 	mmDMA_QM_0_GLBL_NON_SECURE_PROPS,
 	mmDMA_QM_1_GLBL_NON_SECURE_PROPS,
@@ -4554,111 +4531,161 @@ static void goya_write_pte(struct hl_device *hdev, u64 addr, u64 val)
 			(addr - goya->ddr_bar_cur_addr));
 }
 
-static void goya_get_axi_name(struct hl_device *hdev, u32 agent_id,
-		u16 event_type, char *axi_name, int len)
+static const char *_goya_get_event_desc(u16 event_type)
 {
-	if (!strcmp(goya_axi_name[agent_id], "DMA"))
-		if (event_type >= GOYA_ASYNC_EVENT_ID_DMA0_CH)
-			snprintf(axi_name, len, "DMA %d",
-				event_type - GOYA_ASYNC_EVENT_ID_DMA0_CH);
-		else
-			snprintf(axi_name, len, "DMA %d",
-				event_type - GOYA_ASYNC_EVENT_ID_DMA0_QM);
-	else
-		snprintf(axi_name, len, "%s", goya_axi_name[agent_id]);
+	switch (event_type) {
+	case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
+		return "PCIe_dec";
+	case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
+		return "TPC%d_dec";
+	case GOYA_ASYNC_EVENT_ID_MME_WACS:
+		return "MME_wacs";
+	case GOYA_ASYNC_EVENT_ID_MME_WACSD:
+		return "MME_wacsd";
+	case GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER:
+		return "CPU_axi_splitter";
+	case GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC:
+		return "PSOC_axi_dec";
+	case GOYA_ASYNC_EVENT_ID_PSOC:
+		return "PSOC";
+	case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
+		return "TPC%d_krn_err";
+	case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
+		return "TPC%d_cq";
+	case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
+		return "TPC%d_qm";
+	case GOYA_ASYNC_EVENT_ID_MME_QM:
+		return "MME_qm";
+	case GOYA_ASYNC_EVENT_ID_MME_CMDQ:
+		return "MME_cq";
+	case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
+		return "DMA%d_qm";
+	case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
+		return "DMA%d_ch";
+	default:
+		return "N/A";
+	}
 }
 
-static void goya_print_razwi_info(struct hl_device *hdev, u64 reg,
-		bool is_hbw, bool is_read, u16 event_type)
+static void goya_get_event_desc(u16 event_type, char *desc, size_t size)
 {
-	u32 val, agent_id;
-	char axi_name[10] = {0};
-
-	val = RREG32(reg);
+	u8 index;
 
-	if (is_hbw)
-		agent_id = (val & GOYA_IRQ_HBW_AGENT_ID_MASK) >>
-				GOYA_IRQ_HBW_AGENT_ID_SHIFT;
-	else
-		agent_id = (val & GOYA_IRQ_LBW_AGENT_ID_MASK) >>
-				GOYA_IRQ_LBW_AGENT_ID_SHIFT;
-
-	if (agent_id >= GOYA_MAX_INITIATORS) {
-		dev_err(hdev->dev,
-			"Illegal %s %s with wrong initiator id %d, H/W IRQ %d\n",
-				is_read ? "read from" : "write to",
-				is_hbw ? "HBW" : "LBW",
-				agent_id,
-				event_type);
-	} else {
-		goya_get_axi_name(hdev, agent_id, event_type, axi_name,
-				sizeof(axi_name));
-		dev_err(hdev->dev, "Illegal %s by %s %s %s, H/W IRQ %d\n",
-				is_read ? "read" : "write",
-				axi_name,
-				is_read ? "from" : "to",
-				is_hbw ? "HBW" : "LBW",
-				event_type);
+	switch (event_type) {
+	case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
+	case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
+		index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_DEC) / 3;
+		snprintf(desc, size, _goya_get_event_desc(event_type), index);
+		break;
+	case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
+	case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
+		index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR) / 10;
+		snprintf(desc, size, _goya_get_event_desc(event_type), index);
+		break;
+	case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
+		index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_CMDQ;
+		snprintf(desc, size, _goya_get_event_desc(event_type), index);
+		break;
+	case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
+		index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_QM;
+		snprintf(desc, size, _goya_get_event_desc(event_type), index);
+		break;
+	case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
+		index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_QM;
+		snprintf(desc, size, _goya_get_event_desc(event_type), index);
+		break;
+	case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
+		index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_CH;
+		snprintf(desc, size, _goya_get_event_desc(event_type), index);
+		break;
+	default:
+		snprintf(desc, size, _goya_get_event_desc(event_type));
+		break;
 	}
 }
 
-static void goya_print_irq_info(struct hl_device *hdev, u16 event_type)
+static void goya_print_razwi_info(struct hl_device *hdev)
 {
-	struct goya_device *goya = hdev->asic_specific;
-	bool is_hbw = false, is_read = false, is_info = false;
-
 	if (RREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD)) {
-		goya_print_razwi_info(hdev, mmDMA_MACRO_RAZWI_LBW_WT_ID, is_hbw,
-				is_read, event_type);
+		dev_err(hdev->dev, "Illegal write to LBW\n");
 		WREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD, 0);
-		is_info = true;
 	}
+
 	if (RREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD)) {
-		is_read = true;
-		goya_print_razwi_info(hdev, mmDMA_MACRO_RAZWI_LBW_RD_ID, is_hbw,
-				is_read, event_type);
+		dev_err(hdev->dev, "Illegal read from LBW\n");
 		WREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD, 0);
-		is_info = true;
 	}
+
 	if (RREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD)) {
-		is_hbw = true;
-		goya_print_razwi_info(hdev, mmDMA_MACRO_RAZWI_HBW_WT_ID, is_hbw,
-				is_read, event_type);
+		dev_err(hdev->dev, "Illegal write to HBW\n");
 		WREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD, 0);
-		is_info = true;
 	}
+
 	if (RREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD)) {
-		is_hbw = true;
-		is_read = true;
-		goya_print_razwi_info(hdev, mmDMA_MACRO_RAZWI_HBW_RD_ID, is_hbw,
-				is_read, event_type);
+		dev_err(hdev->dev, "Illegal read from HBW\n");
 		WREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD, 0);
-		is_info = true;
-	}
-	if (!is_info) {
-		dev_err(hdev->dev,
-			"Received H/W interrupt %d, no additional info\n",
-			event_type);
-		return;
 	}
+}
 
-	if (goya->hw_cap_initialized & HW_CAP_MMU) {
-		u32 val = RREG32(mmMMU_PAGE_ERROR_CAPTURE);
-		u64 addr;
+static void goya_print_mmu_error_info(struct hl_device *hdev)
+{
+	struct goya_device *goya = hdev->asic_specific;
+	u64 addr;
+	u32 val;
+
+	if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+		return;
 
-		if (val & MMU_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
-			addr = val & MMU_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
-			addr <<= 32;
-			addr |= RREG32(mmMMU_PAGE_ERROR_CAPTURE_VA);
+	val = RREG32(mmMMU_PAGE_ERROR_CAPTURE);
+	if (val & MMU_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
+		addr = val & MMU_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
+		addr <<= 32;
+		addr |= RREG32(mmMMU_PAGE_ERROR_CAPTURE_VA);
 
-			dev_err(hdev->dev, "MMU page fault on va 0x%llx\n",
-					addr);
+		dev_err(hdev->dev, "MMU page fault on va 0x%llx\n", addr);
 
-			WREG32(mmMMU_PAGE_ERROR_CAPTURE, 0);
-		}
+		WREG32(mmMMU_PAGE_ERROR_CAPTURE, 0);
 	}
 }
 
+static void goya_print_irq_info(struct hl_device *hdev, u16 event_type)
+{
+	char desc[20] = "";
+
+	goya_get_event_desc(event_type, desc, sizeof(desc));
+	dev_err(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
+		event_type, desc);
+
+	goya_print_razwi_info(hdev);
+	goya_print_mmu_error_info(hdev);
+}
+
 static int goya_unmask_irq_arr(struct hl_device *hdev, u32 *irq_arr,
 		size_t irq_arr_size)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/15] habanalabs: add MMU DRAM default page mapping
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
  2019-02-28  8:46 ` [PATCH 01/15] habanalabs: Dissociate RAZWI info from event types Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 03/15] habanalabs: disable CPU access on timeouts Oded Gabbay
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel; +Cc: Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

This patch provides a workaround for a H/W bug in Goya, where access to
RAZWI from TPC can cause PCI completion timeout.

The WA is to use the device MMU to map any unmapped DRAM memory to a
default page in the DRAM. That way, the TPC will never reach RAZWI upon
accessing a bad address in the DRAM.

When a DRAM page is mapped by the user, its default mapping is
overwritten. Once that page is unmapped, the MMU driver will map that page
to the default page.

To help debugging, the driver will set the default page area to 0x99 on
device initialization.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c           | 190 +++++-------
 drivers/misc/habanalabs/goya/goyaP.h          |  29 +-
 drivers/misc/habanalabs/habanalabs.h          |  12 +-
 .../include/hw_ip/mmu/mmu_general.h           |   1 +
 drivers/misc/habanalabs/memory.c              |  12 +-
 drivers/misc/habanalabs/mmu.c                 | 285 +++++++++++++++---
 6 files changed, 361 insertions(+), 168 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 447d907bddf3..7c2edabe20bd 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -304,6 +304,7 @@ static u32 goya_non_fatal_events[GOYA_ASYC_EVENT_GROUP_NON_FATAL_SIZE] = {
 static int goya_armcp_info_get(struct hl_device *hdev);
 static void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
 static int goya_mmu_clear_pgt_range(struct hl_device *hdev);
+static int goya_mmu_set_dram_default_page(struct hl_device *hdev);
 static int goya_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
 					u64 phys_addr);
 
@@ -345,6 +346,7 @@ static void goya_get_fixed_properties(struct hl_device *hdev)
 						SRAM_USER_BASE_OFFSET;
 
 	prop->mmu_pgt_addr = MMU_PAGE_TABLES_ADDR;
+	prop->mmu_dram_default_page_addr = MMU_DRAM_DEFAULT_PAGE_ADDR;
 	if (hdev->pldm)
 		prop->mmu_pgt_size = 0x800000; /* 8MB */
 	else
@@ -359,6 +361,8 @@ static void goya_get_fixed_properties(struct hl_device *hdev)
 	prop->va_space_host_end_address = VA_HOST_SPACE_END;
 	prop->va_space_dram_start_address = VA_DDR_SPACE_START;
 	prop->va_space_dram_end_address = VA_DDR_SPACE_END;
+	prop->dram_size_for_default_page_mapping =
+			prop->va_space_dram_end_address;
 	prop->cfg_size = CFG_SIZE;
 	prop->max_asid = MAX_ASID;
 	prop->num_of_events = GOYA_ASYNC_EVENT_ID_SIZE;
@@ -816,6 +820,12 @@ static int goya_late_init(struct hl_device *hdev)
 		goto disable_pci_access;
 	}
 
+	rc = goya_mmu_set_dram_default_page(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to set DRAM default page\n");
+		goto disable_pci_access;
+	}
+
 	return 0;
 
 disable_pci_access:
@@ -2648,6 +2658,7 @@ static int goya_mmu_init(struct hl_device *hdev)
 		return 0;
 
 	hdev->dram_supports_virtual_memory = true;
+	hdev->dram_default_page_mapping = true;
 
 	for (i = 0 ; i < prop->max_asid ; i++) {
 		hop0_addr = prop->mmu_pgt_addr +
@@ -4303,98 +4314,6 @@ static void goya_update_eq_ci(struct hl_device *hdev, u32 val)
 	WREG32(mmPSOC_GLOBAL_CONF_SCRATCHPAD_6, val);
 }
 
-static int goya_context_switch(struct hl_device *hdev, u32 asid)
-{
-	struct asic_fixed_properties *prop = &hdev->asic_prop;
-	struct packet_lin_dma *clear_sram_pkt;
-	struct hl_cs_parser parser;
-	struct hl_cs_job *job;
-	u32 cb_size;
-	struct hl_cb *cb;
-	int rc;
-
-	cb = hl_cb_kernel_create(hdev, PAGE_SIZE);
-	if (!cb)
-		return -EFAULT;
-
-	clear_sram_pkt = (struct packet_lin_dma *)
-					(uintptr_t) cb->kernel_address;
-
-	memset(clear_sram_pkt, 0, sizeof(*clear_sram_pkt));
-	cb_size = sizeof(*clear_sram_pkt);
-
-	clear_sram_pkt->ctl = ((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
-		(DMA_HOST_TO_SRAM << GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT) |
-		(1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
-		(1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
-		(1 << GOYA_PKT_CTL_RB_SHIFT) |
-		(1 << GOYA_PKT_CTL_MB_SHIFT));
-
-	clear_sram_pkt->src_addr = 0x7777777777777777ull;
-	clear_sram_pkt->dst_addr = prop->sram_base_address;
-	if (hdev->pldm)
-		clear_sram_pkt->tsize = 0x10000;
-	else
-		clear_sram_pkt->tsize = prop->sram_size;
-
-	job = hl_cs_allocate_job(hdev, true);
-	if (!job) {
-		dev_err(hdev->dev, "Failed to allocate a new job\n");
-		rc = -ENOMEM;
-		goto release_cb;
-	}
-
-	job->id = 0;
-	job->user_cb = cb;
-	job->user_cb->cs_cnt++;
-	job->user_cb_size = cb_size;
-	job->hw_queue_id = GOYA_QUEUE_ID_DMA_0;
-
-	hl_debugfs_add_job(hdev, job);
-
-	parser.ctx_id = HL_KERNEL_ASID_ID;
-	parser.cs_sequence = 0;
-	parser.job_id = job->id;
-	parser.hw_queue_id = job->hw_queue_id;
-	parser.job_userptr_list = &job->userptr_list;
-	parser.user_cb = job->user_cb;
-	parser.user_cb_size = job->user_cb_size;
-	parser.ext_queue = job->ext_queue;
-	parser.use_virt_addr = hdev->mmu_enable;
-
-	rc = hdev->asic_funcs->cs_parser(hdev, &parser);
-	if (rc) {
-		dev_err(hdev->dev,
-			"Failed to parse kernel CB during context switch\n");
-		goto free_job;
-	}
-
-	job->patched_cb = parser.patched_cb;
-	job->job_cb_size = parser.patched_cb_size;
-	job->patched_cb->cs_cnt++;
-
-	rc = goya_send_job_on_qman0(hdev, job);
-
-	/* no point in setting the asid in case of failure */
-	if (!rc)
-		goya_mmu_prepare(hdev, asid);
-
-	job->patched_cb->cs_cnt--;
-	hl_cb_put(job->patched_cb);
-
-free_job:
-	hl_userptr_delete_list(hdev, &job->userptr_list);
-	hl_debugfs_remove_job(hdev, job);
-	kfree(job);
-	cb->cs_cnt--;
-
-release_cb:
-	hl_cb_put(cb);
-	hl_cb_destroy(hdev, &hdev->kernel_cb_mgr, cb->id << PAGE_SHIFT);
-
-	return rc;
-}
-
 static void goya_restore_phase_topology(struct hl_device *hdev)
 {
 	int i, num_of_sob_in_longs, num_of_mon_in_longs;
@@ -4864,41 +4783,37 @@ void *goya_get_events_stat(struct hl_device *hdev, u32 *size)
 	return goya->events_stat;
 }
 
-static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
+static int goya_memset_device_memory(struct hl_device *hdev, u64 addr, u32 size,
+				u64 val, bool is_dram)
 {
-	struct asic_fixed_properties *prop = &hdev->asic_prop;
-	struct goya_device *goya = hdev->asic_specific;
-	struct packet_lin_dma *clear_pgt_range_pkt;
+	struct packet_lin_dma *lin_dma_pkt;
 	struct hl_cs_parser parser;
 	struct hl_cs_job *job;
 	u32 cb_size;
 	struct hl_cb *cb;
 	int rc;
 
-	if (!(goya->hw_cap_initialized & HW_CAP_MMU))
-		return 0;
-
 	cb = hl_cb_kernel_create(hdev, PAGE_SIZE);
 	if (!cb)
 		return -EFAULT;
 
-	clear_pgt_range_pkt = (struct packet_lin_dma *)
-					(uintptr_t) cb->kernel_address;
+	lin_dma_pkt = (struct packet_lin_dma *) (uintptr_t) cb->kernel_address;
+
+	memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
+	cb_size = sizeof(*lin_dma_pkt);
 
-	memset(clear_pgt_range_pkt, 0, sizeof(*clear_pgt_range_pkt));
-	cb_size = sizeof(*clear_pgt_range_pkt);
+	lin_dma_pkt->ctl = ((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
+				(1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
+				(1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
+				(1 << GOYA_PKT_CTL_RB_SHIFT) |
+				(1 << GOYA_PKT_CTL_MB_SHIFT));
 
-	clear_pgt_range_pkt->ctl =
-		((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
-		(DMA_HOST_TO_DRAM << GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT) |
-		(1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
-		(1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
-		(1 << GOYA_PKT_CTL_RB_SHIFT) |
-		(1 << GOYA_PKT_CTL_MB_SHIFT));
+	lin_dma_pkt->ctl |= (is_dram ? DMA_HOST_TO_DRAM : DMA_HOST_TO_SRAM) <<
+				GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 
-	clear_pgt_range_pkt->src_addr = 0;
-	clear_pgt_range_pkt->dst_addr = prop->mmu_pgt_addr;
-	clear_pgt_range_pkt->tsize = prop->mmu_pgt_size + MMU_CACHE_MNG_SIZE;
+	lin_dma_pkt->src_addr = val;
+	lin_dma_pkt->dst_addr = addr;
+	lin_dma_pkt->tsize = size;
 
 	job = hl_cs_allocate_job(hdev, true);
 	if (!job) {
@@ -4927,8 +4842,7 @@ static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
 
 	rc = hdev->asic_funcs->cs_parser(hdev, &parser);
 	if (rc) {
-		dev_err(hdev->dev,
-			"Failed to parse kernel CB when clearing pgt\n");
+		dev_err(hdev->dev, "Failed to parse kernel CB\n");
 		goto free_job;
 	}
 
@@ -4954,6 +4868,52 @@ static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
 	return rc;
 }
 
+static int goya_context_switch(struct hl_device *hdev, u32 asid)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 addr = prop->sram_base_address;
+	u32 size = hdev->pldm ? 0x10000 : prop->sram_size;
+	u64 val = 0x7777777777777777ull;
+	int rc;
+
+	rc = goya_memset_device_memory(hdev, addr, size, val, false);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to clear SRAM in context switch\n");
+		return rc;
+	}
+
+	goya_mmu_prepare(hdev, asid);
+
+	return 0;
+}
+
+static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct goya_device *goya = hdev->asic_specific;
+	u64 addr = prop->mmu_pgt_addr;
+	u32 size = prop->mmu_pgt_size + MMU_DRAM_DEFAULT_PAGE_SIZE +
+			MMU_CACHE_MNG_SIZE;
+
+	if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+		return 0;
+
+	return goya_memset_device_memory(hdev, addr, size, 0, true);
+}
+
+static int goya_mmu_set_dram_default_page(struct hl_device *hdev)
+{
+	struct goya_device *goya = hdev->asic_specific;
+	u64 addr = hdev->asic_prop.mmu_dram_default_page_addr;
+	u32 size = MMU_DRAM_DEFAULT_PAGE_SIZE;
+	u64 val = 0x9999999999999999ull;
+
+	if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+		return 0;
+
+	return goya_memset_device_memory(hdev, addr, size, val, true);
+}
+
 static void goya_mmu_prepare(struct hl_device *hdev, u32 asid)
 {
 	struct goya_device *goya = hdev->asic_specific;
diff --git a/drivers/misc/habanalabs/goya/goyaP.h b/drivers/misc/habanalabs/goya/goyaP.h
index 0631bc133cce..830551b6b062 100644
--- a/drivers/misc/habanalabs/goya/goyaP.h
+++ b/drivers/misc/habanalabs/goya/goyaP.h
@@ -56,18 +56,23 @@
 
 /* DRAM Memory Map */
 
-#define CPU_FW_IMAGE_SIZE	0x10000000	/* 256MB */
-#define MMU_PAGE_TABLES_SIZE	0x0E000000	/* 224MB */
-#define MMU_CACHE_MNG_SIZE	0x00001000	/* 4KB */
-#define CPU_PQ_PKT_SIZE		0x00001000	/* 4KB */
-#define CPU_PQ_DATA_SIZE	0x01FFE000	/* 32MB - 8KB  */
-
-#define CPU_FW_IMAGE_ADDR	DRAM_PHYS_BASE
-#define MMU_PAGE_TABLES_ADDR	(CPU_FW_IMAGE_ADDR + CPU_FW_IMAGE_SIZE)
-#define MMU_CACHE_MNG_ADDR	(MMU_PAGE_TABLES_ADDR + MMU_PAGE_TABLES_SIZE)
-#define CPU_PQ_PKT_ADDR		(MMU_CACHE_MNG_ADDR + MMU_CACHE_MNG_SIZE)
-#define CPU_PQ_DATA_ADDR	(CPU_PQ_PKT_ADDR + CPU_PQ_PKT_SIZE)
-#define DRAM_BASE_ADDR_USER	(CPU_PQ_DATA_ADDR + CPU_PQ_DATA_SIZE)
+#define CPU_FW_IMAGE_SIZE		0x10000000	/* 256MB */
+#define MMU_PAGE_TABLES_SIZE		0x0DE00000	/* 222MB */
+#define MMU_DRAM_DEFAULT_PAGE_SIZE	0x00200000	/* 2MB */
+#define MMU_CACHE_MNG_SIZE		0x00001000	/* 4KB */
+#define CPU_PQ_PKT_SIZE			0x00001000	/* 4KB */
+#define CPU_PQ_DATA_SIZE		0x01FFE000	/* 32MB - 8KB  */
+
+#define CPU_FW_IMAGE_ADDR		DRAM_PHYS_BASE
+#define MMU_PAGE_TABLES_ADDR		(CPU_FW_IMAGE_ADDR + CPU_FW_IMAGE_SIZE)
+#define MMU_DRAM_DEFAULT_PAGE_ADDR	(MMU_PAGE_TABLES_ADDR + \
+						MMU_PAGE_TABLES_SIZE)
+#define MMU_CACHE_MNG_ADDR		(MMU_DRAM_DEFAULT_PAGE_ADDR + \
+					MMU_DRAM_DEFAULT_PAGE_SIZE)
+#define CPU_PQ_PKT_ADDR			(MMU_CACHE_MNG_ADDR + \
+						MMU_CACHE_MNG_SIZE)
+#define CPU_PQ_DATA_ADDR		(CPU_PQ_PKT_ADDR + CPU_PQ_PKT_SIZE)
+#define DRAM_BASE_ADDR_USER		(CPU_PQ_DATA_ADDR + CPU_PQ_DATA_SIZE)
 
 #if (DRAM_BASE_ADDR_USER != 0x20000000)
 #error "KMD must reserve 512MB"
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index ee29971822c6..59b25c6fae00 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -143,7 +143,10 @@ enum hl_device_hw_state {
  *                               mapping DRAM memory.
  * @va_space_dram_end_address: end address of virtual memory range for
  *                             mapping DRAM memory.
+ * @dram_size_for_default_page_mapping: DRAM size needed to map to avoid page
+ *                                      fault.
  * @mmu_pgt_addr: base physical address in DRAM of MMU page tables.
+ * @mmu_dram_default_page_addr: DRAM default page physical address.
  * @mmu_pgt_size: MMU page tables total size.
  * @mmu_pte_size: PTE size in MMU page tables.
  * @mmu_hop_table_size: MMU hop table size.
@@ -182,7 +185,9 @@ struct asic_fixed_properties {
 	u64			va_space_host_end_address;
 	u64			va_space_dram_start_address;
 	u64			va_space_dram_end_address;
+	u64			dram_size_for_default_page_mapping;
 	u64			mmu_pgt_addr;
+	u64			mmu_dram_default_page_addr;
 	u32			mmu_pgt_size;
 	u32			mmu_pte_size;
 	u32			mmu_hop_table_size;
@@ -592,6 +597,8 @@ struct hl_va_range {
  * @cs_sequence: sequence number for CS. Value is assigned to a CS and passed
  *			to user so user could inquire about CS. It is used as
  *			index to cs_pending array.
+ * @dram_default_hops: array that holds all hops addresses needed for default
+ *                     DRAM mapping.
  * @cs_lock: spinlock to protect cs_sequence.
  * @dram_phys_mem: amount of used physical DRAM memory by this context.
  * @thread_restore_token: token to prevent multiple threads of the same context
@@ -615,6 +622,7 @@ struct hl_ctx {
 	struct mutex		mmu_lock;
 	struct list_head	debugfs_list;
 	u64			cs_sequence;
+	u64			*dram_default_hops;
 	spinlock_t		cs_lock;
 	atomic64_t		dram_phys_mem;
 	atomic_t		thread_restore_token;
@@ -1068,6 +1076,7 @@ struct hl_device_reset_work {
  * @reset_on_lockup: true if a reset should be done in case of stuck CS, false
  *                   otherwise.
  * @dram_supports_virtual_memory: is MMU enabled towards DRAM.
+ * @dram_default_page_mapping: is DRAM default page mapping enabled.
  * @init_done: is the initialization of the device done.
  * @mmu_enable: is MMU enabled.
  */
@@ -1135,6 +1144,7 @@ struct hl_device {
 	u8				heartbeat;
 	u8				reset_on_lockup;
 	u8				dram_supports_virtual_memory;
+	u8				dram_default_page_mapping;
 	u8				init_done;
 
 	/* Parameters for bring-up */
@@ -1329,7 +1339,7 @@ bool hl_userptr_is_pinned(struct hl_device *hdev, u64 addr, u32 size,
 
 int hl_mmu_init(struct hl_device *hdev);
 void hl_mmu_fini(struct hl_device *hdev);
-void hl_mmu_ctx_init(struct hl_ctx *ctx);
+int hl_mmu_ctx_init(struct hl_ctx *ctx);
 void hl_mmu_ctx_fini(struct hl_ctx *ctx);
 int hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr, u32 page_size);
 int hl_mmu_unmap(struct hl_ctx *ctx, u64 virt_addr, u32 page_size);
diff --git a/drivers/misc/habanalabs/include/hw_ip/mmu/mmu_general.h b/drivers/misc/habanalabs/include/hw_ip/mmu/mmu_general.h
index 1bc36aba1426..b680052ee3f0 100644
--- a/drivers/misc/habanalabs/include/hw_ip/mmu/mmu_general.h
+++ b/drivers/misc/habanalabs/include/hw_ip/mmu/mmu_general.h
@@ -36,6 +36,7 @@
 
 #define HL_PTE_SIZE			sizeof(u64)
 #define HOP_TABLE_SIZE			PAGE_SIZE_4KB
+#define PTE_ENTRIES_IN_HOP		(HOP_TABLE_SIZE / HL_PTE_SIZE)
 #define HOP0_TABLES_TOTAL_SIZE		(HOP_TABLE_SIZE * MAX_ASID)
 
 #define MMU_HOP0_PA43_12_SHIFT		12
diff --git a/drivers/misc/habanalabs/memory.c b/drivers/misc/habanalabs/memory.c
index 660cf67258fd..3a12fd1a5274 100644
--- a/drivers/misc/habanalabs/memory.c
+++ b/drivers/misc/habanalabs/memory.c
@@ -925,8 +925,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 		goto map_err;
 	}
 
-	hdev->asic_funcs->mmu_invalidate_cache_range(hdev, false, ctx->asid,
-			ret_vaddr, phys_pg_pack->total_size);
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, false);
 
 	mutex_unlock(&ctx->mmu_lock);
 
@@ -1050,8 +1049,7 @@ static int unmap_device_va(struct hl_ctx *ctx, u64 vaddr)
 			dev_warn_ratelimited(hdev->dev,
 				"unmap failed for vaddr: 0x%llx\n", next_vaddr);
 
-	hdev->asic_funcs->mmu_invalidate_cache_range(hdev, true, ctx->asid,
-			vaddr, phys_pg_pack->total_size);
+	hdev->asic_funcs->mmu_invalidate_cache(hdev, true);
 
 	mutex_unlock(&ctx->mmu_lock);
 
@@ -1455,7 +1453,11 @@ static int hl_vm_ctx_init_with_ranges(struct hl_ctx *ctx, u64 host_range_start,
 	struct hl_device *hdev = ctx->hdev;
 	int rc;
 
-	hl_mmu_ctx_init(ctx);
+	rc = hl_mmu_ctx_init(ctx);
+	if (rc) {
+		dev_err(hdev->dev, "failed to init context %d\n", ctx->asid);
+		return rc;
+	}
 
 	mutex_init(&ctx->mem_hash_lock);
 	hash_init(ctx->mem_hash);
diff --git a/drivers/misc/habanalabs/mmu.c b/drivers/misc/habanalabs/mmu.c
index 79c70d92e74b..a7187f9a5948 100644
--- a/drivers/misc/habanalabs/mmu.c
+++ b/drivers/misc/habanalabs/mmu.c
@@ -151,7 +151,7 @@ static inline u64 get_alloc_next_hop_addr(struct hl_ctx *ctx, u64 curr_pte,
 
 	if (hop_addr == ULLONG_MAX) {
 		hop_addr = alloc_hop(ctx);
-		*is_new_hop = true;
+		*is_new_hop = (hop_addr != ULLONG_MAX);
 	}
 
 	return hop_addr;
@@ -234,22 +234,122 @@ void hl_mmu_fini(struct hl_device *hdev)
 	/* MMU HW fini will be done in device hw_fini() */
 }
 
-/*
- * hl_mmu_ctx_init - init a ctx for using the mmu module
- *
- * @ctx: pointer to the context structure
+/**
+ * hl_mmu_ctx_init() - initialize a context for using the MMU module.
+ * @ctx: pointer to the context structure to initialize.
  *
- * This function does the following:
- * - Init a mutex to protect the concurrent mapping flow
- * - Init a hash to hold all pgts related to this ctx
+ * Initialize a mutex to protect the concurrent mapping flow, a hash to hold all
+ * page tables hops related to this context and an optional DRAM default page
+ * mapping.
+ * Return: 0 on success, non-zero otherwise.
  */
-void hl_mmu_ctx_init(struct hl_ctx *ctx)
+int hl_mmu_ctx_init(struct hl_ctx *ctx)
 {
-	if (!ctx->hdev->mmu_enable)
-		return;
+	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 num_of_hop3, total_hops, hop1_addr, hop2_addr, hop2_pte_addr,
+		hop3_pte_addr, pte_val;
+	int rc, i, j, hop3_allocated = 0;
+
+	if (!hdev->mmu_enable)
+		return 0;
 
 	mutex_init(&ctx->mmu_lock);
 	hash_init(ctx->mmu_hash);
+
+	if (!hdev->dram_supports_virtual_memory ||
+			!hdev->dram_default_page_mapping)
+		return 0;
+
+	num_of_hop3 = (prop->dram_size_for_default_page_mapping /
+			prop->dram_page_size) /
+			PTE_ENTRIES_IN_HOP;
+
+	/* add hop1 and hop2 */
+	total_hops = num_of_hop3 + 2;
+
+	ctx->dram_default_hops = kzalloc(HL_PTE_SIZE * total_hops,  GFP_KERNEL);
+	if (!ctx->dram_default_hops) {
+		rc = -ENOMEM;
+		goto alloc_err;
+	}
+
+	hop1_addr = alloc_hop(ctx);
+	if (hop1_addr == ULLONG_MAX) {
+		dev_err(hdev->dev, "failed to alloc hop 1\n");
+		rc = -ENOMEM;
+		goto hop1_err;
+	}
+
+	ctx->dram_default_hops[total_hops - 1] = hop1_addr;
+
+	hop2_addr = alloc_hop(ctx);
+	if (hop2_addr == ULLONG_MAX) {
+		dev_err(hdev->dev, "failed to alloc hop 2\n");
+		rc = -ENOMEM;
+		goto hop2_err;
+	}
+
+	ctx->dram_default_hops[total_hops - 2] = hop2_addr;
+
+	for (i = 0 ; i < num_of_hop3 ; i++) {
+		ctx->dram_default_hops[i] = alloc_hop(ctx);
+		if (ctx->dram_default_hops[i] == ULLONG_MAX) {
+			dev_err(hdev->dev, "failed to alloc hop 3, i: %d\n", i);
+			rc = -ENOMEM;
+			goto hop3_err;
+		}
+		hop3_allocated++;
+	}
+
+	/* need only pte 0 in hops 0 and 1 */
+	pte_val = (hop1_addr & PTE_PHYS_ADDR_MASK) | PAGE_PRESENT_MASK;
+	hdev->asic_funcs->write_pte(hdev, get_hop0_addr(ctx), pte_val);
+
+	pte_val = (hop2_addr & PTE_PHYS_ADDR_MASK) | PAGE_PRESENT_MASK;
+	hdev->asic_funcs->write_pte(hdev, hop1_addr, pte_val);
+	get_pte(ctx, hop1_addr);
+
+	hop2_pte_addr = hop2_addr;
+	for (i = 0 ; i < num_of_hop3 ; i++) {
+		pte_val = (ctx->dram_default_hops[i] & PTE_PHYS_ADDR_MASK) |
+				PAGE_PRESENT_MASK;
+		hdev->asic_funcs->write_pte(hdev, hop2_pte_addr, pte_val);
+		get_pte(ctx, hop2_addr);
+		hop2_pte_addr += HL_PTE_SIZE;
+	}
+
+	pte_val = (prop->mmu_dram_default_page_addr & PTE_PHYS_ADDR_MASK) |
+			LAST_MASK | PAGE_PRESENT_MASK;
+
+	for (i = 0 ; i < num_of_hop3 ; i++) {
+		hop3_pte_addr = ctx->dram_default_hops[i];
+		for (j = 0 ; j < PTE_ENTRIES_IN_HOP ; j++) {
+			hdev->asic_funcs->write_pte(hdev, hop3_pte_addr,
+					pte_val);
+			get_pte(ctx, ctx->dram_default_hops[i]);
+			hop3_pte_addr += HL_PTE_SIZE;
+		}
+	}
+
+	/* flush all writes to reach PCI */
+	mb();
+	hdev->asic_funcs->read_pte(hdev, hop2_addr);
+
+	return 0;
+
+hop3_err:
+	for (i = 0 ; i < hop3_allocated ; i++)
+		free_hop(ctx, ctx->dram_default_hops[i]);
+	free_hop(ctx, hop2_addr);
+hop2_err:
+	free_hop(ctx, hop1_addr);
+hop1_err:
+	kfree(ctx->dram_default_hops);
+alloc_err:
+	mutex_destroy(&ctx->mmu_lock);
+
+	return rc;
 }
 
 /*
@@ -260,22 +360,65 @@ void hl_mmu_ctx_init(struct hl_ctx *ctx)
  * This function does the following:
  * - Free any pgts which were not freed yet
  * - Free the mutex
+ * - Free DRAM default page mapping hops
  */
 void hl_mmu_ctx_fini(struct hl_ctx *ctx)
 {
+	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	struct pgt_info *pgt_info;
 	struct hlist_node *tmp;
-	int i;
+	u64 num_of_hop3, total_hops, hop1_addr, hop2_addr, hop2_pte_addr,
+		hop3_pte_addr;
+	int i, j;
 
 	if (!ctx->hdev->mmu_enable)
 		return;
 
+	if (hdev->dram_supports_virtual_memory &&
+			hdev->dram_default_page_mapping) {
+
+		num_of_hop3 = (prop->dram_size_for_default_page_mapping /
+				prop->dram_page_size) /
+				PTE_ENTRIES_IN_HOP;
+
+		/* add hop1 and hop2 */
+		total_hops = num_of_hop3 + 2;
+		hop1_addr = ctx->dram_default_hops[total_hops - 1];
+		hop2_addr = ctx->dram_default_hops[total_hops - 2];
+
+		for (i = 0 ; i < num_of_hop3 ; i++) {
+			hop3_pte_addr = ctx->dram_default_hops[i];
+			for (j = 0 ; j < PTE_ENTRIES_IN_HOP ; j++) {
+				clear_pte(hdev, hop3_pte_addr);
+				put_pte(ctx, ctx->dram_default_hops[i]);
+				hop3_pte_addr += HL_PTE_SIZE;
+			}
+		}
+
+		hop2_pte_addr = hop2_addr;
+		for (i = 0 ; i < num_of_hop3 ; i++) {
+			clear_pte(hdev, hop2_pte_addr);
+			put_pte(ctx, hop2_addr);
+			hop2_pte_addr += HL_PTE_SIZE;
+		}
+
+		clear_pte(hdev, hop1_addr);
+		put_pte(ctx, hop1_addr);
+		clear_pte(hdev, get_hop0_addr(ctx));
+
+		kfree(ctx->dram_default_hops);
+
+		/* flush all writes to reach PCI */
+		mb();
+		hdev->asic_funcs->read_pte(hdev, hop2_addr);
+	}
+
 	if (!hash_empty(ctx->mmu_hash))
-		dev_err(ctx->hdev->dev,
-				"ctx is freed while it has pgts in use\n");
+		dev_err(hdev->dev, "ctx is freed while it has pgts in use\n");
 
 	hash_for_each_safe(ctx->mmu_hash, i, tmp, pgt_info, node) {
-		dev_err(ctx->hdev->dev,
+		dev_err(hdev->dev,
 			"pgt_info of addr 0x%llx of asid %d was not destroyed, num_ptes: %d\n",
 			pgt_info->addr, ctx->asid, pgt_info->num_of_ptes);
 		free_hop(ctx, pgt_info->addr);
@@ -287,6 +430,7 @@ void hl_mmu_ctx_fini(struct hl_ctx *ctx)
 static int _hl_mmu_unmap(struct hl_ctx *ctx, u64 virt_addr)
 {
 	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	u64 hop0_addr = 0, hop0_pte_addr = 0,
 		hop1_addr = 0, hop1_pte_addr = 0,
 		hop2_addr = 0, hop2_pte_addr = 0,
@@ -294,6 +438,11 @@ static int _hl_mmu_unmap(struct hl_ctx *ctx, u64 virt_addr)
 		hop4_addr = 0, hop4_pte_addr = 0,
 		curr_pte;
 	int clear_hop3 = 1;
+	bool is_dram_addr, is_huge, is_dram_default_page_mapping;
+
+	is_dram_addr = hl_mem_area_inside_range(virt_addr, PAGE_SIZE_2MB,
+				prop->va_space_dram_start_address,
+				prop->va_space_dram_end_address);
 
 	hop0_addr = get_hop0_addr(ctx);
 
@@ -328,7 +477,18 @@ static int _hl_mmu_unmap(struct hl_ctx *ctx, u64 virt_addr)
 
 	curr_pte = hdev->asic_funcs->read_pte(hdev, hop3_pte_addr);
 
-	if (!(curr_pte & LAST_MASK)) {
+	is_huge = curr_pte & LAST_MASK;
+
+	if (is_dram_addr && !is_huge) {
+		dev_err(hdev->dev,
+				"DRAM unmapping should use huge pages only\n");
+		return -EFAULT;
+	}
+
+	is_dram_default_page_mapping =
+			hdev->dram_default_page_mapping && is_dram_addr;
+
+	if (!is_huge) {
 		hop4_addr = get_next_hop_addr(curr_pte);
 
 		if (hop4_addr == ULLONG_MAX)
@@ -341,29 +501,51 @@ static int _hl_mmu_unmap(struct hl_ctx *ctx, u64 virt_addr)
 		clear_hop3 = 0;
 	}
 
-	if (!(curr_pte & PAGE_PRESENT_MASK))
-		goto not_mapped;
+	if (is_dram_default_page_mapping) {
+		u64 zero_pte = (prop->mmu_dram_default_page_addr &
+				PTE_PHYS_ADDR_MASK) | LAST_MASK |
+					PAGE_PRESENT_MASK;
+		if (curr_pte == zero_pte) {
+			dev_err(hdev->dev,
+				"DRAM: hop3 PTE points to zero page, can't unmap, va: 0x%llx\n",
+					virt_addr);
+			goto not_mapped;
+		}
+
+		if (!(curr_pte & PAGE_PRESENT_MASK)) {
+			dev_err(hdev->dev,
+				"DRAM: hop3 PTE is cleared! can't unmap, va: 0x%llx\n",
+					virt_addr);
+			goto not_mapped;
+		}
 
-	clear_pte(hdev, hop4_addr ? hop4_pte_addr : hop3_pte_addr);
+		hdev->asic_funcs->write_pte(hdev, hop3_pte_addr, zero_pte);
+		put_pte(ctx, hop3_addr);
+	} else {
+		if (!(curr_pte & PAGE_PRESENT_MASK))
+			goto not_mapped;
+
+		clear_pte(hdev, hop4_addr ? hop4_pte_addr : hop3_pte_addr);
 
-	if (hop4_addr && !put_pte(ctx, hop4_addr))
-		clear_hop3 = 1;
+		if (hop4_addr && !put_pte(ctx, hop4_addr))
+			clear_hop3 = 1;
 
-	if (!clear_hop3)
-		goto flush;
-	clear_pte(hdev, hop3_pte_addr);
+		if (!clear_hop3)
+			goto flush;
+		clear_pte(hdev, hop3_pte_addr);
 
-	if (put_pte(ctx, hop3_addr))
-		goto flush;
-	clear_pte(hdev, hop2_pte_addr);
+		if (put_pte(ctx, hop3_addr))
+			goto flush;
+		clear_pte(hdev, hop2_pte_addr);
 
-	if (put_pte(ctx, hop2_addr))
-		goto flush;
-	clear_pte(hdev, hop1_pte_addr);
+		if (put_pte(ctx, hop2_addr))
+			goto flush;
+		clear_pte(hdev, hop1_pte_addr);
 
-	if (put_pte(ctx, hop1_addr))
-		goto flush;
-	clear_pte(hdev, hop0_pte_addr);
+		if (put_pte(ctx, hop1_addr))
+			goto flush;
+		clear_pte(hdev, hop0_pte_addr);
+	}
 
 flush:
 	/* flush all writes from all cores to reach PCI */
@@ -442,6 +624,7 @@ static int _hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
 		u32 page_size)
 {
 	struct hl_device *hdev = ctx->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	u64 hop0_addr = 0, hop0_pte_addr = 0,
 		hop1_addr = 0, hop1_pte_addr = 0,
 		hop2_addr = 0, hop2_pte_addr = 0,
@@ -449,7 +632,8 @@ static int _hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
 		hop4_addr = 0, hop4_pte_addr = 0,
 		curr_pte = 0;
 	bool hop1_new = false, hop2_new = false, hop3_new = false,
-		hop4_new = false, is_huge;
+		hop4_new = false, is_huge, is_dram_addr,
+		is_dram_default_page_mapping;
 	int rc = -ENOMEM;
 
 	/*
@@ -461,6 +645,18 @@ static int _hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
 	 */
 	is_huge = page_size == PAGE_SIZE_2MB;
 
+	is_dram_addr = hl_mem_area_inside_range(virt_addr, page_size,
+				prop->va_space_dram_start_address,
+				prop->va_space_dram_end_address);
+
+	if (is_dram_addr && !is_huge) {
+		dev_err(hdev->dev, "DRAM mapping should use huge pages only\n");
+		return -EFAULT;
+	}
+
+	is_dram_default_page_mapping =
+			hdev->dram_default_page_mapping && is_dram_addr;
+
 	hop0_addr = get_hop0_addr(ctx);
 
 	hop0_pte_addr = get_hop0_pte_addr(ctx, hop0_addr, virt_addr);
@@ -505,7 +701,26 @@ static int _hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
 		curr_pte = hdev->asic_funcs->read_pte(hdev, hop4_pte_addr);
 	}
 
-	if (curr_pte & PAGE_PRESENT_MASK) {
+	if (is_dram_default_page_mapping) {
+		u64 zero_pte = (prop->mmu_dram_default_page_addr &
+					PTE_PHYS_ADDR_MASK) | LAST_MASK |
+						PAGE_PRESENT_MASK;
+
+		if (curr_pte != zero_pte) {
+			dev_err(hdev->dev,
+				"DRAM: mapping already exists for virt_addr 0x%llx\n",
+					virt_addr);
+			rc = EINVAL;
+			goto err;
+		}
+
+		if (hop1_new || hop2_new || hop3_new || hop4_new) {
+			dev_err(hdev->dev,
+				"DRAM mapping should not allocate more hops\n");
+			rc = -EFAULT;
+			goto err;
+		}
+	} else if (curr_pte & PAGE_PRESENT_MASK) {
 		dev_err(hdev->dev,
 				"mapping already exists for virt_addr 0x%llx\n",
 					virt_addr);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/15] habanalabs: disable CPU access on timeouts
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
  2019-02-28  8:46 ` [PATCH 01/15] habanalabs: Dissociate RAZWI info from event types Oded Gabbay
  2019-02-28  8:46 ` [PATCH 02/15] habanalabs: add MMU DRAM default page mapping Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 04/15] habanalabs: fix mmu cache registers init Oded Gabbay
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

This patch provides a workaround for a bug in the F/W where the response
time for a request from KMD may take more then 100ms. This could cause the
queue between KMD and the F/W to get out of sync.

The WA is to:
1. Increase the timeout of ALL requests to 1s.
2. In case a request isn't answered in time, mark the state as
"cpu_disabled" and prevent sending further requests from KMD to the F/W.
This will eventually lead to a heartbeat failure and hard reset of the
device.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/debugfs.c    | 6 ++++--
 drivers/misc/habanalabs/device.c     | 2 ++
 drivers/misc/habanalabs/goya/goya.c  | 9 +++++++--
 drivers/misc/habanalabs/habanalabs.h | 2 ++
 drivers/misc/habanalabs/hwmon.c      | 2 +-
 drivers/misc/habanalabs/sysfs.c      | 4 ++--
 6 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/habanalabs/debugfs.c b/drivers/misc/habanalabs/debugfs.c
index f472b572faea..1d2bbcf90f16 100644
--- a/drivers/misc/habanalabs/debugfs.c
+++ b/drivers/misc/habanalabs/debugfs.c
@@ -723,7 +723,7 @@ static ssize_t hl_device_read(struct file *f, char __user *buf,
 		return 0;
 
 	sprintf(tmp_buf,
-		"Valid values are: disable, enable, suspend, resume\n");
+		"Valid values: disable, enable, suspend, resume, cpu_timeout\n");
 	rc = simple_read_from_buffer(buf, strlen(tmp_buf) + 1, ppos, tmp_buf,
 			strlen(tmp_buf) + 1);
 
@@ -751,9 +751,11 @@ static ssize_t hl_device_write(struct file *f, const char __user *buf,
 		hdev->asic_funcs->suspend(hdev);
 	} else if (strncmp("resume", data, strlen("resume")) == 0) {
 		hdev->asic_funcs->resume(hdev);
+	} else if (strncmp("cpu_timeout", data, strlen("cpu_timeout")) == 0) {
+		hdev->device_cpu_disabled = true;
 	} else {
 		dev_err(hdev->dev,
-			"Valid values are: disable, enable, suspend, resume\n");
+			"Valid values: disable, enable, suspend, resume, cpu_timeout\n");
 		count = -EINVAL;
 	}
 
diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index 120d30a13afb..de46aa6ed154 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -636,6 +636,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
 	/* Finished tear-down, starting to re-initialize */
 
 	if (hard_reset) {
+		hdev->device_cpu_disabled = false;
+
 		/* Allocate the kernel context */
 		hdev->kernel_ctx = kzalloc(sizeof(*hdev->kernel_ctx),
 						GFP_KERNEL);
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 7c2edabe20bd..5780041abe32 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -3232,6 +3232,11 @@ int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 	if (hdev->disabled)
 		goto out;
 
+	if (hdev->device_cpu_disabled) {
+		rc = -EIO;
+		goto out;
+	}
+
 	rc = hl_hw_queue_send_cb_no_cmpl(hdev, GOYA_QUEUE_ID_CPU_PQ, len,
 			pkt_dma_addr);
 	if (rc) {
@@ -3245,8 +3250,8 @@ int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 	hl_hw_queue_inc_ci_kernel(hdev, GOYA_QUEUE_ID_CPU_PQ);
 
 	if (rc == -ETIMEDOUT) {
-		dev_err(hdev->dev,
-			"Timeout while waiting for CPU packet fence\n");
+		dev_err(hdev->dev, "Timeout while waiting for device CPU\n");
+		hdev->device_cpu_disabled = true;
 		goto out;
 	}
 
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 59b25c6fae00..a7c95e9f9b9a 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -1079,6 +1079,7 @@ struct hl_device_reset_work {
  * @dram_default_page_mapping: is DRAM default page mapping enabled.
  * @init_done: is the initialization of the device done.
  * @mmu_enable: is MMU enabled.
+ * @device_cpu_disabled: is the device CPU disabled (due to timeouts)
  */
 struct hl_device {
 	struct pci_dev			*pdev;
@@ -1146,6 +1147,7 @@ struct hl_device {
 	u8				dram_supports_virtual_memory;
 	u8				dram_default_page_mapping;
 	u8				init_done;
+	u8				device_cpu_disabled;
 
 	/* Parameters for bring-up */
 	u8				mmu_enable;
diff --git a/drivers/misc/habanalabs/hwmon.c b/drivers/misc/habanalabs/hwmon.c
index 9c359a1dd868..7eec21f9b96e 100644
--- a/drivers/misc/habanalabs/hwmon.c
+++ b/drivers/misc/habanalabs/hwmon.c
@@ -10,7 +10,7 @@
 #include <linux/pci.h>
 #include <linux/hwmon.h>
 
-#define SENSORS_PKT_TIMEOUT		100000	/* 100ms */
+#define SENSORS_PKT_TIMEOUT		1000000	/* 1s */
 #define HWMON_NR_SENSOR_TYPES		(hwmon_pwm + 1)
 
 int hl_build_hwmon_channel_info(struct hl_device *hdev,
diff --git a/drivers/misc/habanalabs/sysfs.c b/drivers/misc/habanalabs/sysfs.c
index 6d80e7e0885c..12c782112a8c 100644
--- a/drivers/misc/habanalabs/sysfs.c
+++ b/drivers/misc/habanalabs/sysfs.c
@@ -9,8 +9,8 @@
 
 #include <linux/pci.h>
 
-#define SET_CLK_PKT_TIMEOUT	200000	/* 200ms */
-#define SET_PWR_PKT_TIMEOUT	400000	/* 400ms */
+#define SET_CLK_PKT_TIMEOUT	1000000	/* 1s */
+#define SET_PWR_PKT_TIMEOUT	1000000	/* 1s */
 
 long hl_get_frequency(struct hl_device *hdev, u32 pll_index, bool curr)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/15] habanalabs: fix mmu cache registers init
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (2 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 03/15] habanalabs: disable CPU access on timeouts Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 05/15] habanalabs: fix validation of WREG32 to DMA completion Oded Gabbay
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

This patch fix an incorrect initialization of the MMU cache registers. The
shift operation was done in the wrong direction.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5780041abe32..5444cd0824b4 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -2675,8 +2675,9 @@ static int goya_mmu_init(struct hl_device *hdev)
 	goya->hw_cap_initialized |= HW_CAP_MMU;
 
 	/* init MMU cache manage page */
-	WREG32(mmSTLB_CACHE_INV_BASE_39_8, MMU_CACHE_MNG_ADDR >> 8);
-	WREG32(mmSTLB_CACHE_INV_BASE_49_40, MMU_CACHE_MNG_ADDR << 40);
+	WREG32(mmSTLB_CACHE_INV_BASE_39_8,
+				lower_32_bits(MMU_CACHE_MNG_ADDR >> 8));
+	WREG32(mmSTLB_CACHE_INV_BASE_49_40, MMU_CACHE_MNG_ADDR >> 40);
 
 	/* Remove follower feature due to performance bug */
 	WREG32_AND(mmSTLB_STLB_FEATURE_EN,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/15] habanalabs: fix validation of WREG32 to DMA completion
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (3 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 04/15] habanalabs: fix mmu cache registers init Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 06/15] habanalabs: set DMA0 completion to SOB 1007 Oded Gabbay
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

This patch fix a bug in the validation of WREG32 in DMA queues. The
validation was too strict. It allowed the user to set the completion
address only for DMA channel 1.

The fix allows the user to set the completion address for all 5 DMA
channels.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5444cd0824b4..6f0075c4e935 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -3769,7 +3769,7 @@ static int goya_validate_wreg32(struct hl_device *hdev,
 	dev_dbg(hdev->dev, "reg_offset == 0x%x\n", reg_offset);
 	dev_dbg(hdev->dev, "value      == 0x%x\n", wreg_pkt->value);
 
-	if (reg_offset != (mmDMA_CH_1_WR_COMP_ADDR_LO & 0xFFFF)) {
+	if (reg_offset != (mmDMA_CH_0_WR_COMP_ADDR_LO & 0x1FFF)) {
 		dev_err(hdev->dev, "WREG32 packet with illegal address 0x%x\n",
 			reg_offset);
 		return -EPERM;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/15] habanalabs: set DMA0 completion to SOB 1007
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (4 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 05/15] habanalabs: fix validation of WREG32 to DMA completion Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 07/15] habanalabs: extend QMAN0 job timeout Oded Gabbay
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

This patch fix a bug where DMA channel 0 completion address wasn't
initialized by the driver.

The patch sets the address to Sync Object no. 1007

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c                | 14 +++++++-------
 .../habanalabs/include/goya/asic_reg/goya_regs.h   |  1 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 6f0075c4e935..578e4bdc3a49 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -1036,15 +1036,15 @@ static void goya_init_dma_ch(struct hl_device *hdev, int dma_id)
 	WREG32(mmDMA_CH_0_ERRMSG_WDATA + reg_off,
 			GOYA_ASYNC_EVENT_ID_DMA0_CH + dma_id);
 
-	if (dma_id) {
+	if (dma_id)
 		sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
 				(dma_id - 1) * 4;
-		WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + reg_off,
-				lower_32_bits(sob_addr));
-		WREG32(mmDMA_CH_0_WR_COMP_ADDR_HI + reg_off,
-				upper_32_bits(sob_addr));
-		WREG32(mmDMA_CH_0_WR_COMP_WDATA + reg_off, 0x80000001);
-	}
+	else
+		sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
+
+	WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + reg_off, lower_32_bits(sob_addr));
+	WREG32(mmDMA_CH_0_WR_COMP_ADDR_HI + reg_off, upper_32_bits(sob_addr));
+	WREG32(mmDMA_CH_0_WR_COMP_WDATA + reg_off, 0x80000001);
 }
 
 /*
diff --git a/drivers/misc/habanalabs/include/goya/asic_reg/goya_regs.h b/drivers/misc/habanalabs/include/goya/asic_reg/goya_regs.h
index a3c746849f02..6cb0b6e54d41 100644
--- a/drivers/misc/habanalabs/include/goya/asic_reg/goya_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/goya_regs.h
@@ -108,6 +108,7 @@
 #define mmSYNC_MNGR_MON_PAY_ADDRL_0                                  0x113000
 #define mmSYNC_MNGR_SOB_OBJ_0                                        0x112000
 #define mmSYNC_MNGR_SOB_OBJ_1000                                     0x112FA0
+#define mmSYNC_MNGR_SOB_OBJ_1007                                     0x112FBC
 #define mmSYNC_MNGR_SOB_OBJ_1023                                     0x112FFC
 #define mmSYNC_MNGR_MON_STATUS_0                                     0x114000
 #define mmSYNC_MNGR_MON_STATUS_255                                   0x1143FC
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/15] habanalabs: extend QMAN0 job timeout
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (5 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 06/15] habanalabs: set DMA0 completion to SOB 1007 Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 08/15] habanalabs: add comments in uapi/misc/habanalabs.h Oded Gabbay
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel; +Cc: Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

This patch fix a bug where the timeout for sending a job on QMAN0 by KMD
wasn't enough in palladium environment.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 578e4bdc3a49..13923f4127af 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -83,6 +83,7 @@
 #define GOYA_CPU_TIMEOUT_USEC		10000000	/* 10s */
 #define GOYA_TEST_QUEUE_WAIT_USEC	100000		/* 100ms */
 #define GOYA_PLDM_MMU_TIMEOUT_USEC	(MMU_CONFIG_TIMEOUT_USEC * 100)
+#define GOYA_PLDM_QMAN0_TIMEOUT_USEC	(HL_DEVICE_TIMEOUT_USEC * 30)
 
 #define GOYA_QMAN0_FENCE_VAL		0xD169B243
 
@@ -3126,9 +3127,14 @@ static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
 	u32 *fence_ptr;
 	dma_addr_t fence_dma_addr;
 	struct hl_cb *cb;
-	u32 tmp;
+	u32 tmp, timeout;
 	int rc;
 
+	if (hdev->pldm)
+		timeout = GOYA_PLDM_QMAN0_TIMEOUT_USEC;
+	else
+		timeout = HL_DEVICE_TIMEOUT_USEC;
+
 	if (!hdev->asic_funcs->is_device_idle(hdev)) {
 		dev_err_ratelimited(hdev->dev,
 			"Can't send KMD job on QMAN0 if device is not idle\n");
@@ -3175,8 +3181,8 @@ static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
 		goto free_fence_ptr;
 	}
 
-	rc = hl_poll_timeout_memory(hdev, (u64) (uintptr_t) fence_ptr,
-					HL_DEVICE_TIMEOUT_USEC, &tmp);
+	rc = hl_poll_timeout_memory(hdev, (u64) (uintptr_t) fence_ptr, timeout,
+					&tmp);
 
 	hl_hw_queue_inc_ci_kernel(hdev, GOYA_QUEUE_ID_DMA_0);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/15] habanalabs: add comments in uapi/misc/habanalabs.h
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (6 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 07/15] habanalabs: extend QMAN0 job timeout Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 09/15] habanalabs: return correct error code on MMU mapping failure Oded Gabbay
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 include/uapi/misc/habanalabs.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index 23d6ad3459cb..7fd6f633534c 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -112,7 +112,9 @@ struct hl_cb_in {
 	__u64 cb_handle;
 	/* HL_CB_OP_* */
 	__u32 op;
-	/* Size of CB. Minimum requested size must be PAGE_SIZE */
+	/* Size of CB. Maximum size is 2MB. The minimum size that will be
+	 * allocated, regardless of this parameter's value, is PAGE_SIZE
+	 */
 	__u32 cb_size;
 	/* Context ID - Currently not in use */
 	__u32 ctx_id;
@@ -364,6 +366,12 @@ union hl_mem_args {
  * internal. The driver will get completion notifications from the device only
  * on JOBS which are enqueued in the external queues.
  *
+ * For jobs on external queues, the user needs to create command buffers
+ * through the CB ioctl and give the CB's handle to the CS ioctl. For jobs on
+ * internal queues, the user needs to prepare a "command buffer" with packets
+ * on either the SRAM or DRAM, and give the device address of that buffer to
+ * the CS ioctl.
+ *
  * This IOCTL is asynchronous in regard to the actual execution of the CS. This
  * means it returns immediately after ALL the JOBS were enqueued on their
  * relevant queues. Therefore, the user mustn't assume the CS has been completed
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/15] habanalabs: return correct error code on MMU mapping failure
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (7 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 08/15] habanalabs: add comments in uapi/misc/habanalabs.h Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 10/15] habanalabs: fix memory leak with CBs with unaligned size Oded Gabbay
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel; +Cc: Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

This patch fix a bug where EINVAL was returned instead of -EINVAL.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/mmu.c b/drivers/misc/habanalabs/mmu.c
index a7187f9a5948..ce404e6cc9a9 100644
--- a/drivers/misc/habanalabs/mmu.c
+++ b/drivers/misc/habanalabs/mmu.c
@@ -710,7 +710,7 @@ static int _hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
 			dev_err(hdev->dev,
 				"DRAM: mapping already exists for virt_addr 0x%llx\n",
 					virt_addr);
-			rc = EINVAL;
+			rc = -EINVAL;
 			goto err;
 		}
 
@@ -744,7 +744,7 @@ static int _hl_mmu_map(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
 							hop4_pte_addr),
 							hop4_pte_addr);
 
-		rc = EINVAL;
+		rc = -EINVAL;
 		goto err;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/15] habanalabs: fix memory leak with CBs with unaligned size
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (8 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 09/15] habanalabs: return correct error code on MMU mapping failure Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 11/15] habanalabs: print pointer using %p Oded Gabbay
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

This patch fix a bug when a command buffer with unaligned size (with
regard to PAGE_SIZE) was used. The accounting for the unmap operation
wasn't done correctly and could result in a memory leak.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/command_buffer.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/command_buffer.c b/drivers/misc/habanalabs/command_buffer.c
index 28e359731fb8..85f75806a9a7 100644
--- a/drivers/misc/habanalabs/command_buffer.c
+++ b/drivers/misc/habanalabs/command_buffer.c
@@ -236,11 +236,14 @@ int hl_cb_ioctl(struct hl_fpriv *hpriv, void *data)
 static void cb_vm_close(struct vm_area_struct *vma)
 {
 	struct hl_cb *cb = (struct hl_cb *) vma->vm_private_data;
+	long new_mmap_size;
 
-	cb->mmap_size -= vma->vm_end - vma->vm_start;
+	new_mmap_size = cb->mmap_size - (vma->vm_end - vma->vm_start);
 
-	if (cb->mmap_size)
+	if (new_mmap_size > 0) {
+		cb->mmap_size = new_mmap_size;
 		return;
+	}
 
 	spin_lock(&cb->lock);
 	cb->mmap = false;
@@ -273,7 +276,7 @@ int hl_cb_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)
 	}
 
 	/* Validation check */
-	if ((vma->vm_end - vma->vm_start) != cb->size) {
+	if ((vma->vm_end - vma->vm_start) != ALIGN(cb->size, PAGE_SIZE)) {
 		dev_err(hdev->dev,
 			"CB mmap failed, mmap size 0x%lx != 0x%x cb size\n",
 			vma->vm_end - vma->vm_start, cb->size);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/15] habanalabs: print pointer using %p
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (9 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 10/15] habanalabs: fix memory leak with CBs with unaligned size Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  9:31   ` Greg KH
  2019-02-28  8:46 ` [PATCH 12/15] habanalabs: soft-reset device if context-switch fails Oded Gabbay
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 13923f4127af..39824214ce61 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -4276,9 +4276,8 @@ static int goya_parse_cb_no_ext_quque(struct hl_device *hdev,
 			return 0;
 
 		dev_err(hdev->dev,
-			"Internal CB address 0x%llx + 0x%x is not in SRAM nor in DRAM\n",
-			(u64) (uintptr_t) parser->user_cb,
-			parser->user_cb_size);
+			"Internal CB address %p + 0x%x is not in SRAM nor in DRAM\n",
+			parser->user_cb, parser->user_cb_size);
 
 		return -EFAULT;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/15] habanalabs: soft-reset device if context-switch fails
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (10 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 11/15] habanalabs: print pointer using %p Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 13/15] habanalabs: fix little-endian<->cpu conversion warnings Oded Gabbay
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

This patch fix a bug in the driver, where if the TPC or MME remains in
non-IDLE even after all the command submissions are done (due to user bug
or malicious user), then future command submissions will fail in the
context-switch stage and the driver will remain in "stuck" mode.

The fix is to do a soft-reset of the device in case the context-switch
fails, because the device should be IDLE during context-switch. If it is
not IDLE, then something is wrong and we should reset the compute engines.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/command_submission.c | 16 +++++++++-------
 drivers/misc/habanalabs/goya/goya.c          |  2 +-
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/habanalabs/command_submission.c b/drivers/misc/habanalabs/command_submission.c
index 25ad9d805cfa..3525236ed8d9 100644
--- a/drivers/misc/habanalabs/command_submission.c
+++ b/drivers/misc/habanalabs/command_submission.c
@@ -622,13 +622,15 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
 					"Failed to switch to context %d, rejecting CS! %d\n",
 					ctx->asid, rc);
 				/*
-				 * If we timedout, we need to soft-reset because
-				 * QMAN is probably stuck. However, we can't
-				 * call to reset here directly because of
-				 * deadlock, so need to do it at the very end
-				 * of this function
+				 * If we timedout, or if the device is not IDLE
+				 * while we want to do context-switch (-EBUSY),
+				 * we need to soft-reset because QMAN is
+				 * probably stuck. However, we can't call to
+				 * reset here directly because of deadlock, so
+				 * need to do it at the very end of this
+				 * function
 				 */
-				if (rc == -ETIMEDOUT)
+				if ((rc == -ETIMEDOUT) || (rc == -EBUSY))
 					need_soft_reset = true;
 				mutex_unlock(&hpriv->restore_phase_mutex);
 				goto out;
@@ -706,7 +708,7 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
 		args->out.seq = cs_seq;
 	}
 
-	if ((rc == -ETIMEDOUT) && (need_soft_reset))
+	if (((rc == -ETIMEDOUT) || (rc == -EBUSY)) && (need_soft_reset))
 		hl_device_reset(hdev, false, false);
 
 	return rc;
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 39824214ce61..11597432f519 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -3138,7 +3138,7 @@ static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
 	if (!hdev->asic_funcs->is_device_idle(hdev)) {
 		dev_err_ratelimited(hdev->dev,
 			"Can't send KMD job on QMAN0 if device is not idle\n");
-		return -EFAULT;
+		return -EBUSY;
 	}
 
 	fence_ptr = hdev->asic_funcs->dma_pool_zalloc(hdev, 4, GFP_KERNEL,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 13/15] habanalabs: fix little-endian<->cpu conversion warnings
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (11 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 12/15] habanalabs: soft-reset device if context-switch fails Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 14/15] habanalabs: use NULL to initialize array of pointers Oded Gabbay
  2019-02-28  8:46 ` [PATCH 15/15] habanalabs: fix little-endian<->cpu conversion warnings Oded Gabbay
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel; +Cc: Tomer Tayar

From: Tomer Tayar <ttayar@habana.ai>

Add __cpu_to_le16/32/64 and __le16/32/64_to_cpu where needed according to
sparse.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/goya/goya.c | 223 ++++++++++++++++------------
 1 file changed, 125 insertions(+), 98 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 11597432f519..c4f3ec1e9d8b 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -381,7 +381,7 @@ int goya_send_pci_access_msg(struct hl_device *hdev, u32 opcode)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = opcode << ARMCP_PKT_CTL_OPCODE_SHIFT;
+	pkt.ctl = cpu_to_le32(opcode << ARMCP_PKT_CTL_OPCODE_SHIFT);
 
 	return hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt,
 			sizeof(pkt), HL_DEVICE_TIMEOUT_USEC, NULL);
@@ -3167,12 +3167,13 @@ static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
 	fence_pkt = (struct packet_msg_prot *) (uintptr_t) (cb->kernel_address +
 			job->job_cb_size - sizeof(struct packet_msg_prot));
 
-	fence_pkt->ctl = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+	tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 			(1 << GOYA_PKT_CTL_EB_SHIFT) |
 			(1 << GOYA_PKT_CTL_MB_SHIFT);
-	fence_pkt->value = GOYA_QMAN0_FENCE_VAL;
-	fence_pkt->addr = fence_dma_addr +
-			hdev->asic_prop.host_phys_base_address;
+	fence_pkt->ctl = cpu_to_le32(tmp);
+	fence_pkt->value = cpu_to_le32(GOYA_QMAN0_FENCE_VAL);
+	fence_pkt->addr = cpu_to_le64(fence_dma_addr +
+					hdev->asic_prop.host_phys_base_address);
 
 	rc = hl_hw_queue_send_cb_no_cmpl(hdev, GOYA_QUEUE_ID_DMA_0,
 					job->job_cb_size, cb->bus_address);
@@ -3263,16 +3264,17 @@ int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 	}
 
 	if (tmp == ARMCP_PACKET_FENCE_VAL) {
-		rc = (pkt->ctl & ARMCP_PKT_CTL_RC_MASK) >>
-						ARMCP_PKT_CTL_RC_SHIFT;
+		u32 ctl = le32_to_cpu(pkt->ctl);
+
+		rc = (ctl & ARMCP_PKT_CTL_RC_MASK) >> ARMCP_PKT_CTL_RC_SHIFT;
 		if (rc) {
 			dev_err(hdev->dev,
 				"F/W ERROR %d for CPU packet %d\n",
-				rc, (pkt->ctl & ARMCP_PKT_CTL_OPCODE_MASK)
+				rc, (ctl & ARMCP_PKT_CTL_OPCODE_MASK)
 						>> ARMCP_PKT_CTL_OPCODE_SHIFT);
 			rc = -EINVAL;
 		} else if (result) {
-			*result = pkt->result;
+			*result = (long) le64_to_cpu(pkt->result);
 		}
 	} else {
 		dev_err(hdev->dev, "CPU packet wrong fence value\n");
@@ -3318,12 +3320,13 @@ int goya_test_queue(struct hl_device *hdev, u32 hw_queue_id)
 		goto free_fence_ptr;
 	}
 
-	fence_pkt->ctl = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+	tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 			(1 << GOYA_PKT_CTL_EB_SHIFT) |
 			(1 << GOYA_PKT_CTL_MB_SHIFT);
-	fence_pkt->value = fence_val;
-	fence_pkt->addr = fence_dma_addr +
-				hdev->asic_prop.host_phys_base_address;
+	fence_pkt->ctl = cpu_to_le32(tmp);
+	fence_pkt->value = cpu_to_le32(fence_val);
+	fence_pkt->addr = cpu_to_le64(fence_dma_addr +
+					hdev->asic_prop.host_phys_base_address);
 
 	rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id,
 					sizeof(struct packet_msg_prot),
@@ -3369,8 +3372,9 @@ int goya_test_cpu_queue(struct hl_device *hdev)
 
 	memset(&test_pkt, 0, sizeof(test_pkt));
 
-	test_pkt.ctl = ARMCP_PACKET_TEST << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	test_pkt.value = ARMCP_PACKET_FENCE_VAL;
+	test_pkt.ctl = cpu_to_le32(ARMCP_PACKET_TEST <<
+					ARMCP_PKT_CTL_OPCODE_SHIFT);
+	test_pkt.value = cpu_to_le64(ARMCP_PACKET_FENCE_VAL);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &test_pkt,
 			sizeof(test_pkt), HL_DEVICE_TIMEOUT_USEC, &result);
@@ -3514,7 +3518,7 @@ static int goya_pin_memory_before_cs(struct hl_device *hdev,
 	struct hl_userptr *userptr;
 	int rc;
 
-	if (hl_userptr_is_pinned(hdev, addr, user_dma_pkt->tsize,
+	if (hl_userptr_is_pinned(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 			parser->job_userptr_list, &userptr))
 		goto already_pinned;
 
@@ -3522,7 +3526,8 @@ static int goya_pin_memory_before_cs(struct hl_device *hdev,
 	if (!userptr)
 		return -ENOMEM;
 
-	rc = hl_pin_host_memory(hdev, addr, user_dma_pkt->tsize, userptr);
+	rc = hl_pin_host_memory(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
+				userptr);
 	if (rc)
 		goto free_userptr;
 
@@ -3561,12 +3566,15 @@ static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 	bool sram_addr = true;
 	bool skip_host_mem_pin = false;
 	bool user_memset;
+	u32 ctl;
 	int rc = 0;
 
-	user_dir = (user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+	ctl = le32_to_cpu(user_dma_pkt->ctl);
+
+	user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 			GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 
-	user_memset = (user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
+	user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 			GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 
 	switch (user_dir) {
@@ -3574,8 +3582,8 @@ static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 		dev_dbg(hdev->dev, "DMA direction is HOST --> DRAM\n");
 		dir = DMA_TO_DEVICE;
 		sram_addr = false;
-		addr = user_dma_pkt->src_addr;
-		device_memory_addr = user_dma_pkt->dst_addr;
+		addr = le64_to_cpu(user_dma_pkt->src_addr);
+		device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 		if (user_memset)
 			skip_host_mem_pin = true;
 		break;
@@ -3584,15 +3592,15 @@ static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 		dev_dbg(hdev->dev, "DMA direction is DRAM --> HOST\n");
 		dir = DMA_FROM_DEVICE;
 		sram_addr = false;
-		addr = user_dma_pkt->dst_addr;
-		device_memory_addr = user_dma_pkt->src_addr;
+		addr = le64_to_cpu(user_dma_pkt->dst_addr);
+		device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 		break;
 
 	case DMA_HOST_TO_SRAM:
 		dev_dbg(hdev->dev, "DMA direction is HOST --> SRAM\n");
 		dir = DMA_TO_DEVICE;
-		addr = user_dma_pkt->src_addr;
-		device_memory_addr = user_dma_pkt->dst_addr;
+		addr = le64_to_cpu(user_dma_pkt->src_addr);
+		device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 		if (user_memset)
 			skip_host_mem_pin = true;
 		break;
@@ -3600,8 +3608,8 @@ static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 	case DMA_SRAM_TO_HOST:
 		dev_dbg(hdev->dev, "DMA direction is SRAM --> HOST\n");
 		dir = DMA_FROM_DEVICE;
-		addr = user_dma_pkt->dst_addr;
-		device_memory_addr = user_dma_pkt->src_addr;
+		addr = le64_to_cpu(user_dma_pkt->dst_addr);
+		device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 		break;
 	default:
 		dev_err(hdev->dev, "DMA direction is undefined\n");
@@ -3611,7 +3619,7 @@ static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 	if (parser->ctx_id != HL_KERNEL_ASID_ID) {
 		if (sram_addr) {
 			if (!hl_mem_area_inside_range(device_memory_addr,
-					user_dma_pkt->tsize,
+					le32_to_cpu(user_dma_pkt->tsize),
 					hdev->asic_prop.sram_user_base_address,
 					hdev->asic_prop.sram_end_address)) {
 
@@ -3623,7 +3631,7 @@ static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 			}
 		} else {
 			if (!hl_mem_area_inside_range(device_memory_addr,
-					user_dma_pkt->tsize,
+					le32_to_cpu(user_dma_pkt->tsize),
 					hdev->asic_prop.dram_user_base_address,
 					hdev->asic_prop.dram_end_address)) {
 
@@ -3659,21 +3667,24 @@ static int goya_validate_dma_pkt_no_host(struct hl_device *hdev,
 {
 	u64 sram_memory_addr, dram_memory_addr;
 	enum goya_dma_direction user_dir;
+	u32 ctl;
 
-	user_dir = (user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+	ctl = le32_to_cpu(user_dma_pkt->ctl);
+	user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 			GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 
 	if (user_dir == DMA_DRAM_TO_SRAM) {
 		dev_dbg(hdev->dev, "DMA direction is DRAM --> SRAM\n");
-		dram_memory_addr = user_dma_pkt->src_addr;
-		sram_memory_addr = user_dma_pkt->dst_addr;
+		dram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+		sram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 	} else {
 		dev_dbg(hdev->dev, "DMA direction is SRAM --> DRAM\n");
-		sram_memory_addr = user_dma_pkt->src_addr;
-		dram_memory_addr = user_dma_pkt->dst_addr;
+		sram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+		dram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 	}
 
-	if (!hl_mem_area_inside_range(sram_memory_addr, user_dma_pkt->tsize,
+	if (!hl_mem_area_inside_range(sram_memory_addr,
+				le32_to_cpu(user_dma_pkt->tsize),
 				hdev->asic_prop.sram_user_base_address,
 				hdev->asic_prop.sram_end_address)) {
 		dev_err(hdev->dev, "SRAM address 0x%llx + 0x%x is invalid\n",
@@ -3681,7 +3692,8 @@ static int goya_validate_dma_pkt_no_host(struct hl_device *hdev,
 		return -EFAULT;
 	}
 
-	if (!hl_mem_area_inside_range(dram_memory_addr, user_dma_pkt->tsize,
+	if (!hl_mem_area_inside_range(dram_memory_addr,
+				le32_to_cpu(user_dma_pkt->tsize),
 				hdev->asic_prop.dram_user_base_address,
 				hdev->asic_prop.dram_end_address)) {
 		dev_err(hdev->dev, "DRAM address 0x%llx + 0x%x is invalid\n",
@@ -3699,6 +3711,7 @@ static int goya_validate_dma_pkt_no_mmu(struct hl_device *hdev,
 				struct packet_lin_dma *user_dma_pkt)
 {
 	enum goya_dma_direction user_dir;
+	u32 ctl;
 	int rc;
 
 	dev_dbg(hdev->dev, "DMA packet details:\n");
@@ -3706,7 +3719,8 @@ static int goya_validate_dma_pkt_no_mmu(struct hl_device *hdev,
 	dev_dbg(hdev->dev, "destination == 0x%llx\n", user_dma_pkt->dst_addr);
 	dev_dbg(hdev->dev, "size == %u\n", user_dma_pkt->tsize);
 
-	user_dir = (user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+	ctl = le32_to_cpu(user_dma_pkt->ctl);
+	user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 			GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 
 	/*
@@ -3741,8 +3755,8 @@ static int goya_validate_dma_pkt_mmu(struct hl_device *hdev,
 	 * We can't allow user to read from Host using QMANs other than 1.
 	 */
 	if (parser->hw_queue_id > GOYA_QUEUE_ID_DMA_1 &&
-		hl_mem_area_inside_range(user_dma_pkt->src_addr,
-				user_dma_pkt->tsize,
+		hl_mem_area_inside_range(le64_to_cpu(user_dma_pkt->src_addr),
+				le32_to_cpu(user_dma_pkt->tsize),
 				hdev->asic_prop.va_space_host_start_address,
 				hdev->asic_prop.va_space_host_end_address)) {
 		dev_err(hdev->dev,
@@ -3769,7 +3783,8 @@ static int goya_validate_wreg32(struct hl_device *hdev,
 	u32 sob_start_addr, sob_end_addr;
 	u16 reg_offset;
 
-	reg_offset = wreg_pkt->ctl & GOYA_PKT_WREG32_CTL_REG_OFFSET_MASK;
+	reg_offset = le32_to_cpu(wreg_pkt->ctl) &
+			GOYA_PKT_WREG32_CTL_REG_OFFSET_MASK;
 
 	dev_dbg(hdev->dev, "WREG32 packet details:\n");
 	dev_dbg(hdev->dev, "reg_offset == 0x%x\n", reg_offset);
@@ -3792,8 +3807,8 @@ static int goya_validate_wreg32(struct hl_device *hdev,
 	sob_start_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 	sob_end_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1023);
 
-	if ((wreg_pkt->value < sob_start_addr) ||
-			(wreg_pkt->value > sob_end_addr)) {
+	if ((le32_to_cpu(wreg_pkt->value) < sob_start_addr) ||
+			(le32_to_cpu(wreg_pkt->value) > sob_end_addr)) {
 
 		dev_err(hdev->dev, "WREG32 packet with illegal value 0x%x\n",
 			wreg_pkt->value);
@@ -3919,12 +3934,14 @@ static int goya_patch_dma_packet(struct hl_device *hdev,
 	struct sg_table *sgt;
 	bool skip_host_mem_pin = false;
 	bool user_memset;
-	u32 user_rdcomp_mask, user_wrcomp_mask;
+	u32 user_rdcomp_mask, user_wrcomp_mask, ctl;
 
-	user_dir = (user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+	ctl = le32_to_cpu(user_dma_pkt->ctl);
+
+	user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 			GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 
-	user_memset = (user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
+	user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 			GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 
 	if ((user_dir == DMA_DRAM_TO_SRAM) || (user_dir == DMA_SRAM_TO_DRAM) ||
@@ -3935,19 +3952,20 @@ static int goya_patch_dma_packet(struct hl_device *hdev,
 	}
 
 	if ((user_dir == DMA_HOST_TO_DRAM) || (user_dir == DMA_HOST_TO_SRAM)) {
-		addr = user_dma_pkt->src_addr;
-		device_memory_addr = user_dma_pkt->dst_addr;
+		addr = le64_to_cpu(user_dma_pkt->src_addr);
+		device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 		dir = DMA_TO_DEVICE;
 		if (user_memset)
 			skip_host_mem_pin = true;
 	} else {
-		addr = user_dma_pkt->dst_addr;
-		device_memory_addr = user_dma_pkt->src_addr;
+		addr = le64_to_cpu(user_dma_pkt->dst_addr);
+		device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 		dir = DMA_FROM_DEVICE;
 	}
 
 	if ((!skip_host_mem_pin) &&
-		(hl_userptr_is_pinned(hdev, addr, user_dma_pkt->tsize,
+		(hl_userptr_is_pinned(hdev, addr,
+			le32_to_cpu(user_dma_pkt->tsize),
 			parser->job_userptr_list, &userptr) == false)) {
 		dev_err(hdev->dev, "Userptr 0x%llx + 0x%x NOT mapped\n",
 				addr, user_dma_pkt->tsize);
@@ -3960,11 +3978,9 @@ static int goya_patch_dma_packet(struct hl_device *hdev,
 		return 0;
 	}
 
-	user_rdcomp_mask =
-			(user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK);
+	user_rdcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK;
 
-	user_wrcomp_mask =
-			(user_dma_pkt->ctl & GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK);
+	user_wrcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK;
 
 	sgt = userptr->sgt;
 	dma_desc_cnt = 0;
@@ -3994,21 +4010,22 @@ static int goya_patch_dma_packet(struct hl_device *hdev,
 			}
 		}
 
-		new_dma_pkt->ctl = user_dma_pkt->ctl;
+		ctl = le32_to_cpu(user_dma_pkt->ctl);
 		if (likely(dma_desc_cnt))
-			new_dma_pkt->ctl &= ~GOYA_PKT_CTL_EB_MASK;
-		new_dma_pkt->ctl &= ~(GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK |
-					GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK);
-		new_dma_pkt->tsize = len;
+			ctl &= ~GOYA_PKT_CTL_EB_MASK;
+		ctl &= ~(GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK |
+				GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK);
+		new_dma_pkt->ctl = cpu_to_le32(ctl);
+		new_dma_pkt->tsize = cpu_to_le32((u32) len);
 
 		dma_addr += hdev->asic_prop.host_phys_base_address;
 
 		if (dir == DMA_TO_DEVICE) {
-			new_dma_pkt->src_addr = dma_addr;
-			new_dma_pkt->dst_addr = device_memory_addr;
+			new_dma_pkt->src_addr = cpu_to_le64(dma_addr);
+			new_dma_pkt->dst_addr = cpu_to_le64(device_memory_addr);
 		} else {
-			new_dma_pkt->src_addr = device_memory_addr;
-			new_dma_pkt->dst_addr = dma_addr;
+			new_dma_pkt->src_addr = cpu_to_le64(device_memory_addr);
+			new_dma_pkt->dst_addr = cpu_to_le64(dma_addr);
 		}
 
 		if (!user_memset)
@@ -4025,7 +4042,7 @@ static int goya_patch_dma_packet(struct hl_device *hdev,
 
 	/* Fix the last dma packet - rdcomp/wrcomp must be as user set them */
 	new_dma_pkt--;
-	new_dma_pkt->ctl |= (user_rdcomp_mask | user_wrcomp_mask);
+	new_dma_pkt->ctl |= cpu_to_le32(user_rdcomp_mask | user_wrcomp_mask);
 
 	*new_dma_pkt_size = dma_desc_cnt * sizeof(struct packet_lin_dma);
 
@@ -4302,22 +4319,25 @@ void goya_add_end_of_cb_packets(u64 kernel_address, u32 len, u64 cq_addr,
 				u32 cq_val, u32 msix_vec)
 {
 	struct packet_msg_prot *cq_pkt;
+	u32 tmp;
 
 	cq_pkt = (struct packet_msg_prot *) (uintptr_t)
 		(kernel_address + len - (sizeof(struct packet_msg_prot) * 2));
 
-	cq_pkt->ctl = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+	tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 			(1 << GOYA_PKT_CTL_EB_SHIFT) |
 			(1 << GOYA_PKT_CTL_MB_SHIFT);
-	cq_pkt->value = cq_val;
-	cq_pkt->addr = cq_addr;
+	cq_pkt->ctl = cpu_to_le32(tmp);
+	cq_pkt->value = cpu_to_le32(cq_val);
+	cq_pkt->addr = cpu_to_le64(cq_addr);
 
 	cq_pkt++;
 
-	cq_pkt->ctl = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+	tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 			(1 << GOYA_PKT_CTL_MB_SHIFT);
-	cq_pkt->value = msix_vec & 0x7FF;
-	cq_pkt->addr = CFG_BASE + mmPCIE_DBI_MSIX_DOORBELL_OFF;
+	cq_pkt->ctl = cpu_to_le32(tmp);
+	cq_pkt->value = cpu_to_le32(msix_vec & 0x7FF);
+	cq_pkt->addr = cpu_to_le64(CFG_BASE + mmPCIE_DBI_MSIX_DOORBELL_OFF);
 }
 
 static void goya_update_eq_ci(struct hl_device *hdev, u32 val)
@@ -4640,11 +4660,11 @@ static int goya_unmask_irq_arr(struct hl_device *hdev, u32 *irq_arr,
 	if (!pkt)
 		return -ENOMEM;
 
-	pkt->length = irq_arr_size / sizeof(irq_arr[0]);
+	pkt->length = cpu_to_le32(irq_arr_size / sizeof(irq_arr[0]));
 	memcpy(&pkt->irqs, irq_arr, irq_arr_size);
 
-	pkt->armcp_pkt.ctl = ARMCP_PACKET_UNMASK_RAZWI_IRQ_ARRAY <<
-						ARMCP_PKT_CTL_OPCODE_SHIFT;
+	pkt->armcp_pkt.ctl = cpu_to_le32(ARMCP_PACKET_UNMASK_RAZWI_IRQ_ARRAY <<
+						ARMCP_PKT_CTL_OPCODE_SHIFT);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) pkt,
 			total_pkt_size, HL_DEVICE_TIMEOUT_USEC, &result);
@@ -4675,8 +4695,9 @@ static int goya_unmask_irq(struct hl_device *hdev, u16 event_type)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_UNMASK_RAZWI_IRQ << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.value = event_type;
+	pkt.ctl = cpu_to_le32(ARMCP_PACKET_UNMASK_RAZWI_IRQ <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.value = cpu_to_le64(event_type);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 			HL_DEVICE_TIMEOUT_USEC, &result);
@@ -4689,8 +4710,9 @@ static int goya_unmask_irq(struct hl_device *hdev, u16 event_type)
 
 void goya_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
 {
-	u16 event_type = ((eq_entry->hdr.ctl & EQ_CTL_EVENT_TYPE_MASK)
-			>> EQ_CTL_EVENT_TYPE_SHIFT);
+	u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
+	u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
+				>> EQ_CTL_EVENT_TYPE_SHIFT);
 	struct goya_device *goya = hdev->asic_specific;
 
 	goya->events_stat[event_type]++;
@@ -4800,7 +4822,7 @@ static int goya_memset_device_memory(struct hl_device *hdev, u64 addr, u32 size,
 	struct packet_lin_dma *lin_dma_pkt;
 	struct hl_cs_parser parser;
 	struct hl_cs_job *job;
-	u32 cb_size;
+	u32 cb_size, ctl;
 	struct hl_cb *cb;
 	int rc;
 
@@ -4813,18 +4835,18 @@ static int goya_memset_device_memory(struct hl_device *hdev, u64 addr, u32 size,
 	memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
 	cb_size = sizeof(*lin_dma_pkt);
 
-	lin_dma_pkt->ctl = ((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
-				(1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
-				(1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
-				(1 << GOYA_PKT_CTL_RB_SHIFT) |
-				(1 << GOYA_PKT_CTL_MB_SHIFT));
-
-	lin_dma_pkt->ctl |= (is_dram ? DMA_HOST_TO_DRAM : DMA_HOST_TO_SRAM) <<
-				GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
+	ctl = ((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
+			(1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
+			(1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
+			(1 << GOYA_PKT_CTL_RB_SHIFT) |
+			(1 << GOYA_PKT_CTL_MB_SHIFT));
+	ctl |= (is_dram ? DMA_HOST_TO_DRAM : DMA_HOST_TO_SRAM) <<
+			GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
+	lin_dma_pkt->ctl = cpu_to_le32(ctl);
 
-	lin_dma_pkt->src_addr = val;
-	lin_dma_pkt->dst_addr = addr;
-	lin_dma_pkt->tsize = size;
+	lin_dma_pkt->src_addr = cpu_to_le64(val);
+	lin_dma_pkt->dst_addr = cpu_to_le64(addr);
+	lin_dma_pkt->tsize = cpu_to_le32(size);
 
 	job = hl_cs_allocate_job(hdev, true);
 	if (!job) {
@@ -5077,8 +5099,9 @@ int goya_send_heartbeat(struct hl_device *hdev)
 
 	memset(&hb_pkt, 0, sizeof(hb_pkt));
 
-	hb_pkt.ctl = ARMCP_PACKET_TEST << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	hb_pkt.value = ARMCP_PACKET_FENCE_VAL;
+	hb_pkt.ctl = cpu_to_le32(ARMCP_PACKET_TEST <<
+					ARMCP_PKT_CTL_OPCODE_SHIFT);
+	hb_pkt.value = cpu_to_le64(ARMCP_PACKET_FENCE_VAL);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &hb_pkt,
 			sizeof(hb_pkt), HL_DEVICE_TIMEOUT_USEC, &result);
@@ -5116,9 +5139,11 @@ static int goya_armcp_info_get(struct hl_device *hdev)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_INFO_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.addr = armcp_info_dma_addr + prop->host_phys_base_address;
-	pkt.data_max_size = sizeof(struct armcp_info);
+	pkt.ctl = cpu_to_le32(ARMCP_PACKET_INFO_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.addr = cpu_to_le64(armcp_info_dma_addr +
+				prop->host_phys_base_address);
+	pkt.data_max_size = cpu_to_le32(sizeof(struct armcp_info));
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 			GOYA_ARMCP_INFO_TIMEOUT, &result);
@@ -5132,7 +5157,7 @@ static int goya_armcp_info_get(struct hl_device *hdev)
 	memcpy(&prop->armcp_info, armcp_info_cpu_addr,
 			sizeof(prop->armcp_info));
 
-	dram_size = prop->armcp_info.dram_size;
+	dram_size = le64_to_cpu(prop->armcp_info.dram_size);
 	if (dram_size) {
 		if ((!is_power_of_2(dram_size)) ||
 				(dram_size < DRAM_PHYS_DEFAULT_SIZE)) {
@@ -5270,9 +5295,11 @@ static int goya_get_eeprom_data(struct hl_device *hdev, void *data,
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_EEPROM_DATA_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.addr = eeprom_info_dma_addr + prop->host_phys_base_address;
-	pkt.data_max_size = max_size;
+	pkt.ctl = cpu_to_le32(ARMCP_PACKET_EEPROM_DATA_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.addr = cpu_to_le64(eeprom_info_dma_addr +
+				prop->host_phys_base_address);
+	pkt.data_max_size = cpu_to_le32(max_size);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 			GOYA_ARMCP_EEPROM_TIMEOUT, &result);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 14/15] habanalabs: use NULL to initialize array of pointers
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (12 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 13/15] habanalabs: fix little-endian<->cpu conversion warnings Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  2019-02-28  8:46 ` [PATCH 15/15] habanalabs: fix little-endian<->cpu conversion warnings Oded Gabbay
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

This patch fixes the following sparse warnings:

drivers/misc/habanalabs/hwmon.c:20:56: warning: Using plain integer as NULL pointer

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/hwmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/hwmon.c b/drivers/misc/habanalabs/hwmon.c
index 7eec21f9b96e..af81084ef495 100644
--- a/drivers/misc/habanalabs/hwmon.c
+++ b/drivers/misc/habanalabs/hwmon.c
@@ -17,7 +17,7 @@ int hl_build_hwmon_channel_info(struct hl_device *hdev,
 				struct armcp_sensor *sensors_arr)
 {
 	u32 counts[HWMON_NR_SENSOR_TYPES] = {0};
-	u32 *sensors_by_type[HWMON_NR_SENSOR_TYPES] = {0};
+	u32 *sensors_by_type[HWMON_NR_SENSOR_TYPES] = {NULL};
 	u32 sensors_by_type_next_index[HWMON_NR_SENSOR_TYPES] = {0};
 	struct hwmon_channel_info **channels_info;
 	u32 num_sensors_for_type, num_active_sensor_types = 0,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 15/15] habanalabs: fix little-endian<->cpu conversion warnings
  2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
                   ` (13 preceding siblings ...)
  2019-02-28  8:46 ` [PATCH 14/15] habanalabs: use NULL to initialize array of pointers Oded Gabbay
@ 2019-02-28  8:46 ` Oded Gabbay
  14 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  8:46 UTC (permalink / raw)
  To: gregkh, linux-kernel

Add __cpu_to_le16/32/64 and __le16/32/64_to_cpu where needed according to
sparse.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/debugfs.c          | 15 ++++---
 drivers/misc/habanalabs/habanalabs_ioctl.c |  2 +-
 drivers/misc/habanalabs/hw_queue.c         | 23 +++++-----
 drivers/misc/habanalabs/hwmon.c            | 50 ++++++++++++----------
 drivers/misc/habanalabs/irq.c              |  8 ++--
 drivers/misc/habanalabs/sysfs.c            | 25 ++++++-----
 6 files changed, 70 insertions(+), 53 deletions(-)

diff --git a/drivers/misc/habanalabs/debugfs.c b/drivers/misc/habanalabs/debugfs.c
index 1d2bbcf90f16..a53c12aff6ad 100644
--- a/drivers/misc/habanalabs/debugfs.c
+++ b/drivers/misc/habanalabs/debugfs.c
@@ -29,7 +29,8 @@ static int hl_debugfs_i2c_read(struct hl_device *hdev, u8 i2c_bus, u8 i2c_addr,
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_I2C_RD << ARMCP_PKT_CTL_OPCODE_SHIFT;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_I2C_RD <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
 	pkt.i2c_bus = i2c_bus;
 	pkt.i2c_addr = i2c_addr;
 	pkt.i2c_reg = i2c_reg;
@@ -54,11 +55,12 @@ static int hl_debugfs_i2c_write(struct hl_device *hdev, u8 i2c_bus, u8 i2c_addr,
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_I2C_WR << ARMCP_PKT_CTL_OPCODE_SHIFT;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_I2C_WR <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
 	pkt.i2c_bus = i2c_bus;
 	pkt.i2c_addr = i2c_addr;
 	pkt.i2c_reg = i2c_reg;
-	pkt.value = val;
+	pkt.value = __cpu_to_le64(val);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					HL_DEVICE_TIMEOUT_USEC, NULL);
@@ -79,9 +81,10 @@ static void hl_debugfs_led_set(struct hl_device *hdev, u8 led, u8 state)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_LED_SET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.led_index = led;
-	pkt.value = state;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_LED_SET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.led_index = __cpu_to_le32(led);
+	pkt.value = __cpu_to_le64(state);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 						HL_DEVICE_TIMEOUT_USEC, NULL);
diff --git a/drivers/misc/habanalabs/habanalabs_ioctl.c b/drivers/misc/habanalabs/habanalabs_ioctl.c
index 12408d3302e9..2c2739a3c5ec 100644
--- a/drivers/misc/habanalabs/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/habanalabs_ioctl.c
@@ -39,7 +39,7 @@ static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args)
 	hw_ip.num_of_events = prop->num_of_events;
 	memcpy(hw_ip.armcp_version,
 		prop->armcp_info.armcp_version, VERSION_MAX_LEN);
-	hw_ip.armcp_cpld_version = prop->armcp_info.cpld_version;
+	hw_ip.armcp_cpld_version = __le32_to_cpu(prop->armcp_info.cpld_version);
 	hw_ip.psoc_pci_pll_nr = prop->psoc_pci_pll_nr;
 	hw_ip.psoc_pci_pll_nf = prop->psoc_pci_pll_nf;
 	hw_ip.psoc_pci_pll_od = prop->psoc_pci_pll_od;
diff --git a/drivers/misc/habanalabs/hw_queue.c b/drivers/misc/habanalabs/hw_queue.c
index 68dfda59a875..67bece26417c 100644
--- a/drivers/misc/habanalabs/hw_queue.c
+++ b/drivers/misc/habanalabs/hw_queue.c
@@ -80,9 +80,9 @@ static void ext_queue_submit_bd(struct hl_device *hdev, struct hl_hw_queue *q,
 
 	bd = (struct hl_bd *) (uintptr_t) q->kernel_address;
 	bd += hl_pi_2_offset(q->pi);
-	bd->ctl = ctl;
-	bd->len = len;
-	bd->ptr = ptr + hdev->asic_prop.host_phys_base_address;
+	bd->ctl = __cpu_to_le32(ctl);
+	bd->len = __cpu_to_le32(len);
+	bd->ptr = __cpu_to_le64(ptr + hdev->asic_prop.host_phys_base_address);
 
 	q->pi = hl_queue_inc_ptr(q->pi);
 	hdev->asic_funcs->ring_doorbell(hdev, q->hw_queue_id, q->pi);
@@ -249,10 +249,11 @@ static void ext_hw_queue_schedule_job(struct hl_cs_job *job)
 	len = job->job_cb_size;
 	ptr = cb->bus_address;
 
-	cq_pkt.data = (q->pi << CQ_ENTRY_SHADOW_INDEX_SHIFT)
-					& CQ_ENTRY_SHADOW_INDEX_MASK;
-	cq_pkt.data |= 1 << CQ_ENTRY_SHADOW_INDEX_VALID_SHIFT;
-	cq_pkt.data |= 1 << CQ_ENTRY_READY_SHIFT;
+	cq_pkt.data = __cpu_to_le32(
+				((q->pi << CQ_ENTRY_SHADOW_INDEX_SHIFT)
+					& CQ_ENTRY_SHADOW_INDEX_MASK) |
+				(1 << CQ_ENTRY_SHADOW_INDEX_VALID_SHIFT) |
+				(1 << CQ_ENTRY_READY_SHIFT));
 
 	/*
 	 * No need to protect pi_offset because scheduling to the
@@ -267,7 +268,9 @@ static void ext_hw_queue_schedule_job(struct hl_cs_job *job)
 	cq_addr += cq->pi * sizeof(struct hl_cq_entry);
 
 	hdev->asic_funcs->add_end_of_cb_packets(cb->kernel_address, len,
-				cq_addr, cq_pkt.data, q->hw_queue_id);
+						cq_addr,
+						__le32_to_cpu(cq_pkt.data),
+						q->hw_queue_id);
 
 	q->shadow_queue[hl_pi_2_offset(q->pi)] = job;
 
@@ -292,8 +295,8 @@ static void int_hw_queue_schedule_job(struct hl_cs_job *job)
 	u64 *pi, *pbd = (u64 *) &bd;
 
 	bd.ctl = 0;
-	bd.len = job->job_cb_size;
-	bd.ptr = (u64) (uintptr_t) job->user_cb;
+	bd.len = __cpu_to_le32(job->job_cb_size);
+	bd.ptr = __cpu_to_le64((u64) (uintptr_t) job->user_cb);
 
 	pi = (u64 *) (uintptr_t) (q->kernel_address +
 		((q->pi & (q->int_queue_len - 1)) * sizeof(bd)));
diff --git a/drivers/misc/habanalabs/hwmon.c b/drivers/misc/habanalabs/hwmon.c
index af81084ef495..77facd25c4a2 100644
--- a/drivers/misc/habanalabs/hwmon.c
+++ b/drivers/misc/habanalabs/hwmon.c
@@ -26,7 +26,7 @@ int hl_build_hwmon_channel_info(struct hl_device *hdev,
 	int rc, i, j;
 
 	for (i = 0 ; i < ARMCP_MAX_SENSORS ; i++) {
-		type = sensors_arr[i].type;
+		type = __le32_to_cpu(sensors_arr[i].type);
 
 		if ((type == 0) && (sensors_arr[i].flags == 0))
 			break;
@@ -58,10 +58,10 @@ int hl_build_hwmon_channel_info(struct hl_device *hdev,
 	}
 
 	for (i = 0 ; i < arr_size ; i++) {
-		type = sensors_arr[i].type;
+		type = __le32_to_cpu(sensors_arr[i].type);
 		curr_arr = sensors_by_type[type];
 		curr_arr[sensors_by_type_next_index[type]++] =
-				sensors_arr[i].flags;
+				__le32_to_cpu(sensors_arr[i].flags);
 	}
 
 	channels_info = kcalloc(num_active_sensor_types + 1,
@@ -273,9 +273,10 @@ long hl_get_temperature(struct hl_device *hdev, int sensor_index, u32 attr)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_TEMPERATURE_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.sensor_index = sensor_index;
-	pkt.type = attr;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_TEMPERATURE_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.sensor_index = __cpu_to_le16(sensor_index);
+	pkt.type = __cpu_to_le16(attr);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 			SENSORS_PKT_TIMEOUT, &result);
@@ -298,9 +299,10 @@ long hl_get_voltage(struct hl_device *hdev, int sensor_index, u32 attr)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_VOLTAGE_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.sensor_index = sensor_index;
-	pkt.type = attr;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_VOLTAGE_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.sensor_index = __cpu_to_le16(sensor_index);
+	pkt.type = __cpu_to_le16(attr);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					SENSORS_PKT_TIMEOUT, &result);
@@ -323,9 +325,10 @@ long hl_get_current(struct hl_device *hdev, int sensor_index, u32 attr)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_CURRENT_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.sensor_index = sensor_index;
-	pkt.type = attr;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_CURRENT_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.sensor_index = __cpu_to_le16(sensor_index);
+	pkt.type = __cpu_to_le16(attr);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					SENSORS_PKT_TIMEOUT, &result);
@@ -348,9 +351,10 @@ long hl_get_fan_speed(struct hl_device *hdev, int sensor_index, u32 attr)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_FAN_SPEED_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.sensor_index = sensor_index;
-	pkt.type = attr;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_FAN_SPEED_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.sensor_index = __cpu_to_le16(sensor_index);
+	pkt.type = __cpu_to_le16(attr);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					SENSORS_PKT_TIMEOUT, &result);
@@ -373,9 +377,10 @@ long hl_get_pwm_info(struct hl_device *hdev, int sensor_index, u32 attr)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_PWM_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.sensor_index = sensor_index;
-	pkt.type = attr;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_PWM_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.sensor_index = __cpu_to_le16(sensor_index);
+	pkt.type = __cpu_to_le16(attr);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					SENSORS_PKT_TIMEOUT, &result);
@@ -398,10 +403,11 @@ void hl_set_pwm_info(struct hl_device *hdev, int sensor_index, u32 attr,
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_PWM_SET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.sensor_index = sensor_index;
-	pkt.type = attr;
-	pkt.value = value;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_PWM_SET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.sensor_index = __cpu_to_le16(sensor_index);
+	pkt.type = __cpu_to_le16(attr);
+	pkt.value = __cpu_to_le64(value);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					SENSORS_PKT_TIMEOUT, NULL);
diff --git a/drivers/misc/habanalabs/irq.c b/drivers/misc/habanalabs/irq.c
index d4c2077a3718..e69a09c10e3f 100644
--- a/drivers/misc/habanalabs/irq.c
+++ b/drivers/misc/habanalabs/irq.c
@@ -161,8 +161,8 @@ irqreturn_t hl_irq_handler_eq(int irq, void *arg)
 
 	while (1) {
 		bool entry_ready =
-				((eq_base[eq->ci].hdr.ctl & EQ_CTL_READY_MASK)
-						>> EQ_CTL_READY_SHIFT);
+			((__le32_to_cpu(eq_base[eq->ci].hdr.ctl) &
+				EQ_CTL_READY_MASK) >> EQ_CTL_READY_SHIFT);
 
 		if (!entry_ready)
 			break;
@@ -194,7 +194,9 @@ irqreturn_t hl_irq_handler_eq(int irq, void *arg)
 		}
 skip_irq:
 		/* Clear EQ entry ready bit */
-		eq_entry->hdr.ctl &= ~EQ_CTL_READY_MASK;
+		eq_entry->hdr.ctl =
+			__cpu_to_le32(__le32_to_cpu(eq_entry->hdr.ctl) &
+							~EQ_CTL_READY_MASK);
 
 		eq->ci = hl_eq_inc_ptr(eq->ci);
 
diff --git a/drivers/misc/habanalabs/sysfs.c b/drivers/misc/habanalabs/sysfs.c
index 12c782112a8c..c900ab15cceb 100644
--- a/drivers/misc/habanalabs/sysfs.c
+++ b/drivers/misc/habanalabs/sysfs.c
@@ -21,12 +21,12 @@ long hl_get_frequency(struct hl_device *hdev, u32 pll_index, bool curr)
 	memset(&pkt, 0, sizeof(pkt));
 
 	if (curr)
-		pkt.ctl = ARMCP_PACKET_FREQUENCY_CURR_GET <<
-						ARMCP_PKT_CTL_OPCODE_SHIFT;
+		pkt.ctl = __cpu_to_le32(ARMCP_PACKET_FREQUENCY_CURR_GET <<
+						ARMCP_PKT_CTL_OPCODE_SHIFT);
 	else
-		pkt.ctl = ARMCP_PACKET_FREQUENCY_GET <<
-						ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.pll_index = pll_index;
+		pkt.ctl = __cpu_to_le32(ARMCP_PACKET_FREQUENCY_GET <<
+						ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.pll_index = __cpu_to_le32(pll_index);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 						SET_CLK_PKT_TIMEOUT, &result);
@@ -48,9 +48,10 @@ void hl_set_frequency(struct hl_device *hdev, u32 pll_index, u64 freq)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_FREQUENCY_SET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.pll_index = pll_index;
-	pkt.value = freq;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_FREQUENCY_SET <<
+					ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.pll_index = __cpu_to_le32(pll_index);
+	pkt.value = __cpu_to_le64(freq);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					SET_CLK_PKT_TIMEOUT, NULL);
@@ -69,7 +70,8 @@ u64 hl_get_max_power(struct hl_device *hdev)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_MAX_POWER_GET << ARMCP_PKT_CTL_OPCODE_SHIFT;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_MAX_POWER_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 						SET_PWR_PKT_TIMEOUT, &result);
@@ -89,8 +91,9 @@ void hl_set_max_power(struct hl_device *hdev, u64 value)
 
 	memset(&pkt, 0, sizeof(pkt));
 
-	pkt.ctl = ARMCP_PACKET_MAX_POWER_SET << ARMCP_PKT_CTL_OPCODE_SHIFT;
-	pkt.value = value;
+	pkt.ctl = __cpu_to_le32(ARMCP_PACKET_MAX_POWER_SET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.value = __cpu_to_le64(value);
 
 	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 					SET_PWR_PKT_TIMEOUT, NULL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 11/15] habanalabs: print pointer using %p
  2019-02-28  8:46 ` [PATCH 11/15] habanalabs: print pointer using %p Oded Gabbay
@ 2019-02-28  9:31   ` Greg KH
  2019-02-28  9:47     ` Oded Gabbay
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2019-02-28  9:31 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-kernel

On Thu, Feb 28, 2019 at 10:46:20AM +0200, Oded Gabbay wrote:
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

I can't take patches without any changelog text :(


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 11/15] habanalabs: print pointer using %p
  2019-02-28  9:31   ` Greg KH
@ 2019-02-28  9:47     ` Oded Gabbay
  0 siblings, 0 replies; 18+ messages in thread
From: Oded Gabbay @ 2019-02-28  9:47 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux-Kernel@Vger. Kernel. Org

On Thu, Feb 28, 2019 at 11:31 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Feb 28, 2019 at 10:46:20AM +0200, Oded Gabbay wrote:
> > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
>
> I can't take patches without any changelog text :(
>
ah, didn't know it was required for even this trivial change.
I'll resend this and another patch that is without commit message.

Oded

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-02-28  9:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-28  8:46 [PATCH 00/15] habanalabs fixes for merge window Oded Gabbay
2019-02-28  8:46 ` [PATCH 01/15] habanalabs: Dissociate RAZWI info from event types Oded Gabbay
2019-02-28  8:46 ` [PATCH 02/15] habanalabs: add MMU DRAM default page mapping Oded Gabbay
2019-02-28  8:46 ` [PATCH 03/15] habanalabs: disable CPU access on timeouts Oded Gabbay
2019-02-28  8:46 ` [PATCH 04/15] habanalabs: fix mmu cache registers init Oded Gabbay
2019-02-28  8:46 ` [PATCH 05/15] habanalabs: fix validation of WREG32 to DMA completion Oded Gabbay
2019-02-28  8:46 ` [PATCH 06/15] habanalabs: set DMA0 completion to SOB 1007 Oded Gabbay
2019-02-28  8:46 ` [PATCH 07/15] habanalabs: extend QMAN0 job timeout Oded Gabbay
2019-02-28  8:46 ` [PATCH 08/15] habanalabs: add comments in uapi/misc/habanalabs.h Oded Gabbay
2019-02-28  8:46 ` [PATCH 09/15] habanalabs: return correct error code on MMU mapping failure Oded Gabbay
2019-02-28  8:46 ` [PATCH 10/15] habanalabs: fix memory leak with CBs with unaligned size Oded Gabbay
2019-02-28  8:46 ` [PATCH 11/15] habanalabs: print pointer using %p Oded Gabbay
2019-02-28  9:31   ` Greg KH
2019-02-28  9:47     ` Oded Gabbay
2019-02-28  8:46 ` [PATCH 12/15] habanalabs: soft-reset device if context-switch fails Oded Gabbay
2019-02-28  8:46 ` [PATCH 13/15] habanalabs: fix little-endian<->cpu conversion warnings Oded Gabbay
2019-02-28  8:46 ` [PATCH 14/15] habanalabs: use NULL to initialize array of pointers Oded Gabbay
2019-02-28  8:46 ` [PATCH 15/15] habanalabs: fix little-endian<->cpu conversion warnings Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).