[PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled
@ 2014-01-16 10:25 Hannes Reinecke
  2014-01-16 10:25 ` [PATCH 1/6] megaraid_sas: Do not wait forever Hannes Reinecke
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-16 10:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke

Hi all,

recently I've enabled VT-d on one of my systems, which happened to have
one of the latest Megaraid SAS cards (Fury) installed.
And it failed miserably due to invalid DMAR tables.
Which would be okay, if just the HBA failed to initialize.
But the entire system stalled as the megaraid_sas driver
went into an endless loop waiting for the init command to
come back, which it never did.

So I went on to debug this, and found several issues along the way.
With this patchset the initialisation routine for megaraid_sas
correctly aborts and allows the system to boot.

Hannes Reinecke (6):
  megaraid_sas: Do not wait forever
  megaraid_sas_fusion: Fixup fire_cmd syntax
  megaraid_sas_fusion: correctly pass queue info pointer
  megaraid_sas: catch errors from megasas_get_map_info()
  megaraid_sas_fusion: Return correct error value in
    megasas_get_ld_map_info()
  megaraid_sas: check return value for megasas_get_pd_list()

 drivers/scsi/megaraid/megaraid_sas_base.c   | 56 +++++++++++++---------
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 74 ++++++++++++++++-------------
 drivers/scsi/megaraid/megaraid_sas_fusion.h |  8 +---
 3 files changed, 75 insertions(+), 63 deletions(-)

-- 
1.7.12.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] megaraid_sas: Do not wait forever
  2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
@ 2014-01-16 10:25 ` Hannes Reinecke
  2014-01-24  7:46   ` Desai, Kashyap
  2014-01-16 10:25 ` [PATCH 2/6] megaraid_sas_fusion: Fixup fire_cmd syntax Hannes Reinecke
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-16 10:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Kashyap Desai, Adam Radford

If the firmware is incommunicado for whatever reason the driver
will wait forever during initialisation, causing all sorts
of hangcheck timers to trigger.
We should rather wait for a defined time, and give up on the
command if no response was received.

Cc: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Adam Radford <aradford@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 43 ++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 3b7ad10..95d4e5c 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -911,9 +911,11 @@ megasas_issue_blocked_cmd(struct megasas_instance *instance,
 
 	instance->instancet->issue_dcmd(instance, cmd);
 
-	wait_event(instance->int_cmd_wait_q, cmd->cmd_status != ENODATA);
+	wait_event_timeout(instance->int_cmd_wait_q,
+			   cmd->cmd_status != ENODATA,
+			   MEGASAS_INTERNAL_CMD_WAIT_TIME * HZ);
 
-	return 0;
+	return cmd->cmd_status == ENODATA ? -ENODATA : 0;
 }
 
 /**
@@ -932,11 +934,12 @@ megasas_issue_blocked_abort_cmd(struct megasas_instance *instance,
 {
 	struct megasas_cmd *cmd;
 	struct megasas_abort_frame *abort_fr;
+	int status;
 
 	cmd = megasas_get_cmd(instance);
 
 	if (!cmd)
-		return -1;
+		return -ENOMEM;
 
 	abort_fr = &cmd->frame->abort;
 
@@ -960,11 +963,14 @@ megasas_issue_blocked_abort_cmd(struct megasas_instance *instance,
 	/*
 	 * Wait for this cmd to complete
 	 */
-	wait_event(instance->abort_cmd_wait_q, cmd->cmd_status != 0xFF);
+	wait_event_timeout(instance->abort_cmd_wait_q,
+			   cmd->cmd_status != 0xFF,
+			   MEGASAS_INTERNAL_CMD_WAIT_TIME * HZ);
 	cmd->sync_cmd = 0;
+	status = cmd->cmd_status;
 
 	megasas_return_cmd(instance, cmd);
-	return 0;
+	return status == 0xFF ? -ENODATA : 0;
 }
 
 /**
@@ -3902,6 +3908,7 @@ megasas_get_seq_num(struct megasas_instance *instance,
 	struct megasas_dcmd_frame *dcmd;
 	struct megasas_evt_log_info *el_info;
 	dma_addr_t el_info_h = 0;
+	int rc;
 
 	cmd = megasas_get_cmd(instance);
 
@@ -3933,23 +3940,23 @@ megasas_get_seq_num(struct megasas_instance *instance,
 	dcmd->sgl.sge32[0].phys_addr = cpu_to_le32(el_info_h);
 	dcmd->sgl.sge32[0].length = cpu_to_le32(sizeof(struct megasas_evt_log_info));
 
-	megasas_issue_blocked_cmd(instance, cmd);
-
-	/*
-	 * Copy the data back into callers buffer
-	 */
-	eli->newest_seq_num = le32_to_cpu(el_info->newest_seq_num);
-	eli->oldest_seq_num = le32_to_cpu(el_info->oldest_seq_num);
-	eli->clear_seq_num = le32_to_cpu(el_info->clear_seq_num);
-	eli->shutdown_seq_num = le32_to_cpu(el_info->shutdown_seq_num);
-	eli->boot_seq_num = le32_to_cpu(el_info->boot_seq_num);
-
+	rc = megasas_issue_blocked_cmd(instance, cmd);
+	if (!rc) {
+		/*
+		 * Copy the data back into callers buffer
+		 */
+		eli->newest_seq_num = le32_to_cpu(el_info->newest_seq_num);
+		eli->oldest_seq_num = le32_to_cpu(el_info->oldest_seq_num);
+		eli->clear_seq_num = le32_to_cpu(el_info->clear_seq_num);
+		eli->shutdown_seq_num = le32_to_cpu(el_info->shutdown_seq_num);
+		eli->boot_seq_num = le32_to_cpu(el_info->boot_seq_num);
+	}
 	pci_free_consistent(instance->pdev, sizeof(struct megasas_evt_log_info),
 			    el_info, el_info_h);
 
 	megasas_return_cmd(instance, cmd);
 
-	return 0;
+	return rc;
 }
 
 /**
@@ -5045,7 +5052,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
 	 * cmd to the SCSI mid-layer
 	 */
 	cmd->sync_cmd = 1;
-	megasas_issue_blocked_cmd(instance, cmd);
+	error = megasas_issue_blocked_cmd(instance, cmd);
 	cmd->sync_cmd = 0;
 
 	/*
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] megaraid_sas_fusion: Fixup fire_cmd syntax
  2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
  2014-01-16 10:25 ` [PATCH 1/6] megaraid_sas: Do not wait forever Hannes Reinecke
@ 2014-01-16 10:25 ` Hannes Reinecke
  2014-01-16 10:25 ` [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer Hannes Reinecke
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-16 10:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Kashyap Desai, Adam Radford

The syntax for the 'fire_cmd' callback is not used correctly,
so fix it up to be consistent with the original definition.

Cc: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Adam Radford <aradford@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 26 ++++++++++++--------------
 drivers/scsi/megaraid/megaraid_sas_fusion.h |  8 +-------
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index f655592..0a588a6 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -669,8 +669,8 @@ megasas_ioc_init_fusion(struct megasas_instance *instance)
 			break;
 	}
 
-	instance->instancet->fire_cmd(instance, req_desc->u.low,
-				      req_desc->u.high, instance->reg_set);
+	instance->instancet->fire_cmd(instance, req_desc->Words, 0,
+				      instance->reg_set);
 
 	wait_and_poll(instance, cmd);
 
@@ -1052,16 +1052,17 @@ fail_alloc_mfi_cmds:
  */
 void
 megasas_fire_cmd_fusion(struct megasas_instance *instance,
-			dma_addr_t req_desc_lo,
-			u32 req_desc_hi,
+			dma_addr_t frame_phys_addr,
+			u32 frame_count,
 			struct megasas_register_set __iomem *regs)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&instance->hba_lock, flags);
-
-	writel(le32_to_cpu(req_desc_lo), &(regs)->inbound_low_queue_port);
-	writel(le32_to_cpu(req_desc_hi), &(regs)->inbound_high_queue_port);
+	writel(le32_to_cpu(lower_32_bits(frame_phys_addr)),
+	       &(regs)->inbound_low_queue_port);
+	writel(le32_to_cpu(upper_32_bits(frame_phys_addr)),
+	       &(regs)->inbound_high_queue_port);
 	spin_unlock_irqrestore(&instance->hba_lock, flags);
 }
 
@@ -1830,8 +1831,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
 	 */
 	atomic_inc(&instance->fw_outstanding);
 
-	instance->instancet->fire_cmd(instance,
-				      req_desc->u.low, req_desc->u.high,
+	instance->instancet->fire_cmd(instance, req_desc->Words, 0,
 				      instance->reg_set);
 
 	return 0;
@@ -2158,8 +2158,8 @@ megasas_issue_dcmd_fusion(struct megasas_instance *instance,
 		printk(KERN_ERR "Couldn't issue MFI pass thru cmd\n");
 		return;
 	}
-	instance->instancet->fire_cmd(instance, req_desc->u.low,
-				      req_desc->u.high, instance->reg_set);
+	instance->instancet->fire_cmd(instance, req_desc->Words, 0,
+				      instance->reg_set);
 }
 
 /**
@@ -2441,9 +2441,7 @@ int megasas_reset_fusion(struct Scsi_Host *shost)
 							instance->instancet->
 							fire_cmd(instance,
 								 req_desc->
-								 u.low,
-								 req_desc->
-								 u.high,
+								 Words, 0,
 								 instance->
 								 reg_set);
 						}
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h b/drivers/scsi/megaraid/megaraid_sas_fusion.h
index 35a5139..321896f 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.h
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h
@@ -366,13 +366,7 @@ union MEGASAS_REQUEST_DESCRIPTOR_UNION {
 	struct MPI2_SCSI_TARGET_REQUEST_DESCRIPTOR         SCSITarget;
 	struct MPI2_RAID_ACCEL_REQUEST_DESCRIPTOR          RAIDAccelerator;
 	struct MEGASAS_RAID_MFA_IO_REQUEST_DESCRIPTOR      MFAIo;
-	union {
-		struct {
-			u32 low;
-			u32 high;
-		} u;
-		u64 Words;
-	};
+	u64 Words;
 };
 
 /* Default Reply Descriptor */
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer
  2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
  2014-01-16 10:25 ` [PATCH 1/6] megaraid_sas: Do not wait forever Hannes Reinecke
  2014-01-16 10:25 ` [PATCH 2/6] megaraid_sas_fusion: Fixup fire_cmd syntax Hannes Reinecke
@ 2014-01-16 10:25 ` Hannes Reinecke
  2014-01-24  8:41   ` Desai, Kashyap
  2014-01-16 10:25 ` [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info() Hannes Reinecke
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-16 10:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Kashyap Desai, Adam Radford

The pointer to the queue info structure is potentially
a 64-bit value, so we should be using the correct macros
to set the values in the init frame.

Cc: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Adam Radford <aradford@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 0a588a6..5c30f9d 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -644,7 +644,10 @@ megasas_ioc_init_fusion(struct megasas_instance *instance)
 	/* Convert capability to LE32 */
 	cpu_to_le32s((u32 *)&init_frame->driver_operations.mfi_capabilities);
 
-	init_frame->queue_info_new_phys_addr_lo = cpu_to_le32((u32)ioc_init_handle);
+	init_frame->queue_info_new_phys_addr_hi =
+		cpu_to_le32(upper_32_bits(ioc_init_handle));
+	init_frame->queue_info_new_phys_addr_lo =
+		cpu_to_le32(lower_32_bits(ioc_init_handle));
 	init_frame->data_xfer_len = cpu_to_le32(sizeof(struct MPI2_IOC_INIT_REQUEST));
 
 	req_desc =
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info()
  2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
                   ` (2 preceding siblings ...)
  2014-01-16 10:25 ` [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer Hannes Reinecke
@ 2014-01-16 10:25 ` Hannes Reinecke
  2014-01-24  8:35   ` Desai, Kashyap
  2014-01-16 10:25 ` [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info() Hannes Reinecke
  2014-01-16 10:25 ` [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list() Hannes Reinecke
  5 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-16 10:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Kashyap Desai, Adam Radford

megasas_get_map_info() might fail, after which it'll be
pointless to call megasas_sync_map_info().
So update megasas_get_map_info() to correctly handle errors
and call megasas_sync_map_info() only if no error occurred.

Cc: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Adam Radford <aradford@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/megaraid/megaraid_sas_base.c   |  5 ++--
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 41 ++++++++++++++++-------------
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 95d4e5c..d17a097 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4689,8 +4689,9 @@ megasas_resume(struct pci_dev *pdev)
 			megasas_free_cmds_fusion(instance);
 			goto fail_init_mfi;
 		}
-		if (!megasas_get_map_info(instance))
-			megasas_sync_map_info(instance);
+		if (megasas_get_map_info(instance) < 0)
+			goto fail_init_mfi;
+		megasas_sync_map_info(instance);
 	}
 	break;
 	default:
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 5c30f9d..be6de80 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -773,15 +773,16 @@ u8
 megasas_get_map_info(struct megasas_instance *instance)
 {
 	struct fusion_context *fusion = instance->ctrl_context;
+	int rc;
 
 	fusion->fast_path_io = 0;
-	if (!megasas_get_ld_map_info(instance)) {
-		if (MR_ValidateMapInfo(instance)) {
-			fusion->fast_path_io = 1;
-			return 0;
-		}
-	}
-	return 1;
+	rc = megasas_get_ld_map_info(instance);
+	if (rc < 0)
+		return rc;
+
+	if (MR_ValidateMapInfo(instance))
+		fusion->fast_path_io = 1;
+	return 0;
 }
 
 /*
@@ -807,6 +808,14 @@ megasas_sync_map_info(struct megasas_instance *instance)
 	dma_addr_t ci_h = 0;
 	u32 size_map_info;
 
+	fusion = instance->ctrl_context;
+	if (!fusion)
+		return -ENXIO;
+
+	if (!fusion->fast_path_io)
+		return 0;
+
+
 	cmd = megasas_get_cmd(instance);
 
 	if (!cmd) {
@@ -815,13 +824,6 @@ megasas_sync_map_info(struct megasas_instance *instance)
 		return -ENOMEM;
 	}
 
-	fusion = instance->ctrl_context;
-
-	if (!fusion) {
-		megasas_return_cmd(instance, cmd);
-		return 1;
-	}
-
 	map = fusion->ld_map[instance->map_id & 1];
 
 	num_lds = le32_to_cpu(map->raidMap.ldCount);
@@ -1030,8 +1032,10 @@ megasas_init_adapter_fusion(struct megasas_instance *instance)
 		}
 	}
 
-	if (!megasas_get_map_info(instance))
-		megasas_sync_map_info(instance);
+	if (megasas_get_map_info(instance) < 0)
+		goto fail_map_info;
+
+	megasas_sync_map_info(instance);
 
 	return 0;
 
@@ -2457,8 +2461,9 @@ int megasas_reset_fusion(struct Scsi_Host *shost)
 			       sizeof(struct LD_LOAD_BALANCE_INFO)
 			       *MAX_LOGICAL_DRIVES);
 
-			if (!megasas_get_map_info(instance))
-				megasas_sync_map_info(instance);
+			if (megasas_get_map_info(instance) < 0)
+				break;
+			megasas_sync_map_info(instance);
 
 			/* Adapter reset completed successfully */
 			printk(KERN_WARNING "megaraid_sas: Reset "
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info()
  2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
                   ` (3 preceding siblings ...)
  2014-01-16 10:25 ` [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info() Hannes Reinecke
@ 2014-01-16 10:25 ` Hannes Reinecke
  2014-01-24  8:45   ` Desai, Kashyap
  2014-01-16 10:25 ` [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list() Hannes Reinecke
  5 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-16 10:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Kashyap Desai, Adam Radford

When no HBA is found we should be returning '-ENXIO' to be consistent
with the other return values.

Cc: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Adam Radford <aradford@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index be6de80..c2dc033 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -726,7 +726,7 @@ megasas_get_ld_map_info(struct megasas_instance *instance)
 
 	if (!fusion) {
 		megasas_return_cmd(instance, cmd);
-		return 1;
+		return -ENXIO;
 	}
 
 	dcmd = &cmd->frame->dcmd;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list()
  2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
                   ` (4 preceding siblings ...)
  2014-01-16 10:25 ` [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info() Hannes Reinecke
@ 2014-01-16 10:25 ` Hannes Reinecke
  2014-01-24  8:38   ` Desai, Kashyap
  5 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-16 10:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Kashyap Desai, Adam Radford

When megasas_get_pd_list() fails we cannot detect any drives,
so we should be checking the return value accordingly.

Cc: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Adam Radford <aradford@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index d17a097..6b4c4b7 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3769,7 +3769,10 @@ static int megasas_init_fw(struct megasas_instance *instance)
 
 	memset(instance->pd_list, 0 ,
 		(MEGASAS_MAX_PD * sizeof(struct megasas_pd_list)));
-	megasas_get_pd_list(instance);
+	if (megasas_get_pd_list(instance) < 0) {
+		printk(KERN_ERR "megasas: failed to get PD list\n");
+		goto fail_init_adapter;
+	}
 
 	memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);
 	if (megasas_ld_list_query(instance,
@@ -5600,7 +5603,7 @@ megasas_aen_polling(struct work_struct *work)
 
 	if (doscan) {
 		printk(KERN_INFO "scanning ...\n");
-		megasas_get_pd_list(instance);
+		if (megasas_get_pd_list(instance) == 0) {
 		for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
 			for (j = 0; j < MEGASAS_MAX_DEV_PER_CHANNEL; j++) {
 				pd_index = i*MEGASAS_MAX_DEV_PER_CHANNEL + j;
@@ -5620,6 +5623,7 @@ megasas_aen_polling(struct work_struct *work)
 				}
 			}
 		}
+		}
 
 		if (megasas_ld_list_query(instance,
 					  MR_LD_QUERY_TYPE_EXPOSED_TO_HOST))
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/6] megaraid_sas: Do not wait forever
  2014-01-16 10:25 ` [PATCH 1/6] megaraid_sas: Do not wait forever Hannes Reinecke
@ 2014-01-24  7:46   ` Desai, Kashyap
  2014-01-24  8:24     ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Desai, Kashyap @ 2014-01-24  7:46 UTC (permalink / raw)
  To: Hannes Reinecke, James Bottomley; +Cc: linux-scsi, Adam Radford, Saxena, Sumit

Hannes:

We have already worked on "wait_event" usage in "megasas_issue_blocked_cmd". That code will be posted  by LSI once we received test result from LSI Q/A team.

If you see the current OCR code in Linux Driver we do "re-send the IOCTL command".
MR product does not want IOCTL timeout due to some reason. That is why even if FW faulted, Driver will do OCR and re-send all existing <Management commands> (IOCTL comes under management commands).

Just for info. (see below snippet in  OCR code)

/* Re-fire management commands */
                        for (j = 0 ; j < instance->max_fw_cmds; j++) {
                                cmd_fusion = fusion->cmd_list[j];
                                if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
                                        cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
                                        if (cmd_mfi->frame->dcmd.opcode == MR_DCMD_LD_MAP_GET_INFO) {
                                                megasas_return_cmd(instance, cmd_mfi);
                                                megasas_return_cmd_fusion(instance, cmd_fusion);



Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL path. [ I added timeout only for limited DCMDs, which are harmless to continue after timeout ]

As of now, you can skip this patch and we will be submitting patch to fix similar issue.
But note, we cannot add complete "wait_event_timeout" due to day-1 design, but will try to cover wait_event_timout for some valid cases.

` Kashyap

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Thursday, January 16, 2014 3:56 PM
> To: James Bottomley
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke; Desai, Kashyap; Adam
> Radford
> Subject: [PATCH 1/6] megaraid_sas: Do not wait forever
> 
> If the firmware is incommunicado for whatever reason the driver will wait
> forever during initialisation, causing all sorts of hangcheck timers to trigger.
> We should rather wait for a defined time, and give up on the command if no
> response was received.
> 
> Cc: Kashyap Desai <kashyap.desai@lsi.com>
> Cc: Adam Radford <aradford@gmail.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c | 43 ++++++++++++++++++----
> ---------
>  1 file changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 3b7ad10..95d4e5c 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -911,9 +911,11 @@ megasas_issue_blocked_cmd(struct
> megasas_instance *instance,
> 
>  	instance->instancet->issue_dcmd(instance, cmd);
> 
> -	wait_event(instance->int_cmd_wait_q, cmd->cmd_status !=
> ENODATA);
> +	wait_event_timeout(instance->int_cmd_wait_q,
> +			   cmd->cmd_status != ENODATA,
> +			   MEGASAS_INTERNAL_CMD_WAIT_TIME * HZ);
> 
> -	return 0;
> +	return cmd->cmd_status == ENODATA ? -ENODATA : 0;
>  }
> 
>  /**
> @@ -932,11 +934,12 @@ megasas_issue_blocked_abort_cmd(struct
> megasas_instance *instance,  {
>  	struct megasas_cmd *cmd;
>  	struct megasas_abort_frame *abort_fr;
> +	int status;
> 
>  	cmd = megasas_get_cmd(instance);
> 
>  	if (!cmd)
> -		return -1;
> +		return -ENOMEM;
> 
>  	abort_fr = &cmd->frame->abort;
> 
> @@ -960,11 +963,14 @@ megasas_issue_blocked_abort_cmd(struct
> megasas_instance *instance,
>  	/*
>  	 * Wait for this cmd to complete
>  	 */
> -	wait_event(instance->abort_cmd_wait_q, cmd->cmd_status !=
> 0xFF);
> +	wait_event_timeout(instance->abort_cmd_wait_q,
> +			   cmd->cmd_status != 0xFF,
> +			   MEGASAS_INTERNAL_CMD_WAIT_TIME * HZ);
>  	cmd->sync_cmd = 0;
> +	status = cmd->cmd_status;
> 
>  	megasas_return_cmd(instance, cmd);
> -	return 0;
> +	return status == 0xFF ? -ENODATA : 0;
>  }
> 
>  /**
> @@ -3902,6 +3908,7 @@ megasas_get_seq_num(struct megasas_instance
> *instance,
>  	struct megasas_dcmd_frame *dcmd;
>  	struct megasas_evt_log_info *el_info;
>  	dma_addr_t el_info_h = 0;
> +	int rc;
> 
>  	cmd = megasas_get_cmd(instance);
> 
> @@ -3933,23 +3940,23 @@ megasas_get_seq_num(struct
> megasas_instance *instance,
>  	dcmd->sgl.sge32[0].phys_addr = cpu_to_le32(el_info_h);
>  	dcmd->sgl.sge32[0].length = cpu_to_le32(sizeof(struct
> megasas_evt_log_info));
> 
> -	megasas_issue_blocked_cmd(instance, cmd);
> -
> -	/*
> -	 * Copy the data back into callers buffer
> -	 */
> -	eli->newest_seq_num = le32_to_cpu(el_info->newest_seq_num);
> -	eli->oldest_seq_num = le32_to_cpu(el_info->oldest_seq_num);
> -	eli->clear_seq_num = le32_to_cpu(el_info->clear_seq_num);
> -	eli->shutdown_seq_num = le32_to_cpu(el_info-
> >shutdown_seq_num);
> -	eli->boot_seq_num = le32_to_cpu(el_info->boot_seq_num);
> -
> +	rc = megasas_issue_blocked_cmd(instance, cmd);
> +	if (!rc) {
> +		/*
> +		 * Copy the data back into callers buffer
> +		 */
> +		eli->newest_seq_num = le32_to_cpu(el_info-
> >newest_seq_num);
> +		eli->oldest_seq_num = le32_to_cpu(el_info-
> >oldest_seq_num);
> +		eli->clear_seq_num = le32_to_cpu(el_info-
> >clear_seq_num);
> +		eli->shutdown_seq_num = le32_to_cpu(el_info-
> >shutdown_seq_num);
> +		eli->boot_seq_num = le32_to_cpu(el_info-
> >boot_seq_num);
> +	}
>  	pci_free_consistent(instance->pdev, sizeof(struct
> megasas_evt_log_info),
>  			    el_info, el_info_h);
> 
>  	megasas_return_cmd(instance, cmd);
> 
> -	return 0;
> +	return rc;
>  }
> 
>  /**
> @@ -5045,7 +5052,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance
> *instance,
>  	 * cmd to the SCSI mid-layer
>  	 */
>  	cmd->sync_cmd = 1;
> -	megasas_issue_blocked_cmd(instance, cmd);
> +	error = megasas_issue_blocked_cmd(instance, cmd);
>  	cmd->sync_cmd = 0;
> 
>  	/*
> --
> 1.7.12.4
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] megaraid_sas: Do not wait forever
  2014-01-24  7:46   ` Desai, Kashyap
@ 2014-01-24  8:24     ` Hannes Reinecke
  2014-01-24  8:34       ` Desai, Kashyap
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-24  8:24 UTC (permalink / raw)
  To: Desai, Kashyap, James Bottomley; +Cc: linux-scsi, Adam Radford, Saxena, Sumit

On 01/24/2014 08:46 AM, Desai, Kashyap wrote:
> Hannes:
> 
> We have already worked on "wait_event" usage in "megasas_issue_blocked_cmd".
> That code will be posted  by LSI once we received test result from
LSI Q/A team.
> 
> If you see the current OCR code in Linux Driver we do "re-send the IOCTL command".
> MR product does not want IOCTL timeout due to some reason. That is why even if
> FW faulted, Driver will do OCR and re-send all existing
<Management commands>
> (IOCTL comes under management commands).
> 
> Just for info. (see below snippet in  OCR code)
> 
> /* Re-fire management commands */
>                         for (j = 0 ; j < instance->max_fw_cmds; j++) {
>                                 cmd_fusion = fusion->cmd_list[j];
>                                 if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
>                                         cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
>                                         if (cmd_mfi->frame->dcmd.opcode == MR_DCMD_LD_MAP_GET_INFO) {
>                                                 megasas_return_cmd(instance, cmd_mfi);
>                                                 megasas_return_cmd_fusion(instance, cmd_fusion);
> 
> 
> 
> Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL path.
> [ I added timeout only for limited DCMDs, which are harmless to
continue after timeout ]
> 
> As of now, you can skip this patch and we will be submitting patch to fix similar issue.
> But note, we cannot add complete "wait_event_timeout" due to day-1 design, but will
> try to cover wait_event_timout for some valid cases.
> 
Ouch.

The reason I sent this patch is that I've got an Intel box here,
which blocks megaraid_sas initialisation when the IOMMU is turned on:

[   21.867264] megasas: io_request_frames ffff880800f50000
[   21.867363] megasas: init frame 00000000fff57000
[   22.223234] megasas: frame status 00
[   22.223235] megasas: IOC Init cmd success
[   22.223282] megasas: ld map ffff88080b600000
[   22.223289] megasas: issue dcmd 05 opcode 300e101
[   22.244184] dmar: DRHD: handling fault status reg 2
[   22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
addr 6980000
[   22.244186] DMAR:[fault reason 06] PTE Read access is not set
[   22.247223] megasas: frame status 00
[   22.247231] megasas: issue dcmd 05 opcode 300e101
[   22.247231] megasas: INIT adapter done
[   22.247237] megasas: pd list ffff88080cfd0000 size 8192
[   22.247237] megasas: issue dcmd 05 opcode 2010100
[   22.253516] dmar: DRHD: handling fault status reg 102
[   22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
addr e3f0000
[   22.253518] DMAR:[fault reason 05] PTE Write access is not set
[   22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
addr e3f0000
[   22.253521] DMAR:[fault reason 05] PTE Write access is not set
[   22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
addr e3f0000

[ Some more DMAR messages snipped ]

[   22.273199] dmar: DRHD: handling fault status reg 2
[   22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
addr 6cef000
[   22.273201] DMAR:[fault reason 06] PTE Read access is not set

[ .. ]

[   94.222456] megasas: frame status ff
[   94.240946] megasas: failed to get PD list

(I've inserted some debugging messages :-)

This is really weird. The 'write' faults do correspond with the
number of (megaraid) commands, reserved at the initial step.
(This is a 'Fury' card, btw).
What is more puzzling is that the INIT command and the initial
LD List command goes through, but the PD List command gets blocked.

Incidentally, this is not consistent; occasionally even the LD List
command gets blocked, and the DMAR messages occur earlier.

Anyway. Point is, if we cannot timout these initial commands
the megaraid_sas driver will be stuck during initialisation (as the
loop _never_ terminates).
Which in turn means that the modprobe command hangs indefinitely,
and you cannot even unload the module.
The only way to recover here is a reboot.
Nasty.

Hence the patch for the timeout; when this triggers the HBA is
pretty much hosed anyway, so the state of the firmware is pretty
much irrelevant here. But at least you can continue to boot.

(And OCR doesn't work at this point, neither. But that's a different
story).

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/6] megaraid_sas: Do not wait forever
  2014-01-24  8:24     ` Hannes Reinecke
@ 2014-01-24  8:34       ` Desai, Kashyap
  2014-01-24 10:04         ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Desai, Kashyap @ 2014-01-24  8:34 UTC (permalink / raw)
  To: Hannes Reinecke, James Bottomley; +Cc: linux-scsi, Adam Radford, Saxena, Sumit



> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Friday, January 24, 2014 1:54 PM
> To: Desai, Kashyap; James Bottomley
> Cc: linux-scsi@vger.kernel.org; Adam Radford; Saxena, Sumit
> Subject: Re: [PATCH 1/6] megaraid_sas: Do not wait forever
> 
> On 01/24/2014 08:46 AM, Desai, Kashyap wrote:
> > Hannes:
> >
> > We have already worked on "wait_event" usage in
> "megasas_issue_blocked_cmd".
> > That code will be posted  by LSI once we received test result from
> LSI Q/A team.
> >
> > If you see the current OCR code in Linux Driver we do "re-send the IOCTL
> command".
> > MR product does not want IOCTL timeout due to some reason. That is why
> > even if FW faulted, Driver will do OCR and re-send all existing
> <Management commands>
> > (IOCTL comes under management commands).
> >
> > Just for info. (see below snippet in  OCR code)
> >
> > /* Re-fire management commands */
> >                         for (j = 0 ; j < instance->max_fw_cmds; j++) {
> >                                 cmd_fusion = fusion->cmd_list[j];
> >                                 if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
> >                                         cmd_mfi = instance->cmd_list[cmd_fusion-
> >sync_cmd_idx];
> >                                         if (cmd_mfi->frame->dcmd.opcode ==
> MR_DCMD_LD_MAP_GET_INFO) {
> >                                                 megasas_return_cmd(instance, cmd_mfi);
> >
> > megasas_return_cmd_fusion(instance, cmd_fusion);
> >
> >
> >
> > Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL
> path.
> > [ I added timeout only for limited DCMDs, which are harmless to
> continue after timeout ]
> >
> > As of now, you can skip this patch and we will be submitting patch to fix
> similar issue.
> > But note, we cannot add complete "wait_event_timeout" due to day-1
> > design, but will try to cover wait_event_timout for some valid cases.
> >
> Ouch.
> 
> The reason I sent this patch is that I've got an Intel box here, which blocks
> megaraid_sas initialisation when the IOMMU is turned on:
> 
> [   21.867264] megasas: io_request_frames ffff880800f50000
> [   21.867363] megasas: init frame 00000000fff57000
> [   22.223234] megasas: frame status 00
> [   22.223235] megasas: IOC Init cmd success
> [   22.223282] megasas: ld map ffff88080b600000
> [   22.223289] megasas: issue dcmd 05 opcode 300e101
> [   22.244184] dmar: DRHD: handling fault status reg 2
> [   22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
> addr 6980000
> [   22.244186] DMAR:[fault reason 06] PTE Read access is not set
> [   22.247223] megasas: frame status 00
> [   22.247231] megasas: issue dcmd 05 opcode 300e101
> [   22.247231] megasas: INIT adapter done
> [   22.247237] megasas: pd list ffff88080cfd0000 size 8192
> [   22.247237] megasas: issue dcmd 05 opcode 2010100
> [   22.253516] dmar: DRHD: handling fault status reg 102
> [   22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
> addr e3f0000
> [   22.253518] DMAR:[fault reason 05] PTE Write access is not set
> [   22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
> addr e3f0000
> [   22.253521] DMAR:[fault reason 05] PTE Write access is not set
> [   22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
> addr e3f0000
> 
> [ Some more DMAR messages snipped ]
> 
> [   22.273199] dmar: DRHD: handling fault status reg 2
> [   22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
> addr 6cef000
> [   22.273201] DMAR:[fault reason 06] PTE Read access is not set
> 
> [ .. ]
> 
> [   94.222456] megasas: frame status ff
> [   94.240946] megasas: failed to get PD list
> 
> (I've inserted some debugging messages :-)
> 
> This is really weird. The 'write' faults do correspond with the number of
> (megaraid) commands, reserved at the initial step.
> (This is a 'Fury' card, btw).

Fury card has iMR FW and we have seen issue with iMR FW if IOMMU is ON, but not like driver load failure.
Is your OS driver behind Fury ? What is a Raid type used on your setup ?

Which system you are using ? 

> What is more puzzling is that the INIT command and the initial LD List
> command goes through, but the PD List command gets blocked.
> 
> Incidentally, this is not consistent; occasionally even the LD List command
> gets blocked, and the DMAR messages occur earlier.

LD command use megasas_issue_polled which is already timeout based mechanism.
Below are list of DCMD command which use infinite timeout.

megasas_get_seq_num
megasas_flush_cache
megasas_shutdown_controller
megasas_mgmt_fw_ioctl 


We can convert all DCMD except IOCTL with timeout value. For you " megasas_get_seq_num" might be hang in FW. It cannot be " megasas_get_ld_list".


> 
> Anyway. Point is, if we cannot timout these initial commands the
> megaraid_sas driver will be stuck during initialisation (as the loop _never_
> terminates).
> Which in turn means that the modprobe command hangs indefinitely, and
> you cannot even unload the module.
> The only way to recover here is a reboot.
> Nasty.
> 
> Hence the patch for the timeout; when this triggers the HBA is pretty much
> hosed anyway, so the state of the firmware is pretty much irrelevant here.
> But at least you can continue to boot.
> 
> (And OCR doesn't work at this point, neither. But that's a different story).
> 
> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info()
  2014-01-16 10:25 ` [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info() Hannes Reinecke
@ 2014-01-24  8:35   ` Desai, Kashyap
  0 siblings, 0 replies; 15+ messages in thread
From: Desai, Kashyap @ 2014-01-24  8:35 UTC (permalink / raw)
  To: Hannes Reinecke, James Bottomley; +Cc: linux-scsi, Adam Radford

Hannes,

This patch really not required, since MR controller can still operate even if ld map info is failed.
We just disable Fast Path IO if LD map info is failed, so we can still continue without ld map.

This patch is not required for megaraid_sas driver.


` Kashyap

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Thursday, January 16, 2014 3:56 PM
> To: James Bottomley
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke; Desai, Kashyap; Adam
> Radford
> Subject: [PATCH 4/6] megaraid_sas: catch errors from
> megasas_get_map_info()
> 
> megasas_get_map_info() might fail, after which it'll be pointless to call
> megasas_sync_map_info().
> So update megasas_get_map_info() to correctly handle errors and call
> megasas_sync_map_info() only if no error occurred.
> 
> Cc: Kashyap Desai <kashyap.desai@lsi.com>
> Cc: Adam Radford <aradford@gmail.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c   |  5 ++--
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 41 ++++++++++++++++------
> -------
>  2 files changed, 26 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 95d4e5c..d17a097 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -4689,8 +4689,9 @@ megasas_resume(struct pci_dev *pdev)
>  			megasas_free_cmds_fusion(instance);
>  			goto fail_init_mfi;
>  		}
> -		if (!megasas_get_map_info(instance))
> -			megasas_sync_map_info(instance);
> +		if (megasas_get_map_info(instance) < 0)
> +			goto fail_init_mfi;
> +		megasas_sync_map_info(instance);
>  	}
>  	break;
>  	default:
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index 5c30f9d..be6de80 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -773,15 +773,16 @@ u8
>  megasas_get_map_info(struct megasas_instance *instance)  {
>  	struct fusion_context *fusion = instance->ctrl_context;
> +	int rc;
> 
>  	fusion->fast_path_io = 0;
> -	if (!megasas_get_ld_map_info(instance)) {
> -		if (MR_ValidateMapInfo(instance)) {
> -			fusion->fast_path_io = 1;
> -			return 0;
> -		}
> -	}
> -	return 1;
> +	rc = megasas_get_ld_map_info(instance);
> +	if (rc < 0)
> +		return rc;
> +
> +	if (MR_ValidateMapInfo(instance))
> +		fusion->fast_path_io = 1;
> +	return 0;
>  }
> 
>  /*
> @@ -807,6 +808,14 @@ megasas_sync_map_info(struct megasas_instance
> *instance)
>  	dma_addr_t ci_h = 0;
>  	u32 size_map_info;
> 
> +	fusion = instance->ctrl_context;
> +	if (!fusion)
> +		return -ENXIO;
> +
> +	if (!fusion->fast_path_io)
> +		return 0;
> +
> +
>  	cmd = megasas_get_cmd(instance);
> 
>  	if (!cmd) {
> @@ -815,13 +824,6 @@ megasas_sync_map_info(struct megasas_instance
> *instance)
>  		return -ENOMEM;
>  	}
> 
> -	fusion = instance->ctrl_context;
> -
> -	if (!fusion) {
> -		megasas_return_cmd(instance, cmd);
> -		return 1;
> -	}
> -
>  	map = fusion->ld_map[instance->map_id & 1];
> 
>  	num_lds = le32_to_cpu(map->raidMap.ldCount);
> @@ -1030,8 +1032,10 @@ megasas_init_adapter_fusion(struct
> megasas_instance *instance)
>  		}
>  	}
> 
> -	if (!megasas_get_map_info(instance))
> -		megasas_sync_map_info(instance);
> +	if (megasas_get_map_info(instance) < 0)
> +		goto fail_map_info;
> +
> +	megasas_sync_map_info(instance);
> 
>  	return 0;
> 
> @@ -2457,8 +2461,9 @@ int megasas_reset_fusion(struct Scsi_Host *shost)
>  			       sizeof(struct LD_LOAD_BALANCE_INFO)
>  			       *MAX_LOGICAL_DRIVES);
> 
> -			if (!megasas_get_map_info(instance))
> -				megasas_sync_map_info(instance);
> +			if (megasas_get_map_info(instance) < 0)
> +				break;
> +			megasas_sync_map_info(instance);
> 
>  			/* Adapter reset completed successfully */
>  			printk(KERN_WARNING "megaraid_sas: Reset "
> --
> 1.7.12.4
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list()
  2014-01-16 10:25 ` [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list() Hannes Reinecke
@ 2014-01-24  8:38   ` Desai, Kashyap
  0 siblings, 0 replies; 15+ messages in thread
From: Desai, Kashyap @ 2014-01-24  8:38 UTC (permalink / raw)
  To: Hannes Reinecke, James Bottomley; +Cc: linux-scsi, Adam Radford, Saxena, Sumit

ACK this patch.

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Thursday, January 16, 2014 3:56 PM
> To: James Bottomley
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke; Desai, Kashyap; Adam
> Radford
> Subject: [PATCH 6/6] megaraid_sas: check return value for
> megasas_get_pd_list()
> 
> When megasas_get_pd_list() fails we cannot detect any drives, so we should
> be checking the return value accordingly.
> 
> Cc: Kashyap Desai <kashyap.desai@lsi.com>
> Cc: Adam Radford <aradford@gmail.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index d17a097..6b4c4b7 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -3769,7 +3769,10 @@ static int megasas_init_fw(struct
> megasas_instance *instance)
> 
>  	memset(instance->pd_list, 0 ,
>  		(MEGASAS_MAX_PD * sizeof(struct megasas_pd_list)));
> -	megasas_get_pd_list(instance);
> +	if (megasas_get_pd_list(instance) < 0) {
> +		printk(KERN_ERR "megasas: failed to get PD list\n");
> +		goto fail_init_adapter;
> +	}
> 
>  	memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);
>  	if (megasas_ld_list_query(instance,
> @@ -5600,7 +5603,7 @@ megasas_aen_polling(struct work_struct *work)
> 
>  	if (doscan) {
>  		printk(KERN_INFO "scanning ...\n");
> -		megasas_get_pd_list(instance);
> +		if (megasas_get_pd_list(instance) == 0) {
>  		for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
>  			for (j = 0; j < MEGASAS_MAX_DEV_PER_CHANNEL;
> j++) {
>  				pd_index =
> i*MEGASAS_MAX_DEV_PER_CHANNEL + j; @@ -5620,6 +5623,7 @@
> megasas_aen_polling(struct work_struct *work)
>  				}
>  			}
>  		}
> +		}
> 
>  		if (megasas_ld_list_query(instance,
> 
> MR_LD_QUERY_TYPE_EXPOSED_TO_HOST))
> --
> 1.7.12.4
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer
  2014-01-16 10:25 ` [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer Hannes Reinecke
@ 2014-01-24  8:41   ` Desai, Kashyap
  0 siblings, 0 replies; 15+ messages in thread
From: Desai, Kashyap @ 2014-01-24  8:41 UTC (permalink / raw)
  To: Hannes Reinecke, James Bottomley; +Cc: linux-scsi, Adam Radford

ACK  this patch.

` Kashyap

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Thursday, January 16, 2014 3:56 PM
> To: James Bottomley
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke; Desai, Kashyap; Adam
> Radford
> Subject: [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer
> 
> The pointer to the queue info structure is potentially a 64-bit value, so we
> should be using the correct macros to set the values in the init frame.
> 
> Cc: Kashyap Desai <kashyap.desai@lsi.com>
> Cc: Adam Radford <aradford@gmail.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index 0a588a6..5c30f9d 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -644,7 +644,10 @@ megasas_ioc_init_fusion(struct megasas_instance
> *instance)
>  	/* Convert capability to LE32 */
>  	cpu_to_le32s((u32 *)&init_frame-
> >driver_operations.mfi_capabilities);
> 
> -	init_frame->queue_info_new_phys_addr_lo =
> cpu_to_le32((u32)ioc_init_handle);
> +	init_frame->queue_info_new_phys_addr_hi =
> +		cpu_to_le32(upper_32_bits(ioc_init_handle));
> +	init_frame->queue_info_new_phys_addr_lo =
> +		cpu_to_le32(lower_32_bits(ioc_init_handle));
>  	init_frame->data_xfer_len = cpu_to_le32(sizeof(struct
> MPI2_IOC_INIT_REQUEST));
> 
>  	req_desc =
> --
> 1.7.12.4
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info()
  2014-01-16 10:25 ` [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info() Hannes Reinecke
@ 2014-01-24  8:45   ` Desai, Kashyap
  0 siblings, 0 replies; 15+ messages in thread
From: Desai, Kashyap @ 2014-01-24  8:45 UTC (permalink / raw)
  To: Hannes Reinecke, James Bottomley; +Cc: linux-scsi, Adam Radford

ACK this patch.

` Kashyap

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Thursday, January 16, 2014 3:56 PM
> To: James Bottomley
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke; Desai, Kashyap; Adam
> Radford
> Subject: [PATCH 5/6] megaraid_sas_fusion: Return correct error value in
> megasas_get_ld_map_info()
> 
> When no HBA is found we should be returning '-ENXIO' to be consistent with
> the other return values.
> 
> Cc: Kashyap Desai <kashyap.desai@lsi.com>
> Cc: Adam Radford <aradford@gmail.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index be6de80..c2dc033 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -726,7 +726,7 @@ megasas_get_ld_map_info(struct megasas_instance
> *instance)
> 
>  	if (!fusion) {
>  		megasas_return_cmd(instance, cmd);
> -		return 1;
> +		return -ENXIO;
>  	}
> 
>  	dcmd = &cmd->frame->dcmd;
> --
> 1.7.12.4
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] megaraid_sas: Do not wait forever
  2014-01-24  8:34       ` Desai, Kashyap
@ 2014-01-24 10:04         ` Hannes Reinecke
  0 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2014-01-24 10:04 UTC (permalink / raw)
  To: Desai, Kashyap, James Bottomley; +Cc: linux-scsi, Adam Radford, Saxena, Sumit

On 01/24/2014 09:34 AM, Desai, Kashyap wrote:
> 
> 
>> -----Original Message-----
>> From: Hannes Reinecke [mailto:hare@suse.de]
>> Sent: Friday, January 24, 2014 1:54 PM
>> To: Desai, Kashyap; James Bottomley
>> Cc: linux-scsi@vger.kernel.org; Adam Radford; Saxena, Sumit
>> Subject: Re: [PATCH 1/6] megaraid_sas: Do not wait forever
>>
>> On 01/24/2014 08:46 AM, Desai, Kashyap wrote:
>>> Hannes:
>>>
>>> We have already worked on "wait_event" usage in
>> "megasas_issue_blocked_cmd".
>>> That code will be posted  by LSI once we received test result from
>> LSI Q/A team.
>>>
>>> If you see the current OCR code in Linux Driver we do "re-send the IOCTL
>> command".
>>> MR product does not want IOCTL timeout due to some reason. That is why
>>> even if FW faulted, Driver will do OCR and re-send all existing
>> <Management commands>
>>> (IOCTL comes under management commands).
>>>
>>> Just for info. (see below snippet in  OCR code)
>>>
>>> /* Re-fire management commands */
>>>                         for (j = 0 ; j < instance->max_fw_cmds; j++) {
>>>                                 cmd_fusion = fusion->cmd_list[j];
>>>                                 if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
>>>                                         cmd_mfi = instance->cmd_list[cmd_fusion-
>>> sync_cmd_idx];
>>>                                         if (cmd_mfi->frame->dcmd.opcode ==
>> MR_DCMD_LD_MAP_GET_INFO) {
>>>                                                 megasas_return_cmd(instance, cmd_mfi);
>>>
>>> megasas_return_cmd_fusion(instance, cmd_fusion);
>>>
>>>
>>>
>>> Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL
>> path.
>>> [ I added timeout only for limited DCMDs, which are harmless to
>> continue after timeout ]
>>>
>>> As of now, you can skip this patch and we will be submitting patch to fix
>> similar issue.
>>> But note, we cannot add complete "wait_event_timeout" due to day-1
>>> design, but will try to cover wait_event_timout for some valid cases.
>>>
>> Ouch.
>>
>> The reason I sent this patch is that I've got an Intel box here, which blocks
>> megaraid_sas initialisation when the IOMMU is turned on:
>>
>> [   21.867264] megasas: io_request_frames ffff880800f50000
>> [   21.867363] megasas: init frame 00000000fff57000
>> [   22.223234] megasas: frame status 00
>> [   22.223235] megasas: IOC Init cmd success
>> [   22.223282] megasas: ld map ffff88080b600000
>> [   22.223289] megasas: issue dcmd 05 opcode 300e101
>> [   22.244184] dmar: DRHD: handling fault status reg 2
>> [   22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
>> addr 6980000
>> [   22.244186] DMAR:[fault reason 06] PTE Read access is not set
>> [   22.247223] megasas: frame status 00
>> [   22.247231] megasas: issue dcmd 05 opcode 300e101
>> [   22.247231] megasas: INIT adapter done
>> [   22.247237] megasas: pd list ffff88080cfd0000 size 8192
>> [   22.247237] megasas: issue dcmd 05 opcode 2010100
>> [   22.253516] dmar: DRHD: handling fault status reg 102
>> [   22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
>> addr e3f0000
>> [   22.253518] DMAR:[fault reason 05] PTE Write access is not set
>> [   22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
>> addr e3f0000
>> [   22.253521] DMAR:[fault reason 05] PTE Write access is not set
>> [   22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
>> addr e3f0000
>>
>> [ Some more DMAR messages snipped ]
>>
>> [   22.273199] dmar: DRHD: handling fault status reg 2
>> [   22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
>> addr 6cef000
>> [   22.273201] DMAR:[fault reason 06] PTE Read access is not set
>>
>> [ .. ]
>>
>> [   94.222456] megasas: frame status ff
>> [   94.240946] megasas: failed to get PD list
>>
>> (I've inserted some debugging messages :-)
>>
>> This is really weird. The 'write' faults do correspond with the number of
>> (megaraid) commands, reserved at the initial step.
>> (This is a 'Fury' card, btw).
> 
> Fury card has iMR FW and we have seen issue with iMR FW if IOMMU is ON, but not like driver load failure.
> Is your OS driver behind Fury ? What is a Raid type used on your setup ?
> 
It's SLES12 (alpha), basically plain 3.13

> Which system you are using ? 
> 
Pre-production hardware, admittedly. So it _might_ be a BIOS issue.

>> What is more puzzling is that the INIT command and the initial LD List
>> command goes through, but the PD List command gets blocked.
>>
>> Incidentally, this is not consistent; occasionally even the LD List command
>> gets blocked, and the DMAR messages occur earlier.
> 
> LD command use megasas_issue_polled which is already timeout based mechanism.
> Below are list of DCMD command which use infinite timeout.
> 
> megasas_get_seq_num
> megasas_flush_cache
> megasas_shutdown_controller
> megasas_mgmt_fw_ioctl 
> 
> 
> We can convert all DCMD except IOCTL with timeout value. For you " megasas_get_seq_num"
> might be hang in FW. It cannot be " megasas_get_ld_list".
> 
Ahh. Okay, will try be modify megasas_get_seq_num() and see if that
works.

Cheers,

Hannes
P.S.: I've also send an earlier patch named 'megaraid_sas: disable
controller reset for PPC' to linux-scsi. Care to review it, too?
Thanks.
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-01-24 10:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
2014-01-16 10:25 ` [PATCH 1/6] megaraid_sas: Do not wait forever Hannes Reinecke
2014-01-24  7:46   ` Desai, Kashyap
2014-01-24  8:24     ` Hannes Reinecke
2014-01-24  8:34       ` Desai, Kashyap
2014-01-24 10:04         ` Hannes Reinecke
2014-01-16 10:25 ` [PATCH 2/6] megaraid_sas_fusion: Fixup fire_cmd syntax Hannes Reinecke
2014-01-16 10:25 ` [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer Hannes Reinecke
2014-01-24  8:41   ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info() Hannes Reinecke
2014-01-24  8:35   ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info() Hannes Reinecke
2014-01-24  8:45   ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list() Hannes Reinecke
2014-01-24  8:38   ` Desai, Kashyap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.