All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/4] scsi: ufs: Improve UFS error handling
@ 2013-07-23  7:08 Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 1/4] scsi: ufs: Fix broken task management command implementation Sujit Reddy Thumma
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Sujit Reddy Thumma @ 2013-07-23  7:08 UTC (permalink / raw)
  To: Vinayak Holikatti, Santosh Y
  Cc: James E.J. Bottomley, linux-scsi, Sujit Reddy Thumma, linux-arm-msm

The first patch fixes many issues with current task management handling
in UFSHCD driver. Others improve error handling in various scenarios.

These patches depends on:
[PATCH V4 1/2] scsi: ufs: Add support for sending NOP OUT UPIU
[PATCH V4 2/2] scsi: ufs: Set fDeviceInit flag to initiate device initialization
[PATCH V4 1/2] scsi: ufs: Add support for host assisted background operations
[PATCH V4 2/2] scsi: ufs: Add runtime PM support for UFS host controller driver

Changes from v3:
	- Rebased.
Changes from v2:
	- [PATCH V3 1/4]: Make the task management command task tag unique
	  across SCSI/NOP/QUERY request tags.
	- [PATCH V3 3/4]: While handling device/host reset, wait for
	  pending fatal handler to return if running.
Changes from v1:
	- [PATCH V2 1/4]: Fix a race condition because of overloading
	  outstanding_tasks variable to lock the slots. A new variable
	  tm_slots_in_use will track which slots are in use by the driver.
	- [PATCH V2 2/4]: Commit text update to clarify the hardware race
	  with more details.
	- [PATCH V2 3/4]: Minor cleanup and rebase
	- [PATCH V2 4/4]: Fix a bug - sleeping in atomic context


Sujit Reddy Thumma (4):
  scsi: ufs: Fix broken task management command implementation
  scsi: ufs: Fix hardware race conditions while aborting a command
  scsi: ufs: Fix device and host reset methods
  scsi: ufs: Improve UFS fatal error handling

 drivers/scsi/ufs/ufshcd.c | 1003 ++++++++++++++++++++++++++++++++++++---------
 drivers/scsi/ufs/ufshcd.h |   12 +-
 drivers/scsi/ufs/ufshci.h |   19 +-
 3 files changed, 841 insertions(+), 193 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V4 1/4] scsi: ufs: Fix broken task management command implementation
  2013-07-23  7:08 [PATCH V4 0/4] scsi: ufs: Improve UFS error handling Sujit Reddy Thumma
@ 2013-07-23  7:08 ` Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 2/4] scsi: ufs: Fix hardware race conditions while aborting a command Sujit Reddy Thumma
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Sujit Reddy Thumma @ 2013-07-23  7:08 UTC (permalink / raw)
  To: Vinayak Holikatti, Santosh Y
  Cc: James E.J. Bottomley, linux-scsi, Sujit Reddy Thumma, linux-arm-msm

Currently, sending Task Management (TM) command to the card might
be broken in some scenarios as listed below:

Problem: If there are more than 8 TM commands the implementation
         returns error to the caller.
Fix:     Wait for one of the slots to be emptied and send the command.

Problem: Sometimes it is necessary for the caller to know the TM service
         response code to determine the task status.
Fix:     Propogate the service response to the caller.

Problem: If the TM command times out no proper error recovery is
         implemented.
Fix:     Clear the command in the controller door-bell register, so that
         further commands for the same slot don't fail.

Problem: While preparing the TM command descriptor, the task tag used
         should be unique across SCSI/NOP/QUERY/TM commands and not the
	 task tag of the command which the TM command is trying to manage.
Fix:     Use a unique task tag instead of task tag of SCSI command.

Problem: Since the TM command involves H/W communication, abruptly ending
         the request on kill interrupt signal might cause h/w malfunction.
Fix:     Wait for hardware completion interrupt with TASK_UNINTERRUPTIBLE
         set.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c |  174 ++++++++++++++++++++++++++++++---------------
 drivers/scsi/ufs/ufshcd.h |    8 ++-
 2 files changed, 123 insertions(+), 59 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1f2caa0..1d7e027 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -53,6 +53,9 @@
 /* Query request timeout */
 #define QUERY_REQ_TIMEOUT 30 /* msec */
 
+/* Task management command timeout */
+#define TM_CMD_TIMEOUT	100 /* msecs */
+
 /* Expose the flag value from utp_upiu_query.value */
 #define MASK_QUERY_UPIU_FLAG_LOC 0xFF
 
@@ -183,13 +186,35 @@ ufshcd_get_tmr_ocs(struct utp_task_req_desc *task_req_descp)
 /**
  * ufshcd_get_tm_free_slot - get a free slot for task management request
  * @hba: per adapter instance
+ * @free_slot: pointer to variable with available slot value
  *
- * Returns maximum number of task management request slots in case of
- * task management queue full or returns the free slot number
+ * Get a free tag and lock it until ufshcd_put_tm_slot() is called.
+ * Returns 0 if free slot is not available, else return 1 with tag value
+ * in @free_slot.
  */
-static inline int ufshcd_get_tm_free_slot(struct ufs_hba *hba)
+static bool ufshcd_get_tm_free_slot(struct ufs_hba *hba, int *free_slot)
 {
-	return find_first_zero_bit(&hba->outstanding_tasks, hba->nutmrs);
+	int tag;
+	bool ret = false;
+
+	if (!free_slot)
+		goto out;
+
+	do {
+		tag = find_first_zero_bit(&hba->tm_slots_in_use, hba->nutmrs);
+		if (tag >= hba->nutmrs)
+			goto out;
+	} while (test_and_set_bit_lock(tag, &hba->tm_slots_in_use));
+
+	*free_slot = tag;
+	ret = true;
+out:
+	return ret;
+}
+
+static inline void ufshcd_put_tm_slot(struct ufs_hba *hba, int slot)
+{
+	clear_bit_unlock(slot, &hba->tm_slots_in_use);
 }
 
 /**
@@ -1700,10 +1725,11 @@ static void ufshcd_slave_destroy(struct scsi_device *sdev)
  * ufshcd_task_req_compl - handle task management request completion
  * @hba: per adapter instance
  * @index: index of the completed request
+ * @resp: task management service response
  *
- * Returns SUCCESS/FAILED
+ * Returns non-zero value on error, zero on success
  */
-static int ufshcd_task_req_compl(struct ufs_hba *hba, u32 index)
+static int ufshcd_task_req_compl(struct ufs_hba *hba, u32 index, u8 *resp)
 {
 	struct utp_task_req_desc *task_req_descp;
 	struct utp_upiu_task_rsp *task_rsp_upiup;
@@ -1724,19 +1750,15 @@ static int ufshcd_task_req_compl(struct ufs_hba *hba, u32 index)
 				task_req_descp[index].task_rsp_upiu;
 		task_result = be32_to_cpu(task_rsp_upiup->header.dword_1);
 		task_result = ((task_result & MASK_TASK_RESPONSE) >> 8);
-
-		if (task_result != UPIU_TASK_MANAGEMENT_FUNC_COMPL &&
-		    task_result != UPIU_TASK_MANAGEMENT_FUNC_SUCCEEDED)
-			task_result = FAILED;
-		else
-			task_result = SUCCESS;
+		if (resp)
+			*resp = (u8)task_result;
 	} else {
-		task_result = FAILED;
-		dev_err(hba->dev,
-			"trc: Invalid ocs = %x\n", ocs_value);
+		dev_err(hba->dev, "%s: failed, ocs = 0x%x\n",
+				__func__, ocs_value);
 	}
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	return task_result;
+
+	return ocs_value;
 }
 
 /**
@@ -2237,7 +2259,7 @@ static void ufshcd_tmc_handler(struct ufs_hba *hba)
 
 	tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
 	hba->tm_condition = tm_doorbell ^ hba->outstanding_tasks;
-	wake_up_interruptible(&hba->ufshcd_tm_wait_queue);
+	wake_up(&hba->tm_wq);
 }
 
 /**
@@ -2287,38 +2309,58 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 	return retval;
 }
 
+static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag)
+{
+	int err = 0;
+	u32 mask = 1 << tag;
+	unsigned long flags;
+
+	if (!test_bit(tag, &hba->outstanding_reqs))
+		goto out;
+
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	ufshcd_writel(hba, ~(1 << tag), REG_UTP_TASK_REQ_LIST_CLEAR);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+	/* poll for max. 1 sec to clear door bell register by h/w */
+	err = ufshcd_wait_for_register(hba,
+			REG_UTP_TASK_REQ_DOOR_BELL,
+			mask, 0, 1000, 1000);
+out:
+	return err;
+}
+
 /**
  * ufshcd_issue_tm_cmd - issues task management commands to controller
  * @hba: per adapter instance
- * @lrbp: pointer to local reference block
+ * @lun_id: LUN ID to which TM command is sent
+ * @task_id: task ID to which the TM command is applicable
+ * @tm_function: task management function opcode
+ * @tm_response: task management service response return value
  *
- * Returns SUCCESS/FAILED
+ * Returns non-zero value on error, zero on success.
  */
-static int
-ufshcd_issue_tm_cmd(struct ufs_hba *hba,
-		    struct ufshcd_lrb *lrbp,
-		    u8 tm_function)
+static int ufshcd_issue_tm_cmd(struct ufs_hba *hba, int lun_id, int task_id,
+		u8 tm_function, u8 *tm_response)
 {
 	struct utp_task_req_desc *task_req_descp;
 	struct utp_upiu_task_req *task_req_upiup;
 	struct Scsi_Host *host;
 	unsigned long flags;
-	int free_slot = 0;
+	int free_slot;
 	int err;
+	int task_tag;
 
 	host = hba->host;
 
-	spin_lock_irqsave(host->host_lock, flags);
-
-	/* If task management queue is full */
-	free_slot = ufshcd_get_tm_free_slot(hba);
-	if (free_slot >= hba->nutmrs) {
-		spin_unlock_irqrestore(host->host_lock, flags);
-		dev_err(hba->dev, "Task management queue full\n");
-		err = FAILED;
-		goto out;
-	}
+	/*
+	 * Get free slot, sleep if slots are unavailable.
+	 * Even though we use wait_event() which sleeps indefinitely,
+	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
+	 */
+	wait_event(hba->tm_tag_wq, ufshcd_get_tm_free_slot(hba, &free_slot));
 
+	spin_lock_irqsave(host->host_lock, flags);
 	task_req_descp = hba->utmrdl_base_addr;
 	task_req_descp += free_slot;
 
@@ -2330,18 +2372,15 @@ ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	/* Configure task request UPIU */
 	task_req_upiup =
 		(struct utp_upiu_task_req *) task_req_descp->task_req_upiu;
+	task_tag = hba->nutrs + free_slot;
 	task_req_upiup->header.dword_0 =
 		UPIU_HEADER_DWORD(UPIU_TRANSACTION_TASK_REQ, 0,
-					      lrbp->lun, lrbp->task_tag);
+				lun_id, task_tag);
 	task_req_upiup->header.dword_1 =
 		UPIU_HEADER_DWORD(0, tm_function, 0, 0);
 
-	task_req_upiup->input_param1 = lrbp->lun;
-	task_req_upiup->input_param1 =
-		cpu_to_be32(task_req_upiup->input_param1);
-	task_req_upiup->input_param2 = lrbp->task_tag;
-	task_req_upiup->input_param2 =
-		cpu_to_be32(task_req_upiup->input_param2);
+	task_req_upiup->input_param1 = cpu_to_be32(lun_id);
+	task_req_upiup->input_param2 = cpu_to_be32(task_id);
 
 	/* send command to the controller */
 	__set_bit(free_slot, &hba->outstanding_tasks);
@@ -2350,20 +2389,24 @@ ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	spin_unlock_irqrestore(host->host_lock, flags);
 
 	/* wait until the task management command is completed */
-	err =
-	wait_event_interruptible_timeout(hba->ufshcd_tm_wait_queue,
-					 (test_bit(free_slot,
-					 &hba->tm_condition) != 0),
-					 60 * HZ);
+	err = wait_event_timeout(hba->tm_wq,
+			test_bit(free_slot, &hba->tm_condition),
+			msecs_to_jiffies(TM_CMD_TIMEOUT));
 	if (!err) {
-		dev_err(hba->dev,
-			"Task management command timed-out\n");
-		err = FAILED;
-		goto out;
+		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
+				__func__, tm_function);
+		if (ufshcd_clear_tm_cmd(hba, free_slot))
+			dev_WARN(hba->dev, "%s: unable clear tm cmd (slot %d) after timeout\n",
+					__func__, free_slot);
+		err = -ETIMEDOUT;
+	} else {
+		err = ufshcd_task_req_compl(hba, free_slot, tm_response);
 	}
+
 	clear_bit(free_slot, &hba->tm_condition);
-	err = ufshcd_task_req_compl(hba, free_slot);
-out:
+	ufshcd_put_tm_slot(hba, free_slot);
+	wake_up(&hba->tm_tag_wq);
+
 	return err;
 }
 
@@ -2380,14 +2423,22 @@ static int ufshcd_device_reset(struct scsi_cmnd *cmd)
 	unsigned int tag;
 	u32 pos;
 	int err;
+	u8 resp;
+	struct ufshcd_lrb *lrbp;
 
 	host = cmd->device->host;
 	hba = shost_priv(host);
 	tag = cmd->request->tag;
 
-	err = ufshcd_issue_tm_cmd(hba, &hba->lrb[tag], UFS_LOGICAL_RESET);
-	if (err == FAILED)
+	lrbp = &hba->lrb[tag];
+	err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,
+			UFS_LOGICAL_RESET, &resp);
+	if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
+		err = FAILED;
 		goto out;
+	} else {
+		err = SUCCESS;
+	}
 
 	for (pos = 0; pos < hba->nutrs; pos++) {
 		if (test_bit(pos, &hba->outstanding_reqs) &&
@@ -2444,6 +2495,8 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	unsigned long flags;
 	unsigned int tag;
 	int err;
+	u8 resp;
+	struct ufshcd_lrb *lrbp;
 
 	host = cmd->device->host;
 	hba = shost_priv(host);
@@ -2459,9 +2512,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	}
 	spin_unlock_irqrestore(host->host_lock, flags);
 
-	err = ufshcd_issue_tm_cmd(hba, &hba->lrb[tag], UFS_ABORT_TASK);
-	if (err == FAILED)
+	lrbp = &hba->lrb[tag];
+	err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,
+			UFS_ABORT_TASK, &resp);
+	if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
+		err = FAILED;
 		goto out;
+	} else {
+		err = SUCCESS;
+	}
 
 	scsi_dma_unmap(cmd);
 
@@ -2682,7 +2741,8 @@ int ufshcd_init(struct device *dev, struct ufs_hba **hba_handle,
 	host->max_cmd_len = MAX_CDB_SIZE;
 
 	/* Initailize wait queue for task management */
-	init_waitqueue_head(&hba->ufshcd_tm_wait_queue);
+	init_waitqueue_head(&hba->tm_wq);
+	init_waitqueue_head(&hba->tm_tag_wq);
 
 	/* Initialize work queues */
 	INIT_WORK(&hba->feh_workq, ufshcd_fatal_err_handler);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 59c9c48..fe7c947 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -174,8 +174,10 @@ struct ufs_dev_cmd {
  * @irq: Irq number of the controller
  * @active_uic_cmd: handle of active UIC command
  * @uic_cmd_mutex: mutex for uic command
- * @ufshcd_tm_wait_queue: wait queue for task management
+ * @tm_wq: wait queue for task management
+ * @tm_tag_wq: wait queue for free task management slots
  * @tm_condition: condition variable for task management
+ * @tm_slots_in_use: bit map of task management request slots in use
  * @ufshcd_state: UFSHCD states
  * @intr_mask: Interrupt Mask Bits
  * @ee_ctrl_mask: Exception event control mask
@@ -216,8 +218,10 @@ struct ufs_hba {
 	struct uic_command *active_uic_cmd;
 	struct mutex uic_cmd_mutex;
 
-	wait_queue_head_t ufshcd_tm_wait_queue;
+	wait_queue_head_t tm_wq;
+	wait_queue_head_t tm_tag_wq;
 	unsigned long tm_condition;
+	unsigned long tm_slots_in_use;
 
 	u32 ufshcd_state;
 	u32 intr_mask;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V4 2/4] scsi: ufs: Fix hardware race conditions while aborting a command
  2013-07-23  7:08 [PATCH V4 0/4] scsi: ufs: Improve UFS error handling Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 1/4] scsi: ufs: Fix broken task management command implementation Sujit Reddy Thumma
@ 2013-07-23  7:08 ` Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 3/4] scsi: ufs: Fix device and host reset methods Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 4/4] scsi: ufs: Improve UFS fatal error handling Sujit Reddy Thumma
  3 siblings, 0 replies; 5+ messages in thread
From: Sujit Reddy Thumma @ 2013-07-23  7:08 UTC (permalink / raw)
  To: Vinayak Holikatti, Santosh Y
  Cc: James E.J. Bottomley, linux-scsi, Sujit Reddy Thumma, linux-arm-msm

There is a possible race condition in the hardware when the abort
command is issued to terminate the ongoing SCSI command as described
below:

- A bit in the door-bell register is set in the controller for a
  new SCSI command.
- In some rare situations, before controller get a chance to issue
  the command to the device, the software issued an abort command.
- If the device recieves abort command first then it returns success
  because the command itself is not present.
- Now if the controller commits the command to device it will be
  processed.
- Software thinks that command is aborted and proceed while still
  the device is processing it.
- The software, controller and device may go out of sync because of
  this race condition.

To avoid this, query task presence in the device before sending abort
task command so that after the abort operation, the command is guaranteed
to be non-existent in both controller and the device.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c |   70 +++++++++++++++++++++++++++++++++++---------
 1 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1d7e027..3f80396 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2486,6 +2486,12 @@ static int ufshcd_host_reset(struct scsi_cmnd *cmd)
  * ufshcd_abort - abort a specific command
  * @cmd: SCSI command pointer
  *
+ * Abort the pending command in device by sending UFS_ABORT_TASK task management
+ * command, and in host controller by clearing the door-bell register. There can
+ * be race between controller sending the command to the device while abort is
+ * issued. To avoid that, first issue UFS_QUERY_TASK to check if the command is
+ * really issued and then try to abort it.
+ *
  * Returns SUCCESS/FAILED
  */
 static int ufshcd_abort(struct scsi_cmnd *cmd)
@@ -2494,7 +2500,8 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	struct ufs_hba *hba;
 	unsigned long flags;
 	unsigned int tag;
-	int err;
+	int err = 0;
+	int poll_cnt;
 	u8 resp;
 	struct ufshcd_lrb *lrbp;
 
@@ -2502,33 +2509,59 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	hba = shost_priv(host);
 	tag = cmd->request->tag;
 
-	spin_lock_irqsave(host->host_lock, flags);
+	/* If command is already aborted/completed, return SUCCESS */
+	if (!(test_bit(tag, &hba->outstanding_reqs)))
+		goto out;
 
-	/* check if command is still pending */
-	if (!(test_bit(tag, &hba->outstanding_reqs))) {
-		err = FAILED;
-		spin_unlock_irqrestore(host->host_lock, flags);
+	lrbp = &hba->lrb[tag];
+	for (poll_cnt = 100; poll_cnt; poll_cnt--) {
+		err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,
+				UFS_QUERY_TASK, &resp);
+		if (!err && resp == UPIU_TASK_MANAGEMENT_FUNC_SUCCEEDED) {
+			/* cmd pending in the device */
+			break;
+		} else if (!err && resp == UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
+			u32 reg;
+
+			/*
+			 * cmd not pending in the device, check if it is
+			 * in transition.
+			 */
+			reg = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+			if (reg & (1 << tag)) {
+				/* sleep for max. 2ms to stabilize */
+				usleep_range(1000, 2000);
+				continue;
+			}
+			/* command completed already */
+			goto out;
+		} else {
+			if (!err)
+				err = resp; /* service response error */
+			goto out;
+		}
+	}
+
+	if (!poll_cnt) {
+		err = -EBUSY;
 		goto out;
 	}
-	spin_unlock_irqrestore(host->host_lock, flags);
 
-	lrbp = &hba->lrb[tag];
 	err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,
 			UFS_ABORT_TASK, &resp);
 	if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
-		err = FAILED;
+		if (!err)
+			err = resp; /* service response error */
 		goto out;
-	} else {
-		err = SUCCESS;
 	}
 
+	err = ufshcd_clear_cmd(hba, tag);
+	if (err)
+		goto out;
+
 	scsi_dma_unmap(cmd);
 
 	spin_lock_irqsave(host->host_lock, flags);
-
-	/* clear the respective UTRLCLR register bit */
-	ufshcd_utrl_clear(hba, tag);
-
 	__clear_bit(tag, &hba->outstanding_reqs);
 	hba->lrb[tag].cmd = NULL;
 	spin_unlock_irqrestore(host->host_lock, flags);
@@ -2536,6 +2569,13 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	clear_bit_unlock(tag, &hba->lrb_in_use);
 	wake_up(&hba->dev_cmd.tag_wq);
 out:
+	if (!err) {
+		err = SUCCESS;
+	} else {
+		dev_err(hba->dev, "%s: failed with err %d\n", __func__, err);
+		err = FAILED;
+	}
+
 	return err;
 }
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V4 3/4] scsi: ufs: Fix device and host reset methods
  2013-07-23  7:08 [PATCH V4 0/4] scsi: ufs: Improve UFS error handling Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 1/4] scsi: ufs: Fix broken task management command implementation Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 2/4] scsi: ufs: Fix hardware race conditions while aborting a command Sujit Reddy Thumma
@ 2013-07-23  7:08 ` Sujit Reddy Thumma
  2013-07-23  7:08 ` [PATCH V4 4/4] scsi: ufs: Improve UFS fatal error handling Sujit Reddy Thumma
  3 siblings, 0 replies; 5+ messages in thread
From: Sujit Reddy Thumma @ 2013-07-23  7:08 UTC (permalink / raw)
  To: Vinayak Holikatti, Santosh Y
  Cc: James E.J. Bottomley, linux-scsi, Sujit Reddy Thumma, linux-arm-msm

As of now SCSI initiated error handling is broken because,
the reset APIs don't try to bring back the device initialized and
ready for further transfers.

In case of timeouts, the scsi error handler takes care of handling aborts
and resets. Improve the error handling in such scenario by resetting the
device and host and re-initializing them in proper manner.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c |  437 +++++++++++++++++++++++++++++++++++++++------
 drivers/scsi/ufs/ufshcd.h |    2 +
 2 files changed, 381 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 3f80396..1a0ceb2 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -69,9 +69,15 @@ enum {
 
 /* UFSHCD states */
 enum {
-	UFSHCD_STATE_OPERATIONAL,
 	UFSHCD_STATE_RESET,
 	UFSHCD_STATE_ERROR,
+	UFSHCD_STATE_OPERATIONAL,
+};
+
+/* UFSHCD error handling flags */
+enum {
+	UFSHCD_EH_HOST_RESET_PENDING = (1 << 0),
+	UFSHCD_EH_DEVICE_RESET_PENDING = (1 << 1),
 };
 
 /* Interrupt configuration options */
@@ -87,6 +93,22 @@ enum {
 	INT_AGGR_CONFIG,
 };
 
+#define ufshcd_set_device_reset_pending(h) \
+	(h->eh_flags |= UFSHCD_EH_DEVICE_RESET_PENDING)
+#define ufshcd_set_host_reset_pending(h) \
+	(h->eh_flags |= UFSHCD_EH_HOST_RESET_PENDING)
+#define ufshcd_device_reset_pending(h) \
+	(h->eh_flags & UFSHCD_EH_DEVICE_RESET_PENDING)
+#define ufshcd_host_reset_pending(h) \
+	(h->eh_flags & UFSHCD_EH_HOST_RESET_PENDING)
+#define ufshcd_clear_device_reset_pending(h) \
+	(h->eh_flags &= ~UFSHCD_EH_DEVICE_RESET_PENDING)
+#define ufshcd_clear_host_reset_pending(h) \
+	(h->eh_flags &= ~UFSHCD_EH_HOST_RESET_PENDING)
+
+static void ufshcd_tmc_handler(struct ufs_hba *hba);
+static void ufshcd_async_scan(void *data, async_cookie_t cookie);
+
 /*
  * ufshcd_wait_for_register - wait for register value to change
  * @hba - per-adapter interface
@@ -840,9 +862,22 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 
 	tag = cmd->request->tag;
 
-	if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
+	switch (hba->ufshcd_state) {
+	case UFSHCD_STATE_OPERATIONAL:
+		break;
+	case UFSHCD_STATE_RESET:
 		err = SCSI_MLQUEUE_HOST_BUSY;
 		goto out;
+	case UFSHCD_STATE_ERROR:
+		set_host_byte(cmd, DID_ERROR);
+		cmd->scsi_done(cmd);
+		goto out;
+	default:
+		dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
+				__func__, hba->ufshcd_state);
+		set_host_byte(cmd, DID_BAD_TARGET);
+		cmd->scsi_done(cmd);
+		goto out;
 	}
 
 	/* acquire the tag to make sure device cmds don't use it */
@@ -1495,8 +1530,6 @@ static int ufshcd_make_hba_operational(struct ufs_hba *hba)
 	if (hba->ufshcd_state == UFSHCD_STATE_RESET)
 		scsi_unblock_requests(hba->host);
 
-	hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL;
-
 out:
 	return err;
 }
@@ -2212,6 +2245,100 @@ out:
 }
 
 /**
+ * ufshcd_utrl_is_rsr_enabled - check if run-stop register is enabled
+ * @hba: per-adapter instance
+ */
+static bool ufshcd_utrl_is_rsr_enabled(struct ufs_hba *hba)
+{
+	return ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_LIST_RUN_STOP) & 0x1;
+}
+
+/**
+ * ufshcd_utmrl_is_rsr_enabled - check if run-stop register is enabled
+ * @hba: per-adapter instance
+ */
+static bool ufshcd_utmrl_is_rsr_enabled(struct ufs_hba *hba)
+{
+	return ufshcd_readl(hba, REG_UTP_TASK_REQ_LIST_RUN_STOP) & 0x1;
+}
+
+/**
+ * ufshcd_complete_pending_tasks - complete outstanding tasks
+ * @hba: per adapter instance
+ *
+ * Abort in-progress task management commands and wakeup
+ * waiting threads.
+ *
+ * Returns non-zero error value when failed to clear all the commands.
+ */
+static int ufshcd_complete_pending_tasks(struct ufs_hba *hba)
+{
+	int err = 0;
+	unsigned long flags;
+
+	if (!hba->outstanding_tasks)
+		goto out;
+
+	/* Clear UTMRL only when run-stop is enabled */
+	if (ufshcd_utmrl_is_rsr_enabled(hba))
+		ufshcd_writel(hba, ~hba->outstanding_tasks,
+				REG_UTP_TASK_REQ_LIST_CLEAR);
+
+	/* poll for max. 1 sec to clear door bell register by h/w */
+	err = ufshcd_wait_for_register(hba, REG_UTP_TASK_REQ_DOOR_BELL,
+			hba->outstanding_tasks,	~hba->outstanding_tasks,
+			1000, 1000);
+
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	/* complete commands that were cleared out */
+	ufshcd_tmc_handler(hba);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+out:
+	if (err)
+		dev_err(hba->dev, "%s: failed, still pending = 0x%.8x\n",
+		__func__, ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL));
+	return err;
+}
+
+/**
+ * ufshcd_complete_pending_reqs - complete outstanding requests
+ * @hba: per adapter instance
+ *
+ * Abort in-progress transfer request commands and return them to SCSI.
+ *
+ * Returns non-zero error value when failed to clear all the commands.
+ */
+static int ufshcd_complete_pending_reqs(struct ufs_hba *hba)
+{
+	int err = 0;
+	unsigned long flags;
+
+	/* check if we completed all of them */
+	if (!hba->outstanding_reqs)
+		goto out;
+
+	/* Clear UTRL only when run-stop is enabled */
+	if (ufshcd_utrl_is_rsr_enabled(hba))
+		ufshcd_writel(hba, ~hba->outstanding_reqs,
+				REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+
+	/* poll for max. 1 sec to clear door bell register by h/w */
+	err = ufshcd_wait_for_register(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL,
+			hba->outstanding_reqs, ~hba->outstanding_reqs,
+			1000, 1000);
+
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	/* complete commands that were cleared out */
+	ufshcd_transfer_req_compl(hba);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+out:
+	if (err)
+		dev_err(hba->dev, "%s: failed, still pending = 0x%.8x\n",
+		__func__, ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL));
+	return err;
+}
+
+/**
  * ufshcd_fatal_err_handler - handle fatal errors
  * @hba: per adapter instance
  */
@@ -2245,8 +2372,12 @@ static void ufshcd_err_handler(struct ufs_hba *hba)
 	}
 	return;
 fatal_eh:
-	hba->ufshcd_state = UFSHCD_STATE_ERROR;
-	schedule_work(&hba->feh_workq);
+	/* handle fatal errors only when link is functional */
+	if (hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) {
+		/* block commands at driver layer until error is handled */
+		hba->ufshcd_state = UFSHCD_STATE_ERROR;
+		schedule_work(&hba->feh_workq);
+	}
 }
 
 /**
@@ -2411,75 +2542,155 @@ static int ufshcd_issue_tm_cmd(struct ufs_hba *hba, int lun_id, int task_id,
 }
 
 /**
- * ufshcd_device_reset - reset device and abort all the pending commands
- * @cmd: SCSI command pointer
+ * ufshcd_dme_end_point_reset - Notify device Unipro to perform reset
+ * @hba: per adapter instance
  *
- * Returns SUCCESS/FAILED
+ * UIC_CMD_DME_END_PT_RST resets the UFS device completely, the UFS flags,
+ * attributes and descriptors are reset to default state. Callers are
+ * expected to initialize the whole device again after this.
+ *
+ * Returns zero on success, non-zero on failure
  */
-static int ufshcd_device_reset(struct scsi_cmnd *cmd)
+static int ufshcd_dme_end_point_reset(struct ufs_hba *hba)
 {
-	struct Scsi_Host *host;
-	struct ufs_hba *hba;
-	unsigned int tag;
-	u32 pos;
-	int err;
-	u8 resp;
-	struct ufshcd_lrb *lrbp;
+	struct uic_command uic_cmd = {0};
+	int ret;
 
-	host = cmd->device->host;
-	hba = shost_priv(host);
-	tag = cmd->request->tag;
+	uic_cmd.command = UIC_CMD_DME_END_PT_RST;
 
-	lrbp = &hba->lrb[tag];
-	err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag,
-			UFS_LOGICAL_RESET, &resp);
-	if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) {
-		err = FAILED;
+	ret = ufshcd_send_uic_cmd(hba, &uic_cmd);
+	if (ret)
+		dev_err(hba->dev, "%s: error code %d\n", __func__, ret);
+
+	return ret;
+}
+
+/**
+ * ufshcd_dme_reset - Local UniPro reset
+ * @hba: per adapter instance
+ *
+ * Returns zero on success, non-zero on failure
+ */
+static int ufshcd_dme_reset(struct ufs_hba *hba)
+{
+	struct uic_command uic_cmd = {0};
+	int ret;
+
+	uic_cmd.command = UIC_CMD_DME_RESET;
+
+	ret = ufshcd_send_uic_cmd(hba, &uic_cmd);
+	if (ret)
+		dev_err(hba->dev, "%s: error code %d\n", __func__, ret);
+
+	return ret;
+
+}
+
+/**
+ * ufshcd_dme_enable - Local UniPro DME Enable
+ * @hba: per adapter instance
+ *
+ * Returns zero on success, non-zero on failure
+ */
+static int ufshcd_dme_enable(struct ufs_hba *hba)
+{
+	struct uic_command uic_cmd = {0};
+	int ret;
+	uic_cmd.command = UIC_CMD_DME_ENABLE;
+
+	ret = ufshcd_send_uic_cmd(hba, &uic_cmd);
+	if (ret)
+		dev_err(hba->dev, "%s: error code %d\n", __func__, ret);
+
+	return ret;
+
+}
+
+/**
+ * ufshcd_device_reset_and_restore - reset and restore device
+ * @hba: per-adapter instance
+ *
+ * Note that the device reset issues DME_END_POINT_RESET which
+ * may reset entire device and restore device attributes to
+ * default state.
+ *
+ * Returns zero on success, non-zero on failure
+ */
+static int ufshcd_device_reset_and_restore(struct ufs_hba *hba)
+{
+	int err = 0;
+	u32 reg;
+
+	err = ufshcd_dme_end_point_reset(hba);
+	if (err)
 		goto out;
-	} else {
-		err = SUCCESS;
-	}
 
-	for (pos = 0; pos < hba->nutrs; pos++) {
-		if (test_bit(pos, &hba->outstanding_reqs) &&
-		    (hba->lrb[tag].lun == hba->lrb[pos].lun)) {
+	/* restore communication with the device */
+	err = ufshcd_dme_reset(hba);
+	if (err)
+		goto out;
 
-			/* clear the respective UTRLCLR register bit */
-			ufshcd_utrl_clear(hba, pos);
+	err = ufshcd_dme_enable(hba);
+	if (err)
+		goto out;
 
-			clear_bit(pos, &hba->outstanding_reqs);
+	err = ufshcd_dme_link_startup(hba);
+	if (err)
+		goto out;
 
-			if (hba->lrb[pos].cmd) {
-				scsi_dma_unmap(hba->lrb[pos].cmd);
-				hba->lrb[pos].cmd->result =
-					DID_ABORT << 16;
-				hba->lrb[pos].cmd->scsi_done(cmd);
-				hba->lrb[pos].cmd = NULL;
-				clear_bit_unlock(pos, &hba->lrb_in_use);
-				wake_up(&hba->dev_cmd.tag_wq);
-			}
-		}
-	} /* end of for */
+	/* check if link is up and device is detected */
+	reg = ufshcd_readl(hba, REG_CONTROLLER_STATUS);
+	if (!ufshcd_is_device_present(reg)) {
+		dev_err(hba->dev, "Device not present\n");
+		err = -ENXIO;
+		goto out;
+	}
+
+	ufshcd_clear_device_reset_pending(hba);
 out:
+	dev_dbg(hba->dev, "%s: done err = %d\n", __func__, err);
 	return err;
 }
 
 /**
- * ufshcd_host_reset - Main reset function registered with scsi layer
- * @cmd: SCSI command pointer
+ * ufshcd_host_reset_and_restore - reset and restore host controller
+ * @hba: per-adapter instance
  *
- * Returns SUCCESS/FAILED
+ * Note that host controller reset may issue DME_RESET to
+ * local and remote (device) Uni-Pro stack and the attributes
+ * are reset to default state.
+ *
+ * Returns zero on success, non-zero on failure
  */
-static int ufshcd_host_reset(struct scsi_cmnd *cmd)
+static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 {
-	struct ufs_hba *hba;
+	int err;
+	async_cookie_t cookie;
+	unsigned long flags;
 
-	hba = shost_priv(cmd->device->host);
+	/* Reset the host controller */
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	ufshcd_hba_stop(hba);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
-	if (hba->ufshcd_state == UFSHCD_STATE_RESET)
-		return SUCCESS;
+	err = ufshcd_hba_enable(hba);
+	if (err)
+		goto out;
+
+	/* Establish the link again and restore the device */
+	cookie = async_schedule(ufshcd_async_scan, hba);
+	/* wait for async scan to be completed */
+	async_synchronize_cookie(++cookie);
+	if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)
+		err = -EIO;
+out:
+	if (err)
+		dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
+	else
+		ufshcd_clear_host_reset_pending(hba);
 
-	return ufshcd_do_reset(hba);
+	dev_dbg(hba->dev, "%s: done err = %d\n", __func__, err);
+	return err;
 }
 
 /**
@@ -2580,6 +2791,110 @@ out:
 }
 
 /**
+ * ufshcd_reset_and_restore - resets device or host or both
+ * @hba: per-adapter instance
+ *
+ * Reset and recover device, host and re-establish link. This
+ * is helpful to recover the communication in fatal error conditions.
+ *
+ * Returns zero on success, non-zero on failure
+ */
+static int ufshcd_reset_and_restore(struct ufs_hba *hba)
+{
+	int err = 0;
+
+	if (ufshcd_device_reset_pending(hba) &&
+			!ufshcd_host_reset_pending(hba)) {
+		err = ufshcd_device_reset_and_restore(hba);
+		if (err) {
+			ufshcd_clear_device_reset_pending(hba);
+			ufshcd_set_host_reset_pending(hba);
+		}
+	}
+
+	if (ufshcd_host_reset_pending(hba))
+		err = ufshcd_host_reset_and_restore(hba);
+
+	/*
+	 * Due to reset the door-bell might be cleared, clear
+	 * outstanding requests in s/w here.
+	 */
+	ufshcd_complete_pending_reqs(hba);
+	ufshcd_complete_pending_tasks(hba);
+
+	return err;
+}
+
+static int ufshcd_eh_reset_handler(struct scsi_cmnd *cmd, int eh_flag)
+{
+	int err;
+	unsigned long flags;
+	struct ufs_hba *hba;
+
+	hba = shost_priv(cmd->device->host);
+
+	/*
+	 * Check if there is any race with fatal error handling.
+	 * If so, wait for it to complete. Even though fatal error
+	 * handling does reset and restore in some cases, don't assume
+	 * anything out of it. We are just avoiding race here.
+	 */
+	do {
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		if (!(work_pending(&hba->feh_workq) ||
+				hba->ufshcd_state == UFSHCD_STATE_RESET))
+			break;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+		dev_dbg(hba->dev, "%s: reset in progress\n", __func__);
+		flush_work_sync(&hba->feh_workq);
+	} while (1);
+
+	hba->ufshcd_state = UFSHCD_STATE_RESET;
+	if (eh_flag & UFSHCD_EH_DEVICE_RESET_PENDING)
+		ufshcd_set_device_reset_pending(hba);
+	else
+		ufshcd_set_host_reset_pending(hba);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+	err = ufshcd_reset_and_restore(hba);
+
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	if (!err) {
+		err = SUCCESS;
+		hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL;
+	} else {
+		err = FAILED;
+		hba->ufshcd_state = UFSHCD_STATE_ERROR;
+	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+	return err;
+}
+
+/**
+ * ufshcd_eh_device_reset_handler - device reset handler registered to
+ *                                    scsi layer.
+ * @cmd - SCSI command pointer
+ *
+ * Returns SUCCESS/FAILED
+ */
+static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
+{
+	return ufshcd_eh_reset_handler(cmd, UFSHCD_EH_DEVICE_RESET_PENDING);
+}
+
+/**
+ * ufshcd_eh_host_reset_handler - host reset handler registered to scsi layer
+ * @cmd - SCSI command pointer
+ *
+ * Returns SUCCESS/FAILED
+ */
+static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
+{
+	return ufshcd_eh_reset_handler(cmd, UFSHCD_EH_HOST_RESET_PENDING);
+}
+
+/**
  * ufshcd_async_scan - asynchronous execution for link startup
  * @data: data pointer to pass to this function
  * @cookie: cookie data
@@ -2602,8 +2917,14 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
 		goto out;
 
 	ufshcd_force_reset_auto_bkops(hba);
-	scsi_scan_host(hba->host);
-	pm_runtime_put_sync(hba->dev);
+	hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL;
+
+	/* If we are in error handling context no need to scan the host */
+	if (!(ufshcd_device_reset_pending(hba) ||
+			ufshcd_host_reset_pending(hba))) {
+		scsi_scan_host(hba->host);
+		pm_runtime_put_sync(hba->dev);
+	}
 out:
 	return;
 }
@@ -2616,8 +2937,8 @@ static struct scsi_host_template ufshcd_driver_template = {
 	.slave_alloc		= ufshcd_slave_alloc,
 	.slave_destroy		= ufshcd_slave_destroy,
 	.eh_abort_handler	= ufshcd_abort,
-	.eh_device_reset_handler = ufshcd_device_reset,
-	.eh_host_reset_handler	= ufshcd_host_reset,
+	.eh_device_reset_handler = ufshcd_eh_device_reset_handler,
+	.eh_host_reset_handler   = ufshcd_eh_host_reset_handler,
 	.this_id		= -1,
 	.sg_tablesize		= SG_ALL,
 	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index fe7c947..1e0585c 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -179,6 +179,7 @@ struct ufs_dev_cmd {
  * @tm_condition: condition variable for task management
  * @tm_slots_in_use: bit map of task management request slots in use
  * @ufshcd_state: UFSHCD states
+ * @eh_flags: Error handling flags
  * @intr_mask: Interrupt Mask Bits
  * @ee_ctrl_mask: Exception event control mask
  * @feh_workq: Work queue for fatal controller error handling
@@ -224,6 +225,7 @@ struct ufs_hba {
 	unsigned long tm_slots_in_use;
 
 	u32 ufshcd_state;
+	u32 eh_flags;
 	u32 intr_mask;
 	u16 ee_ctrl_mask;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V4 4/4] scsi: ufs: Improve UFS fatal error handling
  2013-07-23  7:08 [PATCH V4 0/4] scsi: ufs: Improve UFS error handling Sujit Reddy Thumma
                   ` (2 preceding siblings ...)
  2013-07-23  7:08 ` [PATCH V4 3/4] scsi: ufs: Fix device and host reset methods Sujit Reddy Thumma
@ 2013-07-23  7:08 ` Sujit Reddy Thumma
  3 siblings, 0 replies; 5+ messages in thread
From: Sujit Reddy Thumma @ 2013-07-23  7:08 UTC (permalink / raw)
  To: Vinayak Holikatti, Santosh Y
  Cc: James E.J. Bottomley, linux-scsi, Sujit Reddy Thumma, linux-arm-msm

Error handling in UFS driver is broken and resets the host controller
for fatal errors without re-initialization. Correct the fatal error
handling sequence according to UFS Host Controller Interface (HCI)
v1.1 specification.

o Upon determining fatal error condition the host controller may hang
  forever until a reset is applied, so just retrying the command doesn't
  work without a reset. So, the reset is applied in the driver context
  in a separate work and SCSI mid-layer isn't informed until reset is
  applied.

o Processed requests which are completed without error are reported to
  SCSI layer as successful and any pending commands that are not started
  yet or are not cause of the error are re-queued into scsi midlayer queue.
  For the command that caused error, host controller or device is reset
  and DID_ERROR is returned for command retry after applying reset.

o SCSI is informed about the expected Unit-Attention exception from the
  device for the immediate command after a reset so that the SCSI layer
  take necessary steps to establish communication with the device.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c |  350 +++++++++++++++++++++++++++++++++++---------
 drivers/scsi/ufs/ufshcd.h |    2 +
 drivers/scsi/ufs/ufshci.h |   19 ++-
 3 files changed, 296 insertions(+), 75 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1a0ceb2..82600d6 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -80,6 +80,14 @@ enum {
 	UFSHCD_EH_DEVICE_RESET_PENDING = (1 << 1),
 };
 
+/* UFSHCD UIC layer error flags */
+enum {
+	UFSHCD_UIC_DL_PA_INIT_ERROR = (1 << 0), /* Data link layer error */
+	UFSHCD_UIC_NL_ERROR = (1 << 1), /* Network layer error */
+	UFSHCD_UIC_TL_ERROR = (1 << 2), /* Transport Layer error */
+	UFSHCD_UIC_DME_ERROR = (1 << 3), /* DME error */
+};
+
 /* Interrupt configuration options */
 enum {
 	UFSHCD_INT_DISABLE,
@@ -108,6 +116,7 @@ enum {
 
 static void ufshcd_tmc_handler(struct ufs_hba *hba);
 static void ufshcd_async_scan(void *data, async_cookie_t cookie);
+static int ufshcd_reset_and_restore(struct ufs_hba *hba);
 
 /*
  * ufshcd_wait_for_register - wait for register value to change
@@ -1527,9 +1536,6 @@ static int ufshcd_make_hba_operational(struct ufs_hba *hba)
 		goto out;
 	}
 
-	if (hba->ufshcd_state == UFSHCD_STATE_RESET)
-		scsi_unblock_requests(hba->host);
-
 out:
 	return err;
 }
@@ -1655,66 +1661,6 @@ static int ufshcd_verify_dev_init(struct ufs_hba *hba)
 }
 
 /**
- * ufshcd_do_reset - reset the host controller
- * @hba: per adapter instance
- *
- * Returns SUCCESS/FAILED
- */
-static int ufshcd_do_reset(struct ufs_hba *hba)
-{
-	struct ufshcd_lrb *lrbp;
-	unsigned long flags;
-	int tag;
-
-	/* block commands from midlayer */
-	scsi_block_requests(hba->host);
-
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	hba->ufshcd_state = UFSHCD_STATE_RESET;
-
-	/* send controller to reset state */
-	ufshcd_hba_stop(hba);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-
-	/* abort outstanding commands */
-	for (tag = 0; tag < hba->nutrs; tag++) {
-		if (test_bit(tag, &hba->outstanding_reqs)) {
-			lrbp = &hba->lrb[tag];
-			if (lrbp->cmd) {
-				scsi_dma_unmap(lrbp->cmd);
-				lrbp->cmd->result = DID_RESET << 16;
-				lrbp->cmd->scsi_done(lrbp->cmd);
-				lrbp->cmd = NULL;
-				clear_bit_unlock(tag, &hba->lrb_in_use);
-			}
-		}
-	}
-
-	/* complete device management command */
-	if (hba->dev_cmd.complete)
-		complete(hba->dev_cmd.complete);
-
-	/* clear outstanding request/task bit maps */
-	hba->outstanding_reqs = 0;
-	hba->outstanding_tasks = 0;
-
-	/* Host controller enable */
-	if (ufshcd_hba_enable(hba)) {
-		dev_err(hba->dev,
-			"Reset: Controller initialization failed\n");
-		return FAILED;
-	}
-
-	if (ufshcd_link_startup(hba)) {
-		dev_err(hba->dev,
-			"Reset: Link start-up failed\n");
-		return FAILED;
-	}
-
-	return SUCCESS;
-}
-
-/**
  * ufshcd_slave_alloc - handle initial SCSI device configurations
  * @sdev: pointer to SCSI device
  *
@@ -1731,6 +1677,9 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev)
 	sdev->use_10_for_ms = 1;
 	scsi_set_tag_type(sdev, MSG_SIMPLE_TAG);
 
+	/* allow SCSI layer to restart the device in case of errors */
+	sdev->allow_restart = 1;
+
 	/*
 	 * Inform SCSI Midlayer that the LUN queue depth is same as the
 	 * controller queue depth. If a LUN queue depth is less than the
@@ -1934,6 +1883,9 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 	case OCS_ABORTED:
 		result |= DID_ABORT << 16;
 		break;
+	case OCS_INVALID_COMMAND_STATUS:
+		result |= DID_REQUEUE << 16;
+		break;
 	case OCS_INVALID_CMD_TABLE_ATTR:
 	case OCS_INVALID_PRDT_ATTR:
 	case OCS_MISMATCH_DATA_BUF_SIZE:
@@ -2338,42 +2290,296 @@ out:
 	return err;
 }
 
+static void ufshcd_decide_eh_xfer_req(struct ufs_hba *hba, u32 ocs)
+{
+	switch (ocs) {
+	case OCS_SUCCESS:
+	case OCS_INVALID_COMMAND_STATUS:
+		break;
+	case OCS_MISMATCH_DATA_BUF_SIZE:
+	case OCS_MISMATCH_RESP_UPIU_SIZE:
+	case OCS_PEER_COMM_FAILURE:
+	case OCS_FATAL_ERROR:
+	case OCS_ABORTED:
+	case OCS_INVALID_CMD_TABLE_ATTR:
+	case OCS_INVALID_PRDT_ATTR:
+		ufshcd_set_host_reset_pending(hba);
+		break;
+	default:
+		dev_err(hba->dev, "%s: unknown OCS 0x%x\n",
+				__func__, ocs);
+		BUG();
+	}
+}
+
+static void ufshcd_decide_eh_task_req(struct ufs_hba *hba, u32 ocs)
+{
+	switch (ocs) {
+	case OCS_TMR_SUCCESS:
+	case OCS_TMR_INVALID_COMMAND_STATUS:
+		break;
+	case OCS_TMR_MISMATCH_REQ_SIZE:
+	case OCS_TMR_MISMATCH_RESP_SIZE:
+	case OCS_TMR_PEER_COMM_FAILURE:
+	case OCS_TMR_INVALID_ATTR:
+	case OCS_TMR_ABORTED:
+	case OCS_TMR_FATAL_ERROR:
+		ufshcd_set_host_reset_pending(hba);
+		break;
+	default:
+		dev_err(hba->dev, "%s: uknown TMR OCS 0x%x\n",
+				__func__, ocs);
+		BUG();
+	}
+}
+
 /**
- * ufshcd_fatal_err_handler - handle fatal errors
+ * ufshcd_error_autopsy_transfer_req() - reads OCS field of failed command and
+ *                          decide error handling
+ * @hba: per adapter instance
+ * @err_xfer: bit mask for transfer request errors
+ *
+ * Iterate over completed transfer requests and
+ * set error handling flags.
+ */
+static void
+ufshcd_error_autopsy_transfer_req(struct ufs_hba *hba, u32 *err_xfer)
+{
+	unsigned long completed;
+	u32 doorbell;
+	int index;
+	int ocs;
+
+	if (!err_xfer)
+		goto out;
+
+	doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	completed = doorbell ^ (u32)hba->outstanding_reqs;
+
+	for (index = 0; index < hba->nutrs; index) {
+		if (test_bit(index, &completed)) {
+			ocs = ufshcd_get_tr_ocs(&hba->lrb[index]);
+			if ((ocs == OCS_SUCCESS) ||
+					(ocs == OCS_INVALID_COMMAND_STATUS))
+				continue;
+
+			*err_xfer |= (1 << index);
+			ufshcd_decide_eh_xfer_req(hba, ocs);
+		}
+	}
+out:
+	return;
+}
+
+/**
+ * ufshcd_error_autopsy_task_req() - reads OCS field of failed command and
+ *                          decide error handling
  * @hba: per adapter instance
+ * @err_tm: bit mask for task management errors
+ *
+ * Iterate over completed task management requests and
+ * set error handling flags.
+ */
+static void
+ufshcd_error_autopsy_task_req(struct ufs_hba *hba, u32 *err_tm)
+{
+	unsigned long completed;
+	u32 doorbell;
+	int index;
+	int ocs;
+
+	if (!err_tm)
+		goto out;
+
+	doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
+	completed = doorbell ^ (u32)hba->outstanding_tasks;
+
+	for (index = 0; index < hba->nutmrs; index) {
+		if (test_bit(index, &completed)) {
+			struct utp_task_req_desc *tm_descp;
+
+			tm_descp = hba->utmrdl_base_addr;
+			ocs = ufshcd_get_tmr_ocs(&tm_descp[index]);
+			if ((ocs == OCS_TMR_SUCCESS) ||
+					(ocs == OCS_TMR_INVALID_COMMAND_STATUS))
+				continue;
+
+			*err_tm |= (1 << index);
+			ufshcd_decide_eh_task_req(hba, ocs);
+		}
+	}
+
+out:
+	return;
+}
+
+/**
+ * ufshcd_fatal_err_handler - handle fatal errors
+ * @work: pointer to work structure
  */
 static void ufshcd_fatal_err_handler(struct work_struct *work)
 {
 	struct ufs_hba *hba;
+	unsigned long flags;
+	u32 err_xfer = 0;
+	u32 err_tm = 0;
+	int err;
+
 	hba = container_of(work, struct ufs_hba, feh_workq);
 
 	pm_runtime_get_sync(hba->dev);
-	/* check if reset is already in progress */
-	if (hba->ufshcd_state != UFSHCD_STATE_RESET)
-		ufshcd_do_reset(hba);
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	if (hba->ufshcd_state == UFSHCD_STATE_RESET) {
+		/* complete processed requests and exit */
+		ufshcd_transfer_req_compl(hba);
+		ufshcd_tmc_handler(hba);
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+		pm_runtime_put_sync(hba->dev);
+		return;
+	}
+
+	hba->ufshcd_state = UFSHCD_STATE_RESET;
+	ufshcd_error_autopsy_transfer_req(hba, &err_xfer);
+	ufshcd_error_autopsy_task_req(hba, &err_tm);
+
+	/*
+	 * Complete successful and pending transfer requests.
+	 * DID_REQUEUE is returned for pending requests as they have
+	 * nothing to do with error'ed request and SCSI layer should
+	 * not treat them as errors and decrement retry count.
+	 */
+	hba->outstanding_reqs &= ~err_xfer;
+	ufshcd_transfer_req_compl(hba);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	ufshcd_complete_pending_reqs(hba);
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->outstanding_reqs |= err_xfer;
+
+	/* Complete successful and pending task requests */
+	hba->outstanding_tasks &= ~err_tm;
+	ufshcd_tmc_handler(hba);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	ufshcd_complete_pending_tasks(hba);
+	spin_lock_irqsave(hba->host->host_lock, flags);
+
+	hba->outstanding_tasks |= err_tm;
+
+	/*
+	 * Controller may generate multiple fatal errors, handle
+	 * errors based on severity.
+	 * 1) DEVICE_FATAL_ERROR
+	 * 2) SYSTEM_BUS/CONTROLLER_FATAL_ERROR
+	 * 3) UIC_ERROR
+	 */
+	if (hba->errors & DEVICE_FATAL_ERROR) {
+		/*
+		 * Some HBAs may not clear UTRLDBR/UTMRLDBR or update
+		 * OCS field on device fatal error.
+		 */
+		ufshcd_set_host_reset_pending(hba);
+	} else if (hba->errors & (SYSTEM_BUS_FATAL_ERROR |
+				CONTROLLER_FATAL_ERROR)) {
+		/* eh flags should be set in err autopsy based on OCS values */
+		if (!hba->eh_flags)
+			WARN(1, "%s: fatal error without error handling\n",
+				dev_name(hba->dev));
+	} else if (hba->errors & UIC_ERROR) {
+		if (hba->uic_error & UFSHCD_UIC_DL_PA_INIT_ERROR) {
+			/* fatal error - reset controller */
+			ufshcd_set_host_reset_pending(hba);
+		} else if (hba->uic_error & (UFSHCD_UIC_NL_ERROR |
+					UFSHCD_UIC_TL_ERROR |
+					UFSHCD_UIC_DME_ERROR)) {
+			/* non-fatal, report error to SCSI layer */
+			if (!hba->eh_flags) {
+				spin_unlock_irqrestore(
+						hba->host->host_lock, flags);
+				ufshcd_complete_pending_reqs(hba);
+				ufshcd_complete_pending_tasks(hba);
+				spin_lock_irqsave(hba->host->host_lock, flags);
+			}
+		}
+	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+	if (hba->eh_flags) {
+		err = ufshcd_reset_and_restore(hba);
+		if (err) {
+			ufshcd_clear_host_reset_pending(hba);
+			ufshcd_clear_device_reset_pending(hba);
+			dev_err(hba->dev, "%s: reset and restore failed\n",
+					__func__);
+			hba->ufshcd_state = UFSHCD_STATE_ERROR;
+		}
+		/*
+		 * Inform scsi mid-layer that we did reset and allow to handle
+		 * Unit Attention properly.
+		 */
+		scsi_report_bus_reset(hba->host, 0);
+		hba->errors = 0;
+		hba->uic_error = 0;
+	}
+	scsi_unblock_requests(hba->host);
 	pm_runtime_put_sync(hba->dev);
 }
 
 /**
- * ufshcd_err_handler - Check for fatal errors
- * @work: pointer to a work queue structure
+ * ufshcd_update_uic_error - check and set fatal UIC error flags.
+ * @hba: per-adapter instance
  */
-static void ufshcd_err_handler(struct ufs_hba *hba)
+static void ufshcd_update_uic_error(struct ufs_hba *hba)
 {
 	u32 reg;
 
+	/* PA_INIT_ERROR is fatal and needs UIC reset */
+	reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DATA_LINK_LAYER);
+	if (reg & UIC_DATA_LINK_LAYER_ERROR_PA_INIT)
+		hba->uic_error |= UFSHCD_UIC_DL_PA_INIT_ERROR;
+
+	/* UIC NL/TL/DME errors needs software retry */
+	reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_NETWORK_LAYER);
+	if (reg)
+		hba->uic_error |= UFSHCD_UIC_NL_ERROR;
+
+	reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_TRANSPORT_LAYER);
+	if (reg)
+		hba->uic_error |= UFSHCD_UIC_TL_ERROR;
+
+	reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DME);
+	if (reg)
+		hba->uic_error |= UFSHCD_UIC_DME_ERROR;
+
+	dev_dbg(hba->dev, "%s: UIC error flags = 0x%08x\n",
+			__func__, hba->uic_error);
+}
+
+/**
+ * ufshcd_err_handler - Check for fatal errors
+ * @hba: per-adapter instance
+ */
+static void ufshcd_err_handler(struct ufs_hba *hba)
+{
 	if (hba->errors & INT_FATAL_ERRORS)
 		goto fatal_eh;
 
 	if (hba->errors & UIC_ERROR) {
-		reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DATA_LINK_LAYER);
-		if (reg & UIC_DATA_LINK_LAYER_ERROR_PA_INIT)
+		hba->uic_error = 0;
+		ufshcd_update_uic_error(hba);
+		if (hba->uic_error)
 			goto fatal_eh;
 	}
+	/*
+	 * Other errors are either non-fatal or completed by the
+	 * controller by updating OCS fields with success/failure.
+	 */
 	return;
+
 fatal_eh:
 	/* handle fatal errors only when link is functional */
 	if (hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) {
+		/* block commands from midlayer */
+		scsi_block_requests(hba->host);
+
 		/* block commands at driver layer until error is handled */
 		hba->ufshcd_state = UFSHCD_STATE_ERROR;
 		schedule_work(&hba->feh_workq);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 1e0585c..47f54a2 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -185,6 +185,7 @@ struct ufs_dev_cmd {
  * @feh_workq: Work queue for fatal controller error handling
  * @eeh_work: Worker to handle exception events
  * @errors: HBA errors
+ * @uic_error: UFS interconnect layer error status
  * @dev_cmd: ufs device management command information
  * @auto_bkops_enabled: to track whether bkops is enabled in device
  */
@@ -235,6 +236,7 @@ struct ufs_hba {
 
 	/* HBA Errors */
 	u32 errors;
+	u32 uic_error;
 
 	/* Device management request data */
 	struct ufs_dev_cmd dev_cmd;
diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h
index f1e1b74..36f68ef 100644
--- a/drivers/scsi/ufs/ufshci.h
+++ b/drivers/scsi/ufs/ufshci.h
@@ -264,7 +264,7 @@ enum {
 	UTP_DEVICE_TO_HOST	= 0x04000000,
 };
 
-/* Overall command status values */
+/* Overall command status values for transfer request */
 enum {
 	OCS_SUCCESS			= 0x0,
 	OCS_INVALID_CMD_TABLE_ATTR	= 0x1,
@@ -274,8 +274,21 @@ enum {
 	OCS_PEER_COMM_FAILURE		= 0x5,
 	OCS_ABORTED			= 0x6,
 	OCS_FATAL_ERROR			= 0x7,
-	OCS_INVALID_COMMAND_STATUS	= 0x0F,
-	MASK_OCS			= 0x0F,
+	OCS_INVALID_COMMAND_STATUS	= 0xF,
+	MASK_OCS			= 0xFF,
+};
+
+/* Overall command status values for task management request */
+enum {
+	OCS_TMR_SUCCESS			= 0x0,
+	OCS_TMR_INVALID_ATTR		= 0x1,
+	OCS_TMR_MISMATCH_REQ_SIZE	= 0x2,
+	OCS_TMR_MISMATCH_RESP_SIZE	= 0x3,
+	OCS_TMR_PEER_COMM_FAILURE	= 0x4,
+	OCS_TMR_ABORTED			= 0x5,
+	OCS_TMR_FATAL_ERROR		= 0x6,
+	OCS_TMR_INVALID_COMMAND_STATUS	= 0xF,
+	MASK_OCS_TMR			= 0xFF,
 };
 
 /**
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-23  7:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-23  7:08 [PATCH V4 0/4] scsi: ufs: Improve UFS error handling Sujit Reddy Thumma
2013-07-23  7:08 ` [PATCH V4 1/4] scsi: ufs: Fix broken task management command implementation Sujit Reddy Thumma
2013-07-23  7:08 ` [PATCH V4 2/4] scsi: ufs: Fix hardware race conditions while aborting a command Sujit Reddy Thumma
2013-07-23  7:08 ` [PATCH V4 3/4] scsi: ufs: Fix device and host reset methods Sujit Reddy Thumma
2013-07-23  7:08 ` [PATCH V4 4/4] scsi: ufs: Improve UFS fatal error handling Sujit Reddy Thumma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.