linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] ncr5380: Exception handling fixes for v4.5
@ 2016-02-22 23:07 Finn Thain
  2016-02-22 23:07 ` [PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset Finn Thain
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Finn Thain @ 2016-02-22 23:07 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Michael Schmitz,
	linux-m68k, linux-scsi, linux-kernel


These patches fix some exception handling and autosense bugs that I
accidentally introduced in v4.5-rc1.

The error recovery and autosense code in these drivers has been unstable
for a long time. Despite that, v4.5-rc1 shows a regression in as much as
it exposes a bug in the aranym emulator. This leads to error recovery,
which can crash.

Also, Michael Schmitz reported some crashes involving abort handling
for a certain target device. And Dan Carpenter found a NULL pointer deref
in the new bus reset code.

Error recovery and autosense are stable with these patches.

I tested them using a Domex 3191D PCI card. Errors during IO were
simulated by sending bus resets and unplugging/replugging the SCSI
cables. Some of these patches fix bugs that only affect more capable
hardware (like Atari). Thanks to Michael Schmitz for patiently testing
those.

Please review this series for v4.5.

---
 drivers/scsi/NCR5380.c       |  133 +++++++++++++++++++------------------------
 drivers/scsi/atari_NCR5380.c |  133 +++++++++++++++++++------------------------
 2 files changed, 118 insertions(+), 148 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset
  2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
@ 2016-02-22 23:07 ` Finn Thain
  2016-02-22 23:07 ` [PATCH 2/6] ncr5380: Dont release lock for PIO transfer Finn Thain
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Finn Thain @ 2016-02-22 23:07 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Michael Schmitz,
	linux-m68k, linux-scsi, linux-kernel

[-- Attachment #1: ncr5380-null-pointer-deref-in-bus_reset-handler --]
[-- Type: text/plain, Size: 4403 bytes --]

Commands subject to exception handling are to be returned to the scsi
mid-layer. Make sure that the various command pointers and command lists
in the low-level driver are correctly cleansed of affected commands.

This fixes some bugs that I accidentally introduced in v4.5-rc1 including
the removal of INIT_LIST_HEAD for the 'autosense' and 'disconnected'
command lists, and the possible NULL pointer dereference in
NCR5380_bus_reset() that was reported by Dan Carpenter.

hostdata->sensing may also point to an affected command so this pointer
also has to be cleared. The abort handler calls complete_cmd() to take
care of this; let's have the bus reset handler do the same.

The issue queue may also contain an affected command. If so, remove it.
This also follows the abort handler logic.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 62717f537e1b ("ncr5380: Implement new eh_bus_reset_handler")
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

---
 drivers/scsi/NCR5380.c       |   19 ++++++++++++-------
 drivers/scsi/atari_NCR5380.c |   19 ++++++++++++-------
 2 files changed, 24 insertions(+), 14 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2016-02-23 10:06:56.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2016-02-23 10:06:56.000000000 +1100
@@ -2450,7 +2450,16 @@ static int NCR5380_bus_reset(struct scsi
 	 * commands!
 	 */
 
-	hostdata->selecting = NULL;
+	if (list_del_cmd(&hostdata->unissued, cmd)) {
+		cmd->result = DID_RESET << 16;
+		cmd->scsi_done(cmd);
+	}
+
+	if (hostdata->selecting) {
+		hostdata->selecting->result = DID_RESET << 16;
+		complete_cmd(instance, hostdata->selecting);
+		hostdata->selecting = NULL;
+	}
 
 	list_for_each_entry(ncmd, &hostdata->disconnected, list) {
 		struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2458,6 +2467,7 @@ static int NCR5380_bus_reset(struct scsi
 		set_host_byte(cmd, DID_RESET);
 		cmd->scsi_done(cmd);
 	}
+	INIT_LIST_HEAD(&hostdata->disconnected);
 
 	list_for_each_entry(ncmd, &hostdata->autosense, list) {
 		struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2465,6 +2475,7 @@ static int NCR5380_bus_reset(struct scsi
 		set_host_byte(cmd, DID_RESET);
 		cmd->scsi_done(cmd);
 	}
+	INIT_LIST_HEAD(&hostdata->autosense);
 
 	if (hostdata->connected) {
 		set_host_byte(hostdata->connected, DID_RESET);
@@ -2472,12 +2483,6 @@ static int NCR5380_bus_reset(struct scsi
 		hostdata->connected = NULL;
 	}
 
-	if (hostdata->sensing) {
-		set_host_byte(hostdata->connected, DID_RESET);
-		complete_cmd(instance, hostdata->sensing);
-		hostdata->sensing = NULL;
-	}
-
 	for (i = 0; i < 8; ++i)
 		hostdata->busy[i] = 0;
 #ifdef REAL_DMA
Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2016-02-23 10:06:56.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2016-02-23 10:06:56.000000000 +1100
@@ -2646,7 +2646,16 @@ static int NCR5380_bus_reset(struct scsi
 	 * commands!
 	 */
 
-	hostdata->selecting = NULL;
+	if (list_del_cmd(&hostdata->unissued, cmd)) {
+		cmd->result = DID_RESET << 16;
+		cmd->scsi_done(cmd);
+	}
+
+	if (hostdata->selecting) {
+		hostdata->selecting->result = DID_RESET << 16;
+		complete_cmd(instance, hostdata->selecting);
+		hostdata->selecting = NULL;
+	}
 
 	list_for_each_entry(ncmd, &hostdata->disconnected, list) {
 		struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2654,6 +2663,7 @@ static int NCR5380_bus_reset(struct scsi
 		set_host_byte(cmd, DID_RESET);
 		cmd->scsi_done(cmd);
 	}
+	INIT_LIST_HEAD(&hostdata->disconnected);
 
 	list_for_each_entry(ncmd, &hostdata->autosense, list) {
 		struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2661,6 +2671,7 @@ static int NCR5380_bus_reset(struct scsi
 		set_host_byte(cmd, DID_RESET);
 		cmd->scsi_done(cmd);
 	}
+	INIT_LIST_HEAD(&hostdata->autosense);
 
 	if (hostdata->connected) {
 		set_host_byte(hostdata->connected, DID_RESET);
@@ -2668,12 +2679,6 @@ static int NCR5380_bus_reset(struct scsi
 		hostdata->connected = NULL;
 	}
 
-	if (hostdata->sensing) {
-		set_host_byte(hostdata->connected, DID_RESET);
-		complete_cmd(instance, hostdata->sensing);
-		hostdata->sensing = NULL;
-	}
-
 #ifdef SUPPORT_TAGS
 	free_all_tags(hostdata);
 #endif

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/6] ncr5380: Dont release lock for PIO transfer
  2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
  2016-02-22 23:07 ` [PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset Finn Thain
@ 2016-02-22 23:07 ` Finn Thain
  2016-02-22 23:07 ` [PATCH 3/6] ncr5380: Dont re-enter NCR5380_select() Finn Thain
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Finn Thain @ 2016-02-22 23:07 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Michael Schmitz,
	linux-m68k, linux-scsi, linux-kernel

[-- Attachment #1: ncr5380-transfer_pio-locking --]
[-- Type: text/plain, Size: 4358 bytes --]

The calls to NCR5380_transfer_pio() for DATA IN and DATA OUT phases will
modify cmd->SCp.this_residual, cmd->SCp.ptr and cmd->SCp.buffer. That
works as long as EH does not intervene, which became possible in
atari_NCR5380.c when I changed the locking to bring it closer to
NCR5380.c.

If error recovery aborts the command, the scsi_cmnd in question and its
buffer will be returned to the mid-layer. So the transfer has to cease,
but it can't be stopped by the initiator because the target controls the
bus phase.

The problem does not arise if the lock is not released. That was fine for
atari_scsi, because it implements DMA. For the other drivers, we have to
release the lock and re-enable interrupts for long PIO data transfers.

The solution is to split the transfer into small chunks. In between chunks
the main loop releases the lock and re-enables interrupts. Thus interrupts
can be serviced and eh_bus_reset_handler can intervene if need be.

This fixes an oops in NCR5380_transfer_pio() that can happen when the EH
abort handler is invoked during DATA IN or DATA OUT phase.

Fixes: 11d2f63b9cf5 ("ncr5380: Change instance->host_lock to hostdata->lock")
Reported-and-tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

---
 drivers/scsi/NCR5380.c       |   16 +++++++++-------
 drivers/scsi/atari_NCR5380.c |   16 +++++++++-------
 2 files changed, 18 insertions(+), 14 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2016-02-23 10:06:56.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2016-02-23 10:06:57.000000000 +1100
@@ -1759,9 +1759,7 @@ static void NCR5380_information_transfer
 	unsigned char msgout = NOP;
 	int sink = 0;
 	int len;
-#if defined(PSEUDO_DMA) || defined(REAL_DMA_POLL)
 	int transfersize;
-#endif
 	unsigned char *data;
 	unsigned char phase, tmp, extended_msg[10], old_phase = 0xff;
 	struct scsi_cmnd *cmd;
@@ -1854,13 +1852,17 @@ static void NCR5380_information_transfer
 				} else
 #endif				/* defined(PSEUDO_DMA) || defined(REAL_DMA_POLL) */
 				{
-					spin_unlock_irq(&hostdata->lock);
-					NCR5380_transfer_pio(instance, &phase,
-					                     (int *)&cmd->SCp.this_residual,
+					/* Break up transfer into 3 ms chunks,
+					 * presuming 6 accesses per handshake.
+					 */
+					transfersize = min((unsigned long)cmd->SCp.this_residual,
+					                   hostdata->accesses_per_ms / 2);
+					len = transfersize;
+					NCR5380_transfer_pio(instance, &phase, &len,
 					                     (unsigned char **)&cmd->SCp.ptr);
-					spin_lock_irq(&hostdata->lock);
+					cmd->SCp.this_residual -= transfersize - len;
 				}
-				break;
+				return;
 			case PHASE_MSGIN:
 				len = 1;
 				data = &tmp;
Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2016-02-23 10:06:56.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2016-02-23 10:06:57.000000000 +1100
@@ -1838,9 +1838,7 @@ static void NCR5380_information_transfer
 	unsigned char msgout = NOP;
 	int sink = 0;
 	int len;
-#if defined(REAL_DMA)
 	int transfersize;
-#endif
 	unsigned char *data;
 	unsigned char phase, tmp, extended_msg[10], old_phase = 0xff;
 	struct scsi_cmnd *cmd;
@@ -1983,18 +1981,22 @@ static void NCR5380_information_transfer
 				} else
 #endif /* defined(REAL_DMA) */
 				{
-					spin_unlock_irq(&hostdata->lock);
-					NCR5380_transfer_pio(instance, &phase,
-					                     (int *)&cmd->SCp.this_residual,
+					/* Break up transfer into 3 ms chunks,
+					 * presuming 6 accesses per handshake.
+					 */
+					transfersize = min((unsigned long)cmd->SCp.this_residual,
+					                   hostdata->accesses_per_ms / 2);
+					len = transfersize;
+					NCR5380_transfer_pio(instance, &phase, &len,
 					                     (unsigned char **)&cmd->SCp.ptr);
-					spin_lock_irq(&hostdata->lock);
+					cmd->SCp.this_residual -= transfersize - len;
 				}
 #if defined(CONFIG_SUN3) && defined(REAL_DMA)
 				/* if we had intended to dma that command clear it */
 				if (sun3_dma_setup_done == cmd)
 					sun3_dma_setup_done = NULL;
 #endif
-				break;
+				return;
 			case PHASE_MSGIN:
 				len = 1;
 				data = &tmp;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/6] ncr5380: Dont re-enter NCR5380_select()
  2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
  2016-02-22 23:07 ` [PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset Finn Thain
  2016-02-22 23:07 ` [PATCH 2/6] ncr5380: Dont release lock for PIO transfer Finn Thain
@ 2016-02-22 23:07 ` Finn Thain
  2016-02-22 23:07 ` [PATCH 4/6] ncr5380: Forget aborted commands Finn Thain
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Finn Thain @ 2016-02-22 23:07 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Michael Schmitz,
	linux-m68k, linux-scsi, linux-kernel

[-- Attachment #1: ncr5380-dont-reenter-NCR5380_select --]
[-- Type: text/plain, Size: 3347 bytes --]

Calling NCR5380_select() from the abort handler causes various problems.
Firstly, it means potentially re-entering NCR5380_select(). Secondly, it
means that the lock is released, which permits the EH handlers to be
re-entered. The combination results in crashes. Don't do it.

Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler")
Reported-and-tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

---
 drivers/scsi/NCR5380.c       |   16 ++++++++--------
 drivers/scsi/atari_NCR5380.c |   16 ++++++++--------
 2 files changed, 16 insertions(+), 16 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2016-02-23 10:06:57.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2016-02-23 10:06:58.000000000 +1100
@@ -2302,6 +2302,9 @@ static bool list_del_cmd(struct list_hea
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
  * If the command has not completed yet, we must not fail to find it.
+ *
+ * The lock protects driver data structures, but EH handlers also use it
+ * to serialize their own execution and prevent their own re-entry.
  */
 
 static int NCR5380_abort(struct scsi_cmnd *cmd)
@@ -2338,14 +2341,11 @@ static int NCR5380_abort(struct scsi_cmn
 	if (list_del_cmd(&hostdata->disconnected, cmd)) {
 		dsprintk(NDEBUG_ABORT, instance,
 		         "abort: removed %p from disconnected list\n", cmd);
-		cmd->result = DID_ERROR << 16;
-		if (!hostdata->connected)
-			NCR5380_select(instance, cmd);
-		if (hostdata->connected != cmd) {
-			complete_cmd(instance, cmd);
-			result = FAILED;
-			goto out;
-		}
+		/* Can't call NCR5380_select() and send ABORT because that
+		 * means releasing the lock. Need a bus reset.
+		 */
+		result = FAILED;
+		goto out;
 	}
 
 	if (hostdata->connected == cmd) {
Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2016-02-23 10:06:57.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2016-02-23 10:06:58.000000000 +1100
@@ -2497,6 +2497,9 @@ static bool list_del_cmd(struct list_hea
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
  * If the command has not completed yet, we must not fail to find it.
+ *
+ * The lock protects driver data structures, but EH handlers also use it
+ * to serialize their own execution and prevent their own re-entry.
  */
 
 static int NCR5380_abort(struct scsi_cmnd *cmd)
@@ -2533,14 +2536,11 @@ static int NCR5380_abort(struct scsi_cmn
 	if (list_del_cmd(&hostdata->disconnected, cmd)) {
 		dsprintk(NDEBUG_ABORT, instance,
 		         "abort: removed %p from disconnected list\n", cmd);
-		cmd->result = DID_ERROR << 16;
-		if (!hostdata->connected)
-			NCR5380_select(instance, cmd);
-		if (hostdata->connected != cmd) {
-			complete_cmd(instance, cmd);
-			result = FAILED;
-			goto out;
-		}
+		/* Can't call NCR5380_select() and send ABORT because that
+		 * means releasing the lock. Need a bus reset.
+		 */
+		result = FAILED;
+		goto out;
 	}
 
 	if (hostdata->connected == cmd) {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/6] ncr5380: Forget aborted commands
  2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
                   ` (2 preceding siblings ...)
  2016-02-22 23:07 ` [PATCH 3/6] ncr5380: Dont re-enter NCR5380_select() Finn Thain
@ 2016-02-22 23:07 ` Finn Thain
  2016-02-22 23:07 ` [PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling Finn Thain
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Finn Thain @ 2016-02-22 23:07 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Michael Schmitz,
	linux-m68k, linux-scsi, linux-kernel

[-- Attachment #1: ncr5380-always-forget-aborted-commands --]
[-- Type: text/plain, Size: 9445 bytes --]

The list structures and related logic used in the NCR5380 driver mean that
a command cannot be queued twice (i.e. can't appear on more than one queue
and can't appear on the same queue more than once).

The abort handler must forget the command so that the mid-layer can re-use
it. E.g. the ML may send it back to the LLD via via scsi_eh_get_sense().

Fix this and also fix two error paths, so that commands get forgotten iff
completed.

Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler")
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

---
 drivers/scsi/NCR5380.c       |   62 +++++++++++--------------------------------
 drivers/scsi/atari_NCR5380.c |   62 +++++++++++--------------------------------
 2 files changed, 34 insertions(+), 90 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2016-02-23 10:06:58.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2016-02-23 10:07:00.000000000 +1100
@@ -1796,6 +1796,7 @@ static void NCR5380_information_transfer
 				do_abort(instance);
 				cmd->result = DID_ERROR << 16;
 				complete_cmd(instance, cmd);
+				hostdata->connected = NULL;
 				return;
 #endif
 			case PHASE_DATAIN:
@@ -1845,7 +1846,6 @@ static void NCR5380_information_transfer
 						sink = 1;
 						do_abort(instance);
 						cmd->result = DID_ERROR << 16;
-						complete_cmd(instance, cmd);
 						/* XXX - need to source or sink data here, as appropriate */
 					} else
 						cmd->SCp.this_residual -= transfersize - len;
@@ -2294,14 +2294,14 @@ static bool list_del_cmd(struct list_hea
  * [disconnected -> connected ->]...
  * [autosense -> connected ->] done
  *
- * If cmd is unissued then just remove it.
- * If cmd is disconnected, try to select the target.
- * If cmd is connected, try to send an abort message.
- * If cmd is waiting for autosense, give it a chance to complete but check
- * that it isn't left connected.
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
+ *
  * If the command has not completed yet, we must not fail to find it.
+ * We have no option but to forget the aborted command (even if it still
+ * lacks sense data). The mid-layer may re-issue a command that is in error
+ * recovery (see scsi_send_eh_cmnd), but the logic and data structures in
+ * this driver are such that a command can appear on one queue only.
  *
  * The lock protects driver data structures, but EH handlers also use it
  * to serialize their own execution and prevent their own re-entry.
@@ -2327,6 +2327,7 @@ static int NCR5380_abort(struct scsi_cmn
 		         "abort: removed %p from issue queue\n", cmd);
 		cmd->result = DID_ABORT << 16;
 		cmd->scsi_done(cmd); /* No tag or busy flag to worry about */
+		goto out;
 	}
 
 	if (hostdata->selecting == cmd) {
@@ -2344,6 +2345,8 @@ static int NCR5380_abort(struct scsi_cmn
 		/* Can't call NCR5380_select() and send ABORT because that
 		 * means releasing the lock. Need a bus reset.
 		 */
+		set_host_byte(cmd, DID_ERROR);
+		complete_cmd(instance, cmd);
 		result = FAILED;
 		goto out;
 	}
@@ -2351,45 +2354,9 @@ static int NCR5380_abort(struct scsi_cmn
 	if (hostdata->connected == cmd) {
 		dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is connected\n", cmd);
 		hostdata->connected = NULL;
-		if (do_abort(instance)) {
-			set_host_byte(cmd, DID_ERROR);
-			complete_cmd(instance, cmd);
-			result = FAILED;
-			goto out;
-		}
-		set_host_byte(cmd, DID_ABORT);
 #ifdef REAL_DMA
 		hostdata->dma_len = 0;
 #endif
-		if (cmd->cmnd[0] == REQUEST_SENSE)
-			complete_cmd(instance, cmd);
-		else {
-			struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
-
-			/* Perform autosense for this command */
-			list_add(&ncmd->list, &hostdata->autosense);
-		}
-	}
-
-	if (list_find_cmd(&hostdata->autosense, cmd)) {
-		dsprintk(NDEBUG_ABORT, instance,
-		         "abort: found %p on sense queue\n", cmd);
-		spin_unlock_irqrestore(&hostdata->lock, flags);
-		queue_work(hostdata->work_q, &hostdata->main_task);
-		msleep(1000);
-		spin_lock_irqsave(&hostdata->lock, flags);
-		if (list_del_cmd(&hostdata->autosense, cmd)) {
-			dsprintk(NDEBUG_ABORT, instance,
-			         "abort: removed %p from sense queue\n", cmd);
-			set_host_byte(cmd, DID_ABORT);
-			complete_cmd(instance, cmd);
-			goto out;
-		}
-	}
-
-	if (hostdata->connected == cmd) {
-		dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is connected\n", cmd);
-		hostdata->connected = NULL;
 		if (do_abort(instance)) {
 			set_host_byte(cmd, DID_ERROR);
 			complete_cmd(instance, cmd);
@@ -2397,9 +2364,14 @@ static int NCR5380_abort(struct scsi_cmn
 			goto out;
 		}
 		set_host_byte(cmd, DID_ABORT);
-#ifdef REAL_DMA
-		hostdata->dma_len = 0;
-#endif
+		complete_cmd(instance, cmd);
+		goto out;
+	}
+
+	if (list_del_cmd(&hostdata->autosense, cmd)) {
+		dsprintk(NDEBUG_ABORT, instance,
+		         "abort: removed %p from sense queue\n", cmd);
+		set_host_byte(cmd, DID_ERROR);
 		complete_cmd(instance, cmd);
 	}
 
Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2016-02-23 10:06:58.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2016-02-23 10:07:00.000000000 +1100
@@ -1907,6 +1907,7 @@ static void NCR5380_information_transfer
 				do_abort(instance);
 				cmd->result = DID_ERROR << 16;
 				complete_cmd(instance, cmd);
+				hostdata->connected = NULL;
 				return;
 #endif
 			case PHASE_DATAIN:
@@ -1964,7 +1965,6 @@ static void NCR5380_information_transfer
 						sink = 1;
 						do_abort(instance);
 						cmd->result = DID_ERROR << 16;
-						complete_cmd(instance, cmd);
 						/* XXX - need to source or sink data here, as appropriate */
 					} else {
 #ifdef REAL_DMA
@@ -2489,14 +2489,14 @@ static bool list_del_cmd(struct list_hea
  * [disconnected -> connected ->]...
  * [autosense -> connected ->] done
  *
- * If cmd is unissued then just remove it.
- * If cmd is disconnected, try to select the target.
- * If cmd is connected, try to send an abort message.
- * If cmd is waiting for autosense, give it a chance to complete but check
- * that it isn't left connected.
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
+ *
  * If the command has not completed yet, we must not fail to find it.
+ * We have no option but to forget the aborted command (even if it still
+ * lacks sense data). The mid-layer may re-issue a command that is in error
+ * recovery (see scsi_send_eh_cmnd), but the logic and data structures in
+ * this driver are such that a command can appear on one queue only.
  *
  * The lock protects driver data structures, but EH handlers also use it
  * to serialize their own execution and prevent their own re-entry.
@@ -2522,6 +2522,7 @@ static int NCR5380_abort(struct scsi_cmn
 		         "abort: removed %p from issue queue\n", cmd);
 		cmd->result = DID_ABORT << 16;
 		cmd->scsi_done(cmd); /* No tag or busy flag to worry about */
+		goto out;
 	}
 
 	if (hostdata->selecting == cmd) {
@@ -2539,6 +2540,8 @@ static int NCR5380_abort(struct scsi_cmn
 		/* Can't call NCR5380_select() and send ABORT because that
 		 * means releasing the lock. Need a bus reset.
 		 */
+		set_host_byte(cmd, DID_ERROR);
+		complete_cmd(instance, cmd);
 		result = FAILED;
 		goto out;
 	}
@@ -2546,45 +2549,9 @@ static int NCR5380_abort(struct scsi_cmn
 	if (hostdata->connected == cmd) {
 		dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is connected\n", cmd);
 		hostdata->connected = NULL;
-		if (do_abort(instance)) {
-			set_host_byte(cmd, DID_ERROR);
-			complete_cmd(instance, cmd);
-			result = FAILED;
-			goto out;
-		}
-		set_host_byte(cmd, DID_ABORT);
 #ifdef REAL_DMA
 		hostdata->dma_len = 0;
 #endif
-		if (cmd->cmnd[0] == REQUEST_SENSE)
-			complete_cmd(instance, cmd);
-		else {
-			struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
-
-			/* Perform autosense for this command */
-			list_add(&ncmd->list, &hostdata->autosense);
-		}
-	}
-
-	if (list_find_cmd(&hostdata->autosense, cmd)) {
-		dsprintk(NDEBUG_ABORT, instance,
-		         "abort: found %p on sense queue\n", cmd);
-		spin_unlock_irqrestore(&hostdata->lock, flags);
-		queue_work(hostdata->work_q, &hostdata->main_task);
-		msleep(1000);
-		spin_lock_irqsave(&hostdata->lock, flags);
-		if (list_del_cmd(&hostdata->autosense, cmd)) {
-			dsprintk(NDEBUG_ABORT, instance,
-			         "abort: removed %p from sense queue\n", cmd);
-			set_host_byte(cmd, DID_ABORT);
-			complete_cmd(instance, cmd);
-			goto out;
-		}
-	}
-
-	if (hostdata->connected == cmd) {
-		dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is connected\n", cmd);
-		hostdata->connected = NULL;
 		if (do_abort(instance)) {
 			set_host_byte(cmd, DID_ERROR);
 			complete_cmd(instance, cmd);
@@ -2592,9 +2559,14 @@ static int NCR5380_abort(struct scsi_cmn
 			goto out;
 		}
 		set_host_byte(cmd, DID_ABORT);
-#ifdef REAL_DMA
-		hostdata->dma_len = 0;
-#endif
+		complete_cmd(instance, cmd);
+		goto out;
+	}
+
+	if (list_del_cmd(&hostdata->autosense, cmd)) {
+		dsprintk(NDEBUG_ABORT, instance,
+		         "abort: removed %p from sense queue\n", cmd);
+		set_host_byte(cmd, DID_ERROR);
 		complete_cmd(instance, cmd);
 	}
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling
  2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
                   ` (3 preceding siblings ...)
  2016-02-22 23:07 ` [PATCH 4/6] ncr5380: Forget aborted commands Finn Thain
@ 2016-02-22 23:07 ` Finn Thain
  2016-02-22 23:07 ` [PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate Finn Thain
  2016-03-01  2:31 ` [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Martin K. Petersen
  6 siblings, 0 replies; 10+ messages in thread
From: Finn Thain @ 2016-02-22 23:07 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Michael Schmitz,
	linux-m68k, linux-scsi, linux-kernel

[-- Attachment #1: ncr5380-simplify-NCR5380_select-state --]
[-- Type: text/plain, Size: 3549 bytes --]

Add missing checks for EH abort during arbitration and selection.
Rework the handling of NCR5380_select() result to improve clarity.

Fixes: 707d62b37fbb ("ncr5380: Fix EH during arbitration and selection")
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

---
 drivers/scsi/NCR5380.c       |   16 +++++++++++-----
 drivers/scsi/atari_NCR5380.c |   16 +++++++++++-----
 2 files changed, 22 insertions(+), 10 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2016-02-23 10:07:00.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2016-02-23 10:07:01.000000000 +1100
@@ -815,15 +815,17 @@ static void NCR5380_main(struct work_str
 	struct NCR5380_hostdata *hostdata =
 		container_of(work, struct NCR5380_hostdata, main_task);
 	struct Scsi_Host *instance = hostdata->host;
-	struct scsi_cmnd *cmd;
 	int done;
 
 	do {
 		done = 1;
 
 		spin_lock_irq(&hostdata->lock);
-		while (!hostdata->connected &&
-		       (cmd = dequeue_next_cmd(instance))) {
+		while (!hostdata->connected && !hostdata->selecting) {
+			struct scsi_cmnd *cmd = dequeue_next_cmd(instance);
+
+			if (!cmd)
+				break;
 
 			dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", cmd);
 
@@ -840,8 +842,7 @@ static void NCR5380_main(struct work_str
 			 * entire unit.
 			 */
 
-			cmd = NCR5380_select(instance, cmd);
-			if (!cmd) {
+			if (!NCR5380_select(instance, cmd)) {
 				dsprintk(NDEBUG_MAIN, instance, "main: select complete\n");
 			} else {
 				dsprintk(NDEBUG_MAIN | NDEBUG_QUEUES, instance,
@@ -1056,6 +1057,11 @@ static struct scsi_cmnd *NCR5380_select(
 		/* Reselection interrupt */
 		goto out;
 	}
+	if (!hostdata->selecting) {
+		/* Command was aborted */
+		NCR5380_write(MODE_REG, MR_BASE);
+		goto out;
+	}
 	if (err < 0) {
 		NCR5380_write(MODE_REG, MR_BASE);
 		shost_printk(KERN_ERR, instance,
Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2016-02-23 10:07:00.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2016-02-23 10:07:01.000000000 +1100
@@ -923,7 +923,6 @@ static void NCR5380_main(struct work_str
 	struct NCR5380_hostdata *hostdata =
 		container_of(work, struct NCR5380_hostdata, main_task);
 	struct Scsi_Host *instance = hostdata->host;
-	struct scsi_cmnd *cmd;
 	int done;
 
 	/*
@@ -936,8 +935,11 @@ static void NCR5380_main(struct work_str
 		done = 1;
 
 		spin_lock_irq(&hostdata->lock);
-		while (!hostdata->connected &&
-		       (cmd = dequeue_next_cmd(instance))) {
+		while (!hostdata->connected && !hostdata->selecting) {
+			struct scsi_cmnd *cmd = dequeue_next_cmd(instance);
+
+			if (!cmd)
+				break;
 
 			dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", cmd);
 
@@ -960,8 +962,7 @@ static void NCR5380_main(struct work_str
 #ifdef SUPPORT_TAGS
 			cmd_get_tag(cmd, cmd->cmnd[0] != REQUEST_SENSE);
 #endif
-			cmd = NCR5380_select(instance, cmd);
-			if (!cmd) {
+			if (!NCR5380_select(instance, cmd)) {
 				dsprintk(NDEBUG_MAIN, instance, "main: select complete\n");
 				maybe_release_dma_irq(instance);
 			} else {
@@ -1257,6 +1258,11 @@ static struct scsi_cmnd *NCR5380_select(
 		/* Reselection interrupt */
 		goto out;
 	}
+	if (!hostdata->selecting) {
+		/* Command was aborted */
+		NCR5380_write(MODE_REG, MR_BASE);
+		goto out;
+	}
 	if (err < 0) {
 		NCR5380_write(MODE_REG, MR_BASE);
 		shost_printk(KERN_ERR, instance,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate
  2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
                   ` (4 preceding siblings ...)
  2016-02-22 23:07 ` [PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling Finn Thain
@ 2016-02-22 23:07 ` Finn Thain
  2016-03-01  2:31 ` [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Martin K. Petersen
  6 siblings, 0 replies; 10+ messages in thread
From: Finn Thain @ 2016-02-22 23:07 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen, Michael Schmitz,
	linux-m68k, linux-scsi, linux-kernel

[-- Attachment #1: ncr5380-autosense-command --]
[-- Type: text/plain, Size: 2899 bytes --]

This bug causes the wrong command to have its sense pointer overwritten,
which sometimes leads to a NULL pointer deref. Fix this by checking which
command is being requeued before restoring the scsi_eh_save data.

It turns out that some targets will disconnect a REQUEST SENSE command.
The autosense algorithm doesn't anticipate this. Hence multiple commands
can end up undergoing autosense simultaneously, and they will all try to
use the same scsi_eh_save struct, which won't work. Defer autosense when
the scsi_eh_save storage is in use by another command.

Fixes: f27db8eb98a1 ("ncr5380: Fix autosense bugs")
Reported-and-tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

---
 drivers/scsi/NCR5380.c       |    4 ++--
 drivers/scsi/atari_NCR5380.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2016-02-23 10:07:01.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2016-02-23 10:07:02.000000000 +1100
@@ -760,7 +760,7 @@ static struct scsi_cmnd *dequeue_next_cm
 	struct NCR5380_cmd *ncmd;
 	struct scsi_cmnd *cmd;
 
-	if (list_empty(&hostdata->autosense)) {
+	if (hostdata->sensing || list_empty(&hostdata->autosense)) {
 		list_for_each_entry(ncmd, &hostdata->unissued, list) {
 			cmd = NCR5380_to_scmd(ncmd);
 			dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p target=%d busy=0x%02x lun=%llu\n",
@@ -793,7 +793,7 @@ static void requeue_cmd(struct Scsi_Host
 	struct NCR5380_hostdata *hostdata = shost_priv(instance);
 	struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
 
-	if (hostdata->sensing) {
+	if (hostdata->sensing == cmd) {
 		scsi_eh_restore_cmnd(cmd, &hostdata->ses);
 		list_add(&ncmd->list, &hostdata->autosense);
 		hostdata->sensing = NULL;
Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2016-02-23 10:07:01.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2016-02-23 10:07:02.000000000 +1100
@@ -862,7 +862,7 @@ static struct scsi_cmnd *dequeue_next_cm
 	struct NCR5380_cmd *ncmd;
 	struct scsi_cmnd *cmd;
 
-	if (list_empty(&hostdata->autosense)) {
+	if (hostdata->sensing || list_empty(&hostdata->autosense)) {
 		list_for_each_entry(ncmd, &hostdata->unissued, list) {
 			cmd = NCR5380_to_scmd(ncmd);
 			dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p target=%d busy=0x%02x lun=%llu\n",
@@ -901,7 +901,7 @@ static void requeue_cmd(struct Scsi_Host
 	struct NCR5380_hostdata *hostdata = shost_priv(instance);
 	struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
 
-	if (hostdata->sensing) {
+	if (hostdata->sensing == cmd) {
 		scsi_eh_restore_cmnd(cmd, &hostdata->ses);
 		list_add(&ncmd->list, &hostdata->autosense);
 		hostdata->sensing = NULL;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/6] ncr5380: Exception handling fixes for v4.5
  2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
                   ` (5 preceding siblings ...)
  2016-02-22 23:07 ` [PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate Finn Thain
@ 2016-03-01  2:31 ` Martin K. Petersen
  2016-03-01  3:32   ` Finn Thain
  6 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2016-03-01  2:31 UTC (permalink / raw)
  To: Finn Thain
  Cc: James Bottomley, Martin K. Petersen, Michael Schmitz, linux-m68k,
	linux-scsi, linux-kernel

>>>>> "Finn" == Finn Thain <fthain@telegraphics.com.au> writes:

Finn> These patches fix some exception handling and autosense bugs that
Finn> I accidentally introduced in v4.5-rc1.

Finn> drivers/scsi/NCR5380.c       |  133 +++++++++++++++++++------------------------
Finn> drivers/scsi/atari_NCR5380.c |  133 +++++++++++++++++++------------------------
Finn> 2 files changed, 118 insertions(+), 148 deletions(-)

This is a pretty big lump of changes for a 4.5 bug fix!

Given the limited exposure based on the nature of the affected hardware
I have queued them for 4.6.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/6] ncr5380: Exception handling fixes for v4.5
  2016-03-01  2:31 ` [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Martin K. Petersen
@ 2016-03-01  3:32   ` Finn Thain
  2016-03-01 13:00     ` Martin K. Petersen
  0 siblings, 1 reply; 10+ messages in thread
From: Finn Thain @ 2016-03-01  3:32 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Michael Schmitz, linux-m68k, linux-scsi, linux-kernel


On Mon, 29 Feb 2016, Martin K. Petersen wrote:

> >>>>> "Finn" == Finn Thain <fthain@telegraphics.com.au> writes:
> 
> Finn> These patches fix some exception handling and autosense bugs that
> Finn> I accidentally introduced in v4.5-rc1.
> 
> Finn> drivers/scsi/NCR5380.c       |  133 +++++++++++++++++++------------------------
> Finn> drivers/scsi/atari_NCR5380.c |  133 +++++++++++++++++++------------------------
> Finn> 2 files changed, 118 insertions(+), 148 deletions(-)
> 
> This is a pretty big lump of changes for a 4.5 bug fix!

I'm sure this is a lot of rework when compared to the high standard set by 
well-funded corporate contributions.

However, I don't have anyone paying me to write and execute thorough test 
plans for error paths on a wide variety of different hardware platforms.

Nonetheless, these error paths have now been thoroughly tested on several 
platforms.

Please keep in mind that these fixes are all rework of the changes I made 
in -rc1. So this submission is
   2 files changed, 118 insertions(+), 148 deletions(-)
which is all rework of the previous submission, which was
   18 files changed, 2940 insertions(+), 3688 deletions(-)

Better than 5% rate of rework. Bugs-per-line-of-code should drop quite 
quickly at that rate, presuming fixes get merged.

> 
> Given the limited exposure based on the nature of the affected hardware 
> I have queued them for 4.6.
> 

The affected hardware doesn't matter (though it is likely outside of any 
commercial support contract). It's the affected users that interest me.

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/6] ncr5380: Exception handling fixes for v4.5
  2016-03-01  3:32   ` Finn Thain
@ 2016-03-01 13:00     ` Martin K. Petersen
  0 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2016-03-01 13:00 UTC (permalink / raw)
  To: Finn Thain
  Cc: Martin K. Petersen, James Bottomley, Michael Schmitz, linux-m68k,
	linux-scsi, linux-kernel

>>>>> "Finn" == Finn Thain <fthain@telegraphics.com.au> writes:

Finn,

>> This is a pretty big lump of changes for a 4.5 bug fix!

Finn> Please keep in mind that these fixes are all rework of the changes
Finn> I made in -rc1. So this submission is 2 files changed, 118
Finn> insertions(+), 148 deletions(-) which is all rework of the
Finn> previous submission, which was 18 files changed, 2940
Finn> insertions(+), 3688 deletions(-)

I am well aware of and appreciate the huge amount of work you have done.

My concern is merely that Linus is going to get pretty upset with such a
big delta this late in the 4.5 cycle. By comparison, our other fixes at
this time are in the 1-5 line bucket!

I think putting the fixes into 4.6 and tagging them for stable is a
better (less emperor penguin wrath-inducing) approach.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-03-01 13:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22 23:07 [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Finn Thain
2016-02-22 23:07 ` [PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset Finn Thain
2016-02-22 23:07 ` [PATCH 2/6] ncr5380: Dont release lock for PIO transfer Finn Thain
2016-02-22 23:07 ` [PATCH 3/6] ncr5380: Dont re-enter NCR5380_select() Finn Thain
2016-02-22 23:07 ` [PATCH 4/6] ncr5380: Forget aborted commands Finn Thain
2016-02-22 23:07 ` [PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling Finn Thain
2016-02-22 23:07 ` [PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate Finn Thain
2016-03-01  2:31 ` [PATCH 0/6] ncr5380: Exception handling fixes for v4.5 Martin K. Petersen
2016-03-01  3:32   ` Finn Thain
2016-03-01 13:00     ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).