All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Limit overall SCSI EH runtime
@ 2013-06-10 11:11 Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 1/7] dpt_i2o: Remove DPTI_STATE_IOCTL Hannes Reinecke
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

This patchset implements a new 'eh_deadline' attribute to the
SCSI host. It will limit the overall SCSI EH runtime by a given
timeout. If the timeout expires all intermediate steps will
be skipped and host reset will be scheduled immediately.

For this patch I've re-used the existing 'last_reset' field
of the SCSI host to store the initial time SCSI EH started.
Also the field 'resetting' has been removed as it never has
been used as intended.

As 'last_reset' might be in use by transport-specific EH
implementation I've disallowed eh_deadline setting there.

Patchset is incremental to my earlier patchset
'scsi: improved eh timeout handler'.

As usual, comments etc are welcome.

Hannes Reinecke (7):
  dpt_i2o: Remove DPTI_STATE_IOCTL
  dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset
  advansys: Remove 'last_reset' references
  tmscsim: Move 'last_reset' into host structure
  dc395: Move 'last_reset' into internal host structure
  scsi: remove check for 'resetting'
  scsi: Add 'eh_deadline' to limit SCSI EH runtime

 drivers/scsi/advansys.c   |   8 +--
 drivers/scsi/dc395x.c     |  24 ++++----
 drivers/scsi/dpt_i2o.c    |  35 +++++-------
 drivers/scsi/dpti.h       |   1 -
 drivers/scsi/hosts.c      |   7 +++
 drivers/scsi/scsi.c       |  28 ---------
 drivers/scsi/scsi_error.c | 142 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/scsi/scsi_sysfs.c |  37 ++++++++++++
 drivers/scsi/tmscsim.c    |  14 ++---
 drivers/scsi/tmscsim.h    |   1 +
 include/scsi/scsi_host.h  |   2 +-
 11 files changed, 218 insertions(+), 81 deletions(-)

-- 
1.7.12.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/7] dpt_i2o: Remove DPTI_STATE_IOCTL
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
@ 2013-06-10 11:11 ` Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 2/7] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset Hannes Reinecke
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

scsi_block_host/scsi_unlock_host provides the required
functionality.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/dpt_i2o.c | 30 ++++++++++++------------------
 drivers/scsi/dpti.h    |  1 -
 2 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index 19e1b42..106ff1a 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -448,15 +448,7 @@ static int adpt_queue_lck(struct scsi_cmnd * cmd, void (*done) (struct scsi_cmnd
 	}
 
 	rmb();
-	/*
-	 * TODO: I need to block here if I am processing ioctl cmds
-	 * but if the outstanding cmds all finish before the ioctl,
-	 * the scsi-core will not know to start sending cmds to me again.
-	 * I need to a way to restart the scsi-cores queues or should I block
-	 * calling scsi_done on the outstanding cmds instead
-	 * for now we don't set the IOCTL state
-	 */
-	if(((pHba->state) & DPTI_STATE_IOCTL) || ((pHba->state) & DPTI_STATE_RESET)) {
+	if((pHba->state) & DPTI_STATE_RESET) {
 		pHba->host->last_reset = jiffies;
 		pHba->host->resetting = 1;
 		return 1;
@@ -1811,21 +1803,23 @@ static int adpt_i2o_passthru(adpt_hba* pHba, u32 __user *arg)
 	}
 
 	do {
-		if(pHba->host)
+		/*
+		 * Stop any new commands from enterring the
+		 * controller while processing the ioctl
+		 */
+		if(pHba->host) {
+			scsi_block_requests(pHba->host);
 			spin_lock_irqsave(pHba->host->host_lock, flags);
-		// This state stops any new commands from enterring the
-		// controller while processing the ioctl
-//		pHba->state |= DPTI_STATE_IOCTL;
-//		We can't set this now - The scsi subsystem sets host_blocked and
-//		the queue empties and stops.  We need a way to restart the queue
+		}
 		rcode = adpt_i2o_post_wait(pHba, msg, size, FOREVER);
 		if (rcode != 0)
 			printk("adpt_i2o_passthru: post wait failed %d %p\n",
 					rcode, reply);
-//		pHba->state &= ~DPTI_STATE_IOCTL;
-		if(pHba->host)
+		if(pHba->host) {
 			spin_unlock_irqrestore(pHba->host->host_lock, flags);
-	} while(rcode == -ETIMEDOUT);  
+			scsi_unblock_requests(pHba->host);
+		}
+	} while(rcode == -ETIMEDOUT);
 
 	if(rcode){
 		goto cleanup;
diff --git a/drivers/scsi/dpti.h b/drivers/scsi/dpti.h
index beded71..aeb0461 100644
--- a/drivers/scsi/dpti.h
+++ b/drivers/scsi/dpti.h
@@ -202,7 +202,6 @@ struct adpt_channel {
 
 // HBA state flags
 #define DPTI_STATE_RESET	(0x01)
-#define DPTI_STATE_IOCTL	(0x02)
 
 typedef struct _adpt_hba {
 	struct _adpt_hba *next;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/7] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 1/7] dpt_i2o: Remove DPTI_STATE_IOCTL Hannes Reinecke
@ 2013-06-10 11:11 ` Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 3/7] advansys: Remove 'last_reset' references Hannes Reinecke
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

When the HBA is in reset we should be returning 'busy' and not
rely on the obscure 'last_reset' feature.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/dpt_i2o.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index 106ff1a..0a2d801 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -448,11 +448,8 @@ static int adpt_queue_lck(struct scsi_cmnd * cmd, void (*done) (struct scsi_cmnd
 	}
 
 	rmb();
-	if((pHba->state) & DPTI_STATE_RESET) {
-		pHba->host->last_reset = jiffies;
-		pHba->host->resetting = 1;
-		return 1;
-	}
+	if((pHba->state) & DPTI_STATE_RESET)
+		return SCSI_MLQUEUE_HOST_BUSY;
 
 	// TODO if the cmd->device if offline then I may need to issue a bus rescan
 	// followed by a get_lct to see if the device is there anymore
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] advansys: Remove 'last_reset' references
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 1/7] dpt_i2o: Remove DPTI_STATE_IOCTL Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 2/7] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset Hannes Reinecke
@ 2013-06-10 11:11 ` Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 4/7] tmscsim: Move 'last_reset' into host structure Hannes Reinecke
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

Serves no purpose whatsoever.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/advansys.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/advansys.c b/drivers/scsi/advansys.c
index c67e401..d814588 100644
--- a/drivers/scsi/advansys.c
+++ b/drivers/scsi/advansys.c
@@ -2511,8 +2511,8 @@ static void asc_prt_scsi_host(struct Scsi_Host *s)
 	struct asc_board *boardp = shost_priv(s);
 
 	printk("Scsi_Host at addr 0x%p, device %s\n", s, dev_name(boardp->dev));
-	printk(" host_busy %u, host_no %d, last_reset %d,\n",
-	       s->host_busy, s->host_no, (unsigned)s->last_reset);
+	printk(" host_busy %u, host_no %d,\n",
+	       s->host_busy, s->host_no);
 
 	printk(" base 0x%lx, io_port 0x%lx, irq %d,\n",
 	       (ulong)s->base, (ulong)s->io_port, boardp->irq);
@@ -3345,8 +3345,8 @@ static void asc_prt_driver_conf(struct seq_file *m, struct Scsi_Host *shost)
 		shost->host_no);
 
 	seq_printf(m,
-		   " host_busy %u, last_reset %lu, max_id %u, max_lun %u, max_channel %u\n",
-		   shost->host_busy, shost->last_reset, shost->max_id,
+		   " host_busy %u, max_id %u, max_lun %u, max_channel %u\n",
+		   shost->host_busy, shost->max_id,
 		   shost->max_lun, shost->max_channel);
 
 	seq_printf(m,
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/7] tmscsim: Move 'last_reset' into host structure
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
                   ` (2 preceding siblings ...)
  2013-06-10 11:11 ` [PATCH 3/7] advansys: Remove 'last_reset' references Hannes Reinecke
@ 2013-06-10 11:11 ` Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 5/7] dc395: Move 'last_reset' into internal " Hannes Reinecke
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

The 'last_reset' value is only used internally, so move it into
the internal host structure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/tmscsim.c | 14 +++++++-------
 drivers/scsi/tmscsim.h |  1 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/tmscsim.c b/drivers/scsi/tmscsim.c
index 9327f5f..1142361 100644
--- a/drivers/scsi/tmscsim.c
+++ b/drivers/scsi/tmscsim.c
@@ -521,7 +521,7 @@ dc390_StartSCSI( struct dc390_acb* pACB, struct dc390_dcb* pDCB, struct dc390_sr
 	pACB->SelConn++;
 	return 1;
     }
-    if (time_before (jiffies, pACB->pScsiHost->last_reset))
+    if (time_before (jiffies, pACB->last_reset))
     {
 	DEBUG0(printk ("DC390: We were just reset and don't accept commands yet!\n"));
 	return 1;
@@ -1863,7 +1863,7 @@ dc390_ScsiRstDetect( struct dc390_acb* pACB )
     /* delay half a second */
     udelay (1000);
     DC390_write8 (ScsiCmd, CLEAR_FIFO_CMD);
-    pACB->pScsiHost->last_reset = jiffies + 5*HZ/2
+    pACB->last_reset = jiffies + 5*HZ/2
 		    + HZ * dc390_eepromBuf[pACB->AdapterIndex][EE_DELAY];
     pACB->Connected = 0;
 
@@ -2048,9 +2048,9 @@ static int DC390_bus_reset (struct scsi_cmnd *cmd)
 
 	dc390_ResetDevParam(pACB);
 	mdelay(1);
-	pACB->pScsiHost->last_reset = jiffies + 3*HZ/2 
+	pACB->last_reset = jiffies + 3*HZ/2
 		+ HZ * dc390_eepromBuf[pACB->AdapterIndex][EE_DELAY];
-    
+
 	DC390_write8(ScsiCmd, CLEAR_FIFO_CMD);
 	DC390_read8(INT_Status);		/* Reset Pending INT */
 
@@ -2383,7 +2383,7 @@ static void dc390_init_hw(struct dc390_acb *pACB, u8 index)
 	if (pACB->Gmode2 & RST_SCSI_BUS) {
 		dc390_ResetSCSIBus(pACB);
 		udelay(1000);
-		shost->last_reset = jiffies + HZ/2 +
+		pACB->last_reset = jiffies + HZ/2 +
 			HZ * dc390_eepromBuf[pACB->AdapterIndex][EE_DELAY];
 	}
 
@@ -2455,8 +2455,8 @@ static int dc390_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	shost->irq = pdev->irq;
 	shost->base = io_port;
 	shost->unique_id = io_port;
-	shost->last_reset = jiffies;
-	
+
+	pACB->last_reset = jiffies;
 	pACB->pScsiHost = shost;
 	pACB->IOPortBase = (u16) io_port;
 	pACB->IRQLevel = pdev->irq;
diff --git a/drivers/scsi/tmscsim.h b/drivers/scsi/tmscsim.h
index 77adc54..3d1bb4a 100644
--- a/drivers/scsi/tmscsim.h
+++ b/drivers/scsi/tmscsim.h
@@ -143,6 +143,7 @@ u8		Ignore_IRQ;	/* Not used */
 
 struct pci_dev	*pdev;
 
+unsigned long   last_reset;
 unsigned long	Cmds;
 u32		SelLost;
 u32		SelConn;
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/7] dc395: Move 'last_reset' into internal host structure
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
                   ` (3 preceding siblings ...)
  2013-06-10 11:11 ` [PATCH 4/7] tmscsim: Move 'last_reset' into host structure Hannes Reinecke
@ 2013-06-10 11:11 ` Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 6/7] scsi: remove check for 'resetting' Hannes Reinecke
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

'last_reset' is only used internally, so move it into the
internal host structure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/dc395x.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/dc395x.c b/drivers/scsi/dc395x.c
index 694e13c..42e8624 100644
--- a/drivers/scsi/dc395x.c
+++ b/drivers/scsi/dc395x.c
@@ -308,6 +308,8 @@ struct AdapterCtlBlk {
 	struct timer_list waiting_timer;
 	struct timer_list selto_timer;
 
+	unsigned long last_reset;
+
 	u16 srb_count;
 
 	u8 sel_timeout;
@@ -860,9 +862,9 @@ static void waiting_set_timer(struct AdapterCtlBlk *acb, unsigned long to)
 	init_timer(&acb->waiting_timer);
 	acb->waiting_timer.function = waiting_timeout;
 	acb->waiting_timer.data = (unsigned long) acb;
-	if (time_before(jiffies + to, acb->scsi_host->last_reset - HZ / 2))
+	if (time_before(jiffies + to, acb->last_reset - HZ / 2))
 		acb->waiting_timer.expires =
-		    acb->scsi_host->last_reset - HZ / 2 + 1;
+		    acb->last_reset - HZ / 2 + 1;
 	else
 		acb->waiting_timer.expires = jiffies + to + 1;
 	add_timer(&acb->waiting_timer);
@@ -1319,7 +1321,7 @@ static int __dc395x_eh_bus_reset(struct scsi_cmnd *cmd)
 	udelay(500);
 
 	/* We may be in serious trouble. Wait some seconds */
-	acb->scsi_host->last_reset =
+	acb->last_reset =
 	    jiffies + 3 * HZ / 2 +
 	    HZ * acb->eeprom.delay_time;
 
@@ -1462,9 +1464,9 @@ static void selto_timer(struct AdapterCtlBlk *acb)
 	acb->selto_timer.function = selection_timeout_missed;
 	acb->selto_timer.data = (unsigned long) acb;
 	if (time_before
-	    (jiffies + HZ, acb->scsi_host->last_reset + HZ / 2))
+	    (jiffies + HZ, acb->last_reset + HZ / 2))
 		acb->selto_timer.expires =
-		    acb->scsi_host->last_reset + HZ / 2 + 1;
+		    acb->last_reset + HZ / 2 + 1;
 	else
 		acb->selto_timer.expires = jiffies + HZ + 1;
 	add_timer(&acb->selto_timer);
@@ -1535,7 +1537,7 @@ static u8 start_scsi(struct AdapterCtlBlk* acb, struct DeviceCtlBlk* dcb,
 	}
 	/* Allow starting of SCSI commands half a second before we allow the mid-level
 	 * to queue them again after a reset */
-	if (time_before(jiffies, acb->scsi_host->last_reset - HZ / 2)) {
+	if (time_before(jiffies, acb->last_reset - HZ / 2)) {
 		dprintkdbg(DBG_KG, "start_scsi: Refuse cmds (reset wait)\n");
 		return 1;
 	}
@@ -3031,7 +3033,7 @@ static void disconnect(struct AdapterCtlBlk *acb)
 		dprintkl(KERN_ERR, "disconnect: No such device\n");
 		udelay(500);
 		/* Suspend queue for a while */
-		acb->scsi_host->last_reset =
+		acb->last_reset =
 		    jiffies + HZ / 2 +
 		    HZ * acb->eeprom.delay_time;
 		clear_fifo(acb, "disconnectEx");
@@ -3053,7 +3055,7 @@ static void disconnect(struct AdapterCtlBlk *acb)
 		waiting_process_next(acb);
 	} else if (srb->state & SRB_ABORT_SENT) {
 		dcb->flag &= ~ABORT_DEV_;
-		acb->scsi_host->last_reset = jiffies + HZ / 2 + 1;
+		acb->last_reset = jiffies + HZ / 2 + 1;
 		dprintkl(KERN_ERR, "disconnect: SRB_ABORT_SENT\n");
 		doing_srb_done(acb, DID_ABORT, srb->cmd, 1);
 		waiting_process_next(acb);
@@ -3649,7 +3651,7 @@ static void scsi_reset_detect(struct AdapterCtlBlk *acb)
 	/*DC395x_write8(acb, TRM_S1040_DMA_CONTROL,STOPDMAXFER); */
 	udelay(500);
 	/* Maybe we locked up the bus? Then lets wait even longer ... */
-	acb->scsi_host->last_reset =
+	acb->last_reset =
 	    jiffies + 5 * HZ / 2 +
 	    HZ * acb->eeprom.delay_time;
 
@@ -4426,7 +4428,7 @@ static void adapter_init_scsi_host(struct Scsi_Host *host)
 	host->dma_channel = -1;
 	host->unique_id = acb->io_port_base;
 	host->irq = acb->irq_level;
-	host->last_reset = jiffies;
+	acb->last_reset = jiffies;
 
 	host->max_id = 16;
 	if (host->max_id - 1 == eeprom->scsi_id)
@@ -4484,7 +4486,7 @@ static void adapter_init_chip(struct AdapterCtlBlk *acb)
 		/*spin_unlock_irq (&io_request_lock); */
 		udelay(500);
 
-		acb->scsi_host->last_reset =
+		acb->last_reset =
 		    jiffies + HZ / 2 +
 		    HZ * acb->eeprom.delay_time;
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/7] scsi: remove check for 'resetting'
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
                   ` (4 preceding siblings ...)
  2013-06-10 11:11 ` [PATCH 5/7] dc395: Move 'last_reset' into internal " Hannes Reinecke
@ 2013-06-10 11:11 ` Hannes Reinecke
  2013-06-10 11:11 ` [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Hannes Reinecke
  2013-06-27  9:23 ` [PATCH 0/7] Limit overall " Ren Mingxin
  7 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

Field is now unused, so this is dead code.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi.c | 28 ----------------------------
 1 file changed, 28 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2c0d0ec..ebe3b0a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -78,11 +78,6 @@ static void scsi_done(struct scsi_cmnd *cmd);
  * Definitions and constants.
  */
 
-#define MIN_RESET_DELAY (2*HZ)
-
-/* Do not call reset on error if we just did a reset within 15 sec. */
-#define MIN_RESET_PERIOD (15*HZ)
-
 /*
  * Note - the initial logging level can be set here to log events at boot time.
  * After the system is up, you may enable logging via the /proc interface.
@@ -658,7 +653,6 @@ EXPORT_SYMBOL(scsi_cmd_get_serial);
 int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 {
 	struct Scsi_Host *host = cmd->device->host;
-	unsigned long timeout;
 	int rtn = 0;
 
 	atomic_inc(&cmd->device->iorequest_cnt);
@@ -704,28 +698,6 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 			       (cmd->device->lun << 5 & 0xe0);
 	}
 
-	/*
-	 * We will wait MIN_RESET_DELAY clock ticks after the last reset so
-	 * we can avoid the drive not being ready.
-	 */
-	timeout = host->last_reset + MIN_RESET_DELAY;
-
-	if (host->resetting && time_before(jiffies, timeout)) {
-		int ticks_remaining = timeout - jiffies;
-		/*
-		 * NOTE: This may be executed from within an interrupt
-		 * handler!  This is bad, but for now, it'll do.  The irq
-		 * level of the interrupt handler has been masked out by the
-		 * platform dependent interrupt handling code already, so the
-		 * sti() here will not cause another call to the SCSI host's
-		 * interrupt handler (assuming there is one irq-level per
-		 * host).
-		 */
-		while (--ticks_remaining >= 0)
-			mdelay(1 + 999 / HZ);
-		host->resetting = 0;
-	}
-
 	scsi_log_send(cmd);
 
 	/*
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
                   ` (5 preceding siblings ...)
  2013-06-10 11:11 ` [PATCH 6/7] scsi: remove check for 'resetting' Hannes Reinecke
@ 2013-06-10 11:11 ` Hannes Reinecke
  2013-06-27 14:33   ` Ewan Milne
  2013-06-28  7:29   ` Bart Van Assche
  2013-06-27  9:23 ` [PATCH 0/7] Limit overall " Ren Mingxin
  7 siblings, 2 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-10 11:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

This patchs adds an 'eh_deadline' attribute to the scsi
host which limits the overall runtime of the SCSI EH.
When a command is failed the start time of the EH is stored
in 'last_reset'. If the overall runtime of the SCSI EH is longer
than last_reset + eh_deadline, the EH is short-circuited and
falls through to issue a host reset only.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/hosts.c      |   7 +++
 drivers/scsi/scsi_error.c | 142 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/scsi/scsi_sysfs.c |  37 ++++++++++++
 include/scsi/scsi_host.h  |   2 +-
 4 files changed, 180 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index df0c3c7..c8d828f 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -316,6 +316,12 @@ static void scsi_host_dev_release(struct device *dev)
 	kfree(shost);
 }
 
+static unsigned int shost_eh_deadline;
+
+module_param_named(eh_deadline, shost_eh_deadline, uint, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(eh_deadline,
+		 "SCSI EH deadline in seconds (should be between 1 and 2^32-1)");
+
 static struct device_type scsi_host_type = {
 	.name =		"scsi_host",
 	.release =	scsi_host_dev_release,
@@ -388,6 +394,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	shost->unchecked_isa_dma = sht->unchecked_isa_dma;
 	shost->use_clustering = sht->use_clustering;
 	shost->ordered_tag = sht->ordered_tag;
+	shost->eh_deadline = shost_eh_deadline;
 
 	if (sht->supported_mode == MODE_UNKNOWN)
 		/* means we didn't set it ... default to INITIATOR */
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 467cb3c..cf30475 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -91,6 +91,31 @@ void scsi_schedule_eh(struct Scsi_Host *shost)
 }
 EXPORT_SYMBOL_GPL(scsi_schedule_eh);
 
+static int sdev_eh_deadline(struct Scsi_Host *shost,
+			   unsigned long eh_start)
+{
+	if (!shost->eh_deadline)
+		return 0;
+
+	if (shost->last_reset != 0 &&
+	    time_before(shost->last_reset, eh_start))
+		eh_start = shost->last_reset;
+
+	if (time_before(jiffies,
+			eh_start + shost->eh_deadline))
+		return 0;
+
+	return 1;
+}
+
+static int scsi_host_eh_deadline(struct Scsi_Host *shost)
+{
+	if (!shost->last_reset)
+		return 0;
+
+	return sdev_eh_deadline(shost, shost->last_reset);
+}
+
 /**
  * scsi_eh_abort_handler - Handle command aborts
  * @work:	sdev on which commands should be aborted.
@@ -102,13 +127,15 @@ scsi_eh_abort_handler(struct work_struct *work)
 		container_of(work, struct scsi_device, abort_work);
 	struct scsi_cmnd *scmd, *tmp;
 	LIST_HEAD(abort_list);
-	unsigned long flags;
+	unsigned long flags, eh_start;
 	int rtn;
 
 	spin_lock_irqsave(&sdev->list_lock, flags);
 	list_splice_init(&sdev->eh_abort_list, &abort_list);
 	spin_unlock_irqrestore(&sdev->list_lock, flags);
 
+	eh_start = jiffies;
+
 	list_for_each_entry_safe(scmd, tmp, &abort_list, eh_entry) {
 		list_del_init(&scmd->eh_entry);
 		if (sdev->sdev_state == SDEV_CANCEL) {
@@ -119,6 +146,13 @@ scsi_eh_abort_handler(struct work_struct *work)
 			scsi_finish_command(scmd);
 			continue;
 		}
+		if (sdev_eh_deadline(sdev->host, eh_start)) {
+			SCSI_LOG_ERROR_RECOVERY(3,
+				scmd_printk(KERN_INFO, scmd,
+					     "eh timeout, not aborting\n"));
+			list_move_tail(&scmd->eh_entry, &abort_list);
+			goto start_eh;
+		}
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "aborting command %p\n", scmd));
@@ -151,6 +185,12 @@ scsi_eh_abort_handler(struct work_struct *work)
 		return;
 
 start_eh:
+	spin_lock_irqsave(sdev->host->host_lock, flags);
+	if (sdev->host->eh_deadline &&
+	    (!sdev->host->last_reset ||
+	     time_before(eh_start, sdev->host->last_reset)))
+		sdev->host->last_reset = eh_start;
+	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 	list_for_each_entry_safe(scmd, tmp, &abort_list, eh_entry) {
 		scmd->result |= DID_TIME_OUT << 16;
 		if (!scsi_eh_scmd_add(scmd, 0)) {
@@ -232,6 +272,9 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
 			goto out_unlock;
 
+	if (sdev->eh_deadline && !shost->last_reset)
+		shost->last_reset = jiffies;
+
 	ret = 1;
 	scmd->eh_eflags |= eh_flag;
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
@@ -1052,13 +1095,25 @@ int scsi_eh_get_sense(struct list_head *work_q,
 		      struct list_head *done_q)
 {
 	struct scsi_cmnd *scmd, *next;
+	struct Scsi_Host *shost;
 	int rtn;
+	unsigned long flags;
 
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
 		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
 		    SCSI_SENSE_VALID(scmd))
 			continue;
 
+		shost = scmd->device->host;
+		spin_lock_irqsave(shost->host_lock, flags);
+		if (scsi_host_eh_deadline(shost)) {
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			SCSI_LOG_ERROR_RECOVERY(3,
+				shost_printk(KERN_INFO, shost,
+					    "skip %s, eh timeout\n", __func__));
+			break;
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
 		SCSI_LOG_ERROR_RECOVERY(2, scmd_printk(KERN_INFO, scmd,
 						  "%s: requesting sense\n",
 						  current->comm));
@@ -1143,11 +1198,22 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
 	struct scsi_cmnd *scmd, *next;
 	struct scsi_device *sdev;
 	int finish_cmds;
+	unsigned long flags;
 
 	while (!list_empty(cmd_list)) {
 		scmd = list_entry(cmd_list->next, struct scsi_cmnd, eh_entry);
 		sdev = scmd->device;
 
+		if (!try_stu) {
+			spin_lock_irqsave(sdev->host->host_lock, flags);
+			if (scsi_host_eh_deadline(sdev->host)) {
+				spin_unlock_irqrestore(sdev->host->host_lock,
+						       flags);
+				break;
+			}
+			spin_unlock_irqrestore(sdev->host->host_lock, flags);
+		}
+
 		finish_cmds = !scsi_device_online(scmd->device) ||
 			(try_stu && !scsi_eh_try_stu(scmd) &&
 			 !scsi_eh_tur(scmd)) ||
@@ -1183,14 +1249,26 @@ static int scsi_eh_abort_cmds(struct list_head *work_q,
 	struct scsi_cmnd *scmd, *next;
 	LIST_HEAD(check_list);
 	int rtn;
+	struct Scsi_Host *shost;
+	unsigned long flags;
 
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
 		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
 			continue;
+		shost = scmd->device->host;
+		spin_lock_irqsave(shost->host_lock, flags);
+		if (scsi_host_eh_deadline(shost)) {
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			SCSI_LOG_ERROR_RECOVERY(3,
+				shost_printk(KERN_INFO, shost,
+					    "skip %s, eh timeout\n", __func__));
+			return 1;
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
 		SCSI_LOG_ERROR_RECOVERY(3, printk("%s: aborting cmd:"
 						  "0x%p\n", current->comm,
 						  scmd));
-		rtn = scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd);
+		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
 		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
 			scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
 			if (rtn == FAST_IO_FAIL)
@@ -1248,8 +1326,18 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
 {
 	struct scsi_cmnd *scmd, *stu_scmd, *next;
 	struct scsi_device *sdev;
+	unsigned long flags;
 
 	shost_for_each_device(sdev, shost) {
+		spin_lock_irqsave(shost->host_lock, flags);
+		if (scsi_host_eh_deadline(shost)) {
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			SCSI_LOG_ERROR_RECOVERY(3,
+				shost_printk(KERN_INFO, shost,
+					    "skip %s, eh timeout\n", __func__));
+			break;
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
 		stu_scmd = NULL;
 		list_for_each_entry(scmd, work_q, eh_entry)
 			if (scmd->device == sdev && SCSI_SENSE_VALID(scmd) &&
@@ -1302,9 +1390,19 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
 {
 	struct scsi_cmnd *scmd, *bdr_scmd, *next;
 	struct scsi_device *sdev;
+	unsigned long flags;
 	int rtn;
 
 	shost_for_each_device(sdev, shost) {
+		spin_lock_irqsave(shost->host_lock, flags);
+		if (scsi_host_eh_deadline(shost)) {
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			SCSI_LOG_ERROR_RECOVERY(3,
+				shost_printk(KERN_INFO, shost,
+					    "skip %s, eh timeout\n", __func__));
+			break;
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
 		bdr_scmd = NULL;
 		list_for_each_entry(scmd, work_q, eh_entry)
 			if (scmd->device == sdev) {
@@ -1364,6 +1462,19 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
 		struct scsi_cmnd *next, *scmd;
 		int rtn;
 		unsigned int id;
+		unsigned long flags;
+
+		spin_lock_irqsave(shost->host_lock, flags);
+		if (scsi_host_eh_deadline(shost)) {
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			/* push back on work queue for further processing */
+			list_splice_init(&tmp_list, work_q);
+			SCSI_LOG_ERROR_RECOVERY(3,
+				shost_printk(KERN_INFO, shost,
+					    "skip %s, eh timeout\n", __func__));
+			return list_empty(work_q);
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
 
 		scmd = list_entry(tmp_list.next, struct scsi_cmnd, eh_entry);
 		id = scmd_id(scmd);
@@ -1408,6 +1519,7 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
 	LIST_HEAD(check_list);
 	unsigned int channel;
 	int rtn;
+	unsigned long flags;
 
 	/*
 	 * we really want to loop over the various channels, and do this on
@@ -1417,6 +1529,16 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
 	 */
 
 	for (channel = 0; channel <= shost->max_channel; channel++) {
+		spin_lock_irqsave(shost->host_lock, flags);
+		if (scsi_host_eh_deadline(shost)) {
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			SCSI_LOG_ERROR_RECOVERY(3,
+				shost_printk(KERN_INFO, shost,
+					    "skip %s, eh timeout\n", __func__));
+			return list_empty(work_q);
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
+
 		chan_scmd = NULL;
 		list_for_each_entry(scmd, work_q, eh_entry) {
 			if (channel == scmd_channel(scmd)) {
@@ -1822,8 +1944,9 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
 	 * will be requests for character device operations, and also for
 	 * ioctls to queued block devices.
 	 */
-	SCSI_LOG_ERROR_RECOVERY(3, printk("%s: waking up host to restart\n",
-					  __func__));
+	SCSI_LOG_ERROR_RECOVERY(3,
+		printk("scsi_eh_%d waking up host to restart\n",
+		       shost->host_no));
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (scsi_host_set_state(shost, SHOST_RUNNING))
@@ -1950,6 +2073,10 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
 		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
 			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
 
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (sdev->eh_deadline)
+		shost->last_reset = 0;
+	spin_unlock_irqrestore(shost->host_lock, flags);
 	scsi_eh_flush_done_q(&eh_done_q);
 }
 
@@ -1976,7 +2103,7 @@ int scsi_error_handler(void *data)
 		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
 		    shost->host_failed != shost->host_busy) {
 			SCSI_LOG_ERROR_RECOVERY(1,
-				printk("Error handler scsi_eh_%d sleeping\n",
+				printk("scsi_eh_%d: sleeping\n",
 					shost->host_no));
 			schedule();
 			continue;
@@ -1984,8 +2111,9 @@ int scsi_error_handler(void *data)
 
 		__set_current_state(TASK_RUNNING);
 		SCSI_LOG_ERROR_RECOVERY(1,
-			printk("Error handler scsi_eh_%d waking up\n",
-				shost->host_no));
+			printk("scsi_eh_%d: waking up %d/%d/%d\n",
+			       shost->host_no, shost->host_eh_scheduled,
+			       shost->host_failed, shost->host_busy));
 
 		/*
 		 * We have a host that is failing for some reason.  Figure out
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index af64c1c..3c1742f 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -281,6 +281,42 @@ exit_store_host_reset:
 
 static DEVICE_ATTR(host_reset, S_IWUSR, NULL, store_host_reset);
 
+static ssize_t
+show_shost_eh_deadline(struct device *dev,
+		      struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+
+	return sprintf(buf, "%d\n", shost->eh_deadline);
+}
+
+static ssize_t
+store_shost_eh_deadline(struct device *dev, struct device_attribute *attr,
+		const char *buf, size_t count)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	int ret = -EINVAL;
+	int timeout;
+	unsigned long flags;
+
+	if (shost->transportt->eh_strategy_handler)
+		return ret;
+
+	if (sscanf(buf, "%d\n", &timeout) == 1) {
+		spin_lock_irqsave(shost->host_lock, flags);
+		if (scsi_host_in_recovery(shost))
+			ret = -EBUSY;
+		else {
+			shost->eh_deadline = timeout;
+			ret = count;
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
+	}
+	return ret;
+}
+
+static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store_shost_eh_deadline);
+
 shost_rd_attr(unique_id, "%u\n");
 shost_rd_attr(host_busy, "%hu\n");
 shost_rd_attr(cmd_per_lun, "%hd\n");
@@ -308,6 +344,7 @@ static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_prot_capabilities.attr,
 	&dev_attr_prot_guard_type.attr,
 	&dev_attr_host_reset.attr,
+	&dev_attr_eh_deadline.attr,
 	NULL
 };
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 7552435..ca87486 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -598,7 +598,7 @@ struct Scsi_Host {
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */
     
 	unsigned int host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
-	int resetting; /* if set, it means that last_reset is a valid value */
+	int eh_deadline; /* Deadline for EH runtime */
 	unsigned long last_reset;
 
 	/*
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/7] Limit overall SCSI EH runtime
  2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
                   ` (6 preceding siblings ...)
  2013-06-10 11:11 ` [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Hannes Reinecke
@ 2013-06-27  9:23 ` Ren Mingxin
  7 siblings, 0 replies; 14+ messages in thread
From: Ren Mingxin @ 2013-06-27  9:23 UTC (permalink / raw)
  To: Hannes Reinecke, James Smart
  Cc: James Bottomley, linux-scsi, Joern Engel, Ewan Milne,
	Roland Dreier, Bryn Reeves, Christoph Hellwig

Hi, Hannes & James:

On 06/10/2013 07:11 PM, Hannes Reinecke wrote:
> This patchset implements a new 'eh_deadline' attribute to the
> SCSI host. It will limit the overall SCSI EH runtime by a given
> timeout. If the timeout expires all intermediate steps will
> be skipped and host reset will be scheduled immediately.

First of all, I think this patchset is useful to restrict some actual
interminable EH processes. BTW: There were some patches which tried
to add user interface to customize different hardware reset levels to
shorten the EH duration, and unfortunately they were not accepted.
Your hard work on EH improvement is much appreciated:-)

But please let me know yourserialEH improvement jobs - will the
redundant environment(such as multipath, mirroring) be taken into
account specially? To this patchset, will just giving a appropriate
timeout to skip EH except for host reset is enough for a quick fail
over in redundant systems? In other words, do you think reserving the
host reset which will occupy the longest time in the escalated levels
is acceptable with redundant configuration?

Thanks,
Ren

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime
  2013-06-10 11:11 ` [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Hannes Reinecke
@ 2013-06-27 14:33   ` Ewan Milne
  2013-06-28  7:14     ` Hannes Reinecke
  2013-06-28  7:29   ` Bart Van Assche
  1 sibling, 1 reply; 14+ messages in thread
From: Ewan Milne @ 2013-06-27 14:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Joern Engel, James Smart,
	Ren Mingxin, Roland Dreier, Bryn Reeves, Christoph Hellwig

The eh_deadline changes allow for a significant improvement
in multipath failover time.  It works very well in our testing.
I do have a few corrections, see below:

On Mon, 2013-06-10 at 13:11 +0200, Hannes Reinecke wrote:
> This patchs adds an 'eh_deadline' attribute to the scsi
> host which limits the overall runtime of the SCSI EH.
> When a command is failed the start time of the EH is stored
> in 'last_reset'. If the overall runtime of the SCSI EH is longer
> than last_reset + eh_deadline, the EH is short-circuited and
> falls through to issue a host reset only.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/hosts.c      |   7 +++
>  drivers/scsi/scsi_error.c | 142 +++++++++++++++++++++++++++++++++++++++++++---
>  drivers/scsi/scsi_sysfs.c |  37 ++++++++++++
>  include/scsi/scsi_host.h  |   2 +-
>  4 files changed, 180 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index df0c3c7..c8d828f 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -316,6 +316,12 @@ static void scsi_host_dev_release(struct device *dev)
>  	kfree(shost);
>  }
>  
> +static unsigned int shost_eh_deadline;
> +
> +module_param_named(eh_deadline, shost_eh_deadline, uint, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(eh_deadline,
> +		 "SCSI EH deadline in seconds (should be between 1 and 2^32-1)");
> +
>  static struct device_type scsi_host_type = {
>  	.name =		"scsi_host",
>  	.release =	scsi_host_dev_release,
> @@ -388,6 +394,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>  	shost->unchecked_isa_dma = sht->unchecked_isa_dma;
>  	shost->use_clustering = sht->use_clustering;
>  	shost->ordered_tag = sht->ordered_tag;
> +	shost->eh_deadline = shost_eh_deadline;

This should be shost->eh_deadline = shost_eh_deadline * HZ; since the
parameter is specified in seconds.

>  
>  	if (sht->supported_mode == MODE_UNKNOWN)
>  		/* means we didn't set it ... default to INITIATOR */
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 467cb3c..cf30475 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -91,6 +91,31 @@ void scsi_schedule_eh(struct Scsi_Host *shost)
>  }
>  EXPORT_SYMBOL_GPL(scsi_schedule_eh);
>  
> +static int sdev_eh_deadline(struct Scsi_Host *shost,
> +			   unsigned long eh_start)
> +{
> +	if (!shost->eh_deadline)
> +		return 0;
> +
> +	if (shost->last_reset != 0 &&
> +	    time_before(shost->last_reset, eh_start))
> +		eh_start = shost->last_reset;
> +
> +	if (time_before(jiffies,
> +			eh_start + shost->eh_deadline))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static int scsi_host_eh_deadline(struct Scsi_Host *shost)
> +{
> +	if (!shost->last_reset)
> +		return 0;
> +
> +	return sdev_eh_deadline(shost, shost->last_reset);
> +}
> +
>  /**
>   * scsi_eh_abort_handler - Handle command aborts
>   * @work:	sdev on which commands should be aborted.
> @@ -102,13 +127,15 @@ scsi_eh_abort_handler(struct work_struct *work)
>  		container_of(work, struct scsi_device, abort_work);
>  	struct scsi_cmnd *scmd, *tmp;
>  	LIST_HEAD(abort_list);
> -	unsigned long flags;
> +	unsigned long flags, eh_start;
>  	int rtn;
>  
>  	spin_lock_irqsave(&sdev->list_lock, flags);
>  	list_splice_init(&sdev->eh_abort_list, &abort_list);
>  	spin_unlock_irqrestore(&sdev->list_lock, flags);
>  
> +	eh_start = jiffies;
> +
>  	list_for_each_entry_safe(scmd, tmp, &abort_list, eh_entry) {
>  		list_del_init(&scmd->eh_entry);
>  		if (sdev->sdev_state == SDEV_CANCEL) {
> @@ -119,6 +146,13 @@ scsi_eh_abort_handler(struct work_struct *work)
>  			scsi_finish_command(scmd);
>  			continue;
>  		}
> +		if (sdev_eh_deadline(sdev->host, eh_start)) {
> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				scmd_printk(KERN_INFO, scmd,
> +					     "eh timeout, not aborting\n"));
> +			list_move_tail(&scmd->eh_entry, &abort_list);
> +			goto start_eh;
> +		}
>  		SCSI_LOG_ERROR_RECOVERY(3,
>  			scmd_printk(KERN_INFO, scmd,
>  				    "aborting command %p\n", scmd));
> @@ -151,6 +185,12 @@ scsi_eh_abort_handler(struct work_struct *work)
>  		return;
>  
>  start_eh:
> +	spin_lock_irqsave(sdev->host->host_lock, flags);
> +	if (sdev->host->eh_deadline &&
> +	    (!sdev->host->last_reset ||
> +	     time_before(eh_start, sdev->host->last_reset)))
> +		sdev->host->last_reset = eh_start;
> +	spin_unlock_irqrestore(sdev->host->host_lock, flags);
>  	list_for_each_entry_safe(scmd, tmp, &abort_list, eh_entry) {
>  		scmd->result |= DID_TIME_OUT << 16;
>  		if (!scsi_eh_scmd_add(scmd, 0)) {
> @@ -232,6 +272,9 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
>  			goto out_unlock;
>  
> +	if (sdev->eh_deadline && !shost->last_reset)
> +		shost->last_reset = jiffies;
> +

I think this is supposed to be if (shost->eh_deadline ...

>  	ret = 1;
>  	scmd->eh_eflags |= eh_flag;
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
> @@ -1052,13 +1095,25 @@ int scsi_eh_get_sense(struct list_head *work_q,
>  		      struct list_head *done_q)
>  {
>  	struct scsi_cmnd *scmd, *next;
> +	struct Scsi_Host *shost;
>  	int rtn;
> +	unsigned long flags;
>  
>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>  		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
>  		    SCSI_SENSE_VALID(scmd))
>  			continue;
>  
> +		shost = scmd->device->host;
> +		spin_lock_irqsave(shost->host_lock, flags);
> +		if (scsi_host_eh_deadline(shost)) {
> +			spin_unlock_irqrestore(shost->host_lock, flags);
> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				shost_printk(KERN_INFO, shost,
> +					    "skip %s, eh timeout\n", __func__));
> +			break;
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
>  		SCSI_LOG_ERROR_RECOVERY(2, scmd_printk(KERN_INFO, scmd,
>  						  "%s: requesting sense\n",
>  						  current->comm));
> @@ -1143,11 +1198,22 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
>  	struct scsi_cmnd *scmd, *next;
>  	struct scsi_device *sdev;
>  	int finish_cmds;
> +	unsigned long flags;
>  
>  	while (!list_empty(cmd_list)) {
>  		scmd = list_entry(cmd_list->next, struct scsi_cmnd, eh_entry);
>  		sdev = scmd->device;
>  
> +		if (!try_stu) {
> +			spin_lock_irqsave(sdev->host->host_lock, flags);
> +			if (scsi_host_eh_deadline(sdev->host)) {
> +				spin_unlock_irqrestore(sdev->host->host_lock,
> +						       flags);

I think a list_splice_init(cmd_list, work_q); is needed here, otherwise
scmds that are still on the cmd_list will be orphaned.  There should
also be a SCSI_LOG_ERROR_RECOVERY() as was done in other places.

> +				break;
> +			}
> +			spin_unlock_irqrestore(sdev->host->host_lock, flags);
> +		}
> +
>  		finish_cmds = !scsi_device_online(scmd->device) ||
>  			(try_stu && !scsi_eh_try_stu(scmd) &&
>  			 !scsi_eh_tur(scmd)) ||
> @@ -1183,14 +1249,26 @@ static int scsi_eh_abort_cmds(struct list_head *work_q,
>  	struct scsi_cmnd *scmd, *next;
>  	LIST_HEAD(check_list);
>  	int rtn;
> +	struct Scsi_Host *shost;
> +	unsigned long flags;
>  
>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>  		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
>  			continue;
> +		shost = scmd->device->host;
> +		spin_lock_irqsave(shost->host_lock, flags);
> +		if (scsi_host_eh_deadline(shost)) {
> +			spin_unlock_irqrestore(shost->host_lock, flags);

I think a list_splice_init(&check_list, work_q); is needed here,
otherwise scmds that are on the check_list will be orphaned.

> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				shost_printk(KERN_INFO, shost,
> +					    "skip %s, eh timeout\n", __func__));
> +			return 1;
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
>  		SCSI_LOG_ERROR_RECOVERY(3, printk("%s: aborting cmd:"
>  						  "0x%p\n", current->comm,
>  						  scmd));
> -		rtn = scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd);
> +		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
>  		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
>  			scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
>  			if (rtn == FAST_IO_FAIL)
> @@ -1248,8 +1326,18 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
>  {
>  	struct scsi_cmnd *scmd, *stu_scmd, *next;
>  	struct scsi_device *sdev;
> +	unsigned long flags;
>  
>  	shost_for_each_device(sdev, shost) {
> +		spin_lock_irqsave(shost->host_lock, flags);
> +		if (scsi_host_eh_deadline(shost)) {
> +			spin_unlock_irqrestore(shost->host_lock, flags);
> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				shost_printk(KERN_INFO, shost,
> +					    "skip %s, eh timeout\n", __func__));
> +			break;
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
>  		stu_scmd = NULL;
>  		list_for_each_entry(scmd, work_q, eh_entry)
>  			if (scmd->device == sdev && SCSI_SENSE_VALID(scmd) &&
> @@ -1302,9 +1390,19 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
>  {
>  	struct scsi_cmnd *scmd, *bdr_scmd, *next;
>  	struct scsi_device *sdev;
> +	unsigned long flags;
>  	int rtn;
>  
>  	shost_for_each_device(sdev, shost) {
> +		spin_lock_irqsave(shost->host_lock, flags);
> +		if (scsi_host_eh_deadline(shost)) {
> +			spin_unlock_irqrestore(shost->host_lock, flags);
> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				shost_printk(KERN_INFO, shost,
> +					    "skip %s, eh timeout\n", __func__));
> +			break;
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
>  		bdr_scmd = NULL;
>  		list_for_each_entry(scmd, work_q, eh_entry)
>  			if (scmd->device == sdev) {
> @@ -1364,6 +1462,19 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>  		struct scsi_cmnd *next, *scmd;
>  		int rtn;
>  		unsigned int id;
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(shost->host_lock, flags);
> +		if (scsi_host_eh_deadline(shost)) {
> +			spin_unlock_irqrestore(shost->host_lock, flags);
> +			/* push back on work queue for further processing */

I think a list_splice_init(&check_list, work_q); is needed here,
otherwise scmds that are on the check_list will be orphaned.

> +			list_splice_init(&tmp_list, work_q);
> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				shost_printk(KERN_INFO, shost,
> +					    "skip %s, eh timeout\n", __func__));
> +			return list_empty(work_q);
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
>  
>  		scmd = list_entry(tmp_list.next, struct scsi_cmnd, eh_entry);
>  		id = scmd_id(scmd);
> @@ -1408,6 +1519,7 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>  	LIST_HEAD(check_list);
>  	unsigned int channel;
>  	int rtn;
> +	unsigned long flags;
>  
>  	/*
>  	 * we really want to loop over the various channels, and do this on
> @@ -1417,6 +1529,16 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>  	 */
>  
>  	for (channel = 0; channel <= shost->max_channel; channel++) {
> +		spin_lock_irqsave(shost->host_lock, flags);
> +		if (scsi_host_eh_deadline(shost)) {
> +			spin_unlock_irqrestore(shost->host_lock, flags);

I think a list_splice_init(&check_list, work_q); is needed here,
otherwise scmds that are on the check_list will be orphaned.

> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				shost_printk(KERN_INFO, shost,
> +					    "skip %s, eh timeout\n", __func__));
> +			return list_empty(work_q);
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +
>  		chan_scmd = NULL;
>  		list_for_each_entry(scmd, work_q, eh_entry) {
>  			if (channel == scmd_channel(scmd)) {
> @@ -1822,8 +1944,9 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
>  	 * will be requests for character device operations, and also for
>  	 * ioctls to queued block devices.
>  	 */
> -	SCSI_LOG_ERROR_RECOVERY(3, printk("%s: waking up host to restart\n",
> -					  __func__));
> +	SCSI_LOG_ERROR_RECOVERY(3,
> +		printk("scsi_eh_%d waking up host to restart\n",
> +		       shost->host_no));
>  
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	if (scsi_host_set_state(shost, SHOST_RUNNING))
> @@ -1950,6 +2073,10 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>  		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
>  			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>  
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	if (sdev->eh_deadline)

I think this is supposed to be if (shost->eh_deadline ...

> +		shost->last_reset = 0;
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>  	scsi_eh_flush_done_q(&eh_done_q);
>  }
>  
> @@ -1976,7 +2103,7 @@ int scsi_error_handler(void *data)
>  		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
>  		    shost->host_failed != shost->host_busy) {
>  			SCSI_LOG_ERROR_RECOVERY(1,
> -				printk("Error handler scsi_eh_%d sleeping\n",
> +				printk("scsi_eh_%d: sleeping\n",
>  					shost->host_no));
>  			schedule();
>  			continue;
> @@ -1984,8 +2111,9 @@ int scsi_error_handler(void *data)
>  
>  		__set_current_state(TASK_RUNNING);
>  		SCSI_LOG_ERROR_RECOVERY(1,
> -			printk("Error handler scsi_eh_%d waking up\n",
> -				shost->host_no));
> +			printk("scsi_eh_%d: waking up %d/%d/%d\n",
> +			       shost->host_no, shost->host_eh_scheduled,
> +			       shost->host_failed, shost->host_busy));
>  
>  		/*
>  		 * We have a host that is failing for some reason.  Figure out
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index af64c1c..3c1742f 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -281,6 +281,42 @@ exit_store_host_reset:
>  
>  static DEVICE_ATTR(host_reset, S_IWUSR, NULL, store_host_reset);
>  
> +static ssize_t
> +show_shost_eh_deadline(struct device *dev,
> +		      struct device_attribute *attr, char *buf)
> +{
> +	struct Scsi_Host *shost = class_to_shost(dev);
> +
> +	return sprintf(buf, "%d\n", shost->eh_deadline);

I think that the attribute should be specified in seconds, so
this should be shost->eh_deadline / HZ.

> +}
> +
> +static ssize_t
> +store_shost_eh_deadline(struct device *dev, struct device_attribute *attr,
> +		const char *buf, size_t count)
> +{
> +	struct Scsi_Host *shost = class_to_shost(dev);
> +	int ret = -EINVAL;
> +	int timeout;
> +	unsigned long flags;
> +
> +	if (shost->transportt->eh_strategy_handler)
> +		return ret;
> +
> +	if (sscanf(buf, "%d\n", &timeout) == 1) {
> +		spin_lock_irqsave(shost->host_lock, flags);
> +		if (scsi_host_in_recovery(shost))
> +			ret = -EBUSY;
> +		else {
> +			shost->eh_deadline = timeout;

I think the deadline should be specified in seconds, so this
should be shost->eh_deadline = timeout * HZ;

> +			ret = count;
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +	}
> +	return ret;
> +}
> +
> +static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store_shost_eh_deadline);
> +
>  shost_rd_attr(unique_id, "%u\n");
>  shost_rd_attr(host_busy, "%hu\n");
>  shost_rd_attr(cmd_per_lun, "%hd\n");
> @@ -308,6 +344,7 @@ static struct attribute *scsi_sysfs_shost_attrs[] = {
>  	&dev_attr_prot_capabilities.attr,
>  	&dev_attr_prot_guard_type.attr,
>  	&dev_attr_host_reset.attr,
> +	&dev_attr_eh_deadline.attr,
>  	NULL
>  };
>  
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 7552435..ca87486 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -598,7 +598,7 @@ struct Scsi_Host {
>  	unsigned int host_eh_scheduled;    /* EH scheduled without command */
>      
>  	unsigned int host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
> -	int resetting; /* if set, it means that last_reset is a valid value */
> +	int eh_deadline; /* Deadline for EH runtime */
>  	unsigned long last_reset;
>  
>  	/*



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime
  2013-06-27 14:33   ` Ewan Milne
@ 2013-06-28  7:14     ` Hannes Reinecke
  2013-06-28 12:54       ` Ewan Milne
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-28  7:14 UTC (permalink / raw)
  To: emilne
  Cc: James Bottomley, linux-scsi, Joern Engel, James Smart,
	Ren Mingxin, Roland Dreier, Bryn Reeves, Christoph Hellwig

On 06/27/2013 04:33 PM, Ewan Milne wrote:
> The eh_deadline changes allow for a significant improvement
> in multipath failover time.  It works very well in our testing.
> I do have a few corrections, see below:
> 
> On Mon, 2013-06-10 at 13:11 +0200, Hannes Reinecke wrote:
>> This patchs adds an 'eh_deadline' attribute to the scsi
>> host which limits the overall runtime of the SCSI EH.
>> When a command is failed the start time of the EH is stored
>> in 'last_reset'. If the overall runtime of the SCSI EH is longer
>> than last_reset + eh_deadline, the EH is short-circuited and
>> falls through to issue a host reset only.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>  drivers/scsi/hosts.c      |   7 +++
>>  drivers/scsi/scsi_error.c | 142 +++++++++++++++++++++++++++++++++++++++++++---
>>  drivers/scsi/scsi_sysfs.c |  37 ++++++++++++
>>  include/scsi/scsi_host.h  |   2 +-
>>  4 files changed, 180 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index df0c3c7..c8d828f 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -316,6 +316,12 @@ static void scsi_host_dev_release(struct device *dev)
>>  	kfree(shost);
>>  }
>>  
>> +static unsigned int shost_eh_deadline;
>> +
>> +module_param_named(eh_deadline, shost_eh_deadline, uint, S_IRUGO|S_IWUSR);
>> +MODULE_PARM_DESC(eh_deadline,
>> +		 "SCSI EH deadline in seconds (should be between 1 and 2^32-1)");
>> +
>>  static struct device_type scsi_host_type = {
>>  	.name =		"scsi_host",
>>  	.release =	scsi_host_dev_release,
>> @@ -388,6 +394,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>>  	shost->unchecked_isa_dma = sht->unchecked_isa_dma;
>>  	shost->use_clustering = sht->use_clustering;
>>  	shost->ordered_tag = sht->ordered_tag;
>> +	shost->eh_deadline = shost_eh_deadline;
> 
> This should be shost->eh_deadline = shost_eh_deadline * HZ; since the
> parameter is specified in seconds.
> 
Yes, correct. Will be fixed with the next round.

[ .. ]
>> @@ -232,6 +272,9 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>>  		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
>>  			goto out_unlock;
>>  
>> +	if (sdev->eh_deadline && !shost->last_reset)
>> +		shost->last_reset = jiffies;
>> +
> 
> I think this is supposed to be if (shost->eh_deadline ...
> 
No. ->last_reset is set to the time the first command timeout/failure.

->last_reset + ->eh_deadline will give you the expiry time.

[ .. ]
>> @@ -1143,11 +1198,22 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
>>  	struct scsi_cmnd *scmd, *next;
>>  	struct scsi_device *sdev;
>>  	int finish_cmds;
>> +	unsigned long flags;
>>  
>>  	while (!list_empty(cmd_list)) {
>>  		scmd = list_entry(cmd_list->next, struct scsi_cmnd, eh_entry);
>>  		sdev = scmd->device;
>>  
>> +		if (!try_stu) {
>> +			spin_lock_irqsave(sdev->host->host_lock, flags);
>> +			if (scsi_host_eh_deadline(sdev->host)) {
>> +				spin_unlock_irqrestore(sdev->host->host_lock,
>> +						       flags);
> 
> I think a list_splice_init(cmd_list, work_q); is needed here, otherwise
> scmds that are still on the cmd_list will be orphaned.  There should
> also be a SCSI_LOG_ERROR_RECOVERY() as was done in other places.
> 
Yes. Will be fixed with the next round.

>> +				break;
>> +			}
>> +			spin_unlock_irqrestore(sdev->host->host_lock, flags);
>> +		}
>> +
>>  		finish_cmds = !scsi_device_online(scmd->device) ||
>>  			(try_stu && !scsi_eh_try_stu(scmd) &&
>>  			 !scsi_eh_tur(scmd)) ||
>> @@ -1183,14 +1249,26 @@ static int scsi_eh_abort_cmds(struct list_head *work_q,
>>  	struct scsi_cmnd *scmd, *next;
>>  	LIST_HEAD(check_list);
>>  	int rtn;
>> +	struct Scsi_Host *shost;
>> +	unsigned long flags;
>>  
>>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>>  		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
>>  			continue;
>> +		shost = scmd->device->host;
>> +		spin_lock_irqsave(shost->host_lock, flags);
>> +		if (scsi_host_eh_deadline(shost)) {
>> +			spin_unlock_irqrestore(shost->host_lock, flags);
> 
> I think a list_splice_init(&check_list, work_q); is needed here,
> otherwise scmds that are on the check_list will be orphaned.
> 
Yes. Will be fixed with the next round.

[ .. ]
>> @@ -1364,6 +1462,19 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>>  		struct scsi_cmnd *next, *scmd;
>>  		int rtn;
>>  		unsigned int id;
>> +		unsigned long flags;
>> +
>> +		spin_lock_irqsave(shost->host_lock, flags);
>> +		if (scsi_host_eh_deadline(shost)) {
>> +			spin_unlock_irqrestore(shost->host_lock, flags);
>> +			/* push back on work queue for further processing */
> 
> I think a list_splice_init(&check_list, work_q); is needed here,
> otherwise scmds that are on the check_list will be orphaned.
> 
I already did so; will be fixed with the next round.

>> +			list_splice_init(&tmp_list, work_q);
>> +			SCSI_LOG_ERROR_RECOVERY(3,
>> +				shost_printk(KERN_INFO, shost,
>> +					    "skip %s, eh timeout\n", __func__));
>> +			return list_empty(work_q);
>> +		}
>> +		spin_unlock_irqrestore(shost->host_lock, flags);
>>  
>>  		scmd = list_entry(tmp_list.next, struct scsi_cmnd, eh_entry);
>>  		id = scmd_id(scmd);
>> @@ -1408,6 +1519,7 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>>  	LIST_HEAD(check_list);
>>  	unsigned int channel;
>>  	int rtn;
>> +	unsigned long flags;
>>  
>>  	/*
>>  	 * we really want to loop over the various channels, and do this on
>> @@ -1417,6 +1529,16 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>>  	 */
>>  
>>  	for (channel = 0; channel <= shost->max_channel; channel++) {
>> +		spin_lock_irqsave(shost->host_lock, flags);
>> +		if (scsi_host_eh_deadline(shost)) {
>> +			spin_unlock_irqrestore(shost->host_lock, flags);
> 
> I think a list_splice_init(&check_list, work_q); is needed here,
> otherwise scmds that are on the check_list will be orphaned.
> 
Yes, will be fixed.

>> +			SCSI_LOG_ERROR_RECOVERY(3,
>> +				shost_printk(KERN_INFO, shost,
>> +					    "skip %s, eh timeout\n", __func__));
>> +			return list_empty(work_q);
>> +		}
>> +		spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>>  		chan_scmd = NULL;
>>  		list_for_each_entry(scmd, work_q, eh_entry) {
>>  			if (channel == scmd_channel(scmd)) {
>> @@ -1822,8 +1944,9 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
>>  	 * will be requests for character device operations, and also for
>>  	 * ioctls to queued block devices.
>>  	 */
>> -	SCSI_LOG_ERROR_RECOVERY(3, printk("%s: waking up host to restart\n",
>> -					  __func__));
>> +	SCSI_LOG_ERROR_RECOVERY(3,
>> +		printk("scsi_eh_%d waking up host to restart\n",
>> +		       shost->host_no));
>>  
>>  	spin_lock_irqsave(shost->host_lock, flags);
>>  	if (scsi_host_set_state(shost, SHOST_RUNNING))
>> @@ -1950,6 +2073,10 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>>  		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
>>  			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>>  
>> +	spin_lock_irqsave(shost->host_lock, flags);
>> +	if (sdev->eh_deadline)
> 
> I think this is supposed to be if (shost->eh_deadline ...
> 
Of course.

>> +		shost->last_reset = 0;
>> +	spin_unlock_irqrestore(shost->host_lock, flags);
>>  	scsi_eh_flush_done_q(&eh_done_q);
>>  }
>>  
>> @@ -1976,7 +2103,7 @@ int scsi_error_handler(void *data)
>>  		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
>>  		    shost->host_failed != shost->host_busy) {
>>  			SCSI_LOG_ERROR_RECOVERY(1,
>> -				printk("Error handler scsi_eh_%d sleeping\n",
>> +				printk("scsi_eh_%d: sleeping\n",
>>  					shost->host_no));
>>  			schedule();
>>  			continue;
>> @@ -1984,8 +2111,9 @@ int scsi_error_handler(void *data)
>>  
>>  		__set_current_state(TASK_RUNNING);
>>  		SCSI_LOG_ERROR_RECOVERY(1,
>> -			printk("Error handler scsi_eh_%d waking up\n",
>> -				shost->host_no));
>> +			printk("scsi_eh_%d: waking up %d/%d/%d\n",
>> +			       shost->host_no, shost->host_eh_scheduled,
>> +			       shost->host_failed, shost->host_busy));
>>  
>>  		/*
>>  		 * We have a host that is failing for some reason.  Figure out
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index af64c1c..3c1742f 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -281,6 +281,42 @@ exit_store_host_reset:
>>  
>>  static DEVICE_ATTR(host_reset, S_IWUSR, NULL, store_host_reset);
>>  
>> +static ssize_t
>> +show_shost_eh_deadline(struct device *dev,
>> +		      struct device_attribute *attr, char *buf)
>> +{
>> +	struct Scsi_Host *shost = class_to_shost(dev);
>> +
>> +	return sprintf(buf, "%d\n", shost->eh_deadline);
> 
> I think that the attribute should be specified in seconds, so
> this should be shost->eh_deadline / HZ.
> 
Already did so.

>> +}
>> +
>> +static ssize_t
>> +store_shost_eh_deadline(struct device *dev, struct device_attribute *attr,
>> +		const char *buf, size_t count)
>> +{
>> +	struct Scsi_Host *shost = class_to_shost(dev);
>> +	int ret = -EINVAL;
>> +	int timeout;
>> +	unsigned long flags;
>> +
>> +	if (shost->transportt->eh_strategy_handler)
>> +		return ret;
>> +
>> +	if (sscanf(buf, "%d\n", &timeout) == 1) {
>> +		spin_lock_irqsave(shost->host_lock, flags);
>> +		if (scsi_host_in_recovery(shost))
>> +			ret = -EBUSY;
>> +		else {
>> +			shost->eh_deadline = timeout;
> 
> I think the deadline should be specified in seconds, so this
> should be shost->eh_deadline = timeout * HZ;
> 
Same here.

>> +			ret = count;
>> +		}
>> +		spin_unlock_irqrestore(shost->host_lock, flags);
>> +	}
>> +	return ret;
>> +}
>> +
>> +static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store_shost_eh_deadline);
>> +
>>  shost_rd_attr(unique_id, "%u\n");
>>  shost_rd_attr(host_busy, "%hu\n");
>>  shost_rd_attr(cmd_per_lun, "%hd\n");
>> @@ -308,6 +344,7 @@ static struct attribute *scsi_sysfs_shost_attrs[] = {
>>  	&dev_attr_prot_capabilities.attr,
>>  	&dev_attr_prot_guard_type.attr,
>>  	&dev_attr_host_reset.attr,
>> +	&dev_attr_eh_deadline.attr,
>>  	NULL
>>  };
>>  
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index 7552435..ca87486 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -598,7 +598,7 @@ struct Scsi_Host {
>>  	unsigned int host_eh_scheduled;    /* EH scheduled without command */
>>      
>>  	unsigned int host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
>> -	int resetting; /* if set, it means that last_reset is a valid value */
>> +	int eh_deadline; /* Deadline for EH runtime */
>>  	unsigned long last_reset;
>>  
>>  	/*
> 
> 

Thanks for the review.

I'll be sending out a next round of patches.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime
  2013-06-10 11:11 ` [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Hannes Reinecke
  2013-06-27 14:33   ` Ewan Milne
@ 2013-06-28  7:29   ` Bart Van Assche
  2013-06-28  7:42     ` Hannes Reinecke
  1 sibling, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2013-06-28  7:29 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Joern Engel, Ewan Milne,
	James Smart, Ren Mingxin, Roland Dreier, Bryn Reeves,
	Christoph Hellwig

On 06/10/13 13:11, Hannes Reinecke wrote:
> +static int sdev_eh_deadline(struct Scsi_Host *shost,
> +			   unsigned long eh_start)
> +{
> +	if (!shost->eh_deadline)
> +		return 0;
> +
> +	if (shost->last_reset != 0 &&
> +	    time_before(shost->last_reset, eh_start))
> +		eh_start = shost->last_reset;
> +
> +	if (time_before(jiffies,
> +			eh_start + shost->eh_deadline))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static int scsi_host_eh_deadline(struct Scsi_Host *shost)
> +{
> +	if (!shost->last_reset)
> +		return 0;
> +
> +	return sdev_eh_deadline(shost, shost->last_reset);
> +}

Hello Hannes,

I would appreciate if you would choose other names for these two 
functions and also for shost->eh_deadline. To me a deadline is a time 
instant. As far as I can see the two functions above check whether a 
deadline has been passed, and shost->eh_deadline is a time interval. How 
about the following names: sdev_eh_past_deadline(), 
shost_eh_past_deadline() and shost->max_eh_jiffies ?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime
  2013-06-28  7:29   ` Bart Van Assche
@ 2013-06-28  7:42     ` Hannes Reinecke
  0 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2013-06-28  7:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-scsi, Joern Engel, Ewan Milne,
	James Smart, Ren Mingxin, Roland Dreier, Bryn Reeves,
	Christoph Hellwig

On 06/28/2013 09:29 AM, Bart Van Assche wrote:
> On 06/10/13 13:11, Hannes Reinecke wrote:
>> +static int sdev_eh_deadline(struct Scsi_Host *shost,
>> +               unsigned long eh_start)
>> +{
>> +    if (!shost->eh_deadline)
>> +        return 0;
>> +
>> +    if (shost->last_reset != 0 &&
>> +        time_before(shost->last_reset, eh_start))
>> +        eh_start = shost->last_reset;
>> +
>> +    if (time_before(jiffies,
>> +            eh_start + shost->eh_deadline))
>> +        return 0;
>> +
>> +    return 1;
>> +}
>> +
>> +static int scsi_host_eh_deadline(struct Scsi_Host *shost)
>> +{
>> +    if (!shost->last_reset)
>> +        return 0;
>> +
>> +    return sdev_eh_deadline(shost, shost->last_reset);
>> +}
> 
> Hello Hannes,
> 
> I would appreciate if you would choose other names for these two
> functions and also for shost->eh_deadline. To me a deadline is a
> time instant. As far as I can see the two functions above check
> whether a deadline has been passed, and shost->eh_deadline is a time
> interval. How about the following names: sdev_eh_past_deadline(),
> shost_eh_past_deadline() and shost->max_eh_jiffies ?
> 
Sure. I changed the naming once already, so I don't have any issues
with that.
And yes, the suggested naming does make more sense.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime
  2013-06-28  7:14     ` Hannes Reinecke
@ 2013-06-28 12:54       ` Ewan Milne
  0 siblings, 0 replies; 14+ messages in thread
From: Ewan Milne @ 2013-06-28 12:54 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Joern Engel, James Smart,
	Ren Mingxin, Roland Dreier, Bryn Reeves, Christoph Hellwig

On Fri, 2013-06-28 at 09:14 +0200, Hannes Reinecke wrote:

> >> @@ -232,6 +272,9 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> >>  		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
> >>  			goto out_unlock;
> >>  
> >> +	if (sdev->eh_deadline && !shost->last_reset)
> >> +		shost->last_reset = jiffies;
> >> +
> > 
> > I think this is supposed to be if (shost->eh_deadline ...
> > 
> No. ->last_reset is set to the time the first command timeout/failure.
> 
> ->last_reset + ->eh_deadline will give you the expiry time.

Sorry, I wasn't clear what I meant.  What I meant was:

          if (shost->eh_deadline && !shost->last_reset)
                  shost->last_reset = jiffies;

...because the eh_deadline field is in struct Scsi_Host, not
in struct scsi_device.

-Ewan




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-06-28 12:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-10 11:11 [PATCH 0/7] Limit overall SCSI EH runtime Hannes Reinecke
2013-06-10 11:11 ` [PATCH 1/7] dpt_i2o: Remove DPTI_STATE_IOCTL Hannes Reinecke
2013-06-10 11:11 ` [PATCH 2/7] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset Hannes Reinecke
2013-06-10 11:11 ` [PATCH 3/7] advansys: Remove 'last_reset' references Hannes Reinecke
2013-06-10 11:11 ` [PATCH 4/7] tmscsim: Move 'last_reset' into host structure Hannes Reinecke
2013-06-10 11:11 ` [PATCH 5/7] dc395: Move 'last_reset' into internal " Hannes Reinecke
2013-06-10 11:11 ` [PATCH 6/7] scsi: remove check for 'resetting' Hannes Reinecke
2013-06-10 11:11 ` [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Hannes Reinecke
2013-06-27 14:33   ` Ewan Milne
2013-06-28  7:14     ` Hannes Reinecke
2013-06-28 12:54       ` Ewan Milne
2013-06-28  7:29   ` Bart Van Assche
2013-06-28  7:42     ` Hannes Reinecke
2013-06-27  9:23 ` [PATCH 0/7] Limit overall " Ren Mingxin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.