All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] hpsa: May 3, 2011 updates
@ 2011-05-03 19:58 Stephen M. Cameron
  2011-05-03 19:58 ` [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost Stephen M. Cameron
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:58 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

The following series mostly contains fixes to improve kdump behavior
and esp. to make older controllers which cannot be hard reset work by
doing a soft reset instead.   There are a few patches factoring out
various functionality into individual functions make way for the soft
reset functionality.  There is also a bugfix which prevents PCI write
combining from potentially causing commands to get lost. 

---

Stephen M. Cameron (16):
      hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
      hpsa: add readl after writel in interrupt mask setting code
      hpsa: remove unused parameter from hpsa_complete_scsi_command()
      hpsa: delete old unused padding garbage
      hpsa: do a better job of detecting controller reset failure
      hpsa: wait longer for no-op to complete after resetting controller
      hpsa: factor out cmd pool allocation functions
      hpsa: factor out irq request code
      hpsa: increase time to wait for board reset
      hpsa: clarify messages around reset behavior
      hpsa: remove atrophied hpsa_scsi_setup function
      hpsa: use new doorbell-bit-5 reset method
      hpsa: do soft reset if hard reset is broken
      hpsa: remove superfluous sleeps around reset code
      hpsa: do not attempt PCI power management reset method if we know it won't work.
      hpsa: add P2000 to list of shared SAS devices


 drivers/scsi/hpsa.c     |  494 ++++++++++++++++++++++++++++++++++++++---------
 drivers/scsi/hpsa.h     |   15 +
 drivers/scsi/hpsa_cmd.h |   11 -
 3 files changed, 412 insertions(+), 108 deletions(-)

-- 
-- steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
@ 2011-05-03 19:58 ` Stephen M. Cameron
  2011-05-04 11:15   ` Tomas Henzl
  2011-05-03 19:58 ` [PATCH 02/16] hpsa: add readl after writel in interrupt mask setting code Stephen M. Cameron
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:58 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Apparently we've been doin it rong for a decade, but only lately do we
run into problems.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 621a153..98c97ca 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -212,6 +212,7 @@ static void SA5_submit_command(struct ctlr_info *h,
 	dev_dbg(&h->pdev->dev, "Sending %x, tag = %x\n", c->busaddr,
 		c->Header.Tag.lower);
 	writel(c->busaddr, h->vaddr + SA5_REQUEST_PORT_OFFSET);
+	(void) readl(h->vaddr + SA5_REQUEST_PORT_OFFSET);
 	h->commands_outstanding++;
 	if (h->commands_outstanding > h->max_outstanding)
 		h->max_outstanding = h->commands_outstanding;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/16] hpsa: add readl after writel in interrupt mask setting code
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
  2011-05-03 19:58 ` [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost Stephen M. Cameron
@ 2011-05-03 19:58 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 03/16] hpsa: remove unused parameter from hpsa_complete_scsi_command() Stephen M. Cameron
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:58 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

This is to ensure the board interrupts are really off when
these functions return.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 98c97ca..7eec397 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -228,10 +228,12 @@ static void SA5_intr_mask(struct ctlr_info *h, unsigned long val)
 	if (val) { /* Turn interrupts on */
 		h->interrupts_enabled = 1;
 		writel(0, h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
+		(void) readl(h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
 	} else { /* Turn them off */
 		h->interrupts_enabled = 0;
 		writel(SA5_INTR_OFF,
 			h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
+		(void) readl(h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
 	}
 }
 
@@ -240,10 +242,12 @@ static void SA5_performant_intr_mask(struct ctlr_info *h, unsigned long val)
 	if (val) { /* turn on interrupts */
 		h->interrupts_enabled = 1;
 		writel(0, h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
+		(void) readl(h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
 	} else {
 		h->interrupts_enabled = 0;
 		writel(SA5_PERF_INTR_OFF,
 			h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
+		(void) readl(h->vaddr + SA5_REPLY_INTR_MASK_OFFSET);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/16] hpsa: remove unused parameter from hpsa_complete_scsi_command()
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
  2011-05-03 19:58 ` [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost Stephen M. Cameron
  2011-05-03 19:58 ` [PATCH 02/16] hpsa: add readl after writel in interrupt mask setting code Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 04/16] hpsa: delete old unused padding garbage Stephen M. Cameron
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 415ad4f..8266850 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1006,8 +1006,7 @@ static void hpsa_unmap_sg_chain_block(struct ctlr_info *h,
 	pci_unmap_single(h->pdev, temp64.val, chain_sg->Len, PCI_DMA_TODEVICE);
 }
 
-static void complete_scsi_command(struct CommandList *cp,
-	int timeout, u32 tag)
+static void complete_scsi_command(struct CommandList *cp)
 {
 	struct scsi_cmnd *cmd;
 	struct ctlr_info *h;
@@ -2936,7 +2935,7 @@ static inline void finish_cmd(struct CommandList *c, u32 raw_tag)
 {
 	removeQ(c);
 	if (likely(c->cmd_type == CMD_SCSI))
-		complete_scsi_command(c, 0, raw_tag);
+		complete_scsi_command(c);
 	else if (c->cmd_type == CMD_IOCTL_PEND)
 		complete(c->waiting);
 }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/16] hpsa: delete old unused padding garbage
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (2 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 03/16] hpsa: remove unused parameter from hpsa_complete_scsi_command() Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 05/16] hpsa: do a better job of detecting controller reset failure Stephen M. Cameron
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa_cmd.h |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index 1846490..0500740 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -256,14 +256,6 @@ struct ErrorInfo {
 #define CMD_IOCTL_PEND  0x01
 #define CMD_SCSI	0x03
 
-/* This structure needs to be divisible by 32 for new
- * indexing method and performant mode.
- */
-#define PAD32 32
-#define PAD64DIFF 0
-#define USEEXTRA ((sizeof(void *) - 4)/4)
-#define PADSIZE (PAD32 + PAD64DIFF * USEEXTRA)
-
 #define DIRECT_LOOKUP_SHIFT 5
 #define DIRECT_LOOKUP_BIT 0x10
 #define DIRECT_LOOKUP_MASK (~((1 << DIRECT_LOOKUP_SHIFT) - 1))


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/16] hpsa: do a better job of detecting controller reset failure
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (3 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 04/16] hpsa: delete old unused padding garbage Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 06/16] hpsa: wait longer for no-op to complete after resetting controller Stephen M. Cameron
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Detect failure of controller reset by noticing if the 32 bytes of
"driver version" we store on the hardware in the config table
fail to get zeroed out.  Previously we noticed if the controller
did not transition to "simple mode", but this did not detect reset
failure if the controller was already in simple mode prior to
the reset attempt (e.g. due to module parameter hpsa_simple_mode=1).

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c     |   76 +++++++++++++++++++++++++++++++++++++++++------
 drivers/scsi/hpsa_cmd.h |    1 +
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 8266850..2aeb210 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3184,6 +3184,59 @@ static int hpsa_controller_hard_reset(struct pci_dev *pdev,
 	return 0;
 }
 
+static __devinit void init_driver_version(char *driver_version, int len)
+{
+	memset(driver_version, 0, len);
+	strncpy(driver_version, "hpsa " HPSA_DRIVER_VERSION, len - 1);
+}
+
+static __devinit int write_driver_ver_to_cfgtable(
+	struct CfgTable __iomem *cfgtable)
+{
+	char *driver_version;
+	int i, size = sizeof(cfgtable->driver_version);
+
+	driver_version = kmalloc(size, GFP_KERNEL);
+	if (!driver_version)
+		return -ENOMEM;
+
+	init_driver_version(driver_version, size);
+	for (i = 0; i < size; i++)
+		writeb(driver_version[i], &cfgtable->driver_version[i]);
+	kfree(driver_version);
+	return 0;
+}
+
+static __devinit void read_driver_ver_from_cfgtable(
+	struct CfgTable __iomem *cfgtable, unsigned char *driver_ver)
+{
+	int i;
+
+	for (i = 0; i < sizeof(cfgtable->driver_version); i++)
+		driver_ver[i] = readb(&cfgtable->driver_version[i]);
+}
+
+static __devinit int controller_reset_failed(
+	struct CfgTable __iomem *cfgtable)
+{
+
+	char *driver_ver, *old_driver_ver;
+	int rc, size = sizeof(cfgtable->driver_version);
+
+	old_driver_ver = kmalloc(2 * size, GFP_KERNEL);
+	if (!old_driver_ver)
+		return -ENOMEM;
+	driver_ver = old_driver_ver + size;
+
+	/* After a reset, the 32 bytes of "driver version" in the cfgtable
+	 * should have been changed, otherwise we know the reset failed.
+	 */
+	init_driver_version(old_driver_ver, size);
+	read_driver_ver_from_cfgtable(cfgtable, driver_ver);
+	rc = !memcmp(driver_ver, old_driver_ver, size);
+	kfree(old_driver_ver);
+	return rc;
+}
 /* This does a hard reset of the controller using PCI power management
  * states or the using the doorbell register.
  */
@@ -3194,7 +3247,7 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	u64 cfg_base_addr_index;
 	void __iomem *vaddr;
 	unsigned long paddr;
-	u32 misc_fw_support, active_transport;
+	u32 misc_fw_support;
 	int rc;
 	struct CfgTable __iomem *cfgtable;
 	bool use_doorbell;
@@ -3256,6 +3309,9 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 		rc = -ENOMEM;
 		goto unmap_vaddr;
 	}
+	rc = write_driver_ver_to_cfgtable(cfgtable);
+	if (rc)
+		goto unmap_vaddr;
 
 	/* If reset via doorbell register is supported, use that. */
 	misc_fw_support = readl(&cfgtable->misc_fw_support);
@@ -3289,19 +3345,16 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 			"failed waiting for board to become ready\n");
 		goto unmap_cfgtable;
 	}
-	dev_info(&pdev->dev, "board ready.\n");
 
-	/* Controller should be in simple mode at this point.  If it's not,
-	 * It means we're on one of those controllers which doesn't support
-	 * the doorbell reset method and on which the PCI power management reset
-	 * method doesn't work (P800, for example.)
-	 * In those cases, don't try to proceed, as it generally doesn't work.
-	 */
-	active_transport = readl(&cfgtable->TransportActive);
-	if (active_transport & PERFORMANT_MODE) {
+	rc = controller_reset_failed(vaddr);
+	if (rc < 0)
+		goto unmap_cfgtable;
+	if (rc) {
 		dev_warn(&pdev->dev, "Unable to successfully reset controller,"
 			" Ignoring controller.\n");
 		rc = -ENODEV;
+	} else {
+		dev_info(&pdev->dev, "board ready.\n");
 	}
 
 unmap_cfgtable:
@@ -3542,6 +3595,9 @@ static int __devinit hpsa_find_cfgtables(struct ctlr_info *h)
 		       cfg_base_addr_index) + cfg_offset, sizeof(*h->cfgtable));
 	if (!h->cfgtable)
 		return -ENOMEM;
+	rc = write_driver_ver_to_cfgtable(h->cfgtable);
+	if (rc)
+		return rc;
 	/* Find performant mode table. */
 	trans_offset = readl(&h->cfgtable->TransMethodOffset);
 	h->transtable = remap_pci_mem(pci_resource_start(h->pdev,
diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index 0500740..8fd35a7 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -337,6 +337,7 @@ struct CfgTable {
 	u8		reserved[0x78 - 0x58];
 	u32		misc_fw_support; /* offset 0x78 */
 #define			MISC_FW_DOORBELL_RESET (0x02)
+	u8		driver_version[32];
 };
 
 #define NUM_BLOCKFETCH_ENTRIES 8


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/16] hpsa: wait longer for no-op to complete after resetting controller
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (4 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 05/16] hpsa: do a better job of detecting controller reset failure Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 07/16] hpsa: factor out cmd pool allocation functions Stephen M. Cameron
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

This is to avoid the usual two or three messages about the command
timing out.  We're obviously not waiting long enough.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 7eec397..7a4ee73 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -130,7 +130,7 @@ struct ctlr_info {
 #define HPSA_BUS_RESET_MSG 2
 #define HPSA_HOST_RESET_MSG 3
 #define HPSA_MSG_SEND_RETRY_LIMIT 10
-#define HPSA_MSG_SEND_RETRY_INTERVAL_MSECS 1000
+#define HPSA_MSG_SEND_RETRY_INTERVAL_MSECS (10000)
 
 /* Maximum time in seconds driver will wait for command completions
  * when polling before giving up.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/16] hpsa: factor out cmd pool allocation functions
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (5 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 06/16] hpsa: wait longer for no-op to complete after resetting controller Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 08/16] hpsa: factor out irq request code Stephen M. Cameron
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   66 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 2aeb210..7336f3c 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3847,6 +3847,40 @@ static __devinit int hpsa_init_reset_devices(struct pci_dev *pdev)
 	return 0;
 }
 
+static __devinit int hpsa_allocate_cmd_pool(struct ctlr_info *h)
+{
+	h->cmd_pool_bits = kzalloc(
+		DIV_ROUND_UP(h->nr_cmds, BITS_PER_LONG) *
+		sizeof(unsigned long), GFP_KERNEL);
+	h->cmd_pool = pci_alloc_consistent(h->pdev,
+		    h->nr_cmds * sizeof(*h->cmd_pool),
+		    &(h->cmd_pool_dhandle));
+	h->errinfo_pool = pci_alloc_consistent(h->pdev,
+		    h->nr_cmds * sizeof(*h->errinfo_pool),
+		    &(h->errinfo_pool_dhandle));
+	if ((h->cmd_pool_bits == NULL)
+	    || (h->cmd_pool == NULL)
+	    || (h->errinfo_pool == NULL)) {
+		dev_err(&h->pdev->dev, "out of memory in %s", __func__);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static void hpsa_free_cmd_pool(struct ctlr_info *h)
+{
+	kfree(h->cmd_pool_bits);
+	if (h->cmd_pool)
+		pci_free_consistent(h->pdev,
+			    h->nr_cmds * sizeof(struct CommandList),
+			    h->cmd_pool, h->cmd_pool_dhandle);
+	if (h->errinfo_pool)
+		pci_free_consistent(h->pdev,
+			    h->nr_cmds * sizeof(struct ErrorInfo),
+			    h->errinfo_pool,
+			    h->errinfo_pool_dhandle);
+}
+
 static int __devinit hpsa_init_one(struct pci_dev *pdev,
 				    const struct pci_device_id *ent)
 {
@@ -3917,33 +3951,14 @@ static int __devinit hpsa_init_one(struct pci_dev *pdev,
 	dev_info(&pdev->dev, "%s: <0x%x> at IRQ %d%s using DAC\n",
 	       h->devname, pdev->device,
 	       h->intr[h->intr_mode], dac ? "" : " not");
-
-	h->cmd_pool_bits =
-	    kmalloc(((h->nr_cmds + BITS_PER_LONG -
-		      1) / BITS_PER_LONG) * sizeof(unsigned long), GFP_KERNEL);
-	h->cmd_pool = pci_alloc_consistent(h->pdev,
-		    h->nr_cmds * sizeof(*h->cmd_pool),
-		    &(h->cmd_pool_dhandle));
-	h->errinfo_pool = pci_alloc_consistent(h->pdev,
-		    h->nr_cmds * sizeof(*h->errinfo_pool),
-		    &(h->errinfo_pool_dhandle));
-	if ((h->cmd_pool_bits == NULL)
-	    || (h->cmd_pool == NULL)
-	    || (h->errinfo_pool == NULL)) {
-		dev_err(&pdev->dev, "out of memory");
-		rc = -ENOMEM;
+	if (hpsa_allocate_cmd_pool(h))
 		goto clean4;
-	}
 	if (hpsa_allocate_sg_chain_blocks(h))
 		goto clean4;
 	init_waitqueue_head(&h->scan_wait_queue);
 	h->scan_finished = 1; /* no scan currently in progress */
 
 	pci_set_drvdata(pdev, h);
-	memset(h->cmd_pool_bits, 0,
-	       ((h->nr_cmds + BITS_PER_LONG -
-		 1) / BITS_PER_LONG) * sizeof(unsigned long));
-
 	hpsa_scsi_setup(h);
 
 	/* Turn the interrupts on so we can service requests */
@@ -3957,16 +3972,7 @@ static int __devinit hpsa_init_one(struct pci_dev *pdev,
 
 clean4:
 	hpsa_free_sg_chain_blocks(h);
-	kfree(h->cmd_pool_bits);
-	if (h->cmd_pool)
-		pci_free_consistent(h->pdev,
-			    h->nr_cmds * sizeof(struct CommandList),
-			    h->cmd_pool, h->cmd_pool_dhandle);
-	if (h->errinfo_pool)
-		pci_free_consistent(h->pdev,
-			    h->nr_cmds * sizeof(struct ErrorInfo),
-			    h->errinfo_pool,
-			    h->errinfo_pool_dhandle);
+	hpsa_free_cmd_pool(h);
 	free_irq(h->intr[h->intr_mode], h);
 clean2:
 clean1:


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/16] hpsa: factor out irq request code
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (6 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 07/16] hpsa: factor out cmd pool allocation functions Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 09/16] hpsa: increase time to wait for board reset Stephen M. Cameron
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   32 +++++++++++++++++++++-----------
 1 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 7336f3c..97db2e5 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3881,6 +3881,26 @@ static void hpsa_free_cmd_pool(struct ctlr_info *h)
 			    h->errinfo_pool_dhandle);
 }
 
+static int hpsa_request_irq(struct ctlr_info *h,
+	irqreturn_t (*msixhandler)(int, void *),
+	irqreturn_t (*intxhandler)(int, void *))
+{
+	int rc;
+
+	if (h->msix_vector || h->msi_vector)
+		rc = request_irq(h->intr[h->intr_mode], msixhandler,
+				IRQF_DISABLED, h->devname, h);
+	else
+		rc = request_irq(h->intr[h->intr_mode], intxhandler,
+				IRQF_DISABLED, h->devname, h);
+	if (rc) {
+		dev_err(&h->pdev->dev, "unable to get irq %d for %s\n",
+		       h->intr[h->intr_mode], h->devname);
+		return -ENODEV;
+	}
+	return 0;
+}
+
 static int __devinit hpsa_init_one(struct pci_dev *pdev,
 				    const struct pci_device_id *ent)
 {
@@ -3936,18 +3956,8 @@ static int __devinit hpsa_init_one(struct pci_dev *pdev,
 	/* make sure the board interrupts are off */
 	h->access.set_intr_mask(h, HPSA_INTR_OFF);
 
-	if (h->msix_vector || h->msi_vector)
-		rc = request_irq(h->intr[h->intr_mode], do_hpsa_intr_msi,
-				IRQF_DISABLED, h->devname, h);
-	else
-		rc = request_irq(h->intr[h->intr_mode], do_hpsa_intr_intx,
-				IRQF_DISABLED, h->devname, h);
-	if (rc) {
-		dev_err(&pdev->dev, "unable to get irq %d for %s\n",
-		       h->intr[h->intr_mode], h->devname);
+	if (hpsa_request_irq(h, do_hpsa_intr_msi, do_hpsa_intr_intx))
 		goto clean2;
-	}
-
 	dev_info(&pdev->dev, "%s: <0x%x> at IRQ %d%s using DAC\n",
 	       h->devname, pdev->device,
 	       h->intr[h->intr_mode], dac ? "" : " not");


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/16] hpsa: increase time to wait for board reset
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (7 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 08/16] hpsa: factor out irq request code Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 10/16] hpsa: clarify messages around reset behavior Stephen M. Cameron
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 7a4ee73..b1412a7 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -155,7 +155,7 @@ struct ctlr_info {
  * HPSA_BOARD_READY_ITERATIONS are derived from those.
  */
 #define HPSA_BOARD_READY_WAIT_SECS (120)
-#define HPSA_BOARD_NOT_READY_WAIT_SECS (10)
+#define HPSA_BOARD_NOT_READY_WAIT_SECS (100)
 #define HPSA_BOARD_READY_POLL_INTERVAL_MSECS (100)
 #define HPSA_BOARD_READY_POLL_INTERVAL \
 	((HPSA_BOARD_READY_POLL_INTERVAL_MSECS * HZ) / 1000)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/16] hpsa: clarify messages around reset behavior
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (8 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 09/16] hpsa: increase time to wait for board reset Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 11/16] hpsa: remove atrophied hpsa_scsi_setup function Stephen M. Cameron
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

When waiting for the board to become "not ready"
don't print a message saying "waiting for board to
become ready" (possibly followed by a message saying
"failed waiting for board to become not ready".  Instead,
it should be "waiting for board to reset" and "failed
waiting for board to reset."

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 97db2e5..2a90e9a 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3334,11 +3334,11 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	msleep(HPSA_POST_RESET_PAUSE_MSECS);
 
 	/* Wait for board to become not ready, then ready. */
-	dev_info(&pdev->dev, "Waiting for board to become ready.\n");
+	dev_info(&pdev->dev, "Waiting for board to reset.\n");
 	rc = hpsa_wait_for_board_state(pdev, vaddr, BOARD_NOT_READY);
 	if (rc)
 		dev_warn(&pdev->dev,
-			"failed waiting for board to become not ready\n");
+			"failed waiting for board to reset\n");
 	rc = hpsa_wait_for_board_state(pdev, vaddr, BOARD_READY);
 	if (rc) {
 		dev_warn(&pdev->dev,
@@ -3837,6 +3837,7 @@ static __devinit int hpsa_init_reset_devices(struct pci_dev *pdev)
 		return -ENODEV;
 
 	/* Now try to get the controller to respond to a no-op */
+	dev_warn(&pdev->dev, "Waiting for controller to respond to no-op\n");
 	for (i = 0; i < HPSA_POST_RESET_NOOP_RETRIES; i++) {
 		if (hpsa_noop(pdev) == 0)
 			break;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/16] hpsa: remove atrophied hpsa_scsi_setup function
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (9 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 10/16] hpsa: clarify messages around reset behavior Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 12/16] hpsa: use new doorbell-bit-5 reset method Stephen M. Cameron
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

hpsa_scsi_setup at one time contained enough code to justify
its existence, but that time has passed.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   11 +++--------
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 2a90e9a..ca92141 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -929,13 +929,6 @@ static void hpsa_slave_destroy(struct scsi_device *sdev)
 	/* nothing to do. */
 }
 
-static void hpsa_scsi_setup(struct ctlr_info *h)
-{
-	h->ndevices = 0;
-	h->scsi_host = NULL;
-	spin_lock_init(&h->devlock);
-}
-
 static void hpsa_free_sg_chain_blocks(struct ctlr_info *h)
 {
 	int i;
@@ -3970,7 +3963,9 @@ static int __devinit hpsa_init_one(struct pci_dev *pdev,
 	h->scan_finished = 1; /* no scan currently in progress */
 
 	pci_set_drvdata(pdev, h);
-	hpsa_scsi_setup(h);
+	h->ndevices = 0;
+	h->scsi_host = NULL;
+	spin_lock_init(&h->devlock);
 
 	/* Turn the interrupts on so we can service requests */
 	h->access.set_intr_mask(h, HPSA_INTR_ON);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/16] hpsa: use new doorbell-bit-5 reset method
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (10 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 11/16] hpsa: remove atrophied hpsa_scsi_setup function Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 13/16] hpsa: do soft reset if hard reset is broken Stephen M. Cameron
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

The bit-2-doorbell reset method seemed to cause (survivable) NMIs
on some systems and (unsurvivable) IOCK NMIs on some G7 servers.
Firmware guys implemented a new doorbell method to alleviate these
problems triggered by bit 5 of the doorbell register.  We want to
use it if it's available.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c     |   25 ++++++++++++++++++++-----
 drivers/scsi/hpsa_cmd.h |    2 ++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index ca92141..c096cda 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3128,7 +3128,7 @@ static __devinit int hpsa_message(struct pci_dev *pdev, unsigned char opcode,
 #define hpsa_noop(p) hpsa_message(p, 3, 0)
 
 static int hpsa_controller_hard_reset(struct pci_dev *pdev,
-	void * __iomem vaddr, bool use_doorbell)
+	void * __iomem vaddr, u32 use_doorbell)
 {
 	u16 pmcsr;
 	int pos;
@@ -3139,7 +3139,7 @@ static int hpsa_controller_hard_reset(struct pci_dev *pdev,
 		 * other way using the doorbell register.
 		 */
 		dev_info(&pdev->dev, "using doorbell to reset controller\n");
-		writel(DOORBELL_CTLR_RESET, vaddr + SA5_DOORBELL);
+		writel(use_doorbell, vaddr + SA5_DOORBELL);
 		msleep(1000);
 	} else { /* Try to do it the PCI power state way */
 
@@ -3243,7 +3243,7 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	u32 misc_fw_support;
 	int rc;
 	struct CfgTable __iomem *cfgtable;
-	bool use_doorbell;
+	u32 use_doorbell;
 	u32 board_id;
 	u16 command_register;
 
@@ -3306,9 +3306,24 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	if (rc)
 		goto unmap_vaddr;
 
-	/* If reset via doorbell register is supported, use that. */
+	/* If reset via doorbell register is supported, use that.
+	 * There are two such methods.  Favor the newest method.
+	 */
 	misc_fw_support = readl(&cfgtable->misc_fw_support);
-	use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET;
+	use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET2;
+	if (use_doorbell) {
+		use_doorbell = DOORBELL_CTLR_RESET2;
+	} else {
+		use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET;
+		if (use_doorbell) {
+			dev_warn(&pdev->dev, "Controller claims that "
+				"'Bit 2 doorbell reset' is "
+				"supported, but not 'bit 5 doorbell reset'.  "
+				"Firmware update is recommended.\n");
+			rc = -ENODEV;
+			goto unmap_cfgtable;
+		}
+	}
 
 	rc = hpsa_controller_hard_reset(pdev, vaddr, use_doorbell);
 	if (rc)
diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index 8fd35a7..55d741b 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -101,6 +101,7 @@
 #define CFGTBL_ChangeReq        0x00000001l
 #define CFGTBL_AccCmds          0x00000001l
 #define DOORBELL_CTLR_RESET	0x00000004l
+#define DOORBELL_CTLR_RESET2	0x00000020l
 
 #define CFGTBL_Trans_Simple     0x00000002l
 #define CFGTBL_Trans_Performant 0x00000004l
@@ -337,6 +338,7 @@ struct CfgTable {
 	u8		reserved[0x78 - 0x58];
 	u32		misc_fw_support; /* offset 0x78 */
 #define			MISC_FW_DOORBELL_RESET (0x02)
+#define			MISC_FW_DOORBELL_RESET2 (0x010)
 	u8		driver_version[32];
 };
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/16] hpsa: do soft reset if hard reset is broken
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (11 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 12/16] hpsa: use new doorbell-bit-5 reset method Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 19:59 ` [PATCH 14/16] hpsa: remove superfluous sleeps around reset code Stephen M. Cameron
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

on driver load, if reset_devices is set, and the hard reset
attempts fail, try to bring up the controller to the point that
a command can be sent, and send it a soft reset command, then
after the reset undo whatever driver initialization was done to get
it to the point to take a command, and re-do it after the reset.

This is to get kdump to work on all the "non-resettable" controllers
(except 64xx controllers which can't be reset due to the potentially
shared cache module.)

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |  226 ++++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/scsi/hpsa.h |    6 +
 2 files changed, 216 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index c096cda..6fe77d0 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -2743,6 +2743,26 @@ static int hpsa_ioctl(struct scsi_device *dev, int cmd, void *arg)
 	}
 }
 
+static int __devinit hpsa_send_host_reset(struct ctlr_info *h,
+	unsigned char *scsi3addr, u8 reset_type)
+{
+	struct CommandList *c;
+
+	c = cmd_alloc(h);
+	if (!c)
+		return -ENOMEM;
+	fill_cmd(c, HPSA_DEVICE_RESET_MSG, h, NULL, 0, 0,
+		RAID_CTLR_LUNID, TYPE_MSG);
+	c->Request.CDB[1] = reset_type; /* fill_cmd defaults to target reset */
+	c->waiting = NULL;
+	enqueue_cmd_and_start_io(h, c);
+	/* Don't wait for completion, the reset won't complete.  Don't free
+	 * the command either.  This is the last command we will send before
+	 * re-initializing everything, so it doesn't matter and won't leak.
+	 */
+	return 0;
+}
+
 static void fill_cmd(struct CommandList *c, u8 cmd, struct ctlr_info *h,
 	void *buff, size_t size, u8 page_code, unsigned char *scsi3addr,
 	int cmd_type)
@@ -2820,7 +2840,8 @@ static void fill_cmd(struct CommandList *c, u8 cmd, struct ctlr_info *h,
 			c->Request.Type.Attribute = ATTR_SIMPLE;
 			c->Request.Type.Direction = XFER_NONE;
 			c->Request.Timeout = 0; /* Don't time out */
-			c->Request.CDB[0] =  0x01; /* RESET_MSG is 0x01 */
+			memset(&c->Request.CDB[0], 0, sizeof(c->Request.CDB));
+			c->Request.CDB[0] =  cmd;
 			c->Request.CDB[1] = 0x03;  /* Reset target above */
 			/* If bytes 4-7 are zero, it means reset the */
 			/* LunID device */
@@ -2986,6 +3007,63 @@ static inline u32 process_nonindexed_cmd(struct ctlr_info *h,
 	return next_command(h);
 }
 
+/* Some controllers, like p400, will give us one interrupt
+ * after a soft reset, even if we turned interrupts off.
+ * Only need to check for this in the hpsa_xxx_discard_completions
+ * functions.
+ */
+static int ignore_bogus_interrupt(struct ctlr_info *h)
+{
+	if (likely(!reset_devices))
+		return 0;
+
+	if (likely(h->interrupts_enabled))
+		return 0;
+
+	dev_info(&h->pdev->dev, "Received interrupt while interrupts disabled "
+		"(known firmware bug.)  Ignoring.\n");
+
+	return 1;
+}
+
+static irqreturn_t hpsa_intx_discard_completions(int irq, void *dev_id)
+{
+	struct ctlr_info *h = dev_id;
+	unsigned long flags;
+	u32 raw_tag;
+
+	if (ignore_bogus_interrupt(h))
+		return IRQ_NONE;
+
+	if (interrupt_not_for_us(h))
+		return IRQ_NONE;
+	spin_lock_irqsave(&h->lock, flags);
+	while (interrupt_pending(h)) {
+		raw_tag = get_next_completion(h);
+		while (raw_tag != FIFO_EMPTY)
+			raw_tag = next_command(h);
+	}
+	spin_unlock_irqrestore(&h->lock, flags);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t hpsa_msix_discard_completions(int irq, void *dev_id)
+{
+	struct ctlr_info *h = dev_id;
+	unsigned long flags;
+	u32 raw_tag;
+
+	if (ignore_bogus_interrupt(h))
+		return IRQ_NONE;
+
+	spin_lock_irqsave(&h->lock, flags);
+	raw_tag = get_next_completion(h);
+	while (raw_tag != FIFO_EMPTY)
+		raw_tag = next_command(h);
+	spin_unlock_irqrestore(&h->lock, flags);
+	return IRQ_HANDLED;
+}
+
 static irqreturn_t do_hpsa_intr_intx(int irq, void *dev_id)
 {
 	struct ctlr_info *h = dev_id;
@@ -3124,7 +3202,6 @@ static __devinit int hpsa_message(struct pci_dev *pdev, unsigned char opcode,
 	return 0;
 }
 
-#define hpsa_soft_reset_controller(p) hpsa_message(p, 1, 0)
 #define hpsa_noop(p) hpsa_message(p, 3, 0)
 
 static int hpsa_controller_hard_reset(struct pci_dev *pdev,
@@ -3320,7 +3397,7 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 				"'Bit 2 doorbell reset' is "
 				"supported, but not 'bit 5 doorbell reset'.  "
 				"Firmware update is recommended.\n");
-			rc = -ENODEV;
+			rc = -ENOTSUPP; /* try soft reset */
 			goto unmap_cfgtable;
 		}
 	}
@@ -3344,13 +3421,18 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	/* Wait for board to become not ready, then ready. */
 	dev_info(&pdev->dev, "Waiting for board to reset.\n");
 	rc = hpsa_wait_for_board_state(pdev, vaddr, BOARD_NOT_READY);
-	if (rc)
+	if (rc) {
 		dev_warn(&pdev->dev,
-			"failed waiting for board to reset\n");
+			"failed waiting for board to reset."
+			" Will try soft reset.\n");
+		rc = -ENOTSUPP; /* Not expected, but try soft reset later */
+		goto unmap_cfgtable;
+	}
 	rc = hpsa_wait_for_board_state(pdev, vaddr, BOARD_READY);
 	if (rc) {
 		dev_warn(&pdev->dev,
-			"failed waiting for board to become ready\n");
+			"failed waiting for board to become ready "
+			"after hard reset\n");
 		goto unmap_cfgtable;
 	}
 
@@ -3358,11 +3440,11 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	if (rc < 0)
 		goto unmap_cfgtable;
 	if (rc) {
-		dev_warn(&pdev->dev, "Unable to successfully reset controller,"
-			" Ignoring controller.\n");
-		rc = -ENODEV;
+		dev_warn(&pdev->dev, "Unable to successfully reset "
+			"controller. Will try soft reset.\n");
+		rc = -ENOTSUPP;
 	} else {
-		dev_info(&pdev->dev, "board ready.\n");
+		dev_info(&pdev->dev, "board ready after hard reset.\n");
 	}
 
 unmap_cfgtable:
@@ -3840,7 +3922,7 @@ static __devinit int hpsa_init_reset_devices(struct pci_dev *pdev)
 	 * due to concerns about shared bbwc between 6402/6404 pair.
 	 */
 	if (rc == -ENOTSUPP)
-		return 0; /* just try to do the kdump anyhow. */
+		return rc; /* just try to do the kdump anyhow. */
 	if (rc)
 		return -ENODEV;
 
@@ -3910,18 +3992,79 @@ static int hpsa_request_irq(struct ctlr_info *h,
 	return 0;
 }
 
+static int __devinit hpsa_kdump_soft_reset(struct ctlr_info *h)
+{
+	if (hpsa_send_host_reset(h, RAID_CTLR_LUNID,
+		HPSA_RESET_TYPE_CONTROLLER)) {
+		dev_warn(&h->pdev->dev, "Resetting array controller failed.\n");
+		return -EIO;
+	}
+
+	dev_info(&h->pdev->dev, "Waiting for board to soft reset.\n");
+	if (hpsa_wait_for_board_state(h->pdev, h->vaddr, BOARD_NOT_READY)) {
+		dev_warn(&h->pdev->dev, "Soft reset had no effect.\n");
+		return -1;
+	}
+
+	dev_info(&h->pdev->dev, "Board reset, awaiting READY status.\n");
+	if (hpsa_wait_for_board_state(h->pdev, h->vaddr, BOARD_READY)) {
+		dev_warn(&h->pdev->dev, "Board failed to become ready "
+			"after soft reset.\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static void hpsa_undo_allocations_after_kdump_soft_reset(struct ctlr_info *h)
+{
+	free_irq(h->intr[h->intr_mode], h);
+#ifdef CONFIG_PCI_MSI
+	if (h->msix_vector)
+		pci_disable_msix(h->pdev);
+	else if (h->msi_vector)
+		pci_disable_msi(h->pdev);
+#endif /* CONFIG_PCI_MSI */
+	hpsa_free_sg_chain_blocks(h);
+	hpsa_free_cmd_pool(h);
+	kfree(h->blockFetchTable);
+	pci_free_consistent(h->pdev, h->reply_pool_size,
+		h->reply_pool, h->reply_pool_dhandle);
+	if (h->vaddr)
+		iounmap(h->vaddr);
+	if (h->transtable)
+		iounmap(h->transtable);
+	if (h->cfgtable)
+		iounmap(h->cfgtable);
+	pci_release_regions(h->pdev);
+	kfree(h);
+}
+
 static int __devinit hpsa_init_one(struct pci_dev *pdev,
 				    const struct pci_device_id *ent)
 {
 	int dac, rc;
 	struct ctlr_info *h;
+	int try_soft_reset = 0;
+	unsigned long flags;
 
 	if (number_of_controllers == 0)
 		printk(KERN_INFO DRIVER_NAME "\n");
 
 	rc = hpsa_init_reset_devices(pdev);
-	if (rc)
-		return rc;
+	if (rc) {
+		if (rc != -ENOTSUPP)
+			return rc;
+		/* If the reset fails in a particular way (it has no way to do
+		 * a proper hard reset, so returns -ENOTSUPP) we can try to do
+		 * a soft reset once we get the controller configured up to the
+		 * point that it can accept a command.
+		 */
+		try_soft_reset = 1;
+		rc = 0;
+	}
+
+reinit_after_soft_reset:
 
 	/* Command structures must be aligned on a 32-byte boundary because
 	 * the 5 lower bits of the address are used by the hardware. and by
@@ -3981,11 +4124,66 @@ static int __devinit hpsa_init_one(struct pci_dev *pdev,
 	h->ndevices = 0;
 	h->scsi_host = NULL;
 	spin_lock_init(&h->devlock);
+	hpsa_put_ctlr_into_performant_mode(h);
+
+	/* At this point, the controller is ready to take commands.
+	 * Now, if reset_devices and the hard reset didn't work, try
+	 * the soft reset and see if that works.
+	 */
+	if (try_soft_reset) {
+
+		/* This is kind of gross.  We may or may not get a completion
+		 * from the soft reset command, and if we do, then the value
+		 * from the fifo may or may not be valid.  So, we wait 10 secs
+		 * after the reset throwing away any completions we get during
+		 * that time.  Unregister the interrupt handler and register
+		 * fake ones to scoop up any residual completions.
+		 */
+		spin_lock_irqsave(&h->lock, flags);
+		h->access.set_intr_mask(h, HPSA_INTR_OFF);
+		spin_unlock_irqrestore(&h->lock, flags);
+		free_irq(h->intr[h->intr_mode], h);
+		rc = hpsa_request_irq(h, hpsa_msix_discard_completions,
+					hpsa_intx_discard_completions);
+		if (rc) {
+			dev_warn(&h->pdev->dev, "Failed to request_irq after "
+				"soft reset.\n");
+			goto clean4;
+		}
+
+		rc = hpsa_kdump_soft_reset(h);
+		if (rc)
+			/* Neither hard nor soft reset worked, we're hosed. */
+			goto clean4;
+
+		dev_info(&h->pdev->dev, "Board READY.\n");
+		dev_info(&h->pdev->dev,
+			"Waiting for stale completions to drain.\n");
+		h->access.set_intr_mask(h, HPSA_INTR_ON);
+		msleep(10000);
+		h->access.set_intr_mask(h, HPSA_INTR_OFF);
+
+		rc = controller_reset_failed(h->cfgtable);
+		if (rc)
+			dev_info(&h->pdev->dev,
+				"Soft reset appears to have failed.\n");
+
+		/* since the controller's reset, we have to go back and re-init
+		 * everything.  Easiest to just forget what we've done and do it
+		 * all over again.
+		 */
+		hpsa_undo_allocations_after_kdump_soft_reset(h);
+		try_soft_reset = 0;
+		if (rc)
+			/* don't go to clean4, we already unallocated */
+			return -ENODEV;
+
+		goto reinit_after_soft_reset;
+	}
 
 	/* Turn the interrupts on so we can service requests */
 	h->access.set_intr_mask(h, HPSA_INTR_ON);
 
-	hpsa_put_ctlr_into_performant_mode(h);
 	hpsa_hba_inquiry(h);
 	hpsa_register_scsi(h);	/* hook ourselves into SCSI subsystem */
 	h->busy_initializing = 0;
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index b1412a7..6d8dcd4 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -127,8 +127,10 @@ struct ctlr_info {
 };
 #define HPSA_ABORT_MSG 0
 #define HPSA_DEVICE_RESET_MSG 1
-#define HPSA_BUS_RESET_MSG 2
-#define HPSA_HOST_RESET_MSG 3
+#define HPSA_RESET_TYPE_CONTROLLER 0x00
+#define HPSA_RESET_TYPE_BUS 0x01
+#define HPSA_RESET_TYPE_TARGET 0x03
+#define HPSA_RESET_TYPE_LUN 0x04
 #define HPSA_MSG_SEND_RETRY_LIMIT 10
 #define HPSA_MSG_SEND_RETRY_INTERVAL_MSECS (10000)
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 14/16] hpsa: remove superfluous sleeps around reset code
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (12 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 13/16] hpsa: do soft reset if hard reset is broken Stephen M. Cameron
@ 2011-05-03 19:59 ` Stephen M. Cameron
  2011-05-03 20:00 ` [PATCH 15/16] hpsa: do not attempt PCI power management reset method if we know it won't work Stephen M. Cameron
  2011-05-03 20:00 ` [PATCH 16/16] hpsa: add P2000 to list of shared SAS devices Stephen M. Cameron
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 19:59 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 6fe77d0..ca8ee92 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3217,7 +3217,6 @@ static int hpsa_controller_hard_reset(struct pci_dev *pdev,
 		 */
 		dev_info(&pdev->dev, "using doorbell to reset controller\n");
 		writel(use_doorbell, vaddr + SA5_DOORBELL);
-		msleep(1000);
 	} else { /* Try to do it the PCI power state way */
 
 		/* Quoting from the Open CISS Specification: "The Power
@@ -3248,8 +3247,6 @@ static int hpsa_controller_hard_reset(struct pci_dev *pdev,
 		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
 		pmcsr |= PCI_D0;
 		pci_write_config_word(pdev, pos + PCI_PM_CTRL, pmcsr);
-
-		msleep(500);
 	}
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 15/16] hpsa: do not attempt PCI power management reset method if we know it won't work.
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (13 preceding siblings ...)
  2011-05-03 19:59 ` [PATCH 14/16] hpsa: remove superfluous sleeps around reset code Stephen M. Cameron
@ 2011-05-03 20:00 ` Stephen M. Cameron
  2011-05-03 20:00 ` [PATCH 16/16] hpsa: add P2000 to list of shared SAS devices Stephen M. Cameron
  15 siblings, 0 replies; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 20:00 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Just go straight to the soft-reset method instead.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   52 +++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index ca8ee92..5c03cb5 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -273,7 +273,7 @@ static ssize_t host_show_transport_mode(struct device *dev,
 			"performant" : "simple");
 }
 
-/* List of controllers which cannot be reset on kexec with reset_devices */
+/* List of controllers which cannot be hard reset on kexec with reset_devices */
 static u32 unresettable_controller[] = {
 	0x324a103C, /* Smart Array P712m */
 	0x324b103C, /* SmartArray P711m */
@@ -291,16 +291,45 @@ static u32 unresettable_controller[] = {
 	0x409D0E11, /* Smart Array 6400 EM */
 };
 
-static int ctlr_is_resettable(struct ctlr_info *h)
+/* List of controllers which cannot even be soft reset */
+static u32 soft_unresettable_controller[] = {
+	/* Exclude 640x boards.  These are two pci devices in one slot
+	 * which share a battery backed cache module.  One controls the
+	 * cache, the other accesses the cache through the one that controls
+	 * it.  If we reset the one controlling the cache, the other will
+	 * likely not be happy.  Just forbid resetting this conjoined mess.
+	 * The 640x isn't really supported by hpsa anyway.
+	 */
+	0x409C0E11, /* Smart Array 6400 */
+	0x409D0E11, /* Smart Array 6400 EM */
+};
+
+static int ctlr_is_hard_resettable(u32 board_id)
 {
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(unresettable_controller); i++)
-		if (unresettable_controller[i] == h->board_id)
+		if (unresettable_controller[i] == board_id)
+			return 0;
+	return 1;
+}
+
+static int ctlr_is_soft_resettable(u32 board_id)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(soft_unresettable_controller); i++)
+		if (soft_unresettable_controller[i] == board_id)
 			return 0;
 	return 1;
 }
 
+static int ctlr_is_resettable(u32 board_id)
+{
+	return ctlr_is_hard_resettable(board_id) ||
+		ctlr_is_soft_resettable(board_id);
+}
+
 static ssize_t host_show_resettable(struct device *dev,
 	struct device_attribute *attr, char *buf)
 {
@@ -308,7 +337,7 @@ static ssize_t host_show_resettable(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 
 	h = shost_to_hba(shost);
-	return snprintf(buf, 20, "%d\n", ctlr_is_resettable(h));
+	return snprintf(buf, 20, "%d\n", ctlr_is_resettable(h->board_id));
 }
 
 static inline int is_logical_dev_addr_mode(unsigned char scsi3addr[])
@@ -3334,20 +3363,15 @@ static __devinit int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	 * using the doorbell register.
 	 */
 
-	/* Exclude 640x boards.  These are two pci devices in one slot
-	 * which share a battery backed cache module.  One controls the
-	 * cache, the other accesses the cache through the one that controls
-	 * it.  If we reset the one controlling the cache, the other will
-	 * likely not be happy.  Just forbid resetting this conjoined mess.
-	 * The 640x isn't really supported by hpsa anyway.
-	 */
 	rc = hpsa_lookup_board_id(pdev, &board_id);
-	if (rc < 0) {
+	if (rc < 0 || !ctlr_is_resettable(board_id)) {
 		dev_warn(&pdev->dev, "Not resetting device.\n");
 		return -ENODEV;
 	}
-	if (board_id == 0x409C0E11 || board_id == 0x409D0E11)
-		return -ENOTSUPP;
+
+	/* if controller is soft- but not hard resettable... */
+	if (!ctlr_is_hard_resettable(board_id))
+		return -ENOTSUPP; /* try soft reset later. */
 
 	/* Save the PCI command register */
 	pci_read_config_word(pdev, 4, &command_register);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 16/16] hpsa: add P2000 to list of shared SAS devices
  2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
                   ` (14 preceding siblings ...)
  2011-05-03 20:00 ` [PATCH 15/16] hpsa: do not attempt PCI power management reset method if we know it won't work Stephen M. Cameron
@ 2011-05-03 20:00 ` Stephen M. Cameron
  2011-05-17 10:12   ` James Bottomley
  15 siblings, 1 reply; 31+ messages in thread
From: Stephen M. Cameron @ 2011-05-03 20:00 UTC (permalink / raw)
  To: james.bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Scott Teel <scott.stacy.teel@hp.com>
---
 drivers/scsi/hpsa.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 5c03cb5..cffc7bb 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1591,6 +1591,7 @@ static unsigned char *msa2xxx_model[] = {
 	"MSA2024",
 	"MSA2312",
 	"MSA2324",
+	"P2000 G3 SAS",
 	NULL,
 };
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-03 19:58 ` [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost Stephen M. Cameron
@ 2011-05-04 11:15   ` Tomas Henzl
  2011-05-04 12:52     ` scameron
  0 siblings, 1 reply; 31+ messages in thread
From: Tomas Henzl @ 2011-05-04 11:15 UTC (permalink / raw)
  To: Stephen M. Cameron
  Cc: james.bottomley, linux-scsi, linux-kernel, smcameron, akpm, mikem

On 05/03/2011 09:58 PM, Stephen M. Cameron wrote:
> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>
> Apparently we've been doin it rong for a decade, but only lately do we
> run into problems.
>
> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> ---
>  drivers/scsi/hpsa.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> index 621a153..98c97ca 100644
> --- a/drivers/scsi/hpsa.h
> +++ b/drivers/scsi/hpsa.h
> @@ -212,6 +212,7 @@ static void SA5_submit_command(struct ctlr_info *h,
>  	dev_dbg(&h->pdev->dev, "Sending %x, tag = %x\n", c->busaddr,
>  		c->Header.Tag.lower);
>  	writel(c->busaddr, h->vaddr + SA5_REQUEST_PORT_OFFSET);
> +	(void) readl(h->vaddr + SA5_REQUEST_PORT_OFFSET);
>   
Hi,
a small nit -
the (void) ^ is I think not needed for gcc and isn't present in the cciss.h patch

Tomas

>  	h->commands_outstanding++;
>  	if (h->commands_outstanding > h->max_outstanding)
>  		h->max_outstanding = h->commands_outstanding;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-04 11:15   ` Tomas Henzl
@ 2011-05-04 12:52     ` scameron
  2011-05-04 13:34       ` Tomas Henzl
  2011-05-04 17:28       ` Valdis.Kletnieks
  0 siblings, 2 replies; 31+ messages in thread
From: scameron @ 2011-05-04 12:52 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: james.bottomley, linux-scsi, linux-kernel, smcameron, akpm,
	mikem, scameron

On Wed, May 04, 2011 at 01:15:50PM +0200, Tomas Henzl wrote:
> On 05/03/2011 09:58 PM, Stephen M. Cameron wrote:
> > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >
> > Apparently we've been doin it rong for a decade, but only lately do we
> > run into problems.
> >
> > Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> > ---
> >  drivers/scsi/hpsa.h |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> > index 621a153..98c97ca 100644
> > --- a/drivers/scsi/hpsa.h
> > +++ b/drivers/scsi/hpsa.h
> > @@ -212,6 +212,7 @@ static void SA5_submit_command(struct ctlr_info *h,
> >  	dev_dbg(&h->pdev->dev, "Sending %x, tag = %x\n", c->busaddr,
> >  		c->Header.Tag.lower);
> >  	writel(c->busaddr, h->vaddr + SA5_REQUEST_PORT_OFFSET);
> > +	(void) readl(h->vaddr + SA5_REQUEST_PORT_OFFSET);
> >   
> Hi,
> a small nit -
> the (void) ^ is I think not needed for gcc and isn't present in the cciss.h patch

I just put it there to make it clear that it ignoring the return of readl is 
done intentionally, not accidentally.  If this goes against some coding convention,
whatever, I'm not super attached to the (void), but I did put it there on purpose,
and would have done it in cciss as well, had I thought of it at the time.

-- steve

> 
> Tomas
> 
> >  	h->commands_outstanding++;
> >  	if (h->commands_outstanding > h->max_outstanding)
> >  		h->max_outstanding = h->commands_outstanding;
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >   

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-04 12:52     ` scameron
@ 2011-05-04 13:34       ` Tomas Henzl
  2011-05-04 17:28       ` Valdis.Kletnieks
  1 sibling, 0 replies; 31+ messages in thread
From: Tomas Henzl @ 2011-05-04 13:34 UTC (permalink / raw)
  To: scameron
  Cc: james.bottomley, linux-scsi, linux-kernel, smcameron, akpm, mikem

On 05/04/2011 02:52 PM, scameron@beardog.cce.hp.com wrote:
> On Wed, May 04, 2011 at 01:15:50PM +0200, Tomas Henzl wrote:
>   
>> On 05/03/2011 09:58 PM, Stephen M. Cameron wrote:
>>     
>>> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>>>
>>> Apparently we've been doin it rong for a decade, but only lately do we
>>> run into problems.
>>>
>>> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>>> ---
>>>  drivers/scsi/hpsa.h |    1 +
>>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
>>> index 621a153..98c97ca 100644
>>> --- a/drivers/scsi/hpsa.h
>>> +++ b/drivers/scsi/hpsa.h
>>> @@ -212,6 +212,7 @@ static void SA5_submit_command(struct ctlr_info *h,
>>>  	dev_dbg(&h->pdev->dev, "Sending %x, tag = %x\n", c->busaddr,
>>>  		c->Header.Tag.lower);
>>>  	writel(c->busaddr, h->vaddr + SA5_REQUEST_PORT_OFFSET);
>>> +	(void) readl(h->vaddr + SA5_REQUEST_PORT_OFFSET);
>>>   
>>>       
>> Hi,
>> a small nit -
>> the (void) ^ is I think not needed for gcc and isn't present in the cciss.h patch
>>     
> I just put it there to make it clear that it ignoring the return of readl is 
> done intentionally, not accidentally.  If this goes against some coding convention,
> whatever, I'm not super attached to the (void), but I did put it there on purpose,
> and would have done it in cciss as well, had I thought of it at the time.
>   
I'm not sure what the coding convention says about this, I personally would omit it,
both ways are used in the kernel - so it is fine for me.

Ack - Tomas



> -- steve
>
>   
>> Tomas
>>
>>     
>>>  	h->commands_outstanding++;
>>>  	if (h->commands_outstanding > h->max_outstanding)
>>>  		h->max_outstanding = h->commands_outstanding;
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>   
>>>       
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-04 12:52     ` scameron
  2011-05-04 13:34       ` Tomas Henzl
@ 2011-05-04 17:28       ` Valdis.Kletnieks
  2011-05-04 17:37         ` Matthew Wilcox
  1 sibling, 1 reply; 31+ messages in thread
From: Valdis.Kletnieks @ 2011-05-04 17:28 UTC (permalink / raw)
  To: scameron
  Cc: Tomas Henzl, james.bottomley, linux-scsi, linux-kernel,
	smcameron, akpm, mikem

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

On Wed, 04 May 2011 07:52:12 CDT, scameron@beardog.cce.hp.com said:
> On Wed, May 04, 2011 at 01:15:50PM +0200, Tomas Henzl wrote:
> > On 05/03/2011 09:58 PM, Stephen M. Cameron wrote:
> > > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

> > >  	dev_dbg(&h->pdev->dev, "Sending %x, tag = %x\n", c->busaddr,
> > >  		c->Header.Tag.lower);
> > >  	writel(c->busaddr, h->vaddr + SA5_REQUEST_PORT_OFFSET);
> > > +	(void) readl(h->vaddr + SA5_REQUEST_PORT_OFFSET);

> I just put it there to make it clear that it ignoring the return of readl is 
> done intentionally, not accidentally.  If this goes against some coding convention,
> whatever, I'm not super attached to the (void), but I did put it there on purpose,
> and would have done it in cciss as well, had I thought of it at the time.

This probably needs a comment like
	/* don't care - dummy read just to force write posting to chipset */
or similar.  I'm assuming it's just functioning as a barrier-type flush of some sort?




[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-04 17:28       ` Valdis.Kletnieks
@ 2011-05-04 17:37         ` Matthew Wilcox
  2011-05-04 17:54           ` Valdis.Kletnieks
  0 siblings, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2011-05-04 17:37 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: scameron, Tomas Henzl, james.bottomley, linux-scsi, linux-kernel,
	smcameron, akpm, mikem

On Wed, May 04, 2011 at 01:28:21PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 04 May 2011 07:52:12 CDT, scameron@beardog.cce.hp.com said:
> > On Wed, May 04, 2011 at 01:15:50PM +0200, Tomas Henzl wrote:
> > > On 05/03/2011 09:58 PM, Stephen M. Cameron wrote:
> > > > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> 
> > > >  	dev_dbg(&h->pdev->dev, "Sending %x, tag = %x\n", c->busaddr,
> > > >  		c->Header.Tag.lower);
> > > >  	writel(c->busaddr, h->vaddr + SA5_REQUEST_PORT_OFFSET);
> > > > +	(void) readl(h->vaddr + SA5_REQUEST_PORT_OFFSET);
> 
> > I just put it there to make it clear that it ignoring the return of readl is 
> > done intentionally, not accidentally.  If this goes against some coding convention,
> > whatever, I'm not super attached to the (void), but I did put it there on purpose,
> > and would have done it in cciss as well, had I thought of it at the time.
> 
> This probably needs a comment like
> 	/* don't care - dummy read just to force write posting to chipset */
> or similar.  I'm assuming it's just functioning as a barrier-type flush of some sort?

It's a PCI write flush.  It's not clear to me why it's needed here,
though.  The write will eventually get to the device; why we need to
make the CPU wait around for it to actually get there doesn't make sense.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-04 17:37         ` Matthew Wilcox
@ 2011-05-04 17:54           ` Valdis.Kletnieks
  2011-05-05 18:35               ` Mike Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Valdis.Kletnieks @ 2011-05-04 17:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: scameron, Tomas Henzl, james.bottomley, linux-scsi, linux-kernel,
	smcameron, akpm, mikem

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
> > This probably needs a comment like
> > 	/* don't care - dummy read just to force write posting to chipset */
> > or similar.  I'm assuming it's just functioning as a barrier-type flush of some sort?
> 
> It's a PCI write flush.  It's not clear to me why it's needed here,
> though.  The write will eventually get to the device; why we need to
> make the CPU wait around for it to actually get there doesn't make sense.

Exactly why I think it needs a one-liner comment. :)



[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-04 17:54           ` Valdis.Kletnieks
@ 2011-05-05 18:35               ` Mike Miller
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Miller @ 2011-05-05 18:35 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Valdis.Kletnieks, scameron, thenzl, Andrew Morton, LKML,
	LKML-scsi, Jens Axboe

On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
> > > This probably needs a comment like
> > > 	/* don't care - dummy read just to force write posting to chipset */
> > > or similar.  I'm assuming it's just functioning as a barrier-type flush of some sort?
> > 
> > It's a PCI write flush.  It's not clear to me why it's needed here,
> > though.  The write will eventually get to the device; why we need to
> > make the CPU wait around for it to actually get there doesn't make sense.
> 
> Exactly why I think it needs a one-liner comment. :)
> 
So we're not exactly sure why it's needed either. We've had reports of
commands getting "lost" or "stuck" under some workloads. The extra readl
works around the issue but certainly may have negative side effects.

I'm not sure I understand how writel works.

>From linux-2.6/arch/x86/include/asm/io.h:

#define build_mmio_write(name, size, type, reg, barrier) \
static inline void name(type val, volatile void __iomem *addr) \
{ asm volatile("mov" size " %0,%1": :reg (val), \
"m" (*(volatile type __force *)addr) barrier); }

This implies (at least to me) that a barrier is part of writel. I don't know
why a write operation needs a barrier but thats essentially what we've done
by adding the extra readl. Can someone confirm or deny that a barrier is
actually built into writel? Or used by writel? If so, does this indicate
that barrier is broken?

At this point we (the software guys) are pretty much at a loss as to how to
continue debugging. We don't know what to trigger on for the PCIe analyzer.
If we track outstanding commands then trigger on one that doesn't complete in
some amount of time the problem could conceivably be far in the past and
difficult to correlate to the data in the trace.

If anyone has any thoughts, suggestions, or flames they would be greatly
appreciated.

-- mikem

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
@ 2011-05-05 18:35               ` Mike Miller
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Miller @ 2011-05-05 18:35 UTC (permalink / raw)
  Cc: Valdis.Kletnieks, scameron, thenzl, Andrew Morton, LKML,
	LKML-scsi, Jens Axboe

On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
> > > This probably needs a comment like
> > > 	/* don't care - dummy read just to force write posting to chipset */
> > > or similar.  I'm assuming it's just functioning as a barrier-type flush of some sort?
> > 
> > It's a PCI write flush.  It's not clear to me why it's needed here,
> > though.  The write will eventually get to the device; why we need to
> > make the CPU wait around for it to actually get there doesn't make sense.
> 
> Exactly why I think it needs a one-liner comment. :)
> 
So we're not exactly sure why it's needed either. We've had reports of
commands getting "lost" or "stuck" under some workloads. The extra readl
works around the issue but certainly may have negative side effects.

I'm not sure I understand how writel works.

>From linux-2.6/arch/x86/include/asm/io.h:

#define build_mmio_write(name, size, type, reg, barrier) \
static inline void name(type val, volatile void __iomem *addr) \
{ asm volatile("mov" size " %0,%1": :reg (val), \
"m" (*(volatile type __force *)addr) barrier); }

This implies (at least to me) that a barrier is part of writel. I don't know
why a write operation needs a barrier but thats essentially what we've done
by adding the extra readl. Can someone confirm or deny that a barrier is
actually built into writel? Or used by writel? If so, does this indicate
that barrier is broken?

At this point we (the software guys) are pretty much at a loss as to how to
continue debugging. We don't know what to trigger on for the PCIe analyzer.
If we track outstanding commands then trigger on one that doesn't complete in
some amount of time the problem could conceivably be far in the past and
difficult to correlate to the data in the trace.

If anyone has any thoughts, suggestions, or flames they would be greatly
appreciated.

-- mikem

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 16/16] hpsa: add P2000 to list of shared SAS devices
  2011-05-03 20:00 ` [PATCH 16/16] hpsa: add P2000 to list of shared SAS devices Stephen M. Cameron
@ 2011-05-17 10:12   ` James Bottomley
  2011-05-17 13:26     ` scameron
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2011-05-17 10:12 UTC (permalink / raw)
  To: Stephen M. Cameron
  Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

On Tue, 2011-05-03 at 15:00 -0500, Stephen M. Cameron wrote:
> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> 
> Signed-off-by: Scott Teel <scott.stacy.teel@hp.com>

Just FYI for future, this is missing your signoff.; it needs to be there
because you're sending me the patch.  I also suspect this patch is
actually by Stott Teel?  In which case it should show as From: him.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 16/16] hpsa: add P2000 to list of shared SAS devices
  2011-05-17 10:12   ` James Bottomley
@ 2011-05-17 13:26     ` scameron
  0 siblings, 0 replies; 31+ messages in thread
From: scameron @ 2011-05-17 13:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel, smcameron, thenzl, akpm, mikem

On Tue, May 17, 2011 at 02:12:27PM +0400, James Bottomley wrote:
> On Tue, 2011-05-03 at 15:00 -0500, Stephen M. Cameron wrote:
> > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> > 
> > Signed-off-by: Scott Teel <scott.stacy.teel@hp.com>
> 
> Just FYI for future, this is missing your signoff.; it needs to be there
> because you're sending me the patch.  I also suspect this patch is
> actually by Stott Teel?  In which case it should show as From: him.

Ok thanks, and sorry about that.  Yes, it is from Scott Teel.  I'm not
very accustomed to forwarding patches from other people.

-- steve


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-05 18:35               ` Mike Miller
  (?)
@ 2011-05-23 11:37               ` Tomas Henzl
  2011-05-25 15:20                 ` Miller, Mike (OS Dev)
  -1 siblings, 1 reply; 31+ messages in thread
From: Tomas Henzl @ 2011-05-23 11:37 UTC (permalink / raw)
  To: Mike Miller
  Cc: Valdis.Kletnieks, scameron, Andrew Morton, LKML, LKML-scsi, Jens Axboe

On 05/05/2011 08:35 PM, Mike Miller wrote:
> On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@vt.edu wrote:
>   
>> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
>>     
>>>> This probably needs a comment like
>>>> 	/* don't care - dummy read just to force write posting to chipset */
>>>> or similar.  I'm assuming it's just functioning as a barrier-type flush of some sort?
>>>>         
>>> It's a PCI write flush.  It's not clear to me why it's needed here,
>>> though.  The write will eventually get to the device; why we need to
>>> make the CPU wait around for it to actually get there doesn't make sense.
>>>       
>> Exactly why I think it needs a one-liner comment. :)
>>
>>     
> So we're not exactly sure why it's needed either. We've had reports of
> commands getting "lost" or "stuck" under some workloads. The extra readl
> works around the issue but certainly may have negative side effects.
>
> I'm not sure I understand how writel works.
>
> From linux-2.6/arch/x86/include/asm/io.h:
>
> #define build_mmio_write(name, size, type, reg, barrier) \
> static inline void name(type val, volatile void __iomem *addr) \
> { asm volatile("mov" size " %0,%1": :reg (val), \
> "m" (*(volatile type __force *)addr) barrier); }
>
> This implies (at least to me) that a barrier is part of writel. I don't know
> why a write operation needs a barrier but thats essentially what we've done
> by adding the extra readl. Can someone confirm or deny that a barrier is
> actually built into writel? Or used by writel? If so, does this indicate
> that barrier is broken?
>
> At this point we (the software guys) are pretty much at a loss as to how to
> continue debugging. We don't know what to trigger on for the PCIe analyzer.
> If we track outstanding commands then trigger on one that doesn't complete in
> some amount of time the problem could conceivably be far in the past and
> difficult to correlate to the data in the trace.
>   
I'd look at the firmware part, you could check what happens for example when
the firmware gets send a command it doesn't understand.
You could also change the communication with the fw by adding a count field, which can
be then checked for the !(next_value == previous_value + 1) and raise an event.
tomas


> If anyone has any thoughts, suggestions, or flames they would be greatly
> appreciated.
>
> -- mikem
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-23 11:37               ` Tomas Henzl
@ 2011-05-25 15:20                 ` Miller, Mike (OS Dev)
  2011-05-26 12:13                   ` Tomas Henzl
  0 siblings, 1 reply; 31+ messages in thread
From: Miller, Mike (OS Dev) @ 2011-05-25 15:20 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: Valdis.Kletnieks, scameron, Andrew Morton, LKML, LKML-scsi, Jens Axboe

Tomas wrote:

> -----Original Message-----
> From: Tomas Henzl [mailto:thenzl@redhat.com]
> Sent: Monday, May 23, 2011 6:38 AM
> To: Miller, Mike (OS Dev)
> Cc: Valdis.Kletnieks@vt.edu; scameron@beardog.cce.hp.com; Andrew Morton;
> LKML; LKML-scsi; Jens Axboe
> Subject: Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path
> to ensure commands don't get lost.
> 
> On 05/05/2011 08:35 PM, Mike Miller wrote:
> > On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@vt.edu
> wrote:
> >
> >> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
> >>
> >>>> This probably needs a comment like
> >>>> 	/* don't care - dummy read just to force write posting to chipset
> */
> >>>> or similar.  I'm assuming it's just functioning as a barrier-type
> flush of some sort?
> >>>>
> >>> It's a PCI write flush.  It's not clear to me why it's needed here,
> >>> though.  The write will eventually get to the device; why we need to
> >>> make the CPU wait around for it to actually get there doesn't make
> sense.
> >>>
> >> Exactly why I think it needs a one-liner comment. :)
> >>
> >>
> > So we're not exactly sure why it's needed either. We've had reports of
> > commands getting "lost" or "stuck" under some workloads. The extra
> readl
> > works around the issue but certainly may have negative side effects.
> >
> > I'm not sure I understand how writel works.
> >
> > From linux-2.6/arch/x86/include/asm/io.h:
> >
> > #define build_mmio_write(name, size, type, reg, barrier) \
> > static inline void name(type val, volatile void __iomem *addr) \
> > { asm volatile("mov" size " %0,%1": :reg (val), \
> > "m" (*(volatile type __force *)addr) barrier); }
> >
> > This implies (at least to me) that a barrier is part of writel. I
> don't know
> > why a write operation needs a barrier but thats essentially what we've
> done
> > by adding the extra readl. Can someone confirm or deny that a barrier
> is
> > actually built into writel? Or used by writel? If so, does this
> indicate
> > that barrier is broken?
> >
> > At this point we (the software guys) are pretty much at a loss as to
> how to
> > continue debugging. We don't know what to trigger on for the PCIe
> analyzer.
> > If we track outstanding commands then trigger on one that doesn't
> complete in
> > some amount of time the problem could conceivably be far in the past
> and
> > difficult to correlate to the data in the trace.
> >
> I'd look at the firmware part, you could check what happens for example
> when
> the firmware gets send a command it doesn't understand.
> You could also change the communication with the fw by adding a count
> field, which can
> be then checked for the !(next_value == previous_value + 1) and raise an
> event.
> tomas

Tomas,
We've tried something very similar to the counter idea in fw. It doesn't help because the controller thinks he's done with the request. We have a (pretty crude) counter in the driver but no timing mechanism. We could add a timer. But what's a suitable timeout value? Is 2 seconds too short, too long? Suggestions, please.

-- mikem


> 
> 
> > If anyone has any thoughts, suggestions, or flames they would be
> greatly
> > appreciated.
> >
> > -- mikem
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-25 15:20                 ` Miller, Mike (OS Dev)
@ 2011-05-26 12:13                   ` Tomas Henzl
  2011-05-26 14:53                     ` Miller, Mike (OS Dev)
  0 siblings, 1 reply; 31+ messages in thread
From: Tomas Henzl @ 2011-05-26 12:13 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: Valdis.Kletnieks, scameron, Andrew Morton, LKML, LKML-scsi, Jens Axboe

On 05/25/2011 05:20 PM, Miller, Mike (OS Dev) wrote:
> Tomas wrote:
>
>   
>> -----Original Message-----
>> From: Tomas Henzl [mailto:thenzl@redhat.com]
>> Sent: Monday, May 23, 2011 6:38 AM
>> To: Miller, Mike (OS Dev)
>> Cc: Valdis.Kletnieks@vt.edu; scameron@beardog.cce.hp.com; Andrew Morton;
>> LKML; LKML-scsi; Jens Axboe
>> Subject: Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path
>> to ensure commands don't get lost.
>>
>> On 05/05/2011 08:35 PM, Mike Miller wrote:
>>     
>>> On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@vt.edu
>>>       
>> wrote:
>>     
>>>       
>>>> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
>>>>
>>>>         
>>>>>> This probably needs a comment like
>>>>>> 	/* don't care - dummy read just to force write posting to chipset
>>>>>>             
>> */
>>     
>>>>>> or similar.  I'm assuming it's just functioning as a barrier-type
>>>>>>             
>> flush of some sort?
>>     
>>>>>>             
>>>>> It's a PCI write flush.  It's not clear to me why it's needed here,
>>>>> though.  The write will eventually get to the device; why we need to
>>>>> make the CPU wait around for it to actually get there doesn't make
>>>>>           
>> sense.
>>     
>>>>>           
>>>> Exactly why I think it needs a one-liner comment. :)
>>>>
>>>>
>>>>         
>>> So we're not exactly sure why it's needed either. We've had reports of
>>> commands getting "lost" or "stuck" under some workloads. The extra
>>>       
>> readl
>>     
>>> works around the issue but certainly may have negative side effects.
>>>
>>> I'm not sure I understand how writel works.
>>>
>>> From linux-2.6/arch/x86/include/asm/io.h:
>>>
>>> #define build_mmio_write(name, size, type, reg, barrier) \
>>> static inline void name(type val, volatile void __iomem *addr) \
>>> { asm volatile("mov" size " %0,%1": :reg (val), \
>>> "m" (*(volatile type __force *)addr) barrier); }
>>>
>>> This implies (at least to me) that a barrier is part of writel. I
>>>       
>> don't know
>>     
>>> why a write operation needs a barrier but thats essentially what we've
>>>       
>> done
>>     
>>> by adding the extra readl. Can someone confirm or deny that a barrier
>>>       
>> is
>>     
>>> actually built into writel? Or used by writel? If so, does this
>>>       
>> indicate
>>     
>>> that barrier is broken?
>>>
>>> At this point we (the software guys) are pretty much at a loss as to
>>>       
>> how to
>>     
>>> continue debugging. We don't know what to trigger on for the PCIe
>>>       
>> analyzer.
>>     
>>> If we track outstanding commands then trigger on one that doesn't
>>>       
>> complete in
>>     
>>> some amount of time the problem could conceivably be far in the past
>>>       
>> and
>>     
>>> difficult to correlate to the data in the trace.
>>>
>>>       
>> I'd look at the firmware part, you could check what happens for example
>> when
>> the firmware gets send a command it doesn't understand.
>> You could also change the communication with the fw by adding a count
>> field, which can
>> be then checked for the !(next_value == previous_value + 1) and raise an
>> event.
>> tomas
>>     
> Tomas,
> We've tried something very similar to the counter idea in fw. It doesn't help because the controller thinks he's done with the request. We have a (pretty crude) counter in the driver but no timing mechanism. We could add a timer. But what's a suitable timeout value? Is 2 seconds too short, too long? Suggestions, please.
>   
I know that a counter isn't a ground-breaking idea, just wanted to show some interest :)
The command can be either eaten by the firmware or during the communication in or out from the device.
I'd would start by the communication, by adding some fields to the command to detect if a command in the row(s) isn't
missing - I know even that isn't easy. The same could be done independently done for the other direction.

tomash

> -- mikem
>
>
>   
>>
>>     
>>> If anyone has any thoughts, suggestions, or flames they would be
>>>       
>> greatly
>>     
>>> appreciated.
>>>
>>> -- mikem
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
>>>       
>> in
>>     
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>       
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost.
  2011-05-26 12:13                   ` Tomas Henzl
@ 2011-05-26 14:53                     ` Miller, Mike (OS Dev)
  0 siblings, 0 replies; 31+ messages in thread
From: Miller, Mike (OS Dev) @ 2011-05-26 14:53 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: Valdis.Kletnieks, scameron, Andrew Morton, LKML, LKML-scsi, Jens Axboe



> -----Original Message-----
> From: Tomas Henzl [mailto:thenzl@redhat.com]
> Sent: Thursday, May 26, 2011 7:14 AM
> To: Miller, Mike (OS Dev)
> Cc: Valdis.Kletnieks@vt.edu; scameron@beardog.cce.hp.com; Andrew Morton;
> LKML; LKML-scsi; Jens Axboe
> Subject: Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path
> to ensure commands don't get lost.
> 
> On 05/25/2011 05:20 PM, Miller, Mike (OS Dev) wrote:
> > Tomas wrote:
> >
> >
> >> -----Original Message-----
> >> From: Tomas Henzl [mailto:thenzl@redhat.com]
> >> Sent: Monday, May 23, 2011 6:38 AM
> >> To: Miller, Mike (OS Dev)
> >> Cc: Valdis.Kletnieks@vt.edu; scameron@beardog.cce.hp.com; Andrew
> Morton;
> >> LKML; LKML-scsi; Jens Axboe
> >> Subject: Re: [PATCH 01/16] hpsa: do readl after writel in main i/o
> path
> >> to ensure commands don't get lost.
> >>
> >> On 05/05/2011 08:35 PM, Mike Miller wrote:
> >>
> >>> On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@vt.edu
> >>>
> >> wrote:
> >>
> >>>
> >>>> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
> >>>>
> >>>>
> >>>>>> This probably needs a comment like
> >>>>>> 	/* don't care - dummy read just to force write posting to
> chipset
> >>>>>>
> >> */
> >>
> >>>>>> or similar.  I'm assuming it's just functioning as a barrier-type
> >>>>>>
> >> flush of some sort?
> >>
> >>>>>>
> >>>>> It's a PCI write flush.  It's not clear to me why it's needed
> here,
> >>>>> though.  The write will eventually get to the device; why we need
> to
> >>>>> make the CPU wait around for it to actually get there doesn't make
> >>>>>
> >> sense.
> >>
> >>>>>
> >>>> Exactly why I think it needs a one-liner comment. :)
> >>>>
> >>>>
> >>>>
> >>> So we're not exactly sure why it's needed either. We've had reports
> of
> >>> commands getting "lost" or "stuck" under some workloads. The extra
> >>>
> >> readl
> >>
> >>> works around the issue but certainly may have negative side effects.
> >>>
> >>> I'm not sure I understand how writel works.
> >>>
> >>> From linux-2.6/arch/x86/include/asm/io.h:
> >>>
> >>> #define build_mmio_write(name, size, type, reg, barrier) \
> >>> static inline void name(type val, volatile void __iomem *addr) \
> >>> { asm volatile("mov" size " %0,%1": :reg (val), \
> >>> "m" (*(volatile type __force *)addr) barrier); }
> >>>
> >>> This implies (at least to me) that a barrier is part of writel. I
> >>>
> >> don't know
> >>
> >>> why a write operation needs a barrier but thats essentially what
> we've
> >>>
> >> done
> >>
> >>> by adding the extra readl. Can someone confirm or deny that a
> barrier
> >>>
> >> is
> >>
> >>> actually built into writel? Or used by writel? If so, does this
> >>>
> >> indicate
> >>
> >>> that barrier is broken?
> >>>
> >>> At this point we (the software guys) are pretty much at a loss as to
> >>>
> >> how to
> >>
> >>> continue debugging. We don't know what to trigger on for the PCIe
> >>>
> >> analyzer.
> >>
> >>> If we track outstanding commands then trigger on one that doesn't
> >>>
> >> complete in
> >>
> >>> some amount of time the problem could conceivably be far in the past
> >>>
> >> and
> >>
> >>> difficult to correlate to the data in the trace.
> >>>
> >>>
> >> I'd look at the firmware part, you could check what happens for
> example
> >> when
> >> the firmware gets send a command it doesn't understand.
> >> You could also change the communication with the fw by adding a count
> >> field, which can
> >> be then checked for the !(next_value == previous_value + 1) and raise
> an
> >> event.
> >> tomas
> >>
> > Tomas,
> > We've tried something very similar to the counter idea in fw. It
> doesn't help because the controller thinks he's done with the request.
> We have a (pretty crude) counter in the driver but no timing mechanism.
> We could add a timer. But what's a suitable timeout value? Is 2 seconds
> too short, too long? Suggestions, please.
> >
> I know that a counter isn't a ground-breaking idea, just wanted to show
> some interest :)

:)

> The command can be either eaten by the firmware or during the
> communication in or out from the device.
> I'd would start by the communication, by adding some fields to the
> command to detect if a command in the row(s) isn't
> missing - I know even that isn't easy. The same could be done
> independently done for the other direction.
> 
> tomash

Thanks, Tomas.

> 
> > -- mikem
> >
> >
> >
> >>
> >>
> >>> If anyone has any thoughts, suggestions, or flames they would be
> >>>
> >> greatly
> >>
> >>> appreciated.
> >>>
> >>> -- mikem
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-
> scsi"
> >>>
> >> in
> >>
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>>
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-05-26 14:55 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-03 19:58 [PATCH 00/16] hpsa: May 3, 2011 updates Stephen M. Cameron
2011-05-03 19:58 ` [PATCH 01/16] hpsa: do readl after writel in main i/o path to ensure commands don't get lost Stephen M. Cameron
2011-05-04 11:15   ` Tomas Henzl
2011-05-04 12:52     ` scameron
2011-05-04 13:34       ` Tomas Henzl
2011-05-04 17:28       ` Valdis.Kletnieks
2011-05-04 17:37         ` Matthew Wilcox
2011-05-04 17:54           ` Valdis.Kletnieks
2011-05-05 18:35             ` Mike Miller
2011-05-05 18:35               ` Mike Miller
2011-05-23 11:37               ` Tomas Henzl
2011-05-25 15:20                 ` Miller, Mike (OS Dev)
2011-05-26 12:13                   ` Tomas Henzl
2011-05-26 14:53                     ` Miller, Mike (OS Dev)
2011-05-03 19:58 ` [PATCH 02/16] hpsa: add readl after writel in interrupt mask setting code Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 03/16] hpsa: remove unused parameter from hpsa_complete_scsi_command() Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 04/16] hpsa: delete old unused padding garbage Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 05/16] hpsa: do a better job of detecting controller reset failure Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 06/16] hpsa: wait longer for no-op to complete after resetting controller Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 07/16] hpsa: factor out cmd pool allocation functions Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 08/16] hpsa: factor out irq request code Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 09/16] hpsa: increase time to wait for board reset Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 10/16] hpsa: clarify messages around reset behavior Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 11/16] hpsa: remove atrophied hpsa_scsi_setup function Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 12/16] hpsa: use new doorbell-bit-5 reset method Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 13/16] hpsa: do soft reset if hard reset is broken Stephen M. Cameron
2011-05-03 19:59 ` [PATCH 14/16] hpsa: remove superfluous sleeps around reset code Stephen M. Cameron
2011-05-03 20:00 ` [PATCH 15/16] hpsa: do not attempt PCI power management reset method if we know it won't work Stephen M. Cameron
2011-05-03 20:00 ` [PATCH 16/16] hpsa: add P2000 to list of shared SAS devices Stephen M. Cameron
2011-05-17 10:12   ` James Bottomley
2011-05-17 13:26     ` scameron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.