All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] megaraid_sas: Fix an errant command from hw
@ 2011-08-24 15:35 Tomas Henzl
  2011-08-24 15:35 ` [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call Tomas Henzl
  2011-08-24 15:35 ` [PATCH 2/2] megaraid_sas: Initialize the cmd field to a known value Tomas Henzl
  0 siblings, 2 replies; 7+ messages in thread
From: Tomas Henzl @ 2011-08-24 15:35 UTC (permalink / raw)
  To: linux-scsi; +Cc: steve.masters, michael.benz, aradford, bo.yang, Tomas Henzl

The following patch series provides a partial fix for a problem
seen in kdump after saving vmcore when the driver shuts down.

For unknown reason the card sends a command to the driver with a random values, 
we can then sometimes see this -
BUG: unable to handle kernel NULL pointer dereference at 00000090
....
The proper solution is to prevent this, maybe a firmware change is needed?

The first patch initializes the cmd field to a dummy value so the 'ghost' command
can be recognized.
The second patch removes the call to the megasas_shutdown_controller
in the megasas_shutdown_controller, because the errant cmd is sent while the controller
is shut down and it seems to be enough to only flush cache and disable interrupts.

Tomas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call
  2011-08-24 15:35 [PATCH 0/2] megaraid_sas: Fix an errant command from hw Tomas Henzl
@ 2011-08-24 15:35 ` Tomas Henzl
  2011-08-24 20:55   ` adam radford
  2011-08-24 15:35 ` [PATCH 2/2] megaraid_sas: Initialize the cmd field to a known value Tomas Henzl
  1 sibling, 1 reply; 7+ messages in thread
From: Tomas Henzl @ 2011-08-24 15:35 UTC (permalink / raw)
  To: linux-scsi; +Cc: steve.masters, michael.benz, aradford, bo.yang, Tomas Henzl

Remove the call to megasas_shutdown_controller in the megasas_shutdown.
It's unclear why it should be needed, flushing the cache and disabling ints 
seems to be enough, because the system is likely going to be switched off
or restarted anyway. The main reason for removing this is a errant message
sent sometimes to the driver when the card is shut down.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index a58f84a..14c33f6 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4532,7 +4532,6 @@ static void megasas_shutdown(struct pci_dev *pdev)
 	struct megasas_instance *instance = pci_get_drvdata(pdev);
 	instance->unload = 1;
 	megasas_flush_cache(instance);
-	megasas_shutdown_controller(instance, MR_DCMD_CTRL_SHUTDOWN);
 	instance->instancet->disable_intr(instance->reg_set);
 	free_irq(instance->msi_flag ? instance->msixentry.vector :
 		 instance->pdev->irq, instance);
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] megaraid_sas: Initialize the cmd field to a known value
  2011-08-24 15:35 [PATCH 0/2] megaraid_sas: Fix an errant command from hw Tomas Henzl
  2011-08-24 15:35 ` [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call Tomas Henzl
@ 2011-08-24 15:35 ` Tomas Henzl
  1 sibling, 0 replies; 7+ messages in thread
From: Tomas Henzl @ 2011-08-24 15:35 UTC (permalink / raw)
  To: linux-scsi; +Cc: steve.masters, michael.benz, aradford, bo.yang, Tomas Henzl

When the command is returned to the pool initialize it with a known value,
this helps to recognize an errant (non-initialized) command and skip it later.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas.h      |    4 ++++
 drivers/scsi/megaraid/megaraid_sas_base.c |    5 +++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
index 3cdb769..edc0117 100644
--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -138,6 +138,10 @@
 #define MFI_CMD_ABORT				0x06
 #define MFI_CMD_SMP				0x07
 #define MFI_CMD_STP				0x08
+#define MFI_CMD_GHOST				0x09
+/*
+ * Ghost command - used to catch unexpected messages from the hardware
+ */
 
 #define MR_DCMD_CTRL_GET_INFO			0x01010000
 #define MR_DCMD_LD_GET_LIST			0x03010000
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 14c33f6..4b8f457 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -212,6 +212,7 @@ megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd)
 
 	cmd->scmd = NULL;
 	cmd->frame_count = 0;
+	cmd->frame->hdr.cmd = MFI_CMD_GHOST;
 	list_add_tail(&cmd->list, &instance->cmd_pool);
 
 	spin_unlock_irqrestore(&instance->cmd_pool_lock, flags);
@@ -2288,6 +2289,10 @@ megasas_complete_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd,
 		megasas_complete_abort(instance, cmd);
 		break;
 
+	case MFI_CMD_GHOST:
+		printk("megasas: Ghost command arrived, this really shoudn't happen!\n");
+		break;
+
 	default:
 		printk("megasas: Unknown command completed! [0x%X]\n",
 		       hdr->cmd);
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call
  2011-08-24 15:35 ` [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call Tomas Henzl
@ 2011-08-24 20:55   ` adam radford
  2011-08-25  8:17     ` Tomas Henzl
  0 siblings, 1 reply; 7+ messages in thread
From: adam radford @ 2011-08-24 20:55 UTC (permalink / raw)
  To: Tomas Henzl; +Cc: linux-scsi, steve.masters, michael.benz, bo.yang

On Wed, Aug 24, 2011 at 8:35 AM, Tomas Henzl <thenzl@redhat.com> wrote:
> Remove the call to megasas_shutdown_controller in the megasas_shutdown.

NACK.   The call to megasas_shutdown_controller() should not be removed.

megasas_shutdown_controller() does 3 things:

1. Cancels pended AEN command in firmware.
2. Cancels pended map update command in firmware.
3. Fires a MR_DCMD_CTRL_SHUTDOWN to notify firmware there was a clean
shutdown.  Without this, FW will think there was an 'unclean shutdown'
on the next restart, which will cause rebuilds and other background FW
activity to kick on every shutdown/reboot.

-Adam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call
  2011-08-24 20:55   ` adam radford
@ 2011-08-25  8:17     ` Tomas Henzl
  2011-08-26  0:00       ` adam radford
  0 siblings, 1 reply; 7+ messages in thread
From: Tomas Henzl @ 2011-08-25  8:17 UTC (permalink / raw)
  To: adam radford; +Cc: linux-scsi, steve.masters, michael.benz, bo.yang

On 08/24/2011 10:55 PM, adam radford wrote:
> On Wed, Aug 24, 2011 at 8:35 AM, Tomas Henzl <thenzl@redhat.com> wrote:
>> Remove the call to megasas_shutdown_controller in the megasas_shutdown.
> NACK.   The call to megasas_shutdown_controller() should not be removed.
>
> megasas_shutdown_controller() does 3 things:
>
> 1. Cancels pended AEN command in firmware.
> 2. Cancels pended map update command in firmware.
> 3. Fires a MR_DCMD_CTRL_SHUTDOWN to notify firmware there was a clean
> shutdown.  Without this, FW will think there was an 'unclean shutdown'
> on the next restart, which will cause rebuilds and other background FW
> activity to kick on every shutdown/reboot.
OK, from what I have seen the problem happens as a response to MR_DCMD_CTRL_SHUTDOWN
(it means we could let the 1+2 pass and remove only the third point),
but nobody wants 'background FW activity...' etc so let s skip patch.
We need to find a better method for this problem.

In the meantime - I've sent two patches, what do you think about the second one?
It solves(hides) the problem too.
 
Tomas

>
> -Adam
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call
  2011-08-25  8:17     ` Tomas Henzl
@ 2011-08-26  0:00       ` adam radford
  2011-09-05 14:17         ` Tomas Henzl
  0 siblings, 1 reply; 7+ messages in thread
From: adam radford @ 2011-08-26  0:00 UTC (permalink / raw)
  To: Tomas Henzl; +Cc: linux-scsi, steve.masters, michael.benz, bo.yang

On Thu, Aug 25, 2011 at 1:17 AM, Tomas Henzl <thenzl@redhat.com> wrote:
> In the meantime - I've sent two patches, what do you think about the second one?
> It solves(hides) the problem too.

Here are my thoughts on your patches:

1. You are not checking if (reset_devices) around these code changes
to determine if you are in the kdump kernel, which is the only place
you are seeing this issue.
2. You should only change behavior for the device ID's you are having
issue with.
3. We need to update your PERC5 controller to the latest firmware,
then try to reproduce.

-Adam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call
  2011-08-26  0:00       ` adam radford
@ 2011-09-05 14:17         ` Tomas Henzl
  0 siblings, 0 replies; 7+ messages in thread
From: Tomas Henzl @ 2011-09-05 14:17 UTC (permalink / raw)
  To: adam radford; +Cc: linux-scsi, steve.masters, michael.benz, bo.yang

On 08/26/2011 02:00 AM, adam radford wrote:
> On Thu, Aug 25, 2011 at 1:17 AM, Tomas Henzl <thenzl@redhat.com> wrote:
>> In the meantime - I've sent two patches, what do you think about the second one?
>> It solves(hides) the problem too.
> Here are my thoughts on your patches:
>
> 1. You are not checking if (reset_devices) around these code changes
> to determine if you are in the kdump kernel, which is the only place
> you are seeing this issue.
> 2. You should only change behavior for the device ID's you are having
> issue with.
> 3. We need to update your PERC5 controller to the latest firmware,
> then try to reproduce.
This problem is fixed in firmware thanks to Adam (LSI), so neither patch is needed.

Tomas


>
> -Adam
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-09-05 14:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-24 15:35 [PATCH 0/2] megaraid_sas: Fix an errant command from hw Tomas Henzl
2011-08-24 15:35 ` [PATCH 1/2] megaraid_sas: Remove the shutdown_controller call Tomas Henzl
2011-08-24 20:55   ` adam radford
2011-08-25  8:17     ` Tomas Henzl
2011-08-26  0:00       ` adam radford
2011-09-05 14:17         ` Tomas Henzl
2011-08-24 15:35 ` [PATCH 2/2] megaraid_sas: Initialize the cmd field to a known value Tomas Henzl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.