All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
@ 2014-09-06 13:25 Sumit.Saxena
  2014-09-09 15:54 ` Tomas Henzl
  0 siblings, 1 reply; 14+ messages in thread
From: Sumit.Saxena @ 2014-09-06 13:25 UTC (permalink / raw)
  To: linux-scsi
  Cc: thenzl, martin.petersen, hch, jbottomley, kashyap.desai, aradford

This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and 
dump it into pre-configured location.

Driver will allocate two different segment of memory. 
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job. 

Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2, 
which will be copy back by driver to the host memory as described in #1.

Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data. 

This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory 
(available at run time) allocation to store crash dump data. 

Let’s call this buffer as Host Crash Buffer. 

Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory. 
This will be internal to driver and firmware/application are unaware of it. 
Partial allocation of Host Crash buffer may have valid information to debug depending upon 
what was collected in that buffer and depending on nature of failure. 

Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available, 
and will be deallocated once application copy Host Crash buffer to the file. 
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)


B.) Irrespective of underlying Firmware capability of crash dump support, 
driver will allocate DMA buffer at start of the day for each MR controllers. 
Let’s call this buffer as “DMA Crash Buffer”.

For this feature, size of DMA crash buffer will be 1MB. 
(We will not gain much even if DMA buffer size is increased.) 

C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”. 
Driver should extract the information from ctrl info provided by firmware and 
figure out if firmware support crash dump feature or not.

Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.

If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”

Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
---
 drivers/scsi/megaraid/megaraid_sas.h        |  58 +++++-
 drivers/scsi/megaraid/megaraid_sas_base.c   | 292 +++++++++++++++++++++++++++-
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 172 +++++++++++++++-
 3 files changed, 517 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
index bc7adcf..e0f03e2 100644
--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -105,6 +105,9 @@
 #define MFI_STATE_READY				0xB0000000
 #define MFI_STATE_OPERATIONAL			0xC0000000
 #define MFI_STATE_FAULT				0xF0000000
+#define MFI_STATE_FORCE_OCR			0x00000080
+#define MFI_STATE_DMADONE			0x00000008
+#define MFI_STATE_CRASH_DUMP_DONE		0x00000004
 #define MFI_RESET_REQUIRED			0x00000001
 #define MFI_RESET_ADAPTER			0x00000002
 #define MEGAMFI_FRAME_SIZE			64
@@ -191,6 +194,9 @@
 #define MR_DCMD_CLUSTER_RESET_LD		0x08010200
 #define MR_DCMD_PD_LIST_QUERY                   0x02010100
 
+#define MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS	0x01190100
+#define MR_DRIVER_SET_APP_CRASHDUMP_MODE	(0xF0010000 | 0x0600)
+
 /*
  * Global functions
  */
@@ -264,6 +270,25 @@ enum MFI_STAT {
 };
 
 /*
+ * Crash dump related defines
+ */
+#define MAX_CRASH_DUMP_SIZE 512
+#define CRASH_DMA_BUF_SIZE  (1024 * 1024)
+
+enum MR_FW_CRASH_DUMP_STATE {
+	UNAVAILABLE = 0,
+	AVAILABLE = 1,
+	COPYING = 2,
+	COPIED = 3,
+	COPY_ERROR = 4,
+};
+
+enum _MR_CRASH_BUF_STATUS {
+	MR_CRASH_BUF_TURN_OFF = 0,
+	MR_CRASH_BUF_TURN_ON = 1,
+};
+
+/*
  * Number of mailbox bytes in DCMD message frame
  */
 #define MFI_MBOX_SIZE				12
@@ -933,7 +958,19 @@ struct megasas_ctrl_info {
 		u8  reserved;                   /*0x7E7*/
 	} iov;
 
-	u8          pad[0x800-0x7E8];           /*0x7E8 pad to 2k */
+	struct {
+#if defined(__BIG_ENDIAN_BITFIELD)
+		u32     reserved:25;
+		u32     supportCrashDump:1;
+		u32     reserved1:6;
+#else
+		u32     reserved1:6;
+		u32     supportCrashDump:1;
+		u32     reserved:25;
+#endif
+	} adapterOperations3;
+
+	u8          pad[0x800-0x7EC];
 } __packed;
 
 /*
@@ -1559,6 +1596,20 @@ struct megasas_instance {
 	u32 *reply_queue;
 	dma_addr_t reply_queue_h;
 
+	u32 *crash_dump_buf;
+	dma_addr_t crash_dump_h;
+	void *crash_buf[MAX_CRASH_DUMP_SIZE];
+	u32 crash_buf_pages;
+	unsigned int    fw_crash_buffer_size;
+	unsigned int    fw_crash_state;
+	unsigned int    fw_crash_buffer_offset;
+	u32 drv_buf_index;
+	u32 drv_buf_alloc;
+	u32 crash_dump_fw_support;
+	u32 crash_dump_drv_support;
+	u32 crash_dump_app_support;
+	spinlock_t crashdump_lock;
+
 	struct megasas_register_set __iomem *reg_set;
 	u32 *reply_post_host_index_addr[MR_MAX_MSIX_REG_ARRAY];
 	struct megasas_pd_list          pd_list[MEGASAS_MAX_PD];
@@ -1606,6 +1657,7 @@ struct megasas_instance {
 	struct megasas_instance_template *instancet;
 	struct tasklet_struct isr_tasklet;
 	struct work_struct work_init;
+	struct work_struct crash_init;
 
 	u8 flag;
 	u8 unload;
@@ -1830,4 +1882,8 @@ u16 MR_LdSpanArrayGet(u32 ld, u32 span, struct MR_FW_RAID_MAP_ALL *map);
 u16 MR_PdDevHandleGet(u32 pd, struct MR_FW_RAID_MAP_ALL *map);
 u16 MR_GetLDTgtId(u32 ld, struct MR_FW_RAID_MAP_ALL *map);
 
+int megasas_set_crash_dump_params(struct megasas_instance *instance,
+		u8 crash_buf_state);
+void megasas_free_host_crash_buffer(struct megasas_instance *instance);
+void megasas_fusion_crash_dump_wq(struct work_struct *work);
 #endif				/*LSI_MEGARAID_SAS_H */
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index a894f13..5b58e39d 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -2560,6 +2560,152 @@ static int megasas_change_queue_depth(struct scsi_device *sdev,
 	return queue_depth;
 }
 
+static ssize_t
+megasas_fw_crash_buffer_store(struct device *cdev,
+	struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct megasas_instance *instance =
+		(struct megasas_instance *) shost->hostdata;
+	int val = 0;
+	unsigned long flags;
+
+	if (kstrtoint(buf, 0, &val) != 0)
+		return -EINVAL;
+
+	spin_lock_irqsave(&instance->crashdump_lock, flags);
+	instance->fw_crash_buffer_offset = val;
+	spin_unlock_irqrestore(&instance->crashdump_lock, flags);
+	return strlen(buf);
+}
+
+static ssize_t
+megasas_fw_crash_buffer_show(struct device *cdev,
+	struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct megasas_instance *instance =
+		(struct megasas_instance *) shost->hostdata;
+	u32 size;
+	unsigned long buff_addr;
+	unsigned long dmachunk = CRASH_DMA_BUF_SIZE;
+	unsigned long src_addr;
+	unsigned long flags;
+	u32 buff_offset;
+
+	buff_offset = instance->fw_crash_buffer_offset;
+	spin_lock_irqsave(&instance->crashdump_lock, flags);
+	if (!instance->crash_dump_buf &&
+		!((instance->fw_crash_state == AVAILABLE) ||
+		(instance->fw_crash_state == COPYING))) {
+		dev_err(&instance->pdev->dev,
+			"Firmware crash dump is not available\n");
+		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
+		return -EINVAL;
+	}
+
+	buff_addr = (unsigned long) buf;
+
+	if (buff_offset >
+		(instance->fw_crash_buffer_size * dmachunk)) {
+		dev_err(&instance->pdev->dev,
+			"Firmware crash dump offset is out of range\n");
+		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
+		return 0;
+	}
+
+	size = (instance->fw_crash_buffer_size * dmachunk) - buff_offset;
+	size = (size >= PAGE_SIZE) ? (PAGE_SIZE - 1) : size;
+
+	src_addr = (unsigned long)instance->crash_buf[buff_offset / dmachunk] +
+		(buff_offset % dmachunk);
+	memcpy(buf, (void *)src_addr,  size);
+	spin_unlock_irqrestore(&instance->crashdump_lock, flags);
+
+	return size;
+}
+
+static ssize_t
+megasas_fw_crash_buffer_size_show(struct device *cdev,
+	struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct megasas_instance *instance =
+		(struct megasas_instance *) shost->hostdata;
+
+	return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long)
+		((instance->fw_crash_buffer_size) * 1024 * 1024)/PAGE_SIZE);
+}
+
+static ssize_t
+megasas_fw_crash_state_store(struct device *cdev,
+	struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct megasas_instance *instance =
+		(struct megasas_instance *) shost->hostdata;
+	int val = 0;
+	unsigned long flags;
+
+	if (kstrtoint(buf, 0, &val) != 0)
+		return -EINVAL;
+
+	if ((val <= AVAILABLE || val > COPY_ERROR)) {
+		dev_err(&instance->pdev->dev, "application updates invalid "
+			"firmware crash state\n");
+		return -EINVAL;
+	}
+
+	instance->fw_crash_state = val;
+
+	if ((val == COPIED) || (val == COPY_ERROR)) {
+		spin_lock_irqsave(&instance->crashdump_lock, flags);
+		megasas_free_host_crash_buffer(instance);
+		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
+		if (val == COPY_ERROR)
+			dev_info(&instance->pdev->dev, "application failed to "
+				"copy Firmware crash dump\n");
+		else
+			dev_info(&instance->pdev->dev, "Firmware crash dump "
+				"copied successfully\n");
+	}
+	return strlen(buf);
+}
+
+static ssize_t
+megasas_fw_crash_state_show(struct device *cdev,
+	struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct megasas_instance *instance =
+		(struct megasas_instance *) shost->hostdata;
+	return snprintf(buf, PAGE_SIZE, "%d\n", instance->fw_crash_state);
+}
+
+static ssize_t
+megasas_page_size_show(struct device *cdev,
+	struct device_attribute *attr, char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long)PAGE_SIZE - 1);
+}
+
+static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR,
+	megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store);
+static DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO,
+	megasas_fw_crash_buffer_size_show, NULL);
+static DEVICE_ATTR(fw_crash_state, S_IRUGO | S_IWUSR,
+	megasas_fw_crash_state_show, megasas_fw_crash_state_store);
+static DEVICE_ATTR(page_size, S_IRUGO,
+	megasas_page_size_show, NULL);
+
+struct device_attribute *megaraid_host_attrs[] = {
+	&dev_attr_fw_crash_buffer_size,
+	&dev_attr_fw_crash_buffer,
+	&dev_attr_fw_crash_state,
+	&dev_attr_page_size,
+	NULL,
+};
+
 /*
  * Scsi host template for megaraid_sas driver
  */
@@ -2575,6 +2721,7 @@ static struct scsi_host_template megasas_template = {
 	.eh_bus_reset_handler = megasas_reset_bus_host,
 	.eh_host_reset_handler = megasas_reset_bus_host,
 	.eh_timed_out = megasas_reset_timer,
+	.shost_attrs = megaraid_host_attrs,
 	.bios_param = megasas_bios_param,
 	.use_clustering = ENABLE_CLUSTERING,
 	.change_queue_depth = megasas_change_queue_depth,
@@ -3887,6 +4034,59 @@ megasas_get_ctrl_info(struct megasas_instance *instance,
 	return ret;
 }
 
+/*
+ * megasas_set_crash_dump_params -	Sends address of crash dump DMA buffer
+ *					to firmware
+ *
+ * @instance:				Adapter soft state
+ * @crash_buf_state		-	tell FW to turn ON/OFF crash dump feature
+					MR_CRASH_BUF_TURN_OFF = 0
+					MR_CRASH_BUF_TURN_ON = 1
+ * @return 0 on success non-zero on failure.
+ * Issues an internal command (DCMD) to set parameters for crash dump feature.
+ * Driver will send address of crash dump DMA buffer and set mbox to tell FW
+ * that driver supports crash dump feature. This DCMD will be sent only if
+ * crash dump feature is supported by the FW.
+ *
+ */
+int megasas_set_crash_dump_params(struct megasas_instance *instance,
+	u8 crash_buf_state)
+{
+	int ret = 0;
+	struct megasas_cmd *cmd;
+	struct megasas_dcmd_frame *dcmd;
+
+	cmd = megasas_get_cmd(instance);
+
+	if (!cmd) {
+		dev_err(&instance->pdev->dev, "Failed to get a free cmd\n");
+		return -ENOMEM;
+	}
+
+
+	dcmd = &cmd->frame->dcmd;
+
+	memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE);
+	dcmd->mbox.b[0] = crash_buf_state;
+	dcmd->cmd = MFI_CMD_DCMD;
+	dcmd->cmd_status = 0xFF;
+	dcmd->sge_count = 1;
+	dcmd->flags = cpu_to_le16(MFI_FRAME_DIR_NONE);
+	dcmd->timeout = 0;
+	dcmd->pad_0 = 0;
+	dcmd->data_xfer_len = cpu_to_le32(CRASH_DMA_BUF_SIZE);
+	dcmd->opcode = cpu_to_le32(MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS);
+	dcmd->sgl.sge32[0].phys_addr = cpu_to_le32(instance->crash_dump_h);
+	dcmd->sgl.sge32[0].length = cpu_to_le32(CRASH_DMA_BUF_SIZE);
+
+	if (!megasas_issue_polled(instance, cmd))
+		ret = 0;
+	else
+		ret = -1;
+	megasas_return_cmd(instance, cmd);
+	return ret;
+}
+
 /**
  * megasas_issue_init_mfi -	Initializes the FW
  * @instance:		Adapter soft state
@@ -4272,6 +4472,27 @@ static int megasas_init_fw(struct megasas_instance *instance)
 			printk(KERN_WARNING "megaraid_sas: I am VF "
 			       "requestorId %d\n", instance->requestorId);
 		}
+
+		le32_to_cpus((u32 *)&ctrl_info->adapterOperations3);
+		instance->crash_dump_fw_support =
+			ctrl_info->adapterOperations3.supportCrashDump;
+		instance->crash_dump_drv_support =
+			(instance->crash_dump_fw_support &&
+			instance->crash_dump_buf);
+		if (instance->crash_dump_drv_support) {
+			dev_info(&instance->pdev->dev, "Firmware Crash dump "
+				"feature is supported\n");
+			megasas_set_crash_dump_params(instance,
+				MR_CRASH_BUF_TURN_OFF);
+
+		} else {
+			if (instance->crash_dump_buf)
+				pci_free_consistent(instance->pdev,
+					CRASH_DMA_BUF_SIZE,
+					instance->crash_dump_buf,
+					instance->crash_dump_h);
+			instance->crash_dump_buf = NULL;
+		}
 	}
 	instance->max_sectors_per_req = instance->max_num_sge *
 						PAGE_SIZE / 512;
@@ -4791,6 +5012,21 @@ static int megasas_probe_one(struct pci_dev *pdev,
 		break;
 	}
 
+	/* Crash dump feature related initialisation*/
+	instance->drv_buf_index = 0;
+	instance->drv_buf_alloc = 0;
+	instance->crash_dump_fw_support = 0;
+	instance->crash_dump_app_support = 0;
+	instance->fw_crash_state = UNAVAILABLE;
+	spin_lock_init(&instance->crashdump_lock);
+
+	instance->crash_dump_buf = pci_alloc_consistent(pdev,
+						CRASH_DMA_BUF_SIZE,
+						&instance->crash_dump_h);
+	if (!instance->crash_dump_buf)
+		dev_err(&instance->pdev->dev, "Can't allocate Firmware "
+			"crash dump DMA buffer\n");
+
 	megasas_poll_wait_aen = 0;
 	instance->flag_ieee = 0;
 	instance->ev = NULL;
@@ -4852,9 +5088,10 @@ static int megasas_probe_one(struct pci_dev *pdev,
 	if ((instance->pdev->device == PCI_DEVICE_ID_LSI_FUSION) ||
 	    (instance->pdev->device == PCI_DEVICE_ID_LSI_PLASMA) ||
 	    (instance->pdev->device == PCI_DEVICE_ID_LSI_INVADER) ||
-	    (instance->pdev->device == PCI_DEVICE_ID_LSI_FURY))
+	    (instance->pdev->device == PCI_DEVICE_ID_LSI_FURY)) {
 		INIT_WORK(&instance->work_init, megasas_fusion_ocr_wq);
-	else
+		INIT_WORK(&instance->crash_init, megasas_fusion_crash_dump_wq);
+	} else
 		INIT_WORK(&instance->work_init, process_fw_state_change_wq);
 
 	/*
@@ -5342,6 +5579,8 @@ static void megasas_detach_one(struct pci_dev *pdev)
 	if (instance->requestorId && !instance->skip_heartbeat_timer_del)
 		del_timer_sync(&instance->sriov_heartbeat_timer);
 
+	if (instance->fw_crash_state != UNAVAILABLE)
+		megasas_free_host_crash_buffer(instance);
 	scsi_remove_host(instance->host);
 	megasas_flush_cache(instance);
 	megasas_shutdown_controller(instance, MR_DCMD_CTRL_SHUTDOWN);
@@ -5432,6 +5671,10 @@ static void megasas_detach_one(struct pci_dev *pdev)
 				    instance->hb_host_mem,
 				    instance->hb_host_mem_h);
 
+	if (instance->crash_dump_buf)
+		pci_free_consistent(pdev, CRASH_DMA_BUF_SIZE,
+			    instance->crash_dump_buf, instance->crash_dump_h);
+
 	scsi_host_put(host);
 
 	pci_disable_device(pdev);
@@ -5523,6 +5766,45 @@ static unsigned int megasas_mgmt_poll(struct file *file, poll_table *wait)
 	return mask;
 }
 
+/*
+ * megasas_set_crash_dump_params_ioctl:
+ *		Send CRASH_DUMP_MODE DCMD to all controllers
+ * @cmd:	MFI command frame
+ */
+
+static int megasas_set_crash_dump_params_ioctl(
+	struct megasas_cmd *cmd)
+{
+	struct megasas_instance *local_instance;
+	int i, error = 0;
+	int crash_support;
+
+	crash_support = cmd->frame->dcmd.mbox.w[0];
+
+	for (i = 0; i < megasas_mgmt_info.max_index; i++) {
+		local_instance = megasas_mgmt_info.instance[i];
+		if (local_instance && local_instance->crash_dump_drv_support) {
+			if ((local_instance->adprecovery ==
+				MEGASAS_HBA_OPERATIONAL) &&
+				!megasas_set_crash_dump_params(local_instance,
+					crash_support)) {
+				local_instance->crash_dump_app_support =
+					crash_support;
+				dev_info(&local_instance->pdev->dev,
+					"Application firmware crash "
+					"dump mode set success\n");
+				error = 0;
+			} else {
+				dev_info(&local_instance->pdev->dev,
+					"Application firmware crash "
+					"dump mode set failed\n");
+				error = -1;
+			}
+		}
+	}
+	return error;
+}
+
 /**
  * megasas_mgmt_fw_ioctl -	Issues management ioctls to FW
  * @instance:			Adapter soft state
@@ -5569,6 +5851,12 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
 					       MFI_FRAME_SGL64 |
 					       MFI_FRAME_SENSE64));
 
+	if (cmd->frame->dcmd.opcode == MR_DRIVER_SET_APP_CRASHDUMP_MODE) {
+		error = megasas_set_crash_dump_params_ioctl(cmd);
+		megasas_return_cmd(instance, cmd);
+		return error;
+	}
+
 	/*
 	 * The management interface between applications and the fw uses
 	 * MFI frames. E.g, RAID configuration changes, LD property changes
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index f30297d..aaba2a7 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -91,6 +91,8 @@ void megasas_start_timer(struct megasas_instance *instance,
 extern struct megasas_mgmt_info megasas_mgmt_info;
 extern int resetwaittime;
 
+
+
 /**
  * megasas_enable_intr_fusion -	Enables interrupts
  * @regs:			MFI register set
@@ -2057,7 +2059,7 @@ irqreturn_t megasas_isr_fusion(int irq, void *devp)
 {
 	struct megasas_irq_context *irq_context = devp;
 	struct megasas_instance *instance = irq_context->instance;
-	u32 mfiStatus, fw_state;
+	u32 mfiStatus, fw_state, dma_state;
 
 	if (instance->mask_interrupts)
 		return IRQ_NONE;
@@ -2079,7 +2081,16 @@ irqreturn_t megasas_isr_fusion(int irq, void *devp)
 		/* If we didn't complete any commands, check for FW fault */
 		fw_state = instance->instancet->read_fw_status_reg(
 			instance->reg_set) & MFI_STATE_MASK;
-		if (fw_state == MFI_STATE_FAULT) {
+		dma_state = instance->instancet->read_fw_status_reg
+			(instance->reg_set) & MFI_STATE_DMADONE;
+		if (instance->crash_dump_drv_support &&
+			instance->crash_dump_app_support) {
+			/* Start collecting crash, if DMA bit is done */
+			if ((fw_state == MFI_STATE_FAULT) && dma_state)
+				schedule_work(&instance->crash_init);
+			else if (fw_state == MFI_STATE_FAULT)
+				schedule_work(&instance->work_init);
+		} else if (fw_state == MFI_STATE_FAULT) {
 			printk(KERN_WARNING "megaraid_sas: Iop2SysDoorbellInt"
 			       "for scsi%d\n", instance->host->host_no);
 			schedule_work(&instance->work_init);
@@ -2232,6 +2243,49 @@ megasas_read_fw_status_reg_fusion(struct megasas_register_set __iomem *regs)
 }
 
 /**
+ * megasas_alloc_host_crash_buffer -	Host buffers for Crash dump collection from Firmware
+ * @instance:				Controller's soft instance
+ * return:			        Number of allocated host crash buffers
+ */
+static void
+megasas_alloc_host_crash_buffer(struct megasas_instance *instance)
+{
+	unsigned int i;
+
+	instance->crash_buf_pages = get_order(CRASH_DMA_BUF_SIZE);
+	for (i = 0; i < MAX_CRASH_DUMP_SIZE; i++) {
+		instance->crash_buf[i] = (void	*)__get_free_pages(GFP_KERNEL,
+				instance->crash_buf_pages);
+		if (!instance->crash_buf[i]) {
+			dev_info(&instance->pdev->dev, "Firmware crash dump "
+				"memory allocation failed at index %d\n", i);
+			break;
+		}
+	}
+	instance->drv_buf_alloc = i;
+}
+
+/**
+ * megasas_free_host_crash_buffer -	Host buffers for Crash dump collection from Firmware
+ * @instance:				Controller's soft instance
+ */
+void
+megasas_free_host_crash_buffer(struct megasas_instance *instance)
+{
+	unsigned int i
+;
+	for (i = 0; i < MAX_CRASH_DUMP_SIZE; i++) {
+		if (instance->crash_buf[i])
+			free_pages((ulong)instance->crash_buf[i],
+					instance->crash_buf_pages);
+	}
+	instance->drv_buf_index = 0;
+	instance->drv_buf_alloc = 0;
+	instance->fw_crash_state = UNAVAILABLE;
+	instance->fw_crash_buffer_size = 0;
+}
+
+/**
  * megasas_adp_reset_fusion -	For controller reset
  * @regs:				MFI register set
  */
@@ -2374,6 +2428,7 @@ int megasas_reset_fusion(struct Scsi_Host *shost, int iotimeout)
 	struct megasas_cmd *cmd_mfi;
 	union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
 	u32 host_diag, abs_state, status_reg, reset_adapter;
+	u32 io_timeout_in_crash_mode = 0;
 
 	instance = (struct megasas_instance *)shost->hostdata;
 	fusion = instance->ctrl_context;
@@ -2387,6 +2442,42 @@ int megasas_reset_fusion(struct Scsi_Host *shost, int iotimeout)
 		mutex_unlock(&instance->reset_mutex);
 		return FAILED;
 	}
+	status_reg = instance->instancet->read_fw_status_reg(instance->reg_set);
+	abs_state = status_reg & MFI_STATE_MASK;
+
+	/* IO timeout detected, forcibly put FW in FAULT state */
+	if (abs_state != MFI_STATE_FAULT && instance->crash_dump_buf &&
+		instance->crash_dump_app_support && iotimeout) {
+		dev_info(&instance->pdev->dev, "IO timeout is detected, "
+			"forcibly FAULT Firmware\n");
+		instance->adprecovery = MEGASAS_ADPRESET_SM_INFAULT;
+		status_reg = readl(&instance->reg_set->doorbell);
+		writel(status_reg | MFI_STATE_FORCE_OCR,
+			&instance->reg_set->doorbell);
+		readl(&instance->reg_set->doorbell);
+		mutex_unlock(&instance->reset_mutex);
+		do {
+			ssleep(3);
+			io_timeout_in_crash_mode++;
+			dev_dbg(&instance->pdev->dev, "waiting for [%d] "
+				"seconds for crash dump collection and OCR "
+				"to be done\n", (io_timeout_in_crash_mode * 3));
+		} while ((instance->adprecovery != MEGASAS_HBA_OPERATIONAL) &&
+			(io_timeout_in_crash_mode < 80));
+
+		if (instance->adprecovery == MEGASAS_HBA_OPERATIONAL) {
+			dev_info(&instance->pdev->dev, "OCR done for IO "
+				"timeout case\n");
+			retval = SUCCESS;
+		} else {
+			dev_info(&instance->pdev->dev, "Controller is not "
+				"operational after 240 seconds wait for IO "
+				"timeout case in FW crash dump mode\n do "
+				"OCR/kill adapter\n");
+			retval = megasas_reset_fusion(shost, 0);
+		}
+		return retval;
+	}
 
 	if (instance->requestorId && !instance->skip_heartbeat_timer_del)
 		del_timer_sync(&instance->sriov_heartbeat_timer);
@@ -2653,6 +2744,15 @@ int megasas_reset_fusion(struct Scsi_Host *shost, int iotimeout)
 			printk(KERN_WARNING "megaraid_sas: Reset "
 			       "successful for scsi%d.\n",
 				instance->host->host_no);
+
+			if (instance->crash_dump_drv_support) {
+				if (instance->crash_dump_app_support)
+					megasas_set_crash_dump_params(instance,
+						MR_CRASH_BUF_TURN_ON);
+				else
+					megasas_set_crash_dump_params(instance,
+						MR_CRASH_BUF_TURN_OFF);
+			}
 			retval = SUCCESS;
 			goto out;
 		}
@@ -2681,6 +2781,74 @@ out:
 	return retval;
 }
 
+/* Fusion Crash dump collection work queue */
+void  megasas_fusion_crash_dump_wq(struct work_struct *work)
+{
+	struct megasas_instance *instance =
+		container_of(work, struct megasas_instance, crash_init);
+	u32 status_reg;
+	u8 partial_copy = 0;
+
+
+	status_reg = instance->instancet->read_fw_status_reg(instance->reg_set);
+
+	/*
+	 * Allocate host crash buffers to copy data from 1 MB DMA crash buffer
+	 * to host crash buffers
+	 */
+	if (instance->drv_buf_index == 0) {
+		/* Buffer is already allocated for old Crash dump.
+		 * Do OCR and do not wait for crash dump collection
+		 */
+		if (instance->drv_buf_alloc) {
+			dev_info(&instance->pdev->dev, "earlier crash dump is "
+				"not yet copied by application, ignoring this "
+				"crash dump and initiating OCR\n");
+			status_reg |= MFI_STATE_CRASH_DUMP_DONE;
+			writel(status_reg,
+				&instance->reg_set->outbound_scratch_pad);
+			readl(&instance->reg_set->outbound_scratch_pad);
+			return;
+		}
+		megasas_alloc_host_crash_buffer(instance);
+		dev_info(&instance->pdev->dev, "Number of host crash buffers "
+			"allocated: %d\n", instance->drv_buf_alloc);
+	}
+
+	/*
+	 * Driver has allocated max buffers, which can be allocated
+	 * and FW has more crash dump data, then driver will
+	 * ignore the data.
+	 */
+	if (instance->drv_buf_index >= (instance->drv_buf_alloc)) {
+		dev_info(&instance->pdev->dev, "Driver is done copying "
+			"the buffer: %d\n", instance->drv_buf_alloc);
+		status_reg |= MFI_STATE_CRASH_DUMP_DONE;
+		partial_copy = 1;
+	} else {
+		memcpy(instance->crash_buf[instance->drv_buf_index],
+			instance->crash_dump_buf, CRASH_DMA_BUF_SIZE);
+		instance->drv_buf_index++;
+		status_reg &= ~MFI_STATE_DMADONE;
+	}
+
+	if (status_reg & MFI_STATE_CRASH_DUMP_DONE) {
+		dev_info(&instance->pdev->dev, "Crash Dump is available,number "
+			"of copied buffers: %d\n", instance->drv_buf_index);
+		instance->fw_crash_buffer_size =  instance->drv_buf_index;
+		instance->fw_crash_state = AVAILABLE;
+		instance->drv_buf_index = 0;
+		writel(status_reg, &instance->reg_set->outbound_scratch_pad);
+		readl(&instance->reg_set->outbound_scratch_pad);
+		if (!partial_copy)
+			megasas_reset_fusion(instance->host, 0);
+	} else {
+		writel(status_reg, &instance->reg_set->outbound_scratch_pad);
+		readl(&instance->reg_set->outbound_scratch_pad);
+	}
+}
+
+
 /* Fusion OCR work queue */
 void megasas_fusion_ocr_wq(struct work_struct *work)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-06 13:25 [PATCH 04/11] megaraid_sas : Firmware crash dump feature support Sumit.Saxena
@ 2014-09-09 15:54 ` Tomas Henzl
  2014-09-09 16:18   ` Elliott, Robert (Server Storage)
  2014-09-10 12:12   ` Sumit Saxena
  0 siblings, 2 replies; 14+ messages in thread
From: Tomas Henzl @ 2014-09-09 15:54 UTC (permalink / raw)
  To: Sumit.Saxena, linux-scsi
  Cc: martin.petersen, hch, jbottomley, kashyap.desai, aradford

On 09/06/2014 03:25 PM, Sumit.Saxena@avagotech.com wrote:
> This feature will provide similar interface as kernel crash dump feature.
> When megaraid firmware encounter any crash, driver will collect the firmware raw image and 
> dump it into pre-configured location.
>
> Driver will allocate two different segment of memory. 
> #1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
> #2 DMA buffer (persistence allocation) just to do a arbitrator job. 
>
> Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2, 
> which will be copy back by driver to the host memory as described in #1.
>
> Driver-Firmware interface:
> ==================
> A.) Host driver can allocate maximum 512MB Host memory to store crash dump data. 
>
> This memory will be internal to the host and will not be exposed to the Firmware.
> Driver may not be able to allocate 512 MB. In that case, driver will do possible memory 
> (available at run time) allocation to store crash dump data. 
>
> Let’s call this buffer as Host Crash Buffer. 
>
> Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory. 
> This will be internal to driver and firmware/application are unaware of it. 
> Partial allocation of Host Crash buffer may have valid information to debug depending upon 
> what was collected in that buffer and depending on nature of failure. 
>
> Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
> Host Crash buffer will be allocated only when FW Crash dump data is available, 
> and will be deallocated once application copy Host Crash buffer to the file. 
> Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
>
>
> B.) Irrespective of underlying Firmware capability of crash dump support, 
> driver will allocate DMA buffer at start of the day for each MR controllers. 
> Let’s call this buffer as “DMA Crash Buffer”.
>
> For this feature, size of DMA crash buffer will be 1MB. 
> (We will not gain much even if DMA buffer size is increased.) 
>
> C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”. 
> Driver should extract the information from ctrl info provided by firmware and 
> figure out if firmware support crash dump feature or not.
>
> Driver will enable crash dump feature only if
> “Firmware support Crash dump” +
> “Driver was able to create DMA Crash Buffer”.
>
> If either one from above is not set, Crash dump feature should be disable in driver.
> Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
>
> Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
> host memory to the filesystem.

Is it possible to store the crash dump data on filesystem on the same controller after
the controller has crashed or do you expect that a use of another disk/controller?
With several controllers in a system this may take a lot memory, could you also
in case when a kdump kernel is running lower it, by not using this feature?

see other comments inside

>
> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
> ---
>  drivers/scsi/megaraid/megaraid_sas.h        |  58 +++++-
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 292 +++++++++++++++++++++++++++-
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 172 +++++++++++++++-
>  3 files changed, 517 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
> index bc7adcf..e0f03e2 100644
> --- a/drivers/scsi/megaraid/megaraid_sas.h
> +++ b/drivers/scsi/megaraid/megaraid_sas.h
> @@ -105,6 +105,9 @@
>  #define MFI_STATE_READY				0xB0000000
>  #define MFI_STATE_OPERATIONAL			0xC0000000
>  #define MFI_STATE_FAULT				0xF0000000
> +#define MFI_STATE_FORCE_OCR			0x00000080
> +#define MFI_STATE_DMADONE			0x00000008
> +#define MFI_STATE_CRASH_DUMP_DONE		0x00000004
>  #define MFI_RESET_REQUIRED			0x00000001
>  #define MFI_RESET_ADAPTER			0x00000002
>  #define MEGAMFI_FRAME_SIZE			64
> @@ -191,6 +194,9 @@
>  #define MR_DCMD_CLUSTER_RESET_LD		0x08010200
>  #define MR_DCMD_PD_LIST_QUERY                   0x02010100
>  
> +#define MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS	0x01190100
> +#define MR_DRIVER_SET_APP_CRASHDUMP_MODE	(0xF0010000 | 0x0600)
> +
>  /*
>   * Global functions
>   */
> @@ -264,6 +270,25 @@ enum MFI_STAT {
>  };
>  
>  /*
> + * Crash dump related defines
> + */
> +#define MAX_CRASH_DUMP_SIZE 512
> +#define CRASH_DMA_BUF_SIZE  (1024 * 1024)
> +
> +enum MR_FW_CRASH_DUMP_STATE {
> +	UNAVAILABLE = 0,
> +	AVAILABLE = 1,
> +	COPYING = 2,
> +	COPIED = 3,
> +	COPY_ERROR = 4,
> +};
> +
> +enum _MR_CRASH_BUF_STATUS {
> +	MR_CRASH_BUF_TURN_OFF = 0,
> +	MR_CRASH_BUF_TURN_ON = 1,
> +};
> +
> +/*
>   * Number of mailbox bytes in DCMD message frame
>   */
>  #define MFI_MBOX_SIZE				12
> @@ -933,7 +958,19 @@ struct megasas_ctrl_info {
>  		u8  reserved;                   /*0x7E7*/
>  	} iov;
>  
> -	u8          pad[0x800-0x7E8];           /*0x7E8 pad to 2k */
> +	struct {
> +#if defined(__BIG_ENDIAN_BITFIELD)
> +		u32     reserved:25;
> +		u32     supportCrashDump:1;
> +		u32     reserved1:6;
> +#else
> +		u32     reserved1:6;
> +		u32     supportCrashDump:1;
> +		u32     reserved:25;
> +#endif
> +	} adapterOperations3;
> +
> +	u8          pad[0x800-0x7EC];
>  } __packed;
>  
>  /*
> @@ -1559,6 +1596,20 @@ struct megasas_instance {
>  	u32 *reply_queue;
>  	dma_addr_t reply_queue_h;
>  
> +	u32 *crash_dump_buf;
> +	dma_addr_t crash_dump_h;
> +	void *crash_buf[MAX_CRASH_DUMP_SIZE];
> +	u32 crash_buf_pages;
> +	unsigned int    fw_crash_buffer_size;
> +	unsigned int    fw_crash_state;
> +	unsigned int    fw_crash_buffer_offset;
> +	u32 drv_buf_index;
> +	u32 drv_buf_alloc;
> +	u32 crash_dump_fw_support;
> +	u32 crash_dump_drv_support;
> +	u32 crash_dump_app_support;
> +	spinlock_t crashdump_lock;
> +
>  	struct megasas_register_set __iomem *reg_set;
>  	u32 *reply_post_host_index_addr[MR_MAX_MSIX_REG_ARRAY];
>  	struct megasas_pd_list          pd_list[MEGASAS_MAX_PD];
> @@ -1606,6 +1657,7 @@ struct megasas_instance {
>  	struct megasas_instance_template *instancet;
>  	struct tasklet_struct isr_tasklet;
>  	struct work_struct work_init;
> +	struct work_struct crash_init;
>  
>  	u8 flag;
>  	u8 unload;
> @@ -1830,4 +1882,8 @@ u16 MR_LdSpanArrayGet(u32 ld, u32 span, struct MR_FW_RAID_MAP_ALL *map);
>  u16 MR_PdDevHandleGet(u32 pd, struct MR_FW_RAID_MAP_ALL *map);
>  u16 MR_GetLDTgtId(u32 ld, struct MR_FW_RAID_MAP_ALL *map);
>  
> +int megasas_set_crash_dump_params(struct megasas_instance *instance,
> +		u8 crash_buf_state);
> +void megasas_free_host_crash_buffer(struct megasas_instance *instance);
> +void megasas_fusion_crash_dump_wq(struct work_struct *work);
>  #endif				/*LSI_MEGARAID_SAS_H */
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> index a894f13..5b58e39d 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -2560,6 +2560,152 @@ static int megasas_change_queue_depth(struct scsi_device *sdev,
>  	return queue_depth;
>  }
>  
> +static ssize_t
> +megasas_fw_crash_buffer_store(struct device *cdev,
> +	struct device_attribute *attr, const char *buf, size_t count)
> +{
> +	struct Scsi_Host *shost = class_to_shost(cdev);
> +	struct megasas_instance *instance =
> +		(struct megasas_instance *) shost->hostdata;
> +	int val = 0;
> +	unsigned long flags;
> +
> +	if (kstrtoint(buf, 0, &val) != 0)
> +		return -EINVAL;
> +
> +	spin_lock_irqsave(&instance->crashdump_lock, flags);
> +	instance->fw_crash_buffer_offset = val;
> +	spin_unlock_irqrestore(&instance->crashdump_lock, flags);

The access to fw_crash_buffer_offset is not protected in the function below
why is the spinlock needed here, could it be removed?

> +	return strlen(buf);
> +}
> +
> +static ssize_t
> +megasas_fw_crash_buffer_show(struct device *cdev,
> +	struct device_attribute *attr, char *buf)
> +{
> +	struct Scsi_Host *shost = class_to_shost(cdev);
> +	struct megasas_instance *instance =
> +		(struct megasas_instance *) shost->hostdata;
> +	u32 size;
> +	unsigned long buff_addr;
> +	unsigned long dmachunk = CRASH_DMA_BUF_SIZE;
> +	unsigned long src_addr;
> +	unsigned long flags;
> +	u32 buff_offset;
> +
> +	buff_offset = instance->fw_crash_buffer_offset;
> +	spin_lock_irqsave(&instance->crashdump_lock, flags);
> +	if (!instance->crash_dump_buf &&
> +		!((instance->fw_crash_state == AVAILABLE) ||
> +		(instance->fw_crash_state == COPYING))) {
> +		dev_err(&instance->pdev->dev,
> +			"Firmware crash dump is not available\n");
> +		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
> +		return -EINVAL;
> +	}
> +
> +	buff_addr = (unsigned long) buf;
> +
> +	if (buff_offset >
> +		(instance->fw_crash_buffer_size * dmachunk)) {
> +		dev_err(&instance->pdev->dev,
> +			"Firmware crash dump offset is out of range\n");
> +		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
> +		return 0;
> +	}
> +
> +	size = (instance->fw_crash_buffer_size * dmachunk) - buff_offset;
> +	size = (size >= PAGE_SIZE) ? (PAGE_SIZE - 1) : size;
> +
> +	src_addr = (unsigned long)instance->crash_buf[buff_offset / dmachunk] +
> +		(buff_offset % dmachunk);
> +	memcpy(buf, (void *)src_addr,  size);
> +	spin_unlock_irqrestore(&instance->crashdump_lock, flags);
> +
> +	return size;
> +}
> +
> +static ssize_t
> +megasas_fw_crash_buffer_size_show(struct device *cdev,
> +	struct device_attribute *attr, char *buf)
> +{
> +	struct Scsi_Host *shost = class_to_shost(cdev);
> +	struct megasas_instance *instance =
> +		(struct megasas_instance *) shost->hostdata;
> +
> +	return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long)
> +		((instance->fw_crash_buffer_size) * 1024 * 1024)/PAGE_SIZE);
> +}
> +
> +static ssize_t
> +megasas_fw_crash_state_store(struct device *cdev,
> +	struct device_attribute *attr, const char *buf, size_t count)
> +{
> +	struct Scsi_Host *shost = class_to_shost(cdev);
> +	struct megasas_instance *instance =
> +		(struct megasas_instance *) shost->hostdata;
> +	int val = 0;
> +	unsigned long flags;
> +
> +	if (kstrtoint(buf, 0, &val) != 0)
> +		return -EINVAL;
> +
> +	if ((val <= AVAILABLE || val > COPY_ERROR)) {
> +		dev_err(&instance->pdev->dev, "application updates invalid "
> +			"firmware crash state\n");
> +		return -EINVAL;
> +	}
> +
> +	instance->fw_crash_state = val;
> +
> +	if ((val == COPIED) || (val == COPY_ERROR)) {
> +		spin_lock_irqsave(&instance->crashdump_lock, flags);
> +		megasas_free_host_crash_buffer(instance);
> +		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
> +		if (val == COPY_ERROR)
> +			dev_info(&instance->pdev->dev, "application failed to "
> +				"copy Firmware crash dump\n");
> +		else
> +			dev_info(&instance->pdev->dev, "Firmware crash dump "
> +				"copied successfully\n");
> +	}
> +	return strlen(buf);
> +}
> +
> +static ssize_t
> +megasas_fw_crash_state_show(struct device *cdev,
> +	struct device_attribute *attr, char *buf)
> +{
> +	struct Scsi_Host *shost = class_to_shost(cdev);
> +	struct megasas_instance *instance =
> +		(struct megasas_instance *) shost->hostdata;
> +	return snprintf(buf, PAGE_SIZE, "%d\n", instance->fw_crash_state);
> +}
> +
> +static ssize_t
> +megasas_page_size_show(struct device *cdev,
> +	struct device_attribute *attr, char *buf)
> +{
> +	return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long)PAGE_SIZE - 1);
> +}
> +
> +static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR,
> +	megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store);
> +static DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO,
> +	megasas_fw_crash_buffer_size_show, NULL);
> +static DEVICE_ATTR(fw_crash_state, S_IRUGO | S_IWUSR,
> +	megasas_fw_crash_state_show, megasas_fw_crash_state_store);
> +static DEVICE_ATTR(page_size, S_IRUGO,
> +	megasas_page_size_show, NULL);
> +
> +struct device_attribute *megaraid_host_attrs[] = {
> +	&dev_attr_fw_crash_buffer_size,
> +	&dev_attr_fw_crash_buffer,
> +	&dev_attr_fw_crash_state,
> +	&dev_attr_page_size,
> +	NULL,
> +};
> +
>  /*
>   * Scsi host template for megaraid_sas driver
>   */
> @@ -2575,6 +2721,7 @@ static struct scsi_host_template megasas_template = {
>  	.eh_bus_reset_handler = megasas_reset_bus_host,
>  	.eh_host_reset_handler = megasas_reset_bus_host,
>  	.eh_timed_out = megasas_reset_timer,
> +	.shost_attrs = megaraid_host_attrs,
>  	.bios_param = megasas_bios_param,
>  	.use_clustering = ENABLE_CLUSTERING,
>  	.change_queue_depth = megasas_change_queue_depth,
> @@ -3887,6 +4034,59 @@ megasas_get_ctrl_info(struct megasas_instance *instance,
>  	return ret;
>  }
>  
> +/*
> + * megasas_set_crash_dump_params -	Sends address of crash dump DMA buffer
> + *					to firmware
> + *
> + * @instance:				Adapter soft state
> + * @crash_buf_state		-	tell FW to turn ON/OFF crash dump feature
> +					MR_CRASH_BUF_TURN_OFF = 0
> +					MR_CRASH_BUF_TURN_ON = 1
> + * @return 0 on success non-zero on failure.
> + * Issues an internal command (DCMD) to set parameters for crash dump feature.
> + * Driver will send address of crash dump DMA buffer and set mbox to tell FW
> + * that driver supports crash dump feature. This DCMD will be sent only if
> + * crash dump feature is supported by the FW.
> + *
> + */
> +int megasas_set_crash_dump_params(struct megasas_instance *instance,
> +	u8 crash_buf_state)
> +{
> +	int ret = 0;
> +	struct megasas_cmd *cmd;
> +	struct megasas_dcmd_frame *dcmd;
> +
> +	cmd = megasas_get_cmd(instance);
> +
> +	if (!cmd) {
> +		dev_err(&instance->pdev->dev, "Failed to get a free cmd\n");
> +		return -ENOMEM;
> +	}
> +
> +
> +	dcmd = &cmd->frame->dcmd;
> +
> +	memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE);
> +	dcmd->mbox.b[0] = crash_buf_state;
> +	dcmd->cmd = MFI_CMD_DCMD;
> +	dcmd->cmd_status = 0xFF;
> +	dcmd->sge_count = 1;
> +	dcmd->flags = cpu_to_le16(MFI_FRAME_DIR_NONE);
> +	dcmd->timeout = 0;
> +	dcmd->pad_0 = 0;
> +	dcmd->data_xfer_len = cpu_to_le32(CRASH_DMA_BUF_SIZE);
> +	dcmd->opcode = cpu_to_le32(MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS);
> +	dcmd->sgl.sge32[0].phys_addr = cpu_to_le32(instance->crash_dump_h);
> +	dcmd->sgl.sge32[0].length = cpu_to_le32(CRASH_DMA_BUF_SIZE);
> +
> +	if (!megasas_issue_polled(instance, cmd))
> +		ret = 0;
> +	else
> +		ret = -1;
> +	megasas_return_cmd(instance, cmd);
> +	return ret;
> +}
> +
>  /**
>   * megasas_issue_init_mfi -	Initializes the FW
>   * @instance:		Adapter soft state
> @@ -4272,6 +4472,27 @@ static int megasas_init_fw(struct megasas_instance *instance)
>  			printk(KERN_WARNING "megaraid_sas: I am VF "
>  			       "requestorId %d\n", instance->requestorId);
>  		}
> +
> +		le32_to_cpus((u32 *)&ctrl_info->adapterOperations3);
> +		instance->crash_dump_fw_support =
> +			ctrl_info->adapterOperations3.supportCrashDump;
> +		instance->crash_dump_drv_support =
> +			(instance->crash_dump_fw_support &&
> +			instance->crash_dump_buf);
> +		if (instance->crash_dump_drv_support) {
> +			dev_info(&instance->pdev->dev, "Firmware Crash dump "
> +				"feature is supported\n");
> +			megasas_set_crash_dump_params(instance,
> +				MR_CRASH_BUF_TURN_OFF);
> +
> +		} else {
> +			if (instance->crash_dump_buf)
> +				pci_free_consistent(instance->pdev,
> +					CRASH_DMA_BUF_SIZE,
> +					instance->crash_dump_buf,
> +					instance->crash_dump_h);
> +			instance->crash_dump_buf = NULL;
> +		}
>  	}
>  	instance->max_sectors_per_req = instance->max_num_sge *
>  						PAGE_SIZE / 512;
> @@ -4791,6 +5012,21 @@ static int megasas_probe_one(struct pci_dev *pdev,
>  		break;
>  	}
>  
> +	/* Crash dump feature related initialisation*/
> +	instance->drv_buf_index = 0;
> +	instance->drv_buf_alloc = 0;
> +	instance->crash_dump_fw_support = 0;
> +	instance->crash_dump_app_support = 0;
> +	instance->fw_crash_state = UNAVAILABLE;
> +	spin_lock_init(&instance->crashdump_lock);
> +
> +	instance->crash_dump_buf = pci_alloc_consistent(pdev,
> +						CRASH_DMA_BUF_SIZE,
> +						&instance->crash_dump_h);
> +	if (!instance->crash_dump_buf)
> +		dev_err(&instance->pdev->dev, "Can't allocate Firmware "
> +			"crash dump DMA buffer\n");
> +
>  	megasas_poll_wait_aen = 0;
>  	instance->flag_ieee = 0;
>  	instance->ev = NULL;
> @@ -4852,9 +5088,10 @@ static int megasas_probe_one(struct pci_dev *pdev,
>  	if ((instance->pdev->device == PCI_DEVICE_ID_LSI_FUSION) ||
>  	    (instance->pdev->device == PCI_DEVICE_ID_LSI_PLASMA) ||
>  	    (instance->pdev->device == PCI_DEVICE_ID_LSI_INVADER) ||
> -	    (instance->pdev->device == PCI_DEVICE_ID_LSI_FURY))
> +	    (instance->pdev->device == PCI_DEVICE_ID_LSI_FURY)) {
>  		INIT_WORK(&instance->work_init, megasas_fusion_ocr_wq);
> -	else
> +		INIT_WORK(&instance->crash_init, megasas_fusion_crash_dump_wq);
> +	} else
>  		INIT_WORK(&instance->work_init, process_fw_state_change_wq);
>  
>  	/*
> @@ -5342,6 +5579,8 @@ static void megasas_detach_one(struct pci_dev *pdev)
>  	if (instance->requestorId && !instance->skip_heartbeat_timer_del)
>  		del_timer_sync(&instance->sriov_heartbeat_timer);
>  
> +	if (instance->fw_crash_state != UNAVAILABLE)
> +		megasas_free_host_crash_buffer(instance);
>  	scsi_remove_host(instance->host);
>  	megasas_flush_cache(instance);
>  	megasas_shutdown_controller(instance, MR_DCMD_CTRL_SHUTDOWN);
> @@ -5432,6 +5671,10 @@ static void megasas_detach_one(struct pci_dev *pdev)
>  				    instance->hb_host_mem,
>  				    instance->hb_host_mem_h);
>  
> +	if (instance->crash_dump_buf)
> +		pci_free_consistent(pdev, CRASH_DMA_BUF_SIZE,
> +			    instance->crash_dump_buf, instance->crash_dump_h);
> +
>  	scsi_host_put(host);
>  
>  	pci_disable_device(pdev);
> @@ -5523,6 +5766,45 @@ static unsigned int megasas_mgmt_poll(struct file *file, poll_table *wait)
>  	return mask;
>  }
>  
> +/*
> + * megasas_set_crash_dump_params_ioctl:
> + *		Send CRASH_DUMP_MODE DCMD to all controllers
> + * @cmd:	MFI command frame
> + */
> +
> +static int megasas_set_crash_dump_params_ioctl(
> +	struct megasas_cmd *cmd)
> +{
> +	struct megasas_instance *local_instance;
> +	int i, error = 0;
> +	int crash_support;
> +
> +	crash_support = cmd->frame->dcmd.mbox.w[0];
> +
> +	for (i = 0; i < megasas_mgmt_info.max_index; i++) {
> +		local_instance = megasas_mgmt_info.instance[i];
> +		if (local_instance && local_instance->crash_dump_drv_support) {
> +			if ((local_instance->adprecovery ==
> +				MEGASAS_HBA_OPERATIONAL) &&
> +				!megasas_set_crash_dump_params(local_instance,
> +					crash_support)) {
> +				local_instance->crash_dump_app_support =
> +					crash_support;
> +				dev_info(&local_instance->pdev->dev,
> +					"Application firmware crash "
> +					"dump mode set success\n");
> +				error = 0;
> +			} else {
> +				dev_info(&local_instance->pdev->dev,
> +					"Application firmware crash "
> +					"dump mode set failed\n");
> +				error = -1;
> +			}
> +		}
> +	}
> +	return error;
> +}
> +
>  /**
>   * megasas_mgmt_fw_ioctl -	Issues management ioctls to FW
>   * @instance:			Adapter soft state
> @@ -5569,6 +5851,12 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
>  					       MFI_FRAME_SGL64 |
>  					       MFI_FRAME_SENSE64));
>  
> +	if (cmd->frame->dcmd.opcode == MR_DRIVER_SET_APP_CRASHDUMP_MODE) {
> +		error = megasas_set_crash_dump_params_ioctl(cmd);
> +		megasas_return_cmd(instance, cmd);
> +		return error;
> +	}
> +
>  	/*
>  	 * The management interface between applications and the fw uses
>  	 * MFI frames. E.g, RAID configuration changes, LD property changes
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index f30297d..aaba2a7 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -91,6 +91,8 @@ void megasas_start_timer(struct megasas_instance *instance,
>  extern struct megasas_mgmt_info megasas_mgmt_info;
>  extern int resetwaittime;
>  
> +
> +
>  /**
>   * megasas_enable_intr_fusion -	Enables interrupts
>   * @regs:			MFI register set
> @@ -2057,7 +2059,7 @@ irqreturn_t megasas_isr_fusion(int irq, void *devp)
>  {
>  	struct megasas_irq_context *irq_context = devp;
>  	struct megasas_instance *instance = irq_context->instance;
> -	u32 mfiStatus, fw_state;
> +	u32 mfiStatus, fw_state, dma_state;
>  
>  	if (instance->mask_interrupts)
>  		return IRQ_NONE;
> @@ -2079,7 +2081,16 @@ irqreturn_t megasas_isr_fusion(int irq, void *devp)
>  		/* If we didn't complete any commands, check for FW fault */
>  		fw_state = instance->instancet->read_fw_status_reg(
>  			instance->reg_set) & MFI_STATE_MASK;
> -		if (fw_state == MFI_STATE_FAULT) {
> +		dma_state = instance->instancet->read_fw_status_reg
> +			(instance->reg_set) & MFI_STATE_DMADONE;
> +		if (instance->crash_dump_drv_support &&
> +			instance->crash_dump_app_support) {
> +			/* Start collecting crash, if DMA bit is done */
> +			if ((fw_state == MFI_STATE_FAULT) && dma_state)
> +				schedule_work(&instance->crash_init);
> +			else if (fw_state == MFI_STATE_FAULT)
> +				schedule_work(&instance->work_init);
> +		} else if (fw_state == MFI_STATE_FAULT) {
>  			printk(KERN_WARNING "megaraid_sas: Iop2SysDoorbellInt"
>  			       "for scsi%d\n", instance->host->host_no);
>  			schedule_work(&instance->work_init);
> @@ -2232,6 +2243,49 @@ megasas_read_fw_status_reg_fusion(struct megasas_register_set __iomem *regs)
>  }
>  
>  /**
> + * megasas_alloc_host_crash_buffer -	Host buffers for Crash dump collection from Firmware
> + * @instance:				Controller's soft instance
> + * return:			        Number of allocated host crash buffers
> + */
> +static void
> +megasas_alloc_host_crash_buffer(struct megasas_instance *instance)
> +{
> +	unsigned int i;
> +
> +	instance->crash_buf_pages = get_order(CRASH_DMA_BUF_SIZE);
> +	for (i = 0; i < MAX_CRASH_DUMP_SIZE; i++) {
> +		instance->crash_buf[i] = (void	*)__get_free_pages(GFP_KERNEL,
> +				instance->crash_buf_pages);
> +		if (!instance->crash_buf[i]) {
> +			dev_info(&instance->pdev->dev, "Firmware crash dump "
> +				"memory allocation failed at index %d\n", i);
> +			break;
> +		}
> +	}
> +	instance->drv_buf_alloc = i;
> +}
> +
> +/**
> + * megasas_free_host_crash_buffer -	Host buffers for Crash dump collection from Firmware
> + * @instance:				Controller's soft instance
> + */
> +void
> +megasas_free_host_crash_buffer(struct megasas_instance *instance)
> +{
> +	unsigned int i
> +;
> +	for (i = 0; i < MAX_CRASH_DUMP_SIZE; i++) {

I'm not sure, but shouldn't this be changed to ?
for (i = 0; i < instance->drv_buf_alloc; i++) {
 

> +		if (instance->crash_buf[i])
> +			free_pages((ulong)instance->crash_buf[i],
> +					instance->crash_buf_pages);
> +	}
> +	instance->drv_buf_index = 0;
> +	instance->drv_buf_alloc = 0;
> +	instance->fw_crash_state = UNAVAILABLE;
> +	instance->fw_crash_buffer_size = 0;
> +}
> +
> +/**
>   * megasas_adp_reset_fusion -	For controller reset
>   * @regs:				MFI register set
>   */
> @@ -2374,6 +2428,7 @@ int megasas_reset_fusion(struct Scsi_Host *shost, int iotimeout)
>  	struct megasas_cmd *cmd_mfi;
>  	union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
>  	u32 host_diag, abs_state, status_reg, reset_adapter;
> +	u32 io_timeout_in_crash_mode = 0;
>  
>  	instance = (struct megasas_instance *)shost->hostdata;
>  	fusion = instance->ctrl_context;
> @@ -2387,6 +2442,42 @@ int megasas_reset_fusion(struct Scsi_Host *shost, int iotimeout)
>  		mutex_unlock(&instance->reset_mutex);
>  		return FAILED;
>  	}
> +	status_reg = instance->instancet->read_fw_status_reg(instance->reg_set);
> +	abs_state = status_reg & MFI_STATE_MASK;
> +
> +	/* IO timeout detected, forcibly put FW in FAULT state */
> +	if (abs_state != MFI_STATE_FAULT && instance->crash_dump_buf &&
> +		instance->crash_dump_app_support && iotimeout) {
> +		dev_info(&instance->pdev->dev, "IO timeout is detected, "
> +			"forcibly FAULT Firmware\n");
> +		instance->adprecovery = MEGASAS_ADPRESET_SM_INFAULT;
> +		status_reg = readl(&instance->reg_set->doorbell);
> +		writel(status_reg | MFI_STATE_FORCE_OCR,
> +			&instance->reg_set->doorbell);
> +		readl(&instance->reg_set->doorbell);
> +		mutex_unlock(&instance->reset_mutex);
> +		do {
> +			ssleep(3);
> +			io_timeout_in_crash_mode++;
> +			dev_dbg(&instance->pdev->dev, "waiting for [%d] "
> +				"seconds for crash dump collection and OCR "
> +				"to be done\n", (io_timeout_in_crash_mode * 3));
> +		} while ((instance->adprecovery != MEGASAS_HBA_OPERATIONAL) &&
> +			(io_timeout_in_crash_mode < 80));
> +
> +		if (instance->adprecovery == MEGASAS_HBA_OPERATIONAL) {
> +			dev_info(&instance->pdev->dev, "OCR done for IO "
> +				"timeout case\n");
> +			retval = SUCCESS;
> +		} else {
> +			dev_info(&instance->pdev->dev, "Controller is not "
> +				"operational after 240 seconds wait for IO "
> +				"timeout case in FW crash dump mode\n do "
> +				"OCR/kill adapter\n");
> +			retval = megasas_reset_fusion(shost, 0);
> +		}
> +		return retval;
> +	}
>  
>  	if (instance->requestorId && !instance->skip_heartbeat_timer_del)
>  		del_timer_sync(&instance->sriov_heartbeat_timer);
> @@ -2653,6 +2744,15 @@ int megasas_reset_fusion(struct Scsi_Host *shost, int iotimeout)
>  			printk(KERN_WARNING "megaraid_sas: Reset "
>  			       "successful for scsi%d.\n",
>  				instance->host->host_no);
> +
> +			if (instance->crash_dump_drv_support) {
> +				if (instance->crash_dump_app_support)
> +					megasas_set_crash_dump_params(instance,
> +						MR_CRASH_BUF_TURN_ON);
> +				else
> +					megasas_set_crash_dump_params(instance,
> +						MR_CRASH_BUF_TURN_OFF);
> +			}
>  			retval = SUCCESS;
>  			goto out;
>  		}
> @@ -2681,6 +2781,74 @@ out:
>  	return retval;
>  }
>  
> +/* Fusion Crash dump collection work queue */
> +void  megasas_fusion_crash_dump_wq(struct work_struct *work)
> +{
> +	struct megasas_instance *instance =
> +		container_of(work, struct megasas_instance, crash_init);
> +	u32 status_reg;
> +	u8 partial_copy = 0;
> +
> +
> +	status_reg = instance->instancet->read_fw_status_reg(instance->reg_set);
> +
> +	/*
> +	 * Allocate host crash buffers to copy data from 1 MB DMA crash buffer
> +	 * to host crash buffers
> +	 */
> +	if (instance->drv_buf_index == 0) {
> +		/* Buffer is already allocated for old Crash dump.
> +		 * Do OCR and do not wait for crash dump collection
> +		 */
> +		if (instance->drv_buf_alloc) {
> +			dev_info(&instance->pdev->dev, "earlier crash dump is "
> +				"not yet copied by application, ignoring this "
> +				"crash dump and initiating OCR\n");
> +			status_reg |= MFI_STATE_CRASH_DUMP_DONE;
> +			writel(status_reg,
> +				&instance->reg_set->outbound_scratch_pad);
> +			readl(&instance->reg_set->outbound_scratch_pad);
> +			return;
> +		}
> +		megasas_alloc_host_crash_buffer(instance);
> +		dev_info(&instance->pdev->dev, "Number of host crash buffers "
> +			"allocated: %d\n", instance->drv_buf_alloc);
> +	}
> +
> +	/*
> +	 * Driver has allocated max buffers, which can be allocated
> +	 * and FW has more crash dump data, then driver will
> +	 * ignore the data.
> +	 */
> +	if (instance->drv_buf_index >= (instance->drv_buf_alloc)) {
> +		dev_info(&instance->pdev->dev, "Driver is done copying "
> +			"the buffer: %d\n", instance->drv_buf_alloc);
> +		status_reg |= MFI_STATE_CRASH_DUMP_DONE;
> +		partial_copy = 1;
> +	} else {
> +		memcpy(instance->crash_buf[instance->drv_buf_index],
> +			instance->crash_dump_buf, CRASH_DMA_BUF_SIZE);
> +		instance->drv_buf_index++;
> +		status_reg &= ~MFI_STATE_DMADONE;
> +	}
> +
> +	if (status_reg & MFI_STATE_CRASH_DUMP_DONE) {
> +		dev_info(&instance->pdev->dev, "Crash Dump is available,number "
> +			"of copied buffers: %d\n", instance->drv_buf_index);
> +		instance->fw_crash_buffer_size =  instance->drv_buf_index;
> +		instance->fw_crash_state = AVAILABLE;
> +		instance->drv_buf_index = 0;
> +		writel(status_reg, &instance->reg_set->outbound_scratch_pad);
> +		readl(&instance->reg_set->outbound_scratch_pad);
> +		if (!partial_copy)
> +			megasas_reset_fusion(instance->host, 0);
> +	} else {
> +		writel(status_reg, &instance->reg_set->outbound_scratch_pad);
> +		readl(&instance->reg_set->outbound_scratch_pad);
> +	}
> +}
> +
> +
>  /* Fusion OCR work queue */
>  void megasas_fusion_ocr_wq(struct work_struct *work)
>  {

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-09 15:54 ` Tomas Henzl
@ 2014-09-09 16:18   ` Elliott, Robert (Server Storage)
  2014-09-10 10:08     ` Tomas Henzl
  2014-09-10 12:12   ` Sumit Saxena
  1 sibling, 1 reply; 14+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-09-09 16:18 UTC (permalink / raw)
  To: Tomas Henzl, Sumit.Saxena, linux-scsi
  Cc: martin.petersen, hch, jbottomley, kashyap.desai, aradford



> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Tomas Henzl
> Sent: Tuesday, 09 September, 2014 10:54 AM
> Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
> support
> 
> On 09/06/2014 03:25 PM, Sumit.Saxena@avagotech.com wrote:
> > This feature will provide similar interface as kernel crash dump
> feature.
> > When megaraid firmware encounter any crash, driver will collect the
> firmware raw image and
> > dump it into pre-configured location.
> >
...
> With several controllers in a system this may take a lot memory,
> could you also
> in case when a kdump kernel is running lower it, by not using this
> feature?

What is the correct way for a driver to determine that it is
running in a kdump kernel?  The reset_devices global variable?


---
Rob Elliott    HP Server Storage




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-09 16:18   ` Elliott, Robert (Server Storage)
@ 2014-09-10 10:08     ` Tomas Henzl
  0 siblings, 0 replies; 14+ messages in thread
From: Tomas Henzl @ 2014-09-10 10:08 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage), Sumit.Saxena, linux-scsi
  Cc: martin.petersen, hch, jbottomley, kashyap.desai, aradford

On 09/09/2014 06:18 PM, Elliott, Robert (Server Storage) wrote:
>
>> -----Original Message-----
>> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>> owner@vger.kernel.org] On Behalf Of Tomas Henzl
>> Sent: Tuesday, 09 September, 2014 10:54 AM
>> Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
>> support
>>
>> On 09/06/2014 03:25 PM, Sumit.Saxena@avagotech.com wrote:
>>> This feature will provide similar interface as kernel crash dump
>> feature.
>>> When megaraid firmware encounter any crash, driver will collect the
>> firmware raw image and
>>> dump it into pre-configured location.
>>>
> ...
>> With several controllers in a system this may take a lot memory,
>> could you also
>> in case when a kdump kernel is running lower it, by not using this
>> feature?
> What is the correct way for a driver to determine that it is
> running in a kdump kernel?  The reset_devices global variable?

Yes reset_devices is used for this purpose.
I think, that I've seen some drivers even trying to reduce the normal
mem use, because a top performance is not needed.

>
>
> ---
> Rob Elliott    HP Server Storage
>
>
>
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-09 15:54 ` Tomas Henzl
  2014-09-09 16:18   ` Elliott, Robert (Server Storage)
@ 2014-09-10 12:12   ` Sumit Saxena
  2014-09-10 15:06     ` Elliott, Robert (Server Storage)
  1 sibling, 1 reply; 14+ messages in thread
From: Sumit Saxena @ 2014-09-10 12:12 UTC (permalink / raw)
  To: Tomas Henzl, linux-scsi
  Cc: martin.petersen, hch, jbottomley, Kashyap Desai, aradford

>-----Original Message-----
>From: Tomas Henzl [mailto:thenzl@redhat.com]
>Sent: Tuesday, September 09, 2014 9:24 PM
>To: Sumit.Saxena@avagotech.com; linux-scsi@vger.kernel.org
>Cc: martin.petersen@oracle.com; hch@infradead.org;
>jbottomley@parallels.com; kashyap.desai@avagotech.com;
>aradford@gmail.com
>Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
>support
>
>On 09/06/2014 03:25 PM, Sumit.Saxena@avagotech.com wrote:
>> This feature will provide similar interface as kernel crash dump feature.
>> When megaraid firmware encounter any crash, driver will collect the
>> firmware raw image and dump it into pre-configured location.
>>
>> Driver will allocate two different segment of memory.
>> #1 Non-DMA able large buffer (will be allocated on demand) to capture
>actual FW crash dump.
>> #2 DMA buffer (persistence allocation) just to do a arbitrator job.
>>
>> Firmware will keep writing Crash dump data in chucks of DMA buffer
>> size into #2, which will be copy back by driver to the host memory as
>described in #1.
>>
>> Driver-Firmware interface:
>> ==================
>> A.) Host driver can allocate maximum 512MB Host memory to store crash
>dump data.
>>
>> This memory will be internal to the host and will not be exposed to the
>Firmware.
>> Driver may not be able to allocate 512 MB. In that case, driver will
>> do possible memory (available at run time) allocation to store crash dump
>data.
>>
>> Let’s call this buffer as Host Crash Buffer.
>>
>> Host Crash buffer will not be contigious as a whole, but it will have
>> multiple
>chunk of contigious memory.
>> This will be internal to driver and firmware/application are unaware of
>> it.
>> Partial allocation of Host Crash buffer may have valid information to
>> debug depending upon what was collected in that buffer and depending on
>nature of failure.
>>
>> Complete Crash dump is the best case, but we do want to capture partial
>buffer just to grab something rather than nothing.
>> Host Crash buffer will be allocated only when FW Crash dump data is
>> available, and will be deallocated once application copy Host Crash
>> buffer to
>the file.
>> Host Crash buffer size can be anything between 1MB to 512MB. (It will
>> be multiple of 1MBs)
>>
>>
>> B.) Irrespective of underlying Firmware capability of crash dump
>> support, driver will allocate DMA buffer at start of the day for each MR
>controllers.
>> Let’s call this buffer as “DMA Crash Buffer”.
>>
>> For this feature, size of DMA crash buffer will be 1MB.
>> (We will not gain much even if DMA buffer size is increased.)
>>
>> C.) Driver will now read Controller Info sending existing dcmd
>“MR_DCMD_CTRL_GET_INFO”.
>> Driver should extract the information from ctrl info provided by
>> firmware and figure out if firmware support crash dump feature or not.
>>
>> Driver will enable crash dump feature only if “Firmware support Crash
>> dump” + “Driver was able to create DMA Crash Buffer”.
>>
>> If either one from above is not set, Crash dump feature should be disable
>> in
>driver.
>> Firmware will enable crash dump feature only if “Driver Send DCMD-
>MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
>>
>> Helper application/script should use sysfs parameter fw_crash_xxx to
>> actually copy data from host memory to the filesystem.
>
>Is it possible to store the crash dump data on filesystem on the same
>controller after the controller has crashed or do you expect that a use of
>another disk/controller?

The location where crash dump is to collected is configurable via config
file of application collecting dump.
Default config file settings is crash data will be collected on disk, on
which OS is booted up.
If Online controller reset(OCR) is enabled in crashed controller and OS disk
is behind same controller, crash data
Will be collected on same controller's filesystem.
There is one case, where crashed controller has OCR disabled and OS is
behind the crashed controller, crash data needs to collected behind
some other storage, then location of crash data collection needs to be
configured to some other disk.
Reason for the same is: crashed controller will be functional only after
system reboot and we need to collect data before system reboot,
so crash data needs to be collected behind disk, which is not on crashed
controller.

>With several controllers in a system this may take a lot memory, could you
>also
>in case when a kdump kernel is running lower it, by not using this feature?
>
Agreed, we will disable this feature for kdump kernel by adding
"reset_devices" global varaiable.
That check is required for only one place, throughout the code, this feature
will remain disabled.
Code snippet for the same-

        instance->crash_dump_drv_support = (!reset_devices) &&
crashdump_enable &&
                                instance->crash_dump_fw_support &&
                                instance->crash_dump_buf);
        if(instance->crash_dump_drv_support) {
                printk(KERN_INFO "megaraid_sas: FW Crash dump is
supported\n");
                megasas_set_crash_dump_params(instance,
MR_CRASH_BUF_TURN_OFF);

        } else {
..
        }

>see other comments inside
>
>>
>> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
>> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
>> ---
>>  drivers/scsi/megaraid/megaraid_sas.h        |  58 +++++-
>>  drivers/scsi/megaraid/megaraid_sas_base.c   | 292
>+++++++++++++++++++++++++++-
>>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 172 +++++++++++++++-
>>  3 files changed, 517 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>> b/drivers/scsi/megaraid/megaraid_sas.h
>> index bc7adcf..e0f03e2 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>> @@ -105,6 +105,9 @@
>>  #define MFI_STATE_READY				0xB0000000
>>  #define MFI_STATE_OPERATIONAL			0xC0000000
>>  #define MFI_STATE_FAULT				0xF0000000
>> +#define MFI_STATE_FORCE_OCR			0x00000080
>> +#define MFI_STATE_DMADONE			0x00000008
>> +#define MFI_STATE_CRASH_DUMP_DONE		0x00000004
>>  #define MFI_RESET_REQUIRED			0x00000001
>>  #define MFI_RESET_ADAPTER			0x00000002
>>  #define MEGAMFI_FRAME_SIZE			64
>> @@ -191,6 +194,9 @@
>>  #define MR_DCMD_CLUSTER_RESET_LD		0x08010200
>>  #define MR_DCMD_PD_LIST_QUERY                   0x02010100
>>
>> +#define MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS	0x01190100
>> +#define MR_DRIVER_SET_APP_CRASHDUMP_MODE	(0xF0010000 |
>0x0600)
>> +
>>  /*
>>   * Global functions
>>   */
>> @@ -264,6 +270,25 @@ enum MFI_STAT {
>>  };
>>
>>  /*
>> + * Crash dump related defines
>> + */
>> +#define MAX_CRASH_DUMP_SIZE 512
>> +#define CRASH_DMA_BUF_SIZE  (1024 * 1024)
>> +
>> +enum MR_FW_CRASH_DUMP_STATE {
>> +	UNAVAILABLE = 0,
>> +	AVAILABLE = 1,
>> +	COPYING = 2,
>> +	COPIED = 3,
>> +	COPY_ERROR = 4,
>> +};
>> +
>> +enum _MR_CRASH_BUF_STATUS {
>> +	MR_CRASH_BUF_TURN_OFF = 0,
>> +	MR_CRASH_BUF_TURN_ON = 1,
>> +};
>> +
>> +/*
>>   * Number of mailbox bytes in DCMD message frame
>>   */
>>  #define MFI_MBOX_SIZE				12
>> @@ -933,7 +958,19 @@ struct megasas_ctrl_info {
>>  		u8  reserved;                   /*0x7E7*/
>>  	} iov;
>>
>> -	u8          pad[0x800-0x7E8];           /*0x7E8 pad to 2k */
>> +	struct {
>> +#if defined(__BIG_ENDIAN_BITFIELD)
>> +		u32     reserved:25;
>> +		u32     supportCrashDump:1;
>> +		u32     reserved1:6;
>> +#else
>> +		u32     reserved1:6;
>> +		u32     supportCrashDump:1;
>> +		u32     reserved:25;
>> +#endif
>> +	} adapterOperations3;
>> +
>> +	u8          pad[0x800-0x7EC];
>>  } __packed;
>>
>>  /*
>> @@ -1559,6 +1596,20 @@ struct megasas_instance {
>>  	u32 *reply_queue;
>>  	dma_addr_t reply_queue_h;
>>
>> +	u32 *crash_dump_buf;
>> +	dma_addr_t crash_dump_h;
>> +	void *crash_buf[MAX_CRASH_DUMP_SIZE];
>> +	u32 crash_buf_pages;
>> +	unsigned int    fw_crash_buffer_size;
>> +	unsigned int    fw_crash_state;
>> +	unsigned int    fw_crash_buffer_offset;
>> +	u32 drv_buf_index;
>> +	u32 drv_buf_alloc;
>> +	u32 crash_dump_fw_support;
>> +	u32 crash_dump_drv_support;
>> +	u32 crash_dump_app_support;
>> +	spinlock_t crashdump_lock;
>> +
>>  	struct megasas_register_set __iomem *reg_set;
>>  	u32 *reply_post_host_index_addr[MR_MAX_MSIX_REG_ARRAY];
>>  	struct megasas_pd_list          pd_list[MEGASAS_MAX_PD];
>> @@ -1606,6 +1657,7 @@ struct megasas_instance {
>>  	struct megasas_instance_template *instancet;
>>  	struct tasklet_struct isr_tasklet;
>>  	struct work_struct work_init;
>> +	struct work_struct crash_init;
>>
>>  	u8 flag;
>>  	u8 unload;
>> @@ -1830,4 +1882,8 @@ u16 MR_LdSpanArrayGet(u32 ld, u32 span, struct
>> MR_FW_RAID_MAP_ALL *map);
>>  u16 MR_PdDevHandleGet(u32 pd, struct MR_FW_RAID_MAP_ALL *map);
>>  u16 MR_GetLDTgtId(u32 ld, struct MR_FW_RAID_MAP_ALL *map);
>>
>> +int megasas_set_crash_dump_params(struct megasas_instance *instance,
>> +		u8 crash_buf_state);
>> +void megasas_free_host_crash_buffer(struct megasas_instance
>> +*instance); void megasas_fusion_crash_dump_wq(struct work_struct
>> +*work);
>>  #endif				/*LSI_MEGARAID_SAS_H */
>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>> b/drivers/scsi/megaraid/megaraid_sas_base.c
>> index a894f13..5b58e39d 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>> @@ -2560,6 +2560,152 @@ static int megasas_change_queue_depth(struct
>scsi_device *sdev,
>>  	return queue_depth;
>>  }
>>
>> +static ssize_t
>> +megasas_fw_crash_buffer_store(struct device *cdev,
>> +	struct device_attribute *attr, const char *buf, size_t count) {
>> +	struct Scsi_Host *shost = class_to_shost(cdev);
>> +	struct megasas_instance *instance =
>> +		(struct megasas_instance *) shost->hostdata;
>> +	int val = 0;
>> +	unsigned long flags;
>> +
>> +	if (kstrtoint(buf, 0, &val) != 0)
>> +		return -EINVAL;
>> +
>> +	spin_lock_irqsave(&instance->crashdump_lock, flags);
>> +	instance->fw_crash_buffer_offset = val;
>> +	spin_unlock_irqrestore(&instance->crashdump_lock, flags);
>
>The access to fw_crash_buffer_offset is not protected in the function below
>why is the spinlock needed here, could it be removed?

While doing some code optimization for this patch, I used local variable "
buff_offset " to store "instance->fw_crash_buffer_offset",
And missed to put that code spinlock guarded. Inhouse driver does have this
issue, since "buff_offset" is not present in that code.
I will move  " buff_offset = instance->fw_crash_buffer_offset;" inside
spinlock.

>
>> +	return strlen(buf);
>> +}
>> +
>> +static ssize_t
>> +megasas_fw_crash_buffer_show(struct device *cdev,
>> +	struct device_attribute *attr, char *buf) {
>> +	struct Scsi_Host *shost = class_to_shost(cdev);
>> +	struct megasas_instance *instance =
>> +		(struct megasas_instance *) shost->hostdata;
>> +	u32 size;
>> +	unsigned long buff_addr;
>> +	unsigned long dmachunk = CRASH_DMA_BUF_SIZE;
>> +	unsigned long src_addr;
>> +	unsigned long flags;
>> +	u32 buff_offset;
>> +
>> +	buff_offset = instance->fw_crash_buffer_offset;
>> +	spin_lock_irqsave(&instance->crashdump_lock, flags);
>> +	if (!instance->crash_dump_buf &&
>> +		!((instance->fw_crash_state == AVAILABLE) ||
>> +		(instance->fw_crash_state == COPYING))) {
>> +		dev_err(&instance->pdev->dev,
>> +			"Firmware crash dump is not available\n");
>> +		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
>> +		return -EINVAL;
>> +	}
>> +
>> +	buff_addr = (unsigned long) buf;
>> +
>> +	if (buff_offset >
>> +		(instance->fw_crash_buffer_size * dmachunk)) {
>> +		dev_err(&instance->pdev->dev,
>> +			"Firmware crash dump offset is out of range\n");
>> +		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
>> +		return 0;
>> +	}
>> +
>> +	size = (instance->fw_crash_buffer_size * dmachunk) - buff_offset;
>> +	size = (size >= PAGE_SIZE) ? (PAGE_SIZE - 1) : size;
>> +
>> +	src_addr = (unsigned long)instance->crash_buf[buff_offset /
>dmachunk] +
>> +		(buff_offset % dmachunk);
>> +	memcpy(buf, (void *)src_addr,  size);
>> +	spin_unlock_irqrestore(&instance->crashdump_lock, flags);
>> +
>> +	return size;
>> +}
>> +
>> +static ssize_t
>> +megasas_fw_crash_buffer_size_show(struct device *cdev,
>> +	struct device_attribute *attr, char *buf) {
>> +	struct Scsi_Host *shost = class_to_shost(cdev);
>> +	struct megasas_instance *instance =
>> +		(struct megasas_instance *) shost->hostdata;
>> +
>> +	return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long)
>> +		((instance->fw_crash_buffer_size) * 1024 *
>1024)/PAGE_SIZE); }
>> +
>> +static ssize_t
>> +megasas_fw_crash_state_store(struct device *cdev,
>> +	struct device_attribute *attr, const char *buf, size_t count) {
>> +	struct Scsi_Host *shost = class_to_shost(cdev);
>> +	struct megasas_instance *instance =
>> +		(struct megasas_instance *) shost->hostdata;
>> +	int val = 0;
>> +	unsigned long flags;
>> +
>> +	if (kstrtoint(buf, 0, &val) != 0)
>> +		return -EINVAL;
>> +
>> +	if ((val <= AVAILABLE || val > COPY_ERROR)) {
>> +		dev_err(&instance->pdev->dev, "application updates invalid "
>> +			"firmware crash state\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	instance->fw_crash_state = val;
>> +
>> +	if ((val == COPIED) || (val == COPY_ERROR)) {
>> +		spin_lock_irqsave(&instance->crashdump_lock, flags);
>> +		megasas_free_host_crash_buffer(instance);
>> +		spin_unlock_irqrestore(&instance->crashdump_lock, flags);
>> +		if (val == COPY_ERROR)
>> +			dev_info(&instance->pdev->dev, "application failed
>to "
>> +				"copy Firmware crash dump\n");
>> +		else
>> +			dev_info(&instance->pdev->dev, "Firmware crash
>dump "
>> +				"copied successfully\n");
>> +	}
>> +	return strlen(buf);
>> +}
>> +
>> +static ssize_t
>> +megasas_fw_crash_state_show(struct device *cdev,
>> +	struct device_attribute *attr, char *buf) {
>> +	struct Scsi_Host *shost = class_to_shost(cdev);
>> +	struct megasas_instance *instance =
>> +		(struct megasas_instance *) shost->hostdata;
>> +	return snprintf(buf, PAGE_SIZE, "%d\n", instance->fw_crash_state); }
>> +
>> +static ssize_t
>> +megasas_page_size_show(struct device *cdev,
>> +	struct device_attribute *attr, char *buf) {
>> +	return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long)PAGE_SIZE -
>> +1); }
>> +
>> +static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR,
>> +	megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store);
>static
>> +DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO,
>> +	megasas_fw_crash_buffer_size_show, NULL); static
>> +DEVICE_ATTR(fw_crash_state, S_IRUGO | S_IWUSR,
>> +	megasas_fw_crash_state_show, megasas_fw_crash_state_store);
>static
>> +DEVICE_ATTR(page_size, S_IRUGO,
>> +	megasas_page_size_show, NULL);
>> +
>> +struct device_attribute *megaraid_host_attrs[] = {
>> +	&dev_attr_fw_crash_buffer_size,
>> +	&dev_attr_fw_crash_buffer,
>> +	&dev_attr_fw_crash_state,
>> +	&dev_attr_page_size,
>> +	NULL,
>> +};
>> +
>>  /*
>>   * Scsi host template for megaraid_sas driver
>>   */
>> @@ -2575,6 +2721,7 @@ static struct scsi_host_template
>megasas_template = {
>>  	.eh_bus_reset_handler = megasas_reset_bus_host,
>>  	.eh_host_reset_handler = megasas_reset_bus_host,
>>  	.eh_timed_out = megasas_reset_timer,
>> +	.shost_attrs = megaraid_host_attrs,
>>  	.bios_param = megasas_bios_param,
>>  	.use_clustering = ENABLE_CLUSTERING,
>>  	.change_queue_depth = megasas_change_queue_depth, @@ -
>3887,6
>> +4034,59 @@ megasas_get_ctrl_info(struct megasas_instance *instance,
>>  	return ret;
>>  }
>>
>> +/*
>> + * megasas_set_crash_dump_params -	Sends address of crash dump
>DMA buffer
>> + *					to firmware
>> + *
>> + * @instance:				Adapter soft state
>> + * @crash_buf_state		-	tell FW to turn ON/OFF crash
>dump feature
>> +					MR_CRASH_BUF_TURN_OFF = 0
>> +					MR_CRASH_BUF_TURN_ON = 1
>> + * @return 0 on success non-zero on failure.
>> + * Issues an internal command (DCMD) to set parameters for crash dump
>feature.
>> + * Driver will send address of crash dump DMA buffer and set mbox to
>> +tell FW
>> + * that driver supports crash dump feature. This DCMD will be sent
>> +only if
>> + * crash dump feature is supported by the FW.
>> + *
>> + */
>> +int megasas_set_crash_dump_params(struct megasas_instance *instance,
>> +	u8 crash_buf_state)
>> +{
>> +	int ret = 0;
>> +	struct megasas_cmd *cmd;
>> +	struct megasas_dcmd_frame *dcmd;
>> +
>> +	cmd = megasas_get_cmd(instance);
>> +
>> +	if (!cmd) {
>> +		dev_err(&instance->pdev->dev, "Failed to get a free cmd\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +
>> +	dcmd = &cmd->frame->dcmd;
>> +
>> +	memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE);
>> +	dcmd->mbox.b[0] = crash_buf_state;
>> +	dcmd->cmd = MFI_CMD_DCMD;
>> +	dcmd->cmd_status = 0xFF;
>> +	dcmd->sge_count = 1;
>> +	dcmd->flags = cpu_to_le16(MFI_FRAME_DIR_NONE);
>> +	dcmd->timeout = 0;
>> +	dcmd->pad_0 = 0;
>> +	dcmd->data_xfer_len = cpu_to_le32(CRASH_DMA_BUF_SIZE);
>> +	dcmd->opcode =
>cpu_to_le32(MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS);
>> +	dcmd->sgl.sge32[0].phys_addr = cpu_to_le32(instance-
>>crash_dump_h);
>> +	dcmd->sgl.sge32[0].length = cpu_to_le32(CRASH_DMA_BUF_SIZE);
>> +
>> +	if (!megasas_issue_polled(instance, cmd))
>> +		ret = 0;
>> +	else
>> +		ret = -1;
>> +	megasas_return_cmd(instance, cmd);
>> +	return ret;
>> +}
>> +
>>  /**
>>   * megasas_issue_init_mfi -	Initializes the FW
>>   * @instance:		Adapter soft state
>> @@ -4272,6 +4472,27 @@ static int megasas_init_fw(struct
>megasas_instance *instance)
>>  			printk(KERN_WARNING "megaraid_sas: I am VF "
>>  			       "requestorId %d\n", instance->requestorId);
>>  		}
>> +
>> +		le32_to_cpus((u32 *)&ctrl_info->adapterOperations3);
>> +		instance->crash_dump_fw_support =
>> +			ctrl_info->adapterOperations3.supportCrashDump;
>> +		instance->crash_dump_drv_support =
>> +			(instance->crash_dump_fw_support &&
>> +			instance->crash_dump_buf);
>> +		if (instance->crash_dump_drv_support) {
>> +			dev_info(&instance->pdev->dev, "Firmware Crash
>dump "
>> +				"feature is supported\n");
>> +			megasas_set_crash_dump_params(instance,
>> +				MR_CRASH_BUF_TURN_OFF);
>> +
>> +		} else {
>> +			if (instance->crash_dump_buf)
>> +				pci_free_consistent(instance->pdev,
>> +					CRASH_DMA_BUF_SIZE,
>> +					instance->crash_dump_buf,
>> +					instance->crash_dump_h);
>> +			instance->crash_dump_buf = NULL;
>> +		}
>>  	}
>>  	instance->max_sectors_per_req = instance->max_num_sge *
>>  						PAGE_SIZE / 512;
>> @@ -4791,6 +5012,21 @@ static int megasas_probe_one(struct pci_dev
>*pdev,
>>  		break;
>>  	}
>>
>> +	/* Crash dump feature related initialisation*/
>> +	instance->drv_buf_index = 0;
>> +	instance->drv_buf_alloc = 0;
>> +	instance->crash_dump_fw_support = 0;
>> +	instance->crash_dump_app_support = 0;
>> +	instance->fw_crash_state = UNAVAILABLE;
>> +	spin_lock_init(&instance->crashdump_lock);
>> +
>> +	instance->crash_dump_buf = pci_alloc_consistent(pdev,
>> +						CRASH_DMA_BUF_SIZE,
>> +						&instance->crash_dump_h);
>> +	if (!instance->crash_dump_buf)
>> +		dev_err(&instance->pdev->dev, "Can't allocate Firmware "
>> +			"crash dump DMA buffer\n");
>> +
>>  	megasas_poll_wait_aen = 0;
>>  	instance->flag_ieee = 0;
>>  	instance->ev = NULL;
>> @@ -4852,9 +5088,10 @@ static int megasas_probe_one(struct pci_dev
>*pdev,
>>  	if ((instance->pdev->device == PCI_DEVICE_ID_LSI_FUSION) ||
>>  	    (instance->pdev->device == PCI_DEVICE_ID_LSI_PLASMA) ||
>>  	    (instance->pdev->device == PCI_DEVICE_ID_LSI_INVADER) ||
>> -	    (instance->pdev->device == PCI_DEVICE_ID_LSI_FURY))
>> +	    (instance->pdev->device == PCI_DEVICE_ID_LSI_FURY)) {
>>  		INIT_WORK(&instance->work_init, megasas_fusion_ocr_wq);
>> -	else
>> +		INIT_WORK(&instance->crash_init,
>megasas_fusion_crash_dump_wq);
>> +	} else
>>  		INIT_WORK(&instance->work_init,
>process_fw_state_change_wq);
>>
>>  	/*
>> @@ -5342,6 +5579,8 @@ static void megasas_detach_one(struct pci_dev
>*pdev)
>>  	if (instance->requestorId && !instance->skip_heartbeat_timer_del)
>>  		del_timer_sync(&instance->sriov_heartbeat_timer);
>>
>> +	if (instance->fw_crash_state != UNAVAILABLE)
>> +		megasas_free_host_crash_buffer(instance);
>>  	scsi_remove_host(instance->host);
>>  	megasas_flush_cache(instance);
>>  	megasas_shutdown_controller(instance,
>MR_DCMD_CTRL_SHUTDOWN); @@
>> -5432,6 +5671,10 @@ static void megasas_detach_one(struct pci_dev
>*pdev)
>>  				    instance->hb_host_mem,
>>  				    instance->hb_host_mem_h);
>>
>> +	if (instance->crash_dump_buf)
>> +		pci_free_consistent(pdev, CRASH_DMA_BUF_SIZE,
>> +			    instance->crash_dump_buf, instance-
>>crash_dump_h);
>> +
>>  	scsi_host_put(host);
>>
>>  	pci_disable_device(pdev);
>> @@ -5523,6 +5766,45 @@ static unsigned int megasas_mgmt_poll(struct file
>*file, poll_table *wait)
>>  	return mask;
>>  }
>>
>> +/*
>> + * megasas_set_crash_dump_params_ioctl:
>> + *		Send CRASH_DUMP_MODE DCMD to all controllers
>> + * @cmd:	MFI command frame
>> + */
>> +
>> +static int megasas_set_crash_dump_params_ioctl(
>> +	struct megasas_cmd *cmd)
>> +{
>> +	struct megasas_instance *local_instance;
>> +	int i, error = 0;
>> +	int crash_support;
>> +
>> +	crash_support = cmd->frame->dcmd.mbox.w[0];
>> +
>> +	for (i = 0; i < megasas_mgmt_info.max_index; i++) {
>> +		local_instance = megasas_mgmt_info.instance[i];
>> +		if (local_instance && local_instance-
>>crash_dump_drv_support) {
>> +			if ((local_instance->adprecovery ==
>> +				MEGASAS_HBA_OPERATIONAL) &&
>> +
>	!megasas_set_crash_dump_params(local_instance,
>> +					crash_support)) {
>> +				local_instance->crash_dump_app_support =
>> +					crash_support;
>> +				dev_info(&local_instance->pdev->dev,
>> +					"Application firmware crash "
>> +					"dump mode set success\n");
>> +				error = 0;
>> +			} else {
>> +				dev_info(&local_instance->pdev->dev,
>> +					"Application firmware crash "
>> +					"dump mode set failed\n");
>> +				error = -1;
>> +			}
>> +		}
>> +	}
>> +	return error;
>> +}
>> +
>>  /**
>>   * megasas_mgmt_fw_ioctl -	Issues management ioctls to FW
>>   * @instance:			Adapter soft state
>> @@ -5569,6 +5851,12 @@ megasas_mgmt_fw_ioctl(struct
>megasas_instance *instance,
>>  					       MFI_FRAME_SGL64 |
>>  					       MFI_FRAME_SENSE64));
>>
>> +	if (cmd->frame->dcmd.opcode ==
>MR_DRIVER_SET_APP_CRASHDUMP_MODE) {
>> +		error = megasas_set_crash_dump_params_ioctl(cmd);
>> +		megasas_return_cmd(instance, cmd);
>> +		return error;
>> +	}
>> +
>>  	/*
>>  	 * The management interface between applications and the fw uses
>>  	 * MFI frames. E.g, RAID configuration changes, LD property changes
>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> index f30297d..aaba2a7 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> @@ -91,6 +91,8 @@ void megasas_start_timer(struct megasas_instance
>> *instance,  extern struct megasas_mgmt_info megasas_mgmt_info;  extern
>> int resetwaittime;
>>
>> +
>> +
>>  /**
>>   * megasas_enable_intr_fusion -	Enables interrupts
>>   * @regs:			MFI register set
>> @@ -2057,7 +2059,7 @@ irqreturn_t megasas_isr_fusion(int irq, void
>> *devp)  {
>>  	struct megasas_irq_context *irq_context = devp;
>>  	struct megasas_instance *instance = irq_context->instance;
>> -	u32 mfiStatus, fw_state;
>> +	u32 mfiStatus, fw_state, dma_state;
>>
>>  	if (instance->mask_interrupts)
>>  		return IRQ_NONE;
>> @@ -2079,7 +2081,16 @@ irqreturn_t megasas_isr_fusion(int irq, void
>*devp)
>>  		/* If we didn't complete any commands, check for FW fault */
>>  		fw_state = instance->instancet->read_fw_status_reg(
>>  			instance->reg_set) & MFI_STATE_MASK;
>> -		if (fw_state == MFI_STATE_FAULT) {
>> +		dma_state = instance->instancet->read_fw_status_reg
>> +			(instance->reg_set) & MFI_STATE_DMADONE;
>> +		if (instance->crash_dump_drv_support &&
>> +			instance->crash_dump_app_support) {
>> +			/* Start collecting crash, if DMA bit is done */
>> +			if ((fw_state == MFI_STATE_FAULT) && dma_state)
>> +				schedule_work(&instance->crash_init);
>> +			else if (fw_state == MFI_STATE_FAULT)
>> +				schedule_work(&instance->work_init);
>> +		} else if (fw_state == MFI_STATE_FAULT) {
>>  			printk(KERN_WARNING "megaraid_sas:
>Iop2SysDoorbellInt"
>>  			       "for scsi%d\n", instance->host->host_no);
>>  			schedule_work(&instance->work_init);
>> @@ -2232,6 +2243,49 @@ megasas_read_fw_status_reg_fusion(struct
>> megasas_register_set __iomem *regs)  }
>>
>>  /**
>> + * megasas_alloc_host_crash_buffer -	Host buffers for Crash dump
>collection from Firmware
>> + * @instance:				Controller's soft instance
>> + * return:			        Number of allocated host crash buffers
>> + */
>> +static void
>> +megasas_alloc_host_crash_buffer(struct megasas_instance *instance) {
>> +	unsigned int i;
>> +
>> +	instance->crash_buf_pages = get_order(CRASH_DMA_BUF_SIZE);
>> +	for (i = 0; i < MAX_CRASH_DUMP_SIZE; i++) {
>> +		instance->crash_buf[i] = (void
>	*)__get_free_pages(GFP_KERNEL,
>> +				instance->crash_buf_pages);
>> +		if (!instance->crash_buf[i]) {
>> +			dev_info(&instance->pdev->dev, "Firmware crash
>dump "
>> +				"memory allocation failed at index %d\n", i);
>> +			break;
>> +		}
>> +	}
>> +	instance->drv_buf_alloc = i;
>> +}
>> +
>> +/**
>> + * megasas_free_host_crash_buffer -	Host buffers for Crash dump
>collection from Firmware
>> + * @instance:				Controller's soft instance
>> + */
>> +void
>> +megasas_free_host_crash_buffer(struct megasas_instance *instance) {
>> +	unsigned int i
>> +;
>> +	for (i = 0; i < MAX_CRASH_DUMP_SIZE; i++) {
>
>I'm not sure, but shouldn't this be changed to ?
>for (i = 0; i < instance->drv_buf_alloc; i++) {
>
>
Agreed, not critical from functional point of view,
 will do this change and resubmit the patch.

>> +		if (instance->crash_buf[i])
>> +			free_pages((ulong)instance->crash_buf[i],
>> +					instance->crash_buf_pages);
>> +	}
>> +	instance->drv_buf_index = 0;
>> +	instance->drv_buf_alloc = 0;
>> +	instance->fw_crash_state = UNAVAILABLE;
>> +	instance->fw_crash_buffer_size = 0;
>> +}
>> +
>> +/**
>>   * megasas_adp_reset_fusion -	For controller reset
>>   * @regs:				MFI register set
>>   */
>> @@ -2374,6 +2428,7 @@ int megasas_reset_fusion(struct Scsi_Host *shost,
>int iotimeout)
>>  	struct megasas_cmd *cmd_mfi;
>>  	union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
>>  	u32 host_diag, abs_state, status_reg, reset_adapter;
>> +	u32 io_timeout_in_crash_mode = 0;
>>
>>  	instance = (struct megasas_instance *)shost->hostdata;
>>  	fusion = instance->ctrl_context;
>> @@ -2387,6 +2442,42 @@ int megasas_reset_fusion(struct Scsi_Host
>*shost, int iotimeout)
>>  		mutex_unlock(&instance->reset_mutex);
>>  		return FAILED;
>>  	}
>> +	status_reg = instance->instancet->read_fw_status_reg(instance-
>>reg_set);
>> +	abs_state = status_reg & MFI_STATE_MASK;
>> +
>> +	/* IO timeout detected, forcibly put FW in FAULT state */
>> +	if (abs_state != MFI_STATE_FAULT && instance->crash_dump_buf
>&&
>> +		instance->crash_dump_app_support && iotimeout) {
>> +		dev_info(&instance->pdev->dev, "IO timeout is detected, "
>> +			"forcibly FAULT Firmware\n");
>> +		instance->adprecovery = MEGASAS_ADPRESET_SM_INFAULT;
>> +		status_reg = readl(&instance->reg_set->doorbell);
>> +		writel(status_reg | MFI_STATE_FORCE_OCR,
>> +			&instance->reg_set->doorbell);
>> +		readl(&instance->reg_set->doorbell);
>> +		mutex_unlock(&instance->reset_mutex);
>> +		do {
>> +			ssleep(3);
>> +			io_timeout_in_crash_mode++;
>> +			dev_dbg(&instance->pdev->dev, "waiting for [%d] "
>> +				"seconds for crash dump collection and OCR "
>> +				"to be done\n", (io_timeout_in_crash_mode
>* 3));
>> +		} while ((instance->adprecovery !=
>MEGASAS_HBA_OPERATIONAL) &&
>> +			(io_timeout_in_crash_mode < 80));
>> +
>> +		if (instance->adprecovery == MEGASAS_HBA_OPERATIONAL)
>{
>> +			dev_info(&instance->pdev->dev, "OCR done for IO "
>> +				"timeout case\n");
>> +			retval = SUCCESS;
>> +		} else {
>> +			dev_info(&instance->pdev->dev, "Controller is not "
>> +				"operational after 240 seconds wait for IO "
>> +				"timeout case in FW crash dump mode\n do "
>> +				"OCR/kill adapter\n");
>> +			retval = megasas_reset_fusion(shost, 0);
>> +		}
>> +		return retval;
>> +	}
>>
>>  	if (instance->requestorId && !instance->skip_heartbeat_timer_del)
>>  		del_timer_sync(&instance->sriov_heartbeat_timer);
>> @@ -2653,6 +2744,15 @@ int megasas_reset_fusion(struct Scsi_Host
>*shost, int iotimeout)
>>  			printk(KERN_WARNING "megaraid_sas: Reset "
>>  			       "successful for scsi%d.\n",
>>  				instance->host->host_no);
>> +
>> +			if (instance->crash_dump_drv_support) {
>> +				if (instance->crash_dump_app_support)
>> +
>	megasas_set_crash_dump_params(instance,
>> +						MR_CRASH_BUF_TURN_ON);
>> +				else
>> +
>	megasas_set_crash_dump_params(instance,
>> +						MR_CRASH_BUF_TURN_OFF);
>> +			}
>>  			retval = SUCCESS;
>>  			goto out;
>>  		}
>> @@ -2681,6 +2781,74 @@ out:
>>  	return retval;
>>  }
>>
>> +/* Fusion Crash dump collection work queue */ void
>> +megasas_fusion_crash_dump_wq(struct work_struct *work) {
>> +	struct megasas_instance *instance =
>> +		container_of(work, struct megasas_instance, crash_init);
>> +	u32 status_reg;
>> +	u8 partial_copy = 0;
>> +
>> +
>> +	status_reg =
>> +instance->instancet->read_fw_status_reg(instance->reg_set);
>> +
>> +	/*
>> +	 * Allocate host crash buffers to copy data from 1 MB DMA crash
>buffer
>> +	 * to host crash buffers
>> +	 */
>> +	if (instance->drv_buf_index == 0) {
>> +		/* Buffer is already allocated for old Crash dump.
>> +		 * Do OCR and do not wait for crash dump collection
>> +		 */
>> +		if (instance->drv_buf_alloc) {
>> +			dev_info(&instance->pdev->dev, "earlier crash dump
>is "
>> +				"not yet copied by application, ignoring this "
>> +				"crash dump and initiating OCR\n");
>> +			status_reg |= MFI_STATE_CRASH_DUMP_DONE;
>> +			writel(status_reg,
>> +				&instance->reg_set-
>>outbound_scratch_pad);
>> +			readl(&instance->reg_set->outbound_scratch_pad);
>> +			return;
>> +		}
>> +		megasas_alloc_host_crash_buffer(instance);
>> +		dev_info(&instance->pdev->dev, "Number of host crash
>buffers "
>> +			"allocated: %d\n", instance->drv_buf_alloc);
>> +	}
>> +
>> +	/*
>> +	 * Driver has allocated max buffers, which can be allocated
>> +	 * and FW has more crash dump data, then driver will
>> +	 * ignore the data.
>> +	 */
>> +	if (instance->drv_buf_index >= (instance->drv_buf_alloc)) {
>> +		dev_info(&instance->pdev->dev, "Driver is done copying "
>> +			"the buffer: %d\n", instance->drv_buf_alloc);
>> +		status_reg |= MFI_STATE_CRASH_DUMP_DONE;
>> +		partial_copy = 1;
>> +	} else {
>> +		memcpy(instance->crash_buf[instance->drv_buf_index],
>> +			instance->crash_dump_buf, CRASH_DMA_BUF_SIZE);
>> +		instance->drv_buf_index++;
>> +		status_reg &= ~MFI_STATE_DMADONE;
>> +	}
>> +
>> +	if (status_reg & MFI_STATE_CRASH_DUMP_DONE) {
>> +		dev_info(&instance->pdev->dev, "Crash Dump is
>available,number "
>> +			"of copied buffers: %d\n", instance->drv_buf_index);
>> +		instance->fw_crash_buffer_size =  instance->drv_buf_index;
>> +		instance->fw_crash_state = AVAILABLE;
>> +		instance->drv_buf_index = 0;
>> +		writel(status_reg, &instance->reg_set-
>>outbound_scratch_pad);
>> +		readl(&instance->reg_set->outbound_scratch_pad);
>> +		if (!partial_copy)
>> +			megasas_reset_fusion(instance->host, 0);
>> +	} else {
>> +		writel(status_reg, &instance->reg_set-
>>outbound_scratch_pad);
>> +		readl(&instance->reg_set->outbound_scratch_pad);
>> +	}
>> +}
>> +
>> +
>>  /* Fusion OCR work queue */
>>  void megasas_fusion_ocr_wq(struct work_struct *work)  {
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-10 12:12   ` Sumit Saxena
@ 2014-09-10 15:06     ` Elliott, Robert (Server Storage)
  2014-09-10 15:28       ` Tomas Henzl
  0 siblings, 1 reply; 14+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-09-10 15:06 UTC (permalink / raw)
  To: Sumit Saxena, Tomas Henzl, linux-scsi
  Cc: martin.petersen, hch, jbottomley, Kashyap Desai, aradford,
	Michal Schmidt, amirv, vgoyal,
	Jens Axboe <axboe@kernel.dk> (axboe@kernel.dk),
	scameron

> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Sumit Saxena
> 
> >From: Tomas Henzl [mailto:thenzl@redhat.com]
> >
> >With several controllers in a system this may take a lot memory,
> > could you also in case when a kdump kernel is running lower it,
> > by not using this feature?
> >
> Agreed, we will disable this feature for kdump kernel by adding
> "reset_devices" global varaiable.
> That check is required for only one place, throughout the code, this
> feature will remain disabled.  Code snippet for the same-
> 
>         instance->crash_dump_drv_support = (!reset_devices) &&
> crashdump_enable &&
>                                 instance->crash_dump_fw_support &&
>                                 instance->crash_dump_buf);
>         if(instance->crash_dump_drv_support) {
>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
> supported\n");
>                 megasas_set_crash_dump_params(instance,
> MR_CRASH_BUF_TURN_OFF);
> 
>         } else {
> ..
>         }

Network drivers have been running into similar problems.

There's a new patch from Amir coming through net-next to make 
is_kdump_kernel() (in crash_dump.h) accessible to modules.
That may be a better signal than reset_devices that the
driver should use minimal resources.

http://comments.gmane.org/gmane.linux.network/324737

I'm not sure about the logistics of a SCSI patch depending
on a net-next patch.

---
Rob Elliott    HP Server Storage





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-10 15:06     ` Elliott, Robert (Server Storage)
@ 2014-09-10 15:28       ` Tomas Henzl
  2014-09-10 15:46         ` Vivek Goyal
  0 siblings, 1 reply; 14+ messages in thread
From: Tomas Henzl @ 2014-09-10 15:28 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage), Sumit Saxena, linux-scsi
  Cc: martin.petersen, hch, jbottomley, Kashyap Desai, aradford,
	Michal Schmidt, amirv, vgoyal,
	Jens Axboe <axboe@kernel.dk> (axboe@kernel.dk),
	scameron

On 09/10/2014 05:06 PM, Elliott, Robert (Server Storage) wrote:
>> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>> owner@vger.kernel.org] On Behalf Of Sumit Saxena
>>
>>> From: Tomas Henzl [mailto:thenzl@redhat.com]
>>>
>>> With several controllers in a system this may take a lot memory,
>>> could you also in case when a kdump kernel is running lower it,
>>> by not using this feature?
>>>
>> Agreed, we will disable this feature for kdump kernel by adding
>> "reset_devices" global varaiable.
>> That check is required for only one place, throughout the code, this
>> feature will remain disabled.  Code snippet for the same-
>>
>>         instance->crash_dump_drv_support = (!reset_devices) &&
>> crashdump_enable &&
>>                                 instance->crash_dump_fw_support &&
>>                                 instance->crash_dump_buf);
>>         if(instance->crash_dump_drv_support) {
>>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
>> supported\n");
>>                 megasas_set_crash_dump_params(instance,
>> MR_CRASH_BUF_TURN_OFF);
>>
>>         } else {
>> ..
>>         }
> Network drivers have been running into similar problems.
>
> There's a new patch from Amir coming through net-next to make 
> is_kdump_kernel() (in crash_dump.h) accessible to modules.
> That may be a better signal than reset_devices that the
> driver should use minimal resources.
>
> http://comments.gmane.org/gmane.linux.network/324737
>
> I'm not sure about the logistics of a SCSI patch depending
> on a net-next patch.

Probably better to start with reset_devices and switch to is_kdump_kernel()
later. 
This is not a discussion about reset_devices versus is_kdump_kernel, but
while it looks good to have it distinguished - is the reset_devices actually
used anywhere else than in kdump kernel?
 

>
> ---
> Rob Elliott    HP Server Storage
>
>
>
>
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-10 15:28       ` Tomas Henzl
@ 2014-09-10 15:46         ` Vivek Goyal
  2014-09-11  9:02           ` Kashyap Desai
  0 siblings, 1 reply; 14+ messages in thread
From: Vivek Goyal @ 2014-09-10 15:46 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: Elliott, Robert (Server Storage),
	Sumit Saxena, linux-scsi, martin.petersen, hch, jbottomley,
	Kashyap Desai, aradford, Michal Schmidt, amirv,
	Jens Axboe <axboe@kernel.dk> (axboe@kernel.dk),
	scameron

On Wed, Sep 10, 2014 at 05:28:40PM +0200, Tomas Henzl wrote:
> On 09/10/2014 05:06 PM, Elliott, Robert (Server Storage) wrote:
> >> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> >> owner@vger.kernel.org] On Behalf Of Sumit Saxena
> >>
> >>> From: Tomas Henzl [mailto:thenzl@redhat.com]
> >>>
> >>> With several controllers in a system this may take a lot memory,
> >>> could you also in case when a kdump kernel is running lower it,
> >>> by not using this feature?
> >>>
> >> Agreed, we will disable this feature for kdump kernel by adding
> >> "reset_devices" global varaiable.
> >> That check is required for only one place, throughout the code, this
> >> feature will remain disabled.  Code snippet for the same-
> >>
> >>         instance->crash_dump_drv_support = (!reset_devices) &&
> >> crashdump_enable &&
> >>                                 instance->crash_dump_fw_support &&
> >>                                 instance->crash_dump_buf);
> >>         if(instance->crash_dump_drv_support) {
> >>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
> >> supported\n");
> >>                 megasas_set_crash_dump_params(instance,
> >> MR_CRASH_BUF_TURN_OFF);
> >>
> >>         } else {
> >> ..
> >>         }
> > Network drivers have been running into similar problems.
> >
> > There's a new patch from Amir coming through net-next to make 
> > is_kdump_kernel() (in crash_dump.h) accessible to modules.
> > That may be a better signal than reset_devices that the
> > driver should use minimal resources.
> >
> > http://comments.gmane.org/gmane.linux.network/324737
> >
> > I'm not sure about the logistics of a SCSI patch depending
> > on a net-next patch.
> 
> Probably better to start with reset_devices and switch to is_kdump_kernel()
> later. 
> This is not a discussion about reset_devices versus is_kdump_kernel, but
> while it looks good to have it distinguished - is the reset_devices actually
> used anywhere else than in kdump kernel?

I think usage of reset_devices for lowering memory footprint of driver
is plain wrong. It tells driver to only reset the device as BIOS might
not have done it right or we skipped BIOS completely.

Using is_kdump_kernel() is also not perfect either but atleast better than
reset_devices.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-10 15:46         ` Vivek Goyal
@ 2014-09-11  9:02           ` Kashyap Desai
  2014-09-11 11:20             ` Tomas Henzl
  0 siblings, 1 reply; 14+ messages in thread
From: Kashyap Desai @ 2014-09-11  9:02 UTC (permalink / raw)
  To: Vivek Goyal, Tomas Henzl
  Cc: Elliott, Robert (Server Storage),
	Sumit Saxena, linux-scsi, martin.petersen, hch, jbottomley,
	aradford, Michal Schmidt, amirv, axboe, scameron

> -----Original Message-----
> From: Vivek Goyal [mailto:vgoyal@redhat.com]
> Sent: Wednesday, September 10, 2014 9:17 PM
> To: Tomas Henzl
> Cc: Elliott, Robert (Server Storage); Sumit Saxena;
linux-scsi@vger.kernel.org;
> martin.petersen@oracle.com; hch@infradead.org;
> jbottomley@parallels.com; Kashyap Desai; aradford@gmail.com; Michal
> Schmidt; amirv@mellanox.com; Jens Axboe <axboe@kernel.dk>
> (axboe@kernel.dk); scameron@beardog.cce.hp.com
> Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
> support
>
> On Wed, Sep 10, 2014 at 05:28:40PM +0200, Tomas Henzl wrote:
> > On 09/10/2014 05:06 PM, Elliott, Robert (Server Storage) wrote:
> > >> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> > >> owner@vger.kernel.org] On Behalf Of Sumit Saxena
> > >>
> > >>> From: Tomas Henzl [mailto:thenzl@redhat.com]
> > >>>
> > >>> With several controllers in a system this may take a lot memory,
> > >>> could you also in case when a kdump kernel is running lower it, by
> > >>> not using this feature?
> > >>>
> > >> Agreed, we will disable this feature for kdump kernel by adding
> > >> "reset_devices" global varaiable.
> > >> That check is required for only one place, throughout the code,
> > >> this feature will remain disabled.  Code snippet for the same-
> > >>
> > >>         instance->crash_dump_drv_support = (!reset_devices) &&
> > >> crashdump_enable &&
> > >>                                 instance->crash_dump_fw_support &&
> > >>                                 instance->crash_dump_buf);
> > >>         if(instance->crash_dump_drv_support) {
> > >>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
> > >> supported\n");
> > >>                 megasas_set_crash_dump_params(instance,
> > >> MR_CRASH_BUF_TURN_OFF);
> > >>
> > >>         } else {
> > >> ..
> > >>         }
> > > Network drivers have been running into similar problems.
> > >
> > > There's a new patch from Amir coming through net-next to make
> > > is_kdump_kernel() (in crash_dump.h) accessible to modules.
> > > That may be a better signal than reset_devices that the driver
> > > should use minimal resources.
> > >
> > > http://comments.gmane.org/gmane.linux.network/324737
> > >
> > > I'm not sure about the logistics of a SCSI patch depending on a
> > > net-next patch.
> >
> > Probably better to start with reset_devices and switch to
> > is_kdump_kernel() later.
> > This is not a discussion about reset_devices versus is_kdump_kernel,
> > but while it looks good to have it distinguished - is the
> > reset_devices actually used anywhere else than in kdump kernel?
>
> I think usage of reset_devices for lowering memory footprint of driver
is
> plain wrong. It tells driver to only reset the device as BIOS might not
have
> done it right or we skipped BIOS completely.
>
> Using is_kdump_kernel() is also not perfect either but atleast better
than
> reset_devices.

We will use is_kdump_kernel() not to enable this feature in Kdump kernel.

MegaRaid Driver will not allocate Host memory (which is used to collect
complete FW crash image) until and unless associated FW is crashed/IO
timeout.
Driver do allocation only if required. Also, it will break the function
immediately after memory allocation is failed and operate on available
memory.

To be on safer side, we can disable this feature in Kdump mode.

~ Kashyap
>
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-11  9:02           ` Kashyap Desai
@ 2014-09-11 11:20             ` Tomas Henzl
  2014-09-11 16:39               ` Kashyap Desai
  0 siblings, 1 reply; 14+ messages in thread
From: Tomas Henzl @ 2014-09-11 11:20 UTC (permalink / raw)
  To: Kashyap Desai, Vivek Goyal
  Cc: Elliott, Robert (Server Storage),
	Sumit Saxena, linux-scsi, martin.petersen, hch, jbottomley,
	aradford, Michal Schmidt, amirv, axboe, scameron

On 09/11/2014 11:02 AM, Kashyap Desai wrote:
>> -----Original Message-----
>> From: Vivek Goyal [mailto:vgoyal@redhat.com]
>> Sent: Wednesday, September 10, 2014 9:17 PM
>> To: Tomas Henzl
>> Cc: Elliott, Robert (Server Storage); Sumit Saxena;
> linux-scsi@vger.kernel.org;
>> martin.petersen@oracle.com; hch@infradead.org;
>> jbottomley@parallels.com; Kashyap Desai; aradford@gmail.com; Michal
>> Schmidt; amirv@mellanox.com; Jens Axboe <axboe@kernel.dk>
>> (axboe@kernel.dk); scameron@beardog.cce.hp.com
>> Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
>> support
>>
>> On Wed, Sep 10, 2014 at 05:28:40PM +0200, Tomas Henzl wrote:
>>> On 09/10/2014 05:06 PM, Elliott, Robert (Server Storage) wrote:
>>>>> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>>>>> owner@vger.kernel.org] On Behalf Of Sumit Saxena
>>>>>
>>>>>> From: Tomas Henzl [mailto:thenzl@redhat.com]
>>>>>>
>>>>>> With several controllers in a system this may take a lot memory,
>>>>>> could you also in case when a kdump kernel is running lower it, by
>>>>>> not using this feature?
>>>>>>
>>>>> Agreed, we will disable this feature for kdump kernel by adding
>>>>> "reset_devices" global varaiable.
>>>>> That check is required for only one place, throughout the code,
>>>>> this feature will remain disabled.  Code snippet for the same-
>>>>>
>>>>>         instance->crash_dump_drv_support = (!reset_devices) &&
>>>>> crashdump_enable &&
>>>>>                                 instance->crash_dump_fw_support &&
>>>>>                                 instance->crash_dump_buf);
>>>>>         if(instance->crash_dump_drv_support) {
>>>>>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
>>>>> supported\n");
>>>>>                 megasas_set_crash_dump_params(instance,
>>>>> MR_CRASH_BUF_TURN_OFF);
>>>>>
>>>>>         } else {
>>>>> ..
>>>>>         }
>>>> Network drivers have been running into similar problems.
>>>>
>>>> There's a new patch from Amir coming through net-next to make
>>>> is_kdump_kernel() (in crash_dump.h) accessible to modules.
>>>> That may be a better signal than reset_devices that the driver
>>>> should use minimal resources.
>>>>
>>>> http://comments.gmane.org/gmane.linux.network/324737
>>>>
>>>> I'm not sure about the logistics of a SCSI patch depending on a
>>>> net-next patch.
>>> Probably better to start with reset_devices and switch to
>>> is_kdump_kernel() later.
>>> This is not a discussion about reset_devices versus is_kdump_kernel,
>>> but while it looks good to have it distinguished - is the
>>> reset_devices actually used anywhere else than in kdump kernel?
>> I think usage of reset_devices for lowering memory footprint of driver
> is
>> plain wrong. It tells driver to only reset the device as BIOS might not
> have
>> done it right or we skipped BIOS completely.
>>
>> Using is_kdump_kernel() is also not perfect either but atleast better
> than
>> reset_devices.
> We will use is_kdump_kernel() not to enable this feature in Kdump kernel.

OK, just keep in mind that the is_kdump_kernel() is not yet in mainline.

>
> MegaRaid Driver will not allocate Host memory (which is used to collect
> complete FW crash image) until and unless associated FW is crashed/IO
> timeout.

This is new for the driver? A previous reply stated that the driver will
allocate 1MB for each MR controller at the 'start of the day'.
If I misunderstood it somehow then nothing is needed.

> Driver do allocation only if required. Also, it will break the function
> immediately after memory allocation is failed and operate on available
> memory.
>
> To be on safer side, we can disable this feature in Kdump mode.
>
> ~ Kashyap
>> Thanks
>> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-11 11:20             ` Tomas Henzl
@ 2014-09-11 16:39               ` Kashyap Desai
  2014-09-11 17:02                 ` Vivek Goyal
  2014-09-11 18:58                 ` Tomas Henzl
  0 siblings, 2 replies; 14+ messages in thread
From: Kashyap Desai @ 2014-09-11 16:39 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: Vivek Goyal, Elliott, Robert (Server Storage),
	Sumit Saxena, linux-scsi, Martin K. Petersen, Christoph Hellwig,
	jbottomley, aradford, Michal Schmidt, amirv, Jens Axboe,
	scameron

On Thu, Sep 11, 2014 at 4:50 PM, Tomas Henzl <thenzl@redhat.com> wrote:
> On 09/11/2014 11:02 AM, Kashyap Desai wrote:
>>> -----Original Message-----
>>> From: Vivek Goyal [mailto:vgoyal@redhat.com]
>>> Sent: Wednesday, September 10, 2014 9:17 PM
>>> To: Tomas Henzl
>>> Cc: Elliott, Robert (Server Storage); Sumit Saxena;
>> linux-scsi@vger.kernel.org;
>>> martin.petersen@oracle.com; hch@infradead.org;
>>> jbottomley@parallels.com; Kashyap Desai; aradford@gmail.com; Michal
>>> Schmidt; amirv@mellanox.com; Jens Axboe <axboe@kernel.dk>
>>> (axboe@kernel.dk); scameron@beardog.cce.hp.com
>>> Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
>>> support
>>>
>>> On Wed, Sep 10, 2014 at 05:28:40PM +0200, Tomas Henzl wrote:
>>>> On 09/10/2014 05:06 PM, Elliott, Robert (Server Storage) wrote:
>>>>>> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>>>>>> owner@vger.kernel.org] On Behalf Of Sumit Saxena
>>>>>>
>>>>>>> From: Tomas Henzl [mailto:thenzl@redhat.com]
>>>>>>>
>>>>>>> With several controllers in a system this may take a lot memory,
>>>>>>> could you also in case when a kdump kernel is running lower it, by
>>>>>>> not using this feature?
>>>>>>>
>>>>>> Agreed, we will disable this feature for kdump kernel by adding
>>>>>> "reset_devices" global varaiable.
>>>>>> That check is required for only one place, throughout the code,
>>>>>> this feature will remain disabled.  Code snippet for the same-
>>>>>>
>>>>>>         instance->crash_dump_drv_support = (!reset_devices) &&
>>>>>> crashdump_enable &&
>>>>>>                                 instance->crash_dump_fw_support &&
>>>>>>                                 instance->crash_dump_buf);
>>>>>>         if(instance->crash_dump_drv_support) {
>>>>>>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
>>>>>> supported\n");
>>>>>>                 megasas_set_crash_dump_params(instance,
>>>>>> MR_CRASH_BUF_TURN_OFF);
>>>>>>
>>>>>>         } else {
>>>>>> ..
>>>>>>         }
>>>>> Network drivers have been running into similar problems.
>>>>>
>>>>> There's a new patch from Amir coming through net-next to make
>>>>> is_kdump_kernel() (in crash_dump.h) accessible to modules.
>>>>> That may be a better signal than reset_devices that the driver
>>>>> should use minimal resources.
>>>>>
>>>>> http://comments.gmane.org/gmane.linux.network/324737
>>>>>
>>>>> I'm not sure about the logistics of a SCSI patch depending on a
>>>>> net-next patch.
>>>> Probably better to start with reset_devices and switch to
>>>> is_kdump_kernel() later.
>>>> This is not a discussion about reset_devices versus is_kdump_kernel,
>>>> but while it looks good to have it distinguished - is the
>>>> reset_devices actually used anywhere else than in kdump kernel?
>>> I think usage of reset_devices for lowering memory footprint of driver
>> is
>>> plain wrong. It tells driver to only reset the device as BIOS might not
>> have
>>> done it right or we skipped BIOS completely.
>>>
>>> Using is_kdump_kernel() is also not perfect either but atleast better
>> than
>>> reset_devices.
>> We will use is_kdump_kernel() not to enable this feature in Kdump kernel.
>
> OK, just keep in mind that the is_kdump_kernel() is not yet in mainline.

Sure I read that part, but when I check kernel source I found
is_kdump_kernel() is available.

http://lxr.free-electrons.com/source/include/linux/crash_dump.h#L55
Let's do one thing - We will submit a separate patch for this to avoid
any confusion.


>
>>
>> MegaRaid Driver will not allocate Host memory (which is used to collect
>> complete FW crash image) until and unless associated FW is crashed/IO
>> timeout.
>
> This is new for the driver? A previous reply stated that the driver will
> allocate 1MB for each MR controller at the 'start of the day'.
> If I misunderstood it somehow then nothing is needed.

This is correct. 1MB DMA buffer per controller will be allocated
irrespective of FW is crashed or not.
When FW actually crash driver will go and try to allocate upto 512MB
memory. I though you queried for 512 MB memory.

>
>> Driver do allocation only if required. Also, it will break the function
>> immediately after memory allocation is failed and operate on available
>> memory.
>>
>> To be on safer side, we can disable this feature in Kdump mode.
>>
>> ~ Kashyap
>>> Thanks
>>> Vivek
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Device Driver Developer @ Avagotech
Kashyap D. Desai
Note - my new email address
kashyap.desai@avagotech.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-11 16:39               ` Kashyap Desai
@ 2014-09-11 17:02                 ` Vivek Goyal
  2014-09-11 18:58                 ` Tomas Henzl
  1 sibling, 0 replies; 14+ messages in thread
From: Vivek Goyal @ 2014-09-11 17:02 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Tomas Henzl, Elliott, Robert (Server Storage),
	Sumit Saxena, linux-scsi, Martin K. Petersen, Christoph Hellwig,
	jbottomley, aradford, Michal Schmidt, amirv, Jens Axboe,
	scameron

On Thu, Sep 11, 2014 at 10:09:27PM +0530, Kashyap Desai wrote:

[..]
> > OK, just keep in mind that the is_kdump_kernel() is not yet in mainline.
> 
> Sure I read that part, but when I check kernel source I found
> is_kdump_kernel() is available.
> 
> http://lxr.free-electrons.com/source/include/linux/crash_dump.h#L55
> Let's do one thing - We will submit a separate patch for this to avoid
> any confusion.

is_kdump_kernel() has been there for long time just that it was not
exported. Now a patch is sitting in dave miller's tree to export it
and make it available to drivers.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-11 16:39               ` Kashyap Desai
  2014-09-11 17:02                 ` Vivek Goyal
@ 2014-09-11 18:58                 ` Tomas Henzl
  2014-09-11 19:11                   ` Kashyap Desai
  1 sibling, 1 reply; 14+ messages in thread
From: Tomas Henzl @ 2014-09-11 18:58 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Vivek Goyal, Elliott, Robert (Server Storage),
	Sumit Saxena, linux-scsi, Martin K. Petersen, Christoph Hellwig,
	jbottomley, aradford, Michal Schmidt, amirv, Jens Axboe,
	scameron

On 09/11/2014 06:39 PM, Kashyap Desai wrote:
> On Thu, Sep 11, 2014 at 4:50 PM, Tomas Henzl <thenzl@redhat.com> wrote:
>> On 09/11/2014 11:02 AM, Kashyap Desai wrote:
>>>> -----Original Message-----
>>>> From: Vivek Goyal [mailto:vgoyal@redhat.com]
>>>> Sent: Wednesday, September 10, 2014 9:17 PM
>>>> To: Tomas Henzl
>>>> Cc: Elliott, Robert (Server Storage); Sumit Saxena;
>>> linux-scsi@vger.kernel.org;
>>>> martin.petersen@oracle.com; hch@infradead.org;
>>>> jbottomley@parallels.com; Kashyap Desai; aradford@gmail.com; Michal
>>>> Schmidt; amirv@mellanox.com; Jens Axboe <axboe@kernel.dk>
>>>> (axboe@kernel.dk); scameron@beardog.cce.hp.com
>>>> Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
>>>> support
>>>>
>>>> On Wed, Sep 10, 2014 at 05:28:40PM +0200, Tomas Henzl wrote:
>>>>> On 09/10/2014 05:06 PM, Elliott, Robert (Server Storage) wrote:
>>>>>>> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>>>>>>> owner@vger.kernel.org] On Behalf Of Sumit Saxena
>>>>>>>
>>>>>>>> From: Tomas Henzl [mailto:thenzl@redhat.com]
>>>>>>>>
>>>>>>>> With several controllers in a system this may take a lot memory,
>>>>>>>> could you also in case when a kdump kernel is running lower it, by
>>>>>>>> not using this feature?
>>>>>>>>
>>>>>>> Agreed, we will disable this feature for kdump kernel by adding
>>>>>>> "reset_devices" global varaiable.
>>>>>>> That check is required for only one place, throughout the code,
>>>>>>> this feature will remain disabled.  Code snippet for the same-
>>>>>>>
>>>>>>>         instance->crash_dump_drv_support = (!reset_devices) &&
>>>>>>> crashdump_enable &&
>>>>>>>                                 instance->crash_dump_fw_support &&
>>>>>>>                                 instance->crash_dump_buf);
>>>>>>>         if(instance->crash_dump_drv_support) {
>>>>>>>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
>>>>>>> supported\n");
>>>>>>>                 megasas_set_crash_dump_params(instance,
>>>>>>> MR_CRASH_BUF_TURN_OFF);
>>>>>>>
>>>>>>>         } else {
>>>>>>> ..
>>>>>>>         }
>>>>>> Network drivers have been running into similar problems.
>>>>>>
>>>>>> There's a new patch from Amir coming through net-next to make
>>>>>> is_kdump_kernel() (in crash_dump.h) accessible to modules.
>>>>>> That may be a better signal than reset_devices that the driver
>>>>>> should use minimal resources.
>>>>>>
>>>>>> http://comments.gmane.org/gmane.linux.network/324737
>>>>>>
>>>>>> I'm not sure about the logistics of a SCSI patch depending on a
>>>>>> net-next patch.
>>>>> Probably better to start with reset_devices and switch to
>>>>> is_kdump_kernel() later.
>>>>> This is not a discussion about reset_devices versus is_kdump_kernel,
>>>>> but while it looks good to have it distinguished - is the
>>>>> reset_devices actually used anywhere else than in kdump kernel?
>>>> I think usage of reset_devices for lowering memory footprint of driver
>>> is
>>>> plain wrong. It tells driver to only reset the device as BIOS might not
>>> have
>>>> done it right or we skipped BIOS completely.
>>>>
>>>> Using is_kdump_kernel() is also not perfect either but atleast better
>>> than
>>>> reset_devices.
>>> We will use is_kdump_kernel() not to enable this feature in Kdump kernel.
>> OK, just keep in mind that the is_kdump_kernel() is not yet in mainline.
> Sure I read that part, but when I check kernel source I found
> is_kdump_kernel() is available.
>
> http://lxr.free-electrons.com/source/include/linux/crash_dump.h#L55
> Let's do one thing - We will submit a separate patch for this to avoid
> any confusion.
>
>
>>> MegaRaid Driver will not allocate Host memory (which is used to collect
>>> complete FW crash image) until and unless associated FW is crashed/IO
>>> timeout.
>> This is new for the driver? A previous reply stated that the driver will
>> allocate 1MB for each MR controller at the 'start of the day'.
>> If I misunderstood it somehow then nothing is needed.
> This is correct. 1MB DMA buffer per controller will be allocated
> irrespective of FW is crashed or not.
> When FW actually crash driver will go and try to allocate upto 512MB
> memory. I though you queried for 512 MB memory.

Memory allocated for a crashkernel depends on system configuration,
values below 100MB are often used - the less the better.
Every MB counts.

>
>>> Driver do allocation only if required. Also, it will break the function
>>> immediately after memory allocation is failed and operate on available
>>> memory.
>>>
>>> To be on safer side, we can disable this feature in Kdump mode.
>>>
>>> ~ Kashyap
>>>> Thanks
>>>> Vivek
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support
  2014-09-11 18:58                 ` Tomas Henzl
@ 2014-09-11 19:11                   ` Kashyap Desai
  0 siblings, 0 replies; 14+ messages in thread
From: Kashyap Desai @ 2014-09-11 19:11 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: Vivek Goyal, Elliott, Robert (Server Storage),
	Sumit Saxena, linux-scsi, Martin K. Petersen, Christoph Hellwig,
	jbottomley, aradford, Michal Schmidt, amirv, Jens Axboe,
	scameron

On Fri, Sep 12, 2014 at 12:28 AM, Tomas Henzl <thenzl@redhat.com> wrote:
> On 09/11/2014 06:39 PM, Kashyap Desai wrote:
>> On Thu, Sep 11, 2014 at 4:50 PM, Tomas Henzl <thenzl@redhat.com> wrote:
>>> On 09/11/2014 11:02 AM, Kashyap Desai wrote:
>>>>> -----Original Message-----
>>>>> From: Vivek Goyal [mailto:vgoyal@redhat.com]
>>>>> Sent: Wednesday, September 10, 2014 9:17 PM
>>>>> To: Tomas Henzl
>>>>> Cc: Elliott, Robert (Server Storage); Sumit Saxena;
>>>> linux-scsi@vger.kernel.org;
>>>>> martin.petersen@oracle.com; hch@infradead.org;
>>>>> jbottomley@parallels.com; Kashyap Desai; aradford@gmail.com; Michal
>>>>> Schmidt; amirv@mellanox.com; Jens Axboe <axboe@kernel.dk>
>>>>> (axboe@kernel.dk); scameron@beardog.cce.hp.com
>>>>> Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature
>>>>> support
>>>>>
>>>>> On Wed, Sep 10, 2014 at 05:28:40PM +0200, Tomas Henzl wrote:
>>>>>> On 09/10/2014 05:06 PM, Elliott, Robert (Server Storage) wrote:
>>>>>>>> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>>>>>>>> owner@vger.kernel.org] On Behalf Of Sumit Saxena
>>>>>>>>
>>>>>>>>> From: Tomas Henzl [mailto:thenzl@redhat.com]
>>>>>>>>>
>>>>>>>>> With several controllers in a system this may take a lot memory,
>>>>>>>>> could you also in case when a kdump kernel is running lower it, by
>>>>>>>>> not using this feature?
>>>>>>>>>
>>>>>>>> Agreed, we will disable this feature for kdump kernel by adding
>>>>>>>> "reset_devices" global varaiable.
>>>>>>>> That check is required for only one place, throughout the code,
>>>>>>>> this feature will remain disabled.  Code snippet for the same-
>>>>>>>>
>>>>>>>>         instance->crash_dump_drv_support = (!reset_devices) &&
>>>>>>>> crashdump_enable &&
>>>>>>>>                                 instance->crash_dump_fw_support &&
>>>>>>>>                                 instance->crash_dump_buf);
>>>>>>>>         if(instance->crash_dump_drv_support) {
>>>>>>>>                 printk(KERN_INFO "megaraid_sas: FW Crash dump is
>>>>>>>> supported\n");
>>>>>>>>                 megasas_set_crash_dump_params(instance,
>>>>>>>> MR_CRASH_BUF_TURN_OFF);
>>>>>>>>
>>>>>>>>         } else {
>>>>>>>> ..
>>>>>>>>         }
>>>>>>> Network drivers have been running into similar problems.
>>>>>>>
>>>>>>> There's a new patch from Amir coming through net-next to make
>>>>>>> is_kdump_kernel() (in crash_dump.h) accessible to modules.
>>>>>>> That may be a better signal than reset_devices that the driver
>>>>>>> should use minimal resources.
>>>>>>>
>>>>>>> http://comments.gmane.org/gmane.linux.network/324737
>>>>>>>
>>>>>>> I'm not sure about the logistics of a SCSI patch depending on a
>>>>>>> net-next patch.
>>>>>> Probably better to start with reset_devices and switch to
>>>>>> is_kdump_kernel() later.
>>>>>> This is not a discussion about reset_devices versus is_kdump_kernel,
>>>>>> but while it looks good to have it distinguished - is the
>>>>>> reset_devices actually used anywhere else than in kdump kernel?
>>>>> I think usage of reset_devices for lowering memory footprint of driver
>>>> is
>>>>> plain wrong. It tells driver to only reset the device as BIOS might not
>>>> have
>>>>> done it right or we skipped BIOS completely.
>>>>>
>>>>> Using is_kdump_kernel() is also not perfect either but atleast better
>>>> than
>>>>> reset_devices.
>>>> We will use is_kdump_kernel() not to enable this feature in Kdump kernel.
>>> OK, just keep in mind that the is_kdump_kernel() is not yet in mainline.
>> Sure I read that part, but when I check kernel source I found
>> is_kdump_kernel() is available.
>>
>> http://lxr.free-electrons.com/source/include/linux/crash_dump.h#L55
>> Let's do one thing - We will submit a separate patch for this to avoid
>> any confusion.
>>
>>
>>>> MegaRaid Driver will not allocate Host memory (which is used to collect
>>>> complete FW crash image) until and unless associated FW is crashed/IO
>>>> timeout.
>>> This is new for the driver? A previous reply stated that the driver will
>>> allocate 1MB for each MR controller at the 'start of the day'.
>>> If I misunderstood it somehow then nothing is needed.
>> This is correct. 1MB DMA buffer per controller will be allocated
>> irrespective of FW is crashed or not.
>> When FW actually crash driver will go and try to allocate upto 512MB
>> memory. I though you queried for 512 MB memory.
>
> Memory allocated for a crashkernel depends on system configuration,
> values below 100MB are often used - the less the better.
> Every MB counts.

It is true that we don't need full feature set in kdump mode and as
specially FW crash dump feature is better to disable in kdump mode.
Couple of places megaraid_sas driver use  reset_devices variable, so
will continue to use that and will resend patch
Later when we have better API like "is_kdump_kernel()" we can replace
in whole megaraid_sas driver.

>
>>
>>>> Driver do allocation only if required. Also, it will break the function
>>>> immediately after memory allocation is failed and operate on available
>>>> memory.
>>>>
>>>> To be on safer side, we can disable this feature in Kdump mode.
>>>>
>>>> ~ Kashyap
>>>>> Thanks
>>>>> Vivek
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>



-- 
Device Driver Developer @ Avagotech
Kashyap D. Desai
Note - my new email address
kashyap.desai@avagotech.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-11 19:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-06 13:25 [PATCH 04/11] megaraid_sas : Firmware crash dump feature support Sumit.Saxena
2014-09-09 15:54 ` Tomas Henzl
2014-09-09 16:18   ` Elliott, Robert (Server Storage)
2014-09-10 10:08     ` Tomas Henzl
2014-09-10 12:12   ` Sumit Saxena
2014-09-10 15:06     ` Elliott, Robert (Server Storage)
2014-09-10 15:28       ` Tomas Henzl
2014-09-10 15:46         ` Vivek Goyal
2014-09-11  9:02           ` Kashyap Desai
2014-09-11 11:20             ` Tomas Henzl
2014-09-11 16:39               ` Kashyap Desai
2014-09-11 17:02                 ` Vivek Goyal
2014-09-11 18:58                 ` Tomas Henzl
2014-09-11 19:11                   ` Kashyap Desai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.