All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] GenWQE: patches to improve RAS features
@ 2014-06-04 13:57 Kleber Sacilotto de Souza
  2014-06-04 13:57 ` [PATCH 1/4] GenWQE: Add sysfs interface for bitstream reload Kleber Sacilotto de Souza
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Kleber Sacilotto de Souza @ 2014-06-04 13:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: haver, gregkh, Kleber Sacilotto de Souza

Hi,

This is a patch series for the IBM GenWQE driver to improve/implement RAS
features and increase the driver version number.


Kleber Sacilotto de Souza (4):
  GenWQE: Add sysfs interface for bitstream reload
  GenWQE: Add support for EEH error recovery
  GenWQE: Improve hardware error recovery
  GenWQE: Increase driver version number

 Documentation/ABI/testing/sysfs-driver-genwqe |    9 +
 drivers/misc/genwqe/Kconfig                   |    6 +
 drivers/misc/genwqe/card_base.c               |  214 +++++++++++++++++++++++--
 drivers/misc/genwqe/card_base.h               |    2 +
 drivers/misc/genwqe/card_ddcb.c               |   24 +++-
 drivers/misc/genwqe/card_debugfs.c            |    7 +
 drivers/misc/genwqe/card_dev.c                |    5 +
 drivers/misc/genwqe/card_sysfs.c              |   25 +++
 drivers/misc/genwqe/card_utils.c              |   10 ++
 drivers/misc/genwqe/genwqe_driver.h           |    2 +-
 include/uapi/linux/genwqe/genwqe_card.h       |    1 +
 11 files changed, 286 insertions(+), 19 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] GenWQE: Add sysfs interface for bitstream reload
  2014-06-04 13:57 [PATCH 0/4] GenWQE: patches to improve RAS features Kleber Sacilotto de Souza
@ 2014-06-04 13:57 ` Kleber Sacilotto de Souza
  2014-06-04 14:05   ` Frank Haverkamp
  2014-06-04 13:57 ` [PATCH 2/4] GenWQE: Add support for EEH error recovery Kleber Sacilotto de Souza
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Kleber Sacilotto de Souza @ 2014-06-04 13:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: haver, gregkh, Kleber Sacilotto de Souza

This patch adds an interface on sysfs for userspace to request a card
bitstream reload. It sets the appropriate register and try to perform a
fundamental reset on the PCIe slot for the card to reload the bitstream
from the chosen partition.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
---
 Documentation/ABI/testing/sysfs-driver-genwqe |    9 +++
 drivers/misc/genwqe/card_base.c               |   90 +++++++++++++++++++++++++
 drivers/misc/genwqe/card_sysfs.c              |   25 +++++++
 include/uapi/linux/genwqe/genwqe_card.h       |    1 +
 4 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-genwqe b/Documentation/ABI/testing/sysfs-driver-genwqe
index 1870737..64ac6d5 100644
--- a/Documentation/ABI/testing/sysfs-driver-genwqe
+++ b/Documentation/ABI/testing/sysfs-driver-genwqe
@@ -25,6 +25,15 @@ Date:           Oct 2013
 Contact:        haver@linux.vnet.ibm.com
 Description:    Interface to set the next bitstream to be used.
 
+What:           /sys/class/genwqe/genwqe<n>_card/reload_bitstream
+Date:           May 2014
+Contact:        klebers@linux.vnet.ibm.com
+Description:    Interface to trigger a PCIe card reset to reload the bitstream.
+                  sudo sh -c 'echo 1 > \
+                    /sys/class/genwqe/genwqe0_card/reload_bitstream'
+                If successfully, the card will come back with the bitstream set
+                on 'next_bitstream'.
+
 What:           /sys/class/genwqe/genwqe<n>_card/tempsens
 Date:           Oct 2013
 Contact:        haver@linux.vnet.ibm.com
diff --git a/drivers/misc/genwqe/card_base.c b/drivers/misc/genwqe/card_base.c
index 74d51c9..e6cc3e1 100644
--- a/drivers/misc/genwqe/card_base.c
+++ b/drivers/misc/genwqe/card_base.c
@@ -761,6 +761,89 @@ static u64 genwqe_fir_checking(struct genwqe_dev *cd)
 }
 
 /**
+ * genwqe_pci_fundamental_reset() - trigger a PCIe fundamental reset on the slot
+ *
+ * Note: pci_set_pcie_reset_state() is not implemented on all archs, so this
+ * reset method will not work in all cases.
+ *
+ * Return: 0 on success or error code from pci_set_pcie_reset_state()
+ */
+static int genwqe_pci_fundamental_reset(struct pci_dev *pci_dev)
+{
+	int rc;
+
+	/*
+	 * lock pci config space access from userspace,
+	 * save state and issue PCIe fundamental reset
+	 */
+	pci_cfg_access_lock(pci_dev);
+	pci_save_state(pci_dev);
+	rc = pci_set_pcie_reset_state(pci_dev, pcie_warm_reset);
+	if (!rc) {
+		/* keep PCIe reset asserted for 250ms */
+		msleep(250);
+		pci_set_pcie_reset_state(pci_dev, pcie_deassert_reset);
+		/* Wait for 2s to reload flash and train the link */
+		msleep(2000);
+	}
+	pci_restore_state(pci_dev);
+	pci_cfg_access_unlock(pci_dev);
+	return rc;
+}
+
+/*
+ * genwqe_reload_bistream() - reload card bitstream
+ *
+ * Set the appropriate register and call fundamental reset to reaload the card
+ * bitstream.
+ *
+ * Return: 0 on success, error code otherwise
+ */
+static int genwqe_reload_bistream(struct genwqe_dev *cd)
+{
+	struct pci_dev *pci_dev = cd->pci_dev;
+	int rc;
+
+	dev_info(&pci_dev->dev,
+		 "[%s] resetting card for bitstream reload\n",
+		 __func__);
+
+	genwqe_stop(cd);
+
+	/*
+	 * Cause a CPLD reprogram with the 'next_bitstream'
+	 * partition on PCIe hot or fundamental reset
+	 */
+	__genwqe_writeq(cd, IO_SLC_CFGREG_SOFTRESET,
+			(cd->softreset & 0xcull) | 0x70ull);
+
+	rc = genwqe_pci_fundamental_reset(pci_dev);
+	if (rc) {
+		/*
+		 * A fundamental reset failure can be caused
+		 * by lack of support on the arch, so we just
+		 * log the error and try to start the card
+		 * again.
+		 */
+		dev_err(&pci_dev->dev,
+			"[%s] err: failed to reset card for bitstream reload\n",
+			__func__);
+	}
+
+	rc = genwqe_start(cd);
+	if (rc) {
+		dev_err(&pci_dev->dev,
+			"[%s] err: cannot start card services! (err=%d)\n",
+			__func__, rc);
+		return rc;
+	}
+	dev_info(&pci_dev->dev,
+		 "[%s] card reloaded\n", __func__);
+	return 0;
+}
+
+
+/**
  * genwqe_health_thread() - Health checking thread
  *
  * This thread is only started for the PF of the card.
@@ -846,6 +929,13 @@ static int genwqe_health_thread(void *data)
 			}
 		}
 
+		if (cd->card_state == GENWQE_CARD_RELOAD_BITSTREAM) {
+			/* Userspace requested card bitstream reload */
+			rc = genwqe_reload_bistream(cd);
+			if (rc)
+				goto fatal_error;
+		}
+
 		cd->last_gfir = gfir;
 		cond_resched();
 	}
diff --git a/drivers/misc/genwqe/card_sysfs.c b/drivers/misc/genwqe/card_sysfs.c
index a72a992..7232e40 100644
--- a/drivers/misc/genwqe/card_sysfs.c
+++ b/drivers/misc/genwqe/card_sysfs.c
@@ -223,6 +223,30 @@ static ssize_t next_bitstream_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(next_bitstream);
 
+static ssize_t reload_bitstream_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	int reload;
+	struct genwqe_dev *cd = dev_get_drvdata(dev);
+
+	if (kstrtoint(buf, 0, &reload) < 0)
+		return -EINVAL;
+
+	if (reload == 0x1) {
+		if (cd->card_state == GENWQE_CARD_UNUSED ||
+		    cd->card_state == GENWQE_CARD_USED)
+			cd->card_state = GENWQE_CARD_RELOAD_BITSTREAM;
+		else
+			return -EIO;
+	} else {
+		return -EINVAL;
+	}
+
+	return count;
+}
+static DEVICE_ATTR_WO(reload_bitstream);
+
 /*
  * Create device_attribute structures / params: name, mode, show, store
  * additional flag if valid in VF
@@ -239,6 +263,7 @@ static struct attribute *genwqe_attributes[] = {
 	&dev_attr_status.attr,
 	&dev_attr_freerunning_timer.attr,
 	&dev_attr_queue_working_time.attr,
+	&dev_attr_reload_bitstream.attr,
 	NULL,
 };
 
diff --git a/include/uapi/linux/genwqe/genwqe_card.h b/include/uapi/linux/genwqe/genwqe_card.h
index 795e957..4fc065f 100644
--- a/include/uapi/linux/genwqe/genwqe_card.h
+++ b/include/uapi/linux/genwqe/genwqe_card.h
@@ -328,6 +328,7 @@ enum genwqe_card_state {
 	GENWQE_CARD_UNUSED = 0,
 	GENWQE_CARD_USED = 1,
 	GENWQE_CARD_FATAL_ERROR = 2,
+	GENWQE_CARD_RELOAD_BITSTREAM = 3,
 	GENWQE_CARD_STATE_MAX,
 };
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] GenWQE: Add support for EEH error recovery
  2014-06-04 13:57 [PATCH 0/4] GenWQE: patches to improve RAS features Kleber Sacilotto de Souza
  2014-06-04 13:57 ` [PATCH 1/4] GenWQE: Add sysfs interface for bitstream reload Kleber Sacilotto de Souza
@ 2014-06-04 13:57 ` Kleber Sacilotto de Souza
  2014-06-04 14:06   ` Frank Haverkamp
  2014-06-04 13:57 ` [PATCH 3/4] GenWQE: Improve hardware " Kleber Sacilotto de Souza
  2014-06-04 13:57 ` [PATCH 4/4] GenWQE: Increase driver version number Kleber Sacilotto de Souza
  3 siblings, 1 reply; 15+ messages in thread
From: Kleber Sacilotto de Souza @ 2014-06-04 13:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: haver, gregkh, Kleber Sacilotto de Souza

This patch implements the callbacks and functions necessary to have EEH
recovery support.

It adds a config option to enable or disable explicit calls to trigger
platform specific mechanisms on error recovery paths. This option is
enabled by default only on PPC64 systems and can be overritten via
debugfs. If this option is enabled, on the error recovery path the
driver will call pci_channel_offline() to check for error condition and
issue non-raw MMIO reads to trigger early EEH detection in case of
hardware failures. This is necessary since the driver MMIO helper
funtions use raw accessors.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
---
 drivers/misc/genwqe/Kconfig        |    6 +++
 drivers/misc/genwqe/card_base.c    |   79 ++++++++++++++++++++++++++++++------
 drivers/misc/genwqe/card_base.h    |    2 +
 drivers/misc/genwqe/card_ddcb.c    |   24 +++++++++--
 drivers/misc/genwqe/card_debugfs.c |    7 +++
 drivers/misc/genwqe/card_dev.c     |    5 ++
 drivers/misc/genwqe/card_utils.c   |   10 +++++
 7 files changed, 115 insertions(+), 18 deletions(-)

diff --git a/drivers/misc/genwqe/Kconfig b/drivers/misc/genwqe/Kconfig
index 6069d8c..4c0a033 100644
--- a/drivers/misc/genwqe/Kconfig
+++ b/drivers/misc/genwqe/Kconfig
@@ -11,3 +11,9 @@ menuconfig GENWQE
 	  Enables PCIe card driver for IBM GenWQE accelerators.
 	  The user-space interface is described in
 	  include/linux/genwqe/genwqe_card.h.
+
+config GENWQE_PLATFORM_ERROR_RECOVERY
+	int "Use platform recovery procedures (0=off, 1=on)"
+	depends on GENWQE
+	default 1 if PPC64
+	default 0
diff --git a/drivers/misc/genwqe/card_base.c b/drivers/misc/genwqe/card_base.c
index e6cc3e1..87ebaba 100644
--- a/drivers/misc/genwqe/card_base.c
+++ b/drivers/misc/genwqe/card_base.c
@@ -140,6 +140,12 @@ static struct genwqe_dev *genwqe_dev_alloc(void)
 	cd->class_genwqe = class_genwqe;
 	cd->debugfs_genwqe = debugfs_genwqe;
 
+	/*
+	 * This comes from kernel config option and can be overritten via
+	 * debugfs.
+	 */
+	cd->use_platform_recovery = CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY;
+
 	init_waitqueue_head(&cd->queue_waitq);
 
 	spin_lock_init(&cd->file_lock);
@@ -943,6 +949,19 @@ static int genwqe_health_thread(void *data)
 	return 0;
 
  fatal_error:
+	if (cd->use_platform_recovery) {
+		/*
+		 * Since we use raw accessors, EEH errors won't be detected
+		 * by the platform until we do a non-raw MMIO or config space
+		 * read
+		 */
+		readq(cd->mmio + IO_SLC_CFGREG_GFIR);
+
+		/* We do nothing if the card is going over PCI recovery */
+		if (pci_channel_offline(pci_dev))
+			return -EIO;
+	}
+
 	dev_err(&pci_dev->dev,
 		"[%s] card unusable. Please trigger unbind!\n", __func__);
 
@@ -1048,6 +1067,9 @@ static int genwqe_pci_setup(struct genwqe_dev *cd)
 	pci_set_master(pci_dev);
 	pci_enable_pcie_error_reporting(pci_dev);
 
+	/* EEH recovery requires PCIe fundamental reset */
+	pci_dev->needs_freset = 1;
+
 	/* request complete BAR-0 space (length = 0) */
 	cd->mmio_len = pci_resource_len(pci_dev, 0);
 	cd->mmio = pci_iomap(pci_dev, 0, 0);
@@ -1186,23 +1208,40 @@ static pci_ers_result_t genwqe_err_error_detected(struct pci_dev *pci_dev,
 
 	dev_err(&pci_dev->dev, "[%s] state=%d\n", __func__, state);
 
-	if (pci_dev == NULL)
-		return PCI_ERS_RESULT_NEED_RESET;
-
 	cd = dev_get_drvdata(&pci_dev->dev);
 	if (cd == NULL)
-		return PCI_ERS_RESULT_NEED_RESET;
+		return PCI_ERS_RESULT_DISCONNECT;
 
-	switch (state) {
-	case pci_channel_io_normal:
-		return PCI_ERS_RESULT_CAN_RECOVER;
-	case pci_channel_io_frozen:
-		return PCI_ERS_RESULT_NEED_RESET;
-	case pci_channel_io_perm_failure:
+	/* Stop the card */
+	genwqe_health_check_stop(cd);
+	genwqe_stop(cd);
+
+	/*
+	 * On permanent failure, the PCI code will call device remove
+	 * after the return of this function.
+	 * genwqe_stop() can be called twice.
+	 */
+	if (state == pci_channel_io_perm_failure) {
 		return PCI_ERS_RESULT_DISCONNECT;
+	} else {
+		genwqe_pci_remove(cd);
+		return PCI_ERS_RESULT_NEED_RESET;
 	}
+}
+
+static pci_ers_result_t genwqe_err_slot_reset(struct pci_dev *pci_dev)
+{
+	int rc;
+	struct genwqe_dev *cd = dev_get_drvdata(&pci_dev->dev);
 
-	return PCI_ERS_RESULT_NEED_RESET;
+	rc = genwqe_pci_setup(cd);
+	if (!rc) {
+		return PCI_ERS_RESULT_RECOVERED;
+	} else {
+		dev_err(&pci_dev->dev,
+			"err: problems with PCI setup (err=%d)\n", rc);
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
 }
 
 static pci_ers_result_t genwqe_err_result_none(struct pci_dev *dev)
@@ -1210,8 +1249,22 @@ static pci_ers_result_t genwqe_err_result_none(struct pci_dev *dev)
 	return PCI_ERS_RESULT_NONE;
 }
 
-static void genwqe_err_resume(struct pci_dev *dev)
+static void genwqe_err_resume(struct pci_dev *pci_dev)
 {
+	int rc;
+	struct genwqe_dev *cd = dev_get_drvdata(&pci_dev->dev);
+
+	rc = genwqe_start(cd);
+	if (!rc) {
+		rc = genwqe_health_check_start(cd);
+		if (rc)
+			dev_err(&pci_dev->dev,
+				"err: cannot start health checking! (err=%d)\n",
+				rc);
+	} else {
+		dev_err(&pci_dev->dev,
+			"err: cannot start card services! (err=%d)\n", rc);
+	}
 }
 
 static int genwqe_sriov_configure(struct pci_dev *dev, int numvfs)
@@ -1234,7 +1287,7 @@ static struct pci_error_handlers genwqe_err_handler = {
 	.error_detected = genwqe_err_error_detected,
 	.mmio_enabled	= genwqe_err_result_none,
 	.link_reset	= genwqe_err_result_none,
-	.slot_reset	= genwqe_err_result_none,
+	.slot_reset	= genwqe_err_slot_reset,
 	.resume		= genwqe_err_resume,
 };
 
diff --git a/drivers/misc/genwqe/card_base.h b/drivers/misc/genwqe/card_base.h
index 0e608a2..67abd8c 100644
--- a/drivers/misc/genwqe/card_base.h
+++ b/drivers/misc/genwqe/card_base.h
@@ -291,6 +291,8 @@ struct genwqe_dev {
 	struct task_struct *health_thread;
 	wait_queue_head_t health_waitq;
 
+	int use_platform_recovery;	/* use platform recovery mechanisms */
+
 	/* char device */
 	dev_t  devnum_genwqe;		/* major/minor num card */
 	struct class *class_genwqe;	/* reference to class object */
diff --git a/drivers/misc/genwqe/card_ddcb.c b/drivers/misc/genwqe/card_ddcb.c
index c8046db..f0de615 100644
--- a/drivers/misc/genwqe/card_ddcb.c
+++ b/drivers/misc/genwqe/card_ddcb.c
@@ -1118,7 +1118,21 @@ static irqreturn_t genwqe_pf_isr(int irq, void *dev_id)
 	 * safer, but slower for the good-case ... See above.
 	 */
 	gfir = __genwqe_readq(cd, IO_SLC_CFGREG_GFIR);
-	if ((gfir & GFIR_ERR_TRIGGER) != 0x0) {
+	if (((gfir & GFIR_ERR_TRIGGER) != 0x0) &&
+	    !pci_channel_offline(pci_dev)) {
+
+		if (cd->use_platform_recovery) {
+			/*
+			 * Since we use raw accessors, EEH errors won't be
+			 * detected by the platform until we do a non-raw
+			 * MMIO or config space read
+			 */
+			readq(cd->mmio + IO_SLC_CFGREG_GFIR);
+
+			/* Don't do anything if the PCI channel is frozen */
+			if (pci_channel_offline(pci_dev))
+				goto exit;
+		}
 
 		wake_up_interruptible(&cd->health_waitq);
 
@@ -1126,12 +1140,12 @@ static irqreturn_t genwqe_pf_isr(int irq, void *dev_id)
 		 * By default GFIRs causes recovery actions. This
 		 * count is just for debug when recovery is masked.
 		 */
-		printk_ratelimited(KERN_ERR
-				   "%s %s: [%s] GFIR=%016llx\n",
-				   GENWQE_DEVNAME, dev_name(&pci_dev->dev),
-				   __func__, gfir);
+		dev_err_ratelimited(&pci_dev->dev,
+				    "[%s] GFIR=%016llx\n",
+				    __func__, gfir);
 	}
 
+ exit:
 	return IRQ_HANDLED;
 }
 
diff --git a/drivers/misc/genwqe/card_debugfs.c b/drivers/misc/genwqe/card_debugfs.c
index 50d2096..2171915 100644
--- a/drivers/misc/genwqe/card_debugfs.c
+++ b/drivers/misc/genwqe/card_debugfs.c
@@ -485,6 +485,13 @@ int genwqe_init_debugfs(struct genwqe_dev *cd)
 		goto err1;
 	}
 
+	file = debugfs_create_u32("use_platform_recovery", 0666, root,
+				  &cd->use_platform_recovery);
+	if (!file) {
+		ret = -ENOMEM;
+		goto err1;
+	}
+
 	cd->debugfs_root = root;
 	return 0;
 err1:
diff --git a/drivers/misc/genwqe/card_dev.c b/drivers/misc/genwqe/card_dev.c
index 1d2f163..aae4255 100644
--- a/drivers/misc/genwqe/card_dev.c
+++ b/drivers/misc/genwqe/card_dev.c
@@ -1048,10 +1048,15 @@ static long genwqe_ioctl(struct file *filp, unsigned int cmd,
 	int rc = 0;
 	struct genwqe_file *cfile = (struct genwqe_file *)filp->private_data;
 	struct genwqe_dev *cd = cfile->cd;
+	struct pci_dev *pci_dev = cd->pci_dev;
 	struct genwqe_reg_io __user *io;
 	u64 val;
 	u32 reg_offs;
 
+	/* Return -EIO if card hit EEH */
+	if (pci_channel_offline(pci_dev))
+		return -EIO;
+
 	if (_IOC_TYPE(cmd) != GENWQE_IOC_CODE)
 		return -EINVAL;
 
diff --git a/drivers/misc/genwqe/card_utils.c b/drivers/misc/genwqe/card_utils.c
index d049d27..61846dd 100644
--- a/drivers/misc/genwqe/card_utils.c
+++ b/drivers/misc/genwqe/card_utils.c
@@ -53,12 +53,17 @@
  */
 int __genwqe_writeq(struct genwqe_dev *cd, u64 byte_offs, u64 val)
 {
+	struct pci_dev *pci_dev = cd->pci_dev;
+
 	if (cd->err_inject & GENWQE_INJECT_HARDWARE_FAILURE)
 		return -EIO;
 
 	if (cd->mmio == NULL)
 		return -EIO;
 
+	if (pci_channel_offline(pci_dev))
+		return -EIO;
+
 	__raw_writeq((__force u64)cpu_to_be64(val), cd->mmio + byte_offs);
 	return 0;
 }
@@ -99,12 +104,17 @@ u64 __genwqe_readq(struct genwqe_dev *cd, u64 byte_offs)
  */
 int __genwqe_writel(struct genwqe_dev *cd, u64 byte_offs, u32 val)
 {
+	struct pci_dev *pci_dev = cd->pci_dev;
+
 	if (cd->err_inject & GENWQE_INJECT_HARDWARE_FAILURE)
 		return -EIO;
 
 	if (cd->mmio == NULL)
 		return -EIO;
 
+	if (pci_channel_offline(pci_dev))
+		return -EIO;
+
 	__raw_writel((__force u32)cpu_to_be32(val), cd->mmio + byte_offs);
 	return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] GenWQE: Improve hardware error recovery
  2014-06-04 13:57 [PATCH 0/4] GenWQE: patches to improve RAS features Kleber Sacilotto de Souza
  2014-06-04 13:57 ` [PATCH 1/4] GenWQE: Add sysfs interface for bitstream reload Kleber Sacilotto de Souza
  2014-06-04 13:57 ` [PATCH 2/4] GenWQE: Add support for EEH error recovery Kleber Sacilotto de Souza
@ 2014-06-04 13:57 ` Kleber Sacilotto de Souza
  2014-06-04 14:06   ` Frank Haverkamp
  2014-06-04 13:57 ` [PATCH 4/4] GenWQE: Increase driver version number Kleber Sacilotto de Souza
  3 siblings, 1 reply; 15+ messages in thread
From: Kleber Sacilotto de Souza @ 2014-06-04 13:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: haver, gregkh, Kleber Sacilotto de Souza

Currently, in the event of a fatal hardware error, the driver tries a
recovery procedure that calls pci_reset_function() to reset the card.
This is not sufficient in some cases, needing a fundamental reset to
bring the card back.

This patch implements a call to the platform fundamental reset procedure
on the error recovery path if GENWQE_PLATFORM_ERROR_RECOVERY is enabled.
This is implemented by default only on PPC64, since this can cause
problems on other archs, e.g. zSeries, where the platform has its own
recovery procedures, leading to a potencial race conditition. For these
cases, the recovery is kept as it was before.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
---
 drivers/misc/genwqe/card_base.c |   45 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/drivers/misc/genwqe/card_base.c b/drivers/misc/genwqe/card_base.c
index 87ebaba..abb7961 100644
--- a/drivers/misc/genwqe/card_base.c
+++ b/drivers/misc/genwqe/card_base.c
@@ -797,6 +797,41 @@ static int genwqe_pci_fundamental_reset(struct pci_dev *pci_dev)
 	return rc;
 }
 
+
+static int genwqe_platform_recovery(struct genwqe_dev *cd)
+{
+	struct pci_dev *pci_dev = cd->pci_dev;
+	int rc;
+
+	dev_info(&pci_dev->dev,
+		 "[%s] resetting card for error recovery\n", __func__);
+
+	/* Clear out error injection flags */
+	cd->err_inject &= ~(GENWQE_INJECT_HARDWARE_FAILURE |
+			    GENWQE_INJECT_GFIR_FATAL |
+			    GENWQE_INJECT_GFIR_INFO);
+
+	genwqe_stop(cd);
+
+	/* Try recoverying the card with fundamental reset */
+	rc = genwqe_pci_fundamental_reset(pci_dev);
+	if (!rc) {
+		rc = genwqe_start(cd);
+		if (!rc)
+			dev_info(&pci_dev->dev,
+				 "[%s] card recovered\n", __func__);
+		else
+			dev_err(&pci_dev->dev,
+				"[%s] err: cannot start card services! (err=%d)\n",
+				__func__, rc);
+	} else {
+		dev_err(&pci_dev->dev,
+			"[%s] card reset failed\n", __func__);
+	}
+
+	return rc;
+}
+
 /*
  * genwqe_reload_bistream() - reload card bitstream
  *
@@ -875,6 +910,7 @@ static int genwqe_health_thread(void *data)
 	struct pci_dev *pci_dev = cd->pci_dev;
 	u64 gfir, gfir_masked, slu_unitcfg, app_unitcfg;
 
+ health_thread_begin:
 	while (!kthread_should_stop()) {
 		rc = wait_event_interruptible_timeout(cd->health_waitq,
 			 (genwqe_health_check_cond(cd, &gfir) ||
@@ -960,6 +996,15 @@ static int genwqe_health_thread(void *data)
 		/* We do nothing if the card is going over PCI recovery */
 		if (pci_channel_offline(pci_dev))
 			return -EIO;
+
+		/*
+		 * If it's supported by the platform, we try a fundamental reset
+		 * to recover from a fatal error. Otherwise, we continue to wait
+		 * for an external recovery procedure to take care of it.
+		 */
+		rc = genwqe_platform_recovery(cd);
+		if (!rc)
+			goto health_thread_begin;
 	}
 
 	dev_err(&pci_dev->dev,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-04 13:57 [PATCH 0/4] GenWQE: patches to improve RAS features Kleber Sacilotto de Souza
                   ` (2 preceding siblings ...)
  2014-06-04 13:57 ` [PATCH 3/4] GenWQE: Improve hardware " Kleber Sacilotto de Souza
@ 2014-06-04 13:57 ` Kleber Sacilotto de Souza
  2014-06-04 14:06   ` Frank Haverkamp
  2014-06-04 15:54   ` Greg KH
  3 siblings, 2 replies; 15+ messages in thread
From: Kleber Sacilotto de Souza @ 2014-06-04 13:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: haver, gregkh, Kleber Sacilotto de Souza

Increase genwqe driver version number.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
---
 drivers/misc/genwqe/genwqe_driver.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
index cd52631..a506e9a 100644
--- a/drivers/misc/genwqe/genwqe_driver.h
+++ b/drivers/misc/genwqe/genwqe_driver.h
@@ -36,7 +36,7 @@
 #include <asm/byteorder.h>
 #include <linux/genwqe/genwqe_card.h>
 
-#define DRV_VERS_STRING		"2.0.15"
+#define DRV_VERS_STRING		"2.0.21"
 
 /*
  * Static minor number assignement, until we decide/implement
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] GenWQE: Add sysfs interface for bitstream reload
  2014-06-04 13:57 ` [PATCH 1/4] GenWQE: Add sysfs interface for bitstream reload Kleber Sacilotto de Souza
@ 2014-06-04 14:05   ` Frank Haverkamp
  0 siblings, 0 replies; 15+ messages in thread
From: Frank Haverkamp @ 2014-06-04 14:05 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza; +Cc: linux-kernel, gregkh


Am Mittwoch, den 04.06.2014, 10:57 -0300 schrieb Kleber Sacilotto de
Souza:
> This patch adds an interface on sysfs for userspace to request a card
> bitstream reload. It sets the appropriate register and try to perform a
> fundamental reset on the PCIe slot for the card to reload the bitstream
> from the chosen partition.
> 
> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> ---
>  Documentation/ABI/testing/sysfs-driver-genwqe |    9 +++
>  drivers/misc/genwqe/card_base.c               |   90 +++++++++++++++++++++++++
>  drivers/misc/genwqe/card_sysfs.c              |   25 +++++++
>  include/uapi/linux/genwqe/genwqe_card.h       |    1 +
>  4 files changed, 125 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-driver-genwqe b/Documentation/ABI/testing/sysfs-driver-genwqe
> index 1870737..64ac6d5 100644
> --- a/Documentation/ABI/testing/sysfs-driver-genwqe
> +++ b/Documentation/ABI/testing/sysfs-driver-genwqe
> @@ -25,6 +25,15 @@ Date:           Oct 2013
>  Contact:        haver@linux.vnet.ibm.com
>  Description:    Interface to set the next bitstream to be used.
> 
> +What:           /sys/class/genwqe/genwqe<n>_card/reload_bitstream
> +Date:           May 2014
> +Contact:        klebers@linux.vnet.ibm.com
> +Description:    Interface to trigger a PCIe card reset to reload the bitstream.
> +                  sudo sh -c 'echo 1 > \
> +                    /sys/class/genwqe/genwqe0_card/reload_bitstream'
> +                If successfully, the card will come back with the bitstream set
> +                on 'next_bitstream'.
> +
>  What:           /sys/class/genwqe/genwqe<n>_card/tempsens
>  Date:           Oct 2013
>  Contact:        haver@linux.vnet.ibm.com
> diff --git a/drivers/misc/genwqe/card_base.c b/drivers/misc/genwqe/card_base.c
> index 74d51c9..e6cc3e1 100644
> --- a/drivers/misc/genwqe/card_base.c
> +++ b/drivers/misc/genwqe/card_base.c
> @@ -761,6 +761,89 @@ static u64 genwqe_fir_checking(struct genwqe_dev *cd)
>  }
> 
>  /**
> + * genwqe_pci_fundamental_reset() - trigger a PCIe fundamental reset on the slot
> + *
> + * Note: pci_set_pcie_reset_state() is not implemented on all archs, so this
> + * reset method will not work in all cases.
> + *
> + * Return: 0 on success or error code from pci_set_pcie_reset_state()
> + */
> +static int genwqe_pci_fundamental_reset(struct pci_dev *pci_dev)
> +{
> +	int rc;
> +
> +	/*
> +	 * lock pci config space access from userspace,
> +	 * save state and issue PCIe fundamental reset
> +	 */
> +	pci_cfg_access_lock(pci_dev);
> +	pci_save_state(pci_dev);
> +	rc = pci_set_pcie_reset_state(pci_dev, pcie_warm_reset);
> +	if (!rc) {
> +		/* keep PCIe reset asserted for 250ms */
> +		msleep(250);
> +		pci_set_pcie_reset_state(pci_dev, pcie_deassert_reset);
> +		/* Wait for 2s to reload flash and train the link */
> +		msleep(2000);
> +	}
> +	pci_restore_state(pci_dev);
> +	pci_cfg_access_unlock(pci_dev);
> +	return rc;
> +}
> +
> +/*
> + * genwqe_reload_bistream() - reload card bitstream
> + *
> + * Set the appropriate register and call fundamental reset to reaload the card
> + * bitstream.
> + *
> + * Return: 0 on success, error code otherwise
> + */
> +static int genwqe_reload_bistream(struct genwqe_dev *cd)
> +{
> +	struct pci_dev *pci_dev = cd->pci_dev;
> +	int rc;
> +
> +	dev_info(&pci_dev->dev,
> +		 "[%s] resetting card for bitstream reload\n",
> +		 __func__);
> +
> +	genwqe_stop(cd);
> +
> +	/*
> +	 * Cause a CPLD reprogram with the 'next_bitstream'
> +	 * partition on PCIe hot or fundamental reset
> +	 */
> +	__genwqe_writeq(cd, IO_SLC_CFGREG_SOFTRESET,
> +			(cd->softreset & 0xcull) | 0x70ull);
> +
> +	rc = genwqe_pci_fundamental_reset(pci_dev);
> +	if (rc) {
> +		/*
> +		 * A fundamental reset failure can be caused
> +		 * by lack of support on the arch, so we just
> +		 * log the error and try to start the card
> +		 * again.
> +		 */
> +		dev_err(&pci_dev->dev,
> +			"[%s] err: failed to reset card for bitstream reload\n",
> +			__func__);
> +	}
> +
> +	rc = genwqe_start(cd);
> +	if (rc) {
> +		dev_err(&pci_dev->dev,
> +			"[%s] err: cannot start card services! (err=%d)\n",
> +			__func__, rc);
> +		return rc;
> +	}
> +	dev_info(&pci_dev->dev,
> +		 "[%s] card reloaded\n", __func__);
> +	return 0;
> +}
> +
> +
> +/**
>   * genwqe_health_thread() - Health checking thread
>   *
>   * This thread is only started for the PF of the card.
> @@ -846,6 +929,13 @@ static int genwqe_health_thread(void *data)
>  			}
>  		}
> 
> +		if (cd->card_state == GENWQE_CARD_RELOAD_BITSTREAM) {
> +			/* Userspace requested card bitstream reload */
> +			rc = genwqe_reload_bistream(cd);
> +			if (rc)
> +				goto fatal_error;
> +		}
> +
>  		cd->last_gfir = gfir;
>  		cond_resched();
>  	}
> diff --git a/drivers/misc/genwqe/card_sysfs.c b/drivers/misc/genwqe/card_sysfs.c
> index a72a992..7232e40 100644
> --- a/drivers/misc/genwqe/card_sysfs.c
> +++ b/drivers/misc/genwqe/card_sysfs.c
> @@ -223,6 +223,30 @@ static ssize_t next_bitstream_store(struct device *dev,
>  }
>  static DEVICE_ATTR_RW(next_bitstream);
> 
> +static ssize_t reload_bitstream_store(struct device *dev,
> +				struct device_attribute *attr,
> +				const char *buf, size_t count)
> +{
> +	int reload;
> +	struct genwqe_dev *cd = dev_get_drvdata(dev);
> +
> +	if (kstrtoint(buf, 0, &reload) < 0)
> +		return -EINVAL;
> +
> +	if (reload == 0x1) {
> +		if (cd->card_state == GENWQE_CARD_UNUSED ||
> +		    cd->card_state == GENWQE_CARD_USED)
> +			cd->card_state = GENWQE_CARD_RELOAD_BITSTREAM;
> +		else
> +			return -EIO;
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	return count;
> +}
> +static DEVICE_ATTR_WO(reload_bitstream);
> +
>  /*
>   * Create device_attribute structures / params: name, mode, show, store
>   * additional flag if valid in VF
> @@ -239,6 +263,7 @@ static struct attribute *genwqe_attributes[] = {
>  	&dev_attr_status.attr,
>  	&dev_attr_freerunning_timer.attr,
>  	&dev_attr_queue_working_time.attr,
> +	&dev_attr_reload_bitstream.attr,
>  	NULL,
>  };
> 
> diff --git a/include/uapi/linux/genwqe/genwqe_card.h b/include/uapi/linux/genwqe/genwqe_card.h
> index 795e957..4fc065f 100644
> --- a/include/uapi/linux/genwqe/genwqe_card.h
> +++ b/include/uapi/linux/genwqe/genwqe_card.h
> @@ -328,6 +328,7 @@ enum genwqe_card_state {
>  	GENWQE_CARD_UNUSED = 0,
>  	GENWQE_CARD_USED = 1,
>  	GENWQE_CARD_FATAL_ERROR = 2,
> +	GENWQE_CARD_RELOAD_BITSTREAM = 3,
>  	GENWQE_CARD_STATE_MAX,
>  };
> 

Thanks for contributing those additions to our driver.

Acked-by: Frank Haverkamp <haver@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] GenWQE: Add support for EEH error recovery
  2014-06-04 13:57 ` [PATCH 2/4] GenWQE: Add support for EEH error recovery Kleber Sacilotto de Souza
@ 2014-06-04 14:06   ` Frank Haverkamp
  0 siblings, 0 replies; 15+ messages in thread
From: Frank Haverkamp @ 2014-06-04 14:06 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza; +Cc: linux-kernel, gregkh

Am Mittwoch, den 04.06.2014, 10:57 -0300 schrieb Kleber Sacilotto de
Souza:
> This patch implements the callbacks and functions necessary to have EEH
> recovery support.
> 
> It adds a config option to enable or disable explicit calls to trigger
> platform specific mechanisms on error recovery paths. This option is
> enabled by default only on PPC64 systems and can be overritten via
> debugfs. If this option is enabled, on the error recovery path the
> driver will call pci_channel_offline() to check for error condition and
> issue non-raw MMIO reads to trigger early EEH detection in case of
> hardware failures. This is necessary since the driver MMIO helper
> funtions use raw accessors.
> 
> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> ---
>  drivers/misc/genwqe/Kconfig        |    6 +++
>  drivers/misc/genwqe/card_base.c    |   79 ++++++++++++++++++++++++++++++------
>  drivers/misc/genwqe/card_base.h    |    2 +
>  drivers/misc/genwqe/card_ddcb.c    |   24 +++++++++--
>  drivers/misc/genwqe/card_debugfs.c |    7 +++
>  drivers/misc/genwqe/card_dev.c     |    5 ++
>  drivers/misc/genwqe/card_utils.c   |   10 +++++
>  7 files changed, 115 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/misc/genwqe/Kconfig b/drivers/misc/genwqe/Kconfig
> index 6069d8c..4c0a033 100644
> --- a/drivers/misc/genwqe/Kconfig
> +++ b/drivers/misc/genwqe/Kconfig
> @@ -11,3 +11,9 @@ menuconfig GENWQE
>  	  Enables PCIe card driver for IBM GenWQE accelerators.
>  	  The user-space interface is described in
>  	  include/linux/genwqe/genwqe_card.h.
> +
> +config GENWQE_PLATFORM_ERROR_RECOVERY
> +	int "Use platform recovery procedures (0=off, 1=on)"
> +	depends on GENWQE
> +	default 1 if PPC64
> +	default 0
> diff --git a/drivers/misc/genwqe/card_base.c b/drivers/misc/genwqe/card_base.c
> index e6cc3e1..87ebaba 100644
> --- a/drivers/misc/genwqe/card_base.c
> +++ b/drivers/misc/genwqe/card_base.c
> @@ -140,6 +140,12 @@ static struct genwqe_dev *genwqe_dev_alloc(void)
>  	cd->class_genwqe = class_genwqe;
>  	cd->debugfs_genwqe = debugfs_genwqe;
> 
> +	/*
> +	 * This comes from kernel config option and can be overritten via
> +	 * debugfs.
> +	 */
> +	cd->use_platform_recovery = CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY;
> +
>  	init_waitqueue_head(&cd->queue_waitq);
> 
>  	spin_lock_init(&cd->file_lock);
> @@ -943,6 +949,19 @@ static int genwqe_health_thread(void *data)
>  	return 0;
> 
>   fatal_error:
> +	if (cd->use_platform_recovery) {
> +		/*
> +		 * Since we use raw accessors, EEH errors won't be detected
> +		 * by the platform until we do a non-raw MMIO or config space
> +		 * read
> +		 */
> +		readq(cd->mmio + IO_SLC_CFGREG_GFIR);
> +
> +		/* We do nothing if the card is going over PCI recovery */
> +		if (pci_channel_offline(pci_dev))
> +			return -EIO;
> +	}
> +
>  	dev_err(&pci_dev->dev,
>  		"[%s] card unusable. Please trigger unbind!\n", __func__);
> 
> @@ -1048,6 +1067,9 @@ static int genwqe_pci_setup(struct genwqe_dev *cd)
>  	pci_set_master(pci_dev);
>  	pci_enable_pcie_error_reporting(pci_dev);
> 
> +	/* EEH recovery requires PCIe fundamental reset */
> +	pci_dev->needs_freset = 1;
> +
>  	/* request complete BAR-0 space (length = 0) */
>  	cd->mmio_len = pci_resource_len(pci_dev, 0);
>  	cd->mmio = pci_iomap(pci_dev, 0, 0);
> @@ -1186,23 +1208,40 @@ static pci_ers_result_t genwqe_err_error_detected(struct pci_dev *pci_dev,
> 
>  	dev_err(&pci_dev->dev, "[%s] state=%d\n", __func__, state);
> 
> -	if (pci_dev == NULL)
> -		return PCI_ERS_RESULT_NEED_RESET;
> -
>  	cd = dev_get_drvdata(&pci_dev->dev);
>  	if (cd == NULL)
> -		return PCI_ERS_RESULT_NEED_RESET;
> +		return PCI_ERS_RESULT_DISCONNECT;
> 
> -	switch (state) {
> -	case pci_channel_io_normal:
> -		return PCI_ERS_RESULT_CAN_RECOVER;
> -	case pci_channel_io_frozen:
> -		return PCI_ERS_RESULT_NEED_RESET;
> -	case pci_channel_io_perm_failure:
> +	/* Stop the card */
> +	genwqe_health_check_stop(cd);
> +	genwqe_stop(cd);
> +
> +	/*
> +	 * On permanent failure, the PCI code will call device remove
> +	 * after the return of this function.
> +	 * genwqe_stop() can be called twice.
> +	 */
> +	if (state == pci_channel_io_perm_failure) {
>  		return PCI_ERS_RESULT_DISCONNECT;
> +	} else {
> +		genwqe_pci_remove(cd);
> +		return PCI_ERS_RESULT_NEED_RESET;
>  	}
> +}
> +
> +static pci_ers_result_t genwqe_err_slot_reset(struct pci_dev *pci_dev)
> +{
> +	int rc;
> +	struct genwqe_dev *cd = dev_get_drvdata(&pci_dev->dev);
> 
> -	return PCI_ERS_RESULT_NEED_RESET;
> +	rc = genwqe_pci_setup(cd);
> +	if (!rc) {
> +		return PCI_ERS_RESULT_RECOVERED;
> +	} else {
> +		dev_err(&pci_dev->dev,
> +			"err: problems with PCI setup (err=%d)\n", rc);
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
>  }
> 
>  static pci_ers_result_t genwqe_err_result_none(struct pci_dev *dev)
> @@ -1210,8 +1249,22 @@ static pci_ers_result_t genwqe_err_result_none(struct pci_dev *dev)
>  	return PCI_ERS_RESULT_NONE;
>  }
> 
> -static void genwqe_err_resume(struct pci_dev *dev)
> +static void genwqe_err_resume(struct pci_dev *pci_dev)
>  {
> +	int rc;
> +	struct genwqe_dev *cd = dev_get_drvdata(&pci_dev->dev);
> +
> +	rc = genwqe_start(cd);
> +	if (!rc) {
> +		rc = genwqe_health_check_start(cd);
> +		if (rc)
> +			dev_err(&pci_dev->dev,
> +				"err: cannot start health checking! (err=%d)\n",
> +				rc);
> +	} else {
> +		dev_err(&pci_dev->dev,
> +			"err: cannot start card services! (err=%d)\n", rc);
> +	}
>  }
> 
>  static int genwqe_sriov_configure(struct pci_dev *dev, int numvfs)
> @@ -1234,7 +1287,7 @@ static struct pci_error_handlers genwqe_err_handler = {
>  	.error_detected = genwqe_err_error_detected,
>  	.mmio_enabled	= genwqe_err_result_none,
>  	.link_reset	= genwqe_err_result_none,
> -	.slot_reset	= genwqe_err_result_none,
> +	.slot_reset	= genwqe_err_slot_reset,
>  	.resume		= genwqe_err_resume,
>  };
> 
> diff --git a/drivers/misc/genwqe/card_base.h b/drivers/misc/genwqe/card_base.h
> index 0e608a2..67abd8c 100644
> --- a/drivers/misc/genwqe/card_base.h
> +++ b/drivers/misc/genwqe/card_base.h
> @@ -291,6 +291,8 @@ struct genwqe_dev {
>  	struct task_struct *health_thread;
>  	wait_queue_head_t health_waitq;
> 
> +	int use_platform_recovery;	/* use platform recovery mechanisms */
> +
>  	/* char device */
>  	dev_t  devnum_genwqe;		/* major/minor num card */
>  	struct class *class_genwqe;	/* reference to class object */
> diff --git a/drivers/misc/genwqe/card_ddcb.c b/drivers/misc/genwqe/card_ddcb.c
> index c8046db..f0de615 100644
> --- a/drivers/misc/genwqe/card_ddcb.c
> +++ b/drivers/misc/genwqe/card_ddcb.c
> @@ -1118,7 +1118,21 @@ static irqreturn_t genwqe_pf_isr(int irq, void *dev_id)
>  	 * safer, but slower for the good-case ... See above.
>  	 */
>  	gfir = __genwqe_readq(cd, IO_SLC_CFGREG_GFIR);
> -	if ((gfir & GFIR_ERR_TRIGGER) != 0x0) {
> +	if (((gfir & GFIR_ERR_TRIGGER) != 0x0) &&
> +	    !pci_channel_offline(pci_dev)) {
> +
> +		if (cd->use_platform_recovery) {
> +			/*
> +			 * Since we use raw accessors, EEH errors won't be
> +			 * detected by the platform until we do a non-raw
> +			 * MMIO or config space read
> +			 */
> +			readq(cd->mmio + IO_SLC_CFGREG_GFIR);
> +
> +			/* Don't do anything if the PCI channel is frozen */
> +			if (pci_channel_offline(pci_dev))
> +				goto exit;
> +		}
> 
>  		wake_up_interruptible(&cd->health_waitq);
> 
> @@ -1126,12 +1140,12 @@ static irqreturn_t genwqe_pf_isr(int irq, void *dev_id)
>  		 * By default GFIRs causes recovery actions. This
>  		 * count is just for debug when recovery is masked.
>  		 */
> -		printk_ratelimited(KERN_ERR
> -				   "%s %s: [%s] GFIR=%016llx\n",
> -				   GENWQE_DEVNAME, dev_name(&pci_dev->dev),
> -				   __func__, gfir);
> +		dev_err_ratelimited(&pci_dev->dev,
> +				    "[%s] GFIR=%016llx\n",
> +				    __func__, gfir);
>  	}
> 
> + exit:
>  	return IRQ_HANDLED;
>  }
> 
> diff --git a/drivers/misc/genwqe/card_debugfs.c b/drivers/misc/genwqe/card_debugfs.c
> index 50d2096..2171915 100644
> --- a/drivers/misc/genwqe/card_debugfs.c
> +++ b/drivers/misc/genwqe/card_debugfs.c
> @@ -485,6 +485,13 @@ int genwqe_init_debugfs(struct genwqe_dev *cd)
>  		goto err1;
>  	}
> 
> +	file = debugfs_create_u32("use_platform_recovery", 0666, root,
> +				  &cd->use_platform_recovery);
> +	if (!file) {
> +		ret = -ENOMEM;
> +		goto err1;
> +	}
> +
>  	cd->debugfs_root = root;
>  	return 0;
>  err1:
> diff --git a/drivers/misc/genwqe/card_dev.c b/drivers/misc/genwqe/card_dev.c
> index 1d2f163..aae4255 100644
> --- a/drivers/misc/genwqe/card_dev.c
> +++ b/drivers/misc/genwqe/card_dev.c
> @@ -1048,10 +1048,15 @@ static long genwqe_ioctl(struct file *filp, unsigned int cmd,
>  	int rc = 0;
>  	struct genwqe_file *cfile = (struct genwqe_file *)filp->private_data;
>  	struct genwqe_dev *cd = cfile->cd;
> +	struct pci_dev *pci_dev = cd->pci_dev;
>  	struct genwqe_reg_io __user *io;
>  	u64 val;
>  	u32 reg_offs;
> 
> +	/* Return -EIO if card hit EEH */
> +	if (pci_channel_offline(pci_dev))
> +		return -EIO;
> +
>  	if (_IOC_TYPE(cmd) != GENWQE_IOC_CODE)
>  		return -EINVAL;
> 
> diff --git a/drivers/misc/genwqe/card_utils.c b/drivers/misc/genwqe/card_utils.c
> index d049d27..61846dd 100644
> --- a/drivers/misc/genwqe/card_utils.c
> +++ b/drivers/misc/genwqe/card_utils.c
> @@ -53,12 +53,17 @@
>   */
>  int __genwqe_writeq(struct genwqe_dev *cd, u64 byte_offs, u64 val)
>  {
> +	struct pci_dev *pci_dev = cd->pci_dev;
> +
>  	if (cd->err_inject & GENWQE_INJECT_HARDWARE_FAILURE)
>  		return -EIO;
> 
>  	if (cd->mmio == NULL)
>  		return -EIO;
> 
> +	if (pci_channel_offline(pci_dev))
> +		return -EIO;
> +
>  	__raw_writeq((__force u64)cpu_to_be64(val), cd->mmio + byte_offs);
>  	return 0;
>  }
> @@ -99,12 +104,17 @@ u64 __genwqe_readq(struct genwqe_dev *cd, u64 byte_offs)
>   */
>  int __genwqe_writel(struct genwqe_dev *cd, u64 byte_offs, u32 val)
>  {
> +	struct pci_dev *pci_dev = cd->pci_dev;
> +
>  	if (cd->err_inject & GENWQE_INJECT_HARDWARE_FAILURE)
>  		return -EIO;
> 
>  	if (cd->mmio == NULL)
>  		return -EIO;
> 
> +	if (pci_channel_offline(pci_dev))
> +		return -EIO;
> +
>  	__raw_writel((__force u32)cpu_to_be32(val), cd->mmio + byte_offs);
>  	return 0;
>  }

Thanks for contributing those additions to our driver.

Acked-by: Frank Haverkamp <haver@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] GenWQE: Improve hardware error recovery
  2014-06-04 13:57 ` [PATCH 3/4] GenWQE: Improve hardware " Kleber Sacilotto de Souza
@ 2014-06-04 14:06   ` Frank Haverkamp
  0 siblings, 0 replies; 15+ messages in thread
From: Frank Haverkamp @ 2014-06-04 14:06 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza; +Cc: linux-kernel, gregkh

Am Mittwoch, den 04.06.2014, 10:57 -0300 schrieb Kleber Sacilotto de
Souza:
> Currently, in the event of a fatal hardware error, the driver tries a
> recovery procedure that calls pci_reset_function() to reset the card.
> This is not sufficient in some cases, needing a fundamental reset to
> bring the card back.
> 
> This patch implements a call to the platform fundamental reset procedure
> on the error recovery path if GENWQE_PLATFORM_ERROR_RECOVERY is enabled.
> This is implemented by default only on PPC64, since this can cause
> problems on other archs, e.g. zSeries, where the platform has its own
> recovery procedures, leading to a potencial race conditition. For these
> cases, the recovery is kept as it was before.
> 
> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> ---
>  drivers/misc/genwqe/card_base.c |   45 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/misc/genwqe/card_base.c b/drivers/misc/genwqe/card_base.c
> index 87ebaba..abb7961 100644
> --- a/drivers/misc/genwqe/card_base.c
> +++ b/drivers/misc/genwqe/card_base.c
> @@ -797,6 +797,41 @@ static int genwqe_pci_fundamental_reset(struct pci_dev *pci_dev)
>  	return rc;
>  }
> 
> +
> +static int genwqe_platform_recovery(struct genwqe_dev *cd)
> +{
> +	struct pci_dev *pci_dev = cd->pci_dev;
> +	int rc;
> +
> +	dev_info(&pci_dev->dev,
> +		 "[%s] resetting card for error recovery\n", __func__);
> +
> +	/* Clear out error injection flags */
> +	cd->err_inject &= ~(GENWQE_INJECT_HARDWARE_FAILURE |
> +			    GENWQE_INJECT_GFIR_FATAL |
> +			    GENWQE_INJECT_GFIR_INFO);
> +
> +	genwqe_stop(cd);
> +
> +	/* Try recoverying the card with fundamental reset */
> +	rc = genwqe_pci_fundamental_reset(pci_dev);
> +	if (!rc) {
> +		rc = genwqe_start(cd);
> +		if (!rc)
> +			dev_info(&pci_dev->dev,
> +				 "[%s] card recovered\n", __func__);
> +		else
> +			dev_err(&pci_dev->dev,
> +				"[%s] err: cannot start card services! (err=%d)\n",
> +				__func__, rc);
> +	} else {
> +		dev_err(&pci_dev->dev,
> +			"[%s] card reset failed\n", __func__);
> +	}
> +
> +	return rc;
> +}
> +
>  /*
>   * genwqe_reload_bistream() - reload card bitstream
>   *
> @@ -875,6 +910,7 @@ static int genwqe_health_thread(void *data)
>  	struct pci_dev *pci_dev = cd->pci_dev;
>  	u64 gfir, gfir_masked, slu_unitcfg, app_unitcfg;
> 
> + health_thread_begin:
>  	while (!kthread_should_stop()) {
>  		rc = wait_event_interruptible_timeout(cd->health_waitq,
>  			 (genwqe_health_check_cond(cd, &gfir) ||
> @@ -960,6 +996,15 @@ static int genwqe_health_thread(void *data)
>  		/* We do nothing if the card is going over PCI recovery */
>  		if (pci_channel_offline(pci_dev))
>  			return -EIO;
> +
> +		/*
> +		 * If it's supported by the platform, we try a fundamental reset
> +		 * to recover from a fatal error. Otherwise, we continue to wait
> +		 * for an external recovery procedure to take care of it.
> +		 */
> +		rc = genwqe_platform_recovery(cd);
> +		if (!rc)
> +			goto health_thread_begin;
>  	}
> 
>  	dev_err(&pci_dev->dev,

Thanks for contributing those additions to our driver.

Acked-by: Frank Haverkamp <haver@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-04 13:57 ` [PATCH 4/4] GenWQE: Increase driver version number Kleber Sacilotto de Souza
@ 2014-06-04 14:06   ` Frank Haverkamp
  2014-06-04 15:54   ` Greg KH
  1 sibling, 0 replies; 15+ messages in thread
From: Frank Haverkamp @ 2014-06-04 14:06 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza; +Cc: linux-kernel, gregkh

Am Mittwoch, den 04.06.2014, 10:57 -0300 schrieb Kleber Sacilotto de
Souza:
> Increase genwqe driver version number.
> 
> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> ---
>  drivers/misc/genwqe/genwqe_driver.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
> index cd52631..a506e9a 100644
> --- a/drivers/misc/genwqe/genwqe_driver.h
> +++ b/drivers/misc/genwqe/genwqe_driver.h
> @@ -36,7 +36,7 @@
>  #include <asm/byteorder.h>
>  #include <linux/genwqe/genwqe_card.h>
> 
> -#define DRV_VERS_STRING		"2.0.15"
> +#define DRV_VERS_STRING		"2.0.21"
> 
>  /*
>   * Static minor number assignement, until we decide/implement

Thanks for contributing those additions to our driver.

Acked-by: Frank Haverkamp <haver@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-04 13:57 ` [PATCH 4/4] GenWQE: Increase driver version number Kleber Sacilotto de Souza
  2014-06-04 14:06   ` Frank Haverkamp
@ 2014-06-04 15:54   ` Greg KH
  2014-06-05  9:23     ` Frank Haverkamp
  1 sibling, 1 reply; 15+ messages in thread
From: Greg KH @ 2014-06-04 15:54 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza; +Cc: linux-kernel, haver

On Wed, Jun 04, 2014 at 10:57:53AM -0300, Kleber Sacilotto de Souza wrote:
> Increase genwqe driver version number.
> 
> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> ---
>  drivers/misc/genwqe/genwqe_driver.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
> index cd52631..a506e9a 100644
> --- a/drivers/misc/genwqe/genwqe_driver.h
> +++ b/drivers/misc/genwqe/genwqe_driver.h
> @@ -36,7 +36,7 @@
>  #include <asm/byteorder.h>
>  #include <linux/genwqe/genwqe_card.h>
>  
> -#define DRV_VERS_STRING		"2.0.15"
> +#define DRV_VERS_STRING		"2.0.21"

Why is this even needed?  Can't you go off of the kernel version number
now?  Who needs / wants this number?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-04 15:54   ` Greg KH
@ 2014-06-05  9:23     ` Frank Haverkamp
  2014-06-05 16:00       ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Frank Haverkamp @ 2014-06-05  9:23 UTC (permalink / raw)
  To: Greg KH; +Cc: Kleber Sacilotto de Souza, linux-kernel

Hi Greg,

Am Mittwoch, den 04.06.2014, 08:54 -0700 schrieb Greg KH:
> On Wed, Jun 04, 2014 at 10:57:53AM -0300, Kleber Sacilotto de Souza wrote:
> > Increase genwqe driver version number.
> > 
> > Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> > ---
> >  drivers/misc/genwqe/genwqe_driver.h |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
> > index cd52631..a506e9a 100644
> > --- a/drivers/misc/genwqe/genwqe_driver.h
> > +++ b/drivers/misc/genwqe/genwqe_driver.h
> > @@ -36,7 +36,7 @@
> >  #include <asm/byteorder.h>
> >  #include <linux/genwqe/genwqe_card.h>
> >  
> > -#define DRV_VERS_STRING		"2.0.15"
> > +#define DRV_VERS_STRING		"2.0.21"
> 
> Why is this even needed?  Can't you go off of the kernel version number
> now?  Who needs / wants this number?

I am aware that if just considering the mainline kernels, we could use
the kernel version itself for the purpose of identifying which code we
are running.

But in our lab we are running multiple back-ported versions of this
driver on different Linux distributions using different kernel versions.
Our user-space software needs to know if the driver has or has not
bug-fixes or features. For this purpose, we are using this extra number.

> thanks,
> 
> greg k-h
> 

Regards

Frank


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-05  9:23     ` Frank Haverkamp
@ 2014-06-05 16:00       ` Greg KH
  2014-06-06 12:07         ` Frank Haverkamp
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2014-06-05 16:00 UTC (permalink / raw)
  To: Frank Haverkamp; +Cc: Kleber Sacilotto de Souza, linux-kernel

On Thu, Jun 05, 2014 at 11:23:04AM +0200, Frank Haverkamp wrote:
> Hi Greg,
> 
> Am Mittwoch, den 04.06.2014, 08:54 -0700 schrieb Greg KH:
> > On Wed, Jun 04, 2014 at 10:57:53AM -0300, Kleber Sacilotto de Souza wrote:
> > > Increase genwqe driver version number.
> > > 
> > > Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> > > ---
> > >  drivers/misc/genwqe/genwqe_driver.h |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
> > > index cd52631..a506e9a 100644
> > > --- a/drivers/misc/genwqe/genwqe_driver.h
> > > +++ b/drivers/misc/genwqe/genwqe_driver.h
> > > @@ -36,7 +36,7 @@
> > >  #include <asm/byteorder.h>
> > >  #include <linux/genwqe/genwqe_card.h>
> > >  
> > > -#define DRV_VERS_STRING		"2.0.15"
> > > +#define DRV_VERS_STRING		"2.0.21"
> > 
> > Why is this even needed?  Can't you go off of the kernel version number
> > now?  Who needs / wants this number?
> 
> I am aware that if just considering the mainline kernels, we could use
> the kernel version itself for the purpose of identifying which code we
> are running.

Which is what you are patching here :)

> But in our lab we are running multiple back-ported versions of this
> driver on different Linux distributions using different kernel versions.

Then deal with that in the backported code, the upstream kernel doesn't
care about this.

> Our user-space software needs to know if the driver has or has not
> bug-fixes or features. For this purpose, we are using this extra number.

Why would you rely on a version number for this, shouldn't you be able
to tell with your api what features are present?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-05 16:00       ` Greg KH
@ 2014-06-06 12:07         ` Frank Haverkamp
  2014-06-06 12:40           ` Frank Haverkamp
  0 siblings, 1 reply; 15+ messages in thread
From: Frank Haverkamp @ 2014-06-06 12:07 UTC (permalink / raw)
  To: Greg KH; +Cc: Kleber Sacilotto de Souza, linux-kernel, haver

Hi Greg,

Am Donnerstag, den 05.06.2014, 09:00 -0700 schrieb Greg KH:
> On Thu, Jun 05, 2014 at 11:23:04AM +0200, Frank Haverkamp wrote:
> > Hi Greg,
> > 
> > Am Mittwoch, den 04.06.2014, 08:54 -0700 schrieb Greg KH:
> > > On Wed, Jun 04, 2014 at 10:57:53AM -0300, Kleber Sacilotto de Souza wrote:
> > > > Increase genwqe driver version number.
> > > > 
> > > > Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> > > > ---
> > > >  drivers/misc/genwqe/genwqe_driver.h |    2 +-
> > > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
> > > > index cd52631..a506e9a 100644
> > > > --- a/drivers/misc/genwqe/genwqe_driver.h
> > > > +++ b/drivers/misc/genwqe/genwqe_driver.h
> > > > @@ -36,7 +36,7 @@
> > > >  #include <asm/byteorder.h>
> > > >  #include <linux/genwqe/genwqe_card.h>
> > > >  
> > > > -#define DRV_VERS_STRING		"2.0.15"
> > > > +#define DRV_VERS_STRING		"2.0.21"
> > > 
> > > Why is this even needed?  Can't you go off of the kernel version number
> > > now?  Who needs / wants this number?
> > 
> > I am aware that if just considering the mainline kernels, we could use
> > the kernel version itself for the purpose of identifying which code we
> > are running.
> 
> Which is what you are patching here :)
> 
> > But in our lab we are running multiple back-ported versions of this
> > driver on different Linux distributions using different kernel versions.
> 
> Then deal with that in the backported code, the upstream kernel doesn't
> care about this.
> 
> > Our user-space software needs to know if the driver has or has not
> > bug-fixes or features. For this purpose, we are using this extra number.
> 
> Why would you rely on a version number for this, shouldn't you be able
> to tell with your api what features are present?

For version "2.0.15" there is no automatic recovery for certain
problems, for "2.0.21" there is.

I personally use the driver versions sysfs entry if a user complains
that something e.g. the recovery is not working right. First thing I am
asking for is the version of the code/driver  as part of the debug data.
If that is not matching my expectations, I will tell the user to update
the code.

In the current example new applications could more gracefully handle
failing recovery scenarios by knowing that the old version of the code
cannot properly handle certain problems. And it could this without
knowing if it is using a driver which is in the mainline tree or if it
is a back-ported version.

Therefore I find it much more convenient for us to handle such things
and I would kindly ask you to accept patch 4/4.

> thanks,
> 
> greg k-h

Thanks

Frank


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-06 12:07         ` Frank Haverkamp
@ 2014-06-06 12:40           ` Frank Haverkamp
  2014-07-09 21:13             ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Frank Haverkamp @ 2014-06-06 12:40 UTC (permalink / raw)
  To: Greg KH; +Cc: Kleber Sacilotto de Souza, linux-kernel, haver, schwidefsky

Hi Greg,

Am Freitag, den 06.06.2014, 14:07 +0200 schrieb Frank Haverkamp:
> Hi Greg,
> 
> Am Donnerstag, den 05.06.2014, 09:00 -0700 schrieb Greg KH:
> > On Thu, Jun 05, 2014 at 11:23:04AM +0200, Frank Haverkamp wrote:
> > > Hi Greg,
> > > 
> > > Am Mittwoch, den 04.06.2014, 08:54 -0700 schrieb Greg KH:
> > > > On Wed, Jun 04, 2014 at 10:57:53AM -0300, Kleber Sacilotto de Souza wrote:
> > > > > Increase genwqe driver version number.
> > > > > 
> > > > > Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> > > > > ---
> > > > >  drivers/misc/genwqe/genwqe_driver.h |    2 +-
> > > > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
> > > > > index cd52631..a506e9a 100644
> > > > > --- a/drivers/misc/genwqe/genwqe_driver.h
> > > > > +++ b/drivers/misc/genwqe/genwqe_driver.h
> > > > > @@ -36,7 +36,7 @@
> > > > >  #include <asm/byteorder.h>
> > > > >  #include <linux/genwqe/genwqe_card.h>
> > > > >  
> > > > > -#define DRV_VERS_STRING		"2.0.15"
> > > > > +#define DRV_VERS_STRING		"2.0.21"
> > > > 
> > > > Why is this even needed?  Can't you go off of the kernel version number
> > > > now?  Who needs / wants this number?
> > > 
> > > I am aware that if just considering the mainline kernels, we could use
> > > the kernel version itself for the purpose of identifying which code we
> > > are running.
> > 
> > Which is what you are patching here :)
> > 
> > > But in our lab we are running multiple back-ported versions of this
> > > driver on different Linux distributions using different kernel versions.
> > 
> > Then deal with that in the backported code, the upstream kernel doesn't
> > care about this.
> > 
> > > Our user-space software needs to know if the driver has or has not
> > > bug-fixes or features. For this purpose, we are using this extra number.
> > 
> > Why would you rely on a version number for this, shouldn't you be able
> > to tell with your api what features are present?
> 
> For version "2.0.15" there is no automatic recovery for certain
> problems, for "2.0.21" there is.
> 
> I personally use the driver versions sysfs entry if a user complains
> that something e.g. the recovery is not working right. First thing I am
> asking for is the version of the code/driver  as part of the debug data.
> If that is not matching my expectations, I will tell the user to update
> the code.
> 
> In the current example new applications could more gracefully handle
> failing recovery scenarios by knowing that the old version of the code
> cannot properly handle certain problems. And it could this without
> knowing if it is using a driver which is in the mainline tree or if it
> is a back-ported version.
> 
> Therefore I find it much more convenient for us to handle such things
> and I would kindly ask you to accept patch 4/4.
> 

I talked a bit with a colleague about version numbers for kernel
drivers. He pointed me on to the fact that when other people contribute
to our code, they will mostly not alter my version number, which is
certainly ok. That in turn makes it impossible for me to figure out the
exact code version from my own (internal) version number.

So at the end it is the kernel version number or maybe the git checksum
used to build the code what enables me to identify the exact version of
the code. If this be the case, than I wonder, if we should not remove
the "version" sysfs entry entirely. I mean, if you reject the update to
it anyways ;-).

> > thanks,
> > 
> > greg k-h
> 
> Thanks
> 
> Frank
> 

Regards

Frank


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] GenWQE: Increase driver version number
  2014-06-06 12:40           ` Frank Haverkamp
@ 2014-07-09 21:13             ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2014-07-09 21:13 UTC (permalink / raw)
  To: Frank Haverkamp; +Cc: Kleber Sacilotto de Souza, linux-kernel, schwidefsky

On Fri, Jun 06, 2014 at 02:40:35PM +0200, Frank Haverkamp wrote:
> Hi Greg,
> 
> Am Freitag, den 06.06.2014, 14:07 +0200 schrieb Frank Haverkamp:
> > Hi Greg,
> > 
> > Am Donnerstag, den 05.06.2014, 09:00 -0700 schrieb Greg KH:
> > > On Thu, Jun 05, 2014 at 11:23:04AM +0200, Frank Haverkamp wrote:
> > > > Hi Greg,
> > > > 
> > > > Am Mittwoch, den 04.06.2014, 08:54 -0700 schrieb Greg KH:
> > > > > On Wed, Jun 04, 2014 at 10:57:53AM -0300, Kleber Sacilotto de Souza wrote:
> > > > > > Increase genwqe driver version number.
> > > > > > 
> > > > > > Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
> > > > > > ---
> > > > > >  drivers/misc/genwqe/genwqe_driver.h |    2 +-
> > > > > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/misc/genwqe/genwqe_driver.h b/drivers/misc/genwqe/genwqe_driver.h
> > > > > > index cd52631..a506e9a 100644
> > > > > > --- a/drivers/misc/genwqe/genwqe_driver.h
> > > > > > +++ b/drivers/misc/genwqe/genwqe_driver.h
> > > > > > @@ -36,7 +36,7 @@
> > > > > >  #include <asm/byteorder.h>
> > > > > >  #include <linux/genwqe/genwqe_card.h>
> > > > > >  
> > > > > > -#define DRV_VERS_STRING		"2.0.15"
> > > > > > +#define DRV_VERS_STRING		"2.0.21"
> > > > > 
> > > > > Why is this even needed?  Can't you go off of the kernel version number
> > > > > now?  Who needs / wants this number?
> > > > 
> > > > I am aware that if just considering the mainline kernels, we could use
> > > > the kernel version itself for the purpose of identifying which code we
> > > > are running.
> > > 
> > > Which is what you are patching here :)
> > > 
> > > > But in our lab we are running multiple back-ported versions of this
> > > > driver on different Linux distributions using different kernel versions.
> > > 
> > > Then deal with that in the backported code, the upstream kernel doesn't
> > > care about this.
> > > 
> > > > Our user-space software needs to know if the driver has or has not
> > > > bug-fixes or features. For this purpose, we are using this extra number.
> > > 
> > > Why would you rely on a version number for this, shouldn't you be able
> > > to tell with your api what features are present?
> > 
> > For version "2.0.15" there is no automatic recovery for certain
> > problems, for "2.0.21" there is.
> > 
> > I personally use the driver versions sysfs entry if a user complains
> > that something e.g. the recovery is not working right. First thing I am
> > asking for is the version of the code/driver  as part of the debug data.
> > If that is not matching my expectations, I will tell the user to update
> > the code.
> > 
> > In the current example new applications could more gracefully handle
> > failing recovery scenarios by knowing that the old version of the code
> > cannot properly handle certain problems. And it could this without
> > knowing if it is using a driver which is in the mainline tree or if it
> > is a back-ported version.
> > 
> > Therefore I find it much more convenient for us to handle such things
> > and I would kindly ask you to accept patch 4/4.
> > 
> 
> I talked a bit with a colleague about version numbers for kernel
> drivers. He pointed me on to the fact that when other people contribute
> to our code, they will mostly not alter my version number, which is
> certainly ok. That in turn makes it impossible for me to figure out the
> exact code version from my own (internal) version number.
> 
> So at the end it is the kernel version number or maybe the git checksum
> used to build the code what enables me to identify the exact version of
> the code. If this be the case, than I wonder, if we should not remove
> the "version" sysfs entry entirely. I mean, if you reject the update to
> it anyways ;-).

I suggest just removing the version entirely.  I'll take this patch, but
can you send a follow-on one removing it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-07-09 21:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04 13:57 [PATCH 0/4] GenWQE: patches to improve RAS features Kleber Sacilotto de Souza
2014-06-04 13:57 ` [PATCH 1/4] GenWQE: Add sysfs interface for bitstream reload Kleber Sacilotto de Souza
2014-06-04 14:05   ` Frank Haverkamp
2014-06-04 13:57 ` [PATCH 2/4] GenWQE: Add support for EEH error recovery Kleber Sacilotto de Souza
2014-06-04 14:06   ` Frank Haverkamp
2014-06-04 13:57 ` [PATCH 3/4] GenWQE: Improve hardware " Kleber Sacilotto de Souza
2014-06-04 14:06   ` Frank Haverkamp
2014-06-04 13:57 ` [PATCH 4/4] GenWQE: Increase driver version number Kleber Sacilotto de Souza
2014-06-04 14:06   ` Frank Haverkamp
2014-06-04 15:54   ` Greg KH
2014-06-05  9:23     ` Frank Haverkamp
2014-06-05 16:00       ` Greg KH
2014-06-06 12:07         ` Frank Haverkamp
2014-06-06 12:40           ` Frank Haverkamp
2014-07-09 21:13             ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.